Thanks Deenar, both two methods work.
I actually tried the second method in spark-shell, but it didn't work at
that time. The reason might be: I registered the data frame eventwk as a
temporary table, repartition, then register the table again. Unfortunately
I could not reproduce it.
Thanks
The following should work as long as your tables are created using Spark SQL
event_wk.repartition(2).write.partitionBy("eventDate").format("parquet"
).insertInto("event)
If you want to stick to using "insert overwrite" for Hive compatibility,
then you can repartition twice, instead of setting
I want to insert into a partition table using dynamic partition, but I
don’t want to have 200 files for a partition because the files will be
small for my case.
sqlContext.sql( """
|insert overwrite table event
|partition(eventDate)
|select
| user,
| detail,
| eventDate