I want to insert into a partition table using dynamic partition, but I don’t want to have 200 files for a partition because the files will be small for my case.
sqlContext.sql( """ |insert overwrite table event |partition(eventDate) |select | user, | detail, | eventDate |from event_wk """.stripMargin) the table “event_wk” is created from a dataframe by registerTempTable, which is built with some joins. If I set spark.sql.shuffle.partition=2, the join’s performance will be bad because that property seems global. I can do something like this: event_wk.reparitition(2).write.partitionBy("eventDate").format("parquet").save(path) but I have to handle adding partitions by myself. Is there a way you can control the number of files just for this last insert step?