I want to insert into a partition table using dynamic partition, but I
don’t want to have 200 files for a partition because the files will be
small for my case.

sqlContext.sql(  """
    |insert overwrite table event
    | user,
    | detail,
    | eventDate
    |from event_wk

the table “event_wk” is created from a dataframe by registerTempTable,
which is built with some joins. If I set spark.sql.shuffle.partition=2, the
join’s performance will be bad because that property seems global.

I can do something like this:


but I have to handle adding partitions by myself.

Is there a way you can control the number of files just for this last
insert step?

Reply via email to