Re: dynamic coalesce to pick file size

2016-07-26 Thread Pedro Rodriguez
I asked something similar if you search for "Tools for Balancing Partitions By Size" (I couldn't find link on archives). Unfortunately there doesn't seem to be something good right now other than knowing your job statistics. I am planning on implementing the idea I explained in the last paragraph

dynamic coalesce to pick file size

2016-07-26 Thread Maurin Lenglart
Hi, I am doing a Sql query that return a Dataframe. Then I am writing the result of the query using “df.write”, but the result get written in a lot of different small files (~100 of 200 ko). So now I am doing a “.coalesce(2)” before the write. But the number “2” that I picked is static, is