Re: data frame problem preserving sort order with repartition() and coalesce()

2016-03-30 Thread Takeshi Yamamuro
Hi, "csvDF = csvDF.sort(orderByColName, ascending=False)" repartitions DF by using RangePartitioner (#partitions depends on "spark.sql.shuffle.partitions"). Seems, in your case, some empty partitions were removed, then you got 17 paritions. // maropu On Wed, Mar 30, 2016 at 6:49 AM, Andy

data frame problem preserving sort order with repartition() and coalesce()

2016-03-29 Thread Andy Davidson
I have a requirement to write my results out into a series of CSV files. No file may have more than 100 rows of data. In the past my data was not sorted, and I was able to use reparation() or coalesce() to ensure the file length requirement. I realize that reparation() cause the data to be