Re: pyspark dataframe sort issue

2016-05-08 Thread Buntu Dev
Thanks Davies, after I did a coalesce(1) to save as single parquet file I was able to get the head() to return the correct order. On Sun, May 8, 2016 at 12:29 AM, Davies Liu wrote: > When you have multiple parquet files, the order of all the rows in > them is not defined.

Re: pyspark dataframe sort issue

2016-05-08 Thread Davies Liu
When you have multiple parquet files, the order of all the rows in them is not defined. On Sat, May 7, 2016 at 11:48 PM, Buntu Dev wrote: > I'm using pyspark dataframe api to sort by specific column and then saving > the dataframe as parquet file. But the resulting parquet