Re: [Spark 1.5.2]All data being written to only one part file rest part files are empty

2016-04-29 Thread Divya Gehlot
Hi , I observed if I use subset of same dataset or data set is small its writing to many part files . If data set grows its writing to only part files rest all part files empty. Thanks, Divya On 25 April 2016 at 23:15, nguyen duc tuan wrote: > Maybe the problem is the

Re: [Spark 1.5.2]All data being written to only one part file rest part files are empty

2016-04-25 Thread nguyen duc tuan
Maybe the problem is the data itself. For example, the first dataframe might has common keys in only one part of the second dataframe. I think you can verify if you are in this situation by repartition one dataframe and join it. If this is the true reason, you might see the result distributed more

[Spark 1.5.2]All data being written to only one part file rest part files are empty

2016-04-24 Thread Divya Gehlot
Hi, After joining two dataframes, saving dataframe using Spark CSV. But all the result data is being written to only one part file whereas there are 200 part files being created, rest 199 part files are empty. What is the cause of uneven partitioning ? How can I evenly distribute the data ?