Hi ,
I observed if I use subset of same dataset or data set is small its
writing to many part files .
If data set grows its writing to only part files rest all part files empty.
Thanks,
Divya
On 25 April 2016 at 23:15, nguyen duc tuan wrote:
> Maybe the problem is the
Maybe the problem is the data itself. For example, the first dataframe
might has common keys in only one part of the second dataframe. I think you
can verify if you are in this situation by repartition one dataframe and
join it. If this is the true reason, you might see the result distributed
more
Hi,
After joining two dataframes, saving dataframe using Spark CSV.
But all the result data is being written to only one part file whereas
there are 200 part files being created, rest 199 part files are empty.
What is the cause of uneven partitioning ? How can I evenly distribute the
data ?