Join is happening successfully as I am able to do count() after the join.
Error is coming only while trying to write in parquet format on hdfs.
Thanks,
Pooja.
On Wed, Jul 1, 2015 at 1:06 PM, Akhil Das ak...@sigmoidanalytics.com
wrote:
It says:
Caused by: java.net.ConnectException:
By any chance, are you using time field in your df. Time fields are known
to be notorious in rdd conversion.
On Jul 1, 2015 6:13 PM, Pooja Jain pooja.ja...@gmail.com wrote:
Join is happening successfully as I am able to do count() after the join.
Error is coming only while trying to write in
I would still look at your executor logs. A count() is rewritten by the
optimizer to be much more efficient because you don't actually need any of
the columns. Also, writing parquet allocates quite a few large buffers.
On Wed, Jul 1, 2015 at 5:42 AM, Pooja Jain pooja.ja...@gmail.com wrote:
It says:
Caused by: java.net.ConnectException: Connection refused: slave2/...:54845
Could you look in the executor logs (stderr on slave2) and see what made it
shut down? Since you are doing a join there's a high possibility of OOM etc.
Thanks
Best Regards
On Wed, Jul 1, 2015 at 10:20 AM,
Hi,
We are using Spark 1.4.0 on hadoop using yarn-cluster mode via
spark-submit. We are facing parquet write issue after doing dataframe joins
We have a full data set and then an incremental data. We are reading them
as dataframes, joining them, and then writing the data to the hdfs system
in