I've created an issue for this but if anyone has any advice, please let me
know.
Basically, on about 10 GBs of data, saveAsTextFile() to HDFS hangs on two
remaining tasks (out of 320). Those tasks seem to be waiting on data from
another task on another node. Eventually (about 2 hours later) they
I'll make a comment on the JIRA - thanks for reporting this, let's get
to the bottom of it.
On Thu, Jun 19, 2014 at 11:19 AM, Surendranauth Hiraman
suren.hira...@velos.io wrote:
I've created an issue for this but if anyone has any advice, please let me
know.
Basically, on about 10 GBs of
I have a flow that ends with saveAsTextFile() to HDFS.
It seems all the expected files per partition have been written out, based
on the number of part files and the file sizes.
But the driver logs show 2 tasks still not completed and has no activity
and the worker logs show no activity for
Looks like eventually there was some type of reset or timeout and the tasks
have been reassigned. I'm guessing they'll keep failing until max failure
count.
The machine it disconnected from was a remote machine, though I've seen
such failures from connections to itself with other problems. The