I have a job that runs fine on relatively small input datasets but then
reaches a threshold where I begin to consistently get "Fetch failure" for
the Failure Reason, late in the job, during a saveAsText() operation. 

The first error we are seeing on the "Details for Stage" page is
"ExecutorLostFailure"

My Shuffle Read is 3.3 GB and that's the only thing that seems high, we have
three servers and they are configured on this job for 5g memory, and the job
is running in spark-shell.  The first error in the shell is "Lost executor 2
on (servername): remote Akka client disassociated.

We are still trying to understand how to best diagnose jobs using the web ui
so it's likely that there's some helpful info here that we just don't know
how to interpret -- is there any kind of "troubleshooting guide" beyond the
Spark Configuration page?  I don't know if I'm providing enough info here.

thanks.



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Fetch-Failure-tp20787.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to