Push epoch updates to executors on fetch failure to avoid fetch retries for missing executors

Juan Rodríguez Hortalá Wed, 25 Oct 2017 10:07:20 -0700

Hi,

I opened https://issues.apache.org/jira/browse/SPARK-22339 some days ago,
and I would like to get some feedback on that. The idea is pushing epoch
updates to the executors after a fetch failure by piggybacking on the
executor heartbeat response, in order to fail faster when an executor and
their shuffle blocks are lost, instead of having to wait for all fetch
retries to fail and a new task to be started on the reader executors. This
can speed up job execution, particularly when executors are lost at the end
of an stage in a Spark application with a single action at a time.There are
more details and a draft patch for this in the JIRA.


Looking forward for your feedback on this.

Greetings,

Juan

Push epoch updates to executors on fetch failure to avoid fetch retries for missing executors

Reply via email to