Hi, I was wondering how Spark handle stage / task failures for a job.
We are running a Spark job to batch write to ElasticSearch and we are seeing one or two stage failures due to ES cluster getting over loaded (expected as we are testing with single node ES cluster). But I was assuming that when some of the batch writes to ES fail after certain number of retries (10), it should have aborted the whole job but we are seeing that spark job marked as finished even though single job failed. How does Spark handles failure when a job or stage is marked as failed? Thanks in advance. -- -- Cheers, Praj