First, the word you are looking for is "straggler", not "strangler" -- very
different words. Second, "idempotent" doesn't mean "only happens once", but
rather "if it does happen more than once, the effect is no different than
if it only happened once".

It is possible to insert a nearly limitless variety of side-effecting code
into Spark Tasks, and there is no guarantee from Spark that such code will
execute idempotently. Speculation is one way that a Task can run more than
once, but it is not the only way. A simple FetchFailure (from a lost
Executor or another reason) will mean that a Task has to be re-run in order
to re-compute the missing outputs from a prior execution. In general, Spark
will run a Task as many times as needed to satisfy the requirements of the
Jobs it is requested to fulfill, and you can assume neither that a Task
will run only once nor that it will execute idempotently (unless, of
course, it is side-effect free). Guaranteeing idempotency requires a higher
level coordinator with access to information on all Task executions. The
OutputCommitCoordinator handles that guarantee for HDFS writes, and the
JIRA discussion associated with the introduction of
the OutputCommitCoordinator covers most of the design issues:
https://issues.apache.org/jira/browse/SPARK-4879

On Thu, Feb 16, 2017 at 10:34 AM, Ji Yan <ji...@drive.ai> wrote:

> Dear spark users,
>
> Is there any mechanism in Spark that does not guarantee the idempotent
> nature? For example, for stranglers, the framework might start another task
> assuming the strangler is slow while the strangler is still running. This
> would be annoying sometime when say the task is writing to a file, but have
> the same tasks running at the same time may corrupt the file. From the
> documentation page, I know that Spark's speculative execution mode is
> turned off by default. Does anyone know any other mechanism in Spark that
> may cause problem in scenario like this?
>
> Thanks
> Ji
>
> The information in this email is confidential and may be legally
> privileged. It is intended solely for the addressee. Access to this email
> by anyone else is unauthorized. If you are not the intended recipient, any
> disclosure, copying, distribution or any action taken or omitted to be
> taken in reliance on it, is prohibited and may be unlawful.
>

Reply via email to