Re: Questions about Accumulators

Eugen Cepoi Sun, 03 May 2015 05:37:05 -0700

Yes that's it. If a partition is lost, to recompute it, some steps will
need to be re-executed. Perhaps the map function in which you update the
accumulator.


I think you can do it more safely in a transformation near the action,
where it is less likely that an error will occur (not always true...). You
can also checkpoint the RDD after the step that updates the accumulator, so
your transformation doesn't get applied again if some task fails. But this
is kind of expensive considering you only want to update some counter...

Another idea could be to implement a custom accumulator that holds a map of
partition index -> value and then on driver side merge the values in the
map, but I never tried this not sure if it would really work.

Cheers,
Eugen

2015-05-03 14:08 GMT+02:00 xiazhuchang <hk8...@163.com>:

> The official document said " In transformations, users should be aware of
> that each task’s update may be applied more than once if tasks or job
> stages
> are re-executed."
> I don't quite understand what is this mean. is that meas if i use the
> accumulator in transformations(i.e. map() operation), this operation will
> be
> execuated more than once if the task restarte? And then the final result
> will be many times of the real result?
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Questions-about-Accumulators-tp22746.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>

Re: Questions about Accumulators

Reply via email to