Re: Read Accumulator value while running

Andrew Or Wed, 13 Jan 2016 11:31:46 -0800

Hi Kira,

As you suspected, accumulator values are only updated after the task
completes. We do send accumulator updates from the executors to the driver
on periodic heartbeats, but these only concern internal accumulators, not
the ones created by the user.


In short, I'm afraid there is not currently a way (in Spark 1.6 and before)
to access the accumulator values until after the tasks that updated them
have completed. This will change in Spark 2.0, the next version, however.

Please let me know if you have more questions.
-Andrew

2016-01-13 11:24 GMT-08:00 Daniel Imberman <daniel.imber...@gmail.com>:

> Hi Kira,
>
> I'm having some trouble understanding your question. Could you please give
> a code example?
>
>
>
> From what I think you're asking there are two issues with what you're
> looking to do. (Please keep in mind I could be totally wrong on both of
> these assumptions, but this is what I've been lead to believe)
>
> 1. The contract of an accumulator is that you can't actually read the
> value as the function is performing because the values in the accumulator
> don't actually mean anything until they are reduced. If you were looking
> for progress in a local context, you could do mapPartitions and have a
> local accumulator per partition, but I don't think it's possible to get the
> actual accumulator value in the middle of the map job.
>
> 2. As far as performing ac2 while ac1 is "always running", I'm pretty sure
> that's not possible. The way that lazy valuation works in Spark, the
> transformations have to be done serially. Having it any other way would
> actually be really bad because then you could have ac1 changing the data
> thereby making ac2's output unpredictable.
>
> That being said, with a more specific example it might be possible to help
> figure out a solution that accomplishes what you are trying to do.
>
> On Wed, Jan 13, 2016 at 5:43 AM Kira <mennou...@gmail.com> wrote:
>
>> Hi,
>>
>> So i have an action on one RDD that is relatively long, let's call it ac1;
>> what i want to do is to execute another action (ac2) on the same RDD to
>> see
>> the evolution of the first one (ac1); for this end i want to use an
>> accumulator and read it's value progressively to see the changes on it (on
>> the fly) while ac1 is always running. My problem is that the accumulator
>> is
>> only updated once the ac1 has been finished, this is not helpful for me
>> :/ .
>>
>> I ve seen  here
>> <
>> http://apache-spark-user-list.1001560.n3.nabble.com/Asynchronous-Broadcast-from-driver-to-workers-is-it-possible-td15758.html
>> >
>> what may seem like a solution for me but it doesn t work : "While Spark
>> already offers support for asynchronous reduce (collect data from workers,
>> while not interrupting execution of a parallel transformation) through
>> accumulator"
>>
>> Another post suggested to use SparkListner to do that.
>>
>> are these solutions correct ? if yes, give me a simple exemple ?
>> are there other solutions ?
>>
>> thank you.
>> Regards
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/Read-Accumulator-value-while-running-tp25960.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> For additional commands, e-mail: user-h...@spark.apache.org
>>
>>

Re: Read Accumulator value while running

Reply via email to