[ 
https://issues.apache.org/jira/browse/SPARK-664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14212046#comment-14212046
 ] 

Andrew Ash commented on SPARK-664:
----------------------------------

[~irashid] it sounds like your proposal is to batch accumulator updates between 
tasks on the executor before sending them back to the driver?

I agree this would reduce the amount of network traffic, but the batching would 
come at a cost of higher latency between task completion and accumulator update 
landing in the accumulator in the driver.  With the completion of SPARK-2380 
these accumulators are now shown in the UI, so increasing latency would have an 
effect on end users.

If network bandwidth and UI update latency are fundamentally at odds, maybe 
this is a case for a user option to choose to optimize for network or UI, 
something like {{spark.accumulators.mergeUpdatesOnExecutor}} defaulted to false.

cc [~pwendell] for thoughts

> Accumulator updates should get locally merged before sent to the driver
> -----------------------------------------------------------------------
>
>                 Key: SPARK-664
>                 URL: https://issues.apache.org/jira/browse/SPARK-664
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>            Reporter: Imran Rashid
>            Priority: Minor
>
> Whenever a task finishes, the accumulator updates from that task are 
> immediately sent back to the driver.  When the accumulator updates are big, 
> this is inefficient because (a) a lot more data has to be sent to the driver 
> and (b) the driver has to do all the work of merging the updates together.
> Probably doesn't matter for small accumulators / low number of tasks, but if 
> both are big, this could be a big bottleneck.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to