[
https://issues.apache.org/jira/browse/SPARK-1239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15204939#comment-15204939
]
Mridul Muralidharan commented on SPARK-1239:
--------------------------------------------
[~tgraves] For the last part (waiting bit) - why not make the threshold where
you use Broadcast instead of direct serialization such that the problem 'goes
away' ? For my case, I was using a fairly high number, but nothing stopping us
from using say 1mb - which means number of outstanding requests which will
cause memory issue becomes extremely high to the point of being not possible
practically.
In general, I dont like the point about waiting for IO to complete - different
nodes might have different loads, which can cause driver not to respond to fast
nodes because slow nodes cause the response not to be sent (over time).
> Don't fetch all map output statuses at each reducer during shuffles
> -------------------------------------------------------------------
>
> Key: SPARK-1239
> URL: https://issues.apache.org/jira/browse/SPARK-1239
> Project: Spark
> Issue Type: Improvement
> Components: Shuffle, Spark Core
> Affects Versions: 1.0.2, 1.1.0
> Reporter: Patrick Wendell
> Assignee: Thomas Graves
>
> Instead we should modify the way we fetch map output statuses to take both a
> mapper and a reducer - or we should just piggyback the statuses on each task.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]