[ 
https://issues.apache.org/jira/browse/SPARK-1239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15205279#comment-15205279
 ] 

Thomas Graves commented on SPARK-1239:
--------------------------------------

I do like the idea of broadcast and originally when I had tried it I had the 
issue mentioned in the second bullet point, but as long as we are synchronizing 
on the requests so we only broadcast it once we should be ok.
It does seem to have some further constraints though too.  With a sufficient 
large job I don't think it matters but what if we only have a small number of 
reducers, we broadcast it to all executors when only a couple need it. I guess 
that doesn't hurt much unless the other executors start going to the executors 
your reducers are on and add more load to them.  Should be pretty minimal 
though.
Broadcast also seems to make less sense when using the dynamic allocation.  At 
least I've seen issues when executors go away, it fails fetch from that one, 
has to retry, etc, adding additional time.  We recently specifically fixed one 
issue with this to make it go get locations again after certain number of 
failures.  That time should be less now that we fixed that but I'll have to run 
the numbers.

I'll do some more analysis/testing of this and see if that really matters.  

with a sufficient number of threads I don't think a few slow nodes would make 
much of a difference here, if you have that many slow nodes the shuffle itself 
is going to be impacted which I would see as a larger affect. The slow nodes 
could just as well affect the broadcast as well.  Hopefully you skip those as 
it takes longer for those to get a chunk, buts its possible that once that slow 
one has a chunk or two, more and more executors start going to that one for the 
broadcast data instead of the driver thus slowing down more transfers.

But its a good point and my current method would truly block (for a certain 
time) rather then being slow.  Note that there is a timeout on waiting for the 
send to happen and when it does it closes the connection and executor would 
retry.  You don't have to worry about that with broadcast.

I'll do some more analysis with that approach.

I wish Netty had some other built in mechanisms for flow control.

> Don't fetch all map output statuses at each reducer during shuffles
> -------------------------------------------------------------------
>
>                 Key: SPARK-1239
>                 URL: https://issues.apache.org/jira/browse/SPARK-1239
>             Project: Spark
>          Issue Type: Improvement
>          Components: Shuffle, Spark Core
>    Affects Versions: 1.0.2, 1.1.0
>            Reporter: Patrick Wendell
>            Assignee: Thomas Graves
>
> Instead we should modify the way we fetch map output statuses to take both a 
> mapper and a reducer - or we should just piggyback the statuses on each task. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to