Re: Spark remote communication pattern

Reynold Xin Thu, 09 Apr 2015 01:25:50 -0700

Take a look at the following two files:

https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/shuffle/hash/BlockStoreShuffleFetcher.scala


and

https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala

On Thu, Apr 9, 2015 at 1:15 AM, Zoltán Zvara <zoltan.zv...@gmail.com> wrote:

> Dear Developers,
>
> I'm trying to investigate the communication pattern regarding data-flow
> during execution of a Spark program defined by an RDD chain. I'm
> investigating from the Task point of view, and found out that the task type
> ResultTask (as retrieving the iterator for its RDD for a given partition),
> effectively asks the BlockManager to get the block from local or remote
> location. What I do there is to include actual location data in BlockResult
> so the task can tell where it retrieved the data from. I've found out that
> ResultTask can issue a data-flow only in this case.
>
> What's the case with the ShuffleMapTask? What happens there? I'm trying to
> log locations which are included in the shuffle process. I would be happy
> to receive a few hints regarding where remote communication is managed in
> case of ShuffleMapTask.
>
> Thanks!
>
> Zoltán
>

Re: Spark remote communication pattern

Reply via email to