Take a look at the following two files: https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/shuffle/hash/BlockStoreShuffleFetcher.scala
and https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala On Thu, Apr 9, 2015 at 1:15 AM, Zoltán Zvara <zoltan.zv...@gmail.com> wrote: > Dear Developers, > > I'm trying to investigate the communication pattern regarding data-flow > during execution of a Spark program defined by an RDD chain. I'm > investigating from the Task point of view, and found out that the task type > ResultTask (as retrieving the iterator for its RDD for a given partition), > effectively asks the BlockManager to get the block from local or remote > location. What I do there is to include actual location data in BlockResult > so the task can tell where it retrieved the data from. I've found out that > ResultTask can issue a data-flow only in this case. > > What's the case with the ShuffleMapTask? What happens there? I'm trying to > log locations which are included in the shuffle process. I would be happy > to receive a few hints regarding where remote communication is managed in > case of ShuffleMapTask. > > Thanks! > > Zoltán >