Thanks! I've found the fetcher! Is there any other places and cases where blocks are traveled through network?
Zvara Zoltán mail, hangout, skype: zoltan.zv...@gmail.com mobile, viber: +36203129543 bank: 10918001-00000021-50480008 address: Hungary, 2475 Kápolnásnyék, Kossuth 6/a elte: HSKSJZ (ZVZOAAI.ELTE) 2015-04-09 10:24 GMT+02:00 Reynold Xin <r...@databricks.com>: > Take a look at the following two files: > > > https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/shuffle/hash/BlockStoreShuffleFetcher.scala > > and > > > https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala > > On Thu, Apr 9, 2015 at 1:15 AM, Zoltán Zvara <zoltan.zv...@gmail.com> > wrote: > >> Dear Developers, >> >> I'm trying to investigate the communication pattern regarding data-flow >> during execution of a Spark program defined by an RDD chain. I'm >> investigating from the Task point of view, and found out that the task >> type >> ResultTask (as retrieving the iterator for its RDD for a given partition), >> effectively asks the BlockManager to get the block from local or remote >> location. What I do there is to include actual location data in >> BlockResult >> so the task can tell where it retrieved the data from. I've found out that >> ResultTask can issue a data-flow only in this case. >> >> What's the case with the ShuffleMapTask? What happens there? I'm trying to >> log locations which are included in the shuffle process. I would be happy >> to receive a few hints regarding where remote communication is managed in >> case of ShuffleMapTask. >> >> Thanks! >> >> Zoltán >> > >