Re: Spark remote communication pattern

Reynold Xin Thu, 09 Apr 2015 10:06:08 -0700

For torrent broadcast, data are read directly through the block manager:

https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/broadcast/TorrentBroadcast.scala#L167




On Thu, Apr 9, 2015 at 7:27 AM, Zoltán Zvara <zoltan.zv...@gmail.com> wrote:

> Thanks! I've found the fetcher! Is there any other places and cases where
> blocks are traveled through network?
>
> Zvara Zoltán
>
>
>
> mail, hangout, skype: zoltan.zv...@gmail.com
>
> mobile, viber: +36203129543
>
> bank: 10918001-00000021-50480008
>
> address: Hungary, 2475 Kápolnásnyék, Kossuth 6/a
>
> elte: HSKSJZ (ZVZOAAI.ELTE)
>
> 2015-04-09 10:24 GMT+02:00 Reynold Xin <r...@databricks.com>:
>
>> Take a look at the following two files:
>>
>>
>> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/shuffle/hash/BlockStoreShuffleFetcher.scala
>>
>> and
>>
>>
>> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala
>>
>> On Thu, Apr 9, 2015 at 1:15 AM, Zoltán Zvara <zoltan.zv...@gmail.com>
>> wrote:
>>
>>> Dear Developers,
>>>
>>> I'm trying to investigate the communication pattern regarding data-flow
>>> during execution of a Spark program defined by an RDD chain. I'm
>>> investigating from the Task point of view, and found out that the task
>>> type
>>> ResultTask (as retrieving the iterator for its RDD for a given
>>> partition),
>>> effectively asks the BlockManager to get the block from local or remote
>>> location. What I do there is to include actual location data in
>>> BlockResult
>>> so the task can tell where it retrieved the data from. I've found out
>>> that
>>> ResultTask can issue a data-flow only in this case.
>>>
>>> What's the case with the ShuffleMapTask? What happens there? I'm trying
>>> to
>>> log locations which are included in the shuffle process. I would be happy
>>> to receive a few hints regarding where remote communication is managed in
>>> case of ShuffleMapTask.
>>>
>>> Thanks!
>>>
>>> Zoltán
>>>
>>
>>
>

Re: Spark remote communication pattern

Reply via email to