[jira] [Updated] (SPARK-2288) Hide ShuffleBlockManager behind ShuffleManager

Raymond Liu (JIRA) Wed, 27 Aug 2014 22:38:54 -0700

     [ 
https://issues.apache.org/jira/browse/SPARK-2288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Raymond Liu updated SPARK-2288:
-------------------------------

    Attachment: shuffleblockmanager.pdf

> Hide ShuffleBlockManager behind ShuffleManager
> ----------------------------------------------
>
>                 Key: SPARK-2288
>                 URL: https://issues.apache.org/jira/browse/SPARK-2288
>             Project: Spark
>          Issue Type: Sub-task
>          Components: Block Manager, Shuffle
>            Reporter: Raymond Liu
>            Assignee: Raymond Liu
>         Attachments: shuffleblockmanager.pdf
>
>
> This is a sub task for SPARK-2275. 
> At present, In shuffle write path, the shuffle block manager manage the 
> mapping from some blockID to a FileSegment for the benefit of consolidate 
> shuffle, this way it bypass the block store's blockId based access mode. Then 
> in the read path, when read a shuffle block data, disk store query 
> shuffleBlockManager to hack the normal blockId to file mapping in order to 
> correctly read data from file. This really rend to a lot of bi-directional 
> dependencies between modules and the code logic is some how messed up. None 
> of the shuffle block manager and blockManager/Disk Store fully control the 
> read path. They are tightly coupled in low level code modules. And it make it 
> hard to implement other shuffle manager logics. e.g. a sort based shuffle 
> which might merge all output from one map partition to a single file. This 
> will need to hack more into the diskStore/diskBlockManager etc to find out 
> the right data to be read.
> Possible approaching:
> So I think it might be better that we expose an FileSegment based read 
> interface for DiskStore in addition to the current blockID based interface.
> Then those mapping blockId to FileSegment code logic can all reside in the 
> specific shuffle manager, if they do need to merge data into one single 
> object. they take care of the mapping logic in both read/write path and take 
> the responsibility of read / write shuffle data
> The BlockStore itself should just take care of read/write as required, it 
> should not involve into the data mapping logic at all. This might make the 
> interface between modules more clear and decouple each other in a more clean 
> way.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-2288) Hide ShuffleBlockManager behind ShuffleManager

Reply via email to