[jira] [Commented] (SPARK-2288) Hide ShuffleBlockManager behind ShuffleManager

2014-08-30 Thread Reynold Xin (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-2288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14116249#comment-14116249
 ] 

Reynold Xin commented on SPARK-2288:


Thanks for the design doc, Raymond. Next time it would be better to also 
comment on the new block type you are adding. Cheers.


 Hide ShuffleBlockManager behind ShuffleManager
 --

 Key: SPARK-2288
 URL: https://issues.apache.org/jira/browse/SPARK-2288
 Project: Spark
  Issue Type: Sub-task
  Components: Block Manager, Shuffle
Reporter: Raymond Liu
Assignee: Raymond Liu
 Attachments: shuffleblockmanager.pdf


 This is a sub task for SPARK-2275. 
 At present, In shuffle write path, the shuffle block manager manage the 
 mapping from some blockID to a FileSegment for the benefit of consolidate 
 shuffle, this way it bypass the block store's blockId based access mode. Then 
 in the read path, when read a shuffle block data, disk store query 
 shuffleBlockManager to hack the normal blockId to file mapping in order to 
 correctly read data from file. This really rend to a lot of bi-directional 
 dependencies between modules and the code logic is some how messed up. None 
 of the shuffle block manager and blockManager/Disk Store fully control the 
 read path. They are tightly coupled in low level code modules. And it make it 
 hard to implement other shuffle manager logics. e.g. a sort based shuffle 
 which might merge all output from one map partition to a single file. This 
 will need to hack more into the diskStore/diskBlockManager etc to find out 
 the right data to be read.
 Possible approaching:
 So I think it might be better that we expose an FileSegment based read 
 interface for DiskStore in addition to the current blockID based interface.
 Then those mapping blockId to FileSegment code logic can all reside in the 
 specific shuffle manager, if they do need to merge data into one single 
 object. they take care of the mapping logic in both read/write path and take 
 the responsibility of read / write shuffle data
 The BlockStore itself should just take care of read/write as required, it 
 should not involve into the data mapping logic at all. This might make the 
 interface between modules more clear and decouple each other in a more clean 
 way.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-2288) Hide ShuffleBlockManager behind ShuffleManager

2014-06-26 Thread Raymond Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-2288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14045446#comment-14045446
 ] 

Raymond Liu commented on SPARK-2288:


Hi 

pull request at https://github.com/apache/spark/pull/1241

 Hide ShuffleBlockManager behind ShuffleManager
 --

 Key: SPARK-2288
 URL: https://issues.apache.org/jira/browse/SPARK-2288
 Project: Spark
  Issue Type: Sub-task
  Components: Block Manager, Shuffle
Reporter: Raymond Liu

 This is a sub task for SPARK-2275. 
 At present, In shuffle write path, the shuffle block manager manage the 
 mapping from some blockID to a FileSegment for the benefit of consolidate 
 shuffle, this way it bypass the block store's blockId based access mode. Then 
 in the read path, when read a shuffle block data, disk store query 
 shuffleBlockManager to hack the normal blockId to file mapping in order to 
 correctly read data from file. This really rend to a lot of bi-directional 
 dependencies between modules and the code logic is some how messed up. None 
 of the shuffle block manager and blockManager/Disk Store fully control the 
 read path. They are tightly coupled in low level code modules. And it make it 
 hard to implement other shuffle manager logics. e.g. a sort based shuffle 
 which might merge all output from one map partition to a single file. This 
 will need to hack more into the diskStore/diskBlockManager etc to find out 
 the right data to be read.
 Possible approaching:
 So I think it might be better that we expose an FileSegment based read 
 interface for DiskStore in addition to the current blockID based interface.
 Then those mapping blockId to FileSegment code logic can all reside in the 
 specific shuffle manager, if they do need to merge data into one single 
 object. they take care of the mapping logic in both read/write path and take 
 the responsibility of read / write shuffle data
 The BlockStore itself should just take care of read/write as required, it 
 should not involve into the data mapping logic at all. This might make the 
 interface between modules more clear and decouple each other in a more clean 
 way.



--
This message was sent by Atlassian JIRA
(v6.2#6252)