[ 
https://issues.apache.org/jira/browse/SPARK-6190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14387275#comment-14387275
 ] 

Imran Rashid commented on SPARK-6190:
-------------------------------------

I've updated the design doc to include some code.

No, we shouldn't be making changes willy nilly, but we've also got to commit to 
fixing major bugs like this.  Part of the reason the proposal for 
LargeByteBuffer focused on what was missing was to emphasize the limited set of 
functionality.  Its much easier to expose more later than to strip things out.

I've looked into limiting all blocks to < 2GB.  Pretty quickly you see that 
each block must belong to a  "{{BlockGroup}}", which doesn't have a 2GB limit.  
If you try to cache a partition over 2GB, then it would need to make create 
multiple blocks.  Later on when you try fetch the data for that partition, you 
have no idea which blocks to fetch -- you only know the BlockGroup.  Similarly, 
for dropping blocks, its pretty useless to only drop some of the blocks in a 
BlockGroup, since you'd need to recreate the entire BlockGroup in any case.  So 
we'd have to support {{put}}, {{drop}}, and {{get}} for BlockGroups (though 
{{get}} could be streaming-ish, eg. returning an iterator over blocks)

Nonetheless, it might (a) make the changes to the network layer simpler and (b) 
open the door for future optimizations, such as auto-splitting large 
partitions.  I don't think that introducing LargeByteBuffer would even make it 
any harder to eventually move to blocks < 2GB, but I'm open to either approach.

> create LargeByteBuffer abstraction for eliminating 2GB limit on blocks
> ----------------------------------------------------------------------
>
>                 Key: SPARK-6190
>                 URL: https://issues.apache.org/jira/browse/SPARK-6190
>             Project: Spark
>          Issue Type: Sub-task
>          Components: Spark Core
>            Reporter: Imran Rashid
>            Assignee: Imran Rashid
>         Attachments: LargeByteBuffer_v2.pdf
>
>
> A key component in eliminating the 2GB limit on blocks is creating a proper 
> abstraction for storing more than 2GB.  Currently spark is limited by a 
> reliance on nio ByteBuffer and netty ByteBuf, both of which are limited at 
> 2GB.  This task will introduce the new abstraction and the relevant 
> implementation and utilities, without effecting the existing implementation 
> at all.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to