[ https://issues.apache.org/jira/browse/SPARK-6190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14387275#comment-14387275 ]
Imran Rashid commented on SPARK-6190: ------------------------------------- I've updated the design doc to include some code. No, we shouldn't be making changes willy nilly, but we've also got to commit to fixing major bugs like this. Part of the reason the proposal for LargeByteBuffer focused on what was missing was to emphasize the limited set of functionality. Its much easier to expose more later than to strip things out. I've looked into limiting all blocks to < 2GB. Pretty quickly you see that each block must belong to a "{{BlockGroup}}", which doesn't have a 2GB limit. If you try to cache a partition over 2GB, then it would need to make create multiple blocks. Later on when you try fetch the data for that partition, you have no idea which blocks to fetch -- you only know the BlockGroup. Similarly, for dropping blocks, its pretty useless to only drop some of the blocks in a BlockGroup, since you'd need to recreate the entire BlockGroup in any case. So we'd have to support {{put}}, {{drop}}, and {{get}} for BlockGroups (though {{get}} could be streaming-ish, eg. returning an iterator over blocks) Nonetheless, it might (a) make the changes to the network layer simpler and (b) open the door for future optimizations, such as auto-splitting large partitions. I don't think that introducing LargeByteBuffer would even make it any harder to eventually move to blocks < 2GB, but I'm open to either approach. > create LargeByteBuffer abstraction for eliminating 2GB limit on blocks > ---------------------------------------------------------------------- > > Key: SPARK-6190 > URL: https://issues.apache.org/jira/browse/SPARK-6190 > Project: Spark > Issue Type: Sub-task > Components: Spark Core > Reporter: Imran Rashid > Assignee: Imran Rashid > Attachments: LargeByteBuffer_v2.pdf > > > A key component in eliminating the 2GB limit on blocks is creating a proper > abstraction for storing more than 2GB. Currently spark is limited by a > reliance on nio ByteBuffer and netty ByteBuf, both of which are limited at > 2GB. This task will introduce the new abstraction and the relevant > implementation and utilities, without effecting the existing implementation > at all. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org