[ https://issues.apache.org/jira/browse/SPARK-6190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16406875#comment-16406875 ]
Matthew Porter commented on SPARK-6190: --------------------------------------- Experiencing similar frustrations to Brian, we have well partitioned datasets that are just massive in size. Every once in a while Spark fails and we spend much longer than I would like doing nothing but tweaking partition values and crossing our fingers that the next run succeeds. Again, very hard to explain to higher management that despite using having hundreds of GBs of of RAM at our disposal, we are limited to 2 GBs during data shuffles. Are there any plans or intentions on resolving this bug? This and https://issues.apache.org/jira/browse/SPARK-5928 have been "In Progress" for more than 3 years now with no visible progress. > create LargeByteBuffer abstraction for eliminating 2GB limit on blocks > ---------------------------------------------------------------------- > > Key: SPARK-6190 > URL: https://issues.apache.org/jira/browse/SPARK-6190 > Project: Spark > Issue Type: Sub-task > Components: Spark Core > Reporter: Imran Rashid > Assignee: Imran Rashid > Priority: Major > Attachments: LargeByteBuffer_v3.pdf > > > A key component in eliminating the 2GB limit on blocks is creating a proper > abstraction for storing more than 2GB. Currently spark is limited by a > reliance on nio ByteBuffer and netty ByteBuf, both of which are limited at > 2GB. This task will introduce the new abstraction and the relevant > implementation and utilities, without effecting the existing implementation > at all. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org