[ https://issues.apache.org/jira/browse/SPARK-1476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14099072#comment-14099072 ]
Mridul Muralidharan commented on SPARK-1476: -------------------------------------------- Based on discussions we had with others, apparently 1.1 was not a good vehicle for this proposal. Further, since there was no interest in this jira/comments on the proposal, we put the effort on the backburner. We plan to push atleast some of the bugs fixed as part of this effort - consolidated shuffle did get resolved in 1.1 and probably a few more might be contributed back in 1.2 time permitting (disk backed map output tracking for example looks like a good candidate). But bulk of the change is pervasive and at times a bit invasive and at odds with some of the other changes (for example, zero-copy); shepherding it might be a bit time consuming for me given other deliverables. If there is renewed interest in this to get it integrated into a spark release, I can try to push for it to be resurrected and submitted. > 2GB limit in spark for blocks > ----------------------------- > > Key: SPARK-1476 > URL: https://issues.apache.org/jira/browse/SPARK-1476 > Project: Spark > Issue Type: Improvement > Components: Spark Core > Environment: all > Reporter: Mridul Muralidharan > Assignee: Mridul Muralidharan > Priority: Critical > Attachments: 2g_fix_proposal.pdf > > > The underlying abstraction for blocks in spark is a ByteBuffer : which limits > the size of the block to 2GB. > This has implication not just for managed blocks in use, but also for shuffle > blocks (memory mapped blocks are limited to 2gig, even though the api allows > for long), ser-deser via byte array backed outstreams (SPARK-1391), etc. > This is a severe limitation for use of spark when used on non trivial > datasets. -- This message was sent by Atlassian JIRA (v6.2#6252) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org