[jira] [Commented] (SPARK-1476) 2GB limit in spark for blocks

Mridul Muralidharan (JIRA) Fri, 15 Aug 2014 13:33:46 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-1476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14099072#comment-14099072
 ]


Mridul Muralidharan commented on SPARK-1476:
--------------------------------------------

Based on discussions we had with others, apparently 1.1 was not a good vehicle 
for this proposal.
Further, since there was no interest in this jira/comments on the proposal, we 
put the effort on the backburner.

We plan to push atleast some of the bugs fixed as part of this effort - 
consolidated shuffle did get resolved in 1.1 and probably a few more might be 
contributed back in 1.2 time permitting (disk backed map output tracking for 
example looks like a good candidate).
But bulk of the change is pervasive and at times a bit invasive and at odds 
with some of the other changes (for example, zero-copy); shepherding it might 
be a bit time consuming for me given other deliverables.

If there is renewed interest in this to get it integrated into a spark release, 
I can try to push for it to be resurrected and submitted.

> 2GB limit in spark for blocks
> -----------------------------
>
>                 Key: SPARK-1476
>                 URL: https://issues.apache.org/jira/browse/SPARK-1476
>             Project: Spark
>          Issue Type: Improvement
>          Components: Spark Core
>         Environment: all
>            Reporter: Mridul Muralidharan
>            Assignee: Mridul Muralidharan
>            Priority: Critical
>         Attachments: 2g_fix_proposal.pdf
>
>
> The underlying abstraction for blocks in spark is a ByteBuffer : which limits 
> the size of the block to 2GB.
> This has implication not just for managed blocks in use, but also for shuffle 
> blocks (memory mapped blocks are limited to 2gig, even though the api allows 
> for long), ser-deser via byte array backed outstreams (SPARK-1391), etc.
> This is a severe limitation for use of spark when used on non trivial 
> datasets.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-1476) 2GB limit in spark for blocks

Reply via email to