[ 
https://issues.apache.org/jira/browse/SPARK-1476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13968974#comment-13968974
 ] 

Patrick Wendell commented on SPARK-1476:
----------------------------------------

Okay sounds good - a POC like that would be really helpful. We've run some very 
large shuffles recently and couldn't isolate any problems except for 
SPARK-1239, which should only be a small change.

If there are some smaller fixes you run into then by all means submit them 
directly. If it requires large architectural changes I'd recommend having a 
design doc before submitting a pull request, because people will want to 
discuss the overall approach. I.e. we should avoid being "break-fix" and think 
about the long term design implications of chagnes.

> 2GB limit in spark for blocks
> -----------------------------
>
>                 Key: SPARK-1476
>                 URL: https://issues.apache.org/jira/browse/SPARK-1476
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>         Environment: all
>            Reporter: Mridul Muralidharan
>            Assignee: Mridul Muralidharan
>            Priority: Critical
>             Fix For: 1.1.0
>
>
> The underlying abstraction for blocks in spark is a ByteBuffer : which limits 
> the size of the block to 2GB.
> This has implication not just for managed blocks in use, but also for shuffle 
> blocks (memory mapped blocks are limited to 2gig, even though the api allows 
> for long), ser-deser via byte array backed outstreams (SPARK-1391), etc.
> This is a severe limitation for use of spark when used on non trivial 
> datasets.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to