[ 
https://issues.apache.org/jira/browse/SPARK-1476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13967911#comment-13967911
 ] 

Patrick Wendell commented on SPARK-1476:
----------------------------------------

[~mrid...@yahoo-inc.com] I think the proposed change would benefit from a 
design doc to explain exactly the cases we want to fix and what trade-offs we 
are willing to make in terms of complexity.

Agreed that there is definitely room for improvement in the out-of-the-box 
behavior here.

Right now the limits as I understand them are (a) the shuffle output from one 
mapper to one reducer cannot be more than 2GB. (b) partitions of an RDD cannot 
exceed 2GB.

I see (a) as the bigger of the two issues. It would be helpful to have specific 
examples of workloads where this causes a problem and the associated data 
sizes, etc. For instance, say I want to do a 1 Terabyte shuffle. Right now 
number of (mappers * reducers) needs to be > ~1000 for this to work (e.g. 100 
mappers and 10 reducers) assuming uniform partitioning. That doesn't seem too 
crazy of an assumption, but if you have skew this would be a much bigger 
problem. 

Popping up a bit - I think our goal should be to handle reasonable workloads 
and not to be 100% compliant with the semantics of Hadoop MapReduce. After all, 
in-memory RDD's are not even a concept in MapReduce. And keep in mind that 
MapReduce became so bloated/complex of a project that it is today no longer 
possible to make substantial changes to it. That's something we definitely want 
to avoid by keeping Spark internals as simple as possible.

> 2GB limit in spark for blocks
> -----------------------------
>
>                 Key: SPARK-1476
>                 URL: https://issues.apache.org/jira/browse/SPARK-1476
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>         Environment: all
>            Reporter: Mridul Muralidharan
>            Priority: Critical
>             Fix For: 1.1.0
>
>
> The underlying abstraction for blocks in spark is a ByteBuffer : which limits 
> the size of the block to 2GB.
> This has implication not just for managed blocks in use, but also for shuffle 
> blocks (memory mapped blocks are limited to 2gig, even though the api allows 
> for long), ser-deser via byte array backed outstreams (SPARK-1391), etc.
> This is a severe limitation for use of spark when used on non trivial 
> datasets.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to