[ 
https://issues.apache.org/jira/browse/SPARK-1476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13968147#comment-13968147
 ] 

Mridul Muralidharan commented on SPARK-1476:
--------------------------------------------

[~pwendell] IMO both are limitations (shuffle block < 2G vs any rdd block < 2G) 
though the former is slightly more trickier to work around at times (when there 
is skew, etc).
Note that trying to work around latter will mean more partitions which also has 
an impact on functionality and shuffle performance (ulimit limits enforced will 
mean job fails again).
Ideally though, I would prefer if we did not need to work around these issues 
actually.

This is one of two jira's we want to work on for the next release - the other 
being improving shuffle performance, particularly in context of number of files 
which are created and/or opened.
Since that is a more involved work, [~tgraves] and I will file that later.


Agree about need for a design document - unfortunately, we are flying blind 
right now since we dont know what all will need to change (tachyon integration 
I mentioned above was something we hit recently). Basic idea is known - but not 
sure what other things this will impact.
Current plan is to do a POC, iron out the issues seen and run it on one of the 
jobs which is failing right now : address performance and/or functionality 
issues seen - and then submit a PR based on the conclusions.
I filed this JIRA to get basic inputs from other developers and/or users; and 
to use it as a sounding board in case we hit particularly thorny issues we cant 
resolve due to lack of sufficient context - crowdsourcing design/implementation 
issues :-) 

[~matei] Interesting that you should mention about splitting output of a map 
into multiple blocks.
We are actually thinking about that in a different context - akin to 
MultiOutputs in hadoop or SPLIT in pig : without needing to cache the 
intermediate output; but directly emit values to different blocks/rdd's based 
on the output of a map or some such.
In the context of splitting the output into multiple blocks : I was not so sure 
- particularly given the need for compression, custom serialization, etc.
Also, we have degenerate cases where the value for a key actually becomes > 2G 
(list of sparse vectors which become denser as iterations increase, etc)

> 2GB limit in spark for blocks
> -----------------------------
>
>                 Key: SPARK-1476
>                 URL: https://issues.apache.org/jira/browse/SPARK-1476
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>         Environment: all
>            Reporter: Mridul Muralidharan
>            Assignee: Mridul Muralidharan
>            Priority: Critical
>             Fix For: 1.1.0
>
>
> The underlying abstraction for blocks in spark is a ByteBuffer : which limits 
> the size of the block to 2GB.
> This has implication not just for managed blocks in use, but also for shuffle 
> blocks (memory mapped blocks are limited to 2gig, even though the api allows 
> for long), ser-deser via byte array backed outstreams (SPARK-1391), etc.
> This is a severe limitation for use of spark when used on non trivial 
> datasets.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to