[jira] [Commented] (SPARK-1476) 2GB limit in spark for blocks

Matei Zaharia (JIRA) Sat, 12 Apr 2014 20:45:07 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-1476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13967723#comment-13967723
 ]


Matei Zaharia commented on SPARK-1476:
--------------------------------------

I agree, would be good to understand what kind of operations this arises in. Do 
you have cached RDD partitions that are this large, or is it shuffle blocks? Is 
it skew in the shuffle data?

The main concern I see with this is that it would complicate the deserializer 
and block sender code paths, but maybe it's worth it.

> 2GB limit in spark for blocks
> -----------------------------
>
>                 Key: SPARK-1476
>                 URL: https://issues.apache.org/jira/browse/SPARK-1476
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>         Environment: all
>            Reporter: Mridul Muralidharan
>            Priority: Critical
>             Fix For: 1.1.0
>
>
> The underlying abstraction for blocks in spark is a ByteBuffer : which limits 
> the size of the block to 2GB.
> This has implication not just for managed blocks in use, but also for shuffle 
> blocks (memory mapped blocks are limited to 2gig, even though the api allows 
> for long), ser-deser via byte array backed outstreams (SPARK-1391), etc.
> This is a severe limitation for use of spark when used on non trivial 
> datasets.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (SPARK-1476) 2GB limit in spark for blocks

Reply via email to