[ 
https://issues.apache.org/jira/browse/SPARK-3135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14164645#comment-14164645
 ] 

Andrew Ash commented on SPARK-3135:
-----------------------------------

[~pwendell] how do you feel about backporting this commit (db16067) to 
branch-1.1?

I'm seeing the below exception on a 1.1.0 cluster and think this commit might 
fix it.  It's a broadcast of a collection of ~100k objects, estimated to take 
~1.5GB in memory.  Xmx on the driver is 20G, and Xmx on the executors is 28G.  
I think the 2GB limit on broadcast object size that [~rxin] mentions in the PR 
summary might be the cause.

{noformat}
INFO  | jvm 2    | 2014/10/08 22:09:56 | java.lang.OutOfMemoryError: Requested 
array size exceeds VM limit
INFO  | jvm 2    | 2014/10/08 22:09:56 |      at 
java.util.Arrays.copyOf(Arrays.java:2271)
INFO  | jvm 2    | 2014/10/08 22:09:56 |      at 
java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:113)
INFO  | jvm 2    | 2014/10/08 22:09:56 |      at 
java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:93)
INFO  | jvm 2    | 2014/10/08 22:09:56 |      at 
java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:140)
INFO  | jvm 2    | 2014/10/08 22:09:56 |      at 
org.xerial.snappy.SnappyOutputStream.dump(SnappyOutputStream.java:297)
INFO  | jvm 2    | 2014/10/08 22:09:56 |      at 
org.xerial.snappy.SnappyOutputStream.rawWrite(SnappyOutputStream.java:244)
INFO  | jvm 2    | 2014/10/08 22:09:56 |      at 
org.xerial.snappy.SnappyOutputStream.write(SnappyOutputStream.java:99)
INFO  | jvm 2    | 2014/10/08 22:09:56 |      at 
com.esotericsoftware.kryo.io.Output.flush(Output.java:155)
INFO  | jvm 2    | 2014/10/08 22:09:56 |      at 
com.esotericsoftware.kryo.io.Output.require(Output.java:135)
INFO  | jvm 2    | 2014/10/08 22:09:56 |      at 
com.esotericsoftware.kryo.io.Output.writeBytes(Output.java:220)
INFO  | jvm 2    | 2014/10/08 22:09:56 |      at 
com.esotericsoftware.kryo.io.Output.writeBytes(Output.java:206)
INFO  | jvm 2    | 2014/10/08 22:09:56 |      at 
com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ByteArraySerializer.write(DefaultArraySerializers.java:29)
INFO  | jvm 2    | 2014/10/08 22:09:56 |      at 
com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ByteArraySerializer.write(DefaultArraySerializers.java:18)
INFO  | jvm 2    | 2014/10/08 22:09:56 |      at 
com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:568)
INFO  | jvm 2    | 2014/10/08 22:09:56 |      at 
com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:75)
INFO  | jvm 2    | 2014/10/08 22:09:56 |      at 
com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:18)
INFO  | jvm 2    | 2014/10/08 22:09:56 |      at 
com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:568)
INFO  | jvm 2    | 2014/10/08 22:09:56 |      at 
com.esotericsoftware.kryo.serializers.MapSerializer.write(MapSerializer.java:86)
INFO  | jvm 2    | 2014/10/08 22:09:56 |      at 
com.esotericsoftware.kryo.serializers.MapSerializer.write(MapSerializer.java:17)
INFO  | jvm 2    | 2014/10/08 22:09:56 |      at 
com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:568)
INFO  | jvm 2    | 2014/10/08 22:09:56 |      at 
org.apache.spark.serializer.KryoSerializationStream.writeObject(KryoSerializer.scala:119)
INFO  | jvm 2    | 2014/10/08 22:09:56 |      at 
org.apache.spark.broadcast.TorrentBroadcast$.blockifyObject(TorrentBroadcast.scala:210)
INFO  | jvm 2    | 2014/10/08 22:09:56 |      at 
org.apache.spark.broadcast.TorrentBroadcast.writeBlocks(TorrentBroadcast.scala:83)
INFO  | jvm 2    | 2014/10/08 22:09:56 |      at 
org.apache.spark.broadcast.TorrentBroadcast.<init>(TorrentBroadcast.scala:68)
INFO  | jvm 2    | 2014/10/08 22:09:56 |      at 
org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:36)
INFO  | jvm 2    | 2014/10/08 22:09:56 |      at 
org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:29)
INFO  | jvm 2    | 2014/10/08 22:09:56 |      at 
org.apache.spark.broadcast.BroadcastManager.newBroadcast(BroadcastManager.scala:62)
INFO  | jvm 2    | 2014/10/08 22:09:56 |      at 
org.apache.spark.SparkContext.broadcast(SparkContext.scala:809)
INFO  | jvm 2    | 2014/10/08 22:09:56 |      at 
org.apache.spark.api.java.JavaSparkContext.broadcast(JavaSparkContext.scala:530)
{noformat}

> Avoid memory copy in TorrentBroadcast serialization
> ---------------------------------------------------
>
>                 Key: SPARK-3135
>                 URL: https://issues.apache.org/jira/browse/SPARK-3135
>             Project: Spark
>          Issue Type: Sub-task
>            Reporter: Reynold Xin
>            Assignee: Reynold Xin
>              Labels: starter
>             Fix For: 1.2.0
>
>
> TorrentBroadcast.blockifyObject uses a ByteArrayOutputStream to serialize 
> broadcast object into a single giant byte array, and then separates it into 
> smaller chunks.  We should implement a new OutputStream that writes 
> serialized bytes directly into chunks of byte arrays so we don't need the 
> extra memory copy.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to