[ https://issues.apache.org/jira/browse/SPARK-3135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14164645#comment-14164645 ]
Andrew Ash commented on SPARK-3135: ----------------------------------- [~pwendell] how do you feel about backporting this commit (db16067) to branch-1.1? I'm seeing the below exception on a 1.1.0 cluster and think this commit might fix it. It's a broadcast of a collection of ~100k objects, estimated to take ~1.5GB in memory. Xmx on the driver is 20G, and Xmx on the executors is 28G. I think the 2GB limit on broadcast object size that [~rxin] mentions in the PR summary might be the cause. {noformat} INFO | jvm 2 | 2014/10/08 22:09:56 | java.lang.OutOfMemoryError: Requested array size exceeds VM limit INFO | jvm 2 | 2014/10/08 22:09:56 | at java.util.Arrays.copyOf(Arrays.java:2271) INFO | jvm 2 | 2014/10/08 22:09:56 | at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:113) INFO | jvm 2 | 2014/10/08 22:09:56 | at java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:93) INFO | jvm 2 | 2014/10/08 22:09:56 | at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:140) INFO | jvm 2 | 2014/10/08 22:09:56 | at org.xerial.snappy.SnappyOutputStream.dump(SnappyOutputStream.java:297) INFO | jvm 2 | 2014/10/08 22:09:56 | at org.xerial.snappy.SnappyOutputStream.rawWrite(SnappyOutputStream.java:244) INFO | jvm 2 | 2014/10/08 22:09:56 | at org.xerial.snappy.SnappyOutputStream.write(SnappyOutputStream.java:99) INFO | jvm 2 | 2014/10/08 22:09:56 | at com.esotericsoftware.kryo.io.Output.flush(Output.java:155) INFO | jvm 2 | 2014/10/08 22:09:56 | at com.esotericsoftware.kryo.io.Output.require(Output.java:135) INFO | jvm 2 | 2014/10/08 22:09:56 | at com.esotericsoftware.kryo.io.Output.writeBytes(Output.java:220) INFO | jvm 2 | 2014/10/08 22:09:56 | at com.esotericsoftware.kryo.io.Output.writeBytes(Output.java:206) INFO | jvm 2 | 2014/10/08 22:09:56 | at com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ByteArraySerializer.write(DefaultArraySerializers.java:29) INFO | jvm 2 | 2014/10/08 22:09:56 | at com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ByteArraySerializer.write(DefaultArraySerializers.java:18) INFO | jvm 2 | 2014/10/08 22:09:56 | at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:568) INFO | jvm 2 | 2014/10/08 22:09:56 | at com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:75) INFO | jvm 2 | 2014/10/08 22:09:56 | at com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:18) INFO | jvm 2 | 2014/10/08 22:09:56 | at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:568) INFO | jvm 2 | 2014/10/08 22:09:56 | at com.esotericsoftware.kryo.serializers.MapSerializer.write(MapSerializer.java:86) INFO | jvm 2 | 2014/10/08 22:09:56 | at com.esotericsoftware.kryo.serializers.MapSerializer.write(MapSerializer.java:17) INFO | jvm 2 | 2014/10/08 22:09:56 | at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:568) INFO | jvm 2 | 2014/10/08 22:09:56 | at org.apache.spark.serializer.KryoSerializationStream.writeObject(KryoSerializer.scala:119) INFO | jvm 2 | 2014/10/08 22:09:56 | at org.apache.spark.broadcast.TorrentBroadcast$.blockifyObject(TorrentBroadcast.scala:210) INFO | jvm 2 | 2014/10/08 22:09:56 | at org.apache.spark.broadcast.TorrentBroadcast.writeBlocks(TorrentBroadcast.scala:83) INFO | jvm 2 | 2014/10/08 22:09:56 | at org.apache.spark.broadcast.TorrentBroadcast.<init>(TorrentBroadcast.scala:68) INFO | jvm 2 | 2014/10/08 22:09:56 | at org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:36) INFO | jvm 2 | 2014/10/08 22:09:56 | at org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:29) INFO | jvm 2 | 2014/10/08 22:09:56 | at org.apache.spark.broadcast.BroadcastManager.newBroadcast(BroadcastManager.scala:62) INFO | jvm 2 | 2014/10/08 22:09:56 | at org.apache.spark.SparkContext.broadcast(SparkContext.scala:809) INFO | jvm 2 | 2014/10/08 22:09:56 | at org.apache.spark.api.java.JavaSparkContext.broadcast(JavaSparkContext.scala:530) {noformat} > Avoid memory copy in TorrentBroadcast serialization > --------------------------------------------------- > > Key: SPARK-3135 > URL: https://issues.apache.org/jira/browse/SPARK-3135 > Project: Spark > Issue Type: Sub-task > Reporter: Reynold Xin > Assignee: Reynold Xin > Labels: starter > Fix For: 1.2.0 > > > TorrentBroadcast.blockifyObject uses a ByteArrayOutputStream to serialize > broadcast object into a single giant byte array, and then separates it into > smaller chunks. We should implement a new OutputStream that writes > serialized bytes directly into chunks of byte arrays so we don't need the > extra memory copy. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org