This might be caused by a few large Map objects that Spark is trying to serialize. These are not broadcast variables or anything, they're just regular objects.
Would it help if I further indexed these maps into a two-level Map i.e. Map[String, Map[String, Int]] ? Or would this still count against me? What if I manually split them up into numerous Map variables? On Mon, Aug 15, 2016 at 2:12 PM, Arun Luthra <arun.lut...@gmail.com> wrote: > I got this OOM error in Spark local mode. The error seems to have been at > the start of a stage (all of the stages on the UI showed as complete, there > were more stages to do but had not showed up on the UI yet). > > There appears to be ~100G of free memory at the time of the error. > > Spark 2.0.0 > 200G driver memory > local[30] > 8 /mntX/tmp directories for spark.local.dir > "spark.sql.shuffle.partitions", "500" > "spark.driver.maxResultSize","500" > "spark.default.parallelism", "1000" > > The line number for the error is at an RDD map operation where there are > some potentially large Map objects that are going to be accessed by each > record. Does it matter if they are broadcast variables or not? I imagine > not because its in local mode they should be available in memory to every > executor/core. > > Possibly related: > http://apache-spark-user-list.1001560.n3.nabble.com/Spark- > ClosureCleaner-or-java-serializer-OOM-when-trying-to-grow-td24796.html > > Exception in thread "main" java.lang.OutOfMemoryError > at java.io.ByteArrayOutputStream.hugeCapacity(ByteArrayOutputStream.java: > 123) > at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:117) > at java.io.ByteArrayOutputStream.ensureCapacity( > ByteArrayOutputStream.java:93) > at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:153) > at java.io.ObjectOutputStream$BlockDataOutputStream.drain( > ObjectOutputStream.java:1877) > at java.io.ObjectOutputStream$BlockDataOutputStream.setBlockDataMode( > ObjectOutputStream.java:1786) > at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1189) > at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:348) > at org.apache.spark.serializer.JavaSerializationStream. > writeObject(JavaSerializer.scala:43) > at org.apache.spark.serializer.JavaSerializerInstance. > serialize(JavaSerializer.scala:100) > at org.apache.spark.util.ClosureCleaner$.ensureSerializable( > ClosureCleaner.scala:295) > at org.apache.spark.util.ClosureCleaner$.org$apache$ > spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:288) > at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:108) > at org.apache.spark.SparkContext.clean(SparkContext.scala:2037) > at org.apache.spark.rdd.RDD$$anonfun$map$1.apply(RDD.scala:366) > at org.apache.spark.rdd.RDD$$anonfun$map$1.apply(RDD.scala:365) > at org.apache.spark.rdd.RDDOperationScope$.withScope( > RDDOperationScope.scala:151) > at org.apache.spark.rdd.RDDOperationScope$.withScope( > RDDOperationScope.scala:112) > at org.apache.spark.rdd.RDD.withScope(RDD.scala:358) > at org.apache.spark.rdd.RDD.map(RDD.scala:365) > at abc.Abc$.main(abc.scala:395) > at abc.Abc.main(abc.scala) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at sun.reflect.NativeMethodAccessorImpl.invoke( > NativeMethodAccessorImpl.java:62) > at sun.reflect.DelegatingMethodAccessorImpl.invoke( > DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$ > deploy$SparkSubmit$$runMain(SparkSubmit.scala:729) > at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:185) > at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:210) > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:124) > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) > >