subject:"Re\: java.lang.OutOfMemoryError \(java.lang.OutOfMemoryError\: GC overhead limit exceeded\)"

Re: java.lang.OutOfMemoryError (java.lang.OutOfMemoryError: GC overhead limit exceeded)

2014-07-08 Thread Aaron Davidson

There is a difference from actual GC overhead, which can be reduced by
reusing objects, versus this error, which actually means you ran out of
memory. This error can probably be relieved by increasing your executor
heap size, unless your data is corrupt and it is allocating huge arrays, or
you are otherwise keeping too much memory around.

For your other question, you can reuse objects similar to MapReduce
(HadoopRDD does this by actually using Hadoop's Writables, for instance),
but the general Spark APIs don't support this because mutable objects are
not friendly to caching or serializing.


On Tue, Jul 8, 2014 at 9:27 AM, Konstantin Kudryavtsev 
kudryavtsev.konstan...@gmail.com wrote:

 Hi all,

 I faced with the next exception during map step:
 java.lang.OutOfMemoryError (java.lang.OutOfMemoryError: GC overhead limit
 exceeded)
 java.lang.reflect.Array.newInstance(Array.java:70)
 com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.read(DefaultArraySerializers.java:325)
 com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.read(DefaultArraySerializers.java:293)
 com.esotericsoftware.kryo.Kryo.readObjectOrNull(Kryo.java:699)
 com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.read(FieldSerializer.java:611)
 com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:221)
 com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:648)
 com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.read(FieldSerializer.java:605)
 com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:221)
 com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:648)
 com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.read(FieldSerializer.java:605)
 com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:221)
 com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:648)
 com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.read(FieldSerializer.java:605)
 com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:221)
 com.esotericsoftware.kryo.Kryo.readObjectOrNull(Kryo.java:699)
 com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.read(FieldSerializer.java:611)
 com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:221)
 com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:729)
 com.twitter.chill.Tuple2Serializer.read(TupleSerializers.scala:43)
 com.twitter.chill.Tuple2Serializer.read(TupleSerializers.scala:34)
 com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:729)
 org.apache.spark.serializer.KryoDeserializationStream.readObject(KryoSerializer.scala:115)
 org.apache.spark.serializer.DeserializationStream$$anon$1.getNext(Serializer.scala:125)
 org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:71)
 scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
 org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:30)
 org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
 org.apache.spark.rdd.CoGroupedRDD$$anonfun$compute$4.apply(CoGroupedRDD.scala:155)
 org.apache.spark.rdd.CoGroupedRDD$$anonfun$compute$4.apply(CoGroupedRDD.scala:154)
 scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
 scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
 I'm using Spark 1.0In map I create new object each time, as I understand
 I can't reuse object similar to MapReduce development? I wondered, if you
 could point me how is it possible to avoid GC overhead...thank you in
 advance

 Thank you,
 Konstantin Kudryavtsev

Re: java.lang.OutOfMemoryError (java.lang.OutOfMemoryError: GC overhead limit exceeded)

2014-07-08 Thread Jerry Lam

Hi Konstantin,

I just ran into the same problem. I mitigated the issue by reducing the
number of cores when I executed the job which otherwise it won't be able to
finish.

Unlike many people believes, it might not means that you were running out
of memory. A better answer can be found here:
http://stackoverflow.com/questions/4371505/gc-overhead-limit-exceeded and
copied here as a reference:

Excessive GC Time and OutOfMemoryError

The concurrent collector will throw an OutOfMemoryError if too much time is
being spent in garbage collection: if more than 98% of the total time is
spent in garbage collection and less than 2% of the heap is recovered, an
OutOfMemoryError will be thrown. This feature is designed to prevent
applications from running for an extended period of time while making
little or no progress because the heap is too small. If necessary, this
feature can be disabled by adding the option -XX:-UseGCOverheadLimit to the
command line.

The policy is the same as that in the parallel collector, except that time
spent performing concurrent collections is not counted toward the 98% time
limit. In other words, only collections performed while the application is
stopped count toward excessive GC time. Such collections are typically due
to a concurrent mode failure or an explicit collection request (e.g., a
call to System.gc()).

It could be that there are many tasks running in the same node and they all
compete for running GCs which slow things down and trigger the error you
saw. By reducing the number of cores, there are more cpu resources
available to a task so the GC could finish before the error gets throw.

HTH,

Jerry


On Tue, Jul 8, 2014 at 1:35 PM, Aaron Davidson ilike...@gmail.com wrote:

 There is a difference from actual GC overhead, which can be reduced by
 reusing objects, versus this error, which actually means you ran out of
 memory. This error can probably be relieved by increasing your executor
 heap size, unless your data is corrupt and it is allocating huge arrays, or
 you are otherwise keeping too much memory around.

 For your other question, you can reuse objects similar to MapReduce
 (HadoopRDD does this by actually using Hadoop's Writables, for instance),
 but the general Spark APIs don't support this because mutable objects are
 not friendly to caching or serializing.


 On Tue, Jul 8, 2014 at 9:27 AM, Konstantin Kudryavtsev 
 kudryavtsev.konstan...@gmail.com wrote:

 Hi all,

 I faced with the next exception during map step:
 java.lang.OutOfMemoryError (java.lang.OutOfMemoryError: GC overhead limit
 exceeded)
 java.lang.reflect.Array.newInstance(Array.java:70)
 com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.read(DefaultArraySerializers.java:325)
 com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.read(DefaultArraySerializers.java:293)
 com.esotericsoftware.kryo.Kryo.readObjectOrNull(Kryo.java:699)
 com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.read(FieldSerializer.java:611)
 com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:221)
 com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:648)
 com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.read(FieldSerializer.java:605)
 com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:221)
 com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:648)
 com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.read(FieldSerializer.java:605)
 com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:221)
 com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:648)
 com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.read(FieldSerializer.java:605)
 com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:221)
 com.esotericsoftware.kryo.Kryo.readObjectOrNull(Kryo.java:699)
 com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.read(FieldSerializer.java:611)
 com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:221)
 com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:729)
 com.twitter.chill.Tuple2Serializer.read(TupleSerializers.scala:43)
 com.twitter.chill.Tuple2Serializer.read(TupleSerializers.scala:34)
 com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:729)
 org.apache.spark.serializer.KryoDeserializationStream.readObject(KryoSerializer.scala:115)
 org.apache.spark.serializer.DeserializationStream$$anon$1.getNext(Serializer.scala:125)
 org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:71)
 scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
 org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:30)
 org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
 org.apache.spark.rdd.CoGroupedRDD$$anonfun$compute$4.apply(CoGroupedRDD.scala:155)

Re: java.lang.OutOfMemoryError (java.lang.OutOfMemoryError: GC overhead limit exceeded)

2014-07-08 Thread Aaron Davidson

This seems almost equivalent to a heap size error -- since GCs are
stop-the-world events, the fact that we were unable to release more than 2%
of the heap suggests that almost all the memory is *currently in use *(i.e.,
live).

Decreasing the number of cores is another solution which decreases memory
pressure, because each core requires its own set of buffers (for instance,
each kryo serializer has a certain buffer allocated to it), and has its own
working set of data (some subset of a partition). Thus, decreasing the
number of used cores decreases memory contention.


On Tue, Jul 8, 2014 at 10:44 AM, Jerry Lam chiling...@gmail.com wrote:

 Hi Konstantin,

 I just ran into the same problem. I mitigated the issue by reducing the
 number of cores when I executed the job which otherwise it won't be able to
 finish.

 Unlike many people believes, it might not means that you were running out
 of memory. A better answer can be found here:
 http://stackoverflow.com/questions/4371505/gc-overhead-limit-exceeded and
 copied here as a reference:

 Excessive GC Time and OutOfMemoryError

 The concurrent collector will throw an OutOfMemoryError if too much time
 is being spent in garbage collection: if more than 98% of the total time is
 spent in garbage collection and less than 2% of the heap is recovered, an
 OutOfMemoryError will be thrown. This feature is designed to prevent
 applications from running for an extended period of time while making
 little or no progress because the heap is too small. If necessary, this
 feature can be disabled by adding the option -XX:-UseGCOverheadLimit to the
 command line.

 The policy is the same as that in the parallel collector, except that time
 spent performing concurrent collections is not counted toward the 98% time
 limit. In other words, only collections performed while the application is
 stopped count toward excessive GC time. Such collections are typically due
 to a concurrent mode failure or an explicit collection request (e.g., a
 call to System.gc()).

 It could be that there are many tasks running in the same node and they
 all compete for running GCs which slow things down and trigger the error
 you saw. By reducing the number of cores, there are more cpu resources
 available to a task so the GC could finish before the error gets throw.

 HTH,

 Jerry


 On Tue, Jul 8, 2014 at 1:35 PM, Aaron Davidson ilike...@gmail.com wrote:

 There is a difference from actual GC overhead, which can be reduced by
 reusing objects, versus this error, which actually means you ran out of
 memory. This error can probably be relieved by increasing your executor
 heap size, unless your data is corrupt and it is allocating huge arrays, or
 you are otherwise keeping too much memory around.

 For your other question, you can reuse objects similar to MapReduce
 (HadoopRDD does this by actually using Hadoop's Writables, for instance),
 but the general Spark APIs don't support this because mutable objects are
 not friendly to caching or serializing.


 On Tue, Jul 8, 2014 at 9:27 AM, Konstantin Kudryavtsev 
 kudryavtsev.konstan...@gmail.com wrote:

 Hi all,

 I faced with the next exception during map step:
 java.lang.OutOfMemoryError (java.lang.OutOfMemoryError: GC overhead
 limit exceeded)
 java.lang.reflect.Array.newInstance(Array.java:70)
 com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.read(DefaultArraySerializers.java:325)
 com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.read(DefaultArraySerializers.java:293)
 com.esotericsoftware.kryo.Kryo.readObjectOrNull(Kryo.java:699)
 com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.read(FieldSerializer.java:611)
 com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:221)
 com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:648)
 com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.read(FieldSerializer.java:605)
 com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:221)
 com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:648)
 com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.read(FieldSerializer.java:605)
 com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:221)
 com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:648)
 com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.read(FieldSerializer.java:605)
 com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:221)
 com.esotericsoftware.kryo.Kryo.readObjectOrNull(Kryo.java:699)
 com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.read(FieldSerializer.java:611)
 com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:221)
 com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:729)
 com.twitter.chill.Tuple2Serializer.read(TupleSerializers.scala:43)
 com.twitter.chill.Tuple2Serializer.read(TupleSerializers.scala:34)

Re: java.lang.OutOfMemoryError (java.lang.OutOfMemoryError: GC overhead limit exceeded)

Re: java.lang.OutOfMemoryError (java.lang.OutOfMemoryError: GC overhead limit exceeded)

Re: java.lang.OutOfMemoryError (java.lang.OutOfMemoryError: GC overhead limit exceeded)

3 matches

Site Navigation

Mail list logo

Footer information