Re: Running "beyond memory limits" in ConnectedComponents

Sean Owen Thu, 15 Jan 2015 10:05:04 -0800

If you give the executor 22GB, it will run with "... -Xmx22g". If the JVM
heap gets nearly full, it will almost certainly consume more than 22GB of
physical memory, because the JVM needs memory for more than just heap. But
in this scenario YARN was only asked for 22GB and it gets killed. This is
exactly what the overhead setting is solving.


The default is 7% not 9%, or 1.4GB for a 20GB executor heap, so a 2GB
overhead is a bump up. It may or may not be sufficient; I just guessed. Any
JVM program with X heap is going to potentially use more than X physical
memory. The overhead setting attempts to account for that so that you
aren't bothered setting both values. But sometimes you need to manually
increase the overhead cushion if you see that YARN kills your program for
using too much physical memory. That's not the same as the JVM running out
of heap.

On Thu, Jan 15, 2015 at 5:54 PM, Nitin kak <nitinkak...@gmail.com> wrote:

> Is this "Overhead memory" allocation used for any specific purpose.
>
> For example, will it be any different if I do *"--executor-memory 22G" *with
> overhead set to 0%(hypothetically) vs
> "*--executor-memory 20G*" and overhead memory to default(9%) which
> eventually brings the total memory asked by Spark to approximately 22G.
>
>
>
> On Thu, Jan 15, 2015 at 12:10 PM, Sean Owen <so...@cloudera.com> wrote:
>
>> This is a YARN setting. It just controls how much any container can
>> reserve, including Spark executors. That is not the problem.
>>
>> You need Spark to ask for more memory from YARN, on top of the memory
>> that is requested by --executor-memory. Your output indicates the default
>> of 7% is too little. For example you can ask for 20GB for executors and ask
>> for 2GB of overhead. Spark will ask for 22GB from YARN. (Of course, YARN
>> needs to be set to allow containers of at least 22GB!)
>>
>> On Thu, Jan 15, 2015 at 4:31 PM, Nitin kak <nitinkak...@gmail.com> wrote:
>>
>>> Thanks for sticking to this thread.
>>>
>>> I am guessing what memory my app requests and what Yarn requests on my
>>> part should be same and is determined by the value of
>>> *--executor-memory* which I had set to *20G*. Or can the two values be
>>> different?
>>>
>>> I checked in Yarn configurations(below), so I think that fits well into
>>> the memory overhead limits.
>>>
>>>
>>> Container Memory Maximum
>>> yarn.scheduler.maximum-allocation-mb
>>>  MiBGiB
>>> Reset to the default value: 64 GiB
>>> <http://10.1.1.49:7180/cmf/services/108/config#>
>>> Override Instances
>>> <http://10.1.1.49:7180/cmf/service/108/roleType/RESOURCEMANAGER/group/yarn-RESOURCEMANAGER-BASE/config/yarn_scheduler_maximum_allocation_mb?wizardMode=false&returnUrl=%2Fcmf%2Fservices%2F108%2Fconfig&filterValue=>
>>>
>>> The largest amount of physical memory, in MiB, that can be requested for
>>> a container.
>>>
>>>
>>>
>>>
>>>
>>> On Thu, Jan 15, 2015 at 10:28 AM, Sean Owen <so...@cloudera.com> wrote:
>>>
>>>> Those settings aren't relevant, I think. You're concerned with what
>>>> your app requests, and what Spark requests of YARN on your behalf. (Of
>>>> course, you can't request more than what your cluster allows for a
>>>> YARN container for example, but that doesn't seem to be what is
>>>> happening here.)
>>>>
>>>> You do not want to omit --executor-memory if you need large executor
>>>> memory heaps, since then you just request the default and that is
>>>> evidently not enough memory for your app.
>>>>
>>>> Look at http://spark.apache.org/docs/latest/running-on-yarn.html and
>>>> spark.yarn.executor.memoryOverhead  By default it's 7% of your 20G or
>>>> about 1.4G. You might set this higher to 2G to give more overhead.
>>>>
>>>> See the --config property=value syntax documented in
>>>> http://spark.apache.org/docs/latest/submitting-applications.html
>>>>
>>>> On Thu, Jan 15, 2015 at 3:47 AM, Nitin kak <nitinkak...@gmail.com>
>>>> wrote:
>>>> > Thanks Sean.
>>>> >
>>>> > I guess Cloudera Manager has parameters executor_total_max_heapsize
>>>> and
>>>> > worker_max_heapsize which point to the parameters you mentioned above.
>>>> >
>>>> > How much should that cushon between the jvm heap size and yarn memory
>>>> limit
>>>> > be?
>>>> >
>>>> > I tried setting jvm memory to 20g and yarn to 24g, but it gave the
>>>> same
>>>> > error as above.
>>>> >
>>>> > Then, I removed the "--executor-memory" clause
>>>> >
>>>> > spark-submit --class ConnectedComponentsTest --master yarn-cluster
>>>> > --num-executors 7 --executor-cores 1
>>>> > target/scala-2.10/connectedcomponentstest_2.10-1.0.jar
>>>> >
>>>> > That is not giving GC, Out of memory exception
>>>> >
>>>> > 15/01/14 21:20:33 WARN channel.DefaultChannelPipeline: An exception
>>>> was
>>>> > thrown by a user handler while handling an exception event ([id:
>>>> 0x362d65d4,
>>>> > /10.1.1.33:35463 => /10.1.1.73:43389] EXCEPTION:
>>>> java.lang.OutOfMemoryError:
>>>> > GC overhead limit exceeded)
>>>> > java.lang.OutOfMemoryError: GC overhead limit exceeded
>>>> >       at java.lang.Object.clone(Native Method)
>>>> >       at akka.util.CompactByteString$.apply(ByteString.scala:410)
>>>> >       at akka.util.ByteString$.apply(ByteString.scala:22)
>>>> >       at
>>>> >
>>>> akka.remote.transport.netty.TcpHandlers$class.onMessage(TcpSupport.scala:45)
>>>> >       at
>>>> >
>>>> akka.remote.transport.netty.TcpServerHandler.onMessage(TcpSupport.scala:57)
>>>> >       at
>>>> >
>>>> akka.remote.transport.netty.NettyServerHelpers$class.messageReceived(NettyHelpers.scala:43)
>>>> >       at
>>>> >
>>>> akka.remote.transport.netty.ServerHandler.messageReceived(NettyTransport.scala:179)
>>>> >       at
>>>> org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:296)
>>>> >       at
>>>> >
>>>> org.jboss.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:462)
>>>> >       at
>>>> >
>>>> org.jboss.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:443)
>>>> >       at
>>>> >
>>>> org.jboss.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:303)
>>>> >       at
>>>> org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:268)
>>>> >       at
>>>> org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:255)
>>>> >       at
>>>> org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88)
>>>> >       at
>>>> >
>>>> org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:109)
>>>> >       at
>>>> >
>>>> org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:312)
>>>> >       at
>>>> >
>>>> org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:90)
>>>> >       at
>>>> org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
>>>> >       at
>>>> >
>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>>> >       at
>>>> >
>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>>> >       at java.lang.Thread.run(Thread.java:745)
>>>> > 15/01/14 21:20:33 ERROR util.Utils: Uncaught exception in thread
>>>> > SparkListenerBus
>>>> > java.lang.OutOfMemoryError: GC overhead limit exceeded
>>>> >       at
>>>> scala.collection.mutable.ListBuffer.$plus$eq(ListBuffer.scala:168)
>>>> >       at
>>>> scala.collection.mutable.ListBuffer.$plus$eq(ListBuffer.scala:45)
>>>> >       at
>>>> >
>>>> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
>>>> >       at
>>>> >
>>>> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
>>>> >       at scala.collection.immutable.List.foreach(List.scala:318)
>>>> >       at
>>>> scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
>>>> >       at
>>>> scala.collection.AbstractTraversable.map(Traversable.scala:105)
>>>> >       at org.json4s.JsonDSL$class.seq2jvalue(JsonDSL.scala:68)
>>>> >       at org.json4s.JsonDSL$.seq2jvalue(JsonDSL.scala:61)
>>>> >       at
>>>> >
>>>> org.apache.spark.util.JsonProtocol$$anonfun$jobStartToJson$3.apply(JsonProtocol.scala:127)
>>>> >       at
>>>> >
>>>> org.apache.spark.util.JsonProtocol$$anonfun$jobStartToJson$3.apply(JsonProtocol.scala:127)
>>>> >       at org.json4s.JsonDSL$class.pair2jvalue(JsonDSL.scala:79)
>>>> >       at org.json4s.JsonDSL$.pair2jvalue(JsonDSL.scala:61)
>>>> >       at
>>>> >
>>>> org.apache.spark.util.JsonProtocol$.jobStartToJson(JsonProtocol.scala:127)
>>>> >       at
>>>> >
>>>> org.apache.spark.util.JsonProtocol$.sparkEventToJson(JsonProtocol.scala:59)
>>>> >       at
>>>> >
>>>> org.apache.spark.scheduler.EventLoggingListener.logEvent(EventLoggingListener.scala:92)
>>>> >       at
>>>> >
>>>> org.apache.spark.scheduler.EventLoggingListener.onJobStart(EventLoggingListener.scala:118)
>>>> >       at
>>>> >
>>>> org.apache.spark.scheduler.SparkListenerBus$$anonfun$postToAll$3.apply(SparkListenerBus.scala:50)
>>>> >       at
>>>> >
>>>> org.apache.spark.scheduler.SparkListenerBus$$anonfun$postToAll$3.apply(SparkListenerBus.scala:50)
>>>> >       at
>>>> >
>>>> org.apache.spark.scheduler.SparkListenerBus$$anonfun$foreachListener$1.apply(SparkListenerBus.scala:83)
>>>> >       at
>>>> >
>>>> org.apache.spark.scheduler.SparkListenerBus$$anonfun$foreachListener$1.apply(SparkListenerBus.scala:81)
>>>> >       at
>>>> >
>>>> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>>>> >       at
>>>> scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
>>>> >       at
>>>> >
>>>> org.apache.spark.scheduler.SparkListenerBus$class.foreachListener(SparkListenerBus.scala:81)
>>>> >       at
>>>> >
>>>> org.apache.spark.scheduler.SparkListenerBus$class.postToAll(SparkListenerBus.scala:50)
>>>> >       at
>>>> >
>>>> org.apache.spark.scheduler.LiveListenerBus.postToAll(LiveListenerBus.scala:32)
>>>> >       at
>>>> >
>>>> org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(LiveListenerBus.scala:56)
>>>> >       at
>>>> >
>>>> org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(LiveListenerBus.scala:56)
>>>> >       at scala.Option.foreach(Option.scala:236)
>>>> >       at
>>>> >
>>>> org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1.apply$mcV$sp(LiveListenerBus.scala:56)
>>>> >       at
>>>> >
>>>> org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1.apply(LiveListenerBus.scala:47)
>>>> >       at
>>>> >
>>>> org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1.apply(LiveListenerBus.scala:47)
>>>> > Exception in thread "SparkListenerBus" java.lang.OutOfMemoryError: GC
>>>> > overhead limit exceeded
>>>> >       at
>>>> scala.collection.mutable.ListBuffer.$plus$eq(ListBuffer.scala:168)
>>>> >       at
>>>> scala.collection.mutable.ListBuffer.$plus$eq(ListBuffer.scala:45)
>>>> >       at
>>>> >
>>>> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
>>>> >       at
>>>> >
>>>> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
>>>> >       at scala.collection.immutable.List.foreach(List.scala:318)
>>>> >       at
>>>> scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
>>>> >       at
>>>> scala.collection.AbstractTraversable.map(Traversable.scala:105)
>>>> >       at org.json4s.JsonDSL$class.seq2jvalue(JsonDSL.scala:68)
>>>> >       at org.json4s.JsonDSL$.seq2jvalue(JsonDSL.scala:61)
>>>> >       at
>>>> >
>>>> org.apache.spark.util.JsonProtocol$$anonfun$jobStartToJson$3.apply(JsonProtocol.scala:127)
>>>> >       at
>>>> >
>>>> org.apache.spark.util.JsonProtocol$$anonfun$jobStartToJson$3.apply(JsonProtocol.scala:127)
>>>> >       at org.json4s.JsonDSL$class.pair2jvalue(JsonDSL.scala:79)
>>>> >       at org.json4s.JsonDSL$.pair2jvalue(JsonDSL.scala:61)
>>>> >       at
>>>> >
>>>> org.apache.spark.util.JsonProtocol$.jobStartToJson(JsonProtocol.scala:127)
>>>> >       at
>>>> >
>>>> org.apache.spark.util.JsonProtocol$.sparkEventToJson(JsonProtocol.scala:59)
>>>> >       at
>>>> >
>>>> org.apache.spark.scheduler.EventLoggingListener.logEvent(EventLoggingListener.scala:92)
>>>> >       at
>>>> >
>>>> org.apache.spark.scheduler.EventLoggingListener.onJobStart(EventLoggingListener.scala:118)
>>>> >       at
>>>> >
>>>> org.apache.spark.scheduler.SparkListenerBus$$anonfun$postToAll$3.apply(SparkListenerBus.scala:50)
>>>> >       at
>>>> >
>>>> org.apache.spark.scheduler.SparkListenerBus$$anonfun$postToAll$3.apply(SparkListenerBus.scala:50)
>>>> >       at
>>>> >
>>>> org.apache.spark.scheduler.SparkListenerBus$$anonfun$foreachListener$1.apply(SparkListenerBus.scala:83)
>>>> >       at
>>>> >
>>>> org.apache.spark.scheduler.SparkListenerBus$$anonfun$foreachListener$1.apply(SparkListenerBus.scala:81)
>>>> >       at
>>>> >
>>>> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>>>> >       at
>>>> scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
>>>> >       at
>>>> >
>>>> org.apache.spark.scheduler.SparkListenerBus$class.foreachListener(SparkListenerBus.scala:81)
>>>> >       at
>>>> >
>>>> org.apache.spark.scheduler.SparkListenerBus$class.postToAll(SparkListenerBus.scala:50)
>>>> >       at
>>>> >
>>>> org.apache.spark.scheduler.LiveListenerBus.postToAll(LiveListenerBus.scala:32)
>>>> >       at
>>>> >
>>>> org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(LiveListenerBus.scala:56)
>>>> >       at
>>>> >
>>>> org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(LiveListenerBus.scala:56)
>>>> >       at scala.Option.foreach(Option.scala:236)
>>>> >       at
>>>> >
>>>> org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1.apply$mcV$sp(LiveListenerBus.scala:56)
>>>> >       at
>>>> >
>>>> org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1.apply(LiveListenerBus.scala:47)
>>>> >       at
>>>> >
>>>> org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1.apply(LiveListenerBus.scala:47)
>>>> >
>>>> >
>>>> > On Wed, Jan 14, 2015 at 4:44 PM, Sean Owen <so...@cloudera.com>
>>>> wrote:
>>>> >>
>>>> >> That's not quite what that error means. Spark is not out of memory.
>>>> It
>>>> >> means that Spark is using more memory than it asked YARN for. That in
>>>> >> turn is because the default amount of cushion established between the
>>>> >> YARN allowed container size and the JVM heap size is too small. See
>>>> >> spark.yarn.executor.memoryOverhead in
>>>> >> http://spark.apache.org/docs/latest/running-on-yarn.html
>>>> >>
>>>> >> On Wed, Jan 14, 2015 at 9:18 PM, nitinkak001 <nitinkak...@gmail.com>
>>>> >> wrote:
>>>> >> > I am trying to run connected components algorithm in Spark. The
>>>> graph
>>>> >> > has
>>>> >> > roughly 28M edges and 3.2M vertices. Here is the code I am using
>>>> >> >
>>>> >> >  /val inputFile =
>>>> >> >
>>>> "/user/hive/warehouse/spark_poc.db/window_compare_output_text/000000_0"
>>>> >> >     val conf = new
>>>> SparkConf().setAppName("ConnectedComponentsTest")
>>>> >> >     val sc = new SparkContext(conf)
>>>> >> >     val graph = GraphLoader.edgeListFile(sc, inputFile, true, 7,
>>>> >> > StorageLevel.MEMORY_AND_DISK, StorageLevel.MEMORY_AND_DISK);
>>>> >> >     graph.cache();
>>>> >> >     val cc = graph.connectedComponents();
>>>> >> >     graph.edges.saveAsTextFile("/user/kakn/output");/
>>>> >> >
>>>> >> > and here is the command:
>>>> >> >
>>>> >> > /spark-submit --class ConnectedComponentsTest --master yarn-cluster
>>>> >> > --num-executors 7 --driver-memory 6g --executor-memory 8g
>>>> >> > --executor-cores 1
>>>> >> > target/scala-2.10/connectedcomponentstest_2.10-1.0.jar/
>>>> >> >
>>>> >> > It runs for about an hour and then fails with below error. *Isnt
>>>> Spark
>>>> >> > supposed to spill on disk if the RDDs dont fit into the memory?*
>>>> >> >
>>>> >> > Application application_1418082773407_8587 failed 2 times due to AM
>>>> >> > Container for appattempt_1418082773407_8587_000002 exited with
>>>> exitCode:
>>>> >> > -104 due to: Container
>>>> >> > [pid=19790,containerID=container_1418082773407_8587_02_000001] is
>>>> >> > running
>>>> >> > beyond physical memory limits. Current usage: 6.5 GB of 6.5 GB
>>>> physical
>>>> >> > memory used; 8.9 GB of 13.6 GB virtual memory used. Killing
>>>> container.
>>>> >> > Dump of the process-tree for
>>>> container_1418082773407_8587_02_000001 :
>>>> >> > |- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS)
>>>> >> > SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES)
>>>> FULL_CMD_LINE
>>>> >> > |- 19790 19788 19790 19790 (bash) 0 0 110809088 336 /bin/bash -c
>>>> >> > /usr/java/jdk1.7.0_67-cloudera/bin/java -server -Xmx6144m
>>>> >> >
>>>> >> >
>>>> -Djava.io.tmpdir=/mnt/DATA1/yarn/nm/usercache/kakn/appcache/application_1418082773407_8587/container_1418082773407_8587_02_000001/tmp
>>>> >> > '-Dspark.executor.memory=8g' '-Dspark.eventLog.enabled=true'
>>>> >> > '-Dspark.yarn.secondary.jars='
>>>> >> > '-Dspark.app.name=ConnectedComponentsTest'
>>>> >> >
>>>> >> >
>>>> '-Dspark.eventLog.dir=hdfs://<server-name-replaced>:8020/user/spark/applicationHistory'
>>>> >> > '-Dspark.master=yarn-cluster'
>>>> >> > org.apache.spark.deploy.yarn.ApplicationMaster
>>>> >> > --class 'ConnectedComponentsTest' --jar
>>>> >> >
>>>> >> >
>>>> 'file:/home/kakn01/Spark/SparkSource/target/scala-2.10/connectedcomponentstest_2.10-1.0.jar'
>>>> >> > --executor-memory 8192 --executor-cores 1 --num-executors 7 1>
>>>> >> >
>>>> >> >
>>>> /var/log/hadoop-yarn/container/application_1418082773407_8587/container_1418082773407_8587_02_000001/stdout
>>>> >> > 2>
>>>> >> >
>>>> >> >
>>>> /var/log/hadoop-yarn/container/application_1418082773407_8587/container_1418082773407_8587_02_000001/stderr
>>>> >> > |- 19794 19790 19790 19790 (java) 205066 9152 9477726208 1707599
>>>> >> > /usr/java/jdk1.7.0_67-cloudera/bin/java -server -Xmx6144m
>>>> >> >
>>>> >> >
>>>> -Djava.io.tmpdir=/mnt/DATA1/yarn/nm/usercache/kakn/appcache/application_1418082773407_8587/container_1418082773407_8587_02_000001/tmp
>>>> >> > -Dspark.executor.memory=8g -Dspark.eventLog.enabled=true
>>>> >> > -Dspark.yarn.secondary.jars= -Dspark.app.name
>>>> =ConnectedComponentsTest
>>>> >> >
>>>> >> >
>>>> -Dspark.eventLog.dir=hdfs://<server-name-replaced>:8020/user/spark/applicationHistory
>>>> >> > -Dspark.master=yarn-cluster
>>>> >> > org.apache.spark.deploy.yarn.ApplicationMaster
>>>> >> > --class ConnectedComponentsTest --jar
>>>> >> >
>>>> >> >
>>>> file:/home/kakn01/Spark/SparkSource/target/scala-2.10/connectedcomponentstest_2.10-1.0.jar
>>>> >> > --executor-memory 8192 --executor-cores 1 --num-executors 7
>>>> >> > Container killed on request. Exit code is 143
>>>> >> > Container exited with a non-zero exit code 143
>>>> >> > .Failing this attempt.. Failing the application.
>>>> >> >
>>>> >> >
>>>> >> >
>>>> >> > --
>>>> >> > View this message in context:
>>>> >> >
>>>> http://apache-spark-user-list.1001560.n3.nabble.com/Running-beyond-memory-limits-in-ConnectedComponents-tp21139.html
>>>> >> > Sent from the Apache Spark User List mailing list archive at
>>>> Nabble.com.
>>>> >> >
>>>> >> >
>>>> ---------------------------------------------------------------------
>>>> >> > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>>>> >> > For additional commands, e-mail: user-h...@spark.apache.org
>>>> >> >
>>>> >
>>>> >
>>>>
>>>
>>>
>>
>

Re: Running "beyond memory limits" in ConnectedComponents

Reply via email to