Re: Running "beyond memory limits" in ConnectedComponents

Nitin kak Thu, 15 Jan 2015 09:57:06 -0800

Replying to all....

Is this "Overhead memory" allocation used for any specific purpose.


For example, will it be any different if I do *"--executor-memory 22G" *with
overhead set to 0%(hypothetically) vs
"*--executor-memory 20G*" and overhead memory to default(9%) which
eventually brings the total memory asked by Spark to approximately 22G.

On Thu, Jan 15, 2015 at 12:54 PM, Nitin kak <nitinkak...@gmail.com> wrote:

> Is this "Overhead memory" allocation used for any specific purpose.
>
> For example, will it be any different if I do *"--executor-memory 22G" *with
> overhead set to 0%(hypothetically) vs
> "*--executor-memory 20G*" and overhead memory to default(9%) which
> eventually brings the total memory asked by Spark to approximately 22G.
>
>
>
> On Thu, Jan 15, 2015 at 12:10 PM, Sean Owen <so...@cloudera.com> wrote:
>
>> This is a YARN setting. It just controls how much any container can
>> reserve, including Spark executors. That is not the problem.
>>
>> You need Spark to ask for more memory from YARN, on top of the memory
>> that is requested by --executor-memory. Your output indicates the default
>> of 7% is too little. For example you can ask for 20GB for executors and ask
>> for 2GB of overhead. Spark will ask for 22GB from YARN. (Of course, YARN
>> needs to be set to allow containers of at least 22GB!)
>>
>> On Thu, Jan 15, 2015 at 4:31 PM, Nitin kak <nitinkak...@gmail.com> wrote:
>>
>>> Thanks for sticking to this thread.
>>>
>>> I am guessing what memory my app requests and what Yarn requests on my
>>> part should be same and is determined by the value of
>>> *--executor-memory* which I had set to *20G*. Or can the two values be
>>> different?
>>>
>>> I checked in Yarn configurations(below), so I think that fits well into
>>> the memory overhead limits.
>>>
>>>
>>> Container Memory Maximum
>>> yarn.scheduler.maximum-allocation-mb
>>>  MiBGiB
>>> Reset to the default value: 64 GiB
>>> <http://10.1.1.49:7180/cmf/services/108/config#>
>>> Override Instances
>>> <http://10.1.1.49:7180/cmf/service/108/roleType/RESOURCEMANAGER/group/yarn-RESOURCEMANAGER-BASE/config/yarn_scheduler_maximum_allocation_mb?wizardMode=false&returnUrl=%2Fcmf%2Fservices%2F108%2Fconfig&filterValue=>
>>>
>>> The largest amount of physical memory, in MiB, that can be requested for
>>> a container.
>>>
>>>
>>>
>>>
>>>
>>> On Thu, Jan 15, 2015 at 10:28 AM, Sean Owen <so...@cloudera.com> wrote:
>>>
>>>> Those settings aren't relevant, I think. You're concerned with what
>>>> your app requests, and what Spark requests of YARN on your behalf. (Of
>>>> course, you can't request more than what your cluster allows for a
>>>> YARN container for example, but that doesn't seem to be what is
>>>> happening here.)
>>>>
>>>> You do not want to omit --executor-memory if you need large executor
>>>> memory heaps, since then you just request the default and that is
>>>> evidently not enough memory for your app.
>>>>
>>>> Look at http://spark.apache.org/docs/latest/running-on-yarn.html and
>>>> spark.yarn.executor.memoryOverhead  By default it's 7% of your 20G or
>>>> about 1.4G. You might set this higher to 2G to give more overhead.
>>>>
>>>> See the --config property=value syntax documented in
>>>> http://spark.apache.org/docs/latest/submitting-applications.html
>>>>
>>>> On Thu, Jan 15, 2015 at 3:47 AM, Nitin kak <nitinkak...@gmail.com>
>>>> wrote:
>>>> > Thanks Sean.
>>>> >
>>>> > I guess Cloudera Manager has parameters executor_total_max_heapsize
>>>> and
>>>> > worker_max_heapsize which point to the parameters you mentioned above.
>>>> >
>>>> > How much should that cushon between the jvm heap size and yarn memory
>>>> limit
>>>> > be?
>>>> >
>>>> > I tried setting jvm memory to 20g and yarn to 24g, but it gave the
>>>> same
>>>> > error as above.
>>>> >
>>>> > Then, I removed the "--executor-memory" clause
>>>> >
>>>> > spark-submit --class ConnectedComponentsTest --master yarn-cluster
>>>> > --num-executors 7 --executor-cores 1
>>>> > target/scala-2.10/connectedcomponentstest_2.10-1.0.jar
>>>> >
>>>> > That is not giving GC, Out of memory exception
>>>> >
>>>> > 15/01/14 21:20:33 WARN channel.DefaultChannelPipeline: An exception
>>>> was
>>>> > thrown by a user handler while handling an exception event ([id:
>>>> 0x362d65d4,
>>>> > /10.1.1.33:35463 => /10.1.1.73:43389] EXCEPTION:
>>>> java.lang.OutOfMemoryError:
>>>> > GC overhead limit exceeded)
>>>> > java.lang.OutOfMemoryError: GC overhead limit exceeded
>>>> >       at java.lang.Object.clone(Native Method)
>>>> >       at akka.util.CompactByteString$.apply(ByteString.scala:410)
>>>> >       at akka.util.ByteString$.apply(ByteString.scala:22)
>>>> >       at
>>>> >
>>>> akka.remote.transport.netty.TcpHandlers$class.onMessage(TcpSupport.scala:45)
>>>> >       at
>>>> >
>>>> akka.remote.transport.netty.TcpServerHandler.onMessage(TcpSupport.scala:57)
>>>> >       at
>>>> >
>>>> akka.remote.transport.netty.NettyServerHelpers$class.messageReceived(NettyHelpers.scala:43)
>>>> >       at
>>>> >
>>>> akka.remote.transport.netty.ServerHandler.messageReceived(NettyTransport.scala:179)
>>>> >       at
>>>> org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:296)
>>>> >       at
>>>> >
>>>> org.jboss.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:462)
>>>> >       at
>>>> >
>>>> org.jboss.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:443)
>>>> >       at
>>>> >
>>>> org.jboss.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:303)
>>>> >       at
>>>> org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:268)
>>>> >       at
>>>> org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:255)
>>>> >       at
>>>> org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88)
>>>> >       at
>>>> >
>>>> org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:109)
>>>> >       at
>>>> >
>>>> org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:312)
>>>> >       at
>>>> >
>>>> org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:90)
>>>> >       at
>>>> org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
>>>> >       at
>>>> >
>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>>> >       at
>>>> >
>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>>> >       at java.lang.Thread.run(Thread.java:745)
>>>> > 15/01/14 21:20:33 ERROR util.Utils: Uncaught exception in thread
>>>> > SparkListenerBus
>>>> > java.lang.OutOfMemoryError: GC overhead limit exceeded
>>>> >       at
>>>> scala.collection.mutable.ListBuffer.$plus$eq(ListBuffer.scala:168)
>>>> >       at
>>>> scala.collection.mutable.ListBuffer.$plus$eq(ListBuffer.scala:45)
>>>> >       at
>>>> >
>>>> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
>>>> >       at
>>>> >
>>>> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
>>>> >       at scala.collection.immutable.List.foreach(List.scala:318)
>>>> >       at
>>>> scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
>>>> >       at
>>>> scala.collection.AbstractTraversable.map(Traversable.scala:105)
>>>> >       at org.json4s.JsonDSL$class.seq2jvalue(JsonDSL.scala:68)
>>>> >       at org.json4s.JsonDSL$.seq2jvalue(JsonDSL.scala:61)
>>>> >       at
>>>> >
>>>> org.apache.spark.util.JsonProtocol$$anonfun$jobStartToJson$3.apply(JsonProtocol.scala:127)
>>>> >       at
>>>> >
>>>> org.apache.spark.util.JsonProtocol$$anonfun$jobStartToJson$3.apply(JsonProtocol.scala:127)
>>>> >       at org.json4s.JsonDSL$class.pair2jvalue(JsonDSL.scala:79)
>>>> >       at org.json4s.JsonDSL$.pair2jvalue(JsonDSL.scala:61)
>>>> >       at
>>>> >
>>>> org.apache.spark.util.JsonProtocol$.jobStartToJson(JsonProtocol.scala:127)
>>>> >       at
>>>> >
>>>> org.apache.spark.util.JsonProtocol$.sparkEventToJson(JsonProtocol.scala:59)
>>>> >       at
>>>> >
>>>> org.apache.spark.scheduler.EventLoggingListener.logEvent(EventLoggingListener.scala:92)
>>>> >       at
>>>> >
>>>> org.apache.spark.scheduler.EventLoggingListener.onJobStart(EventLoggingListener.scala:118)
>>>> >       at
>>>> >
>>>> org.apache.spark.scheduler.SparkListenerBus$$anonfun$postToAll$3.apply(SparkListenerBus.scala:50)
>>>> >       at
>>>> >
>>>> org.apache.spark.scheduler.SparkListenerBus$$anonfun$postToAll$3.apply(SparkListenerBus.scala:50)
>>>> >       at
>>>> >
>>>> org.apache.spark.scheduler.SparkListenerBus$$anonfun$foreachListener$1.apply(SparkListenerBus.scala:83)
>>>> >       at
>>>> >
>>>> org.apache.spark.scheduler.SparkListenerBus$$anonfun$foreachListener$1.apply(SparkListenerBus.scala:81)
>>>> >       at
>>>> >
>>>> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>>>> >       at
>>>> scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
>>>> >       at
>>>> >
>>>> org.apache.spark.scheduler.SparkListenerBus$class.foreachListener(SparkListenerBus.scala:81)
>>>> >       at
>>>> >
>>>> org.apache.spark.scheduler.SparkListenerBus$class.postToAll(SparkListenerBus.scala:50)
>>>> >       at
>>>> >
>>>> org.apache.spark.scheduler.LiveListenerBus.postToAll(LiveListenerBus.scala:32)
>>>> >       at
>>>> >
>>>> org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(LiveListenerBus.scala:56)
>>>> >       at
>>>> >
>>>> org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(LiveListenerBus.scala:56)
>>>> >       at scala.Option.foreach(Option.scala:236)
>>>> >       at
>>>> >
>>>> org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1.apply$mcV$sp(LiveListenerBus.scala:56)
>>>> >       at
>>>> >
>>>> org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1.apply(LiveListenerBus.scala:47)
>>>> >       at
>>>> >
>>>> org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1.apply(LiveListenerBus.scala:47)
>>>> > Exception in thread "SparkListenerBus" java.lang.OutOfMemoryError: GC
>>>> > overhead limit exceeded
>>>> >       at
>>>> scala.collection.mutable.ListBuffer.$plus$eq(ListBuffer.scala:168)
>>>> >       at
>>>> scala.collection.mutable.ListBuffer.$plus$eq(ListBuffer.scala:45)
>>>> >       at
>>>> >
>>>> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
>>>> >       at
>>>> >
>>>> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
>>>> >       at scala.collection.immutable.List.foreach(List.scala:318)
>>>> >       at
>>>> scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
>>>> >       at
>>>> scala.collection.AbstractTraversable.map(Traversable.scala:105)
>>>> >       at org.json4s.JsonDSL$class.seq2jvalue(JsonDSL.scala:68)
>>>> >       at org.json4s.JsonDSL$.seq2jvalue(JsonDSL.scala:61)
>>>> >       at
>>>> >
>>>> org.apache.spark.util.JsonProtocol$$anonfun$jobStartToJson$3.apply(JsonProtocol.scala:127)
>>>> >       at
>>>> >
>>>> org.apache.spark.util.JsonProtocol$$anonfun$jobStartToJson$3.apply(JsonProtocol.scala:127)
>>>> >       at org.json4s.JsonDSL$class.pair2jvalue(JsonDSL.scala:79)
>>>> >       at org.json4s.JsonDSL$.pair2jvalue(JsonDSL.scala:61)
>>>> >       at
>>>> >
>>>> org.apache.spark.util.JsonProtocol$.jobStartToJson(JsonProtocol.scala:127)
>>>> >       at
>>>> >
>>>> org.apache.spark.util.JsonProtocol$.sparkEventToJson(JsonProtocol.scala:59)
>>>> >       at
>>>> >
>>>> org.apache.spark.scheduler.EventLoggingListener.logEvent(EventLoggingListener.scala:92)
>>>> >       at
>>>> >
>>>> org.apache.spark.scheduler.EventLoggingListener.onJobStart(EventLoggingListener.scala:118)
>>>> >       at
>>>> >
>>>> org.apache.spark.scheduler.SparkListenerBus$$anonfun$postToAll$3.apply(SparkListenerBus.scala:50)
>>>> >       at
>>>> >
>>>> org.apache.spark.scheduler.SparkListenerBus$$anonfun$postToAll$3.apply(SparkListenerBus.scala:50)
>>>> >       at
>>>> >
>>>> org.apache.spark.scheduler.SparkListenerBus$$anonfun$foreachListener$1.apply(SparkListenerBus.scala:83)
>>>> >       at
>>>> >
>>>> org.apache.spark.scheduler.SparkListenerBus$$anonfun$foreachListener$1.apply(SparkListenerBus.scala:81)
>>>> >       at
>>>> >
>>>> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>>>> >       at
>>>> scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
>>>> >       at
>>>> >
>>>> org.apache.spark.scheduler.SparkListenerBus$class.foreachListener(SparkListenerBus.scala:81)
>>>> >       at
>>>> >
>>>> org.apache.spark.scheduler.SparkListenerBus$class.postToAll(SparkListenerBus.scala:50)
>>>> >       at
>>>> >
>>>> org.apache.spark.scheduler.LiveListenerBus.postToAll(LiveListenerBus.scala:32)
>>>> >       at
>>>> >
>>>> org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(LiveListenerBus.scala:56)
>>>> >       at
>>>> >
>>>> org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(LiveListenerBus.scala:56)
>>>> >       at scala.Option.foreach(Option.scala:236)
>>>> >       at
>>>> >
>>>> org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1.apply$mcV$sp(LiveListenerBus.scala:56)
>>>> >       at
>>>> >
>>>> org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1.apply(LiveListenerBus.scala:47)
>>>> >       at
>>>> >
>>>> org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1.apply(LiveListenerBus.scala:47)
>>>> >
>>>> >
>>>> > On Wed, Jan 14, 2015 at 4:44 PM, Sean Owen <so...@cloudera.com>
>>>> wrote:
>>>> >>
>>>> >> That's not quite what that error means. Spark is not out of memory.
>>>> It
>>>> >> means that Spark is using more memory than it asked YARN for. That in
>>>> >> turn is because the default amount of cushion established between the
>>>> >> YARN allowed container size and the JVM heap size is too small. See
>>>> >> spark.yarn.executor.memoryOverhead in
>>>> >> http://spark.apache.org/docs/latest/running-on-yarn.html
>>>> >>
>>>> >> On Wed, Jan 14, 2015 at 9:18 PM, nitinkak001 <nitinkak...@gmail.com>
>>>> >> wrote:
>>>> >> > I am trying to run connected components algorithm in Spark. The
>>>> graph
>>>> >> > has
>>>> >> > roughly 28M edges and 3.2M vertices. Here is the code I am using
>>>> >> >
>>>> >> >  /val inputFile =
>>>> >> >
>>>> "/user/hive/warehouse/spark_poc.db/window_compare_output_text/000000_0"
>>>> >> >     val conf = new
>>>> SparkConf().setAppName("ConnectedComponentsTest")
>>>> >> >     val sc = new SparkContext(conf)
>>>> >> >     val graph = GraphLoader.edgeListFile(sc, inputFile, true, 7,
>>>> >> > StorageLevel.MEMORY_AND_DISK, StorageLevel.MEMORY_AND_DISK);
>>>> >> >     graph.cache();
>>>> >> >     val cc = graph.connectedComponents();
>>>> >> >     graph.edges.saveAsTextFile("/user/kakn/output");/
>>>> >> >
>>>> >> > and here is the command:
>>>> >> >
>>>> >> > /spark-submit --class ConnectedComponentsTest --master yarn-cluster
>>>> >> > --num-executors 7 --driver-memory 6g --executor-memory 8g
>>>> >> > --executor-cores 1
>>>> >> > target/scala-2.10/connectedcomponentstest_2.10-1.0.jar/
>>>> >> >
>>>> >> > It runs for about an hour and then fails with below error. *Isnt
>>>> Spark
>>>> >> > supposed to spill on disk if the RDDs dont fit into the memory?*
>>>> >> >
>>>> >> > Application application_1418082773407_8587 failed 2 times due to AM
>>>> >> > Container for appattempt_1418082773407_8587_000002 exited with
>>>> exitCode:
>>>> >> > -104 due to: Container
>>>> >> > [pid=19790,containerID=container_1418082773407_8587_02_000001] is
>>>> >> > running
>>>> >> > beyond physical memory limits. Current usage: 6.5 GB of 6.5 GB
>>>> physical
>>>> >> > memory used; 8.9 GB of 13.6 GB virtual memory used. Killing
>>>> container.
>>>> >> > Dump of the process-tree for
>>>> container_1418082773407_8587_02_000001 :
>>>> >> > |- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS)
>>>> >> > SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES)
>>>> FULL_CMD_LINE
>>>> >> > |- 19790 19788 19790 19790 (bash) 0 0 110809088 336 /bin/bash -c
>>>> >> > /usr/java/jdk1.7.0_67-cloudera/bin/java -server -Xmx6144m
>>>> >> >
>>>> >> >
>>>> -Djava.io.tmpdir=/mnt/DATA1/yarn/nm/usercache/kakn/appcache/application_1418082773407_8587/container_1418082773407_8587_02_000001/tmp
>>>> >> > '-Dspark.executor.memory=8g' '-Dspark.eventLog.enabled=true'
>>>> >> > '-Dspark.yarn.secondary.jars='
>>>> >> > '-Dspark.app.name=ConnectedComponentsTest'
>>>> >> >
>>>> >> >
>>>> '-Dspark.eventLog.dir=hdfs://<server-name-replaced>:8020/user/spark/applicationHistory'
>>>> >> > '-Dspark.master=yarn-cluster'
>>>> >> > org.apache.spark.deploy.yarn.ApplicationMaster
>>>> >> > --class 'ConnectedComponentsTest' --jar
>>>> >> >
>>>> >> >
>>>> 'file:/home/kakn01/Spark/SparkSource/target/scala-2.10/connectedcomponentstest_2.10-1.0.jar'
>>>> >> > --executor-memory 8192 --executor-cores 1 --num-executors 7 1>
>>>> >> >
>>>> >> >
>>>> /var/log/hadoop-yarn/container/application_1418082773407_8587/container_1418082773407_8587_02_000001/stdout
>>>> >> > 2>
>>>> >> >
>>>> >> >
>>>> /var/log/hadoop-yarn/container/application_1418082773407_8587/container_1418082773407_8587_02_000001/stderr
>>>> >> > |- 19794 19790 19790 19790 (java) 205066 9152 9477726208 1707599
>>>> >> > /usr/java/jdk1.7.0_67-cloudera/bin/java -server -Xmx6144m
>>>> >> >
>>>> >> >
>>>> -Djava.io.tmpdir=/mnt/DATA1/yarn/nm/usercache/kakn/appcache/application_1418082773407_8587/container_1418082773407_8587_02_000001/tmp
>>>> >> > -Dspark.executor.memory=8g -Dspark.eventLog.enabled=true
>>>> >> > -Dspark.yarn.secondary.jars= -Dspark.app.name
>>>> =ConnectedComponentsTest
>>>> >> >
>>>> >> >
>>>> -Dspark.eventLog.dir=hdfs://<server-name-replaced>:8020/user/spark/applicationHistory
>>>> >> > -Dspark.master=yarn-cluster
>>>> >> > org.apache.spark.deploy.yarn.ApplicationMaster
>>>> >> > --class ConnectedComponentsTest --jar
>>>> >> >
>>>> >> >
>>>> file:/home/kakn01/Spark/SparkSource/target/scala-2.10/connectedcomponentstest_2.10-1.0.jar
>>>> >> > --executor-memory 8192 --executor-cores 1 --num-executors 7
>>>> >> > Container killed on request. Exit code is 143
>>>> >> > Container exited with a non-zero exit code 143
>>>> >> > .Failing this attempt.. Failing the application.
>>>> >> >
>>>> >> >
>>>> >> >
>>>> >> > --
>>>> >> > View this message in context:
>>>> >> >
>>>> http://apache-spark-user-list.1001560.n3.nabble.com/Running-beyond-memory-limits-in-ConnectedComponents-tp21139.html
>>>> >> > Sent from the Apache Spark User List mailing list archive at
>>>> Nabble.com.
>>>> >> >
>>>> >> >
>>>> ---------------------------------------------------------------------
>>>> >> > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>>>> >> > For additional commands, e-mail: user-h...@spark.apache.org
>>>> >> >
>>>> >
>>>> >
>>>>
>>>
>>>
>>
>

Re: Running "beyond memory limits" in ConnectedComponents

Reply via email to