[ https://issues.apache.org/jira/browse/SPARK-27511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16821847#comment-16821847 ]
Hyukjin Kwon commented on SPARK-27511: -------------------------------------- Let's ask questions into mailing lists rather then filing an issue here. You could have a better answer than this. > Spark Streaming Driver Memory > ----------------------------- > > Key: SPARK-27511 > URL: https://issues.apache.org/jira/browse/SPARK-27511 > Project: Spark > Issue Type: Question > Components: DStreams > Affects Versions: 2.4.0 > Reporter: Badri Krishnan > Priority: Major > > Hello Apache Spark Community. > We are currently facing an issue with one of our Spark Streaming jobs which > consumes data from a IBM MQ, this is run on a AWS EMR cluster using DStreams > and Checkpointing. > Our Spark streaming job failed with several containers exiting with error > code: 143. I checked your container logs. For example, one of the killed > container's stdout logs [1] show the below error: (Exit code from container > container_1553356041292_0001_15_000004 is : 143) > 2019-03-28 19:32:26,569 ERROR [dispatcher-event-loop-3] > org.apache.spark.streaming.receiver.ReceiverSupervisorImpl:Error stopping > receiver 2 org.apache.spark.SparkException: Exception thrown in awaitResult: > at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:226) > .... > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.io.IOException: Failed to connect to > ip-**-***-*.***.***.com/**.**.***.**:***** > at > org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:245) > at > org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:187) > at org.apache.spark.rpc.netty.NettyRpcEnv.createClient(NettyRpcEnv.scala:198) > at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:194) > at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:190) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > ... 3 more > These containers exited with code 143 because it was not able to reach the > application master(Driver Process). > Amazon mentioned that the Application Master is consuming more memory and > hence recommended us to double it. As AM runs on driver, we were asked to > increase spark.driver.memory from 1.4G to 3G. But the question that was > unanswered was whether increasing the memory would solve the problem or delay > the failure. As this is an ever running streaming application, do we need to > consider something to understand whether the memory usage builds up over a > period of time or are there any properties that needs to be set specific to > how AM(application Master) works for streaming application. Any inputs on how > to track the AM memory usage? Any insights will be helpful. > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org