[ 
https://issues.apache.org/jira/browse/SPARK-27511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16821847#comment-16821847
 ] 

Hyukjin Kwon commented on SPARK-27511:
--------------------------------------

Let's ask questions into mailing lists rather then filing an issue here. You 
could have a better answer than this.

> Spark Streaming Driver Memory
> -----------------------------
>
>                 Key: SPARK-27511
>                 URL: https://issues.apache.org/jira/browse/SPARK-27511
>             Project: Spark
>          Issue Type: Question
>          Components: DStreams
>    Affects Versions: 2.4.0
>            Reporter: Badri Krishnan
>            Priority: Major
>
> Hello Apache Spark Community.
> We are currently facing an issue with one of our Spark Streaming jobs which 
> consumes data from a IBM MQ, this is run on a AWS EMR cluster using DStreams 
> and Checkpointing.
> Our Spark streaming job failed with several containers exiting with error 
> code: 143. I checked your container logs. For example, one of the killed 
> container's stdout logs [1] show the below error: (Exit code from container 
> container_1553356041292_0001_15_000004 is : 143)
> 2019-03-28 19:32:26,569 ERROR [dispatcher-event-loop-3] 
> org.apache.spark.streaming.receiver.ReceiverSupervisorImpl:Error stopping 
> receiver 2 org.apache.spark.SparkException: Exception thrown in awaitResult:
> at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:226)
> ....
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: java.io.IOException: Failed to connect to 
> ip-**-***-*.***.***.com/**.**.***.**:*****
> at 
> org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:245)
> at 
> org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:187)
> at org.apache.spark.rpc.netty.NettyRpcEnv.createClient(NettyRpcEnv.scala:198)
> at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:194)
> at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:190)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> ... 3 more
> These containers exited with code 143 because it was not able to reach the 
> application master(Driver Process).
> Amazon mentioned that the Application Master is consuming more memory and 
> hence recommended us to double it. As AM runs on driver, we were asked to 
> increase spark.driver.memory from 1.4G to 3G. But the question that was 
> unanswered was whether increasing the memory would solve the problem or delay 
> the failure. As this is an ever running streaming application, do we need to 
> consider something to understand whether the memory usage builds up over a 
> period of time or are there any properties that needs to be set specific to 
> how AM(application Master) works for streaming application. Any inputs on how 
> to track the AM memory usage? Any insights will be helpful.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to