Hi, The off-heap memory usage of the 3 Spark executor processes keeps increasing constantly until the boundaries of the physical RAM are hit. This happened two weeks ago, at which point the system comes to a grinding halt, because it's unable to spawn new processes. At such a moment restarting Spark is the obvious solution. In the collectd memory usage graph in the link below (1) we see two moments that we've restarted Spark: last week when we upgraded Spark from 1.4.1 to 1.5.1 and two weeks ago when the physical memory was exhausted. (1) http://i.stack.imgur.com/P4DE3.png
As can be seen at the bottom of this mail (2), the Spark executor process uses approx. 62GB of memory, while the heap size max is set to 20GB. This means the off-heap memory usage is approx. 42GB. Some info: - We use Spark Streaming lib. - Our code is written in Java. - We run Oracle Java v1.7.0_76 - Data is read from Kafka (Kafka runs on different boxes). - Data is written to Cassandra (Cassandra runs on different boxes). - 1 Spark master and 3 Spark executors/workers, running on 4 separate boxes. - We recently upgraded Spark to 1.4.1 and 1.5.1 and the memory usage pattern is identical on all those versions. What can be the cause of this ever-increasing off-heap memory use? PS: I've posted this question on StackOverflow yesterday: http://stackoverflow.com/questions/33668035/spark-executors-off-heap-memory-usage-keeps-increasing (2) $ ps aux | grep 40724 USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND apache-+ 40724 140 47.1 75678780 62181644 ? Sl Nov06 11782:27 /usr/lib/jvm/java-7-oracle/jre/bin/java -cp /opt/spark-1.5.1-bin-hadoop2.4/conf/:/opt/spark-1.5.1-bin-hadoop2.4/lib/spark-assembly-1.5.1-hadoop2.4.0.jar:/opt/spark-1.5.1-bin-hadoop2.4/lib/datanucleus-rdbms-3.2.9.jar:/opt/spark-1.5.1-bin-hadoop2.4/lib/datanucleus-api-jdo-3.2.6.jar:/opt/spark-1.5.1-bin-hadoop2.4/lib/datanucleus-core-3.2.10.jar -Xms20480M -Xmx20480M -Dspark.driver.port=7201 -Dspark.blockManager.port=7206 -Dspark.executor.port=7202 -Dspark.broadcast.port=7204 -Dspark.fileserver.port=7203 -Dspark.replClassServer.port=7205 -XX:MaxPermSize=256m org.apache.spark.executor.CoarseGrainedExecutorBackend --driver-url akka.tcp://sparkdri...@xxx.xxx.xxx.xxx:7201/user/CoarseGrainedScheduler --executor-id 2 --hostname xxx.xxx.xxx.xxx --cores 10 --app-id app-20151106125547-0000 --worker-url akka.tcp://sparkwor...@xxx.xxx.xxx.xxx:7200/user/Worker $ sudo -u apache-spark jps 40724 CoarseGrainedExecutorBackend 40517 Worker 30664 Jps $ sudo -u apache-spark jstat -gc 40724 S0C S1C S0U S1U EC EU OC OU PC PU YGC YGCT FGC FGCT GCT 158720.0 157184.0 110339.8 0.0 6674944.0 1708036.1 13981184.0 2733206.2 59904.0 59551.9 41944 1737.864 39 13.464 1751.328 $ sudo -u apache-spark jps -v 40724 CoarseGrainedExecutorBackend -Xms20480M -Xmx20480M -Dspark.driver.port=7201 -Dspark.blockManager.port=7206 -Dspark.executor.port=7202 -Dspark.broadcast.port=7204 -Dspark.fileserver.port=7203 -Dspark.replClassServer.port=7205 -XX:MaxPermSize=256m 40517 Worker -Xms2048m -Xmx2048m -XX:MaxPermSize=256m 10693 Jps -Dapplication.home=/usr/lib/jvm/java-7-oracle -Xms8m Kind regards, Balthasar Schopman Software Developer LeaseWeb Technologies B.V. T: +31 20 316 0232 M: E: b.schop...@tech.leaseweb.com W: http://www.leaseweb.com Luttenbergweg 8, 1101 EC Amsterdam, Netherlands --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org