Re: Spark Cluster: RECEIVED SIGNAL 15: SIGTERM
That's why I think it's the OOM killer. There are several cases of memory overuse / errors : 1 - The application tries to allocate more than the Heap limit and GC cannot free more memory = OutOfMemory : Java Heap Space exception from JVM 2 - The jvm is configured with a max heap size larger than the available memory. At some point the application needs to allocate memory in JVM, the JVM tries to extend its heap and allocate real memory (or maybe the OS is configured with overcommit virtual memory), but fails = Kill process of sacrifice child (or others, depending on various factors : https://plumbr.eu/outofmemoryerror) 3 - The jvm has allocated its memory from the beginning and it has been served, but other processes start starving from memory shortage, the pressure on memory grows beyond the threshold configured in the OOM Killer, and boom, the java process is selected for a sacrifice because it is the main culprit of memory consumption. Guillaume Linux OOM throws SIGTERM, but if I remember correctly JVM handles heap memory limits differently and throws OutOfMemoryError and eventually sends SIGINT. Not sure what happened but the worker simply received a SIGTERM signal, so perhaps the daemon was terminated by someone or a parent process. Just my guess. Tim On Mon, Apr 13, 2015 at 2:28 AM, Guillaume Pitel guillaume.pi...@exensa.com mailto:guillaume.pi...@exensa.com wrote: Very likely to be this : http://www.linuxdevcenter.com/pub/a/linux/2006/11/30/linux-out-of-memory.html?page=2 Your worker ran out of memory = maybe you're asking for too much memory for the JVM, or something else is running on the worker Guillaume Any idea what this means, many thanks == logs/spark-.-org.apache.spark.deploy.worker.Worker-1-09.out.1 == 15/04/13 07:07:22 INFO Worker: Starting Spark worker 09:39910 with 4 cores, 6.6 GB RAM 15/04/13 07:07:22 INFO Worker: Running Spark version 1.3.0 15/04/13 07:07:22 INFO Worker: Spark home: /remote/users//work/tools/spark-1.3.0-bin-hadoop2.4 15/04/13 07:07:22 INFO Server: jetty-8.y.z-SNAPSHOT 15/04/13 07:07:22 INFO AbstractConnector: Started SelectChannelConnector@0.0.0.0:8081 http://SelectChannelConnector@0.0.0.0:8081 15/04/13 07:07:22 INFO Utils: Successfully started service 'WorkerUI' on port 8081. 15/04/13 07:07:22 INFO WorkerWebUI: Started WorkerWebUI at http://09:8081 15/04/13 07:07:22 INFO Worker: Connecting to master akka.tcp://sparkMaster@nceuhamnr08:7077/user/Master... 15/04/13 07:07:22 INFO Worker: Successfully registered with master spark://08:7077 *15/04/13 08:35:07 ERROR Worker: RECEIVED SIGNAL 15: SIGTERM* -- eXenSa *Guillaume PITEL, Président* +33(0)626 222 431 eXenSa S.A.S. http://www.exensa.com/ 41, rue Périer - 92120 Montrouge - FRANCE Tel +33(0)184 163 677 / Fax +33(0)972 283 705 -- eXenSa *Guillaume PITEL, Président* +33(0)626 222 431 eXenSa S.A.S. http://www.exensa.com/ 41, rue Périer - 92120 Montrouge - FRANCE Tel +33(0)184 163 677 / Fax +33(0)972 283 705
Spark Cluster: RECEIVED SIGNAL 15: SIGTERM
Any idea what this means, many thanks == logs/spark-.-org.apache.spark.deploy.worker.Worker-1-09.out.1 == 15/04/13 07:07:22 INFO Worker: Starting Spark worker 09:39910 with 4 cores, 6.6 GB RAM 15/04/13 07:07:22 INFO Worker: Running Spark version 1.3.0 15/04/13 07:07:22 INFO Worker: Spark home: /remote/users//work/tools/spark-1.3.0-bin-hadoop2.4 15/04/13 07:07:22 INFO Server: jetty-8.y.z-SNAPSHOT 15/04/13 07:07:22 INFO AbstractConnector: Started SelectChannelConnector@0.0.0.0:8081 15/04/13 07:07:22 INFO Utils: Successfully started service 'WorkerUI' on port 8081. 15/04/13 07:07:22 INFO WorkerWebUI: Started WorkerWebUI at http://09:8081 15/04/13 07:07:22 INFO Worker: Connecting to master akka.tcp://sparkMaster@nceuhamnr08:7077/user/Master... 15/04/13 07:07:22 INFO Worker: Successfully registered with master spark://08:7077 *15/04/13 08:35:07 ERROR Worker: RECEIVED SIGNAL 15: SIGTERM*
Re: Spark Cluster: RECEIVED SIGNAL 15: SIGTERM
Very likely to be this : http://www.linuxdevcenter.com/pub/a/linux/2006/11/30/linux-out-of-memory.html?page=2 Your worker ran out of memory = maybe you're asking for too much memory for the JVM, or something else is running on the worker Guillaume Any idea what this means, many thanks == logs/spark-.-org.apache.spark.deploy.worker.Worker-1-09.out.1 == 15/04/13 07:07:22 INFO Worker: Starting Spark worker 09:39910 with 4 cores, 6.6 GB RAM 15/04/13 07:07:22 INFO Worker: Running Spark version 1.3.0 15/04/13 07:07:22 INFO Worker: Spark home: /remote/users//work/tools/spark-1.3.0-bin-hadoop2.4 15/04/13 07:07:22 INFO Server: jetty-8.y.z-SNAPSHOT 15/04/13 07:07:22 INFO AbstractConnector: Started SelectChannelConnector@0.0.0.0:8081 http://SelectChannelConnector@0.0.0.0:8081 15/04/13 07:07:22 INFO Utils: Successfully started service 'WorkerUI' on port 8081. 15/04/13 07:07:22 INFO WorkerWebUI: Started WorkerWebUI at http://09:8081 15/04/13 07:07:22 INFO Worker: Connecting to master akka.tcp://sparkMaster@nceuhamnr08:7077/user/Master... 15/04/13 07:07:22 INFO Worker: Successfully registered with master spark://08:7077 *15/04/13 08:35:07 ERROR Worker: RECEIVED SIGNAL 15: SIGTERM* -- eXenSa *Guillaume PITEL, Président* +33(0)626 222 431 eXenSa S.A.S. http://www.exensa.com/ 41, rue Périer - 92120 Montrouge - FRANCE Tel +33(0)184 163 677 / Fax +33(0)972 283 705
Re: Spark Cluster: RECEIVED SIGNAL 15: SIGTERM
Linux OOM throws SIGTERM, but if I remember correctly JVM handles heap memory limits differently and throws OutOfMemoryError and eventually sends SIGINT. Not sure what happened but the worker simply received a SIGTERM signal, so perhaps the daemon was terminated by someone or a parent process. Just my guess. Tim On Mon, Apr 13, 2015 at 2:28 AM, Guillaume Pitel guillaume.pi...@exensa.com wrote: Very likely to be this : http://www.linuxdevcenter.com/pub/a/linux/2006/11/30/linux-out-of-memory.html?page=2 Your worker ran out of memory = maybe you're asking for too much memory for the JVM, or something else is running on the worker Guillaume Any idea what this means, many thanks == logs/spark-.-org.apache.spark.deploy.worker.Worker-1-09.out.1 == 15/04/13 07:07:22 INFO Worker: Starting Spark worker 09:39910 with 4 cores, 6.6 GB RAM 15/04/13 07:07:22 INFO Worker: Running Spark version 1.3.0 15/04/13 07:07:22 INFO Worker: Spark home: /remote/users//work/tools/spark-1.3.0-bin-hadoop2.4 15/04/13 07:07:22 INFO Server: jetty-8.y.z-SNAPSHOT 15/04/13 07:07:22 INFO AbstractConnector: Started SelectChannelConnector@0.0.0.0:8081 15/04/13 07:07:22 INFO Utils: Successfully started service 'WorkerUI' on port 8081. 15/04/13 07:07:22 INFO WorkerWebUI: Started WorkerWebUI at http://09:8081 15/04/13 07:07:22 INFO Worker: Connecting to master akka.tcp://sparkMaster@nceuhamnr08:7077/user/Master... 15/04/13 07:07:22 INFO Worker: Successfully registered with master spark://08:7077 *15/04/13 08:35:07 ERROR Worker: RECEIVED SIGNAL 15: SIGTERM* -- [image: eXenSa] *Guillaume PITEL, Président* +33(0)626 222 431 eXenSa S.A.S. http://www.exensa.com/ 41, rue Périer - 92120 Montrouge - FRANCE Tel +33(0)184 163 677 / Fax +33(0)972 283 705