I'm always suffering Spark SQL job fails with error "Container exited with a non-zero exit code 143". I know that it was casused by the memory used execeeds the limits of spark.yarn.executor.memoryOverhead. As shown below, memory allocation request was failed at 18/11/08 17:36:05, then it RECEIVED SIGNAL TERM. Can spark executor avoid the fate of being destroyed ?
my conf: --master yarn-client \ --driver-memory 10G \ --executor-memory 10G \ --executor-cores 5 \ --num-executors 12 \ --conf "spark.executor.extraJavaOptions= -XX:MaxPermSize=256M" \ --conf "spark.sql.shuffle.partitions=200" \ --conf "spark.scheduler.mode=FAIR" \ --conf "spark.yarn.executor.memoryOverhead=2048" \ ============================================================= 18/11/08 17:35:52 INFO [Executor task launch worker for task 13694] FileScanRDD: Reading File path: hdfs:// phive.smyprd.com:8020/user/hive/warehouse/events/dt=20180103/part-00000, range: 134217728-268435456, partition values: [20180103] 18/11/08 17:35:52 INFO [Executor task launch worker for task 13700] FileScanRDD: Reading File path: hdfs:// phive.smyprd.com:8020/user/hive/warehouse/events/dt=20180104/part-00000, range: 402653184-536870912, partition values: [20180104] 18/11/08 17:35:52 INFO [Executor task launch worker for task 13688] FileScanRDD: Reading File path: hdfs:// phive.smyprd.com:8020/user/hive/warehouse/events/dt=20180101/part-00000, range: 134217728-268435456, partition values: [20180101] 18/11/08 17:35:52 INFO [Executor task launch worker for task 13694] TorrentBroadcast: Started reading broadcast variable 135 18/11/08 17:35:52 INFO [Executor task launch worker for task 13694] MemoryStore: Block broadcast_135_piece0 stored as bytes in memory (estimated size 27.2 KB, free 1822.3 MB) 18/11/08 17:35:52 INFO [Executor task launch worker for task 13694] TorrentBroadcast: Reading broadcast variable 135 took 3 ms 18/11/08 17:35:52 INFO [Executor task launch worker for task 13694] MemoryStore: Block broadcast_135 stored as values in memory (estimated size 365.6 KB, free 1821.9 MB) 18/11/08 17:36:00 INFO [Executor task launch worker for task 13700] ShuffleExternalSorter: Thread 1100 spilling sort data of 580.0 MB to disk (0 time so far) 18/11/08 17:36:03 INFO [Executor task launch worker for task 13688] ShuffleExternalSorter: Thread 1098 spilling sort data of 580.0 MB to disk (0 time so far) 18/11/08 17:36:05 WARN [Executor task launch worker for task 13694] TaskMemoryManager: Failed to allocate a page (67108864 bytes), try again. 18/11/08 17:36:05 INFO [Executor task launch worker for task 13694] ShuffleExternalSorter: Thread 1099 spilling sort data of 514.0 MB to disk (0 time so far) 18/11/08 17:36:05 ERROR [SIGTERM handler] CoarseGrainedExecutorBackend: RECEIVED SIGNAL TERM 18/11/08 17:36:05 INFO [Thread-2] DiskBlockManager: Shutdown hook called 18/11/08 17:36:10 INFO [Thread-2] ShutdownHookManager: Shutdown hook called 18/11/08 17:36:10 INFO [Thread-2] ShutdownHookManager: Deleting directory /data5/yarn/nm/usercache/admin/appcache/application_1527906298937_39699/spark-e6345a15-d684-440a-a4f7-d23884ee9806 18/11/08 17:36:10 INFO [Thread-2] ShutdownHookManager: Deleting directory /data9/yarn/nm/usercache/admin/appcache/application_1527906298937_39699/spark-23870d5b-9e6f-4587-bf01-eaf4ea986293 18/11/08 17:36:10 INFO [Thread-2] ShutdownHookManager: Deleting directory /data7/yarn/nm/usercache/admin/appcache/application_1527906298937_39699/spark-65f184dc-af68-422b-9d2b-e09941ff2679 18/11/08 17:36:10 INFO [Thread-2] ShutdownHookManager: Deleting directory /data15/yarn/nm/usercache/admin/appcache/application_1527906298937_39699/spark-17488560-736e-4ba4-9ae3-f07e1e33afda 18/11/08 17:36:10 INFO [Thread-2] ShutdownHookManager: Deleting directory /data4/yarn/nm/usercache/admin/appcache/application_1527906298937_39699/spark-745de0ee-aa39-4cea-b05e-6f924006d4a9 18/11/08 17:36:10 INFO [Thread-2] ShutdownHookManager: Deleting directory /data6/yarn/nm/usercache/admin/appcache/application_1527906298937_39699/spark-2db274c9-0c45-4e15-ad42-7bce16329b31 18/11/08 17:36:10 INFO [Thread-2] ShutdownHookManager: Deleting directory /data10/yarn/nm/usercache/admin/appcache/application_1527906298937_39699/spark-6f41703c-e844-4130-9800-1cde62e8bf8c 18/11/08 17:36:10 INFO [Thread-2] ShutdownHookManager: Deleting directory /data3/yarn/nm/usercache/admin/appcache/application_1527906298937_39699/spark-6eb2ce0e-a4d6-4300-8154-965847e671ef 18/11/08 17:36:10 INFO [Thread-2] ShutdownHookManager: Deleting directory /data12/yarn/nm/usercache/admin/appcache/application_1527906298937_39699/spark-cd6c6d05-052e-4316-b7ff-342c5cfac817 18/11/08 17:36:10 INFO [Thread-2] ShutdownHookManager: Deleting directory /data2/yarn/nm/usercache/admin/appcache/application_1527906298937_39699/spark-c702d40f-1997-4742-80ea-30a15c6ec738 18/11/08 17:36:10 INFO [Thread-2] ShutdownHookManager: Deleting directory /data11/yarn/nm/usercache/admin/appcache/application_1527906298937_39699/spark-74777ef3-13c4-43d6-bd84-47cc0aba195e 18/11/08 17:36:10 INFO [Thread-2] ShutdownHookManager: Deleting directory /data13/yarn/nm/usercache/admin/appcache/application_1527906298937_39699/spark-690ac7f2-9ffe-437a-a4d7-7426b85993ca 18/11/08 17:36:10 INFO [Thread-2] ShutdownHookManager: Deleting directory /data14/yarn/nm/usercache/admin/appcache/application_1527906298937_39699/spark-9eee2e05-7d13-45fb-abe3-5583942af555 18/11/08 17:36:10 INFO [Thread-2] ShutdownHookManager: Deleting directory /data8/yarn/nm/usercache/admin/appcache/application_1527906298937_39699/spark-5c2d57a1-2201-4fe0-bbbf-8aeaa124cbbf =============================================================