[ https://issues.apache.org/jira/browse/SPARK-11617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15007981#comment-15007981 ]
LingZhou commented on SPARK-11617: ---------------------------------- While running sort in spark-perf there are still exceeding memory limit errors . I am running it in yarn-client mode. 15/11/17 10:48:43 INFO scheduler.TaskSetManager: Finished task 339.0 in stage 3.0 (TID 19539) in 8887 ms on gsr493 (3/6400) 15/11/17 10:48:43 INFO scheduler.TaskSetManager: Starting task 503.0 in stage 3.0 (TID 19703, gsr493, partition 503,PROCESS_LOCAL, 1961 bytes) 15/11/17 10:48:43 INFO scheduler.TaskSetManager: Finished task 11.0 in stage 3.0 (TID 19211) in 8974 ms on gsr493 (4/6400) 15/11/17 10:48:43 INFO scheduler.TaskSetManager: Starting task 504.0 in stage 3.0 (TID 19704, gsr493, partition 504,PROCESS_LOCAL, 1961 bytes) 15/11/17 10:48:43 INFO scheduler.TaskSetManager: Finished task 211.0 in stage 3.0 (TID 19411) in 8967 ms on gsr493 (5/6400) 15/11/17 10:48:43 INFO cluster.YarnClientSchedulerBackend: Disabling executor 92. 15/11/17 10:48:43 ERROR cluster.YarnScheduler: Lost executor 92 on gsr489: Pending loss reason. 15/11/17 10:48:43 INFO scheduler.DAGScheduler: Executor lost: 92 (epoch 1) 15/11/17 10:48:43 INFO storage.BlockManagerMasterEndpoint: Trying to remove executor 92 from BlockManagerMaster. 15/11/17 10:48:43 INFO storage.BlockManagerMasterEndpoint: Removing block manager BlockManagerId(92, gsr489, 45506) 15/11/17 10:48:43 INFO storage.BlockManagerMaster: Removed 92 successfully in removeExecutor 15/11/17 10:48:43 INFO scheduler.ShuffleMapStage: ShuffleMapStage 2 is now unavailable on executor 92 (6336/6400, false) 15/11/17 10:48:43 ERROR cluster.YarnScheduler: Actual reason for lost executor 92: Container killed by YARN for exceeding memory limits. 9.1 GB of 9 GB physical memory used. Consider boosting spark.yarn.executor.memoryOverhead. 15/11/17 10:48:43 WARN scheduler.TaskSetManager: Lost task 68.0 in stage 3.0 (TID 19268, gsr489): ExecutorLostFailure (executor 92 exited caused by one of the running tasks) Reason: Container killed by YARN for exceeding memory limits. 9.1 GB of 9 GB physical memory used. Consider boosting spark.yarn.executor.memoryOverhead. 15/11/17 10:48:43 WARN scheduler.TaskSetManager: Lost task 268.0 in stage 3.0 (TID 19468, gsr489): ExecutorLostFailure (executor 92 exited caused by one of the running tasks) Reason: Container killed by YARN for exceeding memory limits. 9.1 GB of 9 GB physical memory used. Consider boosting spark.yarn.executor.memoryOverhead. 15/11/17 10:48:43 WARN scheduler.TaskSetManager: Lost task 468.0 in stage 3.0 (TID 19668, gsr489): ExecutorLostFailure (executor 92 exited caused by one of the running tasks) Reason: Container killed by YARN for exceeding memory limits. 9.1 GB of 9 GB physical memory used. Consider boosting spark.yarn.executor.memoryOverhead. 15/11/17 10:48:43 WARN scheduler.TaskSetManager: Lost task 168.0 in stage 3.0 (TID 19368, gsr489): ExecutorLostFailure (executor 92 exited caused by one of the running tasks) Reason: Container killed by YARN for exceeding memory limits. 9.1 GB of 9 GB physical memory used. Consider boosting spark.yarn.executor.memoryOverhead. 15/11/17 10:48:43 WARN scheduler.TaskSetManager: Lost task 368.0 in stage 3.0 (TID 19568, gsr489): ExecutorLostFailure (executor 92 exited caused by one of the running tasks) Reason: Container killed by YARN for exceeding memory limits. 9.1 GB of 9 GB physical memory used. Consider boosting spark.yarn.executor.memoryOverhead. 15/11/17 10:48:43 WARN cluster.YarnSchedulerBackend$YarnSchedulerEndpoint: Container killed by YARN for exceeding memory limits. 9.1 GB of 9 GB physical memory used. Consider boosting spark.yarn.executor.memoryOverhead. 15/11/17 10:48:43 INFO cluster.YarnClientSchedulerBackend: Asked to remove non-existent executor 92 15/11/17 10:48:44 INFO storage.BlockManagerInfo: Removed broadcast_0_piece0 on gsr444:54110 in memory (size: 1462.0 B, free: 125.8 MB) 15/11/17 10:48:44 INFO scheduler.TaskSetManager: Starting task 368.1 in stage 3.0 (TID 19705, gsr493, partition 368,PROCESS_LOCAL, 1961 bytes) 15/11/17 10:48:44 INFO scheduler.TaskSetManager: Finished task 439.0 in stage 3.0 (TID 19639) in 9696 ms on gsr493 (6/6400) 15/11/17 10:48:44 INFO cluster.YarnClientSchedulerBackend: Disabling executor 84. 15/11/17 10:48:44 ERROR cluster.YarnScheduler: Lost executor 84 on gsr491: Pending loss reason. 15/11/17 10:48:44 INFO scheduler.DAGScheduler: Executor lost: 84 (epoch 2) 15/11/17 10:48:44 INFO storage.BlockManagerMasterEndpoint: Trying to remove executor 84 from BlockManagerMaster. 15/11/17 10:48:44 INFO storage.BlockManagerMasterEndpoint: Removing block manager BlockManagerId(84, gsr491, 38469) 15/11/17 10:48:44 INFO storage.BlockManagerMaster: Removed 84 successfully in removeExecutor 15/11/17 10:48:44 INFO scheduler.ShuffleMapStage: ShuffleMapStage 2 is now unavailable on executor 84 (6271/6400, false) 15/11/17 10:48:44 ERROR cluster.YarnScheduler: Actual reason for lost executor 84: Container killed by YARN for exceeding memory limits. 9.0 GB of 9 GB physical memory used. Consider boosting spark.yarn.executor.memoryOverhead. 15/11/17 10:48:44 WARN scheduler.TaskSetManager: Lost task 155.0 in stage 3.0 (TID 19355, gsr491): ExecutorLostFailure (executor 84 exited caused by one of the running tasks) Reason: Container killed by YARN for exceeding memory limits. 9.0 GB of 9 GB physical memory used. Consider boosting spark.yarn.executor.memoryOverhead. 15/11/17 10:48:44 WARN scheduler.TaskSetManager: Lost task 355.0 in stage 3.0 (TID 19555, gsr491): ExecutorLostFailure (executor 84 exited caused by one of the running tasks) Reason: Container killed by YARN for exceeding memory limits. 9.0 GB of 9 GB physical memory used. Consider boosting spark.yarn.executor.memoryOverhead. 15/11/17 10:48:44 WARN cluster.YarnSchedulerBackend$YarnSchedulerEndpoint: Container killed by YARN for exceeding memory limits. 9.0 GB of 9 GB physical memory used. Consider boosting spark.yarn.executor.memoryOverhead. > MEMORY LEAK: ByteBuf.release() was not called before it's garbage-collected > --------------------------------------------------------------------------- > > Key: SPARK-11617 > URL: https://issues.apache.org/jira/browse/SPARK-11617 > Project: Spark > Issue Type: Bug > Components: Spark Core, YARN > Affects Versions: 1.6.0 > Reporter: LingZhou > Fix For: 1.6.0 > > > The problem may be related to > [SPARK-11235][NETWORK] Add ability to stream data using network lib. > while running on yarn-client mode, there are error messages: > 15/11/09 10:23:55 ERROR util.ResourceLeakDetector: LEAK: ByteBuf.release() > was not called before it's garbage-collected. Enable advanced leak reporting > to find out where the leak occurred. To enable advanced leak reporting, > specify the JVM option '-Dio.netty.leakDetectionLevel=advanced' or call > ResourceLeakDetector.setLevel() See > http://netty.io/wiki/reference-counted-objects.html for more information. > and then it will cause > cluster.YarnSchedulerBackend$YarnSchedulerEndpoint: Container killed by YARN > for exceeding memory limits. 9.0 GB of 9 GB physical memory used. Consider > boosting spark.yarn.executor.memoryOverhead. > and WARN scheduler.TaskSetManager: Lost task 105.0 in stage 1.0 (TID 2616, > gsr489): java.lang.IndexOutOfBoundsException: index: 130828, length: 16833 > (expected: range(0, 524288)). -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org