[ 
https://issues.apache.org/jira/browse/SPARK-11617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15007981#comment-15007981
 ] 

LingZhou commented on SPARK-11617:
----------------------------------

While running sort in spark-perf there are still exceeding memory limit errors 
. I am running it in yarn-client mode.

15/11/17 10:48:43 INFO scheduler.TaskSetManager: Finished task 339.0 in stage 
3.0 (TID 19539) in 8887 ms on gsr493 (3/6400)
15/11/17 10:48:43 INFO scheduler.TaskSetManager: Starting task 503.0 in stage 
3.0 (TID 19703, gsr493, partition 503,PROCESS_LOCAL, 1961 bytes)
15/11/17 10:48:43 INFO scheduler.TaskSetManager: Finished task 11.0 in stage 
3.0 (TID 19211) in 8974 ms on gsr493 (4/6400)
15/11/17 10:48:43 INFO scheduler.TaskSetManager: Starting task 504.0 in stage 
3.0 (TID 19704, gsr493, partition 504,PROCESS_LOCAL, 1961 bytes)
15/11/17 10:48:43 INFO scheduler.TaskSetManager: Finished task 211.0 in stage 
3.0 (TID 19411) in 8967 ms on gsr493 (5/6400)
15/11/17 10:48:43 INFO cluster.YarnClientSchedulerBackend: Disabling executor 
92.
15/11/17 10:48:43 ERROR cluster.YarnScheduler: Lost executor 92 on gsr489: 
Pending loss reason.
15/11/17 10:48:43 INFO scheduler.DAGScheduler: Executor lost: 92 (epoch 1)
15/11/17 10:48:43 INFO storage.BlockManagerMasterEndpoint: Trying to remove 
executor 92 from BlockManagerMaster.
15/11/17 10:48:43 INFO storage.BlockManagerMasterEndpoint: Removing block 
manager BlockManagerId(92, gsr489, 45506)
15/11/17 10:48:43 INFO storage.BlockManagerMaster: Removed 92 successfully in 
removeExecutor
15/11/17 10:48:43 INFO scheduler.ShuffleMapStage: ShuffleMapStage 2 is now 
unavailable on executor 92 (6336/6400, false)
15/11/17 10:48:43 ERROR cluster.YarnScheduler: Actual reason for lost executor 
92: Container killed by YARN for exceeding memory limits. 9.1 GB of 9 GB 
physical memory used. Consider boosting spark.yarn.executor.memoryOverhead.
15/11/17 10:48:43 WARN scheduler.TaskSetManager: Lost task 68.0 in stage 3.0 
(TID 19268, gsr489): ExecutorLostFailure (executor 92 exited caused by one of 
the running tasks) Reason: Container killed by YARN for exceeding memory 
limits. 9.1 GB of 9 GB physical memory used. Consider boosting 
spark.yarn.executor.memoryOverhead.
15/11/17 10:48:43 WARN scheduler.TaskSetManager: Lost task 268.0 in stage 3.0 
(TID 19468, gsr489): ExecutorLostFailure (executor 92 exited caused by one of 
the running tasks) Reason: Container killed by YARN for exceeding memory 
limits. 9.1 GB of 9 GB physical memory used. Consider boosting 
spark.yarn.executor.memoryOverhead.
15/11/17 10:48:43 WARN scheduler.TaskSetManager: Lost task 468.0 in stage 3.0 
(TID 19668, gsr489): ExecutorLostFailure (executor 92 exited caused by one of 
the running tasks) Reason: Container killed by YARN for exceeding memory 
limits. 9.1 GB of 9 GB physical memory used. Consider boosting 
spark.yarn.executor.memoryOverhead.
15/11/17 10:48:43 WARN scheduler.TaskSetManager: Lost task 168.0 in stage 3.0 
(TID 19368, gsr489): ExecutorLostFailure (executor 92 exited caused by one of 
the running tasks) Reason: Container killed by YARN for exceeding memory 
limits. 9.1 GB of 9 GB physical memory used. Consider boosting 
spark.yarn.executor.memoryOverhead.
15/11/17 10:48:43 WARN scheduler.TaskSetManager: Lost task 368.0 in stage 3.0 
(TID 19568, gsr489): ExecutorLostFailure (executor 92 exited caused by one of 
the running tasks) Reason: Container killed by YARN for exceeding memory 
limits. 9.1 GB of 9 GB physical memory used. Consider boosting 
spark.yarn.executor.memoryOverhead.
15/11/17 10:48:43 WARN cluster.YarnSchedulerBackend$YarnSchedulerEndpoint: 
Container killed by YARN for exceeding memory limits. 9.1 GB of 9 GB physical 
memory used. Consider boosting spark.yarn.executor.memoryOverhead.
15/11/17 10:48:43 INFO cluster.YarnClientSchedulerBackend: Asked to remove 
non-existent executor 92
15/11/17 10:48:44 INFO storage.BlockManagerInfo: Removed broadcast_0_piece0 on 
gsr444:54110 in memory (size: 1462.0 B, free: 125.8 MB)
15/11/17 10:48:44 INFO scheduler.TaskSetManager: Starting task 368.1 in stage 
3.0 (TID 19705, gsr493, partition 368,PROCESS_LOCAL, 1961 bytes)
15/11/17 10:48:44 INFO scheduler.TaskSetManager: Finished task 439.0 in stage 
3.0 (TID 19639) in 9696 ms on gsr493 (6/6400)
15/11/17 10:48:44 INFO cluster.YarnClientSchedulerBackend: Disabling executor 
84.
15/11/17 10:48:44 ERROR cluster.YarnScheduler: Lost executor 84 on gsr491: 
Pending loss reason.
15/11/17 10:48:44 INFO scheduler.DAGScheduler: Executor lost: 84 (epoch 2)
15/11/17 10:48:44 INFO storage.BlockManagerMasterEndpoint: Trying to remove 
executor 84 from BlockManagerMaster.
15/11/17 10:48:44 INFO storage.BlockManagerMasterEndpoint: Removing block 
manager BlockManagerId(84, gsr491, 38469)
15/11/17 10:48:44 INFO storage.BlockManagerMaster: Removed 84 successfully in 
removeExecutor
15/11/17 10:48:44 INFO scheduler.ShuffleMapStage: ShuffleMapStage 2 is now 
unavailable on executor 84 (6271/6400, false)
15/11/17 10:48:44 ERROR cluster.YarnScheduler: Actual reason for lost executor 
84: Container killed by YARN for exceeding memory limits. 9.0 GB of 9 GB 
physical memory used. Consider boosting spark.yarn.executor.memoryOverhead.
15/11/17 10:48:44 WARN scheduler.TaskSetManager: Lost task 155.0 in stage 3.0 
(TID 19355, gsr491): ExecutorLostFailure (executor 84 exited caused by one of 
the running tasks) Reason: Container killed by YARN for exceeding memory 
limits. 9.0 GB of 9 GB physical memory used. Consider boosting 
spark.yarn.executor.memoryOverhead.
15/11/17 10:48:44 WARN scheduler.TaskSetManager: Lost task 355.0 in stage 3.0 
(TID 19555, gsr491): ExecutorLostFailure (executor 84 exited caused by one of 
the running tasks) Reason: Container killed by YARN for exceeding memory 
limits. 9.0 GB of 9 GB physical memory used. Consider boosting 
spark.yarn.executor.memoryOverhead.
15/11/17 10:48:44 WARN cluster.YarnSchedulerBackend$YarnSchedulerEndpoint: 
Container killed by YARN for exceeding memory limits. 9.0 GB of 9 GB physical 
memory used. Consider boosting spark.yarn.executor.memoryOverhead.

> MEMORY LEAK: ByteBuf.release() was not called before it's garbage-collected
> ---------------------------------------------------------------------------
>
>                 Key: SPARK-11617
>                 URL: https://issues.apache.org/jira/browse/SPARK-11617
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core, YARN
>    Affects Versions: 1.6.0
>            Reporter: LingZhou
>             Fix For: 1.6.0
>
>
> The problem may be related to
>  [SPARK-11235][NETWORK] Add ability to stream data using network lib.
> while running on yarn-client mode, there are error messages:
> 15/11/09 10:23:55 ERROR util.ResourceLeakDetector: LEAK: ByteBuf.release() 
> was not called before it's garbage-collected. Enable advanced leak reporting 
> to find out where the leak occurred. To enable advanced leak reporting, 
> specify the JVM option '-Dio.netty.leakDetectionLevel=advanced' or call 
> ResourceLeakDetector.setLevel() See 
> http://netty.io/wiki/reference-counted-objects.html for more information.
> and then it will cause 
> cluster.YarnSchedulerBackend$YarnSchedulerEndpoint: Container killed by YARN 
> for exceeding memory limits. 9.0 GB of 9 GB physical memory used. Consider 
> boosting spark.yarn.executor.memoryOverhead.
> and WARN scheduler.TaskSetManager: Lost task 105.0 in stage 1.0 (TID 2616, 
> gsr489): java.lang.IndexOutOfBoundsException: index: 130828, length: 16833 
> (expected: range(0, 524288)).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to