[ https://issues.apache.org/jira/browse/SPARK-21733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16136260#comment-16136260 ]
Jepson edited comment on SPARK-21733 at 8/22/17 4:29 AM: --------------------------------------------------------- *The nodemanager log detail:* {code:java} 2017-08-22 11:20:07,984 DEBUG org.apache.hadoop.yarn.util.ProcfsBasedProcessTree: [ 17040 16747 ] 2017-08-22 11:20:07,984 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Memory usage of ProcessTree 16747 for container-id container_e56_1503371613444_0001_01_000002: 586.8 MB of 3 GB physical memory used; 4.5 GB of 6.3 GB virtual memory used 2017-08-22 11:20:07,984 DEBUG org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Constructing ProcessTree for : PID = 16766 ContainerId = container_e56_1503371613444_0001_01_000003 2017-08-22 11:20:07,992 DEBUG org.apache.hadoop.yarn.util.ProcfsBasedProcessTree: [ 17066 16766 ] 2017-08-22 11:20:07,992 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Memory usage of ProcessTree 16766 for container-id container_e56_1503371613444_0001_01_000003: 580.4 MB of 3 GB physical memory used; 4.6 GB of 6.3 GB virtual memory used 2017-08-22 11:20:08,716 DEBUG org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Node's health-status : true, 2017-08-22 11:20:08,717 DEBUG org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Sending out 3 container statuses: [ContainerStatus: [ContainerId: container_e56_1503371613444_0001_01_000001, State: RUNNING, Diagnostics: , ExitStatus: -1000, ], ContainerStatus: [ContainerId: container_e56_1503371613444_0001_01_000002, State: RUNNING, Diagnostics: , ExitStatus: -1000, ], ContainerStatus: [ContainerId: container_e56_1503371613444_0001_01_000003, State: RUNNING, Diagnostics: , ExitStatus: -1000, ]] 2017-08-22 11:20:08,717 TRACE org.apache.hadoop.ipc.ProtobufRpcEngine: 102: Call -> hadoop37.jiuye/192.168.17.37:8031: nodeHeartbeat {node_status { node_id { host: "hadoop44.jiuye" port: 8041 } response_id: 389 containersStatuses { container_id { app_attempt_id { application_id { id: 1 cluster_timestamp: 1503371613444 } attemptId: 1 } id: 61572651155457 } state: C_RUNNING diagnostics: "" exit_status: -1000 } containersStatuses { container_id { app_attempt_id { application_id { id: 1 cluster_timestamp: 1503371613444 } attemptId: 1 } id: 61572651155458 } state: C_RUNNING diagnostics: "" exit_status: -1000 } containersStatuses { container_id { app_attempt_id { application_id { id: 1 cluster_timestamp: 1503371613444 } attemptId: 1 } id: 61572651155459 } state: C_RUNNING diagnostics: "" exit_status: -1000 } nodeHealthStatus { is_node_healthy: true health_report: "" last_health_report_time: 1503371969299 } } last_known_container_token_master_key { key_id: -966413074 bytes: "a\021&\346gs\031n" } last_known_nm_token_master_key { key_id: -1126930838 bytes: "$j@\322\331dr`" }} 2017-08-22 11:20:08,717 DEBUG org.apache.hadoop.ipc.Client: IPC Client (1778801068) connection to hadoop37.jiuye/192.168.17.37:8031 from yarn sending #851 2017-08-22 11:20:08,720 DEBUG org.apache.hadoop.ipc.Client: IPC Client (1778801068) connection to hadoop37.jiuye/192.168.17.37:8031 from yarn got value #851 2017-08-22 11:20:08,720 DEBUG org.apache.hadoop.ipc.ProtobufRpcEngine: Call: nodeHeartbeat took 3ms 2017-08-22 11:20:08,720 TRACE org.apache.hadoop.ipc.ProtobufRpcEngine: 102: Response <- hadoop37.jiuye/192.168.17.37:8031: nodeHeartbeat {response_id: 390 nodeAction: NORMAL containers_to_cleanup { app_attempt_id { application_id { id: 1 cluster_timestamp: 1503371613444 } attemptId: 1 } id: 61572651155458 } containers_to_cleanup { app_attempt_id { application_id { id: 1 cluster_timestamp: 1503371613444 } attemptId: 1 } id: 61572651155459 } nextHeartBeatInterval: 1000} 2017-08-22 11:20:08,721 DEBUG org.apache.hadoop.yarn.event.AsyncDispatcher: Dispatching the event org.apache.hadoop.yarn.server.nodemanager.CMgrCompletedContainersEvent.EventType: FINISH_CONTAINERS *{color:#f6c342}2017-08-22 11:20:08,722 DEBUG org.apache.hadoop.yarn.event.AsyncDispatcher: Dispatching the event org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerKillEvent.EventType: KILL_CONTAINER{color}* 2017-08-22 11:20:08,722 DEBUG org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Processing container_e56_1503371613444_0001_01_000002 of type KILL_CONTAINER 2017-08-22 11:20:08,722 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Container container_e56_1503371613444_0001_01_000002 transitioned from RUNNING to KILLING 2017-08-22 11:20:08,722 DEBUG org.apache.hadoop.yarn.event.AsyncDispatcher: Dispatching the event org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerKillEvent.EventType: KILL_CONTAINER 2017-08-22 11:20:08,722 DEBUG org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Processing container_e56_1503371613444_0001_01_000003 of type KILL_CONTAINER 2017-08-22 11:20:08,722 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Container container_e56_1503371613444_0001_01_000003 transitioned from RUNNING to KILLING 2017-08-22 11:20:08,722 DEBUG org.apache.hadoop.yarn.event.AsyncDispatcher: Dispatching the event org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainersLauncherEvent.EventType: CLEANUP_CONTAINER 2017-08-22 11:20:08,722 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch: Cleaning up container container_e56_1503371613444_0001_01_000002 2017-08-22 11:20:08,722 DEBUG org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch: Marking container container_e56_1503371613444_0001_01_000002 as* inactive* {code} was (Author: 1028344...@qq.com): *The nodemanager log detail:* {code:java} 2017-08-22 11:20:07,984 DEBUG org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Constructing ProcessTree for : PID = 16766 ContainerId = container_e56_1503371613444_0001_01_000003 2017-08-22 11:20:07,992 DEBUG org.apache.hadoop.yarn.util.ProcfsBasedProcessTree: [ 17066 16766 ] 2017-08-22 11:20:07,992 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Memory usage of ProcessTree 16766 for container-id container_e56_1503371613444_0001_01_000003: 580.4 MB of 3 GB physical memory used; 4.6 GB of 6.3 GB virtual memory used 2017-08-22 11:20:08,716 DEBUG org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Node's health-status : true, 2017-08-22 11:20:08,717 DEBUG org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Sending out 3 container statuses: [ContainerStatus: [ContainerId: container_e56_1503371613444_0001_01_000001, State: RUNNING, Diagnostics: , ExitStatus: -1000, ], ContainerStatus: [ContainerId: container_e56_1503371613444_0001_01_000002, State: RUNNING, Diagnostics: , ExitStatus: -1000, ], ContainerStatus: [ContainerId: container_e56_1503371613444_0001_01_000003, State: RUNNING, Diagnostics: , ExitStatus: -1000, ]] 2017-08-22 11:20:08,717 TRACE org.apache.hadoop.ipc.ProtobufRpcEngine: 102: Call -> hadoop37.jiuye/192.168.17.37:8031: nodeHeartbeat {node_status { node_id { host: "hadoop44.jiuye" port: 8041 } response_id: 389 containersStatuses { container_id { app_attempt_id { application_id { id: 1 cluster_timestamp: 1503371613444 } attemptId: 1 } id: 61572651155457 } state: C_RUNNING diagnostics: "" exit_status: -1000 } containersStatuses { container_id { app_attempt_id { application_id { id: 1 cluster_timestamp: 1503371613444 } attemptId: 1 } id: 61572651155458 } state: C_RUNNING diagnostics: "" exit_status: -1000 } containersStatuses { container_id { app_attempt_id { application_id { id: 1 cluster_timestamp: 1503371613444 } attemptId: 1 } id: 61572651155459 } state: C_RUNNING diagnostics: "" exit_status: -1000 } nodeHealthStatus { is_node_healthy: true health_report: "" last_health_report_time: 1503371969299 } } last_known_container_token_master_key { key_id: -966413074 bytes: "a\021&\346gs\031n" } last_known_nm_token_master_key { key_id: -1126930838 bytes: "$j@\322\331dr`" }} 2017-08-22 11:20:08,717 DEBUG org.apache.hadoop.ipc.Client: IPC Client (1778801068) connection to hadoop37.jiuye/192.168.17.37:8031 from yarn sending #851 2017-08-22 11:20:08,720 DEBUG org.apache.hadoop.ipc.Client: IPC Client (1778801068) connection to hadoop37.jiuye/192.168.17.37:8031 from yarn got value #851 2017-08-22 11:20:08,720 DEBUG org.apache.hadoop.ipc.ProtobufRpcEngine: Call: nodeHeartbeat took 3ms 2017-08-22 11:20:08,720 TRACE org.apache.hadoop.ipc.ProtobufRpcEngine: 102: Response <- hadoop37.jiuye/192.168.17.37:8031: nodeHeartbeat {response_id: 390 nodeAction: NORMAL containers_to_cleanup { app_attempt_id { application_id { id: 1 cluster_timestamp: 1503371613444 } attemptId: 1 } id: 61572651155458 } containers_to_cleanup { app_attempt_id { application_id { id: 1 cluster_timestamp: 1503371613444 } attemptId: 1 } id: 61572651155459 } nextHeartBeatInterval: 1000} 2017-08-22 11:20:08,721 DEBUG org.apache.hadoop.yarn.event.AsyncDispatcher: Dispatching the event org.apache.hadoop.yarn.server.nodemanager.CMgrCompletedContainersEvent.EventType: FINISH_CONTAINERS 2017-08-22 11:20:08,722 {color:#59afe1}DEBUG org.apache.hadoop.yarn.event.AsyncDispatcher: Dispatching the event org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerKillEvent.EventType: KILL_CONTAINER{color} 2017-08-22 11:20:08,722 DEBUG org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Processing container_e56_1503371613444_0001_01_000002 of type KILL_CONTAINER 2017-08-22 11:20:08,722 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Container container_e56_1503371613444_0001_01_000002 transitioned from RUNNING to KILLING 2017-08-22 11:20:08,722 DEBUG org.apache.hadoop.yarn.event.AsyncDispatcher: Dispatching the event org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerKillEvent.EventType: KILL_CONTAINER 2017-08-22 11:20:08,722 DEBUG org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Processing container_e56_1503371613444_0001_01_000003 of type KILL_CONTAINER 2017-08-22 11:20:08,722 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Container container_e56_1503371613444_0001_01_000003 transitioned from RUNNING to KILLING 2017-08-22 11:20:08,722 DEBUG org.apache.hadoop.yarn.event.AsyncDispatcher: Dispatching the event org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainersLauncherEvent.EventType: CLEANUP_CONTAINER 2017-08-22 11:20:08,722 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch: Cleaning up container container_e56_1503371613444_0001_01_000002 2017-08-22 11:20:08,722 DEBUG org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch: Marking container container_e56_1503371613444_0001_01_000002 as inactive {code} > ERROR executor.CoarseGrainedExecutorBackend: RECEIVED SIGNAL TERM > ----------------------------------------------------------------- > > Key: SPARK-21733 > URL: https://issues.apache.org/jira/browse/SPARK-21733 > Project: Spark > Issue Type: Bug > Components: DStreams > Affects Versions: 2.1.1 > Environment: Apache Spark2.1.1 > CDH5.12.0 Yarn > Reporter: Jepson > Original Estimate: 96h > Remaining Estimate: 96h > > Kafka+Spark streaming ,throw these error: > {code:java} > 17/08/15 09:34:14 INFO memory.MemoryStore: Block broadcast_8003_piece0 stored > as bytes in memory (estimated size 1895.0 B, free 1643.2 MB) > 17/08/15 09:34:14 INFO broadcast.TorrentBroadcast: Reading broadcast variable > 8003 took 11 ms > 17/08/15 09:34:14 INFO memory.MemoryStore: Block broadcast_8003 stored as > values in memory (estimated size 2.9 KB, free 1643.2 MB) > 17/08/15 09:34:14 INFO kafka010.KafkaRDD: Beginning offset 10130733 is the > same as ending offset skipping kssh 5 > 17/08/15 09:34:14 INFO executor.Executor: Finished task 7.0 in stage 8003.0 > (TID 64178). 1740 bytes result sent to driver > 17/08/15 09:34:21 INFO storage.BlockManager: Removing RDD 8002 > 17/08/15 09:34:21 INFO executor.CoarseGrainedExecutorBackend: Got assigned > task 64186 > 17/08/15 09:34:21 INFO executor.Executor: Running task 7.0 in stage 8004.0 > (TID 64186) > 17/08/15 09:34:21 INFO broadcast.TorrentBroadcast: Started reading broadcast > variable 8004 > 17/08/15 09:34:21 INFO memory.MemoryStore: Block broadcast_8004_piece0 stored > as bytes in memory (estimated size 1895.0 B, free 1643.2 MB) > 17/08/15 09:34:21 INFO broadcast.TorrentBroadcast: Reading broadcast variable > 8004 took 8 ms > 17/08/15 09:34:21 INFO memory.MemoryStore: Block broadcast_8004 stored as > values in memory (estimated size 2.9 KB, free 1643.2 MB) > 17/08/15 09:34:21 INFO kafka010.KafkaRDD: Beginning offset 10130733 is the > same as ending offset skipping kssh 5 > 17/08/15 09:34:21 INFO executor.Executor: Finished task 7.0 in stage 8004.0 > (TID 64186). 1740 bytes result sent to driver > h3. 17/08/15 09:34:29 ERROR executor.CoarseGrainedExecutorBackend: RECEIVED > SIGNAL TERM > 17/08/15 09:34:29 INFO storage.DiskBlockManager: Shutdown hook called > 17/08/15 09:34:29 INFO util.ShutdownHookManager: Shutdown hook called > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org