[ https://issues.apache.org/jira/browse/SPARK-21733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16136400#comment-16136400 ]
Jepson edited comment on SPARK-21733 at 8/22/17 7:47 AM: --------------------------------------------------------- [~jerryshao] Thanks for you quick reply. The spark streaming with kafka scala code : scc.start() scc.awaitTermination() *1.And I set the parameters:* --driver-memory 4g \ --executor-memory 4g \ --executor-cores 4 \ --num-executors 4 \ --conf "spark.yarn.am.memory=1024m" \ --conf "spark.yarn.am.memoryOverhead=1024m" \ --conf "spark.yarn.driver.memoryOverhead=4096m" \ --conf "spark.yarn.executor.memoryOverhead=4096m" \ *2.The error again.* was (Author: 1028344...@qq.com): [~jerryshao] Thanks for you quick reply. The spark streaming with kafka scala code : scc.start() scc.awaitTermination() *1.And I set the parameters:* --driver-memory 4g \ --executor-memory 4g \ --executor-cores 4 \ --num-executors 4 \ --conf "spark.yarn.am.memory=1024m" \ --conf "spark.yarn.am.memoryOverhead=1024m" \ --conf "spark.yarn.driver.memoryOverhead=4096m" \ --conf "spark.yarn.executor.memoryOverhead=4096m" \ *2.The error again:* 2017-08-22 15:06:32,082 *INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Memory usage of ProcessTree 5382 for container-id container_e65_1503383442059_0002_01_000006: 573.9 MB of 8 GB physical memory used; 6.2 GB of 40 GB virtual memory used* 2017-08-22 15:06:33,026 DEBUG org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Node's health-status : true, 2017-08-22 15:06:33,026 DEBUG org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Sending out 1 container statuses: [ContainerStatus: [ContainerId: container_e65_1503383442059_0002_01_000006, State: RUNNING, Diagnostics: , ExitStatus: -1000, ]] 2017-08-22 15:06:33,026 DEBUG org.apache.hadoop.ipc.Client: IPC Client (2036704540) connection to hadoop37.jiuye/192.168.17.37:8031 from yarn sending #3069 2017-08-22 15:06:33,027 DEBUG org.apache.hadoop.ipc.Client: IPC Client (2036704540) connection to hadoop37.jiuye/192.168.17.37:8031 from yarn got value #3069 2017-08-22 15:06:33,027 DEBUG org.apache.hadoop.ipc.ProtobufRpcEngine: Call: nodeHeartbeat took 1ms 2017-08-22 15:06:34,028 DEBUG org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Node's health-status : true, 2017-08-22 15:06:34,028 DEBUG org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Sending out 1 container statuses: [ContainerStatus: [ContainerId: container_e65_1503383442059_0002_01_000006, State: RUNNING, Diagnostics: , ExitStatus: -1000, ]] 2017-08-22 15:06:34,028 DEBUG org.apache.hadoop.ipc.Client: IPC Client (2036704540) connection to hadoop37.jiuye/192.168.17.37:8031 from yarn sending #3070 2017-08-22 15:06:34,029 DEBUG org.apache.hadoop.ipc.Client: IPC Client (2036704540) connection to hadoop37.jiuye/192.168.17.37:8031 from yarn got value #3070 2017-08-22 15:06:34,029 DEBUG org.apache.hadoop.ipc.ProtobufRpcEngine: Call: nodeHeartbeat took 1ms 2017-08-22 15:06:35,030 DEBUG org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Node's health-status : true, 2017-08-22 15:06:35,030 DEBUG org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Sending out 1 container statuses: [ContainerStatus: [ContainerId: container_e65_1503383442059_0002_01_000006, State: RUNNING, Diagnostics: , ExitStatus: -1000, ]] 2017-08-22 15:06:35,030 DEBUG org.apache.hadoop.ipc.Client: IPC Client (2036704540) connection to hadoop37.jiuye/192.168.17.37:8031 from yarn sending #3071 2017-08-22 15:06:35,031 DEBUG org.apache.hadoop.ipc.Client: IPC Client (2036704540) connection to hadoop37.jiuye/192.168.17.37:8031 from yarn got value #3071 2017-08-22 15:06:35,031 DEBUG org.apache.hadoop.ipc.ProtobufRpcEngine: Call: nodeHeartbeat took 1ms 2017-08-22 15:06:35,084 DEBUG org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Current ProcessTree list : [ 5382 ] 2017-08-22 15:06:35,084 DEBUG org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Constructing ProcessTree for : PID = 5382 ContainerId = container_e65_1503383442059_0002_01_000006 2017-08-22 15:06:35,092 DEBUG org.apache.hadoop.yarn.util.ProcfsBasedProcessTree: [ 5382 5532 ] 2017-08-22 15:06:35,092 *INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Memory usage of ProcessTree 5382 for container-id container_e65_1503383442059_0002_01_000006: 573.9 MB of 8 GB physical memory used; 6.2 GB of 40 GB virtual memory used* 2017-08-22 15:06:36,031 DEBUG org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Node's health-status : true, 2017-08-22 15:06:36,032 DEBUG org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Sending out 1 container statuses: [ContainerStatus: [ContainerId: container_e65_1503383442059_0002_01_000006, State: RUNNING, Diagnostics: , ExitStatus: -1000, ]] 2017-08-22 15:06:36,032 DEBUG org.apache.hadoop.ipc.Client: IPC Client (2036704540) connection to hadoop37.jiuye/192.168.17.37:8031 from yarn sending #3072 2017-08-22 15:06:36,032 DEBUG org.apache.hadoop.ipc.Client: IPC Client (2036704540) connection to hadoop37.jiuye/192.168.17.37:8031 from yarn got value #3072 2017-08-22 15:06:36,033 DEBUG org.apache.hadoop.ipc.ProtobufRpcEngine: Call: nodeHeartbeat took 1ms 2017-08-22 15:06:37,037 DEBUG org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Node's health-status : true, 2017-08-22 15:06:37,037 DEBUG org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Sending out 1 container statuses: [ContainerStatus: [ContainerId: container_e65_1503383442059_0002_01_000006, State: RUNNING, Diagnostics: , ExitStatus: -1000, ]] 2017-08-22 15:06:37,037 DEBUG org.apache.hadoop.ipc.Client: IPC Client (2036704540) connection to hadoop37.jiuye/192.168.17.37:8031 from yarn sending #3073 2017-08-22 15:06:37,038 DEBUG org.apache.hadoop.ipc.Client: IPC Client (2036704540) connection to hadoop37.jiuye/192.168.17.37:8031 from yarn got value #3073 2017-08-22 15:06:37,038 DEBUG org.apache.hadoop.ipc.ProtobufRpcEngine: Call: nodeHeartbeat took 1ms 2017-08-22 15:06:37,564 DEBUG org.apache.hadoop.ipc.Server: IPC Server idle connection scanner for port 8040: task running 2017-08-22 15:06:37,691 DEBUG org.apache.hadoop.ipc.Server: IPC Server idle connection scanner for port 8041: task running 2017-08-22 15:06:38,040 DEBUG org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Node's health-status : true, 2017-08-22 15:06:38,040 DEBUG org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Sending out 1 container statuses: [ContainerStatus: [ContainerId: container_e65_1503383442059_0002_01_000006, State: RUNNING, Diagnostics: , ExitStatus: -1000, ]] 2017-08-22 15:06:38,040 DEBUG org.apache.hadoop.ipc.Client: IPC Client (2036704540) connection to hadoop37.jiuye/192.168.17.37:8031 from yarn sending #3074 2017-08-22 15:06:38,041 DEBUG org.apache.hadoop.ipc.Client: IPC Client (2036704540) connection to hadoop37.jiuye/192.168.17.37:8031 from yarn got value #3074 2017-08-22 15:06:38,041 DEBUG org.apache.hadoop.ipc.ProtobufRpcEngine: Call: nodeHeartbeat took 1ms 2017-08-22 15:06:38,041 DEBUG org.apache.hadoop.yarn.event.AsyncDispatcher: Dispatching the event org.apache.hadoop.yarn.server.nodemanager.CMgrCompletedContainersEvent.EventType: FINISH_CONTAINERS 2017-08-22 15:06:38,041* DEBUG org.apache.hadoop.yarn.event.AsyncDispatcher: Dispatching the event org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerKillEvent.EventType: KILL_CONTAINER* 2017-08-22 15:06:38,041 DEBUG org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Processing container_e65_1503383442059_0002_01_000006 of type KILL_CONTAINER 2017-08-22 15:06:38,042 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Container container_e65_1503383442059_0002_01_000006 transitioned from RUNNING to KILLING 2017-08-22 15:06:38,042 DEBUG org.apache.hadoop.yarn.event.AsyncDispatcher: Dispatching the event org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainersLauncherEvent.EventType: CLEANUP_CONTAINER 2017-08-22 15:06:38,042 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch: Cleaning up container container_e65_1503383442059_0002_01_000006 2017-08-22 15:06:38,042 DEBUG org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch: Marking container container_e65_1503383442059_0002_01_000006 as inactive 2017-08-22 15:06:38,042 DEBUG org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch: Getting pid for container container_e65_1503383442059_0002_01_000006 to kill from pid file /yarn/nm/nmPrivate/application_1503383442059_0002/container_e65_1503383442059_0002_01_000006/container_e65_1503383442059_0002_01_000006.pid 2017-08-22 15:06:38,042 DEBUG org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch: Accessing pid for container container_e65_1503383442059_0002_01_000006 from pid file /yarn/nm/nmPrivate/application_1503383442059_0002/container_e65_1503383442059_0002_01_000006/container_e65_1503383442059_0002_01_000006.pid 2017-08-22 15:06:38,042 DEBUG org.apache.hadoop.yarn.server.nodemanager.util.ProcessIdFileReader: Accessing pid from pid file /yarn/nm/nmPrivate/application_1503383442059_0002/container_e65_1503383442059_0002_01_000006/container_e65_1503383442059_0002_01_000006.pid 2017-08-22 15:06:38,042 DEBUG org.apache.hadoop.yarn.server.nodemanager.util.ProcessIdFileReader: Got pid 5382 from path /yarn/nm/nmPrivate/application_1503383442059_0002/container_e65_1503383442059_0002_01_000006/container_e65_1503383442059_0002_01_000006.pid 2017-08-22 15:06:38,042 DEBUG org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch: Got pid 5382 for container container_e65_1503383442059_0002_01_000006 2017-08-22 15:06:38,042 DEBUG org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch: Sending signal to pid 5382 as user hdfs for container container_e65_1503383442059_0002_01_000006 2017-08-22 15:06:38,042 DEBUG org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Sending signal 15 to pid 5382 as user hdfs 2017-08-22 15:06:38,046 DEBUG org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch: Sent signal SIGTERM to pid 5382 as user hdfs for container container_e65_1503383442059_0002_01_000006, result=success 2017-08-22 15:06:38,046 DEBUG org.apache.hadoop.security.UserGroupInformation: PrivilegedAction as:yarn (auth:SIMPLE) from:org.apache.hadoop.fs.FileContext.getAbstractFileSystem(FileContext.java:338) 2017-08-22 15:06:38,048 WARN org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Exit code from container container_e65_1503383442059_0002_01_000006 is : 143 2017-08-22 15:06:38,048 DEBUG org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Processing container_e65_1503383442059_0002_01_000006 of type UPDATE_DIAGNOSTICS_MSG 2017-08-22 15:06:38,048 DEBUG org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch: Container container_e65_1503383442059_0002_01_000006 completed with exit code 143 2017-08-22 15:06:38,067 DEBUG org.apache.hadoop.yarn.event.AsyncDispatcher: Dispatching the event org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerExitEvent.EventType: CONTAINER_KILLED_ON_REQUEST 2017-08-22 15:06:38,067 DEBUG org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Processing container_e65_1503383442059_0002_01_000006 of type CONTAINER_KILLED_ON_REQUEST 2017-08-22 15:06:38,067 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Container container_e65_1503383442059_0002_01_000006 transitioned from KILLING to CONTAINER_CLEANEDUP_AFTER_KILL 2017-08-22 15:06:38,067 DEBUG org.apache.hadoop.yarn.event.AsyncDispatcher: Dispatching the event org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.event.ContainerLocalizationCleanupEvent.EventType: CLEANUP_CONTAINER_RESOURCES > ERROR executor.CoarseGrainedExecutorBackend: RECEIVED SIGNAL TERM > ----------------------------------------------------------------- > > Key: SPARK-21733 > URL: https://issues.apache.org/jira/browse/SPARK-21733 > Project: Spark > Issue Type: Bug > Components: DStreams > Affects Versions: 2.1.1 > Environment: Apache Spark2.1.1 > CDH5.12.0 Yarn > Reporter: Jepson > Original Estimate: 96h > Remaining Estimate: 96h > > Kafka+Spark streaming ,throw these error: > {code:java} > 17/08/15 09:34:14 INFO memory.MemoryStore: Block broadcast_8003_piece0 stored > as bytes in memory (estimated size 1895.0 B, free 1643.2 MB) > 17/08/15 09:34:14 INFO broadcast.TorrentBroadcast: Reading broadcast variable > 8003 took 11 ms > 17/08/15 09:34:14 INFO memory.MemoryStore: Block broadcast_8003 stored as > values in memory (estimated size 2.9 KB, free 1643.2 MB) > 17/08/15 09:34:14 INFO kafka010.KafkaRDD: Beginning offset 10130733 is the > same as ending offset skipping kssh 5 > 17/08/15 09:34:14 INFO executor.Executor: Finished task 7.0 in stage 8003.0 > (TID 64178). 1740 bytes result sent to driver > 17/08/15 09:34:21 INFO storage.BlockManager: Removing RDD 8002 > 17/08/15 09:34:21 INFO executor.CoarseGrainedExecutorBackend: Got assigned > task 64186 > 17/08/15 09:34:21 INFO executor.Executor: Running task 7.0 in stage 8004.0 > (TID 64186) > 17/08/15 09:34:21 INFO broadcast.TorrentBroadcast: Started reading broadcast > variable 8004 > 17/08/15 09:34:21 INFO memory.MemoryStore: Block broadcast_8004_piece0 stored > as bytes in memory (estimated size 1895.0 B, free 1643.2 MB) > 17/08/15 09:34:21 INFO broadcast.TorrentBroadcast: Reading broadcast variable > 8004 took 8 ms > 17/08/15 09:34:21 INFO memory.MemoryStore: Block broadcast_8004 stored as > values in memory (estimated size 2.9 KB, free 1643.2 MB) > 17/08/15 09:34:21 INFO kafka010.KafkaRDD: Beginning offset 10130733 is the > same as ending offset skipping kssh 5 > 17/08/15 09:34:21 INFO executor.Executor: Finished task 7.0 in stage 8004.0 > (TID 64186). 1740 bytes result sent to driver > h3. 17/08/15 09:34:29 ERROR executor.CoarseGrainedExecutorBackend: RECEIVED > SIGNAL TERM > 17/08/15 09:34:29 INFO storage.DiskBlockManager: Shutdown hook called > 17/08/15 09:34:29 INFO util.ShutdownHookManager: Shutdown hook called > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org