Robert Metzger created FLINK-3078: ------------------------------------- Summary: JobManager does not shutdown when checkpointed jobs are running Key: FLINK-3078 URL: https://issues.apache.org/jira/browse/FLINK-3078 Project: Flink Issue Type: Bug Components: JobManager Affects Versions: 0.10.0, 0.10.1 Reporter: Robert Metzger
While testing the 0.10.1 release, I found that the JobManager does not shutdown when I'm stopping it while a streaming job is running. It seems that the checkpoint coordinator and the execution graph are still logging, even though the JobManager actor system and other services have been shut down. This is a log file of an affected JobManager: https://gist.github.com/rmetzger/a1532c18eb7081977cee {code} 11:58:04,406 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Flat Map -> Sink: Unnamed (10/10) (c6544ca6d88e2d1acdec5c838d5fce06) switched from CANCELING to FAILED 11:58:04,406 DEBUG org.apache.flink.runtime.executiongraph.ExecutionGraph - Kafka Consumer Topology switched from FAILING to RESTARTING. 11:58:04,407 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Delaying retry of job execution for 100000 ms ... 11:58:04,417 INFO org.apache.flink.runtime.blob.BlobServer - Stopped BLOB server at 0.0.0.0:44904 11:58:04,421 INFO akka.remote.RemoteActorRefProvider$RemotingTerminator - Shutting down remote daemon. 11:58:04,422 INFO akka.remote.RemoteActorRefProvider$RemotingTerminator - Remote daemon shut down; proceeding with flushing remote transports. 11:58:04,446 INFO akka.remote.RemoteActorRefProvider$RemotingTerminator - Remoting shut down. 11:58:04,473 INFO org.apache.flink.runtime.webmonitor.WebRuntimeMonitor - Removing web root dir /tmp/flink-web-2039bed3-d9f9-4950-83ab-6fb70f7fc302 11:58:04,590 INFO org.apache.flink.runtime.checkpoint.CheckpointCoordinator - Triggering checkpoint 66 @ 1448452684590 11:58:04,590 INFO org.apache.flink.runtime.checkpoint.CheckpointCoordinator - Checkpoint triggering task Source: Custom Source (1/10) is not being executed at the moment. Aborting checkpoint. 11:58:05,091 INFO org.apache.flink.runtime.checkpoint.CheckpointCoordinator - Triggering checkpoint 67 @ 1448452685091 11:58:05,091 INFO org.apache.flink.runtime.checkpoint.CheckpointCoordinator - Checkpoint triggering task Source: Custom Source (1/10) is not being executed at the moment. Aborting checkpoint. 11:58:05,590 INFO org.apache.flink.runtime.checkpoint.CheckpointCoordinator - Triggering checkpoint 68 @ 1448452685590 11:58:05,590 INFO org.apache.flink.runtime.checkpoint.CheckpointCoordinator - Checkpoint triggering task Source: Custom Source (1/10) is not being executed at the moment. Aborting checkpoint. 11:58:06,090 INFO org.apache.flink.runtime.checkpoint.CheckpointCoordinator - Triggering checkpoint 69 @ 1448452686090 11:58:06,091 INFO org.apache.flink.runtime.checkpoint.CheckpointCoordinator - Checkpoint triggering task Source: Custom Source (1/10) is not being executed at the moment. Aborting checkpoint. 11:58:06,590 INFO org.apache.flink.runtime.checkpoint.CheckpointCoordinator - Triggering checkpoint 70 @ 1448452686590 {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)