[ https://issues.apache.org/jira/browse/FLINK-17853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17119634#comment-17119634 ]
Fritz Budiyanto commented on FLINK-17853: ----------------------------------------- Thanks. We will migrate to 1.10. Feel free to close this ticket. I'll re-open if it is still happening in 1.10. > JobGraph is not getting deleted after Job cancelation > ----------------------------------------------------- > > Key: FLINK-17853 > URL: https://issues.apache.org/jira/browse/FLINK-17853 > Project: Flink > Issue Type: Bug > Components: Runtime / Coordination > Affects Versions: 1.9.2 > Environment: Flink 1.9.2 > Zookeeper from AWS MSK > Reporter: Fritz Budiyanto > Priority: Major > Attachments: flinkissue.txt > > > I have been seeing this issue several time where JobGraph are not cleaned up > properly after Job deletion. Job deletion is performed by using "flink stop" > command. As a result JobGraph node lingering in ZK, when Flink cluster is > restarted, it will attempt to do HA restoration on non existing checkpoint > which prevent the Flink cluster to come up. > 2020-05-19 19:56:21,471 INFO > org.apache.flink.runtime.taskexecutor.TaskExecutor - Un-registering task and > sending final execution state FINISHED to JobManager for task Source: > kafkaConsumer[update_server] -> (DetectedUpdateMessageConverter -> Sink: > update_server.detected_updates, DrivenCoordinatesMessageConverter -> Sink: > update_server.driven_coordinates) 588902a8096f49845b09fa1f595d6065. > 2020-05-19 19:56:21,622 INFO > org.apache.flink.runtime.taskexecutor.slot.TaskSlotTable - Free slot > TaskSlot(index:0, state:ACTIVE, resource profile: > ResourceProfile\{cpuCores=1.7976931348623157E308, heapMemoryInMB=2147483647, > directMemoryInMB=2147483647, nativeMemoryInMB=2147483647, > networkMemoryInMB=2147483647, managedMemoryInMB=642}, allocationId: > 29f6a5f83c832486f2d7ebe5c779fa32, jobId: 86a028b3f7aada8ffe59859ca71d6385). > 2020-05-19 19:56:21,622 INFO > org.apache.flink.runtime.taskexecutor.JobLeaderService - Remove job > 86a028b3f7aada8ffe59859ca71d6385 from job leader monitoring. > 2020-05-19 19:56:21,622 INFO > org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService - > Stopping ZooKeeperLeaderRetrievalService > /leader/86a028b3f7aada8ffe59859ca71d6385/job_manager_lock. > 2020-05-19 19:56:21,623 INFO > org.apache.flink.runtime.taskexecutor.TaskExecutor - Close JobManager > connection for job 86a028b3f7aada8ffe59859ca71d6385. > 2020-05-19 19:56:21,624 INFO > org.apache.flink.runtime.taskexecutor.TaskExecutor - Close JobManager > connection for job 86a028b3f7aada8ffe59859ca71d6385. > 2020-05-19 19:56:21,624 INFO > org.apache.flink.runtime.taskexecutor.JobLeaderService - Cannot reconnect to > job 86a028b3f7aada8ffe59859ca71d6385 because it is not registered. > ... > Zookeeper CLI: > ls /flink/cluster_update/jobgraphs > [86a028b3f7aada8ffe59859ca71d6385] > > Attached is the Flink logs in reverse order -- This message was sent by Atlassian Jira (v8.3.4#803005)