[jira] [Commented] (FLINK-11336) Flink HA didn't remove ZK metadata
[ https://issues.apache.org/jira/browse/FLINK-11336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16781683#comment-16781683 ] shengjk1 commented on FLINK-11336: -- [~till.rohrmann] Yay, i think too, thank you > Flink HA didn't remove ZK metadata > -- > > Key: FLINK-11336 > URL: https://issues.apache.org/jira/browse/FLINK-11336 > Project: Flink > Issue Type: Bug > Components: Runtime / Coordination >Reporter: shengjk1 >Assignee: Till Rohrmann >Priority: Major > Attachments: image-2019-01-15-19-42-21-902.png > > > Flink HA didn't remove ZK metadata > such as > go to zk cli : ls /flinkone > !image-2019-01-15-19-42-21-902.png! > > i suggest we should delete this metadata when the application cancel or > throw exception -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-11336) Flink HA didn't remove ZK metadata
[ https://issues.apache.org/jira/browse/FLINK-11336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16781656#comment-16781656 ] Till Rohrmann commented on FLINK-11336: --- I've opened the issue FLINK-11789 to track the checkpoint directory clean up [~shengjk1]. > Flink HA didn't remove ZK metadata > -- > > Key: FLINK-11336 > URL: https://issues.apache.org/jira/browse/FLINK-11336 > Project: Flink > Issue Type: Bug > Components: Runtime / Coordination >Reporter: shengjk1 >Assignee: Till Rohrmann >Priority: Major > Attachments: image-2019-01-15-19-42-21-902.png > > > Flink HA didn't remove ZK metadata > such as > go to zk cli : ls /flinkone > !image-2019-01-15-19-42-21-902.png! > > i suggest we should delete this metadata when the application cancel or > throw exception -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-11336) Flink HA didn't remove ZK metadata
[ https://issues.apache.org/jira/browse/FLINK-11336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16781641#comment-16781641 ] Till Rohrmann commented on FLINK-11336: --- Hi [~shengjk1], I think you are right that we should also delete the checkpoint directories {{jobid/shared}} and {{jobId/taskowned}} if the job reaches a globally terminal state. In order to not blow up the scope of this issue I would, however, suggest to create a separate issue for the cleanup of these directories. This issue tries to address the problems of the ZooKeeper meta data cleanup. > Flink HA didn't remove ZK metadata > -- > > Key: FLINK-11336 > URL: https://issues.apache.org/jira/browse/FLINK-11336 > Project: Flink > Issue Type: Bug > Components: Runtime / Coordination >Reporter: shengjk1 >Assignee: Till Rohrmann >Priority: Major > Attachments: image-2019-01-15-19-42-21-902.png > > > Flink HA didn't remove ZK metadata > such as > go to zk cli : ls /flinkone > !image-2019-01-15-19-42-21-902.png! > > i suggest we should delete this metadata when the application cancel or > throw exception -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-11336) Flink HA didn't remove ZK metadata
[ https://issues.apache.org/jira/browse/FLINK-11336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16781618#comment-16781618 ] shengjk1 commented on FLINK-11336: -- hi, [~till.rohrmann] I have other questions and suggestions: 1. I want to know if will also delete invalid directories on HDFS, similar to zk metadata? because most of the metadata of HA is stored on HDFS. such as when job is failed. 2. when the job is canceled, the job's metadata is deleted as default , but i think it also should delete the corresponding directory, such as \{{jobId}}/shared and \{{jobId}}/taskowned. > Flink HA didn't remove ZK metadata > -- > > Key: FLINK-11336 > URL: https://issues.apache.org/jira/browse/FLINK-11336 > Project: Flink > Issue Type: Bug > Components: Runtime / Coordination >Reporter: shengjk1 >Assignee: Till Rohrmann >Priority: Major > Attachments: image-2019-01-15-19-42-21-902.png > > > Flink HA didn't remove ZK metadata > such as > go to zk cli : ls /flinkone > !image-2019-01-15-19-42-21-902.png! > > i suggest we should delete this metadata when the application cancel or > throw exception -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-11336) Flink HA didn't remove ZK metadata
[ https://issues.apache.org/jira/browse/FLINK-11336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16749548#comment-16749548 ] shengjk1 commented on FLINK-11336: -- Yarn (per job or as a session) > Flink HA didn't remove ZK metadata > -- > > Key: FLINK-11336 > URL: https://issues.apache.org/jira/browse/FLINK-11336 > Project: Flink > Issue Type: Improvement >Reporter: shengjk1 >Priority: Major > Attachments: image-2019-01-15-19-42-21-902.png > > > Flink HA didn't remove ZK metadata > such as > go to zk cli : ls /flinkone > !image-2019-01-15-19-42-21-902.png! > > i suggest we should delete this metadata when the application cancel or > throw exception -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-11336) Flink HA didn't remove ZK metadata
[ https://issues.apache.org/jira/browse/FLINK-11336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16748851#comment-16748851 ] Stephan Ewen commented on FLINK-11336: -- What way did you start Flink? - standalone - Yarn (per job or as a session) - Mesos - Container > Flink HA didn't remove ZK metadata > -- > > Key: FLINK-11336 > URL: https://issues.apache.org/jira/browse/FLINK-11336 > Project: Flink > Issue Type: Improvement >Reporter: shengjk1 >Priority: Major > Attachments: image-2019-01-15-19-42-21-902.png > > > Flink HA didn't remove ZK metadata > such as > go to zk cli : ls /flinkone > !image-2019-01-15-19-42-21-902.png! > > i suggest we should delete this metadata when the application cancel or > throw exception -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-11336) Flink HA didn't remove ZK metadata
[ https://issues.apache.org/jira/browse/FLINK-11336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16747322#comment-16747322 ] shengjk1 commented on FLINK-11336: -- 1.No matter what form stop flink, such as cancel,failed with no further retries,kill, metadata not be deleted. 2.when cancel,failed with no further retries,kill,manually deleting metadata has no effect on newly launched programs even if there has a savepoint this is my observed behavior > Flink HA didn't remove ZK metadata > -- > > Key: FLINK-11336 > URL: https://issues.apache.org/jira/browse/FLINK-11336 > Project: Flink > Issue Type: Improvement >Reporter: shengjk1 >Priority: Major > Attachments: image-2019-01-15-19-42-21-902.png > > > Flink HA didn't remove ZK metadata > such as > go to zk cli : ls /flinkone > !image-2019-01-15-19-42-21-902.png! > > i suggest we should delete this metadata when the application cancel or > throw exception -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-11336) Flink HA didn't remove ZK metadata
[ https://issues.apache.org/jira/browse/FLINK-11336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16746275#comment-16746275 ] Stephan Ewen commented on FLINK-11336: -- Sorry, I cannot follow. What is the behavior now you observed? > Flink HA didn't remove ZK metadata > -- > > Key: FLINK-11336 > URL: https://issues.apache.org/jira/browse/FLINK-11336 > Project: Flink > Issue Type: Improvement >Reporter: shengjk1 >Priority: Major > Attachments: image-2019-01-15-19-42-21-902.png > > > Flink HA didn't remove ZK metadata > such as > go to zk cli : ls /flinkone > !image-2019-01-15-19-42-21-902.png! > > i suggest we should delete this metadata when the application cancel or > throw exception -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-11336) Flink HA didn't remove ZK metadata
[ https://issues.apache.org/jira/browse/FLINK-11336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16745782#comment-16745782 ] shengjk1 commented on FLINK-11336: -- Unfamiliar with batch and bounded streams,so Inconvenient conclusion but such as unbounded streams when failed with no further retries cancelled we can remove the metadata ,As for how to start, you can start normally.I have already tried it, no problems in 1.8.0_151 flink 1.7.1 CDH5.13.1 > Flink HA didn't remove ZK metadata > -- > > Key: FLINK-11336 > URL: https://issues.apache.org/jira/browse/FLINK-11336 > Project: Flink > Issue Type: Improvement >Reporter: shengjk1 >Priority: Major > Attachments: image-2019-01-15-19-42-21-902.png > > > Flink HA didn't remove ZK metadata > such as > go to zk cli : ls /flinkone > !image-2019-01-15-19-42-21-902.png! > > i suggest we should delete this metadata when the application cancel or > throw exception -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-11336) Flink HA didn't remove ZK metadata
[ https://issues.apache.org/jira/browse/FLINK-11336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16745020#comment-16745020 ] Stephan Ewen commented on FLINK-11336: -- Flink should remove the metadata when the job terminates, which means - finished (for batch and bounded streams) - failed with no further retries - cancelled It does not remove the metadata if you just kill the YARN application or stop all containers. In that case Flink does not know that this was not a failure, but an intended shutdown. Can you confirm that this was a proper termination (as described above). If yes, which way did you start the Flink job? > Flink HA didn't remove ZK metadata > -- > > Key: FLINK-11336 > URL: https://issues.apache.org/jira/browse/FLINK-11336 > Project: Flink > Issue Type: Improvement >Reporter: shengjk1 >Priority: Major > Attachments: image-2019-01-15-19-42-21-902.png > > > Flink HA didn't remove ZK metadata > such as > go to zk cli : ls /flinkone > !image-2019-01-15-19-42-21-902.png! > > i suggest we should delete this metadata when the application cancel or > throw exception -- This message was sent by Atlassian JIRA (v7.6.3#76005)