[jira] [Commented] (FLINK-11336) Flink HA didn't remove ZK metadata

2019-03-01 Thread shengjk1 (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-11336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16781683#comment-16781683
 ] 

shengjk1 commented on FLINK-11336:
--

[~till.rohrmann]  Yay, i think too, thank you 

> Flink HA didn't remove ZK metadata
> --
>
> Key: FLINK-11336
> URL: https://issues.apache.org/jira/browse/FLINK-11336
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Reporter: shengjk1
>Assignee: Till Rohrmann
>Priority: Major
> Attachments: image-2019-01-15-19-42-21-902.png
>
>
> Flink HA didn't remove ZK metadata
> such as 
> go to zk cli  : ls /flinkone
> !image-2019-01-15-19-42-21-902.png!
>  
> i suggest we should delete this metadata when the application  cancel or 
> throw exception



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-11336) Flink HA didn't remove ZK metadata

2019-03-01 Thread Till Rohrmann (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-11336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16781656#comment-16781656
 ] 

Till Rohrmann commented on FLINK-11336:
---

I've opened the issue FLINK-11789 to track the checkpoint directory clean up 
[~shengjk1].

> Flink HA didn't remove ZK metadata
> --
>
> Key: FLINK-11336
> URL: https://issues.apache.org/jira/browse/FLINK-11336
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Reporter: shengjk1
>Assignee: Till Rohrmann
>Priority: Major
> Attachments: image-2019-01-15-19-42-21-902.png
>
>
> Flink HA didn't remove ZK metadata
> such as 
> go to zk cli  : ls /flinkone
> !image-2019-01-15-19-42-21-902.png!
>  
> i suggest we should delete this metadata when the application  cancel or 
> throw exception



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-11336) Flink HA didn't remove ZK metadata

2019-03-01 Thread Till Rohrmann (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-11336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16781641#comment-16781641
 ] 

Till Rohrmann commented on FLINK-11336:
---

Hi [~shengjk1], I think you are right that we should also delete the checkpoint 
directories {{jobid/shared}} and {{jobId/taskowned}} if the job reaches a 
globally terminal state.

In order to not blow up the scope of this issue I would, however, suggest to 
create a separate issue for the cleanup of these directories. This issue tries 
to address the problems of the ZooKeeper meta data cleanup.

> Flink HA didn't remove ZK metadata
> --
>
> Key: FLINK-11336
> URL: https://issues.apache.org/jira/browse/FLINK-11336
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Reporter: shengjk1
>Assignee: Till Rohrmann
>Priority: Major
> Attachments: image-2019-01-15-19-42-21-902.png
>
>
> Flink HA didn't remove ZK metadata
> such as 
> go to zk cli  : ls /flinkone
> !image-2019-01-15-19-42-21-902.png!
>  
> i suggest we should delete this metadata when the application  cancel or 
> throw exception



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-11336) Flink HA didn't remove ZK metadata

2019-03-01 Thread shengjk1 (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-11336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16781618#comment-16781618
 ] 

shengjk1 commented on FLINK-11336:
--

hi, [~till.rohrmann] 

I have other questions and suggestions:

 1. I want to know if  will also delete invalid directories on HDFS, similar to 
zk metadata?  because most of the metadata of HA is stored on HDFS. such as 
when  job is failed.

 2. when the job is canceled, the job's metadata is  deleted as default , but i 
think it also should  delete the corresponding directory, such as 
\{{jobId}}/shared and \{{jobId}}/taskowned. 

 

 

> Flink HA didn't remove ZK metadata
> --
>
> Key: FLINK-11336
> URL: https://issues.apache.org/jira/browse/FLINK-11336
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Reporter: shengjk1
>Assignee: Till Rohrmann
>Priority: Major
> Attachments: image-2019-01-15-19-42-21-902.png
>
>
> Flink HA didn't remove ZK metadata
> such as 
> go to zk cli  : ls /flinkone
> !image-2019-01-15-19-42-21-902.png!
>  
> i suggest we should delete this metadata when the application  cancel or 
> throw exception



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-11336) Flink HA didn't remove ZK metadata

2019-01-22 Thread shengjk1 (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-11336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16749548#comment-16749548
 ] 

shengjk1 commented on FLINK-11336:
--

Yarn (per job or as a session)

> Flink HA didn't remove ZK metadata
> --
>
> Key: FLINK-11336
> URL: https://issues.apache.org/jira/browse/FLINK-11336
> Project: Flink
>  Issue Type: Improvement
>Reporter: shengjk1
>Priority: Major
> Attachments: image-2019-01-15-19-42-21-902.png
>
>
> Flink HA didn't remove ZK metadata
> such as 
> go to zk cli  : ls /flinkone
> !image-2019-01-15-19-42-21-902.png!
>  
> i suggest we should delete this metadata when the application  cancel or 
> throw exception



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-11336) Flink HA didn't remove ZK metadata

2019-01-22 Thread Stephan Ewen (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-11336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16748851#comment-16748851
 ] 

Stephan Ewen commented on FLINK-11336:
--

What way did you start Flink?

  - standalone
  - Yarn (per job or as a session)
  - Mesos
  - Container

> Flink HA didn't remove ZK metadata
> --
>
> Key: FLINK-11336
> URL: https://issues.apache.org/jira/browse/FLINK-11336
> Project: Flink
>  Issue Type: Improvement
>Reporter: shengjk1
>Priority: Major
> Attachments: image-2019-01-15-19-42-21-902.png
>
>
> Flink HA didn't remove ZK metadata
> such as 
> go to zk cli  : ls /flinkone
> !image-2019-01-15-19-42-21-902.png!
>  
> i suggest we should delete this metadata when the application  cancel or 
> throw exception



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-11336) Flink HA didn't remove ZK metadata

2019-01-19 Thread shengjk1 (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-11336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16747322#comment-16747322
 ] 

shengjk1 commented on FLINK-11336:
--

1.No matter what form stop flink, such  as cancel,failed with no further 
retries,kill, metadata not be deleted.

2.when cancel,failed with no further retries,kill,manually deleting metadata 
has no effect on newly launched programs even if there has a savepoint

this is my observed behavior

> Flink HA didn't remove ZK metadata
> --
>
> Key: FLINK-11336
> URL: https://issues.apache.org/jira/browse/FLINK-11336
> Project: Flink
>  Issue Type: Improvement
>Reporter: shengjk1
>Priority: Major
> Attachments: image-2019-01-15-19-42-21-902.png
>
>
> Flink HA didn't remove ZK metadata
> such as 
> go to zk cli  : ls /flinkone
> !image-2019-01-15-19-42-21-902.png!
>  
> i suggest we should delete this metadata when the application  cancel or 
> throw exception



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-11336) Flink HA didn't remove ZK metadata

2019-01-18 Thread Stephan Ewen (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-11336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16746275#comment-16746275
 ] 

Stephan Ewen commented on FLINK-11336:
--

Sorry, I cannot follow. What is the behavior now you observed?

> Flink HA didn't remove ZK metadata
> --
>
> Key: FLINK-11336
> URL: https://issues.apache.org/jira/browse/FLINK-11336
> Project: Flink
>  Issue Type: Improvement
>Reporter: shengjk1
>Priority: Major
> Attachments: image-2019-01-15-19-42-21-902.png
>
>
> Flink HA didn't remove ZK metadata
> such as 
> go to zk cli  : ls /flinkone
> !image-2019-01-15-19-42-21-902.png!
>  
> i suggest we should delete this metadata when the application  cancel or 
> throw exception



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-11336) Flink HA didn't remove ZK metadata

2019-01-17 Thread shengjk1 (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-11336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16745782#comment-16745782
 ] 

shengjk1 commented on FLINK-11336:
--

Unfamiliar with batch and bounded streams,so Inconvenient conclusion but  such 
as unbounded streams

 when

    failed with no further retries

    cancelled

we can remove the metadata ,As for how to start, you can start normally.I have 
already tried it, no problems in 1.8.0_151  flink 1.7.1 CDH5.13.1

 

> Flink HA didn't remove ZK metadata
> --
>
> Key: FLINK-11336
> URL: https://issues.apache.org/jira/browse/FLINK-11336
> Project: Flink
>  Issue Type: Improvement
>Reporter: shengjk1
>Priority: Major
> Attachments: image-2019-01-15-19-42-21-902.png
>
>
> Flink HA didn't remove ZK metadata
> such as 
> go to zk cli  : ls /flinkone
> !image-2019-01-15-19-42-21-902.png!
>  
> i suggest we should delete this metadata when the application  cancel or 
> throw exception



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-11336) Flink HA didn't remove ZK metadata

2019-01-17 Thread Stephan Ewen (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-11336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16745020#comment-16745020
 ] 

Stephan Ewen commented on FLINK-11336:
--

Flink should remove the metadata when the job terminates, which means
  - finished (for batch and bounded streams)
  - failed with no further retries
  - cancelled

It does not remove the metadata if you just kill the YARN application or stop 
all containers. In that case Flink does not know that this was not a failure, 
but an intended shutdown.

Can you confirm that this was a proper termination (as described above).
If yes, which way did you start the Flink job?


> Flink HA didn't remove ZK metadata
> --
>
> Key: FLINK-11336
> URL: https://issues.apache.org/jira/browse/FLINK-11336
> Project: Flink
>  Issue Type: Improvement
>Reporter: shengjk1
>Priority: Major
> Attachments: image-2019-01-15-19-42-21-902.png
>
>
> Flink HA didn't remove ZK metadata
> such as 
> go to zk cli  : ls /flinkone
> !image-2019-01-15-19-42-21-902.png!
>  
> i suggest we should delete this metadata when the application  cancel or 
> throw exception



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)