[jira] [Comment Edited] (FLINK-17571) A better way to show the files used in currently checkpoints

2020-07-20 Thread Congxian Qiu(klion26) (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-17571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17161719#comment-17161719
 ] 

Congxian Qiu(klion26) edited comment on FLINK-17571 at 7/21/20, 5:33 AM:
-

[~pnowojski] could you please assign this to me if it's ok.

I want to add a command-line command {{./bin/flink checkpoint list $path}}.   
The $path should be the parent directory of {{_metadata}} or the path of 
{{_metadata}}

{{And the result will be all the files used by the specific 
checkpoint/savepoint.}}


was (Author: klion26):
[~pnowojski] could you please assign this to me if it's ok.

I want to add a command-line command {{./bin/flink savepoint list $path}}.   
The $path should be the parent directory of {{_metadata}} or the path of 
{{_metadata}}

{{And the result will be all the files used by the specific 
checkpoint/savepoint.}}

> A better way to show the files used in currently checkpoints
> 
>
> Key: FLINK-17571
> URL: https://issues.apache.org/jira/browse/FLINK-17571
> Project: Flink
>  Issue Type: New Feature
>  Components: Command Line Client, Runtime / Checkpointing
>Reporter: Congxian Qiu(klion26)
>Priority: Major
>
> Inspired by the 
> [userMail|http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Shared-Checkpoint-Cleanup-and-S3-Lifecycle-Policy-tt34965.html]
> Currently, there are [three types of 
> directory|https://ci.apache.org/projects/flink/flink-docs-release-1.10/ops/state/checkpoints.html#directory-structure]
>  for a checkpoint, the files in TASKOWND and EXCLUSIVE directory can be 
> deleted safely, but users can't delete the files in the SHARED directory 
> safely(the files may be created a long time ago).
> I think it's better to give users a better way to know which files are 
> currently used(so the others are not used)
> maybe a command-line command such as below is ok enough to support such a 
> feature.
> {{./bin/flink checkpoint list $checkpointDir  # list all the files used in 
> checkpoint}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (FLINK-17571) A better way to show the files used in currently checkpoints

2020-07-20 Thread Congxian Qiu(klion26) (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-17571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17161719#comment-17161719
 ] 

Congxian Qiu(klion26) edited comment on FLINK-17571 at 7/21/20, 5:33 AM:
-

[~pnowojski] could you please assign this to me if it's ok.

I want to add a command-line command {{./bin/flink checkpoint list $path}}.   
The $path should be the parent directory of {{_metadata}} or the path of 
{{_metadata}}

{{And the result will be all the files used by the specific checkpoint.}}


was (Author: klion26):
[~pnowojski] could you please assign this to me if it's ok.

I want to add a command-line command {{./bin/flink checkpoint list $path}}.   
The $path should be the parent directory of {{_metadata}} or the path of 
{{_metadata}}

{{And the result will be all the files used by the specific 
checkpoint/savepoint.}}

> A better way to show the files used in currently checkpoints
> 
>
> Key: FLINK-17571
> URL: https://issues.apache.org/jira/browse/FLINK-17571
> Project: Flink
>  Issue Type: New Feature
>  Components: Command Line Client, Runtime / Checkpointing
>Reporter: Congxian Qiu(klion26)
>Priority: Major
>
> Inspired by the 
> [userMail|http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Shared-Checkpoint-Cleanup-and-S3-Lifecycle-Policy-tt34965.html]
> Currently, there are [three types of 
> directory|https://ci.apache.org/projects/flink/flink-docs-release-1.10/ops/state/checkpoints.html#directory-structure]
>  for a checkpoint, the files in TASKOWND and EXCLUSIVE directory can be 
> deleted safely, but users can't delete the files in the SHARED directory 
> safely(the files may be created a long time ago).
> I think it's better to give users a better way to know which files are 
> currently used(so the others are not used)
> maybe a command-line command such as below is ok enough to support such a 
> feature.
> {{./bin/flink checkpoint list $checkpointDir  # list all the files used in 
> checkpoint}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (FLINK-17571) A better way to show the files used in currently checkpoints

2020-06-04 Thread Jiayi Liao (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-17571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17126311#comment-17126311
 ] 

Jiayi Liao edited comment on FLINK-17571 at 6/5/20, 2:12 AM:
-

I think we might also need to add an option for cleaning the orphan(useless) 
files. Very recently I have a job which fails to take checkpoint for a few 
hours and thousands of empty chk-x directory (the interval is short because of 
the special business scenarios) is created on HDFS. (FsStateBackend)


was (Author: wind_ljy):
I think we might also need to add an option for cleaning the orphan(useless) 
files. Very recently I have a job which fails to take checkpoint for a few 
hours and thousands of empty chk-x directory is created on HDFS. 
(FsStateBackend)

> A better way to show the files used in currently checkpoints
> 
>
> Key: FLINK-17571
> URL: https://issues.apache.org/jira/browse/FLINK-17571
> Project: Flink
>  Issue Type: New Feature
>  Components: Command Line Client, Runtime / Checkpointing
>Reporter: Congxian Qiu(klion26)
>Priority: Major
>
> Inspired by the 
> [userMail|http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Shared-Checkpoint-Cleanup-and-S3-Lifecycle-Policy-tt34965.html]
> Currently, there are [three types of 
> directory|https://ci.apache.org/projects/flink/flink-docs-release-1.10/ops/state/checkpoints.html#directory-structure]
>  for a checkpoint, the files in TASKOWND and EXCLUSIVE directory can be 
> deleted safely, but users can't delete the files in the SHARED directory 
> safely(the files may be created a long time ago).
> I think it's better to give users a better way to know which files are 
> currently used(so the others are not used)
> maybe a command-line command such as below is ok enough to support such a 
> feature.
> {{./bin/flink checkpoint list $checkpointDir  # list all the files used in 
> checkpoint}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (FLINK-17571) A better way to show the files used in currently checkpoints

2020-05-11 Thread Piotr Nowojski (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-17571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17104136#comment-17104136
 ] 

Piotr Nowojski edited comment on FLINK-17571 at 5/11/20, 10:13 AM:
---

[~klion26] I'm not familiar with when, how & who is removing those files, so 
it's mostly guessing for me. Besides the scenarios that you described, could it 
also affect savepoints?


was (Author: pnowojski):
[~stevenz3wu] I'm not familiar with when, how & who is removing those files, so 
it's mostly guessing for me. Besides the scenarios that you described, could it 
also affect savepoints?

> A better way to show the files used in currently checkpoints
> 
>
> Key: FLINK-17571
> URL: https://issues.apache.org/jira/browse/FLINK-17571
> Project: Flink
>  Issue Type: New Feature
>  Components: Runtime / Checkpointing
>Reporter: Congxian Qiu(klion26)
>Priority: Major
>
> Inspired by the 
> [userMail|http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Shared-Checkpoint-Cleanup-and-S3-Lifecycle-Policy-tt34965.html]
> Currently, there are [three types of 
> directory|https://ci.apache.org/projects/flink/flink-docs-release-1.10/ops/state/checkpoints.html#directory-structure]
>  for a checkpoint, the files in TASKOWND and EXCLUSIVE directory can be 
> deleted safely, but users can't delete the files in the SHARED directory 
> safely(the files may be created a long time ago).
> I think it's better to give users a better way to know which files are 
> currently used(so the others are not used)
> maybe a command-line command such as below is ok enough to support such a 
> feature.
> {{./bin/flink checkpoint list $checkpointDir  # list all the files used in 
> checkpoint}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (FLINK-17571) A better way to show the files used in currently checkpoints

2020-05-09 Thread Steven Zhen Wu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-17571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17103518#comment-17103518
 ] 

Steven Zhen Wu edited comment on FLINK-17571 at 5/9/20, 10:59 PM:
--

[~pnowojski] what is the usage of the remove command?

Please correct my understanding on incremental checkpoint.
 * It removes S3 files when reference count reaching zero. Normally, there 
shouldn't be orphaned checkpoint files lingering around. Maybe in some rare 
cases, reference count based cleanup didn't happen or succeed. so there is a 
small chance of orphaned files here.
 * We don't always restore from external checkpoint and continue the same 
checkpoint lineage (with incremental checkpoint and reference count). E.g. we 
can restore from a savepoint or empty state. Then those abandoned checkpoint 
lineages can leave significant garbage behind. 

here is what I am thinking about the GC
 # trace from root of retained external checkpoints to find all live files
 # Find all files in S3 bucket/prefix. I heard S3 can send daily report and we 
don't have to list objects
 # find the diff and remove the non live files (with some safety threshold like 
older than 30 days)


was (Author: stevenz3wu):
[~pnowojski] what is the usage of the remove command?

Please correct my understanding on incremental checkpoint.
 * It removes S3 files when reference count reaching zero. Normally, there 
shouldn't be orphaned checkpoint files lingering around. Maybe in some rare 
cases, reference count based cleanup didn't happen or succeed. so there is a 
small chance of orphaned files here.
 * We don't always restore from external checkpoint and continue the same 
checkpoint lineage. E.g. we can restore from a savepoint or empty state. Then 
those abandoned checkpoint lineages can leave significant garbage behind. 

here is what I am thinking about the GC
 # trace from root of retained external checkpoints to find all live files
 # Find all files in S3 bucket/prefix. I heard S3 can send daily report and we 
don't have to list objects
 # find the diff and remove the non live files (with some safety threshold like 
older than 30 days)

> A better way to show the files used in currently checkpoints
> 
>
> Key: FLINK-17571
> URL: https://issues.apache.org/jira/browse/FLINK-17571
> Project: Flink
>  Issue Type: New Feature
>  Components: Runtime / Checkpointing
>Reporter: Congxian Qiu(klion26)
>Priority: Major
>
> Inspired by the 
> [userMail|http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Shared-Checkpoint-Cleanup-and-S3-Lifecycle-Policy-tt34965.html]
> Currently, there are [three types of 
> directory|https://ci.apache.org/projects/flink/flink-docs-release-1.10/ops/state/checkpoints.html#directory-structure]
>  for a checkpoint, the files in TASKOWND and EXCLUSIVE directory can be 
> deleted safely, but users can't delete the files in the SHARED directory 
> safely(the files may be created a long time ago).
> I think it's better to give users a better way to know which files are 
> currently used(so the others are not used)
> maybe a command-line command such as below is ok enough to support such a 
> feature.
> {{./bin/flink checkpoint list $checkpointDir  # list all the files used in 
> checkpoint}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (FLINK-17571) A better way to show the files used in currently checkpoints

2020-05-08 Thread Piotr Nowojski (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-17571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17102482#comment-17102482
 ] 

Piotr Nowojski edited comment on FLINK-17571 at 5/8/20, 11:22 AM:
--

Do you mean that:
{noformat}
./bin/flink checkpoint list $checkpointDir  # list all the files used in 
checkpoint
{noformat}
would return also a paths to shared files from different checkpoints? If so, 
yes that would be helpful :)



was (Author: pnowojski):
Do you mean that:
{noformat}
./bin/flink checkpoint list $checkpointDir  # list all the files used in 
checkpoint
{noformat}
would return also a paths to shared files from different checkpoints?


> A better way to show the files used in currently checkpoints
> 
>
> Key: FLINK-17571
> URL: https://issues.apache.org/jira/browse/FLINK-17571
> Project: Flink
>  Issue Type: New Feature
>  Components: Runtime / Checkpointing
>Reporter: Congxian Qiu(klion26)
>Priority: Major
>
> Inspired by the 
> [userMail|http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Shared-Checkpoint-Cleanup-and-S3-Lifecycle-Policy-tt34965.html]
> Currently, there are [three types of 
> directory|https://ci.apache.org/projects/flink/flink-docs-release-1.10/ops/state/checkpoints.html#directory-structure]
>  for a checkpoint, the files in TASKOWND and EXCLUSIVE directory can be 
> deleted safely, but users can't delete the files in the SHARED directory 
> safely(the files may be created a long time ago).
> I think it's better to give users a better way to know which files are 
> currently used(so the others are not used)
> maybe a command-line command such as below is ok enough to support such a 
> feature.
> {{./bin/flink checkpoint list $checkpointDir  # list all the files used in 
> checkpoint}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)