[jira] [Commented] (FLINK-9465) Specify a separate savepoint timeout option via CLI

2021-10-17 Thread Feifan Wang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-9465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17429781#comment-17429781
 ] 

Feifan Wang commented on FLINK-9465:


Hi [~trohrmann], the [PR|https://github.com/apache/flink/pull/17443] is ready , 
can you help review the it ? 

> Specify a separate savepoint timeout option via CLI
> ---
>
> Key: FLINK-9465
> URL: https://issues.apache.org/jira/browse/FLINK-9465
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Checkpointing
>Affects Versions: 1.5.0
>Reporter: Truong Duc Kien
>Assignee: Feifan Wang
>Priority: Minor
>  Labels: auto-deprioritized-major, pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Savepoint can take much longer time to perform than checkpoint, especially 
> with incremental checkpoint enabled. This leads to a couple of troubles:
>  * For our job, we currently have to set the checkpoint timeout much large 
> than necessary, otherwise we would be unable to perform savepoint. 
>  * During rush hour, our cluster would encounter high rate of checkpoint 
> timeout due to backpressure, however we're unable to migrate to a larger 
> configuration, because savepoint also timeout.
> In my opinion, the timeout for savepoint should be configurable separately, 
> both in the config file and as parameter to the savepoint command.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-9465) Specify a separate savepoint timeout option via CLI

2021-10-15 Thread Feifan Wang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-9465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17429326#comment-17429326
 ] 

Feifan Wang commented on FLINK-9465:


I still work on this [~flink-jira-bot].

> Specify a separate savepoint timeout option via CLI
> ---
>
> Key: FLINK-9465
> URL: https://issues.apache.org/jira/browse/FLINK-9465
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Checkpointing
>Affects Versions: 1.5.0
>Reporter: Truong Duc Kien
>Assignee: Feifan Wang
>Priority: Minor
>  Labels: auto-deprioritized-major, pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Savepoint can take much longer time to perform than checkpoint, especially 
> with incremental checkpoint enabled. This leads to a couple of troubles:
>  * For our job, we currently have to set the checkpoint timeout much large 
> than necessary, otherwise we would be unable to perform savepoint. 
>  * During rush hour, our cluster would encounter high rate of checkpoint 
> timeout due to backpressure, however we're unable to migrate to a larger 
> configuration, because savepoint also timeout.
> In my opinion, the timeout for savepoint should be configurable separately, 
> both in the config file and as parameter to the savepoint command.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-9465) Specify a separate savepoint timeout option via CLI

2021-10-09 Thread Feifan Wang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-9465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17426559#comment-17426559
 ] 

Feifan Wang commented on FLINK-9465:


Hi [~trohrmann], I open a pull request to resolve this, but there are still 
some unit test that I think need to be complete. Can you take a glance over 
this PR and give me some guidance on the unit test ?

> Specify a separate savepoint timeout option via CLI
> ---
>
> Key: FLINK-9465
> URL: https://issues.apache.org/jira/browse/FLINK-9465
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Checkpointing
>Affects Versions: 1.5.0
>Reporter: Truong Duc Kien
>Assignee: Feifan Wang
>Priority: Minor
>  Labels: auto-deprioritized-major, pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Savepoint can take much longer time to perform than checkpoint, especially 
> with incremental checkpoint enabled. This leads to a couple of troubles:
>  * For our job, we currently have to set the checkpoint timeout much large 
> than necessary, otherwise we would be unable to perform savepoint. 
>  * During rush hour, our cluster would encounter high rate of checkpoint 
> timeout due to backpressure, however we're unable to migrate to a larger 
> configuration, because savepoint also timeout.
> In my opinion, the timeout for savepoint should be configurable separately, 
> both in the config file and as parameter to the savepoint command.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-9465) Specify a separate savepoint timeout option via CLI

2021-10-08 Thread Feifan Wang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-9465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17426306#comment-17426306
 ] 

Feifan Wang commented on FLINK-9465:


Hi [~trohrmann], thanks for introduce FLINK-15787, I very agree after reading. 
Finally, I will name it "savepointTimeout" in all above four place.

> Specify a separate savepoint timeout option via CLI
> ---
>
> Key: FLINK-9465
> URL: https://issues.apache.org/jira/browse/FLINK-9465
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Checkpointing
>Affects Versions: 1.5.0
>Reporter: Truong Duc Kien
>Assignee: Feifan Wang
>Priority: Minor
>  Labels: auto-deprioritized-major, pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Savepoint can take much longer time to perform than checkpoint, especially 
> with incremental checkpoint enabled. This leads to a couple of troubles:
>  * For our job, we currently have to set the checkpoint timeout much large 
> than necessary, otherwise we would be unable to perform savepoint. 
>  * During rush hour, our cluster would encounter high rate of checkpoint 
> timeout due to backpressure, however we're unable to migrate to a larger 
> configuration, because savepoint also timeout.
> In my opinion, the timeout for savepoint should be configurable separately, 
> both in the config file and as parameter to the savepoint command.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-9465) Specify a separate savepoint timeout option via CLI

2021-10-08 Thread Till Rohrmann (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-9465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17426276#comment-17426276
 ] 

Till Rohrmann commented on FLINK-9465:
--

Hi [~Feifan Wang], sorry for my late reply. I think we should use camel case 
for the parameter. I think we have agreed on this in FLINK-15787. For the CLI 
we should also name it {{savepointTimeout}}.

> Specify a separate savepoint timeout option via CLI
> ---
>
> Key: FLINK-9465
> URL: https://issues.apache.org/jira/browse/FLINK-9465
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Checkpointing
>Affects Versions: 1.5.0
>Reporter: Truong Duc Kien
>Assignee: Feifan Wang
>Priority: Minor
>  Labels: auto-deprioritized-major, pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Savepoint can take much longer time to perform than checkpoint, especially 
> with incremental checkpoint enabled. This leads to a couple of troubles:
>  * For our job, we currently have to set the checkpoint timeout much large 
> than necessary, otherwise we would be unable to perform savepoint. 
>  * During rush hour, our cluster would encounter high rate of checkpoint 
> timeout due to backpressure, however we're unable to migrate to a larger 
> configuration, because savepoint also timeout.
> In my opinion, the timeout for savepoint should be configurable separately, 
> both in the config file and as parameter to the savepoint command.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-9465) Specify a separate savepoint timeout option via CLI

2021-10-05 Thread Feifan Wang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-9465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17424536#comment-17424536
 ] 

Feifan Wang commented on FLINK-9465:


Hi [~trohrmann],

Since the two REST API mentioned above use the POST method, I tend to add 
parameter as part of the body of the http request, just like other parameters.

I want to name parameter as "savepoint-timeout" or "savepointTimeout" directly.
 * "savepoint-timeout" for  [REST API : 
/jobs/:jobid/savepoints​|https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/ops/rest_api/#jobs-jobid-savepoints]
 and [CLI : Creating a 
Savepoint|https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/deployment/cli/#creating-a-savepoint]
 * "savepointTimeout" for [REST API : 
/jobs/:jobid/stop​|https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/ops/rest_api/#jobs-jobid-stop]
 and [CLI : Stopping a Job Gracefully Creating a Final 
Savepoint​|https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/deployment/cli/#stopping-a-job-gracefully-creating-a-final-savepoint]

And the parameter in this 4 place should be optional, if not appear, checkpoint 
timeout will take effect.

How do you think about ? 

> Specify a separate savepoint timeout option via CLI
> ---
>
> Key: FLINK-9465
> URL: https://issues.apache.org/jira/browse/FLINK-9465
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Checkpointing
>Affects Versions: 1.5.0
>Reporter: Truong Duc Kien
>Assignee: Feifan Wang
>Priority: Minor
>  Labels: auto-deprioritized-major, pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Savepoint can take much longer time to perform than checkpoint, especially 
> with incremental checkpoint enabled. This leads to a couple of troubles:
>  * For our job, we currently have to set the checkpoint timeout much large 
> than necessary, otherwise we would be unable to perform savepoint. 
>  * During rush hour, our cluster would encounter high rate of checkpoint 
> timeout due to backpressure, however we're unable to migrate to a larger 
> configuration, because savepoint also timeout.
> In my opinion, the timeout for savepoint should be configurable separately, 
> both in the config file and as parameter to the savepoint command.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-9465) Specify a separate savepoint timeout option via CLI

2021-10-04 Thread Till Rohrmann (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-9465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17423992#comment-17423992
 ] 

Till Rohrmann commented on FLINK-9465:
--

Do you want to add the parameter as a query parameter or make it part of the 
body of the http request?

For the changes in the REST api I would stick to how it's done there. Same for 
the CLI parameter formatting.

How would you name the parameter in the CLI and REST api?

The places you suggested make change for the introduction of the timeout 
parameter.

> Specify a separate savepoint timeout option via CLI
> ---
>
> Key: FLINK-9465
> URL: https://issues.apache.org/jira/browse/FLINK-9465
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Checkpointing
>Affects Versions: 1.5.0
>Reporter: Truong Duc Kien
>Assignee: Feifan Wang
>Priority: Minor
>  Labels: auto-deprioritized-major, pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Savepoint can take much longer time to perform than checkpoint, especially 
> with incremental checkpoint enabled. This leads to a couple of troubles:
>  * For our job, we currently have to set the checkpoint timeout much large 
> than necessary, otherwise we would be unable to perform savepoint. 
>  * During rush hour, our cluster would encounter high rate of checkpoint 
> timeout due to backpressure, however we're unable to migrate to a larger 
> configuration, because savepoint also timeout.
> In my opinion, the timeout for savepoint should be configurable separately, 
> both in the config file and as parameter to the savepoint command.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-9465) Specify a separate savepoint timeout option via CLI

2021-10-02 Thread Feifan Wang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-9465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17423539#comment-17423539
 ] 

Feifan Wang commented on FLINK-9465:


Hi [~trohrmann], thanks for reply, I think we can add the "savepoint-timeout" 
parameter in the following four places:

REST API :
 * 
[/jobs/:jobid/savepoints​|https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/ops/rest_api/#jobs-jobid-savepoints]
 * 
[/jobs/:jobid/stop​|https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/ops/rest_api/#jobs-jobid-stop]

Command-Line Interface :
 * [Creating a 
Savepoint​|https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/deployment/cli/#creating-a-savepoint]
 * [Stopping a Job Gracefully Creating a Final 
Savepoint​|https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/deployment/cli/#stopping-a-job-gracefully-creating-a-final-savepoint]

 

 

BTW, I noticed that there are different styles of parameter formats in rest api 
and cli, some are in camel case, and others are in kebab case. Should we use a 
uniform format ?

> Specify a separate savepoint timeout option via CLI
> ---
>
> Key: FLINK-9465
> URL: https://issues.apache.org/jira/browse/FLINK-9465
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Checkpointing
>Affects Versions: 1.5.0
>Reporter: Truong Duc Kien
>Assignee: Feifan Wang
>Priority: Minor
>  Labels: auto-deprioritized-major, pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Savepoint can take much longer time to perform than checkpoint, especially 
> with incremental checkpoint enabled. This leads to a couple of troubles:
>  * For our job, we currently have to set the checkpoint timeout much large 
> than necessary, otherwise we would be unable to perform savepoint. 
>  * During rush hour, our cluster would encounter high rate of checkpoint 
> timeout due to backpressure, however we're unable to migrate to a larger 
> configuration, because savepoint also timeout.
> In my opinion, the timeout for savepoint should be configurable separately, 
> both in the config file and as parameter to the savepoint command.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-9465) Specify a separate savepoint timeout option via CLI

2021-10-01 Thread Till Rohrmann (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-9465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17423266#comment-17423266
 ] 

Till Rohrmann commented on FLINK-9465:
--

Hi [~Feifan Wang], I've assigned the ticket to you. Before you start coding 
could you quickly explain how you intend to solve the problem (e.g. which REST 
parameters to add, whether to add a CLI option and if yes what's its name?).

> Specify a separate savepoint timeout option via CLI
> ---
>
> Key: FLINK-9465
> URL: https://issues.apache.org/jira/browse/FLINK-9465
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Checkpointing
>Affects Versions: 1.5.0
>Reporter: Truong Duc Kien
>Assignee: Feifan Wang
>Priority: Minor
>  Labels: auto-deprioritized-major, pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Savepoint can take much longer time to perform than checkpoint, especially 
> with incremental checkpoint enabled. This leads to a couple of troubles:
>  * For our job, we currently have to set the checkpoint timeout much large 
> than necessary, otherwise we would be unable to perform savepoint. 
>  * During rush hour, our cluster would encounter high rate of checkpoint 
> timeout due to backpressure, however we're unable to migrate to a larger 
> configuration, because savepoint also timeout.
> In my opinion, the timeout for savepoint should be configurable separately, 
> both in the config file and as parameter to the savepoint command.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-9465) Specify a separate savepoint timeout option via CLI

2021-09-27 Thread Feifan Wang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-9465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17420579#comment-17420579
 ] 

Feifan Wang commented on FLINK-9465:


Hi [~trohrmann] [~twalthr] , this problem also bothers us, I much agree with 
specify a different value than the configured checkpoint timeout in CLI or REST 
API. And I am glad work on it, can you assign this issue to me ?

> Specify a separate savepoint timeout option via CLI
> ---
>
> Key: FLINK-9465
> URL: https://issues.apache.org/jira/browse/FLINK-9465
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Checkpointing
>Affects Versions: 1.5.0
>Reporter: Truong Duc Kien
>Priority: Minor
>  Labels: auto-deprioritized-major, pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Savepoint can take much longer time to perform than checkpoint, especially 
> with incremental checkpoint enabled. This leads to a couple of troubles:
>  * For our job, we currently have to set the checkpoint timeout much large 
> than necessary, otherwise we would be unable to perform savepoint. 
>  * During rush hour, our cluster would encounter high rate of checkpoint 
> timeout due to backpressure, however we're unable to migrate to a larger 
> configuration, because savepoint also timeout.
> In my opinion, the timeout for savepoint should be configurable separately, 
> both in the config file and as parameter to the savepoint command.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-9465) Specify a separate savepoint timeout option via CLI

2021-04-29 Thread Flink Jira Bot (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-9465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17336627#comment-17336627
 ] 

Flink Jira Bot commented on FLINK-9465:
---

This issue was labeled "stale-major" 7 ago and has not received any updates so 
it is being deprioritized. If this ticket is actually Major, please raise the 
priority and ask a committer to assign you the issue or revive the public 
discussion.


> Specify a separate savepoint timeout option via CLI
> ---
>
> Key: FLINK-9465
> URL: https://issues.apache.org/jira/browse/FLINK-9465
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Checkpointing
>Affects Versions: 1.5.0
>Reporter: Truong Duc Kien
>Priority: Major
>  Labels: pull-request-available, stale-major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Savepoint can take much longer time to perform than checkpoint, especially 
> with incremental checkpoint enabled. This leads to a couple of troubles:
>  * For our job, we currently have to set the checkpoint timeout much large 
> than necessary, otherwise we would be unable to perform savepoint. 
>  * During rush hour, our cluster would encounter high rate of checkpoint 
> timeout due to backpressure, however we're unable to migrate to a larger 
> configuration, because savepoint also timeout.
> In my opinion, the timeout for savepoint should be configurable separately, 
> both in the config file and as parameter to the savepoint command.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-9465) Specify a separate savepoint timeout option via CLI

2021-04-22 Thread Flink Jira Bot (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-9465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17328626#comment-17328626
 ] 

Flink Jira Bot commented on FLINK-9465:
---

This major issue is unassigned and itself and all of its Sub-Tasks have not 
been updated for 30 days. So, it has been labeled "stale-major". If this ticket 
is indeed "major", please either assign yourself or give an update. Afterwards, 
please remove the label. In 7 days the issue will be deprioritized.

> Specify a separate savepoint timeout option via CLI
> ---
>
> Key: FLINK-9465
> URL: https://issues.apache.org/jira/browse/FLINK-9465
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Checkpointing
>Affects Versions: 1.5.0
>Reporter: Truong Duc Kien
>Priority: Major
>  Labels: pull-request-available, stale-major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Savepoint can take much longer time to perform than checkpoint, especially 
> with incremental checkpoint enabled. This leads to a couple of troubles:
>  * For our job, we currently have to set the checkpoint timeout much large 
> than necessary, otherwise we would be unable to perform savepoint. 
>  * During rush hour, our cluster would encounter high rate of checkpoint 
> timeout due to backpressure, however we're unable to migrate to a larger 
> configuration, because savepoint also timeout.
> In my opinion, the timeout for savepoint should be configurable separately, 
> both in the config file and as parameter to the savepoint command.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-9465) Specify a separate savepoint timeout option via CLI

2021-01-18 Thread Timo Walther (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-9465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17267135#comment-17267135
 ] 

Timo Walther commented on FLINK-9465:
-

Seems nobody is working on this issue anymore. I marked it as unassigned. There 
was also a thread on the user@ ML on this topic recently:
https://lists.apache.org/thread.html/rac24855efe372b09b025a1eeb1c8111c9bc8c216265ce94cbf0d3880%40%3Cuser.flink.apache.org%3E

> Specify a separate savepoint timeout option via CLI
> ---
>
> Key: FLINK-9465
> URL: https://issues.apache.org/jira/browse/FLINK-9465
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Checkpointing
>Affects Versions: 1.5.0
>Reporter: Truong Duc Kien
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Savepoint can take much longer time to perform than checkpoint, especially 
> with incremental checkpoint enabled. This leads to a couple of troubles:
>  * For our job, we currently have to set the checkpoint timeout much large 
> than necessary, otherwise we would be unable to perform savepoint. 
>  * During rush hour, our cluster would encounter high rate of checkpoint 
> timeout due to backpressure, however we're unable to migrate to a larger 
> configuration, because savepoint also timeout.
> In my opinion, the timeout for savepoint should be configurable separately, 
> both in the config file and as parameter to the savepoint command.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)