[jira] [Comment Edited] (FLINK-29634) Support periodic checkpoint triggering

2023-01-02 Thread Jiale Tan (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-29634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17653712#comment-17653712
 ] 

Jiale Tan edited comment on FLINK-29634 at 1/3/23 12:50 AM:


I see. [~gyfora] Thanks for the input. I checked pom files again and your 
solution makes sense.

One more question - how can we deal with 
{{io.fabric8.kubernetes.client.server.mock.*}} classes? 
They are not a part of {{flink-kubernetes}}, but introduced 
[here|https://github.com/apache/flink-kubernetes-operator/blob/a1842d4c0170feb008293963ec51c0343f42771d/flink-kubernetes-standalone/pom.xml#L74-L78],
 so there are no shaded version for them.

I am running into issues 
[here|https://github.com/apache/flink-kubernetes-operator/blob/7ced741f51a99f2093ce8a45c8c92879a247f836/flink-kubernetes-standalone/src/test/java/org/apache/flink/kubernetes/operator/kubeclient/Fabric8FlinkStandaloneKubeClientTest.java#L58]:
 the code was expecting an 
{{org.apache.flink.kubernetes.shaded.io.fabric8.kubernetes.client.NamespacedKubernetesClient}}
 object for {{kubernetesClient}} but the mockServer.createClient() will create 
an {{io.fabric8.kubernetes.client.NamespacedKubernetesClient}}.


was (Author: JIRAUSER290356):
I see. [~gyfora] Thanks for the input. I checked pom files again and your 
solution makes sense.

One more question - how can we deal with 
{{io.fabric8.kubernetes.client.server.mock.*}} classes? 
They are not a part of {{flink-kubernetes}}, but introduced 
[here|https://github.com/apache/flink-kubernetes-operator/blob/a1842d4c0170feb008293963ec51c0343f42771d/flink-kubernetes-standalone/pom.xml#L74-L78],
 so there are no shaded version for them.

I am running into issues 
[here|https://github.com/apache/flink-kubernetes-operator/blob/7ced741f51a99f2093ce8a45c8c92879a247f836/flink-kubernetes-standalone/src/test/java/org/apache/flink/kubernetes/operator/kubeclient/Fabric8FlinkStandaloneKubeClientTest.java#L58],
 we are expecting an 
{{org.apache.flink.kubernetes.shaded.io.fabric8.kubernetes.client.NamespacedKubernetesClient}}
 object for {{kubernetesClient}} but the mockServer.createClient() will create 
an {{io.fabric8.kubernetes.client.NamespacedKubernetesClient}}.

> Support periodic checkpoint triggering
> --
>
> Key: FLINK-29634
> URL: https://issues.apache.org/jira/browse/FLINK-29634
> Project: Flink
>  Issue Type: New Feature
>  Components: Kubernetes Operator
>Reporter: Thomas Weise
>Assignee: Jiale Tan
>Priority: Major
>
> Similar to the support for periodic savepoints, the operator should support 
> triggering periodic checkpoints to break the incremental checkpoint chain.
> Support for external triggering will come with 1.17: 
> https://issues.apache.org/jira/browse/FLINK-27101 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (FLINK-29634) Support periodic checkpoint triggering

2023-01-02 Thread Jiale Tan (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-29634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17653712#comment-17653712
 ] 

Jiale Tan edited comment on FLINK-29634 at 1/3/23 12:50 AM:


I see. [~gyfora] Thanks for the input. I checked pom files again and your 
solution makes sense.

One more question - how can we deal with 
{{io.fabric8.kubernetes.client.server.mock.*}} classes? 
They are not a part of {{flink-kubernetes}}, but introduced 
[here|https://github.com/apache/flink-kubernetes-operator/blob/a1842d4c0170feb008293963ec51c0343f42771d/flink-kubernetes-standalone/pom.xml#L74-L78],
 so there are no shaded version for them.

I am running into issues 
[here|https://github.com/apache/flink-kubernetes-operator/blob/7ced741f51a99f2093ce8a45c8c92879a247f836/flink-kubernetes-standalone/src/test/java/org/apache/flink/kubernetes/operator/kubeclient/Fabric8FlinkStandaloneKubeClientTest.java#L58],
 we are expecting an 
{{org.apache.flink.kubernetes.shaded.io.fabric8.kubernetes.client.NamespacedKubernetesClient}}
 object for {{kubernetesClient}} but the mockServer.createClient() will create 
an {{io.fabric8.kubernetes.client.NamespacedKubernetesClient}}.


was (Author: JIRAUSER290356):
I see. [~gyfora] Thanks for the input. I checked pom files again and your 
solution makes sense.

One more question - how can we deal with 
{{io.fabric8.kubernetes.client.server.mock.*}} classes? 
They are not a part of {{{}flink-kubernetes{}}}, but introduced 
[here|https://github.com/apache/flink-kubernetes-operator/blob/a1842d4c0170feb008293963ec51c0343f42771d/flink-kubernetes-standalone/pom.xml#L74-L78],
 so there are no shaded version for them.

We will run into issues like 
[this|https://github.com/apache/flink-kubernetes-operator/blob/7ced741f51a99f2093ce8a45c8c92879a247f836/flink-kubernetes-standalone/src/test/java/org/apache/flink/kubernetes/operator/kubeclient/Fabric8FlinkStandaloneKubeClientTest.java#L58],
 we are expecting an 
{{org.apache.flink.kubernetes.shaded.io.fabric8.kubernetes.client.NamespacedKubernetesClient}}
 object for {{kubernetesClient}} but the mockServer.createClient() will create 
an {{io.fabric8.kubernetes.client.NamespacedKubernetesClient}}.

> Support periodic checkpoint triggering
> --
>
> Key: FLINK-29634
> URL: https://issues.apache.org/jira/browse/FLINK-29634
> Project: Flink
>  Issue Type: New Feature
>  Components: Kubernetes Operator
>Reporter: Thomas Weise
>Assignee: Jiale Tan
>Priority: Major
>
> Similar to the support for periodic savepoints, the operator should support 
> triggering periodic checkpoints to break the incremental checkpoint chain.
> Support for external triggering will come with 1.17: 
> https://issues.apache.org/jira/browse/FLINK-27101 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-29634) Support periodic checkpoint triggering

2023-01-02 Thread Jiale Tan (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-29634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17653712#comment-17653712
 ] 

Jiale Tan commented on FLINK-29634:
---

I see. [~gyfora] Thanks for the input. I checked pom files again and your 
solution makes sense.

One more question - how can we deal with 
{{io.fabric8.kubernetes.client.server.mock.*}} classes? 
They are not a part of {{{}flink-kubernetes{}}}, but introduced 
[here|https://github.com/apache/flink-kubernetes-operator/blob/a1842d4c0170feb008293963ec51c0343f42771d/flink-kubernetes-standalone/pom.xml#L74-L78],
 so there are no shaded version for them.

We will run into issues like 
[this|https://github.com/apache/flink-kubernetes-operator/blob/7ced741f51a99f2093ce8a45c8c92879a247f836/flink-kubernetes-standalone/src/test/java/org/apache/flink/kubernetes/operator/kubeclient/Fabric8FlinkStandaloneKubeClientTest.java#L58],
 we are expecting an 
{{org.apache.flink.kubernetes.shaded.io.fabric8.kubernetes.client.NamespacedKubernetesClient}}
 object for {{kubernetesClient}} but the mockServer.createClient() will create 
an {{io.fabric8.kubernetes.client.NamespacedKubernetesClient}}.

> Support periodic checkpoint triggering
> --
>
> Key: FLINK-29634
> URL: https://issues.apache.org/jira/browse/FLINK-29634
> Project: Flink
>  Issue Type: New Feature
>  Components: Kubernetes Operator
>Reporter: Thomas Weise
>Assignee: Jiale Tan
>Priority: Major
>
> Similar to the support for periodic savepoints, the operator should support 
> triggering periodic checkpoints to break the incremental checkpoint chain.
> Support for external triggering will come with 1.17: 
> https://issues.apache.org/jira/browse/FLINK-27101 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (FLINK-29634) Support periodic checkpoint triggering

2023-01-01 Thread Jiale Tan (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-29634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17653483#comment-17653483
 ] 

Jiale Tan edited comment on FLINK-29634 at 1/2/23 3:28 AM:
---

[~gyfora] [~thw] Thanks for the guidance!

I am trying to bump {{flink.version}} up to 1.17 (SNAPSHOT) locally to take 
advantage of the checkpoint rest API classes in 1.17 and start coding. However 
I found it a little bit tricky to deal with the dependencies, since in flink 
1.17 {{io.fabric8}} classes are shaded and relocated in the 
{{flink-kubernetes}} in 
[this|https://github.com/apache/flink/commit/17d7c39bb2a9fcbaac1ead42073c099a52171d7d]
 commit.

This will cause some {{io.fabric8}} classes in the imports not found (which 
means these classes used to be only provided by {{{}flink-kubernetes{}}}), 
fixing their package is comparatively easy, just to change import 
{{io.fabric8}} classes packages to 
{{org.apache.flink.kubernetes.shaded.io.fabric8}}

Things get trickier when {{io.fabric8}} classes used to be provided by 
{{flink-kubernetes-operator}} (like 
[these|https://github.com/apache/flink-kubernetes-operator/blob/a1842d4c0170feb008293963ec51c0343f42771d/flink-kubernetes-standalone/pom.xml#L74-L79]
 ) and also available in {{flink-kubernetes}} . The new relocation change in 
flink 1.17 will cause {{flink-kubernetes-operator}} functions (like 
[this|https://github.com/apache/flink-kubernetes-operator/blob/a1842d4c0170feb008293963ec51c0343f42771d/flink-kubernetes-standalone/src/test/java/org/apache/flink/kubernetes/operator/kubeclient/parameters/ParametersTestBase.java#L92])
 with arguments of {{io.fabric8}} classes now requiring 
{{org.apache.flink.kubernetes.shaded.io.fabric8}} classes arguments. Meanwhile 
it seems {{flink-kubernetes}} vs {{flink-kubernetes-operator}} are using 
different {{io.fabric8}} versions (5.12.3 vs 6.2.0 respectively), this means 
changing import class packages like previous case will cause a version 
downgrade for  {{io.fabric8}} classes.

And also there are certain classes which are not available in 
{{flink-kubernetes}} like 
[these|https://github.com/apache/flink-kubernetes-operator/blob/7ced741f51a99f2093ce8a45c8c92879a247f836/flink-kubernetes-standalone/src/test/java/org/apache/flink/kubernetes/operator/kubeclient/Fabric8FlinkStandaloneKubeClientTest.java#L33-L34]
 {{io.fabric8.kubernetes.client.server.mock.*}} classes) Calling functions from 
those classes will just create {{io.fabric8}} class objects which do not work 
with rest of code, which only accept 
{{org.apache.flink.kubernetes.shaded.io.fabric8}} class objects

2 work arounds I can think of
 # to have a dedicated subproject just to relocate back the {{io.fabric8}} 
classes in {{{}flink-kubernetes{}}}, (or other way around relocate things to 
{{{}org.apache.flink.kubernetes.shaded.io.fabric8{}}}) both seem a little bit 
ugly.
 # ignore the {{io.fabric8}} classes from {{flink-kubernetes}} and add all 
required {{io.fabric8}} dependencies to pom.xml, not sure if it is ok to 
upgrade all dependencies version to 6.2.0.

I am wondering if you folks have some good advices how to deal with this. 


was (Author: JIRAUSER290356):
[~gyfora] [~thw] Thanks for the guidance!

I am trying to bump {{flink.version}} up to 1.17 (SNAPSHOT) locally to take 
advantage of the checkpoint rest API classes in 1.17 and start coding. However 
I found it a little bit tricky to deal with the dependencies, since in flink 
1.17 {{io.fabric8}} classes are shaded and relocated in the 
{{flink-kubernetes}} in 
[this|https://github.com/apache/flink/commit/17d7c39bb2a9fcbaac1ead42073c099a52171d7d]
 commit.

This will cause some {{io.fabric8}} classes in the imports not found (which 
means these classes used to be only provided by {{{}flink-kubernetes{}}}), 
fixing their package is comparatively easy, just to change import 
{{io.fabric8}} classes packages to 
{{org.apache.flink.kubernetes.shaded.io.fabric8}}

Things get trickier when {{io.fabric8}} classes used to be provided by 
{{flink-kubernetes-operator}} (like 
[these|https://github.com/apache/flink-kubernetes-operator/blob/a1842d4c0170feb008293963ec51c0343f42771d/flink-kubernetes-standalone/pom.xml#L74-L79]
 ) and also available in {{flink-kubernetes}} . The new relocation change in 
flink 1.17 will cause {{flink-kubernetes-operator}} functions (like 
[this|https://github.com/apache/flink-kubernetes-operator/blob/a1842d4c0170feb008293963ec51c0343f42771d/flink-kubernetes-standalone/src/test/java/org/apache/flink/kubernetes/operator/kubeclient/parameters/ParametersTestBase.java#L92])
 with arguments of {{io.fabric8}} classes now requiring 
{{org.apache.flink.kubernetes.shaded.io.fabric8}} classes arguments. Meanwhile 
it seems {{flink-kubernetes}} vs {{flink-kubernetes-operator}} are using 
different {{io.fabric8}} versions (5.12.3 vs 6.2.0 respectively), this means 
changing import class packages like 

[jira] [Comment Edited] (FLINK-29634) Support periodic checkpoint triggering

2023-01-01 Thread Jiale Tan (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-29634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17653483#comment-17653483
 ] 

Jiale Tan edited comment on FLINK-29634 at 1/2/23 3:27 AM:
---

[~gyfora] [~thw] Thanks for the guidance!

I am trying to bump {{flink.version}} up to 1.17 (SNAPSHOT) locally to take 
advantage of the checkpoint rest API classes in 1.17 and start coding. However 
I found it a little bit tricky to deal with the dependencies, since in flink 
1.17 {{io.fabric8}} classes are shaded and relocated in the 
{{flink-kubernetes}} in 
[this|https://github.com/apache/flink/commit/17d7c39bb2a9fcbaac1ead42073c099a52171d7d]
 commit.

This will cause some {{io.fabric8}} classes in the imports not found (which 
means these classes used to be only provided by {{{}flink-kubernetes{}}}), 
fixing their package is comparatively easy, just to change import 
{{io.fabric8}} classes packages to 
{{org.apache.flink.kubernetes.shaded.io.fabric8}}

Things get trickier when {{io.fabric8}} classes used to be provided by 
{{flink-kubernetes-operator}} (like 
[these|https://github.com/apache/flink-kubernetes-operator/blob/a1842d4c0170feb008293963ec51c0343f42771d/flink-kubernetes-standalone/pom.xml#L74-L79]
 ) and also available in {{flink-kubernetes}} . The new relocation change in 
flink 1.17 will cause {{flink-kubernetes-operator}} functions (like 
[this|https://github.com/apache/flink-kubernetes-operator/blob/a1842d4c0170feb008293963ec51c0343f42771d/flink-kubernetes-standalone/src/test/java/org/apache/flink/kubernetes/operator/kubeclient/parameters/ParametersTestBase.java#L92])
 with arguments of {{io.fabric8}} classes now requiring 
{{org.apache.flink.kubernetes.shaded.io.fabric8}} classes arguments. Meanwhile 
it seems {{flink-kubernetes}} vs {{flink-kubernetes-operator}} are using 
different {{io.fabric8}} versions (5.12.3 vs 6.2.0 respectively), this means 
changing import class packages like previous case will cause a version 
downgrade for  {{io.fabric8}} classes.

And also there are certain classes which are not available in 
{{flink-kubernetes}} like 
[these|https://github.com/apache/flink-kubernetes-operator/blob/7ced741f51a99f2093ce8a45c8c92879a247f836/flink-kubernetes-standalone/src/test/java/org/apache/flink/kubernetes/operator/kubeclient/Fabric8FlinkStandaloneKubeClientTest.java#L33-L34].
 Calling functions from those classes will just create {{io.fabric8}} class 
objects which do not work with rest of code, which only accept 
{{org.apache.flink.kubernetes.shaded.io.fabric8}} class objects

2 work arounds I can think of
 # to have a dedicated subproject just to relocate back the {{io.fabric8}} 
classes in {{{}flink-kubernetes{}}}, (or other way around relocate things to 
{{{}org.apache.flink.kubernetes.shaded.io.fabric8{}}}) both seem a little bit 
ugly.
 # ignore the {{io.fabric8}} classes from {{flink-kubernetes}} and add all 
required {{io.fabric8}} dependencies to pom.xml, not sure if it is ok to 
upgrade all dependencies version to 6.2.0.

I am wondering if you folks have some good advices how to deal with this. 


was (Author: JIRAUSER290356):
[~gyfora] [~thw] Thanks for the guidance!

I am trying to bump {{flink.version}} up to 1.17 (SNAPSHOT) locally to take 
advantage of the checkpoint rest API classes in 1.17 and start coding. However 
I found it a little bit tricky to deal with the dependencies, since in flink 
1.17 {{io.fabric8}} classes are shaded and relocated in the 
{{flink-kubernetes}} in 
[this|https://github.com/apache/flink/commit/17d7c39bb2a9fcbaac1ead42073c099a52171d7d]
 commit.

This will cause some {{io.fabric8}} classes in the imports not found (which 
means these classes used to be only provided by {{{}flink-kubernetes{}}}), 
fixing their package is comparatively easy, just to change import 
{{io.fabric8}} classes packages to 
{{org.apache.flink.kubernetes.shaded.io.fabric8}}

Things get trickier when {{io.fabric8}} classes used to be provided by 
{{flink-kubernetes-operator}} (like 
[these|https://github.com/apache/flink-kubernetes-operator/blob/a1842d4c0170feb008293963ec51c0343f42771d/flink-kubernetes-standalone/pom.xml#L74-L79]
 ). The new relocation change in flink 1.17 will cause 
{{flink-kubernetes-operator}} functions (like 
[this|https://github.com/apache/flink-kubernetes-operator/blob/a1842d4c0170feb008293963ec51c0343f42771d/flink-kubernetes-standalone/src/test/java/org/apache/flink/kubernetes/operator/kubeclient/parameters/ParametersTestBase.java#L92])
 with arguments of {{io.fabric8}} classes now requiring 
{{org.apache.flink.kubernetes.shaded.io.fabric8}} classes arguments. Meanwhile 
it seems {{flink-kubernetes}} vs {{flink-kubernetes-operator}} are using 
different {{io.fabric8}} versions (5.12.3 vs 6.2.0 respectively), this means 
changing import class packages like previous case will cause a version 
downgrade for  {{io.fabric8}} classes.

2 work arounds I can think of

[jira] [Comment Edited] (FLINK-29634) Support periodic checkpoint triggering

2023-01-01 Thread Jiale Tan (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-29634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17653483#comment-17653483
 ] 

Jiale Tan edited comment on FLINK-29634 at 1/2/23 3:19 AM:
---

[~gyfora] [~thw] Thanks for the guidance!

I am trying to bump {{flink.version}} up to 1.17 (SNAPSHOT) locally to take 
advantage of the checkpoint rest API classes in 1.17 and start coding. However 
I found it a little bit tricky to deal with the dependencies, since in flink 
1.17 {{io.fabric8}} classes are shaded and relocated in the 
{{flink-kubernetes}} in 
[this|https://github.com/apache/flink/commit/17d7c39bb2a9fcbaac1ead42073c099a52171d7d]
 commit.

This will cause some {{io.fabric8}} classes in the imports not found (which 
means these classes used to be only provided by {{{}flink-kubernetes{}}}), 
fixing their package is comparatively easy, just to change import 
{{io.fabric8}} classes packages to 
{{org.apache.flink.kubernetes.shaded.io.fabric8}}

Things get trickier when {{io.fabric8}} classes used to be provided by 
{{flink-kubernetes-operator}} (like 
[these|https://github.com/apache/flink-kubernetes-operator/blob/a1842d4c0170feb008293963ec51c0343f42771d/flink-kubernetes-standalone/pom.xml#L74-L79]
 ). The new relocation change in flink 1.17 will cause 
{{flink-kubernetes-operator}} functions (like 
[this|https://github.com/apache/flink-kubernetes-operator/blob/a1842d4c0170feb008293963ec51c0343f42771d/flink-kubernetes-standalone/src/test/java/org/apache/flink/kubernetes/operator/kubeclient/parameters/ParametersTestBase.java#L92])
 with arguments of {{io.fabric8}} classes now requiring 
{{org.apache.flink.kubernetes.shaded.io.fabric8}} classes arguments. Meanwhile 
it seems {{flink-kubernetes}} vs {{flink-kubernetes-operator}} are using 
different {{io.fabric8}} versions (5.12.3 vs 6.2.0 respectively), this means 
changing import class packages like previous case will cause a version 
downgrade for  {{io.fabric8}} classes.

2 work arounds I can think of
 # to have a dedicated subproject just to relocate back the {{io.fabric8}} 
classes in {{{}flink-kubernetes{}}}, (or other way around relocate things to 
{{{}org.apache.flink.kubernetes.shaded.io.fabric8{}}}) both seem a little bit 
ugly.
 # ignore the {{io.fabric8}} classes from {{flink-kubernetes}} and add all 
required {{io.fabric8}} dependencies to pom.xml, not sure if it is ok to 
upgrade all dependencies version to 6.2.0.

I am wondering if you folks have some good advices how to deal with this. 


was (Author: JIRAUSER290356):
[~gyfora] [~thw] Thanks for the guidance!

I am trying to bump {{flink.version}} up to 1.17 (SNAPSHOT) locally to take 
advantage of the checkpoint rest API classes in 1.17 and start coding. However 
I found it a little bit tricky to deal with the dependencies, since in flink 
1.17 {{io.fabric8}} classes are shaded and relocated in the 
{{flink-kubernetes}} in 
[this|https://github.com/apache/flink/commit/17d7c39bb2a9fcbaac1ead42073c099a52171d7d]
 commit.

This will cause some {{io.fabric8}} classes in the imports not found (which 
means these classes used to be only provided by {{{}flink-kubernetes{}}}), 
fixing their package is comparatively easy, just to change import 
{{io.fabric8}} classes packages to 
{{org.apache.flink.kubernetes.shaded.io.fabric8}}

Things get trickier when {{io.fabric8}} classes used to be provided by 
{{flink-kubernetes-operator}} (like 
[these|https://github.com/apache/flink-kubernetes-operator/blob/a1842d4c0170feb008293963ec51c0343f42771d/flink-kubernetes-standalone/pom.xml#L74-L79]
 ). The new relocation change in flink 1.17 will cause 
{{flink-kubernetes-operator}} functions (like 
[this|https://github.com/apache/flink-kubernetes-operator/blob/a1842d4c0170feb008293963ec51c0343f42771d/flink-kubernetes-standalone/src/test/java/org/apache/flink/kubernetes/operator/kubeclient/parameters/ParametersTestBase.java#L92])
 with arguments of {{io.fabric8}} classes now requiring 
{{org.apache.flink.kubernetes.shaded.io.fabric8}} classes arguments. However it 
seems {{flink-kubernetes}} vs {{flink-kubernetes-operator}} are using different 
{{io.fabric8}} versions (5.12.3 vs 6.2.0 respectively), this means changing 
import class packages like previous case will cause a version downgrade for  
{{io.fabric8}} classes.

2 work arounds I can think of
 # to have a dedicated subproject just to relocate back the {{io.fabric8}} 
classes in {{{}flink-kubernetes{}}}, (or other way around relocate things to 
{{{}org.apache.flink.kubernetes.shaded.io.fabric8{}}}) both seem a little bit 
ugly.
 # ignore the {{io.fabric8}} classes from {{flink-kubernetes}} and add all 
required {{io.fabric8}} dependencies to pom.xml, not sure if it is ok to 
upgrade all dependencies version to 6.2.0.

I am wondering if you folks have some good advices how to deal with this. 

> Support periodic checkpoint triggering
> 

[jira] [Comment Edited] (FLINK-29634) Support periodic checkpoint triggering

2023-01-01 Thread Jiale Tan (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-29634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17653483#comment-17653483
 ] 

Jiale Tan edited comment on FLINK-29634 at 1/2/23 3:13 AM:
---

[~gyfora] [~thw] Thanks for the guidance!

I am trying to bump {{flink.version}} up to 1.17 (SNAPSHOT) locally to take 
advantage of the checkpoint rest API classes in 1.17 and start coding. However 
I found it a little bit tricky to deal with the dependencies, since in flink 
1.17 {{io.fabric8}} classes are shaded and relocated in the 
{{flink-kubernetes}} in 
[this|https://github.com/apache/flink/commit/17d7c39bb2a9fcbaac1ead42073c099a52171d7d]
 commit.

This will cause some {{io.fabric8}} classes in the imports not found (which 
means these classes used to be only provided by {{{}flink-kubernetes{}}}), 
fixing their package is comparatively easy, just to change import 
{{io.fabric8}} classes packages to 
{{org.apache.flink.kubernetes.shaded.io.fabric8}}

Things get trickier when {{io.fabric8}} classes used to be provided by 
{{flink-kubernetes-operator}} (like 
[these|https://github.com/apache/flink-kubernetes-operator/blob/a1842d4c0170feb008293963ec51c0343f42771d/flink-kubernetes-standalone/pom.xml#L74-L79]
 ). The new relocation change in flink 1.17 will cause 
{{flink-kubernetes-operator}} functions (like 
[this|https://github.com/apache/flink-kubernetes-operator/blob/a1842d4c0170feb008293963ec51c0343f42771d/flink-kubernetes-standalone/src/test/java/org/apache/flink/kubernetes/operator/kubeclient/parameters/ParametersTestBase.java#L92])
 with arguments of {{io.fabric8}} classes now requiring 
{{org.apache.flink.kubernetes.shaded.io.fabric8}} classes arguments. However it 
seems {{flink-kubernetes}} vs {{flink-kubernetes-operator}} are using different 
{{io.fabric8}} versions (5.12.3 vs 6.2.0 respectively), this means changing 
import class packages like previous case will cause a version downgrade for  
{{io.fabric8}} classes.

2 work arounds I can think of
 # to have a dedicated subproject just to relocate back the {{io.fabric8}} 
classes in {{{}flink-kubernetes{}}}, (or other way around relocate things to 
{{{}org.apache.flink.kubernetes.shaded.io.fabric8{}}}) both seem a little bit 
ugly.
 # ignore the {{io.fabric8}} classes from {{flink-kubernetes}} and add all 
required {{io.fabric8}} dependencies to pom.xml, not sure if it is ok to 
upgrade all dependencies version to 6.2.0.

I am wondering if you folks have some good advices how to deal with this. 


was (Author: JIRAUSER290356):
[~gyfora] [~thw] Thanks for the guidance!

I am trying to bump {{flink.version}} up to 1.17 (SNAPSHOT) locally to take 
advantage of the checkpoint rest API classes in 1.17 and start coding. However 
I found it a little bit tricky to deal with the dependencies, since in flink 
1.17 {{io.fabric8}} classes are shaded and relocated in the 
{{flink-kubernetes}} in 
[this|https://github.com/apache/flink/commit/17d7c39bb2a9fcbaac1ead42073c099a52171d7d]
 commit.

This change will also cause {{flink-kubernetes-operator}} functions (like 
[this|https://github.com/apache/flink-kubernetes-operator/blob/a1842d4c0170feb008293963ec51c0343f42771d/flink-kubernetes-standalone/src/test/java/org/apache/flink/kubernetes/operator/kubeclient/parameters/ParametersTestBase.java#L92])
 with arguments of {{io.fabric8}} classes now requiring 
{{org.apache.flink.kubernetes.shaded.io.fabric8}} classes arguments. The tricky 
thing is: there are certain {{io.fabric8}} classes in 
{{flink-kubernetes-operator}} like 
[these|https://github.com/apache/flink-kubernetes-operator/blob/a1842d4c0170feb008293963ec51c0343f42771d/flink-kubernetes-standalone/pom.xml#L74-L79]
 (not provided by {{{}flink-kubernetes{}}}), so their package name will stick 
with {{io.fabric8}} and lead to compilation failure.

2 work arounds I can think of
 # to have a dedicated subproject just to relocate back the {{io.fabric8}} 
classes in {{{}flink-kubernetes{}}}, (or other way around relocate things to 
{{{}org.apache.flink.kubernetes.shaded.io.fabric8{}}}) both seem a little bit 
ugly.
 # ignore the {{io.fabric8}} classes from {{flink-kubernetes}} and add all 
required {{io.fabric8}} dependencies to pom.xml , however it seems 
{{flink-kubernetes}} vs {{flink-kubernetes-operator}} are using different 
{{io.fabric8}} versions (5.12.3 vs 6.2.0 respectively) , not sure if it is ok 
to upgrade all dependencies version to 6.2.0.

I am wondering if you folks have some good advices how to deal with this. 

> Support periodic checkpoint triggering
> --
>
> Key: FLINK-29634
> URL: https://issues.apache.org/jira/browse/FLINK-29634
> Project: Flink
>  Issue Type: New Feature
>  Components: Kubernetes Operator
>Reporter: Thomas Weise
>Assignee: Jiale Tan
>Priority: Major
>
> Similar to 

[jira] [Comment Edited] (FLINK-29634) Support periodic checkpoint triggering

2023-01-01 Thread Jiale Tan (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-29634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17653483#comment-17653483
 ] 

Jiale Tan edited comment on FLINK-29634 at 1/2/23 2:53 AM:
---

[~gyfora] [~thw] Thanks for the guidance!

I am trying to bump {{flink.version}} up to 1.17 (SNAPSHOT) locally to take 
advantage of the checkpoint rest API classes in 1.17 and start coding. However 
I found it a little bit tricky to deal with the dependencies, since in flink 
1.17 {{io.fabric8}} classes are shaded and relocated in the 
{{flink-kubernetes}} in 
[this|https://github.com/apache/flink/commit/17d7c39bb2a9fcbaac1ead42073c099a52171d7d]
 commit.

This change will also cause {{flink-kubernetes-operator}} functions (like 
[this|https://github.com/apache/flink-kubernetes-operator/blob/a1842d4c0170feb008293963ec51c0343f42771d/flink-kubernetes-standalone/src/test/java/org/apache/flink/kubernetes/operator/kubeclient/parameters/ParametersTestBase.java#L92])
 with arguments of {{io.fabric8}} classes now requiring 
{{org.apache.flink.kubernetes.shaded.io.fabric8}} classes arguments. The tricky 
thing is: there are certain {{io.fabric8}} classes in 
{{flink-kubernetes-operator}} like 
[these|https://github.com/apache/flink-kubernetes-operator/blob/a1842d4c0170feb008293963ec51c0343f42771d/flink-kubernetes-standalone/pom.xml#L74-L79]
 (not provided by {{{}flink-kubernetes{}}}), so their package name will stick 
with {{io.fabric8}} and lead to compilation failure.

2 work arounds I can think of
 # to have a dedicated subproject just to relocate back the {{io.fabric8}} 
classes in {{{}flink-kubernetes{}}}, (or other way around relocate things to 
{{{}org.apache.flink.kubernetes.shaded.io.fabric8{}}}) both seem a little bit 
ugly.
 # ignore the {{io.fabric8}} classes from {{flink-kubernetes}} and add all 
required {{io.fabric8}} dependencies to pom.xml , however it seems 
{{flink-kubernetes}} vs {{flink-kubernetes-operator}} are using different 
{{io.fabric8}} versions (5.12.3 vs 6.2.0 respectively) , not sure if it is ok 
to upgrade all dependencies version to 6.2.0.

I am wondering if you folks have some good advices how to deal with this. 


was (Author: JIRAUSER290356):
[~gyfora] [~thw] Thanks for the guidance!

I am trying to bump {{flink.version}} up to 1.17 (SNAPSHOT) locally to take 
advantage of the checkpoint rest API classes in 1.17 and start coding. However 
I found it a little bit tricky to deal with the dependencies, since in flink 
1.17 {{io.fabric8}} classes are shaded and relocated in the 
{{flink-kubernetes}} in 
[this|https://github.com/apache/flink/commit/17d7c39bb2a9fcbaac1ead42073c099a52171d7d]
 commit.

This change will also cause {{flink-kubernetes-operator}} functions with 
arguments of {{io.fabric8}} classes now requiring 
{{org.apache.flink.kubernetes.shaded.io.fabric8}} classes arguments. The tricky 
thing is: there are certain {{io.fabric8}} classes in 
{{flink-kubernetes-operator}} like 
[these|https://github.com/apache/flink-kubernetes-operator/blob/a1842d4c0170feb008293963ec51c0343f42771d/flink-kubernetes-standalone/pom.xml#L74-L79]
 (not provided by {{{}flink-kubernetes{}}}), so their package name will stick 
with {{io.fabric8}} and lead to compilation failure.

2 work arounds I can think of
 # to have a dedicated subproject just to relocate back the {{io.fabric8}} 
classes in {{{}flink-kubernetes{}}}, (or other way around relocate things to 
{{{}org.apache.flink.kubernetes.shaded.io.fabric8{}}}) both seem a little bit 
ugly.
 # ignore the {{io.fabric8}} classes from {{flink-kubernetes}} and add all 
required {{io.fabric8}} dependencies to pom.xml , however it seems 
{{flink-kubernetes}} vs {{flink-kubernetes-operator}} are using different 
{{io.fabric8}} versions (5.12.3 vs 6.2.0 respectively) , not sure if it is ok 
to upgrade all dependencies version to 6.2.0.

I am wondering if you folks have some good advices how to deal with this. 

> Support periodic checkpoint triggering
> --
>
> Key: FLINK-29634
> URL: https://issues.apache.org/jira/browse/FLINK-29634
> Project: Flink
>  Issue Type: New Feature
>  Components: Kubernetes Operator
>Reporter: Thomas Weise
>Assignee: Jiale Tan
>Priority: Major
>
> Similar to the support for periodic savepoints, the operator should support 
> triggering periodic checkpoints to break the incremental checkpoint chain.
> Support for external triggering will come with 1.17: 
> https://issues.apache.org/jira/browse/FLINK-27101 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (FLINK-29634) Support periodic checkpoint triggering

2023-01-01 Thread Jiale Tan (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-29634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17653483#comment-17653483
 ] 

Jiale Tan edited comment on FLINK-29634 at 1/2/23 2:45 AM:
---

[~gyfora] [~thw] Thanks for the guidance!

I am trying to bump {{flink.version}} up to 1.17 (SNAPSHOT) locally to take 
advantage of the checkpoint rest API classes in 1.17 and start coding. However 
I found it a little bit tricky to deal with the dependencies, since in flink 
1.17 {{io.fabric8}} classes are shaded and relocated in the 
{{flink-kubernetes}} in 
[this|https://github.com/apache/flink/commit/17d7c39bb2a9fcbaac1ead42073c099a52171d7d]
 commit.

This change will also cause {{flink-kubernetes-operator}} functions with 
arguments of {{io.fabric8}} classes now requiring 
{{org.apache.flink.kubernetes.shaded.io.fabric8}} classes arguments. The tricky 
thing is: there are certain {{io.fabric8}} classes in 
{{flink-kubernetes-operator}} like 
[these|https://github.com/apache/flink-kubernetes-operator/blob/a1842d4c0170feb008293963ec51c0343f42771d/flink-kubernetes-standalone/pom.xml#L74-L79]
 (not provided by {{{}flink-kubernetes{}}}), so their package name will stick 
with {{io.fabric8}} and lead to compilation failure.

2 work arounds I can think of
 # to have a dedicated subproject just to relocate back the {{io.fabric8}} 
classes in {{{}flink-kubernetes{}}}, (or other way around relocate things to 
{{{}org.apache.flink.kubernetes.shaded.io.fabric8{}}}) both seem a little bit 
ugly.
 # ignore the {{io.fabric8}} classes from {{flink-kubernetes}} and add all 
required {{io.fabric8}} dependencies to pom.xml , however it seems 
{{flink-kubernetes}} vs {{flink-kubernetes-operator}} are using different 
{{io.fabric8}} versions (5.12.3 vs 6.2.0 respectively) , not sure if it is ok 
to upgrade all dependencies version to 6.2.0.

I am wondering if you folks have some good advices how to deal with this. 


was (Author: JIRAUSER290356):
[~gyfora] [~thw] Thanks for the guidance!

I am trying to bump {{flink.version}} up to 1.17 (SNAPSHOT) locally to take 
advantage of the checkpoint rest API classes in 1.17 and start coding. However 
I found it a little bit tricky to deal with the dependencies, since in flink 
1.17 {{io.fabric8}} classes are shaded and relocated in the 
{{flink-kubernetes}} in 
[this|https://github.com/apache/flink/commit/17d7c39bb2a9fcbaac1ead42073c099a52171d7d]
 commit.

This change will also cause {{flink-kubernetes-operator}} functions with 
arguments of {{io.fabric8}} classes now requiring 
{{org.apache.flink.kubernetes.shaded.io.fabric8}} classes arguments. The tricky 
thing is: there are certain {{io.fabric8}} classes in 
{{flink-kubernetes-operator}} like 
[these|https://github.com/apache/flink-kubernetes-operator/blob/a1842d4c0170feb008293963ec51c0343f42771d/flink-kubernetes-standalone/pom.xml#L74-L79]
 (not provided by {{{}flink-kubernetes{}}}), so their package name will stick 
with {{io.fabric8}} and lead to compilation failure.

2 work arounds I can think of
 # to have a dedicated subproject just to relocate back the {{io.fabric8}} 
classes in {{{}flink-kubernetes{}}}, (or other way around relocate things to 
{{{}org.apache.flink.kubernetes.shaded.io.fabric8{}}}) which seems a little bit 
ugly.
 # ignore the {{io.fabric8}} classes from {{flink-kubernetes}} and add all 
required {{io.fabric8}} dependencies to pom.xml , however it seems 
{{flink-kubernetes}} vs {{flink-kubernetes-operator}} are using different 
{{io.fabric8}} versions (5.12.3 vs 6.2.0 respectively) , not sure if it is ok 
to upgrade all dependencies version to 6.2.0.

I am wondering if you folks have some good advices how to deal with this. 

> Support periodic checkpoint triggering
> --
>
> Key: FLINK-29634
> URL: https://issues.apache.org/jira/browse/FLINK-29634
> Project: Flink
>  Issue Type: New Feature
>  Components: Kubernetes Operator
>Reporter: Thomas Weise
>Assignee: Jiale Tan
>Priority: Major
>
> Similar to the support for periodic savepoints, the operator should support 
> triggering periodic checkpoints to break the incremental checkpoint chain.
> Support for external triggering will come with 1.17: 
> https://issues.apache.org/jira/browse/FLINK-27101 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (FLINK-29634) Support periodic checkpoint triggering

2023-01-01 Thread Jiale Tan (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-29634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17653483#comment-17653483
 ] 

Jiale Tan edited comment on FLINK-29634 at 1/2/23 2:44 AM:
---

[~gyfora] [~thw] Thanks for the guidance!

I am trying to bump {{flink.version}} up to 1.17 (SNAPSHOT) locally to take 
advantage of the checkpoint rest API classes in 1.17 and start coding. However 
I found it a little bit tricky to deal with the dependencies, since in flink 
1.17 {{io.fabric8}} classes are shaded and relocated in the 
{{flink-kubernetes}} in 
[this|https://github.com/apache/flink/commit/17d7c39bb2a9fcbaac1ead42073c099a52171d7d]
 commit.

This change will also cause {{flink-kubernetes-operator}} functions with 
arguments of {{io.fabric8}} classes now requiring 
{{org.apache.flink.kubernetes.shaded.io.fabric8}} classes arguments. The tricky 
thing is: there are certain {{io.fabric8}} classes in 
{{flink-kubernetes-operator}} like 
[these|https://github.com/apache/flink-kubernetes-operator/blob/a1842d4c0170feb008293963ec51c0343f42771d/flink-kubernetes-standalone/pom.xml#L74-L79]
 (not provided by {{{}flink-kubernetes{}}}), so their package name will stick 
with {{io.fabric8}} and lead to compilation failure.

2 work arounds I can think of
 # to have a dedicated subproject just to relocate back the {{io.fabric8}} 
classes in {{{}flink-kubernetes{}}}, (or other way around relocate things to 
{{{}org.apache.flink.kubernetes.shaded.io.fabric8{}}}) which seems a little bit 
ugly.
 # ignore the {{io.fabric8}} classes from {{flink-kubernetes}} and add all 
required {{io.fabric8}} dependencies to pom.xml , however it seems 
{{flink-kubernetes}} vs {{flink-kubernetes-operator}} are using different 
{{io.fabric8}} versions (5.12.3 vs 6.2.0 respectively) , not sure if it is ok 
to upgrade all dependencies version to 6.2.0.

I am wondering if you folks have some good advices how to deal with this. 


was (Author: JIRAUSER290356):
[~gyfora] [~thw] Thanks for the guidance!

I am trying to bump {{flink.version}} up to 1.17 (SNAPSHOT) locally to take 
advantage of the checkpoint rest API classes in 1.17 and start coding. However 
I found it a little bit tricky to deal with the dependencies, since in flink 
1.17 {{io.fabric8}} classes are shaded and relocated in the 
{{flink-kubernetes}} in 
[this|https://github.com/apache/flink/commit/17d7c39bb2a9fcbaac1ead42073c099a52171d7d]
 commit.

This change will also cause {{flink-kubernetes-operator}} functions with 
arguments of {{io.fabric8}} classes now requiring 
{{org.apache.flink.kubernetes.shaded.io.fabric8}} classes arguments. The tricky 
thing is: there are certain {{io.fabric8}} classes in 
{{flink-kubernetes-operator}} like 
[these|https://github.com/apache/flink-kubernetes-operator/blob/a1842d4c0170feb008293963ec51c0343f42771d/flink-kubernetes-standalone/pom.xml#L74-L79]
 (not provided by {{{}flink-kubernetes{}}}), so their package name will stick 
with {{io.fabric8}} and lead to compilation failure.

2 work arounds I can think of
 # to have a dedicated subproject just to relocate back the {{io.fabric8}} 
classes in {{{}flink-kubernetes{}}}, (or other way around relocate things to 
{{{}org.apache.flink.kubernetes.shaded.io.fabric8{}}}) which seems a little bit 
ugly.
 # ignore the {{io.fabric8}} classes from {{flink-kubernetes}} and add all 
required {{io.fabric8}} dependencies to pom.xml , however it seems 
{{flink-kubernetes}} vs {{flink-kubernetes-operator}} are using different 
{{io.fabric8}} versions (5.12.3 vs 6.2.0 respectively) , not sure if it is ok 
to upgrade all dependencies version to 6.2.0.

I am wondering if you folks have some good practices to deal with this. 

> Support periodic checkpoint triggering
> --
>
> Key: FLINK-29634
> URL: https://issues.apache.org/jira/browse/FLINK-29634
> Project: Flink
>  Issue Type: New Feature
>  Components: Kubernetes Operator
>Reporter: Thomas Weise
>Assignee: Jiale Tan
>Priority: Major
>
> Similar to the support for periodic savepoints, the operator should support 
> triggering periodic checkpoints to break the incremental checkpoint chain.
> Support for external triggering will come with 1.17: 
> https://issues.apache.org/jira/browse/FLINK-27101 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (FLINK-29634) Support periodic checkpoint triggering

2023-01-01 Thread Jiale Tan (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-29634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17653483#comment-17653483
 ] 

Jiale Tan edited comment on FLINK-29634 at 1/2/23 2:43 AM:
---

[~gyfora] [~thw] Thanks for the guidance!

I am trying to bump {{flink.version}} up to 1.17 (SNAPSHOT) locally to take 
advantage of the checkpoint rest API classes in 1.17 and start coding. However 
I found it a little bit tricky to deal with the dependencies, since in flink 
1.17 {{io.fabric8}} classes are shaded and relocated in the 
{{flink-kubernetes}} in 
[this|https://github.com/apache/flink/commit/17d7c39bb2a9fcbaac1ead42073c099a52171d7d]
 commit.

This change will also cause {{flink-kubernetes-operator}} functions with 
arguments of {{io.fabric8}} classes now requiring 
{{org.apache.flink.kubernetes.shaded.io.fabric8}} classes arguments. The tricky 
thing is: there are certain {{io.fabric8}} classes in 
{{flink-kubernetes-operator}} like 
[these|https://github.com/apache/flink-kubernetes-operator/blob/a1842d4c0170feb008293963ec51c0343f42771d/flink-kubernetes-standalone/pom.xml#L74-L79]
 (not provided by {{{}flink-kubernetes{}}}), so their package name will stick 
with {{io.fabric8}} and lead to compilation failure.

2 work arounds I can think of
 # to have a dedicated subproject just to relocate back the {{io.fabric8}} 
classes in {{{}flink-kubernetes{}}}, (or other way around relocate things to 
{{{}org.apache.flink.kubernetes.shaded.io.fabric8{}}}) which seems a little bit 
ugly.
 # ignore the {{io.fabric8}} classes from {{flink-kubernetes}} and add all 
required {{io.fabric8}} dependencies to pom.xml , however it seems 
{{flink-kubernetes}} vs {{flink-kubernetes-operator}} are using different 
{{io.fabric8}} versions (5.12.3 vs 6.2.0 respectively) , not sure if it is ok 
to upgrade all dependencies version to 6.2.0.

I am wondering if you folks have some good practices to deal with this. 


was (Author: JIRAUSER290356):
[~gyfora] [~thw] Thanks for the guidance!

I am trying to bump {{flink.version}} up to 1.17 (SNAPSHOT) locally to take 
advantage of the checkpoint rest API classes in 1.17 and start coding. However 
I found it a little bit tricky to deal with the dependencies, since in flink 
1.17 {{io.fabric8}} classes are shaded and relocated in the 
{{flink-kubernetes}} in 
[this|https://github.com/apache/flink/commit/17d7c39bb2a9fcbaac1ead42073c099a52171d7d]
 commit.

This change will also cause {{flink-kubernetes-operator}} functions with 
arguments of {{io.fabric8}} classes now requiring 
{{org.apache.flink.kubernetes.shaded.io.fabric8}} classes arguments. The tricky 
thing is: there are certain {{io.fabric8}} classes in 
{{flink-kubernetes-operator}} like 
[these|https://github.com/apache/flink-kubernetes-operator/blob/a1842d4c0170feb008293963ec51c0343f42771d/flink-kubernetes-standalone/pom.xml#L74-L79]
 (not provided by {{flink-kubernetes}}), so their package name will stick with 
{{io.fabric8}} and lead to compilation failure.

2 work arounds I can think of
 # to have a dedicated subproject just to relocate back the {{io.fabric8}} 
classes in {{flink-kubernetes}}, (or other way around relocate things to 
{{org.apache.flink.kubernetes.shaded.io.fabric8}}) which seems a little bit 
ugly.
 # ignore the {{io.fabric8}} classes from {{flink-kubernetes}} and add all 
required {{io.fabric8}} dependencies to pom.xml , however it seems 
{{flink-kubernetes}} (5.12.3) vs {{flink-kubernetes-operator}} (6.2.0) are 
using different {{io.fabric8}} versions, not sure if it is ok to upgrade all 
dependencies version to 6.2.0.

I am wondering if you folks have some good practices to deal with this. 

> Support periodic checkpoint triggering
> --
>
> Key: FLINK-29634
> URL: https://issues.apache.org/jira/browse/FLINK-29634
> Project: Flink
>  Issue Type: New Feature
>  Components: Kubernetes Operator
>Reporter: Thomas Weise
>Assignee: Jiale Tan
>Priority: Major
>
> Similar to the support for periodic savepoints, the operator should support 
> triggering periodic checkpoints to break the incremental checkpoint chain.
> Support for external triggering will come with 1.17: 
> https://issues.apache.org/jira/browse/FLINK-27101 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (FLINK-29634) Support periodic checkpoint triggering

2023-01-01 Thread Jiale Tan (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-29634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17653483#comment-17653483
 ] 

Jiale Tan edited comment on FLINK-29634 at 1/2/23 2:41 AM:
---

[~gyfora] [~thw] Thanks for the guidance!

I am trying to bump {{flink.version}} up to 1.17 (SNAPSHOT) locally to take 
advantage of the checkpoint rest API classes in 1.17 and start coding. However 
I found it a little bit tricky to deal with the dependencies, since in flink 
1.17 {{io.fabric8}} classes are shaded and relocated in the 
{{flink-kubernetes}} in 
[this|https://github.com/apache/flink/commit/17d7c39bb2a9fcbaac1ead42073c099a52171d7d]
 commit.

This change will also cause {{flink-kubernetes-operator}} functions with 
arguments of {{io.fabric8}} classes now requiring 
{{org.apache.flink.kubernetes.shaded.io.fabric8}} classes arguments. The tricky 
thing is: there are certain {{io.fabric8}} classes in 
{{flink-kubernetes-operator}} like 
[these|https://github.com/apache/flink-kubernetes-operator/blob/a1842d4c0170feb008293963ec51c0343f42771d/flink-kubernetes-standalone/pom.xml#L74-L79]
 (not provided by {{flink-kubernetes}}), so their package name will stick with 
{{io.fabric8}} and lead to compilation failure.

2 work arounds I can think of
 # to have a dedicated subproject just to relocate back the {{io.fabric8}} 
classes in {{flink-kubernetes}}, (or other way around relocate things to 
{{org.apache.flink.kubernetes.shaded.io.fabric8}}) which seems a little bit 
ugly.
 # ignore the {{io.fabric8}} classes from {{flink-kubernetes}} and add all 
required {{io.fabric8}} dependencies to pom.xml , however it seems 
{{flink-kubernetes}} (5.12.3) vs {{flink-kubernetes-operator}} (6.2.0) are 
using different {{io.fabric8}} versions, not sure if it is ok to upgrade all 
dependencies version to 6.2.0.

I am wondering if you folks have some good practices to deal with this. 


was (Author: JIRAUSER290356):
[~gyfora] [~thw] Thanks for the guidance!

I am trying to bump {{flink.version}} up to 1.17 (SNAPSHOT) locally to take 
advantage of the checkpoint rest API classes in 1.17 and start coding. However 
I found it a little bit tricky to deal with the dependencies, since in flink 
1.17 {{io.fabric8}} classes are shaded and relocated in the 
{{flink-kubernetes}} in 
[this|https://github.com/apache/flink/commit/17d7c39bb2a9fcbaac1ead42073c099a52171d7d]
 commit.

This change will also cause {{flink-kubernetes-operator}} functions with 
arguments of {{io.fabric8}} classes now requiring 
{{org.apache.flink.kubernetes.shaded.io.fabric8}} classes arguments. The tricky 
thing is: there are certain {{io.fabric8}} classes in 
{{flink-kubernetes-operator}} like 
[these|https://github.com/apache/flink-kubernetes-operator/blob/a1842d4c0170feb008293963ec51c0343f42771d/flink-kubernetes-standalone/pom.xml#L74-L79]
 (not provided by {{{}flink-kubernetes{}}}), so their package name will stick 
with {{io.fabric8}} and lead to compilation failure.

2 work arounds I can think of
 # to have a dedicated subproject just to relocate back the {{io.fabric8}} 
classes in {{{}flink-kubernetes{}}}, (or other way around relocate things to 
{{{}org.apache.flink.kubernetes.shaded.io.fabric8{}}}) which seems a little bit 
ugly.
 # ignore the {{io.fabric8 }}classes from{{ }}{{flink-kubernetes }}and add all 
required{{ }}{{io.fabric8 }}dependencies to pom.xml , however it seems 
{{flink-kubernetes }} (5.12.3) vs {{flink-kubernetes-operator}} (6.2.0) are 
using different {{io.fabric8}} versions, not sure if it is ok to upgrade all 
dependencies version to 6.2.0.{{ }}{{{}{}}}{{{}{}}}

I am wondering if you folks have some good practices to deal with this. 

> Support periodic checkpoint triggering
> --
>
> Key: FLINK-29634
> URL: https://issues.apache.org/jira/browse/FLINK-29634
> Project: Flink
>  Issue Type: New Feature
>  Components: Kubernetes Operator
>Reporter: Thomas Weise
>Assignee: Jiale Tan
>Priority: Major
>
> Similar to the support for periodic savepoints, the operator should support 
> triggering periodic checkpoints to break the incremental checkpoint chain.
> Support for external triggering will come with 1.17: 
> https://issues.apache.org/jira/browse/FLINK-27101 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (FLINK-29634) Support periodic checkpoint triggering

2023-01-01 Thread Jiale Tan (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-29634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17653483#comment-17653483
 ] 

Jiale Tan edited comment on FLINK-29634 at 1/2/23 2:38 AM:
---

[~gyfora] [~thw] Thanks for the guidance!

I am trying to bump {{flink.version}} up to 1.17 (SNAPSHOT) locally to take 
advantage of the checkpoint rest API classes in 1.17 and start coding. However 
I found it a little bit tricky to deal with the dependencies, since in flink 
1.17 {{io.fabric8}} classes are shaded and relocated in the 
{{flink-kubernetes}} in 
[this|https://github.com/apache/flink/commit/17d7c39bb2a9fcbaac1ead42073c099a52171d7d]
 commit.

This change will also cause {{flink-kubernetes-operator}} functions with 
arguments of {{io.fabric8}} classes now requiring 
{{org.apache.flink.kubernetes.shaded.io.fabric8}} classes arguments. The tricky 
thing is: there are certain {{io.fabric8}} classes in 
{{flink-kubernetes-operator}} like 
[these|https://github.com/apache/flink-kubernetes-operator/blob/a1842d4c0170feb008293963ec51c0343f42771d/flink-kubernetes-standalone/pom.xml#L74-L79]
 (not provided by {{{}flink-kubernetes{}}}), so their package name will stick 
with {{io.fabric8}} and lead to compilation failure.

2 work arounds I can think of
 # to have a dedicated subproject just to relocate back the {{io.fabric8}} 
classes in {{{}flink-kubernetes{}}}, (or other way around relocate things to 
{{{}org.apache.flink.kubernetes.shaded.io.fabric8{}}}) which seems a little bit 
ugly.
 # ignore the {{io.fabric8 }}classes from{{ }}{{flink-kubernetes }}and add all 
required{{ }}{{io.fabric8 }}dependencies to pom.xml , however it seems 
{{flink-kubernetes }} (5.12.3) vs {{flink-kubernetes-operator}} (6.2.0) are 
using different {{io.fabric8}} versions, not sure if it is ok to upgrade all 
dependencies version to 6.2.0.{{ }}{{{}{}}}{{{}{}}}

I am wondering if you folks have some good practices to deal with this. 


was (Author: JIRAUSER290356):
[~gyfora] [~thw] Thanks for the guidance!

I am trying to bump {{flink.version}} up to 1.17 (SNAPSHOT) locally to take 
advantage of the checkpoint rest API classes in 1.17 and start coding. However 
I found it a little bit tricky to deal with the dependencies, since in flink 
1.17 {{io.fabric8}} classes are shaded and relocated in the 
{{flink-kubernetes}} in 
[this|https://github.com/apache/flink/commit/17d7c39bb2a9fcbaac1ead42073c099a52171d7d]
 commit.

This change will also cause {{flink-kubernetes-operator}} functions with 
arguments of {{io.fabric8}} classes now requiring 
{{org.apache.flink.kubernetes.shaded.io.fabric8}} classes arguments. The tricky 
thing is: there are certain {{io.fabric8}} classes in 
{{flink-kubernetes-operator}} like 
[these|https://github.com/apache/flink-kubernetes-operator/blob/a1842d4c0170feb008293963ec51c0343f42771d/flink-kubernetes-standalone/pom.xml#L74-L79]
 (not provided by {{{}flink-kubernetes{}}}), so their package name will stick 
with {{io.fabric8}} and lead to compilation failure.

One work around I can think of is to have a dedicated subproject just to 
relocate back the {{io.fabric8}} classes in {{{}flink-kubernetes{}}}, (or other 
way around relocate things to 
{{{}org.apache.flink.kubernetes.shaded.io.fabric8{}}}) which seems a little bit 
ugly. 

I am wondering if you folks have some good practices to deal with this. 

> Support periodic checkpoint triggering
> --
>
> Key: FLINK-29634
> URL: https://issues.apache.org/jira/browse/FLINK-29634
> Project: Flink
>  Issue Type: New Feature
>  Components: Kubernetes Operator
>Reporter: Thomas Weise
>Assignee: Jiale Tan
>Priority: Major
>
> Similar to the support for periodic savepoints, the operator should support 
> triggering periodic checkpoints to break the incremental checkpoint chain.
> Support for external triggering will come with 1.17: 
> https://issues.apache.org/jira/browse/FLINK-27101 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (FLINK-29634) Support periodic checkpoint triggering

2023-01-01 Thread Jiale Tan (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-29634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17653483#comment-17653483
 ] 

Jiale Tan edited comment on FLINK-29634 at 1/2/23 2:28 AM:
---

[~gyfora] [~thw] Thanks for the guidance!

I am trying to bump {{flink.version}} up to 1.17 (SNAPSHOT) locally to take 
advantage of the checkpoint rest API classes in 1.17 and start coding. However 
I found it a little bit tricky to deal with the dependencies, since in flink 
1.17 {{io.fabric8}} classes are shaded and relocated in the 
{{flink-kubernetes}} in 
[this|https://github.com/apache/flink/commit/17d7c39bb2a9fcbaac1ead42073c099a52171d7d]
 commit.

This change will also cause {{flink-kubernetes-operator}} functions with 
arguments of {{io.fabric8}} classes now requiring 
{{org.apache.flink.kubernetes.shaded.io.fabric8}} classes arguments. The tricky 
thing is: there are certain {{io.fabric8}} classes in 
{{flink-kubernetes-operator}} like 
[these|https://github.com/apache/flink-kubernetes-operator/blob/a1842d4c0170feb008293963ec51c0343f42771d/flink-kubernetes-standalone/pom.xml#L74-L79]
 (not provided by {{{}flink-kubernetes{}}}), so their package name will stick 
with io.fabric8 and lead to compilation failure.

One work around I can think of is to have a dedicated subproject just to 
relocate back the {{io.fabric8}} classes in {{{}flink-kubernetes{}}}, (or other 
way around relocate things to 
{{{}org.apache.flink.kubernetes.shaded.io.fabric8{}}}) which seems a little bit 
ugly. 

I am wondering if you folks have some good practices to deal with this. 


was (Author: JIRAUSER290356):
[~gyfora] [~thw] Thanks for the guidance!

I am trying to bump {{flink.version}} up to 1.17 (SNAPSHOT) locally to take 
advantage of the checkpoint rest API classes in 1.17 and start coding. However 
I found it a little bit tricky to deal with the dependencies, since in flink 
1.17 {{io.fabric8}} classes are shaded and relocated in the 
{{flink-kubernetes}} in 
[this|https://github.com/apache/flink/commit/17d7c39bb2a9fcbaac1ead42073c099a52171d7d]
 commit.

This change will also cause {{flink-kubernetes-operator}} functions with 
arguments of {{io.fabric8}} classes now requiring 
{{org.apache.flink.kubernetes.shaded.io.fabric8}} classes arguments. The tricky 
thing is: there are certain {{io.fabric8}} classes in 
{{flink-kubernetes-operator}} like 
[these|https://github.com/apache/flink-kubernetes-operator/blob/a1842d4c0170feb008293963ec51c0343f42771d/flink-kubernetes-standalone/pom.xml#L74-L79]
 (not provided by {{{}flink-kubernetes{}}}), so their package name will stick 
with {{io.fabric8 }}and lead to compilation failure.

One work around I can think of is to have a dedicated subproject just to 
relocate back the {{io.fabric8}} classes in {{{}flink-kubernetes{}}}, (or other 
way around relocate things to 
{{{}org.apache.flink.kubernetes.shaded.io.fabric8{}}}) which seems a little bit 
ugly. {{}}

I am wondering if you folks have some good practices to deal with this. 

> Support periodic checkpoint triggering
> --
>
> Key: FLINK-29634
> URL: https://issues.apache.org/jira/browse/FLINK-29634
> Project: Flink
>  Issue Type: New Feature
>  Components: Kubernetes Operator
>Reporter: Thomas Weise
>Assignee: Jiale Tan
>Priority: Major
>
> Similar to the support for periodic savepoints, the operator should support 
> triggering periodic checkpoints to break the incremental checkpoint chain.
> Support for external triggering will come with 1.17: 
> https://issues.apache.org/jira/browse/FLINK-27101 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (FLINK-29634) Support periodic checkpoint triggering

2023-01-01 Thread Jiale Tan (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-29634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17653483#comment-17653483
 ] 

Jiale Tan edited comment on FLINK-29634 at 1/2/23 2:28 AM:
---

[~gyfora] [~thw] Thanks for the guidance!

I am trying to bump {{flink.version}} up to 1.17 (SNAPSHOT) locally to take 
advantage of the checkpoint rest API classes in 1.17 and start coding. However 
I found it a little bit tricky to deal with the dependencies, since in flink 
1.17 {{io.fabric8}} classes are shaded and relocated in the 
{{flink-kubernetes}} in 
[this|https://github.com/apache/flink/commit/17d7c39bb2a9fcbaac1ead42073c099a52171d7d]
 commit.

This change will also cause {{flink-kubernetes-operator}} functions with 
arguments of {{io.fabric8}} classes now requiring 
{{org.apache.flink.kubernetes.shaded.io.fabric8}} classes arguments. The tricky 
thing is: there are certain {{io.fabric8}} classes in 
{{flink-kubernetes-operator}} like 
[these|https://github.com/apache/flink-kubernetes-operator/blob/a1842d4c0170feb008293963ec51c0343f42771d/flink-kubernetes-standalone/pom.xml#L74-L79]
 (not provided by {{{}flink-kubernetes{}}}), so their package name will stick 
with {{io.fabric8}} and lead to compilation failure.

One work around I can think of is to have a dedicated subproject just to 
relocate back the {{io.fabric8}} classes in {{{}flink-kubernetes{}}}, (or other 
way around relocate things to 
{{{}org.apache.flink.kubernetes.shaded.io.fabric8{}}}) which seems a little bit 
ugly. 

I am wondering if you folks have some good practices to deal with this. 


was (Author: JIRAUSER290356):
[~gyfora] [~thw] Thanks for the guidance!

I am trying to bump {{flink.version}} up to 1.17 (SNAPSHOT) locally to take 
advantage of the checkpoint rest API classes in 1.17 and start coding. However 
I found it a little bit tricky to deal with the dependencies, since in flink 
1.17 {{io.fabric8}} classes are shaded and relocated in the 
{{flink-kubernetes}} in 
[this|https://github.com/apache/flink/commit/17d7c39bb2a9fcbaac1ead42073c099a52171d7d]
 commit.

This change will also cause {{flink-kubernetes-operator}} functions with 
arguments of {{io.fabric8}} classes now requiring 
{{org.apache.flink.kubernetes.shaded.io.fabric8}} classes arguments. The tricky 
thing is: there are certain {{io.fabric8}} classes in 
{{flink-kubernetes-operator}} like 
[these|https://github.com/apache/flink-kubernetes-operator/blob/a1842d4c0170feb008293963ec51c0343f42771d/flink-kubernetes-standalone/pom.xml#L74-L79]
 (not provided by {{{}flink-kubernetes{}}}), so their package name will stick 
with io.fabric8 and lead to compilation failure.

One work around I can think of is to have a dedicated subproject just to 
relocate back the {{io.fabric8}} classes in {{{}flink-kubernetes{}}}, (or other 
way around relocate things to 
{{{}org.apache.flink.kubernetes.shaded.io.fabric8{}}}) which seems a little bit 
ugly. 

I am wondering if you folks have some good practices to deal with this. 

> Support periodic checkpoint triggering
> --
>
> Key: FLINK-29634
> URL: https://issues.apache.org/jira/browse/FLINK-29634
> Project: Flink
>  Issue Type: New Feature
>  Components: Kubernetes Operator
>Reporter: Thomas Weise
>Assignee: Jiale Tan
>Priority: Major
>
> Similar to the support for periodic savepoints, the operator should support 
> triggering periodic checkpoints to break the incremental checkpoint chain.
> Support for external triggering will come with 1.17: 
> https://issues.apache.org/jira/browse/FLINK-27101 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (FLINK-29634) Support periodic checkpoint triggering

2023-01-01 Thread Jiale Tan (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-29634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17653483#comment-17653483
 ] 

Jiale Tan edited comment on FLINK-29634 at 1/2/23 2:27 AM:
---

[~gyfora] [~thw] Thanks for the guidance!

I am trying to bump {{flink.version}} up to 1.17 (SNAPSHOT) locally to take 
advantage of the checkpoint rest API classes in 1.17 and start coding. However 
I found it a little bit tricky to deal with the dependencies, since in flink 
1.17 {{io.fabric8}} classes are shaded and relocated in the 
{{flink-kubernetes}} in 
[this|https://github.com/apache/flink/commit/17d7c39bb2a9fcbaac1ead42073c099a52171d7d]
 commit.

This change will also cause {{flink-kubernetes-operator}} functions with 
arguments of {{io.fabric8}} classes now requiring 
{{org.apache.flink.kubernetes.shaded.io.fabric8}} classes arguments. The tricky 
thing is: there are certain {{io.fabric8}} classes in 
{{flink-kubernetes-operator}} like 
[these|https://github.com/apache/flink-kubernetes-operator/blob/a1842d4c0170feb008293963ec51c0343f42771d/flink-kubernetes-standalone/pom.xml#L74-L79]
 (not provided by {{{}flink-kubernetes{}}}), so their package name will stick 
with {{io.fabric8 }}and lead to compilation failure.

One work around I can think of is to have a dedicated subproject just to 
relocate back the {{io.fabric8}} classes in {{{}flink-kubernetes{}}}, (or other 
way around relocate things to 
{{{}org.apache.flink.kubernetes.shaded.io.fabric8{}}}) which seems a little bit 
ugly. {{}}

I am wondering if you folks have some good practices to deal with this. 


was (Author: JIRAUSER290356):
[~gyfora] [~thw] Thanks for the guidance!

I am trying to bump {{flink.version}} up to 1.17 (SNAPSHOT) locally to take 
advantage of the checkpoint rest API classes in 1.17 and start coding. However 
I found it a little bit tricky to deal with the dependencies, since in flink 
1.17 {{io.fabric8}} classes are shaded and relocated in the 
{{flink-kubernetes}} in 
[this|https://github.com/apache/flink/commit/17d7c39bb2a9fcbaac1ead42073c099a52171d7d]
 commit.

This change will also cause {{flink-kubernetes-operator}} functions with 
arguments of {{io.fabric8}} classes now requiring 
{{org.apache.flink.kubernetes.shaded.io.fabric8}} classes arguments. The tricky 
thing is: there are certain {{io.fabric8}} classes in 
{{flink-kubernetes-operator}} like 
[these|https://github.com/apache/flink-kubernetes-operator/blob/a1842d4c0170feb008293963ec51c0343f42771d/flink-kubernetes-standalone/pom.xml#L74-L79]
 (not provided by {{{}flink-kubernetes{}}}), so their package name will stick 
with {{io.fabric8 and lead to compilation failure.}}

One work around I can think of is to have a dedicated subproject just to 
relocate back the {{io.fabric8}} classes in {{{}flink-kubernetes{}}}, which 
seems a little bit ugly.

I am wondering if you folks have some good practices to deal with this. 

> Support periodic checkpoint triggering
> --
>
> Key: FLINK-29634
> URL: https://issues.apache.org/jira/browse/FLINK-29634
> Project: Flink
>  Issue Type: New Feature
>  Components: Kubernetes Operator
>Reporter: Thomas Weise
>Assignee: Jiale Tan
>Priority: Major
>
> Similar to the support for periodic savepoints, the operator should support 
> triggering periodic checkpoints to break the incremental checkpoint chain.
> Support for external triggering will come with 1.17: 
> https://issues.apache.org/jira/browse/FLINK-27101 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (FLINK-29634) Support periodic checkpoint triggering

2023-01-01 Thread Jiale Tan (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-29634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17653483#comment-17653483
 ] 

Jiale Tan edited comment on FLINK-29634 at 1/2/23 2:23 AM:
---

[~gyfora] [~thw] Thanks for the guidance!

I am trying to bump {{flink.version}} up to 1.17 (SNAPSHOT) locally to take 
advantage of the checkpoint rest API classes in 1.17 and start coding. However 
I found it a little bit tricky to deal with the dependencies, since in flink 
1.17 {{io.fabric8}} classes are shaded and relocated in the 
{{flink-kubernetes}} in 
[this|https://github.com/apache/flink/commit/17d7c39bb2a9fcbaac1ead42073c099a52171d7d]
 commit.

This change will also cause {{flink-kubernetes-operator}} functions with 
arguments of {{io.fabric8}} classes now requiring 
{{org.apache.flink.kubernetes.shaded.io.fabric8}} classes arguments. The tricky 
thing is: there are certain {{io.fabric8}} classes in 
{{flink-kubernetes-operator}} like 
[these|https://github.com/apache/flink-kubernetes-operator/blob/a1842d4c0170feb008293963ec51c0343f42771d/flink-kubernetes-standalone/pom.xml#L74-L79]
 (not provided by {{{}flink-kubernetes{}}}), so their package name will stick 
with {{io.fabric8 and lead to compilation failure.}}

One work around I can think of is to have a dedicated subproject just to 
relocate back the {{io.fabric8}} classes in {{{}flink-kubernetes{}}}, which 
seems a little bit ugly.

I am wondering if you folks have some good practices to deal with this. 


was (Author: JIRAUSER290356):
[~gyfora] [~thw] Thanks for the guidance!

I am trying to bump {{flink.version}} up to 1.17 (SNAPSHOT) locally to take 
advantage of the checkpoint rest API classes in 1.17 and start coding. However 
I found it a little bit tricky to deal with the dependencies, since in flink 
1.17 {{io.fabric8}} classes are shaded and relocated in the 
{{flink-kubernetes}} in 
[this|https://github.com/apache/flink/commit/17d7c39bb2a9fcbaac1ead42073c099a52171d7d]
 commit.

This change will also cause {{flink-kubernetes-operator}} functions with 
arguments of {{io.fabric8}} classes now requiring 
{{org.apache.flink.kubernetes.shaded.io.fabric8}} classes arguments. The tricky 
thing is: there are certain {{io.fabric8}} classes in 
{{flink-kubernetes-operator}} like 
[these,|https://github.com/apache/flink-kubernetes-operator/blob/a1842d4c0170feb008293963ec51c0343f42771d/flink-kubernetes-standalone/pom.xml#L74-L79]
 so their package name will stick with {{io.fabric8 and lead to compilation 
failure.}}

One work around I can think of is to have a dedicated subproject just to 
relocate back the {{io.fabric8}} classes in {{{}flink-kubernetes{}}}, which 
seems a little bit ugly.

I am wondering if you folks have some good practices to deal with this. 

> Support periodic checkpoint triggering
> --
>
> Key: FLINK-29634
> URL: https://issues.apache.org/jira/browse/FLINK-29634
> Project: Flink
>  Issue Type: New Feature
>  Components: Kubernetes Operator
>Reporter: Thomas Weise
>Assignee: Jiale Tan
>Priority: Major
>
> Similar to the support for periodic savepoints, the operator should support 
> triggering periodic checkpoints to break the incremental checkpoint chain.
> Support for external triggering will come with 1.17: 
> https://issues.apache.org/jira/browse/FLINK-27101 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (FLINK-29634) Support periodic checkpoint triggering

2023-01-01 Thread Jiale Tan (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-29634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17653483#comment-17653483
 ] 

Jiale Tan edited comment on FLINK-29634 at 1/2/23 2:22 AM:
---

[~gyfora] [~thw] Thanks for the guidance!

I am trying to bump {{flink.version}} up to 1.17 (SNAPSHOT) locally to take 
advantage of the checkpoint rest API classes in 1.17 and start coding. However 
I found it a little bit tricky to deal with the dependencies, since in flink 
1.17 {{io.fabric8}} classes are shaded and relocated in the 
{{flink-kubernetes}} in 
[this|https://github.com/apache/flink/commit/17d7c39bb2a9fcbaac1ead42073c099a52171d7d]
 commit.

This change will also cause {{flink-kubernetes-operator}} functions with 
arguments of {{io.fabric8}} classes now requiring 
{{org.apache.flink.kubernetes.shaded.io.fabric8}} classes arguments. The tricky 
thing is: there are certain {{io.fabric8}} classes in 
{{flink-kubernetes-operator}} like 
[these,|https://github.com/apache/flink-kubernetes-operator/blob/a1842d4c0170feb008293963ec51c0343f42771d/flink-kubernetes-standalone/pom.xml#L74-L79]
 so their package name will stick with {{io.fabric8 and lead to compilation 
failure.}}

One work around I can think of is to have a dedicated subproject just to 
relocate back the {{io.fabric8}} classes in {{{}flink-kubernetes{}}}, which 
seems a little bit ugly.

I am wondering if you folks have some good practices to deal with this. 


was (Author: JIRAUSER290356):
[~gyfora] [~thw] Thanks for the guidance!

I am trying to bump {{flink.version}} up to 1.17 (SNAPSHOT) locally to take 
advantage of the checkpoint rest API classes in 1.17 and start coding. However 
I found it a little bit tricky to deal with the dependencies, since in flink 
1.17 {{io.fabric8}} classes are shaded and relocated in the 
{{flink-kubernetes}} in 
[this|https://github.com/apache/flink/commit/17d7c39bb2a9fcbaac1ead42073c099a52171d7d]
 commit.

This change will also cause {{flink-kubernetes-operator}} functions with 
arguments of {{io.fabric8}} classes now requiring 
{{org.apache.flink.kubernetes.shaded.io.fabric8}} classes arguments. The tricky 
thing is: there are certain {{io.fabric8}} classes in 
{{flink-kubernetes-operator }}which are not provided by{{ flink-kubernetes}} 
like 
[these,|https://github.com/apache/flink-kubernetes-operator/blob/a1842d4c0170feb008293963ec51c0343f42771d/flink-kubernetes-standalone/pom.xml#L74-L79]
 so their package name will stick with {{io.fabric8 and lead to compilation 
failure.}}

One work around I can think of is to have a dedicated subproject just to 
relocate back the {{io.fabric8}} classes in {{{}flink-kubernetes{}}}, which 
seems a little bit ugly.

I am wondering if you folks have some good practices to deal with this. 

> Support periodic checkpoint triggering
> --
>
> Key: FLINK-29634
> URL: https://issues.apache.org/jira/browse/FLINK-29634
> Project: Flink
>  Issue Type: New Feature
>  Components: Kubernetes Operator
>Reporter: Thomas Weise
>Assignee: Jiale Tan
>Priority: Major
>
> Similar to the support for periodic savepoints, the operator should support 
> triggering periodic checkpoints to break the incremental checkpoint chain.
> Support for external triggering will come with 1.17: 
> https://issues.apache.org/jira/browse/FLINK-27101 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (FLINK-29634) Support periodic checkpoint triggering

2023-01-01 Thread Jiale Tan (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-29634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17653483#comment-17653483
 ] 

Jiale Tan edited comment on FLINK-29634 at 1/2/23 2:21 AM:
---

[~gyfora] [~thw] Thanks for the guidance!

I am trying to bump {{flink.version}} up to 1.17 (SNAPSHOT) locally to take 
advantage of the checkpoint rest API classes in 1.17 and start coding. However 
I found it a little bit tricky to deal with the dependencies, since in flink 
1.17 {{io.fabric8}} classes are shaded and relocated in the 
{{flink-kubernetes}} in 
[this|https://github.com/apache/flink/commit/17d7c39bb2a9fcbaac1ead42073c099a52171d7d]
 commit.

This change will also cause {{flink-kubernetes-operator}} functions with 
arguments of {{io.fabric8}} classes now requiring 
{{org.apache.flink.kubernetes.shaded.io.fabric8}} classes arguments. The tricky 
thing is: there are certain {{io.fabric8}} classes in 
{{flink-kubernetes-operator }}which are not provided by{{ flink-kubernetes}} 
like 
[these,|https://github.com/apache/flink-kubernetes-operator/blob/a1842d4c0170feb008293963ec51c0343f42771d/flink-kubernetes-standalone/pom.xml#L74-L79]
 so their package name will stick with {{io.fabric8 and lead to compilation 
failure.}}

One work around I can think of is to have a dedicated subproject just to 
relocate back the {{io.fabric8}} classes in {{{}flink-kubernetes{}}}, which 
seems a little bit ugly.

I am wondering if you folks have some good practices to deal with this. 


was (Author: JIRAUSER290356):
[~gyfora] [~thw] Thanks for the guidance!

I am trying to bump {{flink.version}} up to 1.17 (SNAPSHOT) locally to take 
advantage of the checkpoint rest API classes in 1.17 and start coding. However 
I found it a little bit tricky to deal with the dependencies, since in flink 
1.17 {{io.fabric8}} classes are shaded and relocated in the 
{{flink-kubernetes}} in 
[this|https://github.com/apache/flink/commit/17d7c39bb2a9fcbaac1ead42073c099a52171d7d]
 commit.

This change will also cause {{flink-kubernetes-operator}} functions with 
arguments of {{io.fabric8}} classes now requiring 
{{org.apache.flink.kubernetes.shaded.io.fabric8}} classes arguments. The tricky 
thing is: there are certain {{io.fabric8}} classes in 
{{flink-kubernetes-operator  flink-kubernetes}} like 
[these,|https://github.com/apache/flink-kubernetes-operator/blob/a1842d4c0170feb008293963ec51c0343f42771d/flink-kubernetes-standalone/pom.xml#L74-L79]
 so their package name will stick with {{io.fabric8 and lead to compilation 
failure.}}

One work around I can think of is to have a dedicated subproject just to 
relocate back the {{io.fabric8}} classes in {{{}flink-kubernetes{}}}, which 
seems a little bit ugly.

I am wondering if you folks have some good practices to deal with this. 

> Support periodic checkpoint triggering
> --
>
> Key: FLINK-29634
> URL: https://issues.apache.org/jira/browse/FLINK-29634
> Project: Flink
>  Issue Type: New Feature
>  Components: Kubernetes Operator
>Reporter: Thomas Weise
>Assignee: Jiale Tan
>Priority: Major
>
> Similar to the support for periodic savepoints, the operator should support 
> triggering periodic checkpoints to break the incremental checkpoint chain.
> Support for external triggering will come with 1.17: 
> https://issues.apache.org/jira/browse/FLINK-27101 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (FLINK-29634) Support periodic checkpoint triggering

2023-01-01 Thread Jiale Tan (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-29634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17653483#comment-17653483
 ] 

Jiale Tan edited comment on FLINK-29634 at 1/2/23 2:21 AM:
---

[~gyfora] [~thw] Thanks for the guidance!

I am trying to bump {{flink.version}} up to 1.17 (SNAPSHOT) locally to take 
advantage of the checkpoint rest API classes in 1.17 and start coding. However 
I found it a little bit tricky to deal with the dependencies, since in flink 
1.17 {{io.fabric8}} classes are shaded and relocated in the 
{{flink-kubernetes}} in 
[this|https://github.com/apache/flink/commit/17d7c39bb2a9fcbaac1ead42073c099a52171d7d]
 commit.

This change will also cause {{flink-kubernetes-operator}} functions with 
arguments of {{io.fabric8}} classes now requiring 
{{org.apache.flink.kubernetes.shaded.io.fabric8}} classes arguments. The tricky 
thing is: there are certain {{io.fabric8}} classes in 
{{flink-kubernetes-operator  flink-kubernetes}} like 
[these,|https://github.com/apache/flink-kubernetes-operator/blob/a1842d4c0170feb008293963ec51c0343f42771d/flink-kubernetes-standalone/pom.xml#L74-L79]
 so their package name will stick with {{io.fabric8 and lead to compilation 
failure.}}

One work around I can think of is to have a dedicated subproject just to 
relocate back the {{io.fabric8}} classes in {{{}flink-kubernetes{}}}, which 
seems a little bit ugly.

I am wondering if you folks have some good practices to deal with this. 


was (Author: JIRAUSER290356):
[~gyfora] [~thw] Thanks for the guidance!

I am trying to bump {{flink.version}} up to 1.17 (SNAPSHOT) locally to take 
advantage of the checkpoint rest API classes in 1.17 and start coding. However 
I found it a little bit tricky to deal with the dependencies, since in flink 
1.17 {{io.fabric8}} classes are shaded and relocated in the 
{{flink-kubernetes}} in 
[this|https://github.com/apache/flink/commit/17d7c39bb2a9fcbaac1ead42073c099a52171d7d]
 commit.

This change will also cause {{flink-kubernetes-operator}} functions with 
arguments of {{io.fabric8}} classes now requiring 
{{org.apache.flink.kubernetes.shaded.io.fabric8}} classes arguments. The tricky 
thing is: there are certain {{io.fabric8}} classes in 
{{flink-kubernetes-operator }}which are not provided by{{ flink-kubernetes}} 
like 
[these,|https://github.com/apache/flink-kubernetes-operator/blob/a1842d4c0170feb008293963ec51c0343f42771d/flink-kubernetes-standalone/pom.xml#L74-L79]
 so their package name will stick with {{io.fabric8 and lead to compilation 
failure.}}

One work around I can think of is to have a dedicated subproject just to 
relocate back the {{io.fabric8}} classes in {{{}flink-kubernetes{}}}, which 
seems a little bit ugly.

I am wondering if you folks have some good practices to deal with this. 

> Support periodic checkpoint triggering
> --
>
> Key: FLINK-29634
> URL: https://issues.apache.org/jira/browse/FLINK-29634
> Project: Flink
>  Issue Type: New Feature
>  Components: Kubernetes Operator
>Reporter: Thomas Weise
>Assignee: Jiale Tan
>Priority: Major
>
> Similar to the support for periodic savepoints, the operator should support 
> triggering periodic checkpoints to break the incremental checkpoint chain.
> Support for external triggering will come with 1.17: 
> https://issues.apache.org/jira/browse/FLINK-27101 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (FLINK-29634) Support periodic checkpoint triggering

2023-01-01 Thread Jiale Tan (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-29634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17653483#comment-17653483
 ] 

Jiale Tan edited comment on FLINK-29634 at 1/2/23 2:20 AM:
---

[~gyfora] [~thw] Thanks for the guidance!

I am trying to bump {{flink.version}} up to 1.17 (SNAPSHOT) locally to take 
advantage of the checkpoint rest API classes in 1.17 and start coding. However 
I found it a little bit tricky to deal with the dependencies, since in flink 
1.17 {{io.fabric8}} classes are shaded and relocated in the 
{{flink-kubernetes}} in 
[this|https://github.com/apache/flink/commit/17d7c39bb2a9fcbaac1ead42073c099a52171d7d]
 commit.

This change will also cause {{flink-kubernetes-operator}} functions with 
arguments of {{io.fabric8}} classes now requiring 
{{org.apache.flink.kubernetes.shaded.io.fabric8}} classes arguments. The tricky 
thing is: there are certain {{io.fabric8}} classes in 
{{flink-kubernetes-operator }}which are not provided by{{ flink-kubernetes}} 
like 
[these,|https://github.com/apache/flink-kubernetes-operator/blob/a1842d4c0170feb008293963ec51c0343f42771d/flink-kubernetes-standalone/pom.xml#L74-L79]
 so their package name will stick with {{io.fabric8 and lead to compilation 
failure.}}

One work around I can think of is to have a dedicated subproject just to 
relocate back the {{io.fabric8}} classes in {{{}flink-kubernetes{}}}, which 
seems a little bit ugly.

I am wondering if you folks have some good practices to deal with this. 


was (Author: JIRAUSER290356):
[~gyfora] [~thw] Thanks for the guidance!

I am trying to bump {{flink.version}} up to 1.17 (SNAPSHOT) locally to take 
advantage of the checkpoint rest API classes in 1.17 and start coding. However 
I found it a little bit tricky to deal with the dependencies, since in flink 
1.17 {{io.fabric8}} classes are shaded and relocated in the 
{{flink-kubernetes}} in 
[this|https://github.com/apache/flink/commit/17d7c39bb2a9fcbaac1ead42073c099a52171d7d]
 commit.

This change will also cause {{flink-kubernetes-operator}} functions with 
arguments of {{io.fabric8}} classes now requiring 
{{org.apache.flink.kubernetes.shaded.io.fabric8}} classes. The tricky thing is: 
there are certain {{io.fabric8}} classes in {{flink-kubernetes-operator which 
are not provided by flink-kubernetes}} like 
[these,|https://github.com/apache/flink-kubernetes-operator/blob/a1842d4c0170feb008293963ec51c0343f42771d/flink-kubernetes-standalone/pom.xml#L74-L79]
 so their package name will stick with {{io.fabric8 and lead to compilation 
failure.}}

One work around I can think of is to have a dedicated subproject just to 
relocate back the {{io.fabric8}} classes in {{{}flink-kubernetes{}}}, which 
seems a little bit ugly.

I am wondering if you folks have some good practices to deal with this. 

> Support periodic checkpoint triggering
> --
>
> Key: FLINK-29634
> URL: https://issues.apache.org/jira/browse/FLINK-29634
> Project: Flink
>  Issue Type: New Feature
>  Components: Kubernetes Operator
>Reporter: Thomas Weise
>Assignee: Jiale Tan
>Priority: Major
>
> Similar to the support for periodic savepoints, the operator should support 
> triggering periodic checkpoints to break the incremental checkpoint chain.
> Support for external triggering will come with 1.17: 
> https://issues.apache.org/jira/browse/FLINK-27101 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (FLINK-29634) Support periodic checkpoint triggering

2023-01-01 Thread Jiale Tan (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-29634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17653483#comment-17653483
 ] 

Jiale Tan edited comment on FLINK-29634 at 1/2/23 2:20 AM:
---

[~gyfora] [~thw] Thanks for the guidance!

I am trying to bump {{flink.version}} up to 1.17 (SNAPSHOT) locally to take 
advantage of the checkpoint rest API classes in 1.17 and start coding. However 
I found it a little bit tricky to deal with the dependencies, since in flink 
1.17 {{io.fabric8}} classes are shaded and relocated in the 
{{flink-kubernetes}} in 
[this|https://github.com/apache/flink/commit/17d7c39bb2a9fcbaac1ead42073c099a52171d7d]
 commit.

This change will also cause {{flink-kubernetes-operator}} functions with 
arguments of {{io.fabric8}} classes now requiring 
{{org.apache.flink.kubernetes.shaded.io.fabric8}} classes. The tricky thing is: 
there are certain {{io.fabric8}} classes in {{flink-kubernetes-operator which 
are not provided by flink-kubernetes}} like 
[these,|https://github.com/apache/flink-kubernetes-operator/blob/a1842d4c0170feb008293963ec51c0343f42771d/flink-kubernetes-standalone/pom.xml#L74-L79]
 so their package name will stick with {{io.fabric8 and lead to compilation 
failure.}}

One work around I can think of is to have a dedicated subproject just to 
relocate back the {{io.fabric8}} classes in {{{}flink-kubernetes{}}}, which 
seems a little bit ugly.

I am wondering if you folks have some good practices to deal with this. 


was (Author: JIRAUSER290356):
[~gyfora] [~thw] Thanks for the guidance!

I am trying to bump {{flink.version}} up to 1.17 (SNAPSHOT) locally to take 
advantage of the checkpoint rest API classes in 1.17 and start coding. However 
I found it a little bit tricky to deal with the dependencies, since in flink 
1.17 {{io.fabric8}} classes are shaded and relocated in the 
{{flink-kubernetes}} in 
[this|https://github.com/apache/flink/commit/17d7c39bb2a9fcbaac1ead42073c099a52171d7d]
 commit.

This change will also cause {{flink-kubernetes-operator}} functions with 
arguments of {{io.fabric8}} classes now requiring 
{{org.apache.flink.kubernetes.shaded.io.fabric8}} classes. The tricky thing is: 
there are certain {{io.fabric8}} classes in {{flink-kubernetes-operator }}which 
are not provided by{{ }}{{flink-kubernetes}} like 
[these,|https://github.com/apache/flink-kubernetes-operator/blob/a1842d4c0170feb008293963ec51c0343f42771d/flink-kubernetes-standalone/pom.xml#L74-L79]
 so their package name will stick with {{{}io.fabric8 and lead to compilation 
failure.\{{{}{{

One work around I can think of is to have a dedicated subproject just to 
relocate back the {{io.fabric8}} classes in {{{}flink-kubernetes{}}}, which 
seems a little bit ugly.

I am wondering if you folks have some good practices to deal with this. 

> Support periodic checkpoint triggering
> --
>
> Key: FLINK-29634
> URL: https://issues.apache.org/jira/browse/FLINK-29634
> Project: Flink
>  Issue Type: New Feature
>  Components: Kubernetes Operator
>Reporter: Thomas Weise
>Assignee: Jiale Tan
>Priority: Major
>
> Similar to the support for periodic savepoints, the operator should support 
> triggering periodic checkpoints to break the incremental checkpoint chain.
> Support for external triggering will come with 1.17: 
> https://issues.apache.org/jira/browse/FLINK-27101 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (FLINK-29634) Support periodic checkpoint triggering

2023-01-01 Thread Jiale Tan (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-29634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17653483#comment-17653483
 ] 

Jiale Tan edited comment on FLINK-29634 at 1/2/23 2:19 AM:
---

[~gyfora] [~thw] Thanks for the guidance!

I am trying to bump {{flink.version}} up to 1.17 (SNAPSHOT) locally to take 
advantage of the checkpoint rest API classes in 1.17 and start coding. However 
I found it a little bit tricky to deal with the dependencies, since in flink 
1.17 {{io.fabric8}} classes are shaded and relocated in the 
{{flink-kubernetes}} in 
[this|https://github.com/apache/flink/commit/17d7c39bb2a9fcbaac1ead42073c099a52171d7d]
 commit.

This change will also cause {{flink-kubernetes-operator}} functions with 
arguments of {{io.fabric8}} classes now requiring 
{{org.apache.flink.kubernetes.shaded.io.fabric8}} classes. The tricky thing is: 
there are certain {{io.fabric8}} classes in {{flink-kubernetes-operator }}which 
are not provided by {{flink-kubernetes}} like 
[these,|https://github.com/apache/flink-kubernetes-operator/blob/a1842d4c0170feb008293963ec51c0343f42771d/flink-kubernetes-standalone/pom.xml#L74-L79]
 so their package name will stick with {{io.fabric8 }}and lead to compilation 
failure.{{{}{}}}

One work around I can think of is to have a dedicated subproject just to 
relocate back the {{io.fabric8}} classes in {{{}flink-kubernetes{}}}, which 
seems a little bit ugly.

I am wondering if you folks have some good practices to deal with this. 


was (Author: JIRAUSER290356):
[~gyfora] [~thw] Thanks for the guidance!

I am trying to bump {{flink.version}} up to 1.17 (SNAPSHOT) locally to take 
advantage of the checkpoint rest API classes in 1.17 and start coding. However 
I found it a little bit tricky to deal with the dependencies, since in flink 
1.17 {{io.fabric8}} classes are shaded and relocated in the 
{{flink-kubernetes}} in 
[this|https://github.com/apache/flink/commit/17d7c39bb2a9fcbaac1ead42073c099a52171d7d]
 commit.

This change will also cause {{flink-kubernetes-operator}} functions with 
arguments of {{io.fabric8}} classes now requiring 
{{org.apache.flink.kubernetes.shaded.io.fabric8}} classes, and fail the 
compilation.

I am wondering if you folks have some good practices to deal with this. 

> Support periodic checkpoint triggering
> --
>
> Key: FLINK-29634
> URL: https://issues.apache.org/jira/browse/FLINK-29634
> Project: Flink
>  Issue Type: New Feature
>  Components: Kubernetes Operator
>Reporter: Thomas Weise
>Assignee: Jiale Tan
>Priority: Major
>
> Similar to the support for periodic savepoints, the operator should support 
> triggering periodic checkpoints to break the incremental checkpoint chain.
> Support for external triggering will come with 1.17: 
> https://issues.apache.org/jira/browse/FLINK-27101 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (FLINK-29634) Support periodic checkpoint triggering

2023-01-01 Thread Jiale Tan (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-29634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17653483#comment-17653483
 ] 

Jiale Tan edited comment on FLINK-29634 at 1/2/23 2:19 AM:
---

[~gyfora] [~thw] Thanks for the guidance!

I am trying to bump {{flink.version}} up to 1.17 (SNAPSHOT) locally to take 
advantage of the checkpoint rest API classes in 1.17 and start coding. However 
I found it a little bit tricky to deal with the dependencies, since in flink 
1.17 {{io.fabric8}} classes are shaded and relocated in the 
{{flink-kubernetes}} in 
[this|https://github.com/apache/flink/commit/17d7c39bb2a9fcbaac1ead42073c099a52171d7d]
 commit.

This change will also cause {{flink-kubernetes-operator}} functions with 
arguments of {{io.fabric8}} classes now requiring 
{{org.apache.flink.kubernetes.shaded.io.fabric8}} classes. The tricky thing is: 
there are certain {{io.fabric8}} classes in {{flink-kubernetes-operator }}which 
are not provided by{{ }}{{flink-kubernetes}} like 
[these,|https://github.com/apache/flink-kubernetes-operator/blob/a1842d4c0170feb008293963ec51c0343f42771d/flink-kubernetes-standalone/pom.xml#L74-L79]
 so their package name will stick with {{{}io.fabric8 and lead to compilation 
failure.\{{{}{{

One work around I can think of is to have a dedicated subproject just to 
relocate back the {{io.fabric8}} classes in {{{}flink-kubernetes{}}}, which 
seems a little bit ugly.

I am wondering if you folks have some good practices to deal with this. 


was (Author: JIRAUSER290356):
[~gyfora] [~thw] Thanks for the guidance!

I am trying to bump {{flink.version}} up to 1.17 (SNAPSHOT) locally to take 
advantage of the checkpoint rest API classes in 1.17 and start coding. However 
I found it a little bit tricky to deal with the dependencies, since in flink 
1.17 {{io.fabric8}} classes are shaded and relocated in the 
{{flink-kubernetes}} in 
[this|https://github.com/apache/flink/commit/17d7c39bb2a9fcbaac1ead42073c099a52171d7d]
 commit.

This change will also cause {{flink-kubernetes-operator}} functions with 
arguments of {{io.fabric8}} classes now requiring 
{{org.apache.flink.kubernetes.shaded.io.fabric8}} classes. The tricky thing is: 
there are certain {{io.fabric8}} classes in {{flink-kubernetes-operator }}which 
are not provided by {{flink-kubernetes}} like 
[these,|https://github.com/apache/flink-kubernetes-operator/blob/a1842d4c0170feb008293963ec51c0343f42771d/flink-kubernetes-standalone/pom.xml#L74-L79]
 so their package name will stick with {{io.fabric8 }}and lead to compilation 
failure.{{{}{}}}

One work around I can think of is to have a dedicated subproject just to 
relocate back the {{io.fabric8}} classes in {{{}flink-kubernetes{}}}, which 
seems a little bit ugly.

I am wondering if you folks have some good practices to deal with this. 

> Support periodic checkpoint triggering
> --
>
> Key: FLINK-29634
> URL: https://issues.apache.org/jira/browse/FLINK-29634
> Project: Flink
>  Issue Type: New Feature
>  Components: Kubernetes Operator
>Reporter: Thomas Weise
>Assignee: Jiale Tan
>Priority: Major
>
> Similar to the support for periodic savepoints, the operator should support 
> triggering periodic checkpoints to break the incremental checkpoint chain.
> Support for external triggering will come with 1.17: 
> https://issues.apache.org/jira/browse/FLINK-27101 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (FLINK-29634) Support periodic checkpoint triggering

2023-01-01 Thread Jiale Tan (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-29634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17653483#comment-17653483
 ] 

Jiale Tan edited comment on FLINK-29634 at 1/2/23 2:12 AM:
---

[~gyfora] [~thw] Thanks for the guidance!

I am trying to bump {{flink.version}} up to 1.17 (SNAPSHOT) locally to take 
advantage of the checkpoint rest API classes in 1.17 and start coding. However 
I found it a little bit tricky to deal with the dependencies, since in flink 
1.17 {{io.fabric8}} classes are shaded and relocated in the 
{{flink-kubernetes}} in 
[this|https://github.com/apache/flink/commit/17d7c39bb2a9fcbaac1ead42073c099a52171d7d]
 commit.

This change will also cause {{flink-kubernetes-operator}} functions with 
arguments of {{io.fabric8}} classes now requiring 
{{org.apache.flink.kubernetes.shaded.io.fabric8}} classes, and fail the 
compilation.

I am wondering if you folks have some good practices to deal with this. 


was (Author: JIRAUSER290356):
[~gyfora] [~thw] Thanks for the guidance!

I am trying to bump {{flink.version}} up to 1.17 (SNAPSHOT) locally to take 
advantage of the checkpoint rest API classes in 1.17. However I found it a 
little bit tricky to deal with the dependencies, since in flink 1.17 
{{io.fabric8}} classes are shaded and relocated in the {{flink-kubernetes}} in 
[this|https://github.com/apache/flink/commit/17d7c39bb2a9fcbaac1ead42073c099a52171d7d]
 commit.

This change will also cause {{flink-kubernetes-operator}} functions with 
arguments of {{io.fabric8}} classes now requiring 
{{org.apache.flink.kubernetes.shaded.io.fabric8}} classes, and fail the 
compilation.

I am wondering if you folks have some good practices to deal with this. 

> Support periodic checkpoint triggering
> --
>
> Key: FLINK-29634
> URL: https://issues.apache.org/jira/browse/FLINK-29634
> Project: Flink
>  Issue Type: New Feature
>  Components: Kubernetes Operator
>Reporter: Thomas Weise
>Assignee: Jiale Tan
>Priority: Major
>
> Similar to the support for periodic savepoints, the operator should support 
> triggering periodic checkpoints to break the incremental checkpoint chain.
> Support for external triggering will come with 1.17: 
> https://issues.apache.org/jira/browse/FLINK-27101 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (FLINK-29634) Support periodic checkpoint triggering

2023-01-01 Thread Jiale Tan (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-29634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17653483#comment-17653483
 ] 

Jiale Tan edited comment on FLINK-29634 at 1/2/23 2:12 AM:
---

[~gyfora] [~thw] Thanks for the guidance!

I am trying to bump {{flink.version}} up to 1.17 (SNAPSHOT) locally to take 
advantage of the checkpoint rest API classes in 1.17. However I found it a 
little bit tricky to deal with the dependencies, since in flink 1.17 
{{io.fabric8}} classes are shaded and relocated in the {{flink-kubernetes}} in 
[this|https://github.com/apache/flink/commit/17d7c39bb2a9fcbaac1ead42073c099a52171d7d]
 commit.

This change will also cause {{flink-kubernetes-operator}} functions with 
arguments of {{io.fabric8}} classes now requiring 
{{org.apache.flink.kubernetes.shaded.io.fabric8}} classes, and fail the 
compilation.

I am wondering if you folks have some good practices to deal with this. 


was (Author: JIRAUSER290356):
[~gyfora] [~thw] Thanks for the guidance!

I am trying to bump {{flink.version}} up to 1.17 (SNAPSHOT) locally to take 
advantage of the checkpoint rest API classes in 1.17. However I found it a 
little bit tricky to deal with the dependencies, since in flink 1.17 
{{io.fabric8}} classes are shaded and relocated in the {{flink-kubernetes}} in 
[this|https://github.com/apache/flink/commit/17d7c39bb2a9fcbaac1ead42073c099a52171d7d]
 commit.

Changing the imports package name are relative easy. But this change will also 
cause {{flink-kubernetes-operator}} functions with arguments of {{io.fabric8}} 
classes now requiring {{org.apache.flink.kubernetes.shaded.io.fabric8}} 
classes, and fail the compilation.

I am wondering if you folks have some good practices to deal with this. 

> Support periodic checkpoint triggering
> --
>
> Key: FLINK-29634
> URL: https://issues.apache.org/jira/browse/FLINK-29634
> Project: Flink
>  Issue Type: New Feature
>  Components: Kubernetes Operator
>Reporter: Thomas Weise
>Assignee: Jiale Tan
>Priority: Major
>
> Similar to the support for periodic savepoints, the operator should support 
> triggering periodic checkpoints to break the incremental checkpoint chain.
> Support for external triggering will come with 1.17: 
> https://issues.apache.org/jira/browse/FLINK-27101 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (FLINK-29634) Support periodic checkpoint triggering

2023-01-01 Thread Jiale Tan (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-29634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17653483#comment-17653483
 ] 

Jiale Tan edited comment on FLINK-29634 at 1/2/23 2:02 AM:
---

[~gyfora] [~thw] Thanks for the guidance!

I am trying to bump {{flink.version}} up to 1.17 (SNAPSHOT) locally to take 
advantage of the checkpoint rest API classes in 1.17. However I found it a 
little bit tricky to deal with the dependencies, since in flink 1.17 
{{io.fabric8}} classes are shaded and relocated in the {{flink-kubernetes}} in 
[this|https://github.com/apache/flink/commit/17d7c39bb2a9fcbaac1ead42073c099a52171d7d]
 commit.

Changing the imports package name are relative easy. But this change will also 
cause {{flink-kubernetes-operator}} functions with arguments of {{io.fabric8}} 
classes now requiring {{org.apache.flink.kubernetes.shaded.io.fabric8}} 
classes, and fail the compilation.

I am wondering if you folks have some good practices to deal with this. 


was (Author: JIRAUSER290356):
[~gyfora] [~thw] Thanks for the guidance!

I am trying to bump {{flink.version}} up to 1.17 (SNAPSHOT) locally to take 
advantage of the checkpoint rest API classes in 1.17. However I found it a 
little bit tricky to deal with the dependencies, since in flink 1.17 
{{io.fabric8}} classes are shaded and relocated in the {{flink-kubernetes}} in 
[this|https://github.com/apache/flink/commit/17d7c39bb2a9fcbaac1ead42073c099a52171d7d]
 commit.

Changing the imports package name are relative easy. But this change will also 
cause {{flink-kubernetes-operator}} functions with arguments of {{io.fabric8}} 
classes now requires {{org.apache.flink.kubernetes.shaded.io.fabric8}} classes 
and will not compile.

I am wondering if you folks have some good practices to deal with this. 

> Support periodic checkpoint triggering
> --
>
> Key: FLINK-29634
> URL: https://issues.apache.org/jira/browse/FLINK-29634
> Project: Flink
>  Issue Type: New Feature
>  Components: Kubernetes Operator
>Reporter: Thomas Weise
>Assignee: Jiale Tan
>Priority: Major
>
> Similar to the support for periodic savepoints, the operator should support 
> triggering periodic checkpoints to break the incremental checkpoint chain.
> Support for external triggering will come with 1.17: 
> https://issues.apache.org/jira/browse/FLINK-27101 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (FLINK-29634) Support periodic checkpoint triggering

2023-01-01 Thread Jiale Tan (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-29634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17653483#comment-17653483
 ] 

Jiale Tan edited comment on FLINK-29634 at 1/2/23 2:01 AM:
---

[~gyfora] [~thw] Thanks for the guidance!

I am trying to bump {{flink.version}} up to 1.17 (SNAPSHOT) locally to take 
advantage of the checkpoint rest API classes in 1.17. However I found it a 
little bit tricky to deal with the dependencies, since in flink 1.17 
{{io.fabric8}} classes are shaded and relocated in the {{flink-kubernetes}} in 
[this|https://github.com/apache/flink/commit/17d7c39bb2a9fcbaac1ead42073c099a52171d7d]
 commit.

Changing the imports package name are relative easy. But this change will also 
cause {{flink-kubernetes-operator}} functions with arguments of {{io.fabric8}} 
classes now requires {{org.apache.flink.kubernetes.shaded.io.fabric8}} classes 
and will not compile.

I am wondering if you folks have some good practices to deal with this. 


was (Author: JIRAUSER290356):
[~gyfora] [~thw] Thanks for the guidance!

I am trying to bump {{flink.version}} up to 1.17 (SNAPSHOT) locally to take 
advantage of the checkpoint rest API classes in 1.17. However I found it a 
little bit tricky to deal with the dependencies, since in flink 1.17 
{{io.fabric8}} classes are shaded and relocated in the {{flink-kubernetes}} in 
[this|https://github.com/apache/flink/commit/17d7c39bb2a9fcbaac1ead42073c099a52171d7d]
 commit.

Changing the imports package name are relative easy. But this change will also 
cause {{flink-kubernetes-operator}} functions with arguments of {{io.fabric8}} 
classes now requires\{{ }}{{{}org.apache.flink.kubernetes.shaded.io.fabric8{}}} 
classes and will not compile.

I am wondering if you folks have some good practices to deal with this. 

> Support periodic checkpoint triggering
> --
>
> Key: FLINK-29634
> URL: https://issues.apache.org/jira/browse/FLINK-29634
> Project: Flink
>  Issue Type: New Feature
>  Components: Kubernetes Operator
>Reporter: Thomas Weise
>Assignee: Jiale Tan
>Priority: Major
>
> Similar to the support for periodic savepoints, the operator should support 
> triggering periodic checkpoints to break the incremental checkpoint chain.
> Support for external triggering will come with 1.17: 
> https://issues.apache.org/jira/browse/FLINK-27101 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-29634) Support periodic checkpoint triggering

2023-01-01 Thread Jiale Tan (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-29634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17653483#comment-17653483
 ] 

Jiale Tan commented on FLINK-29634:
---

[~gyfora] [~thw] Thanks for the guidance!

I am trying to bump {{flink.version}} up to 1.17 (SNAPSHOT) locally (to take 
advantage of the checkpoint rest API classes in 1.17). However I found it a 
little bit tricky to deal with the dependencies, since in flink 1.17 
{{io.fabric8}} classes are shaded and relocated in the {{flink-kubernetes}} in 
[this|https://github.com/apache/flink/commit/17d7c39bb2a9fcbaac1ead42073c099a52171d7d]
 commit.

Changing the imports package name are relative easy. But this change will also 
cause {{flink-kubernetes-operator}} functions with arguments of {{io.fabric8}} 
classes now requires{{ }}{{org.apache.flink.kubernetes.shaded.io.fabric8}} 
classes and will not compile.

I am wondering if you folks have some good practices to deal with this. 

> Support periodic checkpoint triggering
> --
>
> Key: FLINK-29634
> URL: https://issues.apache.org/jira/browse/FLINK-29634
> Project: Flink
>  Issue Type: New Feature
>  Components: Kubernetes Operator
>Reporter: Thomas Weise
>Assignee: Jiale Tan
>Priority: Major
>
> Similar to the support for periodic savepoints, the operator should support 
> triggering periodic checkpoints to break the incremental checkpoint chain.
> Support for external triggering will come with 1.17: 
> https://issues.apache.org/jira/browse/FLINK-27101 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (FLINK-29634) Support periodic checkpoint triggering

2023-01-01 Thread Jiale Tan (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-29634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17653483#comment-17653483
 ] 

Jiale Tan edited comment on FLINK-29634 at 1/2/23 2:00 AM:
---

[~gyfora] [~thw] Thanks for the guidance!

I am trying to bump {{flink.version}} up to 1.17 (SNAPSHOT) locally to take 
advantage of the checkpoint rest API classes in 1.17. However I found it a 
little bit tricky to deal with the dependencies, since in flink 1.17 
{{io.fabric8}} classes are shaded and relocated in the {{flink-kubernetes}} in 
[this|https://github.com/apache/flink/commit/17d7c39bb2a9fcbaac1ead42073c099a52171d7d]
 commit.

Changing the imports package name are relative easy. But this change will also 
cause {{flink-kubernetes-operator}} functions with arguments of {{io.fabric8}} 
classes now requires\{{ }}{{{}org.apache.flink.kubernetes.shaded.io.fabric8{}}} 
classes and will not compile.

I am wondering if you folks have some good practices to deal with this. 


was (Author: JIRAUSER290356):
[~gyfora] [~thw] Thanks for the guidance!

I am trying to bump {{flink.version}} up to 1.17 (SNAPSHOT) locally (to take 
advantage of the checkpoint rest API classes in 1.17). However I found it a 
little bit tricky to deal with the dependencies, since in flink 1.17 
{{io.fabric8}} classes are shaded and relocated in the {{flink-kubernetes}} in 
[this|https://github.com/apache/flink/commit/17d7c39bb2a9fcbaac1ead42073c099a52171d7d]
 commit.

Changing the imports package name are relative easy. But this change will also 
cause {{flink-kubernetes-operator}} functions with arguments of {{io.fabric8}} 
classes now requires{{ }}{{org.apache.flink.kubernetes.shaded.io.fabric8}} 
classes and will not compile.

I am wondering if you folks have some good practices to deal with this. 

> Support periodic checkpoint triggering
> --
>
> Key: FLINK-29634
> URL: https://issues.apache.org/jira/browse/FLINK-29634
> Project: Flink
>  Issue Type: New Feature
>  Components: Kubernetes Operator
>Reporter: Thomas Weise
>Assignee: Jiale Tan
>Priority: Major
>
> Similar to the support for periodic savepoints, the operator should support 
> triggering periodic checkpoints to break the incremental checkpoint chain.
> Support for external triggering will come with 1.17: 
> https://issues.apache.org/jira/browse/FLINK-27101 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (FLINK-29610) Infinite timeout is used in SavepointHandlers and CheckpointTriggerHandler calls to RestfulGateway

2022-11-04 Thread Jiale Tan (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-29610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17628690#comment-17628690
 ] 

Jiale Tan edited comment on FLINK-29610 at 11/4/22 6:16 AM:


[~gaoyunhaii] thanks for the info!

I created a PR [https://github.com/apache/flink/pull/21239] as per my 
understanding, it would be nice if you may take a look. [~gaoyunhaii] [~chesnay]

 

Meanwhile I am tracking how this Timeout will be used all the way till 
[here|https://github.com/apache/flink/blob/e9f3ec93aad7cec795c765c937ee71807f5478cf/flink-runtime/src/main/java/org/apache/flink/runtime/jobmaster/JobMaster.java#L856-L879].
 And it seems all those timeout are not being used later? 

 

FYI [~thomasWeise] this is part of unfinished work from FLINK-27101


was (Author: JIRAUSER290356):
[~gaoyunhaii] thanks for the info!

I created a PR [https://github.com/apache/flink/pull/21239] as per my 
understanding, it would be nice if you may take a look. [~gaoyunhaii] [~chesnay]

 

Meanwhile I am tracking how this Timeout will be used all the way till 
[here|https://github.com/apache/flink/blob/e9f3ec93aad7cec795c765c937ee71807f5478cf/flink-runtime/src/main/java/org/apache/flink/runtime/jobmaster/JobMaster.java#L856-L879].
 And it seems all those timeout are not being used? 

 

FYI [~thomasWeise] this is part of unfinished work from FLINK-27101

> Infinite timeout is used in SavepointHandlers and CheckpointTriggerHandler 
> calls to RestfulGateway
> --
>
> Key: FLINK-29610
> URL: https://issues.apache.org/jira/browse/FLINK-29610
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / REST
>Reporter: Jiale Tan
>Assignee: Jiale Tan
>Priority: Major
>  Labels: pull-request-available
>
> In {{{}SavepointHandlers{}}}, both 
> {{[StopWithSavepointHandler|https://github.com/apache/flink/blob/cd8ea8d5b207569f68acc5a3c8db95cd2ca47ba6/flink-runtime/src/main/java/org/apache/flink/runtime/rest/handler/job/savepoints/SavepointHandlers.java#L214]}}
>  and 
> {{[SavepointTriggerHandler|https://github.com/apache/flink/blob/cd8ea8d5b207569f68acc5a3c8db95cd2ca47ba6/flink-runtime/src/main/java/org/apache/flink/runtime/rest/handler/job/savepoints/SavepointHandlers.java#L258]}}
>  are calling {{RestfulGateway}} with {{RpcUtils.INF_TIMEOUT}}
> Same thing happens in the 
> {{[CheckpointTriggerHandler|https://github.com/apache/flink/blob/8e66be89dfcb54b7256d51e9d89222ae6701061f/flink-runtime/src/main/java/org/apache/flink/runtime/rest/handler/job/checkpoints/CheckpointHandlers.java#L146]}}
> As pointed out in 
> [this|https://github.com/apache/flink/pull/20852#discussion_r992218970] 
> discussion, we will need to either figure out why {{RpcUtils.INF_TIMEOUT}} is 
> used, or remove it if there is no strong reason to use it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-29610) Infinite timeout is used in SavepointHandlers and CheckpointTriggerHandler calls to RestfulGateway

2022-11-04 Thread Jiale Tan (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-29610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jiale Tan updated FLINK-29610:
--
Description: 
In {{{}SavepointHandlers{}}}, both 
{{[StopWithSavepointHandler|https://github.com/apache/flink/blob/cd8ea8d5b207569f68acc5a3c8db95cd2ca47ba6/flink-runtime/src/main/java/org/apache/flink/runtime/rest/handler/job/savepoints/SavepointHandlers.java#L214]}}
 and 
{{[SavepointTriggerHandler|https://github.com/apache/flink/blob/cd8ea8d5b207569f68acc5a3c8db95cd2ca47ba6/flink-runtime/src/main/java/org/apache/flink/runtime/rest/handler/job/savepoints/SavepointHandlers.java#L258]}}
 are calling {{RestfulGateway}} with {{RpcUtils.INF_TIMEOUT}}

Same thing happens in the 
{{[CheckpointTriggerHandler|https://github.com/apache/flink/blob/8e66be89dfcb54b7256d51e9d89222ae6701061f/flink-runtime/src/main/java/org/apache/flink/runtime/rest/handler/job/checkpoints/CheckpointHandlers.java#L146]}}

As pointed out in 
[this|https://github.com/apache/flink/pull/20852#discussion_r992218970] 
discussion, we will need to either figure out why {{RpcUtils.INF_TIMEOUT}} is 
used, or remove it if there is no strong reason to use it.

  was:
In {{{}SavepointHandlers{}}}, both 
{{[StopWithSavepointHandler|https://github.com/apache/flink/blob/cd8ea8d5b207569f68acc5a3c8db95cd2ca47ba6/flink-runtime/src/main/java/org/apache/flink/runtime/rest/handler/job/savepoints/SavepointHandlers.java#L214]}}
 and 
{{[SavepointTriggerHandler|https://github.com/apache/flink/blob/cd8ea8d5b207569f68acc5a3c8db95cd2ca47ba6/flink-runtime/src/main/java/org/apache/flink/runtime/rest/handler/job/savepoints/SavepointHandlers.java#L258]}}
 are calling {{RestfulGateway}} with {{RpcUtils.INF_TIMEOUT}}

 

As pointed out in 
[this|https://github.com/apache/flink/pull/20852#discussion_r992218970] 
discussion, we will need to either figure out why {{RpcUtils.INF_TIMEOUT}} is 
used, or remove it if there is no strong reason to use it.


> Infinite timeout is used in SavepointHandlers and CheckpointTriggerHandler 
> calls to RestfulGateway
> --
>
> Key: FLINK-29610
> URL: https://issues.apache.org/jira/browse/FLINK-29610
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / REST
>Reporter: Jiale Tan
>Assignee: Jiale Tan
>Priority: Major
>  Labels: pull-request-available
>
> In {{{}SavepointHandlers{}}}, both 
> {{[StopWithSavepointHandler|https://github.com/apache/flink/blob/cd8ea8d5b207569f68acc5a3c8db95cd2ca47ba6/flink-runtime/src/main/java/org/apache/flink/runtime/rest/handler/job/savepoints/SavepointHandlers.java#L214]}}
>  and 
> {{[SavepointTriggerHandler|https://github.com/apache/flink/blob/cd8ea8d5b207569f68acc5a3c8db95cd2ca47ba6/flink-runtime/src/main/java/org/apache/flink/runtime/rest/handler/job/savepoints/SavepointHandlers.java#L258]}}
>  are calling {{RestfulGateway}} with {{RpcUtils.INF_TIMEOUT}}
> Same thing happens in the 
> {{[CheckpointTriggerHandler|https://github.com/apache/flink/blob/8e66be89dfcb54b7256d51e9d89222ae6701061f/flink-runtime/src/main/java/org/apache/flink/runtime/rest/handler/job/checkpoints/CheckpointHandlers.java#L146]}}
> As pointed out in 
> [this|https://github.com/apache/flink/pull/20852#discussion_r992218970] 
> discussion, we will need to either figure out why {{RpcUtils.INF_TIMEOUT}} is 
> used, or remove it if there is no strong reason to use it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-29610) Infinite timeout is used in SavepointHandlers and CheckpointTriggerHandler calls to RestfulGateway

2022-11-04 Thread Jiale Tan (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-29610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jiale Tan updated FLINK-29610:
--
Summary: Infinite timeout is used in SavepointHandlers and 
CheckpointTriggerHandler calls to RestfulGateway  (was: Infinite timeout is 
used in SavepointHandlers calls to RestfulGateway)

> Infinite timeout is used in SavepointHandlers and CheckpointTriggerHandler 
> calls to RestfulGateway
> --
>
> Key: FLINK-29610
> URL: https://issues.apache.org/jira/browse/FLINK-29610
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / REST
>Reporter: Jiale Tan
>Assignee: Jiale Tan
>Priority: Major
>  Labels: pull-request-available
>
> In {{{}SavepointHandlers{}}}, both 
> {{[StopWithSavepointHandler|https://github.com/apache/flink/blob/cd8ea8d5b207569f68acc5a3c8db95cd2ca47ba6/flink-runtime/src/main/java/org/apache/flink/runtime/rest/handler/job/savepoints/SavepointHandlers.java#L214]}}
>  and 
> {{[SavepointTriggerHandler|https://github.com/apache/flink/blob/cd8ea8d5b207569f68acc5a3c8db95cd2ca47ba6/flink-runtime/src/main/java/org/apache/flink/runtime/rest/handler/job/savepoints/SavepointHandlers.java#L258]}}
>  are calling {{RestfulGateway}} with {{RpcUtils.INF_TIMEOUT}}
>  
> As pointed out in 
> [this|https://github.com/apache/flink/pull/20852#discussion_r992218970] 
> discussion, we will need to either figure out why {{RpcUtils.INF_TIMEOUT}} is 
> used, or remove it if there is no strong reason to use it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-29610) Infinite timeout is used in SavepointHandlers calls to RestfulGateway

2022-11-04 Thread Jiale Tan (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-29610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17628690#comment-17628690
 ] 

Jiale Tan commented on FLINK-29610:
---

[~gaoyunhaii] thanks for the info!

I created a PR [https://github.com/apache/flink/pull/21239] as per my 
understanding, it would be nice if you may take a look. [~gaoyunhaii] [~chesnay]

 

Meanwhile I am tracking how this Timeout will be used all the way till 
[here|https://github.com/apache/flink/blob/e9f3ec93aad7cec795c765c937ee71807f5478cf/flink-runtime/src/main/java/org/apache/flink/runtime/jobmaster/JobMaster.java#L856-L879].
 And it seems all those timeout are not being used? 

 

FYI [~thomasWeise] this is part of unfinished work from FLINK-27101

> Infinite timeout is used in SavepointHandlers calls to RestfulGateway
> -
>
> Key: FLINK-29610
> URL: https://issues.apache.org/jira/browse/FLINK-29610
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / REST
>Reporter: Jiale Tan
>Assignee: Jiale Tan
>Priority: Major
>  Labels: pull-request-available
>
> In {{{}SavepointHandlers{}}}, both 
> {{[StopWithSavepointHandler|https://github.com/apache/flink/blob/cd8ea8d5b207569f68acc5a3c8db95cd2ca47ba6/flink-runtime/src/main/java/org/apache/flink/runtime/rest/handler/job/savepoints/SavepointHandlers.java#L214]}}
>  and 
> {{[SavepointTriggerHandler|https://github.com/apache/flink/blob/cd8ea8d5b207569f68acc5a3c8db95cd2ca47ba6/flink-runtime/src/main/java/org/apache/flink/runtime/rest/handler/job/savepoints/SavepointHandlers.java#L258]}}
>  are calling {{RestfulGateway}} with {{RpcUtils.INF_TIMEOUT}}
>  
> As pointed out in 
> [this|https://github.com/apache/flink/pull/20852#discussion_r992218970] 
> discussion, we will need to either figure out why {{RpcUtils.INF_TIMEOUT}} is 
> used, or remove it if there is no strong reason to use it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (FLINK-29610) Infinite timeout is used in SavepointHandlers calls to RestfulGateway

2022-10-28 Thread Jiale Tan (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-29610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17625989#comment-17625989
 ] 

Jiale Tan edited comment on FLINK-29610 at 10/29/22 3:56 AM:
-

[~gaoyunhaii] I am curious what is the difference between ASK_TIMEOUT_DURATION 
and 
[RestConfiguration.timeout|https://github.com/apache/flink/blob/5d66e82915eace9342c175163b17f610bfbf7fa4/flink-runtime/src/main/java/org/apache/flink/runtime/webmonitor/WebMonitorEndpoint.java#L282]
 and is it ok to use the latter for the fix?


was (Author: JIRAUSER290356):
[~gaoyunhaii] I am curious what is the difference between ASK_TIMEOUT_DURATION 
and 
[RestConfiguration.timeout|https://github.com/apache/flink/blob/5d66e82915eace9342c175163b17f610bfbf7fa4/flink-runtime/src/main/java/org/apache/flink/runtime/webmonitor/WebMonitorEndpoint.java#L282]
 and why it is preferred to use ASK_TIMEOUT_DURATION in this case?

> Infinite timeout is used in SavepointHandlers calls to RestfulGateway
> -
>
> Key: FLINK-29610
> URL: https://issues.apache.org/jira/browse/FLINK-29610
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / REST
>Reporter: Jiale Tan
>Assignee: Jiale Tan
>Priority: Major
>  Labels: pull-request-available
>
> In {{{}SavepointHandlers{}}}, both 
> {{[StopWithSavepointHandler|https://github.com/apache/flink/blob/cd8ea8d5b207569f68acc5a3c8db95cd2ca47ba6/flink-runtime/src/main/java/org/apache/flink/runtime/rest/handler/job/savepoints/SavepointHandlers.java#L214]}}
>  and 
> {{[SavepointTriggerHandler|https://github.com/apache/flink/blob/cd8ea8d5b207569f68acc5a3c8db95cd2ca47ba6/flink-runtime/src/main/java/org/apache/flink/runtime/rest/handler/job/savepoints/SavepointHandlers.java#L258]}}
>  are calling {{RestfulGateway}} with {{RpcUtils.INF_TIMEOUT}}
>  
> As pointed out in 
> [this|https://github.com/apache/flink/pull/20852#discussion_r992218970] 
> discussion, we will need to either figure out why {{RpcUtils.INF_TIMEOUT}} is 
> used, or remove it if there is no strong reason to use it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-29610) Infinite timeout is used in SavepointHandlers calls to RestfulGateway

2022-10-28 Thread Jiale Tan (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-29610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17625989#comment-17625989
 ] 

Jiale Tan commented on FLINK-29610:
---

[~gaoyunhaii] I am curious what is the difference between ASK_TIMEOUT_DURATION 
and 
[RestConfiguration.timeout|https://github.com/apache/flink/blob/5d66e82915eace9342c175163b17f610bfbf7fa4/flink-runtime/src/main/java/org/apache/flink/runtime/webmonitor/WebMonitorEndpoint.java#L282]
 and why it is preferred to use ASK_TIMEOUT_DURATION in this case?

> Infinite timeout is used in SavepointHandlers calls to RestfulGateway
> -
>
> Key: FLINK-29610
> URL: https://issues.apache.org/jira/browse/FLINK-29610
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / REST
>Reporter: Jiale Tan
>Assignee: Jiale Tan
>Priority: Major
>  Labels: pull-request-available
>
> In {{{}SavepointHandlers{}}}, both 
> {{[StopWithSavepointHandler|https://github.com/apache/flink/blob/cd8ea8d5b207569f68acc5a3c8db95cd2ca47ba6/flink-runtime/src/main/java/org/apache/flink/runtime/rest/handler/job/savepoints/SavepointHandlers.java#L214]}}
>  and 
> {{[SavepointTriggerHandler|https://github.com/apache/flink/blob/cd8ea8d5b207569f68acc5a3c8db95cd2ca47ba6/flink-runtime/src/main/java/org/apache/flink/runtime/rest/handler/job/savepoints/SavepointHandlers.java#L258]}}
>  are calling {{RestfulGateway}} with {{RpcUtils.INF_TIMEOUT}}
>  
> As pointed out in 
> [this|https://github.com/apache/flink/pull/20852#discussion_r992218970] 
> discussion, we will need to either figure out why {{RpcUtils.INF_TIMEOUT}} is 
> used, or remove it if there is no strong reason to use it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-29610) Infinite timeout is used in SavepointHandlers calls to RestfulGateway

2022-10-20 Thread Jiale Tan (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-29610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17621406#comment-17621406
 ] 

Jiale Tan commented on FLINK-29610:
---

Thanks [~gaoyunhaii] for the extra context. Will look into this and potentially 
follow up here with a PR.

> Infinite timeout is used in SavepointHandlers calls to RestfulGateway
> -
>
> Key: FLINK-29610
> URL: https://issues.apache.org/jira/browse/FLINK-29610
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / REST
>Reporter: Jiale Tan
>Priority: Major
>
> In {{{}SavepointHandlers{}}}, both 
> {{[StopWithSavepointHandler|https://github.com/apache/flink/blob/cd8ea8d5b207569f68acc5a3c8db95cd2ca47ba6/flink-runtime/src/main/java/org/apache/flink/runtime/rest/handler/job/savepoints/SavepointHandlers.java#L214]}}
>  and 
> {{[SavepointTriggerHandler|https://github.com/apache/flink/blob/cd8ea8d5b207569f68acc5a3c8db95cd2ca47ba6/flink-runtime/src/main/java/org/apache/flink/runtime/rest/handler/job/savepoints/SavepointHandlers.java#L258]}}
>  are calling {{RestfulGateway}} with {{RpcUtils.INF_TIMEOUT}}
>  
> As pointed out in 
> [this|https://github.com/apache/flink/pull/20852#discussion_r992218970] 
> discussion, we will need to either figure out why {{RpcUtils.INF_TIMEOUT}} is 
> used, or remove it if there is no strong reason to use it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (FLINK-29634) Support periodic checkpoint triggering

2022-10-13 Thread Jiale Tan (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-29634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17617319#comment-17617319
 ] 

Jiale Tan edited comment on FLINK-29634 at 10/13/22 10:04 PM:
--

[Here|https://github.com/apache/flink-kubernetes-operator/pull/249/files] is 
the PR by [~gyfora] I found for periodic savepoint support in flink operator.

[~thw], could you please double confirm that PR is the one you were referring 
to? Thanks!


was (Author: JIRAUSER290356):
[Here|https://github.com/apache/flink-kubernetes-operator/pull/249/files] is 
the PR by [~gyfora] I found for periodic savepoint support in flink operator.

[~thw], could you please double confirm that PR is the one you were referring 
to. Thanks!

> Support periodic checkpoint triggering
> --
>
> Key: FLINK-29634
> URL: https://issues.apache.org/jira/browse/FLINK-29634
> Project: Flink
>  Issue Type: New Feature
>  Components: Kubernetes Operator
>Reporter: Thomas Weise
>Assignee: Jiale Tan
>Priority: Major
>
> Similar to the support for periodic savepoints, the operator should support 
> triggering periodic checkpoints to break the incremental checkpoint chain.
> Support for external triggering will come with 1.17: 
> https://issues.apache.org/jira/browse/FLINK-27101 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-29634) Support periodic checkpoint triggering

2022-10-13 Thread Jiale Tan (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-29634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17617319#comment-17617319
 ] 

Jiale Tan commented on FLINK-29634:
---

[Here|https://github.com/apache/flink-kubernetes-operator/pull/249/files] is 
the PR by [~gyfora] I found for periodic savepoint support in flink operator.

[~thw], could you please double confirm that PR is the one you were referring 
to. Thanks!

> Support periodic checkpoint triggering
> --
>
> Key: FLINK-29634
> URL: https://issues.apache.org/jira/browse/FLINK-29634
> Project: Flink
>  Issue Type: New Feature
>  Components: Kubernetes Operator
>Reporter: Thomas Weise
>Assignee: Jiale Tan
>Priority: Major
>
> Similar to the support for periodic savepoints, the operator should support 
> triggering periodic checkpoints to break the incremental checkpoint chain.
> Support for external triggering will come with 1.17: 
> https://issues.apache.org/jira/browse/FLINK-27101 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-29610) Infinite timeout is used in SavepointHandlers calls to RestfulGateway

2022-10-12 Thread Jiale Tan (Jira)
Jiale Tan created FLINK-29610:
-

 Summary: Infinite timeout is used in SavepointHandlers calls to 
RestfulGateway
 Key: FLINK-29610
 URL: https://issues.apache.org/jira/browse/FLINK-29610
 Project: Flink
  Issue Type: Bug
  Components: Runtime / REST
Reporter: Jiale Tan


In {{{}SavepointHandlers{}}}, both 
{{[StopWithSavepointHandler|https://github.com/apache/flink/blob/cd8ea8d5b207569f68acc5a3c8db95cd2ca47ba6/flink-runtime/src/main/java/org/apache/flink/runtime/rest/handler/job/savepoints/SavepointHandlers.java#L214]}}
 and 
{{[SavepointTriggerHandler|https://github.com/apache/flink/blob/cd8ea8d5b207569f68acc5a3c8db95cd2ca47ba6/flink-runtime/src/main/java/org/apache/flink/runtime/rest/handler/job/savepoints/SavepointHandlers.java#L258]}}
 are calling {{RestfulGateway}} with {{RpcUtils.INF_TIMEOUT}}

 

As pointed out in 
[this|https://github.com/apache/flink/pull/20852#discussion_r992218970] 
discussion, we will need to either figure out why {{RpcUtils.INF_TIMEOUT}} is 
used, or remove it if there is no strong reason to use it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (FLINK-27101) Periodically break the chain of incremental checkpoint (trigger checkpoints via REST API)

2022-09-21 Thread Jiale Tan (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-27101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17606485#comment-17606485
 ] 

Jiale Tan edited comment on FLINK-27101 at 9/22/22 3:12 AM:


Hi folks, 

I got [this|https://github.com/apache/flink/pull/20852] draft PR for option 3 
as discussed above:

??Expose triggering checkpoint via CLI and/or REST API with some parameters to 
choose incremental/full checkpoint.??

 

The API and implementation is very similar to save point trigger. 

 

I am new to contributing to flink, please let me know if I am in the right 
direction. If needed, may start a small FLIP / dev mailing list discussion


was (Author: JIRAUSER290356):
Hi folks, 

I got [this|https://github.com/apache/flink/pull/20852] draft PR for option 3 
as discussed above:

??Expose triggering checkpoint via CLI and/or REST API with some parameters to 
choose incremental/full checkpoint.??

 

The API and implementation is very similar to save point trigger. 

 

I am new to contributing to flink, please let me know if I am in the right 
direction. If yes, may start a small FLIP / dev mailing list discussion

> Periodically break the chain of incremental checkpoint (trigger checkpoints 
> via REST API)
> -
>
> Key: FLINK-27101
> URL: https://issues.apache.org/jira/browse/FLINK-27101
> Project: Flink
>  Issue Type: New Feature
>  Components: Runtime / Checkpointing, Runtime / REST
>Reporter: Steven Zhen Wu
>Assignee: Jiale Tan
>Priority: Major
>  Labels: pull-request-available
>
> Incremental checkpoint is almost a must for large-state jobs. It greatly 
> reduces the bytes uploaded to DFS per checkpoint. However, there are  a few 
> implications from incremental checkpoint that are problematic for production 
> operations.  Will use S3 as an example DFS for the rest of description.
> 1. Because there is no way to deterministically know how far back the 
> incremental checkpoint can refer to files uploaded to S3, it is very 
> difficult to set S3 bucket/object TTL. In one application, we have observed 
> Flink checkpoint referring to files uploaded over 6 months ago. S3 TTL can 
> corrupt the Flink checkpoints.
> S3 TTL is important for a few reasons
> - purge orphaned files (like external checkpoints from previous deployments) 
> to keep the storage cost in check. This problem can be addressed by 
> implementing proper garbage collection (similar to JVM) by traversing the 
> retained checkpoints from all jobs and traverse the file references. But that 
> is an expensive solution from engineering cost perspective.
> - Security and privacy. E.g., there may be requirement that Flink state can't 
> keep the data for more than some duration threshold (hours/days/weeks). 
> Application is expected to purge keys to satisfy the requirement. However, 
> with incremental checkpoint and how deletion works in RocksDB, it is hard to 
> set S3 TTL to purge S3 files. Even though those old S3 files don't contain 
> live keys, they may still be referrenced by retained Flink checkpoints.
> 2. Occasionally, corrupted checkpoint files (on S3) are observed. As a 
> result, restoring from checkpoint failed. With incremental checkpoint, it 
> usually doesn't help to try other older checkpoints, because they may refer 
> to the same corrupted file. It is unclear whether the corruption happened 
> before or during S3 upload. This risk can be mitigated with periodical 
> savepoints.
> It all boils down to periodical full snapshot (checkpoint or savepoint) to 
> deterministically break the chain of incremental checkpoints. Search the jira 
> history, the behavior that FLINK-23949 [1] trying to fix is actually close to 
> what we would need here.
> There are a few options
> 1. Periodically trigger savepoints (via control plane). This is actually not 
> a bad practice and might be appealing to some people. The problem is that it 
> requires a job deployment to break the chain of incremental checkpoint. 
> periodical job deployment may sound hacky. If we make the behavior of full 
> checkpoint after a savepoint (fixed in FLINK-23949) configurable, it might be 
> an acceptable compromise. The benefit is that no job deployment is required 
> after savepoints.
> 2. Build the feature in Flink incremental checkpoint. Periodically (with some 
> cron style config) trigger a full checkpoint to break the incremental chain. 
> If the full checkpoint failed (due to whatever reason), the following 
> checkpoints should attempt full checkpoint as well until one successful full 
> checkpoint is completed.
> 3. For the security/privacy requirement, the main thing is to apply 
> compaction on the deleted keys. That could probably avoid references to the 
> old 

[jira] [Comment Edited] (FLINK-27101) Periodically break the chain of incremental checkpoint (trigger checkpoints via REST API)

2022-09-21 Thread Jiale Tan (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-27101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17606485#comment-17606485
 ] 

Jiale Tan edited comment on FLINK-27101 at 9/22/22 3:11 AM:


Hi folks, 

I got [this|https://github.com/apache/flink/pull/20852] draft PR for option 3 
as discussed above:

??Expose triggering checkpoint via CLI and/or REST API with some parameters to 
choose incremental/full checkpoint.??

 

The API and implementation is very similar to save point trigger. 

 

I am new to contributing to flink, please let me know if I am in the right 
direction. If yes, may start a small FLIP / dev mailing list discussion


was (Author: JIRAUSER290356):
Hi folks, 

I got [this|https://github.com/apache/flink/pull/20852] draft PR for option 3 
as discussed above:

??Expose triggering checkpoint via CLI and/or REST API with some parameters to 
choose incremental/full checkpoint.??

 

The API and implementation is very similar to save point trigger. 

 

I am new to contributing to flink, please let me know if I am in the right 
direction. If yes, will start a small FLIP / dev mailing list discussion

> Periodically break the chain of incremental checkpoint (trigger checkpoints 
> via REST API)
> -
>
> Key: FLINK-27101
> URL: https://issues.apache.org/jira/browse/FLINK-27101
> Project: Flink
>  Issue Type: New Feature
>  Components: Runtime / Checkpointing, Runtime / REST
>Reporter: Steven Zhen Wu
>Assignee: Jiale Tan
>Priority: Major
>  Labels: pull-request-available
>
> Incremental checkpoint is almost a must for large-state jobs. It greatly 
> reduces the bytes uploaded to DFS per checkpoint. However, there are  a few 
> implications from incremental checkpoint that are problematic for production 
> operations.  Will use S3 as an example DFS for the rest of description.
> 1. Because there is no way to deterministically know how far back the 
> incremental checkpoint can refer to files uploaded to S3, it is very 
> difficult to set S3 bucket/object TTL. In one application, we have observed 
> Flink checkpoint referring to files uploaded over 6 months ago. S3 TTL can 
> corrupt the Flink checkpoints.
> S3 TTL is important for a few reasons
> - purge orphaned files (like external checkpoints from previous deployments) 
> to keep the storage cost in check. This problem can be addressed by 
> implementing proper garbage collection (similar to JVM) by traversing the 
> retained checkpoints from all jobs and traverse the file references. But that 
> is an expensive solution from engineering cost perspective.
> - Security and privacy. E.g., there may be requirement that Flink state can't 
> keep the data for more than some duration threshold (hours/days/weeks). 
> Application is expected to purge keys to satisfy the requirement. However, 
> with incremental checkpoint and how deletion works in RocksDB, it is hard to 
> set S3 TTL to purge S3 files. Even though those old S3 files don't contain 
> live keys, they may still be referrenced by retained Flink checkpoints.
> 2. Occasionally, corrupted checkpoint files (on S3) are observed. As a 
> result, restoring from checkpoint failed. With incremental checkpoint, it 
> usually doesn't help to try other older checkpoints, because they may refer 
> to the same corrupted file. It is unclear whether the corruption happened 
> before or during S3 upload. This risk can be mitigated with periodical 
> savepoints.
> It all boils down to periodical full snapshot (checkpoint or savepoint) to 
> deterministically break the chain of incremental checkpoints. Search the jira 
> history, the behavior that FLINK-23949 [1] trying to fix is actually close to 
> what we would need here.
> There are a few options
> 1. Periodically trigger savepoints (via control plane). This is actually not 
> a bad practice and might be appealing to some people. The problem is that it 
> requires a job deployment to break the chain of incremental checkpoint. 
> periodical job deployment may sound hacky. If we make the behavior of full 
> checkpoint after a savepoint (fixed in FLINK-23949) configurable, it might be 
> an acceptable compromise. The benefit is that no job deployment is required 
> after savepoints.
> 2. Build the feature in Flink incremental checkpoint. Periodically (with some 
> cron style config) trigger a full checkpoint to break the incremental chain. 
> If the full checkpoint failed (due to whatever reason), the following 
> checkpoints should attempt full checkpoint as well until one successful full 
> checkpoint is completed.
> 3. For the security/privacy requirement, the main thing is to apply 
> compaction on the deleted keys. That could probably avoid references to the 
> old 

[jira] [Comment Edited] (FLINK-27101) Periodically break the chain of incremental checkpoint

2022-09-19 Thread Jiale Tan (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-27101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17606485#comment-17606485
 ] 

Jiale Tan edited comment on FLINK-27101 at 9/19/22 8:21 AM:


Hi folks, 

I got [this|https://github.com/apache/flink/pull/20852] draft PR for option 3 
as discussed above:

??Expose triggering checkpoint via CLI and/or REST API with some parameters to 
choose incremental/full checkpoint.??

 

The API and implementation is very similar to save point trigger. 

 

I am new to contributing to flink, please let me know if I am in the right 
direction. If yes, will start a small FLIP / dev mailing list discussion


was (Author: JIRAUSER290356):
Hi folks, 

I got [this|https://github.com/apache/flink/pull/20852] draft PR for option 3 
as discussed above:

??Expose triggering checkpoint via CLI and/or REST API with some parameters to 
choose incremental/full checkpoint.??

 

The API and implementation is very similar to save point trigger. 

 

I am new to contributing to flink, please let me know if I am in the right 
direction. 

> Periodically break the chain of incremental checkpoint
> --
>
> Key: FLINK-27101
> URL: https://issues.apache.org/jira/browse/FLINK-27101
> Project: Flink
>  Issue Type: New Feature
>  Components: Runtime / Checkpointing
>Reporter: Steven Zhen Wu
>Assignee: Jiale Tan
>Priority: Major
>
> Incremental checkpoint is almost a must for large-state jobs. It greatly 
> reduces the bytes uploaded to DFS per checkpoint. However, there are  a few 
> implications from incremental checkpoint that are problematic for production 
> operations.  Will use S3 as an example DFS for the rest of description.
> 1. Because there is no way to deterministically know how far back the 
> incremental checkpoint can refer to files uploaded to S3, it is very 
> difficult to set S3 bucket/object TTL. In one application, we have observed 
> Flink checkpoint referring to files uploaded over 6 months ago. S3 TTL can 
> corrupt the Flink checkpoints.
> S3 TTL is important for a few reasons
> - purge orphaned files (like external checkpoints from previous deployments) 
> to keep the storage cost in check. This problem can be addressed by 
> implementing proper garbage collection (similar to JVM) by traversing the 
> retained checkpoints from all jobs and traverse the file references. But that 
> is an expensive solution from engineering cost perspective.
> - Security and privacy. E.g., there may be requirement that Flink state can't 
> keep the data for more than some duration threshold (hours/days/weeks). 
> Application is expected to purge keys to satisfy the requirement. However, 
> with incremental checkpoint and how deletion works in RocksDB, it is hard to 
> set S3 TTL to purge S3 files. Even though those old S3 files don't contain 
> live keys, they may still be referrenced by retained Flink checkpoints.
> 2. Occasionally, corrupted checkpoint files (on S3) are observed. As a 
> result, restoring from checkpoint failed. With incremental checkpoint, it 
> usually doesn't help to try other older checkpoints, because they may refer 
> to the same corrupted file. It is unclear whether the corruption happened 
> before or during S3 upload. This risk can be mitigated with periodical 
> savepoints.
> It all boils down to periodical full snapshot (checkpoint or savepoint) to 
> deterministically break the chain of incremental checkpoints. Search the jira 
> history, the behavior that FLINK-23949 [1] trying to fix is actually close to 
> what we would need here.
> There are a few options
> 1. Periodically trigger savepoints (via control plane). This is actually not 
> a bad practice and might be appealing to some people. The problem is that it 
> requires a job deployment to break the chain of incremental checkpoint. 
> periodical job deployment may sound hacky. If we make the behavior of full 
> checkpoint after a savepoint (fixed in FLINK-23949) configurable, it might be 
> an acceptable compromise. The benefit is that no job deployment is required 
> after savepoints.
> 2. Build the feature in Flink incremental checkpoint. Periodically (with some 
> cron style config) trigger a full checkpoint to break the incremental chain. 
> If the full checkpoint failed (due to whatever reason), the following 
> checkpoints should attempt full checkpoint as well until one successful full 
> checkpoint is completed.
> 3. For the security/privacy requirement, the main thing is to apply 
> compaction on the deleted keys. That could probably avoid references to the 
> old files. Is there any RocksDB compation can achieve full compaction of 
> removing old delete markers. Recent delete markers are fine
> [1] https://issues.apache.org/jira/browse/FLINK-23949



--
This 

[jira] [Commented] (FLINK-27101) Periodically break the chain of incremental checkpoint

2022-09-19 Thread Jiale Tan (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-27101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17606485#comment-17606485
 ] 

Jiale Tan commented on FLINK-27101:
---

Hi folks, 

I got [this|https://github.com/apache/flink/pull/20852] draft PR for option 3 
as discussed above:

??Expose triggering checkpoint via CLI and/or REST API with some parameters to 
choose incremental/full checkpoint.??

 

The API and implementation is very similar to save point trigger. 

 

I am new to contributing to flink, please let me know if I am in the right 
direction. 

> Periodically break the chain of incremental checkpoint
> --
>
> Key: FLINK-27101
> URL: https://issues.apache.org/jira/browse/FLINK-27101
> Project: Flink
>  Issue Type: New Feature
>  Components: Runtime / Checkpointing
>Reporter: Steven Zhen Wu
>Assignee: Jiale Tan
>Priority: Major
>
> Incremental checkpoint is almost a must for large-state jobs. It greatly 
> reduces the bytes uploaded to DFS per checkpoint. However, there are  a few 
> implications from incremental checkpoint that are problematic for production 
> operations.  Will use S3 as an example DFS for the rest of description.
> 1. Because there is no way to deterministically know how far back the 
> incremental checkpoint can refer to files uploaded to S3, it is very 
> difficult to set S3 bucket/object TTL. In one application, we have observed 
> Flink checkpoint referring to files uploaded over 6 months ago. S3 TTL can 
> corrupt the Flink checkpoints.
> S3 TTL is important for a few reasons
> - purge orphaned files (like external checkpoints from previous deployments) 
> to keep the storage cost in check. This problem can be addressed by 
> implementing proper garbage collection (similar to JVM) by traversing the 
> retained checkpoints from all jobs and traverse the file references. But that 
> is an expensive solution from engineering cost perspective.
> - Security and privacy. E.g., there may be requirement that Flink state can't 
> keep the data for more than some duration threshold (hours/days/weeks). 
> Application is expected to purge keys to satisfy the requirement. However, 
> with incremental checkpoint and how deletion works in RocksDB, it is hard to 
> set S3 TTL to purge S3 files. Even though those old S3 files don't contain 
> live keys, they may still be referrenced by retained Flink checkpoints.
> 2. Occasionally, corrupted checkpoint files (on S3) are observed. As a 
> result, restoring from checkpoint failed. With incremental checkpoint, it 
> usually doesn't help to try other older checkpoints, because they may refer 
> to the same corrupted file. It is unclear whether the corruption happened 
> before or during S3 upload. This risk can be mitigated with periodical 
> savepoints.
> It all boils down to periodical full snapshot (checkpoint or savepoint) to 
> deterministically break the chain of incremental checkpoints. Search the jira 
> history, the behavior that FLINK-23949 [1] trying to fix is actually close to 
> what we would need here.
> There are a few options
> 1. Periodically trigger savepoints (via control plane). This is actually not 
> a bad practice and might be appealing to some people. The problem is that it 
> requires a job deployment to break the chain of incremental checkpoint. 
> periodical job deployment may sound hacky. If we make the behavior of full 
> checkpoint after a savepoint (fixed in FLINK-23949) configurable, it might be 
> an acceptable compromise. The benefit is that no job deployment is required 
> after savepoints.
> 2. Build the feature in Flink incremental checkpoint. Periodically (with some 
> cron style config) trigger a full checkpoint to break the incremental chain. 
> If the full checkpoint failed (due to whatever reason), the following 
> checkpoints should attempt full checkpoint as well until one successful full 
> checkpoint is completed.
> 3. For the security/privacy requirement, the main thing is to apply 
> compaction on the deleted keys. That could probably avoid references to the 
> old files. Is there any RocksDB compation can achieve full compaction of 
> removing old delete markers. Recent delete markers are fine
> [1] https://issues.apache.org/jira/browse/FLINK-23949



--
This message was sent by Atlassian Jira
(v8.20.10#820010)