Re: [PR] [FLINK-27529] Fix Intger Comparison For Source Index in Hybrid Source [flink]
varun1729DD commented on PR #23703: URL: https://github.com/apache/flink/pull/23703#issuecomment-1837396406 @flinkbot run azure -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [FLINK-33210][core] Introduce job status changed listener for lineage [flink]
JingGe commented on PR #23695: URL: https://github.com/apache/flink/pull/23695#issuecomment-1837390803 @FangYongs are you still working on this PR? FYI: Flink doc needs to be generated again after new configOption is introduced. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (FLINK-22014) Flink JobManager failed to restart after failure in kubernetes HA setup
[ https://issues.apache.org/jira/browse/FLINK-22014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17792489#comment-17792489 ] Yuexin Chen commented on FLINK-22014: - [~mason6345] Is it caused by this issue FLINK-33011? we use flink-kubernetes-operator 1.6 for flink tasks deployment. > Flink JobManager failed to restart after failure in kubernetes HA setup > --- > > Key: FLINK-22014 > URL: https://issues.apache.org/jira/browse/FLINK-22014 > Project: Flink > Issue Type: Bug > Components: Deployment / Kubernetes >Affects Versions: 1.11.3, 1.12.2, 1.13.0 >Reporter: Mikalai Lushchytski >Priority: Major > Labels: k8s-ha, pull-request-available > Attachments: flink-logs.txt.zip, image-2021-04-19-11-17-58-215.png, > scalyr-logs (1).txt > > > After the JobManager pod failed and the new one started, it was not able to > recover jobs due to the absence of recovery data in storage - config map > pointed at not existing file. > > Due to this the JobManager pod entered into the `CrashLoopBackOff`state and > was not able to recover - each attempt failed with the same error so the > whole cluster became unrecoverable and not operating. > > I had to manually delete the config map and start the jobs again without the > save point. > > If I tried to emulate the failure further by deleting job manager pod > manually, the new pod every time recovered well and issue was not > reproducible anymore artificially. > > Below is the failure log: > {code:java} > 2021-03-26 08:22:57,925 INFO > org.apache.flink.runtime.resourcemanager.slotmanager.SlotManagerImpl [] - > Starting the SlotManager. > 2021-03-26 08:22:57,928 INFO > org.apache.flink.runtime.leaderretrieval.DefaultLeaderRetrievalService [] - > Starting DefaultLeaderRetrievalService with KubernetesLeaderRetrievalDriver > {configMapName='stellar-flink-cluster-dispatcher-leader'}. > 2021-03-26 08:22:57,931 INFO > org.apache.flink.runtime.jobmanager.DefaultJobGraphStore [] - Retrieved job > ids [198c46bac791e73ebcc565a550fa4ff6, 344f5ebc1b5c3a566b4b2837813e4940, > 96c4603a0822d10884f7fe536703d811, d9ded24224aab7c7041420b3efc1b6ba] from > KubernetesStateHandleStore{configMapName='stellar-flink-cluster-dispatcher-leader'} > 2021-03-26 08:22:57,933 INFO > org.apache.flink.runtime.dispatcher.runner.SessionDispatcherLeaderProcess [] > - Trying to recover job with job id 198c46bac791e73ebcc565a550fa4ff6. > 2021-03-26 08:22:58,029 INFO > org.apache.flink.runtime.dispatcher.runner.SessionDispatcherLeaderProcess [] > - Stopping SessionDispatcherLeaderProcess. > 2021-03-26 08:28:22,677 INFO > org.apache.flink.runtime.jobmanager.DefaultJobGraphStore [] - Stopping > DefaultJobGraphStore. 2021-03-26 08:28:22,681 ERROR > org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - Fatal error > occurred in the cluster entrypoint. java.util.concurrent.CompletionException: > org.apache.flink.util.FlinkRuntimeException: Could not recover job with job > id 198c46bac791e73ebcc565a550fa4ff6. >at java.util.concurrent.CompletableFuture.encodeThrowable(Unknown Source) > ~[?:?] >at java.util.concurrent.CompletableFuture.completeThrowable(Unknown > Source) [?:?] >at java.util.concurrent.CompletableFuture$AsyncSupply.run(Unknown Source) > [?:?] >at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) [?:?] >at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) [?:?] >at java.lang.Thread.run(Unknown Source) [?:?] Caused by: > org.apache.flink.util.FlinkRuntimeException: Could not recover job with job > id 198c46bac791e73ebcc565a550fa4ff6. >at > org.apache.flink.runtime.dispatcher.runner.SessionDispatcherLeaderProcess.recoverJob(SessionDispatcherLeaderProcess.java:144 > undefined) ~[flink-dist_2.12-1.12.2.jar:1.12.2] >at > org.apache.flink.runtime.dispatcher.runner.SessionDispatcherLeaderProcess.recoverJobs(SessionDispatcherLeaderProcess.java:122 > undefined) ~[flink-dist_2.12-1.12.2.jar:1.12.2] >at > org.apache.flink.runtime.dispatcher.runner.AbstractDispatcherLeaderProcess.supplyUnsynchronizedIfRunning(AbstractDispatcherLeaderProcess.java:198 > undefined) ~[flink-dist_2.12-1.12.2.jar:1.12.2] >at > org.apache.flink.runtime.dispatcher.runner.SessionDispatcherLeaderProcess.recoverJobsIfRunning(SessionDispatcherLeaderProcess.java:113 > undefined) ~[flink-dist_2.12-1.12.2.jar:1.12.2] ... 4 more > Caused by: org.apache.flink.util.FlinkException: Could not retrieve submitted > JobGraph from state handle under jobGraph-198c46bac791e73ebcc565a550fa4ff6. > This indicates that the retrieved state handle is broken. Try cleaning the > state handle store. >at >
[jira] (FLINK-22014) Flink JobManager failed to restart after failure in kubernetes HA setup
[ https://issues.apache.org/jira/browse/FLINK-22014 ] Yuexin Chen deleted comment on FLINK-22014: - was (Author: JIRAUSER295713): [~mason6345] Is it caused by this issue issues.apache.org/jira/browse/FLINK-33011, we use flink-kubernetes-operator 1.6 for flink tasks deployment. > Flink JobManager failed to restart after failure in kubernetes HA setup > --- > > Key: FLINK-22014 > URL: https://issues.apache.org/jira/browse/FLINK-22014 > Project: Flink > Issue Type: Bug > Components: Deployment / Kubernetes >Affects Versions: 1.11.3, 1.12.2, 1.13.0 >Reporter: Mikalai Lushchytski >Priority: Major > Labels: k8s-ha, pull-request-available > Attachments: flink-logs.txt.zip, image-2021-04-19-11-17-58-215.png, > scalyr-logs (1).txt > > > After the JobManager pod failed and the new one started, it was not able to > recover jobs due to the absence of recovery data in storage - config map > pointed at not existing file. > > Due to this the JobManager pod entered into the `CrashLoopBackOff`state and > was not able to recover - each attempt failed with the same error so the > whole cluster became unrecoverable and not operating. > > I had to manually delete the config map and start the jobs again without the > save point. > > If I tried to emulate the failure further by deleting job manager pod > manually, the new pod every time recovered well and issue was not > reproducible anymore artificially. > > Below is the failure log: > {code:java} > 2021-03-26 08:22:57,925 INFO > org.apache.flink.runtime.resourcemanager.slotmanager.SlotManagerImpl [] - > Starting the SlotManager. > 2021-03-26 08:22:57,928 INFO > org.apache.flink.runtime.leaderretrieval.DefaultLeaderRetrievalService [] - > Starting DefaultLeaderRetrievalService with KubernetesLeaderRetrievalDriver > {configMapName='stellar-flink-cluster-dispatcher-leader'}. > 2021-03-26 08:22:57,931 INFO > org.apache.flink.runtime.jobmanager.DefaultJobGraphStore [] - Retrieved job > ids [198c46bac791e73ebcc565a550fa4ff6, 344f5ebc1b5c3a566b4b2837813e4940, > 96c4603a0822d10884f7fe536703d811, d9ded24224aab7c7041420b3efc1b6ba] from > KubernetesStateHandleStore{configMapName='stellar-flink-cluster-dispatcher-leader'} > 2021-03-26 08:22:57,933 INFO > org.apache.flink.runtime.dispatcher.runner.SessionDispatcherLeaderProcess [] > - Trying to recover job with job id 198c46bac791e73ebcc565a550fa4ff6. > 2021-03-26 08:22:58,029 INFO > org.apache.flink.runtime.dispatcher.runner.SessionDispatcherLeaderProcess [] > - Stopping SessionDispatcherLeaderProcess. > 2021-03-26 08:28:22,677 INFO > org.apache.flink.runtime.jobmanager.DefaultJobGraphStore [] - Stopping > DefaultJobGraphStore. 2021-03-26 08:28:22,681 ERROR > org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - Fatal error > occurred in the cluster entrypoint. java.util.concurrent.CompletionException: > org.apache.flink.util.FlinkRuntimeException: Could not recover job with job > id 198c46bac791e73ebcc565a550fa4ff6. >at java.util.concurrent.CompletableFuture.encodeThrowable(Unknown Source) > ~[?:?] >at java.util.concurrent.CompletableFuture.completeThrowable(Unknown > Source) [?:?] >at java.util.concurrent.CompletableFuture$AsyncSupply.run(Unknown Source) > [?:?] >at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) [?:?] >at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) [?:?] >at java.lang.Thread.run(Unknown Source) [?:?] Caused by: > org.apache.flink.util.FlinkRuntimeException: Could not recover job with job > id 198c46bac791e73ebcc565a550fa4ff6. >at > org.apache.flink.runtime.dispatcher.runner.SessionDispatcherLeaderProcess.recoverJob(SessionDispatcherLeaderProcess.java:144 > undefined) ~[flink-dist_2.12-1.12.2.jar:1.12.2] >at > org.apache.flink.runtime.dispatcher.runner.SessionDispatcherLeaderProcess.recoverJobs(SessionDispatcherLeaderProcess.java:122 > undefined) ~[flink-dist_2.12-1.12.2.jar:1.12.2] >at > org.apache.flink.runtime.dispatcher.runner.AbstractDispatcherLeaderProcess.supplyUnsynchronizedIfRunning(AbstractDispatcherLeaderProcess.java:198 > undefined) ~[flink-dist_2.12-1.12.2.jar:1.12.2] >at > org.apache.flink.runtime.dispatcher.runner.SessionDispatcherLeaderProcess.recoverJobsIfRunning(SessionDispatcherLeaderProcess.java:113 > undefined) ~[flink-dist_2.12-1.12.2.jar:1.12.2] ... 4 more > Caused by: org.apache.flink.util.FlinkException: Could not retrieve submitted > JobGraph from state handle under jobGraph-198c46bac791e73ebcc565a550fa4ff6. > This indicates that the retrieved state handle is broken. Try cleaning the > state handle store. >at > org.apache.flink.runtime.jobmanager.DefaultJobGraphStore.recoverJobGraph(DefaultJobGraphStore.java:171
[jira] [Commented] (FLINK-22014) Flink JobManager failed to restart after failure in kubernetes HA setup
[ https://issues.apache.org/jira/browse/FLINK-22014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17792488#comment-17792488 ] Yuexin Chen commented on FLINK-22014: - [~mason6345] Is it caused by this issue issues.apache.org/jira/browse/FLINK-33011, we use flink-kubernetes-operator 1.6 for flink tasks deployment. > Flink JobManager failed to restart after failure in kubernetes HA setup > --- > > Key: FLINK-22014 > URL: https://issues.apache.org/jira/browse/FLINK-22014 > Project: Flink > Issue Type: Bug > Components: Deployment / Kubernetes >Affects Versions: 1.11.3, 1.12.2, 1.13.0 >Reporter: Mikalai Lushchytski >Priority: Major > Labels: k8s-ha, pull-request-available > Attachments: flink-logs.txt.zip, image-2021-04-19-11-17-58-215.png, > scalyr-logs (1).txt > > > After the JobManager pod failed and the new one started, it was not able to > recover jobs due to the absence of recovery data in storage - config map > pointed at not existing file. > > Due to this the JobManager pod entered into the `CrashLoopBackOff`state and > was not able to recover - each attempt failed with the same error so the > whole cluster became unrecoverable and not operating. > > I had to manually delete the config map and start the jobs again without the > save point. > > If I tried to emulate the failure further by deleting job manager pod > manually, the new pod every time recovered well and issue was not > reproducible anymore artificially. > > Below is the failure log: > {code:java} > 2021-03-26 08:22:57,925 INFO > org.apache.flink.runtime.resourcemanager.slotmanager.SlotManagerImpl [] - > Starting the SlotManager. > 2021-03-26 08:22:57,928 INFO > org.apache.flink.runtime.leaderretrieval.DefaultLeaderRetrievalService [] - > Starting DefaultLeaderRetrievalService with KubernetesLeaderRetrievalDriver > {configMapName='stellar-flink-cluster-dispatcher-leader'}. > 2021-03-26 08:22:57,931 INFO > org.apache.flink.runtime.jobmanager.DefaultJobGraphStore [] - Retrieved job > ids [198c46bac791e73ebcc565a550fa4ff6, 344f5ebc1b5c3a566b4b2837813e4940, > 96c4603a0822d10884f7fe536703d811, d9ded24224aab7c7041420b3efc1b6ba] from > KubernetesStateHandleStore{configMapName='stellar-flink-cluster-dispatcher-leader'} > 2021-03-26 08:22:57,933 INFO > org.apache.flink.runtime.dispatcher.runner.SessionDispatcherLeaderProcess [] > - Trying to recover job with job id 198c46bac791e73ebcc565a550fa4ff6. > 2021-03-26 08:22:58,029 INFO > org.apache.flink.runtime.dispatcher.runner.SessionDispatcherLeaderProcess [] > - Stopping SessionDispatcherLeaderProcess. > 2021-03-26 08:28:22,677 INFO > org.apache.flink.runtime.jobmanager.DefaultJobGraphStore [] - Stopping > DefaultJobGraphStore. 2021-03-26 08:28:22,681 ERROR > org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - Fatal error > occurred in the cluster entrypoint. java.util.concurrent.CompletionException: > org.apache.flink.util.FlinkRuntimeException: Could not recover job with job > id 198c46bac791e73ebcc565a550fa4ff6. >at java.util.concurrent.CompletableFuture.encodeThrowable(Unknown Source) > ~[?:?] >at java.util.concurrent.CompletableFuture.completeThrowable(Unknown > Source) [?:?] >at java.util.concurrent.CompletableFuture$AsyncSupply.run(Unknown Source) > [?:?] >at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) [?:?] >at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) [?:?] >at java.lang.Thread.run(Unknown Source) [?:?] Caused by: > org.apache.flink.util.FlinkRuntimeException: Could not recover job with job > id 198c46bac791e73ebcc565a550fa4ff6. >at > org.apache.flink.runtime.dispatcher.runner.SessionDispatcherLeaderProcess.recoverJob(SessionDispatcherLeaderProcess.java:144 > undefined) ~[flink-dist_2.12-1.12.2.jar:1.12.2] >at > org.apache.flink.runtime.dispatcher.runner.SessionDispatcherLeaderProcess.recoverJobs(SessionDispatcherLeaderProcess.java:122 > undefined) ~[flink-dist_2.12-1.12.2.jar:1.12.2] >at > org.apache.flink.runtime.dispatcher.runner.AbstractDispatcherLeaderProcess.supplyUnsynchronizedIfRunning(AbstractDispatcherLeaderProcess.java:198 > undefined) ~[flink-dist_2.12-1.12.2.jar:1.12.2] >at > org.apache.flink.runtime.dispatcher.runner.SessionDispatcherLeaderProcess.recoverJobsIfRunning(SessionDispatcherLeaderProcess.java:113 > undefined) ~[flink-dist_2.12-1.12.2.jar:1.12.2] ... 4 more > Caused by: org.apache.flink.util.FlinkException: Could not retrieve submitted > JobGraph from state handle under jobGraph-198c46bac791e73ebcc565a550fa4ff6. > This indicates that the retrieved state handle is broken. Try cleaning the > state handle store. >at >
[jira] (FLINK-22014) Flink JobManager failed to restart after failure in kubernetes HA setup
[ https://issues.apache.org/jira/browse/FLINK-22014 ] Yuexin Chen deleted comment on FLINK-22014: - was (Author: JIRAUSER295713): [~mason6345] [~novakov.alex] have you found the issue of the problem? > Flink JobManager failed to restart after failure in kubernetes HA setup > --- > > Key: FLINK-22014 > URL: https://issues.apache.org/jira/browse/FLINK-22014 > Project: Flink > Issue Type: Bug > Components: Deployment / Kubernetes >Affects Versions: 1.11.3, 1.12.2, 1.13.0 >Reporter: Mikalai Lushchytski >Priority: Major > Labels: k8s-ha, pull-request-available > Attachments: flink-logs.txt.zip, image-2021-04-19-11-17-58-215.png, > scalyr-logs (1).txt > > > After the JobManager pod failed and the new one started, it was not able to > recover jobs due to the absence of recovery data in storage - config map > pointed at not existing file. > > Due to this the JobManager pod entered into the `CrashLoopBackOff`state and > was not able to recover - each attempt failed with the same error so the > whole cluster became unrecoverable and not operating. > > I had to manually delete the config map and start the jobs again without the > save point. > > If I tried to emulate the failure further by deleting job manager pod > manually, the new pod every time recovered well and issue was not > reproducible anymore artificially. > > Below is the failure log: > {code:java} > 2021-03-26 08:22:57,925 INFO > org.apache.flink.runtime.resourcemanager.slotmanager.SlotManagerImpl [] - > Starting the SlotManager. > 2021-03-26 08:22:57,928 INFO > org.apache.flink.runtime.leaderretrieval.DefaultLeaderRetrievalService [] - > Starting DefaultLeaderRetrievalService with KubernetesLeaderRetrievalDriver > {configMapName='stellar-flink-cluster-dispatcher-leader'}. > 2021-03-26 08:22:57,931 INFO > org.apache.flink.runtime.jobmanager.DefaultJobGraphStore [] - Retrieved job > ids [198c46bac791e73ebcc565a550fa4ff6, 344f5ebc1b5c3a566b4b2837813e4940, > 96c4603a0822d10884f7fe536703d811, d9ded24224aab7c7041420b3efc1b6ba] from > KubernetesStateHandleStore{configMapName='stellar-flink-cluster-dispatcher-leader'} > 2021-03-26 08:22:57,933 INFO > org.apache.flink.runtime.dispatcher.runner.SessionDispatcherLeaderProcess [] > - Trying to recover job with job id 198c46bac791e73ebcc565a550fa4ff6. > 2021-03-26 08:22:58,029 INFO > org.apache.flink.runtime.dispatcher.runner.SessionDispatcherLeaderProcess [] > - Stopping SessionDispatcherLeaderProcess. > 2021-03-26 08:28:22,677 INFO > org.apache.flink.runtime.jobmanager.DefaultJobGraphStore [] - Stopping > DefaultJobGraphStore. 2021-03-26 08:28:22,681 ERROR > org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - Fatal error > occurred in the cluster entrypoint. java.util.concurrent.CompletionException: > org.apache.flink.util.FlinkRuntimeException: Could not recover job with job > id 198c46bac791e73ebcc565a550fa4ff6. >at java.util.concurrent.CompletableFuture.encodeThrowable(Unknown Source) > ~[?:?] >at java.util.concurrent.CompletableFuture.completeThrowable(Unknown > Source) [?:?] >at java.util.concurrent.CompletableFuture$AsyncSupply.run(Unknown Source) > [?:?] >at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) [?:?] >at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) [?:?] >at java.lang.Thread.run(Unknown Source) [?:?] Caused by: > org.apache.flink.util.FlinkRuntimeException: Could not recover job with job > id 198c46bac791e73ebcc565a550fa4ff6. >at > org.apache.flink.runtime.dispatcher.runner.SessionDispatcherLeaderProcess.recoverJob(SessionDispatcherLeaderProcess.java:144 > undefined) ~[flink-dist_2.12-1.12.2.jar:1.12.2] >at > org.apache.flink.runtime.dispatcher.runner.SessionDispatcherLeaderProcess.recoverJobs(SessionDispatcherLeaderProcess.java:122 > undefined) ~[flink-dist_2.12-1.12.2.jar:1.12.2] >at > org.apache.flink.runtime.dispatcher.runner.AbstractDispatcherLeaderProcess.supplyUnsynchronizedIfRunning(AbstractDispatcherLeaderProcess.java:198 > undefined) ~[flink-dist_2.12-1.12.2.jar:1.12.2] >at > org.apache.flink.runtime.dispatcher.runner.SessionDispatcherLeaderProcess.recoverJobsIfRunning(SessionDispatcherLeaderProcess.java:113 > undefined) ~[flink-dist_2.12-1.12.2.jar:1.12.2] ... 4 more > Caused by: org.apache.flink.util.FlinkException: Could not retrieve submitted > JobGraph from state handle under jobGraph-198c46bac791e73ebcc565a550fa4ff6. > This indicates that the retrieved state handle is broken. Try cleaning the > state handle store. >at > org.apache.flink.runtime.jobmanager.DefaultJobGraphStore.recoverJobGraph(DefaultJobGraphStore.java:171 > undefined) ~[flink-dist_2.12-1.12.2.jar:1.12.2] >at >
Re: [PR] [FLINK-33727][table] Use different sink names for restore tests [flink]
JingGe commented on code in PR #23858: URL: https://github.com/apache/flink/pull/23858#discussion_r1413009264 ## flink-table/flink-table-planner/src/test/java/org/apache/flink/table/planner/plan/nodes/exec/testutils/JoinTestPrograms.java: ## @@ -142,7 +142,7 @@ public class JoinTestPrograms { .setupTableSource(SOURCE_T1) .setupTableSource(SOURCE_T2) .setupTableSink( -SinkTestStep.newBuilder("MySink") + SinkTestStep.newBuilder("NON_WINDOW_INNER_JOIN_WITH_NULL_Sink") Review Comment: It is a great idea to use data warehouse like naming convention to improve the readability. It would be even better to follow the naming convention that table names commonly use lower case letters. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (FLINK-22014) Flink JobManager failed to restart after failure in kubernetes HA setup
[ https://issues.apache.org/jira/browse/FLINK-22014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17792483#comment-17792483 ] chenyuexin commented on FLINK-22014: [~mason6345] [~novakov.alex] have you found the issue of the problem? > Flink JobManager failed to restart after failure in kubernetes HA setup > --- > > Key: FLINK-22014 > URL: https://issues.apache.org/jira/browse/FLINK-22014 > Project: Flink > Issue Type: Bug > Components: Deployment / Kubernetes >Affects Versions: 1.11.3, 1.12.2, 1.13.0 >Reporter: Mikalai Lushchytski >Priority: Major > Labels: k8s-ha, pull-request-available > Attachments: flink-logs.txt.zip, image-2021-04-19-11-17-58-215.png, > scalyr-logs (1).txt > > > After the JobManager pod failed and the new one started, it was not able to > recover jobs due to the absence of recovery data in storage - config map > pointed at not existing file. > > Due to this the JobManager pod entered into the `CrashLoopBackOff`state and > was not able to recover - each attempt failed with the same error so the > whole cluster became unrecoverable and not operating. > > I had to manually delete the config map and start the jobs again without the > save point. > > If I tried to emulate the failure further by deleting job manager pod > manually, the new pod every time recovered well and issue was not > reproducible anymore artificially. > > Below is the failure log: > {code:java} > 2021-03-26 08:22:57,925 INFO > org.apache.flink.runtime.resourcemanager.slotmanager.SlotManagerImpl [] - > Starting the SlotManager. > 2021-03-26 08:22:57,928 INFO > org.apache.flink.runtime.leaderretrieval.DefaultLeaderRetrievalService [] - > Starting DefaultLeaderRetrievalService with KubernetesLeaderRetrievalDriver > {configMapName='stellar-flink-cluster-dispatcher-leader'}. > 2021-03-26 08:22:57,931 INFO > org.apache.flink.runtime.jobmanager.DefaultJobGraphStore [] - Retrieved job > ids [198c46bac791e73ebcc565a550fa4ff6, 344f5ebc1b5c3a566b4b2837813e4940, > 96c4603a0822d10884f7fe536703d811, d9ded24224aab7c7041420b3efc1b6ba] from > KubernetesStateHandleStore{configMapName='stellar-flink-cluster-dispatcher-leader'} > 2021-03-26 08:22:57,933 INFO > org.apache.flink.runtime.dispatcher.runner.SessionDispatcherLeaderProcess [] > - Trying to recover job with job id 198c46bac791e73ebcc565a550fa4ff6. > 2021-03-26 08:22:58,029 INFO > org.apache.flink.runtime.dispatcher.runner.SessionDispatcherLeaderProcess [] > - Stopping SessionDispatcherLeaderProcess. > 2021-03-26 08:28:22,677 INFO > org.apache.flink.runtime.jobmanager.DefaultJobGraphStore [] - Stopping > DefaultJobGraphStore. 2021-03-26 08:28:22,681 ERROR > org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - Fatal error > occurred in the cluster entrypoint. java.util.concurrent.CompletionException: > org.apache.flink.util.FlinkRuntimeException: Could not recover job with job > id 198c46bac791e73ebcc565a550fa4ff6. >at java.util.concurrent.CompletableFuture.encodeThrowable(Unknown Source) > ~[?:?] >at java.util.concurrent.CompletableFuture.completeThrowable(Unknown > Source) [?:?] >at java.util.concurrent.CompletableFuture$AsyncSupply.run(Unknown Source) > [?:?] >at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) [?:?] >at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) [?:?] >at java.lang.Thread.run(Unknown Source) [?:?] Caused by: > org.apache.flink.util.FlinkRuntimeException: Could not recover job with job > id 198c46bac791e73ebcc565a550fa4ff6. >at > org.apache.flink.runtime.dispatcher.runner.SessionDispatcherLeaderProcess.recoverJob(SessionDispatcherLeaderProcess.java:144 > undefined) ~[flink-dist_2.12-1.12.2.jar:1.12.2] >at > org.apache.flink.runtime.dispatcher.runner.SessionDispatcherLeaderProcess.recoverJobs(SessionDispatcherLeaderProcess.java:122 > undefined) ~[flink-dist_2.12-1.12.2.jar:1.12.2] >at > org.apache.flink.runtime.dispatcher.runner.AbstractDispatcherLeaderProcess.supplyUnsynchronizedIfRunning(AbstractDispatcherLeaderProcess.java:198 > undefined) ~[flink-dist_2.12-1.12.2.jar:1.12.2] >at > org.apache.flink.runtime.dispatcher.runner.SessionDispatcherLeaderProcess.recoverJobsIfRunning(SessionDispatcherLeaderProcess.java:113 > undefined) ~[flink-dist_2.12-1.12.2.jar:1.12.2] ... 4 more > Caused by: org.apache.flink.util.FlinkException: Could not retrieve submitted > JobGraph from state handle under jobGraph-198c46bac791e73ebcc565a550fa4ff6. > This indicates that the retrieved state handle is broken. Try cleaning the > state handle store. >at > org.apache.flink.runtime.jobmanager.DefaultJobGraphStore.recoverJobGraph(DefaultJobGraphStore.java:171 > undefined)
[jira] [Commented] (FLINK-22014) Flink JobManager failed to restart after failure in kubernetes HA setup
[ https://issues.apache.org/jira/browse/FLINK-22014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17792482#comment-17792482 ] chenyuexin commented on FLINK-22014: The same issue in my case with flink-1.15.4, we found that our running flink tasks some have ’submittedJobGraph’ file and ‘job-result-store‘ folder, some tasks all don’t exist, we use S3 storage and K8S HA > Flink JobManager failed to restart after failure in kubernetes HA setup > --- > > Key: FLINK-22014 > URL: https://issues.apache.org/jira/browse/FLINK-22014 > Project: Flink > Issue Type: Bug > Components: Deployment / Kubernetes >Affects Versions: 1.11.3, 1.12.2, 1.13.0 >Reporter: Mikalai Lushchytski >Priority: Major > Labels: k8s-ha, pull-request-available > Attachments: flink-logs.txt.zip, image-2021-04-19-11-17-58-215.png, > scalyr-logs (1).txt > > > After the JobManager pod failed and the new one started, it was not able to > recover jobs due to the absence of recovery data in storage - config map > pointed at not existing file. > > Due to this the JobManager pod entered into the `CrashLoopBackOff`state and > was not able to recover - each attempt failed with the same error so the > whole cluster became unrecoverable and not operating. > > I had to manually delete the config map and start the jobs again without the > save point. > > If I tried to emulate the failure further by deleting job manager pod > manually, the new pod every time recovered well and issue was not > reproducible anymore artificially. > > Below is the failure log: > {code:java} > 2021-03-26 08:22:57,925 INFO > org.apache.flink.runtime.resourcemanager.slotmanager.SlotManagerImpl [] - > Starting the SlotManager. > 2021-03-26 08:22:57,928 INFO > org.apache.flink.runtime.leaderretrieval.DefaultLeaderRetrievalService [] - > Starting DefaultLeaderRetrievalService with KubernetesLeaderRetrievalDriver > {configMapName='stellar-flink-cluster-dispatcher-leader'}. > 2021-03-26 08:22:57,931 INFO > org.apache.flink.runtime.jobmanager.DefaultJobGraphStore [] - Retrieved job > ids [198c46bac791e73ebcc565a550fa4ff6, 344f5ebc1b5c3a566b4b2837813e4940, > 96c4603a0822d10884f7fe536703d811, d9ded24224aab7c7041420b3efc1b6ba] from > KubernetesStateHandleStore{configMapName='stellar-flink-cluster-dispatcher-leader'} > 2021-03-26 08:22:57,933 INFO > org.apache.flink.runtime.dispatcher.runner.SessionDispatcherLeaderProcess [] > - Trying to recover job with job id 198c46bac791e73ebcc565a550fa4ff6. > 2021-03-26 08:22:58,029 INFO > org.apache.flink.runtime.dispatcher.runner.SessionDispatcherLeaderProcess [] > - Stopping SessionDispatcherLeaderProcess. > 2021-03-26 08:28:22,677 INFO > org.apache.flink.runtime.jobmanager.DefaultJobGraphStore [] - Stopping > DefaultJobGraphStore. 2021-03-26 08:28:22,681 ERROR > org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - Fatal error > occurred in the cluster entrypoint. java.util.concurrent.CompletionException: > org.apache.flink.util.FlinkRuntimeException: Could not recover job with job > id 198c46bac791e73ebcc565a550fa4ff6. >at java.util.concurrent.CompletableFuture.encodeThrowable(Unknown Source) > ~[?:?] >at java.util.concurrent.CompletableFuture.completeThrowable(Unknown > Source) [?:?] >at java.util.concurrent.CompletableFuture$AsyncSupply.run(Unknown Source) > [?:?] >at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) [?:?] >at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) [?:?] >at java.lang.Thread.run(Unknown Source) [?:?] Caused by: > org.apache.flink.util.FlinkRuntimeException: Could not recover job with job > id 198c46bac791e73ebcc565a550fa4ff6. >at > org.apache.flink.runtime.dispatcher.runner.SessionDispatcherLeaderProcess.recoverJob(SessionDispatcherLeaderProcess.java:144 > undefined) ~[flink-dist_2.12-1.12.2.jar:1.12.2] >at > org.apache.flink.runtime.dispatcher.runner.SessionDispatcherLeaderProcess.recoverJobs(SessionDispatcherLeaderProcess.java:122 > undefined) ~[flink-dist_2.12-1.12.2.jar:1.12.2] >at > org.apache.flink.runtime.dispatcher.runner.AbstractDispatcherLeaderProcess.supplyUnsynchronizedIfRunning(AbstractDispatcherLeaderProcess.java:198 > undefined) ~[flink-dist_2.12-1.12.2.jar:1.12.2] >at > org.apache.flink.runtime.dispatcher.runner.SessionDispatcherLeaderProcess.recoverJobsIfRunning(SessionDispatcherLeaderProcess.java:113 > undefined) ~[flink-dist_2.12-1.12.2.jar:1.12.2] ... 4 more > Caused by: org.apache.flink.util.FlinkException: Could not retrieve submitted > JobGraph from state handle under jobGraph-198c46bac791e73ebcc565a550fa4ff6. > This indicates that the retrieved state handle is broken. Try cleaning the > state handle store.
[jira] [Comment Edited] (FLINK-33727) JoinRestoreTest is failing on AZP
[ https://issues.apache.org/jira/browse/FLINK-33727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17792458#comment-17792458 ] Sergey Nuyanzin edited comment on FLINK-33727 at 12/2/23 11:29 PM: --- Based on local tests seems these {{MySink}} is kind of "state holder" for the tests... Renaming helps {quote} (Right now, the DeduplicationTestPrograms and JoinTestPrograms both use sinks called "MySink".) {quote} is there any reason to have same name? Also I guess it is worth noting: there is a number of other tests with same potential issue e.g. with sink name "sink_t", probably something else was (Author: sergey nuyanzin): Based on local tests seems these {{MySink}} is kind of "state holder" for the tests... {quote} (Right now, the DeduplicationTestPrograms and JoinTestPrograms both use sinks called "MySink".) {quote} is there any reason to have same name? > JoinRestoreTest is failing on AZP > - > > Key: FLINK-33727 > URL: https://issues.apache.org/jira/browse/FLINK-33727 > Project: Flink > Issue Type: Bug > Components: Table SQL / Planner >Affects Versions: 1.19.0 >Reporter: Sergey Nuyanzin >Priority: Critical > Labels: pull-request-available, test-stability > > Since {{JoinRestoreTest}} was introduced in FLINK-33470 it seems to be a > reason > {noformat} > Dec 02 04:42:26 04:42:26.408 [ERROR] Failures: > Dec 02 04:42:26 04:42:26.408 [ERROR] > JoinRestoreTest>RestoreTestBase.testRestore:283 > Dec 02 04:42:26 Expecting actual: > Dec 02 04:42:26 ["+I[9, carol, apple, 9000]", > Dec 02 04:42:26 "+I[8, bill, banana, 8000]", > Dec 02 04:42:26 "+I[6, jerry, pen, 6000]"] > Dec 02 04:42:26 to contain exactly in any order: > Dec 02 04:42:26 ["+I[Adam, null]", > Dec 02 04:42:26 "+I[Baker, Research]", > Dec 02 04:42:26 "+I[Charlie, Human Resources]", > Dec 02 04:42:26 "+I[Charlie, HR]", > Dec 02 04:42:26 "+I[Don, Sales]", > Dec 02 04:42:26 "+I[Victor, null]", > Dec 02 04:42:26 "+I[Helena, Engineering]", > Dec 02 04:42:26 "+I[Juliet, Engineering]", > Dec 02 04:42:26 "+I[Ivana, Research]", > Dec 02 04:42:26 "+I[Charlie, People Operations]"] > {noformat} > examples of failures > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=55120=logs=0c940707-2659-5648-cbe6-a1ad63045f0a=075c2716-8010-5565-fe08-3c4bb45824a4=12099 > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=55129=logs=a9db68b9-a7e0-54b6-0f98-010e0aff39e2=cdd32e0b-6047-565b-c58f-14054472f1be=11786 > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=55136=logs=0c940707-2659-5648-cbe6-a1ad63045f0a=075c2716-8010-5565-fe08-3c4bb45824a4=12099 > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=55137=logs=0c940707-2659-5648-cbe6-a1ad63045f0a=075c2716-8010-5565-fe08-3c4bb45824a4=11779 -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [PR] [FLINK-33727][table] Use different sink names for restore tests [flink]
flinkbot commented on PR #23858: URL: https://github.com/apache/flink/pull/23858#issuecomment-1837279346 ## CI report: * da62de304dcaeef07bda7d8da7bfc082737b14fa UNKNOWN Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [FLINK-33710] Prevent triggering cluster upgrades for permutations of the same overrides [flink-kubernetes-operator]
gyfora commented on code in PR #721: URL: https://github.com/apache/flink-kubernetes-operator/pull/721#discussion_r1412894510 ## flink-kubernetes-operator/src/main/java/org/apache/flink/kubernetes/operator/reconciler/deployment/AbstractFlinkResourceReconciler.java: ## @@ -186,7 +187,41 @@ private void applyAutoscaler(FlinkResourceContext ctx) throws Exception { ctx.getResource().getSpec().getJob() != null && ctx.getObserveConfig().getBoolean(AUTOSCALER_ENABLED); autoScalerCtx.getConfiguration().set(AUTOSCALER_ENABLED, autoscalerEnabled); + autoscaler.scale(autoScalerCtx); +putBackOldParallelismOverridesIfNewOnesAreMerelyAPermutation(ctx); +} + +private static < +CR extends AbstractFlinkResource, +SPEC extends AbstractFlinkSpec, +STATUS extends CommonStatus> +void putBackOldParallelismOverridesIfNewOnesAreMerelyAPermutation( Review Comment: Could we move this logic to the `ScalingRealizer`? https://github.com/apache/flink-kubernetes-operator/commit/158cbe29169cbfb7fa7ad676fb0273fd7ef6d25e adds the logic there and this issue comes from that change. I feel these 2 changes logically belong together and it's weird that we break something in once place and fix it in another while it could be simply next to each other. It will likely also make the whole logic a bit simpler ## flink-kubernetes-operator/src/main/java/org/apache/flink/kubernetes/operator/reconciler/deployment/AbstractFlinkResourceReconciler.java: ## @@ -186,7 +187,41 @@ private void applyAutoscaler(FlinkResourceContext ctx) throws Exception { ctx.getResource().getSpec().getJob() != null && ctx.getObserveConfig().getBoolean(AUTOSCALER_ENABLED); autoScalerCtx.getConfiguration().set(AUTOSCALER_ENABLED, autoscalerEnabled); + autoscaler.scale(autoScalerCtx); +putBackOldParallelismOverridesIfNewOnesAreMerelyAPermutation(ctx); +} + +private static < +CR extends AbstractFlinkResource, +SPEC extends AbstractFlinkSpec, +STATUS extends CommonStatus> Review Comment: Can we rename this to `resetParallelismOverridesIfUnchanged` ? The current naming is unusually verbose for the codebase. It may be better to add a javadoc comment instead. But this may be irrelevant if you check my other comment. We don't need to replace it if we don't set it in the first place, but for that we need to move the logic -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (FLINK-33727) JoinRestoreTest is failing on AZP
[ https://issues.apache.org/jira/browse/FLINK-33727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17792458#comment-17792458 ] Sergey Nuyanzin commented on FLINK-33727: - Seems these {{MySink}} is kind of "state holder" for the tests... {quote} (Right now, the DeduplicationTestPrograms and JoinTestPrograms both use sinks called "MySink".) {quote} is there any reason to have same name? > JoinRestoreTest is failing on AZP > - > > Key: FLINK-33727 > URL: https://issues.apache.org/jira/browse/FLINK-33727 > Project: Flink > Issue Type: Bug > Components: Table SQL / Planner >Affects Versions: 1.19.0 >Reporter: Sergey Nuyanzin >Priority: Critical > Labels: pull-request-available, test-stability > > Since {{JoinRestoreTest}} was introduced in FLINK-33470 it seems to be a > reason > {noformat} > Dec 02 04:42:26 04:42:26.408 [ERROR] Failures: > Dec 02 04:42:26 04:42:26.408 [ERROR] > JoinRestoreTest>RestoreTestBase.testRestore:283 > Dec 02 04:42:26 Expecting actual: > Dec 02 04:42:26 ["+I[9, carol, apple, 9000]", > Dec 02 04:42:26 "+I[8, bill, banana, 8000]", > Dec 02 04:42:26 "+I[6, jerry, pen, 6000]"] > Dec 02 04:42:26 to contain exactly in any order: > Dec 02 04:42:26 ["+I[Adam, null]", > Dec 02 04:42:26 "+I[Baker, Research]", > Dec 02 04:42:26 "+I[Charlie, Human Resources]", > Dec 02 04:42:26 "+I[Charlie, HR]", > Dec 02 04:42:26 "+I[Don, Sales]", > Dec 02 04:42:26 "+I[Victor, null]", > Dec 02 04:42:26 "+I[Helena, Engineering]", > Dec 02 04:42:26 "+I[Juliet, Engineering]", > Dec 02 04:42:26 "+I[Ivana, Research]", > Dec 02 04:42:26 "+I[Charlie, People Operations]"] > {noformat} > examples of failures > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=55120=logs=0c940707-2659-5648-cbe6-a1ad63045f0a=075c2716-8010-5565-fe08-3c4bb45824a4=12099 > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=55129=logs=a9db68b9-a7e0-54b6-0f98-010e0aff39e2=cdd32e0b-6047-565b-c58f-14054472f1be=11786 > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=55136=logs=0c940707-2659-5648-cbe6-a1ad63045f0a=075c2716-8010-5565-fe08-3c4bb45824a4=12099 > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=55137=logs=0c940707-2659-5648-cbe6-a1ad63045f0a=075c2716-8010-5565-fe08-3c4bb45824a4=11779 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Comment Edited] (FLINK-33727) JoinRestoreTest is failing on AZP
[ https://issues.apache.org/jira/browse/FLINK-33727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17792458#comment-17792458 ] Sergey Nuyanzin edited comment on FLINK-33727 at 12/2/23 11:23 PM: --- Based on local tests seems these {{MySink}} is kind of "state holder" for the tests... {quote} (Right now, the DeduplicationTestPrograms and JoinTestPrograms both use sinks called "MySink".) {quote} is there any reason to have same name? was (Author: sergey nuyanzin): Seems these {{MySink}} is kind of "state holder" for the tests... {quote} (Right now, the DeduplicationTestPrograms and JoinTestPrograms both use sinks called "MySink".) {quote} is there any reason to have same name? > JoinRestoreTest is failing on AZP > - > > Key: FLINK-33727 > URL: https://issues.apache.org/jira/browse/FLINK-33727 > Project: Flink > Issue Type: Bug > Components: Table SQL / Planner >Affects Versions: 1.19.0 >Reporter: Sergey Nuyanzin >Priority: Critical > Labels: pull-request-available, test-stability > > Since {{JoinRestoreTest}} was introduced in FLINK-33470 it seems to be a > reason > {noformat} > Dec 02 04:42:26 04:42:26.408 [ERROR] Failures: > Dec 02 04:42:26 04:42:26.408 [ERROR] > JoinRestoreTest>RestoreTestBase.testRestore:283 > Dec 02 04:42:26 Expecting actual: > Dec 02 04:42:26 ["+I[9, carol, apple, 9000]", > Dec 02 04:42:26 "+I[8, bill, banana, 8000]", > Dec 02 04:42:26 "+I[6, jerry, pen, 6000]"] > Dec 02 04:42:26 to contain exactly in any order: > Dec 02 04:42:26 ["+I[Adam, null]", > Dec 02 04:42:26 "+I[Baker, Research]", > Dec 02 04:42:26 "+I[Charlie, Human Resources]", > Dec 02 04:42:26 "+I[Charlie, HR]", > Dec 02 04:42:26 "+I[Don, Sales]", > Dec 02 04:42:26 "+I[Victor, null]", > Dec 02 04:42:26 "+I[Helena, Engineering]", > Dec 02 04:42:26 "+I[Juliet, Engineering]", > Dec 02 04:42:26 "+I[Ivana, Research]", > Dec 02 04:42:26 "+I[Charlie, People Operations]"] > {noformat} > examples of failures > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=55120=logs=0c940707-2659-5648-cbe6-a1ad63045f0a=075c2716-8010-5565-fe08-3c4bb45824a4=12099 > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=55129=logs=a9db68b9-a7e0-54b6-0f98-010e0aff39e2=cdd32e0b-6047-565b-c58f-14054472f1be=11786 > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=55136=logs=0c940707-2659-5648-cbe6-a1ad63045f0a=075c2716-8010-5565-fe08-3c4bb45824a4=12099 > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=55137=logs=0c940707-2659-5648-cbe6-a1ad63045f0a=075c2716-8010-5565-fe08-3c4bb45824a4=11779 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-33727) JoinRestoreTest is failing on AZP
[ https://issues.apache.org/jira/browse/FLINK-33727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated FLINK-33727: --- Labels: pull-request-available test-stability (was: test-stability) > JoinRestoreTest is failing on AZP > - > > Key: FLINK-33727 > URL: https://issues.apache.org/jira/browse/FLINK-33727 > Project: Flink > Issue Type: Bug > Components: Table SQL / Planner >Affects Versions: 1.19.0 >Reporter: Sergey Nuyanzin >Priority: Critical > Labels: pull-request-available, test-stability > > Since {{JoinRestoreTest}} was introduced in FLINK-33470 it seems to be a > reason > {noformat} > Dec 02 04:42:26 04:42:26.408 [ERROR] Failures: > Dec 02 04:42:26 04:42:26.408 [ERROR] > JoinRestoreTest>RestoreTestBase.testRestore:283 > Dec 02 04:42:26 Expecting actual: > Dec 02 04:42:26 ["+I[9, carol, apple, 9000]", > Dec 02 04:42:26 "+I[8, bill, banana, 8000]", > Dec 02 04:42:26 "+I[6, jerry, pen, 6000]"] > Dec 02 04:42:26 to contain exactly in any order: > Dec 02 04:42:26 ["+I[Adam, null]", > Dec 02 04:42:26 "+I[Baker, Research]", > Dec 02 04:42:26 "+I[Charlie, Human Resources]", > Dec 02 04:42:26 "+I[Charlie, HR]", > Dec 02 04:42:26 "+I[Don, Sales]", > Dec 02 04:42:26 "+I[Victor, null]", > Dec 02 04:42:26 "+I[Helena, Engineering]", > Dec 02 04:42:26 "+I[Juliet, Engineering]", > Dec 02 04:42:26 "+I[Ivana, Research]", > Dec 02 04:42:26 "+I[Charlie, People Operations]"] > {noformat} > examples of failures > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=55120=logs=0c940707-2659-5648-cbe6-a1ad63045f0a=075c2716-8010-5565-fe08-3c4bb45824a4=12099 > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=55129=logs=a9db68b9-a7e0-54b6-0f98-010e0aff39e2=cdd32e0b-6047-565b-c58f-14054472f1be=11786 > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=55136=logs=0c940707-2659-5648-cbe6-a1ad63045f0a=075c2716-8010-5565-fe08-3c4bb45824a4=12099 > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=55137=logs=0c940707-2659-5648-cbe6-a1ad63045f0a=075c2716-8010-5565-fe08-3c4bb45824a4=11779 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[PR] [FLINK-33727][table] Use different sink names for restore tests [flink]
snuyanzin opened a new pull request, #23858: URL: https://github.com/apache/flink/pull/23858 ## What is the purpose of the change In restore tests there is `MySink` name for sinks and it seems to be the problem for restore tests. For more details see jira. The PR just renames these sinks for every table program ## Verifying this change The issue was 100% reproduced by executing all tests for `RestoreTestBase` In same way it could be tested ## Does this pull request potentially affect one of the following parts: - Dependencies (does it add or upgrade a dependency): ( no) - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: ( no) - The serializers: (no ) - The runtime per-record code paths (performance sensitive): (no ) - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: (no ) - The S3 file system connector: (no ) ## Documentation - Does this pull request introduce a new feature? (no) - If yes, how is the feature documented? (not applicable) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (FLINK-33727) JoinRestoreTest is failing on AZP
[ https://issues.apache.org/jira/browse/FLINK-33727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17792449#comment-17792449 ] Sergey Nuyanzin commented on FLINK-33727: - I don't think it is related to concurrent execution. I was able to find a way to reproduce it locally with 100%. Just open IntellijIDEA and run all tests for {{RestoreTestBase}} Even more, I started commenting tests and realised if there at least one test e.g. {{ExpandRestoreTest}} before {{JoinRestoreTest}} then {{JoinRestoreTest}} fails with 100% at least for my env. If I comment out also {{ExpandRestoreTest}} then it starts passing. It seems it relies on some internal state... > JoinRestoreTest is failing on AZP > - > > Key: FLINK-33727 > URL: https://issues.apache.org/jira/browse/FLINK-33727 > Project: Flink > Issue Type: Bug > Components: Table SQL / Planner >Affects Versions: 1.19.0 >Reporter: Sergey Nuyanzin >Priority: Critical > Labels: test-stability > > Since {{JoinRestoreTest}} was introduced in FLINK-33470 it seems to be a > reason > {noformat} > Dec 02 04:42:26 04:42:26.408 [ERROR] Failures: > Dec 02 04:42:26 04:42:26.408 [ERROR] > JoinRestoreTest>RestoreTestBase.testRestore:283 > Dec 02 04:42:26 Expecting actual: > Dec 02 04:42:26 ["+I[9, carol, apple, 9000]", > Dec 02 04:42:26 "+I[8, bill, banana, 8000]", > Dec 02 04:42:26 "+I[6, jerry, pen, 6000]"] > Dec 02 04:42:26 to contain exactly in any order: > Dec 02 04:42:26 ["+I[Adam, null]", > Dec 02 04:42:26 "+I[Baker, Research]", > Dec 02 04:42:26 "+I[Charlie, Human Resources]", > Dec 02 04:42:26 "+I[Charlie, HR]", > Dec 02 04:42:26 "+I[Don, Sales]", > Dec 02 04:42:26 "+I[Victor, null]", > Dec 02 04:42:26 "+I[Helena, Engineering]", > Dec 02 04:42:26 "+I[Juliet, Engineering]", > Dec 02 04:42:26 "+I[Ivana, Research]", > Dec 02 04:42:26 "+I[Charlie, People Operations]"] > {noformat} > examples of failures > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=55120=logs=0c940707-2659-5648-cbe6-a1ad63045f0a=075c2716-8010-5565-fe08-3c4bb45824a4=12099 > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=55129=logs=a9db68b9-a7e0-54b6-0f98-010e0aff39e2=cdd32e0b-6047-565b-c58f-14054472f1be=11786 > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=55136=logs=0c940707-2659-5648-cbe6-a1ad63045f0a=075c2716-8010-5565-fe08-3c4bb45824a4=12099 > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=55137=logs=0c940707-2659-5648-cbe6-a1ad63045f0a=075c2716-8010-5565-fe08-3c4bb45824a4=11779 -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [PR] [FLINK-33470] Implement restore tests for Join node [flink]
jnh5y commented on PR #23680: URL: https://github.com/apache/flink/pull/23680#issuecomment-1837217788 > @dawidwys @jnh5y after merging this ci on master failed 4 times out of 6 e.g. > > ``` > Dec 02 04:42:26 04:42:26.408 [ERROR] Failures: > Dec 02 04:42:26 04:42:26.408 [ERROR] JoinRestoreTest>RestoreTestBase.testRestore:283 > Dec 02 04:42:26 Expecting actual: > Dec 02 04:42:26 ["+I[9, carol, apple, 9000]", > Dec 02 04:42:26 "+I[8, bill, banana, 8000]", > Dec 02 04:42:26 "+I[6, jerry, pen, 6000]"] > Dec 02 04:42:26 to contain exactly in any order: > Dec 02 04:42:26 ["+I[Adam, null]", > Dec 02 04:42:26 "+I[Baker, Research]", > Dec 02 04:42:26 "+I[Charlie, Human Resources]", > Dec 02 04:42:26 "+I[Charlie, HR]", > Dec 02 04:42:26 "+I[Don, Sales]", > Dec 02 04:42:26 "+I[Victor, null]", > Dec 02 04:42:26 "+I[Helena, Engineering]", > Dec 02 04:42:26 "+I[Juliet, Engineering]", > Dec 02 04:42:26 "+I[Ivana, Research]", > Dec 02 04:42:26 "+I[Charlie, People Operations]"] > ``` > > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=55120=logs=0c940707-2659-5648-cbe6-a1ad63045f0a=075c2716-8010-5565-fe08-3c4bb45824a4=12099 https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=55129=logs=a9db68b9-a7e0-54b6-0f98-010e0aff39e2=cdd32e0b-6047-565b-c58f-14054472f1be=11786 https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=55136=logs=0c940707-2659-5648-cbe6-a1ad63045f0a=075c2716-8010-5565-fe08-3c4bb45824a4=12099 https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=55137=logs=0c940707-2659-5648-cbe6-a1ad63045f0a=075c2716-8010-5565-fe08-3c4bb45824a4=11779 > > could you please have a look? [FLINK-33727](https://issues.apache.org/jira/browse/FLINK-33727) Yes. I've commented on the Flink JIRA; I think the issue is concurrency with the RestoreTestBase implementations. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (FLINK-33727) JoinRestoreTest is failing on AZP
[ https://issues.apache.org/jira/browse/FLINK-33727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17792413#comment-17792413 ] Jim Hughes commented on FLINK-33727: >From a quick look, the data is coming from `DeduplicationTestPrograms.java`. I believe that this shows that the various `RestoreTest`s are being executed concurrently and are interfering with each other. Two obvious ideas would be: 1. Have each RestoreTest use differently named sinks/sources. (Right now, the DeduplicationTestPrograms and JoinTestPrograms both use sinks called "MySink".) 2. Do something at the JUnit level so that implementations of RestoreTestBase do not run concurrently. Thoughts? > JoinRestoreTest is failing on AZP > - > > Key: FLINK-33727 > URL: https://issues.apache.org/jira/browse/FLINK-33727 > Project: Flink > Issue Type: Bug > Components: Table SQL / Planner >Affects Versions: 1.19.0 >Reporter: Sergey Nuyanzin >Priority: Critical > Labels: test-stability > > Since {{JoinRestoreTest}} was introduced in FLINK-33470 it seems to be a > reason > {noformat} > Dec 02 04:42:26 04:42:26.408 [ERROR] Failures: > Dec 02 04:42:26 04:42:26.408 [ERROR] > JoinRestoreTest>RestoreTestBase.testRestore:283 > Dec 02 04:42:26 Expecting actual: > Dec 02 04:42:26 ["+I[9, carol, apple, 9000]", > Dec 02 04:42:26 "+I[8, bill, banana, 8000]", > Dec 02 04:42:26 "+I[6, jerry, pen, 6000]"] > Dec 02 04:42:26 to contain exactly in any order: > Dec 02 04:42:26 ["+I[Adam, null]", > Dec 02 04:42:26 "+I[Baker, Research]", > Dec 02 04:42:26 "+I[Charlie, Human Resources]", > Dec 02 04:42:26 "+I[Charlie, HR]", > Dec 02 04:42:26 "+I[Don, Sales]", > Dec 02 04:42:26 "+I[Victor, null]", > Dec 02 04:42:26 "+I[Helena, Engineering]", > Dec 02 04:42:26 "+I[Juliet, Engineering]", > Dec 02 04:42:26 "+I[Ivana, Research]", > Dec 02 04:42:26 "+I[Charlie, People Operations]"] > {noformat} > examples of failures > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=55120=logs=0c940707-2659-5648-cbe6-a1ad63045f0a=075c2716-8010-5565-fe08-3c4bb45824a4=12099 > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=55129=logs=a9db68b9-a7e0-54b6-0f98-010e0aff39e2=cdd32e0b-6047-565b-c58f-14054472f1be=11786 > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=55136=logs=0c940707-2659-5648-cbe6-a1ad63045f0a=075c2716-8010-5565-fe08-3c4bb45824a4=12099 > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=55137=logs=0c940707-2659-5648-cbe6-a1ad63045f0a=075c2716-8010-5565-fe08-3c4bb45824a4=11779 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-33727) JoinRestoreTest is failing on AZP
[ https://issues.apache.org/jira/browse/FLINK-33727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Nuyanzin updated FLINK-33727: Description: Since {{JoinRestoreTest}} was introduced in FLINK-33470 it seems to be a reason {noformat} Dec 02 04:42:26 04:42:26.408 [ERROR] Failures: Dec 02 04:42:26 04:42:26.408 [ERROR] JoinRestoreTest>RestoreTestBase.testRestore:283 Dec 02 04:42:26 Expecting actual: Dec 02 04:42:26 ["+I[9, carol, apple, 9000]", Dec 02 04:42:26 "+I[8, bill, banana, 8000]", Dec 02 04:42:26 "+I[6, jerry, pen, 6000]"] Dec 02 04:42:26 to contain exactly in any order: Dec 02 04:42:26 ["+I[Adam, null]", Dec 02 04:42:26 "+I[Baker, Research]", Dec 02 04:42:26 "+I[Charlie, Human Resources]", Dec 02 04:42:26 "+I[Charlie, HR]", Dec 02 04:42:26 "+I[Don, Sales]", Dec 02 04:42:26 "+I[Victor, null]", Dec 02 04:42:26 "+I[Helena, Engineering]", Dec 02 04:42:26 "+I[Juliet, Engineering]", Dec 02 04:42:26 "+I[Ivana, Research]", Dec 02 04:42:26 "+I[Charlie, People Operations]"] {noformat} examples of failures https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=55120=logs=0c940707-2659-5648-cbe6-a1ad63045f0a=075c2716-8010-5565-fe08-3c4bb45824a4=12099 https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=55129=logs=a9db68b9-a7e0-54b6-0f98-010e0aff39e2=cdd32e0b-6047-565b-c58f-14054472f1be=11786 https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=55136=logs=0c940707-2659-5648-cbe6-a1ad63045f0a=075c2716-8010-5565-fe08-3c4bb45824a4=12099 https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=55137=logs=0c940707-2659-5648-cbe6-a1ad63045f0a=075c2716-8010-5565-fe08-3c4bb45824a4=11779 was: Since it was introduced in FLINK-33470 it seems to be a reason {noformat} Dec 02 04:42:26 04:42:26.408 [ERROR] Failures: Dec 02 04:42:26 04:42:26.408 [ERROR] JoinRestoreTest>RestoreTestBase.testRestore:283 Dec 02 04:42:26 Expecting actual: Dec 02 04:42:26 ["+I[9, carol, apple, 9000]", Dec 02 04:42:26 "+I[8, bill, banana, 8000]", Dec 02 04:42:26 "+I[6, jerry, pen, 6000]"] Dec 02 04:42:26 to contain exactly in any order: Dec 02 04:42:26 ["+I[Adam, null]", Dec 02 04:42:26 "+I[Baker, Research]", Dec 02 04:42:26 "+I[Charlie, Human Resources]", Dec 02 04:42:26 "+I[Charlie, HR]", Dec 02 04:42:26 "+I[Don, Sales]", Dec 02 04:42:26 "+I[Victor, null]", Dec 02 04:42:26 "+I[Helena, Engineering]", Dec 02 04:42:26 "+I[Juliet, Engineering]", Dec 02 04:42:26 "+I[Ivana, Research]", Dec 02 04:42:26 "+I[Charlie, People Operations]"] {noformat} examples of failures https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=55120=logs=0c940707-2659-5648-cbe6-a1ad63045f0a=075c2716-8010-5565-fe08-3c4bb45824a4=12099 https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=55129=logs=a9db68b9-a7e0-54b6-0f98-010e0aff39e2=cdd32e0b-6047-565b-c58f-14054472f1be=11786 https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=55136=logs=0c940707-2659-5648-cbe6-a1ad63045f0a=075c2716-8010-5565-fe08-3c4bb45824a4=12099 https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=55137=logs=0c940707-2659-5648-cbe6-a1ad63045f0a=075c2716-8010-5565-fe08-3c4bb45824a4=11779 > JoinRestoreTest is failing on AZP > - > > Key: FLINK-33727 > URL: https://issues.apache.org/jira/browse/FLINK-33727 > Project: Flink > Issue Type: Bug > Components: Table SQL / Planner >Affects Versions: 1.19.0 >Reporter: Sergey Nuyanzin >Priority: Critical > Labels: test-stability > > Since {{JoinRestoreTest}} was introduced in FLINK-33470 it seems to be a > reason > {noformat} > Dec 02 04:42:26 04:42:26.408 [ERROR] Failures: > Dec 02 04:42:26 04:42:26.408 [ERROR] > JoinRestoreTest>RestoreTestBase.testRestore:283 > Dec 02 04:42:26 Expecting actual: > Dec 02 04:42:26 ["+I[9, carol, apple, 9000]", > Dec 02 04:42:26 "+I[8, bill, banana, 8000]", > Dec 02 04:42:26 "+I[6, jerry, pen, 6000]"] > Dec 02 04:42:26 to contain exactly in any order: > Dec 02 04:42:26 ["+I[Adam, null]", > Dec 02 04:42:26 "+I[Baker, Research]", > Dec 02 04:42:26 "+I[Charlie, Human Resources]", > Dec 02 04:42:26 "+I[Charlie, HR]", > Dec 02 04:42:26 "+I[Don, Sales]", > Dec 02 04:42:26 "+I[Victor, null]", > Dec 02 04:42:26 "+I[Helena, Engineering]", > Dec 02 04:42:26 "+I[Juliet, Engineering]", > Dec 02 04:42:26 "+I[Ivana, Research]", > Dec 02 04:42:26 "+I[Charlie, People Operations]"] > {noformat} > examples of failures > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=55120=logs=0c940707-2659-5648-cbe6-a1ad63045f0a=075c2716-8010-5565-fe08-3c4bb45824a4=12099 >
[jira] [Commented] (FLINK-32986) The new createTemporaryFunction has some regression of type inference compare to the deprecated registerFunction
[ https://issues.apache.org/jira/browse/FLINK-32986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17792406#comment-17792406 ] Jeyhun Karimov commented on FLINK-32986: Hi [~lincoln.86xy] could you please assign this task to me or give me an access to self-assign the task? Thanks! > The new createTemporaryFunction has some regression of type inference compare > to the deprecated registerFunction > > > Key: FLINK-32986 > URL: https://issues.apache.org/jira/browse/FLINK-32986 > Project: Flink > Issue Type: Technical Debt > Components: Table SQL / API >Affects Versions: 1.18.0, 1.17.1 >Reporter: lincoln lee >Priority: Major > Labels: pull-request-available > > Current `LookupJoinITCase#testJoinTemporalTableWithUdfFilter` uses a legacy > form function registration: > {code} > tEnv.registerFunction("add", new TestAddWithOpen) > {code} > it works fine with the SQL call `add(T.id, 2) > 3` but fails when swith to > the new api: > {code} > tEnv.createTemporaryFunction("add", classOf[TestAddWithOpen]) > // or this > tEnv.createTemporaryFunction("add", new TestAddWithOpen) > {code} > exception: > {code} > Caused by: org.apache.flink.table.api.ValidationException: Invalid function > call: > default_catalog.default_database.add(BIGINT, INT NOT NULL) > at > org.apache.flink.table.types.inference.TypeInferenceUtil.createInvalidCallException(TypeInferenceUtil.java:193) > at > org.apache.flink.table.planner.functions.inference.TypeInferenceOperandChecker.checkOperandTypes(TypeInferenceOperandChecker.java:89) > at > org.apache.calcite.sql.SqlOperator.checkOperandTypes(SqlOperator.java:753) > at > org.apache.calcite.sql.SqlOperator.validateOperands(SqlOperator.java:499) > at org.apache.calcite.sql.SqlFunction.deriveType(SqlFunction.java:335) > at org.apache.calcite.sql.SqlFunction.deriveType(SqlFunction.java:231) > at > org.apache.calcite.sql.validate.SqlValidatorImpl$DeriveTypeVisitor.visit(SqlValidatorImpl.java:6302) > at > org.apache.calcite.sql.validate.SqlValidatorImpl$DeriveTypeVisitor.visit(SqlValidatorImpl.java:6287) > at org.apache.calcite.sql.SqlCall.accept(SqlCall.java:161) > at > org.apache.calcite.sql.validate.SqlValidatorImpl.deriveTypeImpl(SqlValidatorImpl.java:1869) > at > org.apache.calcite.sql.validate.SqlValidatorImpl.deriveType(SqlValidatorImpl.java:1860) > at > org.apache.calcite.sql.type.SqlTypeUtil.deriveType(SqlTypeUtil.java:200) > at > org.apache.calcite.sql.type.InferTypes.lambda$static$0(InferTypes.java:47) > at > org.apache.calcite.sql.validate.SqlValidatorImpl.inferUnknownTypes(SqlValidatorImpl.java:2050) > at > org.apache.calcite.sql.validate.SqlValidatorImpl.inferUnknownTypes(SqlValidatorImpl.java:2055) > at > org.apache.calcite.sql.validate.SqlValidatorImpl.validateWhereOrOn(SqlValidatorImpl.java:4338) > at > org.apache.calcite.sql.validate.SqlValidatorImpl.validateJoin(SqlValidatorImpl.java:3410) > at > org.apache.flink.table.planner.calcite.FlinkCalciteSqlValidator.validateJoin(FlinkCalciteSqlValidator.java:154) > at > org.apache.calcite.sql.validate.SqlValidatorImpl.validateFrom(SqlValidatorImpl.java:3282) > at > org.apache.calcite.sql.validate.SqlValidatorImpl.validateSelect(SqlValidatorImpl.java:3603) > at > org.apache.calcite.sql.validate.SelectNamespace.validateImpl(SelectNamespace.java:64) > at > org.apache.calcite.sql.validate.AbstractNamespace.validate(AbstractNamespace.java:89) > at > org.apache.calcite.sql.validate.SqlValidatorImpl.validateNamespace(SqlValidatorImpl.java:1050) > at > org.apache.calcite.sql.validate.SqlValidatorImpl.validateQuery(SqlValidatorImpl.java:1025) > at org.apache.calcite.sql.SqlSelect.validate(SqlSelect.java:248) > at > org.apache.calcite.sql.validate.SqlValidatorImpl.validateScopedExpression(SqlValidatorImpl.java:1000) > at > org.apache.calcite.sql.validate.SqlValidatorImpl.validate(SqlValidatorImpl.java:749) > at > org.apache.flink.table.planner.calcite.FlinkPlannerImpl.org$apache$flink$table$planner$calcite$FlinkPlannerImpl$$validate(FlinkPlannerImpl.scala:196) > ... 49 more > Caused by: org.apache.flink.table.api.ValidationException: Invalid input > arguments. Expected signatures are: > default_catalog.default_database.add(a BIGINT NOT NULL, b INT NOT NULL) > default_catalog.default_database.add(a BIGINT NOT NULL, b BIGINT NOT NULL) > at > org.apache.flink.table.types.inference.TypeInferenceUtil.createInvalidInputException(TypeInferenceUtil.java:180) > at >
[jira] [Commented] (FLINK-31481) Support enhanced show databases syntax
[ https://issues.apache.org/jira/browse/FLINK-31481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17792404#comment-17792404 ] Jeyhun Karimov commented on FLINK-31481: Hi [~taoran] could you please assign this task to me or give me an access to self-assign the task? Thanks! > Support enhanced show databases syntax > -- > > Key: FLINK-31481 > URL: https://issues.apache.org/jira/browse/FLINK-31481 > Project: Flink > Issue Type: Sub-task > Components: Table SQL / API, Table SQL / Planner >Reporter: Ran Tao >Priority: Major > Labels: pull-request-available > > As FLIP discussed. To avoid bloat, this ticket supports ShowDatabases. -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [PR] [hotfix][table] refactoring to template method to separate concerns [flink]
flinkbot commented on PR #23857: URL: https://github.com/apache/flink/pull/23857#issuecomment-1837186832 ## CI report: * b7f62e249f5ad4280b0a994bf87ef64323d2dee3 UNKNOWN Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[PR] [hotfix][table] refactoring to template method to separate concerns [flink]
JingGe opened a new pull request, #23857: URL: https://github.com/apache/flink/pull/23857 ## What is the purpose of the change refactoring to template method to separate concerns ## Verifying this change This change is already covered by existing tests, such as TableauStyleTest. ## Does this pull request potentially affect one of the following parts: - Dependencies (does it add or upgrade a dependency): (yes / **no**) - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: (yes / **no**) - The serializers: (yes / **no** / don't know) - The runtime per-record code paths (performance sensitive): (yes / **no** / don't know) - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: (yes / **no** / don't know) - The S3 file system connector: (yes / **no** / don't know) ## Documentation - Does this pull request introduce a new feature? (yes / **no**) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [FLINK-33669][doc] Update the documentation for RestartStrategy, Checkpoint Storage, and State Backend. [flink]
JingGe commented on code in PR #23847: URL: https://github.com/apache/flink/pull/23847#discussion_r1412809280 ## docs/content.zh/docs/deployment/filesystems/gcs.md: ## @@ -44,7 +44,10 @@ env.readTextFile("gs:///"); stream.writeAsText("gs:///"); // Use GCS as checkpoint storage Review Comment: Thanks for the clarification. If we take a look at other case like azure at line 66, oss at line 51, s3 at line 49, they are all translated into Chinese. It is a question of consistency. WDYT? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (FLINK-33240) Generate docs for deprecated options as well
[ https://issues.apache.org/jira/browse/FLINK-33240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17792394#comment-17792394 ] Jing Ge commented on FLINK-33240: - please check FLINK-30862 for some objections. > Generate docs for deprecated options as well > > > Key: FLINK-33240 > URL: https://issues.apache.org/jira/browse/FLINK-33240 > Project: Flink > Issue Type: Improvement > Components: Documentation >Reporter: Zhanghao Chen >Priority: Major > Fix For: 1.19.0 > > > Currently, Flink will skip doc generation for deprecated options (See > {{{}ConfigOptionsDocGenerator#{}}}{{{}shouldBeDocumented{}}}). As a result, > the deprecated options can no longer be found in the new version of Flink > document. This might confuse users upgrading from an older version of Flink > and they have to either carefully read the release notes or check the source > code for upgrading guidance on deprecated options. I suggest generating doc > for deprecated options as well, and we should scan through the code to make > sure that proper upgrading guidance is provided for the deprecated options. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-33727) JoinRestoreTest is failing on AZP
[ https://issues.apache.org/jira/browse/FLINK-33727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17792391#comment-17792391 ] Sergey Nuyanzin commented on FLINK-33727: - [~dwysakowicz], [~jhughes] could you please have a look here please? > JoinRestoreTest is failing on AZP > - > > Key: FLINK-33727 > URL: https://issues.apache.org/jira/browse/FLINK-33727 > Project: Flink > Issue Type: Bug > Components: Table SQL / Planner >Affects Versions: 1.19.0 >Reporter: Sergey Nuyanzin >Priority: Critical > Labels: test-stability > > Since it was introduced in FLINK-33470 it seems to be a reason > {noformat} > Dec 02 04:42:26 04:42:26.408 [ERROR] Failures: > Dec 02 04:42:26 04:42:26.408 [ERROR] > JoinRestoreTest>RestoreTestBase.testRestore:283 > Dec 02 04:42:26 Expecting actual: > Dec 02 04:42:26 ["+I[9, carol, apple, 9000]", > Dec 02 04:42:26 "+I[8, bill, banana, 8000]", > Dec 02 04:42:26 "+I[6, jerry, pen, 6000]"] > Dec 02 04:42:26 to contain exactly in any order: > Dec 02 04:42:26 ["+I[Adam, null]", > Dec 02 04:42:26 "+I[Baker, Research]", > Dec 02 04:42:26 "+I[Charlie, Human Resources]", > Dec 02 04:42:26 "+I[Charlie, HR]", > Dec 02 04:42:26 "+I[Don, Sales]", > Dec 02 04:42:26 "+I[Victor, null]", > Dec 02 04:42:26 "+I[Helena, Engineering]", > Dec 02 04:42:26 "+I[Juliet, Engineering]", > Dec 02 04:42:26 "+I[Ivana, Research]", > Dec 02 04:42:26 "+I[Charlie, People Operations]"] > {noformat} > examples of failures > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=55120=logs=0c940707-2659-5648-cbe6-a1ad63045f0a=075c2716-8010-5565-fe08-3c4bb45824a4=12099 > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=55129=logs=a9db68b9-a7e0-54b6-0f98-010e0aff39e2=cdd32e0b-6047-565b-c58f-14054472f1be=11786 > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=55136=logs=0c940707-2659-5648-cbe6-a1ad63045f0a=075c2716-8010-5565-fe08-3c4bb45824a4=12099 > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=55137=logs=0c940707-2659-5648-cbe6-a1ad63045f0a=075c2716-8010-5565-fe08-3c4bb45824a4=11779 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-33727) JoinRestoreTest is failing on AZP
Sergey Nuyanzin created FLINK-33727: --- Summary: JoinRestoreTest is failing on AZP Key: FLINK-33727 URL: https://issues.apache.org/jira/browse/FLINK-33727 Project: Flink Issue Type: Bug Components: Table SQL / Planner Affects Versions: 1.19.0 Reporter: Sergey Nuyanzin Since it was introduced in FLINK-33470 it seems to be a reason {noformat} Dec 02 04:42:26 04:42:26.408 [ERROR] Failures: Dec 02 04:42:26 04:42:26.408 [ERROR] JoinRestoreTest>RestoreTestBase.testRestore:283 Dec 02 04:42:26 Expecting actual: Dec 02 04:42:26 ["+I[9, carol, apple, 9000]", Dec 02 04:42:26 "+I[8, bill, banana, 8000]", Dec 02 04:42:26 "+I[6, jerry, pen, 6000]"] Dec 02 04:42:26 to contain exactly in any order: Dec 02 04:42:26 ["+I[Adam, null]", Dec 02 04:42:26 "+I[Baker, Research]", Dec 02 04:42:26 "+I[Charlie, Human Resources]", Dec 02 04:42:26 "+I[Charlie, HR]", Dec 02 04:42:26 "+I[Don, Sales]", Dec 02 04:42:26 "+I[Victor, null]", Dec 02 04:42:26 "+I[Helena, Engineering]", Dec 02 04:42:26 "+I[Juliet, Engineering]", Dec 02 04:42:26 "+I[Ivana, Research]", Dec 02 04:42:26 "+I[Charlie, People Operations]"] {noformat} examples of failures https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=55120=logs=0c940707-2659-5648-cbe6-a1ad63045f0a=075c2716-8010-5565-fe08-3c4bb45824a4=12099 https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=55129=logs=a9db68b9-a7e0-54b6-0f98-010e0aff39e2=cdd32e0b-6047-565b-c58f-14054472f1be=11786 https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=55136=logs=0c940707-2659-5648-cbe6-a1ad63045f0a=075c2716-8010-5565-fe08-3c4bb45824a4=12099 https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=55137=logs=0c940707-2659-5648-cbe6-a1ad63045f0a=075c2716-8010-5565-fe08-3c4bb45824a4=11779 -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [PR] [FLINK-33470] Implement restore tests for Join node [flink]
snuyanzin commented on PR #23680: URL: https://github.com/apache/flink/pull/23680#issuecomment-1837157958 @dawidwys @jnh5y after merging this ci on master failed 4 times out of 6 e.g. ``` Dec 02 04:42:26 04:42:26.408 [ERROR] Failures: Dec 02 04:42:26 04:42:26.408 [ERROR] JoinRestoreTest>RestoreTestBase.testRestore:283 Dec 02 04:42:26 Expecting actual: Dec 02 04:42:26 ["+I[9, carol, apple, 9000]", Dec 02 04:42:26 "+I[8, bill, banana, 8000]", Dec 02 04:42:26 "+I[6, jerry, pen, 6000]"] Dec 02 04:42:26 to contain exactly in any order: Dec 02 04:42:26 ["+I[Adam, null]", Dec 02 04:42:26 "+I[Baker, Research]", Dec 02 04:42:26 "+I[Charlie, Human Resources]", Dec 02 04:42:26 "+I[Charlie, HR]", Dec 02 04:42:26 "+I[Don, Sales]", Dec 02 04:42:26 "+I[Victor, null]", Dec 02 04:42:26 "+I[Helena, Engineering]", Dec 02 04:42:26 "+I[Juliet, Engineering]", Dec 02 04:42:26 "+I[Ivana, Research]", Dec 02 04:42:26 "+I[Charlie, People Operations]"] ``` https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=55120=logs=0c940707-2659-5648-cbe6-a1ad63045f0a=075c2716-8010-5565-fe08-3c4bb45824a4=12099 https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=55129=logs=a9db68b9-a7e0-54b6-0f98-010e0aff39e2=cdd32e0b-6047-565b-c58f-14054472f1be=11786 https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=55136=logs=0c940707-2659-5648-cbe6-a1ad63045f0a=075c2716-8010-5565-fe08-3c4bb45824a4=12099 https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=55137=logs=0c940707-2659-5648-cbe6-a1ad63045f0a=075c2716-8010-5565-fe08-3c4bb45824a4=11779 could you please have a look? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (FLINK-33190) [Umbrella]Externalized Connectors should directly depend on 3rd-party libs instead of shaded repo
[ https://issues.apache.org/jira/browse/FLINK-33190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17792386#comment-17792386 ] Jing Ge commented on FLINK-33190: - [~martijnvisser] thanks for the hint! > [Umbrella]Externalized Connectors should directly depend on 3rd-party libs > instead of shaded repo > -- > > Key: FLINK-33190 > URL: https://issues.apache.org/jira/browse/FLINK-33190 > Project: Flink > Issue Type: Improvement > Components: Build System >Affects Versions: 1.18.0 >Reporter: Jing Ge >Priority: Major > > Connectors shouldn't depend on flink-shaded. > The overhead and/or risks of doing/supporting that right now far > outweigh the benefits. > ( Because we either have to encode the full version for all dependencies > into the package, or accept the risk of minor/patch dependency clashes) > Connectors are small enough in scope that depending directly on > guava/jackson/etc. is a fine approach, and they have plenty of other > dependencies that they need to manage anyway; let's treat these the same > way. > > https://lists.apache.org/thread/mtypmprz2b5p20gj064d0wsz3k0ofpco -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Closed] (FLINK-33193) JDBC Connector should directly depend on 3rd-party libs instead of flink-shaded repo
[ https://issues.apache.org/jira/browse/FLINK-33193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Ge closed FLINK-33193. --- Resolution: Won't Fix > JDBC Connector should directly depend on 3rd-party libs instead of > flink-shaded repo > > > Key: FLINK-33193 > URL: https://issues.apache.org/jira/browse/FLINK-33193 > Project: Flink > Issue Type: Sub-task > Components: Connectors / JDBC >Affects Versions: 1.18.0 >Reporter: Jing Ge >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Closed] (FLINK-33195) ElasticSearch Connector should directly depend on 3rd-party libs instead of flink-shaded repo
[ https://issues.apache.org/jira/browse/FLINK-33195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Ge closed FLINK-33195. --- Resolution: Won't Fix > ElasticSearch Connector should directly depend on 3rd-party libs instead of > flink-shaded repo > - > > Key: FLINK-33195 > URL: https://issues.apache.org/jira/browse/FLINK-33195 > Project: Flink > Issue Type: Sub-task > Components: Connectors / ElasticSearch >Affects Versions: 1.18.0 >Reporter: Jing Ge >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-33698) Fix the backoff time calculation in ExponentialBackoffDelayRetryStrategy
[ https://issues.apache.org/jira/browse/FLINK-33698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xiangyu feng updated FLINK-33698: - Description: The backoff time calculation in `ExponentialBackoffDelayRetryStrategy` is problematic due to the lack of a reset when reusing the object. Current Version: {code:java} @Override public long getBackoffTimeMillis(int currentAttempts) { if (currentAttempts <= 1) { // equivalent to initial delay return lastRetryDelay; } long backoff = Math.min((long) (lastRetryDelay * multiplier), maxRetryDelay); this.lastRetryDelay = backoff; return backoff; } {code} Fixed Version: {code:java} @Override public long getBackoffTimeMillis(int currentAttempts) { if (currentAttempts <= 1) { // reset to initialDelay this.lastRetryDelay = initialDelay; return lastRetryDelay; } long backoff = Math.min((long) (lastRetryDelay * multiplier), maxRetryDelay); this.lastRetryDelay = backoff; return backoff; }{code} was: The backoff time calculation in `ExponentialBackoffDelayRetryStrategy` should consider currentAttempts. Current Version: {code:java} @Override public long getBackoffTimeMillis(int currentAttempts) { if (currentAttempts <= 1) { // equivalent to initial delay return lastRetryDelay; } long backoff = Math.min((long) (lastRetryDelay * multiplier), maxRetryDelay); this.lastRetryDelay = backoff; return backoff; } {code} Fixed Version: {code:java} @Override public long getBackoffTimeMillis(int currentAttempts) { if (currentAttempts <= 1) { // reset to initialDelay this.lastRetryDelay = initialDelay; return lastRetryDelay; } long backoff = Math.min((long) (lastRetryDelay * multiplier), maxRetryDelay); this.lastRetryDelay = backoff; return backoff; }{code} > Fix the backoff time calculation in ExponentialBackoffDelayRetryStrategy > > > Key: FLINK-33698 > URL: https://issues.apache.org/jira/browse/FLINK-33698 > Project: Flink > Issue Type: Bug > Components: API / DataStream >Reporter: xiangyu feng >Assignee: xiangyu feng >Priority: Major > Labels: pull-request-available > Fix For: 1.19.0 > > > The backoff time calculation in `ExponentialBackoffDelayRetryStrategy` is > problematic due to the lack of a reset when reusing the object. > > Current Version: > {code:java} > @Override > public long getBackoffTimeMillis(int currentAttempts) { > if (currentAttempts <= 1) { > // equivalent to initial delay > return lastRetryDelay; > } > long backoff = Math.min((long) (lastRetryDelay * multiplier), > maxRetryDelay); > this.lastRetryDelay = backoff; > return backoff; > } {code} > Fixed Version: > {code:java} > @Override > public long getBackoffTimeMillis(int currentAttempts) { > if (currentAttempts <= 1) { > // reset to initialDelay > this.lastRetryDelay = initialDelay; > return lastRetryDelay; > } > long backoff = Math.min((long) (lastRetryDelay * multiplier), > maxRetryDelay); > this.lastRetryDelay = backoff; > return backoff; > }{code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-33698) Fix the backoff time calculation in ExponentialBackoffDelayRetryStrategy
[ https://issues.apache.org/jira/browse/FLINK-33698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xiangyu feng updated FLINK-33698: - Description: The backoff time calculation in `ExponentialBackoffDelayRetryStrategy` should consider currentAttempts. Current Version: {code:java} @Override public long getBackoffTimeMillis(int currentAttempts) { if (currentAttempts <= 1) { // equivalent to initial delay return lastRetryDelay; } long backoff = Math.min((long) (lastRetryDelay * multiplier), maxRetryDelay); this.lastRetryDelay = backoff; return backoff; } {code} Fixed Version: {code:java} @Override public long getBackoffTimeMillis(int currentAttempts) { if (currentAttempts <= 1) { // reset to initialDelay this.lastRetryDelay = initialDelay; return lastRetryDelay; } long backoff = Math.min((long) (lastRetryDelay * multiplier), maxRetryDelay); this.lastRetryDelay = backoff; return backoff; }{code} was: The backoff time calculation in `ExponentialBackoffDelayRetryStrategy` should consider currentAttempts. Current Version: {code:java} @Override public long getBackoffTimeMillis(int currentAttempts) { if (currentAttempts <= 1) { // equivalent to initial delay return lastRetryDelay; } long backoff = Math.min((long) (lastRetryDelay * multiplier), maxRetryDelay); this.lastRetryDelay = backoff; return backoff; } {code} Fixed Version: {code:java} @Override public long getBackoffTimeMillis(int currentAttempts) { if (currentAttempts <= 1) { // equivalent to initial delay return initialDelay; } long backoff = Math.min( (long) (initialDelay * Math.pow(multiplier, currentAttempts - 1)), maxRetryDelay); return backoff; } {code} > Fix the backoff time calculation in ExponentialBackoffDelayRetryStrategy > > > Key: FLINK-33698 > URL: https://issues.apache.org/jira/browse/FLINK-33698 > Project: Flink > Issue Type: Bug > Components: API / DataStream >Reporter: xiangyu feng >Assignee: xiangyu feng >Priority: Major > Labels: pull-request-available > Fix For: 1.19.0 > > > The backoff time calculation in `ExponentialBackoffDelayRetryStrategy` should > consider currentAttempts. > > Current Version: > {code:java} > @Override > public long getBackoffTimeMillis(int currentAttempts) { > if (currentAttempts <= 1) { > // equivalent to initial delay > return lastRetryDelay; > } > long backoff = Math.min((long) (lastRetryDelay * multiplier), > maxRetryDelay); > this.lastRetryDelay = backoff; > return backoff; > } {code} > Fixed Version: > {code:java} > @Override > public long getBackoffTimeMillis(int currentAttempts) { > if (currentAttempts <= 1) { > // reset to initialDelay > this.lastRetryDelay = initialDelay; > return lastRetryDelay; > } > long backoff = Math.min((long) (lastRetryDelay * multiplier), > maxRetryDelay); > this.lastRetryDelay = backoff; > return backoff; > }{code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [PR] [FLINK-33710] Prevent triggering cluster upgrades for permutations of the same overrides [flink-kubernetes-operator]
afedulov commented on code in PR #721: URL: https://github.com/apache/flink-kubernetes-operator/pull/721#discussion_r1412799765 ## flink-kubernetes-operator/src/main/java/org/apache/flink/kubernetes/operator/reconciler/deployment/AbstractFlinkResourceReconciler.java: ## @@ -180,13 +190,28 @@ public void reconcile(FlinkResourceContext ctx) throws Exception { } } -private void applyAutoscaler(FlinkResourceContext ctx) throws Exception { +private void applyAutoscaler(FlinkResourceContext ctx, @Nullable String existingOverrides) +throws Exception { var autoScalerCtx = ctx.getJobAutoScalerContext(); boolean autoscalerEnabled = ctx.getResource().getSpec().getJob() != null && ctx.getObserveConfig().getBoolean(AUTOSCALER_ENABLED); autoScalerCtx.getConfiguration().set(AUTOSCALER_ENABLED, autoscalerEnabled); + autoscaler.scale(autoScalerCtx); + +// Check that the overrides actually changed and not merely the String representation Review Comment: ```suggestion // Prevents subsequent unneeded spec updates when the `scale` operation only changes the order of the parallelism overrides (required because an unsorted map was used in the past). ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (FLINK-33684) Improve the retry strategy in CollectResultFetcher
[ https://issues.apache.org/jira/browse/FLINK-33684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated FLINK-33684: --- Labels: pull-request-available (was: ) > Improve the retry strategy in CollectResultFetcher > -- > > Key: FLINK-33684 > URL: https://issues.apache.org/jira/browse/FLINK-33684 > Project: Flink > Issue Type: Sub-task > Components: API / DataStream >Reporter: xiangyu feng >Priority: Major > Labels: pull-request-available > > Currently CollectResultFetcher use a fixed retry interval. > {code:java} > private void sleepBeforeRetry() { > if (retryMillis <= 0) { > return; > } > try { > // TODO a more proper retry strategy? > Thread.sleep(retryMillis); > } catch (InterruptedException e) { > LOG.warn("Interrupted when sleeping before a retry", e); > } > } {code} > This can be improved with a RetryStrategy. -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [PR] [FLINK-33669][doc] Update the documentation for RestartStrategy, Checkpoint Storage, and State Backend. [flink]
1996fanrui commented on code in PR #23847: URL: https://github.com/apache/flink/pull/23847#discussion_r1412799696 ## docs/content/docs/ops/state/task_failure_recovery.md: ## @@ -117,11 +116,11 @@ The fixed delay restart strategy can also be set programmatically: {{< tabs "73f5d009-b9af-4bfe-be22-d1c4659fd1ec" >}} {{< tab "Java" >}} ```java -StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); -env.setRestartStrategy(RestartStrategies.fixedDelayRestart( - 3, // number of restart attempts - Time.of(10, TimeUnit.SECONDS) // delay -)); +Configuration config = new Configuration(); +config.set(RestartStrategyOptions.RESTART_STRATEGY, "fixed-delay"); +config.set(RestartStrategyOptions.RESTART_STRATEGY_FIXED_DELAY_ATTEMPTS, 3); // number of restart attempts +config.set(RestartStrategyOptions.RESTART_STRATEGY_FIXED_DELAY_DELAY, Duration.ofSeconds(10)); // delay +StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(config); ``` Review Comment: There is already a yaml demo above, maybe we don't need a demo for programmatic set? Not sure, looking forward to your feedback. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [FLINK-33684] Use IncrementalDelayRetryStrategy in CollectResultFetcher [flink]
flinkbot commented on PR #23856: URL: https://github.com/apache/flink/pull/23856#issuecomment-1837148341 ## CI report: * 7a0b6a4637b823329768e45f0f156051dfa05356 UNKNOWN Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[PR] Flink 33684 [flink]
xiangyuf opened a new pull request, #23856: URL: https://github.com/apache/flink/pull/23856 ## What is the purpose of the change Currently `CollectResultFetcher` uses a fixed retry interval. This can be improved with `IncrementalDelayRetryStrategy`. `IncrementalDelayRetryStrategy` is a balance between job E2E latency and request count. ## Brief change log *(for example:)* - Use `IncrementalDelayRetryStrategy` in `CollectResultFetcher` to decide the retry interval between requests. ## Verifying this change - *Added integration tests `CollectResultFetcherRetryITCase` to test the `CollectResultFetcher` retry behavior ## Does this pull request potentially affect one of the following parts: - Dependencies (does it add or upgrade a dependency): no - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: no - The serializers: no - The runtime per-record code paths (performance sensitive): no - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: no - The S3 file system connector: no ## Documentation - Does this pull request introduce a new feature? no -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [FLINK-33669][doc] Update the documentation for RestartStrategy, Checkpoint Storage, and State Backend. [flink]
1996fanrui commented on code in PR #23847: URL: https://github.com/apache/flink/pull/23847#discussion_r1412797465 ## docs/content.zh/docs/ops/state/task_failure_recovery.md: ## @@ -49,20 +49,18 @@ Flink 作业如果没有定义重启策略,则会遵循集群启动时加载 {{< generated/restart_strategy_configuration >}} 除了定义默认的重启策略以外,还可以为每个 Flink 作业单独定义重启策略。 -这个重启策略通过在程序中的 `StreamExecutionEnvironment` 对象上调用 `setRestartStrategy` 方法来设置。 -当然,对于 `StreamExecutionEnvironment` 也同样适用。 下例展示了如何给我们的作业设置固定延时重启策略。 如果发生故障,系统会重启作业 3 次,每两次连续的重启尝试之间等待 10 秒钟。 {{< tabs "2b011473-9a34-4e7b-943b-be4a9071fe3c" >}} {{< tab "Java" >}} Review Comment: Hey @JunRuiLee , I'm thinking should we still have 3 parts(Java, Scala, Python) here? We recommend users use the options instead of api for all languages, it includes Scala and Python. I think only keeping options is enough here even if we didn't depreate the api for scala and python. WDYT? Note: if you agree it, the comments should take effect for all doc related to this FLIP. ## docs/content.zh/docs/ops/state/task_failure_recovery.md: ## @@ -49,20 +49,18 @@ Flink 作业如果没有定义重启策略,则会遵循集群启动时加载 {{< generated/restart_strategy_configuration >}} 除了定义默认的重启策略以外,还可以为每个 Flink 作业单独定义重启策略。 -这个重启策略通过在程序中的 `StreamExecutionEnvironment` 对象上调用 `setRestartStrategy` 方法来设置。 -当然,对于 `StreamExecutionEnvironment` 也同样适用。 下例展示了如何给我们的作业设置固定延时重启策略。 如果发生故障,系统会重启作业 3 次,每两次连续的重启尝试之间等待 10 秒钟。 {{< tabs "2b011473-9a34-4e7b-943b-be4a9071fe3c" >}} {{< tab "Java" >}} Review Comment: Hey @JunRuiLee , I'm thinking should we still have 3 parts(Java, Scala, Python) here? We recommend users use the options instead of api for all languages, it includes Scala and Python. I think only keeping options is enough here even if we didn't depreate the api for scala and python. WDYT? Note: if you agree it, this comment should take effect for all doc related to this FLIP. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [FLINK-33536] Fix Flink Table API CSV streaming sink fails with IOException: Stream closed [flink]
Samrat002 commented on PR #23725: URL: https://github.com/apache/flink/pull/23725#issuecomment-1837144127 @PrabhuJoseph Please review whenever time -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [FLINK-16175][table sql/ api][WIP]Add config option to switch case sensitive for column names in SQL [flink]
leonardBang closed pull request #11535: [FLINK-16175][table sql/ api][WIP]Add config option to switch case sensitive for column names in SQL URL: https://github.com/apache/flink/pull/11535 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Resolved] (FLINK-33625) FLIP-390: Support System out and err to be redirected to LOG or discarded
[ https://issues.apache.org/jira/browse/FLINK-33625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rui Fan resolved FLINK-33625. - Resolution: Fixed > FLIP-390: Support System out and err to be redirected to LOG or discarded > - > > Key: FLINK-33625 > URL: https://issues.apache.org/jira/browse/FLINK-33625 > Project: Flink > Issue Type: New Feature > Components: Runtime / Configuration >Reporter: Rui Fan >Assignee: Rui Fan >Priority: Major > Labels: pull-request-available > Fix For: 1.19.0 > > > Get more from https://cwiki.apache.org/confluence/x/4guZE -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-33625) FLIP-390: Support System out and err to be redirected to LOG or discarded
[ https://issues.apache.org/jira/browse/FLINK-33625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17792364#comment-17792364 ] Rui Fan commented on FLINK-33625: - Merged master<1.19.0> via: 186ed0eb0449a7bf3c216067798dc47a0c5b36a6 > FLIP-390: Support System out and err to be redirected to LOG or discarded > - > > Key: FLINK-33625 > URL: https://issues.apache.org/jira/browse/FLINK-33625 > Project: Flink > Issue Type: New Feature > Components: Runtime / Configuration >Reporter: Rui Fan >Assignee: Rui Fan >Priority: Major > Labels: pull-request-available > Fix For: 1.19.0 > > > Get more from https://cwiki.apache.org/confluence/x/4guZE -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [PR] [FLINK-33625][runtime] Support System out and err to be redirected to LOG or discarded [flink]
1996fanrui merged PR #23800: URL: https://github.com/apache/flink/pull/23800 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [FLINK-33584][Filesystems] Update Hadoop Filesystem dependencies to 3.3.6 [flink]
MartijnVisser commented on PR #23844: URL: https://github.com/apache/flink/pull/23844#issuecomment-1837112592 I'll run the S3 tests locally before merging it! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (FLINK-33726) Print cost time for stream queries in SQL Client
[ https://issues.apache.org/jira/browse/FLINK-33726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Ge updated FLINK-33726: Description: The time cost information is expected when executing stream queries in SQL CLI. For example: {code:java} Flink SQL> select * from (values ('abc', 123)); +++ | EXPR$0 | EXPR$1 | +++ |abc |123 | +++ Received a total of 1 rows (0.22 seconds){code} > Print cost time for stream queries in SQL Client > > > Key: FLINK-33726 > URL: https://issues.apache.org/jira/browse/FLINK-33726 > Project: Flink > Issue Type: Improvement > Components: Table SQL / Client >Reporter: Jing Ge >Assignee: Jing Ge >Priority: Major > > The time cost information is expected when executing stream queries in SQL > CLI. > For example: > {code:java} > Flink SQL> select * from (values ('abc', 123)); > +++ > | EXPR$0 | EXPR$1 | > +++ > |abc |123 | > +++ > Received a total of 1 rows (0.22 seconds){code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-33726) Print cost time for stream queries in SQL Client
Jing Ge created FLINK-33726: --- Summary: Print cost time for stream queries in SQL Client Key: FLINK-33726 URL: https://issues.apache.org/jira/browse/FLINK-33726 Project: Flink Issue Type: Improvement Components: Table SQL / Client Reporter: Jing Ge Assignee: Jing Ge -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-33600) Print cost time for batch queries in SQL Client
[ https://issues.apache.org/jira/browse/FLINK-33600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17792356#comment-17792356 ] Jing Ge commented on FLINK-33600: - master: c20c13fb5cb78eff2cbd08ea48f1cb7cf9a1981c > Print cost time for batch queries in SQL Client > --- > > Key: FLINK-33600 > URL: https://issues.apache.org/jira/browse/FLINK-33600 > Project: Flink > Issue Type: New Feature > Components: Table SQL / Client >Reporter: Jark Wu >Assignee: Jing Ge >Priority: Major > Labels: pull-request-available > Fix For: 1.19.0 > > > Currently, there is no cost time information when executing batch queries in > SQL CLI. But this is very helpful in OLAP/ad-hoc scenarios. > For example: > {code} > Flink SQL> select * from (values ('abc', 123)); > +++ > | EXPR$0 | EXPR$1 | > +++ > |abc |123 | > +++ > 1 row in set (0.22 seconds) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (FLINK-33600) Print cost time for batch queries in SQL Client
[ https://issues.apache.org/jira/browse/FLINK-33600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Ge resolved FLINK-33600. - Resolution: Fixed > Print cost time for batch queries in SQL Client > --- > > Key: FLINK-33600 > URL: https://issues.apache.org/jira/browse/FLINK-33600 > Project: Flink > Issue Type: New Feature > Components: Table SQL / Client >Reporter: Jark Wu >Assignee: Jing Ge >Priority: Major > Labels: pull-request-available > Fix For: 1.19.0 > > > Currently, there is no cost time information when executing batch queries in > SQL CLI. But this is very helpful in OLAP/ad-hoc scenarios. > For example: > {code} > Flink SQL> select * from (values ('abc', 123)); > +++ > | EXPR$0 | EXPR$1 | > +++ > |abc |123 | > +++ > 1 row in set (0.22 seconds) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [PR] FLINK-33600][table] print the query time cost for batch query in cli [flink]
JingGe merged PR #23809: URL: https://github.com/apache/flink/pull/23809 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (FLINK-33534) PipelineOptions.PARALLELISM_OVERRIDES is not picked up from jar submission request
[ https://issues.apache.org/jira/browse/FLINK-33534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17792351#comment-17792351 ] Gyula Fora commented on FLINK-33534: I assigned it to you! I haven’t tried to repro this outside the operator > PipelineOptions.PARALLELISM_OVERRIDES is not picked up from jar submission > request > -- > > Key: FLINK-33534 > URL: https://issues.apache.org/jira/browse/FLINK-33534 > Project: Flink > Issue Type: Bug > Components: Runtime / Configuration, Runtime / REST >Affects Versions: 1.18.0, 1.17.1 >Reporter: Gyula Fora >Assignee: Yunfeng Zhou >Priority: Major > > PARALLELISM_OVERRIDES are currently only applied when they are part of the > JobManager / Cluster configuration. > When this config is provided as part of the JarRunRequestBody it is > completely ignored and does not take effect. > The main reason is that the dispatcher reads this value from it's own > configuration object and does not include the extra configs passed through > the rest request. > This is a blocker for supporting the autoscaler properly for FlinkSessionJobs > in the autoscaler -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (FLINK-33534) PipelineOptions.PARALLELISM_OVERRIDES is not picked up from jar submission request
[ https://issues.apache.org/jira/browse/FLINK-33534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gyula Fora reassigned FLINK-33534: -- Assignee: Yunfeng Zhou > PipelineOptions.PARALLELISM_OVERRIDES is not picked up from jar submission > request > -- > > Key: FLINK-33534 > URL: https://issues.apache.org/jira/browse/FLINK-33534 > Project: Flink > Issue Type: Bug > Components: Runtime / Configuration, Runtime / REST >Affects Versions: 1.18.0, 1.17.1 >Reporter: Gyula Fora >Assignee: Yunfeng Zhou >Priority: Major > > PARALLELISM_OVERRIDES are currently only applied when they are part of the > JobManager / Cluster configuration. > When this config is provided as part of the JarRunRequestBody it is > completely ignored and does not take effect. > The main reason is that the dispatcher reads this value from it's own > configuration object and does not include the extra configs passed through > the rest request. > This is a blocker for supporting the autoscaler properly for FlinkSessionJobs > in the autoscaler -- This message was sent by Atlassian Jira (v8.20.10#820010)