date:20231202

Re: [PR] [FLINK-27529] Fix Intger Comparison For Source Index in Hybrid Source [flink]

2023-12-02 Thread via GitHub



varun1729DD commented on PR #23703:
URL: https://github.com/apache/flink/pull/23703#issuecomment-1837396406

   @flinkbot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] [FLINK-33210][core] Introduce job status changed listener for lineage [flink]

2023-12-02 Thread via GitHub



JingGe commented on PR #23695:
URL: https://github.com/apache/flink/pull/23695#issuecomment-1837390803

   @FangYongs are you still working on this PR? FYI: Flink doc needs to be 
generated again after new configOption is introduced.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Commented] (FLINK-22014) Flink JobManager failed to restart after failure in kubernetes HA setup

2023-12-02 Thread Yuexin Chen (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-22014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17792489#comment-17792489
 ] 

Yuexin Chen commented on FLINK-22014:
-

[~mason6345] Is it caused by this issue FLINK-33011? we use 
flink-kubernetes-operator 1.6 for flink tasks deployment.

> Flink JobManager failed to restart after failure in kubernetes HA setup
> ---
>
> Key: FLINK-22014
> URL: https://issues.apache.org/jira/browse/FLINK-22014
> Project: Flink
>  Issue Type: Bug
>  Components: Deployment / Kubernetes
>Affects Versions: 1.11.3, 1.12.2, 1.13.0
>Reporter: Mikalai Lushchytski
>Priority: Major
>  Labels: k8s-ha, pull-request-available
> Attachments: flink-logs.txt.zip, image-2021-04-19-11-17-58-215.png, 
> scalyr-logs (1).txt
>
>
> After the JobManager pod failed and the new one started, it was not able to 
> recover jobs due to the absence of recovery data in storage - config map 
> pointed at not existing file.
>   
>  Due to this the JobManager pod entered into the `CrashLoopBackOff`state and 
> was not able to recover - each attempt failed with the same error so the 
> whole cluster became unrecoverable and not operating.
>   
>  I had to manually delete the config map and start the jobs again without the 
> save point.
>   
>  If I tried to emulate the failure further by deleting job manager pod 
> manually, the new pod every time recovered well and issue was not 
> reproducible anymore artificially.
>   
>  Below is the failure log:
> {code:java}
> 2021-03-26 08:22:57,925 INFO 
> org.apache.flink.runtime.resourcemanager.slotmanager.SlotManagerImpl [] - 
> Starting the SlotManager.
>  2021-03-26 08:22:57,928 INFO 
> org.apache.flink.runtime.leaderretrieval.DefaultLeaderRetrievalService [] - 
> Starting DefaultLeaderRetrievalService with KubernetesLeaderRetrievalDriver
> {configMapName='stellar-flink-cluster-dispatcher-leader'}.
>  2021-03-26 08:22:57,931 INFO 
> org.apache.flink.runtime.jobmanager.DefaultJobGraphStore [] - Retrieved job 
> ids [198c46bac791e73ebcc565a550fa4ff6, 344f5ebc1b5c3a566b4b2837813e4940, 
> 96c4603a0822d10884f7fe536703d811, d9ded24224aab7c7041420b3efc1b6ba] from 
> KubernetesStateHandleStore{configMapName='stellar-flink-cluster-dispatcher-leader'}
> 2021-03-26 08:22:57,933 INFO 
> org.apache.flink.runtime.dispatcher.runner.SessionDispatcherLeaderProcess [] 
> - Trying to recover job with job id 198c46bac791e73ebcc565a550fa4ff6.
>  2021-03-26 08:22:58,029 INFO 
> org.apache.flink.runtime.dispatcher.runner.SessionDispatcherLeaderProcess [] 
> - Stopping SessionDispatcherLeaderProcess.
>  2021-03-26 08:28:22,677 INFO 
> org.apache.flink.runtime.jobmanager.DefaultJobGraphStore [] - Stopping 
> DefaultJobGraphStore. 2021-03-26 08:28:22,681 ERROR 
> org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - Fatal error 
> occurred in the cluster entrypoint. java.util.concurrent.CompletionException: 
> org.apache.flink.util.FlinkRuntimeException: Could not recover job with job 
> id 198c46bac791e73ebcc565a550fa4ff6.
>at java.util.concurrent.CompletableFuture.encodeThrowable(Unknown Source) 
> ~[?:?]
>at java.util.concurrent.CompletableFuture.completeThrowable(Unknown 
> Source) [?:?]
>at java.util.concurrent.CompletableFuture$AsyncSupply.run(Unknown Source) 
> [?:?]
>at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) [?:?]
>at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) [?:?]
>at java.lang.Thread.run(Unknown Source) [?:?] Caused by: 
> org.apache.flink.util.FlinkRuntimeException: Could not recover job with job 
> id 198c46bac791e73ebcc565a550fa4ff6.
>at 
> org.apache.flink.runtime.dispatcher.runner.SessionDispatcherLeaderProcess.recoverJob(SessionDispatcherLeaderProcess.java:144
>  undefined) ~[flink-dist_2.12-1.12.2.jar:1.12.2]
>at 
> org.apache.flink.runtime.dispatcher.runner.SessionDispatcherLeaderProcess.recoverJobs(SessionDispatcherLeaderProcess.java:122
>  undefined) ~[flink-dist_2.12-1.12.2.jar:1.12.2]
>at 
> org.apache.flink.runtime.dispatcher.runner.AbstractDispatcherLeaderProcess.supplyUnsynchronizedIfRunning(AbstractDispatcherLeaderProcess.java:198
>  undefined) ~[flink-dist_2.12-1.12.2.jar:1.12.2]
>at 
> org.apache.flink.runtime.dispatcher.runner.SessionDispatcherLeaderProcess.recoverJobsIfRunning(SessionDispatcherLeaderProcess.java:113
>  undefined) ~[flink-dist_2.12-1.12.2.jar:1.12.2] ... 4 more 
> Caused by: org.apache.flink.util.FlinkException: Could not retrieve submitted 
> JobGraph from state handle under jobGraph-198c46bac791e73ebcc565a550fa4ff6. 
> This indicates that the retrieved state handle is broken. Try cleaning the 
> state handle store.
>at 
>

[jira] (FLINK-22014) Flink JobManager failed to restart after failure in kubernetes HA setup

2023-12-02 Thread Yuexin Chen (Jira)



[ https://issues.apache.org/jira/browse/FLINK-22014 ]


Yuexin Chen deleted comment on FLINK-22014:
-

was (Author: JIRAUSER295713):
[~mason6345] Is it caused by this issue 
issues.apache.org/jira/browse/FLINK-33011, we use flink-kubernetes-operator 1.6 
for flink tasks deployment.

> Flink JobManager failed to restart after failure in kubernetes HA setup
> ---
>
> Key: FLINK-22014
> URL: https://issues.apache.org/jira/browse/FLINK-22014
> Project: Flink
>  Issue Type: Bug
>  Components: Deployment / Kubernetes
>Affects Versions: 1.11.3, 1.12.2, 1.13.0
>Reporter: Mikalai Lushchytski
>Priority: Major
>  Labels: k8s-ha, pull-request-available
> Attachments: flink-logs.txt.zip, image-2021-04-19-11-17-58-215.png, 
> scalyr-logs (1).txt
>
>
> After the JobManager pod failed and the new one started, it was not able to 
> recover jobs due to the absence of recovery data in storage - config map 
> pointed at not existing file.
>   
>  Due to this the JobManager pod entered into the `CrashLoopBackOff`state and 
> was not able to recover - each attempt failed with the same error so the 
> whole cluster became unrecoverable and not operating.
>   
>  I had to manually delete the config map and start the jobs again without the 
> save point.
>   
>  If I tried to emulate the failure further by deleting job manager pod 
> manually, the new pod every time recovered well and issue was not 
> reproducible anymore artificially.
>   
>  Below is the failure log:
> {code:java}
> 2021-03-26 08:22:57,925 INFO 
> org.apache.flink.runtime.resourcemanager.slotmanager.SlotManagerImpl [] - 
> Starting the SlotManager.
>  2021-03-26 08:22:57,928 INFO 
> org.apache.flink.runtime.leaderretrieval.DefaultLeaderRetrievalService [] - 
> Starting DefaultLeaderRetrievalService with KubernetesLeaderRetrievalDriver
> {configMapName='stellar-flink-cluster-dispatcher-leader'}.
>  2021-03-26 08:22:57,931 INFO 
> org.apache.flink.runtime.jobmanager.DefaultJobGraphStore [] - Retrieved job 
> ids [198c46bac791e73ebcc565a550fa4ff6, 344f5ebc1b5c3a566b4b2837813e4940, 
> 96c4603a0822d10884f7fe536703d811, d9ded24224aab7c7041420b3efc1b6ba] from 
> KubernetesStateHandleStore{configMapName='stellar-flink-cluster-dispatcher-leader'}
> 2021-03-26 08:22:57,933 INFO 
> org.apache.flink.runtime.dispatcher.runner.SessionDispatcherLeaderProcess [] 
> - Trying to recover job with job id 198c46bac791e73ebcc565a550fa4ff6.
>  2021-03-26 08:22:58,029 INFO 
> org.apache.flink.runtime.dispatcher.runner.SessionDispatcherLeaderProcess [] 
> - Stopping SessionDispatcherLeaderProcess.
>  2021-03-26 08:28:22,677 INFO 
> org.apache.flink.runtime.jobmanager.DefaultJobGraphStore [] - Stopping 
> DefaultJobGraphStore. 2021-03-26 08:28:22,681 ERROR 
> org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - Fatal error 
> occurred in the cluster entrypoint. java.util.concurrent.CompletionException: 
> org.apache.flink.util.FlinkRuntimeException: Could not recover job with job 
> id 198c46bac791e73ebcc565a550fa4ff6.
>at java.util.concurrent.CompletableFuture.encodeThrowable(Unknown Source) 
> ~[?:?]
>at java.util.concurrent.CompletableFuture.completeThrowable(Unknown 
> Source) [?:?]
>at java.util.concurrent.CompletableFuture$AsyncSupply.run(Unknown Source) 
> [?:?]
>at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) [?:?]
>at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) [?:?]
>at java.lang.Thread.run(Unknown Source) [?:?] Caused by: 
> org.apache.flink.util.FlinkRuntimeException: Could not recover job with job 
> id 198c46bac791e73ebcc565a550fa4ff6.
>at 
> org.apache.flink.runtime.dispatcher.runner.SessionDispatcherLeaderProcess.recoverJob(SessionDispatcherLeaderProcess.java:144
>  undefined) ~[flink-dist_2.12-1.12.2.jar:1.12.2]
>at 
> org.apache.flink.runtime.dispatcher.runner.SessionDispatcherLeaderProcess.recoverJobs(SessionDispatcherLeaderProcess.java:122
>  undefined) ~[flink-dist_2.12-1.12.2.jar:1.12.2]
>at 
> org.apache.flink.runtime.dispatcher.runner.AbstractDispatcherLeaderProcess.supplyUnsynchronizedIfRunning(AbstractDispatcherLeaderProcess.java:198
>  undefined) ~[flink-dist_2.12-1.12.2.jar:1.12.2]
>at 
> org.apache.flink.runtime.dispatcher.runner.SessionDispatcherLeaderProcess.recoverJobsIfRunning(SessionDispatcherLeaderProcess.java:113
>  undefined) ~[flink-dist_2.12-1.12.2.jar:1.12.2] ... 4 more 
> Caused by: org.apache.flink.util.FlinkException: Could not retrieve submitted 
> JobGraph from state handle under jobGraph-198c46bac791e73ebcc565a550fa4ff6. 
> This indicates that the retrieved state handle is broken. Try cleaning the 
> state handle store.
>at 
> org.apache.flink.runtime.jobmanager.DefaultJobGraphStore.recoverJobGraph(DefaultJobGraphStore.java:171

[jira] [Commented] (FLINK-22014) Flink JobManager failed to restart after failure in kubernetes HA setup

2023-12-02 Thread Yuexin Chen (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-22014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17792488#comment-17792488
 ] 

Yuexin Chen commented on FLINK-22014:
-

[~mason6345] Is it caused by this issue 
issues.apache.org/jira/browse/FLINK-33011, we use flink-kubernetes-operator 1.6 
for flink tasks deployment.

> Flink JobManager failed to restart after failure in kubernetes HA setup
> ---
>
> Key: FLINK-22014
> URL: https://issues.apache.org/jira/browse/FLINK-22014
> Project: Flink
>  Issue Type: Bug
>  Components: Deployment / Kubernetes
>Affects Versions: 1.11.3, 1.12.2, 1.13.0
>Reporter: Mikalai Lushchytski
>Priority: Major
>  Labels: k8s-ha, pull-request-available
> Attachments: flink-logs.txt.zip, image-2021-04-19-11-17-58-215.png, 
> scalyr-logs (1).txt
>
>
> After the JobManager pod failed and the new one started, it was not able to 
> recover jobs due to the absence of recovery data in storage - config map 
> pointed at not existing file.
>   
>  Due to this the JobManager pod entered into the `CrashLoopBackOff`state and 
> was not able to recover - each attempt failed with the same error so the 
> whole cluster became unrecoverable and not operating.
>   
>  I had to manually delete the config map and start the jobs again without the 
> save point.
>   
>  If I tried to emulate the failure further by deleting job manager pod 
> manually, the new pod every time recovered well and issue was not 
> reproducible anymore artificially.
>   
>  Below is the failure log:
> {code:java}
> 2021-03-26 08:22:57,925 INFO 
> org.apache.flink.runtime.resourcemanager.slotmanager.SlotManagerImpl [] - 
> Starting the SlotManager.
>  2021-03-26 08:22:57,928 INFO 
> org.apache.flink.runtime.leaderretrieval.DefaultLeaderRetrievalService [] - 
> Starting DefaultLeaderRetrievalService with KubernetesLeaderRetrievalDriver
> {configMapName='stellar-flink-cluster-dispatcher-leader'}.
>  2021-03-26 08:22:57,931 INFO 
> org.apache.flink.runtime.jobmanager.DefaultJobGraphStore [] - Retrieved job 
> ids [198c46bac791e73ebcc565a550fa4ff6, 344f5ebc1b5c3a566b4b2837813e4940, 
> 96c4603a0822d10884f7fe536703d811, d9ded24224aab7c7041420b3efc1b6ba] from 
> KubernetesStateHandleStore{configMapName='stellar-flink-cluster-dispatcher-leader'}
> 2021-03-26 08:22:57,933 INFO 
> org.apache.flink.runtime.dispatcher.runner.SessionDispatcherLeaderProcess [] 
> - Trying to recover job with job id 198c46bac791e73ebcc565a550fa4ff6.
>  2021-03-26 08:22:58,029 INFO 
> org.apache.flink.runtime.dispatcher.runner.SessionDispatcherLeaderProcess [] 
> - Stopping SessionDispatcherLeaderProcess.
>  2021-03-26 08:28:22,677 INFO 
> org.apache.flink.runtime.jobmanager.DefaultJobGraphStore [] - Stopping 
> DefaultJobGraphStore. 2021-03-26 08:28:22,681 ERROR 
> org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - Fatal error 
> occurred in the cluster entrypoint. java.util.concurrent.CompletionException: 
> org.apache.flink.util.FlinkRuntimeException: Could not recover job with job 
> id 198c46bac791e73ebcc565a550fa4ff6.
>at java.util.concurrent.CompletableFuture.encodeThrowable(Unknown Source) 
> ~[?:?]
>at java.util.concurrent.CompletableFuture.completeThrowable(Unknown 
> Source) [?:?]
>at java.util.concurrent.CompletableFuture$AsyncSupply.run(Unknown Source) 
> [?:?]
>at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) [?:?]
>at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) [?:?]
>at java.lang.Thread.run(Unknown Source) [?:?] Caused by: 
> org.apache.flink.util.FlinkRuntimeException: Could not recover job with job 
> id 198c46bac791e73ebcc565a550fa4ff6.
>at 
> org.apache.flink.runtime.dispatcher.runner.SessionDispatcherLeaderProcess.recoverJob(SessionDispatcherLeaderProcess.java:144
>  undefined) ~[flink-dist_2.12-1.12.2.jar:1.12.2]
>at 
> org.apache.flink.runtime.dispatcher.runner.SessionDispatcherLeaderProcess.recoverJobs(SessionDispatcherLeaderProcess.java:122
>  undefined) ~[flink-dist_2.12-1.12.2.jar:1.12.2]
>at 
> org.apache.flink.runtime.dispatcher.runner.AbstractDispatcherLeaderProcess.supplyUnsynchronizedIfRunning(AbstractDispatcherLeaderProcess.java:198
>  undefined) ~[flink-dist_2.12-1.12.2.jar:1.12.2]
>at 
> org.apache.flink.runtime.dispatcher.runner.SessionDispatcherLeaderProcess.recoverJobsIfRunning(SessionDispatcherLeaderProcess.java:113
>  undefined) ~[flink-dist_2.12-1.12.2.jar:1.12.2] ... 4 more 
> Caused by: org.apache.flink.util.FlinkException: Could not retrieve submitted 
> JobGraph from state handle under jobGraph-198c46bac791e73ebcc565a550fa4ff6. 
> This indicates that the retrieved state handle is broken. Try cleaning the 
> state handle store.
>at 
>

[jira] (FLINK-22014) Flink JobManager failed to restart after failure in kubernetes HA setup

2023-12-02 Thread Yuexin Chen (Jira)



[ https://issues.apache.org/jira/browse/FLINK-22014 ]


Yuexin Chen deleted comment on FLINK-22014:
-

was (Author: JIRAUSER295713):
[~mason6345] [~novakov.alex] have you found the issue of the problem?

> Flink JobManager failed to restart after failure in kubernetes HA setup
> ---
>
> Key: FLINK-22014
> URL: https://issues.apache.org/jira/browse/FLINK-22014
> Project: Flink
>  Issue Type: Bug
>  Components: Deployment / Kubernetes
>Affects Versions: 1.11.3, 1.12.2, 1.13.0
>Reporter: Mikalai Lushchytski
>Priority: Major
>  Labels: k8s-ha, pull-request-available
> Attachments: flink-logs.txt.zip, image-2021-04-19-11-17-58-215.png, 
> scalyr-logs (1).txt
>
>
> After the JobManager pod failed and the new one started, it was not able to 
> recover jobs due to the absence of recovery data in storage - config map 
> pointed at not existing file.
>   
>  Due to this the JobManager pod entered into the `CrashLoopBackOff`state and 
> was not able to recover - each attempt failed with the same error so the 
> whole cluster became unrecoverable and not operating.
>   
>  I had to manually delete the config map and start the jobs again without the 
> save point.
>   
>  If I tried to emulate the failure further by deleting job manager pod 
> manually, the new pod every time recovered well and issue was not 
> reproducible anymore artificially.
>   
>  Below is the failure log:
> {code:java}
> 2021-03-26 08:22:57,925 INFO 
> org.apache.flink.runtime.resourcemanager.slotmanager.SlotManagerImpl [] - 
> Starting the SlotManager.
>  2021-03-26 08:22:57,928 INFO 
> org.apache.flink.runtime.leaderretrieval.DefaultLeaderRetrievalService [] - 
> Starting DefaultLeaderRetrievalService with KubernetesLeaderRetrievalDriver
> {configMapName='stellar-flink-cluster-dispatcher-leader'}.
>  2021-03-26 08:22:57,931 INFO 
> org.apache.flink.runtime.jobmanager.DefaultJobGraphStore [] - Retrieved job 
> ids [198c46bac791e73ebcc565a550fa4ff6, 344f5ebc1b5c3a566b4b2837813e4940, 
> 96c4603a0822d10884f7fe536703d811, d9ded24224aab7c7041420b3efc1b6ba] from 
> KubernetesStateHandleStore{configMapName='stellar-flink-cluster-dispatcher-leader'}
> 2021-03-26 08:22:57,933 INFO 
> org.apache.flink.runtime.dispatcher.runner.SessionDispatcherLeaderProcess [] 
> - Trying to recover job with job id 198c46bac791e73ebcc565a550fa4ff6.
>  2021-03-26 08:22:58,029 INFO 
> org.apache.flink.runtime.dispatcher.runner.SessionDispatcherLeaderProcess [] 
> - Stopping SessionDispatcherLeaderProcess.
>  2021-03-26 08:28:22,677 INFO 
> org.apache.flink.runtime.jobmanager.DefaultJobGraphStore [] - Stopping 
> DefaultJobGraphStore. 2021-03-26 08:28:22,681 ERROR 
> org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - Fatal error 
> occurred in the cluster entrypoint. java.util.concurrent.CompletionException: 
> org.apache.flink.util.FlinkRuntimeException: Could not recover job with job 
> id 198c46bac791e73ebcc565a550fa4ff6.
>at java.util.concurrent.CompletableFuture.encodeThrowable(Unknown Source) 
> ~[?:?]
>at java.util.concurrent.CompletableFuture.completeThrowable(Unknown 
> Source) [?:?]
>at java.util.concurrent.CompletableFuture$AsyncSupply.run(Unknown Source) 
> [?:?]
>at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) [?:?]
>at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) [?:?]
>at java.lang.Thread.run(Unknown Source) [?:?] Caused by: 
> org.apache.flink.util.FlinkRuntimeException: Could not recover job with job 
> id 198c46bac791e73ebcc565a550fa4ff6.
>at 
> org.apache.flink.runtime.dispatcher.runner.SessionDispatcherLeaderProcess.recoverJob(SessionDispatcherLeaderProcess.java:144
>  undefined) ~[flink-dist_2.12-1.12.2.jar:1.12.2]
>at 
> org.apache.flink.runtime.dispatcher.runner.SessionDispatcherLeaderProcess.recoverJobs(SessionDispatcherLeaderProcess.java:122
>  undefined) ~[flink-dist_2.12-1.12.2.jar:1.12.2]
>at 
> org.apache.flink.runtime.dispatcher.runner.AbstractDispatcherLeaderProcess.supplyUnsynchronizedIfRunning(AbstractDispatcherLeaderProcess.java:198
>  undefined) ~[flink-dist_2.12-1.12.2.jar:1.12.2]
>at 
> org.apache.flink.runtime.dispatcher.runner.SessionDispatcherLeaderProcess.recoverJobsIfRunning(SessionDispatcherLeaderProcess.java:113
>  undefined) ~[flink-dist_2.12-1.12.2.jar:1.12.2] ... 4 more 
> Caused by: org.apache.flink.util.FlinkException: Could not retrieve submitted 
> JobGraph from state handle under jobGraph-198c46bac791e73ebcc565a550fa4ff6. 
> This indicates that the retrieved state handle is broken. Try cleaning the 
> state handle store.
>at 
> org.apache.flink.runtime.jobmanager.DefaultJobGraphStore.recoverJobGraph(DefaultJobGraphStore.java:171
>  undefined) ~[flink-dist_2.12-1.12.2.jar:1.12.2]
>at 
>

Re: [PR] [FLINK-33727][table] Use different sink names for restore tests [flink]

2023-12-02 Thread via GitHub



JingGe commented on code in PR #23858:
URL: https://github.com/apache/flink/pull/23858#discussion_r1413009264


##
flink-table/flink-table-planner/src/test/java/org/apache/flink/table/planner/plan/nodes/exec/testutils/JoinTestPrograms.java:
##
@@ -142,7 +142,7 @@ public class JoinTestPrograms {
 .setupTableSource(SOURCE_T1)
 .setupTableSource(SOURCE_T2)
 .setupTableSink(
-SinkTestStep.newBuilder("MySink")
+
SinkTestStep.newBuilder("NON_WINDOW_INNER_JOIN_WITH_NULL_Sink")

Review Comment:
   It is a great idea to use data warehouse like naming convention to improve 
the readability. It would be even better to follow the naming convention that 
table names commonly use lower case letters.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Commented] (FLINK-22014) Flink JobManager failed to restart after failure in kubernetes HA setup

2023-12-02 Thread chenyuexin (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-22014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17792483#comment-17792483
 ] 

chenyuexin commented on FLINK-22014:


[~mason6345] [~novakov.alex] have you found the issue of the problem?

> Flink JobManager failed to restart after failure in kubernetes HA setup
> ---
>
> Key: FLINK-22014
> URL: https://issues.apache.org/jira/browse/FLINK-22014
> Project: Flink
>  Issue Type: Bug
>  Components: Deployment / Kubernetes
>Affects Versions: 1.11.3, 1.12.2, 1.13.0
>Reporter: Mikalai Lushchytski
>Priority: Major
>  Labels: k8s-ha, pull-request-available
> Attachments: flink-logs.txt.zip, image-2021-04-19-11-17-58-215.png, 
> scalyr-logs (1).txt
>
>
> After the JobManager pod failed and the new one started, it was not able to 
> recover jobs due to the absence of recovery data in storage - config map 
> pointed at not existing file.
>   
>  Due to this the JobManager pod entered into the `CrashLoopBackOff`state and 
> was not able to recover - each attempt failed with the same error so the 
> whole cluster became unrecoverable and not operating.
>   
>  I had to manually delete the config map and start the jobs again without the 
> save point.
>   
>  If I tried to emulate the failure further by deleting job manager pod 
> manually, the new pod every time recovered well and issue was not 
> reproducible anymore artificially.
>   
>  Below is the failure log:
> {code:java}
> 2021-03-26 08:22:57,925 INFO 
> org.apache.flink.runtime.resourcemanager.slotmanager.SlotManagerImpl [] - 
> Starting the SlotManager.
>  2021-03-26 08:22:57,928 INFO 
> org.apache.flink.runtime.leaderretrieval.DefaultLeaderRetrievalService [] - 
> Starting DefaultLeaderRetrievalService with KubernetesLeaderRetrievalDriver
> {configMapName='stellar-flink-cluster-dispatcher-leader'}.
>  2021-03-26 08:22:57,931 INFO 
> org.apache.flink.runtime.jobmanager.DefaultJobGraphStore [] - Retrieved job 
> ids [198c46bac791e73ebcc565a550fa4ff6, 344f5ebc1b5c3a566b4b2837813e4940, 
> 96c4603a0822d10884f7fe536703d811, d9ded24224aab7c7041420b3efc1b6ba] from 
> KubernetesStateHandleStore{configMapName='stellar-flink-cluster-dispatcher-leader'}
> 2021-03-26 08:22:57,933 INFO 
> org.apache.flink.runtime.dispatcher.runner.SessionDispatcherLeaderProcess [] 
> - Trying to recover job with job id 198c46bac791e73ebcc565a550fa4ff6.
>  2021-03-26 08:22:58,029 INFO 
> org.apache.flink.runtime.dispatcher.runner.SessionDispatcherLeaderProcess [] 
> - Stopping SessionDispatcherLeaderProcess.
>  2021-03-26 08:28:22,677 INFO 
> org.apache.flink.runtime.jobmanager.DefaultJobGraphStore [] - Stopping 
> DefaultJobGraphStore. 2021-03-26 08:28:22,681 ERROR 
> org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - Fatal error 
> occurred in the cluster entrypoint. java.util.concurrent.CompletionException: 
> org.apache.flink.util.FlinkRuntimeException: Could not recover job with job 
> id 198c46bac791e73ebcc565a550fa4ff6.
>at java.util.concurrent.CompletableFuture.encodeThrowable(Unknown Source) 
> ~[?:?]
>at java.util.concurrent.CompletableFuture.completeThrowable(Unknown 
> Source) [?:?]
>at java.util.concurrent.CompletableFuture$AsyncSupply.run(Unknown Source) 
> [?:?]
>at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) [?:?]
>at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) [?:?]
>at java.lang.Thread.run(Unknown Source) [?:?] Caused by: 
> org.apache.flink.util.FlinkRuntimeException: Could not recover job with job 
> id 198c46bac791e73ebcc565a550fa4ff6.
>at 
> org.apache.flink.runtime.dispatcher.runner.SessionDispatcherLeaderProcess.recoverJob(SessionDispatcherLeaderProcess.java:144
>  undefined) ~[flink-dist_2.12-1.12.2.jar:1.12.2]
>at 
> org.apache.flink.runtime.dispatcher.runner.SessionDispatcherLeaderProcess.recoverJobs(SessionDispatcherLeaderProcess.java:122
>  undefined) ~[flink-dist_2.12-1.12.2.jar:1.12.2]
>at 
> org.apache.flink.runtime.dispatcher.runner.AbstractDispatcherLeaderProcess.supplyUnsynchronizedIfRunning(AbstractDispatcherLeaderProcess.java:198
>  undefined) ~[flink-dist_2.12-1.12.2.jar:1.12.2]
>at 
> org.apache.flink.runtime.dispatcher.runner.SessionDispatcherLeaderProcess.recoverJobsIfRunning(SessionDispatcherLeaderProcess.java:113
>  undefined) ~[flink-dist_2.12-1.12.2.jar:1.12.2] ... 4 more 
> Caused by: org.apache.flink.util.FlinkException: Could not retrieve submitted 
> JobGraph from state handle under jobGraph-198c46bac791e73ebcc565a550fa4ff6. 
> This indicates that the retrieved state handle is broken. Try cleaning the 
> state handle store.
>at 
> org.apache.flink.runtime.jobmanager.DefaultJobGraphStore.recoverJobGraph(DefaultJobGraphStore.java:171
>  undefined)

[jira] [Commented] (FLINK-22014) Flink JobManager failed to restart after failure in kubernetes HA setup

2023-12-02 Thread chenyuexin (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-22014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17792482#comment-17792482
 ] 

chenyuexin commented on FLINK-22014:


The same issue in my case with flink-1.15.4, we found that our running flink 
tasks some have ’submittedJobGraph’ file and ‘job-result-store‘ folder, 
some tasks all don’t exist, we use S3 storage and K8S HA

> Flink JobManager failed to restart after failure in kubernetes HA setup
> ---
>
> Key: FLINK-22014
> URL: https://issues.apache.org/jira/browse/FLINK-22014
> Project: Flink
>  Issue Type: Bug
>  Components: Deployment / Kubernetes
>Affects Versions: 1.11.3, 1.12.2, 1.13.0
>Reporter: Mikalai Lushchytski
>Priority: Major
>  Labels: k8s-ha, pull-request-available
> Attachments: flink-logs.txt.zip, image-2021-04-19-11-17-58-215.png, 
> scalyr-logs (1).txt
>
>
> After the JobManager pod failed and the new one started, it was not able to 
> recover jobs due to the absence of recovery data in storage - config map 
> pointed at not existing file.
>   
>  Due to this the JobManager pod entered into the `CrashLoopBackOff`state and 
> was not able to recover - each attempt failed with the same error so the 
> whole cluster became unrecoverable and not operating.
>   
>  I had to manually delete the config map and start the jobs again without the 
> save point.
>   
>  If I tried to emulate the failure further by deleting job manager pod 
> manually, the new pod every time recovered well and issue was not 
> reproducible anymore artificially.
>   
>  Below is the failure log:
> {code:java}
> 2021-03-26 08:22:57,925 INFO 
> org.apache.flink.runtime.resourcemanager.slotmanager.SlotManagerImpl [] - 
> Starting the SlotManager.
>  2021-03-26 08:22:57,928 INFO 
> org.apache.flink.runtime.leaderretrieval.DefaultLeaderRetrievalService [] - 
> Starting DefaultLeaderRetrievalService with KubernetesLeaderRetrievalDriver
> {configMapName='stellar-flink-cluster-dispatcher-leader'}.
>  2021-03-26 08:22:57,931 INFO 
> org.apache.flink.runtime.jobmanager.DefaultJobGraphStore [] - Retrieved job 
> ids [198c46bac791e73ebcc565a550fa4ff6, 344f5ebc1b5c3a566b4b2837813e4940, 
> 96c4603a0822d10884f7fe536703d811, d9ded24224aab7c7041420b3efc1b6ba] from 
> KubernetesStateHandleStore{configMapName='stellar-flink-cluster-dispatcher-leader'}
> 2021-03-26 08:22:57,933 INFO 
> org.apache.flink.runtime.dispatcher.runner.SessionDispatcherLeaderProcess [] 
> - Trying to recover job with job id 198c46bac791e73ebcc565a550fa4ff6.
>  2021-03-26 08:22:58,029 INFO 
> org.apache.flink.runtime.dispatcher.runner.SessionDispatcherLeaderProcess [] 
> - Stopping SessionDispatcherLeaderProcess.
>  2021-03-26 08:28:22,677 INFO 
> org.apache.flink.runtime.jobmanager.DefaultJobGraphStore [] - Stopping 
> DefaultJobGraphStore. 2021-03-26 08:28:22,681 ERROR 
> org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - Fatal error 
> occurred in the cluster entrypoint. java.util.concurrent.CompletionException: 
> org.apache.flink.util.FlinkRuntimeException: Could not recover job with job 
> id 198c46bac791e73ebcc565a550fa4ff6.
>at java.util.concurrent.CompletableFuture.encodeThrowable(Unknown Source) 
> ~[?:?]
>at java.util.concurrent.CompletableFuture.completeThrowable(Unknown 
> Source) [?:?]
>at java.util.concurrent.CompletableFuture$AsyncSupply.run(Unknown Source) 
> [?:?]
>at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) [?:?]
>at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) [?:?]
>at java.lang.Thread.run(Unknown Source) [?:?] Caused by: 
> org.apache.flink.util.FlinkRuntimeException: Could not recover job with job 
> id 198c46bac791e73ebcc565a550fa4ff6.
>at 
> org.apache.flink.runtime.dispatcher.runner.SessionDispatcherLeaderProcess.recoverJob(SessionDispatcherLeaderProcess.java:144
>  undefined) ~[flink-dist_2.12-1.12.2.jar:1.12.2]
>at 
> org.apache.flink.runtime.dispatcher.runner.SessionDispatcherLeaderProcess.recoverJobs(SessionDispatcherLeaderProcess.java:122
>  undefined) ~[flink-dist_2.12-1.12.2.jar:1.12.2]
>at 
> org.apache.flink.runtime.dispatcher.runner.AbstractDispatcherLeaderProcess.supplyUnsynchronizedIfRunning(AbstractDispatcherLeaderProcess.java:198
>  undefined) ~[flink-dist_2.12-1.12.2.jar:1.12.2]
>at 
> org.apache.flink.runtime.dispatcher.runner.SessionDispatcherLeaderProcess.recoverJobsIfRunning(SessionDispatcherLeaderProcess.java:113
>  undefined) ~[flink-dist_2.12-1.12.2.jar:1.12.2] ... 4 more 
> Caused by: org.apache.flink.util.FlinkException: Could not retrieve submitted 
> JobGraph from state handle under jobGraph-198c46bac791e73ebcc565a550fa4ff6. 
> This indicates that the retrieved state handle is broken. Try cleaning the 
> state handle store.

[jira] [Comment Edited] (FLINK-33727) JoinRestoreTest is failing on AZP

2023-12-02 Thread Sergey Nuyanzin (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-33727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17792458#comment-17792458
 ] 

Sergey Nuyanzin edited comment on FLINK-33727 at 12/2/23 11:29 PM:
---

Based on local tests seems these {{MySink}} is kind of "state holder" for the 
tests...

Renaming helps
{quote}
  (Right now, the DeduplicationTestPrograms and JoinTestPrograms both use sinks 
called "MySink".)
{quote}
is there any reason to have same name?

Also I guess it is worth noting: there is a number of other tests with same 
potential issue e.g. with sink name "sink_t", probably something else


was (Author: sergey nuyanzin):
Based on local tests seems these {{MySink}} is kind of "state holder" for the 
tests...

{quote}
  (Right now, the DeduplicationTestPrograms and JoinTestPrograms both use sinks 
called "MySink".)
{quote}
is there any reason to have same name?

> JoinRestoreTest is failing on AZP
> -
>
> Key: FLINK-33727
> URL: https://issues.apache.org/jira/browse/FLINK-33727
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Planner
>Affects Versions: 1.19.0
>Reporter: Sergey Nuyanzin
>Priority: Critical
>  Labels: pull-request-available, test-stability
>
> Since {{JoinRestoreTest}} was introduced in FLINK-33470 it seems to be a 
> reason
> {noformat}
> Dec 02 04:42:26 04:42:26.408 [ERROR] Failures: 
> Dec 02 04:42:26 04:42:26.408 [ERROR]   
> JoinRestoreTest>RestoreTestBase.testRestore:283 
> Dec 02 04:42:26 Expecting actual:
> Dec 02 04:42:26   ["+I[9, carol, apple, 9000]",
> Dec 02 04:42:26 "+I[8, bill, banana, 8000]",
> Dec 02 04:42:26 "+I[6, jerry, pen, 6000]"]
> Dec 02 04:42:26 to contain exactly in any order:
> Dec 02 04:42:26   ["+I[Adam, null]",
> Dec 02 04:42:26 "+I[Baker, Research]",
> Dec 02 04:42:26 "+I[Charlie, Human Resources]",
> Dec 02 04:42:26 "+I[Charlie, HR]",
> Dec 02 04:42:26 "+I[Don, Sales]",
> Dec 02 04:42:26 "+I[Victor, null]",
> Dec 02 04:42:26 "+I[Helena, Engineering]",
> Dec 02 04:42:26 "+I[Juliet, Engineering]",
> Dec 02 04:42:26 "+I[Ivana, Research]",
> Dec 02 04:42:26 "+I[Charlie, People Operations]"]
> {noformat}
> examples of failures
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=55120=logs=0c940707-2659-5648-cbe6-a1ad63045f0a=075c2716-8010-5565-fe08-3c4bb45824a4=12099
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=55129=logs=a9db68b9-a7e0-54b6-0f98-010e0aff39e2=cdd32e0b-6047-565b-c58f-14054472f1be=11786
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=55136=logs=0c940707-2659-5648-cbe6-a1ad63045f0a=075c2716-8010-5565-fe08-3c4bb45824a4=12099
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=55137=logs=0c940707-2659-5648-cbe6-a1ad63045f0a=075c2716-8010-5565-fe08-3c4bb45824a4=11779



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Re: [PR] [FLINK-33727][table] Use different sink names for restore tests [flink]

2023-12-02 Thread via GitHub



flinkbot commented on PR #23858:
URL: https://github.com/apache/flink/pull/23858#issuecomment-1837279346

   
   ## CI report:
   
   * da62de304dcaeef07bda7d8da7bfc082737b14fa UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] [FLINK-33710] Prevent triggering cluster upgrades for permutations of the same overrides [flink-kubernetes-operator]

2023-12-02 Thread via GitHub



gyfora commented on code in PR #721:
URL: 
https://github.com/apache/flink-kubernetes-operator/pull/721#discussion_r1412894510


##
flink-kubernetes-operator/src/main/java/org/apache/flink/kubernetes/operator/reconciler/deployment/AbstractFlinkResourceReconciler.java:
##
@@ -186,7 +187,41 @@ private void applyAutoscaler(FlinkResourceContext ctx) 
throws Exception {
 ctx.getResource().getSpec().getJob() != null
 && 
ctx.getObserveConfig().getBoolean(AUTOSCALER_ENABLED);
 autoScalerCtx.getConfiguration().set(AUTOSCALER_ENABLED, 
autoscalerEnabled);
+
 autoscaler.scale(autoScalerCtx);
+putBackOldParallelismOverridesIfNewOnesAreMerelyAPermutation(ctx);
+}
+
+private static <
+CR extends AbstractFlinkResource,
+SPEC extends AbstractFlinkSpec,
+STATUS extends CommonStatus>
+void putBackOldParallelismOverridesIfNewOnesAreMerelyAPermutation(

Review Comment:
   Could we move this logic to the `ScalingRealizer`? 
https://github.com/apache/flink-kubernetes-operator/commit/158cbe29169cbfb7fa7ad676fb0273fd7ef6d25e
 adds the logic there and this issue comes from that change. I feel these 2 
changes logically belong together and it's weird that we break something in 
once place and fix it in another while it could be simply next to each other.
   
   It will likely also make the whole logic a bit simpler



##
flink-kubernetes-operator/src/main/java/org/apache/flink/kubernetes/operator/reconciler/deployment/AbstractFlinkResourceReconciler.java:
##
@@ -186,7 +187,41 @@ private void applyAutoscaler(FlinkResourceContext ctx) 
throws Exception {
 ctx.getResource().getSpec().getJob() != null
 && 
ctx.getObserveConfig().getBoolean(AUTOSCALER_ENABLED);
 autoScalerCtx.getConfiguration().set(AUTOSCALER_ENABLED, 
autoscalerEnabled);
+
 autoscaler.scale(autoScalerCtx);
+putBackOldParallelismOverridesIfNewOnesAreMerelyAPermutation(ctx);
+}
+
+private static <
+CR extends AbstractFlinkResource,
+SPEC extends AbstractFlinkSpec,
+STATUS extends CommonStatus>

Review Comment:
   Can we rename this to `resetParallelismOverridesIfUnchanged` ? The current 
naming is unusually verbose for the codebase. It may be better to add a javadoc 
comment instead. But this may be irrelevant if you check my other comment. We 
don't need to replace it if we don't set it in the first place, but for that we 
need to move the logic



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Commented] (FLINK-33727) JoinRestoreTest is failing on AZP

2023-12-02 Thread Sergey Nuyanzin (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-33727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17792458#comment-17792458
 ] 

Sergey Nuyanzin commented on FLINK-33727:
-

Seems these {{MySink}} is kind of "state holder" for the tests...

{quote}
  (Right now, the DeduplicationTestPrograms and JoinTestPrograms both use sinks 
called "MySink".)
{quote}
is there any reason to have same name?

> JoinRestoreTest is failing on AZP
> -
>
> Key: FLINK-33727
> URL: https://issues.apache.org/jira/browse/FLINK-33727
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Planner
>Affects Versions: 1.19.0
>Reporter: Sergey Nuyanzin
>Priority: Critical
>  Labels: pull-request-available, test-stability
>
> Since {{JoinRestoreTest}} was introduced in FLINK-33470 it seems to be a 
> reason
> {noformat}
> Dec 02 04:42:26 04:42:26.408 [ERROR] Failures: 
> Dec 02 04:42:26 04:42:26.408 [ERROR]   
> JoinRestoreTest>RestoreTestBase.testRestore:283 
> Dec 02 04:42:26 Expecting actual:
> Dec 02 04:42:26   ["+I[9, carol, apple, 9000]",
> Dec 02 04:42:26 "+I[8, bill, banana, 8000]",
> Dec 02 04:42:26 "+I[6, jerry, pen, 6000]"]
> Dec 02 04:42:26 to contain exactly in any order:
> Dec 02 04:42:26   ["+I[Adam, null]",
> Dec 02 04:42:26 "+I[Baker, Research]",
> Dec 02 04:42:26 "+I[Charlie, Human Resources]",
> Dec 02 04:42:26 "+I[Charlie, HR]",
> Dec 02 04:42:26 "+I[Don, Sales]",
> Dec 02 04:42:26 "+I[Victor, null]",
> Dec 02 04:42:26 "+I[Helena, Engineering]",
> Dec 02 04:42:26 "+I[Juliet, Engineering]",
> Dec 02 04:42:26 "+I[Ivana, Research]",
> Dec 02 04:42:26 "+I[Charlie, People Operations]"]
> {noformat}
> examples of failures
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=55120=logs=0c940707-2659-5648-cbe6-a1ad63045f0a=075c2716-8010-5565-fe08-3c4bb45824a4=12099
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=55129=logs=a9db68b9-a7e0-54b6-0f98-010e0aff39e2=cdd32e0b-6047-565b-c58f-14054472f1be=11786
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=55136=logs=0c940707-2659-5648-cbe6-a1ad63045f0a=075c2716-8010-5565-fe08-3c4bb45824a4=12099
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=55137=logs=0c940707-2659-5648-cbe6-a1ad63045f0a=075c2716-8010-5565-fe08-3c4bb45824a4=11779



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Comment Edited] (FLINK-33727) JoinRestoreTest is failing on AZP

2023-12-02 Thread Sergey Nuyanzin (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-33727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17792458#comment-17792458
 ] 

Sergey Nuyanzin edited comment on FLINK-33727 at 12/2/23 11:23 PM:
---

Based on local tests seems these {{MySink}} is kind of "state holder" for the 
tests...

{quote}
  (Right now, the DeduplicationTestPrograms and JoinTestPrograms both use sinks 
called "MySink".)
{quote}
is there any reason to have same name?


was (Author: sergey nuyanzin):
Seems these {{MySink}} is kind of "state holder" for the tests...

{quote}
  (Right now, the DeduplicationTestPrograms and JoinTestPrograms both use sinks 
called "MySink".)
{quote}
is there any reason to have same name?

> JoinRestoreTest is failing on AZP
> -
>
> Key: FLINK-33727
> URL: https://issues.apache.org/jira/browse/FLINK-33727
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Planner
>Affects Versions: 1.19.0
>Reporter: Sergey Nuyanzin
>Priority: Critical
>  Labels: pull-request-available, test-stability
>
> Since {{JoinRestoreTest}} was introduced in FLINK-33470 it seems to be a 
> reason
> {noformat}
> Dec 02 04:42:26 04:42:26.408 [ERROR] Failures: 
> Dec 02 04:42:26 04:42:26.408 [ERROR]   
> JoinRestoreTest>RestoreTestBase.testRestore:283 
> Dec 02 04:42:26 Expecting actual:
> Dec 02 04:42:26   ["+I[9, carol, apple, 9000]",
> Dec 02 04:42:26 "+I[8, bill, banana, 8000]",
> Dec 02 04:42:26 "+I[6, jerry, pen, 6000]"]
> Dec 02 04:42:26 to contain exactly in any order:
> Dec 02 04:42:26   ["+I[Adam, null]",
> Dec 02 04:42:26 "+I[Baker, Research]",
> Dec 02 04:42:26 "+I[Charlie, Human Resources]",
> Dec 02 04:42:26 "+I[Charlie, HR]",
> Dec 02 04:42:26 "+I[Don, Sales]",
> Dec 02 04:42:26 "+I[Victor, null]",
> Dec 02 04:42:26 "+I[Helena, Engineering]",
> Dec 02 04:42:26 "+I[Juliet, Engineering]",
> Dec 02 04:42:26 "+I[Ivana, Research]",
> Dec 02 04:42:26 "+I[Charlie, People Operations]"]
> {noformat}
> examples of failures
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=55120=logs=0c940707-2659-5648-cbe6-a1ad63045f0a=075c2716-8010-5565-fe08-3c4bb45824a4=12099
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=55129=logs=a9db68b9-a7e0-54b6-0f98-010e0aff39e2=cdd32e0b-6047-565b-c58f-14054472f1be=11786
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=55136=logs=0c940707-2659-5648-cbe6-a1ad63045f0a=075c2716-8010-5565-fe08-3c4bb45824a4=12099
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=55137=logs=0c940707-2659-5648-cbe6-a1ad63045f0a=075c2716-8010-5565-fe08-3c4bb45824a4=11779



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (FLINK-33727) JoinRestoreTest is failing on AZP

2023-12-02 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-33727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-33727:
---
Labels: pull-request-available test-stability  (was: test-stability)

> JoinRestoreTest is failing on AZP
> -
>
> Key: FLINK-33727
> URL: https://issues.apache.org/jira/browse/FLINK-33727
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Planner
>Affects Versions: 1.19.0
>Reporter: Sergey Nuyanzin
>Priority: Critical
>  Labels: pull-request-available, test-stability
>
> Since {{JoinRestoreTest}} was introduced in FLINK-33470 it seems to be a 
> reason
> {noformat}
> Dec 02 04:42:26 04:42:26.408 [ERROR] Failures: 
> Dec 02 04:42:26 04:42:26.408 [ERROR]   
> JoinRestoreTest>RestoreTestBase.testRestore:283 
> Dec 02 04:42:26 Expecting actual:
> Dec 02 04:42:26   ["+I[9, carol, apple, 9000]",
> Dec 02 04:42:26 "+I[8, bill, banana, 8000]",
> Dec 02 04:42:26 "+I[6, jerry, pen, 6000]"]
> Dec 02 04:42:26 to contain exactly in any order:
> Dec 02 04:42:26   ["+I[Adam, null]",
> Dec 02 04:42:26 "+I[Baker, Research]",
> Dec 02 04:42:26 "+I[Charlie, Human Resources]",
> Dec 02 04:42:26 "+I[Charlie, HR]",
> Dec 02 04:42:26 "+I[Don, Sales]",
> Dec 02 04:42:26 "+I[Victor, null]",
> Dec 02 04:42:26 "+I[Helena, Engineering]",
> Dec 02 04:42:26 "+I[Juliet, Engineering]",
> Dec 02 04:42:26 "+I[Ivana, Research]",
> Dec 02 04:42:26 "+I[Charlie, People Operations]"]
> {noformat}
> examples of failures
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=55120=logs=0c940707-2659-5648-cbe6-a1ad63045f0a=075c2716-8010-5565-fe08-3c4bb45824a4=12099
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=55129=logs=a9db68b9-a7e0-54b6-0f98-010e0aff39e2=cdd32e0b-6047-565b-c58f-14054472f1be=11786
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=55136=logs=0c940707-2659-5648-cbe6-a1ad63045f0a=075c2716-8010-5565-fe08-3c4bb45824a4=12099
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=55137=logs=0c940707-2659-5648-cbe6-a1ad63045f0a=075c2716-8010-5565-fe08-3c4bb45824a4=11779



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[PR] [FLINK-33727][table] Use different sink names for restore tests [flink]

2023-12-02 Thread via GitHub



snuyanzin opened a new pull request, #23858:
URL: https://github.com/apache/flink/pull/23858

   
   ## What is the purpose of the change
   
   In restore  tests there is `MySink` name for sinks and it seems to be the 
problem for restore tests. For more details see jira.
   The PR just renames these sinks for every table program
   
   
   ## Verifying this change
   
   The issue was 100% reproduced by executing all tests for `RestoreTestBase`
   In same way it could be tested
   
   ## Does this pull request potentially affect one of the following parts:
   
 - Dependencies (does it add or upgrade a dependency): ( no)
 - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: ( no)
 - The serializers: (no )
 - The runtime per-record code paths (performance sensitive): (no )
 - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Kubernetes/Yarn, ZooKeeper: (no )
 - The S3 file system connector: (no )
   
   ## Documentation
   
 - Does this pull request introduce a new feature? (no)
 - If yes, how is the feature documented? (not applicable)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Commented] (FLINK-33727) JoinRestoreTest is failing on AZP

2023-12-02 Thread Sergey Nuyanzin (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-33727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17792449#comment-17792449
 ] 

Sergey Nuyanzin commented on FLINK-33727:
-

I don't think it is related to concurrent execution.

I was able to find a way to reproduce it locally with 100%.
Just open IntellijIDEA and run all tests for {{RestoreTestBase}}

Even  more, I started commenting tests and realised if there at least one test 
e.g. {{ExpandRestoreTest}} before {{JoinRestoreTest}} then {{JoinRestoreTest}} 
fails with 100% at least for my env. If I comment out also 
{{ExpandRestoreTest}} then it starts passing.
It seems it relies on some internal state...


> JoinRestoreTest is failing on AZP
> -
>
> Key: FLINK-33727
> URL: https://issues.apache.org/jira/browse/FLINK-33727
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Planner
>Affects Versions: 1.19.0
>Reporter: Sergey Nuyanzin
>Priority: Critical
>  Labels: test-stability
>
> Since {{JoinRestoreTest}} was introduced in FLINK-33470 it seems to be a 
> reason
> {noformat}
> Dec 02 04:42:26 04:42:26.408 [ERROR] Failures: 
> Dec 02 04:42:26 04:42:26.408 [ERROR]   
> JoinRestoreTest>RestoreTestBase.testRestore:283 
> Dec 02 04:42:26 Expecting actual:
> Dec 02 04:42:26   ["+I[9, carol, apple, 9000]",
> Dec 02 04:42:26 "+I[8, bill, banana, 8000]",
> Dec 02 04:42:26 "+I[6, jerry, pen, 6000]"]
> Dec 02 04:42:26 to contain exactly in any order:
> Dec 02 04:42:26   ["+I[Adam, null]",
> Dec 02 04:42:26 "+I[Baker, Research]",
> Dec 02 04:42:26 "+I[Charlie, Human Resources]",
> Dec 02 04:42:26 "+I[Charlie, HR]",
> Dec 02 04:42:26 "+I[Don, Sales]",
> Dec 02 04:42:26 "+I[Victor, null]",
> Dec 02 04:42:26 "+I[Helena, Engineering]",
> Dec 02 04:42:26 "+I[Juliet, Engineering]",
> Dec 02 04:42:26 "+I[Ivana, Research]",
> Dec 02 04:42:26 "+I[Charlie, People Operations]"]
> {noformat}
> examples of failures
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=55120=logs=0c940707-2659-5648-cbe6-a1ad63045f0a=075c2716-8010-5565-fe08-3c4bb45824a4=12099
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=55129=logs=a9db68b9-a7e0-54b6-0f98-010e0aff39e2=cdd32e0b-6047-565b-c58f-14054472f1be=11786
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=55136=logs=0c940707-2659-5648-cbe6-a1ad63045f0a=075c2716-8010-5565-fe08-3c4bb45824a4=12099
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=55137=logs=0c940707-2659-5648-cbe6-a1ad63045f0a=075c2716-8010-5565-fe08-3c4bb45824a4=11779



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Re: [PR] [FLINK-33470] Implement restore tests for Join node [flink]

2023-12-02 Thread via GitHub



jnh5y commented on PR #23680:
URL: https://github.com/apache/flink/pull/23680#issuecomment-1837217788

   > @dawidwys @jnh5y after merging this ci on master failed 4 times out of 6 
e.g.
   > 
   > ```
   > Dec 02 04:42:26 04:42:26.408 [ERROR] Failures: 
   > Dec 02 04:42:26 04:42:26.408 [ERROR]   
JoinRestoreTest>RestoreTestBase.testRestore:283 
   > Dec 02 04:42:26 Expecting actual:
   > Dec 02 04:42:26   ["+I[9, carol, apple, 9000]",
   > Dec 02 04:42:26 "+I[8, bill, banana, 8000]",
   > Dec 02 04:42:26 "+I[6, jerry, pen, 6000]"]
   > Dec 02 04:42:26 to contain exactly in any order:
   > Dec 02 04:42:26   ["+I[Adam, null]",
   > Dec 02 04:42:26 "+I[Baker, Research]",
   > Dec 02 04:42:26 "+I[Charlie, Human Resources]",
   > Dec 02 04:42:26 "+I[Charlie, HR]",
   > Dec 02 04:42:26 "+I[Don, Sales]",
   > Dec 02 04:42:26 "+I[Victor, null]",
   > Dec 02 04:42:26 "+I[Helena, Engineering]",
   > Dec 02 04:42:26 "+I[Juliet, Engineering]",
   > Dec 02 04:42:26 "+I[Ivana, Research]",
   > Dec 02 04:42:26 "+I[Charlie, People Operations]"]
   > ```
   > 
   > 
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=55120=logs=0c940707-2659-5648-cbe6-a1ad63045f0a=075c2716-8010-5565-fe08-3c4bb45824a4=12099
 
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=55129=logs=a9db68b9-a7e0-54b6-0f98-010e0aff39e2=cdd32e0b-6047-565b-c58f-14054472f1be=11786
 
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=55136=logs=0c940707-2659-5648-cbe6-a1ad63045f0a=075c2716-8010-5565-fe08-3c4bb45824a4=12099
 
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=55137=logs=0c940707-2659-5648-cbe6-a1ad63045f0a=075c2716-8010-5565-fe08-3c4bb45824a4=11779
   > 
   > could you please have a look? 
[FLINK-33727](https://issues.apache.org/jira/browse/FLINK-33727)
   
   Yes.  I've commented on the Flink JIRA; I think the issue is concurrency 
with the RestoreTestBase implementations.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Commented] (FLINK-33727) JoinRestoreTest is failing on AZP

2023-12-02 Thread Jim Hughes (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-33727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17792413#comment-17792413
 ] 

Jim Hughes commented on FLINK-33727:


>From a quick look, the data is coming from `DeduplicationTestPrograms.java`.  

I believe that this shows that the various `RestoreTest`s are being executed 
concurrently and are interfering with each other.  

Two obvious ideas would be:
1. Have each RestoreTest use differently named sinks/sources.  (Right now, the 
DeduplicationTestPrograms and JoinTestPrograms both use sinks called "MySink".)
2. Do something at the JUnit level so that implementations of RestoreTestBase 
do not run concurrently.

Thoughts?

> JoinRestoreTest is failing on AZP
> -
>
> Key: FLINK-33727
> URL: https://issues.apache.org/jira/browse/FLINK-33727
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Planner
>Affects Versions: 1.19.0
>Reporter: Sergey Nuyanzin
>Priority: Critical
>  Labels: test-stability
>
> Since {{JoinRestoreTest}} was introduced in FLINK-33470 it seems to be a 
> reason
> {noformat}
> Dec 02 04:42:26 04:42:26.408 [ERROR] Failures: 
> Dec 02 04:42:26 04:42:26.408 [ERROR]   
> JoinRestoreTest>RestoreTestBase.testRestore:283 
> Dec 02 04:42:26 Expecting actual:
> Dec 02 04:42:26   ["+I[9, carol, apple, 9000]",
> Dec 02 04:42:26 "+I[8, bill, banana, 8000]",
> Dec 02 04:42:26 "+I[6, jerry, pen, 6000]"]
> Dec 02 04:42:26 to contain exactly in any order:
> Dec 02 04:42:26   ["+I[Adam, null]",
> Dec 02 04:42:26 "+I[Baker, Research]",
> Dec 02 04:42:26 "+I[Charlie, Human Resources]",
> Dec 02 04:42:26 "+I[Charlie, HR]",
> Dec 02 04:42:26 "+I[Don, Sales]",
> Dec 02 04:42:26 "+I[Victor, null]",
> Dec 02 04:42:26 "+I[Helena, Engineering]",
> Dec 02 04:42:26 "+I[Juliet, Engineering]",
> Dec 02 04:42:26 "+I[Ivana, Research]",
> Dec 02 04:42:26 "+I[Charlie, People Operations]"]
> {noformat}
> examples of failures
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=55120=logs=0c940707-2659-5648-cbe6-a1ad63045f0a=075c2716-8010-5565-fe08-3c4bb45824a4=12099
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=55129=logs=a9db68b9-a7e0-54b6-0f98-010e0aff39e2=cdd32e0b-6047-565b-c58f-14054472f1be=11786
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=55136=logs=0c940707-2659-5648-cbe6-a1ad63045f0a=075c2716-8010-5565-fe08-3c4bb45824a4=12099
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=55137=logs=0c940707-2659-5648-cbe6-a1ad63045f0a=075c2716-8010-5565-fe08-3c4bb45824a4=11779



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (FLINK-33727) JoinRestoreTest is failing on AZP

2023-12-02 Thread Sergey Nuyanzin (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-33727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Nuyanzin updated FLINK-33727:

Description: 
Since {{JoinRestoreTest}} was introduced in FLINK-33470 it seems to be a reason

{noformat}
Dec 02 04:42:26 04:42:26.408 [ERROR] Failures: 
Dec 02 04:42:26 04:42:26.408 [ERROR]   
JoinRestoreTest>RestoreTestBase.testRestore:283 
Dec 02 04:42:26 Expecting actual:
Dec 02 04:42:26   ["+I[9, carol, apple, 9000]",
Dec 02 04:42:26 "+I[8, bill, banana, 8000]",
Dec 02 04:42:26 "+I[6, jerry, pen, 6000]"]
Dec 02 04:42:26 to contain exactly in any order:
Dec 02 04:42:26   ["+I[Adam, null]",
Dec 02 04:42:26 "+I[Baker, Research]",
Dec 02 04:42:26 "+I[Charlie, Human Resources]",
Dec 02 04:42:26 "+I[Charlie, HR]",
Dec 02 04:42:26 "+I[Don, Sales]",
Dec 02 04:42:26 "+I[Victor, null]",
Dec 02 04:42:26 "+I[Helena, Engineering]",
Dec 02 04:42:26 "+I[Juliet, Engineering]",
Dec 02 04:42:26 "+I[Ivana, Research]",
Dec 02 04:42:26 "+I[Charlie, People Operations]"]
{noformat}
examples of failures

https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=55120=logs=0c940707-2659-5648-cbe6-a1ad63045f0a=075c2716-8010-5565-fe08-3c4bb45824a4=12099
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=55129=logs=a9db68b9-a7e0-54b6-0f98-010e0aff39e2=cdd32e0b-6047-565b-c58f-14054472f1be=11786
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=55136=logs=0c940707-2659-5648-cbe6-a1ad63045f0a=075c2716-8010-5565-fe08-3c4bb45824a4=12099
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=55137=logs=0c940707-2659-5648-cbe6-a1ad63045f0a=075c2716-8010-5565-fe08-3c4bb45824a4=11779

  was:
Since it was introduced in FLINK-33470 it seems to be a reason

{noformat}
Dec 02 04:42:26 04:42:26.408 [ERROR] Failures: 
Dec 02 04:42:26 04:42:26.408 [ERROR]   
JoinRestoreTest>RestoreTestBase.testRestore:283 
Dec 02 04:42:26 Expecting actual:
Dec 02 04:42:26   ["+I[9, carol, apple, 9000]",
Dec 02 04:42:26 "+I[8, bill, banana, 8000]",
Dec 02 04:42:26 "+I[6, jerry, pen, 6000]"]
Dec 02 04:42:26 to contain exactly in any order:
Dec 02 04:42:26   ["+I[Adam, null]",
Dec 02 04:42:26 "+I[Baker, Research]",
Dec 02 04:42:26 "+I[Charlie, Human Resources]",
Dec 02 04:42:26 "+I[Charlie, HR]",
Dec 02 04:42:26 "+I[Don, Sales]",
Dec 02 04:42:26 "+I[Victor, null]",
Dec 02 04:42:26 "+I[Helena, Engineering]",
Dec 02 04:42:26 "+I[Juliet, Engineering]",
Dec 02 04:42:26 "+I[Ivana, Research]",
Dec 02 04:42:26 "+I[Charlie, People Operations]"]
{noformat}
examples of failures

https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=55120=logs=0c940707-2659-5648-cbe6-a1ad63045f0a=075c2716-8010-5565-fe08-3c4bb45824a4=12099
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=55129=logs=a9db68b9-a7e0-54b6-0f98-010e0aff39e2=cdd32e0b-6047-565b-c58f-14054472f1be=11786
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=55136=logs=0c940707-2659-5648-cbe6-a1ad63045f0a=075c2716-8010-5565-fe08-3c4bb45824a4=12099
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=55137=logs=0c940707-2659-5648-cbe6-a1ad63045f0a=075c2716-8010-5565-fe08-3c4bb45824a4=11779


> JoinRestoreTest is failing on AZP
> -
>
> Key: FLINK-33727
> URL: https://issues.apache.org/jira/browse/FLINK-33727
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Planner
>Affects Versions: 1.19.0
>Reporter: Sergey Nuyanzin
>Priority: Critical
>  Labels: test-stability
>
> Since {{JoinRestoreTest}} was introduced in FLINK-33470 it seems to be a 
> reason
> {noformat}
> Dec 02 04:42:26 04:42:26.408 [ERROR] Failures: 
> Dec 02 04:42:26 04:42:26.408 [ERROR]   
> JoinRestoreTest>RestoreTestBase.testRestore:283 
> Dec 02 04:42:26 Expecting actual:
> Dec 02 04:42:26   ["+I[9, carol, apple, 9000]",
> Dec 02 04:42:26 "+I[8, bill, banana, 8000]",
> Dec 02 04:42:26 "+I[6, jerry, pen, 6000]"]
> Dec 02 04:42:26 to contain exactly in any order:
> Dec 02 04:42:26   ["+I[Adam, null]",
> Dec 02 04:42:26 "+I[Baker, Research]",
> Dec 02 04:42:26 "+I[Charlie, Human Resources]",
> Dec 02 04:42:26 "+I[Charlie, HR]",
> Dec 02 04:42:26 "+I[Don, Sales]",
> Dec 02 04:42:26 "+I[Victor, null]",
> Dec 02 04:42:26 "+I[Helena, Engineering]",
> Dec 02 04:42:26 "+I[Juliet, Engineering]",
> Dec 02 04:42:26 "+I[Ivana, Research]",
> Dec 02 04:42:26 "+I[Charlie, People Operations]"]
> {noformat}
> examples of failures
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=55120=logs=0c940707-2659-5648-cbe6-a1ad63045f0a=075c2716-8010-5565-fe08-3c4bb45824a4=12099
>

[jira] [Commented] (FLINK-32986) The new createTemporaryFunction has some regression of type inference compare to the deprecated registerFunction

2023-12-02 Thread Jeyhun Karimov (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-32986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17792406#comment-17792406
 ] 

Jeyhun Karimov commented on FLINK-32986:


Hi [~lincoln.86xy]  could you please assign this task to me or give me an 
access to self-assign the task? Thanks!

> The new createTemporaryFunction has some regression of type inference compare 
> to the deprecated registerFunction
> 
>
> Key: FLINK-32986
> URL: https://issues.apache.org/jira/browse/FLINK-32986
> Project: Flink
>  Issue Type: Technical Debt
>  Components: Table SQL / API
>Affects Versions: 1.18.0, 1.17.1
>Reporter: lincoln lee
>Priority: Major
>  Labels: pull-request-available
>
> Current `LookupJoinITCase#testJoinTemporalTableWithUdfFilter` uses a legacy 
> form function registration:
> {code}
> tEnv.registerFunction("add", new TestAddWithOpen)
> {code}
> it works fine with the SQL call `add(T.id, 2) > 3` but fails when swith to 
> the new api:
> {code}
> tEnv.createTemporaryFunction("add", classOf[TestAddWithOpen])
> // or this
> tEnv.createTemporaryFunction("add", new TestAddWithOpen)
> {code}
> exception:
> {code}
> Caused by: org.apache.flink.table.api.ValidationException: Invalid function 
> call:
> default_catalog.default_database.add(BIGINT, INT NOT NULL)
>   at 
> org.apache.flink.table.types.inference.TypeInferenceUtil.createInvalidCallException(TypeInferenceUtil.java:193)
>   at 
> org.apache.flink.table.planner.functions.inference.TypeInferenceOperandChecker.checkOperandTypes(TypeInferenceOperandChecker.java:89)
>   at 
> org.apache.calcite.sql.SqlOperator.checkOperandTypes(SqlOperator.java:753)
>   at 
> org.apache.calcite.sql.SqlOperator.validateOperands(SqlOperator.java:499)
>   at org.apache.calcite.sql.SqlFunction.deriveType(SqlFunction.java:335)
>   at org.apache.calcite.sql.SqlFunction.deriveType(SqlFunction.java:231)
>   at 
> org.apache.calcite.sql.validate.SqlValidatorImpl$DeriveTypeVisitor.visit(SqlValidatorImpl.java:6302)
>   at 
> org.apache.calcite.sql.validate.SqlValidatorImpl$DeriveTypeVisitor.visit(SqlValidatorImpl.java:6287)
>   at org.apache.calcite.sql.SqlCall.accept(SqlCall.java:161)
>   at 
> org.apache.calcite.sql.validate.SqlValidatorImpl.deriveTypeImpl(SqlValidatorImpl.java:1869)
>   at 
> org.apache.calcite.sql.validate.SqlValidatorImpl.deriveType(SqlValidatorImpl.java:1860)
>   at 
> org.apache.calcite.sql.type.SqlTypeUtil.deriveType(SqlTypeUtil.java:200)
>   at 
> org.apache.calcite.sql.type.InferTypes.lambda$static$0(InferTypes.java:47)
>   at 
> org.apache.calcite.sql.validate.SqlValidatorImpl.inferUnknownTypes(SqlValidatorImpl.java:2050)
>   at 
> org.apache.calcite.sql.validate.SqlValidatorImpl.inferUnknownTypes(SqlValidatorImpl.java:2055)
>   at 
> org.apache.calcite.sql.validate.SqlValidatorImpl.validateWhereOrOn(SqlValidatorImpl.java:4338)
>   at 
> org.apache.calcite.sql.validate.SqlValidatorImpl.validateJoin(SqlValidatorImpl.java:3410)
>   at 
> org.apache.flink.table.planner.calcite.FlinkCalciteSqlValidator.validateJoin(FlinkCalciteSqlValidator.java:154)
>   at 
> org.apache.calcite.sql.validate.SqlValidatorImpl.validateFrom(SqlValidatorImpl.java:3282)
>   at 
> org.apache.calcite.sql.validate.SqlValidatorImpl.validateSelect(SqlValidatorImpl.java:3603)
>   at 
> org.apache.calcite.sql.validate.SelectNamespace.validateImpl(SelectNamespace.java:64)
>   at 
> org.apache.calcite.sql.validate.AbstractNamespace.validate(AbstractNamespace.java:89)
>   at 
> org.apache.calcite.sql.validate.SqlValidatorImpl.validateNamespace(SqlValidatorImpl.java:1050)
>   at 
> org.apache.calcite.sql.validate.SqlValidatorImpl.validateQuery(SqlValidatorImpl.java:1025)
>   at org.apache.calcite.sql.SqlSelect.validate(SqlSelect.java:248)
>   at 
> org.apache.calcite.sql.validate.SqlValidatorImpl.validateScopedExpression(SqlValidatorImpl.java:1000)
>   at 
> org.apache.calcite.sql.validate.SqlValidatorImpl.validate(SqlValidatorImpl.java:749)
>   at 
> org.apache.flink.table.planner.calcite.FlinkPlannerImpl.org$apache$flink$table$planner$calcite$FlinkPlannerImpl$$validate(FlinkPlannerImpl.scala:196)
>   ... 49 more
> Caused by: org.apache.flink.table.api.ValidationException: Invalid input 
> arguments. Expected signatures are:
> default_catalog.default_database.add(a BIGINT NOT NULL, b INT NOT NULL)
> default_catalog.default_database.add(a BIGINT NOT NULL, b BIGINT NOT NULL)
>   at 
> org.apache.flink.table.types.inference.TypeInferenceUtil.createInvalidInputException(TypeInferenceUtil.java:180)
>   at 
>

[jira] [Commented] (FLINK-31481) Support enhanced show databases syntax

2023-12-02 Thread Jeyhun Karimov (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-31481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17792404#comment-17792404
 ] 

Jeyhun Karimov commented on FLINK-31481:


Hi [~taoran] could you please assign this task to me or give me an access to 
self-assign the task? Thanks!

> Support enhanced show databases syntax
> --
>
> Key: FLINK-31481
> URL: https://issues.apache.org/jira/browse/FLINK-31481
> Project: Flink
>  Issue Type: Sub-task
>  Components: Table SQL / API, Table SQL / Planner
>Reporter: Ran Tao
>Priority: Major
>  Labels: pull-request-available
>
> As FLIP discussed. To avoid bloat, this ticket supports ShowDatabases.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Re: [PR] [hotfix][table] refactoring to template method to separate concerns [flink]

2023-12-02 Thread via GitHub



flinkbot commented on PR #23857:
URL: https://github.com/apache/flink/pull/23857#issuecomment-1837186832

   
   ## CI report:
   
   * b7f62e249f5ad4280b0a994bf87ef64323d2dee3 UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[PR] [hotfix][table] refactoring to template method to separate concerns [flink]

2023-12-02 Thread via GitHub



JingGe opened a new pull request, #23857:
URL: https://github.com/apache/flink/pull/23857

   ## What is the purpose of the change
   
   refactoring to template method to separate concerns
   
   ## Verifying this change
   
   This change is already covered by existing tests, such as TableauStyleTest.
   
   ## Does this pull request potentially affect one of the following parts:
   
 - Dependencies (does it add or upgrade a dependency): (yes / **no**)
 - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: (yes / **no**)
 - The serializers: (yes / **no** / don't know)
 - The runtime per-record code paths (performance sensitive): (yes / **no** 
/ don't know)
 - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Kubernetes/Yarn, ZooKeeper: (yes / **no** / don't 
know)
 - The S3 file system connector: (yes / **no** / don't know)
   
   ## Documentation
   
 - Does this pull request introduce a new feature? (yes / **no**)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] [FLINK-33669][doc] Update the documentation for RestartStrategy, Checkpoint Storage, and State Backend. [flink]

2023-12-02 Thread via GitHub



JingGe commented on code in PR #23847:
URL: https://github.com/apache/flink/pull/23847#discussion_r1412809280


##
docs/content.zh/docs/deployment/filesystems/gcs.md:
##
@@ -44,7 +44,10 @@ env.readTextFile("gs:///");
 stream.writeAsText("gs:///");
 
 // Use GCS as checkpoint storage

Review Comment:
   Thanks for the clarification. If we take a look at other case like azure at 
line 66, oss at line 51, s3 at line 49, they are all translated into Chinese. 
It is a question of consistency. WDYT?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Commented] (FLINK-33240) Generate docs for deprecated options as well

2023-12-02 Thread Jing Ge (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-33240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17792394#comment-17792394
 ] 

Jing Ge commented on FLINK-33240:
-

please check FLINK-30862 for some objections.

> Generate docs for deprecated options as well
> 
>
> Key: FLINK-33240
> URL: https://issues.apache.org/jira/browse/FLINK-33240
> Project: Flink
>  Issue Type: Improvement
>  Components: Documentation
>Reporter: Zhanghao Chen
>Priority: Major
> Fix For: 1.19.0
>
>
> Currently, Flink will skip doc generation for deprecated options (See 
> {{{}ConfigOptionsDocGenerator#{}}}{{{}shouldBeDocumented{}}}). As a result, 
> the deprecated options can no longer be found in the new version of Flink 
> document. This might confuse users upgrading from an older version of Flink 
> and they have to either carefully read the release notes or check the source 
> code for upgrading guidance on deprecated options. I suggest generating doc 
> for deprecated options as well, and we should scan through the code to make 
> sure that proper upgrading guidance is provided for the deprecated options.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (FLINK-33727) JoinRestoreTest is failing on AZP

2023-12-02 Thread Sergey Nuyanzin (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-33727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17792391#comment-17792391
 ] 

Sergey Nuyanzin commented on FLINK-33727:
-

[~dwysakowicz], [~jhughes] could you please have a look here please?

> JoinRestoreTest is failing on AZP
> -
>
> Key: FLINK-33727
> URL: https://issues.apache.org/jira/browse/FLINK-33727
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Planner
>Affects Versions: 1.19.0
>Reporter: Sergey Nuyanzin
>Priority: Critical
>  Labels: test-stability
>
> Since it was introduced in FLINK-33470 it seems to be a reason
> {noformat}
> Dec 02 04:42:26 04:42:26.408 [ERROR] Failures: 
> Dec 02 04:42:26 04:42:26.408 [ERROR]   
> JoinRestoreTest>RestoreTestBase.testRestore:283 
> Dec 02 04:42:26 Expecting actual:
> Dec 02 04:42:26   ["+I[9, carol, apple, 9000]",
> Dec 02 04:42:26 "+I[8, bill, banana, 8000]",
> Dec 02 04:42:26 "+I[6, jerry, pen, 6000]"]
> Dec 02 04:42:26 to contain exactly in any order:
> Dec 02 04:42:26   ["+I[Adam, null]",
> Dec 02 04:42:26 "+I[Baker, Research]",
> Dec 02 04:42:26 "+I[Charlie, Human Resources]",
> Dec 02 04:42:26 "+I[Charlie, HR]",
> Dec 02 04:42:26 "+I[Don, Sales]",
> Dec 02 04:42:26 "+I[Victor, null]",
> Dec 02 04:42:26 "+I[Helena, Engineering]",
> Dec 02 04:42:26 "+I[Juliet, Engineering]",
> Dec 02 04:42:26 "+I[Ivana, Research]",
> Dec 02 04:42:26 "+I[Charlie, People Operations]"]
> {noformat}
> examples of failures
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=55120=logs=0c940707-2659-5648-cbe6-a1ad63045f0a=075c2716-8010-5565-fe08-3c4bb45824a4=12099
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=55129=logs=a9db68b9-a7e0-54b6-0f98-010e0aff39e2=cdd32e0b-6047-565b-c58f-14054472f1be=11786
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=55136=logs=0c940707-2659-5648-cbe6-a1ad63045f0a=075c2716-8010-5565-fe08-3c4bb45824a4=12099
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=55137=logs=0c940707-2659-5648-cbe6-a1ad63045f0a=075c2716-8010-5565-fe08-3c4bb45824a4=11779



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-33727) JoinRestoreTest is failing on AZP

2023-12-02 Thread Sergey Nuyanzin (Jira)

Sergey Nuyanzin created FLINK-33727:
---

 Summary: JoinRestoreTest is failing on AZP
 Key: FLINK-33727
 URL: https://issues.apache.org/jira/browse/FLINK-33727
 Project: Flink
  Issue Type: Bug
  Components: Table SQL / Planner
Affects Versions: 1.19.0
Reporter: Sergey Nuyanzin


Since it was introduced in FLINK-33470 it seems to be a reason

{noformat}
Dec 02 04:42:26 04:42:26.408 [ERROR] Failures: 
Dec 02 04:42:26 04:42:26.408 [ERROR]   
JoinRestoreTest>RestoreTestBase.testRestore:283 
Dec 02 04:42:26 Expecting actual:
Dec 02 04:42:26   ["+I[9, carol, apple, 9000]",
Dec 02 04:42:26 "+I[8, bill, banana, 8000]",
Dec 02 04:42:26 "+I[6, jerry, pen, 6000]"]
Dec 02 04:42:26 to contain exactly in any order:
Dec 02 04:42:26   ["+I[Adam, null]",
Dec 02 04:42:26 "+I[Baker, Research]",
Dec 02 04:42:26 "+I[Charlie, Human Resources]",
Dec 02 04:42:26 "+I[Charlie, HR]",
Dec 02 04:42:26 "+I[Don, Sales]",
Dec 02 04:42:26 "+I[Victor, null]",
Dec 02 04:42:26 "+I[Helena, Engineering]",
Dec 02 04:42:26 "+I[Juliet, Engineering]",
Dec 02 04:42:26 "+I[Ivana, Research]",
Dec 02 04:42:26 "+I[Charlie, People Operations]"]
{noformat}
examples of failures

https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=55120=logs=0c940707-2659-5648-cbe6-a1ad63045f0a=075c2716-8010-5565-fe08-3c4bb45824a4=12099
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=55129=logs=a9db68b9-a7e0-54b6-0f98-010e0aff39e2=cdd32e0b-6047-565b-c58f-14054472f1be=11786
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=55136=logs=0c940707-2659-5648-cbe6-a1ad63045f0a=075c2716-8010-5565-fe08-3c4bb45824a4=12099
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=55137=logs=0c940707-2659-5648-cbe6-a1ad63045f0a=075c2716-8010-5565-fe08-3c4bb45824a4=11779



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Re: [PR] [FLINK-33470] Implement restore tests for Join node [flink]

2023-12-02 Thread via GitHub



snuyanzin commented on PR #23680:
URL: https://github.com/apache/flink/pull/23680#issuecomment-1837157958

   @dawidwys @jnh5y after merging this ci on master failed 4 times out of 6
   e.g.
   ```
   Dec 02 04:42:26 04:42:26.408 [ERROR] Failures: 
   Dec 02 04:42:26 04:42:26.408 [ERROR]   
JoinRestoreTest>RestoreTestBase.testRestore:283 
   Dec 02 04:42:26 Expecting actual:
   Dec 02 04:42:26   ["+I[9, carol, apple, 9000]",
   Dec 02 04:42:26 "+I[8, bill, banana, 8000]",
   Dec 02 04:42:26 "+I[6, jerry, pen, 6000]"]
   Dec 02 04:42:26 to contain exactly in any order:
   Dec 02 04:42:26   ["+I[Adam, null]",
   Dec 02 04:42:26 "+I[Baker, Research]",
   Dec 02 04:42:26 "+I[Charlie, Human Resources]",
   Dec 02 04:42:26 "+I[Charlie, HR]",
   Dec 02 04:42:26 "+I[Don, Sales]",
   Dec 02 04:42:26 "+I[Victor, null]",
   Dec 02 04:42:26 "+I[Helena, Engineering]",
   Dec 02 04:42:26 "+I[Juliet, Engineering]",
   Dec 02 04:42:26 "+I[Ivana, Research]",
   Dec 02 04:42:26 "+I[Charlie, People Operations]"]
   ```
   
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=55120=logs=0c940707-2659-5648-cbe6-a1ad63045f0a=075c2716-8010-5565-fe08-3c4bb45824a4=12099
   
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=55129=logs=a9db68b9-a7e0-54b6-0f98-010e0aff39e2=cdd32e0b-6047-565b-c58f-14054472f1be=11786
   
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=55136=logs=0c940707-2659-5648-cbe6-a1ad63045f0a=075c2716-8010-5565-fe08-3c4bb45824a4=12099
   
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=55137=logs=0c940707-2659-5648-cbe6-a1ad63045f0a=075c2716-8010-5565-fe08-3c4bb45824a4=11779
   
   
   could you please have a look?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Commented] (FLINK-33190) [Umbrella]Externalized Connectors should directly depend on 3rd-party libs instead of shaded repo

2023-12-02 Thread Jing Ge (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-33190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17792386#comment-17792386
 ] 

Jing Ge commented on FLINK-33190:
-

[~martijnvisser] thanks for the hint!

> [Umbrella]Externalized Connectors should directly depend on 3rd-party libs 
> instead of shaded repo 
> --
>
> Key: FLINK-33190
> URL: https://issues.apache.org/jira/browse/FLINK-33190
> Project: Flink
>  Issue Type: Improvement
>  Components: Build System
>Affects Versions: 1.18.0
>Reporter: Jing Ge
>Priority: Major
>
> Connectors shouldn't depend on flink-shaded.
> The overhead and/or risks of doing/supporting that right now far
> outweigh the benefits.
> ( Because we either have to encode the full version for all dependencies
> into the package, or accept the risk of minor/patch dependency clashes)
> Connectors are small enough in scope that depending directly on
> guava/jackson/etc. is a fine approach, and they have plenty of other
> dependencies that they need to manage anyway; let's treat these the same
> way.
>  
> https://lists.apache.org/thread/mtypmprz2b5p20gj064d0wsz3k0ofpco



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Closed] (FLINK-33193) JDBC Connector should directly depend on 3rd-party libs instead of flink-shaded repo

2023-12-02 Thread Jing Ge (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-33193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Ge closed FLINK-33193.
---
Resolution: Won't Fix

> JDBC Connector should directly depend on 3rd-party libs instead of 
> flink-shaded repo
> 
>
> Key: FLINK-33193
> URL: https://issues.apache.org/jira/browse/FLINK-33193
> Project: Flink
>  Issue Type: Sub-task
>  Components: Connectors / JDBC
>Affects Versions: 1.18.0
>Reporter: Jing Ge
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Closed] (FLINK-33195) ElasticSearch Connector should directly depend on 3rd-party libs instead of flink-shaded repo

2023-12-02 Thread Jing Ge (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-33195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Ge closed FLINK-33195.
---
Resolution: Won't Fix

> ElasticSearch Connector should directly depend on 3rd-party libs instead of 
> flink-shaded repo
> -
>
> Key: FLINK-33195
> URL: https://issues.apache.org/jira/browse/FLINK-33195
> Project: Flink
>  Issue Type: Sub-task
>  Components: Connectors / ElasticSearch
>Affects Versions: 1.18.0
>Reporter: Jing Ge
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (FLINK-33698) Fix the backoff time calculation in ExponentialBackoffDelayRetryStrategy

2023-12-02 Thread xiangyu feng (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-33698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xiangyu feng updated FLINK-33698:
-
Description: 
The backoff time calculation in `ExponentialBackoffDelayRetryStrategy` is 
problematic due to the lack of a reset when reusing the object.

 

Current Version:
{code:java}
@Override
public long getBackoffTimeMillis(int currentAttempts) {
if (currentAttempts <= 1) {
// equivalent to initial delay
return lastRetryDelay;
}
long backoff = Math.min((long) (lastRetryDelay * multiplier), 
maxRetryDelay);
this.lastRetryDelay = backoff;
return backoff;
} {code}
Fixed Version:
{code:java}
@Override
public long getBackoffTimeMillis(int currentAttempts) {
if (currentAttempts <= 1) {
// reset to initialDelay
this.lastRetryDelay = initialDelay;
return lastRetryDelay;
}

long backoff = Math.min((long) (lastRetryDelay * multiplier), 
maxRetryDelay);
this.lastRetryDelay = backoff;
return backoff;
}{code}

  was:
The backoff time calculation in `ExponentialBackoffDelayRetryStrategy` should 
consider currentAttempts. 

 

Current Version:
{code:java}
@Override
public long getBackoffTimeMillis(int currentAttempts) {
if (currentAttempts <= 1) {
// equivalent to initial delay
return lastRetryDelay;
}
long backoff = Math.min((long) (lastRetryDelay * multiplier), 
maxRetryDelay);
this.lastRetryDelay = backoff;
return backoff;
} {code}
Fixed Version:
{code:java}
@Override
public long getBackoffTimeMillis(int currentAttempts) {
if (currentAttempts <= 1) {
// reset to initialDelay
this.lastRetryDelay = initialDelay;
return lastRetryDelay;
}

long backoff = Math.min((long) (lastRetryDelay * multiplier), 
maxRetryDelay);
this.lastRetryDelay = backoff;
return backoff;
}{code}


> Fix the backoff time calculation in ExponentialBackoffDelayRetryStrategy
> 
>
> Key: FLINK-33698
> URL: https://issues.apache.org/jira/browse/FLINK-33698
> Project: Flink
>  Issue Type: Bug
>  Components: API / DataStream
>Reporter: xiangyu feng
>Assignee: xiangyu feng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.19.0
>
>
> The backoff time calculation in `ExponentialBackoffDelayRetryStrategy` is 
> problematic due to the lack of a reset when reusing the object.
>  
> Current Version:
> {code:java}
> @Override
> public long getBackoffTimeMillis(int currentAttempts) {
> if (currentAttempts <= 1) {
> // equivalent to initial delay
> return lastRetryDelay;
> }
> long backoff = Math.min((long) (lastRetryDelay * multiplier), 
> maxRetryDelay);
> this.lastRetryDelay = backoff;
> return backoff;
> } {code}
> Fixed Version:
> {code:java}
> @Override
> public long getBackoffTimeMillis(int currentAttempts) {
> if (currentAttempts <= 1) {
> // reset to initialDelay
> this.lastRetryDelay = initialDelay;
> return lastRetryDelay;
> }
> long backoff = Math.min((long) (lastRetryDelay * multiplier), 
> maxRetryDelay);
> this.lastRetryDelay = backoff;
> return backoff;
> }{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (FLINK-33698) Fix the backoff time calculation in ExponentialBackoffDelayRetryStrategy

2023-12-02 Thread xiangyu feng (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-33698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xiangyu feng updated FLINK-33698:
-
Description: 
The backoff time calculation in `ExponentialBackoffDelayRetryStrategy` should 
consider currentAttempts. 

 

Current Version:
{code:java}
@Override
public long getBackoffTimeMillis(int currentAttempts) {
if (currentAttempts <= 1) {
// equivalent to initial delay
return lastRetryDelay;
}
long backoff = Math.min((long) (lastRetryDelay * multiplier), 
maxRetryDelay);
this.lastRetryDelay = backoff;
return backoff;
} {code}
Fixed Version:
{code:java}
@Override
public long getBackoffTimeMillis(int currentAttempts) {
if (currentAttempts <= 1) {
// reset to initialDelay
this.lastRetryDelay = initialDelay;
return lastRetryDelay;
}

long backoff = Math.min((long) (lastRetryDelay * multiplier), 
maxRetryDelay);
this.lastRetryDelay = backoff;
return backoff;
}{code}

  was:
The backoff time calculation in `ExponentialBackoffDelayRetryStrategy` should 
consider currentAttempts. 

 

Current Version:
{code:java}
@Override
public long getBackoffTimeMillis(int currentAttempts) {
if (currentAttempts <= 1) {
// equivalent to initial delay
return lastRetryDelay;
}
long backoff = Math.min((long) (lastRetryDelay * multiplier), 
maxRetryDelay);
this.lastRetryDelay = backoff;
return backoff;
} {code}
Fixed Version:
{code:java}
@Override
public long getBackoffTimeMillis(int currentAttempts) {
if (currentAttempts <= 1) {
// equivalent to initial delay
return initialDelay;
}
long backoff =
Math.min(
(long) (initialDelay * Math.pow(multiplier, currentAttempts 
- 1)),
maxRetryDelay);
return backoff;
} {code}


> Fix the backoff time calculation in ExponentialBackoffDelayRetryStrategy
> 
>
> Key: FLINK-33698
> URL: https://issues.apache.org/jira/browse/FLINK-33698
> Project: Flink
>  Issue Type: Bug
>  Components: API / DataStream
>Reporter: xiangyu feng
>Assignee: xiangyu feng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.19.0
>
>
> The backoff time calculation in `ExponentialBackoffDelayRetryStrategy` should 
> consider currentAttempts. 
>  
> Current Version:
> {code:java}
> @Override
> public long getBackoffTimeMillis(int currentAttempts) {
> if (currentAttempts <= 1) {
> // equivalent to initial delay
> return lastRetryDelay;
> }
> long backoff = Math.min((long) (lastRetryDelay * multiplier), 
> maxRetryDelay);
> this.lastRetryDelay = backoff;
> return backoff;
> } {code}
> Fixed Version:
> {code:java}
> @Override
> public long getBackoffTimeMillis(int currentAttempts) {
> if (currentAttempts <= 1) {
> // reset to initialDelay
> this.lastRetryDelay = initialDelay;
> return lastRetryDelay;
> }
> long backoff = Math.min((long) (lastRetryDelay * multiplier), 
> maxRetryDelay);
> this.lastRetryDelay = backoff;
> return backoff;
> }{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Re: [PR] [FLINK-33710] Prevent triggering cluster upgrades for permutations of the same overrides [flink-kubernetes-operator]

2023-12-02 Thread via GitHub



afedulov commented on code in PR #721:
URL: 
https://github.com/apache/flink-kubernetes-operator/pull/721#discussion_r1412799765


##
flink-kubernetes-operator/src/main/java/org/apache/flink/kubernetes/operator/reconciler/deployment/AbstractFlinkResourceReconciler.java:
##
@@ -180,13 +190,28 @@ public void reconcile(FlinkResourceContext ctx) 
throws Exception {
 }
 }
 
-private void applyAutoscaler(FlinkResourceContext ctx) throws 
Exception {
+private void applyAutoscaler(FlinkResourceContext ctx, @Nullable 
String existingOverrides)
+throws Exception {
 var autoScalerCtx = ctx.getJobAutoScalerContext();
 boolean autoscalerEnabled =
 ctx.getResource().getSpec().getJob() != null
 && 
ctx.getObserveConfig().getBoolean(AUTOSCALER_ENABLED);
 autoScalerCtx.getConfiguration().set(AUTOSCALER_ENABLED, 
autoscalerEnabled);
+
 autoscaler.scale(autoScalerCtx);
+
+// Check that the overrides actually changed and not merely the String 
representation

Review Comment:
   ```suggestion
   // Prevents subsequent unneeded spec updates when the `scale` 
operation only changes the order of the parallelism overrides (required because 
an unsorted map was used in the past).
   ```
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Updated] (FLINK-33684) Improve the retry strategy in CollectResultFetcher

2023-12-02 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-33684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-33684:
---
Labels: pull-request-available  (was: )

> Improve the retry strategy in CollectResultFetcher
> --
>
> Key: FLINK-33684
> URL: https://issues.apache.org/jira/browse/FLINK-33684
> Project: Flink
>  Issue Type: Sub-task
>  Components: API / DataStream
>Reporter: xiangyu feng
>Priority: Major
>  Labels: pull-request-available
>
> Currently  CollectResultFetcher use a fixed retry interval.
> {code:java}
> private void sleepBeforeRetry() {
> if (retryMillis <= 0) {
> return;
> }
> try {
> // TODO a more proper retry strategy?
> Thread.sleep(retryMillis);
> } catch (InterruptedException e) {
> LOG.warn("Interrupted when sleeping before a retry", e);
> }
> } {code}
> This can be improved with a RetryStrategy.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Re: [PR] [FLINK-33669][doc] Update the documentation for RestartStrategy, Checkpoint Storage, and State Backend. [flink]

2023-12-02 Thread via GitHub



1996fanrui commented on code in PR #23847:
URL: https://github.com/apache/flink/pull/23847#discussion_r1412799696


##
docs/content/docs/ops/state/task_failure_recovery.md:
##
@@ -117,11 +116,11 @@ The fixed delay restart strategy can also be set 
programmatically:
 {{< tabs "73f5d009-b9af-4bfe-be22-d1c4659fd1ec" >}}
 {{< tab "Java" >}}
 ```java
-StreamExecutionEnvironment env = 
StreamExecutionEnvironment.getExecutionEnvironment();
-env.setRestartStrategy(RestartStrategies.fixedDelayRestart(
-  3, // number of restart attempts
-  Time.of(10, TimeUnit.SECONDS) // delay
-));
+Configuration config = new Configuration();
+config.set(RestartStrategyOptions.RESTART_STRATEGY, "fixed-delay");
+config.set(RestartStrategyOptions.RESTART_STRATEGY_FIXED_DELAY_ATTEMPTS, 3); 
// number of restart attempts
+config.set(RestartStrategyOptions.RESTART_STRATEGY_FIXED_DELAY_DELAY, 
Duration.ofSeconds(10)); // delay
+StreamExecutionEnvironment env = 
StreamExecutionEnvironment.getExecutionEnvironment(config);
 ```

Review Comment:
   There is already a yaml demo above, maybe we don't need a demo for 
programmatic set? Not sure, looking forward to your feedback.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] [FLINK-33684] Use IncrementalDelayRetryStrategy in CollectResultFetcher [flink]

2023-12-02 Thread via GitHub



flinkbot commented on PR #23856:
URL: https://github.com/apache/flink/pull/23856#issuecomment-1837148341

   
   ## CI report:
   
   * 7a0b6a4637b823329768e45f0f156051dfa05356 UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[PR] Flink 33684 [flink]

2023-12-02 Thread via GitHub



xiangyuf opened a new pull request, #23856:
URL: https://github.com/apache/flink/pull/23856

   
   ## What is the purpose of the change
   
   Currently  `CollectResultFetcher` uses a fixed retry interval. This can be 
improved with `IncrementalDelayRetryStrategy`.  `IncrementalDelayRetryStrategy` 
is a balance between job E2E latency and request count.
   
   
   ## Brief change log
   
   *(for example:)*
 - Use `IncrementalDelayRetryStrategy` in `CollectResultFetcher` to decide 
the retry interval  between requests.
   
   
   ## Verifying this change
   
 - *Added integration tests `CollectResultFetcherRetryITCase` to test the 
`CollectResultFetcher` retry behavior
   
   ## Does this pull request potentially affect one of the following parts:
   
 - Dependencies (does it add or upgrade a dependency): no
 - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: no
 - The serializers: no
 - The runtime per-record code paths (performance sensitive): no
 - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Kubernetes/Yarn, ZooKeeper: no
 - The S3 file system connector: no
   
   ## Documentation
   
 - Does this pull request introduce a new feature? no
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] [FLINK-33669][doc] Update the documentation for RestartStrategy, Checkpoint Storage, and State Backend. [flink]

2023-12-02 Thread via GitHub



1996fanrui commented on code in PR #23847:
URL: https://github.com/apache/flink/pull/23847#discussion_r1412797465


##
docs/content.zh/docs/ops/state/task_failure_recovery.md:
##
@@ -49,20 +49,18 @@ Flink 作业如果没有定义重启策略，则会遵循集群启动时加载
 {{< generated/restart_strategy_configuration >}}
 
 除了定义默认的重启策略以外，还可以为每个 Flink 作业单独定义重启策略。
-这个重启策略通过在程序中的 `StreamExecutionEnvironment` 对象上调用 `setRestartStrategy` 方法来设置。
-当然，对于 `StreamExecutionEnvironment` 也同样适用。
 
 下例展示了如何给我们的作业设置固定延时重启策略。
 如果发生故障，系统会重启作业 3 次，每两次连续的重启尝试之间等待 10 秒钟。
 
 {{< tabs "2b011473-9a34-4e7b-943b-be4a9071fe3c" >}}
 {{< tab "Java" >}}

Review Comment:
   Hey @JunRuiLee , I'm thinking should we still have 3 parts(Java, Scala, 
Python) here? 
   
   We recommend users use the options instead of api for all languages, it 
includes Scala and Python. I think only keeping options is enough here even if 
we didn't depreate the api for scala and python.
   
   WDYT?
   
   Note: if you agree it, the comments should take effect for all doc related 
to this FLIP.



##
docs/content.zh/docs/ops/state/task_failure_recovery.md:
##
@@ -49,20 +49,18 @@ Flink 作业如果没有定义重启策略，则会遵循集群启动时加载
 {{< generated/restart_strategy_configuration >}}
 
 除了定义默认的重启策略以外，还可以为每个 Flink 作业单独定义重启策略。
-这个重启策略通过在程序中的 `StreamExecutionEnvironment` 对象上调用 `setRestartStrategy` 方法来设置。
-当然，对于 `StreamExecutionEnvironment` 也同样适用。
 
 下例展示了如何给我们的作业设置固定延时重启策略。
 如果发生故障，系统会重启作业 3 次，每两次连续的重启尝试之间等待 10 秒钟。
 
 {{< tabs "2b011473-9a34-4e7b-943b-be4a9071fe3c" >}}
 {{< tab "Java" >}}

Review Comment:
   Hey @JunRuiLee , I'm thinking should we still have 3 parts(Java, Scala, 
Python) here? 
   
   We recommend users use the options instead of api for all languages, it 
includes Scala and Python. I think only keeping options is enough here even if 
we didn't depreate the api for scala and python.
   
   WDYT?
   
   Note: if you agree it, this comment should take effect for all doc related 
to this FLIP.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] [FLINK-33536] Fix Flink Table API CSV streaming sink fails with IOException: Stream closed [flink]

2023-12-02 Thread via GitHub



Samrat002 commented on PR #23725:
URL: https://github.com/apache/flink/pull/23725#issuecomment-1837144127

   @PrabhuJoseph Please review whenever time


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] [FLINK-16175][table sql/ api][WIP]Add config option to switch case sensitive for column names in SQL [flink]

2023-12-02 Thread via GitHub



leonardBang closed pull request #11535: [FLINK-16175][table sql/ api][WIP]Add 
config option to switch case sensitive for column names in SQL
URL: https://github.com/apache/flink/pull/11535


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Resolved] (FLINK-33625) FLIP-390: Support System out and err to be redirected to LOG or discarded

2023-12-02 Thread Rui Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-33625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Fan resolved FLINK-33625.
-
Resolution: Fixed

> FLIP-390: Support System out and err to be redirected to LOG or discarded
> -
>
> Key: FLINK-33625
> URL: https://issues.apache.org/jira/browse/FLINK-33625
> Project: Flink
>  Issue Type: New Feature
>  Components: Runtime / Configuration
>Reporter: Rui Fan
>Assignee: Rui Fan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.19.0
>
>
> Get more from https://cwiki.apache.org/confluence/x/4guZE



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (FLINK-33625) FLIP-390: Support System out and err to be redirected to LOG or discarded

2023-12-02 Thread Rui Fan (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-33625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17792364#comment-17792364
 ] 

Rui Fan commented on FLINK-33625:
-

Merged master<1.19.0> via: 186ed0eb0449a7bf3c216067798dc47a0c5b36a6

> FLIP-390: Support System out and err to be redirected to LOG or discarded
> -
>
> Key: FLINK-33625
> URL: https://issues.apache.org/jira/browse/FLINK-33625
> Project: Flink
>  Issue Type: New Feature
>  Components: Runtime / Configuration
>Reporter: Rui Fan
>Assignee: Rui Fan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.19.0
>
>
> Get more from https://cwiki.apache.org/confluence/x/4guZE



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Re: [PR] [FLINK-33625][runtime] Support System out and err to be redirected to LOG or discarded [flink]

2023-12-02 Thread via GitHub



1996fanrui merged PR #23800:
URL: https://github.com/apache/flink/pull/23800


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] [FLINK-33584][Filesystems] Update Hadoop Filesystem dependencies to 3.3.6 [flink]

2023-12-02 Thread via GitHub



MartijnVisser commented on PR #23844:
URL: https://github.com/apache/flink/pull/23844#issuecomment-1837112592

   I'll run the S3 tests locally before merging it! 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Updated] (FLINK-33726) Print cost time for stream queries in SQL Client

2023-12-02 Thread Jing Ge (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-33726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Ge updated FLINK-33726:

Description: 
The time cost information is expected when executing stream queries in SQL CLI. 

For example:
{code:java}
Flink SQL> select * from (values ('abc', 123));
+++
| EXPR$0 | EXPR$1 |
+++
|abc |123 |
+++ 
Received a total of 1 rows  (0.22 seconds){code}

> Print cost time for stream queries in SQL Client
> 
>
> Key: FLINK-33726
> URL: https://issues.apache.org/jira/browse/FLINK-33726
> Project: Flink
>  Issue Type: Improvement
>  Components: Table SQL / Client
>Reporter: Jing Ge
>Assignee: Jing Ge
>Priority: Major
>
> The time cost information is expected when executing stream queries in SQL 
> CLI. 
> For example:
> {code:java}
> Flink SQL> select * from (values ('abc', 123));
> +++
> | EXPR$0 | EXPR$1 |
> +++
> |abc |123 |
> +++ 
> Received a total of 1 rows  (0.22 seconds){code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-33726) Print cost time for stream queries in SQL Client

2023-12-02 Thread Jing Ge (Jira)

Jing Ge created FLINK-33726:
---

 Summary: Print cost time for stream queries in SQL Client
 Key: FLINK-33726
 URL: https://issues.apache.org/jira/browse/FLINK-33726
 Project: Flink
  Issue Type: Improvement
  Components: Table SQL / Client
Reporter: Jing Ge
Assignee: Jing Ge






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (FLINK-33600) Print cost time for batch queries in SQL Client

2023-12-02 Thread Jing Ge (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-33600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17792356#comment-17792356
 ] 

Jing Ge commented on FLINK-33600:
-

master: c20c13fb5cb78eff2cbd08ea48f1cb7cf9a1981c

> Print cost time for batch queries in SQL Client
> ---
>
> Key: FLINK-33600
> URL: https://issues.apache.org/jira/browse/FLINK-33600
> Project: Flink
>  Issue Type: New Feature
>  Components: Table SQL / Client
>Reporter: Jark Wu
>Assignee: Jing Ge
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.19.0
>
>
> Currently, there is no cost time information when executing batch queries in 
> SQL CLI. But this is very helpful in OLAP/ad-hoc scenarios. 
> For example: 
> {code}
> Flink SQL> select * from (values ('abc', 123));
> +++
> | EXPR$0 | EXPR$1 |
> +++
> |abc |123 |
> +++
> 1 row in set  (0.22 seconds)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Resolved] (FLINK-33600) Print cost time for batch queries in SQL Client

2023-12-02 Thread Jing Ge (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-33600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Ge resolved FLINK-33600.
-
Resolution: Fixed

> Print cost time for batch queries in SQL Client
> ---
>
> Key: FLINK-33600
> URL: https://issues.apache.org/jira/browse/FLINK-33600
> Project: Flink
>  Issue Type: New Feature
>  Components: Table SQL / Client
>Reporter: Jark Wu
>Assignee: Jing Ge
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.19.0
>
>
> Currently, there is no cost time information when executing batch queries in 
> SQL CLI. But this is very helpful in OLAP/ad-hoc scenarios. 
> For example: 
> {code}
> Flink SQL> select * from (values ('abc', 123));
> +++
> | EXPR$0 | EXPR$1 |
> +++
> |abc |123 |
> +++
> 1 row in set  (0.22 seconds)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Re: [PR] FLINK-33600][table] print the query time cost for batch query in cli [flink]

2023-12-02 Thread via GitHub



JingGe merged PR #23809:
URL: https://github.com/apache/flink/pull/23809


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Commented] (FLINK-33534) PipelineOptions.PARALLELISM_OVERRIDES is not picked up from jar submission request

2023-12-02 Thread Gyula Fora (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-33534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17792351#comment-17792351
 ] 

Gyula Fora commented on FLINK-33534:


I assigned it to you! I haven’t tried to repro this outside the operator 

> PipelineOptions.PARALLELISM_OVERRIDES is not picked up from jar submission 
> request
> --
>
> Key: FLINK-33534
> URL: https://issues.apache.org/jira/browse/FLINK-33534
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Configuration, Runtime / REST
>Affects Versions: 1.18.0, 1.17.1
>Reporter: Gyula Fora
>Assignee: Yunfeng Zhou
>Priority: Major
>
> PARALLELISM_OVERRIDES are currently only applied when they are part of the 
> JobManager / Cluster configuration.
> When this config is provided as part of the JarRunRequestBody it is 
> completely ignored and does not take effect. 
> The main reason is that the dispatcher reads this value from it's own 
> configuration object and does not include the extra configs passed through 
> the rest request.
> This is a blocker for supporting the autoscaler properly for FlinkSessionJobs 
> in the autoscaler



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Assigned] (FLINK-33534) PipelineOptions.PARALLELISM_OVERRIDES is not picked up from jar submission request

2023-12-02 Thread Gyula Fora (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-33534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gyula Fora reassigned FLINK-33534:
--

Assignee: Yunfeng Zhou

> PipelineOptions.PARALLELISM_OVERRIDES is not picked up from jar submission 
> request
> --
>
> Key: FLINK-33534
> URL: https://issues.apache.org/jira/browse/FLINK-33534
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Configuration, Runtime / REST
>Affects Versions: 1.18.0, 1.17.1
>Reporter: Gyula Fora
>Assignee: Yunfeng Zhou
>Priority: Major
>
> PARALLELISM_OVERRIDES are currently only applied when they are part of the 
> JobManager / Cluster configuration.
> When this config is provided as part of the JarRunRequestBody it is 
> completely ignored and does not take effect. 
> The main reason is that the dispatcher reads this value from it's own 
> configuration object and does not include the extra configs passed through 
> the rest request.
> This is a blocker for supporting the autoscaler properly for FlinkSessionJobs 
> in the autoscaler



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

53 matches

Mail list logo