date:20230828

[GitHub] [flink] JunRuiLee commented on pull request #23181: W.I.P only used to run ci.

2023-08-28 Thread via GitHub



JunRuiLee commented on PR #23181:
URL: https://github.com/apache/flink/pull/23181#issuecomment-1696796786

   @flinkbot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Comment Edited] (FLINK-32678) Release Testing: Stress-Test to cover multiple low-level changes in Flink

2023-08-28 Thread Yang Wang (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-32678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17759570#comment-17759570
 ] 

Yang Wang edited comment on FLINK-32678 at 8/29/23 5:20 AM:


*Stress Test*
Run 1000 Flink Jobs with 1 JM and 1 TM for each
1. Flink version 1.15.4 with {{high-availability.use-old-ha-services=true}}
Flink JobManager has 4 leader electors(RestServer, ResourceManager, Dispatcher, 
JobManager) to periodically update the K8s ConfigMap. So the QPS of {{PUT 
ConfigMap}}  for 1000 jobs will be roughly 800 req/s ≈ 4(leader elector) * 
1000(Flink JobManager pods) / 5(renew interval). The the QPS of {{GET 
ConfigMap}} is twice as much as {{{}PUT{}}}.

2. Flink version 1.17.1(same as 1.15.4 with 
{{{}high-availability.use-old-ha-services=false{}}})
Flink will only have one shared leader elector. So the QPS of {{PUT ConfigMap}} 
 for 1000 jobs will be roughly 200 req/s ≈ 1(leader elector) * 1000(Flink 
JobManager pods) / 5(renew interval). The the QPS of {{GET ConfigMap}} is twice 
as much as {{{}PUT{}}}.
 
3. Flink version 1.18-snapshot
Flink will only have one shared leader elector. So the QPS of {{PATCH 
ConfigMap}}  for 1000 jobs will be roughly 200 req/s ≈ 1(leader elector) * 
1000(Flink JobManager pods) / 5(renew interval). The the QPS of {{GET 
ConfigMap}} is same as {{{}PATCH{}}}.
 
!qps-configmap-put-115.png|width=694,height=176!
!qps-configmap-put-117.jpg|width=694,height=176!
!qps-configmap-patch-118.png|width=694,height=176!

>From the above two pictures, we could verify that the new leader elector in 
>1.18 only sends a quarter of the write requests of the old one in 1.15 on the 
>K8s APIServer. It will significantly reduce the stress on the K8s APIServer.

 

!qps-configmap-get-115.png|width=694,height=176!
!qps-configmap-get-117.jpg|width=694,height=176!
!qps-configmap-get-118.png|width=694,height=176!

We could also find that the read requests of 1.18 are half of the 1.17. The 
root cause is fabric8 6.6.2(FLINK-31997) has introduced the PATCH http method 
for updating the leader annotation. It will save a GET request for each update.

 
||Flink Version||PUT/PATCH QPS||GET QPS||
|1.15.4 with old HA|800|1600|
|1.17.1|200|400|
|1.18.0|200|200|

 

All in all, the Flink 1.18 takes less stress on the K8s APIServer while all the 
1000 Flink jobs run normally as before.


was (Author: fly_in_gis):
*Stress Test*
Run 1000 Flink Jobs with 1 JM and 1 TM for each
1. Flink version 1.15.4 with {{high-availability.use-old-ha-services=true}}
Flink JobManager has 4 leader electors(RestServer, ResourceManager, Dispatcher, 
JobManager) to periodically update the K8s ConfigMap. So the QPS of {{PUT 
ConfigMap}}  for 1000 jobs will be roughly 800 req/s ≈ 4(leader elector) * 
1000(Flink JobManager pods) / 5(renew interval). The the QPS of {{GET 
ConfigMap}} is twice as much as {{{}PUT{}}}.

2. Flink version 1.17.1(same as 1.15.4 with 
{{{}high-availability.use-old-ha-services=false{}}})
Flink will only have one shared leader elector. So the QPS of {{PUT ConfigMap}} 
 for 1000 jobs will be roughly 200 req/s ≈ 1(leader elector) * 1000(Flink 
JobManager pods) / 5(renew interval). The the QPS of {{GET ConfigMap}} is twice 
as much as {{{}PUT{}}}.
 
3. Flink version 1.18-snapshot
Flink will only have one shared leader elector. So the QPS of {{PATCH 
ConfigMap}}  for 1000 jobs will be roughly 200 req/s ≈ 1(leader elector) * 
1000(Flink JobManager pods) / 5(renew interval). The the QPS of {{GET 
ConfigMap}} is same as {{{}PATCH{}}}.
 
!qps-configmap-put-115.png|width=694,height=176!
!qps-configmap-put-117.jpg|width=694,height=176!
!qps-configmap-patch-118.png|width=694,height=176!

>From the above two pictures, we could verify that the new leader elector in 
>1.18 only sends a quarter of the write requests of the old one in 1.15 on the 
>K8s APIServer. It will significantly reduce the stress on the K8s APIServer.

 

!qps-configmap-get-115.png|width=694,height=176!
!qps-configmap-get-117.jpg|width=694,height=176!
!qps-configmap-get-118.png|width=694,height=176!

We could also find that the read requests are half of the 1.17. The root cause 
is fabric8 6.6.2(FLINK-31997) has introduced the PATCH http method for updating 
the leader annotation. It will save a GET request for each update.

 
||Flink Version||PUT/PATCH QPS||GET QPS||
|1.15.4 with old HA|800|1600|
|1.17.1|200|400|
|1.18.0|200|200|

 

All in all, the Flink 1.18 takes less stress on the K8s APIServer while all the 
1000 Flink jobs run normally as before.

> Release Testing: Stress-Test to cover multiple low-level changes in Flink
> -
>
> Key: FLINK-32678
> URL: https://issues.apache.org/jira/browse/FLINK-32678
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Coordination
>Affects Versions: 1.18.0
>

[jira] [Comment Edited] (FLINK-32678) Release Testing: Stress-Test to cover multiple low-level changes in Flink

2023-08-28 Thread Yang Wang (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-32678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17759570#comment-17759570
 ] 

Yang Wang edited comment on FLINK-32678 at 8/29/23 5:20 AM:


*Stress Test*
Run 1000 Flink Jobs with 1 JM and 1 TM for each
1. Flink version 1.15.4 with {{high-availability.use-old-ha-services=true}}
Flink JobManager has 4 leader electors(RestServer, ResourceManager, Dispatcher, 
JobManager) to periodically update the K8s ConfigMap. So the QPS of {{PUT 
ConfigMap}}  for 1000 jobs will be roughly 800 req/s ≈ 4(leader elector) * 
1000(Flink JobManager pods) / 5(renew interval). The the QPS of {{GET 
ConfigMap}} is twice as much as {{{}PUT{}}}.

2. Flink version 1.17.1(same as 1.15.4 with 
{{{}high-availability.use-old-ha-services=false{}}})
Flink will only have one shared leader elector. So the QPS of {{PUT ConfigMap}} 
 for 1000 jobs will be roughly 200 req/s ≈ 1(leader elector) * 1000(Flink 
JobManager pods) / 5(renew interval). The the QPS of {{GET ConfigMap}} is twice 
as much as {{{}PUT{}}}.
 
3. Flink version 1.18-snapshot
Flink will only have one shared leader elector. So the QPS of {{PATCH 
ConfigMap}}  for 1000 jobs will be roughly 200 req/s ≈ 1(leader elector) * 
1000(Flink JobManager pods) / 5(renew interval). The the QPS of {{GET 
ConfigMap}} is same as {{{}PATCH{}}}.
 
!qps-configmap-put-115.png|width=694,height=176!
!qps-configmap-put-117.jpg|width=694,height=176!
!qps-configmap-patch-118.png|width=694,height=176!

>From the above two pictures, we could verify that the new leader elector in 
>1.18 only sends a quarter of the write requests of the old one in 1.15 on the 
>K8s APIServer. It will significantly reduce the stress on the K8s APIServer.

 

!qps-configmap-get-115.png|width=694,height=176!
!qps-configmap-get-117.jpg|width=694,height=176!
!qps-configmap-get-118.png|width=694,height=176!

We could also find that the read requests are half of the 1.17. The root cause 
is fabric8 6.6.2(FLINK-31997) has introduced the PATCH http method for updating 
the leader annotation. It will save a GET request for each update.

 
||Flink Version||PUT/PATCH QPS||GET QPS||
|1.15.4 with old HA|800|1600|
|1.17.1|200|400|
|1.18.0|200|200|

 

All in all, the Flink 1.18 takes less stress on the K8s APIServer while all the 
1000 Flink jobs run normally as before.


was (Author: fly_in_gis):
*Stress Test*
Run 1000 Flink Jobs with 1 JM and 1 TM for each
1. Flink version 1.15.4 with {{high-availability.use-old-ha-services=true}}
Flink JobManager has 4 leader electors(RestServer, ResourceManager, Dispatcher, 
JobManager) to periodically update the K8s ConfigMap. So the QPS of {{PUT 
ConfigMap}}  for 1000 jobs will be roughly 800 req/s ≈ 4(leader elector) * 
1000(Flink JobManager pods) / 5(renew interval). The the QPS of {{GET 
ConfigMap}} is twice as much as {{PUT}}.

2. Flink version 1.17.1(same as 1.15.4 with 
{{high-availability.use-old-ha-services=false}})
Flink will only have one shared leader elector. So the QPS of {{PUT ConfigMap}} 
 for 1000 jobs will be roughly 200 req/s ≈ 1(leader elector) * 1000(Flink 
JobManager pods) / 5(renew interval). The the QPS of {{GET ConfigMap}} is twice 
as much as {{PUT}}.
 
3. Flink version 1.18-snapshot
Flink will only have one shared leader elector. So the QPS of {{PATCH 
ConfigMap}}  for 1000 jobs will be roughly 200 req/s ≈ 1(leader elector) * 
1000(Flink JobManager pods) / 5(renew interval). The the QPS of {{GET 
ConfigMap}} is same as {{PATCH}}.
 
!qps-configmap-put-115.png|width=694,height=176!
!qps-configmap-put-117.png|width=694,height=176!
!qps-configmap-patch-118.png|width=694,height=176!

>From the above two pictures, we could verify that the new leader elector in 
>1.18 only sends a quarter of the write requests of the old one in 1.15 on the 
>K8s APIServer. It will significantly reduce the stress on the K8s APIServer.

 

!qps-configmap-get-115.png|width=694,height=176!
!qps-configmap-get-117.png|width=694,height=176!
!qps-configmap-get-118.png|width=694,height=176!

We could also find that the read requests are half of the 1.17. The root cause 
is fabric8 6.6.2(FLINK-31997) has introduced the PATCH http method for updating 
the leader annotation. It will save a GET request for each update.

 

All in all, the Flink 1.18 takes less stress on the K8s APIServer while all the 
1000 Flink jobs run normally as before.

> Release Testing: Stress-Test to cover multiple low-level changes in Flink
> -
>
> Key: FLINK-32678
> URL: https://issues.apache.org/jira/browse/FLINK-32678
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Coordination
>Affects Versions: 1.18.0
>Reporter: Matthias Pohl
>Assignee: Yang Wang
>Priority: Major
>  Labels: release-testing
>

[jira] [Comment Edited] (FLINK-32678) Release Testing: Stress-Test to cover multiple low-level changes in Flink

2023-08-28 Thread Yang Wang (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-32678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17759570#comment-17759570
 ] 

Yang Wang edited comment on FLINK-32678 at 8/29/23 5:15 AM:


*Stress Test*
Run 1000 Flink Jobs with 1 JM and 1 TM for each
1. Flink version 1.15.4 with {{high-availability.use-old-ha-services=true}}
Flink JobManager has 4 leader electors(RestServer, ResourceManager, Dispatcher, 
JobManager) to periodically update the K8s ConfigMap. So the QPS of {{PUT 
ConfigMap}}  for 1000 jobs will be roughly 800 req/s ≈ 4(leader elector) * 
1000(Flink JobManager pods) / 5(renew interval). The the QPS of {{GET 
ConfigMap}} is twice as much as {{PUT}}.

2. Flink version 1.17.1(same as 1.15.4 with 
{{high-availability.use-old-ha-services=false}})
Flink will only have one shared leader elector. So the QPS of {{PUT ConfigMap}} 
 for 1000 jobs will be roughly 200 req/s ≈ 1(leader elector) * 1000(Flink 
JobManager pods) / 5(renew interval). The the QPS of {{GET ConfigMap}} is twice 
as much as {{PUT}}.
 
3. Flink version 1.18-snapshot
Flink will only have one shared leader elector. So the QPS of {{PATCH 
ConfigMap}}  for 1000 jobs will be roughly 200 req/s ≈ 1(leader elector) * 
1000(Flink JobManager pods) / 5(renew interval). The the QPS of {{GET 
ConfigMap}} is same as {{PATCH}}.
 
!qps-configmap-put-115.png|width=694,height=176!
!qps-configmap-put-117.png|width=694,height=176!
!qps-configmap-patch-118.png|width=694,height=176!

>From the above two pictures, we could verify that the new leader elector in 
>1.18 only sends a quarter of the write requests of the old one in 1.15 on the 
>K8s APIServer. It will significantly reduce the stress on the K8s APIServer.

 

!qps-configmap-get-115.png|width=694,height=176!
!qps-configmap-get-117.png|width=694,height=176!
!qps-configmap-get-118.png|width=694,height=176!

We could also find that the read requests are half of the 1.17. The root cause 
is fabric8 6.6.2(FLINK-31997) has introduced the PATCH http method for updating 
the leader annotation. It will save a GET request for each update.

 

All in all, the Flink 1.18 takes less stress on the K8s APIServer while all the 
1000 Flink jobs run normally as before.


was (Author: fly_in_gis):
*Stress Test*
Run 1000 Flink Jobs with 1 JM and 1 TM for each
1. Flink version 1.15.4 with {{high-availability.use-old-ha-services=true}}
Flink JobManager has 4 leader electors(RestServer, ResourceManager, Dispatcher, 
JobManager) to periodically update the K8s ConfigMap. So the QPS of {{PUT 
ConfigMap}}  for 1000 jobs will be roughly 800 req/s ≈ 4(leader elector) * 
1000(Flink JobManager pods) / 5(renew interval).
 
2. Flink version 1.18-snapshot
Flink will only have one shared leader elector. So the QPS of {{PATCH 
ConfigMap}}  for 1000 jobs will be roughly 200 req/s ≈ 1(leader elector) * 
1000(Flink JobManager pods) / 5(renew interval).
 
!qos-configmap-put-115.png|width=694,height=176!
!qos-configmap-patch-118.png|width=694,height=176!

>From the above two pictures, we could verify that the new leader elector in 
>1.18 only sends a quarter of the write requests of the old one in 1.15 on the 
>K8s APIServer. It will significantly reduce the stress on the K8s APIServer.

 

!qos-configmap-get-115.png|width=694,height=176!

!qos-configmap-get-118.png|width=694,height=176!

We also find that the read requests are only 1/8 of the old one. The root cause 
is fabric8 6.6.2(FLINK-31997) has introduced the PATCH http method for updating 
the leader annotation. It will save a GET request for each update.

 

All in all, the Flink 1.18 takes less stress on the K8s APIServer while all the 
1000 Flink jobs run normally as before.

> Release Testing: Stress-Test to cover multiple low-level changes in Flink
> -
>
> Key: FLINK-32678
> URL: https://issues.apache.org/jira/browse/FLINK-32678
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Coordination
>Affects Versions: 1.18.0
>Reporter: Matthias Pohl
>Assignee: Yang Wang
>Priority: Major
>  Labels: release-testing
> Fix For: 1.18.0
>
> Attachments: qps-configmap-get-115.png, qps-configmap-get-117.jpg, 
> qps-configmap-get-118.png, qps-configmap-patch-118.png, 
> qps-configmap-put-115.png, qps-configmap-put-117.jpg
>
>
> -We decided to do another round of testing for the LeaderElection refactoring 
> which happened in 
> [FLIP-285|https://cwiki.apache.org/confluence/display/FLINK/FLIP-285%3A+refactoring+leaderelection+to+make+flink+support+multi-component+leader+election+out-of-the-box].-
> This release testing task is about running a bigger amount of jobs in a Flink 
> environment to look for unusual behavior. This Jira issue shall cover the 
> following 1.18 efforts:
>  *

[GitHub] [flink-kubernetes-operator] gyfora commented on a diff in pull request #648: [FLINK-32700] Support job drain for Savepoint upgrade mode

2023-08-28 Thread via GitHub



gyfora commented on code in PR #648:
URL: 
https://github.com/apache/flink-kubernetes-operator/pull/648#discussion_r1308206545


##
flink-kubernetes-operator/src/main/java/org/apache/flink/kubernetes/operator/config/FlinkOperatorConfiguration.java:
##
@@ -185,6 +186,9 @@ public static FlinkOperatorConfiguration 
fromConfiguration(Configuration operato
 boolean savepointOnDeletion =
 
operatorConfig.get(KubernetesOperatorConfigOptions.SAVEPOINT_ON_DELETION);
 
+boolean drainJobOnSavepointDeletion =
+
operatorConfig.get(KubernetesOperatorConfigOptions.DRAIN_ON_SAVEPOINT_DELETION);

Review Comment:
   Similar to this fix PR: 
https://github.com/apache/flink-kubernetes-operator/pull/659
   
   These configs should not be part of the `FlinkOperatorConfiguration` which 
will prevent it for setting it for a per-resource level. It should be accessed 
from the observeConfig directly.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Updated] (FLINK-32678) Release Testing: Stress-Test to cover multiple low-level changes in Flink

2023-08-28 Thread Yang Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-32678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Wang updated FLINK-32678:
--
Attachment: (was: qos-configmap-get-115.png)

> Release Testing: Stress-Test to cover multiple low-level changes in Flink
> -
>
> Key: FLINK-32678
> URL: https://issues.apache.org/jira/browse/FLINK-32678
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Coordination
>Affects Versions: 1.18.0
>Reporter: Matthias Pohl
>Assignee: Yang Wang
>Priority: Major
>  Labels: release-testing
> Fix For: 1.18.0
>
> Attachments: qps-configmap-get-115.png, qps-configmap-get-117.jpg, 
> qps-configmap-get-118.png, qps-configmap-patch-118.png, 
> qps-configmap-put-115.png, qps-configmap-put-117.jpg
>
>
> -We decided to do another round of testing for the LeaderElection refactoring 
> which happened in 
> [FLIP-285|https://cwiki.apache.org/confluence/display/FLINK/FLIP-285%3A+refactoring+leaderelection+to+make+flink+support+multi-component+leader+election+out-of-the-box].-
> This release testing task is about running a bigger amount of jobs in a Flink 
> environment to look for unusual behavior. This Jira issue shall cover the 
> following 1.18 efforts:
>  * Leader Election refactoring 
> ([FLIP-285|https://cwiki.apache.org/confluence/display/FLINK/FLIP-285%3A+refactoring+leaderelection+to+make+flink+support+multi-component+leader+election+out-of-the-box],
>  FLINK-26522)
>  * Akka to Pekko transition (FLINK-32468)
>  * flink-shaded 17.0 updates (FLINK-32032)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (FLINK-32678) Release Testing: Stress-Test to cover multiple low-level changes in Flink

2023-08-28 Thread Yang Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-32678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Wang updated FLINK-32678:
--
Attachment: (was: qos-configmap-put-115.png)

> Release Testing: Stress-Test to cover multiple low-level changes in Flink
> -
>
> Key: FLINK-32678
> URL: https://issues.apache.org/jira/browse/FLINK-32678
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Coordination
>Affects Versions: 1.18.0
>Reporter: Matthias Pohl
>Assignee: Yang Wang
>Priority: Major
>  Labels: release-testing
> Fix For: 1.18.0
>
> Attachments: qps-configmap-get-115.png, qps-configmap-get-117.jpg, 
> qps-configmap-get-118.png, qps-configmap-patch-118.png, 
> qps-configmap-put-115.png, qps-configmap-put-117.jpg
>
>
> -We decided to do another round of testing for the LeaderElection refactoring 
> which happened in 
> [FLIP-285|https://cwiki.apache.org/confluence/display/FLINK/FLIP-285%3A+refactoring+leaderelection+to+make+flink+support+multi-component+leader+election+out-of-the-box].-
> This release testing task is about running a bigger amount of jobs in a Flink 
> environment to look for unusual behavior. This Jira issue shall cover the 
> following 1.18 efforts:
>  * Leader Election refactoring 
> ([FLIP-285|https://cwiki.apache.org/confluence/display/FLINK/FLIP-285%3A+refactoring+leaderelection+to+make+flink+support+multi-component+leader+election+out-of-the-box],
>  FLINK-26522)
>  * Akka to Pekko transition (FLINK-32468)
>  * flink-shaded 17.0 updates (FLINK-32032)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (FLINK-32678) Release Testing: Stress-Test to cover multiple low-level changes in Flink

2023-08-28 Thread Yang Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-32678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Wang updated FLINK-32678:
--
Attachment: qps-configmap-get-118.png
qps-configmap-get-115.png
qps-configmap-put-115.png
qps-configmap-get-117.jpg
qps-configmap-put-117.jpg
qps-configmap-patch-118.png

> Release Testing: Stress-Test to cover multiple low-level changes in Flink
> -
>
> Key: FLINK-32678
> URL: https://issues.apache.org/jira/browse/FLINK-32678
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Coordination
>Affects Versions: 1.18.0
>Reporter: Matthias Pohl
>Assignee: Yang Wang
>Priority: Major
>  Labels: release-testing
> Fix For: 1.18.0
>
> Attachments: qps-configmap-get-115.png, qps-configmap-get-117.jpg, 
> qps-configmap-get-118.png, qps-configmap-patch-118.png, 
> qps-configmap-put-115.png, qps-configmap-put-117.jpg
>
>
> -We decided to do another round of testing for the LeaderElection refactoring 
> which happened in 
> [FLIP-285|https://cwiki.apache.org/confluence/display/FLINK/FLIP-285%3A+refactoring+leaderelection+to+make+flink+support+multi-component+leader+election+out-of-the-box].-
> This release testing task is about running a bigger amount of jobs in a Flink 
> environment to look for unusual behavior. This Jira issue shall cover the 
> following 1.18 efforts:
>  * Leader Election refactoring 
> ([FLIP-285|https://cwiki.apache.org/confluence/display/FLINK/FLIP-285%3A+refactoring+leaderelection+to+make+flink+support+multi-component+leader+election+out-of-the-box],
>  FLINK-26522)
>  * Akka to Pekko transition (FLINK-32468)
>  * flink-shaded 17.0 updates (FLINK-32032)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (FLINK-32678) Release Testing: Stress-Test to cover multiple low-level changes in Flink

2023-08-28 Thread Yang Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-32678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Wang updated FLINK-32678:
--
Attachment: (was: qos-configmap-patch-118.png)

> Release Testing: Stress-Test to cover multiple low-level changes in Flink
> -
>
> Key: FLINK-32678
> URL: https://issues.apache.org/jira/browse/FLINK-32678
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Coordination
>Affects Versions: 1.18.0
>Reporter: Matthias Pohl
>Assignee: Yang Wang
>Priority: Major
>  Labels: release-testing
> Fix For: 1.18.0
>
> Attachments: qps-configmap-get-115.png, qps-configmap-get-117.jpg, 
> qps-configmap-get-118.png, qps-configmap-patch-118.png, 
> qps-configmap-put-115.png, qps-configmap-put-117.jpg
>
>
> -We decided to do another round of testing for the LeaderElection refactoring 
> which happened in 
> [FLIP-285|https://cwiki.apache.org/confluence/display/FLINK/FLIP-285%3A+refactoring+leaderelection+to+make+flink+support+multi-component+leader+election+out-of-the-box].-
> This release testing task is about running a bigger amount of jobs in a Flink 
> environment to look for unusual behavior. This Jira issue shall cover the 
> following 1.18 efforts:
>  * Leader Election refactoring 
> ([FLIP-285|https://cwiki.apache.org/confluence/display/FLINK/FLIP-285%3A+refactoring+leaderelection+to+make+flink+support+multi-component+leader+election+out-of-the-box],
>  FLINK-26522)
>  * Akka to Pekko transition (FLINK-32468)
>  * flink-shaded 17.0 updates (FLINK-32032)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (FLINK-32678) Release Testing: Stress-Test to cover multiple low-level changes in Flink

2023-08-28 Thread Yang Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-32678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Wang updated FLINK-32678:
--
Attachment: (was: qos-configmap-get-118.png)

> Release Testing: Stress-Test to cover multiple low-level changes in Flink
> -
>
> Key: FLINK-32678
> URL: https://issues.apache.org/jira/browse/FLINK-32678
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Coordination
>Affects Versions: 1.18.0
>Reporter: Matthias Pohl
>Assignee: Yang Wang
>Priority: Major
>  Labels: release-testing
> Fix For: 1.18.0
>
> Attachments: qps-configmap-get-115.png, qps-configmap-get-117.jpg, 
> qps-configmap-get-118.png, qps-configmap-patch-118.png, 
> qps-configmap-put-115.png, qps-configmap-put-117.jpg
>
>
> -We decided to do another round of testing for the LeaderElection refactoring 
> which happened in 
> [FLIP-285|https://cwiki.apache.org/confluence/display/FLINK/FLIP-285%3A+refactoring+leaderelection+to+make+flink+support+multi-component+leader+election+out-of-the-box].-
> This release testing task is about running a bigger amount of jobs in a Flink 
> environment to look for unusual behavior. This Jira issue shall cover the 
> following 1.18 efforts:
>  * Leader Election refactoring 
> ([FLIP-285|https://cwiki.apache.org/confluence/display/FLINK/FLIP-285%3A+refactoring+leaderelection+to+make+flink+support+multi-component+leader+election+out-of-the-box],
>  FLINK-26522)
>  * Akka to Pekko transition (FLINK-32468)
>  * flink-shaded 17.0 updates (FLINK-32032)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[GitHub] [flink] lsyldliu commented on a diff in pull request #23282: [FlINK-32865][table-planner] Add ExecutionOrderEnforcer to exec plan and put it into BatchExecMultipleInput

2023-08-28 Thread via GitHub



lsyldliu commented on code in PR #23282:
URL: https://github.com/apache/flink/pull/23282#discussion_r1308172368


##
flink-table/flink-table-planner/src/main/java/org/apache/flink/table/planner/plan/nodes/exec/processor/DynamicFilteringDependencyProcessor.java:
##
@@ -52,50 +55,91 @@ public class DynamicFilteringDependencyProcessor implements 
ExecNodeGraphProcess
 
 @Override
 public ExecNodeGraph process(ExecNodeGraph execGraph, ProcessorContext 
context) {
-ExecNodeGraph factSideProcessedGraph = 
checkIfFactSourceNeedEnforceDependency(execGraph);
+ExecNodeGraph factSideProcessedGraph =
+checkIfFactSourceNeedEnforceDependency(execGraph, context);
 return enforceDimSideBlockingExchange(factSideProcessedGraph, context);
 }
 
-private ExecNodeGraph checkIfFactSourceNeedEnforceDependency(ExecNodeGraph 
execGraph) {
-Map>> 
dynamicFilteringScanDescendants =
+private ExecNodeGraph checkIfFactSourceNeedEnforceDependency(
+ExecNodeGraph execGraph, ProcessorContext context) {
+Map> 
dynamicFilteringScanDescendants =
 new HashMap<>();
 
 AbstractExecNodeExactlyOnceVisitor dynamicFilteringScanCollector =
 new AbstractExecNodeExactlyOnceVisitor() {
 @Override
 protected void visitNode(ExecNode node) {
-node.getInputEdges().stream()
-.map(ExecEdge::getSource)
-.forEach(
-input -> {
-// The character of the dynamic 
filter scan is that it
-// has an input.
-if (input instanceof 
BatchExecTableSourceScan
-&& 
input.getInputEdges().size() > 0) {
-dynamicFilteringScanDescendants
-.computeIfAbsent(
-
(BatchExecTableSourceScan) input,
-ignored -> new 
ArrayList<>())
-.add(node);
-}
-});
+for (int i = 0; i < node.getInputEdges().size(); ++i) {
+ExecEdge edge = node.getInputEdges().get(i);
+ExecNode input = edge.getSource();
+
+// The character of the dynamic filter scan is 
that it
+// has an input.
+if (input instanceof BatchExecTableSourceScan
+&& input.getInputEdges().size() > 0) {
+dynamicFilteringScanDescendants
+.computeIfAbsent(
+(BatchExecTableSourceScan) 
input,
+ignored -> new ArrayList<>())
+.add(new DescendantInfo(node, i));
+}
+}
 
 visitInputs(node);
 }
 };
 execGraph.getRootNodes().forEach(node -> 
node.accept(dynamicFilteringScanCollector));
 
-for (Map.Entry>> entry :
+for (Map.Entry> entry :
 dynamicFilteringScanDescendants.entrySet()) {
-if (entry.getValue().size() == 1) {
-ExecNode next = entry.getValue().get(0);
-if (next instanceof BatchExecMultipleInput) {
-// the source can be chained with BatchExecMultipleInput
-continue;
-}
-}
-// otherwise we need dependencies
-entry.getKey().setNeedDynamicFilteringDependency(true);
+BatchExecTableSourceScan tableSourceScan = entry.getKey();
+BatchExecDynamicFilteringDataCollector 
dynamicFilteringDataCollector =
+getDynamicFilteringDataCollector(tableSourceScan);
+
+// Add exchange between collector and enforcer
+BatchExecExchange exchange =
+new BatchExecExchange(
+context.getPlanner().getTableConfig(),
+InputProperty.builder()
+
.requiredDistribution(InputProperty.ANY_DISTRIBUTION)
+
.damBehavior(InputProperty.DamBehavior.BLOCKING)
+.build(),
+(RowType) 
dynamicFilteringDataCollector.getOutputType(),
+"Exchange");
+

[GitHub] [flink] WencongLiu closed pull request #23314: Test

2023-08-28 Thread via GitHub



WencongLiu closed pull request #23314: Test
URL: https://github.com/apache/flink/pull/23314


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] 1996fanrui commented on a diff in pull request #23219: [FLINK-20681][yarn] Support specifying the hdfs path when ship archives or files

2023-08-28 Thread via GitHub



1996fanrui commented on code in PR #23219:
URL: https://github.com/apache/flink/pull/23219#discussion_r1308166223


##
flink-yarn/src/test/java/org/apache/flink/yarn/YarnClusterDescriptorTest.java:
##
@@ -543,22 +546,62 @@ void testExplicitFileShipping() throws Exception {
 Files.createTempDirectory(temporaryFolder, 
UUID.randomUUID().toString())
 .toFile();
 
-assertThat(descriptor.getShipFiles()).doesNotContain(libFile, 
libFolder);
+assertThat(descriptor.getShipFiles())
+.doesNotContain(getPathFromLocalFile(libFile), 
getPathFromLocalFile(libFolder));
 
-List shipFiles = new ArrayList<>();
-shipFiles.add(libFile);
-shipFiles.add(libFolder);
+List shipFiles = new ArrayList<>();
+shipFiles.add(getPathFromLocalFile(libFile));
+shipFiles.add(getPathFromLocalFile(libFolder));
 
 descriptor.addShipFiles(shipFiles);
 
-assertThat(descriptor.getShipFiles()).contains(libFile, libFolder);
+assertThat(descriptor.getShipFiles())
+.contains(getPathFromLocalFile(libFile), 
getPathFromLocalFile(libFolder));
 
 // only execute part of the deployment to test for shipped files
-Set effectiveShipFiles = new HashSet<>();
+Set effectiveShipFiles = new HashSet<>();
 descriptor.addLibFoldersToShipFiles(effectiveShipFiles);
 
 assertThat(effectiveShipFiles).isEmpty();
-assertThat(descriptor.getShipFiles()).hasSize(2).contains(libFile, 
libFolder);
+assertThat(descriptor.getShipFiles())
+.hasSize(2)
+.contains(getPathFromLocalFile(libFile), 
getPathFromLocalFile(libFolder));
+}
+
+String hdfsDir = "hdfs:///flink/hdfs_dir";
+String hdfsFile = "hdfs:///flink/hdfs_file";
+File libFile = Files.createTempFile(temporaryFolder, "libFile", 
".jar").toFile();
+File libFolder =
+Files.createTempDirectory(temporaryFolder, 
UUID.randomUUID().toString()).toFile();
+final org.apache.hadoop.conf.Configuration hdConf =
+new org.apache.hadoop.conf.Configuration();
+hdConf.set(
+MiniDFSCluster.HDFS_MINIDFS_BASEDIR, 
temporaryFolder.toAbsolutePath().toString());
+try (final MiniDFSCluster hdfsCluster = new 
MiniDFSCluster.Builder(hdConf).build()) {
+final org.apache.hadoop.fs.Path hdfsRootPath =
+new org.apache.hadoop.fs.Path(hdfsCluster.getURI());
+hdfsCluster.getFileSystem().mkdirs(new 
org.apache.hadoop.fs.Path(hdfsDir));
+hdfsCluster.getFileSystem().createNewFile(new 
org.apache.hadoop.fs.Path(hdfsFile));
+
+Configuration flinkConfiguration = new Configuration();
+flinkConfiguration.set(
+YarnConfigOptions.SHIP_FILES,
+Arrays.asList(
+libFile.getAbsolutePath(),
+libFolder.getAbsolutePath(),
+hdfsDir,
+hdfsFile));
+final YarnConfiguration yarnConfig = new YarnConfiguration();
+yarnConfig.set(
+CommonConfigurationKeysPublic.FS_DEFAULT_NAME_KEY, 
hdfsRootPath.toString());
+YarnClusterDescriptor descriptor =
+createYarnClusterDescriptor(flinkConfiguration, 
yarnConfig);
+assertThat(descriptor.getShipFiles())
+.containsExactly(
+getPathFromLocalFile(libFile),
+getPathFromLocalFile(libFolder),
+new Path(hdfsDir),
+new Path(hdfsFile));

Review Comment:
   These new tests can be moved to a separate test method.
   
   The old test is testing by `YarnClusterDescriptor.addShipFiles`, the method 
name also mentioned it. And your test is testing by 
`YarnConfigOptions.SHIP_FILES`, so it's not suitable adding your  new test to 
the old test method.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Closed] (FLINK-32824) Port Calcite's fix for the sql like operator

2023-08-28 Thread lincoln lee (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-32824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lincoln lee closed FLINK-32824.
---
Resolution: Fixed

> Port Calcite's fix for the sql like operator
> 
>
> Key: FLINK-32824
> URL: https://issues.apache.org/jira/browse/FLINK-32824
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Runtime
>Affects Versions: 1.18.0, 1.17.1
>Reporter: lincoln lee
>Assignee: Shuai Xu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.18.0, 1.19.0
>
>
> we should port the bugfix of sql like operator 
> https://issues.apache.org/jira/browse/CALCITE-1898
> {code}
> The LIKE operator must match '.' (period) literally, not treat it as a 
> wild-card. Currently it treats it the same as '_'.
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (FLINK-32824) Port Calcite's fix for the sql like operator

2023-08-28 Thread lincoln lee (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-32824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17759797#comment-17759797
 ] 

lincoln lee commented on FLINK-32824:
-

fixed in release-1.18: e9566267639a33adfd6ced6df0c44d19f435366d

> Port Calcite's fix for the sql like operator
> 
>
> Key: FLINK-32824
> URL: https://issues.apache.org/jira/browse/FLINK-32824
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Runtime
>Affects Versions: 1.18.0, 1.17.1
>Reporter: lincoln lee
>Assignee: Shuai Xu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.18.0, 1.19.0
>
>
> we should port the bugfix of sql like operator 
> https://issues.apache.org/jira/browse/CALCITE-1898
> {code}
> The LIKE operator must match '.' (period) literally, not treat it as a 
> wild-card. Currently it treats it the same as '_'.
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[GitHub] [flink] lincoln-lil merged pull request #23312: [FLINK-32824] Port Calcite's fix for the sql like operator.

2023-08-28 Thread via GitHub



lincoln-lil merged PR #23312:
URL: https://github.com/apache/flink/pull/23312


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] flinkbot commented on pull request #23316: [FLINK-18445][table] Add pre-filter optimization for lookup join

2023-08-28 Thread via GitHub



flinkbot commented on PR #23316:
URL: https://github.com/apache/flink/pull/23316#issuecomment-1696726227

   
   ## CI report:
   
   * f282ba755b54d24ef35730755918ecc9b8e169a2 UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Updated] (FLINK-18445) Short circuit join condition for lookup join

2023-08-28 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-18445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-18445:
---
Labels: auto-deprioritized-major auto-deprioritized-minor 
pull-request-available  (was: auto-deprioritized-major auto-deprioritized-minor)

> Short circuit join condition for lookup join
> 
>
> Key: FLINK-18445
> URL: https://issues.apache.org/jira/browse/FLINK-18445
> Project: Flink
>  Issue Type: Improvement
>  Components: Table SQL / Runtime
>Reporter: Rui Li
>Assignee: lincoln lee
>Priority: Minor
>  Labels: auto-deprioritized-major, auto-deprioritized-minor, 
> pull-request-available
> Fix For: 1.19.0
>
>
> Consider the following query:
> {code}
> select *
> from probe
> left join
> build for system_time as of probe.ts
> on probe.key=build.key and probe.col is not null
> {code}
> In current implementation, we lookup each probe.key in build to decide 
> whether a match is found. A possible optimization is to skip the lookup for 
> rows whose {{col}} is null.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[GitHub] [flink] lincoln-lil opened a new pull request, #23316: [FLINK-18445][table] Add pre-filter optimization for lookup join

2023-08-28 Thread via GitHub



lincoln-lil opened a new pull request, #23316:
URL: https://github.com/apache/flink/pull/23316

   ## What is the purpose of the change
   As the issue shows there's some chance for optimizing the lookup join when 
do a left join (maybe full outer join as well in future) which has filter 
condition on left input in the join condition. We can achieve this by adding a 
prefilter in lookup join operator, this is what has been done in the pr.
   
   ## Brief change log
   * add a new FilterCondition to the codegen part
   * add pre-filter (via codegen) for lookup join operator
   
   ## Verifying this change
   * json plan test (LookupJoinJsonPlanTest)
   * lookup join operator tests (LookupJoinHarnessTest 
KeyedLookupJoinHarnessTest AsyncLookupJoinHarnessTest)
   * lookup join itcase (LookupJoinITCase)
   
   ## Does this pull request potentially affect one of the following parts:
 - Dependencies (does it add or upgrade a dependency): (no)
 - The public API, i.e., is any changed class annotated with 
@Public(Evolving): (no)
 - The serializers: (no )
 - The runtime per-record code paths (performance sensitive): (no)
 - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Kubernetes/Yarn, ZooKeeper: (no)
 - The S3 file system connector: (no)
   
   ## Documentation
 - Does this pull request introduce a new feature? (no)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] flinkbot commented on pull request #23315: [FLINK-32945][runtime] Fix NPE when task reached end-of-data but checkpoint failed

2023-08-28 Thread via GitHub



flinkbot commented on PR #23315:
URL: https://github.com/apache/flink/pull/23315#issuecomment-1696717433

   
   ## CI report:
   
   * b84a51e67193e892c87fb8458d028abc22d4f57f UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] jiangxin369 opened a new pull request, #23315: [FLINK-32945][runtime] Fix NPE when task reached end-of-data but checkpoint failed

2023-08-28 Thread via GitHub



jiangxin369 opened a new pull request, #23315:
URL: https://github.com/apache/flink/pull/23315

   
   
   ## What is the purpose of the change
   
   Fix NullPointerException when executing TopSpeedWindowing example with 
checkpointing enabled. 
   
   The bug happens when some vertices have reached the end of data but do not 
transform to FINISHED because their final checkpoint fails and causes a 
failover. The `VertexEndOfDataListener` would remove the end-of-data vertices 
from its state, however, in such case these vertices would be restarted and try 
to access the `VertexEndOfDataListener` again, which causes the NPE.
   
   ## Brief change log
   
 - Restore the state of VertexEndOfDataListener when some tasks are going 
to be restarted.
   
   ## Verifying this change
   
   This change added tests and can be verified as follows:
   
 - Added test that validates that the state of VertexEndOfDataListener 
would be restored when failover happens.
   
   ## Does this pull request potentially affect one of the following parts:
   
 - Dependencies (does it add or upgrade a dependency): (no)
 - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: (no)
 - The serializers: (no)
 - The runtime per-record code paths (performance sensitive): (no)
 - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Kubernetes/Yarn, ZooKeeper: (yes )
 - The S3 file system connector: (no)
   
   ## Documentation
   
 - Does this pull request introduce a new feature? (no)
 - If yes, how is the feature documented? (not applicable )
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] hackergin commented on a diff in pull request #23307: [FLINK-32968][table] Fix doc for catalog modification listener

2023-08-28 Thread via GitHub



hackergin commented on code in PR #23307:
URL: https://github.com/apache/flink/pull/23307#discussion_r1308135813


##
docs/content/docs/dev/table/catalogs.md:
##
@@ -801,10 +806,9 @@ env.executeSql("CREATE TABLE ...").wait();
 ```
 
 For sql-gateway, you can add the option `table.catalog-modification.listeners` 
in the `flink-conf.yaml` and start
-the gateway, or you can also use `SET` to specify the listener for ddl, for 
example, in sql-client or jdbc-driver.
+the gateway, or you can also start sql-gateway with dynamic parameter, then 
you can use sql-client to perform ddl directly.
 
-```
-Flink SQL> SET 'table.catalog-modification.listeners' = 'your_factory';
+```sql
 Flink SQL> CREATE TABLE test_table(...);

Review Comment:
   I think this line can be removed directly, only keeping the part seems a bit 
strange here.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Commented] (FLINK-31650) Incorrect busyMsTimePerSecond metric value for FINISHED task

2023-08-28 Thread Zhanghao Chen (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-31650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17759786#comment-17759786
 ] 

Zhanghao Chen commented on FLINK-31650:
---

Hi [~JunRuiLi], we encountered the same problem. Are you still working on it? 
If not, I'd be willing to take over this ticket.

> Incorrect busyMsTimePerSecond metric value for FINISHED task
> 
>
> Key: FLINK-31650
> URL: https://issues.apache.org/jira/browse/FLINK-31650
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Metrics, Runtime / REST
>Affects Versions: 1.17.0, 1.16.1, 1.15.4
>Reporter: Lijie Wang
>Assignee: Junrui Li
>Priority: Major
> Attachments: busyMsTimePerSecond.png
>
>
> As shown in the figure below, the busyMsTimePerSecond of the FINISHED task is 
> 100%, which is obviously unreasonable.
> !busyMsTimePerSecond.png|width=1048,height=432!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Closed] (FLINK-32984) Display the host port information on the suttasks page of SubtaskExecution

2023-08-28 Thread Weihua Hu (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-32984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weihua Hu closed FLINK-32984.
-
Resolution: Duplicate

> Display the host port information on the suttasks page of SubtaskExecution
> --
>
> Key: FLINK-32984
> URL: https://issues.apache.org/jira/browse/FLINK-32984
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Web Frontend
>Affects Versions: 1.7.2, 1.8.0
>Reporter: Matt Wang
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 1.19.0
>
> Attachments: image-2023-08-28-21-04-05-297.png, 
> image-2023-08-28-21-04-27-396.png
>
>
> Currently in the Flink UI, the Host in the task SubTasks page will not 
> display TM port information, but the Host in TaskManagers will display port 
> information. 
> !image-2023-08-28-21-04-05-297.png|width=850,height=144!
> !image-2023-08-28-21-04-27-396.png|width=949,height=109!
> Since multiple TMs will run on one Host, I think the port information of the 
> TM should also be displayed in SubTasks, so that it is convenient to locate 
> the TM running on the subtask



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (FLINK-32980) Support env.java.opts.all & env.java.opts.cli config for starting Session cluster CLIs

2023-08-28 Thread Weihua Hu (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-32980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weihua Hu updated FLINK-32980:
--
Release Note:   (was: Resolved in master: 
6dbb9b8d3fcfe88a85b22e2f177e8fea7585702e)

> Support env.java.opts.all & env.java.opts.cli config for starting Session 
> cluster CLIs
> --
>
> Key: FLINK-32980
> URL: https://issues.apache.org/jira/browse/FLINK-32980
> Project: Flink
>  Issue Type: Improvement
>  Components: Command Line Client, Deployment / Scripts
>Affects Versions: 1.18.0
>Reporter: Zhanghao Chen
>Assignee: Zhanghao Chen
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 1.19.0
>
>
> *Problem*
> The following configs are supposed to be supported:
> |h5. env.java.opts.all|(none)|String|Java options to start the JVM of all 
> Flink processes with.|
> |h5. env.java.opts.client|(none)|String|Java options to start the JVM of the 
> Flink Client with.|
> However, the two configs do not take effect for starting Flink session 
> clusters using kubernetes-session.sh and yarn-session.sh. This can lead to 
> problems in complex production envs. For example, in my company, some nodes 
> are IPv6-only, and the connection between Flink client and K8s/YARN control 
> plane is via a domain name whose backend is on IPv4/v6 dual stack, and the 
> JVM arg -Djava.net.preferIPv6Addresses=true needs to be set to make Java 
> connect to IPv6 addresses in favor of IPv4 ones otherwise the K8s/YARN 
> control plane is inaccessible.
>  
> *Proposal*
> The fix is straight-forward, simply apply the following changes to the 
> session scripts:
> `
> FLINK_ENV_JAVA_OPTS="${FLINK_ENV_JAVA_OPTS} ${FLINK_ENV_JAVA_OPTS_CLI}"
> exec $JAVA_RUN $JVM_ARGS $FLINK_ENV_JAVA_OPTS xxx
> `



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (FLINK-32980) Support env.java.opts.all & env.java.opts.cli config for starting Session cluster CLIs

2023-08-28 Thread Weihua Hu (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-32980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17759785#comment-17759785
 ] 

Weihua Hu commented on FLINK-32980:
---

Resolved in master: 6dbb9b8d3fcfe88a85b22e2f177e8fea7585702e

> Support env.java.opts.all & env.java.opts.cli config for starting Session 
> cluster CLIs
> --
>
> Key: FLINK-32980
> URL: https://issues.apache.org/jira/browse/FLINK-32980
> Project: Flink
>  Issue Type: Improvement
>  Components: Command Line Client, Deployment / Scripts
>Affects Versions: 1.18.0
>Reporter: Zhanghao Chen
>Assignee: Zhanghao Chen
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 1.19.0
>
>
> *Problem*
> The following configs are supposed to be supported:
> |h5. env.java.opts.all|(none)|String|Java options to start the JVM of all 
> Flink processes with.|
> |h5. env.java.opts.client|(none)|String|Java options to start the JVM of the 
> Flink Client with.|
> However, the two configs do not take effect for starting Flink session 
> clusters using kubernetes-session.sh and yarn-session.sh. This can lead to 
> problems in complex production envs. For example, in my company, some nodes 
> are IPv6-only, and the connection between Flink client and K8s/YARN control 
> plane is via a domain name whose backend is on IPv4/v6 dual stack, and the 
> JVM arg -Djava.net.preferIPv6Addresses=true needs to be set to make Java 
> connect to IPv6 addresses in favor of IPv4 ones otherwise the K8s/YARN 
> control plane is inaccessible.
>  
> *Proposal*
> The fix is straight-forward, simply apply the following changes to the 
> session scripts:
> `
> FLINK_ENV_JAVA_OPTS="${FLINK_ENV_JAVA_OPTS} ${FLINK_ENV_JAVA_OPTS_CLI}"
> exec $JAVA_RUN $JVM_ARGS $FLINK_ENV_JAVA_OPTS xxx
> `



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Resolved] (FLINK-32980) Support env.java.opts.all & env.java.opts.cli config for starting Session cluster CLIs

2023-08-28 Thread Weihua Hu (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-32980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weihua Hu resolved FLINK-32980.
---
Release Note: Resolved in master: 6dbb9b8d3fcfe88a85b22e2f177e8fea7585702e
  Resolution: Fixed

> Support env.java.opts.all & env.java.opts.cli config for starting Session 
> cluster CLIs
> --
>
> Key: FLINK-32980
> URL: https://issues.apache.org/jira/browse/FLINK-32980
> Project: Flink
>  Issue Type: Improvement
>  Components: Command Line Client, Deployment / Scripts
>Affects Versions: 1.18.0
>Reporter: Zhanghao Chen
>Assignee: Zhanghao Chen
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 1.19.0
>
>
> *Problem*
> The following configs are supposed to be supported:
> |h5. env.java.opts.all|(none)|String|Java options to start the JVM of all 
> Flink processes with.|
> |h5. env.java.opts.client|(none)|String|Java options to start the JVM of the 
> Flink Client with.|
> However, the two configs do not take effect for starting Flink session 
> clusters using kubernetes-session.sh and yarn-session.sh. This can lead to 
> problems in complex production envs. For example, in my company, some nodes 
> are IPv6-only, and the connection between Flink client and K8s/YARN control 
> plane is via a domain name whose backend is on IPv4/v6 dual stack, and the 
> JVM arg -Djava.net.preferIPv6Addresses=true needs to be set to make Java 
> connect to IPv6 addresses in favor of IPv4 ones otherwise the K8s/YARN 
> control plane is inaccessible.
>  
> *Proposal*
> The fix is straight-forward, simply apply the following changes to the 
> session scripts:
> `
> FLINK_ENV_JAVA_OPTS="${FLINK_ENV_JAVA_OPTS} ${FLINK_ENV_JAVA_OPTS_CLI}"
> exec $JAVA_RUN $JVM_ARGS $FLINK_ENV_JAVA_OPTS xxx
> `



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[GitHub] [flink] huwh merged pull request #23308: [FLINK-32980][scripts] Support env.java.opts.all & env.java.opts.cli config for starting Session cluster CLIs

2023-08-28 Thread via GitHub



huwh merged PR #23308:
URL: https://github.com/apache/flink/pull/23308


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Comment Edited] (FLINK-32780) Release Testing: Verify FLIP-324: Introduce Runtime Filter for Flink Batch Jobs

2023-08-28 Thread dalongliu (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-32780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17759578#comment-17759578
 ] 

dalongliu edited comment on FLINK-32780 at 8/29/23 2:40 AM:


RuntimeFilter was validated against the TPC-DS test set. Enabling the 
table.optimizer.runtime-filter.enabled option. The following validations were 
done separately:
1. verified the plan of the query, and confirmed that many queries were 
inserted into the RuntimeFilter through the plan.
2. For the whole TPC-DS dataset, before enabling runtime, total time is 5141s, 
after is 4883s, the gain of RuntimeFilter is 5%, the queries with significant 
gain are q88(140s -> 62s),q93(117s -> 101s), q95(218s -> 70s), other queries 
with limited gain.
3. q24, q72 showed performance regression, especially q72. By checking the plan 
of q72, compared with not turning on RuntimeFilter, we found that there is a 
dependency between the upstream and downstream operators, which leads to the 
source node not being able to be executed in parallel, thus leading to the 
performance regression. I don't think we should insert the RuntimeFilter 
operator for this pattern because it doesn't filter the amount of data that the 
Join operator needs to process by itself.

!image-2023-08-28-20-50-26-687.png!


was (Author: lsy):
RuntimeFilter was validated against the TPC-DS test set. Enabling the 
table.optimizer.runtime-filter.enabled option. The following validations were 
done separately:
1. verified the plan of the query, and confirmed that many queries were 
inserted into the RuntimeFilter through the plan.
2. For the whole TPC-DS dataset, before enabling runtime, total time is 5141s, 
after is 4883s, the gain of RuntimeFilter is 5%, the queries with significant 
gain are q88, q93, q95, other queries with limited gain.
3. q24, q72 showed performance regression, especially q72. By checking the plan 
of q72, compared with not turning on RuntimeFilter, we found that there is a 
dependency between the upstream and downstream operators, which leads to the 
source node not being able to be executed in parallel, thus leading to the 
performance regression. I don't think we should insert the RuntimeFilter 
operator for this pattern because it doesn't filter the amount of data that the 
Join operator needs to process by itself.

!image-2023-08-28-20-50-26-687.png!

> Release Testing: Verify FLIP-324: Introduce Runtime Filter for Flink Batch 
> Jobs
> ---
>
> Key: FLINK-32780
> URL: https://issues.apache.org/jira/browse/FLINK-32780
> Project: Flink
>  Issue Type: Sub-task
>  Components: Tests
>Affects Versions: 1.18.0
>Reporter: Qingsheng Ren
>Assignee: dalongliu
>Priority: Major
> Fix For: 1.18.0
>
> Attachments: image-2023-08-28-20-50-26-687.png
>
>
> This issue aims to verify FLIP-324: 
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-324%3A+Introduce+Runtime+Filter+for+Flink+Batch+Jobs
> We can enable runtime filter by set: table.optimizer.runtime-filter.enabled: 
> true
> 1. Create two tables, one small table (small amount of data), one large table 
> (large amount of data), and then run join query on these two tables(such as 
> the example in FLIP doc: SELECT * FROM fact, dim WHERE x = a AND z = 2). The 
> Flink table planner should be able to obtain the statistical information of 
> these two tables (for example, Hive table), and the data volume of the small 
> table should be less than 
> "table.optimizer.runtime-filter.max-build-data-size", and the data volume of 
> the large table should be larger than 
> "table.optimizer.runtime-filter.min-probe-data-size".
> 2. Show the plan of the join query. The plan should include nodes such as 
> LocalRuntimeFilterBuilder, GlobalRuntimeFilterBuilder and RuntimeFilter. We 
> can also verify plan for the various variants of above query.
> 3. Execute the above plan, and: 
> * Check whether the data in the large table has been successfully filtered  
> * Verify the execution result, the execution result should be same with the 
> execution plan which disable runtime filter.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-32986) The new createTemporaryFunction has some regression of type inference compare to the deprecated registerFunction

2023-08-28 Thread lincoln lee (Jira)

lincoln lee created FLINK-32986:
---

 Summary: The new createTemporaryFunction has some regression of 
type inference compare to the deprecated registerFunction
 Key: FLINK-32986
 URL: https://issues.apache.org/jira/browse/FLINK-32986
 Project: Flink
  Issue Type: Technical Debt
  Components: Table SQL / API
Affects Versions: 1.17.1, 1.18.0
Reporter: lincoln lee


Current `LookupJoinITCase#testJoinTemporalTableWithUdfFilter` uses a legacy 
form function registration:
{code}
tEnv.registerFunction("add", new TestAddWithOpen)
{code}
it works fine with the SQL call `add(T.id, 2) > 3` but fails when swith to the 
new api:
{code}
tEnv.createTemporaryFunction("add", classOf[TestAddWithOpen])
// or this
tEnv.createTemporaryFunction("add", new TestAddWithOpen)
{code}
exception:
{code}
Caused by: org.apache.flink.table.api.ValidationException: Invalid function 
call:
default_catalog.default_database.add(BIGINT, INT NOT NULL)
at 
org.apache.flink.table.types.inference.TypeInferenceUtil.createInvalidCallException(TypeInferenceUtil.java:193)
at 
org.apache.flink.table.planner.functions.inference.TypeInferenceOperandChecker.checkOperandTypes(TypeInferenceOperandChecker.java:89)
at 
org.apache.calcite.sql.SqlOperator.checkOperandTypes(SqlOperator.java:753)
at 
org.apache.calcite.sql.SqlOperator.validateOperands(SqlOperator.java:499)
at org.apache.calcite.sql.SqlFunction.deriveType(SqlFunction.java:335)
at org.apache.calcite.sql.SqlFunction.deriveType(SqlFunction.java:231)
at 
org.apache.calcite.sql.validate.SqlValidatorImpl$DeriveTypeVisitor.visit(SqlValidatorImpl.java:6302)
at 
org.apache.calcite.sql.validate.SqlValidatorImpl$DeriveTypeVisitor.visit(SqlValidatorImpl.java:6287)
at org.apache.calcite.sql.SqlCall.accept(SqlCall.java:161)
at 
org.apache.calcite.sql.validate.SqlValidatorImpl.deriveTypeImpl(SqlValidatorImpl.java:1869)
at 
org.apache.calcite.sql.validate.SqlValidatorImpl.deriveType(SqlValidatorImpl.java:1860)
at 
org.apache.calcite.sql.type.SqlTypeUtil.deriveType(SqlTypeUtil.java:200)
at 
org.apache.calcite.sql.type.InferTypes.lambda$static$0(InferTypes.java:47)
at 
org.apache.calcite.sql.validate.SqlValidatorImpl.inferUnknownTypes(SqlValidatorImpl.java:2050)
at 
org.apache.calcite.sql.validate.SqlValidatorImpl.inferUnknownTypes(SqlValidatorImpl.java:2055)
at 
org.apache.calcite.sql.validate.SqlValidatorImpl.validateWhereOrOn(SqlValidatorImpl.java:4338)
at 
org.apache.calcite.sql.validate.SqlValidatorImpl.validateJoin(SqlValidatorImpl.java:3410)
at 
org.apache.flink.table.planner.calcite.FlinkCalciteSqlValidator.validateJoin(FlinkCalciteSqlValidator.java:154)
at 
org.apache.calcite.sql.validate.SqlValidatorImpl.validateFrom(SqlValidatorImpl.java:3282)
at 
org.apache.calcite.sql.validate.SqlValidatorImpl.validateSelect(SqlValidatorImpl.java:3603)
at 
org.apache.calcite.sql.validate.SelectNamespace.validateImpl(SelectNamespace.java:64)
at 
org.apache.calcite.sql.validate.AbstractNamespace.validate(AbstractNamespace.java:89)
at 
org.apache.calcite.sql.validate.SqlValidatorImpl.validateNamespace(SqlValidatorImpl.java:1050)
at 
org.apache.calcite.sql.validate.SqlValidatorImpl.validateQuery(SqlValidatorImpl.java:1025)
at org.apache.calcite.sql.SqlSelect.validate(SqlSelect.java:248)
at 
org.apache.calcite.sql.validate.SqlValidatorImpl.validateScopedExpression(SqlValidatorImpl.java:1000)
at 
org.apache.calcite.sql.validate.SqlValidatorImpl.validate(SqlValidatorImpl.java:749)
at 
org.apache.flink.table.planner.calcite.FlinkPlannerImpl.org$apache$flink$table$planner$calcite$FlinkPlannerImpl$$validate(FlinkPlannerImpl.scala:196)
... 49 more
Caused by: org.apache.flink.table.api.ValidationException: Invalid input 
arguments. Expected signatures are:
default_catalog.default_database.add(a BIGINT NOT NULL, b INT NOT NULL)
default_catalog.default_database.add(a BIGINT NOT NULL, b BIGINT NOT NULL)
at 
org.apache.flink.table.types.inference.TypeInferenceUtil.createInvalidInputException(TypeInferenceUtil.java:180)
at 
org.apache.flink.table.planner.functions.inference.TypeInferenceOperandChecker.checkOperandTypesOrError(TypeInferenceOperandChecker.java:124)
at 
org.apache.flink.table.planner.functions.inference.TypeInferenceOperandChecker.checkOperandTypes(TypeInferenceOperandChecker.java:86)
... 75 more
Caused by: org.apache.flink.table.api.ValidationException: Invalid input 
arguments.
at 
org.apache.flink.table.types.inference.TypeInferenceUtil.inferInputTypes(TypeInferenceUtil.java:442)
at 
org.apache.flink.table.types.inference.TypeInferenceUtil.adaptArguments(TypeInferenceUtil.java:123)
at

[GitHub] [flink] hackergin commented on pull request #23109: [FLINK-32475][docs] Add doc for time travel

2023-08-28 Thread via GitHub



hackergin commented on PR #23109:
URL: https://github.com/apache/flink/pull/23109#issuecomment-1696677420

   @luoyuxia Thanks for the review, sorry for the late update,  PTAL when your 
are free. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] fredia commented on pull request #22890: [FLINK-32072][checkpoint] Create and wire FileMergingSnapshotManager with TaskManagerServices

2023-08-28 Thread via GitHub



fredia commented on PR #22890:
URL: https://github.com/apache/flink/pull/22890#issuecomment-1696673932

   @Zakelly Thanks for the review, I moved the initialization of file merging 
into `initializeBaseLocationsForCheckpoint()` to address some failing tests, 
file merging will only take effect if checkpoint is enabled. Please take a look 
again if you are free.
   
   And my local CI is green now: 
https://dev.azure.com/fredia/flink/_build/results?buildId=585=results


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Commented] (FLINK-32758) PyFlink bounds are overly restrictive and outdated

2023-08-28 Thread Dian Fu (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-32758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17759777#comment-17759777
 ] 

Dian Fu commented on FLINK-32758:
-

[~deepyaman] Actually the tests have passed on the CI:
!image-2023-08-29-10-19-37-977.png!

It's failing when building wheel package for MacOS.

It uses third-party platform **cibuildwheel** to build wheel packages, see 
[https://github.com/apache/flink/blob/master/tools/azure-pipelines/build-python-wheels.yml#L41]
 for more details. Currently it's using version 2.8.0. I will try if the latest 
version (2.15.0) works on my CI.

 

> PyFlink bounds are overly restrictive and outdated
> --
>
> Key: FLINK-32758
> URL: https://issues.apache.org/jira/browse/FLINK-32758
> Project: Flink
>  Issue Type: Improvement
>  Components: API / Python
>Affects Versions: 1.17.1, 1.19.0
>Reporter: Deepyaman Datta
>Assignee: Deepyaman Datta
>Priority: Blocker
>  Labels: pull-request-available, test-stability
> Attachments: image-2023-08-29-10-19-37-977.png
>
>
> Hi! I am part of a team building the Flink backend for Ibis 
> ([https://github.com/ibis-project/ibis]). We would like to leverage PyFlink 
> under the hood for execution; however, PyFlink's requirements are 
> incompatible with several other Ibis requirements. Beyond Ibis, PyFlink's 
> outdated and restrictive requirements prevent it from being used alongside 
> most recent releases of Python data libraries.
> Some of the major libraries we (and likely others in the Python community 
> interested in using PyFlink alongside other libraries) need compatibility 
> with:
>  * PyArrow (at least >=10.0.0, but there's no reason not to be also be 
> compatible with latest)
>  * pandas (should be compatible with 2.x series, but also probably with 
> 1.4.x, released January 2022, and 1.5.x)
>  * numpy (1.22 was released in December 2022)
>  * Newer releases of Apache Beam
>  * Newer releases of cython
> Furthermore, uncapped dependencies could be more generally preferable, as 
> they avoid the need for frequent PyFlink releases as newer versions of 
> libraries are released. A common (and great) argument for not upper-bounding 
> dependencies, especially for libraries: 
> [https://iscinumpy.dev/post/bound-version-constraints/]
> I am currently testing removing upper bounds in 
> [https://github.com/apache/flink/pull/23141]; so far, builds pass without 
> issue in 
> [b65c072|https://github.com/apache/flink/pull/23141/commits/b65c0723ed66e01e83d718f770aa916f41f34581],
>  and I'm currently waiting on 
> [c8eb15c|https://github.com/apache/flink/pull/23141/commits/c8eb15cbc371dc259fb4fda5395f0f55e08ea9c6]
>  to see if I can get PyArrow to resolve >=10.0.0. Solving the proposed 
> dependencies results in:
> {{#}}
> {{# This file is autogenerated by pip-compile with Python 3.8}}
> {{# by the following command:}}
> {{#}}
> {{#    pip-compile --config=pyproject.toml 
> --output-file=dev/compiled-requirements.txt dev/dev-requirements.txt}}
> {{#}}
> {{apache-beam==2.49.0}}
> {{    # via -r dev/dev-requirements.txt}}
> {{avro-python3==1.10.2}}
> {{    # via -r dev/dev-requirements.txt}}
> {{certifi==2023.7.22}}
> {{    # via requests}}
> {{charset-normalizer==3.2.0}}
> {{    # via requests}}
> {{cloudpickle==2.2.1}}
> {{    # via}}
> {{    #   -r dev/dev-requirements.txt}}
> {{    #   apache-beam}}
> {{crcmod==1.7}}
> {{    # via apache-beam}}
> {{cython==3.0.0}}
> {{    # via -r dev/dev-requirements.txt}}
> {{dill==0.3.1.1}}
> {{    # via apache-beam}}
> {{dnspython==2.4.1}}
> {{    # via pymongo}}
> {{docopt==0.6.2}}
> {{    # via hdfs}}
> {{exceptiongroup==1.1.2}}
> {{    # via pytest}}
> {{fastavro==1.8.2}}
> {{    # via}}
> {{    #   -r dev/dev-requirements.txt}}
> {{    #   apache-beam}}
> {{fasteners==0.18}}
> {{    # via apache-beam}}
> {{find-libpython==0.3.0}}
> {{    # via pemja}}
> {{grpcio==1.56.2}}
> {{    # via}}
> {{    #   -r dev/dev-requirements.txt}}
> {{    #   apache-beam}}
> {{    #   grpcio-tools}}
> {{grpcio-tools==1.56.2}}
> {{    # via -r dev/dev-requirements.txt}}
> {{hdfs==2.7.0}}
> {{    # via apache-beam}}
> {{httplib2==0.22.0}}
> {{    # via}}
> {{    #   -r dev/dev-requirements.txt}}
> {{    #   apache-beam}}
> {{idna==3.4}}
> {{    # via requests}}
> {{iniconfig==2.0.0}}
> {{    # via pytest}}
> {{numpy==1.24.4}}
> {{    # via}}
> {{    #   -r dev/dev-requirements.txt}}
> {{    #   apache-beam}}
> {{    #   pandas}}
> {{    #   pyarrow}}
> {{objsize==0.6.1}}
> {{    # via apache-beam}}
> {{orjson==3.9.2}}
> {{    # via apache-beam}}
> {{packaging==23.1}}
> {{    # via pytest}}
> {{pandas==2.0.3}}
> {{    # via -r dev/dev-requirements.txt}}
> {{pemja==0.3.0 ; platform_system != "Windows"}}
> {{    # via -r dev/dev-requirements.txt}}
> {{pluggy==1.2.0}}
> {{    #

[jira] [Updated] (FLINK-32758) PyFlink bounds are overly restrictive and outdated

2023-08-28 Thread Dian Fu (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-32758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dian Fu updated FLINK-32758:

Attachment: image-2023-08-29-10-19-37-977.png

> PyFlink bounds are overly restrictive and outdated
> --
>
> Key: FLINK-32758
> URL: https://issues.apache.org/jira/browse/FLINK-32758
> Project: Flink
>  Issue Type: Improvement
>  Components: API / Python
>Affects Versions: 1.17.1, 1.19.0
>Reporter: Deepyaman Datta
>Assignee: Deepyaman Datta
>Priority: Blocker
>  Labels: pull-request-available, test-stability
> Attachments: image-2023-08-29-10-19-37-977.png
>
>
> Hi! I am part of a team building the Flink backend for Ibis 
> ([https://github.com/ibis-project/ibis]). We would like to leverage PyFlink 
> under the hood for execution; however, PyFlink's requirements are 
> incompatible with several other Ibis requirements. Beyond Ibis, PyFlink's 
> outdated and restrictive requirements prevent it from being used alongside 
> most recent releases of Python data libraries.
> Some of the major libraries we (and likely others in the Python community 
> interested in using PyFlink alongside other libraries) need compatibility 
> with:
>  * PyArrow (at least >=10.0.0, but there's no reason not to be also be 
> compatible with latest)
>  * pandas (should be compatible with 2.x series, but also probably with 
> 1.4.x, released January 2022, and 1.5.x)
>  * numpy (1.22 was released in December 2022)
>  * Newer releases of Apache Beam
>  * Newer releases of cython
> Furthermore, uncapped dependencies could be more generally preferable, as 
> they avoid the need for frequent PyFlink releases as newer versions of 
> libraries are released. A common (and great) argument for not upper-bounding 
> dependencies, especially for libraries: 
> [https://iscinumpy.dev/post/bound-version-constraints/]
> I am currently testing removing upper bounds in 
> [https://github.com/apache/flink/pull/23141]; so far, builds pass without 
> issue in 
> [b65c072|https://github.com/apache/flink/pull/23141/commits/b65c0723ed66e01e83d718f770aa916f41f34581],
>  and I'm currently waiting on 
> [c8eb15c|https://github.com/apache/flink/pull/23141/commits/c8eb15cbc371dc259fb4fda5395f0f55e08ea9c6]
>  to see if I can get PyArrow to resolve >=10.0.0. Solving the proposed 
> dependencies results in:
> {{#}}
> {{# This file is autogenerated by pip-compile with Python 3.8}}
> {{# by the following command:}}
> {{#}}
> {{#    pip-compile --config=pyproject.toml 
> --output-file=dev/compiled-requirements.txt dev/dev-requirements.txt}}
> {{#}}
> {{apache-beam==2.49.0}}
> {{    # via -r dev/dev-requirements.txt}}
> {{avro-python3==1.10.2}}
> {{    # via -r dev/dev-requirements.txt}}
> {{certifi==2023.7.22}}
> {{    # via requests}}
> {{charset-normalizer==3.2.0}}
> {{    # via requests}}
> {{cloudpickle==2.2.1}}
> {{    # via}}
> {{    #   -r dev/dev-requirements.txt}}
> {{    #   apache-beam}}
> {{crcmod==1.7}}
> {{    # via apache-beam}}
> {{cython==3.0.0}}
> {{    # via -r dev/dev-requirements.txt}}
> {{dill==0.3.1.1}}
> {{    # via apache-beam}}
> {{dnspython==2.4.1}}
> {{    # via pymongo}}
> {{docopt==0.6.2}}
> {{    # via hdfs}}
> {{exceptiongroup==1.1.2}}
> {{    # via pytest}}
> {{fastavro==1.8.2}}
> {{    # via}}
> {{    #   -r dev/dev-requirements.txt}}
> {{    #   apache-beam}}
> {{fasteners==0.18}}
> {{    # via apache-beam}}
> {{find-libpython==0.3.0}}
> {{    # via pemja}}
> {{grpcio==1.56.2}}
> {{    # via}}
> {{    #   -r dev/dev-requirements.txt}}
> {{    #   apache-beam}}
> {{    #   grpcio-tools}}
> {{grpcio-tools==1.56.2}}
> {{    # via -r dev/dev-requirements.txt}}
> {{hdfs==2.7.0}}
> {{    # via apache-beam}}
> {{httplib2==0.22.0}}
> {{    # via}}
> {{    #   -r dev/dev-requirements.txt}}
> {{    #   apache-beam}}
> {{idna==3.4}}
> {{    # via requests}}
> {{iniconfig==2.0.0}}
> {{    # via pytest}}
> {{numpy==1.24.4}}
> {{    # via}}
> {{    #   -r dev/dev-requirements.txt}}
> {{    #   apache-beam}}
> {{    #   pandas}}
> {{    #   pyarrow}}
> {{objsize==0.6.1}}
> {{    # via apache-beam}}
> {{orjson==3.9.2}}
> {{    # via apache-beam}}
> {{packaging==23.1}}
> {{    # via pytest}}
> {{pandas==2.0.3}}
> {{    # via -r dev/dev-requirements.txt}}
> {{pemja==0.3.0 ; platform_system != "Windows"}}
> {{    # via -r dev/dev-requirements.txt}}
> {{pluggy==1.2.0}}
> {{    # via pytest}}
> {{proto-plus==1.22.3}}
> {{    # via apache-beam}}
> {{protobuf==4.23.4}}
> {{    # via}}
> {{    #   -r dev/dev-requirements.txt}}
> {{    #   apache-beam}}
> {{    #   grpcio-tools}}
> {{    #   proto-plus}}
> {{py4j==0.10.9.7}}
> {{    # via -r dev/dev-requirements.txt}}
> {{pyarrow==11.0.0}}
> {{    # via}}
> {{    #   -r dev/dev-requirements.txt}}
> {{    #   apache-beam}}
> {{pydot==1.4.2}}
> {{    #

[GitHub] [flink] fredia commented on a diff in pull request #22890: [FLINK-32072][checkpoint] Create and wire FileMergingSnapshotManager with TaskManagerServices

2023-08-28 Thread via GitHub



fredia commented on code in PR #22890:
URL: https://github.com/apache/flink/pull/22890#discussion_r1308123867


##
flink-runtime/src/main/java/org/apache/flink/runtime/state/TaskExecutorFileMergingManager.java:
##
@@ -0,0 +1,163 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.flink.runtime.state;
+
+import org.apache.flink.annotation.VisibleForTesting;
+import org.apache.flink.api.common.JobID;
+import org.apache.flink.configuration.Configuration;
+import 
org.apache.flink.runtime.checkpoint.filemerging.FileMergingSnapshotManager;
+import org.apache.flink.util.Preconditions;
+import org.apache.flink.util.ShutdownHookUtil;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import javax.annotation.Nonnull;
+import javax.annotation.concurrent.GuardedBy;
+
+import java.util.HashMap;
+import java.util.Map;
+
+/**
+ * There is one {@link FileMergingSnapshotManager} for each job per task 
manager. This class holds
+ * all {@link FileMergingSnapshotManager} objects for a task executor 
(manager).
+ */
+public class TaskExecutorFileMergingManager {
+/** Logger for this class. */
+private static final Logger LOG = 
LoggerFactory.getLogger(TaskExecutorFileMergingManager.class);
+
+/**
+ * This map holds all FileMergingSnapshotManager for tasks running on this 
task
+ * manager(executor).
+ */
+@GuardedBy("lock")
+private final Map 
fileMergingSnapshotManagerByJobId;
+
+@GuardedBy("lock")
+private boolean closed;
+
+private final Object lock = new Object();
+
+/** Shutdown hook for this manager. */
+private final Thread shutdownHook;
+
+public TaskExecutorFileMergingManager() {
+this.fileMergingSnapshotManagerByJobId = new HashMap<>();
+this.closed = false;
+this.shutdownHook =
+ShutdownHookUtil.addShutdownHook(this::shutdown, 
getClass().getSimpleName(), LOG);
+}
+
+/**
+ * Initialize file merging snapshot manager for each job according 
configurations when {@link
+ * org.apache.flink.runtime.taskexecutor.TaskExecutor#submitTask}.
+ */
+public FileMergingSnapshotManager fileMergingSnapshotManagerForJob(
+@Nonnull JobID jobId,
+Configuration clusterConfiguration,
+Configuration jobConfiguration) {
+synchronized (lock) {
+if (closed) {
+throw new IllegalStateException(
+"TaskExecutorFileMergingManager is already closed and 
cannot "
++ "register a new 
FileMergingSnapshotManager.");
+}
+FileMergingSnapshotManager fileMergingSnapshotManager =
+fileMergingSnapshotManagerByJobId.get(jobId);
+if (fileMergingSnapshotManager == null) {
+// TODO FLINK-32440: choose different 
FileMergingSnapshotManager by configuration
+LOG.info("Registered new file merging snapshot manager for job 
{}.", jobId);
+}
+return fileMergingSnapshotManager;
+}
+}
+
+@VisibleForTesting
+FileMergingSnapshotManager fileMergingSnapshotManagerForJob(
+@Nonnull JobID jobId, FileMergingSnapshotManager 
mergingSnapshotManager) {

Review Comment:
   After moving file merging initialization into 
`initializeBaseLocationsForCheckpoint()`, this function is no longer needed, 
remove it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] jiexray commented on a diff in pull request #21410: [FLINK-29776][flink-statebackend-changelog][JUnit5 Migration]Module: flink-statebackend-changelog.

2023-08-28 Thread via GitHub



jiexray commented on code in PR #21410:
URL: https://github.com/apache/flink/pull/21410#discussion_r1308122484


##
flink-state-backends/flink-statebackend-rocksdb/src/test/java/org/apache/flink/contrib/streaming/state/EmbeddedRocksDBStateBackendTest.java:
##
@@ -46,18 +46,17 @@
 import org.apache.flink.runtime.state.storage.JobManagerCheckpointStorage;
 import org.apache.flink.runtime.util.BlockerCheckpointStreamFactory;
 import org.apache.flink.runtime.util.BlockingCheckpointOutputStream;
+import org.apache.flink.testutils.junit.extensions.parameterized.Parameter;
+import org.apache.flink.testutils.junit.extensions.parameterized.Parameters;
 import org.apache.flink.util.FlinkRuntimeException;
 import org.apache.flink.util.IOUtils;
 import org.apache.flink.util.function.SupplierWithException;
 
 import org.apache.commons.io.FileUtils;
 import org.apache.commons.io.filefilter.IOFileFilter;
-import org.junit.After;
-import org.junit.ClassRule;
-import org.junit.Test;
-import org.junit.rules.TemporaryFolder;
-import org.junit.runner.RunWith;
-import org.junit.runners.Parameterized;
+import org.junit.jupiter.api.AfterEach;
+import org.junit.jupiter.api.TestTemplate;
+import org.junit.jupiter.api.io.TempDir;
 import org.mockito.Mockito;

Review Comment:
   Good point. I think this would better be changed in another pr.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] jiexray commented on pull request #23305: [FLINK-32975][state/changelog] Enhance equal() for ChangelogMapState's iterator

2023-08-28 Thread via GitHub



jiexray commented on PR #23305:
URL: https://github.com/apache/flink/pull/23305#issuecomment-1696665750

   @masteryhx Thank you for the review. I have added a new test in 
`ChangelogMapStateTest`. In the test, I try to assert the following:
   ```
   state1.iterate().next() equals expected; // assert-1
   state2.iterate().next() equals expected; // assert-2
   
   state1.iterate().next() equals state2.iterate().next(); // assert-3
   ```
   
   `assert-1` and `assert-2` are passed without the overwrite `equal()`. 
However, `assert-3` will fail without the overwrite `equal()`. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Updated] (FLINK-32965) Remove the deprecated MutableObjectMode

2023-08-28 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-32965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-32965:
---
Labels: pull-request-available  (was: )

> Remove the deprecated MutableObjectMode
> ---
>
> Key: FLINK-32965
> URL: https://issues.apache.org/jira/browse/FLINK-32965
> Project: Flink
>  Issue Type: Technical Debt
>  Components: Runtime / Configuration
>Reporter: Rui Fan
>Assignee: Rui Fan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.19.0
>
>
> FLINK-1285 updated the MutableObjectMode to ObjectReuse. The 
> TaskConfig#getMutableObjectMode() is not used for a long long time, and it's 
> internal class, so it can be removed directly.
> Also, we should update the GroupReduceDriverTest:
>  * Updating the _testAllReduceDriverAccumulatingImmutable()_ from 
> `context.setMutableObjectMode({color:#cc7832}false{color}){color:#cc7832};{color}`
>  to 
> `context.getExecutionConfig().disableObjectReuse(){color:#cc7832};{color}`.
>  * Updating the _testAllReduceDriverIncorrectlyAccumulatingMutable()_ to 
> `context.getExecutionConfig().disableObjectReuse(){color:#cc7832};{color}`, 
> and fix the assert.
>  
> More details can be get from 
> [https://github.com/apache/flink/pull/23233#discussion_r1304595960]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[GitHub] [flink] 1996fanrui merged pull request #23303: [FLINK-32965][flink-runtime] Removing the deprecated MutableObjectMode and restore related tests

2023-08-28 Thread via GitHub



1996fanrui merged PR #23303:
URL: https://github.com/apache/flink/pull/23303


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Commented] (FLINK-32758) PyFlink bounds are overly restrictive and outdated

2023-08-28 Thread Dian Fu (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-32758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17759774#comment-17759774
 ] 

Dian Fu commented on FLINK-32758:
-

[~Sergey Nuyanzin] [~deepyaman]  I have submitted a hotfix to temporary 
limiting fastavro < 1.8 to make the CI green (verified it on my CI): 
[https://github.com/apache/flink/commit/345dece9a8fd58d6ea1c829052fb2f3c68516b48]

> PyFlink bounds are overly restrictive and outdated
> --
>
> Key: FLINK-32758
> URL: https://issues.apache.org/jira/browse/FLINK-32758
> Project: Flink
>  Issue Type: Improvement
>  Components: API / Python
>Affects Versions: 1.17.1, 1.19.0
>Reporter: Deepyaman Datta
>Assignee: Deepyaman Datta
>Priority: Blocker
>  Labels: pull-request-available, test-stability
>
> Hi! I am part of a team building the Flink backend for Ibis 
> ([https://github.com/ibis-project/ibis]). We would like to leverage PyFlink 
> under the hood for execution; however, PyFlink's requirements are 
> incompatible with several other Ibis requirements. Beyond Ibis, PyFlink's 
> outdated and restrictive requirements prevent it from being used alongside 
> most recent releases of Python data libraries.
> Some of the major libraries we (and likely others in the Python community 
> interested in using PyFlink alongside other libraries) need compatibility 
> with:
>  * PyArrow (at least >=10.0.0, but there's no reason not to be also be 
> compatible with latest)
>  * pandas (should be compatible with 2.x series, but also probably with 
> 1.4.x, released January 2022, and 1.5.x)
>  * numpy (1.22 was released in December 2022)
>  * Newer releases of Apache Beam
>  * Newer releases of cython
> Furthermore, uncapped dependencies could be more generally preferable, as 
> they avoid the need for frequent PyFlink releases as newer versions of 
> libraries are released. A common (and great) argument for not upper-bounding 
> dependencies, especially for libraries: 
> [https://iscinumpy.dev/post/bound-version-constraints/]
> I am currently testing removing upper bounds in 
> [https://github.com/apache/flink/pull/23141]; so far, builds pass without 
> issue in 
> [b65c072|https://github.com/apache/flink/pull/23141/commits/b65c0723ed66e01e83d718f770aa916f41f34581],
>  and I'm currently waiting on 
> [c8eb15c|https://github.com/apache/flink/pull/23141/commits/c8eb15cbc371dc259fb4fda5395f0f55e08ea9c6]
>  to see if I can get PyArrow to resolve >=10.0.0. Solving the proposed 
> dependencies results in:
> {{#}}
> {{# This file is autogenerated by pip-compile with Python 3.8}}
> {{# by the following command:}}
> {{#}}
> {{#    pip-compile --config=pyproject.toml 
> --output-file=dev/compiled-requirements.txt dev/dev-requirements.txt}}
> {{#}}
> {{apache-beam==2.49.0}}
> {{    # via -r dev/dev-requirements.txt}}
> {{avro-python3==1.10.2}}
> {{    # via -r dev/dev-requirements.txt}}
> {{certifi==2023.7.22}}
> {{    # via requests}}
> {{charset-normalizer==3.2.0}}
> {{    # via requests}}
> {{cloudpickle==2.2.1}}
> {{    # via}}
> {{    #   -r dev/dev-requirements.txt}}
> {{    #   apache-beam}}
> {{crcmod==1.7}}
> {{    # via apache-beam}}
> {{cython==3.0.0}}
> {{    # via -r dev/dev-requirements.txt}}
> {{dill==0.3.1.1}}
> {{    # via apache-beam}}
> {{dnspython==2.4.1}}
> {{    # via pymongo}}
> {{docopt==0.6.2}}
> {{    # via hdfs}}
> {{exceptiongroup==1.1.2}}
> {{    # via pytest}}
> {{fastavro==1.8.2}}
> {{    # via}}
> {{    #   -r dev/dev-requirements.txt}}
> {{    #   apache-beam}}
> {{fasteners==0.18}}
> {{    # via apache-beam}}
> {{find-libpython==0.3.0}}
> {{    # via pemja}}
> {{grpcio==1.56.2}}
> {{    # via}}
> {{    #   -r dev/dev-requirements.txt}}
> {{    #   apache-beam}}
> {{    #   grpcio-tools}}
> {{grpcio-tools==1.56.2}}
> {{    # via -r dev/dev-requirements.txt}}
> {{hdfs==2.7.0}}
> {{    # via apache-beam}}
> {{httplib2==0.22.0}}
> {{    # via}}
> {{    #   -r dev/dev-requirements.txt}}
> {{    #   apache-beam}}
> {{idna==3.4}}
> {{    # via requests}}
> {{iniconfig==2.0.0}}
> {{    # via pytest}}
> {{numpy==1.24.4}}
> {{    # via}}
> {{    #   -r dev/dev-requirements.txt}}
> {{    #   apache-beam}}
> {{    #   pandas}}
> {{    #   pyarrow}}
> {{objsize==0.6.1}}
> {{    # via apache-beam}}
> {{orjson==3.9.2}}
> {{    # via apache-beam}}
> {{packaging==23.1}}
> {{    # via pytest}}
> {{pandas==2.0.3}}
> {{    # via -r dev/dev-requirements.txt}}
> {{pemja==0.3.0 ; platform_system != "Windows"}}
> {{    # via -r dev/dev-requirements.txt}}
> {{pluggy==1.2.0}}
> {{    # via pytest}}
> {{proto-plus==1.22.3}}
> {{    # via apache-beam}}
> {{protobuf==4.23.4}}
> {{    # via}}
> {{    #   -r dev/dev-requirements.txt}}
> {{    #   apache-beam}}
> {{    #   grpcio-tools}}
> {{    #   proto-plus}}
> {{py4j==0.10.9.7}}
> {{    # via -r

[jira] [Resolved] (FLINK-32965) Remove the deprecated MutableObjectMode

2023-08-28 Thread Rui Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-32965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Fan resolved FLINK-32965.
-
Fix Version/s: 1.19.0
   Resolution: Fixed

> Remove the deprecated MutableObjectMode
> ---
>
> Key: FLINK-32965
> URL: https://issues.apache.org/jira/browse/FLINK-32965
> Project: Flink
>  Issue Type: Technical Debt
>  Components: Runtime / Configuration
>Reporter: Rui Fan
>Assignee: Rui Fan
>Priority: Major
> Fix For: 1.19.0
>
>
> FLINK-1285 updated the MutableObjectMode to ObjectReuse. The 
> TaskConfig#getMutableObjectMode() is not used for a long long time, and it's 
> internal class, so it can be removed directly.
> Also, we should update the GroupReduceDriverTest:
>  * Updating the _testAllReduceDriverAccumulatingImmutable()_ from 
> `context.setMutableObjectMode({color:#cc7832}false{color}){color:#cc7832};{color}`
>  to 
> `context.getExecutionConfig().disableObjectReuse(){color:#cc7832};{color}`.
>  * Updating the _testAllReduceDriverIncorrectlyAccumulatingMutable()_ to 
> `context.getExecutionConfig().disableObjectReuse(){color:#cc7832};{color}`, 
> and fix the assert.
>  
> More details can be get from 
> [https://github.com/apache/flink/pull/23233#discussion_r1304595960]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (FLINK-32965) Remove the deprecated MutableObjectMode

2023-08-28 Thread Rui Fan (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-32965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17759773#comment-17759773
 ] 

Rui Fan commented on FLINK-32965:
-

Merged  via 3fa046ef7c461554b50d2d791dc4846cd8ff75b7

> Remove the deprecated MutableObjectMode
> ---
>
> Key: FLINK-32965
> URL: https://issues.apache.org/jira/browse/FLINK-32965
> Project: Flink
>  Issue Type: Technical Debt
>  Components: Runtime / Configuration
>Reporter: Rui Fan
>Assignee: Rui Fan
>Priority: Major
>
> FLINK-1285 updated the MutableObjectMode to ObjectReuse. The 
> TaskConfig#getMutableObjectMode() is not used for a long long time, and it's 
> internal class, so it can be removed directly.
> Also, we should update the GroupReduceDriverTest:
>  * Updating the _testAllReduceDriverAccumulatingImmutable()_ from 
> `context.setMutableObjectMode({color:#cc7832}false{color}){color:#cc7832};{color}`
>  to 
> `context.getExecutionConfig().disableObjectReuse(){color:#cc7832};{color}`.
>  * Updating the _testAllReduceDriverIncorrectlyAccumulatingMutable()_ to 
> `context.getExecutionConfig().disableObjectReuse(){color:#cc7832};{color}`, 
> and fix the assert.
>  
> More details can be get from 
> [https://github.com/apache/flink/pull/23233#discussion_r1304595960]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[GitHub] [flink] flinkbot commented on pull request #23314: Test

2023-08-28 Thread via GitHub



flinkbot commented on PR #23314:
URL: https://github.com/apache/flink/pull/23314#issuecomment-1696659499

   
   ## CI report:
   
   * 4e7859ec86f75ec9187a89210460c0d3b9782ffc UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] WencongLiu opened a new pull request, #23314: Test

2023-08-28 Thread via GitHub



WencongLiu opened a new pull request, #23314:
URL: https://github.com/apache/flink/pull/23314

   
   
   ## What is the purpose of the change
   
   *(For example: This pull request makes task deployment go through the blob 
server, rather than through RPC. That way we avoid re-transferring them on each 
deployment (during recovery).)*
   
   
   ## Brief change log
   
   *(for example:)*
 - *The TaskInfo is stored in the blob store on job creation time as a 
persistent artifact*
 - *Deployments RPC transmits only the blob storage reference*
 - *TaskManagers retrieve the TaskInfo from the blob cache*
   
   
   ## Verifying this change
   
   Please make sure both new and modified tests in this PR follows the 
conventions defined in our code quality guide: 
https://flink.apache.org/contributing/code-style-and-quality-common.html#testing
   
   *(Please pick either of the following options)*
   
   This change is a trivial rework / code cleanup without any test coverage.
   
   *(or)*
   
   This change is already covered by existing tests, such as *(please describe 
tests)*.
   
   *(or)*
   
   This change added tests and can be verified as follows:
   
   *(example:)*
 - *Added integration tests for end-to-end deployment with large payloads 
(100MB)*
 - *Extended integration test for recovery after master (JobManager) 
failure*
 - *Added test that validates that TaskInfo is transferred only once across 
recoveries*
 - *Manually verified the change by running a 4 node cluster with 2 
JobManagers and 4 TaskManagers, a stateful streaming program, and killing one 
JobManager and two TaskManagers during the execution, verifying that recovery 
happens correctly.*
   
   ## Does this pull request potentially affect one of the following parts:
   
 - Dependencies (does it add or upgrade a dependency): (yes / no)
 - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: (yes / no)
 - The serializers: (yes / no / don't know)
 - The runtime per-record code paths (performance sensitive): (yes / no / 
don't know)
 - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Kubernetes/Yarn, ZooKeeper: (yes / no / don't know)
 - The S3 file system connector: (yes / no / don't know)
   
   ## Documentation
   
 - Does this pull request introduce a new feature? (yes / no)
 - If yes, how is the feature documented? (not applicable / docs / JavaDocs 
/ not documented)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] xintongsong commented on pull request #23292: [FLINK-32817] Harnessing Jackson for Secure Serialization of YarnLocalResourceDescriptor

2023-08-28 Thread via GitHub



xintongsong commented on PR #23292:
URL: https://github.com/apache/flink/pull/23292#issuecomment-1696654555

   @YesOrNo828, please take a look into the CI failures. The failures are all 
from the Yarn deployment, which are likely related to the changes of this PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] flinkbot commented on pull request #23313: [FLINK-20767][table planner] Support filter push down on nested fields

2023-08-28 Thread via GitHub



flinkbot commented on PR #23313:
URL: https://github.com/apache/flink/pull/23313#issuecomment-1696630993

   
   ## CI report:
   
   * 5470431742bf63292142aaa7c96b4c894b3a7c82 UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Updated] (FLINK-20767) add nested field support for SupportsFilterPushDown

2023-08-28 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-20767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-20767:
---
Labels: pull-request-available  (was: )

> add nested field support for SupportsFilterPushDown
> ---
>
> Key: FLINK-20767
> URL: https://issues.apache.org/jira/browse/FLINK-20767
> Project: Flink
>  Issue Type: Sub-task
>  Components: Table SQL / Planner
>Affects Versions: 1.12.0
>Reporter: Jun Zhang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.18.0
>
>
> I think we should add the nested field support for SupportsFilterPushDown



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[GitHub] [flink] venkata91 opened a new pull request, #23313: [FLINK-20767][table planner] Support filter push down on nested fields

2023-08-28 Thread via GitHub



venkata91 opened a new pull request, #23313:
URL: https://github.com/apache/flink/pull/23313

   
   
   ## What is the purpose of the change
   
   Support filter push down on nested fields 
   For example:
   `SELECT * FROM users WHERE user.age > 18`
   In the above SQL, `user.age > 18` filter can be pushed to table source to 
avoid scanning records that don't match the predicate. This is especially 
useful in analytics with columnar formats as well as in JDBC datasources.
   
   ## Brief change log
   
   - Introduce a new `ResolvedExpression` called 
`NestedFieldReferenceExpression` to handle filters on nested fields
   - Changes to `RexNodeToExpressionConverter` and `ExpressionConverter` to 
convert `RexFieldAccess` <=> `NestedFieldReferenceExpression` and vice versa.
   
   ## Verifying this change
   
   This change added tests and can be verified as follows:
   
   Added unit tests in `PushFilterIntoTableSourceScanRuleTest`
   
   ## Does this pull request potentially affect one of the following parts:
   
 - Dependencies (does it add or upgrade a dependency): no
 - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: no
 - The serializers: no
 - The runtime per-record code paths (performance sensitive): no
 - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Kubernetes/Yarn, ZooKeeper: no
 - The S3 file system connector: no
   
   ## Documentation
   
 - Does this pull request introduce a new feature? 
 - If yes, how is the feature documented? not applicable
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Updated] (FLINK-30630) Python: BatchModeDataStreamTests.test_keyed_map failed

2023-08-28 Thread Flink Jira Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-30630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Flink Jira Bot updated FLINK-30630:
---
  Labels: auto-deprioritized-critical test-stability  (was: stale-critical 
test-stability)
Priority: Major  (was: Critical)

This issue was labeled "stale-critical" 7 days ago and has not received any 
updates so it is being deprioritized. If this ticket is actually Critical, 
please raise the priority and ask a committer to assign you the issue or revive 
the public discussion.


> Python: BatchModeDataStreamTests.test_keyed_map failed
> --
>
> Key: FLINK-30630
> URL: https://issues.apache.org/jira/browse/FLINK-30630
> Project: Flink
>  Issue Type: Bug
>  Components: API / Python
>Affects Versions: 1.15.3
>Reporter: Matthias Pohl
>Priority: Major
>  Labels: auto-deprioritized-critical, test-stability
>
> {{BatchModeDataStreamTests.test_keyed_map}} failed in 1.15:
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=44691=logs=821b528f-1eed-5598-a3b4-7f748b13f261=6bb545dd-772d-5d8c-f258-f5085fba3295



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (FLINK-32476) Support configuring object-reuse for internal operators

2023-08-28 Thread Flink Jira Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-32476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Flink Jira Bot updated FLINK-32476:
---
Labels: pull-request-available stale-major  (was: pull-request-available)

I am the [Flink Jira Bot|https://github.com/apache/flink-jira-bot/] and I help 
the community manage its development. I see this issues has been marked as 
Major but is unassigned and neither itself nor its Sub-Tasks have been updated 
for 60 days. I have gone ahead and added a "stale-major" to the issue". If this 
ticket is a Major, please either assign yourself or give an update. Afterwards, 
please remove the label or in 7 days the issue will be deprioritized.


> Support configuring object-reuse for internal operators
> ---
>
> Key: FLINK-32476
> URL: https://issues.apache.org/jira/browse/FLINK-32476
> Project: Flink
>  Issue Type: New Feature
>  Components: Runtime / Task
>Reporter: Xuannan Su
>Priority: Major
>  Labels: pull-request-available, stale-major
>
> Currently, object reuse is disabled by default for streaming jobs in order to 
> prevent unexpected behavior. Object reuse becomes problematic when the 
> upstream operator stores its output while the downstream operator modifies 
> the input.
> However, many operators implemented by Flink, such as Flink SQL operators, do 
> not modify the input. This implies that it is safe to reuse the input object 
> in such cases. Therefore, we intend to enable object reuse specifically for 
> operators that do not modify the input.
> As the first step, we will focus on the operators implemented within Flink. 
> We will create the FLIP to introduce the API that allows user-defined 
> operators to enable object reuse in the future.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (FLINK-32396) Support timestamp for jdbc driver and gateway

2023-08-28 Thread Flink Jira Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-32396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Flink Jira Bot updated FLINK-32396:
---
Labels: pull-request-available stale-major  (was: pull-request-available)

I am the [Flink Jira Bot|https://github.com/apache/flink-jira-bot/] and I help 
the community manage its development. I see this issues has been marked as 
Major but is unassigned and neither itself nor its Sub-Tasks have been updated 
for 60 days. I have gone ahead and added a "stale-major" to the issue". If this 
ticket is a Major, please either assign yourself or give an update. Afterwards, 
please remove the label or in 7 days the issue will be deprioritized.


> Support timestamp for jdbc driver and gateway
> -
>
> Key: FLINK-32396
> URL: https://issues.apache.org/jira/browse/FLINK-32396
> Project: Flink
>  Issue Type: Improvement
>  Components: Table SQL / JDBC
>Affects Versions: 1.18.0
>Reporter: Fang Yong
>Priority: Major
>  Labels: pull-request-available, stale-major
>
> Support timestamp and timestamp_ltz data type for jdbc driver and sql-gateway



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[GitHub] [flink-kubernetes-operator] mananmangal commented on a diff in pull request #648: [FLINK-32700] Support job drain for Savepoint upgrade mode

2023-08-28 Thread via GitHub



mananmangal commented on code in PR #648:
URL: 
https://github.com/apache/flink-kubernetes-operator/pull/648#discussion_r1291624743


##
flink-kubernetes-operator/src/test/java/org/apache/flink/kubernetes/operator/reconciler/deployment/ApplicationReconcilerTest.java:
##
@@ -1070,4 +1070,33 @@ public void testUpgradeReconciledGeneration() throws 
Exception {
 .getMetadata()
 .getGeneration());
 }
+
+@ParameterizedTest
+
@MethodSource("org.apache.flink.kubernetes.operator.TestUtils#flinkVersions")
+public void testSubmitAndDrainOnCleanUpWithSavepoint(FlinkVersion 
flinkVersion)
+throws Exception {
+var conf = configManager.getDefaultConfig();
+conf.set(KubernetesOperatorConfigOptions.SAVEPOINT_ON_DELETION, true);
+conf.set(KubernetesOperatorConfigOptions.DRAIN_ON_SAVEPOINT_DELETION, 
true);
+configManager.updateDefaultConfig(conf);
+
+FlinkDeployment deployment = 
TestUtils.buildApplicationCluster(flinkVersion);
+
+// session ready
+reconciler.reconcile(deployment, 
TestUtils.createContextWithReadyFlinkDeployment());
+verifyAndSetRunningJobsToStatus(deployment, flinkService.listJobs());
+
+// clean up
+assertEquals(
+null, 
deployment.getStatus().getJobStatus().getSavepointInfo().getLastSavepoint());
+reconciler.cleanup(deployment, 
TestUtils.createContextWithReadyFlinkDeployment());
+assertEquals(
+"savepoint_0",
+deployment
+.getStatus()
+.getJobStatus()
+.getSavepointInfo()
+.getLastSavepoint()
+.getLocation());
+}

Review Comment:
   I understand, I will make these changes.



##
flink-kubernetes-operator/src/test/java/org/apache/flink/kubernetes/operator/reconciler/deployment/ApplicationReconcilerTest.java:
##
@@ -1070,4 +1070,33 @@ public void testUpgradeReconciledGeneration() throws 
Exception {
 .getMetadata()
 .getGeneration());
 }
+
+@ParameterizedTest
+
@MethodSource("org.apache.flink.kubernetes.operator.TestUtils#flinkVersions")
+public void testSubmitAndDrainOnCleanUpWithSavepoint(FlinkVersion 
flinkVersion)
+throws Exception {
+var conf = configManager.getDefaultConfig();
+conf.set(KubernetesOperatorConfigOptions.SAVEPOINT_ON_DELETION, true);
+conf.set(KubernetesOperatorConfigOptions.DRAIN_ON_SAVEPOINT_DELETION, 
true);
+configManager.updateDefaultConfig(conf);
+
+FlinkDeployment deployment = 
TestUtils.buildApplicationCluster(flinkVersion);
+
+// session ready
+reconciler.reconcile(deployment, 
TestUtils.createContextWithReadyFlinkDeployment());
+verifyAndSetRunningJobsToStatus(deployment, flinkService.listJobs());
+
+// clean up
+assertEquals(
+null, 
deployment.getStatus().getJobStatus().getSavepointInfo().getLastSavepoint());
+reconciler.cleanup(deployment, 
TestUtils.createContextWithReadyFlinkDeployment());
+assertEquals(
+"savepoint_0",
+deployment
+.getStatus()
+.getJobStatus()
+.getSavepointInfo()
+.getLastSavepoint()
+.getLocation());
+}

Review Comment:
   Replaced unit tests to AbstractFlinkServiceTests



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Commented] (FLINK-32758) PyFlink bounds are overly restrictive and outdated

2023-08-28 Thread Sergey Nuyanzin (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-32758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17759693#comment-17759693
 ] 

Sergey Nuyanzin commented on FLINK-32758:
-

> I'm not sure how I can verify that a potential fix would work, if I try? Can 
> I trigger these tests manually?
if you have set your own CI then this could help (just movement of the job to 
normal CI from nightly) 
https://github.com/apache/flink/pull/23045/commits/16b65720306ca820dfd83f1ccaacb4b0aed850ac

if you haven't set your own CI then also rename of job is required. Or once a 
fix ready I can help scheduling it on my own CI

> PyFlink bounds are overly restrictive and outdated
> --
>
> Key: FLINK-32758
> URL: https://issues.apache.org/jira/browse/FLINK-32758
> Project: Flink
>  Issue Type: Improvement
>  Components: API / Python
>Affects Versions: 1.17.1, 1.19.0
>Reporter: Deepyaman Datta
>Assignee: Deepyaman Datta
>Priority: Blocker
>  Labels: pull-request-available, test-stability
>
> Hi! I am part of a team building the Flink backend for Ibis 
> ([https://github.com/ibis-project/ibis]). We would like to leverage PyFlink 
> under the hood for execution; however, PyFlink's requirements are 
> incompatible with several other Ibis requirements. Beyond Ibis, PyFlink's 
> outdated and restrictive requirements prevent it from being used alongside 
> most recent releases of Python data libraries.
> Some of the major libraries we (and likely others in the Python community 
> interested in using PyFlink alongside other libraries) need compatibility 
> with:
>  * PyArrow (at least >=10.0.0, but there's no reason not to be also be 
> compatible with latest)
>  * pandas (should be compatible with 2.x series, but also probably with 
> 1.4.x, released January 2022, and 1.5.x)
>  * numpy (1.22 was released in December 2022)
>  * Newer releases of Apache Beam
>  * Newer releases of cython
> Furthermore, uncapped dependencies could be more generally preferable, as 
> they avoid the need for frequent PyFlink releases as newer versions of 
> libraries are released. A common (and great) argument for not upper-bounding 
> dependencies, especially for libraries: 
> [https://iscinumpy.dev/post/bound-version-constraints/]
> I am currently testing removing upper bounds in 
> [https://github.com/apache/flink/pull/23141]; so far, builds pass without 
> issue in 
> [b65c072|https://github.com/apache/flink/pull/23141/commits/b65c0723ed66e01e83d718f770aa916f41f34581],
>  and I'm currently waiting on 
> [c8eb15c|https://github.com/apache/flink/pull/23141/commits/c8eb15cbc371dc259fb4fda5395f0f55e08ea9c6]
>  to see if I can get PyArrow to resolve >=10.0.0. Solving the proposed 
> dependencies results in:
> {{#}}
> {{# This file is autogenerated by pip-compile with Python 3.8}}
> {{# by the following command:}}
> {{#}}
> {{#    pip-compile --config=pyproject.toml 
> --output-file=dev/compiled-requirements.txt dev/dev-requirements.txt}}
> {{#}}
> {{apache-beam==2.49.0}}
> {{    # via -r dev/dev-requirements.txt}}
> {{avro-python3==1.10.2}}
> {{    # via -r dev/dev-requirements.txt}}
> {{certifi==2023.7.22}}
> {{    # via requests}}
> {{charset-normalizer==3.2.0}}
> {{    # via requests}}
> {{cloudpickle==2.2.1}}
> {{    # via}}
> {{    #   -r dev/dev-requirements.txt}}
> {{    #   apache-beam}}
> {{crcmod==1.7}}
> {{    # via apache-beam}}
> {{cython==3.0.0}}
> {{    # via -r dev/dev-requirements.txt}}
> {{dill==0.3.1.1}}
> {{    # via apache-beam}}
> {{dnspython==2.4.1}}
> {{    # via pymongo}}
> {{docopt==0.6.2}}
> {{    # via hdfs}}
> {{exceptiongroup==1.1.2}}
> {{    # via pytest}}
> {{fastavro==1.8.2}}
> {{    # via}}
> {{    #   -r dev/dev-requirements.txt}}
> {{    #   apache-beam}}
> {{fasteners==0.18}}
> {{    # via apache-beam}}
> {{find-libpython==0.3.0}}
> {{    # via pemja}}
> {{grpcio==1.56.2}}
> {{    # via}}
> {{    #   -r dev/dev-requirements.txt}}
> {{    #   apache-beam}}
> {{    #   grpcio-tools}}
> {{grpcio-tools==1.56.2}}
> {{    # via -r dev/dev-requirements.txt}}
> {{hdfs==2.7.0}}
> {{    # via apache-beam}}
> {{httplib2==0.22.0}}
> {{    # via}}
> {{    #   -r dev/dev-requirements.txt}}
> {{    #   apache-beam}}
> {{idna==3.4}}
> {{    # via requests}}
> {{iniconfig==2.0.0}}
> {{    # via pytest}}
> {{numpy==1.24.4}}
> {{    # via}}
> {{    #   -r dev/dev-requirements.txt}}
> {{    #   apache-beam}}
> {{    #   pandas}}
> {{    #   pyarrow}}
> {{objsize==0.6.1}}
> {{    # via apache-beam}}
> {{orjson==3.9.2}}
> {{    # via apache-beam}}
> {{packaging==23.1}}
> {{    # via pytest}}
> {{pandas==2.0.3}}
> {{    # via -r dev/dev-requirements.txt}}
> {{pemja==0.3.0 ; platform_system != "Windows"}}
> {{    # via -r dev/dev-requirements.txt}}
> {{pluggy==1.2.0}}
> {{    # via pytest}}
> {{proto-plus==1.22.3}}
>

[jira] [Comment Edited] (FLINK-32976) NullPointException when starting flink cluster

2023-08-28 Thread Feng Jin (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-32976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17759670#comment-17759670
 ] 

Feng Jin edited comment on FLINK-32976 at 8/28/23 5:41 PM:
---

[~martijnvisser]  Sorry for the not clearly description. I have updated the 
reproducible steps.   

 

The corresponding code for  is as follows:

https://github.com/hackergin/flink/blob/78136133fbec4ca145dec66d4bc0c324c8e16d82/flink-runtime/src/main/java/org/apache/flink/runtime/security/token/hadoop/HadoopFSDelegationTokenProvider.java#L173
{code:java}
//代码占位符

if 
(flinkConfiguration.getString(DeploymentOptions.TARGET).toLowerCase().contains("yarn"))
 {
LOG.debug("Running on YARN, trying to add staging directory to file systems 
to access");
String yarnStagingDirectory =
flinkConfiguration.getString("yarn.staging-directory", "");
if (!StringUtils.isBlank(yarnStagingDirectory)) {
LOG.debug(
"Adding staging directory to file systems to access {}",
yarnStagingDirectory);
result.add(new 
Path(yarnStagingDirectory).getFileSystem(hadoopConfiguration));
LOG.debug("Staging directory added to file systems to access 
successfully");
} else {
LOG.debug(
"Staging directory is not set or empty so not added to file 
systems to access");
}
} {code}
When starting the standalone cluster, since the TARGET parameter is not set, 
there is no check for the existence of TARGET, resulting in a null pointer 
error.

 

Therefore, we can easily fix this issue by adding a check. 

 

Flink stand alone cluster with Hadoop configuration is not a common practice. 
Most Flink jobs are either run on YARN or on Kubernetes. In these two modes, 
the TARGET parameter is correctly set, so this issue does not occur.

 


was (Author: hackergin):
[~martijnvisser]  Sorry for the not clearly description. I have updated the 
reproducible steps.   

 

The corresponding code for  is as follows:

 
{code:java}
//代码占位符

if 
(flinkConfiguration.getString(DeploymentOptions.TARGET).toLowerCase().contains("yarn"))
 {
LOG.debug("Running on YARN, trying to add staging directory to file systems 
to access");
String yarnStagingDirectory =
flinkConfiguration.getString("yarn.staging-directory", "");
if (!StringUtils.isBlank(yarnStagingDirectory)) {
LOG.debug(
"Adding staging directory to file systems to access {}",
yarnStagingDirectory);
result.add(new 
Path(yarnStagingDirectory).getFileSystem(hadoopConfiguration));
LOG.debug("Staging directory added to file systems to access 
successfully");
} else {
LOG.debug(
"Staging directory is not set or empty so not added to file 
systems to access");
}
} {code}
When starting the standalone cluster, since the TARGET parameter is not set, 
there is no check for the existence of TARGET, resulting in a null pointer 
error.

 

Therefore, we can easily fix this issue by adding a check. 

 

Flink stand alone cluster with Hadoop configuration is not a common practice. 
Most Flink jobs are either run on YARN or on Kubernetes. In these two modes, 
the TARGET parameter is correctly set, so this issue does not occur.

 

> NullPointException when starting flink cluster
> --
>
> Key: FLINK-32976
> URL: https://issues.apache.org/jira/browse/FLINK-32976
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Configuration
>Affects Versions: 1.17.1
>Reporter: Feng Jin
>Priority: Major
>
> It can be reproduced when starting flink cluster with hadoop configuration. 
>  
> {code:java}
> //代码占位符
> // Set up hadoop conf , hadoop classpath
> // start jobManager
> ./jobmanager.sh start-foreground {code}
>  
> The error message as follows: 
>  
> {code:java}
> //代码占位符
> Caused by: java.ang.NullPointerException
> at org.apache.flink. runtime. 
> security.token.hadoop.HadoopFSDelegationTokenProvider.getFileSystemsToAccess(HadoopFSDelegationTokenProvider.java:173)~[flink-dist-1.17.1.jar:1.17.1]
> at 
> org.apache.flink.runtime.security.token.hadoop.HadoopFSDelegationTokenProvidertionTokens$1(HadoopFSDelegationTokenProvider.java:113)
>  ~[flink-dist-1.17.1.jar:1.17.1
> at java.security.AccessController.doprivileged(Native Method)~[?:1.8.0 281]
> at javax.security.auth.Subject.doAs(Subject.java:422)~[?:1.8.0 281]
> at org. apache.hadoop . 
> security.UserGroupInformation.doAs(UserGroupInformation. java:1876) 
> ~flink-shacd-hadoop-3-uber-3.1.1.7.2.1.0-327-9.0.jar:3.1.1.7.2.1.0-327-9.0]
> at org. apache.flink. runtime.security.token .hadoop 
> .HadoopFSDelegationTokenProvider.obtainDelegationTcens(HadoopFSDelegationTokenProvider.java:108)~flink-dist-1.17.1.jar:1.17.1]
> at org.apache.flink. runtime.

[jira] [Commented] (FLINK-32976) NullPointException when starting flink cluster

2023-08-28 Thread Feng Jin (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-32976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17759670#comment-17759670
 ] 

Feng Jin commented on FLINK-32976:
--

[~martijnvisser]  Sorry for the not clearly description. I have updated the 
reproducible steps.   

 

The corresponding code for  is as follows:

 
{code:java}
//代码占位符

if 
(flinkConfiguration.getString(DeploymentOptions.TARGET).toLowerCase().contains("yarn"))
 {
LOG.debug("Running on YARN, trying to add staging directory to file systems 
to access");
String yarnStagingDirectory =
flinkConfiguration.getString("yarn.staging-directory", "");
if (!StringUtils.isBlank(yarnStagingDirectory)) {
LOG.debug(
"Adding staging directory to file systems to access {}",
yarnStagingDirectory);
result.add(new 
Path(yarnStagingDirectory).getFileSystem(hadoopConfiguration));
LOG.debug("Staging directory added to file systems to access 
successfully");
} else {
LOG.debug(
"Staging directory is not set or empty so not added to file 
systems to access");
}
} {code}
When starting the standalone cluster, since the TARGET parameter is not set, 
there is no check for the existence of TARGET, resulting in a null pointer 
error.

 

Therefore, we can easily fix this issue by adding a check. 

 

Flink stand alone cluster with Hadoop configuration is not a common practice. 
Most Flink jobs are either run on YARN or on Kubernetes. In these two modes, 
the TARGET parameter is correctly set, so this issue does not occur.

 

> NullPointException when starting flink cluster
> --
>
> Key: FLINK-32976
> URL: https://issues.apache.org/jira/browse/FLINK-32976
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Configuration
>Affects Versions: 1.17.1
>Reporter: Feng Jin
>Priority: Major
>
> It can be reproduced when starting flink cluster with hadoop configuration. 
>  
> {code:java}
> //代码占位符
> // Set up hadoop conf , hadoop classpath
> // start jobManager
> ./jobmanager.sh start-foreground {code}
>  
> The error message as follows: 
>  
> {code:java}
> //代码占位符
> Caused by: java.ang.NullPointerException
> at org.apache.flink. runtime. 
> security.token.hadoop.HadoopFSDelegationTokenProvider.getFileSystemsToAccess(HadoopFSDelegationTokenProvider.java:173)~[flink-dist-1.17.1.jar:1.17.1]
> at 
> org.apache.flink.runtime.security.token.hadoop.HadoopFSDelegationTokenProvidertionTokens$1(HadoopFSDelegationTokenProvider.java:113)
>  ~[flink-dist-1.17.1.jar:1.17.1
> at java.security.AccessController.doprivileged(Native Method)~[?:1.8.0 281]
> at javax.security.auth.Subject.doAs(Subject.java:422)~[?:1.8.0 281]
> at org. apache.hadoop . 
> security.UserGroupInformation.doAs(UserGroupInformation. java:1876) 
> ~flink-shacd-hadoop-3-uber-3.1.1.7.2.1.0-327-9.0.jar:3.1.1.7.2.1.0-327-9.0]
> at org. apache.flink. runtime.security.token .hadoop 
> .HadoopFSDelegationTokenProvider.obtainDelegationTcens(HadoopFSDelegationTokenProvider.java:108)~flink-dist-1.17.1.jar:1.17.1]
> at org.apache.flink. runtime. security.token.DefaultDelegationTokenManager . 
> lambda$obtainDel
> SAndGetNextRenewal$1(DefaultDelegationTokenManager .java:264)~ 
> flink-dist-1.17.1.jar:1.17.1]
> at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193) 
> ~?:1.8.0 281
> at 
> java.util.HashMap$ValueSpliterator.forEachRemaining(HashMap.java:1628)~[?:1.8.0
>  281]at 
> java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482)~?:1.8.0 
> 281]
> at 
> java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472) 
> ~?:1.8.0 281at 
> java,util.stream.Reduce0ps$Reduce0p.evaluateSequential(Reduce0ps.java:708)~?:1.8.0
>  281]
> at 
> java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)~[?:1.8.0
>  281]at java.util.stream.ReferencePipeline.reduce(ReferencePipeline.java:479) 
> ~?:1.8.0 281
> at java.util.stream.ReferencePipeline.min(ReferencePipeline.java:520)~?:1.8.0 
> 281at org. apache. flink. runtime. 
> security.token.DefaultDelegationTokenManager 
> .obtainDelegationTokensAndGeNextRenewal(DefaultDelegationTokenManager 
> .java:286)~[flink-dist-1.17.1.jar:1.17.1
> at org.apache. flink.runtime. security.token.DefaultDelegationTokenManager. 
> obtainDelegationTokens(DefaltDelegationTokenManager.java:242)~[flink-dist-1.17.1.jar:1.17.1]
> at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.initializes@) 
> ~[flink-dist-1.17.1.jar:1.17.1]
> at 
> org.apache.flink.runtime.entrypoint.clusterEntrypoint.nk-dist-1.17.1.jar:1.17.1]
> at org.apache.flink.runtime.entrypoint.ClusterEntrypoint:232) 
> ~[flink-dist-1.17.1.jar:1.17.1]
> at java.security.AccessController.doPrivileged(Native Method) ~[?:1.8. 281]
> at

[jira] [Updated] (FLINK-32976) NullPointException when starting flink cluster

2023-08-28 Thread Feng Jin (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-32976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Feng Jin updated FLINK-32976:
-
Description: 
It can be reproduced when starting flink cluster with hadoop configuration. 

 
{code:java}
//代码占位符

// Set up hadoop conf , hadoop classpath

// start jobManager


./jobmanager.sh start-foreground {code}
 

The error message as follows: 

 
{code:java}
//代码占位符
Caused by: java.ang.NullPointerException
at org.apache.flink. runtime. 
security.token.hadoop.HadoopFSDelegationTokenProvider.getFileSystemsToAccess(HadoopFSDelegationTokenProvider.java:173)~[flink-dist-1.17.1.jar:1.17.1]
at 
org.apache.flink.runtime.security.token.hadoop.HadoopFSDelegationTokenProvidertionTokens$1(HadoopFSDelegationTokenProvider.java:113)
 ~[flink-dist-1.17.1.jar:1.17.1
at java.security.AccessController.doprivileged(Native Method)~[?:1.8.0 281]
at javax.security.auth.Subject.doAs(Subject.java:422)~[?:1.8.0 281]
at org. apache.hadoop . 
security.UserGroupInformation.doAs(UserGroupInformation. java:1876) 
~flink-shacd-hadoop-3-uber-3.1.1.7.2.1.0-327-9.0.jar:3.1.1.7.2.1.0-327-9.0]
at org. apache.flink. runtime.security.token .hadoop 
.HadoopFSDelegationTokenProvider.obtainDelegationTcens(HadoopFSDelegationTokenProvider.java:108)~flink-dist-1.17.1.jar:1.17.1]
at org.apache.flink. runtime. security.token.DefaultDelegationTokenManager . 
lambda$obtainDel
SAndGetNextRenewal$1(DefaultDelegationTokenManager .java:264)~ 
flink-dist-1.17.1.jar:1.17.1]
at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193) 
~?:1.8.0 281
at 
java.util.HashMap$ValueSpliterator.forEachRemaining(HashMap.java:1628)~[?:1.8.0 
281]at 
java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482)~?:1.8.0 
281]
at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472) 
~?:1.8.0 281at 
java,util.stream.Reduce0ps$Reduce0p.evaluateSequential(Reduce0ps.java:708)~?:1.8.0
 281]
at 
java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)~[?:1.8.0 
281]at java.util.stream.ReferencePipeline.reduce(ReferencePipeline.java:479) 
~?:1.8.0 281
at java.util.stream.ReferencePipeline.min(ReferencePipeline.java:520)~?:1.8.0 
281at org. apache. flink. runtime. security.token.DefaultDelegationTokenManager 
.obtainDelegationTokensAndGeNextRenewal(DefaultDelegationTokenManager 
.java:286)~[flink-dist-1.17.1.jar:1.17.1
at org.apache. flink.runtime. security.token.DefaultDelegationTokenManager. 
obtainDelegationTokens(DefaltDelegationTokenManager.java:242)~[flink-dist-1.17.1.jar:1.17.1]
at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.initializes@) 
~[flink-dist-1.17.1.jar:1.17.1]
at 
org.apache.flink.runtime.entrypoint.clusterEntrypoint.nk-dist-1.17.1.jar:1.17.1]
at org.apache.flink.runtime.entrypoint.ClusterEntrypoint:232) 
~[flink-dist-1.17.1.jar:1.17.1]
at java.security.AccessController.doPrivileged(Native Method) ~[?:1.8. 281]
at javax.security.auth.Subject.doAs(Subject.java:422)~?:1.8.0 281]
at org. apache.hadoop . security.UserGroupInformation. doAs 
(UserGroupInformation. 
java:1876)~[flink-shadd-hadoop-3-uber-3.1.1.7.2.1.0-327-9.0.jar:3.1.1.7.2.1.0-327-9.0]
at org.apache.flink.runtime.security. contexts 
.HadoopSecurityContext.runSecured(HadoopSecurijava:41) 
~[flink-dist-1.17.1.jar:1.17.1
at org. apache.flink. runtime. entrypoint. ClusterEntrypoint . 
startCluster(clusterEntrypoint. java:229)link-dist-1.17.1.jar:1.17.1]...2 
more{code}
 

 

  was:
The error message as follows: 

 

 
{code:java}
//代码占位符
Caused by: java.ang.NullPointerException
at org.apache.flink. runtime. 
security.token.hadoop.HadoopFSDelegationTokenProvider.getFileSystemsToAccess(HadoopFSDelegationTokenProvider.java:173)~[flink-dist-1.17.1.jar:1.17.1]
at 
org.apache.flink.runtime.security.token.hadoop.HadoopFSDelegationTokenProvidertionTokens$1(HadoopFSDelegationTokenProvider.java:113)
 ~[flink-dist-1.17.1.jar:1.17.1
at java.security.AccessController.doprivileged(Native Method)~[?:1.8.0 281]
at javax.security.auth.Subject.doAs(Subject.java:422)~[?:1.8.0 281]
at org. apache.hadoop . 
security.UserGroupInformation.doAs(UserGroupInformation. java:1876) 
~flink-shacd-hadoop-3-uber-3.1.1.7.2.1.0-327-9.0.jar:3.1.1.7.2.1.0-327-9.0]
at org. apache.flink. runtime.security.token .hadoop 
.HadoopFSDelegationTokenProvider.obtainDelegationTcens(HadoopFSDelegationTokenProvider.java:108)~flink-dist-1.17.1.jar:1.17.1]
at org.apache.flink. runtime. security.token.DefaultDelegationTokenManager . 
lambda$obtainDel
SAndGetNextRenewal$1(DefaultDelegationTokenManager .java:264)~ 
flink-dist-1.17.1.jar:1.17.1]
at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193) 
~?:1.8.0 281
at 
java.util.HashMap$ValueSpliterator.forEachRemaining(HashMap.java:1628)~[?:1.8.0 
281]at 
java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482)~?:1.8.0 
281]
at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472)

[GitHub] [flink] Jiabao-Sun commented on pull request #23218: [FLINK-32854][flink-runtime][JUnit5 Migration] The state package of flink-runtime module

2023-08-28 Thread via GitHub



Jiabao-Sun commented on PR #23218:
URL: https://github.com/apache/flink/pull/23218#issuecomment-1696047403

   Thanks @1996fanrui for the hard review and sorry for the oversight.
   Please help review it again when you have time.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] flinkbot commented on pull request #23312: [FLINK-32824] Port Calcite's fix for the sql like operator.

2023-08-28 Thread via GitHub



flinkbot commented on PR #23312:
URL: https://github.com/apache/flink/pull/23312#issuecomment-1695987276

   
   ## CI report:
   
   * 4505846115102c1e09ae495076f7d270f53f3da5 UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] xishuaidelin opened a new pull request, #23312: [FLINK-32824] Port Calcite's fix for the sql like operator.

2023-08-28 Thread via GitHub



xishuaidelin opened a new pull request, #23312:
URL: https://github.com/apache/flink/pull/23312

   
   
   ## What is the purpose of the change
   
   This pull request aims to port Calcite's fix for the sql like operator and 
relates to FLINK-32824.
   
   
   ## Brief change log
   
Modified the JAVA_REGEX_SPECIALS in SqlLikeUtils class.
   
   ## Verifying this change
   
   Please make sure both new and modified tests in this PR follows the 
conventions defined in our code quality guide: 
https://flink.apache.org/contributing/code-style-and-quality-common.html#testing
   
   This change added tests and can be verified as follows:
   
 - *testSqlLike in ParserImplTest for validatation.*
 - *testShowColumnsWithLike in TableEnvironmentTest in table layer for 
validation.*
   
   ## Does this pull request potentially affect one of the following parts:
   
 - Dependencies (does it add or upgrade a dependency): (no)
 - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: (no)
 - The serializers: (no)
 - The runtime per-record code paths (performance sensitive): (no)
 - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Kubernetes/Yarn, ZooKeeper: (no)
 - The S3 file system connector: (no)
   
   ## Documentation
   
 - Does this pull request introduce a new feature? (no)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink-connector-aws] hlteoh37 opened a new pull request, #92: [hotfix] Add MiniClusterExtension to ITCase tests

2023-08-28 Thread via GitHub



hlteoh37 opened a new pull request, #92:
URL: https://github.com/apache/flink-connector-aws/pull/92

   ## Purpose of the change
   
   * [Text for ITCase MiniCluster ArchUnitTest has been changed for Flink 
1.18](https://github.com/apache/flink/pull/22399). This PR removes the need for 
this exception violation in ITCase, by registering a local MiniCluster 
environment in the Firehose ITCase.
   * Change in the github workflow is merely to show that this case passes 
against `1.18-SNAPSHOT`. It will be removed before merging.
   
   ## Verifying this change
   This change is to fix breaking ArchUnitTests.
   
   ## Significant changes
   *(Please check any boxes [x] if the answer is "yes". You can first publish 
the PR and check them afterwards, for convenience.)*
   - [ ] Dependencies have been added or upgraded
   - [ ] Public API has been changed (Public API is any class annotated with 
`@Public(Evolving)`)
   - [ ] Serializers have been changed
   - [ ] New feature has been introduced
 - If yes, how is this documented? (not applicable / docs / JavaDocs / not 
documented)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Updated] (FLINK-32976) NullpointException when starting flink cluster

2023-08-28 Thread Feng Jin (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-32976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Feng Jin updated FLINK-32976:
-
Description: 
The error message as follows: 

 

 
{code:java}
//代码占位符
Caused by: java.ang.NullPointerException
at org.apache.flink. runtime. 
security.token.hadoop.HadoopFSDelegationTokenProvider.getFileSystemsToAccess(HadoopFSDelegationTokenProvider.java:173)~[flink-dist-1.17.1.jar:1.17.1]
at 
org.apache.flink.runtime.security.token.hadoop.HadoopFSDelegationTokenProvidertionTokens$1(HadoopFSDelegationTokenProvider.java:113)
 ~[flink-dist-1.17.1.jar:1.17.1
at java.security.AccessController.doprivileged(Native Method)~[?:1.8.0 281]
at javax.security.auth.Subject.doAs(Subject.java:422)~[?:1.8.0 281]
at org. apache.hadoop . 
security.UserGroupInformation.doAs(UserGroupInformation. java:1876) 
~flink-shacd-hadoop-3-uber-3.1.1.7.2.1.0-327-9.0.jar:3.1.1.7.2.1.0-327-9.0]
at org. apache.flink. runtime.security.token .hadoop 
.HadoopFSDelegationTokenProvider.obtainDelegationTcens(HadoopFSDelegationTokenProvider.java:108)~flink-dist-1.17.1.jar:1.17.1]
at org.apache.flink. runtime. security.token.DefaultDelegationTokenManager . 
lambda$obtainDel
SAndGetNextRenewal$1(DefaultDelegationTokenManager .java:264)~ 
flink-dist-1.17.1.jar:1.17.1]
at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193) 
~?:1.8.0 281
at 
java.util.HashMap$ValueSpliterator.forEachRemaining(HashMap.java:1628)~[?:1.8.0 
281]at 
java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482)~?:1.8.0 
281]
at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472) 
~?:1.8.0 281at 
java,util.stream.Reduce0ps$Reduce0p.evaluateSequential(Reduce0ps.java:708)~?:1.8.0
 281]
at 
java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)~[?:1.8.0 
281]at java.util.stream.ReferencePipeline.reduce(ReferencePipeline.java:479) 
~?:1.8.0 281
at java.util.stream.ReferencePipeline.min(ReferencePipeline.java:520)~?:1.8.0 
281at org. apache. flink. runtime. security.token.DefaultDelegationTokenManager 
.obtainDelegationTokensAndGeNextRenewal(DefaultDelegationTokenManager 
.java:286)~[flink-dist-1.17.1.jar:1.17.1
at org.apache. flink.runtime. security.token.DefaultDelegationTokenManager. 
obtainDelegationTokens(DefaltDelegationTokenManager.java:242)~[flink-dist-1.17.1.jar:1.17.1]
at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.initializes@) 
~[flink-dist-1.17.1.jar:1.17.1]
at 
org.apache.flink.runtime.entrypoint.clusterEntrypoint.nk-dist-1.17.1.jar:1.17.1]
at org.apache.flink.runtime.entrypoint.ClusterEntrypoint:232) 
~[flink-dist-1.17.1.jar:1.17.1]
at java.security.AccessController.doPrivileged(Native Method) ~[?:1.8. 281]
at javax.security.auth.Subject.doAs(Subject.java:422)~?:1.8.0 281]
at org. apache.hadoop . security.UserGroupInformation. doAs 
(UserGroupInformation. 
java:1876)~[flink-shadd-hadoop-3-uber-3.1.1.7.2.1.0-327-9.0.jar:3.1.1.7.2.1.0-327-9.0]
at org.apache.flink.runtime.security. contexts 
.HadoopSecurityContext.runSecured(HadoopSecurijava:41) 
~[flink-dist-1.17.1.jar:1.17.1
at org. apache.flink. runtime. entrypoint. ClusterEntrypoint . 
startCluster(clusterEntrypoint. java:229)link-dist-1.17.1.jar:1.17.1]...2 
more{code}
 

 

  was:
The error message as follows: 

 

 
{code:java}
//代码占位符
Caused by: java.ang.NullPointerExceptionat org.apache.flink. runtime. 
security.token.hadoop.HadoopFSDelegationTokenProvider.getFileSystemsToAccess(HadoopFSDelegationTokenProvider.java:173)~[flink-dist-1.17.1.jar:1.17.1]at
 
org.apache.flink.runtime.security.token.hadoop.HadoopFSDelegationTokenProvidertionTokens$1(HadoopFSDelegationTokenProvider.java:113)
 ~[flink-dist-1.17.1.jar:1.17.1at 
java.security.AccessController.doprivileged(Native Method)~[?:1.8.0 281]at 
javax.security.auth.Subject.doAs(Subject.java:422)~[?:1.8.0 281]at org. 
apache.hadoop . security.UserGroupInformation.doAs(UserGroupInformation. 
java:1876) 
~flink-shacd-hadoop-3-uber-3.1.1.7.2.1.0-327-9.0.jar:3.1.1.7.2.1.0-327-9.0]at 
org. apache.flink. runtime.security.token .hadoop 
.HadoopFSDelegationTokenProvider.obtainDelegationTcens(HadoopFSDelegationTokenProvider.java:108)~flink-dist-1.17.1.jar:1.17.1]at
 org.apache.flink. runtime. security.token.DefaultDelegationTokenManager . 
lambda$obtainDelSAndGetNextRenewal$1(DefaultDelegationTokenManager .java:264)~ 
flink-dist-1.17.1.jar:1.17.1]at 
java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193) 
~?:1.8.0 281at 
java.util.HashMap$ValueSpliterator.forEachRemaining(HashMap.java:1628)~[?:1.8.0 
281]at 
java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482)~?:1.8.0 
281]at 
java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472) 
~?:1.8.0 281at 
java,util.stream.Reduce0ps$Reduce0p.evaluateSequential(Reduce0ps.java:708)~?:1.8.0
 281]at 
java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)~[?:1.8.0 
281]at

[jira] [Updated] (FLINK-32976) NullPointException when starting flink cluster

2023-08-28 Thread Feng Jin (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-32976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Feng Jin updated FLINK-32976:
-
Summary: NullPointException when starting flink cluster  (was: 
NullpointException when starting flink cluster)

> NullPointException when starting flink cluster
> --
>
> Key: FLINK-32976
> URL: https://issues.apache.org/jira/browse/FLINK-32976
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Configuration
>Affects Versions: 1.17.1
>Reporter: Feng Jin
>Priority: Major
>
> The error message as follows: 
>  
>  
> {code:java}
> //代码占位符
> Caused by: java.ang.NullPointerException
> at org.apache.flink. runtime. 
> security.token.hadoop.HadoopFSDelegationTokenProvider.getFileSystemsToAccess(HadoopFSDelegationTokenProvider.java:173)~[flink-dist-1.17.1.jar:1.17.1]
> at 
> org.apache.flink.runtime.security.token.hadoop.HadoopFSDelegationTokenProvidertionTokens$1(HadoopFSDelegationTokenProvider.java:113)
>  ~[flink-dist-1.17.1.jar:1.17.1
> at java.security.AccessController.doprivileged(Native Method)~[?:1.8.0 281]
> at javax.security.auth.Subject.doAs(Subject.java:422)~[?:1.8.0 281]
> at org. apache.hadoop . 
> security.UserGroupInformation.doAs(UserGroupInformation. java:1876) 
> ~flink-shacd-hadoop-3-uber-3.1.1.7.2.1.0-327-9.0.jar:3.1.1.7.2.1.0-327-9.0]
> at org. apache.flink. runtime.security.token .hadoop 
> .HadoopFSDelegationTokenProvider.obtainDelegationTcens(HadoopFSDelegationTokenProvider.java:108)~flink-dist-1.17.1.jar:1.17.1]
> at org.apache.flink. runtime. security.token.DefaultDelegationTokenManager . 
> lambda$obtainDel
> SAndGetNextRenewal$1(DefaultDelegationTokenManager .java:264)~ 
> flink-dist-1.17.1.jar:1.17.1]
> at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193) 
> ~?:1.8.0 281
> at 
> java.util.HashMap$ValueSpliterator.forEachRemaining(HashMap.java:1628)~[?:1.8.0
>  281]at 
> java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482)~?:1.8.0 
> 281]
> at 
> java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472) 
> ~?:1.8.0 281at 
> java,util.stream.Reduce0ps$Reduce0p.evaluateSequential(Reduce0ps.java:708)~?:1.8.0
>  281]
> at 
> java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)~[?:1.8.0
>  281]at java.util.stream.ReferencePipeline.reduce(ReferencePipeline.java:479) 
> ~?:1.8.0 281
> at java.util.stream.ReferencePipeline.min(ReferencePipeline.java:520)~?:1.8.0 
> 281at org. apache. flink. runtime. 
> security.token.DefaultDelegationTokenManager 
> .obtainDelegationTokensAndGeNextRenewal(DefaultDelegationTokenManager 
> .java:286)~[flink-dist-1.17.1.jar:1.17.1
> at org.apache. flink.runtime. security.token.DefaultDelegationTokenManager. 
> obtainDelegationTokens(DefaltDelegationTokenManager.java:242)~[flink-dist-1.17.1.jar:1.17.1]
> at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.initializes@) 
> ~[flink-dist-1.17.1.jar:1.17.1]
> at 
> org.apache.flink.runtime.entrypoint.clusterEntrypoint.nk-dist-1.17.1.jar:1.17.1]
> at org.apache.flink.runtime.entrypoint.ClusterEntrypoint:232) 
> ~[flink-dist-1.17.1.jar:1.17.1]
> at java.security.AccessController.doPrivileged(Native Method) ~[?:1.8. 281]
> at javax.security.auth.Subject.doAs(Subject.java:422)~?:1.8.0 281]
> at org. apache.hadoop . security.UserGroupInformation. doAs 
> (UserGroupInformation. 
> java:1876)~[flink-shadd-hadoop-3-uber-3.1.1.7.2.1.0-327-9.0.jar:3.1.1.7.2.1.0-327-9.0]
> at org.apache.flink.runtime.security. contexts 
> .HadoopSecurityContext.runSecured(HadoopSecurijava:41) 
> ~[flink-dist-1.17.1.jar:1.17.1
> at org. apache.flink. runtime. entrypoint. ClusterEntrypoint . 
> startCluster(clusterEntrypoint. java:229)link-dist-1.17.1.jar:1.17.1]...2 
> more{code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[GitHub] [flink] 1996fanrui commented on a diff in pull request #22985: [FLINK-21883][scheduler] Implement cooldown period for adaptive scheduler

2023-08-28 Thread via GitHub



1996fanrui commented on code in PR #22985:
URL: https://github.com/apache/flink/pull/22985#discussion_r1307539956


##
flink-runtime/src/main/java/org/apache/flink/runtime/scheduler/adaptive/Executing.java:
##
@@ -144,6 +188,17 @@ private void maybeRescale() {
 }
 }
 
+private Duration timeSinceLastRescale() {
+return Duration.between(lastRescale, Instant.now());
+}
+
+private void rescaleWhenCooldownPeriodIsOver() {

Review Comment:
   If we don't drop it or don't have any filter, the `scaling-interval.min` 
cannot be guaranteed, right?
   
   The interval of  `08:01:40` and `08:01:45` is 5 s, and the 
`scaling-interval.min`  is 30s. It may not be as expected. Do you think so?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] echauchot commented on a diff in pull request #22985: [FLINK-21883][scheduler] Implement cooldown period for adaptive scheduler

2023-08-28 Thread via GitHub



echauchot commented on code in PR #22985:
URL: https://github.com/apache/flink/pull/22985#discussion_r1307494302


##
flink-runtime/src/main/java/org/apache/flink/runtime/scheduler/adaptive/Executing.java:
##
@@ -144,6 +188,17 @@ private void maybeRescale() {
 }
 }
 
+private Duration timeSinceLastRescale() {
+return Duration.between(lastRescale, Instant.now());
+}
+
+private void rescaleWhenCooldownPeriodIsOver() {

Review Comment:
   ok thanks for pointing out. That being said, with the period restart 
behavior (see previous comment), in the above case, on first call 
`maybeRescale` will be scheduled for 08:01:40 and on second call `maybeRescale` 
will be scheduled for 08:01:45. You want that we simply drop the last schedule 
so that only one `maybeRescale` is scheduled at a time ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] 1996fanrui commented on a diff in pull request #23218: [FLINK-32854][flink-runtime][JUnit5 Migration] The state package of flink-runtime module

2023-08-28 Thread via GitHub



1996fanrui commented on code in PR #23218:
URL: https://github.com/apache/flink/pull/23218#discussion_r1307500363


##
flink-runtime/src/test/java/org/apache/flink/runtime/state/OperatorStateOutputCheckpointStreamTest.java:
##
@@ -54,41 +55,40 @@ private OperatorStateHandle writeAllTestKeyGroups(
 }
 
 @Test
-public void testCloseNotPropagated() throws Exception {
+void testCloseNotPropagated() throws Exception {
 OperatorStateCheckpointOutputStream stream = createStream();
 TestMemoryCheckpointOutputStream innerStream =
 (TestMemoryCheckpointOutputStream) stream.getDelegate();
 stream.close();
-Assert.assertFalse(innerStream.isClosed());
+assertThat(innerStream.isClosed()).isFalse();
 innerStream.close();
 }
 
 @Test
-public void testEmptyOperatorStream() throws Exception {
+void testEmptyOperatorStream() throws Exception {
 OperatorStateCheckpointOutputStream stream = createStream();
 TestMemoryCheckpointOutputStream innerStream =
 (TestMemoryCheckpointOutputStream) stream.getDelegate();
 OperatorStateHandle emptyHandle = stream.closeAndGetHandle();
-Assert.assertTrue(innerStream.isClosed());
-Assert.assertEquals(0, stream.getNumberOfPartitions());
-Assert.assertEquals(null, emptyHandle);
+assertThat(innerStream.isClosed()).isTrue();
+assertThat(stream.getNumberOfPartitions()).isZero();
+assertThat(emptyHandle).isNull();
 }
 
 @Test
 public void testWriteReadRoundtrip() throws Exception {

Review Comment:
   ```suggestion
  void testWriteReadRoundtrip() throws Exception {
   ```



##
flink-runtime/src/test/java/org/apache/flink/runtime/state/filesystem/FileStateHandleTest.java:
##
@@ -147,7 +144,7 @@ public void 
testDiscardStateWithDeletionFailureThroughException() throws Excepti
 }
 
 @Test
-public void testDiscardStateWithDeletionFailureThroughReturnValue() throws 
Exception {
+void testDiscardStateWithDeletionFailureThroughReturnValue() throws 
Exception {

Review Comment:
   The public of `testDiscardStateWithDeletionFailureThroughException`can be 
removed.
   
   And the public of `FsCheckpointStreamFactoryTest` class can be removed.



##
flink-runtime/src/test/java/org/apache/flink/runtime/state/KeyGroupsStateHandleTest.java:
##
@@ -20,28 +20,26 @@
 
 import org.apache.flink.runtime.state.memory.ByteStreamStateHandle;
 
-import org.junit.Test;
+import org.junit.jupiter.api.Test;
 
-import static org.junit.Assert.assertEquals;
-import static org.junit.Assert.assertNotNull;
-import static org.junit.Assert.assertNull;
+import static org.assertj.core.api.Assertions.assertThat;
 
 /** A test for {@link KeyGroupsStateHandle} */
-public class KeyGroupsStateHandleTest {
+class KeyGroupsStateHandleTest {
 
 @Test
-public void testNonEmptyIntersection() {
+void testNonEmptyIntersection() {
 KeyGroupRangeOffsets offsets = new KeyGroupRangeOffsets(0, 7);
 byte[] dummy = new byte[10];
 StreamStateHandle streamHandle = new ByteStreamStateHandle("test", 
dummy);
 KeyGroupsStateHandle handle = new KeyGroupsStateHandle(offsets, 
streamHandle);
 
 KeyGroupRange expectedRange = new KeyGroupRange(0, 3);
 KeyGroupsStateHandle newHandle = handle.getIntersection(expectedRange);
-assertNotNull(newHandle);
-assertEquals(streamHandle, newHandle.getDelegateStateHandle());
-assertEquals(expectedRange, newHandle.getKeyGroupRange());
-assertEquals(handle.getStateHandleId(), newHandle.getStateHandleId());
+assertThat(newHandle).isNotNull();
+assertThat(newHandle.getDelegateStateHandle()).isEqualTo(streamHandle);
+assertThat(newHandle.getKeyGroupRange()).isEqualTo(expectedRange);
+
assertThat(newHandle.getStateHandleId()).isEqualTo(handle.getStateHandleId());
 }
 
 @Test

Review Comment:
   The public can be removed.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] echauchot commented on a diff in pull request #22985: [FLINK-21883][scheduler] Implement cooldown period for adaptive scheduler

2023-08-28 Thread via GitHub



echauchot commented on code in PR #22985:
URL: https://github.com/apache/flink/pull/22985#discussion_r1307494302


##
flink-runtime/src/main/java/org/apache/flink/runtime/scheduler/adaptive/Executing.java:
##
@@ -144,6 +188,17 @@ private void maybeRescale() {
 }
 }
 
+private Duration timeSinceLastRescale() {
+return Duration.between(lastRescale, Instant.now());
+}
+
+private void rescaleWhenCooldownPeriodIsOver() {

Review Comment:
   ok thanks for pointing out. That being said, with the period restart 
behavior (see previous comment), in the above case, on first call 
`maybeRescale` will be scheduled for 08:01:40 and on second call maybeRescale` 
will be scheduled for 08:01:45. You want that we simply drop the last schedule 
so that only one `maybeRescale` is scheduled at a time ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] echauchot commented on a diff in pull request #22985: [FLINK-21883][scheduler] Implement cooldown period for adaptive scheduler

2023-08-28 Thread via GitHub



echauchot commented on code in PR #22985:
URL: https://github.com/apache/flink/pull/22985#discussion_r1307494302


##
flink-runtime/src/main/java/org/apache/flink/runtime/scheduler/adaptive/Executing.java:
##
@@ -144,6 +188,17 @@ private void maybeRescale() {
 }
 }
 
+private Duration timeSinceLastRescale() {
+return Duration.between(lastRescale, Instant.now());
+}
+
+private void rescaleWhenCooldownPeriodIsOver() {

Review Comment:
   ok thanks for pointing out. That being said, with the period reset behavior 
(see previous comment), in the above case, on first call `maybeRescale` will be 
scheduled for 08:01:40 and on second call maybeRescale` will be scheduled for 
08:01:45. You want that we simply drop the last schedule so that only one 
`maybeRescale` is scheduled at a time ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] zentol commented on a diff in pull request #23296: [FLINK-32751][collect] Fixes race condition between close and request handling processes in CollectSinkOperatorCoordinator

2023-08-28 Thread via GitHub



zentol commented on code in PR #23296:
URL: https://github.com/apache/flink/pull/23296#discussion_r1307474170


##
flink-streaming-java/src/main/java/org/apache/flink/streaming/api/operators/collect/CollectSinkOperatorCoordinator.java:
##
@@ -83,7 +104,10 @@ public void start() throws Exception {
 @Override
 public void close() throws Exception {
 LOG.info("Closing the CollectSinkOperatorCoordinator.");
+running = false;
 this.executorService.shutdownNow();
+ongoingRequests.forEach(ft -> ft.cancel(true));

Review Comment:
   could it be that this alone is actually the solution?
   If multiple requests were already queued into the executor we weren't 
failing them anywhere.



##
flink-streaming-java/src/test/java/org/apache/flink/streaming/api/operators/collect/CollectSinkOperatorCoordinatorTest.java:
##
@@ -105,54 +102,147 @@ void testCloseAfterRequestIsReceived() throws Exception {
 }
 
 @Test
-void testServerFailure() throws Exception {
-CollectSinkOperatorCoordinator coordinator =
-new CollectSinkOperatorCoordinator(SOCKET_TIMEOUT_MILLIS);
-coordinator.start();
-
-final String versionOfFailedRequest = "version3";
-final CompletableFuture failedResponseFuture;
-try (final TestingSocketServer socketServer =
-
TestingSocketServer.createSocketServerAndInitializeCoordinator(coordinator)) {
-
-// a normal response
-final List expectedData0 = Arrays.asList(Row.of(1, "aaa"), 
Row.of(2, "bbb"));
-final CompletableFuture responseFuture0 =
-coordinator.handleCoordinationRequest(
-createRequestForServerGeneratedResponse());
-socketServer.handleRequest(expectedData0);
-assertResponseWithDefaultMetadataFromServer(responseFuture0, 
expectedData0);
+void testSuccessfulResponse() throws Exception {
+try (CollectSinkOperatorCoordinator testInstance = new 
CollectSinkOperatorCoordinator();
+final TestingSocketServer socketServer =
+
TestingSocketServer.createSocketServerAndInitializeCoordinator(
+testInstance)) {
+testInstance.start();
 
-// a normal response
-final List expectedData1 =
-Arrays.asList(Row.of(3, "ccc"), Row.of(4, "ddd"), 
Row.of(5, "eee"));
-final CompletableFuture responseFuture1 =
-coordinator.handleCoordinationRequest(
+final List expectedData = Arrays.asList(Row.of(1, "aaa"), 
Row.of(2, "bbb"));
+final CompletableFuture responseFuture =
+testInstance.handleCoordinationRequest(
 createRequestForServerGeneratedResponse());
-socketServer.handleRequest(expectedData1);
-assertResponseWithDefaultMetadataFromServer(responseFuture1, 
expectedData1);
+assertThat(responseFuture).isNotDone();
+
+socketServer.handleRequest(expectedData);
+
+assertResponseWithDefaultMetadataFromServer(responseFuture, 
expectedData);
+}
+}
+
+@Test
+void testServerSideClosingTheServerSocket() throws Exception {
+try (CollectSinkOperatorCoordinator coordinator = new 
CollectSinkOperatorCoordinator()) {
+coordinator.start();
+
+final CompletableFuture responseFuture;
+try (final TestingSocketServer socketServer =
+
TestingSocketServer.createSocketServerAndInitializeCoordinator(coordinator)) {
+final String version = "version";
+responseFuture =
+coordinator.handleCoordinationRequest(
+
createRequestForClientGeneratedResponse(version));
+assertThat(responseFuture).isNotDone();
+
+
FlinkAssertions.assertThatFuture(socketServer.handleRequestWithoutResponse())
+.eventuallySucceeds();
+}
+assertEmptyResponseGeneratedFromServer(responseFuture);
+}
+}
+
+@Test
+void testServerSideClosingTheAcceptingSocket() throws Exception {
+try (CollectSinkOperatorCoordinator coordinator = new 
CollectSinkOperatorCoordinator()) {
+coordinator.start();
+
+try (final TestingSocketServer socketServer =
+
TestingSocketServer.createSocketServerAndInitializeCoordinator(coordinator)) {
+final String version = "version";
+final CompletableFuture responseFuture =
+coordinator.handleCoordinationRequest(
+
createRequestForClientGeneratedResponse(version));
+assertThat(responseFuture).isNotDone();
+
+final CompletableFuture> dataFuture = new 
CompletableFuture<>();
+

[jira] [Updated] (FLINK-32961) A new FileSystemFactory that support two high available hdfs

2023-08-28 Thread Martijn Visser (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-32961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Martijn Visser updated FLINK-32961:
---
Fix Version/s: (was: 1.18.0)

> A new FileSystemFactory that support two high available hdfs
> 
>
> Key: FLINK-32961
> URL: https://issues.apache.org/jira/browse/FLINK-32961
> Project: Flink
>  Issue Type: Improvement
>  Components: FileSystems
>Reporter: 王茂军
>Priority: Major
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> I run realtime ETL program by flink on yarn. The ETL program sink user log to 
> master hdfs, and sink checkpoint to anather micro hdfs.The master hdfs and 
> micro hdfs are both high available.
> By default, the ETL program can not understand the dfs.nameservices of the 
> micro hdfs.
> I prepare to write a custom org.apache.flink.core.fs.FileSystemFactory that 
> support two or more ha hdfs.So that , i can sink user log to master hdfs, and 
> save checkpoint data to micro hdfs



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (FLINK-32970) could not load external class using "-C" option

2023-08-28 Thread Martijn Visser (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-32970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17759602#comment-17759602
 ] 

Martijn Visser commented on FLINK-32970:


[~SpongebobZ] Can you please try this with the latest versions of Flink, given 
that the community doesn't support 1.14 anymore?

> could not load external class using "-C" option
> ---
>
> Key: FLINK-32970
> URL: https://issues.apache.org/jira/browse/FLINK-32970
> Project: Flink
>  Issue Type: Bug
>  Components: API / Core
>Affects Versions: 1.14.6
>Reporter: Spongebob
>Priority: Major
>
> Firstly, the "connectors.jar" contains "test-connector" and I put it in user 
> libs.
> Then, I started a tableEvironment in one operator function of 
> streamExecutionEnvironment.
> In the tableEvironment I declared a table using the "test-connector".
> Finally, I run the application and load the "connectors.jar" using "-C 
> connectors.jar", when the table's creation statement was executed, I got an 
> class not found exception which like below(please notice that if I put the 
> "connectors.jar" in flink lib, the application would run normally):
> {code:java}
> SLF4J: Found binding in 
> [jar:file:/data/hadoop-3.3.5/tmpdata/nm-local-dir/usercache/root/appcache/application_1690443774859_0439/filecache/13/log4j-slf4j-impl-2.17.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]SLF4J:
>  Found binding in 
> [jar:file:/data/hadoop-3.3.5/share/hadoop/common/lib/slf4j-reload4j-1.7.36.jar!/org/slf4j/impl/StaticLoggerBinder.class]SLF4J:
>  See http://www.slf4j.org/codes.html#multiple_bindings for an 
> explanation.SLF4J: Actual binding is of type 
> [org.apache.logging.slf4j.Log4jLoggerFactory]java.util.concurrent.ExecutionException:
>  org.apache.flink.table.api.TableException: Failed to wait job finish
> at 
> java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357) 
> at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908)
>   at 
> org.apache.flink.table.api.internal.TableResultImpl.awaitInternal(TableResultImpl.java:129)
>   at 
> org.apache.flink.table.api.internal.TableResultImpl.await(TableResultImpl.java:92)
>at 
> com.xctech.cone.data.sql.model.runner.ModelRunner.executeStatementSet(ModelRunner.java:58)
>at 
> com.xctech.cone.data.versionedStarRocks.MicroBatchModelRunner.run(MicroBatchModelRunner.java:60)
>  at 
> com.xctech.cone.data.versionedStarRocks.ExecuteSQLFunction.invoke(ExecuteSQLFunction.java:103)
>at 
> com.xctech.cone.data.versionedStarRocks.ExecuteSQLFunction.invoke(ExecuteSQLFunction.java:25)
> at 
> org.apache.flink.streaming.api.operators.StreamSink.processElement(StreamSink.java:54)
>at 
> org.apache.flink.streaming.runtime.tasks.CopyingChainingOutput.pushToOperator(CopyingChainingOutput.java:82)
>  at 
> org.apache.flink.streaming.runtime.tasks.CopyingChainingOutput.collect(CopyingChainingOutput.java:57)
> at 
> org.apache.flink.streaming.runtime.tasks.CopyingChainingOutput.collect(CopyingChainingOutput.java:29)
> at 
> org.apache.flink.streaming.api.operators.CountingOutput.collect(CountingOutput.java:56)
>   at 
> org.apache.flink.streaming.api.operators.CountingOutput.collect(CountingOutput.java:29)
>   at 
> org.apache.flink.streaming.api.operators.TimestampedCollector.collect(TimestampedCollector.java:51)
>   at 
> org.apache.flink.streaming.api.functions.windowing.PassThroughAllWindowFunction.apply(PassThroughAllWindowFunction.java:35)
>   at 
> org.apache.flink.streaming.runtime.operators.windowing.functions.InternalSingleValueAllWindowFunction.process(InternalSingleValueAllWindowFunction.java:48)
>   at 
> org.apache.flink.streaming.runtime.operators.windowing.functions.InternalSingleValueAllWindowFunction.process(InternalSingleValueAllWindowFunction.java:34)
>   at 
> org.apache.flink.streaming.runtime.operators.windowing.WindowOperator.emitWindowContents(WindowOperator.java:568)
> at 
> org.apache.flink.streaming.runtime.operators.windowing.WindowOperator.onProcessingTime(WindowOperator.java:524)
>   at 
> org.apache.flink.streaming.api.operators.InternalTimerServiceImpl.onProcessingTime(InternalTimerServiceImpl.java:284)
> at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.invokeProcessingTimeCallback(StreamTask.java:1693)
>at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.lambda$null$22(StreamTask.java:1684)
>  at 
> org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$1.runThrowing(StreamTaskActionExecutor.java:50)
> at 
> org.apache.flink.streaming.runtime.tasks.mailbox.Mail.run(Mail.java:90)  
> at 
> org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.processMailsWhenDefaultActionUnavailable(MailboxProcessor.java:338)
> at

[jira] [Closed] (FLINK-32976) NullpointException when starting flink cluster

2023-08-28 Thread Martijn Visser (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-32976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Martijn Visser closed FLINK-32976.
--
Resolution: Cannot Reproduce

[~hackergin] Please provide info on what you've tried and how to reproduce this 
issue: there's currently not enough information to determine what's the issue

> NullpointException when starting flink cluster
> --
>
> Key: FLINK-32976
> URL: https://issues.apache.org/jira/browse/FLINK-32976
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Configuration
>Affects Versions: 1.17.1
>Reporter: Feng Jin
>Priority: Major
>
> The error message as follows: 
>  
>  
> {code:java}
> //代码占位符
> Caused by: java.ang.NullPointerExceptionat org.apache.flink. runtime. 
> security.token.hadoop.HadoopFSDelegationTokenProvider.getFileSystemsToAccess(HadoopFSDelegationTokenProvider.java:173)~[flink-dist-1.17.1.jar:1.17.1]at
>  
> org.apache.flink.runtime.security.token.hadoop.HadoopFSDelegationTokenProvidertionTokens$1(HadoopFSDelegationTokenProvider.java:113)
>  ~[flink-dist-1.17.1.jar:1.17.1at 
> java.security.AccessController.doprivileged(Native Method)~[?:1.8.0 281]at 
> javax.security.auth.Subject.doAs(Subject.java:422)~[?:1.8.0 281]at org. 
> apache.hadoop . security.UserGroupInformation.doAs(UserGroupInformation. 
> java:1876) 
> ~flink-shacd-hadoop-3-uber-3.1.1.7.2.1.0-327-9.0.jar:3.1.1.7.2.1.0-327-9.0]at 
> org. apache.flink. runtime.security.token .hadoop 
> .HadoopFSDelegationTokenProvider.obtainDelegationTcens(HadoopFSDelegationTokenProvider.java:108)~flink-dist-1.17.1.jar:1.17.1]at
>  org.apache.flink. runtime. security.token.DefaultDelegationTokenManager . 
> lambda$obtainDelSAndGetNextRenewal$1(DefaultDelegationTokenManager 
> .java:264)~ flink-dist-1.17.1.jar:1.17.1]at 
> java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193) 
> ~?:1.8.0 281at 
> java.util.HashMap$ValueSpliterator.forEachRemaining(HashMap.java:1628)~[?:1.8.0
>  281]at 
> java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482)~?:1.8.0 
> 281]at 
> java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472) 
> ~?:1.8.0 281at 
> java,util.stream.Reduce0ps$Reduce0p.evaluateSequential(Reduce0ps.java:708)~?:1.8.0
>  281]at 
> java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)~[?:1.8.0
>  281]at java.util.stream.ReferencePipeline.reduce(ReferencePipeline.java:479) 
> ~?:1.8.0 281at 
> java.util.stream.ReferencePipeline.min(ReferencePipeline.java:520)~?:1.8.0 
> 281at org. apache. flink. runtime. 
> security.token.DefaultDelegationTokenManager 
> .obtainDelegationTokensAndGeNextRenewal(DefaultDelegationTokenManager 
> .java:286)~[flink-dist-1.17.1.jar:1.17.1at org.apache. flink.runtime. 
> security.token.DefaultDelegationTokenManager. 
> obtainDelegationTokens(DefaltDelegationTokenManager.java:242)~[flink-dist-1.17.1.jar:1.17.1]at
>  org.apache.flink.runtime.entrypoint.ClusterEntrypoint.initializes@) 
> ~[flink-dist-1.17.1.jar:1.17.1]at 
> org.apache.flink.runtime.entrypoint.clusterEntrypoint.nk-dist-1.17.1.jar:1.17.1]at
>  org.apache.flink.runtime.entrypoint.ClusterEntrypoint:232) 
> ~[flink-dist-1.17.1.jar:1.17.1]at 
> java.security.AccessController.doPrivileged(Native Method) ~[?:1.8. 281]at 
> javax.security.auth.Subject.doAs(Subject.java:422)~?:1.8.0 281]at org. 
> apache.hadoop . security.UserGroupInformation. doAs (UserGroupInformation. 
> java:1876)~[flink-shadd-hadoop-3-uber-3.1.1.7.2.1.0-327-9.0.jar:3.1.1.7.2.1.0-327-9.0]at
>  org.apache.flink.runtime.security. contexts 
> .HadoopSecurityContext.runSecured(HadoopSecurijava:41) 
> ~[flink-dist-1.17.1.jar:1.17.1at org. apache.flink. runtime. entrypoint. 
> ClusterEntrypoint . startCluster(clusterEntrypoint. 
> java:229)link-dist-1.17.1.jar:1.17.1]...2 more{code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[GitHub] [flink] echauchot commented on a diff in pull request #22985: [FLINK-21883][scheduler] Implement cooldown period for adaptive scheduler

2023-08-28 Thread via GitHub



echauchot commented on code in PR #22985:
URL: https://github.com/apache/flink/pull/22985#discussion_r1307494302


##
flink-runtime/src/main/java/org/apache/flink/runtime/scheduler/adaptive/Executing.java:
##
@@ -144,6 +188,17 @@ private void maybeRescale() {
 }
 }
 
+private Duration timeSinceLastRescale() {
+return Duration.between(lastRescale, Instant.now());
+}
+
+private void rescaleWhenCooldownPeriodIsOver() {

Review Comment:
   ok thanks for pointing out.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Closed] (FLINK-32981) Add python dynamic Flink home detection

2023-08-28 Thread Jira



 [ 
https://issues.apache.org/jira/browse/FLINK-32981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Márton Balassi closed FLINK-32981.
--
Fix Version/s: 1.19.0
   Resolution: Fixed

63a4db2 in master

> Add python dynamic Flink home detection
> ---
>
> Key: FLINK-32981
> URL: https://issues.apache.org/jira/browse/FLINK-32981
> Project: Flink
>  Issue Type: Improvement
>  Components: API / Python
>Affects Versions: 1.19.0
>Reporter: Gabor Somogyi
>Assignee: Gabor Somogyi
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.19.0
>
>
> During `pyflink` library compilation Flink home is calculated from the 
> provided `pyflink` version which is normally something like: `1.19.dev0`.  
> Such case `.dev0` is replaced to `-SNAPSHOT` which ends-up in hardcoded home 
> directory: 
> `../../flink-dist/target/flink-1.18-SNAPSHOT-bin/flink-1.18-SNAPSHOT`. This 
> is fine as long as one uses the basic version types described 
> [here](https://peps.python.org/pep-0440/#developmental-releases). In order to 
> support any kind of `pyflink` version one can dynamically find out the Flink 
> home directory through globbing.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[GitHub] [flink] echauchot commented on a diff in pull request #22985: [FLINK-21883][scheduler] Implement cooldown period for adaptive scheduler

2023-08-28 Thread via GitHub



echauchot commented on code in PR #22985:
URL: https://github.com/apache/flink/pull/22985#discussion_r1307486764


##
flink-runtime/src/main/java/org/apache/flink/runtime/scheduler/adaptive/Executing.java:
##
@@ -210,6 +265,15 @@ interface Context
  * @return a ScheduledFuture representing pending completion of the 
task
  */
 ScheduledFuture runIfState(State expectedState, Runnable action, 
Duration delay);
+
+/**
+ * Runs the given action immediately or after a delay depending on the 
given condition.
+ *
+ * @param condition if met, the action is executed immediately or 
scheduled otherwise
+ * @param action action to run
+ * @param delay delay after which to run the action if the condition 
is not met
+ */
+void runIfConditionOrSchedule(boolean condition, Runnable action, 
Duration delay);

Review Comment:
   Agree, thanks



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] X-czh commented on a diff in pull request #23302: [FLINK-32848][tests][JUnit5 migration] Migrate flink-runtime/shuffle tests to JUnit5

2023-08-28 Thread via GitHub



X-czh commented on code in PR #23302:
URL: https://github.com/apache/flink/pull/23302#discussion_r1307486343


##
flink-runtime/src/test/java/org/apache/flink/runtime/shuffle/ShuffleMasterTest.java:
##
@@ -99,7 +95,7 @@ public void testStopTrackingPartition() throws Exception {
 EXTERNAL_PARTITION_RELEASE_EVENT,
 EXTERNAL_PARTITION_RELEASE_EVENT,
 };
-assertArrayEquals(expectedPartitionEvents, 
TestShuffleMaster.partitionEvents.toArray());
+
assertThat(TestShuffleMaster.partitionEvents.toArray()).isEqualTo(expectedPartitionEvents);

Review Comment:
   Thanks for pointing that, I've prepared an improvement commit



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Closed] (FLINK-32971) Add pyflink proper development version support

2023-08-28 Thread Jira



 [ 
https://issues.apache.org/jira/browse/FLINK-32971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Márton Balassi closed FLINK-32971.
--
Fix Version/s: 1.19.0
   Resolution: Fixed

171524f in master

> Add pyflink proper development version support
> --
>
> Key: FLINK-32971
> URL: https://issues.apache.org/jira/browse/FLINK-32971
> Project: Flink
>  Issue Type: Improvement
>  Components: API / Python
>Affects Versions: 1.19.0
>Reporter: Gabor Somogyi
>Assignee: Gabor Somogyi
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.19.0
>
>
> The following doc describes the python version specification: 
> https://peps.python.org/pep-0440/#developmental-releases
> Extract:
> {code:java}
> Developmental releases are also permitted for pre-releases and post-releases:
> X.YaN.devM   # Developmental release of an alpha release
> X.YbN.devM   # Developmental release of a beta release
> X.YrcN.devM  # Developmental release of a release candidate
> X.Y.postN.devM   # Developmental release of a post-release
> {code}
> At the moment pyflink supports only `.dev0` development version.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[GitHub] [flink] mbalassi merged pull request #23304: [FLINK-32971][PYTHON] Add pyflink proper development version support

2023-08-28 Thread via GitHub



mbalassi merged PR #23304:
URL: https://github.com/apache/flink/pull/23304


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] ferenc-csaky commented on pull request #23304: [FLINK-32971][PYTHON] Add pyflink proper development version support

2023-08-28 Thread via GitHub



ferenc-csaky commented on PR #23304:
URL: https://github.com/apache/flink/pull/23304#issuecomment-1695757103

   > > Any specific reason to be more lenient and not use \d+ instead?
   > 
   > I personally don't think that Flink must re-implement the versioning 
standard. I know this case it's just changing 2 chars to different 2 but the 
main direction is not to have python version check.
   
   That is a fair point, thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] wangzzu closed pull request #23311: [FLINK-32984] Display the host port information on the suttasks page of SubtaskExecution

2023-08-28 Thread via GitHub



wangzzu closed pull request #23311: [FLINK-32984] Display the host port 
information on the suttasks page of SubtaskExecution
URL: https://github.com/apache/flink/pull/23311


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] wangzzu commented on pull request #23311: [FLINK-32984] Display the host port information on the suttasks page of SubtaskExecution

2023-08-28 Thread via GitHub



wangzzu commented on PR #23311:
URL: https://github.com/apache/flink/pull/23311#issuecomment-1695750654

   Duplicated with another Jira


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] gaborgsomogyi commented on a diff in pull request #23304: [FLINK-32971][PYTHON] Add pyflink proper development version support

2023-08-28 Thread via GitHub



gaborgsomogyi commented on code in PR #23304:
URL: https://github.com/apache/flink/pull/23304#discussion_r1307448863


##
flink-python/setup.py:
##
@@ -252,7 +253,7 @@ def extracted_output_files(base_dir, file_path, 
output_directory):
   "is complete, or do this in the flink-python directory of 
the flink source "
   "directory.")
 sys.exit(-1)
-if VERSION.find('dev0') != -1:
+if re.search('dev.*$', VERSION) is not None:

Review Comment:
   I see example for both, I leave it like this until an additional +1 :)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] flinkbot commented on pull request #23311: [FLINK-32984] Display the host port information on the suttasks page of SubtaskExecution

2023-08-28 Thread via GitHub



flinkbot commented on PR #23311:
URL: https://github.com/apache/flink/pull/23311#issuecomment-1695726241

   
   ## CI report:
   
   * 1da9de02438237d60917cc0f84a727e26af55a7d UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Updated] (FLINK-32984) Display the host port information on the suttasks page of SubtaskExecution

2023-08-28 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-32984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-32984:
---
Labels: pull-request-available  (was: )

> Display the host port information on the suttasks page of SubtaskExecution
> --
>
> Key: FLINK-32984
> URL: https://issues.apache.org/jira/browse/FLINK-32984
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Web Frontend
>Affects Versions: 1.7.2, 1.8.0
>Reporter: Matt Wang
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 1.19.0
>
> Attachments: image-2023-08-28-21-04-05-297.png, 
> image-2023-08-28-21-04-27-396.png
>
>
> Currently in the Flink UI, the Host in the task SubTasks page will not 
> display TM port information, but the Host in TaskManagers will display port 
> information. 
> !image-2023-08-28-21-04-05-297.png|width=850,height=144!
> !image-2023-08-28-21-04-27-396.png|width=949,height=109!
> Since multiple TMs will run on one Host, I think the port information of the 
> TM should also be displayed in SubTasks, so that it is convenient to locate 
> the TM running on the subtask



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[GitHub] [flink] wangzzu opened a new pull request, #23311: [FLINK-32984] Display the host port information on the suttasks page of SubtaskExecution

2023-08-28 Thread via GitHub



wangzzu opened a new pull request, #23311:
URL: https://github.com/apache/flink/pull/23311

   
   
   ## What is the purpose of the change
   
   Display the host port information on the SutTasks page of SubtaskExecution
   
   ## Brief change log
   
   - add the port info to host of SubtaskExecutionAttemptDetailsInfo
   
   ## Verifying this change
   
   This change is already covered by existing tests, such as 
*SubtaskCurrentAttemptDetailsHandlerTest/testHandleRequest*.
   
   ## Does this pull request potentially affect one of the following parts:
   
 - Dependencies (does it add or upgrade a dependency): no
 - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: no
 - The serializers: no
 - The runtime per-record code paths (performance sensitive): no
 - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Kubernetes/Yarn, ZooKeeper: no
 - The S3 file system connector: no
   
   ## Documentation
   
 - Does this pull request introduce a new feature? no
 - If yes, how is the feature documented? not applicable
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Commented] (FLINK-32758) PyFlink bounds are overly restrictive and outdated

2023-08-28 Thread Deepyaman Datta (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-32758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17759585#comment-17759585
 ] 

Deepyaman Datta commented on FLINK-32758:
-

[~Sergey Nuyanzin] This looks to be related to 
[https://github.com/fastavro/fastavro/issues/701]; while we pin `cython<3` for 
PyFlink, `fastavro` is getting built separately with Cython 3. One possible 
solution is to do something like 
[https://stackoverflow.com/a/76837035/1093967,] where `cython<3` is installed 
globally in the environment and used for building all of the libraries (I 
think). I'm not sure how you all feel about that, but I try to raise a PR with 
that, if helpful. It seems the failing test is on nightly build that runs a lot 
more checks; I'm not sure how I can verify that a potential fix would work, if 
I try? Can I trigger these tests manually?

The other possibility is to check why `fastavro>=1.8.1` isn't getting picked, 
and it's using `fastavro==1.8.0`. The newer versions have the Cython pin in 
their build requirements, and we wouldn't need to do a `pip wheel 
--no-build-isolation`. I can try to check this later today.

> PyFlink bounds are overly restrictive and outdated
> --
>
> Key: FLINK-32758
> URL: https://issues.apache.org/jira/browse/FLINK-32758
> Project: Flink
>  Issue Type: Improvement
>  Components: API / Python
>Affects Versions: 1.17.1, 1.19.0
>Reporter: Deepyaman Datta
>Assignee: Deepyaman Datta
>Priority: Blocker
>  Labels: pull-request-available, test-stability
>
> Hi! I am part of a team building the Flink backend for Ibis 
> ([https://github.com/ibis-project/ibis]). We would like to leverage PyFlink 
> under the hood for execution; however, PyFlink's requirements are 
> incompatible with several other Ibis requirements. Beyond Ibis, PyFlink's 
> outdated and restrictive requirements prevent it from being used alongside 
> most recent releases of Python data libraries.
> Some of the major libraries we (and likely others in the Python community 
> interested in using PyFlink alongside other libraries) need compatibility 
> with:
>  * PyArrow (at least >=10.0.0, but there's no reason not to be also be 
> compatible with latest)
>  * pandas (should be compatible with 2.x series, but also probably with 
> 1.4.x, released January 2022, and 1.5.x)
>  * numpy (1.22 was released in December 2022)
>  * Newer releases of Apache Beam
>  * Newer releases of cython
> Furthermore, uncapped dependencies could be more generally preferable, as 
> they avoid the need for frequent PyFlink releases as newer versions of 
> libraries are released. A common (and great) argument for not upper-bounding 
> dependencies, especially for libraries: 
> [https://iscinumpy.dev/post/bound-version-constraints/]
> I am currently testing removing upper bounds in 
> [https://github.com/apache/flink/pull/23141]; so far, builds pass without 
> issue in 
> [b65c072|https://github.com/apache/flink/pull/23141/commits/b65c0723ed66e01e83d718f770aa916f41f34581],
>  and I'm currently waiting on 
> [c8eb15c|https://github.com/apache/flink/pull/23141/commits/c8eb15cbc371dc259fb4fda5395f0f55e08ea9c6]
>  to see if I can get PyArrow to resolve >=10.0.0. Solving the proposed 
> dependencies results in:
> {{#}}
> {{# This file is autogenerated by pip-compile with Python 3.8}}
> {{# by the following command:}}
> {{#}}
> {{#    pip-compile --config=pyproject.toml 
> --output-file=dev/compiled-requirements.txt dev/dev-requirements.txt}}
> {{#}}
> {{apache-beam==2.49.0}}
> {{    # via -r dev/dev-requirements.txt}}
> {{avro-python3==1.10.2}}
> {{    # via -r dev/dev-requirements.txt}}
> {{certifi==2023.7.22}}
> {{    # via requests}}
> {{charset-normalizer==3.2.0}}
> {{    # via requests}}
> {{cloudpickle==2.2.1}}
> {{    # via}}
> {{    #   -r dev/dev-requirements.txt}}
> {{    #   apache-beam}}
> {{crcmod==1.7}}
> {{    # via apache-beam}}
> {{cython==3.0.0}}
> {{    # via -r dev/dev-requirements.txt}}
> {{dill==0.3.1.1}}
> {{    # via apache-beam}}
> {{dnspython==2.4.1}}
> {{    # via pymongo}}
> {{docopt==0.6.2}}
> {{    # via hdfs}}
> {{exceptiongroup==1.1.2}}
> {{    # via pytest}}
> {{fastavro==1.8.2}}
> {{    # via}}
> {{    #   -r dev/dev-requirements.txt}}
> {{    #   apache-beam}}
> {{fasteners==0.18}}
> {{    # via apache-beam}}
> {{find-libpython==0.3.0}}
> {{    # via pemja}}
> {{grpcio==1.56.2}}
> {{    # via}}
> {{    #   -r dev/dev-requirements.txt}}
> {{    #   apache-beam}}
> {{    #   grpcio-tools}}
> {{grpcio-tools==1.56.2}}
> {{    # via -r dev/dev-requirements.txt}}
> {{hdfs==2.7.0}}
> {{    # via apache-beam}}
> {{httplib2==0.22.0}}
> {{    # via}}
> {{    #   -r dev/dev-requirements.txt}}
> {{    #   apache-beam}}
> {{idna==3.4}}
> {{    # via requests}}
> {{iniconfig==2.0.0}}
> {{    # via

[GitHub] [flink] gaborgsomogyi commented on pull request #23304: [FLINK-32971][PYTHON] Add pyflink proper development version support

2023-08-28 Thread via GitHub



gaborgsomogyi commented on PR #23304:
URL: https://github.com/apache/flink/pull/23304#issuecomment-1695707098

   > Any specific reason to be more lenient and not use \d+ instead?
   
   I personally don't think that Flink must re-implement the versioning 
standard. I know this case it's just changing 2 chars to different 2 but the 
main direction is not to have python version check.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Resolved] (FLINK-32963) Make the test "testKeyedMapStateStateMigration" stable

2023-08-28 Thread Hangxiang Yu (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-32963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hangxiang Yu resolved FLINK-32963.
--
Fix Version/s: 1.19.0
   Resolution: Fixed

merge 
[ab41e4f4|https://github.com/apache/flink/commit/ab41e4f4f1406ca349f0407cbbe05e0647132c81]
 into master

> Make the test "testKeyedMapStateStateMigration" stable
> --
>
> Key: FLINK-32963
> URL: https://issues.apache.org/jira/browse/FLINK-32963
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / State Backends
>Affects Versions: 1.17.1
>Reporter: Asha Boyapati
>Assignee: Asha Boyapati
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 1.19.0
>
>
> We are proposing to make the following test stable:
> {{org.apache.flink.runtime.state.FileStateBackendMigrationTest.testKeyedMapStateStateMigration}}
> The test is currently flaky because the order of elements returned by the 
> iterator is non-deterministic.
> The following PR fixes the flaky test by making it independent of the order 
> of elements returned by the iterator:
> [https://github.com/apache/flink/pull/23298]
> We detected this using the NonDex tool using the following command:
> {{mvn edu.illinois:nondex-maven-plugin:2.1.1:nondex -pl flink-runtime 
> -DnondexRuns=10 
> -Dtest=org.apache.flink.runtime.state.FileStateBackendMigrationTest#testKeyedMapStateStateMigration}}
> Please see the following Continuous Integration log that shows the flakiness:
> [https://github.com/asha-boyapati/flink/actions/runs/5909136145/job/16029377793]
> Please see the following Continuous Integration log that shows that the 
> flakiness is fixed by this change:
> [https://github.com/asha-boyapati/flink/actions/runs/5909183468/job/16029467973]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[GitHub] [flink] masteryhx closed pull request #23298: [FLINK-32963][Test] Remove flakiness from testKeyedMapStateStateMigration

2023-08-28 Thread via GitHub



masteryhx closed pull request #23298: [FLINK-32963][Test] Remove flakiness from 
testKeyedMapStateStateMigration
URL: https://github.com/apache/flink/pull/23298


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Updated] (FLINK-32985) Support setting env.java.opts.sql-gateway to specify the jvm opts for sql gateway

2023-08-28 Thread xiangyu feng (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-32985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xiangyu feng updated FLINK-32985:
-
Component/s: Runtime / Configuration

> Support setting env.java.opts.sql-gateway to specify the jvm opts for sql 
> gateway
> -
>
> Key: FLINK-32985
> URL: https://issues.apache.org/jira/browse/FLINK-32985
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Configuration, Table SQL / Gateway
>Reporter: xiangyu feng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-32985) Support setting env.java.opts.sql-gateway to specify the jvm opts for sql gateway

2023-08-28 Thread xiangyu feng (Jira)

xiangyu feng created FLINK-32985:


 Summary: Support setting env.java.opts.sql-gateway to specify the 
jvm opts for sql gateway
 Key: FLINK-32985
 URL: https://issues.apache.org/jira/browse/FLINK-32985
 Project: Flink
  Issue Type: Sub-task
  Components: Table SQL / Gateway
Reporter: xiangyu feng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[GitHub] [flink] gaborgsomogyi commented on a diff in pull request #23306: [FLINK-32981][PYTHON] Add python dynamic Flink home detection

2023-08-28 Thread via GitHub



gaborgsomogyi commented on code in PR #23306:
URL: https://github.com/apache/flink/pull/23306#discussion_r1307408172


##
flink-python/apache-flink-libraries/setup.py:
##
@@ -98,8 +98,12 @@ def find_file_path(pattern):
   file=sys.stderr)
 sys.exit(-1)
 flink_version = VERSION.replace(".dev0", "-SNAPSHOT")
-FLINK_HOME = os.path.abspath(
-"../../flink-dist/target/flink-%s-bin/flink-%s" % (flink_version, 
flink_version))
+flink_homes = glob.glob('../../flink-dist/target/flink-*-bin/flink-*')
+if len(flink_homes) != 1:
+print("More than one Flink home directory found 
{0}".format(flink_homes),

Review Comment:
   Good point, tried it out only w/ more than one dir. Changing...



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] ferenc-csaky commented on a diff in pull request #23306: [FLINK-32981][PYTHON] Add python dynamic Flink home detection

2023-08-28 Thread via GitHub



ferenc-csaky commented on code in PR #23306:
URL: https://github.com/apache/flink/pull/23306#discussion_r1307401154


##
flink-python/apache-flink-libraries/setup.py:
##
@@ -98,8 +98,12 @@ def find_file_path(pattern):
   file=sys.stderr)
 sys.exit(-1)
 flink_version = VERSION.replace(".dev0", "-SNAPSHOT")
-FLINK_HOME = os.path.abspath(
-"../../flink-dist/target/flink-%s-bin/flink-%s" % (flink_version, 
flink_version))
+flink_homes = glob.glob('../../flink-dist/target/flink-*-bin/flink-*')
+if len(flink_homes) != 1:
+print("More than one Flink home directory found 
{0}".format(flink_homes),

Review Comment:
   Is it possible to get to this line if we did not package Flink yet? Even if 
not, I could end up cleaning up the dist target dir, so there is a possibility 
that `len(flink_homes)` will be `0` and in that case the "More than one..." 
error msg. is misleading. I think this can be rephrased in a more general way, 
maybe "Ambiguous Flink home directory found: ..." or something.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Created] (FLINK-32984) Display the host port information on the suttasks page of SubtaskExecution

2023-08-28 Thread Matt Wang (Jira)

Matt Wang created FLINK-32984:
-

 Summary: Display the host port information on the suttasks page of 
SubtaskExecution
 Key: FLINK-32984
 URL: https://issues.apache.org/jira/browse/FLINK-32984
 Project: Flink
  Issue Type: Improvement
  Components: Runtime / Web Frontend
Affects Versions: 1.8.0, 1.7.2
Reporter: Matt Wang
 Fix For: 1.19.0
 Attachments: image-2023-08-28-21-04-05-297.png, 
image-2023-08-28-21-04-27-396.png

Currently in the Flink UI, the Host in the task SubTasks page will not display 
TM port information, but the Host in TaskManagers will display port 
information. 

!image-2023-08-28-21-04-05-297.png|width=850,height=144!
!image-2023-08-28-21-04-27-396.png|width=949,height=109!

Since multiple TMs will run on one Host, I think the port information of the TM 
should also be displayed in SubTasks, so that it is convenient to locate the TM 
running on the subtask



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Comment Edited] (FLINK-32780) Release Testing: Verify FLIP-324: Introduce Runtime Filter for Flink Batch Jobs

2023-08-28 Thread dalongliu (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-32780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17759578#comment-17759578
 ] 

dalongliu edited comment on FLINK-32780 at 8/28/23 12:55 PM:
-

RuntimeFilter was validated against the TPC-DS test set. Enabling the 
table.optimizer.runtime-filter.enabled option. The following validations were 
done separately:
1. verified the plan of the query, and confirmed that many queries were 
inserted into the RuntimeFilter through the plan.
2. For the whole TPC-DS dataset, before enabling runtime, total time is 5141s, 
after is 4883s, the gain of RuntimeFilter is 5%, the queries with significant 
gain are q88, q93, q95, other queries with limited gain.
3. q24, q72 showed performance regression, especially q72. By checking the plan 
of q72, compared with not turning on RuntimeFilter, we found that there is a 
dependency between the upstream and downstream operators, which leads to the 
source node not being able to be executed in parallel, thus leading to the 
performance regression. I don't think we should insert the RuntimeFilter 
operator for this pattern because it doesn't filter the amount of data that the 
Join operator needs to process by itself.

!image-2023-08-28-20-50-26-687.png!


was (Author: lsy):
RuntimeFilter was validated against the TPC-DS test set. Enabling the 
table.optimizer.runtime-filter.enabled option. The following validations were 
done separately:
1. verified the plan of the query, and confirmed that many queries were 
inserted into the RuntimeFilter through the plan.
2. For the whole TPC-DS dataset, before enabling runtime, total time is 5141, 
after is 4883, the gain of RuntimeFilter is 5%, the queries with significant 
gain are q88, q93, q95, other queries with limited gain.
3. q24, q72 showed performance regression, especially q72. By checking the plan 
of q72, compared with not turning on RuntimeFilter, we found that there is a 
dependency between the upstream and downstream operators, which leads to the 
source node not being able to be executed in parallel, thus leading to the 
performance regression. I don't think we should insert the RuntimeFilter 
operator for this pattern because it doesn't filter the amount of data that the 
Join operator needs to process by itself.

!image-2023-08-28-20-50-26-687.png!

> Release Testing: Verify FLIP-324: Introduce Runtime Filter for Flink Batch 
> Jobs
> ---
>
> Key: FLINK-32780
> URL: https://issues.apache.org/jira/browse/FLINK-32780
> Project: Flink
>  Issue Type: Sub-task
>  Components: Tests
>Affects Versions: 1.18.0
>Reporter: Qingsheng Ren
>Assignee: dalongliu
>Priority: Major
> Fix For: 1.18.0
>
> Attachments: image-2023-08-28-20-50-26-687.png
>
>
> This issue aims to verify FLIP-324: 
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-324%3A+Introduce+Runtime+Filter+for+Flink+Batch+Jobs
> We can enable runtime filter by set: table.optimizer.runtime-filter.enabled: 
> true
> 1. Create two tables, one small table (small amount of data), one large table 
> (large amount of data), and then run join query on these two tables(such as 
> the example in FLIP doc: SELECT * FROM fact, dim WHERE x = a AND z = 2). The 
> Flink table planner should be able to obtain the statistical information of 
> these two tables (for example, Hive table), and the data volume of the small 
> table should be less than 
> "table.optimizer.runtime-filter.max-build-data-size", and the data volume of 
> the large table should be larger than 
> "table.optimizer.runtime-filter.min-probe-data-size".
> 2. Show the plan of the join query. The plan should include nodes such as 
> LocalRuntimeFilterBuilder, GlobalRuntimeFilterBuilder and RuntimeFilter. We 
> can also verify plan for the various variants of above query.
> 3. Execute the above plan, and: 
> * Check whether the data in the large table has been successfully filtered  
> * Verify the execution result, the execution result should be same with the 
> execution plan which disable runtime filter.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Comment Edited] (FLINK-32780) Release Testing: Verify FLIP-324: Introduce Runtime Filter for Flink Batch Jobs

2023-08-28 Thread dalongliu (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-32780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17759578#comment-17759578
 ] 

dalongliu edited comment on FLINK-32780 at 8/28/23 12:54 PM:
-

RuntimeFilter was validated against the TPC-DS test set. Enabling the 
table.optimizer.runtime-filter.enabled option. The following validations were 
done separately:
1. verified the plan of the query, and confirmed that many queries were 
inserted into the RuntimeFilter through the plan.
2. For the whole TPC-DS dataset, before enabling runtime, total time is 5141, 
after is 4883, the gain of RuntimeFilter is 5%, the queries with significant 
gain are q88, q93, q95, other queries with limited gain.
3. q24, q72 showed performance regression, especially q72. By checking the plan 
of q72, compared with not turning on RuntimeFilter, we found that there is a 
dependency between the upstream and downstream operators, which leads to the 
source node not being able to be executed in parallel, thus leading to the 
performance regression. I don't think we should insert the RuntimeFilter 
operator for this pattern because it doesn't filter the amount of data that the 
Join operator needs to process by itself.

!image-2023-08-28-20-50-26-687.png!


was (Author: lsy):
RuntimeFilter was validated against the TPC-DS test set. Enabling the 
table.optimizer.runtime-filter.enabled option. The following validations were 
done separately:
1. verified the plan of the query, and confirmed that many queries were 
inserted into the RuntimeFilter through the plan.
2. For the whole TPC-DS dataset, the gain of RuntimeFilter is 5%, the queries 
with significant gain are q88, q93, q95, other queries with limited gain.
3. q24, q72 showed performance regression, especially q72. By checking the plan 
of q72, compared with not turning on RuntimeFilter, we found that there is a 
dependency between the upstream and downstream operators, which leads to the 
source node not being able to be executed in parallel, thus leading to the 
performance regression. I don't think we should insert the RuntimeFilter 
operator for this pattern because it doesn't filter the amount of data that the 
Join operator needs to process by itself.

!image-2023-08-28-20-50-26-687.png!

> Release Testing: Verify FLIP-324: Introduce Runtime Filter for Flink Batch 
> Jobs
> ---
>
> Key: FLINK-32780
> URL: https://issues.apache.org/jira/browse/FLINK-32780
> Project: Flink
>  Issue Type: Sub-task
>  Components: Tests
>Affects Versions: 1.18.0
>Reporter: Qingsheng Ren
>Assignee: dalongliu
>Priority: Major
> Fix For: 1.18.0
>
> Attachments: image-2023-08-28-20-50-26-687.png
>
>
> This issue aims to verify FLIP-324: 
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-324%3A+Introduce+Runtime+Filter+for+Flink+Batch+Jobs
> We can enable runtime filter by set: table.optimizer.runtime-filter.enabled: 
> true
> 1. Create two tables, one small table (small amount of data), one large table 
> (large amount of data), and then run join query on these two tables(such as 
> the example in FLIP doc: SELECT * FROM fact, dim WHERE x = a AND z = 2). The 
> Flink table planner should be able to obtain the statistical information of 
> these two tables (for example, Hive table), and the data volume of the small 
> table should be less than 
> "table.optimizer.runtime-filter.max-build-data-size", and the data volume of 
> the large table should be larger than 
> "table.optimizer.runtime-filter.min-probe-data-size".
> 2. Show the plan of the join query. The plan should include nodes such as 
> LocalRuntimeFilterBuilder, GlobalRuntimeFilterBuilder and RuntimeFilter. We 
> can also verify plan for the various variants of above query.
> 3. Execute the above plan, and: 
> * Check whether the data in the large table has been successfully filtered  
> * Verify the execution result, the execution result should be same with the 
> execution plan which disable runtime filter.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Resolved] (FLINK-32780) Release Testing: Verify FLIP-324: Introduce Runtime Filter for Flink Batch Jobs

2023-08-28 Thread dalongliu (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-32780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dalongliu resolved FLINK-32780.
---
Resolution: Fixed

> Release Testing: Verify FLIP-324: Introduce Runtime Filter for Flink Batch 
> Jobs
> ---
>
> Key: FLINK-32780
> URL: https://issues.apache.org/jira/browse/FLINK-32780
> Project: Flink
>  Issue Type: Sub-task
>  Components: Tests
>Affects Versions: 1.18.0
>Reporter: Qingsheng Ren
>Assignee: dalongliu
>Priority: Major
> Fix For: 1.18.0
>
> Attachments: image-2023-08-28-20-50-26-687.png
>
>
> This issue aims to verify FLIP-324: 
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-324%3A+Introduce+Runtime+Filter+for+Flink+Batch+Jobs
> We can enable runtime filter by set: table.optimizer.runtime-filter.enabled: 
> true
> 1. Create two tables, one small table (small amount of data), one large table 
> (large amount of data), and then run join query on these two tables(such as 
> the example in FLIP doc: SELECT * FROM fact, dim WHERE x = a AND z = 2). The 
> Flink table planner should be able to obtain the statistical information of 
> these two tables (for example, Hive table), and the data volume of the small 
> table should be less than 
> "table.optimizer.runtime-filter.max-build-data-size", and the data volume of 
> the large table should be larger than 
> "table.optimizer.runtime-filter.min-probe-data-size".
> 2. Show the plan of the join query. The plan should include nodes such as 
> LocalRuntimeFilterBuilder, GlobalRuntimeFilterBuilder and RuntimeFilter. We 
> can also verify plan for the various variants of above query.
> 3. Execute the above plan, and: 
> * Check whether the data in the large table has been successfully filtered  
> * Verify the execution result, the execution result should be same with the 
> execution plan which disable runtime filter.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Comment Edited] (FLINK-32780) Release Testing: Verify FLIP-324: Introduce Runtime Filter for Flink Batch Jobs

2023-08-28 Thread dalongliu (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-32780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17759578#comment-17759578
 ] 

dalongliu edited comment on FLINK-32780 at 8/28/23 12:50 PM:
-

RuntimeFilter was validated against the TPC-DS test set. Enabling the 
table.optimizer.runtime-filter.enabled option. The following validations were 
done separately:
1. verified the plan of the query, and confirmed that many queries were 
inserted into the RuntimeFilter through the plan.
2. For the whole TPC-DS dataset, the gain of RuntimeFilter is 5%, the queries 
with significant gain are q88, q93, q95, other queries with limited gain.
3. q24, q72 showed performance regression, especially q72. By checking the plan 
of q72, compared with not turning on RuntimeFilter, we found that there is a 
dependency between the upstream and downstream operators, which leads to the 
source node not being able to be executed in parallel, thus leading to the 
performance regression. I don't think we should insert the RuntimeFilter 
operator for this pattern because it doesn't filter the amount of data that the 
Join operator needs to process by itself.

!image-2023-08-28-20-50-26-687.png!


was (Author: lsy):
RuntimeFilter was validated against the TPC-DS test set. Enabling the 
table.optimizer.runtime-filter.enabled option. The following validations were 
done separately:
1. verified the plan of the query, and confirmed that many queries were 
inserted into the RuntimeFilter through the plan.
2. For the whole TPC-DS dataset, the gain of RuntimeFilter is 5%, the queries 
with significant gain are q88, q93, q95, other queries with limited gain.
3. q24, q72 showed performance regression, especially q72. By checking the plan 
of q72, compared with not turning on RuntimeFilter, we found that there is a 
dependency between the upstream and downstream operators, which leads to the 
source node not being able to be executed in parallel, thus leading to the 
performance regression. I don't think we should insert the RuntimeFilter 
operator for this pattern because it doesn't filter the amount of data that the 
Join operator needs to process by itself.

!https://alidocs.oss-cn-zhangjiakou.aliyuncs.com/res/8oLl97DWKaLzqapY/img/fb011d3a-0e16-41e4-90eb-24d5be7b509e.png#255!

> Release Testing: Verify FLIP-324: Introduce Runtime Filter for Flink Batch 
> Jobs
> ---
>
> Key: FLINK-32780
> URL: https://issues.apache.org/jira/browse/FLINK-32780
> Project: Flink
>  Issue Type: Sub-task
>  Components: Tests
>Affects Versions: 1.18.0
>Reporter: Qingsheng Ren
>Assignee: dalongliu
>Priority: Major
> Fix For: 1.18.0
>
> Attachments: image-2023-08-28-20-50-26-687.png
>
>
> This issue aims to verify FLIP-324: 
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-324%3A+Introduce+Runtime+Filter+for+Flink+Batch+Jobs
> We can enable runtime filter by set: table.optimizer.runtime-filter.enabled: 
> true
> 1. Create two tables, one small table (small amount of data), one large table 
> (large amount of data), and then run join query on these two tables(such as 
> the example in FLIP doc: SELECT * FROM fact, dim WHERE x = a AND z = 2). The 
> Flink table planner should be able to obtain the statistical information of 
> these two tables (for example, Hive table), and the data volume of the small 
> table should be less than 
> "table.optimizer.runtime-filter.max-build-data-size", and the data volume of 
> the large table should be larger than 
> "table.optimizer.runtime-filter.min-probe-data-size".
> 2. Show the plan of the join query. The plan should include nodes such as 
> LocalRuntimeFilterBuilder, GlobalRuntimeFilterBuilder and RuntimeFilter. We 
> can also verify plan for the various variants of above query.
> 3. Execute the above plan, and: 
> * Check whether the data in the large table has been successfully filtered  
> * Verify the execution result, the execution result should be same with the 
> execution plan which disable runtime filter.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (FLINK-32780) Release Testing: Verify FLIP-324: Introduce Runtime Filter for Flink Batch Jobs

2023-08-28 Thread dalongliu (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-32780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17759578#comment-17759578
 ] 

dalongliu commented on FLINK-32780:
---

RuntimeFilter was validated against the TPC-DS test set. Enabling the 
table.optimizer.runtime-filter.enabled option. The following validations were 
done separately:
1. verified the plan of the query, and confirmed that many queries were 
inserted into the RuntimeFilter through the plan.
2. For the whole TPC-DS dataset, the gain of RuntimeFilter is 5%, the queries 
with significant gain are q88, q93, q95, other queries with limited gain.
3. q24, q72 showed performance regression, especially q72. By checking the plan 
of q72, compared with not turning on RuntimeFilter, we found that there is a 
dependency between the upstream and downstream operators, which leads to the 
source node not being able to be executed in parallel, thus leading to the 
performance regression. I don't think we should insert the RuntimeFilter 
operator for this pattern because it doesn't filter the amount of data that the 
Join operator needs to process by itself.

!https://alidocs.oss-cn-zhangjiakou.aliyuncs.com/res/8oLl97DWKaLzqapY/img/fb011d3a-0e16-41e4-90eb-24d5be7b509e.png#255!

> Release Testing: Verify FLIP-324: Introduce Runtime Filter for Flink Batch 
> Jobs
> ---
>
> Key: FLINK-32780
> URL: https://issues.apache.org/jira/browse/FLINK-32780
> Project: Flink
>  Issue Type: Sub-task
>  Components: Tests
>Affects Versions: 1.18.0
>Reporter: Qingsheng Ren
>Assignee: dalongliu
>Priority: Major
> Fix For: 1.18.0
>
>
> This issue aims to verify FLIP-324: 
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-324%3A+Introduce+Runtime+Filter+for+Flink+Batch+Jobs
> We can enable runtime filter by set: table.optimizer.runtime-filter.enabled: 
> true
> 1. Create two tables, one small table (small amount of data), one large table 
> (large amount of data), and then run join query on these two tables(such as 
> the example in FLIP doc: SELECT * FROM fact, dim WHERE x = a AND z = 2). The 
> Flink table planner should be able to obtain the statistical information of 
> these two tables (for example, Hive table), and the data volume of the small 
> table should be less than 
> "table.optimizer.runtime-filter.max-build-data-size", and the data volume of 
> the large table should be larger than 
> "table.optimizer.runtime-filter.min-probe-data-size".
> 2. Show the plan of the join query. The plan should include nodes such as 
> LocalRuntimeFilterBuilder, GlobalRuntimeFilterBuilder and RuntimeFilter. We 
> can also verify plan for the various variants of above query.
> 3. Execute the above plan, and: 
> * Check whether the data in the large table has been successfully filtered  
> * Verify the execution result, the execution result should be same with the 
> execution plan which disable runtime filter.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[GitHub] [flink] ferenc-csaky commented on a diff in pull request #23304: [FLINK-32971][PYTHON] Add pyflink proper development version support

2023-08-28 Thread via GitHub



ferenc-csaky commented on code in PR #23304:
URL: https://github.com/apache/flink/pull/23304#discussion_r1307380348


##
flink-python/setup.py:
##
@@ -252,7 +253,7 @@ def extracted_output_files(base_dir, file_path, 
output_directory):
   "is complete, or do this in the flink-python directory of 
the flink source "
   "directory.")
 sys.exit(-1)
-if VERSION.find('dev0') != -1:
+if re.search('dev.*$', VERSION) is not None:

Review Comment:
   nit: Probably this can be argued, so feel free to ignore but in general I 
think `if :` is more readable. (And objectively more concise :)).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Updated] (FLINK-18445) Short circuit join condition for lookup join

2023-08-28 Thread lincoln lee (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-18445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lincoln lee updated FLINK-18445:

Priority: Minor  (was: Not a Priority)

> Short circuit join condition for lookup join
> 
>
> Key: FLINK-18445
> URL: https://issues.apache.org/jira/browse/FLINK-18445
> Project: Flink
>  Issue Type: Improvement
>  Components: Table SQL / Runtime
>Reporter: Rui Li
>Assignee: lincoln lee
>Priority: Minor
>  Labels: auto-deprioritized-major, auto-deprioritized-minor
> Fix For: 1.19.0
>
>
> Consider the following query:
> {code}
> select *
> from probe
> left join
> build for system_time as of probe.ts
> on probe.key=build.key and probe.col is not null
> {code}
> In current implementation, we lookup each probe.key in build to decide 
> whether a match is found. A possible optimization is to skip the lookup for 
> rows whose {{col}} is null.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

1 2 3 >

1 - 100 of 256 matches

Mail list logo