[jira] [Updated] (FLINK-34751) RestClusterClient APIs doesn't work with running Flink application on YARN

2024-03-19 Thread Venkata krishnan Sowrirajan (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Venkata krishnan Sowrirajan updated FLINK-34751:

Description: 
Apache YARN uses web proxy in Resource Manager to expose the endpoints 
available through the AM process (in this case RestServerEndpoint that run as 
part of AM). Note: this is in the context of running Flink cluster in YARN 
application mode.

For eg: in the case of RestClusterClient#listJobs -

{{Standalone listJobs}} makes the request as - 
{{{}https://:/v1/{}}}{{{}jobs{}}}{{{}/overview{}}}

YARN the same request has to be proxified as -  
{{{}https://:/proxy//v1/{}}}{{{}jobs{}}}{{{}/overview?proxyapproved=true{}}}

  was:
Apache YARN uses web proxy in Resource Manager to expose the endpoints 
available through the AM process (in this case RestServerEndpoint that run as 
part of AM). Note: this is in the context of running Flink cluster in YARN 
application mode.

 

For eg: in the case of RestClusterClient#listJobs -

{{Standalone listJobs}} makes the request as - 
{{{}https://:/v1/{}}}{{{}jobs{}}}{{{}/overview{}}}

YARN the same request has to be proxified as -  
{{{}https://:/proxy//v1/{}}}{{{}jobs{}}}{{{}/overview?proxyapproved=true{}}}


> RestClusterClient APIs doesn't work with running Flink application on YARN
> --
>
> Key: FLINK-34751
> URL: https://issues.apache.org/jira/browse/FLINK-34751
> Project: Flink
>  Issue Type: Bug
>Reporter: Venkata krishnan Sowrirajan
>Priority: Major
>
> Apache YARN uses web proxy in Resource Manager to expose the endpoints 
> available through the AM process (in this case RestServerEndpoint that run as 
> part of AM). Note: this is in the context of running Flink cluster in YARN 
> application mode.
> For eg: in the case of RestClusterClient#listJobs -
> {{Standalone listJobs}} makes the request as - 
> {{{}https://:/v1/{}}}{{{}jobs{}}}{{{}/overview{}}}
> YARN the same request has to be proxified as -  
> {{{}https://:/proxy//v1/{}}}{{{}jobs{}}}{{{}/overview?proxyapproved=true{}}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-34751) RestClusterClient APIs doesn't work with running Flink application on YARN

2024-03-19 Thread Venkata krishnan Sowrirajan (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Venkata krishnan Sowrirajan updated FLINK-34751:

Description: 
Apache YARN uses web proxy in Resource Manager to expose the endpoints 
available through the AM process (in this case RestServerEndpoint that run as 
part of AM). Note: this is in the context of running Flink cluster in YARN 
application mode.

 

For eg: in the case of RestClusterClient#listJobs -

{{Standalone listJobs}} makes the request as - 
{{{}https://:/v1/{}}}{{{}jobs{}}}{{{}/overview{}}}

YARN the same request has to be proxified as -  
{{{}https://:/proxy//v1/{}}}{{{}jobs{}}}{{{}/overview?proxyapproved=true{}}}

  was:Apache YARN uses web proxy in Resource Manager to expose the endpoints 
available through the AM process (in this case RestServerEndpoint that run as 
part of AM). Note: this is in the context of running Flink cluster in YARN 
application mode.


> RestClusterClient APIs doesn't work with running Flink application on YARN
> --
>
> Key: FLINK-34751
> URL: https://issues.apache.org/jira/browse/FLINK-34751
> Project: Flink
>  Issue Type: Bug
>Reporter: Venkata krishnan Sowrirajan
>Priority: Major
>
> Apache YARN uses web proxy in Resource Manager to expose the endpoints 
> available through the AM process (in this case RestServerEndpoint that run as 
> part of AM). Note: this is in the context of running Flink cluster in YARN 
> application mode.
>  
> For eg: in the case of RestClusterClient#listJobs -
> {{Standalone listJobs}} makes the request as - 
> {{{}https://:/v1/{}}}{{{}jobs{}}}{{{}/overview{}}}
> YARN the same request has to be proxified as -  
> {{{}https://:/proxy//v1/{}}}{{{}jobs{}}}{{{}/overview?proxyapproved=true{}}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-34751) RestClusterClient APIs doesn't work with running Flink application on YARN

2024-03-19 Thread Venkata krishnan Sowrirajan (Jira)
Venkata krishnan Sowrirajan created FLINK-34751:
---

 Summary: RestClusterClient APIs doesn't work with running Flink 
application on YARN
 Key: FLINK-34751
 URL: https://issues.apache.org/jira/browse/FLINK-34751
 Project: Flink
  Issue Type: Bug
Reporter: Venkata krishnan Sowrirajan


Apache YARN uses web proxy in Resource Manager to expose the endpoints 
available through the AM process (in this case RestServerEndpoint that run as 
part of AM). Note: this is in the context of running Flink cluster in YARN 
application mode.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (FLINK-34734) Translate directory and page titles of Flink CDC docs to Chinese

2024-03-19 Thread Leonard Xu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Leonard Xu resolved FLINK-34734.

Resolution: Implemented

flink-cdc master: 8fdd151a0606f8bb707ce11b969bdb62f22c7182

> Translate directory and page titles of Flink CDC docs to Chinese
> 
>
> Key: FLINK-34734
> URL: https://issues.apache.org/jira/browse/FLINK-34734
> Project: Flink
>  Issue Type: Sub-task
>  Components: chinese-translation, Documentation, Flink CDC
>Affects Versions: 3.1.0
>Reporter: LvYanquan
>Assignee: LvYanquan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.1.0
>
>
> The titles is used to build directory and document names.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (FLINK-34734) Translate directory and page titles of Flink CDC docs to Chinese

2024-03-19 Thread Leonard Xu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Leonard Xu reassigned FLINK-34734:
--

Assignee: LvYanquan

> Translate directory and page titles of Flink CDC docs to Chinese
> 
>
> Key: FLINK-34734
> URL: https://issues.apache.org/jira/browse/FLINK-34734
> Project: Flink
>  Issue Type: Sub-task
>  Components: chinese-translation, Documentation, Flink CDC
>Affects Versions: 3.1.0
>Reporter: LvYanquan
>Assignee: LvYanquan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.1.0
>
>
> The titles is used to build directory and document names.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] [FLINK-34734][cdc][doc] update the title of content.zh to Chinese. [flink-cdc]

2024-03-19 Thread via GitHub


leonardBang merged PR #3172:
URL: https://github.com/apache/flink-cdc/pull/3172


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Created] (FLINK-34750) "legacy-flink-cdc-sources" Page of DB2 for Flink CDC Chinese Documentation.

2024-03-19 Thread LvYanquan (Jira)
LvYanquan created FLINK-34750:
-

 Summary: "legacy-flink-cdc-sources" Page of DB2 for Flink CDC 
Chinese Documentation.
 Key: FLINK-34750
 URL: https://issues.apache.org/jira/browse/FLINK-34750
 Project: Flink
  Issue Type: Sub-task
  Components: chinese-translation, Documentation, Flink CDC
Affects Versions: cdc-3.1.0
Reporter: LvYanquan
 Fix For: cdc-3.1.0


Translate legacy-flink-cdc-sources pages of 
[https://github.com/apache/flink-cdc/blob/master/docs/content/docs/connectors/legacy-flink-cdc-sources/postgres-cdc.md
 
|https://github.com/apache/flink-cdc/blob/master/docs/content/docs/connectors/legacy-flink-cdc-sources/db2-cdc.md]into
 Chinese.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-34749) "legacy-flink-cdc-sources" Page of SQLServer for Flink CDC Chinese Documentation.

2024-03-19 Thread LvYanquan (Jira)
LvYanquan created FLINK-34749:
-

 Summary: "legacy-flink-cdc-sources" Page of SQLServer for Flink 
CDC Chinese Documentation.
 Key: FLINK-34749
 URL: https://issues.apache.org/jira/browse/FLINK-34749
 Project: Flink
  Issue Type: Sub-task
  Components: chinese-translation, Documentation, Flink CDC
Affects Versions: cdc-3.1.0
Reporter: LvYanquan
 Fix For: cdc-3.1.0


Translate legacy-flink-cdc-sources pages of 
[https://github.com/apache/flink-cdc/blob/master/docs/content/docs/connectors/legacy-flink-cdc-sources/sqlserver-cdc.md
 
|https://github.com/apache/flink-cdc/blob/master/docs/content/docs/connectors/legacy-flink-cdc-sources/db2-cdc.md]into
 Chinese.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-34748) "legacy-flink-cdc-sources" Page of Oracle for Flink CDC Chinese Documentation.

2024-03-19 Thread LvYanquan (Jira)
LvYanquan created FLINK-34748:
-

 Summary: "legacy-flink-cdc-sources" Page of Oracle for Flink CDC 
Chinese Documentation.
 Key: FLINK-34748
 URL: https://issues.apache.org/jira/browse/FLINK-34748
 Project: Flink
  Issue Type: Sub-task
  Components: chinese-translation, Documentation, Flink CDC
Affects Versions: cdc-3.1.0
Reporter: LvYanquan
 Fix For: cdc-3.1.0


Translate legacy-flink-cdc-sources pages of 
[https://github.com/apache/flink-cdc/blob/master/docs/content/docs/connectors/legacy-flink-cdc-sources/oracle-cdc.md
 
|https://github.com/apache/flink-cdc/blob/master/docs/content/docs/connectors/legacy-flink-cdc-sources/db2-cdc.md]into
 Chinese.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-34747) "legacy-flink-cdc-sources" Page of DB2 for Flink CDC Chinese Documentation.

2024-03-19 Thread LvYanquan (Jira)
LvYanquan created FLINK-34747:
-

 Summary: "legacy-flink-cdc-sources" Page of DB2 for Flink CDC 
Chinese Documentation.
 Key: FLINK-34747
 URL: https://issues.apache.org/jira/browse/FLINK-34747
 Project: Flink
  Issue Type: Sub-task
  Components: chinese-translation, Documentation, Flink CDC
Affects Versions: cdc-3.1.0
Reporter: LvYanquan
 Fix For: cdc-3.1.0


Translate legacy-flink-cdc-sources pages of 
[https://github.com/apache/flink-cdc/blob/master/docs/content/docs/connectors/legacy-flink-cdc-sources/db2-cdc.md
 
|https://github.com/apache/flink-cdc/blob/master/docs/content/docs/connectors/legacy-flink-cdc-sources/db2-cdc.md]into
 Chinese.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-34740) "legacy-flink-cdc-sources" Pages for Flink CDC Chinese Documentation existed in 2.x version.

2024-03-19 Thread LvYanquan (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

LvYanquan updated FLINK-34740:
--
Description: 
Translate legacy-flink-cdc-sources pages of 
[https://github.com/apache/flink-cdc/tree/master/docs/content/docs/connectors/legacy-flink-cdc-sources]
 into Chinese.

This includes legacy MySQL\MongoDB\Oceanbase source.

  was:Translate legacy-flink-cdc-sources pages of 
[https://github.com/apache/flink-cdc/tree/master/docs/content/docs/connectors/legacy-flink-cdc-sources]
 into Chinese.


> "legacy-flink-cdc-sources" Pages for Flink CDC Chinese Documentation existed 
> in 2.x version.
> 
>
> Key: FLINK-34740
> URL: https://issues.apache.org/jira/browse/FLINK-34740
> Project: Flink
>  Issue Type: Sub-task
>  Components: chinese-translation, Documentation, Flink CDC
>Affects Versions: cdc-3.1.0
>Reporter: LvYanquan
>Priority: Minor
> Fix For: cdc-3.1.0
>
>
> Translate legacy-flink-cdc-sources pages of 
> [https://github.com/apache/flink-cdc/tree/master/docs/content/docs/connectors/legacy-flink-cdc-sources]
>  into Chinese.
> This includes legacy MySQL\MongoDB\Oceanbase source.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-34702) Rank should not convert to StreamExecDuplicate when the input is not insert only

2024-03-19 Thread Jacky Lau (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17828590#comment-17828590
 ] 

Jacky Lau commented on FLINK-34702:
---

 we have three proposed solutions :

{*}Solution 1{*}: Add a switch parameter, enabled by default. When a 
compilation error occurs, the switch is turned off based on the error message. 
However, this is noticeable by the user.

{*}Solution 2{*}: Make the deduplication operator support changelog messages. 
Currently, the deduplication operator is divided into eight combinations: 
processing time / event time, first row / last row, mini-batch / 
non-mini-batch. Supporting changelog would effectively double these 
combinations to sixteen, significantly complicating the code logic. 
Furthermore, taking processing time's first row as an example, if changelog is 
supported, it would require recording all historical data, which could severely 
degrade performance. Additionally, the deduplication operator is essentially a 
special case of the Rank operator, but simply adapting the logic to use Rank 
isn't feasible either, as the Rank operator currently does not support 
mini-batch. This change would be imperceptible to users.

{*}Solution 3{*}: An improvement over Solution 1, since the logical to physical 
stage cannot detect whether there are changelog, the transformation could be 
done during the physical rewrite phase. This phase comes after the 
FlinkChangelogModeInferenceProgram.

 

After discussion with [~lincoln.86xy] offline, we agree with *Solution 3*

> Rank should not convert to StreamExecDuplicate when the input is not insert 
> only
> 
>
> Key: FLINK-34702
> URL: https://issues.apache.org/jira/browse/FLINK-34702
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Planner
>Affects Versions: 1.20.0
>Reporter: Jacky Lau
>Priority: Major
> Fix For: 1.20.0
>
>
> {code:java}
> @Test
> def testSimpleFirstRowOnBuiltinProctime1(): Unit = {
>   val sqlQuery =
> """
>   |SELECT *
>   |FROM (
>   |  SELECT *,
>   |ROW_NUMBER() OVER (PARTITION BY a ORDER BY PROCTIME() ASC) as 
> rowNum
>   |  FROM (select a, count(b) as b from MyTable group by a)
>   |)
>   |WHERE rowNum = 1
> """.stripMargin
>   util.verifyExecPlan(sqlQuery)
> } {code}
> Exception:
> org.apache.flink.table.api.TableException: StreamPhysicalDeduplicate doesn't 
> support consuming update changes which is produced by node 
> GroupAggregate(groupBy=[a], select=[a, COUNT(b) AS b])
> because the StreamPhysicalDeduplicate can not consuming update changes now 
> while StreamExecRank can.
> so we should not convert the FlinkLogicalRank to StreamPhysicalDeduplicate in 
> this case. and we can defer whether input contains update change in the 
> "optimize the physical plan" phase. 
> so we can add an option to solve it. and when the StreamPhysicalDeduplicate 
> can support consuming update changes , we can deprecate it



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-34740) "legacy-flink-cdc-sources" Pages for Flink CDC Chinese Documentation existed in 2.x version.

2024-03-19 Thread LvYanquan (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

LvYanquan updated FLINK-34740:
--
Summary: "legacy-flink-cdc-sources" Pages for Flink CDC Chinese 
Documentation existed in 2.x version.  (was: "legacy-flink-cdc-sources" Pages 
for Flink CDC Chinese Documentation)

> "legacy-flink-cdc-sources" Pages for Flink CDC Chinese Documentation existed 
> in 2.x version.
> 
>
> Key: FLINK-34740
> URL: https://issues.apache.org/jira/browse/FLINK-34740
> Project: Flink
>  Issue Type: Sub-task
>  Components: chinese-translation, Documentation, Flink CDC
>Affects Versions: cdc-3.1.0
>Reporter: LvYanquan
>Priority: Minor
> Fix For: cdc-3.1.0
>
>
> Translate legacy-flink-cdc-sources pages of 
> [https://github.com/apache/flink-cdc/tree/master/docs/content/docs/connectors/legacy-flink-cdc-sources]
>  into Chinese.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] [FLINK-34743][autoscaler] Memory tuning takes effect even if the parallelism isn't changed [flink-kubernetes-operator]

2024-03-19 Thread via GitHub


1996fanrui commented on PR #799:
URL: 
https://github.com/apache/flink-kubernetes-operator/pull/799#issuecomment-2008613406

   Thanks @mxm  for the quick review and valuable suggestion!
   
   I originally planned to add a memory scaling threshold in a subsequent PR 
(It's better to be done in 1.9.0). If you think it's needed in this PR, let me 
do it recently.
   
   How about reuse the `AutoScalerOptions#TARGET_UTILIZATION_BOUNDARY` option, 
it's 0.3 by default, it means we will ignore this tuning when the total memory 
saving less than 30%.
   
   I prefer reuse option because it can lower the user's threshold for use. 
WDYT?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [FLINK-34734][cdc][doc] update the title of content.zh to Chinese. [flink-cdc]

2024-03-19 Thread via GitHub


lvyanquan commented on PR #3172:
URL: https://github.com/apache/flink-cdc/pull/3172#issuecomment-2008603814

   @leonardBang @PatrickRen  PTAL.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (FLINK-34734) Translate directory and page titles of Flink CDC docs to Chinese

2024-03-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-34734:
---
Labels: pull-request-available  (was: )

> Translate directory and page titles of Flink CDC docs to Chinese
> 
>
> Key: FLINK-34734
> URL: https://issues.apache.org/jira/browse/FLINK-34734
> Project: Flink
>  Issue Type: Sub-task
>  Components: chinese-translation, Documentation, Flink CDC
>Affects Versions: 3.1.0
>Reporter: LvYanquan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.1.0
>
>
> The titles is used to build directory and document names.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[PR] [FLINK-34734][cdc][doc] update the title of content.zh to Chinese. [flink-cdc]

2024-03-19 Thread via GitHub


lvyanquan opened a new pull request, #3172:
URL: https://github.com/apache/flink-cdc/pull/3172

   Display page:
   https://github.com/apache/flink-cdc/assets/38547014/a005559c-7c25-4be5-9a90-bc4a8da13779;>
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Commented] (FLINK-34667) Changelog state backend support local rescaling

2024-03-19 Thread Yanfei Lei (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17828573#comment-17828573
 ] 

Yanfei Lei commented on FLINK-34667:


Merged via 501de48

> Changelog state backend support local rescaling
> ---
>
> Key: FLINK-34667
> URL: https://issues.apache.org/jira/browse/FLINK-34667
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / State Backends
>Affects Versions: 1.20.0
>Reporter: Yanfei Lei
>Assignee: Yanfei Lei
>Priority: Major
>  Labels: pull-request-available
>
> FLINK-33341 uses the available local keyed state for rescaling, this will 
> cause changelog state to incorrectly treat part of the local state as the 
> complete local state.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (FLINK-34667) Changelog state backend support local rescaling

2024-03-19 Thread Yanfei Lei (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yanfei Lei closed FLINK-34667.
--

> Changelog state backend support local rescaling
> ---
>
> Key: FLINK-34667
> URL: https://issues.apache.org/jira/browse/FLINK-34667
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / State Backends
>Affects Versions: 1.20.0
>Reporter: Yanfei Lei
>Assignee: Yanfei Lei
>Priority: Major
>  Labels: pull-request-available
>
> FLINK-33341 uses the available local keyed state for rescaling, this will 
> cause changelog state to incorrectly treat part of the local state as the 
> complete local state.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (FLINK-34667) Changelog state backend support local rescaling

2024-03-19 Thread Yanfei Lei (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yanfei Lei resolved FLINK-34667.

Resolution: Fixed

> Changelog state backend support local rescaling
> ---
>
> Key: FLINK-34667
> URL: https://issues.apache.org/jira/browse/FLINK-34667
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / State Backends
>Affects Versions: 1.20.0
>Reporter: Yanfei Lei
>Assignee: Yanfei Lei
>Priority: Major
>  Labels: pull-request-available
>
> FLINK-33341 uses the available local keyed state for rescaling, this will 
> cause changelog state to incorrectly treat part of the local state as the 
> complete local state.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] [FLINK-34667][state/changelog] Changelog state backend supports local rescaling [flink]

2024-03-19 Thread via GitHub


fredia merged PR #24516:
URL: https://github.com/apache/flink/pull/24516


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [FLINK-34732][cdc][ci] Add document dead link check for Flink CDC Documentation [flink-cdc]

2024-03-19 Thread via GitHub


GOODBOY008 commented on code in PR #3168:
URL: https://github.com/apache/flink-cdc/pull/3168#discussion_r1531451850


##
.dlc.json:
##
@@ -0,0 +1,44 @@
+{
+  "ignorePatterns": [
+{
+  "pattern": "^http://localhost;
+},
+{
+  "pattern": "^#"
+},
+{
+  "pattern": "^{"
+},
+{
+  "pattern": "^https://repo1.maven.org/maven2/org/apache/flink.*SNAPSHOT.*;
+},
+{
+  "pattern": "^https://mvnrepository.com;
+},
+{
+  "pattern": "^https://img.shields.io;
+},
+{
+  "pattern": "^https://tokei.rs;
+},
+{
+  "pattern": "^https://json.org/;
+},
+{
+  "pattern": "^https://opencollective.com;
+},
+{
+  "pattern": "^https://twitter.com*;
+}

Review Comment:
   I will polish this config.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Created] (FLINK-34746) Switching to the Apache CDN for Dockerfile

2024-03-19 Thread lincoln lee (Jira)
lincoln lee created FLINK-34746:
---

 Summary: Switching to the Apache CDN for Dockerfile
 Key: FLINK-34746
 URL: https://issues.apache.org/jira/browse/FLINK-34746
 Project: Flink
  Issue Type: Improvement
  Components: flink-docker
Reporter: lincoln lee


During publishing the official image, we received some comments

for Switching to the Apache CDN

 

See

https://github.com/docker-library/official-images/pull/16114

https://github.com/docker-library/official-images/pull/16430

 

Reason for switching: [https://apache.org/history/mirror-history.html] (also 
[https://www.apache.org/dyn/closer.cgi] and [https://www.apache.org/mirrors])



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] [FLINK-34732][cdc][ci] Add document dead link check for Flink CDC Documentation [flink-cdc]

2024-03-19 Thread via GitHub


GOODBOY008 commented on code in PR #3168:
URL: https://github.com/apache/flink-cdc/pull/3168#discussion_r1531450125


##
docs/content.zh/docs/connectors/legacy-flink-cdc-sources/db2-cdc.md:
##
@@ -32,9 +32,9 @@ describes how to setup the db2 CDC connector to run SQL 
queries against Db2 data
 
 ## Supported Databases
 
-| Connector | Database   | 
Driver   |
-|---||--|
-| [Db2-cdc](../db2-cdc) |  [Db2](https://www.ibm.com/products/db2): 11.5 | 
Db2 Driver: 11.5.0.0 |
+| Connector | Database   | Driver  
 |
+|---||--|
+| Db2-cdc   |  [Db2](https://www.ibm.com/products/db2): 11.5 | Db2 Driver: 
11.5.0.0 |

Review Comment:
   This just link to current page ,so I think there is no need to add this link.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Commented] (FLINK-34744) autoscaling-dynamic cannot run

2024-03-19 Thread Rui Fan (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17828571#comment-17828571
 ] 

Rui Fan commented on FLINK-34744:
-

Merged to main(1.9.0) via: b584b08806c7e8366519acdb92bfb4725faaebba

> autoscaling-dynamic cannot run
> --
>
> Key: FLINK-34744
> URL: https://issues.apache.org/jira/browse/FLINK-34744
> Project: Flink
>  Issue Type: Bug
>  Components: Autoscaler
>Reporter: Rui Fan
>Assignee: Rui Fan
>Priority: Major
>  Labels: pull-request-available
> Fix For: kubernetes-operator-1.9.0
>
> Attachments: image-2024-03-19-21-46-15-530.png
>
>
> autoscaling-dynamic cannot run on my Mac
>  !image-2024-03-19-21-46-15-530.png! 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (FLINK-34744) autoscaling-dynamic cannot run

2024-03-19 Thread Rui Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Fan resolved FLINK-34744.
-
Resolution: Fixed

> autoscaling-dynamic cannot run
> --
>
> Key: FLINK-34744
> URL: https://issues.apache.org/jira/browse/FLINK-34744
> Project: Flink
>  Issue Type: Bug
>  Components: Autoscaler
>Reporter: Rui Fan
>Assignee: Rui Fan
>Priority: Major
>  Labels: pull-request-available
> Fix For: kubernetes-operator-1.9.0
>
> Attachments: image-2024-03-19-21-46-15-530.png
>
>
> autoscaling-dynamic cannot run on my Mac
>  !image-2024-03-19-21-46-15-530.png! 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] [FLINK-34732][cdc][ci] Add document dead link check for Flink CDC Documentation [flink-cdc]

2024-03-19 Thread via GitHub


PatrickRen commented on code in PR #3168:
URL: https://github.com/apache/flink-cdc/pull/3168#discussion_r1531445066


##
docs/content.zh/docs/connectors/legacy-flink-cdc-sources/db2-cdc.md:
##
@@ -32,9 +32,9 @@ describes how to setup the db2 CDC connector to run SQL 
queries against Db2 data
 
 ## Supported Databases
 
-| Connector | Database   | 
Driver   |
-|---||--|
-| [Db2-cdc](../db2-cdc) |  [Db2](https://www.ibm.com/products/db2): 11.5 | 
Db2 Driver: 11.5.0.0 |
+| Connector | Database   | Driver  
 |
+|---||--|
+| Db2-cdc   |  [Db2](https://www.ibm.com/products/db2): 11.5 | Db2 Driver: 
11.5.0.0 |

Review Comment:
   What about using `{{ < ref > }}` here instead of removing the link?



##
docs/content/docs/connectors/legacy-flink-cdc-sources/mysql-cdc.md:
##
@@ -31,9 +31,9 @@ The MySQL CDC connector allows for reading snapshot data and 
incremental data fr
 
 ## Supported Databases
 
-| Connector | Database 




  | Driver  |
-|---||-|
-| [mysql-cdc](../mysql-cdc) |  [MySQL](https://dev.mysql.com/doc): 5.6, 
5.7, 8.0.x  [RDS MySQL](https://www.aliyun.com/product/rds/mysql): 5.6, 
5.7, 8.0.x  [PolarDB MySQL](https://www.aliyun.com/product/polardb): 5.6, 
5.7, 8.0.x  [Aurora MySQL](https://aws.amazon.com/cn/rds/aurora): 5.6, 5.7, 
8.0.x  [MariaDB](https://mariadb.org): 10.x  [PolarDB 
X](https://github.com/ApsaraDB/galaxysql): 2.0.1 | JDBC Driver: 8.0.27 |
+| Connector | Database 




  | Driver  |
+|---||-|
+| mysql-cdc |  [MySQL](https://dev.mysql.com/doc): 5.6, 5.7, 8.0.x  
[RDS MySQL](https://www.aliyun.com/product/rds/mysql): 5.6, 5.7, 8.0.x  
[PolarDB MySQL](https://www.aliyun.com/product/polardb): 5.6, 5.7, 8.0.x  
[Aurora MySQL](https://aws.amazon.com/cn/rds/aurora): 5.6, 5.7, 8.0.x  
[MariaDB](https://mariadb.org): 10.x  [PolarDB 
X](https://github.com/ApsaraDB/galaxysql): 2.0.1 | JDBC Driver: 8.0.27 |

Review Comment:
   Use {{< ref >}} here



##
docs/content.zh/docs/connectors/legacy-flink-cdc-sources/mysql-cdc.md:
##
@@ -31,9 +31,9 @@ The MySQL CDC connector allows for reading snapshot data and 
incremental data fr
 
 ## Supported Databases
 
-| Connector | Database 




  | Driver  |

Re: [PR] [FLINK-34744][autoscaler] Fix the issue that autoscaling-dynamic cannot run [flink-kubernetes-operator]

2024-03-19 Thread via GitHub


1996fanrui merged PR #800:
URL: https://github.com/apache/flink-kubernetes-operator/pull/800


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] (FLINK-34740) "legacy-flink-cdc-sources" Pages for Flink CDC Chinese Documentation

2024-03-19 Thread Hongshun Wang (Jira)


[ https://issues.apache.org/jira/browse/FLINK-34740 ]


Hongshun Wang deleted comment on FLINK-34740:
---

was (Author: JIRAUSER298968):
I am willing to do this.

> "legacy-flink-cdc-sources" Pages for Flink CDC Chinese Documentation
> 
>
> Key: FLINK-34740
> URL: https://issues.apache.org/jira/browse/FLINK-34740
> Project: Flink
>  Issue Type: Sub-task
>  Components: chinese-translation, Documentation, Flink CDC
>Affects Versions: cdc-3.1.0
>Reporter: LvYanquan
>Priority: Minor
> Fix For: cdc-3.1.0
>
>
> Translate legacy-flink-cdc-sources pages of 
> [https://github.com/apache/flink-cdc/tree/master/docs/content/docs/connectors/legacy-flink-cdc-sources]
>  into Chinese.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-34740) "legacy-flink-cdc-sources" Pages for Flink CDC Chinese Documentation

2024-03-19 Thread Hongshun Wang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17828570#comment-17828570
 ] 

Hongshun Wang commented on FLINK-34740:
---

I am willing to do this.

> "legacy-flink-cdc-sources" Pages for Flink CDC Chinese Documentation
> 
>
> Key: FLINK-34740
> URL: https://issues.apache.org/jira/browse/FLINK-34740
> Project: Flink
>  Issue Type: Sub-task
>  Components: chinese-translation, Documentation, Flink CDC
>Affects Versions: cdc-3.1.0
>Reporter: LvYanquan
>Priority: Minor
> Fix For: cdc-3.1.0
>
>
> Translate legacy-flink-cdc-sources pages of 
> [https://github.com/apache/flink-cdc/tree/master/docs/content/docs/connectors/legacy-flink-cdc-sources]
>  into Chinese.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-34741) "get-started" Page for Flink CDC Chinese Documentation

2024-03-19 Thread Hongshun Wang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17828569#comment-17828569
 ] 

Hongshun Wang commented on FLINK-34741:
---

I am willing to do this.

> "get-started" Page for Flink CDC Chinese Documentation
> --
>
> Key: FLINK-34741
> URL: https://issues.apache.org/jira/browse/FLINK-34741
> Project: Flink
>  Issue Type: Sub-task
>  Components: chinese-translation, Documentation, Flink CDC
>Affects Versions: cdc-3.1.0
>Reporter: LvYanquan
>Priority: Minor
> Fix For: cdc-3.1.0
>
>
> Translate 
> [https://github.com/apache/flink-cdc/tree/master/docs/content/docs/get-started]
>  pages into Chinese.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-34742) Translate "FAQ" Page for Flink CDC Chinese Documentation

2024-03-19 Thread Hongshun Wang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17828568#comment-17828568
 ] 

Hongshun Wang commented on FLINK-34742:
---

I am willing to do this.

> Translate "FAQ" Page for Flink CDC Chinese Documentation
> 
>
> Key: FLINK-34742
> URL: https://issues.apache.org/jira/browse/FLINK-34742
> Project: Flink
>  Issue Type: Sub-task
>  Components: chinese-translation, Documentation, Flink CDC
>Affects Versions: cdc-3.1.0
>Reporter: LvYanquan
>Priority: Minor
> Fix For: cdc-3.1.0
>
>
> Translate 
> [https://github.com/apache/flink-cdc/blob/master/docs/content/docs/faq/faq.md]
>  page into Chinese.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-34729) "Core Concept" Pages for Flink CDC Chinese Documentation

2024-03-19 Thread LvYanquan (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17828566#comment-17828566
 ] 

LvYanquan commented on FLINK-34729:
---

I am willing to do this.

> "Core Concept" Pages for Flink CDC Chinese Documentation
> 
>
> Key: FLINK-34729
> URL: https://issues.apache.org/jira/browse/FLINK-34729
> Project: Flink
>  Issue Type: Sub-task
>  Components: chinese-translation, Documentation, Flink CDC
>Reporter: LvYanquan
>Priority: Minor
>
> Translate [Core 
> Concept|https://github.com/apache/flink-cdc/tree/master/docs/content/docs/core-concept]
>  Pages into Chinese.
> Include data-pipeline/data-source/data-sink/route/transform/table-id.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-34735) "Developer Guide - Understanding Flink CDC API" Page for Flink CDC Chinese Documentation

2024-03-19 Thread LvYanquan (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17828567#comment-17828567
 ] 

LvYanquan commented on FLINK-34735:
---

I am willing to do this.

> "Developer Guide - Understanding Flink CDC API" Page for Flink CDC Chinese 
> Documentation
> 
>
> Key: FLINK-34735
> URL: https://issues.apache.org/jira/browse/FLINK-34735
> Project: Flink
>  Issue Type: Sub-task
>  Components: chinese-translation, Documentation, Flink CDC
>Affects Versions: 3.1.0
>Reporter: LvYanquan
>Priority: Minor
> Fix For: 3.1.0
>
>
> Translate 
> [https://github.com/apache/flink-cdc/blob/master/docs/content/docs/developer-guide/understand-flink-cdc-api.md]
>  into Chinese.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] [FLINK-25537] [JUnit5 Migration] Module: flink-core [flink]

2024-03-19 Thread via GitHub


GOODBOY008 commented on PR #24523:
URL: https://github.com/apache/flink/pull/24523#issuecomment-2008545415

   @flinkbot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [FLINK-28693][table] Fix janino compile failed because the code generated refers the class in table-planner [flink]

2024-03-19 Thread via GitHub


xuyangzhong commented on PR #24280:
URL: https://github.com/apache/flink/pull/24280#issuecomment-2008542204

   > This is a pr we would like merged. It looks like @snuyanzin asked for 
tests to be added. @xuyangzhong are you looking at adding the tests?
   
   Hi, @davidradl . I'm attempting to add tests for it, but my recent schedule 
has been quite tight. I will do my best to re-push the PR within the next few 
days. Considering your dependency on this PR, another quick solution would be 
to cherry-pick this PR to your branch and repackage Flink. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [FLINK-34745] Improve validations for a period in Time Travel [flink]

2024-03-19 Thread via GitHub


snuyanzin commented on code in PR #24534:
URL: https://github.com/apache/flink/pull/24534#discussion_r1531189991


##
flink-table/flink-table-planner/src/test/scala/org/apache/flink/table/planner/plan/stream/sql/join/TemporalJoinTest.scala:
##
@@ -508,6 +508,27 @@ class TemporalJoinTest extends TableTestBase {
 " table, but the rowtime types are TIMESTAMP_LTZ(3) *ROWTIME* and 
TIMESTAMP(3) *ROWTIME*.",
   classOf[ValidationException]
 )
+
+val sqlQuery9 = "SELECT * " +
+  "FROM Orders AS o JOIN " +
+  "RatesHistoryWithPK FOR SYSTEM_TIME AS OF 'o.rowtime' AS r " +
+  "ON o.currency = r.currency"
+expectExceptionThrown(
+  sqlQuery9,
+  "The system time period specification expects Timestamp type but is 
'CHAR'",

Review Comment:
   IIRC this message is defined in resource file in Calcite and we (downstream 
project) can redefine it 
   However i guess it is out of scope for this PR



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [FLINK-34745] Improve validations for a period in Time Travel [flink]

2024-03-19 Thread via GitHub


snuyanzin commented on code in PR #24534:
URL: https://github.com/apache/flink/pull/24534#discussion_r1531183660


##
flink-table/flink-table-planner/src/main/java/org/apache/flink/table/planner/calcite/FlinkCalciteSqlValidator.java:
##
@@ -222,14 +223,23 @@ protected void registerNamespace(
 Collections.singletonList(simplifiedRexNode),
 reducedNodes);
 // check whether period is the unsupported expression
-if (!(reducedNodes.get(0) instanceof RexLiteral)) {
-throw new UnsupportedOperationException(
+final RexNode reducedNode = reducedNodes.get(0);
+if (!(reducedNode instanceof RexLiteral)) {
+throw new ValidationException(
 String.format(
 "Unsupported time travel expression: %s for 
the expression can not be reduced to a constant by Flink.",
 periodNode));
 }
 
-RexLiteral rexLiteral = (RexLiteral) (reducedNodes).get(0);
+RexLiteral rexLiteral = (RexLiteral) reducedNode;
+final RelDataType sqlType = rexLiteral.getType();
+if (!SqlTypeUtil.isTimestamp(rexLiteral.getType())) {

Review Comment:
   nit: 
   ```suggestion
   final RelDataType sqlType = rexLiteral.getType();
   if (!SqlTypeUtil.isTimestamp(sqlType)) {
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [FLINK-34745] Improve validations for a period in Time Travel [flink]

2024-03-19 Thread via GitHub


dawidwys commented on code in PR #24534:
URL: https://github.com/apache/flink/pull/24534#discussion_r1530780573


##
flink-table/flink-table-planner/src/main/java/org/apache/flink/table/planner/calcite/FlinkCalciteSqlValidator.java:
##
@@ -222,14 +223,23 @@ protected void registerNamespace(
 Collections.singletonList(simplifiedRexNode),
 reducedNodes);
 // check whether period is the unsupported expression
-if (!(reducedNodes.get(0) instanceof RexLiteral)) {
-throw new UnsupportedOperationException(
+final RexNode reducedNode = reducedNodes.get(0);
+if (!(reducedNode instanceof RexLiteral)) {
+throw new ValidationException(
 String.format(
 "Unsupported time travel expression: %s for 
the expression can not be reduced to a constant by Flink.",
 periodNode));
 }
 
-RexLiteral rexLiteral = (RexLiteral) (reducedNodes).get(0);
+RexLiteral rexLiteral = (RexLiteral) reducedNode;
+final SqlTypeName sqlTypeName = rexLiteral.getTypeName();
+if (!(sqlTypeName == SqlTypeName.TIMESTAMP_WITH_LOCAL_TIME_ZONE
+|| sqlTypeName == SqlTypeName.TIMESTAMP)) {

Review Comment:
   Good idea! I was looking for sth like it, but I failed.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Comment Edited] (FLINK-34696) GSRecoverableWriterCommitter is generating excessive data blobs

2024-03-19 Thread Galen Warren (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17828412#comment-17828412
 ] 

Galen Warren edited comment on FLINK-34696 at 3/19/24 5:07 PM:
---

I think your proposed algorithm is just another way of implementing the "delete 
intermediate blobs that are no longer necessary as soon as possible" idea we've 
considered. You accomplish it by overwriting the staging blob on each 
iteration; a similar effect could be achieved by writing to a new intermediate 
staging blob on each iteration (as is done now) and deleting the old one right 
away (which is not done now. instead this deletion occurs shortly thereafter, 
after the commit succeeds).

Aside: I'm not sure whether it's possible to overwrite the existing staging 
blob like you suggest. The [docs |[Objects: compose  |  Cloud Storage  |  
Google 
Cloud|https://cloud.google.com/storage/docs/json_api/v1/objects/compose]] for 
the compose operation say:
{quote}Concatenates a list of existing objects into a new object in the same 
bucket. The existing source objects are unaffected by this operation
{quote}
In your proposal, the same staging blob is both the target of the compose 
operation and one of the input blobs to be composed. That _might_ work, but 
would have to be tested to see if it's allowed. If it isn't, writing to a new 
blob and deleting the old one would have essentially the same effect. That's 
almost certainly what happens behind the scenes anyway, since blobs are 
immutable.

Bigger picture, if you're trying to combine millions of immutable blobs 
together in one step, 32 at a time, I don't see how you avoid having lots of 
intermediate composed blobs, one way or another, at least temporarily. The main 
question would be how quickly ones that are no longer needed are discarded.
{quote}To streamline this, we could modify {{GSCommitRecoverable}} to update 
the list of {{{}componentObjectIds{}}}, allowing the removal of blobs already 
appended to the {{stagingBlob}}
{quote}
Not quite following you here; the GSCommitRecoverable is provided as an input 
to the GSRecoverableWriterCommitter, providing the list of "raw" (i.e. 
uncomposed) temporary blobs that need to be composed and committed. If you 
removed those raw blobs from that list as they're added to the staging blob, 
what would that accomplish? The GSCommitRecoverable provided to the 
GSRecoverableWriterCommitter isn't persisted anywhere after the commit succeeds 
or fails, and if the commit were to be retried it would need the complete list 
anyway.
{quote}I notice an other issue with the code - because the storage.compose 
might throw a StorageException. With the current code this would mean the 
intermediate composition blobs are not cleaned up. 

If the code above is implemented, I think there is no longer a need the TTL 
feature since all necessary blobs should be written to a final blob. Any 
leftover blobs post-job completion would indicate a failed state.
{quote}
There is no real way to prevent the orphaning of intermediate blobs in all 
failure scenarios. Even in your proposed algorithm, the staging blob could be 
orphaned if the composition process failed partway through. This is what the 
temp bucket and TTL mechanism are for. It's optional, so you don't have to use 
it, but yes you would have to keep an eye out for orphaned intermediate blobs 
via some other mechanism f you choose not to use it.

Honestly, I think some of your difficulties are coming from trying to combine 
so much data together at once vs. doing it along the way, with commits at 
incremental checkpoints. I was going to suggest you reduce your checkpoint 
interval, to aggregate more frequently, but obviously that doesn't work if 
you're not checkpointing at all.

I do think the RecoverableWriter mechanism is designed to make writing 
predictable and repeatable in checkpoint/recovery situations (hence the name); 
if you're not doing any of that, you may run into some challenges. If you were 
to use checkpoints, you would aggregate more frequently, meaning fewer 
intermediate blobs with shorter lifetimes, and you'd be less likely to run up 
against the 5TB limit, as you would end up with a series of smaller files, 
written at each checkpoint, vs. one huge file.

If we do want to consider changes, here's what I suggest:
 * Test whether GCP allows composition to overwrite an existing blob that is 
also an input to the compose operation. If that works, then we could use that 
technique in the composeBlobs function. As mentioned above, I doubt that this 
actually reduces the number of blobs that exist temporarily – since blobs are 
immutable objects – but it would minimize their lifetime, effectively deleting 
them as soon as possible. If overwriting doesn't work, then we could achieve a 
similar effect by deleting intermediate blobs as soon as they are not needed 
anymore, 

[jira] [Comment Edited] (FLINK-34696) GSRecoverableWriterCommitter is generating excessive data blobs

2024-03-19 Thread Galen Warren (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17828412#comment-17828412
 ] 

Galen Warren edited comment on FLINK-34696 at 3/19/24 5:07 PM:
---

I think your proposed algorithm is just another way of implementing the "delete 
intermediate blobs that are no longer necessary as soon as possible" idea we've 
considered. You accomplish it by overwriting the staging blob on each 
iteration; a similar effect could be achieved by writing to a new intermediate 
staging blob on each iteration (as is done now) and deleting the old one right 
away (which is not done now. instead this deletion occurs shortly thereafter, 
after the commit succeeds).

Aside: I'm not sure whether it's possible to overwrite the existing staging 
blob like you suggest. The [docs |[Objects: compose  |  Cloud Storage  |  
Google 
Cloud|https://cloud.google.com/storage/docs/json_api/v1/objects/compose]] for 
the compose operation say:
{quote}Concatenates a list of existing objects into a new object in the same 
bucket. The existing source objects are unaffected by this operation
{quote}
In your proposal, the same staging blob is both the target of the compose 
operation and one of the input blobs to be composed. That _might_ work, but 
would have to be tested to see if it's allowed. If it isn't, writing to a new 
blob and deleting the old one would have essentially the same effect. That's 
almost certainly what happens behind the scenes anyway, since blobs are 
immutable.

Bigger picture, if you're trying to combine millions of immutable blobs 
together in one step, 32 at a time, I don't see how you avoid having lots of 
intermediate composed blobs, one way or another, at least temporarily. The main 
question would be how quickly ones that are no longer needed are discarded.
{quote}To streamline this, we could modify {{GSCommitRecoverable}} to update 
the list of {{{}componentObjectIds{}}}, allowing the removal of blobs already 
appended to the {{stagingBlob}}
{quote}
Not quite following you here; the GSCommitRecoverable is provided as an input 
to the GSRecoverableWriterCommitter, providing the list of "raw" (i.e. 
uncomposed) temporary blobs that need to be composed and committed. If you 
removed those raw blobs from that list as they're added to the staging blob, 
what would that accomplish? The GSCommitRecoverable provided to the 
GSRecoverableWriterCommitter isn't persisted anywhere after the commit succeeds 
or fails, and if the commit were to be retried it would need the complete list 
anyway.
{quote}I notice an other issue with the code - because the storage.compose 
might throw a StorageException. With the current code this would mean the 
intermediate composition blobs are not cleaned up. 

If the code above is implemented, I think there is no longer a need the TTL 
feature since all necessary blobs should be written to a final blob. Any 
leftover blobs post-job completion would indicate a failed state.
{quote}
There is no real way to prevent the orphaning of intermediate blobs in all 
failure scenarios. Even in your proposed algorithm, the staging blob could be 
orphaned if the composition process failed partway through. This is what the 
temp bucket and TTL mechanism are for. It's optional, so you don't have to use 
it, but yes you would have to keep an eye out for orphaned intermediate blobs 
via some other mechanism f you choose not to use it.

Honestly, I think some of your difficulties are coming from trying to combine 
so much data together at once vs. doing it along the way, with commits at 
incremental checkpoints. I was going to suggest you reduce your checkpoint 
interval, to aggregate more frequently, but obviously that doesn't work if 
you're not checkpointing at all.

I do think the RecoverableWriter mechanism is designed to make writing 
predictable and repeatable in checkpoint/recovery situations (hence the name); 
if you're not doing any of that, you may run into some challenges. If you were 
to use checkpoints, you would aggregate more frequently, meaning fewer 
intermediate blobs with shorter lifetimes, and you'd be less likely to run up 
against the 5TB limit, as you would end up with a series of smaller files, 
written at each checkpoint, vs. one huge file.

If we do want to consider changes, here's what I suggest:
 * Test whether GCP allows composition to overwrite an existing blob that is 
also an input to the compose operation. If that works, then we could use that 
technique in the composeBlobs function. As mentioned above, I doubt that this 
actually reduces the number of blobs that exist temporarily – since blobs are 
immutable objects – but it would minimize their lifetime, effectively deleting 
them as soon as possible. If overwriting doesn't work, then we could achieve a 
similar effect by deleting intermediate blobs as soon as they are not needed 
anymore, 

[jira] [Comment Edited] (FLINK-34696) GSRecoverableWriterCommitter is generating excessive data blobs

2024-03-19 Thread Galen Warren (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17828412#comment-17828412
 ] 

Galen Warren edited comment on FLINK-34696 at 3/19/24 5:06 PM:
---

I think your proposed algorithm is just another way of implementing the "delete 
intermediate blobs that are no longer necessary as soon as possible" idea we've 
considered. You accomplish it by overwriting the staging blob on each 
iteration; a similar effect could be achieved by writing to a new intermediate 
staging blob on each iteration (as is done now) and deleting the old one right 
away (which is not done now. instead this deletion occurs shortly thereafter, 
after the commit succeeds).

Aside: I'm not sure whether it's possible to overwrite the existing staging 
blob like you suggest. The [docs |[Objects: compose  |  Cloud Storage  |  
Google 
Cloud|https://cloud.google.com/storage/docs/json_api/v1/objects/compose]] for 
the compose operation say:
{quote}Concatenates a list of existing objects into a new object in the same 
bucket. The existing source objects are unaffected by this operation
{quote}
In your proposal, the same staging blob is both the target of the compose 
operation and one of the input blobs to be composed. That _might_ work, but 
would have to be tested to see if it's allowed. If it isn't, writing to a new 
blob and deleting the old one would have essentially the same effect. That's 
almost certainly what happens behind the scenes anyway, since blobs are 
immutable.

Bigger picture, if you're trying to combine millions of immutable blobs 
together in one step, 32 at a time, I don't see how you avoid having lots of 
intermediate composed blobs, one way or another, at least temporarily. The main 
question would be how quickly ones that are no longer needed are discarded.
{quote}To streamline this, we could modify {{GSCommitRecoverable}} to update 
the list of {{{}componentObjectIds{}}}, allowing the removal of blobs already 
appended to the {{stagingBlob}}
{quote}
Not quite following you here; the GSCommitRecoverable is provided as an input 
to the GSRecoverableWriterCommitter, providing the list of "raw" (i.e. 
uncomposed) temporary blobs that need to be composed and committed. If you 
removed those raw blobs from that list as they're added to the staging blob, 
what would that accomplish? The GSCommitRecoverable provided to the 
GSRecoverableWriterCommitter isn't persisted anywhere after the commit succeeds 
or fails, and if the commit were to be retried it would need the complete list 
anyway.
{quote}I notice an other issue with the code - because the storage.compose 
might throw a StorageException. With the current code this would mean the 
intermediate composition blobs are not cleaned up. 

If the code above is implemented, I think there is no longer a need the TTL 
feature since all necessary blobs should be written to a final blob. Any 
leftover blobs post-job completion would indicate a failed state.
{quote}
There is no real way to prevent the orphaning of intermediate blobs in all 
failure scenarios. Even in your proposed algorithm, the staging blob could be 
orphaned if the composition process failed partway through. This is what the 
temp bucket and TTL mechanism are for. It's optional, so you don't have to use 
it, but yes you would have to keep an eye out for orphaned intermediate blobs 
via some other mechanism f you choose not to use it.

Honestly, I think some of your difficulties are coming from trying to combine 
so much data together at once vs. doing it along the way, with commits at 
incremental checkpoints. I was going to suggest you reduce your checkpoint 
interval, to aggregate more frequently, but obviously that doesn't work if 
you're not checkpointing at all.

I do think the RecoverableWriter mechanism is designed to make writing 
predictable and repeatable in checkpoint/recovery situations (hence the name); 
if you're not doing any of that, you may run into some challenges. If you were 
to use checkpoints, you would aggregate more frequently, meaning fewer 
intermediate blobs with shorter lifetimes, and you'd be less likely to run up 
against the 5TB limit, as you would end up with a series of smaller files, 
written at each checkpoint, vs. one huge file.

If we do want to consider changes, here's what I suggest:
 * Test whether GCP allows composition to overwrite an existing blob that is 
also an input to the compose operation. If that works, then we could use that 
technique in the composeBlobs function. As mentioned above, I doubt that this 
actually reduces the number of blobs that exist temporarily – since blobs are 
immutable objects – but it would minimize their lifetime, effectively deleting 
them as soon as possible. If overwriting doesn't work, then we could achieve a 
similar effect by deleting intermediate blobs as soon as they are not needed 
anymore, 

Re: [PR] [FLINK-34745] Improve validations for a period in Time Travel [flink]

2024-03-19 Thread via GitHub


snuyanzin commented on code in PR #24534:
URL: https://github.com/apache/flink/pull/24534#discussion_r1530744301


##
flink-table/flink-table-planner/src/main/java/org/apache/flink/table/planner/calcite/FlinkCalciteSqlValidator.java:
##
@@ -222,14 +223,23 @@ protected void registerNamespace(
 Collections.singletonList(simplifiedRexNode),
 reducedNodes);
 // check whether period is the unsupported expression
-if (!(reducedNodes.get(0) instanceof RexLiteral)) {
-throw new UnsupportedOperationException(
+final RexNode reducedNode = reducedNodes.get(0);
+if (!(reducedNode instanceof RexLiteral)) {
+throw new ValidationException(
 String.format(
 "Unsupported time travel expression: %s for 
the expression can not be reduced to a constant by Flink.",
 periodNode));
 }
 
-RexLiteral rexLiteral = (RexLiteral) (reducedNodes).get(0);
+RexLiteral rexLiteral = (RexLiteral) reducedNode;
+final SqlTypeName sqlTypeName = rexLiteral.getTypeName();
+if (!(sqlTypeName == SqlTypeName.TIMESTAMP_WITH_LOCAL_TIME_ZONE
+|| sqlTypeName == SqlTypeName.TIMESTAMP)) {

Review Comment:
   ```suggestion
   if (!SqlTypeUtil.isTimestamp(sqlTypeName) {
   ```
   I think `org.apache.calcite.sql.type.SqlTypeUtil#isTimestamp` could be 
reused here
   as defined at 
   
https://github.com/apache/calcite/blob/413eded693a9087402cc1a6eefeca7a29445d337/core/src/main/java/org/apache/calcite/sql/type/SqlTypeUtil.java#L390-L393
   
   and 
   
https://github.com/apache/calcite/blob/413eded693a9087402cc1a6eefeca7a29445d337/core/src/main/java/org/apache/calcite/sql/type/SqlTypeFamily.java#L182-L183
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Comment Edited] (FLINK-34575) Vulnerabilities in commons-compress 1.24.0; upgrade to 1.26.0 needed.

2024-03-19 Thread Adrian Vasiliu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17824327#comment-17824327
 ] 

Adrian Vasiliu edited comment on FLINK-34575 at 3/19/24 4:39 PM:
-

Thanks Marijn for getting 
[https://github.com/apache/flink-connector-hbase/pull/41] merged. I suppose at 
some point this will be complemented by
[https://github.com/apache/flink/pull/24352|https://github.com/apache/flink/pull/24352]
 .



> I don't think a security fix is necessary, since Flink isn't affected 
> directly by it.

Good to know. Now, vulnerability scanners don't know it, and in enterprise 
context vulnerabilities create trouble / extra processes even when the 
vulnerability can't really be exploited.


was (Author: JIRAUSER280892):
Thanks Marijn for getting 
[https://github.com/apache/flink-connector-hbase/pull/41] merged. I suppose at 
some point this will be complemented by 
[https://github.com/apache/flink/pull/24352.]

> I don't think a security fix is necessary, since Flink isn't affected 
> directly by it.

Good to know. Now, vulnerability scanners don't know it, and in enterprise 
context vulnerabilities create trouble / extra processes even when the 
vulnerability can't really be exploited.

> Vulnerabilities in commons-compress 1.24.0; upgrade to 1.26.0 needed.
> -
>
> Key: FLINK-34575
> URL: https://issues.apache.org/jira/browse/FLINK-34575
> Project: Flink
>  Issue Type: Technical Debt
>Affects Versions: 1.18.1
>Reporter: Adrian Vasiliu
>Priority: Major
> Fix For: hbase-4.0.0
>
>
> Since Feb. 19, medium/high CVEs have been found for commons-compress 1.24.0:
> [https://nvd.nist.gov/vuln/detail/CVE-2024-25710]
> https://nvd.nist.gov/vuln/detail/CVE-2024-26308
> [https://github.com/apache/flink/pull/24352] has been opened automatically on 
> Feb. 21 by dependabot for bumping commons-compress to v1.26.0 which fixes the 
> CVEs, but two CI checks are red on the PR.
> Flink's dependency on commons-compress has been upgraded to v1.24.0 in Oct 
> 2023 (https://issues.apache.org/jira/browse/FLINK-33329).
> v1.24.0 is the version currently in the master 
> branch:[https://github.com/apache/flink/blob/master/pom.xml#L727-L729|https://github.com/apache/flink/blob/master/pom.xml#L727-L729).].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] [FLINK-34745] Improve validations for a period in Time Travel [flink]

2024-03-19 Thread via GitHub


dawidwys commented on code in PR #24534:
URL: https://github.com/apache/flink/pull/24534#discussion_r1530725274


##
flink-table/flink-table-planner/src/test/scala/org/apache/flink/table/planner/plan/stream/sql/join/TemporalJoinTest.scala:
##
@@ -508,6 +508,27 @@ class TemporalJoinTest extends TableTestBase {
 " table, but the rowtime types are TIMESTAMP_LTZ(3) *ROWTIME* and 
TIMESTAMP(3) *ROWTIME*.",
   classOf[ValidationException]
 )
+
+val sqlQuery9 = "SELECT * " +
+  "FROM Orders AS o JOIN " +
+  "RatesHistoryWithPK FOR SYSTEM_TIME AS OF 'o.rowtime' AS r " +
+  "ON o.currency = r.currency"
+expectExceptionThrown(
+  sqlQuery9,
+  "The system time period specification expects Timestamp type but is 
'CHAR'",

Review Comment:
   Unforunately, this message is actually defined in Calcite. We'd need to 
change it there. I could change it for this particular case, but It's also 
thrown from other locations in Calcite. I'd rather keep it consistent.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [FLINK-34745] Improve validations for a period in Time Travel [flink]

2024-03-19 Thread via GitHub


dawidwys commented on code in PR #24534:
URL: https://github.com/apache/flink/pull/24534#discussion_r1530725846


##
flink-table/flink-table-planner/src/main/java/org/apache/flink/table/planner/calcite/FlinkCalciteSqlValidator.java:
##
@@ -222,15 +223,25 @@ protected void registerNamespace(
 Collections.singletonList(simplifiedRexNode),
 reducedNodes);
 // check whether period is the unsupported expression
-if (!(reducedNodes.get(0) instanceof RexLiteral)) {
-throw new UnsupportedOperationException(
+final RexNode reducedNode = reducedNodes.get(0);
+if (!(reducedNode instanceof RexLiteral)) {
+throw new ValidationException(
 String.format(
 "Unsupported time travel expression: %s for 
the expression can not be reduced to a constant by Flink.",
 periodNode));
 }
 
-RexLiteral rexLiteral = (RexLiteral) (reducedNodes).get(0);
-TimestampString timestampString = 
rexLiteral.getValueAs(TimestampString.class);
+final SqlTypeName sqlTypeName = ((RexLiteral) 
reducedNode).getTypeName();
+if (!(sqlTypeName == SqlTypeName.TIMESTAMP_WITH_LOCAL_TIME_ZONE
+|| sqlTypeName == SqlTypeName.TIMESTAMP)) {
+throw newValidationError(
+periodNode,
+
Static.RESOURCE.illegalExpressionForTemporal(sqlTypeName.getName()));
+}
+
+RexLiteral rexLiteral = (RexLiteral) reducedNode;
+TimestampString timestampString =
+((RexLiteral) 
reducedNode).getValueAs(TimestampString.class);

Review Comment:
   ah, I wanted to delete the line with `rexLiteral` 臘 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [FLINK-34745] Improve validations for a period in Time Travel [flink]

2024-03-19 Thread via GitHub


dawidwys commented on code in PR #24534:
URL: https://github.com/apache/flink/pull/24534#discussion_r1530725274


##
flink-table/flink-table-planner/src/test/scala/org/apache/flink/table/planner/plan/stream/sql/join/TemporalJoinTest.scala:
##
@@ -508,6 +508,27 @@ class TemporalJoinTest extends TableTestBase {
 " table, but the rowtime types are TIMESTAMP_LTZ(3) *ROWTIME* and 
TIMESTAMP(3) *ROWTIME*.",
   classOf[ValidationException]
 )
+
+val sqlQuery9 = "SELECT * " +
+  "FROM Orders AS o JOIN " +
+  "RatesHistoryWithPK FOR SYSTEM_TIME AS OF 'o.rowtime' AS r " +
+  "ON o.currency = r.currency"
+expectExceptionThrown(
+  sqlQuery9,
+  "The system time period specification expects Timestamp type but is 
'CHAR'",

Review Comment:
   Unforunately, this message is actually defined in Calcite. We'd need to 
change it there.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [FLINK-34718][tests] Exclude operators not meant for adaptive scheduler [flink]

2024-03-19 Thread via GitHub


XComp commented on PR #24533:
URL: https://github.com/apache/flink/pull/24533#issuecomment-2007544762

   You don't have to wait. You could run both profiles in the same CI build 
(similar to what is done in the nightly workflows)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [FLINK-34718][tests] Exclude operators not meant for adaptive scheduler [flink]

2024-03-19 Thread via GitHub


RyanSkraba commented on PR #24533:
URL: https://github.com/apache/flink/pull/24533#issuecomment-2007540901

   It's in process -- I guess I'll let it run through to show the tests are 
executed outside of the adaptive scheduler, then push it again with the fake 
commit to run through again to show they are excluded.  :thinking: 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Comment Edited] (FLINK-34696) GSRecoverableWriterCommitter is generating excessive data blobs

2024-03-19 Thread Galen Warren (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17828412#comment-17828412
 ] 

Galen Warren edited comment on FLINK-34696 at 3/19/24 3:32 PM:
---

I think your proposed algorithm is just another way of implementing the "delete 
intermediate blobs that are no longer necessary as soon as possible" idea we've 
considered. You accomplish it by overwriting the staging blob on each 
iteration; a similar effect could be achieved by writing to a new intermediate 
staging blob on each iteration (as is done now) and deleting the old one right 
away (which is not done now. instead this deletion occurs shortly thereafter, 
after the commit succeeds).

Aside: I'm not sure whether it's possible to overwrite the existing staging 
blob like you suggest. The [docs |[Objects: compose  |  Cloud Storage  |  
Google 
Cloud|https://cloud.google.com/storage/docs/json_api/v1/objects/compose]] for 
the compose operation say:
{quote}Concatenates a list of existing objects into a new object in the same 
bucket. The existing source objects are unaffected by this operation
{quote}
In your proposal, the same staging blob is both the target of the compose 
operation and one of the input blobs to be composed. That _might_ work, but 
would have to be tested to see if it's allowed. If it isn't, writing to a new 
blob and deleting the old one would have essentially the same effect. That's 
almost certainly what happens behind the scenes anyway, since blobs are 
immutable.

Bigger picture, if you're trying to combine millions of immutable blobs 
together in one step, 32 at a time, I don't see how you avoid having lots of 
intermediate composed blobs, one way or another, at least temporarily. The main 
question would be how quickly ones that are no longer needed are discarded.
{quote}To streamline this, we could modify {{GSCommitRecoverable}} to update 
the list of {{{}componentObjectIds{}}}, allowing the removal of blobs already 
appended to the {{stagingBlob}}
{quote}
Not quite following you here; the GSCommitRecoverable is provided as an input 
to the GSRecoverableWriterCommitter, providing the list of "raw" (i.e. 
uncomposed) temporary blobs that need to be composed and committed. If you 
removed those raw blobs from that list as they're added to the staging blob, 
what would that accomplish? The GSCommitRecoverable provided to the 
GSRecoverableWriterCommitter isn't persisted anywhere after the commit succeeds 
or fails, and if the commit were to be retried it would need the complete list 
anyway.
{quote}I notice an other issue with the code - because the storage.compose 
might throw a StorageException. With the current code this would mean the 
intermediate composition blobs are not cleaned up. 

If the code above is implemented, I think there is no longer a need the TTL 
feature since all necessary blobs should be written to a final blob. Any 
leftover blobs post-job completion would indicate a failed state.
{quote}
There is no real way to prevent the orphaning of intermediate blobs in all 
failure scenarios. Even in your proposed algorithm, the staging blob could be 
orphaned if the composition process failed partway through. This is what the 
temp bucket and TTL mechanism are for. It's optional, so you don't have to use 
it, but yes you would have to keep an eye out for orphaned intermediate blobs 
via some other mechanism f you choose not to use it.

Honestly, I think some of your difficulties are coming from trying to combine 
so much data together at once vs. doing it along the way, with commits at 
incremental checkpoints. I was going to suggest you reduce your checkpoint 
interval, to aggregate more frequently, but obviously that doesn't work if 
you're not checkpointing at all.

I do think the RecoverableWriter mechanism is designed to make writing 
predictable and repeatable in checkpoint/recovery situations (hence the name); 
if you're not doing any of that, you may run into some challenges. If you were 
to use checkpoints, you would aggregate more frequently, meaning fewer 
intermediate blobs with shorter lifetimes, and you'd be less likely to run up 
against the 5TB limit, as you would end up with a series of smaller files, 
written at each checkpoint, vs. one huge file.

If we do want to consider changes, here's what I suggest:
 * Test whether GCP allows composition to overwrite an existing blob that is 
also an input to the compose operation. If that works, then we could use that 
technique in the composeBlobs function. As mentioned above, I doubt that this 
actually reduces the number of blobs that exist temporarily – since blobs are 
immutable objects – but it would minimize their lifetime, effectively deleting 
them as soon as possible. If overwriting doesn't work, then we could achieve a 
similar effect by deleting intermediate blobs as soon as they are not needed 
anymore, 

[jira] [Commented] (FLINK-34696) GSRecoverableWriterCommitter is generating excessive data blobs

2024-03-19 Thread Galen Warren (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17828412#comment-17828412
 ] 

Galen Warren commented on FLINK-34696:
--

I think your proposed algorithm is just another way of implementing the "delete 
intermediate blobs that are no longer necessary as soon as possible" idea we've 
considered. You accomplish it by overwriting the staging blob on each 
iteration; a similar effect could be achieved by writing to a new intermediate 
staging blob on each iteration (as is done now) and deleting the old one right 
away (which is not done now. instead this deletion occurs shortly thereafter, 
after the commit succeeds).

Aside: I'm not sure whether it's possible to overwrite the existing staging 
blob like you suggest. The [docs |[Objects: compose  |  Cloud Storage  |  
Google 
Cloud|https://cloud.google.com/storage/docs/json_api/v1/objects/compose]] for 
the compose operation say:
{quote}Concatenates a list of existing objects into a new object in the same 
bucket. The existing source objects are unaffected by this operation
{quote}
In your proposal, the same staging blob is both the target of the compose 
operation and one of the input blobs to be composed. That _might_ work, but 
would have to be tested to see if it's allowed. If it isn't, writing to a new 
blob and deleting the old one would have essentially the same effect. That's 
almost certainly what happens behind the scenes anyway, since blobs are 
immutable.

Bigger picture, if you're trying to combine millions of immutable blobs 
together in one step, 32 at a time, I don't see how you avoid having lots of 
intermediate composed blobs, one way or another, at least temporarily. The main 
question would be how quickly ones that are no longer needed are discarded.
{quote}To streamline this, we could modify {{GSCommitRecoverable}} to update 
the list of {{{}componentObjectIds{}}}, allowing the removal of blobs already 
appended to the {{stagingBlob}}
{quote}
Not quite following you here; the GSCommitRecoverable is provided as an input 
to the GSRecoverableWriterCommitter, providing the list of "raw" (i.e. 
uncomposed) temporary blobs that need to be composed and committed. If you 
removed those raw blobs from that list as they're added to the staging blob, 
what would that accomplish? The GSCommitRecoverable provided to the 
GSRecoverableWriterCommitter isn't persisted anywhere after the commit succeeds 
or fails, and if the commit were to be retried it would need the complete list 
anyway.
{quote}I notice an other issue with the code - because the storage.compose 
might throw a StorageException. With the current code this would mean the 
intermediate composition blobs are not cleaned up. 

If the code above is implemented, I think there is no longer a need the TTL 
feature since all necessary blobs should be written to a final blob. Any 
leftover blobs post-job completion would indicate a failed state.
{quote}
There is no real way to prevent the orphaning of intermediate blobs in all 
failure scenarios. Even in your proposed algorithm, the staging blob could be 
orphaned if the composition process failed partway through. This is what the 
temp bucket and TTL mechanism are for. It's optional, so you don't have to use 
it, but yes you would have to keep an eye out for orphaned intermediate blobs 
via some other mechanism f you choose not to use it.

Honestly, I think some of your difficulties are coming from trying to combine 
so much data together at once vs. doing it along the way, with commits at 
incremental checkpoints. I was going to suggest you reduce your checkpoint 
interval, to aggregate more frequently, but obviously that doesn't work if 
you're not checkpointing at all.

I do think the RecoverableWriter mechanism is designed to make writing 
predictable and repeatable in checkpoint/recovery situations (hence the name); 
if you're not doing any of that, you may run into some challenges. If you were 
to use checkpoints, you would aggregate more frequently, meaning fewer 
intermediate blobs with shorter lifetimes, and you'd be less likely to run up 
against the 5TB limit, as you would end up with a series of smaller files, 
written at each checkpoint, vs. one huge file.

If we do want to consider changes, here's what I suggest:
 * Test whether GCP allows composition to overwrite an existing blob that is 
also an input to the compose operation. If that works, then we could use that 
technique in the composeBlobs function. As mentioned above, I doubt that this 
actually reduces the number of blobs that exist temporarily – since blobs are 
immutable objects – but it would minimize their lifetime, effectively deleting 
them as soon as possible. If overwriting doesn't work, then we could achieve a 
similar effect by deleting intermediate blobs as soon as they are not needed 
anymore, rather than waiting until the commit succeeds.

 
 * 

Re: [PR] [FLINK-20398][e2e] Migrate test_batch_sql.sh to Java e2e tests framework [flink]

2024-03-19 Thread via GitHub


affo commented on PR #24471:
URL: https://github.com/apache/flink/pull/24471#issuecomment-2007495060

   @XComp 
   
   Required quite of an effort honestly, but here we are with the JUnit5 
version of what I had before  
   
   This also allowed not to start a separate jar, but to directly include the 
code in the text and directly run it agains the MiniCluster obtained  
   
   Thank you for your detailed review


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [FLINK-20398][e2e] Migrate test_batch_sql.sh to Java e2e tests framework [flink]

2024-03-19 Thread via GitHub


affo commented on code in PR #24471:
URL: https://github.com/apache/flink/pull/24471#discussion_r1530613302


##
flink-end-to-end-tests/flink-end-to-end-tests-common/src/main/java/org/apache/flink/tests/util/flink/FlinkDistribution.java:
##
@@ -234,10 +234,7 @@ public JobID submitJob(final JobSubmission jobSubmission, 
Duration timeout) thro
 
 LOG.info("Running {}.", commands.stream().collect(Collectors.joining(" 
")));
 
-final Pattern pattern =
-jobSubmission.isDetached()
-? Pattern.compile("Job has been submitted with JobID 
(.*)")
-: Pattern.compile("Job with JobID (.*) has finished.");
+final Pattern pattern = Pattern.compile("Job has been submitted with 
JobID (.*)");

Review Comment:
   It used to be, as apparently that string matching is now obsolete.
   With latest changes this is not relevant anymore  



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [FLINK-20398][e2e] Migrate test_batch_sql.sh to Java e2e tests framework [flink]

2024-03-19 Thread via GitHub


affo commented on code in PR #24471:
URL: https://github.com/apache/flink/pull/24471#discussion_r1530612428


##
flink-end-to-end-tests/flink-batch-sql-test/src/test/resources/log4j2-test.properties:
##
@@ -0,0 +1,31 @@
+#
+#  Licensed to the Apache Software Foundation (ASF) under one
+#  or more contributor license agreements.  See the NOTICE file
+#  distributed with this work for additional information
+#  regarding copyright ownership.  The ASF licenses this file
+#  to you under the Apache License, Version 2.0 (the
+#  "License"); you may not use this file except in compliance
+#  with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+#  Unless required by applicable law or agreed to in writing, software
+#  distributed under the License is distributed on an "AS IS" BASIS,
+#  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+#  See the License for the specific language governing permissions and
+#  limitations under the License.
+#
+# Set root logger level to OFF to not flood build logs
+# set manually to INFO for debugging purposes
+rootLogger.level=OFF
+rootLogger.appenderRef.test.ref=TestLogger
+appender.testlogger.name=TestLogger
+appender.testlogger.type=CONSOLE
+appender.testlogger.target=SYSTEM_ERR
+appender.testlogger.layout.type=PatternLayout
+appender.testlogger.layout.pattern=%-4r [%t] %-5p %c %x - %m%n
+# Uncomment to enable codegen logging
+#loggers = testlogger
+#logger.testlogger.name =org.apache.flink.table.planner.codegen
+#logger.testlogger.level = TRACE
+#logger.testlogger.appenderRefs = TestLogger

Review Comment:
   This is quite useful for running tests (switch from `OFF` to `INFO`) and 
`OFF` prevents to clutter logs in CI runs



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [FLINK-34745] Improve validations for a period in Time Travel [flink]

2024-03-19 Thread via GitHub


alexey-lv commented on code in PR #24534:
URL: https://github.com/apache/flink/pull/24534#discussion_r1530610432


##
flink-table/flink-table-planner/src/test/scala/org/apache/flink/table/planner/plan/stream/sql/join/TemporalJoinTest.scala:
##
@@ -508,6 +508,27 @@ class TemporalJoinTest extends TableTestBase {
 " table, but the rowtime types are TIMESTAMP_LTZ(3) *ROWTIME* and 
TIMESTAMP(3) *ROWTIME*.",
   classOf[ValidationException]
 )
+
+val sqlQuery9 = "SELECT * " +
+  "FROM Orders AS o JOIN " +
+  "RatesHistoryWithPK FOR SYSTEM_TIME AS OF 'o.rowtime' AS r " +
+  "ON o.currency = r.currency"
+expectExceptionThrown(
+  sqlQuery9,
+  "The system time period specification expects Timestamp type but is 
'CHAR'",

Review Comment:
   nit: the error message would be better if it was changed `...expects 
Timestamp type but is...` --> `...expects Timestamp type, but got ...`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [FLINK-34745] Improve validations for a period in Time Travel [flink]

2024-03-19 Thread via GitHub


alexey-lv commented on code in PR #24534:
URL: https://github.com/apache/flink/pull/24534#discussion_r1530607371


##
flink-table/flink-table-planner/src/main/java/org/apache/flink/table/planner/calcite/FlinkCalciteSqlValidator.java:
##
@@ -222,15 +223,25 @@ protected void registerNamespace(
 Collections.singletonList(simplifiedRexNode),
 reducedNodes);
 // check whether period is the unsupported expression
-if (!(reducedNodes.get(0) instanceof RexLiteral)) {
-throw new UnsupportedOperationException(
+final RexNode reducedNode = reducedNodes.get(0);
+if (!(reducedNode instanceof RexLiteral)) {
+throw new ValidationException(
 String.format(
 "Unsupported time travel expression: %s for 
the expression can not be reduced to a constant by Flink.",
 periodNode));
 }
 
-RexLiteral rexLiteral = (RexLiteral) (reducedNodes).get(0);
-TimestampString timestampString = 
rexLiteral.getValueAs(TimestampString.class);
+final SqlTypeName sqlTypeName = ((RexLiteral) 
reducedNode).getTypeName();
+if (!(sqlTypeName == SqlTypeName.TIMESTAMP_WITH_LOCAL_TIME_ZONE
+|| sqlTypeName == SqlTypeName.TIMESTAMP)) {
+throw newValidationError(
+periodNode,
+
Static.RESOURCE.illegalExpressionForTemporal(sqlTypeName.getName()));
+}
+
+RexLiteral rexLiteral = (RexLiteral) reducedNode;
+TimestampString timestampString =
+((RexLiteral) 
reducedNode).getValueAs(TimestampString.class);

Review Comment:
   Use rexLiteral here



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [FLINK-34745] Improve validations for a period in Time Travel [flink]

2024-03-19 Thread via GitHub


alexey-lv commented on code in PR #24534:
URL: https://github.com/apache/flink/pull/24534#discussion_r1530606670


##
flink-table/flink-table-planner/src/main/java/org/apache/flink/table/planner/calcite/FlinkCalciteSqlValidator.java:
##
@@ -222,15 +223,25 @@ protected void registerNamespace(
 Collections.singletonList(simplifiedRexNode),
 reducedNodes);
 // check whether period is the unsupported expression
-if (!(reducedNodes.get(0) instanceof RexLiteral)) {
-throw new UnsupportedOperationException(
+final RexNode reducedNode = reducedNodes.get(0);
+if (!(reducedNode instanceof RexLiteral)) {
+throw new ValidationException(
 String.format(
 "Unsupported time travel expression: %s for 
the expression can not be reduced to a constant by Flink.",
 periodNode));
 }
 
-RexLiteral rexLiteral = (RexLiteral) (reducedNodes).get(0);
-TimestampString timestampString = 
rexLiteral.getValueAs(TimestampString.class);
+final SqlTypeName sqlTypeName = ((RexLiteral) 
reducedNode).getTypeName();
+if (!(sqlTypeName == SqlTypeName.TIMESTAMP_WITH_LOCAL_TIME_ZONE
+|| sqlTypeName == SqlTypeName.TIMESTAMP)) {
+throw newValidationError(
+periodNode,
+
Static.RESOURCE.illegalExpressionForTemporal(sqlTypeName.getName()));
+}
+
+RexLiteral rexLiteral = (RexLiteral) reducedNode;

Review Comment:
   (nit) move `RexLiteral rexLiteral = (RexLiteral) reducedNode` above 
   `final SqlTypeName sqlTypeName = ((RexLiteral) reducedNode).getTypeName();` 
-- this would help to simplify readability of the latter a bit.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [hotfix] Fix maven property typo in root pom.xml [flink]

2024-03-19 Thread via GitHub


flinkbot commented on PR #24535:
URL: https://github.com/apache/flink/pull/24535#issuecomment-2007483505

   
   ## CI report:
   
   * 1eb406be2e3bc9e67e92b7e8b7d579b584f4b22e UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[PR] [hotfix] Fix maven property typo in root pom.xml [flink]

2024-03-19 Thread via GitHub


RyanSkraba opened a new pull request, #24535:
URL: https://github.com/apache/flink/pull/24535

   ## What is the purpose of the change
   
   There's a missing brace in the root `pom.xml` that would ignore any tests 
that should be excluded on a specific JDK version (`FailsOnJavaXxx`) when the 
`enable-adaptive-scheduler` profile is enabled.  There currently aren't any, 
but this should be corrected (and maybe backported)?
   
   ## Brief change log
   
   Add the missing brace.
   
   ## Verifying this change
   
   This change is a trivial rework without any test coverage.
   
   ## Does this pull request potentially affect one of the following parts:
   
 - Dependencies (does it add or upgrade a dependency): **no**
 - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: **no**
 - The serializers: **no**
 - The runtime per-record code paths (performance sensitive): **no**
 - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Kubernetes/Yarn, ZooKeeper: **no**
 - The S3 file system connector: **no**
   
   ## Documentation
   
 - Does this pull request introduce a new feature? **no**
 - If yes, how is the feature documented? **not documented**
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [FLINK-34745] Improve validations for a period in Time Travel [flink]

2024-03-19 Thread via GitHub


flinkbot commented on PR #24534:
URL: https://github.com/apache/flink/pull/24534#issuecomment-2007457908

   
   ## CI report:
   
   * 16df9b27a63c67e540905939e98775c3c57d99c5 UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (FLINK-34745) Parsing temporal table join throws cryptic exceptions

2024-03-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-34745:
---
Labels: pull-request-available  (was: )

> Parsing temporal table join throws cryptic exceptions
> -
>
> Key: FLINK-34745
> URL: https://issues.apache.org/jira/browse/FLINK-34745
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Planner
>Affects Versions: 1.19.0
>Reporter: Dawid Wysakowicz
>Assignee: Dawid Wysakowicz
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.20.0
>
>
> 1. Wrong expression type in {{AS OF}}:
> {code}
> SELECT * " +
>   "FROM Orders AS o JOIN " +
>   "RatesHistoryWithPK FOR SYSTEM_TIME AS OF 'o.rowtime' AS r " +
>   "ON o.currency = r.currency
> {code}
> throws: 
> {code}
> java.lang.AssertionError: cannot convert CHAR literal to class 
> org.apache.calcite.util.TimestampString
> {code}
> 2. Not a simple table reference in {{AS OF}}
> {code}
> SELECT * " +
>   "FROM Orders AS o JOIN " +
>   "RatesHistoryWithPK FOR SYSTEM_TIME AS OF o.rowtime + INTERVAL '1' 
> SECOND AS r " +
>   "ON o.currency = r.currency
> {code}
> throws:
> {code}
> java.lang.AssertionError: no unique expression found for {id: o.rowtime, 
> prefix: 1}; count is 0
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[PR] [FLINK-34745] Improve validations for a period in Time Travel [flink]

2024-03-19 Thread via GitHub


dawidwys opened a new pull request, #24534:
URL: https://github.com/apache/flink/pull/24534

   ## What is the purpose of the change
   
   The provided SQLs are invalid, however the current code base fails on 
unexpected locations giving unhelpful messages. This PR improves the validation 
which results in proper error messages.
   
   ## Verifying this change
   
   Added tests.
   
   ## Does this pull request potentially affect one of the following parts:
   
 - Dependencies (does it add or upgrade a dependency): (yes / **no**)
 - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: (yes / **no**)
 - The serializers: (yes / **no** / don't know)
 - The runtime per-record code paths (performance sensitive): (yes / **no** 
/ don't know)
 - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Kubernetes/Yarn, ZooKeeper: (yes / **no** / don't 
know)
 - The S3 file system connector: (yes / **no** / don't know)
   
   ## Documentation
   
 - Does this pull request introduce a new feature? (yes / **no**)
 - If yes, how is the feature documented? (**not applicable** / docs / 
JavaDocs / not documented)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [FLINK-34718][tests] Exclude operators not meant for adaptive scheduler [flink]

2024-03-19 Thread via GitHub


flinkbot commented on PR #24533:
URL: https://github.com/apache/flink/pull/24533#issuecomment-2007442964

   
   ## CI report:
   
   * 555032883dddaa40c52c3b789a108de7f5adf55e UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [FLINK-34718][tests] Exclude operators not meant for adaptive scheduler [flink]

2024-03-19 Thread via GitHub


XComp commented on PR #24533:
URL: https://github.com/apache/flink/pull/24533#issuecomment-2007438599

   @RyanSkraba The tests will pass in any way because the PR CI run will only 
utilize the `DefaultScheduler`. You could test this by enabling the 
`AdaptiveScheduler` profile in your PR or by triggering a nightly GHA workflow 
on this PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [FLINK-34718][tests] Exclude operators not meant for adaptive scheduler [flink]

2024-03-19 Thread via GitHub


RyanSkraba commented on PR #24533:
URL: https://github.com/apache/flink/pull/24533#issuecomment-2007422589

   @WencongLiu If this passes, can you take a look?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (FLINK-34718) KeyedPartitionWindowedStream and NonPartitionWindowedStream IllegalStateException in AZP

2024-03-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-34718:
---
Labels: pull-request-available test-stability  (was: test-stability)

> KeyedPartitionWindowedStream and NonPartitionWindowedStream 
> IllegalStateException in AZP
> 
>
> Key: FLINK-34718
> URL: https://issues.apache.org/jira/browse/FLINK-34718
> Project: Flink
>  Issue Type: Bug
>  Components: API / DataStream
>Affects Versions: 1.20.0
>Reporter: Ryan Skraba
>Priority: Critical
>  Labels: pull-request-available, test-stability
>
> [https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=58320=logs=8fd9202e-fd17-5b26-353c-ac1ff76c8f28=ea7cf968-e585-52cb-e0fc-f48de023a7ca=9646]
> 18 of the KeyedPartitionWindowedStreamITCase and 
> NonKeyedPartitionWindowedStreamITCase unit tests introduced in FLINK-34543 
> are failing in the adaptive scheduler profile, with errors similar to:
> {code:java}
> Mar 15 01:54:12 Caused by: java.lang.IllegalStateException: The adaptive 
> scheduler supports pipelined data exchanges (violated by MapPartition 
> (org.apache.flink.streaming.runtime.tasks.OneInputStreamTask) -> 
> ddb598ad156ed281023ba4eebbe487e3).
> Mar 15 01:54:12   at 
> org.apache.flink.util.Preconditions.checkState(Preconditions.java:215)
> Mar 15 01:54:12   at 
> org.apache.flink.runtime.scheduler.adaptive.AdaptiveScheduler.assertPreconditions(AdaptiveScheduler.java:438)
> Mar 15 01:54:12   at 
> org.apache.flink.runtime.scheduler.adaptive.AdaptiveScheduler.(AdaptiveScheduler.java:356)
> Mar 15 01:54:12   at 
> org.apache.flink.runtime.scheduler.adaptive.AdaptiveSchedulerFactory.createInstance(AdaptiveSchedulerFactory.java:124)
> Mar 15 01:54:12   at 
> org.apache.flink.runtime.jobmaster.DefaultSlotPoolServiceSchedulerFactory.createScheduler(DefaultSlotPoolServiceSchedulerFactory.java:121)
> Mar 15 01:54:12   at 
> org.apache.flink.runtime.jobmaster.JobMaster.createScheduler(JobMaster.java:384)
> Mar 15 01:54:12   at 
> org.apache.flink.runtime.jobmaster.JobMaster.(JobMaster.java:361)
> Mar 15 01:54:12   at 
> org.apache.flink.runtime.jobmaster.factories.DefaultJobMasterServiceFactory.internalCreateJobMasterService(DefaultJobMasterServiceFactory.java:128)
> Mar 15 01:54:12   at 
> org.apache.flink.runtime.jobmaster.factories.DefaultJobMasterServiceFactory.lambda$createJobMasterService$0(DefaultJobMasterServiceFactory.java:100)
> Mar 15 01:54:12   at 
> org.apache.flink.util.function.FunctionUtils.lambda$uncheckedSupplier$4(FunctionUtils.java:112)
> Mar 15 01:54:12   at 
> java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1604)
> Mar 15 01:54:12   ... 4 more
>  {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[PR] [FLINK-34718][tests] Exclude operators not meant for adaptive scheduler [flink]

2024-03-19 Thread via GitHub


RyanSkraba opened a new pull request, #24533:
URL: https://github.com/apache/flink/pull/24533

   ## What is the purpose of the change
   
   * Some of the new tests for the unit tests introduced in FLINK-34543 are 
failing in the **adaptive scheduler** stage.  These operators are not suitable 
for the adaptive scheduler and should be excluded.
   
   ## Brief change log
   
   Add `@Tag` to exclude the `KeyedPartitionWindowedStreamITCase` and 
`NonKeyedPartitionWindowedStreamITCase`.
   
   ## Verifying this change
   
   This change can be verified in the Azure and GitHub Actions pipeline by 
verifying the execution of a full build: they should be run in stages that 
don't have the system property "flink.tests.enable-adaptive-scheduler" set to 
true (and definitely excluded from the **adaptive scheduler** stage)
   
   ## Does this pull request potentially affect one of the following parts:
   
 - Dependencies (does it add or upgrade a dependency): **no**
 - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: **no**
 - The serializers: **no**
 - The runtime per-record code paths (performance sensitive): **no**
 - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Kubernetes/Yarn, ZooKeeper: **no**
 - The S3 file system connector **no**
   
   ## Documentation
   
 - Does this pull request introduce a new feature? **no**
 - If yes, how is the feature documented? **not applicable**
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (FLINK-34745) Parsing temporal table join throws cryptic exceptions

2024-03-19 Thread Dawid Wysakowicz (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dawid Wysakowicz updated FLINK-34745:
-
Description: 
1. Wrong expression type in {{AS OF}}:
{code}
SELECT * " +
  "FROM Orders AS o JOIN " +
  "RatesHistoryWithPK FOR SYSTEM_TIME AS OF 'o.rowtime' AS r " +
  "ON o.currency = r.currency
{code}

throws: 

{code}
java.lang.AssertionError: cannot convert CHAR literal to class 
org.apache.calcite.util.TimestampString
{code}

2. Not a simple table reference in {{AS OF}}
{code}
SELECT * " +
  "FROM Orders AS o JOIN " +
  "RatesHistoryWithPK FOR SYSTEM_TIME AS OF o.rowtime + INTERVAL '1' SECOND 
AS r " +
  "ON o.currency = r.currency
{code}

throws:
{code}
java.lang.AssertionError: no unique expression found for {id: o.rowtime, 
prefix: 1}; count is 0
{code}

  was:
1. Wrong expression type in `AS OF`:
{code}
SELECT * " +
  "FROM Orders AS o JOIN " +
  "RatesHistoryWithPK FOR SYSTEM_TIME AS OF 'o.rowtime' AS r " +
  "ON o.currency = r.currency
{code}

throws: 

{code}
java.lang.AssertionError: cannot convert CHAR literal to class 
org.apache.calcite.util.TimestampString
{code}

2. Not a table simple table reference
{code}
SELECT * " +
  "FROM Orders AS o JOIN " +
  "RatesHistoryWithPK FOR SYSTEM_TIME AS OF o.rowtime + INTERVAL '1' SECOND 
AS r " +
  "ON o.currency = r.currency
{code}

throws:
{code}
java.lang.AssertionError: no unique expression found for {id: o.rowtime, 
prefix: 1}; count is 0
{code}


> Parsing temporal table join throws cryptic exceptions
> -
>
> Key: FLINK-34745
> URL: https://issues.apache.org/jira/browse/FLINK-34745
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Planner
>Affects Versions: 1.19.0
>Reporter: Dawid Wysakowicz
>Assignee: Dawid Wysakowicz
>Priority: Major
> Fix For: 1.20.0
>
>
> 1. Wrong expression type in {{AS OF}}:
> {code}
> SELECT * " +
>   "FROM Orders AS o JOIN " +
>   "RatesHistoryWithPK FOR SYSTEM_TIME AS OF 'o.rowtime' AS r " +
>   "ON o.currency = r.currency
> {code}
> throws: 
> {code}
> java.lang.AssertionError: cannot convert CHAR literal to class 
> org.apache.calcite.util.TimestampString
> {code}
> 2. Not a simple table reference in {{AS OF}}
> {code}
> SELECT * " +
>   "FROM Orders AS o JOIN " +
>   "RatesHistoryWithPK FOR SYSTEM_TIME AS OF o.rowtime + INTERVAL '1' 
> SECOND AS r " +
>   "ON o.currency = r.currency
> {code}
> throws:
> {code}
> java.lang.AssertionError: no unique expression found for {id: o.rowtime, 
> prefix: 1}; count is 0
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-34745) Parsing temporal table join throws cryptic exceptions

2024-03-19 Thread Dawid Wysakowicz (Jira)
Dawid Wysakowicz created FLINK-34745:


 Summary: Parsing temporal table join throws cryptic exceptions
 Key: FLINK-34745
 URL: https://issues.apache.org/jira/browse/FLINK-34745
 Project: Flink
  Issue Type: Bug
  Components: Table SQL / Planner
Affects Versions: 1.19.0
Reporter: Dawid Wysakowicz
Assignee: Dawid Wysakowicz
 Fix For: 1.20.0


1. Wrong expression type in `AS OF`:
{code}
SELECT * " +
  "FROM Orders AS o JOIN " +
  "RatesHistoryWithPK FOR SYSTEM_TIME AS OF 'o.rowtime' AS r " +
  "ON o.currency = r.currency
{code}

throws: 

{code}
java.lang.AssertionError: cannot convert CHAR literal to class 
org.apache.calcite.util.TimestampString
{code}

2. Not a table simple table reference
{code}
SELECT * " +
  "FROM Orders AS o JOIN " +
  "RatesHistoryWithPK FOR SYSTEM_TIME AS OF o.rowtime + INTERVAL '1' SECOND 
AS r " +
  "ON o.currency = r.currency
{code}

throws:
{code}
java.lang.AssertionError: no unique expression found for {id: o.rowtime, 
prefix: 1}; count is 0
{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-34718) KeyedPartitionWindowedStream and NonPartitionWindowedStream IllegalStateException in AZP

2024-03-19 Thread Ryan Skraba (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17828371#comment-17828371
 ] 

Ryan Skraba commented on FLINK-34718:
-

I haven't been following FLIP-331 – if these operators are never appropriate 
for the AdaptiveScheduler, then we should exclude them from that stage in the 
pipeline.  If I understand correctly, this is a relatively easy thing to do for 
both Azure and GitHub actions.  I'll take a look.

> KeyedPartitionWindowedStream and NonPartitionWindowedStream 
> IllegalStateException in AZP
> 
>
> Key: FLINK-34718
> URL: https://issues.apache.org/jira/browse/FLINK-34718
> Project: Flink
>  Issue Type: Bug
>  Components: API / DataStream
>Affects Versions: 1.20.0
>Reporter: Ryan Skraba
>Priority: Critical
>  Labels: test-stability
>
> [https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=58320=logs=8fd9202e-fd17-5b26-353c-ac1ff76c8f28=ea7cf968-e585-52cb-e0fc-f48de023a7ca=9646]
> 18 of the KeyedPartitionWindowedStreamITCase and 
> NonKeyedPartitionWindowedStreamITCase unit tests introduced in FLINK-34543 
> are failing in the adaptive scheduler profile, with errors similar to:
> {code:java}
> Mar 15 01:54:12 Caused by: java.lang.IllegalStateException: The adaptive 
> scheduler supports pipelined data exchanges (violated by MapPartition 
> (org.apache.flink.streaming.runtime.tasks.OneInputStreamTask) -> 
> ddb598ad156ed281023ba4eebbe487e3).
> Mar 15 01:54:12   at 
> org.apache.flink.util.Preconditions.checkState(Preconditions.java:215)
> Mar 15 01:54:12   at 
> org.apache.flink.runtime.scheduler.adaptive.AdaptiveScheduler.assertPreconditions(AdaptiveScheduler.java:438)
> Mar 15 01:54:12   at 
> org.apache.flink.runtime.scheduler.adaptive.AdaptiveScheduler.(AdaptiveScheduler.java:356)
> Mar 15 01:54:12   at 
> org.apache.flink.runtime.scheduler.adaptive.AdaptiveSchedulerFactory.createInstance(AdaptiveSchedulerFactory.java:124)
> Mar 15 01:54:12   at 
> org.apache.flink.runtime.jobmaster.DefaultSlotPoolServiceSchedulerFactory.createScheduler(DefaultSlotPoolServiceSchedulerFactory.java:121)
> Mar 15 01:54:12   at 
> org.apache.flink.runtime.jobmaster.JobMaster.createScheduler(JobMaster.java:384)
> Mar 15 01:54:12   at 
> org.apache.flink.runtime.jobmaster.JobMaster.(JobMaster.java:361)
> Mar 15 01:54:12   at 
> org.apache.flink.runtime.jobmaster.factories.DefaultJobMasterServiceFactory.internalCreateJobMasterService(DefaultJobMasterServiceFactory.java:128)
> Mar 15 01:54:12   at 
> org.apache.flink.runtime.jobmaster.factories.DefaultJobMasterServiceFactory.lambda$createJobMasterService$0(DefaultJobMasterServiceFactory.java:100)
> Mar 15 01:54:12   at 
> org.apache.flink.util.function.FunctionUtils.lambda$uncheckedSupplier$4(FunctionUtils.java:112)
> Mar 15 01:54:12   at 
> java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1604)
> Mar 15 01:54:12   ... 4 more
>  {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-34727) RestClusterClient.requestJobResult throw ConnectionClosedException when the accumulator data is large

2024-03-19 Thread Wancheng Xiao (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wancheng Xiao updated FLINK-34727:
--
Description: 
The task was succeed, but "RestClusterClient.requestJobResult()" encountered an 
error reporting ConnectionClosedException. (Channel became inactive)
After debugging, it is speculated that the problem occurred in the flink task 
server-side "AbstractRestHandler.respondToRequest()" with the 
"response.thenAccept(resp -> HandlerUtils.sendResponse())", this "thenAccept()" 
did not pass the future returned by sendResponse, causing the server shutdown 
process before the request was sent. I suspect that "thenAccept()" needs to be 
replaced with "thenCompose()"

The details are as follows:

 

*Pseudocode:*

!image-2024-03-19-15-51-20-150.png|width=802,height=222!

 

*Server handling steps:*

netty-thread: got request
flink-dispatcher-thread: exec requestJobResult[6] and complete 
shutDownFuture[8], then call HandlerUtils.sendResponse[13](netty async write)
netty-thread: write some data to channel.(not done)
flink-dispatcher-thread: call inFlightRequestTracker.deregisterRequest[15]
netty-thread: write some data to channel failed, channel not active

i added some log to trace this bug:

!AbstractHandler.png|width=406,height=313!

!AbstractRestHandler.png|width=418,height=322!

!MiniDispatcher.png|width=419,height=277!

!RestServerEndpoint.png|width=419,height=279!

then i got:

/{*}then call requestJobResult and shutDownFuture.complete; (close channel when 
request deregisted){*}/
2024-03-17 18:01:34.788 [flink-akka.actor.default-dispatcher-20] INFO  
o.a.flink.runtime.rest.handler.job.JobExecutionResultHandler  - 
JobExecutionResultHandler gateway.requestJobStatus complete. 
[jobStatus=FINISHED]
/{*}submit sendResponse{*}/
2024-03-17 18:01:34.821 [flink-akka.actor.default-dispatcher-20] INFO  
o.a.flink.runtime.rest.handler.job.JobExecutionResultHandler  - submit 
HandlerUtils.sendResponse().
/{*}thenAccept(sendResponse()) is complete, will call inFlightRequestTracker, 
but sendResponse's return future not completed{*}  /
2024-03-17 18:01:34.821 [flink-akka.actor.default-dispatcher-20] INFO  
o.a.flink.runtime.rest.handler.job.JobExecutionResultHandler  - 
requestProcessingFuture complete. 
[requestProcessingFuture=java.util.concurrent.CompletableFuture@1329aca5[Completed
 normally]]
/{*}sendResponse's write task is still running{*}/
2024-03-17 18:01:34.822 [flink-rest-server-netty-worker-thread-10] INFO  
o.a.f.s.netty4.io.netty.handler.stream.ChunkedWriteHandler  - write
/{*}deregister request and then shut down, then channel close{*}/
2024-03-17 18:01:34.826 [flink-akka.actor.default-dispatcher-20] INFO  
o.a.flink.runtime.rest.handler.job.JobExecutionResultHandler  - call 
inFlightRequestTracker.deregisterRequest() done
2024-03-17 18:01:34.827 [flink-rest-server-netty-worker-thread-10] INFO  
o.a.f.shaded.netty4.io.netty.channel.DefaultChannelPipeline  - pipeline close.
2024-03-17 18:01:34.827 [flink-rest-server-netty-worker-thread-10] INFO  
org.apache.flink.runtime.rest.handler.util.HandlerUtils  - lastContentFuture 
complete. [future=DefaultChannelPromise@621f03ea(failure: 
java.nio.channels.ClosedChannelException)]

*more details in flink_bug_complex.log*

 

 

 

Additionally:

During the process of investigating this bug, 
FutureUtils.retryOperationWithDelay swallowed the first occurrence of the 
"Channel became inactive" exception and, after several retries, the server was 
shut down,then the client throw "Connection refused" Exception. which had some 
impact on the troubleshooting process. Could we consider adding some logging 
here to aid in future diagnostics?

  was:
The task was succeed, but "RestClusterClient.requestJobResult()" encountered an 
error reporting ConnectionClosedException. (Channel became inactive)
After debugging, it is speculated that the problem occurred in the flink task 
server-side "AbstractRestHandler.respondToRequest()" with the 
"response.thenAccept(resp -> HandlerUtils.sendResponse())", this "thenAccept()" 
did not pass the future returned by sendResponse, causing the server shutdown 
process before the request was sent. I suspect that "thenAccept()" needs to be 
replaced with "thenCompose()"

The details are as follows:

 

*Pseudocode:*

!image-2024-03-19-15-51-20-150.png!

 

*Server handling steps:*

netty-thread: got request
flink-dispatcher-thread: exec requestJobResult[6] and complete 
shutDownFuture[8], then call HandlerUtils.sendResponse[13](netty async write)
netty-thread: write some data to channel.(not done)
flink-dispatcher-thread: call inFlightRequestTracker.deregisterRequest[15]
netty-thread: write some data to channel failed, channel not active

i added some log to trace this bug:  !AbstractHandler.png!

!AbstractRestHandler.png!

!MiniDispatcher.png!

!RestServerEndpoint.png!

then i got:

/{*}then call requestJobResult and 

[jira] [Assigned] (FLINK-34723) Parquet writer should restrict map keys to be not null

2024-03-19 Thread Xingcan Cui (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xingcan Cui reassigned FLINK-34723:
---

Assignee: (was: Xingcan Cui)

> Parquet writer should restrict map keys to be not null
> --
>
> Key: FLINK-34723
> URL: https://issues.apache.org/jira/browse/FLINK-34723
> Project: Flink
>  Issue Type: Bug
>  Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile)
>Affects Versions: 1.19.0, 1.18.1
>Reporter: Xingcan Cui
>Priority: Major
>  Labels: pull-request-available
>
> We got the following exception when reading a parquet file (with map types) 
> generated by Flink.
> {code:java}
> Map keys must be annotated as required.{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-34718) KeyedPartitionWindowedStream and NonPartitionWindowedStream IllegalStateException in AZP

2024-03-19 Thread Ryan Skraba (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17828348#comment-17828348
 ] 

Ryan Skraba commented on FLINK-34718:
-

* 
[https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=58398=logs=8fd9202e-fd17-5b26-353c-ac1ff76c8f28=ea7cf968-e585-52cb-e0fc-f48de023a7ca=9670]

> KeyedPartitionWindowedStream and NonPartitionWindowedStream 
> IllegalStateException in AZP
> 
>
> Key: FLINK-34718
> URL: https://issues.apache.org/jira/browse/FLINK-34718
> Project: Flink
>  Issue Type: Bug
>  Components: API / DataStream
>Affects Versions: 1.20.0
>Reporter: Ryan Skraba
>Priority: Critical
>  Labels: test-stability
>
> [https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=58320=logs=8fd9202e-fd17-5b26-353c-ac1ff76c8f28=ea7cf968-e585-52cb-e0fc-f48de023a7ca=9646]
> 18 of the KeyedPartitionWindowedStreamITCase and 
> NonKeyedPartitionWindowedStreamITCase unit tests introduced in FLINK-34543 
> are failing in the adaptive scheduler profile, with errors similar to:
> {code:java}
> Mar 15 01:54:12 Caused by: java.lang.IllegalStateException: The adaptive 
> scheduler supports pipelined data exchanges (violated by MapPartition 
> (org.apache.flink.streaming.runtime.tasks.OneInputStreamTask) -> 
> ddb598ad156ed281023ba4eebbe487e3).
> Mar 15 01:54:12   at 
> org.apache.flink.util.Preconditions.checkState(Preconditions.java:215)
> Mar 15 01:54:12   at 
> org.apache.flink.runtime.scheduler.adaptive.AdaptiveScheduler.assertPreconditions(AdaptiveScheduler.java:438)
> Mar 15 01:54:12   at 
> org.apache.flink.runtime.scheduler.adaptive.AdaptiveScheduler.(AdaptiveScheduler.java:356)
> Mar 15 01:54:12   at 
> org.apache.flink.runtime.scheduler.adaptive.AdaptiveSchedulerFactory.createInstance(AdaptiveSchedulerFactory.java:124)
> Mar 15 01:54:12   at 
> org.apache.flink.runtime.jobmaster.DefaultSlotPoolServiceSchedulerFactory.createScheduler(DefaultSlotPoolServiceSchedulerFactory.java:121)
> Mar 15 01:54:12   at 
> org.apache.flink.runtime.jobmaster.JobMaster.createScheduler(JobMaster.java:384)
> Mar 15 01:54:12   at 
> org.apache.flink.runtime.jobmaster.JobMaster.(JobMaster.java:361)
> Mar 15 01:54:12   at 
> org.apache.flink.runtime.jobmaster.factories.DefaultJobMasterServiceFactory.internalCreateJobMasterService(DefaultJobMasterServiceFactory.java:128)
> Mar 15 01:54:12   at 
> org.apache.flink.runtime.jobmaster.factories.DefaultJobMasterServiceFactory.lambda$createJobMasterService$0(DefaultJobMasterServiceFactory.java:100)
> Mar 15 01:54:12   at 
> org.apache.flink.util.function.FunctionUtils.lambda$uncheckedSupplier$4(FunctionUtils.java:112)
> Mar 15 01:54:12   at 
> java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1604)
> Mar 15 01:54:12   ... 4 more
>  {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Reopened] (FLINK-34643) JobIDLoggingITCase failed

2024-03-19 Thread Matthias Pohl (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Pohl reopened FLINK-34643:
---

I'm reopening the issue. [~roman] is this due to the fact that we haven't 
collected all the logs that should be considered (due to the "empirical nature" 
of this test)?

> JobIDLoggingITCase failed
> -
>
> Key: FLINK-34643
> URL: https://issues.apache.org/jira/browse/FLINK-34643
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.20.0
>Reporter: Matthias Pohl
>Assignee: Roman Khachatryan
>Priority: Major
>  Labels: pull-request-available, test-stability
> Fix For: 1.20.0
>
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=58187=logs=8fd9202e-fd17-5b26-353c-ac1ff76c8f28=ea7cf968-e585-52cb-e0fc-f48de023a7ca=7897
> {code}
> Mar 09 01:24:23 01:24:23.498 [ERROR] Tests run: 1, Failures: 0, Errors: 1, 
> Skipped: 0, Time elapsed: 4.209 s <<< FAILURE! -- in 
> org.apache.flink.test.misc.JobIDLoggingITCase
> Mar 09 01:24:23 01:24:23.498 [ERROR] 
> org.apache.flink.test.misc.JobIDLoggingITCase.testJobIDLogging(ClusterClient) 
> -- Time elapsed: 1.459 s <<< ERROR!
> Mar 09 01:24:23 java.lang.IllegalStateException: Too few log events recorded 
> for org.apache.flink.runtime.jobmaster.JobMaster (12) - this must be a bug in 
> the test code
> Mar 09 01:24:23   at 
> org.apache.flink.util.Preconditions.checkState(Preconditions.java:215)
> Mar 09 01:24:23   at 
> org.apache.flink.test.misc.JobIDLoggingITCase.assertJobIDPresent(JobIDLoggingITCase.java:148)
> Mar 09 01:24:23   at 
> org.apache.flink.test.misc.JobIDLoggingITCase.testJobIDLogging(JobIDLoggingITCase.java:132)
> Mar 09 01:24:23   at java.lang.reflect.Method.invoke(Method.java:498)
> Mar 09 01:24:23   at 
> java.util.concurrent.RecursiveAction.exec(RecursiveAction.java:189)
> Mar 09 01:24:23   at 
> java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
> Mar 09 01:24:23   at 
> java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056)
> Mar 09 01:24:23   at 
> java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)
> Mar 09 01:24:23   at 
> java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:175)
> Mar 09 01:24:23 
> {code}
> The other test failures of this build were also caused by the same test:
> * 
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=58187=logs=2c3cbe13-dee0-5837-cf47-3053da9a8a78=b78d9d30-509a-5cea-1fef-db7abaa325ae=8349
> * 
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=58187=logs=a596f69e-60d2-5a4b-7d39-dc69e4cdaed3=712ade8c-ca16-5b76-3acd-14df33bc1cb1=8209



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-34744) autoscaling-dynamic cannot run

2024-03-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-34744:
---
Labels: pull-request-available  (was: )

> autoscaling-dynamic cannot run
> --
>
> Key: FLINK-34744
> URL: https://issues.apache.org/jira/browse/FLINK-34744
> Project: Flink
>  Issue Type: Bug
>  Components: Autoscaler
>Reporter: Rui Fan
>Assignee: Rui Fan
>Priority: Major
>  Labels: pull-request-available
> Fix For: kubernetes-operator-1.9.0
>
> Attachments: image-2024-03-19-21-46-15-530.png
>
>
> autoscaling-dynamic cannot run on my Mac
>  !image-2024-03-19-21-46-15-530.png! 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[PR] [FLINK-34744][autoscaler] Fix the issue that autoscaling-dynamic cannot run [flink-kubernetes-operator]

2024-03-19 Thread via GitHub


1996fanrui opened a new pull request, #800:
URL: https://github.com/apache/flink-kubernetes-operator/pull/800

   autoscaling-dynamic cannot run on my Local
   
   I guess yaml cannot hanlde the `\n` properly with multiple lines.
   
   https://github.com/apache/flink-kubernetes-operator/assets/38427477/2abe2602-3d16-478a-992d-49d7e536c48f;>
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Created] (FLINK-34744) autoscaling-dynamic cannot run

2024-03-19 Thread Rui Fan (Jira)
Rui Fan created FLINK-34744:
---

 Summary: autoscaling-dynamic cannot run
 Key: FLINK-34744
 URL: https://issues.apache.org/jira/browse/FLINK-34744
 Project: Flink
  Issue Type: Bug
  Components: Autoscaler
Reporter: Rui Fan
Assignee: Rui Fan
 Fix For: kubernetes-operator-1.9.0
 Attachments: image-2024-03-19-21-46-15-530.png

autoscaling-dynamic cannot run on my Mac

 !image-2024-03-19-21-46-15-530.png! 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-34404) GroupWindowAggregateProcTimeRestoreTest#testRestore times out

2024-03-19 Thread Ryan Skraba (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17828345#comment-17828345
 ] 

Ryan Skraba commented on FLINK-34404:
-

This also occurs on AZP:
* 
[https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=58399=logs=f2c100be-250b-5e85-7bbe-176f68fcddc5=05efd11e-5400-54a4-0d27-a4663be008a9=12626
 
|https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=58399=logs=f2c100be-250b-5e85-7bbe-176f68fcddc5=05efd11e-5400-54a4-0d27-a4663be008a9=12626](1.19)

 

> GroupWindowAggregateProcTimeRestoreTest#testRestore times out
> -
>
> Key: FLINK-34404
> URL: https://issues.apache.org/jira/browse/FLINK-34404
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Planner
>Affects Versions: 1.19.0, 1.20.0
>Reporter: Matthias Pohl
>Assignee: Alan Sheinberg
>Priority: Critical
>  Labels: test-stability
> Attachments: FLINK-34404.failure.log, FLINK-34404.success.log
>
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57357=logs=32715a4c-21b8-59a3-4171-744e5ab107eb=ff64056b-5320-5afe-c22c-6fa339e59586=11603
> {code}
> Feb 07 02:17:40 "ForkJoinPool-74-worker-1" #382 daemon prio=5 os_prio=0 
> cpu=282.22ms elapsed=961.78s tid=0x7f880a485c00 nid=0x6745 waiting on 
> condition  [0x7f878a6f9000]
> Feb 07 02:17:40java.lang.Thread.State: WAITING (parking)
> Feb 07 02:17:40   at 
> jdk.internal.misc.Unsafe.park(java.base@17.0.7/Native Method)
> Feb 07 02:17:40   - parking to wait for  <0xff73d060> (a 
> java.util.concurrent.CompletableFuture$Signaller)
> Feb 07 02:17:40   at 
> java.util.concurrent.locks.LockSupport.park(java.base@17.0.7/LockSupport.java:211)
> Feb 07 02:17:40   at 
> java.util.concurrent.CompletableFuture$Signaller.block(java.base@17.0.7/CompletableFuture.java:1864)
> Feb 07 02:17:40   at 
> java.util.concurrent.ForkJoinPool.compensatedBlock(java.base@17.0.7/ForkJoinPool.java:3449)
> Feb 07 02:17:40   at 
> java.util.concurrent.ForkJoinPool.managedBlock(java.base@17.0.7/ForkJoinPool.java:3432)
> Feb 07 02:17:40   at 
> java.util.concurrent.CompletableFuture.waitingGet(java.base@17.0.7/CompletableFuture.java:1898)
> Feb 07 02:17:40   at 
> java.util.concurrent.CompletableFuture.get(java.base@17.0.7/CompletableFuture.java:2072)
> Feb 07 02:17:40   at 
> org.apache.flink.table.planner.plan.nodes.exec.testutils.RestoreTestBase.testRestore(RestoreTestBase.java:292)
> Feb 07 02:17:40   at 
> jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(java.base@17.0.7/Native 
> Method)
> Feb 07 02:17:40   at 
> jdk.internal.reflect.NativeMethodAccessorImpl.invoke(java.base@17.0.7/NativeMethodAccessorImpl.java:77)
> Feb 07 02:17:40   at 
> jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(java.base@17.0.7/DelegatingMethodAccessorImpl.java:43)
> Feb 07 02:17:40   at 
> java.lang.reflect.Method.invoke(java.base@17.0.7/Method.java:568)
> Feb 07 02:17:40   at 
> org.junit.platform.commons.util.ReflectionUtils.invokeMethod(ReflectionUtils.java:728)
> [...]
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (FLINK-33186) CheckpointAfterAllTasksFinishedITCase.testRestoreAfterSomeTasksFinished fails on AZP

2024-03-19 Thread Ryan Skraba (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-33186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17828331#comment-17828331
 ] 

Ryan Skraba edited comment on FLINK-33186 at 3/19/24 1:35 PM:
--

* 
[https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=58399=logs=39d5b1d5-3b41-54dc-6458-1e2ddd1cdcf3=0c010d0c-3dec-5bf1-d408-7b18988b1b2b=8063]
 (1.20)


was (Author: ryanskraba):
* 
[https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=58399=logs=39d5b1d5-3b41-54dc-6458-1e2ddd1cdcf3=0c010d0c-3dec-5bf1-d408-7b18988b1b2b=8063]

>  CheckpointAfterAllTasksFinishedITCase.testRestoreAfterSomeTasksFinished 
> fails on AZP
> -
>
> Key: FLINK-33186
> URL: https://issues.apache.org/jira/browse/FLINK-33186
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Checkpointing
>Affects Versions: 1.19.0, 1.18.1
>Reporter: Sergey Nuyanzin
>Assignee: Jiang Xin
>Priority: Critical
>  Labels: test-stability
>
> This build 
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=53509=logs=baf26b34-3c6a-54e8-f93f-cf269b32f802=8c9d126d-57d2-5a9e-a8c8-ff53f7b35cd9=8762
> fails as
> {noformat}
> Sep 28 01:23:43 Caused by: 
> org.apache.flink.runtime.checkpoint.CheckpointException: Task local 
> checkpoint failure.
> Sep 28 01:23:43   at 
> org.apache.flink.runtime.checkpoint.PendingCheckpoint.abort(PendingCheckpoint.java:550)
> Sep 28 01:23:43   at 
> org.apache.flink.runtime.checkpoint.CheckpointCoordinator.abortPendingCheckpoint(CheckpointCoordinator.java:2248)
> Sep 28 01:23:43   at 
> org.apache.flink.runtime.checkpoint.CheckpointCoordinator.abortPendingCheckpoint(CheckpointCoordinator.java:2235)
> Sep 28 01:23:43   at 
> org.apache.flink.runtime.checkpoint.CheckpointCoordinator.lambda$null$9(CheckpointCoordinator.java:817)
> Sep 28 01:23:43   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> Sep 28 01:23:43   at 
> java.util.concurrent.FutureTask.run(FutureTask.java:266)
> Sep 28 01:23:43   at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
> Sep 28 01:23:43   at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
> Sep 28 01:23:43   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> Sep 28 01:23:43   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> Sep 28 01:23:43   at java.lang.Thread.run(Thread.java:748)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-33186) CheckpointAfterAllTasksFinishedITCase.testRestoreAfterSomeTasksFinished fails on AZP

2024-03-19 Thread Ryan Skraba (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-33186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17828331#comment-17828331
 ] 

Ryan Skraba commented on FLINK-33186:
-

* 
[https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=58399=logs=39d5b1d5-3b41-54dc-6458-1e2ddd1cdcf3=0c010d0c-3dec-5bf1-d408-7b18988b1b2b=8063]

>  CheckpointAfterAllTasksFinishedITCase.testRestoreAfterSomeTasksFinished 
> fails on AZP
> -
>
> Key: FLINK-33186
> URL: https://issues.apache.org/jira/browse/FLINK-33186
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Checkpointing
>Affects Versions: 1.19.0, 1.18.1
>Reporter: Sergey Nuyanzin
>Assignee: Jiang Xin
>Priority: Critical
>  Labels: test-stability
>
> This build 
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=53509=logs=baf26b34-3c6a-54e8-f93f-cf269b32f802=8c9d126d-57d2-5a9e-a8c8-ff53f7b35cd9=8762
> fails as
> {noformat}
> Sep 28 01:23:43 Caused by: 
> org.apache.flink.runtime.checkpoint.CheckpointException: Task local 
> checkpoint failure.
> Sep 28 01:23:43   at 
> org.apache.flink.runtime.checkpoint.PendingCheckpoint.abort(PendingCheckpoint.java:550)
> Sep 28 01:23:43   at 
> org.apache.flink.runtime.checkpoint.CheckpointCoordinator.abortPendingCheckpoint(CheckpointCoordinator.java:2248)
> Sep 28 01:23:43   at 
> org.apache.flink.runtime.checkpoint.CheckpointCoordinator.abortPendingCheckpoint(CheckpointCoordinator.java:2235)
> Sep 28 01:23:43   at 
> org.apache.flink.runtime.checkpoint.CheckpointCoordinator.lambda$null$9(CheckpointCoordinator.java:817)
> Sep 28 01:23:43   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> Sep 28 01:23:43   at 
> java.util.concurrent.FutureTask.run(FutureTask.java:266)
> Sep 28 01:23:43   at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
> Sep 28 01:23:43   at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
> Sep 28 01:23:43   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> Sep 28 01:23:43   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> Sep 28 01:23:43   at java.lang.Thread.run(Thread.java:748)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-34720) Deploy Maven Snapshot failed on AZP

2024-03-19 Thread Ryan Skraba (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17828330#comment-17828330
 ] 

Ryan Skraba commented on FLINK-34720:
-

* 
[https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=58398=logs=eca6b3a6-1600-56cc-916a-c549b3cde3ff=7b3c1df5-9194-5183-5ebd-5567f52d5f8f=34]
 (before the revert)

> Deploy Maven Snapshot failed on AZP
> ---
>
> Key: FLINK-34720
> URL: https://issues.apache.org/jira/browse/FLINK-34720
> Project: Flink
>  Issue Type: Bug
>  Components: Build System / CI
>Affects Versions: 1.20.0
>Reporter: Ryan Skraba
>Priority: Blocker
>  Labels: pull-request-available, test-stability
>
>  
> There isn't any obvious reason that {{mvn: command not found}} could have 
> occurred, but we saw it three times this weekend.
>  * 
> [https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=58352=logs=eca6b3a6-1600-56cc-916a-c549b3cde3ff=7b3c1df5-9194-5183-5ebd-5567f52d5f8f]
>   
>  * 
> [https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=58359=logs=eca6b3a6-1600-56cc-916a-c549b3cde3ff=7b3c1df5-9194-5183-5ebd-5567f52d5f8f=36]
>  * 
> [https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=58366=logs=eca6b3a6-1600-56cc-916a-c549b3cde3ff=7b3c1df5-9194-5183-5ebd-5567f52d5f8f=36]
>  
> {code:java}
> + [[ tools != \t\o\o\l\s ]]
> + cd ..
> + echo 'Deploying to repository.apache.org'
> + COMMON_OPTIONS='-Prelease,docs-and-source -DskipTests 
> -DretryFailedDeploymentCount=10 -Dmaven.repo.local=/__w/1/.m2/repository 
> -Dmaven.wagon.http.pool=false -Dorg.slf4j.simpleLogger.showDateTime=true 
> -Dorg.slf4j.simpleLogger.dateTimeFormat=HH:mm:ss.SSS 
> -Dorg.slf4j.simpleLogger.log.org.apache.maven.cli.transfer.Slf4jMavenTransferListener=warn
>  --no-snapshot-updates -B   -Dgpg.skip -Drat.skip -Dcheckstyle.skip 
> --settings /__w/1/s/tools/deploy-settings.xml'
> + mvn clean deploy -Prelease,docs-and-source -DskipTests 
> -DretryFailedDeploymentCount=10 -Dmaven.repo.local=/__w/1/.m2/repository 
> -Dmaven.wagon.http.pool=false -Dorg.slf4j.simpleLogger.showDateTime=true 
> -Dorg.slf4j.simpleLogger.dateTimeFormat=HH:mm:ss.SSS 
> -Dorg.slf4j.simpleLogger.log.org.apache.maven.cli.transfer.Slf4jMavenTransferListener=warn
>  --no-snapshot-updates -B -Dgpg.skip -Drat.skip -Dcheckstyle.skip --settings 
> /__w/1/s/tools/deploy-settings.xml
> Deploying to repository.apache.org
> ./releasing/deploy_staging_jars.sh: line 46: mvn: command not found
> ##[error]Bash exited with code '127'.
> Finishing: Deploy maven snapshot
>  {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-30535) Introduce TTL state based benchmarks

2024-03-19 Thread Zakelly Lan (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-30535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17828329#comment-17828329
 ] 

Zakelly Lan commented on FLINK-30535:
-

[~rovboyko] Actually this has been resolved. Closing...

> Introduce TTL state based benchmarks
> 
>
> Key: FLINK-30535
> URL: https://issues.apache.org/jira/browse/FLINK-30535
> Project: Flink
>  Issue Type: New Feature
>  Components: Benchmarks
>Reporter: Yun Tang
>Assignee: Zakelly Lan
>Priority: Major
>  Labels: pull-request-available
>
> This ticket is inspired by https://issues.apache.org/jira/browse/FLINK-30088 
> which wants to optimize the TTL state performance. I think it would be useful 
> to introduce state benchmarks based on TTL as Flink has some overhead to 
> support TTL.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (FLINK-30535) Introduce TTL state based benchmarks

2024-03-19 Thread Zakelly Lan (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-30535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zakelly Lan resolved FLINK-30535.
-
Resolution: Fixed

> Introduce TTL state based benchmarks
> 
>
> Key: FLINK-30535
> URL: https://issues.apache.org/jira/browse/FLINK-30535
> Project: Flink
>  Issue Type: New Feature
>  Components: Benchmarks
>Reporter: Yun Tang
>Assignee: Zakelly Lan
>Priority: Major
>  Labels: pull-request-available
>
> This ticket is inspired by https://issues.apache.org/jira/browse/FLINK-30088 
> which wants to optimize the TTL state performance. I think it would be useful 
> to introduce state benchmarks based on TTL as Flink has some overhead to 
> support TTL.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] [FLINK-34732][cdc][ci] Add document dead link check for Flink CDC Documentation [flink-cdc]

2024-03-19 Thread via GitHub


leonardBang commented on PR #3168:
URL: https://github.com/apache/flink-cdc/pull/3168#issuecomment-2007184430

   > @leonardBang @PatrickRen First ,I add this ci and quick fix some dead 
link. Some dead link maybe cannot fix currently.
   > 
   > The link like under: [✖] 
https://repo1.maven.org/maven2/org/apache/flink/flink-cdc-pipeline-connector-mysql/3.0.0/flink-cdc-pipeline-connector-mysql-3.0.0.jar
 [✖] 
https://repo1.maven.org/maven2/org/apache/flink/flink-cdc-pipeline-connector-starrocks/3.0.0/flink-cdc-pipeline-connector-starrocks-3.0.0.jar
   
   Could we use `/com/ververica` patch to make the url works?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (FLINK-34743) Memory tuning takes effect even if the parallelism isn't changed

2024-03-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-34743:
---
Labels: pull-request-available  (was: )

> Memory tuning takes effect even if the parallelism isn't changed
> 
>
> Key: FLINK-34743
> URL: https://issues.apache.org/jira/browse/FLINK-34743
> Project: Flink
>  Issue Type: Improvement
>  Components: Autoscaler
>Reporter: Rui Fan
>Assignee: Rui Fan
>Priority: Major
>  Labels: pull-request-available
> Fix For: kubernetes-operator-1.9.0
>
>
> Currently, the memory tuning related logic is only called when the 
> parallelism is changed.
> See ScalingExecutor#scaleResource to get more details.
> It's better to let the memory tuning takes effect even if the parallelism 
> isn't changed. For example, one flink job runs with desired parallelisms, but 
> it wastes memory.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[PR] [FLINK-34743][autoscaler] Memory tuning takes effect even if the parallelism isn't changed [flink-kubernetes-operator]

2024-03-19 Thread via GitHub


1996fanrui opened a new pull request, #799:
URL: https://github.com/apache/flink-kubernetes-operator/pull/799

   ## What is the purpose of the change
   
   Currently, the memory tuning related logic is only called when the 
parallelism is changed.
   
   See ScalingExecutor#scaleResource to get more details.
   
   It's better to let the memory tuning takes effect even if the parallelism 
isn't changed. For example, one flink job runs with desired parallelisms, but 
it wastes memory.
   
   
   ## Brief change log
   
   [FLINK-34743][autoscaler] Memory tuning takes effect even if the parallelism 
isn't changed
   
   - Do scaling when parallelism or memory configuration is changed.
   
   ## Verifying this change
   
   Manually test done.
   
   The unit test is writing.
   
   ## Does this pull request potentially affect one of the following parts:
   
 - Dependencies (does it add or upgrade a dependency): no
 - The public API, i.e., is any changes to the `CustomResourceDescriptors`: 
no
 - Core observer or reconciler logic that is regularly executed:  no
   
   ## Documentation
   
 - Does this pull request introduce a new feature? no
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Created] (FLINK-34743) Memory tuning takes effect even if the parallelism isn't changed

2024-03-19 Thread Rui Fan (Jira)
Rui Fan created FLINK-34743:
---

 Summary: Memory tuning takes effect even if the parallelism isn't 
changed
 Key: FLINK-34743
 URL: https://issues.apache.org/jira/browse/FLINK-34743
 Project: Flink
  Issue Type: Improvement
  Components: Autoscaler
Reporter: Rui Fan
Assignee: Rui Fan
 Fix For: kubernetes-operator-1.9.0


Currently, the memory tuning related logic is only called when the parallelism 
is changed.

See ScalingExecutor#scaleResource to get more details.

It's better to let the memory tuning takes effect even if the parallelism isn't 
changed. For example, one flink job runs with desired parallelisms, but it 
wastes memory.





--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] [BP-3.0][FLINK-34677][cdc][docs] Refactor the structure of Flink CDC website documentation [flink-cdc]

2024-03-19 Thread via GitHub


leonardBang merged PR #3171:
URL: https://github.com/apache/flink-cdc/pull/3171


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Commented] (FLINK-34643) JobIDLoggingITCase failed

2024-03-19 Thread Ryan Skraba (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17828316#comment-17828316
 ] 

Ryan Skraba commented on FLINK-34643:
-

Weird – I collected a lot of build logs yesterday from over the weekend that 
resemble this error, but apparently my comment didn't get added :/  I'll go 
back and find those links.

In the meantime, [~roman]: we are still seeing failures in the same test that 
seem very related to this issue.  Is it possible that this fix is incomplete 
and should be reopened, or would you prefer that I raise a new JIRA?
 * 
[https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=58398=logs=8fd9202e-fd17-5b26-353c-ac1ff76c8f28=ea7cf968-e585-52cb-e0fc-f48de023a7ca=8249]

{code:java}
Mar 19 01:23:06 [not all expected events logged by 
org.apache.flink.runtime.jobmaster.JobMaster, logged:
Mar 19 01:23:06 [Logger=org.apache.flink.runtime.jobmaster.JobMaster Level=INFO 
Message=Initializing job 'Flink Streaming Job' 
(2ef7e557551a93ef716b6c3ba580bcd6)., 
Logger=org.apache.flink.runtime.jobmaster.JobMaster Level=INFO Message=Using 
restart back off time strategy NoRestartBackoffTimeStrategy for Flink Streaming 
Job (2ef7e557551a93ef716b6c3ba580bcd6)., 
Logger=org.apache.flink.runtime.jobmaster.JobMaster Level=INFO Message=Starting 
execution of job 'Flink Streaming Job' (2ef7e557551a93ef716b6c3ba580bcd6) under 
job master id 90514ce7689864236ebeb94380dc474d., 
Logger=org.apache.flink.runtime.jobmaster.JobMaster Level=DEBUG Message=Trigger 
heartbeat request., Logger=org.apache.flink.runtime.jobmaster.JobMaster 
Level=INFO Message=Connecting to ResourceManager 
pekko://flink/user/rpc/resourcemanager_1(8eee414f9dea640cb3668826c12e4976), 
Logger=org.apache.flink.runtime.jobmaster.JobMaster Level=INFO Message=Resolved 
ResourceManager address, beginning registration, 
Logger=org.apache.flink.runtime.jobmaster.JobMaster Level=DEBUG 
Message=Registration at ResourceManager attempt 1 (timeout=100ms), 
Logger=org.apache.flink.runtime.jobmaster.JobMaster Level=DEBUG 
Message=Registration with ResourceManager at 
pekko://flink/user/rpc/resourcemanager_1 was successful., 
Logger=org.apache.flink.runtime.jobmaster.JobMaster Level=INFO 
Message=JobManager successfully registered at ResourceManager, leader id: 
8eee414f9dea640cb3668826c12e4976., 
Logger=org.apache.flink.runtime.jobmaster.JobMaster Level=INFO Message=Stopping 
the JobMaster for job 'Flink Streaming Job' 
(2ef7e557551a93ef716b6c3ba580bcd6)., 
Logger=org.apache.flink.runtime.jobmaster.JobMaster Level=INFO 
Message=Disconnect TaskExecutor 23ae1952-8d6f-476e-b23b-4fad48feec15 because: 
Stopping JobMaster for job 'Flink Streaming Job' 
(2ef7e557551a93ef716b6c3ba580bcd6)., 
Logger=org.apache.flink.runtime.jobmaster.JobMaster Level=DEBUG Message=Close 
ResourceManager connection 58e840ebb5c16d7fb17f233b9e93cb3c.]] 
Mar 19 01:23:06 Expecting empty but was: [Checkpoint storage is set to .*,
Mar 19 01:23:06 Running initialization on master for job .*,
Mar 19 01:23:06 Starting scheduling.*,
Mar 19 01:23:06 State backend is set to .*,
Mar 19 01:23:06 Successfully created execution graph from job graph .*,
Mar 19 01:23:06 Successfully ran initialization on master.*,
Mar 19 01:23:06 Triggering a manual checkpoint for job .*.,
Mar 19 01:23:06 Using failover strategy .*]
Mar 19 01:23:06 at 
org.apache.flink.test.misc.JobIDLoggingITCase.assertJobIDPresent(JobIDLoggingITCase.java:241)
Mar 19 01:23:06 at 
org.apache.flink.test.misc.JobIDLoggingITCase.testJobIDLogging(JobIDLoggingITCase.java:170)
Mar 19 01:23:06 at java.lang.reflect.Method.invoke(Method.java:498)
Mar 19 01:23:06 at 
java.util.concurrent.RecursiveAction.exec(RecursiveAction.java:189)
Mar 19 01:23:06 at 
java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
Mar 19 01:23:06 at 
java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056)
Mar 19 01:23:06 at 
java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)
Mar 19 01:23:06 at 
java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:175)
{code}

> JobIDLoggingITCase failed
> -
>
> Key: FLINK-34643
> URL: https://issues.apache.org/jira/browse/FLINK-34643
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.20.0
>Reporter: Matthias Pohl
>Assignee: Roman Khachatryan
>Priority: Major
>  Labels: pull-request-available, test-stability
> Fix For: 1.20.0
>
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=58187=logs=8fd9202e-fd17-5b26-353c-ac1ff76c8f28=ea7cf968-e585-52cb-e0fc-f48de023a7ca=7897
> {code}
> Mar 09 01:24:23 01:24:23.498 [ERROR] Tests run: 1, Failures: 0, Errors: 1, 
> 

[jira] [Commented] (FLINK-30535) Introduce TTL state based benchmarks

2024-03-19 Thread Roman Boyko (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-30535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17828309#comment-17828309
 ] 

Roman Boyko commented on FLINK-30535:
-

[~Zakelly] , thank for this functionality!

One question - why it's still in OPEN status?

> Introduce TTL state based benchmarks
> 
>
> Key: FLINK-30535
> URL: https://issues.apache.org/jira/browse/FLINK-30535
> Project: Flink
>  Issue Type: New Feature
>  Components: Benchmarks
>Reporter: Yun Tang
>Assignee: Zakelly Lan
>Priority: Major
>  Labels: pull-request-available
>
> This ticket is inspired by https://issues.apache.org/jira/browse/FLINK-30088 
> which wants to optimize the TTL state performance. I think it would be useful 
> to introduce state benchmarks based on TTL as Flink has some overhead to 
> support TTL.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] [FLINK-31664] Implement ARRAY_INTERSECT function [flink]

2024-03-19 Thread via GitHub


liuyongvs commented on PR #24526:
URL: https://github.com/apache/flink/pull/24526#issuecomment-2007045444

   hi @dawidwys will you help review this?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [FLINK-28693][table] Fix janino compile failed because the code generated refers the class in table-planner [flink]

2024-03-19 Thread via GitHub


davidradl commented on PR #24280:
URL: https://github.com/apache/flink/pull/24280#issuecomment-2006990240

   This is a pr we would like merged. It looks like @snuyanzin asked for tests 
to be added. @xuyangzhong are you looking at adding the tests? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [FLINK-34562][docs-zh] Port Debezium Avro Confluent changes (FLINK-34509) to Chinese [flink]

2024-03-19 Thread via GitHub


Vincent-Woo commented on PR #24508:
URL: https://github.com/apache/flink/pull/24508#issuecomment-2006981943

   @wuchong @xccui @klion26  Excuse me, do you have time to take a look at this?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [FLINK-25537] [JUnit5 Migration] Module: flink-core [flink]

2024-03-19 Thread via GitHub


GOODBOY008 commented on PR #24523:
URL: https://github.com/apache/flink/pull/24523#issuecomment-2006975240

   @flinkbot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [FLINK-34732][cdc][ci] Add document dead link check for Flink CDC Documentation [flink-cdc]

2024-03-19 Thread via GitHub


GOODBOY008 commented on PR #3168:
URL: https://github.com/apache/flink-cdc/pull/3168#issuecomment-2006967612

   @leonardBang @PatrickRen First ,I add this ci and quick fix some dead link. 
Some dead link maybe cannot fix currently.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [FLINK-26088][Connectors/ElasticSearch] Add Elasticsearch 8.0 support [flink-connector-elasticsearch]

2024-03-19 Thread via GitHub


drorventura commented on PR #53:
URL: 
https://github.com/apache/flink-connector-elasticsearch/pull/53#issuecomment-2006807958

   Thank you @mtfelisb 
   @reswqa could you please approve?  


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Comment Edited] (FLINK-34726) Flink Kubernetes Operator has some room for optimizing performance.

2024-03-19 Thread Fei Feng (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17828262#comment-17828262
 ] 

Fei Feng edited comment on FLINK-34726 at 3/19/24 10:40 AM:


I think this two things (rest cluster client  and  sessionjob's secondary 
resource) should be contained in xxxContext , then we do not need to create 
them again and again,  and we can avoid unnecessary runtime overhead and GC 
pressure finally.

The difficulty lies in how to promptly update when changes occur. for example, 
rest cluster client should be recreate or update  if jobmanager rest address 
changed, and if FlinkDeployment object changed, sessionjob's SecondaryResource 
should be update too
 


was (Author: fei feng):
I think this two things (rest cluster client  and  sessionjob's secondary 
resource) should be contained in xxxContext , then we do not need to create 
them again and again,  and we can avoid unnecessary runtime overhead and GC 
pressure finally.

The difficulty lies in how to promptly update when changes occur. for example, 
rest cluster client should be recreate or update  if jobmanager rest address 
changed, and if FlinkDeployment object changed, sessionjob's SecondaryResource 
should be update
 

> Flink Kubernetes Operator has some room for optimizing performance.
> ---
>
> Key: FLINK-34726
> URL: https://issues.apache.org/jira/browse/FLINK-34726
> Project: Flink
>  Issue Type: Improvement
>  Components: Kubernetes Operator
>Affects Versions: kubernetes-operator-1.5.0, kubernetes-operator-1.6.0, 
> kubernetes-operator-1.7.0
>Reporter: Fei Feng
>Priority: Major
> Attachments: operator_no_submit_no_kill.flamegraph.html
>
>
> When there is a huge number of FlinkDeployment and FlinkSessionJob in a 
> kubernetes cluster, there will be a significant delay between event submit 
> into reconcile thread pool and  event is processed. 
> this is our test:we give operator enough resource(cpu: 10core, memory: 20g, 
> reconcile thread pool  size was 200 ) and we deployed 1 jobs firstly (one 
> FlinkDeployment and one SessionJob per job) , then we do submit/delete job 
> tests. we found that 
> 1. it cost about 2min between create new FlinkDeployment and FlinkSessionJob 
> CR to k8s and the flink job submited to jobmanager.
> 2. it cost about 1min between delete a FlinkDeployment and FlinkSessionJob CR 
>  and the flink job and session cluster cleared.
>  
> I use async-profiler to get flamegraph when  there is a huge number 
> FlinkDeployment and FlinkSessionJob. I found two obvious areas for 
> optimization
> 1. For Flinkdeployment: in the observe step, we call 
> AbstractFlinkService.getClusterInfo/listJobs/getTaskManagerInfo , every time 
> we call these method we need create RestClusterClient/ send requests/ close, 
> I think we should reuse RestClusterClient as much as possible to avoid 
> frequently creating objects to reduce GC pressure
> 2. For FlinkSessionJob (This issue is more obvious): in the whole reconcile 
> loop, we call getSecondaryResource 5 times to get FlinkDeployement resource 
> info. Based on my current understanding of the Flink Operator, I think we do 
> not need to call it 5 times in a single reconcile loop, calling it once is 
> enough. If yes, we cloud save 30% cpu usage (every getSecondaryResource cost 
> 6% cpu usage)
> [^operator_no_submit_no_kill.flamegraph.html]
> I hope we can discuss solutions to address this problem together. I'm very 
> willing to optimize and resolve this issue.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (FLINK-34726) Flink Kubernetes Operator has some room for optimizing performance.

2024-03-19 Thread Fei Feng (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17828262#comment-17828262
 ] 

Fei Feng edited comment on FLINK-34726 at 3/19/24 10:39 AM:


I think this two things (rest cluster client  and  sessionjob's secondary 
resource) should be contained in xxxContext , then we do not need to create 
them again and again,  and we can avoid unnecessary runtime overhead and GC 
pressure finally.

The difficulty lies in how to promptly update when changes occur. for example, 
rest cluster client should be recreate or update  if jobmanager rest address 
changed, and if FlinkDeployment object changed, sessionjob's SecondaryResource 
should be update
 


was (Author: fei feng):
I think this two things (rest cluster client  and  flink deployment resource 
info ) should be contained in Context, to avoid unnecessary runtime overhead 
and GC pressure.  The difficulty lies in how to promptly update when changes 
occur. for example, rest cluster client should be recreate or update  if 
jobmanager rest address changed, and if FlinkDeployment object changed, 
sessionjob's SecondaryResource should be update
 

> Flink Kubernetes Operator has some room for optimizing performance.
> ---
>
> Key: FLINK-34726
> URL: https://issues.apache.org/jira/browse/FLINK-34726
> Project: Flink
>  Issue Type: Improvement
>  Components: Kubernetes Operator
>Affects Versions: kubernetes-operator-1.5.0, kubernetes-operator-1.6.0, 
> kubernetes-operator-1.7.0
>Reporter: Fei Feng
>Priority: Major
> Attachments: operator_no_submit_no_kill.flamegraph.html
>
>
> When there is a huge number of FlinkDeployment and FlinkSessionJob in a 
> kubernetes cluster, there will be a significant delay between event submit 
> into reconcile thread pool and  event is processed. 
> this is our test:we give operator enough resource(cpu: 10core, memory: 20g, 
> reconcile thread pool  size was 200 ) and we deployed 1 jobs firstly (one 
> FlinkDeployment and one SessionJob per job) , then we do submit/delete job 
> tests. we found that 
> 1. it cost about 2min between create new FlinkDeployment and FlinkSessionJob 
> CR to k8s and the flink job submited to jobmanager.
> 2. it cost about 1min between delete a FlinkDeployment and FlinkSessionJob CR 
>  and the flink job and session cluster cleared.
>  
> I use async-profiler to get flamegraph when  there is a huge number 
> FlinkDeployment and FlinkSessionJob. I found two obvious areas for 
> optimization
> 1. For Flinkdeployment: in the observe step, we call 
> AbstractFlinkService.getClusterInfo/listJobs/getTaskManagerInfo , every time 
> we call these method we need create RestClusterClient/ send requests/ close, 
> I think we should reuse RestClusterClient as much as possible to avoid 
> frequently creating objects to reduce GC pressure
> 2. For FlinkSessionJob (This issue is more obvious): in the whole reconcile 
> loop, we call getSecondaryResource 5 times to get FlinkDeployement resource 
> info. Based on my current understanding of the Flink Operator, I think we do 
> not need to call it 5 times in a single reconcile loop, calling it once is 
> enough. If yes, we cloud save 30% cpu usage (every getSecondaryResource cost 
> 6% cpu usage)
> [^operator_no_submit_no_kill.flamegraph.html]
> I hope we can discuss solutions to address this problem together. I'm very 
> willing to optimize and resolve this issue.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-34728) operator does not need to upload and download the jar when deploying session job

2024-03-19 Thread Fei Feng (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17828263#comment-17828263
 ] 

Fei Feng commented on FLINK-34728:
--

Yes, this is not an issue on the Flink kubernetes operator side, the operator 
is just a client, and the server side needs to provide new api in order to 
complete this lightweight submission remodeling.
 

> operator does not need to upload and download the jar when deploying session 
> job
> 
>
> Key: FLINK-34728
> URL: https://issues.apache.org/jira/browse/FLINK-34728
> Project: Flink
>  Issue Type: Improvement
>  Components: Deployment / Kubernetes, Kubernetes Operator
>Affects Versions: kubernetes-operator-1.5.0, kubernetes-operator-1.6.0, 
> kubernetes-operator-1.7.0
>Reporter: Fei Feng
>Priority: Major
> Attachments: image-2024-03-19-15-59-20-933.png
>
>
> Problem:
> By reading the source code of the sessionjob's first reconcilition in the 
> session mode of the flink kubernetes operator, a clear single point of 
> bottleneck can be identified. When submitting a session job, the operator 
> needs to first download the job jar from the jarURL to the local storage of 
> kubernetes pod , then upload the jar to the job manager through the 
> `/jars/upload` rest api, and finally call the `/jars/:jarid/run` rest api to 
> launch the job.
> In this process, the operator needs to first download the jar and then upload 
> the jar. When multiple jobs are submitted to the session cluster 
> simultaneously, the operator can become a single point of bottleneck, which 
> may be limited by the network traffic or other resource constraints of the 
> operator pod.
>  
> [https://github.com/apache/flink-kubernetes-operator/blob/main/flink-kubernetes-operator/src/main/java/org/apache/flink/kubernetes/operator/service/AbstractFlinkService.java#L824]
> !image-2024-03-19-15-59-20-933.png|width=548,height=432!
>  
> Solution:
> We can modify the job submission process in the session mode. The jobmanager 
> can provide a `/jars/run` rest api that supports self-downloading the job 
> jar, and the operator only needs to send a rest request to submit the job, 
> without download and upload the job jar. In this way, the submission pressure 
> of the operator can be distributed to each job manager. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-34726) Flink Kubernetes Operator has some room for optimizing performance.

2024-03-19 Thread Fei Feng (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17828262#comment-17828262
 ] 

Fei Feng commented on FLINK-34726:
--

I think this two things (rest cluster client  and  flink deployment resource 
info ) should be contained in Context, to avoid unnecessary runtime overhead 
and GC pressure.  The difficulty lies in how to promptly update when changes 
occur. for example, rest cluster client should be recreate or update  if 
jobmanager rest address changed, and if FlinkDeployment object changed, 
sessionjob's SecondaryResource should be update
 

> Flink Kubernetes Operator has some room for optimizing performance.
> ---
>
> Key: FLINK-34726
> URL: https://issues.apache.org/jira/browse/FLINK-34726
> Project: Flink
>  Issue Type: Improvement
>  Components: Kubernetes Operator
>Affects Versions: kubernetes-operator-1.5.0, kubernetes-operator-1.6.0, 
> kubernetes-operator-1.7.0
>Reporter: Fei Feng
>Priority: Major
> Attachments: operator_no_submit_no_kill.flamegraph.html
>
>
> When there is a huge number of FlinkDeployment and FlinkSessionJob in a 
> kubernetes cluster, there will be a significant delay between event submit 
> into reconcile thread pool and  event is processed. 
> this is our test:we give operator enough resource(cpu: 10core, memory: 20g, 
> reconcile thread pool  size was 200 ) and we deployed 1 jobs firstly (one 
> FlinkDeployment and one SessionJob per job) , then we do submit/delete job 
> tests. we found that 
> 1. it cost about 2min between create new FlinkDeployment and FlinkSessionJob 
> CR to k8s and the flink job submited to jobmanager.
> 2. it cost about 1min between delete a FlinkDeployment and FlinkSessionJob CR 
>  and the flink job and session cluster cleared.
>  
> I use async-profiler to get flamegraph when  there is a huge number 
> FlinkDeployment and FlinkSessionJob. I found two obvious areas for 
> optimization
> 1. For Flinkdeployment: in the observe step, we call 
> AbstractFlinkService.getClusterInfo/listJobs/getTaskManagerInfo , every time 
> we call these method we need create RestClusterClient/ send requests/ close, 
> I think we should reuse RestClusterClient as much as possible to avoid 
> frequently creating objects to reduce GC pressure
> 2. For FlinkSessionJob (This issue is more obvious): in the whole reconcile 
> loop, we call getSecondaryResource 5 times to get FlinkDeployement resource 
> info. Based on my current understanding of the Flink Operator, I think we do 
> not need to call it 5 times in a single reconcile loop, calling it once is 
> enough. If yes, we cloud save 30% cpu usage (every getSecondaryResource cost 
> 6% cpu usage)
> [^operator_no_submit_no_kill.flamegraph.html]
> I hope we can discuss solutions to address this problem together. I'm very 
> willing to optimize and resolve this issue.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


  1   2   3   >