[jira] [Updated] (FLINK-34751) RestClusterClient APIs doesn't work with running Flink application on YARN
[ https://issues.apache.org/jira/browse/FLINK-34751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Venkata krishnan Sowrirajan updated FLINK-34751: Description: Apache YARN uses web proxy in Resource Manager to expose the endpoints available through the AM process (in this case RestServerEndpoint that run as part of AM). Note: this is in the context of running Flink cluster in YARN application mode. For eg: in the case of RestClusterClient#listJobs - {{Standalone listJobs}} makes the request as - {{{}https://:/v1/{}}}{{{}jobs{}}}{{{}/overview{}}} YARN the same request has to be proxified as - {{{}https://:/proxy//v1/{}}}{{{}jobs{}}}{{{}/overview?proxyapproved=true{}}} was: Apache YARN uses web proxy in Resource Manager to expose the endpoints available through the AM process (in this case RestServerEndpoint that run as part of AM). Note: this is in the context of running Flink cluster in YARN application mode. For eg: in the case of RestClusterClient#listJobs - {{Standalone listJobs}} makes the request as - {{{}https://:/v1/{}}}{{{}jobs{}}}{{{}/overview{}}} YARN the same request has to be proxified as - {{{}https://:/proxy//v1/{}}}{{{}jobs{}}}{{{}/overview?proxyapproved=true{}}} > RestClusterClient APIs doesn't work with running Flink application on YARN > -- > > Key: FLINK-34751 > URL: https://issues.apache.org/jira/browse/FLINK-34751 > Project: Flink > Issue Type: Bug >Reporter: Venkata krishnan Sowrirajan >Priority: Major > > Apache YARN uses web proxy in Resource Manager to expose the endpoints > available through the AM process (in this case RestServerEndpoint that run as > part of AM). Note: this is in the context of running Flink cluster in YARN > application mode. > For eg: in the case of RestClusterClient#listJobs - > {{Standalone listJobs}} makes the request as - > {{{}https://:/v1/{}}}{{{}jobs{}}}{{{}/overview{}}} > YARN the same request has to be proxified as - > {{{}https://:/proxy//v1/{}}}{{{}jobs{}}}{{{}/overview?proxyapproved=true{}}} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-34751) RestClusterClient APIs doesn't work with running Flink application on YARN
[ https://issues.apache.org/jira/browse/FLINK-34751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Venkata krishnan Sowrirajan updated FLINK-34751: Description: Apache YARN uses web proxy in Resource Manager to expose the endpoints available through the AM process (in this case RestServerEndpoint that run as part of AM). Note: this is in the context of running Flink cluster in YARN application mode. For eg: in the case of RestClusterClient#listJobs - {{Standalone listJobs}} makes the request as - {{{}https://:/v1/{}}}{{{}jobs{}}}{{{}/overview{}}} YARN the same request has to be proxified as - {{{}https://:/proxy//v1/{}}}{{{}jobs{}}}{{{}/overview?proxyapproved=true{}}} was:Apache YARN uses web proxy in Resource Manager to expose the endpoints available through the AM process (in this case RestServerEndpoint that run as part of AM). Note: this is in the context of running Flink cluster in YARN application mode. > RestClusterClient APIs doesn't work with running Flink application on YARN > -- > > Key: FLINK-34751 > URL: https://issues.apache.org/jira/browse/FLINK-34751 > Project: Flink > Issue Type: Bug >Reporter: Venkata krishnan Sowrirajan >Priority: Major > > Apache YARN uses web proxy in Resource Manager to expose the endpoints > available through the AM process (in this case RestServerEndpoint that run as > part of AM). Note: this is in the context of running Flink cluster in YARN > application mode. > > For eg: in the case of RestClusterClient#listJobs - > {{Standalone listJobs}} makes the request as - > {{{}https://:/v1/{}}}{{{}jobs{}}}{{{}/overview{}}} > YARN the same request has to be proxified as - > {{{}https://:/proxy//v1/{}}}{{{}jobs{}}}{{{}/overview?proxyapproved=true{}}} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-34751) RestClusterClient APIs doesn't work with running Flink application on YARN
Venkata krishnan Sowrirajan created FLINK-34751: --- Summary: RestClusterClient APIs doesn't work with running Flink application on YARN Key: FLINK-34751 URL: https://issues.apache.org/jira/browse/FLINK-34751 Project: Flink Issue Type: Bug Reporter: Venkata krishnan Sowrirajan Apache YARN uses web proxy in Resource Manager to expose the endpoints available through the AM process (in this case RestServerEndpoint that run as part of AM). Note: this is in the context of running Flink cluster in YARN application mode. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (FLINK-34734) Translate directory and page titles of Flink CDC docs to Chinese
[ https://issues.apache.org/jira/browse/FLINK-34734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Leonard Xu resolved FLINK-34734. Resolution: Implemented flink-cdc master: 8fdd151a0606f8bb707ce11b969bdb62f22c7182 > Translate directory and page titles of Flink CDC docs to Chinese > > > Key: FLINK-34734 > URL: https://issues.apache.org/jira/browse/FLINK-34734 > Project: Flink > Issue Type: Sub-task > Components: chinese-translation, Documentation, Flink CDC >Affects Versions: 3.1.0 >Reporter: LvYanquan >Assignee: LvYanquan >Priority: Major > Labels: pull-request-available > Fix For: 3.1.0 > > > The titles is used to build directory and document names. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (FLINK-34734) Translate directory and page titles of Flink CDC docs to Chinese
[ https://issues.apache.org/jira/browse/FLINK-34734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Leonard Xu reassigned FLINK-34734: -- Assignee: LvYanquan > Translate directory and page titles of Flink CDC docs to Chinese > > > Key: FLINK-34734 > URL: https://issues.apache.org/jira/browse/FLINK-34734 > Project: Flink > Issue Type: Sub-task > Components: chinese-translation, Documentation, Flink CDC >Affects Versions: 3.1.0 >Reporter: LvYanquan >Assignee: LvYanquan >Priority: Major > Labels: pull-request-available > Fix For: 3.1.0 > > > The titles is used to build directory and document names. -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [PR] [FLINK-34734][cdc][doc] update the title of content.zh to Chinese. [flink-cdc]
leonardBang merged PR #3172: URL: https://github.com/apache/flink-cdc/pull/3172 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (FLINK-34750) "legacy-flink-cdc-sources" Page of DB2 for Flink CDC Chinese Documentation.
LvYanquan created FLINK-34750: - Summary: "legacy-flink-cdc-sources" Page of DB2 for Flink CDC Chinese Documentation. Key: FLINK-34750 URL: https://issues.apache.org/jira/browse/FLINK-34750 Project: Flink Issue Type: Sub-task Components: chinese-translation, Documentation, Flink CDC Affects Versions: cdc-3.1.0 Reporter: LvYanquan Fix For: cdc-3.1.0 Translate legacy-flink-cdc-sources pages of [https://github.com/apache/flink-cdc/blob/master/docs/content/docs/connectors/legacy-flink-cdc-sources/postgres-cdc.md |https://github.com/apache/flink-cdc/blob/master/docs/content/docs/connectors/legacy-flink-cdc-sources/db2-cdc.md]into Chinese. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-34749) "legacy-flink-cdc-sources" Page of SQLServer for Flink CDC Chinese Documentation.
LvYanquan created FLINK-34749: - Summary: "legacy-flink-cdc-sources" Page of SQLServer for Flink CDC Chinese Documentation. Key: FLINK-34749 URL: https://issues.apache.org/jira/browse/FLINK-34749 Project: Flink Issue Type: Sub-task Components: chinese-translation, Documentation, Flink CDC Affects Versions: cdc-3.1.0 Reporter: LvYanquan Fix For: cdc-3.1.0 Translate legacy-flink-cdc-sources pages of [https://github.com/apache/flink-cdc/blob/master/docs/content/docs/connectors/legacy-flink-cdc-sources/sqlserver-cdc.md |https://github.com/apache/flink-cdc/blob/master/docs/content/docs/connectors/legacy-flink-cdc-sources/db2-cdc.md]into Chinese. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-34748) "legacy-flink-cdc-sources" Page of Oracle for Flink CDC Chinese Documentation.
LvYanquan created FLINK-34748: - Summary: "legacy-flink-cdc-sources" Page of Oracle for Flink CDC Chinese Documentation. Key: FLINK-34748 URL: https://issues.apache.org/jira/browse/FLINK-34748 Project: Flink Issue Type: Sub-task Components: chinese-translation, Documentation, Flink CDC Affects Versions: cdc-3.1.0 Reporter: LvYanquan Fix For: cdc-3.1.0 Translate legacy-flink-cdc-sources pages of [https://github.com/apache/flink-cdc/blob/master/docs/content/docs/connectors/legacy-flink-cdc-sources/oracle-cdc.md |https://github.com/apache/flink-cdc/blob/master/docs/content/docs/connectors/legacy-flink-cdc-sources/db2-cdc.md]into Chinese. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-34747) "legacy-flink-cdc-sources" Page of DB2 for Flink CDC Chinese Documentation.
LvYanquan created FLINK-34747: - Summary: "legacy-flink-cdc-sources" Page of DB2 for Flink CDC Chinese Documentation. Key: FLINK-34747 URL: https://issues.apache.org/jira/browse/FLINK-34747 Project: Flink Issue Type: Sub-task Components: chinese-translation, Documentation, Flink CDC Affects Versions: cdc-3.1.0 Reporter: LvYanquan Fix For: cdc-3.1.0 Translate legacy-flink-cdc-sources pages of [https://github.com/apache/flink-cdc/blob/master/docs/content/docs/connectors/legacy-flink-cdc-sources/db2-cdc.md |https://github.com/apache/flink-cdc/blob/master/docs/content/docs/connectors/legacy-flink-cdc-sources/db2-cdc.md]into Chinese. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-34740) "legacy-flink-cdc-sources" Pages for Flink CDC Chinese Documentation existed in 2.x version.
[ https://issues.apache.org/jira/browse/FLINK-34740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] LvYanquan updated FLINK-34740: -- Description: Translate legacy-flink-cdc-sources pages of [https://github.com/apache/flink-cdc/tree/master/docs/content/docs/connectors/legacy-flink-cdc-sources] into Chinese. This includes legacy MySQL\MongoDB\Oceanbase source. was:Translate legacy-flink-cdc-sources pages of [https://github.com/apache/flink-cdc/tree/master/docs/content/docs/connectors/legacy-flink-cdc-sources] into Chinese. > "legacy-flink-cdc-sources" Pages for Flink CDC Chinese Documentation existed > in 2.x version. > > > Key: FLINK-34740 > URL: https://issues.apache.org/jira/browse/FLINK-34740 > Project: Flink > Issue Type: Sub-task > Components: chinese-translation, Documentation, Flink CDC >Affects Versions: cdc-3.1.0 >Reporter: LvYanquan >Priority: Minor > Fix For: cdc-3.1.0 > > > Translate legacy-flink-cdc-sources pages of > [https://github.com/apache/flink-cdc/tree/master/docs/content/docs/connectors/legacy-flink-cdc-sources] > into Chinese. > This includes legacy MySQL\MongoDB\Oceanbase source. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-34702) Rank should not convert to StreamExecDuplicate when the input is not insert only
[ https://issues.apache.org/jira/browse/FLINK-34702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17828590#comment-17828590 ] Jacky Lau commented on FLINK-34702: --- we have three proposed solutions : {*}Solution 1{*}: Add a switch parameter, enabled by default. When a compilation error occurs, the switch is turned off based on the error message. However, this is noticeable by the user. {*}Solution 2{*}: Make the deduplication operator support changelog messages. Currently, the deduplication operator is divided into eight combinations: processing time / event time, first row / last row, mini-batch / non-mini-batch. Supporting changelog would effectively double these combinations to sixteen, significantly complicating the code logic. Furthermore, taking processing time's first row as an example, if changelog is supported, it would require recording all historical data, which could severely degrade performance. Additionally, the deduplication operator is essentially a special case of the Rank operator, but simply adapting the logic to use Rank isn't feasible either, as the Rank operator currently does not support mini-batch. This change would be imperceptible to users. {*}Solution 3{*}: An improvement over Solution 1, since the logical to physical stage cannot detect whether there are changelog, the transformation could be done during the physical rewrite phase. This phase comes after the FlinkChangelogModeInferenceProgram. After discussion with [~lincoln.86xy] offline, we agree with *Solution 3* > Rank should not convert to StreamExecDuplicate when the input is not insert > only > > > Key: FLINK-34702 > URL: https://issues.apache.org/jira/browse/FLINK-34702 > Project: Flink > Issue Type: Bug > Components: Table SQL / Planner >Affects Versions: 1.20.0 >Reporter: Jacky Lau >Priority: Major > Fix For: 1.20.0 > > > {code:java} > @Test > def testSimpleFirstRowOnBuiltinProctime1(): Unit = { > val sqlQuery = > """ > |SELECT * > |FROM ( > | SELECT *, > |ROW_NUMBER() OVER (PARTITION BY a ORDER BY PROCTIME() ASC) as > rowNum > | FROM (select a, count(b) as b from MyTable group by a) > |) > |WHERE rowNum = 1 > """.stripMargin > util.verifyExecPlan(sqlQuery) > } {code} > Exception: > org.apache.flink.table.api.TableException: StreamPhysicalDeduplicate doesn't > support consuming update changes which is produced by node > GroupAggregate(groupBy=[a], select=[a, COUNT(b) AS b]) > because the StreamPhysicalDeduplicate can not consuming update changes now > while StreamExecRank can. > so we should not convert the FlinkLogicalRank to StreamPhysicalDeduplicate in > this case. and we can defer whether input contains update change in the > "optimize the physical plan" phase. > so we can add an option to solve it. and when the StreamPhysicalDeduplicate > can support consuming update changes , we can deprecate it -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-34740) "legacy-flink-cdc-sources" Pages for Flink CDC Chinese Documentation existed in 2.x version.
[ https://issues.apache.org/jira/browse/FLINK-34740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] LvYanquan updated FLINK-34740: -- Summary: "legacy-flink-cdc-sources" Pages for Flink CDC Chinese Documentation existed in 2.x version. (was: "legacy-flink-cdc-sources" Pages for Flink CDC Chinese Documentation) > "legacy-flink-cdc-sources" Pages for Flink CDC Chinese Documentation existed > in 2.x version. > > > Key: FLINK-34740 > URL: https://issues.apache.org/jira/browse/FLINK-34740 > Project: Flink > Issue Type: Sub-task > Components: chinese-translation, Documentation, Flink CDC >Affects Versions: cdc-3.1.0 >Reporter: LvYanquan >Priority: Minor > Fix For: cdc-3.1.0 > > > Translate legacy-flink-cdc-sources pages of > [https://github.com/apache/flink-cdc/tree/master/docs/content/docs/connectors/legacy-flink-cdc-sources] > into Chinese. -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [PR] [FLINK-34743][autoscaler] Memory tuning takes effect even if the parallelism isn't changed [flink-kubernetes-operator]
1996fanrui commented on PR #799: URL: https://github.com/apache/flink-kubernetes-operator/pull/799#issuecomment-2008613406 Thanks @mxm for the quick review and valuable suggestion! I originally planned to add a memory scaling threshold in a subsequent PR (It's better to be done in 1.9.0). If you think it's needed in this PR, let me do it recently. How about reuse the `AutoScalerOptions#TARGET_UTILIZATION_BOUNDARY` option, it's 0.3 by default, it means we will ignore this tuning when the total memory saving less than 30%. I prefer reuse option because it can lower the user's threshold for use. WDYT? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [FLINK-34734][cdc][doc] update the title of content.zh to Chinese. [flink-cdc]
lvyanquan commented on PR #3172: URL: https://github.com/apache/flink-cdc/pull/3172#issuecomment-2008603814 @leonardBang @PatrickRen PTAL. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (FLINK-34734) Translate directory and page titles of Flink CDC docs to Chinese
[ https://issues.apache.org/jira/browse/FLINK-34734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated FLINK-34734: --- Labels: pull-request-available (was: ) > Translate directory and page titles of Flink CDC docs to Chinese > > > Key: FLINK-34734 > URL: https://issues.apache.org/jira/browse/FLINK-34734 > Project: Flink > Issue Type: Sub-task > Components: chinese-translation, Documentation, Flink CDC >Affects Versions: 3.1.0 >Reporter: LvYanquan >Priority: Major > Labels: pull-request-available > Fix For: 3.1.0 > > > The titles is used to build directory and document names. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[PR] [FLINK-34734][cdc][doc] update the title of content.zh to Chinese. [flink-cdc]
lvyanquan opened a new pull request, #3172: URL: https://github.com/apache/flink-cdc/pull/3172 Display page: https://github.com/apache/flink-cdc/assets/38547014/a005559c-7c25-4be5-9a90-bc4a8da13779;> -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (FLINK-34667) Changelog state backend support local rescaling
[ https://issues.apache.org/jira/browse/FLINK-34667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17828573#comment-17828573 ] Yanfei Lei commented on FLINK-34667: Merged via 501de48 > Changelog state backend support local rescaling > --- > > Key: FLINK-34667 > URL: https://issues.apache.org/jira/browse/FLINK-34667 > Project: Flink > Issue Type: Bug > Components: Runtime / State Backends >Affects Versions: 1.20.0 >Reporter: Yanfei Lei >Assignee: Yanfei Lei >Priority: Major > Labels: pull-request-available > > FLINK-33341 uses the available local keyed state for rescaling, this will > cause changelog state to incorrectly treat part of the local state as the > complete local state. > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Closed] (FLINK-34667) Changelog state backend support local rescaling
[ https://issues.apache.org/jira/browse/FLINK-34667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yanfei Lei closed FLINK-34667. -- > Changelog state backend support local rescaling > --- > > Key: FLINK-34667 > URL: https://issues.apache.org/jira/browse/FLINK-34667 > Project: Flink > Issue Type: Bug > Components: Runtime / State Backends >Affects Versions: 1.20.0 >Reporter: Yanfei Lei >Assignee: Yanfei Lei >Priority: Major > Labels: pull-request-available > > FLINK-33341 uses the available local keyed state for rescaling, this will > cause changelog state to incorrectly treat part of the local state as the > complete local state. > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (FLINK-34667) Changelog state backend support local rescaling
[ https://issues.apache.org/jira/browse/FLINK-34667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yanfei Lei resolved FLINK-34667. Resolution: Fixed > Changelog state backend support local rescaling > --- > > Key: FLINK-34667 > URL: https://issues.apache.org/jira/browse/FLINK-34667 > Project: Flink > Issue Type: Bug > Components: Runtime / State Backends >Affects Versions: 1.20.0 >Reporter: Yanfei Lei >Assignee: Yanfei Lei >Priority: Major > Labels: pull-request-available > > FLINK-33341 uses the available local keyed state for rescaling, this will > cause changelog state to incorrectly treat part of the local state as the > complete local state. > -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [PR] [FLINK-34667][state/changelog] Changelog state backend supports local rescaling [flink]
fredia merged PR #24516: URL: https://github.com/apache/flink/pull/24516 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [FLINK-34732][cdc][ci] Add document dead link check for Flink CDC Documentation [flink-cdc]
GOODBOY008 commented on code in PR #3168: URL: https://github.com/apache/flink-cdc/pull/3168#discussion_r1531451850 ## .dlc.json: ## @@ -0,0 +1,44 @@ +{ + "ignorePatterns": [ +{ + "pattern": "^http://localhost; +}, +{ + "pattern": "^#" +}, +{ + "pattern": "^{" +}, +{ + "pattern": "^https://repo1.maven.org/maven2/org/apache/flink.*SNAPSHOT.*; +}, +{ + "pattern": "^https://mvnrepository.com; +}, +{ + "pattern": "^https://img.shields.io; +}, +{ + "pattern": "^https://tokei.rs; +}, +{ + "pattern": "^https://json.org/; +}, +{ + "pattern": "^https://opencollective.com; +}, +{ + "pattern": "^https://twitter.com*; +} Review Comment: I will polish this config. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (FLINK-34746) Switching to the Apache CDN for Dockerfile
lincoln lee created FLINK-34746: --- Summary: Switching to the Apache CDN for Dockerfile Key: FLINK-34746 URL: https://issues.apache.org/jira/browse/FLINK-34746 Project: Flink Issue Type: Improvement Components: flink-docker Reporter: lincoln lee During publishing the official image, we received some comments for Switching to the Apache CDN See https://github.com/docker-library/official-images/pull/16114 https://github.com/docker-library/official-images/pull/16430 Reason for switching: [https://apache.org/history/mirror-history.html] (also [https://www.apache.org/dyn/closer.cgi] and [https://www.apache.org/mirrors]) -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [PR] [FLINK-34732][cdc][ci] Add document dead link check for Flink CDC Documentation [flink-cdc]
GOODBOY008 commented on code in PR #3168: URL: https://github.com/apache/flink-cdc/pull/3168#discussion_r1531450125 ## docs/content.zh/docs/connectors/legacy-flink-cdc-sources/db2-cdc.md: ## @@ -32,9 +32,9 @@ describes how to setup the db2 CDC connector to run SQL queries against Db2 data ## Supported Databases -| Connector | Database | Driver | -|---||--| -| [Db2-cdc](../db2-cdc) | [Db2](https://www.ibm.com/products/db2): 11.5 | Db2 Driver: 11.5.0.0 | +| Connector | Database | Driver | +|---||--| +| Db2-cdc | [Db2](https://www.ibm.com/products/db2): 11.5 | Db2 Driver: 11.5.0.0 | Review Comment: This just link to current page ,so I think there is no need to add this link. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (FLINK-34744) autoscaling-dynamic cannot run
[ https://issues.apache.org/jira/browse/FLINK-34744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17828571#comment-17828571 ] Rui Fan commented on FLINK-34744: - Merged to main(1.9.0) via: b584b08806c7e8366519acdb92bfb4725faaebba > autoscaling-dynamic cannot run > -- > > Key: FLINK-34744 > URL: https://issues.apache.org/jira/browse/FLINK-34744 > Project: Flink > Issue Type: Bug > Components: Autoscaler >Reporter: Rui Fan >Assignee: Rui Fan >Priority: Major > Labels: pull-request-available > Fix For: kubernetes-operator-1.9.0 > > Attachments: image-2024-03-19-21-46-15-530.png > > > autoscaling-dynamic cannot run on my Mac > !image-2024-03-19-21-46-15-530.png! -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (FLINK-34744) autoscaling-dynamic cannot run
[ https://issues.apache.org/jira/browse/FLINK-34744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rui Fan resolved FLINK-34744. - Resolution: Fixed > autoscaling-dynamic cannot run > -- > > Key: FLINK-34744 > URL: https://issues.apache.org/jira/browse/FLINK-34744 > Project: Flink > Issue Type: Bug > Components: Autoscaler >Reporter: Rui Fan >Assignee: Rui Fan >Priority: Major > Labels: pull-request-available > Fix For: kubernetes-operator-1.9.0 > > Attachments: image-2024-03-19-21-46-15-530.png > > > autoscaling-dynamic cannot run on my Mac > !image-2024-03-19-21-46-15-530.png! -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [PR] [FLINK-34732][cdc][ci] Add document dead link check for Flink CDC Documentation [flink-cdc]
PatrickRen commented on code in PR #3168: URL: https://github.com/apache/flink-cdc/pull/3168#discussion_r1531445066 ## docs/content.zh/docs/connectors/legacy-flink-cdc-sources/db2-cdc.md: ## @@ -32,9 +32,9 @@ describes how to setup the db2 CDC connector to run SQL queries against Db2 data ## Supported Databases -| Connector | Database | Driver | -|---||--| -| [Db2-cdc](../db2-cdc) | [Db2](https://www.ibm.com/products/db2): 11.5 | Db2 Driver: 11.5.0.0 | +| Connector | Database | Driver | +|---||--| +| Db2-cdc | [Db2](https://www.ibm.com/products/db2): 11.5 | Db2 Driver: 11.5.0.0 | Review Comment: What about using `{{ < ref > }}` here instead of removing the link? ## docs/content/docs/connectors/legacy-flink-cdc-sources/mysql-cdc.md: ## @@ -31,9 +31,9 @@ The MySQL CDC connector allows for reading snapshot data and incremental data fr ## Supported Databases -| Connector | Database | Driver | -|---||-| -| [mysql-cdc](../mysql-cdc) | [MySQL](https://dev.mysql.com/doc): 5.6, 5.7, 8.0.x [RDS MySQL](https://www.aliyun.com/product/rds/mysql): 5.6, 5.7, 8.0.x [PolarDB MySQL](https://www.aliyun.com/product/polardb): 5.6, 5.7, 8.0.x [Aurora MySQL](https://aws.amazon.com/cn/rds/aurora): 5.6, 5.7, 8.0.x [MariaDB](https://mariadb.org): 10.x [PolarDB X](https://github.com/ApsaraDB/galaxysql): 2.0.1 | JDBC Driver: 8.0.27 | +| Connector | Database | Driver | +|---||-| +| mysql-cdc | [MySQL](https://dev.mysql.com/doc): 5.6, 5.7, 8.0.x [RDS MySQL](https://www.aliyun.com/product/rds/mysql): 5.6, 5.7, 8.0.x [PolarDB MySQL](https://www.aliyun.com/product/polardb): 5.6, 5.7, 8.0.x [Aurora MySQL](https://aws.amazon.com/cn/rds/aurora): 5.6, 5.7, 8.0.x [MariaDB](https://mariadb.org): 10.x [PolarDB X](https://github.com/ApsaraDB/galaxysql): 2.0.1 | JDBC Driver: 8.0.27 | Review Comment: Use {{< ref >}} here ## docs/content.zh/docs/connectors/legacy-flink-cdc-sources/mysql-cdc.md: ## @@ -31,9 +31,9 @@ The MySQL CDC connector allows for reading snapshot data and incremental data fr ## Supported Databases -| Connector | Database | Driver |
Re: [PR] [FLINK-34744][autoscaler] Fix the issue that autoscaling-dynamic cannot run [flink-kubernetes-operator]
1996fanrui merged PR #800: URL: https://github.com/apache/flink-kubernetes-operator/pull/800 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] (FLINK-34740) "legacy-flink-cdc-sources" Pages for Flink CDC Chinese Documentation
[ https://issues.apache.org/jira/browse/FLINK-34740 ] Hongshun Wang deleted comment on FLINK-34740: --- was (Author: JIRAUSER298968): I am willing to do this. > "legacy-flink-cdc-sources" Pages for Flink CDC Chinese Documentation > > > Key: FLINK-34740 > URL: https://issues.apache.org/jira/browse/FLINK-34740 > Project: Flink > Issue Type: Sub-task > Components: chinese-translation, Documentation, Flink CDC >Affects Versions: cdc-3.1.0 >Reporter: LvYanquan >Priority: Minor > Fix For: cdc-3.1.0 > > > Translate legacy-flink-cdc-sources pages of > [https://github.com/apache/flink-cdc/tree/master/docs/content/docs/connectors/legacy-flink-cdc-sources] > into Chinese. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-34740) "legacy-flink-cdc-sources" Pages for Flink CDC Chinese Documentation
[ https://issues.apache.org/jira/browse/FLINK-34740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17828570#comment-17828570 ] Hongshun Wang commented on FLINK-34740: --- I am willing to do this. > "legacy-flink-cdc-sources" Pages for Flink CDC Chinese Documentation > > > Key: FLINK-34740 > URL: https://issues.apache.org/jira/browse/FLINK-34740 > Project: Flink > Issue Type: Sub-task > Components: chinese-translation, Documentation, Flink CDC >Affects Versions: cdc-3.1.0 >Reporter: LvYanquan >Priority: Minor > Fix For: cdc-3.1.0 > > > Translate legacy-flink-cdc-sources pages of > [https://github.com/apache/flink-cdc/tree/master/docs/content/docs/connectors/legacy-flink-cdc-sources] > into Chinese. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-34741) "get-started" Page for Flink CDC Chinese Documentation
[ https://issues.apache.org/jira/browse/FLINK-34741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17828569#comment-17828569 ] Hongshun Wang commented on FLINK-34741: --- I am willing to do this. > "get-started" Page for Flink CDC Chinese Documentation > -- > > Key: FLINK-34741 > URL: https://issues.apache.org/jira/browse/FLINK-34741 > Project: Flink > Issue Type: Sub-task > Components: chinese-translation, Documentation, Flink CDC >Affects Versions: cdc-3.1.0 >Reporter: LvYanquan >Priority: Minor > Fix For: cdc-3.1.0 > > > Translate > [https://github.com/apache/flink-cdc/tree/master/docs/content/docs/get-started] > pages into Chinese. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-34742) Translate "FAQ" Page for Flink CDC Chinese Documentation
[ https://issues.apache.org/jira/browse/FLINK-34742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17828568#comment-17828568 ] Hongshun Wang commented on FLINK-34742: --- I am willing to do this. > Translate "FAQ" Page for Flink CDC Chinese Documentation > > > Key: FLINK-34742 > URL: https://issues.apache.org/jira/browse/FLINK-34742 > Project: Flink > Issue Type: Sub-task > Components: chinese-translation, Documentation, Flink CDC >Affects Versions: cdc-3.1.0 >Reporter: LvYanquan >Priority: Minor > Fix For: cdc-3.1.0 > > > Translate > [https://github.com/apache/flink-cdc/blob/master/docs/content/docs/faq/faq.md] > page into Chinese. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-34729) "Core Concept" Pages for Flink CDC Chinese Documentation
[ https://issues.apache.org/jira/browse/FLINK-34729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17828566#comment-17828566 ] LvYanquan commented on FLINK-34729: --- I am willing to do this. > "Core Concept" Pages for Flink CDC Chinese Documentation > > > Key: FLINK-34729 > URL: https://issues.apache.org/jira/browse/FLINK-34729 > Project: Flink > Issue Type: Sub-task > Components: chinese-translation, Documentation, Flink CDC >Reporter: LvYanquan >Priority: Minor > > Translate [Core > Concept|https://github.com/apache/flink-cdc/tree/master/docs/content/docs/core-concept] > Pages into Chinese. > Include data-pipeline/data-source/data-sink/route/transform/table-id. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-34735) "Developer Guide - Understanding Flink CDC API" Page for Flink CDC Chinese Documentation
[ https://issues.apache.org/jira/browse/FLINK-34735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17828567#comment-17828567 ] LvYanquan commented on FLINK-34735: --- I am willing to do this. > "Developer Guide - Understanding Flink CDC API" Page for Flink CDC Chinese > Documentation > > > Key: FLINK-34735 > URL: https://issues.apache.org/jira/browse/FLINK-34735 > Project: Flink > Issue Type: Sub-task > Components: chinese-translation, Documentation, Flink CDC >Affects Versions: 3.1.0 >Reporter: LvYanquan >Priority: Minor > Fix For: 3.1.0 > > > Translate > [https://github.com/apache/flink-cdc/blob/master/docs/content/docs/developer-guide/understand-flink-cdc-api.md] > into Chinese. -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [PR] [FLINK-25537] [JUnit5 Migration] Module: flink-core [flink]
GOODBOY008 commented on PR #24523: URL: https://github.com/apache/flink/pull/24523#issuecomment-2008545415 @flinkbot run azure -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [FLINK-28693][table] Fix janino compile failed because the code generated refers the class in table-planner [flink]
xuyangzhong commented on PR #24280: URL: https://github.com/apache/flink/pull/24280#issuecomment-2008542204 > This is a pr we would like merged. It looks like @snuyanzin asked for tests to be added. @xuyangzhong are you looking at adding the tests? Hi, @davidradl . I'm attempting to add tests for it, but my recent schedule has been quite tight. I will do my best to re-push the PR within the next few days. Considering your dependency on this PR, another quick solution would be to cherry-pick this PR to your branch and repackage Flink. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [FLINK-34745] Improve validations for a period in Time Travel [flink]
snuyanzin commented on code in PR #24534: URL: https://github.com/apache/flink/pull/24534#discussion_r1531189991 ## flink-table/flink-table-planner/src/test/scala/org/apache/flink/table/planner/plan/stream/sql/join/TemporalJoinTest.scala: ## @@ -508,6 +508,27 @@ class TemporalJoinTest extends TableTestBase { " table, but the rowtime types are TIMESTAMP_LTZ(3) *ROWTIME* and TIMESTAMP(3) *ROWTIME*.", classOf[ValidationException] ) + +val sqlQuery9 = "SELECT * " + + "FROM Orders AS o JOIN " + + "RatesHistoryWithPK FOR SYSTEM_TIME AS OF 'o.rowtime' AS r " + + "ON o.currency = r.currency" +expectExceptionThrown( + sqlQuery9, + "The system time period specification expects Timestamp type but is 'CHAR'", Review Comment: IIRC this message is defined in resource file in Calcite and we (downstream project) can redefine it However i guess it is out of scope for this PR -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [FLINK-34745] Improve validations for a period in Time Travel [flink]
snuyanzin commented on code in PR #24534: URL: https://github.com/apache/flink/pull/24534#discussion_r1531183660 ## flink-table/flink-table-planner/src/main/java/org/apache/flink/table/planner/calcite/FlinkCalciteSqlValidator.java: ## @@ -222,14 +223,23 @@ protected void registerNamespace( Collections.singletonList(simplifiedRexNode), reducedNodes); // check whether period is the unsupported expression -if (!(reducedNodes.get(0) instanceof RexLiteral)) { -throw new UnsupportedOperationException( +final RexNode reducedNode = reducedNodes.get(0); +if (!(reducedNode instanceof RexLiteral)) { +throw new ValidationException( String.format( "Unsupported time travel expression: %s for the expression can not be reduced to a constant by Flink.", periodNode)); } -RexLiteral rexLiteral = (RexLiteral) (reducedNodes).get(0); +RexLiteral rexLiteral = (RexLiteral) reducedNode; +final RelDataType sqlType = rexLiteral.getType(); +if (!SqlTypeUtil.isTimestamp(rexLiteral.getType())) { Review Comment: nit: ```suggestion final RelDataType sqlType = rexLiteral.getType(); if (!SqlTypeUtil.isTimestamp(sqlType)) { ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [FLINK-34745] Improve validations for a period in Time Travel [flink]
dawidwys commented on code in PR #24534: URL: https://github.com/apache/flink/pull/24534#discussion_r1530780573 ## flink-table/flink-table-planner/src/main/java/org/apache/flink/table/planner/calcite/FlinkCalciteSqlValidator.java: ## @@ -222,14 +223,23 @@ protected void registerNamespace( Collections.singletonList(simplifiedRexNode), reducedNodes); // check whether period is the unsupported expression -if (!(reducedNodes.get(0) instanceof RexLiteral)) { -throw new UnsupportedOperationException( +final RexNode reducedNode = reducedNodes.get(0); +if (!(reducedNode instanceof RexLiteral)) { +throw new ValidationException( String.format( "Unsupported time travel expression: %s for the expression can not be reduced to a constant by Flink.", periodNode)); } -RexLiteral rexLiteral = (RexLiteral) (reducedNodes).get(0); +RexLiteral rexLiteral = (RexLiteral) reducedNode; +final SqlTypeName sqlTypeName = rexLiteral.getTypeName(); +if (!(sqlTypeName == SqlTypeName.TIMESTAMP_WITH_LOCAL_TIME_ZONE +|| sqlTypeName == SqlTypeName.TIMESTAMP)) { Review Comment: Good idea! I was looking for sth like it, but I failed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Comment Edited] (FLINK-34696) GSRecoverableWriterCommitter is generating excessive data blobs
[ https://issues.apache.org/jira/browse/FLINK-34696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17828412#comment-17828412 ] Galen Warren edited comment on FLINK-34696 at 3/19/24 5:07 PM: --- I think your proposed algorithm is just another way of implementing the "delete intermediate blobs that are no longer necessary as soon as possible" idea we've considered. You accomplish it by overwriting the staging blob on each iteration; a similar effect could be achieved by writing to a new intermediate staging blob on each iteration (as is done now) and deleting the old one right away (which is not done now. instead this deletion occurs shortly thereafter, after the commit succeeds). Aside: I'm not sure whether it's possible to overwrite the existing staging blob like you suggest. The [docs |[Objects: compose | Cloud Storage | Google Cloud|https://cloud.google.com/storage/docs/json_api/v1/objects/compose]] for the compose operation say: {quote}Concatenates a list of existing objects into a new object in the same bucket. The existing source objects are unaffected by this operation {quote} In your proposal, the same staging blob is both the target of the compose operation and one of the input blobs to be composed. That _might_ work, but would have to be tested to see if it's allowed. If it isn't, writing to a new blob and deleting the old one would have essentially the same effect. That's almost certainly what happens behind the scenes anyway, since blobs are immutable. Bigger picture, if you're trying to combine millions of immutable blobs together in one step, 32 at a time, I don't see how you avoid having lots of intermediate composed blobs, one way or another, at least temporarily. The main question would be how quickly ones that are no longer needed are discarded. {quote}To streamline this, we could modify {{GSCommitRecoverable}} to update the list of {{{}componentObjectIds{}}}, allowing the removal of blobs already appended to the {{stagingBlob}} {quote} Not quite following you here; the GSCommitRecoverable is provided as an input to the GSRecoverableWriterCommitter, providing the list of "raw" (i.e. uncomposed) temporary blobs that need to be composed and committed. If you removed those raw blobs from that list as they're added to the staging blob, what would that accomplish? The GSCommitRecoverable provided to the GSRecoverableWriterCommitter isn't persisted anywhere after the commit succeeds or fails, and if the commit were to be retried it would need the complete list anyway. {quote}I notice an other issue with the code - because the storage.compose might throw a StorageException. With the current code this would mean the intermediate composition blobs are not cleaned up. If the code above is implemented, I think there is no longer a need the TTL feature since all necessary blobs should be written to a final blob. Any leftover blobs post-job completion would indicate a failed state. {quote} There is no real way to prevent the orphaning of intermediate blobs in all failure scenarios. Even in your proposed algorithm, the staging blob could be orphaned if the composition process failed partway through. This is what the temp bucket and TTL mechanism are for. It's optional, so you don't have to use it, but yes you would have to keep an eye out for orphaned intermediate blobs via some other mechanism f you choose not to use it. Honestly, I think some of your difficulties are coming from trying to combine so much data together at once vs. doing it along the way, with commits at incremental checkpoints. I was going to suggest you reduce your checkpoint interval, to aggregate more frequently, but obviously that doesn't work if you're not checkpointing at all. I do think the RecoverableWriter mechanism is designed to make writing predictable and repeatable in checkpoint/recovery situations (hence the name); if you're not doing any of that, you may run into some challenges. If you were to use checkpoints, you would aggregate more frequently, meaning fewer intermediate blobs with shorter lifetimes, and you'd be less likely to run up against the 5TB limit, as you would end up with a series of smaller files, written at each checkpoint, vs. one huge file. If we do want to consider changes, here's what I suggest: * Test whether GCP allows composition to overwrite an existing blob that is also an input to the compose operation. If that works, then we could use that technique in the composeBlobs function. As mentioned above, I doubt that this actually reduces the number of blobs that exist temporarily – since blobs are immutable objects – but it would minimize their lifetime, effectively deleting them as soon as possible. If overwriting doesn't work, then we could achieve a similar effect by deleting intermediate blobs as soon as they are not needed anymore,
[jira] [Comment Edited] (FLINK-34696) GSRecoverableWriterCommitter is generating excessive data blobs
[ https://issues.apache.org/jira/browse/FLINK-34696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17828412#comment-17828412 ] Galen Warren edited comment on FLINK-34696 at 3/19/24 5:07 PM: --- I think your proposed algorithm is just another way of implementing the "delete intermediate blobs that are no longer necessary as soon as possible" idea we've considered. You accomplish it by overwriting the staging blob on each iteration; a similar effect could be achieved by writing to a new intermediate staging blob on each iteration (as is done now) and deleting the old one right away (which is not done now. instead this deletion occurs shortly thereafter, after the commit succeeds). Aside: I'm not sure whether it's possible to overwrite the existing staging blob like you suggest. The [docs |[Objects: compose | Cloud Storage | Google Cloud|https://cloud.google.com/storage/docs/json_api/v1/objects/compose]] for the compose operation say: {quote}Concatenates a list of existing objects into a new object in the same bucket. The existing source objects are unaffected by this operation {quote} In your proposal, the same staging blob is both the target of the compose operation and one of the input blobs to be composed. That _might_ work, but would have to be tested to see if it's allowed. If it isn't, writing to a new blob and deleting the old one would have essentially the same effect. That's almost certainly what happens behind the scenes anyway, since blobs are immutable. Bigger picture, if you're trying to combine millions of immutable blobs together in one step, 32 at a time, I don't see how you avoid having lots of intermediate composed blobs, one way or another, at least temporarily. The main question would be how quickly ones that are no longer needed are discarded. {quote}To streamline this, we could modify {{GSCommitRecoverable}} to update the list of {{{}componentObjectIds{}}}, allowing the removal of blobs already appended to the {{stagingBlob}} {quote} Not quite following you here; the GSCommitRecoverable is provided as an input to the GSRecoverableWriterCommitter, providing the list of "raw" (i.e. uncomposed) temporary blobs that need to be composed and committed. If you removed those raw blobs from that list as they're added to the staging blob, what would that accomplish? The GSCommitRecoverable provided to the GSRecoverableWriterCommitter isn't persisted anywhere after the commit succeeds or fails, and if the commit were to be retried it would need the complete list anyway. {quote}I notice an other issue with the code - because the storage.compose might throw a StorageException. With the current code this would mean the intermediate composition blobs are not cleaned up. If the code above is implemented, I think there is no longer a need the TTL feature since all necessary blobs should be written to a final blob. Any leftover blobs post-job completion would indicate a failed state. {quote} There is no real way to prevent the orphaning of intermediate blobs in all failure scenarios. Even in your proposed algorithm, the staging blob could be orphaned if the composition process failed partway through. This is what the temp bucket and TTL mechanism are for. It's optional, so you don't have to use it, but yes you would have to keep an eye out for orphaned intermediate blobs via some other mechanism f you choose not to use it. Honestly, I think some of your difficulties are coming from trying to combine so much data together at once vs. doing it along the way, with commits at incremental checkpoints. I was going to suggest you reduce your checkpoint interval, to aggregate more frequently, but obviously that doesn't work if you're not checkpointing at all. I do think the RecoverableWriter mechanism is designed to make writing predictable and repeatable in checkpoint/recovery situations (hence the name); if you're not doing any of that, you may run into some challenges. If you were to use checkpoints, you would aggregate more frequently, meaning fewer intermediate blobs with shorter lifetimes, and you'd be less likely to run up against the 5TB limit, as you would end up with a series of smaller files, written at each checkpoint, vs. one huge file. If we do want to consider changes, here's what I suggest: * Test whether GCP allows composition to overwrite an existing blob that is also an input to the compose operation. If that works, then we could use that technique in the composeBlobs function. As mentioned above, I doubt that this actually reduces the number of blobs that exist temporarily – since blobs are immutable objects – but it would minimize their lifetime, effectively deleting them as soon as possible. If overwriting doesn't work, then we could achieve a similar effect by deleting intermediate blobs as soon as they are not needed anymore,
[jira] [Comment Edited] (FLINK-34696) GSRecoverableWriterCommitter is generating excessive data blobs
[ https://issues.apache.org/jira/browse/FLINK-34696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17828412#comment-17828412 ] Galen Warren edited comment on FLINK-34696 at 3/19/24 5:06 PM: --- I think your proposed algorithm is just another way of implementing the "delete intermediate blobs that are no longer necessary as soon as possible" idea we've considered. You accomplish it by overwriting the staging blob on each iteration; a similar effect could be achieved by writing to a new intermediate staging blob on each iteration (as is done now) and deleting the old one right away (which is not done now. instead this deletion occurs shortly thereafter, after the commit succeeds). Aside: I'm not sure whether it's possible to overwrite the existing staging blob like you suggest. The [docs |[Objects: compose | Cloud Storage | Google Cloud|https://cloud.google.com/storage/docs/json_api/v1/objects/compose]] for the compose operation say: {quote}Concatenates a list of existing objects into a new object in the same bucket. The existing source objects are unaffected by this operation {quote} In your proposal, the same staging blob is both the target of the compose operation and one of the input blobs to be composed. That _might_ work, but would have to be tested to see if it's allowed. If it isn't, writing to a new blob and deleting the old one would have essentially the same effect. That's almost certainly what happens behind the scenes anyway, since blobs are immutable. Bigger picture, if you're trying to combine millions of immutable blobs together in one step, 32 at a time, I don't see how you avoid having lots of intermediate composed blobs, one way or another, at least temporarily. The main question would be how quickly ones that are no longer needed are discarded. {quote}To streamline this, we could modify {{GSCommitRecoverable}} to update the list of {{{}componentObjectIds{}}}, allowing the removal of blobs already appended to the {{stagingBlob}} {quote} Not quite following you here; the GSCommitRecoverable is provided as an input to the GSRecoverableWriterCommitter, providing the list of "raw" (i.e. uncomposed) temporary blobs that need to be composed and committed. If you removed those raw blobs from that list as they're added to the staging blob, what would that accomplish? The GSCommitRecoverable provided to the GSRecoverableWriterCommitter isn't persisted anywhere after the commit succeeds or fails, and if the commit were to be retried it would need the complete list anyway. {quote}I notice an other issue with the code - because the storage.compose might throw a StorageException. With the current code this would mean the intermediate composition blobs are not cleaned up. If the code above is implemented, I think there is no longer a need the TTL feature since all necessary blobs should be written to a final blob. Any leftover blobs post-job completion would indicate a failed state. {quote} There is no real way to prevent the orphaning of intermediate blobs in all failure scenarios. Even in your proposed algorithm, the staging blob could be orphaned if the composition process failed partway through. This is what the temp bucket and TTL mechanism are for. It's optional, so you don't have to use it, but yes you would have to keep an eye out for orphaned intermediate blobs via some other mechanism f you choose not to use it. Honestly, I think some of your difficulties are coming from trying to combine so much data together at once vs. doing it along the way, with commits at incremental checkpoints. I was going to suggest you reduce your checkpoint interval, to aggregate more frequently, but obviously that doesn't work if you're not checkpointing at all. I do think the RecoverableWriter mechanism is designed to make writing predictable and repeatable in checkpoint/recovery situations (hence the name); if you're not doing any of that, you may run into some challenges. If you were to use checkpoints, you would aggregate more frequently, meaning fewer intermediate blobs with shorter lifetimes, and you'd be less likely to run up against the 5TB limit, as you would end up with a series of smaller files, written at each checkpoint, vs. one huge file. If we do want to consider changes, here's what I suggest: * Test whether GCP allows composition to overwrite an existing blob that is also an input to the compose operation. If that works, then we could use that technique in the composeBlobs function. As mentioned above, I doubt that this actually reduces the number of blobs that exist temporarily – since blobs are immutable objects – but it would minimize their lifetime, effectively deleting them as soon as possible. If overwriting doesn't work, then we could achieve a similar effect by deleting intermediate blobs as soon as they are not needed anymore,
Re: [PR] [FLINK-34745] Improve validations for a period in Time Travel [flink]
snuyanzin commented on code in PR #24534: URL: https://github.com/apache/flink/pull/24534#discussion_r1530744301 ## flink-table/flink-table-planner/src/main/java/org/apache/flink/table/planner/calcite/FlinkCalciteSqlValidator.java: ## @@ -222,14 +223,23 @@ protected void registerNamespace( Collections.singletonList(simplifiedRexNode), reducedNodes); // check whether period is the unsupported expression -if (!(reducedNodes.get(0) instanceof RexLiteral)) { -throw new UnsupportedOperationException( +final RexNode reducedNode = reducedNodes.get(0); +if (!(reducedNode instanceof RexLiteral)) { +throw new ValidationException( String.format( "Unsupported time travel expression: %s for the expression can not be reduced to a constant by Flink.", periodNode)); } -RexLiteral rexLiteral = (RexLiteral) (reducedNodes).get(0); +RexLiteral rexLiteral = (RexLiteral) reducedNode; +final SqlTypeName sqlTypeName = rexLiteral.getTypeName(); +if (!(sqlTypeName == SqlTypeName.TIMESTAMP_WITH_LOCAL_TIME_ZONE +|| sqlTypeName == SqlTypeName.TIMESTAMP)) { Review Comment: ```suggestion if (!SqlTypeUtil.isTimestamp(sqlTypeName) { ``` I think `org.apache.calcite.sql.type.SqlTypeUtil#isTimestamp` could be reused here as defined at https://github.com/apache/calcite/blob/413eded693a9087402cc1a6eefeca7a29445d337/core/src/main/java/org/apache/calcite/sql/type/SqlTypeUtil.java#L390-L393 and https://github.com/apache/calcite/blob/413eded693a9087402cc1a6eefeca7a29445d337/core/src/main/java/org/apache/calcite/sql/type/SqlTypeFamily.java#L182-L183 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Comment Edited] (FLINK-34575) Vulnerabilities in commons-compress 1.24.0; upgrade to 1.26.0 needed.
[ https://issues.apache.org/jira/browse/FLINK-34575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17824327#comment-17824327 ] Adrian Vasiliu edited comment on FLINK-34575 at 3/19/24 4:39 PM: - Thanks Marijn for getting [https://github.com/apache/flink-connector-hbase/pull/41] merged. I suppose at some point this will be complemented by [https://github.com/apache/flink/pull/24352|https://github.com/apache/flink/pull/24352] . > I don't think a security fix is necessary, since Flink isn't affected > directly by it. Good to know. Now, vulnerability scanners don't know it, and in enterprise context vulnerabilities create trouble / extra processes even when the vulnerability can't really be exploited. was (Author: JIRAUSER280892): Thanks Marijn for getting [https://github.com/apache/flink-connector-hbase/pull/41] merged. I suppose at some point this will be complemented by [https://github.com/apache/flink/pull/24352.] > I don't think a security fix is necessary, since Flink isn't affected > directly by it. Good to know. Now, vulnerability scanners don't know it, and in enterprise context vulnerabilities create trouble / extra processes even when the vulnerability can't really be exploited. > Vulnerabilities in commons-compress 1.24.0; upgrade to 1.26.0 needed. > - > > Key: FLINK-34575 > URL: https://issues.apache.org/jira/browse/FLINK-34575 > Project: Flink > Issue Type: Technical Debt >Affects Versions: 1.18.1 >Reporter: Adrian Vasiliu >Priority: Major > Fix For: hbase-4.0.0 > > > Since Feb. 19, medium/high CVEs have been found for commons-compress 1.24.0: > [https://nvd.nist.gov/vuln/detail/CVE-2024-25710] > https://nvd.nist.gov/vuln/detail/CVE-2024-26308 > [https://github.com/apache/flink/pull/24352] has been opened automatically on > Feb. 21 by dependabot for bumping commons-compress to v1.26.0 which fixes the > CVEs, but two CI checks are red on the PR. > Flink's dependency on commons-compress has been upgraded to v1.24.0 in Oct > 2023 (https://issues.apache.org/jira/browse/FLINK-33329). > v1.24.0 is the version currently in the master > branch:[https://github.com/apache/flink/blob/master/pom.xml#L727-L729|https://github.com/apache/flink/blob/master/pom.xml#L727-L729).]. -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [PR] [FLINK-34745] Improve validations for a period in Time Travel [flink]
dawidwys commented on code in PR #24534: URL: https://github.com/apache/flink/pull/24534#discussion_r1530725274 ## flink-table/flink-table-planner/src/test/scala/org/apache/flink/table/planner/plan/stream/sql/join/TemporalJoinTest.scala: ## @@ -508,6 +508,27 @@ class TemporalJoinTest extends TableTestBase { " table, but the rowtime types are TIMESTAMP_LTZ(3) *ROWTIME* and TIMESTAMP(3) *ROWTIME*.", classOf[ValidationException] ) + +val sqlQuery9 = "SELECT * " + + "FROM Orders AS o JOIN " + + "RatesHistoryWithPK FOR SYSTEM_TIME AS OF 'o.rowtime' AS r " + + "ON o.currency = r.currency" +expectExceptionThrown( + sqlQuery9, + "The system time period specification expects Timestamp type but is 'CHAR'", Review Comment: Unforunately, this message is actually defined in Calcite. We'd need to change it there. I could change it for this particular case, but It's also thrown from other locations in Calcite. I'd rather keep it consistent. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [FLINK-34745] Improve validations for a period in Time Travel [flink]
dawidwys commented on code in PR #24534: URL: https://github.com/apache/flink/pull/24534#discussion_r1530725846 ## flink-table/flink-table-planner/src/main/java/org/apache/flink/table/planner/calcite/FlinkCalciteSqlValidator.java: ## @@ -222,15 +223,25 @@ protected void registerNamespace( Collections.singletonList(simplifiedRexNode), reducedNodes); // check whether period is the unsupported expression -if (!(reducedNodes.get(0) instanceof RexLiteral)) { -throw new UnsupportedOperationException( +final RexNode reducedNode = reducedNodes.get(0); +if (!(reducedNode instanceof RexLiteral)) { +throw new ValidationException( String.format( "Unsupported time travel expression: %s for the expression can not be reduced to a constant by Flink.", periodNode)); } -RexLiteral rexLiteral = (RexLiteral) (reducedNodes).get(0); -TimestampString timestampString = rexLiteral.getValueAs(TimestampString.class); +final SqlTypeName sqlTypeName = ((RexLiteral) reducedNode).getTypeName(); +if (!(sqlTypeName == SqlTypeName.TIMESTAMP_WITH_LOCAL_TIME_ZONE +|| sqlTypeName == SqlTypeName.TIMESTAMP)) { +throw newValidationError( +periodNode, + Static.RESOURCE.illegalExpressionForTemporal(sqlTypeName.getName())); +} + +RexLiteral rexLiteral = (RexLiteral) reducedNode; +TimestampString timestampString = +((RexLiteral) reducedNode).getValueAs(TimestampString.class); Review Comment: ah, I wanted to delete the line with `rexLiteral` 臘 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [FLINK-34745] Improve validations for a period in Time Travel [flink]
dawidwys commented on code in PR #24534: URL: https://github.com/apache/flink/pull/24534#discussion_r1530725274 ## flink-table/flink-table-planner/src/test/scala/org/apache/flink/table/planner/plan/stream/sql/join/TemporalJoinTest.scala: ## @@ -508,6 +508,27 @@ class TemporalJoinTest extends TableTestBase { " table, but the rowtime types are TIMESTAMP_LTZ(3) *ROWTIME* and TIMESTAMP(3) *ROWTIME*.", classOf[ValidationException] ) + +val sqlQuery9 = "SELECT * " + + "FROM Orders AS o JOIN " + + "RatesHistoryWithPK FOR SYSTEM_TIME AS OF 'o.rowtime' AS r " + + "ON o.currency = r.currency" +expectExceptionThrown( + sqlQuery9, + "The system time period specification expects Timestamp type but is 'CHAR'", Review Comment: Unforunately, this message is actually defined in Calcite. We'd need to change it there. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [FLINK-34718][tests] Exclude operators not meant for adaptive scheduler [flink]
XComp commented on PR #24533: URL: https://github.com/apache/flink/pull/24533#issuecomment-2007544762 You don't have to wait. You could run both profiles in the same CI build (similar to what is done in the nightly workflows) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [FLINK-34718][tests] Exclude operators not meant for adaptive scheduler [flink]
RyanSkraba commented on PR #24533: URL: https://github.com/apache/flink/pull/24533#issuecomment-2007540901 It's in process -- I guess I'll let it run through to show the tests are executed outside of the adaptive scheduler, then push it again with the fake commit to run through again to show they are excluded. :thinking: -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Comment Edited] (FLINK-34696) GSRecoverableWriterCommitter is generating excessive data blobs
[ https://issues.apache.org/jira/browse/FLINK-34696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17828412#comment-17828412 ] Galen Warren edited comment on FLINK-34696 at 3/19/24 3:32 PM: --- I think your proposed algorithm is just another way of implementing the "delete intermediate blobs that are no longer necessary as soon as possible" idea we've considered. You accomplish it by overwriting the staging blob on each iteration; a similar effect could be achieved by writing to a new intermediate staging blob on each iteration (as is done now) and deleting the old one right away (which is not done now. instead this deletion occurs shortly thereafter, after the commit succeeds). Aside: I'm not sure whether it's possible to overwrite the existing staging blob like you suggest. The [docs |[Objects: compose | Cloud Storage | Google Cloud|https://cloud.google.com/storage/docs/json_api/v1/objects/compose]] for the compose operation say: {quote}Concatenates a list of existing objects into a new object in the same bucket. The existing source objects are unaffected by this operation {quote} In your proposal, the same staging blob is both the target of the compose operation and one of the input blobs to be composed. That _might_ work, but would have to be tested to see if it's allowed. If it isn't, writing to a new blob and deleting the old one would have essentially the same effect. That's almost certainly what happens behind the scenes anyway, since blobs are immutable. Bigger picture, if you're trying to combine millions of immutable blobs together in one step, 32 at a time, I don't see how you avoid having lots of intermediate composed blobs, one way or another, at least temporarily. The main question would be how quickly ones that are no longer needed are discarded. {quote}To streamline this, we could modify {{GSCommitRecoverable}} to update the list of {{{}componentObjectIds{}}}, allowing the removal of blobs already appended to the {{stagingBlob}} {quote} Not quite following you here; the GSCommitRecoverable is provided as an input to the GSRecoverableWriterCommitter, providing the list of "raw" (i.e. uncomposed) temporary blobs that need to be composed and committed. If you removed those raw blobs from that list as they're added to the staging blob, what would that accomplish? The GSCommitRecoverable provided to the GSRecoverableWriterCommitter isn't persisted anywhere after the commit succeeds or fails, and if the commit were to be retried it would need the complete list anyway. {quote}I notice an other issue with the code - because the storage.compose might throw a StorageException. With the current code this would mean the intermediate composition blobs are not cleaned up. If the code above is implemented, I think there is no longer a need the TTL feature since all necessary blobs should be written to a final blob. Any leftover blobs post-job completion would indicate a failed state. {quote} There is no real way to prevent the orphaning of intermediate blobs in all failure scenarios. Even in your proposed algorithm, the staging blob could be orphaned if the composition process failed partway through. This is what the temp bucket and TTL mechanism are for. It's optional, so you don't have to use it, but yes you would have to keep an eye out for orphaned intermediate blobs via some other mechanism f you choose not to use it. Honestly, I think some of your difficulties are coming from trying to combine so much data together at once vs. doing it along the way, with commits at incremental checkpoints. I was going to suggest you reduce your checkpoint interval, to aggregate more frequently, but obviously that doesn't work if you're not checkpointing at all. I do think the RecoverableWriter mechanism is designed to make writing predictable and repeatable in checkpoint/recovery situations (hence the name); if you're not doing any of that, you may run into some challenges. If you were to use checkpoints, you would aggregate more frequently, meaning fewer intermediate blobs with shorter lifetimes, and you'd be less likely to run up against the 5TB limit, as you would end up with a series of smaller files, written at each checkpoint, vs. one huge file. If we do want to consider changes, here's what I suggest: * Test whether GCP allows composition to overwrite an existing blob that is also an input to the compose operation. If that works, then we could use that technique in the composeBlobs function. As mentioned above, I doubt that this actually reduces the number of blobs that exist temporarily – since blobs are immutable objects – but it would minimize their lifetime, effectively deleting them as soon as possible. If overwriting doesn't work, then we could achieve a similar effect by deleting intermediate blobs as soon as they are not needed anymore,
[jira] [Commented] (FLINK-34696) GSRecoverableWriterCommitter is generating excessive data blobs
[ https://issues.apache.org/jira/browse/FLINK-34696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17828412#comment-17828412 ] Galen Warren commented on FLINK-34696: -- I think your proposed algorithm is just another way of implementing the "delete intermediate blobs that are no longer necessary as soon as possible" idea we've considered. You accomplish it by overwriting the staging blob on each iteration; a similar effect could be achieved by writing to a new intermediate staging blob on each iteration (as is done now) and deleting the old one right away (which is not done now. instead this deletion occurs shortly thereafter, after the commit succeeds). Aside: I'm not sure whether it's possible to overwrite the existing staging blob like you suggest. The [docs |[Objects: compose | Cloud Storage | Google Cloud|https://cloud.google.com/storage/docs/json_api/v1/objects/compose]] for the compose operation say: {quote}Concatenates a list of existing objects into a new object in the same bucket. The existing source objects are unaffected by this operation {quote} In your proposal, the same staging blob is both the target of the compose operation and one of the input blobs to be composed. That _might_ work, but would have to be tested to see if it's allowed. If it isn't, writing to a new blob and deleting the old one would have essentially the same effect. That's almost certainly what happens behind the scenes anyway, since blobs are immutable. Bigger picture, if you're trying to combine millions of immutable blobs together in one step, 32 at a time, I don't see how you avoid having lots of intermediate composed blobs, one way or another, at least temporarily. The main question would be how quickly ones that are no longer needed are discarded. {quote}To streamline this, we could modify {{GSCommitRecoverable}} to update the list of {{{}componentObjectIds{}}}, allowing the removal of blobs already appended to the {{stagingBlob}} {quote} Not quite following you here; the GSCommitRecoverable is provided as an input to the GSRecoverableWriterCommitter, providing the list of "raw" (i.e. uncomposed) temporary blobs that need to be composed and committed. If you removed those raw blobs from that list as they're added to the staging blob, what would that accomplish? The GSCommitRecoverable provided to the GSRecoverableWriterCommitter isn't persisted anywhere after the commit succeeds or fails, and if the commit were to be retried it would need the complete list anyway. {quote}I notice an other issue with the code - because the storage.compose might throw a StorageException. With the current code this would mean the intermediate composition blobs are not cleaned up. If the code above is implemented, I think there is no longer a need the TTL feature since all necessary blobs should be written to a final blob. Any leftover blobs post-job completion would indicate a failed state. {quote} There is no real way to prevent the orphaning of intermediate blobs in all failure scenarios. Even in your proposed algorithm, the staging blob could be orphaned if the composition process failed partway through. This is what the temp bucket and TTL mechanism are for. It's optional, so you don't have to use it, but yes you would have to keep an eye out for orphaned intermediate blobs via some other mechanism f you choose not to use it. Honestly, I think some of your difficulties are coming from trying to combine so much data together at once vs. doing it along the way, with commits at incremental checkpoints. I was going to suggest you reduce your checkpoint interval, to aggregate more frequently, but obviously that doesn't work if you're not checkpointing at all. I do think the RecoverableWriter mechanism is designed to make writing predictable and repeatable in checkpoint/recovery situations (hence the name); if you're not doing any of that, you may run into some challenges. If you were to use checkpoints, you would aggregate more frequently, meaning fewer intermediate blobs with shorter lifetimes, and you'd be less likely to run up against the 5TB limit, as you would end up with a series of smaller files, written at each checkpoint, vs. one huge file. If we do want to consider changes, here's what I suggest: * Test whether GCP allows composition to overwrite an existing blob that is also an input to the compose operation. If that works, then we could use that technique in the composeBlobs function. As mentioned above, I doubt that this actually reduces the number of blobs that exist temporarily – since blobs are immutable objects – but it would minimize their lifetime, effectively deleting them as soon as possible. If overwriting doesn't work, then we could achieve a similar effect by deleting intermediate blobs as soon as they are not needed anymore, rather than waiting until the commit succeeds. *
Re: [PR] [FLINK-20398][e2e] Migrate test_batch_sql.sh to Java e2e tests framework [flink]
affo commented on PR #24471: URL: https://github.com/apache/flink/pull/24471#issuecomment-2007495060 @XComp Required quite of an effort honestly, but here we are with the JUnit5 version of what I had before This also allowed not to start a separate jar, but to directly include the code in the text and directly run it agains the MiniCluster obtained Thank you for your detailed review -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [FLINK-20398][e2e] Migrate test_batch_sql.sh to Java e2e tests framework [flink]
affo commented on code in PR #24471: URL: https://github.com/apache/flink/pull/24471#discussion_r1530613302 ## flink-end-to-end-tests/flink-end-to-end-tests-common/src/main/java/org/apache/flink/tests/util/flink/FlinkDistribution.java: ## @@ -234,10 +234,7 @@ public JobID submitJob(final JobSubmission jobSubmission, Duration timeout) thro LOG.info("Running {}.", commands.stream().collect(Collectors.joining(" "))); -final Pattern pattern = -jobSubmission.isDetached() -? Pattern.compile("Job has been submitted with JobID (.*)") -: Pattern.compile("Job with JobID (.*) has finished."); +final Pattern pattern = Pattern.compile("Job has been submitted with JobID (.*)"); Review Comment: It used to be, as apparently that string matching is now obsolete. With latest changes this is not relevant anymore -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [FLINK-20398][e2e] Migrate test_batch_sql.sh to Java e2e tests framework [flink]
affo commented on code in PR #24471: URL: https://github.com/apache/flink/pull/24471#discussion_r1530612428 ## flink-end-to-end-tests/flink-batch-sql-test/src/test/resources/log4j2-test.properties: ## @@ -0,0 +1,31 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# +# Set root logger level to OFF to not flood build logs +# set manually to INFO for debugging purposes +rootLogger.level=OFF +rootLogger.appenderRef.test.ref=TestLogger +appender.testlogger.name=TestLogger +appender.testlogger.type=CONSOLE +appender.testlogger.target=SYSTEM_ERR +appender.testlogger.layout.type=PatternLayout +appender.testlogger.layout.pattern=%-4r [%t] %-5p %c %x - %m%n +# Uncomment to enable codegen logging +#loggers = testlogger +#logger.testlogger.name =org.apache.flink.table.planner.codegen +#logger.testlogger.level = TRACE +#logger.testlogger.appenderRefs = TestLogger Review Comment: This is quite useful for running tests (switch from `OFF` to `INFO`) and `OFF` prevents to clutter logs in CI runs -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [FLINK-34745] Improve validations for a period in Time Travel [flink]
alexey-lv commented on code in PR #24534: URL: https://github.com/apache/flink/pull/24534#discussion_r1530610432 ## flink-table/flink-table-planner/src/test/scala/org/apache/flink/table/planner/plan/stream/sql/join/TemporalJoinTest.scala: ## @@ -508,6 +508,27 @@ class TemporalJoinTest extends TableTestBase { " table, but the rowtime types are TIMESTAMP_LTZ(3) *ROWTIME* and TIMESTAMP(3) *ROWTIME*.", classOf[ValidationException] ) + +val sqlQuery9 = "SELECT * " + + "FROM Orders AS o JOIN " + + "RatesHistoryWithPK FOR SYSTEM_TIME AS OF 'o.rowtime' AS r " + + "ON o.currency = r.currency" +expectExceptionThrown( + sqlQuery9, + "The system time period specification expects Timestamp type but is 'CHAR'", Review Comment: nit: the error message would be better if it was changed `...expects Timestamp type but is...` --> `...expects Timestamp type, but got ...` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [FLINK-34745] Improve validations for a period in Time Travel [flink]
alexey-lv commented on code in PR #24534: URL: https://github.com/apache/flink/pull/24534#discussion_r1530607371 ## flink-table/flink-table-planner/src/main/java/org/apache/flink/table/planner/calcite/FlinkCalciteSqlValidator.java: ## @@ -222,15 +223,25 @@ protected void registerNamespace( Collections.singletonList(simplifiedRexNode), reducedNodes); // check whether period is the unsupported expression -if (!(reducedNodes.get(0) instanceof RexLiteral)) { -throw new UnsupportedOperationException( +final RexNode reducedNode = reducedNodes.get(0); +if (!(reducedNode instanceof RexLiteral)) { +throw new ValidationException( String.format( "Unsupported time travel expression: %s for the expression can not be reduced to a constant by Flink.", periodNode)); } -RexLiteral rexLiteral = (RexLiteral) (reducedNodes).get(0); -TimestampString timestampString = rexLiteral.getValueAs(TimestampString.class); +final SqlTypeName sqlTypeName = ((RexLiteral) reducedNode).getTypeName(); +if (!(sqlTypeName == SqlTypeName.TIMESTAMP_WITH_LOCAL_TIME_ZONE +|| sqlTypeName == SqlTypeName.TIMESTAMP)) { +throw newValidationError( +periodNode, + Static.RESOURCE.illegalExpressionForTemporal(sqlTypeName.getName())); +} + +RexLiteral rexLiteral = (RexLiteral) reducedNode; +TimestampString timestampString = +((RexLiteral) reducedNode).getValueAs(TimestampString.class); Review Comment: Use rexLiteral here -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [FLINK-34745] Improve validations for a period in Time Travel [flink]
alexey-lv commented on code in PR #24534: URL: https://github.com/apache/flink/pull/24534#discussion_r1530606670 ## flink-table/flink-table-planner/src/main/java/org/apache/flink/table/planner/calcite/FlinkCalciteSqlValidator.java: ## @@ -222,15 +223,25 @@ protected void registerNamespace( Collections.singletonList(simplifiedRexNode), reducedNodes); // check whether period is the unsupported expression -if (!(reducedNodes.get(0) instanceof RexLiteral)) { -throw new UnsupportedOperationException( +final RexNode reducedNode = reducedNodes.get(0); +if (!(reducedNode instanceof RexLiteral)) { +throw new ValidationException( String.format( "Unsupported time travel expression: %s for the expression can not be reduced to a constant by Flink.", periodNode)); } -RexLiteral rexLiteral = (RexLiteral) (reducedNodes).get(0); -TimestampString timestampString = rexLiteral.getValueAs(TimestampString.class); +final SqlTypeName sqlTypeName = ((RexLiteral) reducedNode).getTypeName(); +if (!(sqlTypeName == SqlTypeName.TIMESTAMP_WITH_LOCAL_TIME_ZONE +|| sqlTypeName == SqlTypeName.TIMESTAMP)) { +throw newValidationError( +periodNode, + Static.RESOURCE.illegalExpressionForTemporal(sqlTypeName.getName())); +} + +RexLiteral rexLiteral = (RexLiteral) reducedNode; Review Comment: (nit) move `RexLiteral rexLiteral = (RexLiteral) reducedNode` above `final SqlTypeName sqlTypeName = ((RexLiteral) reducedNode).getTypeName();` -- this would help to simplify readability of the latter a bit. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [hotfix] Fix maven property typo in root pom.xml [flink]
flinkbot commented on PR #24535: URL: https://github.com/apache/flink/pull/24535#issuecomment-2007483505 ## CI report: * 1eb406be2e3bc9e67e92b7e8b7d579b584f4b22e UNKNOWN Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[PR] [hotfix] Fix maven property typo in root pom.xml [flink]
RyanSkraba opened a new pull request, #24535: URL: https://github.com/apache/flink/pull/24535 ## What is the purpose of the change There's a missing brace in the root `pom.xml` that would ignore any tests that should be excluded on a specific JDK version (`FailsOnJavaXxx`) when the `enable-adaptive-scheduler` profile is enabled. There currently aren't any, but this should be corrected (and maybe backported)? ## Brief change log Add the missing brace. ## Verifying this change This change is a trivial rework without any test coverage. ## Does this pull request potentially affect one of the following parts: - Dependencies (does it add or upgrade a dependency): **no** - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: **no** - The serializers: **no** - The runtime per-record code paths (performance sensitive): **no** - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: **no** - The S3 file system connector: **no** ## Documentation - Does this pull request introduce a new feature? **no** - If yes, how is the feature documented? **not documented** -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [FLINK-34745] Improve validations for a period in Time Travel [flink]
flinkbot commented on PR #24534: URL: https://github.com/apache/flink/pull/24534#issuecomment-2007457908 ## CI report: * 16df9b27a63c67e540905939e98775c3c57d99c5 UNKNOWN Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (FLINK-34745) Parsing temporal table join throws cryptic exceptions
[ https://issues.apache.org/jira/browse/FLINK-34745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated FLINK-34745: --- Labels: pull-request-available (was: ) > Parsing temporal table join throws cryptic exceptions > - > > Key: FLINK-34745 > URL: https://issues.apache.org/jira/browse/FLINK-34745 > Project: Flink > Issue Type: Bug > Components: Table SQL / Planner >Affects Versions: 1.19.0 >Reporter: Dawid Wysakowicz >Assignee: Dawid Wysakowicz >Priority: Major > Labels: pull-request-available > Fix For: 1.20.0 > > > 1. Wrong expression type in {{AS OF}}: > {code} > SELECT * " + > "FROM Orders AS o JOIN " + > "RatesHistoryWithPK FOR SYSTEM_TIME AS OF 'o.rowtime' AS r " + > "ON o.currency = r.currency > {code} > throws: > {code} > java.lang.AssertionError: cannot convert CHAR literal to class > org.apache.calcite.util.TimestampString > {code} > 2. Not a simple table reference in {{AS OF}} > {code} > SELECT * " + > "FROM Orders AS o JOIN " + > "RatesHistoryWithPK FOR SYSTEM_TIME AS OF o.rowtime + INTERVAL '1' > SECOND AS r " + > "ON o.currency = r.currency > {code} > throws: > {code} > java.lang.AssertionError: no unique expression found for {id: o.rowtime, > prefix: 1}; count is 0 > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[PR] [FLINK-34745] Improve validations for a period in Time Travel [flink]
dawidwys opened a new pull request, #24534: URL: https://github.com/apache/flink/pull/24534 ## What is the purpose of the change The provided SQLs are invalid, however the current code base fails on unexpected locations giving unhelpful messages. This PR improves the validation which results in proper error messages. ## Verifying this change Added tests. ## Does this pull request potentially affect one of the following parts: - Dependencies (does it add or upgrade a dependency): (yes / **no**) - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: (yes / **no**) - The serializers: (yes / **no** / don't know) - The runtime per-record code paths (performance sensitive): (yes / **no** / don't know) - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: (yes / **no** / don't know) - The S3 file system connector: (yes / **no** / don't know) ## Documentation - Does this pull request introduce a new feature? (yes / **no**) - If yes, how is the feature documented? (**not applicable** / docs / JavaDocs / not documented) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [FLINK-34718][tests] Exclude operators not meant for adaptive scheduler [flink]
flinkbot commented on PR #24533: URL: https://github.com/apache/flink/pull/24533#issuecomment-2007442964 ## CI report: * 555032883dddaa40c52c3b789a108de7f5adf55e UNKNOWN Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [FLINK-34718][tests] Exclude operators not meant for adaptive scheduler [flink]
XComp commented on PR #24533: URL: https://github.com/apache/flink/pull/24533#issuecomment-2007438599 @RyanSkraba The tests will pass in any way because the PR CI run will only utilize the `DefaultScheduler`. You could test this by enabling the `AdaptiveScheduler` profile in your PR or by triggering a nightly GHA workflow on this PR. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [FLINK-34718][tests] Exclude operators not meant for adaptive scheduler [flink]
RyanSkraba commented on PR #24533: URL: https://github.com/apache/flink/pull/24533#issuecomment-2007422589 @WencongLiu If this passes, can you take a look? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (FLINK-34718) KeyedPartitionWindowedStream and NonPartitionWindowedStream IllegalStateException in AZP
[ https://issues.apache.org/jira/browse/FLINK-34718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated FLINK-34718: --- Labels: pull-request-available test-stability (was: test-stability) > KeyedPartitionWindowedStream and NonPartitionWindowedStream > IllegalStateException in AZP > > > Key: FLINK-34718 > URL: https://issues.apache.org/jira/browse/FLINK-34718 > Project: Flink > Issue Type: Bug > Components: API / DataStream >Affects Versions: 1.20.0 >Reporter: Ryan Skraba >Priority: Critical > Labels: pull-request-available, test-stability > > [https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=58320=logs=8fd9202e-fd17-5b26-353c-ac1ff76c8f28=ea7cf968-e585-52cb-e0fc-f48de023a7ca=9646] > 18 of the KeyedPartitionWindowedStreamITCase and > NonKeyedPartitionWindowedStreamITCase unit tests introduced in FLINK-34543 > are failing in the adaptive scheduler profile, with errors similar to: > {code:java} > Mar 15 01:54:12 Caused by: java.lang.IllegalStateException: The adaptive > scheduler supports pipelined data exchanges (violated by MapPartition > (org.apache.flink.streaming.runtime.tasks.OneInputStreamTask) -> > ddb598ad156ed281023ba4eebbe487e3). > Mar 15 01:54:12 at > org.apache.flink.util.Preconditions.checkState(Preconditions.java:215) > Mar 15 01:54:12 at > org.apache.flink.runtime.scheduler.adaptive.AdaptiveScheduler.assertPreconditions(AdaptiveScheduler.java:438) > Mar 15 01:54:12 at > org.apache.flink.runtime.scheduler.adaptive.AdaptiveScheduler.(AdaptiveScheduler.java:356) > Mar 15 01:54:12 at > org.apache.flink.runtime.scheduler.adaptive.AdaptiveSchedulerFactory.createInstance(AdaptiveSchedulerFactory.java:124) > Mar 15 01:54:12 at > org.apache.flink.runtime.jobmaster.DefaultSlotPoolServiceSchedulerFactory.createScheduler(DefaultSlotPoolServiceSchedulerFactory.java:121) > Mar 15 01:54:12 at > org.apache.flink.runtime.jobmaster.JobMaster.createScheduler(JobMaster.java:384) > Mar 15 01:54:12 at > org.apache.flink.runtime.jobmaster.JobMaster.(JobMaster.java:361) > Mar 15 01:54:12 at > org.apache.flink.runtime.jobmaster.factories.DefaultJobMasterServiceFactory.internalCreateJobMasterService(DefaultJobMasterServiceFactory.java:128) > Mar 15 01:54:12 at > org.apache.flink.runtime.jobmaster.factories.DefaultJobMasterServiceFactory.lambda$createJobMasterService$0(DefaultJobMasterServiceFactory.java:100) > Mar 15 01:54:12 at > org.apache.flink.util.function.FunctionUtils.lambda$uncheckedSupplier$4(FunctionUtils.java:112) > Mar 15 01:54:12 at > java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1604) > Mar 15 01:54:12 ... 4 more > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[PR] [FLINK-34718][tests] Exclude operators not meant for adaptive scheduler [flink]
RyanSkraba opened a new pull request, #24533: URL: https://github.com/apache/flink/pull/24533 ## What is the purpose of the change * Some of the new tests for the unit tests introduced in FLINK-34543 are failing in the **adaptive scheduler** stage. These operators are not suitable for the adaptive scheduler and should be excluded. ## Brief change log Add `@Tag` to exclude the `KeyedPartitionWindowedStreamITCase` and `NonKeyedPartitionWindowedStreamITCase`. ## Verifying this change This change can be verified in the Azure and GitHub Actions pipeline by verifying the execution of a full build: they should be run in stages that don't have the system property "flink.tests.enable-adaptive-scheduler" set to true (and definitely excluded from the **adaptive scheduler** stage) ## Does this pull request potentially affect one of the following parts: - Dependencies (does it add or upgrade a dependency): **no** - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: **no** - The serializers: **no** - The runtime per-record code paths (performance sensitive): **no** - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: **no** - The S3 file system connector **no** ## Documentation - Does this pull request introduce a new feature? **no** - If yes, how is the feature documented? **not applicable** -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (FLINK-34745) Parsing temporal table join throws cryptic exceptions
[ https://issues.apache.org/jira/browse/FLINK-34745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dawid Wysakowicz updated FLINK-34745: - Description: 1. Wrong expression type in {{AS OF}}: {code} SELECT * " + "FROM Orders AS o JOIN " + "RatesHistoryWithPK FOR SYSTEM_TIME AS OF 'o.rowtime' AS r " + "ON o.currency = r.currency {code} throws: {code} java.lang.AssertionError: cannot convert CHAR literal to class org.apache.calcite.util.TimestampString {code} 2. Not a simple table reference in {{AS OF}} {code} SELECT * " + "FROM Orders AS o JOIN " + "RatesHistoryWithPK FOR SYSTEM_TIME AS OF o.rowtime + INTERVAL '1' SECOND AS r " + "ON o.currency = r.currency {code} throws: {code} java.lang.AssertionError: no unique expression found for {id: o.rowtime, prefix: 1}; count is 0 {code} was: 1. Wrong expression type in `AS OF`: {code} SELECT * " + "FROM Orders AS o JOIN " + "RatesHistoryWithPK FOR SYSTEM_TIME AS OF 'o.rowtime' AS r " + "ON o.currency = r.currency {code} throws: {code} java.lang.AssertionError: cannot convert CHAR literal to class org.apache.calcite.util.TimestampString {code} 2. Not a table simple table reference {code} SELECT * " + "FROM Orders AS o JOIN " + "RatesHistoryWithPK FOR SYSTEM_TIME AS OF o.rowtime + INTERVAL '1' SECOND AS r " + "ON o.currency = r.currency {code} throws: {code} java.lang.AssertionError: no unique expression found for {id: o.rowtime, prefix: 1}; count is 0 {code} > Parsing temporal table join throws cryptic exceptions > - > > Key: FLINK-34745 > URL: https://issues.apache.org/jira/browse/FLINK-34745 > Project: Flink > Issue Type: Bug > Components: Table SQL / Planner >Affects Versions: 1.19.0 >Reporter: Dawid Wysakowicz >Assignee: Dawid Wysakowicz >Priority: Major > Fix For: 1.20.0 > > > 1. Wrong expression type in {{AS OF}}: > {code} > SELECT * " + > "FROM Orders AS o JOIN " + > "RatesHistoryWithPK FOR SYSTEM_TIME AS OF 'o.rowtime' AS r " + > "ON o.currency = r.currency > {code} > throws: > {code} > java.lang.AssertionError: cannot convert CHAR literal to class > org.apache.calcite.util.TimestampString > {code} > 2. Not a simple table reference in {{AS OF}} > {code} > SELECT * " + > "FROM Orders AS o JOIN " + > "RatesHistoryWithPK FOR SYSTEM_TIME AS OF o.rowtime + INTERVAL '1' > SECOND AS r " + > "ON o.currency = r.currency > {code} > throws: > {code} > java.lang.AssertionError: no unique expression found for {id: o.rowtime, > prefix: 1}; count is 0 > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-34745) Parsing temporal table join throws cryptic exceptions
Dawid Wysakowicz created FLINK-34745: Summary: Parsing temporal table join throws cryptic exceptions Key: FLINK-34745 URL: https://issues.apache.org/jira/browse/FLINK-34745 Project: Flink Issue Type: Bug Components: Table SQL / Planner Affects Versions: 1.19.0 Reporter: Dawid Wysakowicz Assignee: Dawid Wysakowicz Fix For: 1.20.0 1. Wrong expression type in `AS OF`: {code} SELECT * " + "FROM Orders AS o JOIN " + "RatesHistoryWithPK FOR SYSTEM_TIME AS OF 'o.rowtime' AS r " + "ON o.currency = r.currency {code} throws: {code} java.lang.AssertionError: cannot convert CHAR literal to class org.apache.calcite.util.TimestampString {code} 2. Not a table simple table reference {code} SELECT * " + "FROM Orders AS o JOIN " + "RatesHistoryWithPK FOR SYSTEM_TIME AS OF o.rowtime + INTERVAL '1' SECOND AS r " + "ON o.currency = r.currency {code} throws: {code} java.lang.AssertionError: no unique expression found for {id: o.rowtime, prefix: 1}; count is 0 {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-34718) KeyedPartitionWindowedStream and NonPartitionWindowedStream IllegalStateException in AZP
[ https://issues.apache.org/jira/browse/FLINK-34718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17828371#comment-17828371 ] Ryan Skraba commented on FLINK-34718: - I haven't been following FLIP-331 – if these operators are never appropriate for the AdaptiveScheduler, then we should exclude them from that stage in the pipeline. If I understand correctly, this is a relatively easy thing to do for both Azure and GitHub actions. I'll take a look. > KeyedPartitionWindowedStream and NonPartitionWindowedStream > IllegalStateException in AZP > > > Key: FLINK-34718 > URL: https://issues.apache.org/jira/browse/FLINK-34718 > Project: Flink > Issue Type: Bug > Components: API / DataStream >Affects Versions: 1.20.0 >Reporter: Ryan Skraba >Priority: Critical > Labels: test-stability > > [https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=58320=logs=8fd9202e-fd17-5b26-353c-ac1ff76c8f28=ea7cf968-e585-52cb-e0fc-f48de023a7ca=9646] > 18 of the KeyedPartitionWindowedStreamITCase and > NonKeyedPartitionWindowedStreamITCase unit tests introduced in FLINK-34543 > are failing in the adaptive scheduler profile, with errors similar to: > {code:java} > Mar 15 01:54:12 Caused by: java.lang.IllegalStateException: The adaptive > scheduler supports pipelined data exchanges (violated by MapPartition > (org.apache.flink.streaming.runtime.tasks.OneInputStreamTask) -> > ddb598ad156ed281023ba4eebbe487e3). > Mar 15 01:54:12 at > org.apache.flink.util.Preconditions.checkState(Preconditions.java:215) > Mar 15 01:54:12 at > org.apache.flink.runtime.scheduler.adaptive.AdaptiveScheduler.assertPreconditions(AdaptiveScheduler.java:438) > Mar 15 01:54:12 at > org.apache.flink.runtime.scheduler.adaptive.AdaptiveScheduler.(AdaptiveScheduler.java:356) > Mar 15 01:54:12 at > org.apache.flink.runtime.scheduler.adaptive.AdaptiveSchedulerFactory.createInstance(AdaptiveSchedulerFactory.java:124) > Mar 15 01:54:12 at > org.apache.flink.runtime.jobmaster.DefaultSlotPoolServiceSchedulerFactory.createScheduler(DefaultSlotPoolServiceSchedulerFactory.java:121) > Mar 15 01:54:12 at > org.apache.flink.runtime.jobmaster.JobMaster.createScheduler(JobMaster.java:384) > Mar 15 01:54:12 at > org.apache.flink.runtime.jobmaster.JobMaster.(JobMaster.java:361) > Mar 15 01:54:12 at > org.apache.flink.runtime.jobmaster.factories.DefaultJobMasterServiceFactory.internalCreateJobMasterService(DefaultJobMasterServiceFactory.java:128) > Mar 15 01:54:12 at > org.apache.flink.runtime.jobmaster.factories.DefaultJobMasterServiceFactory.lambda$createJobMasterService$0(DefaultJobMasterServiceFactory.java:100) > Mar 15 01:54:12 at > org.apache.flink.util.function.FunctionUtils.lambda$uncheckedSupplier$4(FunctionUtils.java:112) > Mar 15 01:54:12 at > java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1604) > Mar 15 01:54:12 ... 4 more > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-34727) RestClusterClient.requestJobResult throw ConnectionClosedException when the accumulator data is large
[ https://issues.apache.org/jira/browse/FLINK-34727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wancheng Xiao updated FLINK-34727: -- Description: The task was succeed, but "RestClusterClient.requestJobResult()" encountered an error reporting ConnectionClosedException. (Channel became inactive) After debugging, it is speculated that the problem occurred in the flink task server-side "AbstractRestHandler.respondToRequest()" with the "response.thenAccept(resp -> HandlerUtils.sendResponse())", this "thenAccept()" did not pass the future returned by sendResponse, causing the server shutdown process before the request was sent. I suspect that "thenAccept()" needs to be replaced with "thenCompose()" The details are as follows: *Pseudocode:* !image-2024-03-19-15-51-20-150.png|width=802,height=222! *Server handling steps:* netty-thread: got request flink-dispatcher-thread: exec requestJobResult[6] and complete shutDownFuture[8], then call HandlerUtils.sendResponse[13](netty async write) netty-thread: write some data to channel.(not done) flink-dispatcher-thread: call inFlightRequestTracker.deregisterRequest[15] netty-thread: write some data to channel failed, channel not active i added some log to trace this bug: !AbstractHandler.png|width=406,height=313! !AbstractRestHandler.png|width=418,height=322! !MiniDispatcher.png|width=419,height=277! !RestServerEndpoint.png|width=419,height=279! then i got: /{*}then call requestJobResult and shutDownFuture.complete; (close channel when request deregisted){*}/ 2024-03-17 18:01:34.788 [flink-akka.actor.default-dispatcher-20] INFO o.a.flink.runtime.rest.handler.job.JobExecutionResultHandler - JobExecutionResultHandler gateway.requestJobStatus complete. [jobStatus=FINISHED] /{*}submit sendResponse{*}/ 2024-03-17 18:01:34.821 [flink-akka.actor.default-dispatcher-20] INFO o.a.flink.runtime.rest.handler.job.JobExecutionResultHandler - submit HandlerUtils.sendResponse(). /{*}thenAccept(sendResponse()) is complete, will call inFlightRequestTracker, but sendResponse's return future not completed{*} / 2024-03-17 18:01:34.821 [flink-akka.actor.default-dispatcher-20] INFO o.a.flink.runtime.rest.handler.job.JobExecutionResultHandler - requestProcessingFuture complete. [requestProcessingFuture=java.util.concurrent.CompletableFuture@1329aca5[Completed normally]] /{*}sendResponse's write task is still running{*}/ 2024-03-17 18:01:34.822 [flink-rest-server-netty-worker-thread-10] INFO o.a.f.s.netty4.io.netty.handler.stream.ChunkedWriteHandler - write /{*}deregister request and then shut down, then channel close{*}/ 2024-03-17 18:01:34.826 [flink-akka.actor.default-dispatcher-20] INFO o.a.flink.runtime.rest.handler.job.JobExecutionResultHandler - call inFlightRequestTracker.deregisterRequest() done 2024-03-17 18:01:34.827 [flink-rest-server-netty-worker-thread-10] INFO o.a.f.shaded.netty4.io.netty.channel.DefaultChannelPipeline - pipeline close. 2024-03-17 18:01:34.827 [flink-rest-server-netty-worker-thread-10] INFO org.apache.flink.runtime.rest.handler.util.HandlerUtils - lastContentFuture complete. [future=DefaultChannelPromise@621f03ea(failure: java.nio.channels.ClosedChannelException)] *more details in flink_bug_complex.log* Additionally: During the process of investigating this bug, FutureUtils.retryOperationWithDelay swallowed the first occurrence of the "Channel became inactive" exception and, after several retries, the server was shut down,then the client throw "Connection refused" Exception. which had some impact on the troubleshooting process. Could we consider adding some logging here to aid in future diagnostics? was: The task was succeed, but "RestClusterClient.requestJobResult()" encountered an error reporting ConnectionClosedException. (Channel became inactive) After debugging, it is speculated that the problem occurred in the flink task server-side "AbstractRestHandler.respondToRequest()" with the "response.thenAccept(resp -> HandlerUtils.sendResponse())", this "thenAccept()" did not pass the future returned by sendResponse, causing the server shutdown process before the request was sent. I suspect that "thenAccept()" needs to be replaced with "thenCompose()" The details are as follows: *Pseudocode:* !image-2024-03-19-15-51-20-150.png! *Server handling steps:* netty-thread: got request flink-dispatcher-thread: exec requestJobResult[6] and complete shutDownFuture[8], then call HandlerUtils.sendResponse[13](netty async write) netty-thread: write some data to channel.(not done) flink-dispatcher-thread: call inFlightRequestTracker.deregisterRequest[15] netty-thread: write some data to channel failed, channel not active i added some log to trace this bug: !AbstractHandler.png! !AbstractRestHandler.png! !MiniDispatcher.png! !RestServerEndpoint.png! then i got: /{*}then call requestJobResult and
[jira] [Assigned] (FLINK-34723) Parquet writer should restrict map keys to be not null
[ https://issues.apache.org/jira/browse/FLINK-34723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xingcan Cui reassigned FLINK-34723: --- Assignee: (was: Xingcan Cui) > Parquet writer should restrict map keys to be not null > -- > > Key: FLINK-34723 > URL: https://issues.apache.org/jira/browse/FLINK-34723 > Project: Flink > Issue Type: Bug > Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile) >Affects Versions: 1.19.0, 1.18.1 >Reporter: Xingcan Cui >Priority: Major > Labels: pull-request-available > > We got the following exception when reading a parquet file (with map types) > generated by Flink. > {code:java} > Map keys must be annotated as required.{code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-34718) KeyedPartitionWindowedStream and NonPartitionWindowedStream IllegalStateException in AZP
[ https://issues.apache.org/jira/browse/FLINK-34718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17828348#comment-17828348 ] Ryan Skraba commented on FLINK-34718: - * [https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=58398=logs=8fd9202e-fd17-5b26-353c-ac1ff76c8f28=ea7cf968-e585-52cb-e0fc-f48de023a7ca=9670] > KeyedPartitionWindowedStream and NonPartitionWindowedStream > IllegalStateException in AZP > > > Key: FLINK-34718 > URL: https://issues.apache.org/jira/browse/FLINK-34718 > Project: Flink > Issue Type: Bug > Components: API / DataStream >Affects Versions: 1.20.0 >Reporter: Ryan Skraba >Priority: Critical > Labels: test-stability > > [https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=58320=logs=8fd9202e-fd17-5b26-353c-ac1ff76c8f28=ea7cf968-e585-52cb-e0fc-f48de023a7ca=9646] > 18 of the KeyedPartitionWindowedStreamITCase and > NonKeyedPartitionWindowedStreamITCase unit tests introduced in FLINK-34543 > are failing in the adaptive scheduler profile, with errors similar to: > {code:java} > Mar 15 01:54:12 Caused by: java.lang.IllegalStateException: The adaptive > scheduler supports pipelined data exchanges (violated by MapPartition > (org.apache.flink.streaming.runtime.tasks.OneInputStreamTask) -> > ddb598ad156ed281023ba4eebbe487e3). > Mar 15 01:54:12 at > org.apache.flink.util.Preconditions.checkState(Preconditions.java:215) > Mar 15 01:54:12 at > org.apache.flink.runtime.scheduler.adaptive.AdaptiveScheduler.assertPreconditions(AdaptiveScheduler.java:438) > Mar 15 01:54:12 at > org.apache.flink.runtime.scheduler.adaptive.AdaptiveScheduler.(AdaptiveScheduler.java:356) > Mar 15 01:54:12 at > org.apache.flink.runtime.scheduler.adaptive.AdaptiveSchedulerFactory.createInstance(AdaptiveSchedulerFactory.java:124) > Mar 15 01:54:12 at > org.apache.flink.runtime.jobmaster.DefaultSlotPoolServiceSchedulerFactory.createScheduler(DefaultSlotPoolServiceSchedulerFactory.java:121) > Mar 15 01:54:12 at > org.apache.flink.runtime.jobmaster.JobMaster.createScheduler(JobMaster.java:384) > Mar 15 01:54:12 at > org.apache.flink.runtime.jobmaster.JobMaster.(JobMaster.java:361) > Mar 15 01:54:12 at > org.apache.flink.runtime.jobmaster.factories.DefaultJobMasterServiceFactory.internalCreateJobMasterService(DefaultJobMasterServiceFactory.java:128) > Mar 15 01:54:12 at > org.apache.flink.runtime.jobmaster.factories.DefaultJobMasterServiceFactory.lambda$createJobMasterService$0(DefaultJobMasterServiceFactory.java:100) > Mar 15 01:54:12 at > org.apache.flink.util.function.FunctionUtils.lambda$uncheckedSupplier$4(FunctionUtils.java:112) > Mar 15 01:54:12 at > java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1604) > Mar 15 01:54:12 ... 4 more > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Reopened] (FLINK-34643) JobIDLoggingITCase failed
[ https://issues.apache.org/jira/browse/FLINK-34643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matthias Pohl reopened FLINK-34643: --- I'm reopening the issue. [~roman] is this due to the fact that we haven't collected all the logs that should be considered (due to the "empirical nature" of this test)? > JobIDLoggingITCase failed > - > > Key: FLINK-34643 > URL: https://issues.apache.org/jira/browse/FLINK-34643 > Project: Flink > Issue Type: Bug > Components: Runtime / Coordination >Affects Versions: 1.20.0 >Reporter: Matthias Pohl >Assignee: Roman Khachatryan >Priority: Major > Labels: pull-request-available, test-stability > Fix For: 1.20.0 > > > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=58187=logs=8fd9202e-fd17-5b26-353c-ac1ff76c8f28=ea7cf968-e585-52cb-e0fc-f48de023a7ca=7897 > {code} > Mar 09 01:24:23 01:24:23.498 [ERROR] Tests run: 1, Failures: 0, Errors: 1, > Skipped: 0, Time elapsed: 4.209 s <<< FAILURE! -- in > org.apache.flink.test.misc.JobIDLoggingITCase > Mar 09 01:24:23 01:24:23.498 [ERROR] > org.apache.flink.test.misc.JobIDLoggingITCase.testJobIDLogging(ClusterClient) > -- Time elapsed: 1.459 s <<< ERROR! > Mar 09 01:24:23 java.lang.IllegalStateException: Too few log events recorded > for org.apache.flink.runtime.jobmaster.JobMaster (12) - this must be a bug in > the test code > Mar 09 01:24:23 at > org.apache.flink.util.Preconditions.checkState(Preconditions.java:215) > Mar 09 01:24:23 at > org.apache.flink.test.misc.JobIDLoggingITCase.assertJobIDPresent(JobIDLoggingITCase.java:148) > Mar 09 01:24:23 at > org.apache.flink.test.misc.JobIDLoggingITCase.testJobIDLogging(JobIDLoggingITCase.java:132) > Mar 09 01:24:23 at java.lang.reflect.Method.invoke(Method.java:498) > Mar 09 01:24:23 at > java.util.concurrent.RecursiveAction.exec(RecursiveAction.java:189) > Mar 09 01:24:23 at > java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289) > Mar 09 01:24:23 at > java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056) > Mar 09 01:24:23 at > java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692) > Mar 09 01:24:23 at > java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:175) > Mar 09 01:24:23 > {code} > The other test failures of this build were also caused by the same test: > * > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=58187=logs=2c3cbe13-dee0-5837-cf47-3053da9a8a78=b78d9d30-509a-5cea-1fef-db7abaa325ae=8349 > * > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=58187=logs=a596f69e-60d2-5a4b-7d39-dc69e4cdaed3=712ade8c-ca16-5b76-3acd-14df33bc1cb1=8209 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-34744) autoscaling-dynamic cannot run
[ https://issues.apache.org/jira/browse/FLINK-34744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated FLINK-34744: --- Labels: pull-request-available (was: ) > autoscaling-dynamic cannot run > -- > > Key: FLINK-34744 > URL: https://issues.apache.org/jira/browse/FLINK-34744 > Project: Flink > Issue Type: Bug > Components: Autoscaler >Reporter: Rui Fan >Assignee: Rui Fan >Priority: Major > Labels: pull-request-available > Fix For: kubernetes-operator-1.9.0 > > Attachments: image-2024-03-19-21-46-15-530.png > > > autoscaling-dynamic cannot run on my Mac > !image-2024-03-19-21-46-15-530.png! -- This message was sent by Atlassian Jira (v8.20.10#820010)
[PR] [FLINK-34744][autoscaler] Fix the issue that autoscaling-dynamic cannot run [flink-kubernetes-operator]
1996fanrui opened a new pull request, #800: URL: https://github.com/apache/flink-kubernetes-operator/pull/800 autoscaling-dynamic cannot run on my Local I guess yaml cannot hanlde the `\n` properly with multiple lines. https://github.com/apache/flink-kubernetes-operator/assets/38427477/2abe2602-3d16-478a-992d-49d7e536c48f;> -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (FLINK-34744) autoscaling-dynamic cannot run
Rui Fan created FLINK-34744: --- Summary: autoscaling-dynamic cannot run Key: FLINK-34744 URL: https://issues.apache.org/jira/browse/FLINK-34744 Project: Flink Issue Type: Bug Components: Autoscaler Reporter: Rui Fan Assignee: Rui Fan Fix For: kubernetes-operator-1.9.0 Attachments: image-2024-03-19-21-46-15-530.png autoscaling-dynamic cannot run on my Mac !image-2024-03-19-21-46-15-530.png! -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-34404) GroupWindowAggregateProcTimeRestoreTest#testRestore times out
[ https://issues.apache.org/jira/browse/FLINK-34404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17828345#comment-17828345 ] Ryan Skraba commented on FLINK-34404: - This also occurs on AZP: * [https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=58399=logs=f2c100be-250b-5e85-7bbe-176f68fcddc5=05efd11e-5400-54a4-0d27-a4663be008a9=12626 |https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=58399=logs=f2c100be-250b-5e85-7bbe-176f68fcddc5=05efd11e-5400-54a4-0d27-a4663be008a9=12626](1.19) > GroupWindowAggregateProcTimeRestoreTest#testRestore times out > - > > Key: FLINK-34404 > URL: https://issues.apache.org/jira/browse/FLINK-34404 > Project: Flink > Issue Type: Bug > Components: Table SQL / Planner >Affects Versions: 1.19.0, 1.20.0 >Reporter: Matthias Pohl >Assignee: Alan Sheinberg >Priority: Critical > Labels: test-stability > Attachments: FLINK-34404.failure.log, FLINK-34404.success.log > > > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57357=logs=32715a4c-21b8-59a3-4171-744e5ab107eb=ff64056b-5320-5afe-c22c-6fa339e59586=11603 > {code} > Feb 07 02:17:40 "ForkJoinPool-74-worker-1" #382 daemon prio=5 os_prio=0 > cpu=282.22ms elapsed=961.78s tid=0x7f880a485c00 nid=0x6745 waiting on > condition [0x7f878a6f9000] > Feb 07 02:17:40java.lang.Thread.State: WAITING (parking) > Feb 07 02:17:40 at > jdk.internal.misc.Unsafe.park(java.base@17.0.7/Native Method) > Feb 07 02:17:40 - parking to wait for <0xff73d060> (a > java.util.concurrent.CompletableFuture$Signaller) > Feb 07 02:17:40 at > java.util.concurrent.locks.LockSupport.park(java.base@17.0.7/LockSupport.java:211) > Feb 07 02:17:40 at > java.util.concurrent.CompletableFuture$Signaller.block(java.base@17.0.7/CompletableFuture.java:1864) > Feb 07 02:17:40 at > java.util.concurrent.ForkJoinPool.compensatedBlock(java.base@17.0.7/ForkJoinPool.java:3449) > Feb 07 02:17:40 at > java.util.concurrent.ForkJoinPool.managedBlock(java.base@17.0.7/ForkJoinPool.java:3432) > Feb 07 02:17:40 at > java.util.concurrent.CompletableFuture.waitingGet(java.base@17.0.7/CompletableFuture.java:1898) > Feb 07 02:17:40 at > java.util.concurrent.CompletableFuture.get(java.base@17.0.7/CompletableFuture.java:2072) > Feb 07 02:17:40 at > org.apache.flink.table.planner.plan.nodes.exec.testutils.RestoreTestBase.testRestore(RestoreTestBase.java:292) > Feb 07 02:17:40 at > jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(java.base@17.0.7/Native > Method) > Feb 07 02:17:40 at > jdk.internal.reflect.NativeMethodAccessorImpl.invoke(java.base@17.0.7/NativeMethodAccessorImpl.java:77) > Feb 07 02:17:40 at > jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(java.base@17.0.7/DelegatingMethodAccessorImpl.java:43) > Feb 07 02:17:40 at > java.lang.reflect.Method.invoke(java.base@17.0.7/Method.java:568) > Feb 07 02:17:40 at > org.junit.platform.commons.util.ReflectionUtils.invokeMethod(ReflectionUtils.java:728) > [...] > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Comment Edited] (FLINK-33186) CheckpointAfterAllTasksFinishedITCase.testRestoreAfterSomeTasksFinished fails on AZP
[ https://issues.apache.org/jira/browse/FLINK-33186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17828331#comment-17828331 ] Ryan Skraba edited comment on FLINK-33186 at 3/19/24 1:35 PM: -- * [https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=58399=logs=39d5b1d5-3b41-54dc-6458-1e2ddd1cdcf3=0c010d0c-3dec-5bf1-d408-7b18988b1b2b=8063] (1.20) was (Author: ryanskraba): * [https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=58399=logs=39d5b1d5-3b41-54dc-6458-1e2ddd1cdcf3=0c010d0c-3dec-5bf1-d408-7b18988b1b2b=8063] > CheckpointAfterAllTasksFinishedITCase.testRestoreAfterSomeTasksFinished > fails on AZP > - > > Key: FLINK-33186 > URL: https://issues.apache.org/jira/browse/FLINK-33186 > Project: Flink > Issue Type: Bug > Components: Runtime / Checkpointing >Affects Versions: 1.19.0, 1.18.1 >Reporter: Sergey Nuyanzin >Assignee: Jiang Xin >Priority: Critical > Labels: test-stability > > This build > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=53509=logs=baf26b34-3c6a-54e8-f93f-cf269b32f802=8c9d126d-57d2-5a9e-a8c8-ff53f7b35cd9=8762 > fails as > {noformat} > Sep 28 01:23:43 Caused by: > org.apache.flink.runtime.checkpoint.CheckpointException: Task local > checkpoint failure. > Sep 28 01:23:43 at > org.apache.flink.runtime.checkpoint.PendingCheckpoint.abort(PendingCheckpoint.java:550) > Sep 28 01:23:43 at > org.apache.flink.runtime.checkpoint.CheckpointCoordinator.abortPendingCheckpoint(CheckpointCoordinator.java:2248) > Sep 28 01:23:43 at > org.apache.flink.runtime.checkpoint.CheckpointCoordinator.abortPendingCheckpoint(CheckpointCoordinator.java:2235) > Sep 28 01:23:43 at > org.apache.flink.runtime.checkpoint.CheckpointCoordinator.lambda$null$9(CheckpointCoordinator.java:817) > Sep 28 01:23:43 at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > Sep 28 01:23:43 at > java.util.concurrent.FutureTask.run(FutureTask.java:266) > Sep 28 01:23:43 at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) > Sep 28 01:23:43 at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) > Sep 28 01:23:43 at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > Sep 28 01:23:43 at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > Sep 28 01:23:43 at java.lang.Thread.run(Thread.java:748) > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-33186) CheckpointAfterAllTasksFinishedITCase.testRestoreAfterSomeTasksFinished fails on AZP
[ https://issues.apache.org/jira/browse/FLINK-33186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17828331#comment-17828331 ] Ryan Skraba commented on FLINK-33186: - * [https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=58399=logs=39d5b1d5-3b41-54dc-6458-1e2ddd1cdcf3=0c010d0c-3dec-5bf1-d408-7b18988b1b2b=8063] > CheckpointAfterAllTasksFinishedITCase.testRestoreAfterSomeTasksFinished > fails on AZP > - > > Key: FLINK-33186 > URL: https://issues.apache.org/jira/browse/FLINK-33186 > Project: Flink > Issue Type: Bug > Components: Runtime / Checkpointing >Affects Versions: 1.19.0, 1.18.1 >Reporter: Sergey Nuyanzin >Assignee: Jiang Xin >Priority: Critical > Labels: test-stability > > This build > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=53509=logs=baf26b34-3c6a-54e8-f93f-cf269b32f802=8c9d126d-57d2-5a9e-a8c8-ff53f7b35cd9=8762 > fails as > {noformat} > Sep 28 01:23:43 Caused by: > org.apache.flink.runtime.checkpoint.CheckpointException: Task local > checkpoint failure. > Sep 28 01:23:43 at > org.apache.flink.runtime.checkpoint.PendingCheckpoint.abort(PendingCheckpoint.java:550) > Sep 28 01:23:43 at > org.apache.flink.runtime.checkpoint.CheckpointCoordinator.abortPendingCheckpoint(CheckpointCoordinator.java:2248) > Sep 28 01:23:43 at > org.apache.flink.runtime.checkpoint.CheckpointCoordinator.abortPendingCheckpoint(CheckpointCoordinator.java:2235) > Sep 28 01:23:43 at > org.apache.flink.runtime.checkpoint.CheckpointCoordinator.lambda$null$9(CheckpointCoordinator.java:817) > Sep 28 01:23:43 at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > Sep 28 01:23:43 at > java.util.concurrent.FutureTask.run(FutureTask.java:266) > Sep 28 01:23:43 at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) > Sep 28 01:23:43 at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) > Sep 28 01:23:43 at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > Sep 28 01:23:43 at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > Sep 28 01:23:43 at java.lang.Thread.run(Thread.java:748) > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-34720) Deploy Maven Snapshot failed on AZP
[ https://issues.apache.org/jira/browse/FLINK-34720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17828330#comment-17828330 ] Ryan Skraba commented on FLINK-34720: - * [https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=58398=logs=eca6b3a6-1600-56cc-916a-c549b3cde3ff=7b3c1df5-9194-5183-5ebd-5567f52d5f8f=34] (before the revert) > Deploy Maven Snapshot failed on AZP > --- > > Key: FLINK-34720 > URL: https://issues.apache.org/jira/browse/FLINK-34720 > Project: Flink > Issue Type: Bug > Components: Build System / CI >Affects Versions: 1.20.0 >Reporter: Ryan Skraba >Priority: Blocker > Labels: pull-request-available, test-stability > > > There isn't any obvious reason that {{mvn: command not found}} could have > occurred, but we saw it three times this weekend. > * > [https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=58352=logs=eca6b3a6-1600-56cc-916a-c549b3cde3ff=7b3c1df5-9194-5183-5ebd-5567f52d5f8f] > > * > [https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=58359=logs=eca6b3a6-1600-56cc-916a-c549b3cde3ff=7b3c1df5-9194-5183-5ebd-5567f52d5f8f=36] > * > [https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=58366=logs=eca6b3a6-1600-56cc-916a-c549b3cde3ff=7b3c1df5-9194-5183-5ebd-5567f52d5f8f=36] > > {code:java} > + [[ tools != \t\o\o\l\s ]] > + cd .. > + echo 'Deploying to repository.apache.org' > + COMMON_OPTIONS='-Prelease,docs-and-source -DskipTests > -DretryFailedDeploymentCount=10 -Dmaven.repo.local=/__w/1/.m2/repository > -Dmaven.wagon.http.pool=false -Dorg.slf4j.simpleLogger.showDateTime=true > -Dorg.slf4j.simpleLogger.dateTimeFormat=HH:mm:ss.SSS > -Dorg.slf4j.simpleLogger.log.org.apache.maven.cli.transfer.Slf4jMavenTransferListener=warn > --no-snapshot-updates -B -Dgpg.skip -Drat.skip -Dcheckstyle.skip > --settings /__w/1/s/tools/deploy-settings.xml' > + mvn clean deploy -Prelease,docs-and-source -DskipTests > -DretryFailedDeploymentCount=10 -Dmaven.repo.local=/__w/1/.m2/repository > -Dmaven.wagon.http.pool=false -Dorg.slf4j.simpleLogger.showDateTime=true > -Dorg.slf4j.simpleLogger.dateTimeFormat=HH:mm:ss.SSS > -Dorg.slf4j.simpleLogger.log.org.apache.maven.cli.transfer.Slf4jMavenTransferListener=warn > --no-snapshot-updates -B -Dgpg.skip -Drat.skip -Dcheckstyle.skip --settings > /__w/1/s/tools/deploy-settings.xml > Deploying to repository.apache.org > ./releasing/deploy_staging_jars.sh: line 46: mvn: command not found > ##[error]Bash exited with code '127'. > Finishing: Deploy maven snapshot > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-30535) Introduce TTL state based benchmarks
[ https://issues.apache.org/jira/browse/FLINK-30535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17828329#comment-17828329 ] Zakelly Lan commented on FLINK-30535: - [~rovboyko] Actually this has been resolved. Closing... > Introduce TTL state based benchmarks > > > Key: FLINK-30535 > URL: https://issues.apache.org/jira/browse/FLINK-30535 > Project: Flink > Issue Type: New Feature > Components: Benchmarks >Reporter: Yun Tang >Assignee: Zakelly Lan >Priority: Major > Labels: pull-request-available > > This ticket is inspired by https://issues.apache.org/jira/browse/FLINK-30088 > which wants to optimize the TTL state performance. I think it would be useful > to introduce state benchmarks based on TTL as Flink has some overhead to > support TTL. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (FLINK-30535) Introduce TTL state based benchmarks
[ https://issues.apache.org/jira/browse/FLINK-30535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zakelly Lan resolved FLINK-30535. - Resolution: Fixed > Introduce TTL state based benchmarks > > > Key: FLINK-30535 > URL: https://issues.apache.org/jira/browse/FLINK-30535 > Project: Flink > Issue Type: New Feature > Components: Benchmarks >Reporter: Yun Tang >Assignee: Zakelly Lan >Priority: Major > Labels: pull-request-available > > This ticket is inspired by https://issues.apache.org/jira/browse/FLINK-30088 > which wants to optimize the TTL state performance. I think it would be useful > to introduce state benchmarks based on TTL as Flink has some overhead to > support TTL. -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [PR] [FLINK-34732][cdc][ci] Add document dead link check for Flink CDC Documentation [flink-cdc]
leonardBang commented on PR #3168: URL: https://github.com/apache/flink-cdc/pull/3168#issuecomment-2007184430 > @leonardBang @PatrickRen First ,I add this ci and quick fix some dead link. Some dead link maybe cannot fix currently. > > The link like under: [✖] https://repo1.maven.org/maven2/org/apache/flink/flink-cdc-pipeline-connector-mysql/3.0.0/flink-cdc-pipeline-connector-mysql-3.0.0.jar [✖] https://repo1.maven.org/maven2/org/apache/flink/flink-cdc-pipeline-connector-starrocks/3.0.0/flink-cdc-pipeline-connector-starrocks-3.0.0.jar Could we use `/com/ververica` patch to make the url works? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (FLINK-34743) Memory tuning takes effect even if the parallelism isn't changed
[ https://issues.apache.org/jira/browse/FLINK-34743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated FLINK-34743: --- Labels: pull-request-available (was: ) > Memory tuning takes effect even if the parallelism isn't changed > > > Key: FLINK-34743 > URL: https://issues.apache.org/jira/browse/FLINK-34743 > Project: Flink > Issue Type: Improvement > Components: Autoscaler >Reporter: Rui Fan >Assignee: Rui Fan >Priority: Major > Labels: pull-request-available > Fix For: kubernetes-operator-1.9.0 > > > Currently, the memory tuning related logic is only called when the > parallelism is changed. > See ScalingExecutor#scaleResource to get more details. > It's better to let the memory tuning takes effect even if the parallelism > isn't changed. For example, one flink job runs with desired parallelisms, but > it wastes memory. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[PR] [FLINK-34743][autoscaler] Memory tuning takes effect even if the parallelism isn't changed [flink-kubernetes-operator]
1996fanrui opened a new pull request, #799: URL: https://github.com/apache/flink-kubernetes-operator/pull/799 ## What is the purpose of the change Currently, the memory tuning related logic is only called when the parallelism is changed. See ScalingExecutor#scaleResource to get more details. It's better to let the memory tuning takes effect even if the parallelism isn't changed. For example, one flink job runs with desired parallelisms, but it wastes memory. ## Brief change log [FLINK-34743][autoscaler] Memory tuning takes effect even if the parallelism isn't changed - Do scaling when parallelism or memory configuration is changed. ## Verifying this change Manually test done. The unit test is writing. ## Does this pull request potentially affect one of the following parts: - Dependencies (does it add or upgrade a dependency): no - The public API, i.e., is any changes to the `CustomResourceDescriptors`: no - Core observer or reconciler logic that is regularly executed: no ## Documentation - Does this pull request introduce a new feature? no -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (FLINK-34743) Memory tuning takes effect even if the parallelism isn't changed
Rui Fan created FLINK-34743: --- Summary: Memory tuning takes effect even if the parallelism isn't changed Key: FLINK-34743 URL: https://issues.apache.org/jira/browse/FLINK-34743 Project: Flink Issue Type: Improvement Components: Autoscaler Reporter: Rui Fan Assignee: Rui Fan Fix For: kubernetes-operator-1.9.0 Currently, the memory tuning related logic is only called when the parallelism is changed. See ScalingExecutor#scaleResource to get more details. It's better to let the memory tuning takes effect even if the parallelism isn't changed. For example, one flink job runs with desired parallelisms, but it wastes memory. -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [PR] [BP-3.0][FLINK-34677][cdc][docs] Refactor the structure of Flink CDC website documentation [flink-cdc]
leonardBang merged PR #3171: URL: https://github.com/apache/flink-cdc/pull/3171 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (FLINK-34643) JobIDLoggingITCase failed
[ https://issues.apache.org/jira/browse/FLINK-34643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17828316#comment-17828316 ] Ryan Skraba commented on FLINK-34643: - Weird – I collected a lot of build logs yesterday from over the weekend that resemble this error, but apparently my comment didn't get added :/ I'll go back and find those links. In the meantime, [~roman]: we are still seeing failures in the same test that seem very related to this issue. Is it possible that this fix is incomplete and should be reopened, or would you prefer that I raise a new JIRA? * [https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=58398=logs=8fd9202e-fd17-5b26-353c-ac1ff76c8f28=ea7cf968-e585-52cb-e0fc-f48de023a7ca=8249] {code:java} Mar 19 01:23:06 [not all expected events logged by org.apache.flink.runtime.jobmaster.JobMaster, logged: Mar 19 01:23:06 [Logger=org.apache.flink.runtime.jobmaster.JobMaster Level=INFO Message=Initializing job 'Flink Streaming Job' (2ef7e557551a93ef716b6c3ba580bcd6)., Logger=org.apache.flink.runtime.jobmaster.JobMaster Level=INFO Message=Using restart back off time strategy NoRestartBackoffTimeStrategy for Flink Streaming Job (2ef7e557551a93ef716b6c3ba580bcd6)., Logger=org.apache.flink.runtime.jobmaster.JobMaster Level=INFO Message=Starting execution of job 'Flink Streaming Job' (2ef7e557551a93ef716b6c3ba580bcd6) under job master id 90514ce7689864236ebeb94380dc474d., Logger=org.apache.flink.runtime.jobmaster.JobMaster Level=DEBUG Message=Trigger heartbeat request., Logger=org.apache.flink.runtime.jobmaster.JobMaster Level=INFO Message=Connecting to ResourceManager pekko://flink/user/rpc/resourcemanager_1(8eee414f9dea640cb3668826c12e4976), Logger=org.apache.flink.runtime.jobmaster.JobMaster Level=INFO Message=Resolved ResourceManager address, beginning registration, Logger=org.apache.flink.runtime.jobmaster.JobMaster Level=DEBUG Message=Registration at ResourceManager attempt 1 (timeout=100ms), Logger=org.apache.flink.runtime.jobmaster.JobMaster Level=DEBUG Message=Registration with ResourceManager at pekko://flink/user/rpc/resourcemanager_1 was successful., Logger=org.apache.flink.runtime.jobmaster.JobMaster Level=INFO Message=JobManager successfully registered at ResourceManager, leader id: 8eee414f9dea640cb3668826c12e4976., Logger=org.apache.flink.runtime.jobmaster.JobMaster Level=INFO Message=Stopping the JobMaster for job 'Flink Streaming Job' (2ef7e557551a93ef716b6c3ba580bcd6)., Logger=org.apache.flink.runtime.jobmaster.JobMaster Level=INFO Message=Disconnect TaskExecutor 23ae1952-8d6f-476e-b23b-4fad48feec15 because: Stopping JobMaster for job 'Flink Streaming Job' (2ef7e557551a93ef716b6c3ba580bcd6)., Logger=org.apache.flink.runtime.jobmaster.JobMaster Level=DEBUG Message=Close ResourceManager connection 58e840ebb5c16d7fb17f233b9e93cb3c.]] Mar 19 01:23:06 Expecting empty but was: [Checkpoint storage is set to .*, Mar 19 01:23:06 Running initialization on master for job .*, Mar 19 01:23:06 Starting scheduling.*, Mar 19 01:23:06 State backend is set to .*, Mar 19 01:23:06 Successfully created execution graph from job graph .*, Mar 19 01:23:06 Successfully ran initialization on master.*, Mar 19 01:23:06 Triggering a manual checkpoint for job .*., Mar 19 01:23:06 Using failover strategy .*] Mar 19 01:23:06 at org.apache.flink.test.misc.JobIDLoggingITCase.assertJobIDPresent(JobIDLoggingITCase.java:241) Mar 19 01:23:06 at org.apache.flink.test.misc.JobIDLoggingITCase.testJobIDLogging(JobIDLoggingITCase.java:170) Mar 19 01:23:06 at java.lang.reflect.Method.invoke(Method.java:498) Mar 19 01:23:06 at java.util.concurrent.RecursiveAction.exec(RecursiveAction.java:189) Mar 19 01:23:06 at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289) Mar 19 01:23:06 at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056) Mar 19 01:23:06 at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692) Mar 19 01:23:06 at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:175) {code} > JobIDLoggingITCase failed > - > > Key: FLINK-34643 > URL: https://issues.apache.org/jira/browse/FLINK-34643 > Project: Flink > Issue Type: Bug > Components: Runtime / Coordination >Affects Versions: 1.20.0 >Reporter: Matthias Pohl >Assignee: Roman Khachatryan >Priority: Major > Labels: pull-request-available, test-stability > Fix For: 1.20.0 > > > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=58187=logs=8fd9202e-fd17-5b26-353c-ac1ff76c8f28=ea7cf968-e585-52cb-e0fc-f48de023a7ca=7897 > {code} > Mar 09 01:24:23 01:24:23.498 [ERROR] Tests run: 1, Failures: 0, Errors: 1, >
[jira] [Commented] (FLINK-30535) Introduce TTL state based benchmarks
[ https://issues.apache.org/jira/browse/FLINK-30535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17828309#comment-17828309 ] Roman Boyko commented on FLINK-30535: - [~Zakelly] , thank for this functionality! One question - why it's still in OPEN status? > Introduce TTL state based benchmarks > > > Key: FLINK-30535 > URL: https://issues.apache.org/jira/browse/FLINK-30535 > Project: Flink > Issue Type: New Feature > Components: Benchmarks >Reporter: Yun Tang >Assignee: Zakelly Lan >Priority: Major > Labels: pull-request-available > > This ticket is inspired by https://issues.apache.org/jira/browse/FLINK-30088 > which wants to optimize the TTL state performance. I think it would be useful > to introduce state benchmarks based on TTL as Flink has some overhead to > support TTL. -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [PR] [FLINK-31664] Implement ARRAY_INTERSECT function [flink]
liuyongvs commented on PR #24526: URL: https://github.com/apache/flink/pull/24526#issuecomment-2007045444 hi @dawidwys will you help review this? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [FLINK-28693][table] Fix janino compile failed because the code generated refers the class in table-planner [flink]
davidradl commented on PR #24280: URL: https://github.com/apache/flink/pull/24280#issuecomment-2006990240 This is a pr we would like merged. It looks like @snuyanzin asked for tests to be added. @xuyangzhong are you looking at adding the tests? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [FLINK-34562][docs-zh] Port Debezium Avro Confluent changes (FLINK-34509) to Chinese [flink]
Vincent-Woo commented on PR #24508: URL: https://github.com/apache/flink/pull/24508#issuecomment-2006981943 @wuchong @xccui @klion26 Excuse me, do you have time to take a look at this? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [FLINK-25537] [JUnit5 Migration] Module: flink-core [flink]
GOODBOY008 commented on PR #24523: URL: https://github.com/apache/flink/pull/24523#issuecomment-2006975240 @flinkbot run azure -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [FLINK-34732][cdc][ci] Add document dead link check for Flink CDC Documentation [flink-cdc]
GOODBOY008 commented on PR #3168: URL: https://github.com/apache/flink-cdc/pull/3168#issuecomment-2006967612 @leonardBang @PatrickRen First ,I add this ci and quick fix some dead link. Some dead link maybe cannot fix currently. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [FLINK-26088][Connectors/ElasticSearch] Add Elasticsearch 8.0 support [flink-connector-elasticsearch]
drorventura commented on PR #53: URL: https://github.com/apache/flink-connector-elasticsearch/pull/53#issuecomment-2006807958 Thank you @mtfelisb @reswqa could you please approve? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Comment Edited] (FLINK-34726) Flink Kubernetes Operator has some room for optimizing performance.
[ https://issues.apache.org/jira/browse/FLINK-34726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17828262#comment-17828262 ] Fei Feng edited comment on FLINK-34726 at 3/19/24 10:40 AM: I think this two things (rest cluster client and sessionjob's secondary resource) should be contained in xxxContext , then we do not need to create them again and again, and we can avoid unnecessary runtime overhead and GC pressure finally. The difficulty lies in how to promptly update when changes occur. for example, rest cluster client should be recreate or update if jobmanager rest address changed, and if FlinkDeployment object changed, sessionjob's SecondaryResource should be update too was (Author: fei feng): I think this two things (rest cluster client and sessionjob's secondary resource) should be contained in xxxContext , then we do not need to create them again and again, and we can avoid unnecessary runtime overhead and GC pressure finally. The difficulty lies in how to promptly update when changes occur. for example, rest cluster client should be recreate or update if jobmanager rest address changed, and if FlinkDeployment object changed, sessionjob's SecondaryResource should be update > Flink Kubernetes Operator has some room for optimizing performance. > --- > > Key: FLINK-34726 > URL: https://issues.apache.org/jira/browse/FLINK-34726 > Project: Flink > Issue Type: Improvement > Components: Kubernetes Operator >Affects Versions: kubernetes-operator-1.5.0, kubernetes-operator-1.6.0, > kubernetes-operator-1.7.0 >Reporter: Fei Feng >Priority: Major > Attachments: operator_no_submit_no_kill.flamegraph.html > > > When there is a huge number of FlinkDeployment and FlinkSessionJob in a > kubernetes cluster, there will be a significant delay between event submit > into reconcile thread pool and event is processed. > this is our test:we give operator enough resource(cpu: 10core, memory: 20g, > reconcile thread pool size was 200 ) and we deployed 1 jobs firstly (one > FlinkDeployment and one SessionJob per job) , then we do submit/delete job > tests. we found that > 1. it cost about 2min between create new FlinkDeployment and FlinkSessionJob > CR to k8s and the flink job submited to jobmanager. > 2. it cost about 1min between delete a FlinkDeployment and FlinkSessionJob CR > and the flink job and session cluster cleared. > > I use async-profiler to get flamegraph when there is a huge number > FlinkDeployment and FlinkSessionJob. I found two obvious areas for > optimization > 1. For Flinkdeployment: in the observe step, we call > AbstractFlinkService.getClusterInfo/listJobs/getTaskManagerInfo , every time > we call these method we need create RestClusterClient/ send requests/ close, > I think we should reuse RestClusterClient as much as possible to avoid > frequently creating objects to reduce GC pressure > 2. For FlinkSessionJob (This issue is more obvious): in the whole reconcile > loop, we call getSecondaryResource 5 times to get FlinkDeployement resource > info. Based on my current understanding of the Flink Operator, I think we do > not need to call it 5 times in a single reconcile loop, calling it once is > enough. If yes, we cloud save 30% cpu usage (every getSecondaryResource cost > 6% cpu usage) > [^operator_no_submit_no_kill.flamegraph.html] > I hope we can discuss solutions to address this problem together. I'm very > willing to optimize and resolve this issue. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Comment Edited] (FLINK-34726) Flink Kubernetes Operator has some room for optimizing performance.
[ https://issues.apache.org/jira/browse/FLINK-34726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17828262#comment-17828262 ] Fei Feng edited comment on FLINK-34726 at 3/19/24 10:39 AM: I think this two things (rest cluster client and sessionjob's secondary resource) should be contained in xxxContext , then we do not need to create them again and again, and we can avoid unnecessary runtime overhead and GC pressure finally. The difficulty lies in how to promptly update when changes occur. for example, rest cluster client should be recreate or update if jobmanager rest address changed, and if FlinkDeployment object changed, sessionjob's SecondaryResource should be update was (Author: fei feng): I think this two things (rest cluster client and flink deployment resource info ) should be contained in Context, to avoid unnecessary runtime overhead and GC pressure. The difficulty lies in how to promptly update when changes occur. for example, rest cluster client should be recreate or update if jobmanager rest address changed, and if FlinkDeployment object changed, sessionjob's SecondaryResource should be update > Flink Kubernetes Operator has some room for optimizing performance. > --- > > Key: FLINK-34726 > URL: https://issues.apache.org/jira/browse/FLINK-34726 > Project: Flink > Issue Type: Improvement > Components: Kubernetes Operator >Affects Versions: kubernetes-operator-1.5.0, kubernetes-operator-1.6.0, > kubernetes-operator-1.7.0 >Reporter: Fei Feng >Priority: Major > Attachments: operator_no_submit_no_kill.flamegraph.html > > > When there is a huge number of FlinkDeployment and FlinkSessionJob in a > kubernetes cluster, there will be a significant delay between event submit > into reconcile thread pool and event is processed. > this is our test:we give operator enough resource(cpu: 10core, memory: 20g, > reconcile thread pool size was 200 ) and we deployed 1 jobs firstly (one > FlinkDeployment and one SessionJob per job) , then we do submit/delete job > tests. we found that > 1. it cost about 2min between create new FlinkDeployment and FlinkSessionJob > CR to k8s and the flink job submited to jobmanager. > 2. it cost about 1min between delete a FlinkDeployment and FlinkSessionJob CR > and the flink job and session cluster cleared. > > I use async-profiler to get flamegraph when there is a huge number > FlinkDeployment and FlinkSessionJob. I found two obvious areas for > optimization > 1. For Flinkdeployment: in the observe step, we call > AbstractFlinkService.getClusterInfo/listJobs/getTaskManagerInfo , every time > we call these method we need create RestClusterClient/ send requests/ close, > I think we should reuse RestClusterClient as much as possible to avoid > frequently creating objects to reduce GC pressure > 2. For FlinkSessionJob (This issue is more obvious): in the whole reconcile > loop, we call getSecondaryResource 5 times to get FlinkDeployement resource > info. Based on my current understanding of the Flink Operator, I think we do > not need to call it 5 times in a single reconcile loop, calling it once is > enough. If yes, we cloud save 30% cpu usage (every getSecondaryResource cost > 6% cpu usage) > [^operator_no_submit_no_kill.flamegraph.html] > I hope we can discuss solutions to address this problem together. I'm very > willing to optimize and resolve this issue. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-34728) operator does not need to upload and download the jar when deploying session job
[ https://issues.apache.org/jira/browse/FLINK-34728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17828263#comment-17828263 ] Fei Feng commented on FLINK-34728: -- Yes, this is not an issue on the Flink kubernetes operator side, the operator is just a client, and the server side needs to provide new api in order to complete this lightweight submission remodeling. > operator does not need to upload and download the jar when deploying session > job > > > Key: FLINK-34728 > URL: https://issues.apache.org/jira/browse/FLINK-34728 > Project: Flink > Issue Type: Improvement > Components: Deployment / Kubernetes, Kubernetes Operator >Affects Versions: kubernetes-operator-1.5.0, kubernetes-operator-1.6.0, > kubernetes-operator-1.7.0 >Reporter: Fei Feng >Priority: Major > Attachments: image-2024-03-19-15-59-20-933.png > > > Problem: > By reading the source code of the sessionjob's first reconcilition in the > session mode of the flink kubernetes operator, a clear single point of > bottleneck can be identified. When submitting a session job, the operator > needs to first download the job jar from the jarURL to the local storage of > kubernetes pod , then upload the jar to the job manager through the > `/jars/upload` rest api, and finally call the `/jars/:jarid/run` rest api to > launch the job. > In this process, the operator needs to first download the jar and then upload > the jar. When multiple jobs are submitted to the session cluster > simultaneously, the operator can become a single point of bottleneck, which > may be limited by the network traffic or other resource constraints of the > operator pod. > > [https://github.com/apache/flink-kubernetes-operator/blob/main/flink-kubernetes-operator/src/main/java/org/apache/flink/kubernetes/operator/service/AbstractFlinkService.java#L824] > !image-2024-03-19-15-59-20-933.png|width=548,height=432! > > Solution: > We can modify the job submission process in the session mode. The jobmanager > can provide a `/jars/run` rest api that supports self-downloading the job > jar, and the operator only needs to send a rest request to submit the job, > without download and upload the job jar. In this way, the submission pressure > of the operator can be distributed to each job manager. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-34726) Flink Kubernetes Operator has some room for optimizing performance.
[ https://issues.apache.org/jira/browse/FLINK-34726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17828262#comment-17828262 ] Fei Feng commented on FLINK-34726: -- I think this two things (rest cluster client and flink deployment resource info ) should be contained in Context, to avoid unnecessary runtime overhead and GC pressure. The difficulty lies in how to promptly update when changes occur. for example, rest cluster client should be recreate or update if jobmanager rest address changed, and if FlinkDeployment object changed, sessionjob's SecondaryResource should be update > Flink Kubernetes Operator has some room for optimizing performance. > --- > > Key: FLINK-34726 > URL: https://issues.apache.org/jira/browse/FLINK-34726 > Project: Flink > Issue Type: Improvement > Components: Kubernetes Operator >Affects Versions: kubernetes-operator-1.5.0, kubernetes-operator-1.6.0, > kubernetes-operator-1.7.0 >Reporter: Fei Feng >Priority: Major > Attachments: operator_no_submit_no_kill.flamegraph.html > > > When there is a huge number of FlinkDeployment and FlinkSessionJob in a > kubernetes cluster, there will be a significant delay between event submit > into reconcile thread pool and event is processed. > this is our test:we give operator enough resource(cpu: 10core, memory: 20g, > reconcile thread pool size was 200 ) and we deployed 1 jobs firstly (one > FlinkDeployment and one SessionJob per job) , then we do submit/delete job > tests. we found that > 1. it cost about 2min between create new FlinkDeployment and FlinkSessionJob > CR to k8s and the flink job submited to jobmanager. > 2. it cost about 1min between delete a FlinkDeployment and FlinkSessionJob CR > and the flink job and session cluster cleared. > > I use async-profiler to get flamegraph when there is a huge number > FlinkDeployment and FlinkSessionJob. I found two obvious areas for > optimization > 1. For Flinkdeployment: in the observe step, we call > AbstractFlinkService.getClusterInfo/listJobs/getTaskManagerInfo , every time > we call these method we need create RestClusterClient/ send requests/ close, > I think we should reuse RestClusterClient as much as possible to avoid > frequently creating objects to reduce GC pressure > 2. For FlinkSessionJob (This issue is more obvious): in the whole reconcile > loop, we call getSecondaryResource 5 times to get FlinkDeployement resource > info. Based on my current understanding of the Flink Operator, I think we do > not need to call it 5 times in a single reconcile loop, calling it once is > enough. If yes, we cloud save 30% cpu usage (every getSecondaryResource cost > 6% cpu usage) > [^operator_no_submit_no_kill.flamegraph.html] > I hope we can discuss solutions to address this problem together. I'm very > willing to optimize and resolve this issue. -- This message was sent by Atlassian Jira (v8.20.10#820010)