[jira] [Created] (FLINK-34359) "Kerberized YARN per-job on Docker test (default input)" failed due to IllegalStateException
Matthias Pohl created FLINK-34359: - Summary: "Kerberized YARN per-job on Docker test (default input)" failed due to IllegalStateException Key: FLINK-34359 URL: https://issues.apache.org/jira/browse/FLINK-34359 Project: Flink Issue Type: Bug Components: Deployment / YARN Affects Versions: 1.18.1 Reporter: Matthias Pohl This looks similar to FLINK-34357 because it's also due to some YARN issue. But the e2e test "Kerberized YARN per-job on Docker test (default input)" is causing the failure: {code} [...] Exception in thread "Thread-4" java.lang.IllegalStateException: Trying to access closed classloader. Please check if you store classloaders directly or indirectly in static fields. If the stacktrace suggests that the leak occurs in a third party library and cannot be fixed immediately, you can disable this check with the configuration 'classloader.check-leaked-classloader'. at org.apache.flink.util.FlinkUserCodeClassLoaders$SafetyNetWrapperClassLoader.ensureInner(FlinkUserCodeClassLoaders.java:184) at org.apache.flink.util.FlinkUserCodeClassLoaders$SafetyNetWrapperClassLoader.getResource(FlinkUserCodeClassLoaders.java:208) at org.apache.hadoop.conf.Configuration.getResource(Configuration.java:2570) at org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2801) at org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:2776) at org.apache.hadoop.conf.Configuration.loadProps(Configuration.java:2654) at org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2636) at org.apache.hadoop.conf.Configuration.get(Configuration.java:1100) at org.apache.hadoop.conf.Configuration.getTimeDuration(Configuration.java:1707) at org.apache.hadoop.conf.Configuration.getTimeDuration(Configuration.java:1688) at org.apache.hadoop.util.ShutdownHookManager.getShutdownTimeout(ShutdownHookManager.java:183) at org.apache.hadoop.util.ShutdownHookManager.shutdownExecutor(ShutdownHookManager.java:145) at org.apache.hadoop.util.ShutdownHookManager.access$300(ShutdownHookManager.java:65) at org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:102) {code} https://github.com/apache/flink/actions/runs/7770984519/job/21191905887#step:14:11720 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-34358) flink-connector-jdbc nightly fails with "Expecting code to raise a throwable"
Martijn Visser created FLINK-34358: -- Summary: flink-connector-jdbc nightly fails with "Expecting code to raise a throwable" Key: FLINK-34358 URL: https://issues.apache.org/jira/browse/FLINK-34358 Project: Flink Issue Type: Bug Components: Connectors / JDBC Reporter: Martijn Visser https://github.com/apache/flink-connector-jdbc/actions/runs/7770283211/job/21190280602#step:14:346 {code:java} [INFO] Running org.apache.flink.connector.jdbc.dialect.cratedb.CrateDBDialectTypeTest Error: Tests run: 19, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 0.554 s <<< FAILURE! - in org.apache.flink.connector.jdbc.dialect.cratedb.CrateDBDialectTypeTest Error: org.apache.flink.connector.jdbc.dialect.cratedb.CrateDBDialectTypeTest.testDataTypeValidate(TestItem)[19] Time elapsed: 0.018 s <<< FAILURE! java.lang.AssertionError: Expecting code to raise a throwable. [INFO] Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.002 s - in org.apache.flink.connector.jdbc.catalog.JdbcCatalogUtilsTest [INFO] Running org.apache.flink.architecture.ProductionCodeArchitectureTest [INFO] Running org.apache.flink.architecture.ProductionCodeArchitectureBase [INFO] Running org.apache.flink.architecture.rules.ApiAnnotationRules [INFO] Tests run: 20, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 2.155 s - in org.apache.flink.connector.jdbc.dialect.JdbcDialectTypeTest [INFO] Running org.apache.flink.architecture.TestCodeArchitectureTest [INFO] Running org.apache.flink.architecture.TestCodeArchitectureTestBase [INFO] Running org.apache.flink.architecture.rules.ITCaseRules [INFO] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 4.109 s - in org.apache.flink.architecture.rules.ApiAnnotationRules [INFO] Running org.apache.flink.architecture.rules.TableApiRules [INFO] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.024 s - in org.apache.flink.architecture.rules.TableApiRules [INFO] Running org.apache.flink.architecture.rules.ConnectorRules [INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.31 s - in org.apache.flink.architecture.rules.ConnectorRules [INFO] Tests run: 0, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 4.464 s - in org.apache.flink.architecture.ProductionCodeArchitectureBase [INFO] Tests run: 0, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 4.468 s - in org.apache.flink.architecture.ProductionCodeArchitectureTest [INFO] Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 5.758 s - in org.apache.flink.architecture.rules.ITCaseRules [INFO] Tests run: 0, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 5.761 s - in org.apache.flink.architecture.TestCodeArchitectureTestBase [INFO] Tests run: 0, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 5.775 s - in org.apache.flink.architecture.TestCodeArchitectureTest [INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 110.38 s - in org.apache.flink.connector.jdbc.databases.oracle.xa.OracleExactlyOnceSinkE2eTest [INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 172.591 s - in org.apache.flink.connector.jdbc.databases.db2.xa.Db2ExactlyOnceSinkE2eTest [INFO] [INFO] Results: [INFO] Error: Failures: Error:PostgresDialectTypeTest>JdbcDialectTypeTest.testDataTypeValidate:102 Expecting code to raise a throwable. Error:TrinoDialectTypeTest>JdbcDialectTypeTest.testDataTypeValidate:102 Expecting code to raise a throwable. Error:CrateDBDialectTypeTest>JdbcDialectTypeTest.testDataTypeValidate:102 Expecting code to raise a throwable. [INFO] Error: Tests run: 394, Failures: 3, Errors: 0, Skipped: 1 {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-34357) IllegalAnnotationsException causes "PyFlink YARN per-job on Docker test" e2e test to fail
Matthias Pohl created FLINK-34357: - Summary: IllegalAnnotationsException causes "PyFlink YARN per-job on Docker test" e2e test to fail Key: FLINK-34357 URL: https://issues.apache.org/jira/browse/FLINK-34357 Project: Flink Issue Type: Bug Components: Deployment / YARN Affects Versions: 1.18.1 Reporter: Matthias Pohl https://github.com/apache/flink/actions/runs/7763815214/job/21176570116#step:14:10009 {code} Feb 03 03:29:04 SEVERE: Failed to generate the schema for the JAX-B elements Feb 03 03:29:04 javax.xml.bind.JAXBException Feb 03 03:29:04 - with linked exception: Feb 03 03:29:04 [java.lang.reflect.InvocationTargetException] Feb 03 03:29:04 at javax.xml.bind.ContextFinder.newInstance(ContextFinder.java:262) Feb 03 03:29:04 at javax.xml.bind.ContextFinder.newInstance(ContextFinder.java:234) [...] Feb 03 03:29:04 at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582) Feb 03 03:29:04 Caused by: java.lang.reflect.InvocationTargetException Feb 03 03:29:04 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) Feb 03 03:29:04 at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) Feb 03 03:29:04 at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) Feb 03 03:29:04 at java.lang.reflect.Method.invoke(Method.java:498) Feb 03 03:29:04 at org.apache.hadoop.yarn.server.applicationhistoryservice.webapp.ContextFactory.createContext(ContextFactory.java:44) Feb 03 03:29:04 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) Feb 03 03:29:04 at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) Feb 03 03:29:04 at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) Feb 03 03:29:04 at java.lang.reflect.Method.invoke(Method.java:498) Feb 03 03:29:04 at javax.xml.bind.ContextFinder.newInstance(ContextFinder.java:247) Feb 03 03:29:04 ... 57 more Feb 03 03:29:04 Caused by: com.sun.xml.internal.bind.v2.runtime.IllegalAnnotationsException: 1 counts of IllegalAnnotationExceptions Feb 03 03:29:04 java.util.Set is an interface, and JAXB can't handle interfaces. Feb 03 03:29:04 this problem is related to the following location: Feb 03 03:29:04 at java.util.Set Feb 03 03:29:04 at public java.util.HashMap org.apache.hadoop.yarn.api.records.timeline.TimelineEntity.getPrimaryFiltersJAXB() Feb 03 03:29:04 at org.apache.hadoop.yarn.api.records.timeline.TimelineEntity Feb 03 03:29:04 at public java.util.List org.apache.hadoop.yarn.api.records.timeline.TimelineEntities.getEntities() Feb 03 03:29:04 at org.apache.hadoop.yarn.api.records.timeline.TimelineEntities Feb 03 03:29:04 Feb 03 03:29:04 at com.sun.xml.internal.bind.v2.runtime.IllegalAnnotationsException$Builder.check(IllegalAnnotationsException.java:91) Feb 03 03:29:04 at com.sun.xml.internal.bind.v2.runtime.JAXBContextImpl.getTypeInfoSet(JAXBContextImpl.java:445) Feb 03 03:29:04 at com.sun.xml.internal.bind.v2.runtime.JAXBContextImpl.(JAXBContextImpl.java:277) Feb 03 03:29:04 at com.sun.xml.internal.bind.v2.runtime.JAXBContextImpl.(JAXBContextImpl.java:124) Feb 03 03:29:04 at com.sun.xml.internal.bind.v2.runtime.JAXBContextImpl$JAXBContextBuilder.build(JAXBContextImpl.java:1123) Feb 03 03:29:04 at com.sun.xml.internal.bind.v2.ContextFactory.createContext(ContextFactory.java:147) Feb 03 03:29:04 ... 67 more {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [ANNOUNCE] Flink 1.19 feature freeze & sync summary on 01/30/2024
> My opinion would be to follow the process by default, and to make exceptions only if there're good reasons. Sounds make sense, I will merge it after 1.19 branch cutting. Thanks Xintong for the explanation! And sorry for bothering. Best, Rui On Mon, Feb 5, 2024 at 1:20 PM Xintong Song wrote: > Thanks for the info. > > My opinion would be to follow the process by default, and to make > exceptions only if there're good reasons. From your description, it sounds > like merging the PR in or after 1.19 doesn't really make a difference. In > that case, I'd suggest to merge it for the next release (i.e. merge it into > master after the 1.19 branch cutting). > > Best, > > Xintong > > > > On Mon, Feb 5, 2024 at 12:52 PM Rui Fan <1996fan...@gmail.com> wrote: > > > Thanks Xintong for the reply. > > > > They are Flink internal classes, and they are not used anymore. > > So I think they don't affect users, the benefit of removing them > > is to simplify Flink's code and reduce maintenance costs. > > > > If we just merge some user-related PRs recently, I could merge > > it after 1.19. Thank you again~ > > > > Best, > > Rui > > > > On Mon, Feb 5, 2024 at 12:21 PM Xintong Song > > wrote: > > > > > Hi Rui, > > > > > > Quick question, would there be any downside if this PR doesn't go into > > > 1.19? Or any user benefit from getting it into this release? > > > > > > Best, > > > > > > Xintong > > > > > > > > > > > > On Sun, Feb 4, 2024 at 10:16 AM Rui Fan <1996fan...@gmail.com> wrote: > > > > > > > Hi release managers, > > > > > > > > > The feature freeze of 1.19 has started now. That means that no new > > > > features > > > > > or improvements should now be merged into the master branch unless > > you > > > > ask > > > > > the release managers first, which has already been done for PRs, or > > > > pending > > > > > on CI to pass. Bug fixes and documentation PRs can still be merged. > > > > > > > > I'm curious whether the code cleanup could be merged? > > > > FLINK-31449[1] removed DeclarativeSlotManager related logic. > > > > Some other classes are not used anymore after FLINK-31449. > > > > FLINK-34345[2][3] will remove them. > > > > > > > > I checked these classes are not used in the master branch. > > > > And the PR[3] is reviewed for now, could I merge it now or > > > > after flink-1.19? > > > > > > > > Looking forward to your feedback, thanks~ > > > > > > > > [1] https://issues.apache.org/jira/browse/FLINK-31449 > > > > [2] https://issues.apache.org/jira/browse/FLINK-34345 > > > > [3] https://github.com/apache/flink/pull/24257 > > > > > > > > Best, > > > > Rui > > > > > > > > On Wed, Jan 31, 2024 at 5:20 PM Lincoln Lee > > > > wrote: > > > > > > > >> Hi Matthias, > > > >> > > > >> Thanks for letting us know! After discussed with 1.19 release > > managers, > > > we > > > >> agreed to merge these pr. > > > >> > > > >> Thank you for the work on GHA workflows! > > > >> > > > >> Best, > > > >> Yun, Jing, Martijn and Lincoln > > > >> > > > >> > > > >> Matthias Pohl 于2024年1月30日周二 22:20写道: > > > >> > > > >> > Thanks for the update, Lincoln. > > > >> > > > > >> > fyi: I merged FLINK-32684 (deprecating AkkaOptions) [1] since we > > > agreed > > > >> in > > > >> > today's meeting that this change is still ok to go in. > > > >> > > > > >> > The beta version of the GitHub Actions workflows (FLIP-396 [2]) > are > > > also > > > >> > finalized (see related PRs for basic CI [3], nightly master [4] > and > > > >> nightly > > > >> > scheduling [5]). I'd like to merge the changes before creating the > > > >> > release-1.19 branch. That would enable us to see whether we miss > > > >> anything > > > >> > in the GHA workflows setup when creating a new release branch. > > > >> > > > > >> > The changes are limited to a few CI scripts that are also used for > > > Azure > > > >> > Pipelines (see [3]). The majority of the changes are GHA-specific > > and > > > >> > shouldn't affect the Azure Pipelines CI setup. > > > >> > > > > >> > Therefore, I'm requesting the approval from the 1.19 release > > managers > > > to > > > >> > go ahead with merging the mentioned PRs [3, 4, 5]. > > > >> > > > > >> > Matthias > > > >> > > > > >> > > > > >> > [1] https://issues.apache.org/jira/browse/FLINK-32684 > > > >> > [2] > > > >> > > > > >> > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-396%3A+Trial+to+test+GitHub+Actions+as+an+alternative+for+Flink%27s+current+Azure+CI+infrastructure > > > >> > [3] https://github.com/apache/flink/pull/23970 > > > >> > [4] https://github.com/apache/flink/pull/23971 > > > >> > [5] https://github.com/apache/flink/pull/23972 > > > >> > > > > >> > On Tue, Jan 30, 2024 at 1:51 PM Lincoln Lee < > lincoln.8...@gmail.com > > > > > > >> > wrote: > > > >> > > > > >> >> Hi everyone, > > > >> >> > > > >> >> (Since feature freeze and release sync are on the same day, we > > merged > > > >> the > > > >> >> announcement and sync summary together) > > > >> >> > > > >> >> > > > >> >> *- Feature freeze* > > > >> >>
Re: [ANNOUNCE] Flink 1.19 feature freeze & sync summary on 01/30/2024
Thanks for the info. My opinion would be to follow the process by default, and to make exceptions only if there're good reasons. From your description, it sounds like merging the PR in or after 1.19 doesn't really make a difference. In that case, I'd suggest to merge it for the next release (i.e. merge it into master after the 1.19 branch cutting). Best, Xintong On Mon, Feb 5, 2024 at 12:52 PM Rui Fan <1996fan...@gmail.com> wrote: > Thanks Xintong for the reply. > > They are Flink internal classes, and they are not used anymore. > So I think they don't affect users, the benefit of removing them > is to simplify Flink's code and reduce maintenance costs. > > If we just merge some user-related PRs recently, I could merge > it after 1.19. Thank you again~ > > Best, > Rui > > On Mon, Feb 5, 2024 at 12:21 PM Xintong Song > wrote: > > > Hi Rui, > > > > Quick question, would there be any downside if this PR doesn't go into > > 1.19? Or any user benefit from getting it into this release? > > > > Best, > > > > Xintong > > > > > > > > On Sun, Feb 4, 2024 at 10:16 AM Rui Fan <1996fan...@gmail.com> wrote: > > > > > Hi release managers, > > > > > > > The feature freeze of 1.19 has started now. That means that no new > > > features > > > > or improvements should now be merged into the master branch unless > you > > > ask > > > > the release managers first, which has already been done for PRs, or > > > pending > > > > on CI to pass. Bug fixes and documentation PRs can still be merged. > > > > > > I'm curious whether the code cleanup could be merged? > > > FLINK-31449[1] removed DeclarativeSlotManager related logic. > > > Some other classes are not used anymore after FLINK-31449. > > > FLINK-34345[2][3] will remove them. > > > > > > I checked these classes are not used in the master branch. > > > And the PR[3] is reviewed for now, could I merge it now or > > > after flink-1.19? > > > > > > Looking forward to your feedback, thanks~ > > > > > > [1] https://issues.apache.org/jira/browse/FLINK-31449 > > > [2] https://issues.apache.org/jira/browse/FLINK-34345 > > > [3] https://github.com/apache/flink/pull/24257 > > > > > > Best, > > > Rui > > > > > > On Wed, Jan 31, 2024 at 5:20 PM Lincoln Lee > > > wrote: > > > > > >> Hi Matthias, > > >> > > >> Thanks for letting us know! After discussed with 1.19 release > managers, > > we > > >> agreed to merge these pr. > > >> > > >> Thank you for the work on GHA workflows! > > >> > > >> Best, > > >> Yun, Jing, Martijn and Lincoln > > >> > > >> > > >> Matthias Pohl 于2024年1月30日周二 22:20写道: > > >> > > >> > Thanks for the update, Lincoln. > > >> > > > >> > fyi: I merged FLINK-32684 (deprecating AkkaOptions) [1] since we > > agreed > > >> in > > >> > today's meeting that this change is still ok to go in. > > >> > > > >> > The beta version of the GitHub Actions workflows (FLIP-396 [2]) are > > also > > >> > finalized (see related PRs for basic CI [3], nightly master [4] and > > >> nightly > > >> > scheduling [5]). I'd like to merge the changes before creating the > > >> > release-1.19 branch. That would enable us to see whether we miss > > >> anything > > >> > in the GHA workflows setup when creating a new release branch. > > >> > > > >> > The changes are limited to a few CI scripts that are also used for > > Azure > > >> > Pipelines (see [3]). The majority of the changes are GHA-specific > and > > >> > shouldn't affect the Azure Pipelines CI setup. > > >> > > > >> > Therefore, I'm requesting the approval from the 1.19 release > managers > > to > > >> > go ahead with merging the mentioned PRs [3, 4, 5]. > > >> > > > >> > Matthias > > >> > > > >> > > > >> > [1] https://issues.apache.org/jira/browse/FLINK-32684 > > >> > [2] > > >> > > > >> > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-396%3A+Trial+to+test+GitHub+Actions+as+an+alternative+for+Flink%27s+current+Azure+CI+infrastructure > > >> > [3] https://github.com/apache/flink/pull/23970 > > >> > [4] https://github.com/apache/flink/pull/23971 > > >> > [5] https://github.com/apache/flink/pull/23972 > > >> > > > >> > On Tue, Jan 30, 2024 at 1:51 PM Lincoln Lee > > > >> > wrote: > > >> > > > >> >> Hi everyone, > > >> >> > > >> >> (Since feature freeze and release sync are on the same day, we > merged > > >> the > > >> >> announcement and sync summary together) > > >> >> > > >> >> > > >> >> *- Feature freeze* > > >> >> The feature freeze of 1.19 has started now. That means that no new > > >> >> features > > >> >> or improvements should now be merged into the master branch unless > > you > > >> ask > > >> >> the release managers first, which has already been done for PRs, or > > >> >> pending > > >> >> on CI to pass. Bug fixes and documentation PRs can still be merged. > > >> >> > > >> >> > > >> >> *- Cutting release branch* > > >> >> Currently we have three blocker issues[1][2][3], and will try to > > close > > >> >> them this Friday. > > >> >> We are planning to cut the release branch on next Monday (Feb 6th) > if > > >> no
Re: Frequent Flink JM restarts due to Kube API server errors.
Hii, Few more details: We are running GKE version 1.27.7-gke.1121002. and using flink version 1.14.3. Thanks! On Mon, 5 Feb 2024 at 12:05, Lavkesh Lahngir wrote: > Hii All, > > We run a Flink operator on GKE, deploying one Flink job per job manager. > We utilize > org.apache.flink.kubernetes.highavailability.KubernetesHaServicesFactory > for high availability. The JobManager employs config maps for checkpointing > and leader election. If, at any point, the Kube API server returns an error > (5xx or 4xx), the JM pod is restarted. This occurrence is sporadic, > happening every 1-2 days for some jobs among the 400 running in the same > cluster, each with its JobManager pod. > > What might be causing these errors from the Kube? One possibility is that > when the JM writes the config map and attempts to retrieve it immediately > after, it could result in a 404 error. > Are there any configurations to increase heartbeat or timeouts that might > be causing temporary disconnections from the Kube API server? > > Thank you! >
[jira] [Created] (FLINK-34356) Release Testin: Verify FLINK-33768 Support dynamic source parallelism inference for batch jobs
lincoln lee created FLINK-34356: --- Summary: Release Testin: Verify FLINK-33768 Support dynamic source parallelism inference for batch jobs Key: FLINK-34356 URL: https://issues.apache.org/jira/browse/FLINK-34356 Project: Flink Issue Type: Sub-task Components: Runtime / Coordination Affects Versions: 1.19.0 Reporter: lincoln lee Assignee: xingbe Fix For: 1.19.0 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-34355) Release Testing: Verify FLINK-34054 Support named parameters for functions and procedures
lincoln lee created FLINK-34355: --- Summary: Release Testing: Verify FLINK-34054 Support named parameters for functions and procedures Key: FLINK-34355 URL: https://issues.apache.org/jira/browse/FLINK-34355 Project: Flink Issue Type: Sub-task Components: Table SQL / API Affects Versions: 1.19.0 Reporter: lincoln lee Assignee: Feng Jin Fix For: 1.19.0 -- This message was sent by Atlassian Jira (v8.20.10#820010)
Frequent Flink JM restarts due to Kube API server errors.
Hii All, We run a Flink operator on GKE, deploying one Flink job per job manager. We utilize org.apache.flink.kubernetes.highavailability.KubernetesHaServicesFactory for high availability. The JobManager employs config maps for checkpointing and leader election. If, at any point, the Kube API server returns an error (5xx or 4xx), the JM pod is restarted. This occurrence is sporadic, happening every 1-2 days for some jobs among the 400 running in the same cluster, each with its JobManager pod. What might be causing these errors from the Kube? One possibility is that when the JM writes the config map and attempts to retrieve it immediately after, it could result in a 404 error. Are there any configurations to increase heartbeat or timeouts that might be causing temporary disconnections from the Kube API server? Thank you!
Re: [ANNOUNCE] Flink 1.19 feature freeze & sync summary on 01/30/2024
Thanks Xintong for the reply. They are Flink internal classes, and they are not used anymore. So I think they don't affect users, the benefit of removing them is to simplify Flink's code and reduce maintenance costs. If we just merge some user-related PRs recently, I could merge it after 1.19. Thank you again~ Best, Rui On Mon, Feb 5, 2024 at 12:21 PM Xintong Song wrote: > Hi Rui, > > Quick question, would there be any downside if this PR doesn't go into > 1.19? Or any user benefit from getting it into this release? > > Best, > > Xintong > > > > On Sun, Feb 4, 2024 at 10:16 AM Rui Fan <1996fan...@gmail.com> wrote: > > > Hi release managers, > > > > > The feature freeze of 1.19 has started now. That means that no new > > features > > > or improvements should now be merged into the master branch unless you > > ask > > > the release managers first, which has already been done for PRs, or > > pending > > > on CI to pass. Bug fixes and documentation PRs can still be merged. > > > > I'm curious whether the code cleanup could be merged? > > FLINK-31449[1] removed DeclarativeSlotManager related logic. > > Some other classes are not used anymore after FLINK-31449. > > FLINK-34345[2][3] will remove them. > > > > I checked these classes are not used in the master branch. > > And the PR[3] is reviewed for now, could I merge it now or > > after flink-1.19? > > > > Looking forward to your feedback, thanks~ > > > > [1] https://issues.apache.org/jira/browse/FLINK-31449 > > [2] https://issues.apache.org/jira/browse/FLINK-34345 > > [3] https://github.com/apache/flink/pull/24257 > > > > Best, > > Rui > > > > On Wed, Jan 31, 2024 at 5:20 PM Lincoln Lee > > wrote: > > > >> Hi Matthias, > >> > >> Thanks for letting us know! After discussed with 1.19 release managers, > we > >> agreed to merge these pr. > >> > >> Thank you for the work on GHA workflows! > >> > >> Best, > >> Yun, Jing, Martijn and Lincoln > >> > >> > >> Matthias Pohl 于2024年1月30日周二 22:20写道: > >> > >> > Thanks for the update, Lincoln. > >> > > >> > fyi: I merged FLINK-32684 (deprecating AkkaOptions) [1] since we > agreed > >> in > >> > today's meeting that this change is still ok to go in. > >> > > >> > The beta version of the GitHub Actions workflows (FLIP-396 [2]) are > also > >> > finalized (see related PRs for basic CI [3], nightly master [4] and > >> nightly > >> > scheduling [5]). I'd like to merge the changes before creating the > >> > release-1.19 branch. That would enable us to see whether we miss > >> anything > >> > in the GHA workflows setup when creating a new release branch. > >> > > >> > The changes are limited to a few CI scripts that are also used for > Azure > >> > Pipelines (see [3]). The majority of the changes are GHA-specific and > >> > shouldn't affect the Azure Pipelines CI setup. > >> > > >> > Therefore, I'm requesting the approval from the 1.19 release managers > to > >> > go ahead with merging the mentioned PRs [3, 4, 5]. > >> > > >> > Matthias > >> > > >> > > >> > [1] https://issues.apache.org/jira/browse/FLINK-32684 > >> > [2] > >> > > >> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-396%3A+Trial+to+test+GitHub+Actions+as+an+alternative+for+Flink%27s+current+Azure+CI+infrastructure > >> > [3] https://github.com/apache/flink/pull/23970 > >> > [4] https://github.com/apache/flink/pull/23971 > >> > [5] https://github.com/apache/flink/pull/23972 > >> > > >> > On Tue, Jan 30, 2024 at 1:51 PM Lincoln Lee > >> > wrote: > >> > > >> >> Hi everyone, > >> >> > >> >> (Since feature freeze and release sync are on the same day, we merged > >> the > >> >> announcement and sync summary together) > >> >> > >> >> > >> >> *- Feature freeze* > >> >> The feature freeze of 1.19 has started now. That means that no new > >> >> features > >> >> or improvements should now be merged into the master branch unless > you > >> ask > >> >> the release managers first, which has already been done for PRs, or > >> >> pending > >> >> on CI to pass. Bug fixes and documentation PRs can still be merged. > >> >> > >> >> > >> >> *- Cutting release branch* > >> >> Currently we have three blocker issues[1][2][3], and will try to > close > >> >> them this Friday. > >> >> We are planning to cut the release branch on next Monday (Feb 6th) if > >> no > >> >> new test instabilities, > >> >> and we'll make another announcement in the dev mailing list then. > >> >> > >> >> > >> >> *- Cross-team testing* > >> >> Release testing is expected to start next week as soon as we cut the > >> >> release branch. > >> >> As a prerequisite, please Before we start testing, please make sure > >> >> 1. Whether the feature needs a cross-team testing > >> >> 2. If yes, please the documentation completed > >> >> There's an umbrella ticket[4] for tracking the 1.19 testing, RM will > >> >> create all tickets for completed features listed on the 1.19 wiki > >> page[5] > >> >> and assign to the feature's Responsible Contributor, > >> >> also contributors are encouraged to create tickets
Re: [ANNOUNCE] Flink 1.19 feature freeze & sync summary on 01/30/2024
Hi Rui, Quick question, would there be any downside if this PR doesn't go into 1.19? Or any user benefit from getting it into this release? Best, Xintong On Sun, Feb 4, 2024 at 10:16 AM Rui Fan <1996fan...@gmail.com> wrote: > Hi release managers, > > > The feature freeze of 1.19 has started now. That means that no new > features > > or improvements should now be merged into the master branch unless you > ask > > the release managers first, which has already been done for PRs, or > pending > > on CI to pass. Bug fixes and documentation PRs can still be merged. > > I'm curious whether the code cleanup could be merged? > FLINK-31449[1] removed DeclarativeSlotManager related logic. > Some other classes are not used anymore after FLINK-31449. > FLINK-34345[2][3] will remove them. > > I checked these classes are not used in the master branch. > And the PR[3] is reviewed for now, could I merge it now or > after flink-1.19? > > Looking forward to your feedback, thanks~ > > [1] https://issues.apache.org/jira/browse/FLINK-31449 > [2] https://issues.apache.org/jira/browse/FLINK-34345 > [3] https://github.com/apache/flink/pull/24257 > > Best, > Rui > > On Wed, Jan 31, 2024 at 5:20 PM Lincoln Lee > wrote: > >> Hi Matthias, >> >> Thanks for letting us know! After discussed with 1.19 release managers, we >> agreed to merge these pr. >> >> Thank you for the work on GHA workflows! >> >> Best, >> Yun, Jing, Martijn and Lincoln >> >> >> Matthias Pohl 于2024年1月30日周二 22:20写道: >> >> > Thanks for the update, Lincoln. >> > >> > fyi: I merged FLINK-32684 (deprecating AkkaOptions) [1] since we agreed >> in >> > today's meeting that this change is still ok to go in. >> > >> > The beta version of the GitHub Actions workflows (FLIP-396 [2]) are also >> > finalized (see related PRs for basic CI [3], nightly master [4] and >> nightly >> > scheduling [5]). I'd like to merge the changes before creating the >> > release-1.19 branch. That would enable us to see whether we miss >> anything >> > in the GHA workflows setup when creating a new release branch. >> > >> > The changes are limited to a few CI scripts that are also used for Azure >> > Pipelines (see [3]). The majority of the changes are GHA-specific and >> > shouldn't affect the Azure Pipelines CI setup. >> > >> > Therefore, I'm requesting the approval from the 1.19 release managers to >> > go ahead with merging the mentioned PRs [3, 4, 5]. >> > >> > Matthias >> > >> > >> > [1] https://issues.apache.org/jira/browse/FLINK-32684 >> > [2] >> > >> https://cwiki.apache.org/confluence/display/FLINK/FLIP-396%3A+Trial+to+test+GitHub+Actions+as+an+alternative+for+Flink%27s+current+Azure+CI+infrastructure >> > [3] https://github.com/apache/flink/pull/23970 >> > [4] https://github.com/apache/flink/pull/23971 >> > [5] https://github.com/apache/flink/pull/23972 >> > >> > On Tue, Jan 30, 2024 at 1:51 PM Lincoln Lee >> > wrote: >> > >> >> Hi everyone, >> >> >> >> (Since feature freeze and release sync are on the same day, we merged >> the >> >> announcement and sync summary together) >> >> >> >> >> >> *- Feature freeze* >> >> The feature freeze of 1.19 has started now. That means that no new >> >> features >> >> or improvements should now be merged into the master branch unless you >> ask >> >> the release managers first, which has already been done for PRs, or >> >> pending >> >> on CI to pass. Bug fixes and documentation PRs can still be merged. >> >> >> >> >> >> *- Cutting release branch* >> >> Currently we have three blocker issues[1][2][3], and will try to close >> >> them this Friday. >> >> We are planning to cut the release branch on next Monday (Feb 6th) if >> no >> >> new test instabilities, >> >> and we'll make another announcement in the dev mailing list then. >> >> >> >> >> >> *- Cross-team testing* >> >> Release testing is expected to start next week as soon as we cut the >> >> release branch. >> >> As a prerequisite, please Before we start testing, please make sure >> >> 1. Whether the feature needs a cross-team testing >> >> 2. If yes, please the documentation completed >> >> There's an umbrella ticket[4] for tracking the 1.19 testing, RM will >> >> create all tickets for completed features listed on the 1.19 wiki >> page[5] >> >> and assign to the feature's Responsible Contributor, >> >> also contributors are encouraged to create tickets following the steps >> in >> >> the umbrella ticket if there are other ones that need to be cross-team >> >> tested. >> >> >> >> *- Release notes* >> >> >> >> All new features and behavior changes require authors to fill out the >> >> 'Release Note' column in the JIRA(click the Edit button and pull the >> page >> >> to the center), >> >> especially since 1.19 involves a lot of deprecation, which is important >> >> for users and will be part of the release announcement. >> >> >> >> - *Sync meeting* (https://meet.google.com/vcx-arzs-trv) >> >> >> >> We've already switched to weekly release sync, so the next release sync >> >> will be on Feb
Re: [VOTE] FLIP-331: Support EndOfStreamTrigger and isOutputOnlyAfterEndOfStream operator attribute to optimize task deployment
+1 (non-binding) Best, Yuxin Hang Ruan 于2024年2月5日周一 11:22写道: > +1 (non-binding) > > Best, > Hang > > Dong Lin 于2024年2月5日周一 11:08写道: > > > Thanks for the FLIP. > > > > +1 (binding) > > > > Best, > > Dong > > > > On Wed, Jan 31, 2024 at 11:41 AM Xuannan Su > wrote: > > > > > Hi everyone, > > > > > > Thanks for all the feedback about the FLIP-331: Support > > > EndOfStreamTrigger and isOutputOnlyAfterEndOfStream operator attribute > > > to optimize task deployment [1] [2]. > > > > > > I'd like to start a vote for it. The vote will be open for at least 72 > > > hours(excluding weekends,until Feb 5, 12:00AM GMT) unless there is an > > > objection or an insufficient number of votes. > > > > > > [1] > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-331%3A+Support+EndOfStreamTrigger+and+isOutputOnlyAfterEndOfStream+operator+attribute+to+optimize+task+deployment > > > [2] https://lists.apache.org/thread/qq39rmg3f23ysx5m094s4c4cq0m4tdj5 > > > > > > > > > Best, > > > Xuannan > > > > > >
Re: [VOTE] FLIP-331: Support EndOfStreamTrigger and isOutputOnlyAfterEndOfStream operator attribute to optimize task deployment
+1 (non-binding) Best, Hang Dong Lin 于2024年2月5日周一 11:08写道: > Thanks for the FLIP. > > +1 (binding) > > Best, > Dong > > On Wed, Jan 31, 2024 at 11:41 AM Xuannan Su wrote: > > > Hi everyone, > > > > Thanks for all the feedback about the FLIP-331: Support > > EndOfStreamTrigger and isOutputOnlyAfterEndOfStream operator attribute > > to optimize task deployment [1] [2]. > > > > I'd like to start a vote for it. The vote will be open for at least 72 > > hours(excluding weekends,until Feb 5, 12:00AM GMT) unless there is an > > objection or an insufficient number of votes. > > > > [1] > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-331%3A+Support+EndOfStreamTrigger+and+isOutputOnlyAfterEndOfStream+operator+attribute+to+optimize+task+deployment > > [2] https://lists.apache.org/thread/qq39rmg3f23ysx5m094s4c4cq0m4tdj5 > > > > > > Best, > > Xuannan > > >
Re: [VOTE] FLIP-331: Support EndOfStreamTrigger and isOutputOnlyAfterEndOfStream operator attribute to optimize task deployment
Thanks for the FLIP. +1 (binding) Best, Dong On Wed, Jan 31, 2024 at 11:41 AM Xuannan Su wrote: > Hi everyone, > > Thanks for all the feedback about the FLIP-331: Support > EndOfStreamTrigger and isOutputOnlyAfterEndOfStream operator attribute > to optimize task deployment [1] [2]. > > I'd like to start a vote for it. The vote will be open for at least 72 > hours(excluding weekends,until Feb 5, 12:00AM GMT) unless there is an > objection or an insufficient number of votes. > > [1] > https://cwiki.apache.org/confluence/display/FLINK/FLIP-331%3A+Support+EndOfStreamTrigger+and+isOutputOnlyAfterEndOfStream+operator+attribute+to+optimize+task+deployment > [2] https://lists.apache.org/thread/qq39rmg3f23ysx5m094s4c4cq0m4tdj5 > > > Best, > Xuannan >
Re: [DISCUSS] FLIP-409: DataStream V2 Building Blocks: DataStream, Partitioning and ProcessFunction
Thanks for updating the FLIP, Weijie. I think separating the TwoInputProcessFunction according to whether the input stream contains BroadcastStream makes sense. I have a few more comments. 1. I'd suggest the names `TwoInputNonBroadcastStreamProcessFunction` and `TwoInputBroadcastStreamProcessFunction` for the separated methods. 2. I'd suggest making `NonPartitionedContext` extend `RuntimeContext`. Otherwise, for all the functionalities that `RuntimeContext` provides, we need to duplicate them for `NonPartitionedContext`. 3. Some of these changes also affect FLIP-410. I noticed that FLIP-410 is also updated accordingly. It would be nice to also mention those changes in the FLIP-410 discussion thread. Best, Xintong On Sun, Feb 4, 2024 at 11:23 AM weijie guo wrote: > Hi Xuannan and Xintong, > > Good point! After further consideration, I feel that we should make the > Broadcast + NonKeyed/Keyed process function different from the normal > TwoInputProcessFunction. Because the record from the broadcast input indeed > correspond to all partitions, while the record from the non-broadcast edge > have explicit partitions. > > When we consider the data of broadcast input, it is only valid to do > something on all the partitions at once, such as things like > `applyToKeyedState`. Similarly, other operations(e.g, endOfInput) that do > not determine the current partition should also only be allowed to perform > on all partitions. This FLIP has been updated. > > Best regards, > > Weijie > > > Xintong Song 于2024年2月1日周四 11:31写道: > > > OK, I see your point. > > > > I think the demand for updating states and emitting outputs upon > receiving > > a broadcast record makes sense. However, the way > > `KeyedBroadcastProcessFunction` supports this may not be optimal. E.g., > if > > `Collector#collect` is called in `processBroadcastElement` but outside of > > `Context#applyToKeyedState`, the result can be undefined. > > > > Currently in this FLIP, a `TwoInputStreamProcessFunction` is not aware of > > which input is KeyedStream and which is BroadcastStream, which makes > > supporting things like `applyToKeyedState` difficult. I think we can > > provide a built-in function similar to `KeyedBroadcastProcessFunction` on > > top of `TwoInputStreamProcessFunction` to address this demand. > > > > WDYT? > > > > > > Best, > > > > Xintong > > > > > > > > On Thu, Feb 1, 2024 at 10:41 AM Xuannan Su > wrote: > > > > > Hi Weijie and Xingtong, > > > > > > Thanks for the reply! Please see my comments below. > > > > > > > Does this mean if we want to support (KeyedStream, BroadcastStream) > -> > > > > (KeyedStream), we must make sure that no data can be output upon > > > processing > > > > records from the input BroadcastStream? That's probably a reasonable > > > > limitation. > > > > > > I don't think that the requirement for supporting (KeyedStream, > > > BroadcastStream) -> (KeyedStream) is that no data can be output upon > > > processing the BroadcastStream. For instance, in the current > > > `KeyedBroadcastProcessFunction`, we use Context#applyToKeyedState to > > > produce output results, which can be keyed in the same manner as the > > > keyed input stream, upon processing data from the BroadcastStream. > > > Therefore, I believe it only requires that the user must ensure that > > > the output is keyed in the same way as the input, in this case, the > > > same way as the keyed input stream. I think this requirement is > > > consistent with that of (KeyedStream, KeyedStream) -> (KeyedStream). > > > Thus, I believe that supporting (KeyedStream, BroadcastStream) -> > > > (KeyedStream) will not introduce complexity for the users. WDYT? > > > > > > Best regards, > > > Xuannan > > > > > > > > > On Tue, Jan 30, 2024 at 3:12 PM weijie guo > > > wrote: > > > > > > > > Hi Xintong, > > > > > > > > Thanks for your reply. > > > > > > > > > Does this mean if we want to support (KeyedStream, BroadcastStream) > > -> > > > > (KeyedStream), we must make sure that no data can be output upon > > > processing > > > > records from the input BroadcastStream? That's probably a reasonable > > > > limitation. > > > > > > > > I think so, this is the restriction that has to be imposed in order > to > > > > avoid re-partition(i.e. shuffle). > > > > If one just want to get a keyed-stream and don't care about the data > > > > distribution, then explicit KeyBy partitioning works as expected. > > > > > > > > > The problem is would this limitation be too implicit for the users > to > > > > understand. > > > > > > > > Since we can't check for this limitation at compile time, if we were > to > > > add > > > > support for this case, we would have to introduce additional runtime > > > checks > > > > to ensure program correctness. For now, I'm inclined not to support > it, > > > as > > > > it's hard for users to understand this restriction unless we have > > > something > > > > better. And we can always add it later if we do realize there's a > > strong > > > >
FW: RE: Flink JDBC connector release
Hi Sergey, Sorry for the typos. I meant: Yes that is right, I am talking about the jdbc connector rc3 and Flink 1.18. I am looking into finding a simple way to reproduce the issue, and will raise a Jira if I can, Kind regards, David. From: Sergey Nuyanzin Date: Friday, 2 February 2024 at 19:46 To: dev@flink.apache.org Subject: [EXTERNAL] Re: Flink JDBC connector release Hi David thanks for testing I assume you are talking about jdbc connector rc3 and Flink 1.18. In case you think there is a bug it would make sense to raise a jira issue and provide steps to reproduce it On Fri, Feb 2, 2024 at 5:42 PM David Radley wrote: > Hi, > > We have been doing some testing on flink jdbc connector rc2. We are > testing with Flink 1.1.8 jars and are using the TableEnvironment > TableResult< > https://nightlies.apache.org/flink/flink-docs-master/api/java/org/apache/flink/table/api/TableResult.html > > > executeSql(String< > http://docs.oracle.com/javase/7/docs/api/java/lang/String.html?is-external=true > > > statement). We hit some strange behaviour testing the lookup join, we got a > null pointer Exception on > https://github.com/apache/flink-connector-jdbc/blob/390d7bc9139204fbfc48fe275a69eb60c4807fb5/flink-connector-jdbc/src/main/java/org/apache/flink/connector/jdbc/table/JdbcRowDataLookupFunction.java#L192 > > and that the constructor did not seem to be driven. We changed the > release that the pom file was building against to 1.18 and it works; so our > issue appeared to be mismatching jar levels. This issue did not appear > running the SQL client against rc2. > > > > I am attempting to put a simple java test together to show this issue. > > > > WDYT? > > Kind regards. > > > > > > > > Unless otherwise stated above: > > IBM United Kingdom Limited > Registered in England and Wales with number 741598 > Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU > -- Best regards, Sergey Unless otherwise stated above: IBM United Kingdom Limited Registered in England and Wales with number 741598 Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU Unless otherwise stated above: IBM United Kingdom Limited Registered in England and Wales with number 741598 Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU
RE: Flink JDBC connector release
Hi Sergey, Yes that is right I am talking about are talking about jdbc connector rc3 and Flink 1.18. I am looking into finding a simple way to reproduce it, and will raise a Jira if I can, Kind regards, David. From: Sergey Nuyanzin Date: Friday, 2 February 2024 at 19:46 To: dev@flink.apache.org Subject: [EXTERNAL] Re: Flink JDBC connector release Hi David thanks for testing I assume you are talking about jdbc connector rc3 and Flink 1.18. In case you think there is a bug it would make sense to raise a jira issue and provide steps to reproduce it On Fri, Feb 2, 2024 at 5:42 PM David Radley wrote: > Hi, > > We have been doing some testing on flink jdbc connector rc2. We are > testing with Flink 1.1.8 jars and are using the TableEnvironment > TableResult< > https://nightlies.apache.org/flink/flink-docs-master/api/java/org/apache/flink/table/api/TableResult.html > > > executeSql(String< > http://docs.oracle.com/javase/7/docs/api/java/lang/String.html?is-external=true > > > statement). We hit some strange behaviour testing the lookup join, we got a > null pointer Exception on > https://github.com/apache/flink-connector-jdbc/blob/390d7bc9139204fbfc48fe275a69eb60c4807fb5/flink-connector-jdbc/src/main/java/org/apache/flink/connector/jdbc/table/JdbcRowDataLookupFunction.java#L192 > > and that the constructor did not seem to be driven. We changed the > release that the pom file was building against to 1.18 and it works; so our > issue appeared to be mismatching jar levels. This issue did not appear > running the SQL client against rc2. > > > > I am attempting to put a simple java test together to show this issue. > > > > WDYT? > > Kind regards. > > > > > > > > Unless otherwise stated above: > > IBM United Kingdom Limited > Registered in England and Wales with number 741598 > Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU > -- Best regards, Sergey Unless otherwise stated above: IBM United Kingdom Limited Registered in England and Wales with number 741598 Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU
[jira] [Created] (FLINK-34354) Release Testing: Verify FLINK-34037 Improve Serialization Configuration and Usage in Flink
Zhanghao Chen created FLINK-34354: - Summary: Release Testing: Verify FLINK-34037 Improve Serialization Configuration and Usage in Flink Key: FLINK-34354 URL: https://issues.apache.org/jira/browse/FLINK-34354 Project: Flink Issue Type: Sub-task Components: API / Type Serialization System Affects Versions: 1.19.0 Reporter: Zhanghao Chen Fix For: 1.19.0 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-34353) A strange exception will be thrown if minibatch size is not set while using mini-batch join
xuyang created FLINK-34353: -- Summary: A strange exception will be thrown if minibatch size is not set while using mini-batch join Key: FLINK-34353 URL: https://issues.apache.org/jira/browse/FLINK-34353 Project: Flink Issue Type: Bug Components: Table SQL / Planner Affects Versions: 1.19.0 Reporter: xuyang Fix For: 1.19.0 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-34352) Improve the documentation of allowNonRestoredState
Hangxiang Yu created FLINK-34352: Summary: Improve the documentation of allowNonRestoredState Key: FLINK-34352 URL: https://issues.apache.org/jira/browse/FLINK-34352 Project: Flink Issue Type: Improvement Components: Documentation Reporter: Hangxiang Yu Assignee: Hangxiang Yu Current documentation of allowNonRestoredState is not clear, we should clarify: # It can lead to serious issues with correctness if it's used incorrectly. # The correctness is related to the topological order and the logic of job when removing operator by default. # For DataStream Job, the operator uid could be assigned explicitly to avoid the reassignment of operator uid. -- This message was sent by Atlassian Jira (v8.20.10#820010)