Re: Re: [VOTE] FLIP-417: Expose JobManagerOperatorMetrics via REST API
Hi voters and devs, I'm inclined to close the voting thread with the additional minor details to the FLIP. Please chime in if there are any objections! Best, Mason On Wed, Feb 7, 2024 at 11:49 AM Mason Chen wrote: > Hi Voters, > > JFYI, I have modified the proposed REST API path and added changes to the > metric scope configuration--you can find the reasoning and discussion in > the `[DISCUSS]` thread and FLIP doc. Please let me know if there are any > concerns. > > Best, > Mason > > On Mon, Jan 29, 2024 at 5:32 AM Thomas Weise wrote: > >> +1 (binding) >> >> >> On Mon, Jan 29, 2024 at 5:45 AM Maximilian Michels >> wrote: >> >> > +1 (binding) >> > >> > On Fri, Jan 26, 2024 at 6:03 AM Rui Fan <1996fan...@gmail.com> wrote: >> > > >> > > +1(binding) >> > > >> > > Best, >> > > Rui >> > > >> > > On Fri, Jan 26, 2024 at 11:55 AM Xuyang wrote: >> > > >> > > > +1 (non-binding) >> > > > >> > > > >> > > > -- >> > > > >> > > > Best! >> > > > Xuyang >> > > > >> > > > >> > > > >> > > > >> > > > >> > > > 在 2024-01-26 10:12:34,"Hang Ruan" 写道: >> > > > >Thanks for the FLIP. >> > > > > >> > > > >+1 (non-binding) >> > > > > >> > > > >Best, >> > > > >Hang >> > > > > >> > > > >Mason Chen 于2024年1月26日周五 04:51写道: >> > > > > >> > > > >> Hi Devs, >> > > > >> >> > > > >> I would like to start a vote on FLIP-417: Expose >> > > > JobManagerOperatorMetrics >> > > > >> via REST API [1] which has been discussed in this thread [2]. >> > > > >> >> > > > >> The vote will be open for at least 72 hours unless there is an >> > > > objection or >> > > > >> not enough votes. >> > > > >> >> > > > >> [1] >> > > > >> >> > > > >> >> > > > >> > >> https://cwiki.apache.org/confluence/display/FLINK/FLIP-417%3A+Expose+JobManagerOperatorMetrics+via+REST+API >> > > > >> [2] >> > https://lists.apache.org/thread/tt0hf6kf5lcxd7g62v9dhpn3z978pxw0 >> > > > >> >> > > > >> Best, >> > > > >> Mason >> > > > >> >> > > > >> > >> >
[jira] [Created] (FLINK-34442) Support optimizations for pre-partitioned [external] data sources
Jeyhun Karimov created FLINK-34442: -- Summary: Support optimizations for pre-partitioned [external] data sources Key: FLINK-34442 URL: https://issues.apache.org/jira/browse/FLINK-34442 Project: Flink Issue Type: Improvement Components: Table SQL / API, Table SQL / Planner Affects Versions: 1.18.1 Reporter: Jeyhun Karimov There are some use-cases in which data sources are pre-partitioned: - Kafka broker is already partitioned w.r.t. some key - There are multiple Flink jobs that materialize their outputs and read them as input subsequently One of the main benefits is that we might avoid unnecessary shuffling. There is already an experimental feature in DataStream to support a subset of these [1]. We should support this for Flink Table/SQL as well. [1] https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/experimental/ -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-34441) Add Documentation for flink-sql-runner-example in Kubernetes Operator Documentation
Prakash Tiwari created FLINK-34441: -- Summary: Add Documentation for flink-sql-runner-example in Kubernetes Operator Documentation Key: FLINK-34441 URL: https://issues.apache.org/jira/browse/FLINK-34441 Project: Flink Issue Type: Improvement Components: Kubernetes Operator Reporter: Prakash Tiwari There isn't a direct way available to submit SQL script based jobs to the Flink Kubernetes Operator. So we have created a [flink-sql-runner-example|https://github.com/apache/flink-kubernetes-operator/tree/release-1.7/examples/flink-sql-runner-example] that helps to run Flink SQL scripts as table API jobs. I believe it's a very useful and important example, and information about this job is missing from the Kubernetes Operator's documentation. Hence I've created this issue to update the documentation to include this example. The prospect for this issue was discussed here: [https://github.com/apache/flink-kubernetes-operator/pull/596] -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-34440) Support Debezium Protobuf Confluent Format
Kevin Lam created FLINK-34440: - Summary: Support Debezium Protobuf Confluent Format Key: FLINK-34440 URL: https://issues.apache.org/jira/browse/FLINK-34440 Project: Flink Issue Type: New Feature Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile) Affects Versions: 1.18.1, 1.19.0 Reporter: Kevin Lam *Motivation* Debezium and the Confluent Schema registry can be used to emit Protobuf Encoded messages to Kafka, but Flink does not easily support consuming these messages through a connector. *Definition of Done* Add a format `debezium-protobuf-confluent` provided by DebeziumProtobufFormatFactory that supports Debezium messages encoded using Protocol Buffer and the Confluent Schema Registry. To consider * Mirror the implementation of the `debezium-avro-confluent` format. First implement a `protobuf-confluent` format similar to the existing [Confluent Avro|https://nightlies.apache.org/flink/flink-docs-release-1.18/docs/connectors/table/formats/avro-confluent/] format that's provided today, which allows reading/writing protobuf using the Confluent Schema Registry -- This message was sent by Atlassian Jira (v8.20.10#820010)
[DISCUSS] Externalized Google Cloud Connectors
Hi Devs! I’d like to kick off a discussion on setting up a repo for a new fleet of Google Cloud connectors. A bit of context: - We have a team of Google engineers who are looking to build/maintain 5-10 GCP connectors for Flink. - We are wondering if it would make sense to host our connectors under the ASF umbrella following a similar repo structure as AWS ( https://github.com/apache/flink-connector-aws). In our case: apache/flink-connectors-gcp. - Currently, we have no Flink committers on our team. We are actively involved in the Apache Beam community and have a number of ASF members on the team. We saw that one of the original motivations for externalizing connectors was to encourage more activity and contributions around connectors by easing the contribution overhead. We understand that the decision was ultimately made to host the externalized connector repos under the ASF organization. For the same reasons (release infra, quality assurance, integration with the community, etc.), we would like all GCP connectors to live under the ASF organization. We want to ask the Flink community what you all think of this idea, and what would be the best way for us to go about contributing something like this. We are excited to contribute and want to learn and follow your practices. A specific issue we know of is that our changes need approval from Flink committers. Do you have a suggestion for how best to go about a new contribution like ours from a team that does not have committers? Is it possible, for example, to partner with a committer (or a small cohort) for tight engagement? We also know about ASF voting and release process, but that doesn't seem to be as much of a potential hurdle. Huge thanks in advance for sharing your thoughts! Claire
[jira] [Created] (FLINK-34439) Move chown operations to COPY commands in Dockerfile
Mate Czagany created FLINK-34439: Summary: Move chown operations to COPY commands in Dockerfile Key: FLINK-34439 URL: https://issues.apache.org/jira/browse/FLINK-34439 Project: Flink Issue Type: Improvement Components: Kubernetes Operator Reporter: Mate Czagany We can lower the size of the output operator container image if we don't run 'chown' commands in seperate RUN commands inside the Dockerfile, but instead use the '--chown' argument of the COPY command. Using 'RUN chown...' will copy all the files affected with their whole size to a new layer, duplicating the previous files from the COPY command. Example: {code:java} $ docker image history ghcr.io/apache/flink-kubernetes-operator:ccb10b8 ... 3 months ago RUN /bin/sh -c chown -R flink:flink $FLINK... 116MB buildkit.dockerfile.v0 ... {code} This would mean a 20% reduction in image size. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-34438) Kubernetes Operator doesn't wait for TaskManager deletion in native mode
Mate Czagany created FLINK-34438: Summary: Kubernetes Operator doesn't wait for TaskManager deletion in native mode Key: FLINK-34438 URL: https://issues.apache.org/jira/browse/FLINK-34438 Project: Flink Issue Type: Bug Components: Kubernetes Operator Affects Versions: kubernetes-operator-1.6.1, kubernetes-operator-1.7.0, kubernetes-operator-1.8.0 Reporter: Mate Czagany This issue was partly fixed in FLINK-32334 but native mode was not included in the fix. I don't see any downsides with adding the same check to native deployment mode, which would make sure that all TaskManagers were deleted when we shut down a Flink cluster. There should also be some logs suggesting that the timeout was exceeded instead of silently returning when waiting for the cluster to shut down. An issue was also mentioned on the mailing list which seems to be related to this: [https://lists.apache.org/thread/4gwj4ob4n9zg7b90vnqohj8x1p0bb5cb] -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-34437) Typo in SQL Client - `s/succeed/succeeded`
Robin Moffatt created FLINK-34437: - Summary: Typo in SQL Client - `s/succeed/succeeded` Key: FLINK-34437 URL: https://issues.apache.org/jira/browse/FLINK-34437 Project: Flink Issue Type: Bug Components: Table SQL / Client Affects Versions: 1.18.1 Reporter: Robin Moffatt ```sql Flink SQL> CREATE CATALOG c_new WITH ('type'='generic_in_memory'); [INFO] Execute statement succeed. ``` `Execute statement succeed.` is grammatically incorrect, and should read `Execute statement succeeded.` https://github.com/apache/flink/blob/5844092408d21023a738077d0922cc75f1e634d7/flink-table/flink-sql-client/src/main/java/org/apache/flink/table/client/cli/CliStrings.java#L214 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-34436) Avro schema evolution and compatibility issues
Jacek Wislicki created FLINK-34436: -- Summary: Avro schema evolution and compatibility issues Key: FLINK-34436 URL: https://issues.apache.org/jira/browse/FLINK-34436 Project: Flink Issue Type: Bug Components: Connectors / Pulsar Affects Versions: 1.17.2 Reporter: Jacek Wislicki We noticed a couple of critical issues in the Pulsar-Flink connector related to schema evolution and compatibility. Please see the MRE available at https://github.com/JacekWislicki/test11. More details are in the project's README file, here is the summary: Library versions: * Pulsar 3.0.1 * Flink 1.17.2 * Pulsar-Flink connector 4.1.0-1.17 Problems: * Exception thrown when schema's fields are added/removed * Avro's enum default value is ignored, instead the last known applied I believe that I observed the same behaviour in the Pulsar itself, still now we are focusing on the connector, hence I was able to document the problems when using it. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-34435) Bump org.yaml:snakeyaml from 1.31 to 2.2 for flink-connector-elasticsearch
Martijn Visser created FLINK-34435: -- Summary: Bump org.yaml:snakeyaml from 1.31 to 2.2 for flink-connector-elasticsearch Key: FLINK-34435 URL: https://issues.apache.org/jira/browse/FLINK-34435 Project: Flink Issue Type: Technical Debt Components: Connectors / ElasticSearch Reporter: Martijn Visser Assignee: Martijn Visser https://github.com/apache/flink-connector-elasticsearch/pull/90 -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [VOTE] Release flink-connector-parent, release candidate #1
Hi all, I fixed the source release [1] as requested, it no more contains tools/release/shared directory. I found out why it contained that directory, it was because parent_pom branch was referring to an incorrect sub-module mount point for release_utils branch (cf FLINK-34364 [2]). Here is the fixing PR (3). And by the way I noticed that all the connectors source releases were containing an empty tools/releasing directory because only tools/releasing/shared is excluded in the source release script and not the whole tools/releasing directory. It seems a bit messy to me so I think we should fix that in the release scripts later on for next connectors releases. I also found out that the RC1 tag was pointing to my fork instead of the main repo so I remade the tag (4) Apart of that, the code and artifact have not changed so I did not invalidate the RC1. Please confirm that I can proceed to the release. Best Etienne [1] https://dist.apache.org/repos/dist/dev/flink/flink-connector-parent-1.1.0-rc1/ [2] https://issues.apache.org/jira/browse/FLINK-34364 [3] https://github.com/apache/flink-connector-shared-utils/pull/36 [4] https://github.com/apache/flink-connector-shared-utils/releases/tag/v1.1.0-rc1 Le 05/02/2024 à 12:36, Etienne Chauchot a écrit : Hi, I just got back from vacations. I'll close the vote thread and proceed to the release later this week. Here is the ticket: https://issues.apache.org/jira/browse/FLINK-34364 Best Etienne Le 04/02/2024 à 05:06, Qingsheng Ren a écrit : +1 (binding) - Verified checksum and signature - Verified pom content - Built flink-connector-kafka from source with the parent pom in staging Best, Qingsheng On Thu, Feb 1, 2024 at 11:19 PM Chesnay Schepler wrote: - checked source/maven pom contents Please file a ticket to exclude tools/release from the source release. +1 (binding) On 29/01/2024 15:59, Maximilian Michels wrote: - Inspected the source for licenses and corresponding headers - Checksums and signature OK +1 (binding) On Tue, Jan 23, 2024 at 4:08 PM Etienne Chauchot wrote: Hi everyone, Please review and vote on the release candidate #1 for the version 1.1.0, as follows: [ ] +1, Approve the release [ ] -1, Do not approve the release (please provide specific comments) The complete staging area is available for your review, which includes: * JIRA release notes [1], * the official Apache source release to be deployed to dist.apache.org [2], which are signed with the key with fingerprint D1A76BA19D6294DD0033F6843A019F0B8DD163EA [3], * all artifacts to be deployed to the Maven Central Repository [4], * source code tag v1.1.0-rc1 [5], * website pull request listing the new release [6] * confluence wiki: connector parent upgrade to version 1.1.0 that will be validated after the artifact is released (there is no PR mechanism on the wiki) [7] The vote will be open for at least 72 hours. It is adopted by majority approval, with at least 3 PMC affirmative votes. Thanks, Etienne [1] https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12353442 [2] https://dist.apache.org/repos/dist/dev/flink/flink-connector-parent-1.1.0-rc1 [3]https://dist.apache.org/repos/dist/release/flink/KEYS [4] https://repository.apache.org/content/repositories/orgapacheflink-1698/ [5] https://github.com/apache/flink-connector-shared-utils/releases/tag/v1.1.0-rc1 [6]https://github.com/apache/flink-web/pull/717 [7] https://cwiki.apache.org/confluence/display/FLINK/Externalized+Connector+development
[jira] [Created] (FLINK-34434) DefaultSlotStatusSyncer doesn't complete the returned future
Matthias Pohl created FLINK-34434: - Summary: DefaultSlotStatusSyncer doesn't complete the returned future Key: FLINK-34434 URL: https://issues.apache.org/jira/browse/FLINK-34434 Project: Flink Issue Type: Bug Components: Runtime / Coordination Affects Versions: 1.18.1, 1.17.2, 1.19.0, 1.20.0 Reporter: Matthias Pohl When looking into FLINK-34427 (unrelated), I noticed an odd line in [DefaultSlotStatusSyncer:155|https://github.com/apache/flink/blob/15fe1653acec45d7c7bac17071e9773a4aa690a4/flink-runtime/src/main/java/org/apache/flink/runtime/resourcemanager/slotmanager/DefaultSlotStatusSyncer.java#L155] where we complete a future that should be already completed (because the callback is triggered after the {{requestFuture}} is already completed in some way. Shouldn't we complete the {{returnedFuture}} instead? I'm keeping the priority at {{Major}} because it doesn't seem to have been an issue in the past. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-34433) CollectionFunctionsITCase.test failed due to job restart
Matthias Pohl created FLINK-34433: - Summary: CollectionFunctionsITCase.test failed due to job restart Key: FLINK-34433 URL: https://issues.apache.org/jira/browse/FLINK-34433 Project: Flink Issue Type: Bug Components: Table SQL / Planner Affects Versions: 1.19.0, 1.20.0 Reporter: Matthias Pohl https://github.com/apache/flink/actions/runs/7880739697/job/21503460772#step:10:11312 {code} Error: 02:33:24 02:33:24.955 [ERROR] Tests run: 439, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 56.57 s <<< FAILURE! -- in org.apache.flink.table.planner.functions.CollectionFunctionsITCase Error: 02:33:24 02:33:24.956 [ERROR] org.apache.flink.table.planner.functions.CollectionFunctionsITCase.test(TestCase)[81] -- Time elapsed: 1.141 s <<< ERROR! Feb 13 02:33:24 java.lang.RuntimeException: Job restarted Feb 13 02:33:24 at org.apache.flink.streaming.api.operators.collect.UncheckpointedCollectResultBuffer.sinkRestarted(UncheckpointedCollectResultBuffer.java:42) Feb 13 02:33:24 at org.apache.flink.streaming.api.operators.collect.AbstractCollectResultBuffer.dealWithResponse(AbstractCollectResultBuffer.java:87) Feb 13 02:33:24 at org.apache.flink.streaming.api.operators.collect.CollectResultFetcher.next(CollectResultFetcher.java:124) Feb 13 02:33:24 at org.apache.flink.streaming.api.operators.collect.CollectResultIterator.nextResultFromFetcher(CollectResultIterator.java:126) Feb 13 02:33:24 at org.apache.flink.streaming.api.operators.collect.CollectResultIterator.hasNext(CollectResultIterator.java:100) Feb 13 02:33:24 at org.apache.flink.table.planner.connectors.CollectDynamicSink$CloseableRowIteratorWrapper.hasNext(CollectDynamicSink.java:247) Feb 13 02:33:24 at org.assertj.core.internal.Iterators.assertHasNext(Iterators.java:49) Feb 13 02:33:24 at org.assertj.core.api.AbstractIteratorAssert.hasNext(AbstractIteratorAssert.java:60) Feb 13 02:33:24 at org.apache.flink.table.planner.functions.BuiltInFunctionTestBase$ResultTestItem.test(BuiltInFunctionTestBase.java:383) Feb 13 02:33:24 at org.apache.flink.table.planner.functions.BuiltInFunctionTestBase$TestSetSpec.lambda$getTestCase$4(BuiltInFunctionTestBase.java:341) Feb 13 02:33:24 at org.apache.flink.table.planner.functions.BuiltInFunctionTestBase$TestCase.execute(BuiltInFunctionTestBase.java:119) Feb 13 02:33:24 at org.apache.flink.table.planner.functions.BuiltInFunctionTestBase.test(BuiltInFunctionTestBase.java:99) Feb 13 02:33:24 at java.lang.reflect.Method.invoke(Method.java:498) Feb 13 02:33:24 at java.util.concurrent.RecursiveAction.exec(RecursiveAction.java:189) Feb 13 02:33:24 at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289) Feb 13 02:33:24 at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056) Feb 13 02:33:24 at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692) Feb 13 02:33:24 at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:175) {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[ANNOUNCE] Flink 1.19 Cross-team testing & sync summary on 02/13/2024
Hi folks, enclosed please find the summary of the sync up meeting. *- Cross-team testing* *Unassigned release testing tickets[1] are looking for contributors to pick them up. The guideline of how to test the feature could be found in the ticket description.* There are 3 release testing instruction tickets[2] still waiting for response since authors are taking the Chinese new Loong year. *- Release-1.19 nightly build* Azure CI pipeline has been configured to trigger the 1.19 release branch nightly build, 1.16 has been removed[3]. *- CI issues* There are some CI test or instability issues that are under evaluation[4]. Huge shoutout to Matthias for observing CI stability and continuously improving the release process. Best regards, Martijn, Lincoln, Yun, and Jing [1] https://issues.apache.org/jira/browse/FLINK-34399?jql=project%20%3D%20FLINK%20AND%20parent%20%3D%20FLINK-34285%20AND%20labels%20%3D%20release-testing%20AND%20assignee%20is%20EMPTY%20%20ORDER%20BY%20updatedDate [2] https://issues.apache.org/jira/browse/FLINK-34391?jql=project%20%3D%20FLINK%20AND%20parent%20%3D%20FLINK-34285%20AND%20labels%20is%20EMPTY%20AND%20status%20%3D%20OPEN%20%20ORDER%20BY%20updatedDate [3] https://issues.apache.org/jira/browse/FLINK-34282 [4] https://cwiki.apache.org/confluence/display/FLINK/1.19+Release
[DISCUSS] FLIP suggestion: Flink SQL Scalar Functions.
Accroding this thread https://lists.apache.org/thread/rkpvlnwj9gv1hvx1dyklx6k88qpnvk2t , I want to suggest FLIP via Google Doc, https://docs.google.com/document/d/1Os0aRLAXYxmcO-GkaAfxlZgi3kipntA28tFoA4kdzno/edit?usp=sharing
[jira] [Created] (FLINK-34432) Re-enable forkReuse for flink-table-planner
Martijn Visser created FLINK-34432: -- Summary: Re-enable forkReuse for flink-table-planner Key: FLINK-34432 URL: https://issues.apache.org/jira/browse/FLINK-34432 Project: Flink Issue Type: Technical Debt Components: Table SQL / Client, Test Infrastructure, Tests Affects Versions: 1.19.0, 1.18.2, 1.20.0 Reporter: Martijn Visser With FLINK-18356 resolved, we should re-enable forkReuse for flink-table-planner to speed up the tests -- This message was sent by Atlassian Jira (v8.20.10#820010)