Re: [DISCUSS] Move CheckpointingMode to flink-core
Hi Lincoln, Given that we have finished the testing for 1.19, I agree it is better not merge this into 1.19. Thanks for RMs' attention! Hi Chesney and Junrui, Thanks for your advice. My original intention is to move the class as well as change the package to make it clean. But it involves much more effort. Here are several options we have: 1. Move CheckpointingMode to flink-core and keep the same package. No more deprecation and API changes. But it will leave a 'org.apache.flink.streaming.api' package in flink-core. 2. Introduce new CheckpointingMode in package 'org.apache.flink.core.execution' and deprecate the old one. Deprecate the corresponding getter/setter of 'CheckpointConfig' and introduce new ones with a similar but different name (e.g. set/getCheckpointMode). We will discuss the removal of those deprecation later in 2.x. 3. Based on 1, move CheckpointingMode to package 'org.apache.flink.core.execution' in 2.0. This is a breaking change that needs more discussion. Both ways work. I'm slightly inclined to option 1, or option 3 if we all agree, since the new getter/setter may also bring in confusions thus we cannot make the API purely clean. WDYT? Best, Zakelly On Wed, Feb 28, 2024 at 10:14 AM Junrui Lee wrote: > Hi Zakelly, > > I agree with Chesnay's response. I would suggest that during the process of > moving CheckpointingMode from the flink-streaming-java module to the > flink-core module, we should keep the package name unchanged. This approach > would be completely transparent to users. In fact, this practice should be > applicable to many of our desired moves from flink-streaming-java to > higher-level modules, such as flink-runtime and flink-core. > > Best, > Junrui > > Chesnay Schepler 于2024年2月28日周三 05:18写道: > > > Moving classes (== keep the same package) to a module higher up in the > > dependency tree should not be a breaking change and can imo be done > > anytime without any risk to users. > > > > On 27/02/2024 17:01, Lincoln Lee wrote: > > > Hi Zakelly, > > > > > > Thanks for letting us 1.19 RMs know about this! > > > > > > This change has been discussed during today's release sync meeting, we > > > suggest not merge it into 1.19. > > > We can continue discussing the removal in 2.x separately. > > > > > > Best, > > > Lincoln Lee > > > > > > > > > Hangxiang Yu 于2024年2月27日周二 11:28写道: > > > > > >> Hi, Zakelly. > > >> Thanks for driving this. > > >> Moving this class to flink-core makes sense to me which could make the > > code > > >> path and configs clearer. > > >> It's marked as @Public from 1.0 and 1.20 should be the next long-term > > >> version, so 1.19 should have been a suitable version to do it. > > >> And also look forward to thoughts of other developers/RMs since 1.19 > is > > >> currently under a feature freeze status. > > >> > > >> On Mon, Feb 26, 2024 at 6:42 PM Zakelly Lan > > wrote: > > >> > > >>> Hi devs, > > >>> > > >>> When working on the FLIP-406[1], I realized that moving all options > of > > >>> ExecutionCheckpointingOptions(flink-streaming-java) to > > >>> CheckpointingOptions(flink-core) depends on relocating the > > >>> enum CheckpointingMode(flink-streaming-java) to flink-core module. > > >> However, > > >>> the CheckpointingMode is annotated as @Public and used by datastream > > api > > >>> like 'CheckpointConfig#setCheckpointingMode'. So I'd like to start a > > >>> discussion on moving the CheckpointingMode to flink-core. It is in a > > >> little > > >>> bit of a hurry if we want the old enum to be entirely removed in > Flink > > >> 2.x > > >>> series, since the deprecation should be shipped in the upcoming Flink > > >> 1.19. > > >>> I suggest not creating a dedicated FLIP and treating this as a > sub-task > > >> of > > >>> FLIP-406. > > >>> > > >>> I prepared a minimal change of providing new APIs and deprecating the > > old > > >>> ones[2], which could be merged to 1.19 if we agree to do so. > > >>> > > >>> Looking forward to your thoughts! Also cc RMs of 1.19 about this. > > >>> > > >>> [1] > > >>> > > >> > > > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=284789560 > > >>> [2] > > >>> > > >>> > > >> > > > https://github.com/apache/flink/commit/9bdd237d0322df8853f1b9e6ae658f77b9175237 > > >>> Best, > > >>> Zakelly > > >>> > > >> > > >> -- > > >> Best, > > >> Hangxiang. > > >> > > > > >
Re: [ANNOUNCE] Flink 1.19 Cross-team testing completed & sync summary on 02/27/2024
Thanks Lincoln and all contributors! Best, Jingsong On Wed, Feb 28, 2024 at 12:15 PM Xintong Song wrote: > > Matthias has already said it on the release sync, but I still want to say > it again. It's amazing how smooth the release testing goes for this > release. Great thanks to the release managers and all contributors who make > this happen. > > Best, > > Xintong > > > > On Wed, Feb 28, 2024 at 10:10 AM Lincoln Lee wrote: > > > Hi devs, > > > > I'd like to share some highlights from the release sync on 02/27/2024 > > > > - Cross-team testing > > > > We've finished all of the testing work[1]. Huge thanks to all contributors > > and volunteers for the effort on this! > > > > - Blockers > > > > Two api change merge requests[2][3] had been discussed, there was an > > agreement on the second pr, as it is a fix for an > > unintended behavior newly introduced in 1.19, and we need to avoid > > releasing it to users. For the 1st pr, we suggest continue the discussing > > separately. > > So we will wait for [3] done and then create the first release candidate > > 1.19.0-rc1(expecting within this week if no new blockers). > > > > - Release notes > > > > Revision to the draft version of the release note[4] has been closed, and > > the formal pr[5] has been submitted, > > also the release announcement pr will be ready later this week, please > > continue to help review before 1.19 release, thanks! > > > > - Sync meeting (https://meet.google.com/vcx-arzs-trv) > > > > We've already switched to weekly release sync, so the next release sync > > will be on Mar 5th, 2024. Feel free to join! > > > > [1] https://issues.apache.org/jira/browse/FLINK-34285 > > [2] https://lists.apache.org/thread/2llhhbkcx5w7chp3d6cthoqc8kwfvw6x > > [3] > > https://github.com/apache/flink/pull/24387#pullrequestreview-1902749309 > > [4] > > https://docs.google.com/document/d/1HLF4Nhvkln4zALKJdwRErCnPzufh7Z3BhhkWlk9Zh7w > > [5] https://github.com/apache/flink/pull/24394 > > > > Best, > > Yun, Jing, Martijn and Lincoln > >
Re: [ANNOUNCE] Flink 1.19 Cross-team testing completed & sync summary on 02/27/2024
Matthias has already said it on the release sync, but I still want to say it again. It's amazing how smooth the release testing goes for this release. Great thanks to the release managers and all contributors who make this happen. Best, Xintong On Wed, Feb 28, 2024 at 10:10 AM Lincoln Lee wrote: > Hi devs, > > I'd like to share some highlights from the release sync on 02/27/2024 > > - Cross-team testing > > We've finished all of the testing work[1]. Huge thanks to all contributors > and volunteers for the effort on this! > > - Blockers > > Two api change merge requests[2][3] had been discussed, there was an > agreement on the second pr, as it is a fix for an > unintended behavior newly introduced in 1.19, and we need to avoid > releasing it to users. For the 1st pr, we suggest continue the discussing > separately. > So we will wait for [3] done and then create the first release candidate > 1.19.0-rc1(expecting within this week if no new blockers). > > - Release notes > > Revision to the draft version of the release note[4] has been closed, and > the formal pr[5] has been submitted, > also the release announcement pr will be ready later this week, please > continue to help review before 1.19 release, thanks! > > - Sync meeting (https://meet.google.com/vcx-arzs-trv) > > We've already switched to weekly release sync, so the next release sync > will be on Mar 5th, 2024. Feel free to join! > > [1] https://issues.apache.org/jira/browse/FLINK-34285 > [2] https://lists.apache.org/thread/2llhhbkcx5w7chp3d6cthoqc8kwfvw6x > [3] > https://github.com/apache/flink/pull/24387#pullrequestreview-1902749309 > [4] > https://docs.google.com/document/d/1HLF4Nhvkln4zALKJdwRErCnPzufh7Z3BhhkWlk9Zh7w > [5] https://github.com/apache/flink/pull/24394 > > Best, > Yun, Jing, Martijn and Lincoln >
Re: [VOTE] FLIP-314: Support Customized Job Lineage Listener
+1 (non-binding) Thanks for consolidating the opinions from both Flink and OpenLineage communities. Look forward to the collaboration. Best Regards Peter Huang On Tue, Feb 27, 2024 at 6:13 PM Yong Fang wrote: > Hi devs, > > I would like to restart a vote about FLIP-314: Support Customized Job > Lineage Listener[1]. > > Previously, we added lineage related interfaces in FLIP-314. Before the > interfaces were developed and merged into the master, @Maciej and > @Zhenqiu provided valuable suggestions for the interface from the > perspective of the lineage system. So we updated the interfaces of FLIP-314 > and discussed them again in the discussion thread [2]. > > So I am here to initiate a new vote on FLIP-314, the vote will be open for > at least 72 hours unless there is an objection or insufficient votes > > [1] > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-314%3A+Support+Customized+Job+Lineage+Listener > [2] https://lists.apache.org/thread/wopprvp3ww243mtw23nj59p57cghh7mc > > Best, > Fang Yong >
[jira] [Created] (FLINK-34535) Support JobPlanInfo for the explain result
yuanfenghu created FLINK-34535: -- Summary: Support JobPlanInfo for the explain result Key: FLINK-34535 URL: https://issues.apache.org/jira/browse/FLINK-34535 Project: Flink Issue Type: Improvement Components: Table SQL / Planner Reporter: yuanfenghu In the Flink Sql Explain syntax, we can set ExplainDetails to plan JSON_EXECUTION_PLAN, but we cannot plan JobPlanInfo. If we can explain this part of the information, referring to JobPlanInfo, I can combine it with the parameter `pipeline.jobvertex-parallelism-overrides` to set up my task parallelism -- This message was sent by Atlassian Jira (v8.20.10#820010)
Flink kinesis connector v4.2.0-1.18 has issue when consuming from EFO consumer
Hi, I used the flink-connector-kinesis (4.0.2-1.18) to consume from Kinesis. The job can start but will fail within 1 hour. Detailed error log is attached. When I changed the version of the flink-connector-kinesis to `1.15.2` , everything settled. Any idea to fix it ? create table kafka_event_v1 ( `timestamp` bigint, serverTimestamp bigint, name string, data string, app string, `user` string, context string, browser row< url string, referrer string, userAgent string, `language` string, title string, viewportWidth int, viewportHeight int, contentWidth int, contentHeight int, cookies map, name string, version string, device row< model string, type string, vendor string >, engine row< name string, version string >, os row< name string, version string > >, abtests map, apikey string, lifecycleId string, sessionId string, instanceId string, requestId string, eventId string, `trigger` string, virtualId string, accountId string, ip string, serverTimestampLtz as to_timestamp(from_unixtime(serverTimestamp / 1000)), watermark for serverTimestampLtz as serverTimestampLtz - interval '5' second ) with ( 'connector' = 'kafka', 'properties.bootstrap.servers' = 'shared.kafka.smartnews.internal:9093', 'topic' = 'shared-cluster-sn-pixel-event-v1-dev', 'scan.startup.mode' = 'earliest-offset', 'format' = 'json', 'json.ignore-parse-errors' = 'true', 'properties.group.id' = 'event_v2', 'properties.security.protocol' = 'SASL_SSL', 'properties.sasl.mechanism' = 'SNTOKEN', 'properties.sasl.login.class' = 'com.smartnews.dp.kafka.security.sn.auth.SnTokenLogin', 'properties.sasl.login.callback.handler.class' = 'com.smartnews.dp.kafka.security.sn.auth.SnTokenCallbackHandler', 'properties.sasl.client.callback.handler.class' = 'com.smartnews.dp.kafka.security.sn.sasl.SnTokenSaslClientCallbackHandler', 'properties.sasl.jaas.config' = 'com.smartnews.dp.kafka.security.sn.auth.SnTokenLoginModule required username="sn-pixel" password="aQXJcNUsCuIZpICHO9bQ" env="prd";', 'properties.ssl.truststore.type' = 'PEM' ); create catalog iceberg_dev with ( 'type'='iceberg', 'catalog-type'='hive', 'uri'='thrift://dev-hive-metastore.smartnews.internal:9083', 'warehouse'='s3a://smartnews-dmp/warehouse/development' ); insert into iceberg_dev.pixel.event_v2 /*+ options( 'partition.time-extractor.timestamp-pattern'='$dt 00:00:00', 'sink.partition-commit.policy.kind'='metastore,success-file', 'auto-compaction'='true' ) */ select `timestamp`, serverTimestamp, data, app, `user`, context, browser, abtests, lifecycleId, sessionId, instanceId, requestId, eventId, `trigger`, virtualId, accountId, ip, date_format(serverTimestampLtz, '-MM-dd') dt, apikey, name from default_catalog.default_database.kafka_event_v1 ;
Re: Requesting link to join Slack Community
Hi, Geetesh. Here is the invite link : https://join.slack.com/t/apache-flink/shared_invite/zt-1t4khgllz-Fm1CnXzdBbUchBz4HzJCAg . I will raise a PR to update the link. Best, Hang Geetesh Nikhade 于2024年2月28日周三 04:49写道: > Hi Folks, > > I would like to join the Apache Flink Community on Slack, but it looks > like the link shared on official flink website seems to have expired. Can > someone please share a new join link? or let me know what would be the best > way to get that link? > > Thanks in advance. > > Best, > Geetesh >
Re: [DISCUSS] Move CheckpointingMode to flink-core
Hi Zakelly, I agree with Chesnay's response. I would suggest that during the process of moving CheckpointingMode from the flink-streaming-java module to the flink-core module, we should keep the package name unchanged. This approach would be completely transparent to users. In fact, this practice should be applicable to many of our desired moves from flink-streaming-java to higher-level modules, such as flink-runtime and flink-core. Best, Junrui Chesnay Schepler 于2024年2月28日周三 05:18写道: > Moving classes (== keep the same package) to a module higher up in the > dependency tree should not be a breaking change and can imo be done > anytime without any risk to users. > > On 27/02/2024 17:01, Lincoln Lee wrote: > > Hi Zakelly, > > > > Thanks for letting us 1.19 RMs know about this! > > > > This change has been discussed during today's release sync meeting, we > > suggest not merge it into 1.19. > > We can continue discussing the removal in 2.x separately. > > > > Best, > > Lincoln Lee > > > > > > Hangxiang Yu 于2024年2月27日周二 11:28写道: > > > >> Hi, Zakelly. > >> Thanks for driving this. > >> Moving this class to flink-core makes sense to me which could make the > code > >> path and configs clearer. > >> It's marked as @Public from 1.0 and 1.20 should be the next long-term > >> version, so 1.19 should have been a suitable version to do it. > >> And also look forward to thoughts of other developers/RMs since 1.19 is > >> currently under a feature freeze status. > >> > >> On Mon, Feb 26, 2024 at 6:42 PM Zakelly Lan > wrote: > >> > >>> Hi devs, > >>> > >>> When working on the FLIP-406[1], I realized that moving all options of > >>> ExecutionCheckpointingOptions(flink-streaming-java) to > >>> CheckpointingOptions(flink-core) depends on relocating the > >>> enum CheckpointingMode(flink-streaming-java) to flink-core module. > >> However, > >>> the CheckpointingMode is annotated as @Public and used by datastream > api > >>> like 'CheckpointConfig#setCheckpointingMode'. So I'd like to start a > >>> discussion on moving the CheckpointingMode to flink-core. It is in a > >> little > >>> bit of a hurry if we want the old enum to be entirely removed in Flink > >> 2.x > >>> series, since the deprecation should be shipped in the upcoming Flink > >> 1.19. > >>> I suggest not creating a dedicated FLIP and treating this as a sub-task > >> of > >>> FLIP-406. > >>> > >>> I prepared a minimal change of providing new APIs and deprecating the > old > >>> ones[2], which could be merged to 1.19 if we agree to do so. > >>> > >>> Looking forward to your thoughts! Also cc RMs of 1.19 about this. > >>> > >>> [1] > >>> > >> > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=284789560 > >>> [2] > >>> > >>> > >> > https://github.com/apache/flink/commit/9bdd237d0322df8853f1b9e6ae658f77b9175237 > >>> Best, > >>> Zakelly > >>> > >> > >> -- > >> Best, > >> Hangxiang. > >> > >
[VOTE] FLIP-314: Support Customized Job Lineage Listener
Hi devs, I would like to restart a vote about FLIP-314: Support Customized Job Lineage Listener[1]. Previously, we added lineage related interfaces in FLIP-314. Before the interfaces were developed and merged into the master, @Maciej and @Zhenqiu provided valuable suggestions for the interface from the perspective of the lineage system. So we updated the interfaces of FLIP-314 and discussed them again in the discussion thread [2]. So I am here to initiate a new vote on FLIP-314, the vote will be open for at least 72 hours unless there is an objection or insufficient votes [1] https://cwiki.apache.org/confluence/display/FLINK/FLIP-314%3A+Support+Customized+Job+Lineage+Listener [2] https://lists.apache.org/thread/wopprvp3ww243mtw23nj59p57cghh7mc Best, Fang Yong
[ANNOUNCE] Flink 1.19 Cross-team testing completed & sync summary on 02/27/2024
Hi devs, I'd like to share some highlights from the release sync on 02/27/2024 - Cross-team testing We've finished all of the testing work[1]. Huge thanks to all contributors and volunteers for the effort on this! - Blockers Two api change merge requests[2][3] had been discussed, there was an agreement on the second pr, as it is a fix for an unintended behavior newly introduced in 1.19, and we need to avoid releasing it to users. For the 1st pr, we suggest continue the discussing separately. So we will wait for [3] done and then create the first release candidate 1.19.0-rc1(expecting within this week if no new blockers). - Release notes Revision to the draft version of the release note[4] has been closed, and the formal pr[5] has been submitted, also the release announcement pr will be ready later this week, please continue to help review before 1.19 release, thanks! - Sync meeting (https://meet.google.com/vcx-arzs-trv) We've already switched to weekly release sync, so the next release sync will be on Mar 5th, 2024. Feel free to join! [1] https://issues.apache.org/jira/browse/FLINK-34285 [2] https://lists.apache.org/thread/2llhhbkcx5w7chp3d6cthoqc8kwfvw6x [3] https://github.com/apache/flink/pull/24387#pullrequestreview-1902749309 [4] https://docs.google.com/document/d/1HLF4Nhvkln4zALKJdwRErCnPzufh7Z3BhhkWlk9Zh7w [5] https://github.com/apache/flink/pull/24394 Best, Yun, Jing, Martijn and Lincoln
Re: [DISCUSS] Move CheckpointingMode to flink-core
Moving classes (== keep the same package) to a module higher up in the dependency tree should not be a breaking change and can imo be done anytime without any risk to users. On 27/02/2024 17:01, Lincoln Lee wrote: Hi Zakelly, Thanks for letting us 1.19 RMs know about this! This change has been discussed during today's release sync meeting, we suggest not merge it into 1.19. We can continue discussing the removal in 2.x separately. Best, Lincoln Lee Hangxiang Yu 于2024年2月27日周二 11:28写道: Hi, Zakelly. Thanks for driving this. Moving this class to flink-core makes sense to me which could make the code path and configs clearer. It's marked as @Public from 1.0 and 1.20 should be the next long-term version, so 1.19 should have been a suitable version to do it. And also look forward to thoughts of other developers/RMs since 1.19 is currently under a feature freeze status. On Mon, Feb 26, 2024 at 6:42 PM Zakelly Lan wrote: Hi devs, When working on the FLIP-406[1], I realized that moving all options of ExecutionCheckpointingOptions(flink-streaming-java) to CheckpointingOptions(flink-core) depends on relocating the enum CheckpointingMode(flink-streaming-java) to flink-core module. However, the CheckpointingMode is annotated as @Public and used by datastream api like 'CheckpointConfig#setCheckpointingMode'. So I'd like to start a discussion on moving the CheckpointingMode to flink-core. It is in a little bit of a hurry if we want the old enum to be entirely removed in Flink 2.x series, since the deprecation should be shipped in the upcoming Flink 1.19. I suggest not creating a dedicated FLIP and treating this as a sub-task of FLIP-406. I prepared a minimal change of providing new APIs and deprecating the old ones[2], which could be merged to 1.19 if we agree to do so. Looking forward to your thoughts! Also cc RMs of 1.19 about this. [1] https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=284789560 [2] https://github.com/apache/flink/commit/9bdd237d0322df8853f1b9e6ae658f77b9175237 Best, Zakelly -- Best, Hangxiang.
Requesting link to join Slack Community
Hi Folks, I would like to join the Apache Flink Community on Slack, but it looks like the link shared on official flink website seems to have expired. Can someone please share a new join link? or let me know what would be the best way to get that link? Thanks in advance. Best, Geetesh
Re: [DISCUSS] FLINK-34440 Support Debezium Protobuf Confluent Format
Hey Robert, Thanks for your response. I have a partial implementation, just for the decoding portion. The code I have is pretty rough and doesn't do any of the refactors I mentioned, but the decoder logic does pull the schema from the schema registry and use that to deserialize the DynamicMessage before converting it to RowData using a DynamicMessageToRowDataConverter class. For the other aspects, I would need to start from scratch for the encoder. Would be very happy to see you drive the contribution back to open source from Confluent, or collaborate on this. Another topic I had is Protobuf's field ids. Ideally in Flink it would be nice if we are idiomatic about not renumbering them in incompatible ways, similar to what's discussed on the Schema Registry issue here: https://github.com/confluentinc/schema-registry/issues/2551 On Tue, Feb 27, 2024 at 5:51 AM Robert Metzger wrote: > Hi all, > > +1 to support the format in Flink. > > @Kevin: Do you already have an implementation for this inhouse that you are > looking to upstream, or would you start from scratch? > I'm asking because my current employer, Confluent, has a Protobuf Schema > registry implementation for Flink, and I could help drive contributing this > back to open source. > If you already have an implementation, let's decide which one to use :) > > Best, > Robert > > On Thu, Feb 22, 2024 at 2:05 PM David Radley > wrote: > > > Hi Kevin, > > Some thoughts on this. > > I suggested an Apicurio registry format in the dev list, and was advised > > to raise a FLIP for this, I suggest the same would apply here (or the > > alternative to FLIPs if you cannot raise one). I am prototyping an Avro > > Apicurio format, prior to raising the Flip, and notice that the > readSchema > > in the SchemaCoder only takes a byte array ,but I need to pass down the > > Kafka headers (where the Apicurio globalId identifying the schema lives). > > > > I assume: > > > > * for the confluent Protobuf format you would extend the Protobuf > > format to drive some Schema Registry logic for Protobuf (similar to the > way > > Avro does it) where the magic byte _ schema id can be obtained and the > > schema looked up using the Confluent Schema registry. > > * It would be good if any protobuf format enhancements for Schema > > registries pass down the Kafka headers (I am thinking as a Map > Object> for Avro) as well as the message payload so Apicurio registry > could > > work with this. > > * It would make sense to have the Confluent schema lookup in common > > code, which is part of the SchemaCoder readSchema logic. > > * I assume the ProtobufSchemaCoder readSchema would return a Protobuf > > Schema object. > > > > > > > > I also wondered whether these Kafka only formats should be moved to the > > Kafka connector repo, or whether they might in the future be used outside > > Kafka – e.g. Avro/Protobuf files in a database. > >Kind regards, David. > > > > > > From: Kevin Lam > > Date: Wednesday, 21 February 2024 at 18:51 > > To: dev@flink.apache.org > > Subject: [EXTERNAL] [DISCUSS] FLINK-34440 Support Debezium Protobuf > > Confluent Format > > I would love to get some feedback from the community on this JIRA issue: > > https://issues.apache.org/jira/projects/FLINK/issues/FLINK-34440 > > > > I am looking into creating a PR and would appreciate some review on the > > approach. > > > > In terms of design I think we can mirror the `debezium-avro-confluent` > and > > `avro-confluent` formats already available in Flink: > > > >1. `protobuf-confluent` format which uses DynamicMessage > >< > > > https://protobuf.dev/reference/java/api-docs/com/google/protobuf/DynamicMessage > > > > >for encoding and decoding. > > - For encoding the Flink RowType will be used to dynamically > create a > > Protobuf Schema and register it with the Confluent Schema > > Registry. It will > > use the same schema to construct a DynamicMessage and serialize it. > > - For decoding, the schema will be fetched from the registry and > use > > DynamicMessage to deserialize and convert the Protobuf object to a > > Flink > > RowData. > > - Note: here there is no external .proto file > >2. `debezium-avro-confluent` format which unpacks the Debezium > Envelope > >and collects the appropriate UPDATE_BEFORE, UPDATE_AFTER, INSERT, > DELETE > >events. > > - We may be able to refactor and reuse code from the existing > > DebeziumAvroDeserializationSchema + DebeziumAvroSerializationSchema > > since > > the deser logic is largely delegated to and these Schemas are > > concerned > > with the handling the Debezium envelope. > >3. Move the Confluent Schema Registry Client code to a separate maven > >module, flink-formats/flink-confluent-common, and extend it to support > >ProtobufSchemaProvider > >< > > >
Re: [DISCUSS] Move CheckpointingMode to flink-core
Hi Zakelly, Thanks for letting us 1.19 RMs know about this! This change has been discussed during today's release sync meeting, we suggest not merge it into 1.19. We can continue discussing the removal in 2.x separately. Best, Lincoln Lee Hangxiang Yu 于2024年2月27日周二 11:28写道: > Hi, Zakelly. > Thanks for driving this. > Moving this class to flink-core makes sense to me which could make the code > path and configs clearer. > It's marked as @Public from 1.0 and 1.20 should be the next long-term > version, so 1.19 should have been a suitable version to do it. > And also look forward to thoughts of other developers/RMs since 1.19 is > currently under a feature freeze status. > > On Mon, Feb 26, 2024 at 6:42 PM Zakelly Lan wrote: > > > Hi devs, > > > > When working on the FLIP-406[1], I realized that moving all options of > > ExecutionCheckpointingOptions(flink-streaming-java) to > > CheckpointingOptions(flink-core) depends on relocating the > > enum CheckpointingMode(flink-streaming-java) to flink-core module. > However, > > the CheckpointingMode is annotated as @Public and used by datastream api > > like 'CheckpointConfig#setCheckpointingMode'. So I'd like to start a > > discussion on moving the CheckpointingMode to flink-core. It is in a > little > > bit of a hurry if we want the old enum to be entirely removed in Flink > 2.x > > series, since the deprecation should be shipped in the upcoming Flink > 1.19. > > I suggest not creating a dedicated FLIP and treating this as a sub-task > of > > FLIP-406. > > > > I prepared a minimal change of providing new APIs and deprecating the old > > ones[2], which could be merged to 1.19 if we agree to do so. > > > > Looking forward to your thoughts! Also cc RMs of 1.19 about this. > > > > [1] > > > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=284789560 > > [2] > > > > > https://github.com/apache/flink/commit/9bdd237d0322df8853f1b9e6ae658f77b9175237 > > > > Best, > > Zakelly > > > > > -- > Best, > Hangxiang. >
[jira] [Created] (FLINK-34534) CLONE - Vote on the release candidate
lincoln lee created FLINK-34534: --- Summary: CLONE - Vote on the release candidate Key: FLINK-34534 URL: https://issues.apache.org/jira/browse/FLINK-34534 Project: Flink Issue Type: Sub-task Affects Versions: 1.17.0 Reporter: Matthias Pohl Assignee: Qingsheng Ren Fix For: 1.17.0 Once you have built and individually reviewed the release candidate, please share it for the community-wide review. Please review foundation-wide [voting guidelines|http://www.apache.org/foundation/voting.html] for more information. Start the review-and-vote thread on the dev@ mailing list. Here’s an email template; please adjust as you see fit. {quote}From: Release Manager To: dev@flink.apache.org Subject: [VOTE] Release 1.2.3, release candidate #3 Hi everyone, Please review and vote on the release candidate #3 for the version 1.2.3, as follows: [ ] +1, Approve the release [ ] -1, Do not approve the release (please provide specific comments) The complete staging area is available for your review, which includes: * JIRA release notes [1], * the official Apache source release and binary convenience releases to be deployed to dist.apache.org [2], which are signed with the key with fingerprint [3], * all artifacts to be deployed to the Maven Central Repository [4], * source code tag "release-1.2.3-rc3" [5], * website pull request listing the new release and adding announcement blog post [6]. The vote will be open for at least 72 hours. It is adopted by majority approval, with at least 3 PMC affirmative votes. Thanks, Release Manager [1] link [2] link [3] [https://dist.apache.org/repos/dist/release/flink/KEYS] [4] link [5] link [6] link {quote} *If there are any issues found in the release candidate, reply on the vote thread to cancel the vote.* There’s no need to wait 72 hours. Proceed to the Fix Issues step below and address the problem. However, some issues don’t require cancellation. For example, if an issue is found in the website pull request, just correct it on the spot and the vote can continue as-is. For cancelling a release, the release manager needs to send an email to the release candidate thread, stating that the release candidate is officially cancelled. Next, all artifacts created specifically for the RC in the previous steps need to be removed: * Delete the staging repository in Nexus * Remove the source / binary RC files from dist.apache.org * Delete the source code tag in git *If there are no issues, reply on the vote thread to close the voting.* Then, tally the votes in a separate email. Here’s an email template; please adjust as you see fit. {quote}From: Release Manager To: dev@flink.apache.org Subject: [RESULT] [VOTE] Release 1.2.3, release candidate #3 I'm happy to announce that we have unanimously approved this release. There are XXX approving votes, XXX of which are binding: * approver 1 * approver 2 * approver 3 * approver 4 There are no disapproving votes. Thanks everyone! {quote} h3. Expectations * Community votes to release the proposed candidate, with at least three approving PMC votes Any issues that are raised till the vote is over should be either resolved or moved into the next release (if applicable). -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-34533) CLONE - Propose a pull request for website updates
lincoln lee created FLINK-34533: --- Summary: CLONE - Propose a pull request for website updates Key: FLINK-34533 URL: https://issues.apache.org/jira/browse/FLINK-34533 Project: Flink Issue Type: Sub-task Affects Versions: 1.17.0 Reporter: Matthias Pohl Assignee: Qingsheng Ren Fix For: 1.17.0 The final step of building the candidate is to propose a website pull request containing the following changes: # update [apache/flink-web:_config.yml|https://github.com/apache/flink-web/blob/asf-site/_config.yml] ## update {{FLINK_VERSION_STABLE}} and {{FLINK_VERSION_STABLE_SHORT}} as required ## update version references in quickstarts ({{{}q/{}}} directory) as required ## (major only) add a new entry to {{flink_releases}} for the release binaries and sources ## (minor only) update the entry for the previous release in the series in {{flink_releases}} ### Please pay notice to the ids assigned to the download entries. They should be unique and reflect their corresponding version number. ## add a new entry to {{release_archive.flink}} # add a blog post announcing the release in _posts # add a organized release notes page under docs/content/release-notes and docs/content.zh/release-notes (like [https://nightlies.apache.org/flink/flink-docs-release-1.15/release-notes/flink-1.15/]). The page is based on the non-empty release notes collected from the issues, and only the issues that affect existing users should be included (e.g., instead of new functionality). It should be in a separate PR since it would be merged to the flink project. (!) Don’t merge the PRs before finalizing the release. h3. Expectations * Website pull request proposed to list the [release|http://flink.apache.org/downloads.html] * (major only) Check {{docs/config.toml}} to ensure that ** the version constants refer to the new version ** the {{baseurl}} does not point to {{flink-docs-master}} but {{flink-docs-release-X.Y}} instead -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-34532) CLONE - Stage source and binary releases on dist.apache.org
lincoln lee created FLINK-34532: --- Summary: CLONE - Stage source and binary releases on dist.apache.org Key: FLINK-34532 URL: https://issues.apache.org/jira/browse/FLINK-34532 Project: Flink Issue Type: Sub-task Reporter: Matthias Pohl Assignee: Qingsheng Ren Copy the source release to the dev repository of dist.apache.org: # If you have not already, check out the Flink section of the dev repository on dist.apache.org via Subversion. In a fresh directory: {code:bash} $ svn checkout https://dist.apache.org/repos/dist/dev/flink --depth=immediates {code} # Make a directory for the new release and copy all the artifacts (Flink source/binary distributions, hashes, GPG signatures and the python subdirectory) into that newly created directory: {code:bash} $ mkdir flink/flink-${RELEASE_VERSION}-rc${RC_NUM} $ mv /tools/releasing/release/* flink/flink-${RELEASE_VERSION}-rc${RC_NUM} {code} # Add and commit all the files. {code:bash} $ cd flink flink $ svn add flink-${RELEASE_VERSION}-rc${RC_NUM} flink $ svn commit -m "Add flink-${RELEASE_VERSION}-rc${RC_NUM}" {code} # Verify that files are present under [https://dist.apache.org/repos/dist/dev/flink|https://dist.apache.org/repos/dist/dev/flink]. # Push the release tag if not done already (the following command assumes to be called from within the apache/flink checkout): {code:bash} $ git push refs/tags/release-${RELEASE_VERSION}-rc${RC_NUM} {code} h3. Expectations * Maven artifacts deployed to the staging repository of [repository.apache.org|https://repository.apache.org/content/repositories/] * Source distribution deployed to the dev repository of [dist.apache.org|https://dist.apache.org/repos/dist/dev/flink/] * Check hashes (e.g. shasum -c *.sha512) * Check signatures (e.g. {{{}gpg --verify flink-1.2.3-source-release.tar.gz.asc flink-1.2.3-source-release.tar.gz{}}}) * {{grep}} for legal headers in each file. * If time allows check the NOTICE files of the modules whose dependencies have been changed in this release in advance, since the license issues from time to time pop up during voting. See [Verifying a Flink Release|https://cwiki.apache.org/confluence/display/FLINK/Verifying+a+Flink+Release] "Checking License" section. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-34531) CLONE - Build and stage Java and Python artifacts
lincoln lee created FLINK-34531: --- Summary: CLONE - Build and stage Java and Python artifacts Key: FLINK-34531 URL: https://issues.apache.org/jira/browse/FLINK-34531 Project: Flink Issue Type: Sub-task Reporter: Matthias Pohl Assignee: Qingsheng Ren # Create a local release branch ((!) this step can not be skipped for minor releases): {code:bash} $ cd ./tools tools/ $ OLD_VERSION=$CURRENT_SNAPSHOT_VERSION NEW_VERSION=$RELEASE_VERSION RELEASE_CANDIDATE=$RC_NUM releasing/create_release_branch.sh {code} # Tag the release commit: {code:bash} $ git tag -s ${TAG} -m "${TAG}" {code} # We now need to do several things: ## Create the source release archive ## Deploy jar artefacts to the [Apache Nexus Repository|https://repository.apache.org/], which is the staging area for deploying the jars to Maven Central ## Build PyFlink wheel packages You might want to create a directory on your local machine for collecting the various source and binary releases before uploading them. Creating the binary releases is a lengthy process but you can do this on another machine (for example, in the "cloud"). When doing this, you can skip signing the release files on the remote machine, download them to your local machine and sign them there. # Build the source release: {code:bash} tools $ RELEASE_VERSION=$RELEASE_VERSION releasing/create_source_release.sh {code} # Stage the maven artifacts: {code:bash} tools $ releasing/deploy_staging_jars.sh {code} Review all staged artifacts ([https://repository.apache.org/]). They should contain all relevant parts for each module, including pom.xml, jar, test jar, source, test source, javadoc, etc. Carefully review any new artifacts. # Close the staging repository on Apache Nexus. When prompted for a description, enter “Apache Flink, version X, release candidate Y”. Then, you need to build the PyFlink wheel packages (since 1.11): # Set up an azure pipeline in your own Azure account. You can refer to [Azure Pipelines|https://cwiki.apache.org/confluence/display/FLINK/Azure+Pipelines#AzurePipelines-Tutorial:SettingupAzurePipelinesforaforkoftheFlinkrepository] for more details on how to set up azure pipeline for a fork of the Flink repository. Note that a google cloud mirror in Europe is used for downloading maven artifacts, therefore it is recommended to set your [Azure organization region|https://docs.microsoft.com/en-us/azure/devops/organizations/accounts/change-organization-location] to Europe to speed up the downloads. # Push the release candidate branch to your forked personal Flink repository, e.g. {code:bash} tools $ git push refs/heads/release-${RELEASE_VERSION}-rc${RC_NUM}:release-${RELEASE_VERSION}-rc${RC_NUM} {code} # Trigger the Azure Pipelines manually to build the PyFlink wheel packages ## Go to your Azure Pipelines Flink project → Pipelines ## Click the "New pipeline" button on the top right ## Select "GitHub" → your GitHub Flink repository → "Existing Azure Pipelines YAML file" ## Select your branch → Set path to "/azure-pipelines.yaml" → click on "Continue" → click on "Variables" ## Then click "New Variable" button, fill the name with "MODE", and the value with "release". Click "OK" to set the variable and the "Save" button to save the variables, then back on the "Review your pipeline" screen click "Run" to trigger the build. ## You should now see a build where only the "CI build (release)" is running # Download the PyFlink wheel packages from the build result page after the jobs of "build_wheels mac" and "build_wheels linux" have finished. ## Download the PyFlink wheel packages ### Open the build result page of the pipeline ### Go to the {{Artifacts}} page (build_wheels linux -> 1 artifact) ### Click {{wheel_Darwin_build_wheels mac}} and {{wheel_Linux_build_wheels linux}} separately to download the zip files ## Unzip these two zip files {code:bash} $ cd /path/to/downloaded_wheel_packages $ unzip wheel_Linux_build_wheels\ linux.zip $ unzip wheel_Darwin_build_wheels\ mac.zip{code} ## Create directory {{./dist}} under the directory of {{{}flink-python{}}}: {code:bash} $ cd $ mkdir flink-python/dist{code} ## Move the unzipped wheel packages to the directory of {{{}flink-python/dist{}}}: {code:java} $ mv /path/to/wheel_Darwin_build_wheels\ mac/* flink-python/dist/ $ mv /path/to/wheel_Linux_build_wheels\ linux/* flink-python/dist/ $ cd tools{code} Finally, we create the binary convenience release files: {code:bash} tools $ RELEASE_VERSION=$RELEASE_VERSION releasing/create_binary_release.sh {code} If you want to run this step in parallel on a remote machine you have to make the release commit available there (for example by pushing to a repository). *This is important: the commit inside the binary builds has to match the commit of the source builds and the tagged release commit.* When building
[jira] [Created] (FLINK-34530) Build Release Candidate: 1.19.0-rc1
lincoln lee created FLINK-34530: --- Summary: Build Release Candidate: 1.19.0-rc1 Key: FLINK-34530 URL: https://issues.apache.org/jira/browse/FLINK-34530 Project: Flink Issue Type: New Feature Affects Versions: 1.17.0 Reporter: Matthias Pohl Assignee: Jing Ge Fix For: 1.17.0 The core of the release process is the build-vote-fix cycle. Each cycle produces one release candidate. The Release Manager repeats this cycle until the community approves one release candidate, which is then finalized. h4. Prerequisites Set up a few environment variables to simplify Maven commands that follow. This identifies the release candidate being built. Start with {{RC_NUM}} equal to 1 and increment it for each candidate: {code} RC_NUM="1" TAG="release-${RELEASE_VERSION}-rc${RC_NUM}" {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [VOTE] Release flink-connector-parent 1.1.0 release candidate #2
+1 - pom contents - source contents - Website PR On 19/02/2024 18:33, Etienne Chauchot wrote: Hi everyone, Please review and vote on the release candidate #2 for the version 1.1.0, as follows: [ ] +1, Approve the release [ ] -1, Do not approve the release (please provide specific comments) The complete staging area is available for your review, which includes: * JIRA release notes [1], * the official Apache source release to be deployed to dist.apache.org [2], which are signed with the key with fingerprint D1A76BA19D6294DD0033F6843A019F0B8DD163EA [3], * all artifacts to be deployed to the Maven Central Repository [4], * source code tag v1.1.0-rc2 [5], * website pull request listing the new release [6]. * confluence wiki: connector parent upgrade to version 1.1.0 that will be validated after the artifact is released (there is no PR mechanism on the wiki) [7] The vote will be open for at least 72 hours. It is adopted by majority approval, with at least 3 PMC affirmative votes. Thanks, Etienne [1] https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12353442 [2] https://dist.apache.org/repos/dist/dev/flink/flink-connector-parent-1.1.0-rc2 [3] https://dist.apache.org/repos/dist/release/flink/KEYS [4] https://repository.apache.org/content/repositories/orgapacheflink-1707 [5] https://github.com/apache/flink-connector-shared-utils/releases/tag/v1.1.0-rc2 [6] https://github.com/apache/flink-web/pull/717 [7] https://cwiki.apache.org/confluence/display/FLINK/Externalized+Connector+development
Re: [DISCUSS] FLINK-34440 Support Debezium Protobuf Confluent Format
Hi all, +1 to support the format in Flink. @Kevin: Do you already have an implementation for this inhouse that you are looking to upstream, or would you start from scratch? I'm asking because my current employer, Confluent, has a Protobuf Schema registry implementation for Flink, and I could help drive contributing this back to open source. If you already have an implementation, let's decide which one to use :) Best, Robert On Thu, Feb 22, 2024 at 2:05 PM David Radley wrote: > Hi Kevin, > Some thoughts on this. > I suggested an Apicurio registry format in the dev list, and was advised > to raise a FLIP for this, I suggest the same would apply here (or the > alternative to FLIPs if you cannot raise one). I am prototyping an Avro > Apicurio format, prior to raising the Flip, and notice that the readSchema > in the SchemaCoder only takes a byte array ,but I need to pass down the > Kafka headers (where the Apicurio globalId identifying the schema lives). > > I assume: > > * for the confluent Protobuf format you would extend the Protobuf > format to drive some Schema Registry logic for Protobuf (similar to the way > Avro does it) where the magic byte _ schema id can be obtained and the > schema looked up using the Confluent Schema registry. > * It would be good if any protobuf format enhancements for Schema > registries pass down the Kafka headers (I am thinking as a Map Object> for Avro) as well as the message payload so Apicurio registry could > work with this. > * It would make sense to have the Confluent schema lookup in common > code, which is part of the SchemaCoder readSchema logic. > * I assume the ProtobufSchemaCoder readSchema would return a Protobuf > Schema object. > > > > I also wondered whether these Kafka only formats should be moved to the > Kafka connector repo, or whether they might in the future be used outside > Kafka – e.g. Avro/Protobuf files in a database. >Kind regards, David. > > > From: Kevin Lam > Date: Wednesday, 21 February 2024 at 18:51 > To: dev@flink.apache.org > Subject: [EXTERNAL] [DISCUSS] FLINK-34440 Support Debezium Protobuf > Confluent Format > I would love to get some feedback from the community on this JIRA issue: > https://issues.apache.org/jira/projects/FLINK/issues/FLINK-34440 > > I am looking into creating a PR and would appreciate some review on the > approach. > > In terms of design I think we can mirror the `debezium-avro-confluent` and > `avro-confluent` formats already available in Flink: > >1. `protobuf-confluent` format which uses DynamicMessage >< > https://protobuf.dev/reference/java/api-docs/com/google/protobuf/DynamicMessage > > >for encoding and decoding. > - For encoding the Flink RowType will be used to dynamically create a > Protobuf Schema and register it with the Confluent Schema > Registry. It will > use the same schema to construct a DynamicMessage and serialize it. > - For decoding, the schema will be fetched from the registry and use > DynamicMessage to deserialize and convert the Protobuf object to a > Flink > RowData. > - Note: here there is no external .proto file >2. `debezium-avro-confluent` format which unpacks the Debezium Envelope >and collects the appropriate UPDATE_BEFORE, UPDATE_AFTER, INSERT, DELETE >events. > - We may be able to refactor and reuse code from the existing > DebeziumAvroDeserializationSchema + DebeziumAvroSerializationSchema > since > the deser logic is largely delegated to and these Schemas are > concerned > with the handling the Debezium envelope. >3. Move the Confluent Schema Registry Client code to a separate maven >module, flink-formats/flink-confluent-common, and extend it to support >ProtobufSchemaProvider >< > https://github.com/confluentinc/schema-registry/blob/ca226f2e1e2091c67b372338221b57fdd435d9f2/protobuf-provider/src/main/java/io/confluent/kafka/schemaregistry/protobuf/ProtobufSchemaProvider.java#L26 > > >. > > > Does anyone have any feedback or objections to this approach? > > Unless otherwise stated above: > > IBM United Kingdom Limited > Registered in England and Wales with number 741598 > Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU >
[jira] [Created] (FLINK-34529) Projection cannot be pushed down through rank operator.
yisha zhou created FLINK-34529: -- Summary: Projection cannot be pushed down through rank operator. Key: FLINK-34529 URL: https://issues.apache.org/jira/browse/FLINK-34529 Project: Flink Issue Type: Bug Components: Table SQL / Planner Affects Versions: 1.19.0 Reporter: yisha zhou When there is a rank/deduplicate operator, the projection based on output of this operator cannot be pushed down to the input of it. The following code can help reproducing the issue: {code:java} val util = streamTestUtil() util.addTableSource[(String, Int, String)]("T1", 'a, 'b, 'c) util.addTableSource[(String, Int, String)]("T2", 'd, 'e, 'f) val sql = """ |SELECT a FROM ( | SELECT a, f, | ROW_NUMBER() OVER (PARTITION BY f ORDER BY c DESC) as rank_num | FROM T1, T2 | WHERE T1.a = T2.d |) |WHERE rank_num = 1 """.stripMargin util.verifyPlan(sql){code} The plan is expected to be: {code:java} Calc(select=[a]) +- Rank(strategy=[AppendFastStrategy], rankType=[ROW_NUMBER], rankRange=[rankStart=1, rankEnd=1], partitionBy=[f], orderBy=[c DESC], select=[a, c, f]) +- Exchange(distribution=[hash[f]]) +- Calc(select=[a, c, f]) +- Join(joinType=[InnerJoin], where=[=(a, d)], select=[a, c, d, f], leftInputSpec=[NoUniqueKey], rightInputSpec=[NoUniqueKey]) :- Exchange(distribution=[hash[a]]) : +- Calc(select=[a, c]) : +- LegacyTableSourceScan(table=[[default_catalog, default_database, T1, source: [TestTableSource(a, b, c)]]], fields=[a, b, c]) +- Exchange(distribution=[hash[d]]) +- Calc(select=[d, f]) +- LegacyTableSourceScan(table=[[default_catalog, default_database, T2, source: [TestTableSource(d, e, f)]]], fields=[d, e, f]) {code} Notice that the 'select' of Join operator is [a, c, d, f]. However the actual plan is: {code:java} Calc(select=[a]) +- Rank(strategy=[AppendFastStrategy], rankType=[ROW_NUMBER], rankRange=[rankStart=1, rankEnd=1], partitionBy=[f], orderBy=[c DESC], select=[a, c, f]) +- Exchange(distribution=[hash[f]]) +- Calc(select=[a, c, f]) +- Join(joinType=[InnerJoin], where=[=(a, d)], select=[a, b, c, d, e, f], leftInputSpec=[NoUniqueKey], rightInputSpec=[NoUniqueKey]) :- Exchange(distribution=[hash[a]]) : +- LegacyTableSourceScan(table=[[default_catalog, default_database, T1, source: [TestTableSource(a, b, c)]]], fields=[a, b, c]) +- Exchange(distribution=[hash[d]]) +- LegacyTableSourceScan(table=[[default_catalog, default_database, T2, source: [TestTableSource(d, e, f)]]], fields=[d, e, f]) {code} the 'select' of Join operator is [a, b, c, d, e, f], which means the projection in the final Calc is not passed through the Rank. And I think an easy way to fix this issue is to add org.apache.calcite.rel.rules.ProjectWindowTransposeRule into FlinkStreamRuleSets.LOGICAL_OPT_RULES. -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [DISCUSS] Apache Bahir retired
Thank you Leonard for sharing your thoughts on this topic. I agree that complying with the Flink community connector development process would be a must, if there are no legal or copyright issues, I would be happy to take that task for this particular case. I am no legal/copyright expert myslef, but Bahir uses the Apache 2.0 license as well, so I believe it should be possible without too many complications, but I try to look for help on that front. FYI we are using and supporting a downstream fork of the Kudu connector on top of Flink 1.18 without any major modifications, so it is pretty up to date upstream as well. Regards, Ferenc On Monday, February 26th, 2024 at 10:29, Leonard Xu wrote: > > > Hey, Ferenc > > Thanks for initiating this discussion. Apache Bahir is a great project that > provided significant assistance to many Apache Flink/Spark users. It's pity > news that it has been retired. > > I believe that connectivity is crucial for building the ecosystem of the > Flink such a computing engine. The community, or at least I, would actively > support the introduction and maintenance of new connectors. Therefore, adding > a Kudu connector or other connectors from Bahir makes sense to me, as long as > we adhere to the development process for connectors in the Flink community[1]. > I recently visited the Bahir Flink repository. Although the last release of > Bahir Flink was in August ’22[2] which is compatible with Flink 1.14, its > latest code is compatible with Flink 1.17[3]. So, based on the existing > codebase, developing an official Apache Flink connector for Kudu or other > connectors should be manageable. One point to consider is that if we're not > developing a connector entirely from scratch but based on an existing > repository, we must ensure that there are no copyright issues. Here, "no > issues" means satisfying both Apache Bahir's and Apache Flink's copyright > requirements. Honestly, I'm not an expert in copyright or legal matters. If > you're interested in contributing to the Kudu connector, it might be > necessary to attract other experienced community members to participate in > this aspect. > > Best, > Leonard > > [1] https://cwiki.apache.org/confluence/display/FLINK/FLIP+Connector+Template > [2] https://github.com/apache/bahir-flink/releases/tag/v1.1.0 > [3] https://github.com/apache/bahir-flink/blob/master/pom.xml#L116 > > > > > 2024年2月22日 下午6:37,Ferenc Csaky ferenc.cs...@pm.me.INVALID 写道: > > > > Hello devs, > > > > Just saw that the Bahir project is retired [1]. Any plans on what's > > happening with the Flink connectors that were part of this project? We > > specifically use the Kudu connector and integrate it to our platform at > > Cloudera, so we would be okay to maintain it. Would it be possible to carry > > it over as separate connector repu under the Apache umbrella similarly as > > it happened with the external connectors previously? > > > > Thanks, > > Ferenc
[jira] [Created] (FLINK-34528) With shuffling data, when a Task Manager is killed, restarting the Flink job takes a considerable amount of time
junzhong qin created FLINK-34528: Summary: With shuffling data, when a Task Manager is killed, restarting the Flink job takes a considerable amount of time Key: FLINK-34528 URL: https://issues.apache.org/jira/browse/FLINK-34528 Project: Flink Issue Type: Sub-task Reporter: junzhong qin Attachments: image-2024-02-27-16-35-04-464.png In the test case, the pipeline looks like: !image-2024-02-27-16-35-04-464.png! The Source: Custom Source generates strings, and the job keyBy the strings to Sink: Unnamed. # parallelism = 100 # taskmanager.numberOfTaskSlots = 2 # disable checkpoint The worker was killed at {code:java} 2024-02-27 16:41:49,982 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph [] - Sink: Unnamed (6/100) (2f1c7b22098a273f5471e3e8f794e1d3_0a448493b4782967b150582570326227_5_0) switched from RUNNING to FAILED on container_e2472_1705993319725_62292_01_46 @ xxx (dataPort=38827).org.apache.flink.runtime.io.network.netty.exception.RemoteTransportException: Connection unexpectedly closed by remote task manager 'xxx/10.169.18.138:35983 [ container_e2472_1705993319725_62292_01_10(xxx:5454) ] '. This might indicate that the remote task manager was lost.at org.apache.flink.runtime.io.network.netty.CreditBasedPartitionRequestClientHandler.channelInactive(CreditBasedPartitionRequestClientHandler.java:134) at org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:262) at org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:248) at org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:241) at org.apache.flink.shaded.netty4.io.netty.channel.ChannelInboundHandlerAdapter.channelInactive(ChannelInboundHandlerAdapter.java:81) at org.apache.flink.runtime.io.network.netty.NettyMessageClientDecoderDelegate.channelInactive(NettyMessageClientDecoderDelegate.java:94) at org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:262) at org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:248) at org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:241) at org.apache.flink.shaded.netty4.io.netty.channel.DefaultChannelPipeline$HeadContext.channelInactive(DefaultChannelPipeline.java:1405) at org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:262) at org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:248) at org.apache.flink.shaded.netty4.io.netty.channel.DefaultChannelPipeline.fireChannelInactive(DefaultChannelPipeline.java:901) at org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannel$AbstractUnsafe$7.run(AbstractChannel.java:813) at org.apache.flink.shaded.netty4.io.netty.util.concurrent.AbstractEventExecutor.runTask(AbstractEventExecutor.java:174) at org.apache.flink.shaded.netty4.io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:167) at org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:470) at org.apache.flink.shaded.netty4.io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:403) at org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997) at org.apache.flink.shaded.netty4.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) at java.lang.Thread.run(Thread.java:748) ~[?:1.8.0_252]{code} The job took about 16 seconds to restart. {code:java} // The task was scheduled to a task manager that had already been killed 2024-02-27 16:41:53,506 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph [] - Deploying Source: Custom Source (16/100) (attempt #3) with attempt id 2f1c7b22098a273f5471e3e8f794e1d3_bc764cd8ddf7a0cff126f51c16239658_15_3 and vertex id bc764cd8ddf7a0cff126f51c16239658_15 to container_e2472_1705993319725_62292_01_10 @ xxx (dataPort=35983) with allocation id 975dded4548ad15b36d0e5e6aac8f5b6 // The last task switched from INITIALIZING to RUNNING 2024-02-27 16:42:05,176 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph [] - Sink: Unnamed (64/100) (2f1c7b22098a273f5471e3e8f794e1d3_0a448493b4782967b150582570326227_63_14) switched from INITIALIZING to RUNNING. {code} --
[jira] [Created] (FLINK-34527) Deprecate Time classes also in PyFlink
Matthias Pohl created FLINK-34527: - Summary: Deprecate Time classes also in PyFlink Key: FLINK-34527 URL: https://issues.apache.org/jira/browse/FLINK-34527 Project: Flink Issue Type: Bug Components: API / Python Affects Versions: 1.20.0 Reporter: Matthias Pohl FLINK-32570 deprecated the Time classes. But we missed touched the PyFlink-related APIs. -- This message was sent by Atlassian Jira (v8.20.10#820010)