Re: [DISCUSS] Move CheckpointingMode to flink-core

2024-02-27 Thread Zakelly Lan
Hi Lincoln,

Given that we have finished the testing for 1.19, I agree it is better not
merge this into 1.19. Thanks for RMs' attention!

Hi Chesney and Junrui,

Thanks for your advice. My original intention is to move the class as well
as change the package to make it clean. But it involves much more effort.
Here are several options we have:

   1. Move CheckpointingMode to flink-core and keep the same package. No
   more deprecation and API changes. But it will leave a
   'org.apache.flink.streaming.api' package in flink-core.
   2. Introduce new CheckpointingMode in package
   'org.apache.flink.core.execution' and deprecate the old one. Deprecate the
   corresponding getter/setter of 'CheckpointConfig' and introduce new ones
   with a similar but different name (e.g. set/getCheckpointMode). We will
   discuss the removal of those deprecation later in 2.x.
   3. Based on 1, move CheckpointingMode to package
   'org.apache.flink.core.execution' in 2.0. This is a breaking change that
   needs more discussion.

Both ways work. I'm slightly inclined to option 1, or option 3 if we all
agree, since the new getter/setter may also bring in confusions thus we
cannot make the API purely clean. WDYT?


Best,
Zakelly

On Wed, Feb 28, 2024 at 10:14 AM Junrui Lee  wrote:

> Hi Zakelly,
>
> I agree with Chesnay's response. I would suggest that during the process of
> moving CheckpointingMode from the flink-streaming-java module to the
> flink-core module, we should keep the package name unchanged. This approach
> would be completely transparent to users. In fact, this practice should be
> applicable to many of our desired moves from flink-streaming-java to
> higher-level modules, such as flink-runtime and flink-core.
>
> Best,
> Junrui
>
> Chesnay Schepler  于2024年2月28日周三 05:18写道:
>
> > Moving classes (== keep the same package) to a module higher up in the
> > dependency tree should not be a breaking change and can imo be done
> > anytime without any risk to users.
> >
> > On 27/02/2024 17:01, Lincoln Lee wrote:
> > > Hi Zakelly,
> > >
> > > Thanks for letting us 1.19 RMs know about this!
> > >
> > > This change has been discussed during today's release sync meeting, we
> > > suggest not merge it into 1.19.
> > > We can continue discussing the removal in 2.x separately.
> > >
> > > Best,
> > > Lincoln Lee
> > >
> > >
> > > Hangxiang Yu  于2024年2月27日周二 11:28写道:
> > >
> > >> Hi, Zakelly.
> > >> Thanks for driving this.
> > >> Moving this class to flink-core makes sense to me which could make the
> > code
> > >> path and configs clearer.
> > >> It's marked as @Public from 1.0 and 1.20 should be the next long-term
> > >> version, so 1.19 should have been a suitable version to do it.
> > >> And also look forward to thoughts of other developers/RMs since 1.19
> is
> > >> currently under a feature freeze status.
> > >>
> > >> On Mon, Feb 26, 2024 at 6:42 PM Zakelly Lan 
> > wrote:
> > >>
> > >>> Hi devs,
> > >>>
> > >>> When working on the FLIP-406[1], I realized that moving all options
> of
> > >>> ExecutionCheckpointingOptions(flink-streaming-java) to
> > >>> CheckpointingOptions(flink-core) depends on relocating the
> > >>> enum CheckpointingMode(flink-streaming-java) to flink-core module.
> > >> However,
> > >>> the CheckpointingMode is annotated as @Public and used by datastream
> > api
> > >>> like 'CheckpointConfig#setCheckpointingMode'. So I'd like to start a
> > >>> discussion on moving the CheckpointingMode to flink-core. It is in a
> > >> little
> > >>> bit of a hurry if we want the old enum to be entirely removed in
> Flink
> > >> 2.x
> > >>> series, since the deprecation should be shipped in the upcoming Flink
> > >> 1.19.
> > >>> I suggest not creating a dedicated FLIP and treating this as a
> sub-task
> > >> of
> > >>> FLIP-406.
> > >>>
> > >>> I prepared a minimal change of providing new APIs and deprecating the
> > old
> > >>> ones[2], which could be merged to 1.19 if we agree to do so.
> > >>>
> > >>> Looking forward to your thoughts! Also cc RMs of 1.19 about this.
> > >>>
> > >>> [1]
> > >>>
> > >>
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=284789560
> > >>> [2]
> > >>>
> > >>>
> > >>
> >
> https://github.com/apache/flink/commit/9bdd237d0322df8853f1b9e6ae658f77b9175237
> > >>> Best,
> > >>> Zakelly
> > >>>
> > >>
> > >> --
> > >> Best,
> > >> Hangxiang.
> > >>
> >
> >
>


Re: [ANNOUNCE] Flink 1.19 Cross-team testing completed & sync summary on 02/27/2024

2024-02-27 Thread Jingsong Li
Thanks Lincoln and all contributors!

Best,
Jingsong

On Wed, Feb 28, 2024 at 12:15 PM Xintong Song  wrote:
>
> Matthias has already said it on the release sync, but I still want to say
> it again. It's amazing how smooth the release testing goes for this
> release. Great thanks to the release managers and all contributors who make
> this happen.
>
> Best,
>
> Xintong
>
>
>
> On Wed, Feb 28, 2024 at 10:10 AM Lincoln Lee  wrote:
>
> > Hi devs,
> >
> > I'd like to share some highlights from the release sync on 02/27/2024
> >
> > - Cross-team testing
> >
> > We've finished all of the testing work[1]. Huge thanks to all contributors
> > and volunteers for the effort on this!
> >
> > - Blockers
> >
> > Two api change merge requests[2][3] had been discussed, there was an
> > agreement on the second pr, as it is a fix for an
> > unintended behavior newly introduced in 1.19, and we need to avoid
> > releasing it to users. For the 1st pr, we suggest continue the discussing
> > separately.
> > So we will wait for [3] done and then create the first release candidate
> > 1.19.0-rc1(expecting within this week if no new blockers).
> >
> > - Release notes
> >
> > Revision to the draft version of the release note[4] has been closed, and
> > the formal pr[5] has been submitted,
> > also the release announcement pr will be ready later this week, please
> > continue to help review before 1.19 release, thanks!
> >
> > - Sync meeting (https://meet.google.com/vcx-arzs-trv)
> >
> > We've already switched to weekly release sync, so the next release sync
> > will be on Mar 5th, 2024. Feel free to join!
> >
> > [1] https://issues.apache.org/jira/browse/FLINK-34285
> > [2] https://lists.apache.org/thread/2llhhbkcx5w7chp3d6cthoqc8kwfvw6x
> > [3]
> > https://github.com/apache/flink/pull/24387#pullrequestreview-1902749309
> > [4]
> > https://docs.google.com/document/d/1HLF4Nhvkln4zALKJdwRErCnPzufh7Z3BhhkWlk9Zh7w
> > [5] https://github.com/apache/flink/pull/24394
> >
> > Best,
> > Yun, Jing, Martijn and Lincoln
> >


Re: [ANNOUNCE] Flink 1.19 Cross-team testing completed & sync summary on 02/27/2024

2024-02-27 Thread Xintong Song
Matthias has already said it on the release sync, but I still want to say
it again. It's amazing how smooth the release testing goes for this
release. Great thanks to the release managers and all contributors who make
this happen.

Best,

Xintong



On Wed, Feb 28, 2024 at 10:10 AM Lincoln Lee  wrote:

> Hi devs,
>
> I'd like to share some highlights from the release sync on 02/27/2024
>
> - Cross-team testing
>
> We've finished all of the testing work[1]. Huge thanks to all contributors
> and volunteers for the effort on this!
>
> - Blockers
>
> Two api change merge requests[2][3] had been discussed, there was an
> agreement on the second pr, as it is a fix for an
> unintended behavior newly introduced in 1.19, and we need to avoid
> releasing it to users. For the 1st pr, we suggest continue the discussing
> separately.
> So we will wait for [3] done and then create the first release candidate
> 1.19.0-rc1(expecting within this week if no new blockers).
>
> - Release notes
>
> Revision to the draft version of the release note[4] has been closed, and
> the formal pr[5] has been submitted,
> also the release announcement pr will be ready later this week, please
> continue to help review before 1.19 release, thanks!
>
> - Sync meeting (https://meet.google.com/vcx-arzs-trv)
>
> We've already switched to weekly release sync, so the next release sync
> will be on Mar 5th, 2024. Feel free to join!
>
> [1] https://issues.apache.org/jira/browse/FLINK-34285
> [2] https://lists.apache.org/thread/2llhhbkcx5w7chp3d6cthoqc8kwfvw6x
> [3]
> https://github.com/apache/flink/pull/24387#pullrequestreview-1902749309
> [4]
> https://docs.google.com/document/d/1HLF4Nhvkln4zALKJdwRErCnPzufh7Z3BhhkWlk9Zh7w
> [5] https://github.com/apache/flink/pull/24394
>
> Best,
> Yun, Jing, Martijn and Lincoln
>


Re: [VOTE] FLIP-314: Support Customized Job Lineage Listener

2024-02-27 Thread Peter Huang
+1 (non-binding)

Thanks for consolidating the opinions from both Flink and OpenLineage
communities. Look forward to the collaboration.


Best Regards
Peter Huang

On Tue, Feb 27, 2024 at 6:13 PM Yong Fang  wrote:

> Hi devs,
>
> I would like to restart a vote about FLIP-314: Support Customized Job
> Lineage Listener[1].
>
> Previously, we added lineage related interfaces in FLIP-314. Before the
> interfaces were developed and merged into the master, @Maciej and
> @Zhenqiu provided valuable suggestions for the interface from the
> perspective of the lineage system. So we updated the interfaces of FLIP-314
> and discussed them again in the discussion thread [2].
>
> So I am here to initiate a new vote on FLIP-314, the vote will be open for
> at least 72 hours unless there is an objection or insufficient votes
>
> [1]
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-314%3A+Support+Customized+Job+Lineage+Listener
> [2] https://lists.apache.org/thread/wopprvp3ww243mtw23nj59p57cghh7mc
>
> Best,
> Fang Yong
>


[jira] [Created] (FLINK-34535) Support JobPlanInfo for the explain result

2024-02-27 Thread yuanfenghu (Jira)
yuanfenghu created FLINK-34535:
--

 Summary: Support JobPlanInfo for the explain result
 Key: FLINK-34535
 URL: https://issues.apache.org/jira/browse/FLINK-34535
 Project: Flink
  Issue Type: Improvement
  Components: Table SQL / Planner
Reporter: yuanfenghu


In the Flink Sql Explain syntax, we can set ExplainDetails to plan 
JSON_EXECUTION_PLAN, but we cannot plan JobPlanInfo. If we can explain this 
part of the information, referring to JobPlanInfo, I can combine it with the 
parameter `pipeline.jobvertex-parallelism-overrides` to set up my task 
parallelism



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Flink kinesis connector v4.2.0-1.18 has issue when consuming from EFO consumer

2024-02-27 Thread Xiaolong Wang
Hi,

I used the flink-connector-kinesis (4.0.2-1.18) to consume from Kinesis.
The job can start but will fail within 1 hour. Detailed error log
is attached.

When I changed the version of the flink-connector-kinesis to `1.15.2` ,
everything settled.

Any idea to fix it ?
create table kafka_event_v1 (
  `timestamp` bigint,
  serverTimestamp bigint,

  name string,
  data string,
  app string,
  `user` string,
  context string,
  browser row<
url string,
referrer string,
userAgent string,
`language` string,
title string,
viewportWidth int,
viewportHeight int,
contentWidth int,
contentHeight int,
cookies map,
name string,
version string,
device row<
  model string,
  type string,
  vendor string
>,
engine row<
  name string,
  version string
>,
os row<
  name string,
  version string
>
  >,
  abtests map,
  apikey string,
  lifecycleId string,
  sessionId string,
  instanceId string,
  requestId string,
  eventId string,
  `trigger` string,

  virtualId string,
  accountId string,
  ip string,

  serverTimestampLtz as to_timestamp(from_unixtime(serverTimestamp / 1000)),
  watermark for serverTimestampLtz as serverTimestampLtz - interval '5' second
) with (
  'connector' = 'kafka',
  'properties.bootstrap.servers' = 'shared.kafka.smartnews.internal:9093',
  'topic' = 'shared-cluster-sn-pixel-event-v1-dev',
  'scan.startup.mode' = 'earliest-offset',
  'format' = 'json',
  'json.ignore-parse-errors' = 'true',

  'properties.group.id' = 'event_v2',
  'properties.security.protocol' = 'SASL_SSL',
  'properties.sasl.mechanism' = 'SNTOKEN',
  'properties.sasl.login.class' = 
'com.smartnews.dp.kafka.security.sn.auth.SnTokenLogin',
  'properties.sasl.login.callback.handler.class' = 
'com.smartnews.dp.kafka.security.sn.auth.SnTokenCallbackHandler',
  'properties.sasl.client.callback.handler.class' = 
'com.smartnews.dp.kafka.security.sn.sasl.SnTokenSaslClientCallbackHandler',
  'properties.sasl.jaas.config' = 
'com.smartnews.dp.kafka.security.sn.auth.SnTokenLoginModule required 
username="sn-pixel" password="aQXJcNUsCuIZpICHO9bQ" env="prd";',
  'properties.ssl.truststore.type' = 'PEM'
);

create catalog iceberg_dev with (
  'type'='iceberg',
  'catalog-type'='hive',
  'uri'='thrift://dev-hive-metastore.smartnews.internal:9083',
  'warehouse'='s3a://smartnews-dmp/warehouse/development'
);

insert into iceberg_dev.pixel.event_v2 /*+ options(
  'partition.time-extractor.timestamp-pattern'='$dt 00:00:00',
  'sink.partition-commit.policy.kind'='metastore,success-file',
  'auto-compaction'='true'
) */
select
  `timestamp`,
  serverTimestamp,
  data,
  app,
  `user`,
  context,
  browser,
  abtests,
  lifecycleId,
  sessionId,
  instanceId,
  requestId,
  eventId,
  `trigger`,

  virtualId,
  accountId,
  ip,

  date_format(serverTimestampLtz, '-MM-dd') dt,
  apikey,
  name
from
  default_catalog.default_database.kafka_event_v1
;





Re: Requesting link to join Slack Community

2024-02-27 Thread Hang Ruan
Hi, Geetesh.

Here is the invite link :
https://join.slack.com/t/apache-flink/shared_invite/zt-1t4khgllz-Fm1CnXzdBbUchBz4HzJCAg
.
I will raise a PR to update the link.

Best,
Hang

Geetesh Nikhade  于2024年2月28日周三 04:49写道:

> Hi Folks,
>
> I would like to join the Apache Flink Community on Slack, but it looks
> like the link shared on official flink website seems to have expired. Can
> someone please share a new join link? or let me know what would be the best
> way to get that link?
>
> Thanks in advance.
>
> Best,
> Geetesh
>


Re: [DISCUSS] Move CheckpointingMode to flink-core

2024-02-27 Thread Junrui Lee
Hi Zakelly,

I agree with Chesnay's response. I would suggest that during the process of
moving CheckpointingMode from the flink-streaming-java module to the
flink-core module, we should keep the package name unchanged. This approach
would be completely transparent to users. In fact, this practice should be
applicable to many of our desired moves from flink-streaming-java to
higher-level modules, such as flink-runtime and flink-core.

Best,
Junrui

Chesnay Schepler  于2024年2月28日周三 05:18写道:

> Moving classes (== keep the same package) to a module higher up in the
> dependency tree should not be a breaking change and can imo be done
> anytime without any risk to users.
>
> On 27/02/2024 17:01, Lincoln Lee wrote:
> > Hi Zakelly,
> >
> > Thanks for letting us 1.19 RMs know about this!
> >
> > This change has been discussed during today's release sync meeting, we
> > suggest not merge it into 1.19.
> > We can continue discussing the removal in 2.x separately.
> >
> > Best,
> > Lincoln Lee
> >
> >
> > Hangxiang Yu  于2024年2月27日周二 11:28写道:
> >
> >> Hi, Zakelly.
> >> Thanks for driving this.
> >> Moving this class to flink-core makes sense to me which could make the
> code
> >> path and configs clearer.
> >> It's marked as @Public from 1.0 and 1.20 should be the next long-term
> >> version, so 1.19 should have been a suitable version to do it.
> >> And also look forward to thoughts of other developers/RMs since 1.19 is
> >> currently under a feature freeze status.
> >>
> >> On Mon, Feb 26, 2024 at 6:42 PM Zakelly Lan 
> wrote:
> >>
> >>> Hi devs,
> >>>
> >>> When working on the FLIP-406[1], I realized that moving all options of
> >>> ExecutionCheckpointingOptions(flink-streaming-java) to
> >>> CheckpointingOptions(flink-core) depends on relocating the
> >>> enum CheckpointingMode(flink-streaming-java) to flink-core module.
> >> However,
> >>> the CheckpointingMode is annotated as @Public and used by datastream
> api
> >>> like 'CheckpointConfig#setCheckpointingMode'. So I'd like to start a
> >>> discussion on moving the CheckpointingMode to flink-core. It is in a
> >> little
> >>> bit of a hurry if we want the old enum to be entirely removed in Flink
> >> 2.x
> >>> series, since the deprecation should be shipped in the upcoming Flink
> >> 1.19.
> >>> I suggest not creating a dedicated FLIP and treating this as a sub-task
> >> of
> >>> FLIP-406.
> >>>
> >>> I prepared a minimal change of providing new APIs and deprecating the
> old
> >>> ones[2], which could be merged to 1.19 if we agree to do so.
> >>>
> >>> Looking forward to your thoughts! Also cc RMs of 1.19 about this.
> >>>
> >>> [1]
> >>>
> >>
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=284789560
> >>> [2]
> >>>
> >>>
> >>
> https://github.com/apache/flink/commit/9bdd237d0322df8853f1b9e6ae658f77b9175237
> >>> Best,
> >>> Zakelly
> >>>
> >>
> >> --
> >> Best,
> >> Hangxiang.
> >>
>
>


[VOTE] FLIP-314: Support Customized Job Lineage Listener

2024-02-27 Thread Yong Fang
Hi devs,

I would like to restart a vote about FLIP-314: Support Customized Job
Lineage Listener[1].

Previously, we added lineage related interfaces in FLIP-314. Before the
interfaces were developed and merged into the master, @Maciej and
@Zhenqiu provided valuable suggestions for the interface from the
perspective of the lineage system. So we updated the interfaces of FLIP-314
and discussed them again in the discussion thread [2].

So I am here to initiate a new vote on FLIP-314, the vote will be open for
at least 72 hours unless there is an objection or insufficient votes

[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-314%3A+Support+Customized+Job+Lineage+Listener
[2] https://lists.apache.org/thread/wopprvp3ww243mtw23nj59p57cghh7mc

Best,
Fang Yong


[ANNOUNCE] Flink 1.19 Cross-team testing completed & sync summary on 02/27/2024

2024-02-27 Thread Lincoln Lee
Hi devs,

I'd like to share some highlights from the release sync on 02/27/2024

- Cross-team testing

We've finished all of the testing work[1]. Huge thanks to all contributors
and volunteers for the effort on this!

- Blockers

Two api change merge requests[2][3] had been discussed, there was an
agreement on the second pr, as it is a fix for an
unintended behavior newly introduced in 1.19, and we need to avoid
releasing it to users. For the 1st pr, we suggest continue the discussing
separately.
So we will wait for [3] done and then create the first release candidate
1.19.0-rc1(expecting within this week if no new blockers).

- Release notes

Revision to the draft version of the release note[4] has been closed, and
the formal pr[5] has been submitted,
also the release announcement pr will be ready later this week, please
continue to help review before 1.19 release, thanks!

- Sync meeting (https://meet.google.com/vcx-arzs-trv)

We've already switched to weekly release sync, so the next release sync
will be on Mar 5th, 2024. Feel free to join!

[1] https://issues.apache.org/jira/browse/FLINK-34285
[2] https://lists.apache.org/thread/2llhhbkcx5w7chp3d6cthoqc8kwfvw6x
[3] https://github.com/apache/flink/pull/24387#pullrequestreview-1902749309
[4]
https://docs.google.com/document/d/1HLF4Nhvkln4zALKJdwRErCnPzufh7Z3BhhkWlk9Zh7w
[5] https://github.com/apache/flink/pull/24394

Best,
Yun, Jing, Martijn and Lincoln


Re: [DISCUSS] Move CheckpointingMode to flink-core

2024-02-27 Thread Chesnay Schepler
Moving classes (== keep the same package) to a module higher up in the 
dependency tree should not be a breaking change and can imo be done 
anytime without any risk to users.


On 27/02/2024 17:01, Lincoln Lee wrote:

Hi Zakelly,

Thanks for letting us 1.19 RMs know about this!

This change has been discussed during today's release sync meeting, we
suggest not merge it into 1.19.
We can continue discussing the removal in 2.x separately.

Best,
Lincoln Lee


Hangxiang Yu  于2024年2月27日周二 11:28写道:


Hi, Zakelly.
Thanks for driving this.
Moving this class to flink-core makes sense to me which could make the code
path and configs clearer.
It's marked as @Public from 1.0 and 1.20 should be the next long-term
version, so 1.19 should have been a suitable version to do it.
And also look forward to thoughts of other developers/RMs since 1.19 is
currently under a feature freeze status.

On Mon, Feb 26, 2024 at 6:42 PM Zakelly Lan  wrote:


Hi devs,

When working on the FLIP-406[1], I realized that moving all options of
ExecutionCheckpointingOptions(flink-streaming-java) to
CheckpointingOptions(flink-core) depends on relocating the
enum CheckpointingMode(flink-streaming-java) to flink-core module.

However,

the CheckpointingMode is annotated as @Public and used by datastream api
like 'CheckpointConfig#setCheckpointingMode'. So I'd like to start a
discussion on moving the CheckpointingMode to flink-core. It is in a

little

bit of a hurry if we want the old enum to be entirely removed in Flink

2.x

series, since the deprecation should be shipped in the upcoming Flink

1.19.

I suggest not creating a dedicated FLIP and treating this as a sub-task

of

FLIP-406.

I prepared a minimal change of providing new APIs and deprecating the old
ones[2], which could be merged to 1.19 if we agree to do so.

Looking forward to your thoughts! Also cc RMs of 1.19 about this.

[1]


https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=284789560

[2]



https://github.com/apache/flink/commit/9bdd237d0322df8853f1b9e6ae658f77b9175237

Best,
Zakelly



--
Best,
Hangxiang.





Requesting link to join Slack Community

2024-02-27 Thread Geetesh Nikhade
Hi Folks,

I would like to join the Apache Flink Community on Slack, but it looks like the 
link shared on official flink website seems to have expired. Can someone please 
share a new join link? or let me know what would be the best way to get that 
link?

Thanks in advance.

Best,
Geetesh


Re: [DISCUSS] FLINK-34440 Support Debezium Protobuf Confluent Format

2024-02-27 Thread Kevin Lam
Hey Robert,

Thanks for your response. I have a partial implementation, just for the
decoding portion.

The code I have is pretty rough and doesn't do any of the refactors I
mentioned, but the decoder logic does pull the schema from the schema
registry and use that to deserialize the DynamicMessage before converting
it to RowData using a DynamicMessageToRowDataConverter class. For the other
aspects, I would need to start from scratch for the encoder.

Would be very happy to see you drive the contribution back to open source
from Confluent, or collaborate on this.

Another topic I had is Protobuf's field ids. Ideally in Flink it would be
nice if we are idiomatic about not renumbering them in incompatible ways,
similar to what's discussed on the Schema Registry issue here:
https://github.com/confluentinc/schema-registry/issues/2551


On Tue, Feb 27, 2024 at 5:51 AM Robert Metzger  wrote:

> Hi all,
>
> +1 to support the format in Flink.
>
> @Kevin: Do you already have an implementation for this inhouse that you are
> looking to upstream, or would you start from scratch?
> I'm asking because my current employer, Confluent, has a Protobuf Schema
> registry implementation for Flink, and I could help drive contributing this
> back to open source.
> If you already have an implementation, let's decide which one to use :)
>
> Best,
> Robert
>
> On Thu, Feb 22, 2024 at 2:05 PM David Radley 
> wrote:
>
> > Hi Kevin,
> > Some thoughts on this.
> > I suggested an Apicurio registry format in the dev list, and was advised
> > to raise a FLIP for this, I suggest the same would apply here (or the
> > alternative to FLIPs if you cannot raise one). I am prototyping an Avro
> > Apicurio format, prior to raising the Flip,  and notice that the
> readSchema
> > in the SchemaCoder only takes a byte array ,but I need to pass down the
> > Kafka headers (where the Apicurio globalId identifying the schema lives).
> >
> > I assume:
> >
> >   *   for the confluent Protobuf format you would extend the Protobuf
> > format to drive some Schema Registry logic for Protobuf (similar to the
> way
> > Avro does it) where the magic byte _ schema id can be obtained and the
> > schema looked up using the Confluent Schema registry.
> >   *   It would be good if any protobuf format enhancements for Schema
> > registries pass down the Kafka headers (I am thinking as a Map > Object> for Avro) as well as the message payload so Apicurio registry
> could
> > work with this.
> >   *   It would make sense to have the Confluent schema lookup in common
> > code, which is part of the SchemaCoder readSchema  logic.
> >   *   I assume the ProtobufSchemaCoder readSchema would return a Protobuf
> > Schema object.
> >
> >
> >
> > I also wondered whether these Kafka only formats should be moved to the
> > Kafka connector repo, or whether they might in the future be used outside
> > Kafka – e.g. Avro/Protobuf files in a database.
> >Kind regards, David.
> >
> >
> > From: Kevin Lam 
> > Date: Wednesday, 21 February 2024 at 18:51
> > To: dev@flink.apache.org 
> > Subject: [EXTERNAL] [DISCUSS] FLINK-34440 Support Debezium Protobuf
> > Confluent Format
> > I would love to get some feedback from the community on this JIRA issue:
> > https://issues.apache.org/jira/projects/FLINK/issues/FLINK-34440
> >
> > I am looking into creating a PR and would appreciate some review on the
> > approach.
> >
> > In terms of design I think we can mirror the `debezium-avro-confluent`
> and
> > `avro-confluent` formats already available in Flink:
> >
> >1. `protobuf-confluent` format which uses DynamicMessage
> ><
> >
> https://protobuf.dev/reference/java/api-docs/com/google/protobuf/DynamicMessage
> > >
> >for encoding and decoding.
> >   - For encoding the Flink RowType will be used to dynamically
> create a
> >   Protobuf Schema and register it with the Confluent Schema
> > Registry. It will
> >   use the same schema to construct a DynamicMessage and serialize it.
> >   - For decoding, the schema will be fetched from the registry and
> use
> >   DynamicMessage to deserialize and convert the Protobuf object to a
> > Flink
> >   RowData.
> >   - Note: here there is no external .proto file
> >2. `debezium-avro-confluent` format which unpacks the Debezium
> Envelope
> >and collects the appropriate UPDATE_BEFORE, UPDATE_AFTER, INSERT,
> DELETE
> >events.
> >   - We may be able to refactor and reuse code from the existing
> >   DebeziumAvroDeserializationSchema + DebeziumAvroSerializationSchema
> > since
> >   the deser logic is largely delegated to and these Schemas are
> > concerned
> >   with the handling the Debezium envelope.
> >3. Move the Confluent Schema Registry Client code to a separate maven
> >module, flink-formats/flink-confluent-common, and extend it to support
> >ProtobufSchemaProvider
> ><
> >
> 

Re: [DISCUSS] Move CheckpointingMode to flink-core

2024-02-27 Thread Lincoln Lee
Hi Zakelly,

Thanks for letting us 1.19 RMs know about this!

This change has been discussed during today's release sync meeting, we
suggest not merge it into 1.19.
We can continue discussing the removal in 2.x separately.

Best,
Lincoln Lee


Hangxiang Yu  于2024年2月27日周二 11:28写道:

> Hi, Zakelly.
> Thanks for driving this.
> Moving this class to flink-core makes sense to me which could make the code
> path and configs clearer.
> It's marked as @Public from 1.0 and 1.20 should be the next long-term
> version, so 1.19 should have been a suitable version to do it.
> And also look forward to thoughts of other developers/RMs since 1.19 is
> currently under a feature freeze status.
>
> On Mon, Feb 26, 2024 at 6:42 PM Zakelly Lan  wrote:
>
> > Hi devs,
> >
> > When working on the FLIP-406[1], I realized that moving all options of
> > ExecutionCheckpointingOptions(flink-streaming-java) to
> > CheckpointingOptions(flink-core) depends on relocating the
> > enum CheckpointingMode(flink-streaming-java) to flink-core module.
> However,
> > the CheckpointingMode is annotated as @Public and used by datastream api
> > like 'CheckpointConfig#setCheckpointingMode'. So I'd like to start a
> > discussion on moving the CheckpointingMode to flink-core. It is in a
> little
> > bit of a hurry if we want the old enum to be entirely removed in Flink
> 2.x
> > series, since the deprecation should be shipped in the upcoming Flink
> 1.19.
> > I suggest not creating a dedicated FLIP and treating this as a sub-task
> of
> > FLIP-406.
> >
> > I prepared a minimal change of providing new APIs and deprecating the old
> > ones[2], which could be merged to 1.19 if we agree to do so.
> >
> > Looking forward to your thoughts! Also cc RMs of 1.19 about this.
> >
> > [1]
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=284789560
> > [2]
> >
> >
> https://github.com/apache/flink/commit/9bdd237d0322df8853f1b9e6ae658f77b9175237
> >
> > Best,
> > Zakelly
> >
>
>
> --
> Best,
> Hangxiang.
>


[jira] [Created] (FLINK-34534) CLONE - Vote on the release candidate

2024-02-27 Thread lincoln lee (Jira)
lincoln lee created FLINK-34534:
---

 Summary: CLONE - Vote on the release candidate
 Key: FLINK-34534
 URL: https://issues.apache.org/jira/browse/FLINK-34534
 Project: Flink
  Issue Type: Sub-task
Affects Versions: 1.17.0
Reporter: Matthias Pohl
Assignee: Qingsheng Ren
 Fix For: 1.17.0


Once you have built and individually reviewed the release candidate, please 
share it for the community-wide review. Please review foundation-wide [voting 
guidelines|http://www.apache.org/foundation/voting.html] for more information.

Start the review-and-vote thread on the dev@ mailing list. Here’s an email 
template; please adjust as you see fit.
{quote}From: Release Manager
To: dev@flink.apache.org
Subject: [VOTE] Release 1.2.3, release candidate #3

Hi everyone,
Please review and vote on the release candidate #3 for the version 1.2.3, as 
follows:
[ ] +1, Approve the release
[ ] -1, Do not approve the release (please provide specific comments)

The complete staging area is available for your review, which includes:
 * JIRA release notes [1],
 * the official Apache source release and binary convenience releases to be 
deployed to dist.apache.org [2], which are signed with the key with fingerprint 
 [3],
 * all artifacts to be deployed to the Maven Central Repository [4],
 * source code tag "release-1.2.3-rc3" [5],
 * website pull request listing the new release and adding announcement blog 
post [6].

The vote will be open for at least 72 hours. It is adopted by majority 
approval, with at least 3 PMC affirmative votes.

Thanks,
Release Manager

[1] link
[2] link
[3] [https://dist.apache.org/repos/dist/release/flink/KEYS]
[4] link
[5] link
[6] link
{quote}
*If there are any issues found in the release candidate, reply on the vote 
thread to cancel the vote.* There’s no need to wait 72 hours. Proceed to the 
Fix Issues step below and address the problem. However, some issues don’t 
require cancellation. For example, if an issue is found in the website pull 
request, just correct it on the spot and the vote can continue as-is.

For cancelling a release, the release manager needs to send an email to the 
release candidate thread, stating that the release candidate is officially 
cancelled. Next, all artifacts created specifically for the RC in the previous 
steps need to be removed:
 * Delete the staging repository in Nexus
 * Remove the source / binary RC files from dist.apache.org
 * Delete the source code tag in git

*If there are no issues, reply on the vote thread to close the voting.* Then, 
tally the votes in a separate email. Here’s an email template; please adjust as 
you see fit.
{quote}From: Release Manager
To: dev@flink.apache.org
Subject: [RESULT] [VOTE] Release 1.2.3, release candidate #3

I'm happy to announce that we have unanimously approved this release.

There are XXX approving votes, XXX of which are binding:
 * approver 1
 * approver 2
 * approver 3
 * approver 4

There are no disapproving votes.

Thanks everyone!
{quote}
 

h3. Expectations
 * Community votes to release the proposed candidate, with at least three 
approving PMC votes

Any issues that are raised till the vote is over should be either resolved or 
moved into the next release (if applicable).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-34533) CLONE - Propose a pull request for website updates

2024-02-27 Thread lincoln lee (Jira)
lincoln lee created FLINK-34533:
---

 Summary: CLONE - Propose a pull request for website updates
 Key: FLINK-34533
 URL: https://issues.apache.org/jira/browse/FLINK-34533
 Project: Flink
  Issue Type: Sub-task
Affects Versions: 1.17.0
Reporter: Matthias Pohl
Assignee: Qingsheng Ren
 Fix For: 1.17.0


The final step of building the candidate is to propose a website pull request 
containing the following changes:
 # update 
[apache/flink-web:_config.yml|https://github.com/apache/flink-web/blob/asf-site/_config.yml]
 ## update {{FLINK_VERSION_STABLE}} and {{FLINK_VERSION_STABLE_SHORT}} as 
required
 ## update version references in quickstarts ({{{}q/{}}} directory) as required
 ## (major only) add a new entry to {{flink_releases}} for the release binaries 
and sources
 ## (minor only) update the entry for the previous release in the series in 
{{flink_releases}}
 ### Please pay notice to the ids assigned to the download entries. They should 
be unique and reflect their corresponding version number.
 ## add a new entry to {{release_archive.flink}}
 # add a blog post announcing the release in _posts
 # add a organized release notes page under docs/content/release-notes and 
docs/content.zh/release-notes (like 
[https://nightlies.apache.org/flink/flink-docs-release-1.15/release-notes/flink-1.15/]).
 The page is based on the non-empty release notes collected from the issues, 
and only the issues that affect existing users should be included (e.g., 
instead of new functionality). It should be in a separate PR since it would be 
merged to the flink project.

(!) Don’t merge the PRs before finalizing the release.

 

h3. Expectations
 * Website pull request proposed to list the 
[release|http://flink.apache.org/downloads.html]
 * (major only) Check {{docs/config.toml}} to ensure that
 ** the version constants refer to the new version
 ** the {{baseurl}} does not point to {{flink-docs-master}}  but 
{{flink-docs-release-X.Y}} instead



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-34532) CLONE - Stage source and binary releases on dist.apache.org

2024-02-27 Thread lincoln lee (Jira)
lincoln lee created FLINK-34532:
---

 Summary: CLONE - Stage source and binary releases on 
dist.apache.org
 Key: FLINK-34532
 URL: https://issues.apache.org/jira/browse/FLINK-34532
 Project: Flink
  Issue Type: Sub-task
Reporter: Matthias Pohl
Assignee: Qingsheng Ren


Copy the source release to the dev repository of dist.apache.org:
# If you have not already, check out the Flink section of the dev repository on 
dist.apache.org via Subversion. In a fresh directory:
{code:bash}
$ svn checkout https://dist.apache.org/repos/dist/dev/flink --depth=immediates
{code}
# Make a directory for the new release and copy all the artifacts (Flink 
source/binary distributions, hashes, GPG signatures and the python 
subdirectory) into that newly created directory:
{code:bash}
$ mkdir flink/flink-${RELEASE_VERSION}-rc${RC_NUM}
$ mv /tools/releasing/release/* 
flink/flink-${RELEASE_VERSION}-rc${RC_NUM}
{code}
# Add and commit all the files.
{code:bash}
$ cd flink
flink $ svn add flink-${RELEASE_VERSION}-rc${RC_NUM}
flink $ svn commit -m "Add flink-${RELEASE_VERSION}-rc${RC_NUM}"
{code}
# Verify that files are present under 
[https://dist.apache.org/repos/dist/dev/flink|https://dist.apache.org/repos/dist/dev/flink].
# Push the release tag if not done already (the following command assumes to be 
called from within the apache/flink checkout):
{code:bash}
$ git push  refs/tags/release-${RELEASE_VERSION}-rc${RC_NUM}
{code}

 

h3. Expectations
 * Maven artifacts deployed to the staging repository of 
[repository.apache.org|https://repository.apache.org/content/repositories/]
 * Source distribution deployed to the dev repository of 
[dist.apache.org|https://dist.apache.org/repos/dist/dev/flink/]
 * Check hashes (e.g. shasum -c *.sha512)
 * Check signatures (e.g. {{{}gpg --verify 
flink-1.2.3-source-release.tar.gz.asc flink-1.2.3-source-release.tar.gz{}}})
 * {{grep}} for legal headers in each file.
 * If time allows check the NOTICE files of the modules whose dependencies have 
been changed in this release in advance, since the license issues from time to 
time pop up during voting. See [Verifying a Flink 
Release|https://cwiki.apache.org/confluence/display/FLINK/Verifying+a+Flink+Release]
 "Checking License" section.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-34531) CLONE - Build and stage Java and Python artifacts

2024-02-27 Thread lincoln lee (Jira)
lincoln lee created FLINK-34531:
---

 Summary: CLONE - Build and stage Java and Python artifacts
 Key: FLINK-34531
 URL: https://issues.apache.org/jira/browse/FLINK-34531
 Project: Flink
  Issue Type: Sub-task
Reporter: Matthias Pohl
Assignee: Qingsheng Ren


# Create a local release branch ((!) this step can not be skipped for minor 
releases):
{code:bash}
$ cd ./tools
tools/ $ OLD_VERSION=$CURRENT_SNAPSHOT_VERSION NEW_VERSION=$RELEASE_VERSION 
RELEASE_CANDIDATE=$RC_NUM releasing/create_release_branch.sh
{code}
 # Tag the release commit:
{code:bash}
$ git tag -s ${TAG} -m "${TAG}"
{code}
 # We now need to do several things:
 ## Create the source release archive
 ## Deploy jar artefacts to the [Apache Nexus 
Repository|https://repository.apache.org/], which is the staging area for 
deploying the jars to Maven Central
 ## Build PyFlink wheel packages
You might want to create a directory on your local machine for collecting the 
various source and binary releases before uploading them. Creating the binary 
releases is a lengthy process but you can do this on another machine (for 
example, in the "cloud"). When doing this, you can skip signing the release 
files on the remote machine, download them to your local machine and sign them 
there.
 # Build the source release:
{code:bash}
tools $ RELEASE_VERSION=$RELEASE_VERSION releasing/create_source_release.sh
{code}
 # Stage the maven artifacts:
{code:bash}
tools $ releasing/deploy_staging_jars.sh
{code}
Review all staged artifacts ([https://repository.apache.org/]). They should 
contain all relevant parts for each module, including pom.xml, jar, test jar, 
source, test source, javadoc, etc. Carefully review any new artifacts.
 # Close the staging repository on Apache Nexus. When prompted for a 
description, enter “Apache Flink, version X, release candidate Y”.
Then, you need to build the PyFlink wheel packages (since 1.11):
 # Set up an azure pipeline in your own Azure account. You can refer to [Azure 
Pipelines|https://cwiki.apache.org/confluence/display/FLINK/Azure+Pipelines#AzurePipelines-Tutorial:SettingupAzurePipelinesforaforkoftheFlinkrepository]
 for more details on how to set up azure pipeline for a fork of the Flink 
repository. Note that a google cloud mirror in Europe is used for downloading 
maven artifacts, therefore it is recommended to set your [Azure organization 
region|https://docs.microsoft.com/en-us/azure/devops/organizations/accounts/change-organization-location]
 to Europe to speed up the downloads.
 # Push the release candidate branch to your forked personal Flink repository, 
e.g.
{code:bash}
tools $ git push  
refs/heads/release-${RELEASE_VERSION}-rc${RC_NUM}:release-${RELEASE_VERSION}-rc${RC_NUM}
{code}
 # Trigger the Azure Pipelines manually to build the PyFlink wheel packages
 ## Go to your Azure Pipelines Flink project → Pipelines
 ## Click the "New pipeline" button on the top right
 ## Select "GitHub" → your GitHub Flink repository → "Existing Azure Pipelines 
YAML file"
 ## Select your branch → Set path to "/azure-pipelines.yaml" → click on 
"Continue" → click on "Variables"
 ## Then click "New Variable" button, fill the name with "MODE", and the value 
with "release". Click "OK" to set the variable and the "Save" button to save 
the variables, then back on the "Review your pipeline" screen click "Run" to 
trigger the build.
 ## You should now see a build where only the "CI build (release)" is running
 # Download the PyFlink wheel packages from the build result page after the 
jobs of "build_wheels mac" and "build_wheels linux" have finished.
 ## Download the PyFlink wheel packages
 ### Open the build result page of the pipeline
 ### Go to the {{Artifacts}} page (build_wheels linux -> 1 artifact)
 ### Click {{wheel_Darwin_build_wheels mac}} and {{wheel_Linux_build_wheels 
linux}} separately to download the zip files
 ## Unzip these two zip files
{code:bash}
$ cd /path/to/downloaded_wheel_packages
$ unzip wheel_Linux_build_wheels\ linux.zip
$ unzip wheel_Darwin_build_wheels\ mac.zip{code}
 ## Create directory {{./dist}} under the directory of {{{}flink-python{}}}:
{code:bash}
$ cd 
$ mkdir flink-python/dist{code}
 ## Move the unzipped wheel packages to the directory of 
{{{}flink-python/dist{}}}:
{code:java}
$ mv /path/to/wheel_Darwin_build_wheels\ mac/* flink-python/dist/
$ mv /path/to/wheel_Linux_build_wheels\ linux/* flink-python/dist/
$ cd tools{code}

Finally, we create the binary convenience release files:
{code:bash}
tools $ RELEASE_VERSION=$RELEASE_VERSION releasing/create_binary_release.sh
{code}
If you want to run this step in parallel on a remote machine you have to make 
the release commit available there (for example by pushing to a repository). 
*This is important: the commit inside the binary builds has to match the commit 
of the source builds and the tagged release commit.* 
When building 

[jira] [Created] (FLINK-34530) Build Release Candidate: 1.19.0-rc1

2024-02-27 Thread lincoln lee (Jira)
lincoln lee created FLINK-34530:
---

 Summary: Build Release Candidate: 1.19.0-rc1
 Key: FLINK-34530
 URL: https://issues.apache.org/jira/browse/FLINK-34530
 Project: Flink
  Issue Type: New Feature
Affects Versions: 1.17.0
Reporter: Matthias Pohl
Assignee: Jing Ge
 Fix For: 1.17.0


The core of the release process is the build-vote-fix cycle. Each cycle 
produces one release candidate. The Release Manager repeats this cycle until 
the community approves one release candidate, which is then finalized.

h4. Prerequisites
Set up a few environment variables to simplify Maven commands that follow. This 
identifies the release candidate being built. Start with {{RC_NUM}} equal to 1 
and increment it for each candidate:
{code}
RC_NUM="1"
TAG="release-${RELEASE_VERSION}-rc${RC_NUM}"
{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [VOTE] Release flink-connector-parent 1.1.0 release candidate #2

2024-02-27 Thread Chesnay Schepler

+1
- pom contents
- source contents
- Website PR

On 19/02/2024 18:33, Etienne Chauchot wrote:

Hi everyone,
Please review and vote on the release candidate #2 for the version 
1.1.0, as follows:

[ ] +1, Approve the release
[ ] -1, Do not approve the release (please provide specific comments)


The complete staging area is available for your review, which includes:
* JIRA release notes [1],
* the official Apache source release to be deployed to dist.apache.org 
[2], which are signed with the key with fingerprint 
D1A76BA19D6294DD0033F6843A019F0B8DD163EA [3],

* all artifacts to be deployed to the Maven Central Repository [4],
* source code tag v1.1.0-rc2 [5],
* website pull request listing the new release [6].

* confluence wiki: connector parent upgrade to version 1.1.0 that will 
be validated after the artifact is released (there is no PR mechanism 
on the wiki) [7]



The vote will be open for at least 72 hours. It is adopted by majority 
approval, with at least 3 PMC affirmative votes.


Thanks,
Etienne

[1] 
https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12353442
[2] 
https://dist.apache.org/repos/dist/dev/flink/flink-connector-parent-1.1.0-rc2

[3] https://dist.apache.org/repos/dist/release/flink/KEYS
[4] 
https://repository.apache.org/content/repositories/orgapacheflink-1707
[5] 
https://github.com/apache/flink-connector-shared-utils/releases/tag/v1.1.0-rc2


[6] https://github.com/apache/flink-web/pull/717

[7] 
https://cwiki.apache.org/confluence/display/FLINK/Externalized+Connector+development






Re: [DISCUSS] FLINK-34440 Support Debezium Protobuf Confluent Format

2024-02-27 Thread Robert Metzger
Hi all,

+1 to support the format in Flink.

@Kevin: Do you already have an implementation for this inhouse that you are
looking to upstream, or would you start from scratch?
I'm asking because my current employer, Confluent, has a Protobuf Schema
registry implementation for Flink, and I could help drive contributing this
back to open source.
If you already have an implementation, let's decide which one to use :)

Best,
Robert

On Thu, Feb 22, 2024 at 2:05 PM David Radley 
wrote:

> Hi Kevin,
> Some thoughts on this.
> I suggested an Apicurio registry format in the dev list, and was advised
> to raise a FLIP for this, I suggest the same would apply here (or the
> alternative to FLIPs if you cannot raise one). I am prototyping an Avro
> Apicurio format, prior to raising the Flip,  and notice that the readSchema
> in the SchemaCoder only takes a byte array ,but I need to pass down the
> Kafka headers (where the Apicurio globalId identifying the schema lives).
>
> I assume:
>
>   *   for the confluent Protobuf format you would extend the Protobuf
> format to drive some Schema Registry logic for Protobuf (similar to the way
> Avro does it) where the magic byte _ schema id can be obtained and the
> schema looked up using the Confluent Schema registry.
>   *   It would be good if any protobuf format enhancements for Schema
> registries pass down the Kafka headers (I am thinking as a Map Object> for Avro) as well as the message payload so Apicurio registry could
> work with this.
>   *   It would make sense to have the Confluent schema lookup in common
> code, which is part of the SchemaCoder readSchema  logic.
>   *   I assume the ProtobufSchemaCoder readSchema would return a Protobuf
> Schema object.
>
>
>
> I also wondered whether these Kafka only formats should be moved to the
> Kafka connector repo, or whether they might in the future be used outside
> Kafka – e.g. Avro/Protobuf files in a database.
>Kind regards, David.
>
>
> From: Kevin Lam 
> Date: Wednesday, 21 February 2024 at 18:51
> To: dev@flink.apache.org 
> Subject: [EXTERNAL] [DISCUSS] FLINK-34440 Support Debezium Protobuf
> Confluent Format
> I would love to get some feedback from the community on this JIRA issue:
> https://issues.apache.org/jira/projects/FLINK/issues/FLINK-34440
>
> I am looking into creating a PR and would appreciate some review on the
> approach.
>
> In terms of design I think we can mirror the `debezium-avro-confluent` and
> `avro-confluent` formats already available in Flink:
>
>1. `protobuf-confluent` format which uses DynamicMessage
><
> https://protobuf.dev/reference/java/api-docs/com/google/protobuf/DynamicMessage
> >
>for encoding and decoding.
>   - For encoding the Flink RowType will be used to dynamically create a
>   Protobuf Schema and register it with the Confluent Schema
> Registry. It will
>   use the same schema to construct a DynamicMessage and serialize it.
>   - For decoding, the schema will be fetched from the registry and use
>   DynamicMessage to deserialize and convert the Protobuf object to a
> Flink
>   RowData.
>   - Note: here there is no external .proto file
>2. `debezium-avro-confluent` format which unpacks the Debezium Envelope
>and collects the appropriate UPDATE_BEFORE, UPDATE_AFTER, INSERT, DELETE
>events.
>   - We may be able to refactor and reuse code from the existing
>   DebeziumAvroDeserializationSchema + DebeziumAvroSerializationSchema
> since
>   the deser logic is largely delegated to and these Schemas are
> concerned
>   with the handling the Debezium envelope.
>3. Move the Confluent Schema Registry Client code to a separate maven
>module, flink-formats/flink-confluent-common, and extend it to support
>ProtobufSchemaProvider
><
> https://github.com/confluentinc/schema-registry/blob/ca226f2e1e2091c67b372338221b57fdd435d9f2/protobuf-provider/src/main/java/io/confluent/kafka/schemaregistry/protobuf/ProtobufSchemaProvider.java#L26
> >
>.
>
>
> Does anyone have any feedback or objections to this approach?
>
> Unless otherwise stated above:
>
> IBM United Kingdom Limited
> Registered in England and Wales with number 741598
> Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU
>


[jira] [Created] (FLINK-34529) Projection cannot be pushed down through rank operator.

2024-02-27 Thread yisha zhou (Jira)
yisha zhou created FLINK-34529:
--

 Summary: Projection cannot be pushed down through rank operator.
 Key: FLINK-34529
 URL: https://issues.apache.org/jira/browse/FLINK-34529
 Project: Flink
  Issue Type: Bug
  Components: Table SQL / Planner
Affects Versions: 1.19.0
Reporter: yisha zhou


When there is a rank/deduplicate operator, the projection based on output of 
this operator cannot be pushed down to the input of it.

The following code can help reproducing the issue:
{code:java}
val util = streamTestUtil() 
util.addTableSource[(String, Int, String)]("T1", 'a, 'b, 'c)
util.addTableSource[(String, Int, String)]("T2", 'd, 'e, 'f)
val sql =
  """
|SELECT a FROM (
|  SELECT a, f,
|  ROW_NUMBER() OVER (PARTITION BY f ORDER BY c DESC) as rank_num
|  FROM  T1, T2
|  WHERE T1.a = T2.d
|)
|WHERE rank_num = 1
  """.stripMargin

util.verifyPlan(sql){code}
The plan is expected to be:
{code:java}
Calc(select=[a])
+- Rank(strategy=[AppendFastStrategy], rankType=[ROW_NUMBER], 
rankRange=[rankStart=1, rankEnd=1], partitionBy=[f], orderBy=[c DESC], 
select=[a, c, f])
   +- Exchange(distribution=[hash[f]])
  +- Calc(select=[a, c, f])
 +- Join(joinType=[InnerJoin], where=[=(a, d)], select=[a, c, d, f], 
leftInputSpec=[NoUniqueKey], rightInputSpec=[NoUniqueKey])
:- Exchange(distribution=[hash[a]])
:  +- Calc(select=[a, c])
: +- LegacyTableSourceScan(table=[[default_catalog, 
default_database, T1, source: [TestTableSource(a, b, c)]]], fields=[a, b, c])
+- Exchange(distribution=[hash[d]])
   +- Calc(select=[d, f])
  +- LegacyTableSourceScan(table=[[default_catalog, 
default_database, T2, source: [TestTableSource(d, e, f)]]], fields=[d, e, f]) 
{code}
Notice that the 'select' of Join operator is [a, c, d, f]. However the actual 
plan is:
{code:java}
Calc(select=[a])
+- Rank(strategy=[AppendFastStrategy], rankType=[ROW_NUMBER], 
rankRange=[rankStart=1, rankEnd=1], partitionBy=[f], orderBy=[c DESC], 
select=[a, c, f])
   +- Exchange(distribution=[hash[f]])
  +- Calc(select=[a, c, f])
 +- Join(joinType=[InnerJoin], where=[=(a, d)], select=[a, b, c, d, e, 
f], leftInputSpec=[NoUniqueKey], rightInputSpec=[NoUniqueKey])
:- Exchange(distribution=[hash[a]])
:  +- LegacyTableSourceScan(table=[[default_catalog, 
default_database, T1, source: [TestTableSource(a, b, c)]]], fields=[a, b, c])
+- Exchange(distribution=[hash[d]])
   +- LegacyTableSourceScan(table=[[default_catalog, 
default_database, T2, source: [TestTableSource(d, e, f)]]], fields=[d, e, f])
 {code}
the 'select' of Join operator is [a, b, c, d, e, f], which means the projection 
in the final Calc is not passed through the Rank.

And I think an easy way to fix this issue is to add 
org.apache.calcite.rel.rules.ProjectWindowTransposeRule into 
FlinkStreamRuleSets.LOGICAL_OPT_RULES.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] Apache Bahir retired

2024-02-27 Thread Ferenc Csaky
Thank you Leonard for sharing your thoughts on this topic.

I agree that complying with the Flink community connector
development process would be a must, if there are no legal or
copyright issues, I would be happy to take that task for this
particular case.

I am no legal/copyright expert myslef, but Bahir uses the Apache
2.0 license as well, so I believe it should be possible without too many 
complications, but I try to look for help on that front.

FYI we are using and supporting a downstream fork of the Kudu connector on top 
of Flink 1.18 without any major modifications, so it is pretty up to date 
upstream as well.

Regards,
Ferenc




On Monday, February 26th, 2024 at 10:29, Leonard Xu  wrote:

> 
> 
> Hey, Ferenc
> 
> Thanks for initiating this discussion. Apache Bahir is a great project that 
> provided significant assistance to many Apache Flink/Spark users. It's pity 
> news that it has been retired.
> 
> I believe that connectivity is crucial for building the ecosystem of the 
> Flink such a computing engine. The community, or at least I, would actively 
> support the introduction and maintenance of new connectors. Therefore, adding 
> a Kudu connector or other connectors from Bahir makes sense to me, as long as 
> we adhere to the development process for connectors in the Flink community[1].
> I recently visited the Bahir Flink repository. Although the last release of 
> Bahir Flink was in August ’22[2] which is compatible with Flink 1.14, its 
> latest code is compatible with Flink 1.17[3]. So, based on the existing 
> codebase, developing an official Apache Flink connector for Kudu or other 
> connectors should be manageable. One point to consider is that if we're not 
> developing a connector entirely from scratch but based on an existing 
> repository, we must ensure that there are no copyright issues. Here, "no 
> issues" means satisfying both Apache Bahir's and Apache Flink's copyright 
> requirements. Honestly, I'm not an expert in copyright or legal matters. If 
> you're interested in contributing to the Kudu connector, it might be 
> necessary to attract other experienced community members to participate in 
> this aspect.
> 
> Best,
> Leonard
> 
> [1] https://cwiki.apache.org/confluence/display/FLINK/FLIP+Connector+Template
> [2] https://github.com/apache/bahir-flink/releases/tag/v1.1.0
> [3] https://github.com/apache/bahir-flink/blob/master/pom.xml#L116
> 
> 
> 
> > 2024年2月22日 下午6:37,Ferenc Csaky ferenc.cs...@pm.me.INVALID 写道:
> > 
> > Hello devs,
> > 
> > Just saw that the Bahir project is retired [1]. Any plans on what's 
> > happening with the Flink connectors that were part of this project? We 
> > specifically use the Kudu connector and integrate it to our platform at 
> > Cloudera, so we would be okay to maintain it. Would it be possible to carry 
> > it over as separate connector repu under the Apache umbrella similarly as 
> > it happened with the external connectors previously?
> > 
> > Thanks,
> > Ferenc


[jira] [Created] (FLINK-34528) With shuffling data, when a Task Manager is killed, restarting the Flink job takes a considerable amount of time

2024-02-27 Thread junzhong qin (Jira)
junzhong qin created FLINK-34528:


 Summary: With shuffling data, when a Task Manager is killed, 
restarting the Flink job takes a considerable amount of time
 Key: FLINK-34528
 URL: https://issues.apache.org/jira/browse/FLINK-34528
 Project: Flink
  Issue Type: Sub-task
Reporter: junzhong qin
 Attachments: image-2024-02-27-16-35-04-464.png

In the test case, the pipeline looks like:

!image-2024-02-27-16-35-04-464.png!

The Source: Custom Source generates strings, and the job keyBy the strings to 
Sink: Unnamed.
 # parallelism = 100
 # taskmanager.numberOfTaskSlots = 2
 # disable checkpoint

The worker was killed at 
{code:java}
2024-02-27 16:41:49,982 INFO  
org.apache.flink.runtime.executiongraph.ExecutionGraph   [] - Sink: Unnamed 
(6/100) (2f1c7b22098a273f5471e3e8f794e1d3_0a448493b4782967b150582570326227_5_0) 
switched from RUNNING to FAILED on 
container_e2472_1705993319725_62292_01_46 @ xxx 
(dataPort=38827).org.apache.flink.runtime.io.network.netty.exception.RemoteTransportException:
 Connection unexpectedly closed by remote task manager 'xxx/10.169.18.138:35983 
[ container_e2472_1705993319725_62292_01_10(xxx:5454) ] '. This might 
indicate that the remote task manager was lost.at 
org.apache.flink.runtime.io.network.netty.CreditBasedPartitionRequestClientHandler.channelInactive(CreditBasedPartitionRequestClientHandler.java:134)
at 
org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:262)
at 
org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:248)
at 
org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:241)
at 
org.apache.flink.shaded.netty4.io.netty.channel.ChannelInboundHandlerAdapter.channelInactive(ChannelInboundHandlerAdapter.java:81)
at 
org.apache.flink.runtime.io.network.netty.NettyMessageClientDecoderDelegate.channelInactive(NettyMessageClientDecoderDelegate.java:94)
at 
org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:262)
at 
org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:248)
at 
org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:241)
at 
org.apache.flink.shaded.netty4.io.netty.channel.DefaultChannelPipeline$HeadContext.channelInactive(DefaultChannelPipeline.java:1405)
at 
org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:262)
at 
org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:248)
at 
org.apache.flink.shaded.netty4.io.netty.channel.DefaultChannelPipeline.fireChannelInactive(DefaultChannelPipeline.java:901)
at 
org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannel$AbstractUnsafe$7.run(AbstractChannel.java:813)
at 
org.apache.flink.shaded.netty4.io.netty.util.concurrent.AbstractEventExecutor.runTask(AbstractEventExecutor.java:174)
at 
org.apache.flink.shaded.netty4.io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:167)
at 
org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:470)
at 
org.apache.flink.shaded.netty4.io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:403)
at 
org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997)
at 
org.apache.flink.shaded.netty4.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
at java.lang.Thread.run(Thread.java:748) ~[?:1.8.0_252]{code}
The job took about 16 seconds to restart.
{code:java}
// The task was scheduled to a task manager that had already been killed
2024-02-27 16:41:53,506 INFO 
org.apache.flink.runtime.executiongraph.ExecutionGraph [] - Deploying Source: 
Custom Source (16/100) (attempt #3) with attempt id 
2f1c7b22098a273f5471e3e8f794e1d3_bc764cd8ddf7a0cff126f51c16239658_15_3 and 
vertex id bc764cd8ddf7a0cff126f51c16239658_15 to 
container_e2472_1705993319725_62292_01_10 @ xxx (dataPort=35983) with 
allocation id 975dded4548ad15b36d0e5e6aac8f5b6

// The last task switched from INITIALIZING to RUNNING
2024-02-27 16:42:05,176 INFO 
org.apache.flink.runtime.executiongraph.ExecutionGraph [] - Sink: Unnamed 
(64/100) 
(2f1c7b22098a273f5471e3e8f794e1d3_0a448493b4782967b150582570326227_63_14) 
switched from INITIALIZING to RUNNING. {code}



--

[jira] [Created] (FLINK-34527) Deprecate Time classes also in PyFlink

2024-02-27 Thread Matthias Pohl (Jira)
Matthias Pohl created FLINK-34527:
-

 Summary: Deprecate Time classes also in PyFlink
 Key: FLINK-34527
 URL: https://issues.apache.org/jira/browse/FLINK-34527
 Project: Flink
  Issue Type: Bug
  Components: API / Python
Affects Versions: 1.20.0
Reporter: Matthias Pohl


FLINK-32570 deprecated the Time classes. But we missed touched the 
PyFlink-related APIs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)