Re: [DISCUSS] FLIP-453: Promote Unified Sink API V2 to Public and Deprecate SinkFunction

2024-05-02 Thread Péter Váry
Hi Martijn,

We might want to add FLIP-371 [1] to the list. (Or we aim only for higher
level FLIPs?)

We are in the process of using the new API in Iceberg connector [2] - so
far, so good.

I know of one minor known issue about the sink [3], which should be ready
for the release.

All-in-all, I think we are in good shape, and we could move forward with
the promotion.

Thanks,
Peter

[1] -
https://cwiki.apache.org/confluence/plugins/servlet/mobile?contentId=263430387
[2] - https://github.com/apache/iceberg/pull/10179
[3] - https://issues.apache.org/jira/browse/FLINK-35149


On Thu, May 2, 2024, 09:47 Muhammet Orazov 
wrote:

> Got it, thanks!
>
> On 2024-05-02 06:53, Martijn Visser wrote:
> > Hi Muhammet,
> >
> > Thanks for joining the discussion! The changes in this FLIP would be
> > targeted for Flink 1.19, since it's only a matter of changing the
> > annotation.
> >
> > Best regards,
> >
> > Martijn
> >
> > On Thu, May 2, 2024 at 7:26 AM Muhammet Orazov 
> > wrote:
> >
> >> Hello Martijn,
> >>
> >> Thanks for the FLIP and detailed history of changes, +1.
> >>
> >> Would FLIP changes target for 2.0? I think it would be good
> >> to have clear APIs on 2.0 release.
> >>
> >> Best,
> >> Muhammet
> >>
> >> On 2024-05-01 15:30, Martijn Visser wrote:
> >> > Hi everyone,
> >> >
> >> > I would like to start a discussion on FLIP-453: Promote Unified Sink
> >> > API V2
> >> > to Public and Deprecate SinkFunction
> >> > https://cwiki.apache.org/confluence/x/rIobEg
> >> >
> >> > This FLIP proposes to promote the Unified Sink API V2 from
> >> > PublicEvolving
> >> > to Public and to mark the SinkFunction as Deprecated.
> >> >
> >> > I'm looking forward to your thoughts.
> >> >
> >> > Best regards,
> >> >
> >> > Martijn
> >>
>


[jira] [Created] (FLINK-35286) Cannot discover Hive connector outside Hive catalog

2024-05-02 Thread Ryan Goldenberg (Jira)
Ryan Goldenberg created FLINK-35286:
---

 Summary: Cannot discover Hive connector outside Hive catalog
 Key: FLINK-35286
 URL: https://issues.apache.org/jira/browse/FLINK-35286
 Project: Flink
  Issue Type: Improvement
  Components: Connectors / Hive
 Environment: Flink 1.18.1
Hive 2.3.9
flink-sql-connector-hive-2.3.9_2.12-1.18.1.jar
Reporter: Ryan Goldenberg


*Problem*

Referencing tables with the 'hive' connector outside of HiveCatalog gives the 
error

 
{code:java}
org.apache.flink.table.api.ValidationException: Could not find any factory for 
identifier 'hive' that implements 
'org.apache.flink.table.factories.DynamicTableFactory' in the classpath{code}
 

For example, when using a table created like

 
{code:java}
CREATE TABLE my_table LIKE hive.db.table WITH (...);{code}
Whereas the 'hive' connector is available if the table is referenced via 
HiveCatalog.

 

*Desired Behavior*
The 'hive' connector should be available for tables outside of HiveCatalog, for 
example in the default catalog.

*Benefits*
* Can refer to Hive tables without fully qualified path `catalog.db.table` 
outside of HiveCatalog, useful when it is not the only catalog or data source.
* Can modify Hive tables without changing Hive metastore or using SQL hints 
[here|https://nightlies.apache.org/flink/flink-docs-release-1.18/docs/connectors/table/hive/hive_read_write/#reading]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [VOTE] Apache Flink CDC Release 3.1.0, release candidate #1

2024-05-02 Thread Muhammet Orazov

Hey Qingsheng,

Thanks a lot! +1 (non-binding)

- Checked sha512sum hash
- Checked GPG signature
- Reviewed release notes
- Reviewed GitHub web pr (added minor suggestions)
- Built the source with JDK 11 & 8
- Checked that src doesn't contain binary files

Best,
Muhammet

On 2024-04-30 05:11, Qingsheng Ren wrote:

Hi everyone,

Please review and vote on the release candidate #1 for the version 
3.1.0 of

Apache Flink CDC, as follows:
[ ] +1, Approve the release
[ ] -1, Do not approve the release (please provide specific comments)

**Release Overview**

As an overview, the release consists of the following:
a) Flink CDC source release to be deployed to dist.apache.org
b) Maven artifacts to be deployed to the Maven Central Repository

**Staging Areas to Review**

The staging areas containing the above mentioned artifacts are as 
follows,

for your review:
* All artifacts for a) can be found in the corresponding dev repository 
at

dist.apache.org [1], which are signed with the key with fingerprint
A1BD477F79D036D2C30CA7DBCA8AEEC2F6EB040B [2]
* All artifacts for b) can be found at the Apache Nexus Repository [3]

Other links for your review:
* JIRA release notes [4]
* Source code tag "release-3.1.0-rc1" with commit hash
63b42cb937d481f558209ab3c8547959cf039643 [5]
* PR for release announcement blog post of Flink CDC 3.1.0 in flink-web 
[6]


**Vote Duration**

The voting time will run for at least 72 hours, adopted by majority
approval with at least 3 PMC affirmative votes.

Thanks,
Qingsheng Ren

[1] https://dist.apache.org/repos/dist/dev/flink/flink-cdc-3.1.0-rc1/
[2] https://dist.apache.org/repos/dist/release/flink/KEYS
[3] 
https://repository.apache.org/content/repositories/orgapacheflink-1731

[4]
https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12354387
[5] https://github.com/apache/flink-cdc/releases/tag/release-3.1.0-rc1
[6] https://github.com/apache/flink-web/pull/739


[jira] [Created] (FLINK-35285) Autoscaler key group optimization can interfere with scale-down.max-factor

2024-05-02 Thread Trystan (Jira)
Trystan created FLINK-35285:
---

 Summary: Autoscaler key group optimization can interfere with 
scale-down.max-factor
 Key: FLINK-35285
 URL: https://issues.apache.org/jira/browse/FLINK-35285
 Project: Flink
  Issue Type: Bug
Reporter: Trystan


When setting a less aggressive scale down limit, the key group optimization can 
prevent a vertex from scaling down at all. It will hunt from target upwards to 
maxParallelism/2, and will always find the same parallelism again.

 

A simple test trying to scale down from a parallelism of 60 with a 
scale-down.max-factor of 0.2:
{code:java}
assertEquals(48, JobVertexScaler.scale(60, inputShipStrategies, 360, .8, 8, 
360)); {code}
 

It seems reasonable to make a good attempt to spread data across subtasks, but 
not at the expense of total deadlock. The problem is that during scale down it 
doesn't actually ensure that it newParallelism will be < currentParallelism.

 

Clunky, but something to ensure it can make at least some progress. There is 
another test that now fails, but just to illustrate the point:
{code:java}
for (int p = newParallelism; p <= maxParallelism / 2 && p <= upperBound; p++) {
if ((scaleFactor < 1 && p < currentParallelism) || (scaleFactor > 1 && p > 
currentParallelism)) {
if (maxParallelism % p == 0) {
return p;
}
}
} {code}
 

Perhaps this is by design and not a bug, but total failure to scale down in 
order to keep optimized key groups does not seem ideal.

 

Key group optimization block:

https://github.com/apache/flink-kubernetes-operator/blob/fe3d24e4500d6fcaed55250ccc816546886fd1cf/flink-autoscaler/src/main/java/org/apache/flink/autoscaler/JobVertexScaler.java#L296C1-L303C10



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-35284) Streaming File Sink end-to-end test times out

2024-05-02 Thread Ryan Skraba (Jira)
Ryan Skraba created FLINK-35284:
---

 Summary: Streaming File Sink end-to-end test times out
 Key: FLINK-35284
 URL: https://issues.apache.org/jira/browse/FLINK-35284
 Project: Flink
  Issue Type: Bug
Affects Versions: 1.20.0
Reporter: Ryan Skraba


1.20 e2e_2_cron_adaptive_scheduler 
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=59303=logs=fb37c667-81b7-5c22-dd91-846535e99a97=011e961e-597c-5c96-04fe-7941c8b83f23=3076

{code}
May 01 01:08:42 Test (pid: 127498) did not finish after 900 seconds.
May 01 01:08:42 Printing Flink logs and killing it:
{code}

This looks like a consequence of hundreds of {{RecipientUnreachableException}}s 
like: 

{code}
2024-05-01 00:55:00,496 WARN  
org.apache.flink.runtime.resourcemanager.slotmanager.DefaultSlotStatusSyncer [] 
- Slot allocation for allocation 2ec550d8331cd53c32fd899e1e9a0fa5 for job 
5654b195450b352be998673f1637fc43 failed.
org.apache.flink.runtime.rpc.exceptions.RecipientUnreachableException: Could 
not send message [RemoteRpcInvocation(TaskExecutorGateway.requestSlot(SlotID, 
JobID, AllocationID, ResourceProfile, String, ResourceManagerId, Time))] from 
sender [Actor[pekko://flink/temp/taskmanager_0$De]] to recipient 
[Actor[pekko.ssl.tcp://flink@localhost:40665/user/rpc/taskmanager_0#-299862847]],
 because the recipient is unreachable. This can either mean that the recipient 
has been terminated or that the remote RpcService is currently not reachable.
at 
org.apache.flink.runtime.rpc.pekko.DeadLettersActor.handleDeadLetter(DeadLettersActor.java:61)
 ~[flink-rpc-akkafe85d469-8ced-4732-922e-62c82b554871.jar:1.20-SNAPSHOT]
at 
org.apache.pekko.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:33) 
~[flink-rpc-akkafe85d469-8ced-4732-922e-62c82b554871.jar:1.20-SNAPSHOT]
at 
org.apache.pekko.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:29) 
~[flink-rpc-akkafe85d469-8ced-4732-922e-62c82b554871.jar:1.20-SNAPSHOT]
at scala.PartialFunction.applyOrElse(PartialFunction.scala:127) 
~[flink-rpc-akkafe85d469-8ced-4732-922e-62c82b554871.jar:1.20-SNAPSHOT]
{code}





--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [VOTE] FLIP-454: New Apicurio Avro format

2024-05-02 Thread Nic Townsend
+1

I believe expanding the supported list of schema registries is a valuable 
addition.

--
Kind regards

Nic Townsend
Sofware Developer – IBM Event Automation

Slack: @nictownsend
X: @nict0wnsend


From: David Radley 
Date: Thursday, 2 May 2024 at 10:43
To: dev@flink.apache.org 
Subject: [EXTERNAL] [VOTE] FLIP-454: New Apicurio Avro format
Hi everyone,

I'd like to start a vote on the FLIP-454: New Apicurio Avro format
[1]. The discussion thread is here [2].

The vote will be open for at least 72 hours unless there is an
objection
or
insufficient votes.

[1] 
https://cwiki.apache.org/confluence/display/FLINK/FLIP-454%3A+New+Apicurio+Avro+format
[2] https://lists.apache.org/thread/wtkl4yn847tdd0wrqm5xgv9wc0cb0kr8


Kind regards, David.

Unless otherwise stated above:

IBM United Kingdom Limited
Registered in England and Wales with number 741598
Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU

Unless otherwise stated above:

IBM United Kingdom Limited
Registered in England and Wales with number 741598
Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU


Re: [DISCUSS] FLIP-444: Native file copy support

2024-05-02 Thread Aleksandr Pilipenko
Hi Piotr,

Thanks for the proposal.
How adding a s5cmd will affect memory footprint? Since this is a native
binary, memory consumption will not be controlled by JVM or Flink.

Thanks,
Aleksandr

On Thu, 2 May 2024 at 11:12, Hong Liang  wrote:

> Hi Piotr,
>
> Thanks for the FLIP! Nice to see work to improve the filesystem
> performance. +1 to future work to improve the upload speed as well. This
> would be useful for jobs with large state and high Async checkpointing
> times.
>
> Some thoughts on the configuration, it might be good for us to introduce 2x
> points of configurability for future proofing:
> 1/ Configure the implementation of PathsCopyingFileSystem used, maybe by
> config, or by ServiceResources (this would allow us to use this for
> alternative clouds/Implement S3 SDKv2 support if we want this in the
> future). Also this could be used as a feature flag to determine if we
> should be using this new native file copy support.
> 2/ Configure the location of the s5cmd binary (version control etc.), as
> you have mentioned in the FLIP.
>
> Regards,
> Hong
>
>
> On Thu, May 2, 2024 at 9:40 AM Muhammet Orazov
>  wrote:
>
> > Hey Piotr,
> >
> > Thanks for the proposal! It would be great improvement!
> >
> > Some questions from my side:
> >
> > > In order to configure s5cmd Flink’s user would need
> > > to specify path to the s5cmd binary.
> >
> > Could you please also add the configuration property
> > for this? An example showing how users would set this
> > parameter would be helpful.
> >
> > Would this affect any filesystem connectors that use
> > FileSystem[1][2] dependencies?
> >
> > Best,
> > Muhammet
> >
> > [1]:
> >
> >
> https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/filesystems/s3/
> > [2]:
> >
> >
> https://nightlies.apache.org/flink/flink-docs-master/docs/connectors/datastream/filesystem/
> >
> > On 2024-04-30 13:15, Piotr Nowojski wrote:
> > > Hi all!
> > >
> > > I would like to put under discussion:
> > >
> > > FLIP-444: Native file copy support
> > > https://cwiki.apache.org/confluence/x/rAn9EQ
> > >
> > > This proposal aims to speed up Flink recovery times, by speeding up
> > > state
> > > download times. However in the future, the same mechanism could be also
> > > used to speed up state uploading (checkpointing/savepointing).
> > >
> > > I'm curious to hear your thoughts.
> > >
> > > Best,
> > > Piotrek
> >
>


Re: [DISCUSS] FLIP-444: Native file copy support

2024-05-02 Thread Hong Liang
Hi Piotr,

Thanks for the FLIP! Nice to see work to improve the filesystem
performance. +1 to future work to improve the upload speed as well. This
would be useful for jobs with large state and high Async checkpointing
times.

Some thoughts on the configuration, it might be good for us to introduce 2x
points of configurability for future proofing:
1/ Configure the implementation of PathsCopyingFileSystem used, maybe by
config, or by ServiceResources (this would allow us to use this for
alternative clouds/Implement S3 SDKv2 support if we want this in the
future). Also this could be used as a feature flag to determine if we
should be using this new native file copy support.
2/ Configure the location of the s5cmd binary (version control etc.), as
you have mentioned in the FLIP.

Regards,
Hong


On Thu, May 2, 2024 at 9:40 AM Muhammet Orazov
 wrote:

> Hey Piotr,
>
> Thanks for the proposal! It would be great improvement!
>
> Some questions from my side:
>
> > In order to configure s5cmd Flink’s user would need
> > to specify path to the s5cmd binary.
>
> Could you please also add the configuration property
> for this? An example showing how users would set this
> parameter would be helpful.
>
> Would this affect any filesystem connectors that use
> FileSystem[1][2] dependencies?
>
> Best,
> Muhammet
>
> [1]:
>
> https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/filesystems/s3/
> [2]:
>
> https://nightlies.apache.org/flink/flink-docs-master/docs/connectors/datastream/filesystem/
>
> On 2024-04-30 13:15, Piotr Nowojski wrote:
> > Hi all!
> >
> > I would like to put under discussion:
> >
> > FLIP-444: Native file copy support
> > https://cwiki.apache.org/confluence/x/rAn9EQ
> >
> > This proposal aims to speed up Flink recovery times, by speeding up
> > state
> > download times. However in the future, the same mechanism could be also
> > used to speed up state uploading (checkpointing/savepointing).
> >
> > I'm curious to hear your thoughts.
> >
> > Best,
> > Piotrek
>


[jira] [Created] (FLINK-35283) Add support unique Kafka producer client ids

2024-05-02 Thread Francis (Jira)
Francis created FLINK-35283:
---

 Summary: Add support unique Kafka producer client ids 
 Key: FLINK-35283
 URL: https://issues.apache.org/jira/browse/FLINK-35283
 Project: Flink
  Issue Type: Improvement
  Components: Connectors / Kafka
Reporter: Francis


This issue came out of debuging a warning we're seeing in our Flink logs. We're 
running Flink 1.18 and have an application that uses Kafka topics as a source 
and a sink. We're running with several tasks. The warning we're seeing in the 
logs is:

```
WARN org.apache.kafka.common.utils.AppInfoParser - Error registering AppInfo 
mbean
javax.management.InstanceAlreadyExistsException: 
kafka.producer:type=app-info,id=kafka producer client id
```

I've spent a bit of time debugging, and it looks like the root cause of this 
warning is the Flink `KafkaSink` creating multiple `KafkaWriter`s that, in 
turn, create multiple `KafkaProducer`s with the same Kafka producer 
`client.id`. Since the value for `client.id` is used when registering the 
`AppInfo` MBean — when multiple `KafkaProducer`s with the same `client.id` are 
registered we get the above `InstanceAlreadyExistsException`. Since we're 
running with several tasks and we get a Kafka producer per task this duplicate 
registration exception makes sense to me.


I'm wondering if the fix would be to update the {{KafkaSink.builder}} by adding 
a {{setClientIdPrefix}} method, similar to what we have already on the 
{{{}KafkaSource.builder{}}}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


RE: FW: RE: [DISCUSS] FLIP-XXX Apicurio-avro format

2024-05-02 Thread David Radley
Hi Martijn,
I have started a vote thread – please could you update the Flip with the link 
to the vote thread,
  Kind regards, David.


From: David Radley 
Date: Thursday, 2 May 2024 at 10:39
To: dev@flink.apache.org 
Subject: [EXTERNAL] RE: FW: RE: [DISCUSS] FLIP-XXX Apicurio-avro format
Fabulous, thanks Martijn 

From: Martijn Visser 
Date: Thursday, 2 May 2024 at 10:08
To: dev@flink.apache.org 
Subject: [EXTERNAL] Re: FW: RE: [DISCUSS] FLIP-XXX Apicurio-avro format
Done :)

On Thu, May 2, 2024 at 11:01 AM David Radley 
wrote:

> Hi Martijn,
> Thank you very much for looking at this. In response to your feedback; I
> produced a reduced version which is on this link.
>
>
> https://docs.google.com/document/d/1J1E-cE-X2H3-kw4rNjLn71OGPQk_Yl1iGX4-eCHWLgE/edit?usp=sharing
>
> The original version you have copied is a bit out-dated and verbose.
> Please could you replace the Flip with content from the above link,
> Kind regards, David,
>
> From: Martijn Visser 
> Date: Wednesday, 1 May 2024 at 16:31
> To: dev@flink.apache.org 
> Subject: [EXTERNAL] Re: FW: RE: [DISCUSS] FLIP-XXX Apicurio-avro format
> Hi David,
>
> I've copied and pasted it into
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-454%3A+New+Apicurio+Avro+format
> ;
> please take a look if it's as expected.
>
> Best regards,
>
> Martijn
>
> On Wed, May 1, 2024 at 3:47 PM David Radley 
> wrote:
>
> > Hi Martijn,
> > Any news?
> >Kind regards, David.
> >
> >
> > From: David Radley 
> > Date: Monday, 22 April 2024 at 09:48
> > To: dev@flink.apache.org 
> > Subject: FW: [EXTERNAL] RE: [DISCUSS] FLIP-XXX Apicurio-avro format
> > Hi Martijn,
> > A gentle nudge, is this ok for you or one of the PMC or committers to
> > create a Flip now?
> >Kind regards, David.
> >
> > From: David Radley 
> > Date: Monday, 15 April 2024 at 12:29
> > To: dev@flink.apache.org 
> > Subject: Re: [EXTERNAL] RE: [DISCUSS] FLIP-XXX Apicurio-avro format
> > Hi Martijn,
> > Thanks for looking at this. I have used the template in a new  Google Doc
> >
> https://docs.google.com/document/d/1J1E-cE-X2H3-kw4rNjLn71OGPQk_Yl1iGX4-eCHWLgE/edit?usp=sharing
> .
> > I have significantly reduced the content in the Flip, in line with what I
> > see as the template and its usage. If this it too much or too little, I
> can
> > amend,
> >
> > Kind regards, David.
> >
> > From: Martijn Visser 
> > Date: Friday, 12 April 2024 at 18:11
> > To: dev@flink.apache.org 
> > Subject: Re: [EXTERNAL] RE: [DISCUSS] FLIP-XXX Apicurio-avro format
> > Hi David,
> >
> > I tried, but the format wasn't as the FLIP template expects, so I ended
> up
> > needing to change the entire formatting and that was just too much work
> to
> > be honest. If you could make sure that especially the headers match with
> > the FLIP template, and that all of the contents from the FLIP template is
> > there, that would make things much easier.
> >
> > Thanks,
> >
> > Martijn
> >
> > On Fri, Apr 12, 2024 at 6:08 PM David Radley 
> > wrote:
> >
> > > Hi,
> > > A gentle nudge. Please could a committer/PMC member raise the Flip for
> > > this,
> > >   Kind regards, David.
> > >
> > >
> > > From: David Radley 
> > > Date: Monday, 8 April 2024 at 09:40
> > > To: dev@flink.apache.org 
> > > Subject: [EXTERNAL] RE: [DISCUSS] FLIP-XXX Apicurio-avro format
> > > Hi,
> > > I have posted a Google Doc [0] to the mailing list for a discussion
> > thread
> > > for a Flip proposal to introduce a Apicurio-avro format. The
> discussions
> > > have been resolved, please could a committer/PMC member copy the
> contents
> > > from the Google Doc, and create a FLIP number for this,. as per the
> > process
> > > [1],
> > >   Kind regards, David.
> > > [0]
> > >
> > >
> >
> https://docs.google.com/document/d/14LWZPVFQ7F9mryJPdKXb4l32n7B0iWYkcOdEd1xTC7w/edit?usp=sharing
> > >
> > > [1]
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/Flink+Improvement+Proposals#FlinkImprovementProposals-CreateyourOwnFLIP
> > >
> > > From: Jeyhun Karimov 
> > > Date: Friday, 22 March 2024 at 13:05
> > > To: dev@flink.apache.org 
> > > Subject: [EXTERNAL] Re: [DISCUSS] FLIP-XXX Apicurio-avro format
> > > Hi David,
> > >
> > > Thanks a lot for clarification.
> > > Sounds good to me.
> > >
> > > Regards,
> > > Jeyhun
> > >
> > > On Fri, Mar 22, 2024 at 10:54 AM David Radley  >
> > > wrote:
> > >
> > > > Hi Jeyhun,
> > > > Thanks for your feedback.
> > > >
> > > > So for outbound messages, the message includes the global ID. We
> > register
> > > > the schema and match on the artifact id. So if the schema then
> evolved,
> > > > adding a new  version, the global ID would still be unique and the
> same
> > > > version would be targeted. If you wanted to change the Flink table
> > > > definition in line with a higher version, then you could do this –
> the
> > > > artifact id would need to match for it to use the same schema and a
> > > higher
> > > > artifact version would need 

[VOTE] FLIP-454: New Apicurio Avro format

2024-05-02 Thread David Radley
Hi everyone,

I'd like to start a vote on the FLIP-454: New Apicurio Avro format
[1]. The discussion thread is here [2].

The vote will be open for at least 72 hours unless there is an
objection
or
insufficient votes.

[1] 
https://cwiki.apache.org/confluence/display/FLINK/FLIP-454%3A+New+Apicurio+Avro+format
[2] https://lists.apache.org/thread/wtkl4yn847tdd0wrqm5xgv9wc0cb0kr8


Kind regards, David.

Unless otherwise stated above:

IBM United Kingdom Limited
Registered in England and Wales with number 741598
Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU


RE: FW: RE: [DISCUSS] FLIP-XXX Apicurio-avro format

2024-05-02 Thread David Radley
Fabulous, thanks Martijn 

From: Martijn Visser 
Date: Thursday, 2 May 2024 at 10:08
To: dev@flink.apache.org 
Subject: [EXTERNAL] Re: FW: RE: [DISCUSS] FLIP-XXX Apicurio-avro format
Done :)

On Thu, May 2, 2024 at 11:01 AM David Radley 
wrote:

> Hi Martijn,
> Thank you very much for looking at this. In response to your feedback; I
> produced a reduced version which is on this link.
>
>
> https://docs.google.com/document/d/1J1E-cE-X2H3-kw4rNjLn71OGPQk_Yl1iGX4-eCHWLgE/edit?usp=sharing
>
> The original version you have copied is a bit out-dated and verbose.
> Please could you replace the Flip with content from the above link,
> Kind regards, David,
>
> From: Martijn Visser 
> Date: Wednesday, 1 May 2024 at 16:31
> To: dev@flink.apache.org 
> Subject: [EXTERNAL] Re: FW: RE: [DISCUSS] FLIP-XXX Apicurio-avro format
> Hi David,
>
> I've copied and pasted it into
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-454%3A+New+Apicurio+Avro+format
> ;
> please take a look if it's as expected.
>
> Best regards,
>
> Martijn
>
> On Wed, May 1, 2024 at 3:47 PM David Radley 
> wrote:
>
> > Hi Martijn,
> > Any news?
> >Kind regards, David.
> >
> >
> > From: David Radley 
> > Date: Monday, 22 April 2024 at 09:48
> > To: dev@flink.apache.org 
> > Subject: FW: [EXTERNAL] RE: [DISCUSS] FLIP-XXX Apicurio-avro format
> > Hi Martijn,
> > A gentle nudge, is this ok for you or one of the PMC or committers to
> > create a Flip now?
> >Kind regards, David.
> >
> > From: David Radley 
> > Date: Monday, 15 April 2024 at 12:29
> > To: dev@flink.apache.org 
> > Subject: Re: [EXTERNAL] RE: [DISCUSS] FLIP-XXX Apicurio-avro format
> > Hi Martijn,
> > Thanks for looking at this. I have used the template in a new  Google Doc
> >
> https://docs.google.com/document/d/1J1E-cE-X2H3-kw4rNjLn71OGPQk_Yl1iGX4-eCHWLgE/edit?usp=sharing
> .
> > I have significantly reduced the content in the Flip, in line with what I
> > see as the template and its usage. If this it too much or too little, I
> can
> > amend,
> >
> > Kind regards, David.
> >
> > From: Martijn Visser 
> > Date: Friday, 12 April 2024 at 18:11
> > To: dev@flink.apache.org 
> > Subject: Re: [EXTERNAL] RE: [DISCUSS] FLIP-XXX Apicurio-avro format
> > Hi David,
> >
> > I tried, but the format wasn't as the FLIP template expects, so I ended
> up
> > needing to change the entire formatting and that was just too much work
> to
> > be honest. If you could make sure that especially the headers match with
> > the FLIP template, and that all of the contents from the FLIP template is
> > there, that would make things much easier.
> >
> > Thanks,
> >
> > Martijn
> >
> > On Fri, Apr 12, 2024 at 6:08 PM David Radley 
> > wrote:
> >
> > > Hi,
> > > A gentle nudge. Please could a committer/PMC member raise the Flip for
> > > this,
> > >   Kind regards, David.
> > >
> > >
> > > From: David Radley 
> > > Date: Monday, 8 April 2024 at 09:40
> > > To: dev@flink.apache.org 
> > > Subject: [EXTERNAL] RE: [DISCUSS] FLIP-XXX Apicurio-avro format
> > > Hi,
> > > I have posted a Google Doc [0] to the mailing list for a discussion
> > thread
> > > for a Flip proposal to introduce a Apicurio-avro format. The
> discussions
> > > have been resolved, please could a committer/PMC member copy the
> contents
> > > from the Google Doc, and create a FLIP number for this,. as per the
> > process
> > > [1],
> > >   Kind regards, David.
> > > [0]
> > >
> > >
> >
> https://docs.google.com/document/d/14LWZPVFQ7F9mryJPdKXb4l32n7B0iWYkcOdEd1xTC7w/edit?usp=sharing
> > >
> > > [1]
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/Flink+Improvement+Proposals#FlinkImprovementProposals-CreateyourOwnFLIP
> > >
> > > From: Jeyhun Karimov 
> > > Date: Friday, 22 March 2024 at 13:05
> > > To: dev@flink.apache.org 
> > > Subject: [EXTERNAL] Re: [DISCUSS] FLIP-XXX Apicurio-avro format
> > > Hi David,
> > >
> > > Thanks a lot for clarification.
> > > Sounds good to me.
> > >
> > > Regards,
> > > Jeyhun
> > >
> > > On Fri, Mar 22, 2024 at 10:54 AM David Radley  >
> > > wrote:
> > >
> > > > Hi Jeyhun,
> > > > Thanks for your feedback.
> > > >
> > > > So for outbound messages, the message includes the global ID. We
> > register
> > > > the schema and match on the artifact id. So if the schema then
> evolved,
> > > > adding a new  version, the global ID would still be unique and the
> same
> > > > version would be targeted. If you wanted to change the Flink table
> > > > definition in line with a higher version, then you could do this –
> the
> > > > artifact id would need to match for it to use the same schema and a
> > > higher
> > > > artifact version would need to be provided. I notice that Apicurio
> has
> > > > rules around compatibility that you can configure, I suppose if we
> > > attempt
> > > > to create an artifact that breaks these rules , then the register
> > schema
> > > > will fail and the associated operation should fail (e.g. an 

[jira] [Created] (FLINK-35282) Support for Apache Beam > 2.49

2024-05-02 Thread APA (Jira)
APA created FLINK-35282:
---

 Summary: Support for Apache Beam > 2.49
 Key: FLINK-35282
 URL: https://issues.apache.org/jira/browse/FLINK-35282
 Project: Flink
  Issue Type: Improvement
Reporter: APA


>From what I see PyFlink still has the requirement of Apache Beam => 2.43.0 and 
><= 2.49.0 which subsequently results in a requirement of PyArrow <= 12.0.0. 
>That keeps us exposed to [https://nvd.nist.gov/vuln/detail/CVE-2023-47248]

I'm not deep enough familiar with the PyFlink code base to understand why 
Apache Beam's upper dependency limit can't be lifted. From all the existing 
issues I haven't seen one addressing this. Therefore I created one now. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: FW: RE: [DISCUSS] FLIP-XXX Apicurio-avro format

2024-05-02 Thread Martijn Visser
Done :)

On Thu, May 2, 2024 at 11:01 AM David Radley 
wrote:

> Hi Martijn,
> Thank you very much for looking at this. In response to your feedback; I
> produced a reduced version which is on this link.
>
>
> https://docs.google.com/document/d/1J1E-cE-X2H3-kw4rNjLn71OGPQk_Yl1iGX4-eCHWLgE/edit?usp=sharing
>
> The original version you have copied is a bit out-dated and verbose.
> Please could you replace the Flip with content from the above link,
> Kind regards, David,
>
> From: Martijn Visser 
> Date: Wednesday, 1 May 2024 at 16:31
> To: dev@flink.apache.org 
> Subject: [EXTERNAL] Re: FW: RE: [DISCUSS] FLIP-XXX Apicurio-avro format
> Hi David,
>
> I've copied and pasted it into
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-454%3A+New+Apicurio+Avro+format
> ;
> please take a look if it's as expected.
>
> Best regards,
>
> Martijn
>
> On Wed, May 1, 2024 at 3:47 PM David Radley 
> wrote:
>
> > Hi Martijn,
> > Any news?
> >Kind regards, David.
> >
> >
> > From: David Radley 
> > Date: Monday, 22 April 2024 at 09:48
> > To: dev@flink.apache.org 
> > Subject: FW: [EXTERNAL] RE: [DISCUSS] FLIP-XXX Apicurio-avro format
> > Hi Martijn,
> > A gentle nudge, is this ok for you or one of the PMC or committers to
> > create a Flip now?
> >Kind regards, David.
> >
> > From: David Radley 
> > Date: Monday, 15 April 2024 at 12:29
> > To: dev@flink.apache.org 
> > Subject: Re: [EXTERNAL] RE: [DISCUSS] FLIP-XXX Apicurio-avro format
> > Hi Martijn,
> > Thanks for looking at this. I have used the template in a new  Google Doc
> >
> https://docs.google.com/document/d/1J1E-cE-X2H3-kw4rNjLn71OGPQk_Yl1iGX4-eCHWLgE/edit?usp=sharing
> .
> > I have significantly reduced the content in the Flip, in line with what I
> > see as the template and its usage. If this it too much or too little, I
> can
> > amend,
> >
> > Kind regards, David.
> >
> > From: Martijn Visser 
> > Date: Friday, 12 April 2024 at 18:11
> > To: dev@flink.apache.org 
> > Subject: Re: [EXTERNAL] RE: [DISCUSS] FLIP-XXX Apicurio-avro format
> > Hi David,
> >
> > I tried, but the format wasn't as the FLIP template expects, so I ended
> up
> > needing to change the entire formatting and that was just too much work
> to
> > be honest. If you could make sure that especially the headers match with
> > the FLIP template, and that all of the contents from the FLIP template is
> > there, that would make things much easier.
> >
> > Thanks,
> >
> > Martijn
> >
> > On Fri, Apr 12, 2024 at 6:08 PM David Radley 
> > wrote:
> >
> > > Hi,
> > > A gentle nudge. Please could a committer/PMC member raise the Flip for
> > > this,
> > >   Kind regards, David.
> > >
> > >
> > > From: David Radley 
> > > Date: Monday, 8 April 2024 at 09:40
> > > To: dev@flink.apache.org 
> > > Subject: [EXTERNAL] RE: [DISCUSS] FLIP-XXX Apicurio-avro format
> > > Hi,
> > > I have posted a Google Doc [0] to the mailing list for a discussion
> > thread
> > > for a Flip proposal to introduce a Apicurio-avro format. The
> discussions
> > > have been resolved, please could a committer/PMC member copy the
> contents
> > > from the Google Doc, and create a FLIP number for this,. as per the
> > process
> > > [1],
> > >   Kind regards, David.
> > > [0]
> > >
> > >
> >
> https://docs.google.com/document/d/14LWZPVFQ7F9mryJPdKXb4l32n7B0iWYkcOdEd1xTC7w/edit?usp=sharing
> > >
> > > [1]
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/Flink+Improvement+Proposals#FlinkImprovementProposals-CreateyourOwnFLIP
> > >
> > > From: Jeyhun Karimov 
> > > Date: Friday, 22 March 2024 at 13:05
> > > To: dev@flink.apache.org 
> > > Subject: [EXTERNAL] Re: [DISCUSS] FLIP-XXX Apicurio-avro format
> > > Hi David,
> > >
> > > Thanks a lot for clarification.
> > > Sounds good to me.
> > >
> > > Regards,
> > > Jeyhun
> > >
> > > On Fri, Mar 22, 2024 at 10:54 AM David Radley  >
> > > wrote:
> > >
> > > > Hi Jeyhun,
> > > > Thanks for your feedback.
> > > >
> > > > So for outbound messages, the message includes the global ID. We
> > register
> > > > the schema and match on the artifact id. So if the schema then
> evolved,
> > > > adding a new  version, the global ID would still be unique and the
> same
> > > > version would be targeted. If you wanted to change the Flink table
> > > > definition in line with a higher version, then you could do this –
> the
> > > > artifact id would need to match for it to use the same schema and a
> > > higher
> > > > artifact version would need to be provided. I notice that Apicurio
> has
> > > > rules around compatibility that you can configure, I suppose if we
> > > attempt
> > > > to create an artifact that breaks these rules , then the register
> > schema
> > > > will fail and the associated operation should fail (e.g. an insert).
> I
> > > have
> > > > not tried this.
> > > >
> > > >
> > > > For inbound messages, using the global id in the header – this
> targets
> > > one
> > > > version of the schema. I 

RE: FW: RE: [DISCUSS] FLIP-XXX Apicurio-avro format

2024-05-02 Thread David Radley
Hi Martijn,
Thank you very much for looking at this. In response to your feedback; I 
produced a reduced version which is on this link.

https://docs.google.com/document/d/1J1E-cE-X2H3-kw4rNjLn71OGPQk_Yl1iGX4-eCHWLgE/edit?usp=sharing

The original version you have copied is a bit out-dated and verbose. Please 
could you replace the Flip with content from the above link,
Kind regards, David,

From: Martijn Visser 
Date: Wednesday, 1 May 2024 at 16:31
To: dev@flink.apache.org 
Subject: [EXTERNAL] Re: FW: RE: [DISCUSS] FLIP-XXX Apicurio-avro format
Hi David,

I've copied and pasted it into
https://cwiki.apache.org/confluence/display/FLINK/FLIP-454%3A+New+Apicurio+Avro+format;
please take a look if it's as expected.

Best regards,

Martijn

On Wed, May 1, 2024 at 3:47 PM David Radley  wrote:

> Hi Martijn,
> Any news?
>Kind regards, David.
>
>
> From: David Radley 
> Date: Monday, 22 April 2024 at 09:48
> To: dev@flink.apache.org 
> Subject: FW: [EXTERNAL] RE: [DISCUSS] FLIP-XXX Apicurio-avro format
> Hi Martijn,
> A gentle nudge, is this ok for you or one of the PMC or committers to
> create a Flip now?
>Kind regards, David.
>
> From: David Radley 
> Date: Monday, 15 April 2024 at 12:29
> To: dev@flink.apache.org 
> Subject: Re: [EXTERNAL] RE: [DISCUSS] FLIP-XXX Apicurio-avro format
> Hi Martijn,
> Thanks for looking at this. I have used the template in a new  Google Doc
> https://docs.google.com/document/d/1J1E-cE-X2H3-kw4rNjLn71OGPQk_Yl1iGX4-eCHWLgE/edit?usp=sharing.
> I have significantly reduced the content in the Flip, in line with what I
> see as the template and its usage. If this it too much or too little, I can
> amend,
>
> Kind regards, David.
>
> From: Martijn Visser 
> Date: Friday, 12 April 2024 at 18:11
> To: dev@flink.apache.org 
> Subject: Re: [EXTERNAL] RE: [DISCUSS] FLIP-XXX Apicurio-avro format
> Hi David,
>
> I tried, but the format wasn't as the FLIP template expects, so I ended up
> needing to change the entire formatting and that was just too much work to
> be honest. If you could make sure that especially the headers match with
> the FLIP template, and that all of the contents from the FLIP template is
> there, that would make things much easier.
>
> Thanks,
>
> Martijn
>
> On Fri, Apr 12, 2024 at 6:08 PM David Radley 
> wrote:
>
> > Hi,
> > A gentle nudge. Please could a committer/PMC member raise the Flip for
> > this,
> >   Kind regards, David.
> >
> >
> > From: David Radley 
> > Date: Monday, 8 April 2024 at 09:40
> > To: dev@flink.apache.org 
> > Subject: [EXTERNAL] RE: [DISCUSS] FLIP-XXX Apicurio-avro format
> > Hi,
> > I have posted a Google Doc [0] to the mailing list for a discussion
> thread
> > for a Flip proposal to introduce a Apicurio-avro format. The discussions
> > have been resolved, please could a committer/PMC member copy the contents
> > from the Google Doc, and create a FLIP number for this,. as per the
> process
> > [1],
> >   Kind regards, David.
> > [0]
> >
> >
> https://docs.google.com/document/d/14LWZPVFQ7F9mryJPdKXb4l32n7B0iWYkcOdEd1xTC7w/edit?usp=sharing
> >
> > [1]
> >
> >
> https://cwiki.apache.org/confluence/display/FLINK/Flink+Improvement+Proposals#FlinkImprovementProposals-CreateyourOwnFLIP
> >
> > From: Jeyhun Karimov 
> > Date: Friday, 22 March 2024 at 13:05
> > To: dev@flink.apache.org 
> > Subject: [EXTERNAL] Re: [DISCUSS] FLIP-XXX Apicurio-avro format
> > Hi David,
> >
> > Thanks a lot for clarification.
> > Sounds good to me.
> >
> > Regards,
> > Jeyhun
> >
> > On Fri, Mar 22, 2024 at 10:54 AM David Radley 
> > wrote:
> >
> > > Hi Jeyhun,
> > > Thanks for your feedback.
> > >
> > > So for outbound messages, the message includes the global ID. We
> register
> > > the schema and match on the artifact id. So if the schema then evolved,
> > > adding a new  version, the global ID would still be unique and the same
> > > version would be targeted. If you wanted to change the Flink table
> > > definition in line with a higher version, then you could do this – the
> > > artifact id would need to match for it to use the same schema and a
> > higher
> > > artifact version would need to be provided. I notice that Apicurio has
> > > rules around compatibility that you can configure, I suppose if we
> > attempt
> > > to create an artifact that breaks these rules , then the register
> schema
> > > will fail and the associated operation should fail (e.g. an insert). I
> > have
> > > not tried this.
> > >
> > >
> > > For inbound messages, using the global id in the header – this targets
> > one
> > > version of the schema. I can create different messages on the topic
> built
> > > with different schema versions, and I can create different tables in
> > Flink,
> > > as long as the reader and writer schemas are compatible as per the
> > >
> >
> 

Re: [DISCUSS] FLIP-444: Native file copy support

2024-05-02 Thread Muhammet Orazov

Hey Piotr,

Thanks for the proposal! It would be great improvement!

Some questions from my side:


In order to configure s5cmd Flink’s user would need
to specify path to the s5cmd binary.


Could you please also add the configuration property
for this? An example showing how users would set this
parameter would be helpful.

Would this affect any filesystem connectors that use
FileSystem[1][2] dependencies?

Best,
Muhammet

[1]: 
https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/filesystems/s3/
[2]: 
https://nightlies.apache.org/flink/flink-docs-master/docs/connectors/datastream/filesystem/


On 2024-04-30 13:15, Piotr Nowojski wrote:

Hi all!

I would like to put under discussion:

FLIP-444: Native file copy support
https://cwiki.apache.org/confluence/x/rAn9EQ

This proposal aims to speed up Flink recovery times, by speeding up 
state

download times. However in the future, the same mechanism could be also
used to speed up state uploading (checkpointing/savepointing).

I'm curious to hear your thoughts.

Best,
Piotrek


Re: [DISCUSS] FLIP-453: Promote Unified Sink API V2 to Public and Deprecate SinkFunction

2024-05-02 Thread Muhammet Orazov

Got it, thanks!

On 2024-05-02 06:53, Martijn Visser wrote:

Hi Muhammet,

Thanks for joining the discussion! The changes in this FLIP would be
targeted for Flink 1.19, since it's only a matter of changing the
annotation.

Best regards,

Martijn

On Thu, May 2, 2024 at 7:26 AM Muhammet Orazov 
wrote:


Hello Martijn,

Thanks for the FLIP and detailed history of changes, +1.

Would FLIP changes target for 2.0? I think it would be good
to have clear APIs on 2.0 release.

Best,
Muhammet

On 2024-05-01 15:30, Martijn Visser wrote:
> Hi everyone,
>
> I would like to start a discussion on FLIP-453: Promote Unified Sink
> API V2
> to Public and Deprecate SinkFunction
> https://cwiki.apache.org/confluence/x/rIobEg
>
> This FLIP proposes to promote the Unified Sink API V2 from
> PublicEvolving
> to Public and to mark the SinkFunction as Deprecated.
>
> I'm looking forward to your thoughts.
>
> Best regards,
>
> Martijn



Re: [Discuss] FLIP-452: Allow Skipping Invocation of Function Calls While Constant-folding

2024-05-02 Thread Muhammet Orazov

Hey Alan,

Thanks for the proposal, +1!

The `isDeterministic()`[1] function is mentioned in the documentation,
I would suggest to add maybe a section for `supportsConstantFolding()`,
with short description and examples use cases (similar to the
motivation of the FLIP) where this could be useful in UDFs.

Best,
Muhammet

[1]: 
https://nightlies.apache.org/flink/flink-docs-release-1.19/docs/dev/table/functions/udfs/#evaluation-methods



On 2024-04-29 22:57, Alan Sheinberg wrote:
I'd like to start a discussion of FLIP-452: Allow Skipping Invocation 
of

Function Calls While Constant-folding [1]

This feature proposes adding a new
method FunctionDefinition.allowConstantFolding() as part of the Flink
Table/SQL API.  This would be used to determine whether an expression
containing this function should have constant-folding logic run on it,
invoking the function at planning time.

The current behavior of always doing constant-folding on function calls 
is

problematic for UDFs which invoke RPCs or have other side effects in
external systems.  In these cases, you either don’t want these actions 
to

occur during planning time, or it may be important to happen on a per
result row basis.

Note that this is a bit different than
FunctionDefinition.isDeterministic(), and can exist along-side it.

Looking forward to your feedback and suggestions.

[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-452%3A+Allow+Skipping+Invocation+of+Function+Calls+While+Constant-folding


Thanks,
Alan


Re: [DISCUSS] FLIP-453: Promote Unified Sink API V2 to Public and Deprecate SinkFunction

2024-05-02 Thread Martijn Visser
Hi Muhammet,

Thanks for joining the discussion! The changes in this FLIP would be
targeted for Flink 1.19, since it's only a matter of changing the
annotation.

Best regards,

Martijn

On Thu, May 2, 2024 at 7:26 AM Muhammet Orazov 
wrote:

> Hello Martijn,
>
> Thanks for the FLIP and detailed history of changes, +1.
>
> Would FLIP changes target for 2.0? I think it would be good
> to have clear APIs on 2.0 release.
>
> Best,
> Muhammet
>
> On 2024-05-01 15:30, Martijn Visser wrote:
> > Hi everyone,
> >
> > I would like to start a discussion on FLIP-453: Promote Unified Sink
> > API V2
> > to Public and Deprecate SinkFunction
> > https://cwiki.apache.org/confluence/x/rIobEg
> >
> > This FLIP proposes to promote the Unified Sink API V2 from
> > PublicEvolving
> > to Public and to mark the SinkFunction as Deprecated.
> >
> > I'm looking forward to your thoughts.
> >
> > Best regards,
> >
> > Martijn
>


[jira] [Created] (FLINK-35281) FlinkEnvironmentUtils#addJar add each jar only once

2024-05-02 Thread Hongshun Wang (Jira)
Hongshun Wang created FLINK-35281:
-

 Summary: FlinkEnvironmentUtils#addJar add each jar only once
 Key: FLINK-35281
 URL: https://issues.apache.org/jira/browse/FLINK-35281
 Project: Flink
  Issue Type: Bug
  Components: Flink CDC
Affects Versions: cdc-3.1.0
Reporter: Hongshun Wang
 Fix For: cdc-3.2.0


Current org.apache.flink.cdc.composer.flink.FlinkEnvironmentUtils#addJar will 
be invoked for each source and sink.
{code:java}

public static void addJar(StreamExecutionEnvironment env, URL jarUrl) {
try {
Class envClass = 
StreamExecutionEnvironment.class;
Field field = envClass.getDeclaredField("configuration");
field.setAccessible(true);
Configuration configuration = ((Configuration) field.get(env));
List jars =
configuration.getOptional(PipelineOptions.JARS).orElse(new 
ArrayList<>());
jars.add(jarUrl.toString());
configuration.set(PipelineOptions.JARS, jars);
} catch (Exception e) {
throw new RuntimeException("Failed to add JAR to Flink execution 
environment", e);
} {code}
if multiple source or sink share same jar, the par path will be added repeatly.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)