[jira] [Created] (FLINK-35307) Add Compile CI check on jdk17

2024-05-07 Thread Zakelly Lan (Jira)
Zakelly Lan created FLINK-35307:
---

 Summary: Add Compile CI check on jdk17
 Key: FLINK-35307
 URL: https://issues.apache.org/jira/browse/FLINK-35307
 Project: Flink
  Issue Type: Improvement
  Components: Build System / CI
Reporter: Zakelly Lan






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-35306) Flink cannot compile with jdk17

2024-05-07 Thread Rui Fan (Jira)
Rui Fan created FLINK-35306:
---

 Summary: Flink cannot compile with jdk17
 Key: FLINK-35306
 URL: https://issues.apache.org/jira/browse/FLINK-35306
 Project: Flink
  Issue Type: Bug
Affects Versions: 1.20.0
Reporter: Rui Fan
Assignee: Rui Fan
 Fix For: 1.20.0
 Attachments: image-2024-05-08-11-48-04-161.png

!image-2024-05-08-11-48-04-161.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] FLIP-453: Promote Unified Sink API V2 to Public and Deprecate SinkFunction

2024-05-07 Thread Hongshun Wang
Hi Martijn, Thanks for the proposal +1 from me.Some sinks still use
sinkfunction; it's time to take a step forward.

Best,
Hongshun

On Mon, May 6, 2024 at 5:44 PM Leonard Xu  wrote:

> +1 from my side, thanks Martijn for the effort.
>
> Best,
> Leonard
>
> > 2024年5月4日 下午7:41,Ahmed Hamdy  写道:
> >
> > Hi Martijn
> > Thanks for the proposal +1 from me.
> > Should this change take place in 1.20, what are the planned release steps
> > for connectors that only offer a deprecated interface in this case (i.e.
> > RabbitMQ, Cassandra, pusbub, Hbase)? Are we going to refrain from
> releases
> > that support 1.20+ till the blockers are implemented?
> > Best Regards
> > Ahmed Hamdy
> >
> >
> > On Fri, 3 May 2024 at 14:32, Péter Váry 
> wrote:
> >
> >>> With regards to FLINK-35149, the fix version indicates a change at
> Flink
> >> CDC; is that indeed correct, or does it require a change in the SinkV2
> >> interface?
> >>
> >> The fix doesn't need change in SinkV2, so we are good there.
> >> The issue is that the new SinkV2
> SupportsCommitter/SupportsPreWriteTopology
> >> doesn't work with the CDC yet.
> >>
> >> Martijn Visser  ezt írta (időpont: 2024. máj.
> >> 3.,
> >> P, 14:06):
> >>
> >>> Hi Ferenc,
> >>>
> >>> You're right, 1.20 it is :)
> >>>
> >>> I've assigned the HBase one to you!
> >>>
> >>> Thanks,
> >>>
> >>> Martijn
> >>>
> >>> On Fri, May 3, 2024 at 1:55 PM Ferenc Csaky  >
> >>> wrote:
> >>>
>  Hi Martijn,
> 
>  +1 for the proposal.
> 
> > targeted for Flink 1.19
> 
>  I guess you meant Flink 1.20 here.
> 
>  Also, I volunteer to take updating the HBase sink, feel free to assign
>  that task to me.
> 
>  Best,
>  Ferenc
> 
> 
> 
> 
>  On Friday, May 3rd, 2024 at 10:20, Martijn Visser <
>  martijnvis...@apache.org> wrote:
> 
> >
> >
> > Hi Peter,
> >
> > I'll add it for completeness, thanks!
> > With regards to FLINK-35149, the fix version indicates a change at
> >>> Flink
> > CDC; is that indeed correct, or does it require a change in the
> >> SinkV2
> > interface?
> >
> > Best regards,
> >
> > Martijn
> >
> >
> > On Fri, May 3, 2024 at 7:47 AM Péter Váry
> >> peter.vary.apa...@gmail.com
> >
> > wrote:
> >
> >> Hi Martijn,
> >>
> >> We might want to add FLIP-371 [1] to the list. (Or we aim only for
>  higher
> >> level FLIPs?)
> >>
> >> We are in the process of using the new API in Iceberg connector
> >> [2] -
>  so
> >> far, so good.
> >>
> >> I know of one minor known issue about the sink [3], which should be
>  ready
> >> for the release.
> >>
> >> All-in-all, I think we are in good shape, and we could move forward
>  with
> >> the promotion.
> >>
> >> Thanks,
> >> Peter
> >>
> >> [1] -
> >>
> >>
> 
> >>>
> >>
> https://cwiki.apache.org/confluence/plugins/servlet/mobile?contentId=263430387
> >> [2] - https://github.com/apache/iceberg/pull/10179
> >> [3] - https://issues.apache.org/jira/browse/FLINK-35149
> >>
> >> On Thu, May 2, 2024, 09:47 Muhammet Orazov
>  mor+fl...@morazow.com.invalid
> >> wrote:
> >>
> >>> Got it, thanks!
> >>>
> >>> On 2024-05-02 06:53, Martijn Visser wrote:
> >>>
>  Hi Muhammet,
> 
>  Thanks for joining the discussion! The changes in this FLIP
> >> would
>  be
>  targeted for Flink 1.19, since it's only a matter of changing
> >> the
>  annotation.
> 
>  Best regards,
> 
>  Martijn
> 
>  On Thu, May 2, 2024 at 7:26 AM Muhammet Orazov
>  mor+fl...@morazow.com
>  wrote:
> 
> > Hello Martijn,
> >
> > Thanks for the FLIP and detailed history of changes, +1.
> >
> > Would FLIP changes target for 2.0? I think it would be good
> > to have clear APIs on 2.0 release.
> >
> > Best,
> > Muhammet
> >
> > On 2024-05-01 15:30, Martijn Visser wrote:
> >
> >> Hi everyone,
> >>
> >> I would like to start a discussion on FLIP-453: Promote
>  Unified Sink
> >> API V2
> >> to Public and Deprecate SinkFunction
> >> https://cwiki.apache.org/confluence/x/rIobEg
> >>
> >> This FLIP proposes to promote the Unified Sink API V2 from
> >> PublicEvolving
> >> to Public and to mark the SinkFunction as Deprecated.
> >>
> >> I'm looking forward to your thoughts.
> >>
> >> Best regards,
> >>
> >> Martijn
> 
> >>>
> >>
>
>


[jira] [Created] (FLINK-35305) FLIP-438: Amazon SQS Sink Connector

2024-05-07 Thread Priya Dhingra (Jira)
Priya Dhingra created FLINK-35305:
-

 Summary: FLIP-438: Amazon SQS Sink Connector
 Key: FLINK-35305
 URL: https://issues.apache.org/jira/browse/FLINK-35305
 Project: Flink
  Issue Type: New Feature
  Components: Connectors / AWS
Reporter: Priya Dhingra


This is an umbrella task for FLIP-438. FLIP-438: 
https://cwiki.apache.org/confluence/display/FLINK/FLIP-438%3A+Amazon+SQS+Sink+Connector



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[VOTE] FLIP-452: Allow Skipping Invocation of Function Calls While Constant-folding

2024-05-07 Thread Alan Sheinberg
Hi everyone,

I'd like to start a vote on FLIP-452 [1]. It covers adding a new method
FunctionDefinition.supportsConstantFolding() as part of the Flink Table/SQL
API to allow skipping invocation of functions while constant-folding. It
has been discussed in this thread [2].

I would like to start a vote.  The vote will be open for at least 72 hours
unless there is an objection or insufficient votes.

[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-452%3A+Allow+Skipping+Invocation+of+Function+Calls+While+Constant-folding

[2] https://lists.apache.org/thread/ko5ndv5kr87nm011psll2hzzd0nn3ztz

Thanks,
Alan


Re: Issue in PrefetchCount

2024-05-07 Thread Talat Uyarer
Hi ajay,

When you have 3 parallelisms you will have 3 independent clients. If you
want to keep prefetch count 3 you need to set setRequestedChannelMax as 1
and setParallelism 3. So All 3 clients can have one connection.

Talat

On Tue, May 7, 2024 at 5:52 AM ajay pandey  wrote:

> Hi Flink Team,
>
>
> I am currently reading streaming data from RabbitMQ and using the
> RMQConnectionConfig for establishing the connection. Here's how I'm setting
> up the connection:
> and we use flink version 1.16.2 and RabbitMQ version 3.10.7
>
>  RMQConnectionConfig connectionConfig = new RMQConnectionConfig.Builder()
> .setPrefetchCount(smsInput.prefetchCount)
> .setHost(smsInput.HostServer)
> .setPort(smsInput.HostPort)
> .setUserName(smsInput.HostUserName)
> .setPassword(smsInput.HostPassword)
> .setVirtualHost("/")
> .build();
>
> ConnectionFactory rabbitMQConnectionFactory =
> connectionConfig.getConnectionFactory();
> rabbitMQConnectionFactory.setRequestedChannelMax(smsInput.prefetchCount);
> // Set prefetchcount
>
> DataStream stream = executionEnvironment.addSource(new
> RMQSource(connectionConfig,
>  smsInput.QueueName,
>  new SimpleStringSchema()))
>  .setParallelism(1);
>
>
> Additionally, I have configured the prefetch count to read 3 data at the
> same time from RabbitMQ. Here's how I have enabled the checkpointing
> interval.
>
>
> executionEnvironment.enableCheckpointing(smsInput.checkpointIntervalMS,CheckpointingMode.EXACTLY_ONCE,true);
>
> The prefetch count seems to be working fine, but when I run the job with a
> parallelism of 3, the prefetchCount is not working as expected.
>
> We establish a connection to RabbitMQ with a fixed setParallelism of 1.
> However, my other operators retrieve data from RabbitMQ and execute the job
> with a parallelism of 3, as shown in the following command.
>
> bin/flink run -p 3 ../apps/Flink_1.16.2_prefetch.jar
> ../config/app-config.properties -yD
> env.java.home=/usr/lib/jvm/java-11-openjdk-11.0.19.0.7-1.el7_9.x86_64
>
> So kindly provide a solution for configuring the prefetch count with
> parallelism.
>
>
>
> Thanks,
> Ajay Pandey
>


Re: [Vote] FLIP-438: Amazon SQS Sink Connector

2024-05-07 Thread Dhingra, Priya
Hi All,

I'm happy to announce that FLIP-438: Amazon SQS Sink Connector[1] has
been accepted with 9 approving votes (4 binding).

I would like to thank everyone who took part in the discussion and/or voted.

- Ahmed Hamdy (non-binding)
- Samrat(non-binding)
- Danny (binding)
- Aleksandr (non-binding)
- Muhammet (non-binding)
- Hong (binding)
- Jeyhun (non-binding)
- Robert Metzger(binding)
- Leonard Xu (binding)

Regards,
Priya

[1] 
https://cwiki.apache.org/confluence/display/FLINK/FLIP-438%3A+Amazon+SQS+Sink+Connector


On 5/7/24, 1:11 AM, "Hong Liang" mailto:h...@apache.org>> 
wrote:


CAUTION: This email originated from outside of the organization. Do not click 
links or open attachments unless you can confirm the sender and know the 
content is safe.






Hi Priya,


Great to see the FLIP has passed the vote.
It would be good to report the final result of the vote at the end of the
thread, listing the binding / non-binding votes in an email. See example
here [1] [2]


Regards,
Hong


[1] https://lists.apache.org/thread/3sj88kk0104vzj4hklfgbn3rpdnjxj8v 

[2]
https://cwiki.apache.org/confluence/display/FLINK/Flink+Improvement+Proposals 



On Mon, May 6, 2024 at 1:13 AM Dhingra, Priya mailto:dhipr...@amazon.com.inva>lid>
wrote:


> Thank you all!
>
>
> Closing the vote. Will update the Flip with Jira link for tracking
> implementation.
>
>





Re: [DISCUSSION] FLIP-456: CompiledPlan support for Batch Execution Mode

2024-05-07 Thread Alexey Leonov-Vendrovskiy
Hey Paul,

Yes, no interchangeability. Just a wire-through for more uniformity.

Thanks,
Alexey

On Tue, May 7, 2024 at 2:10 AM Paul Lam  wrote:

> Hi Alexey,
>
> Thanks a lot for bringing up the discussion. I’m big +1 for the FLIP.
>
> I suppose the goal doesn’t involve the interchangeability of json plans
> between batch mode and streaming mode, right?
> In other words, a json plan compiled in a batch program can’t be run in
> streaming mode without a migration (which is not yet supported).
>
> Best,
> Paul Lam
>
> > 2024年5月7日 14:38,Alexey Leonov-Vendrovskiy  写道:
> >
> > Hi everyone,
> >
> > PTAL at the proposed FLIP-456: CompiledPlan support for Batch Execution
> > Mode. It is pretty self-describing.
> >
> > Any thoughts are welcome!
> >
> > Thanks,
> > Alexey
> >
> > [1]
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-456%3A+CompiledPlan+support+for+Batch+Execution+Mode
> > .
>
>


[RabbitMQ Connector]: missing support for per consumer QoS (needed for quorum queues)

2024-05-07 Thread charlie vuillemez
Hello devs,
Seems RabbitMQ Connector support only global QoS (Prefetch), which is not 
compatible with RabbitMQ quorum queues.Cf 
https://www.rabbitmq.com/docs/quorum-queues#feature-matrix
Quorum queues is the only option for HA, cause mirrored classic queues are 
deprecated since RabbitMQ v3.9.Quorum queues support only per-consumer 
prefetch: https://www.rabbitmq.com/docs/consumer-prefetch
Trying to consume a quorum queue with a per channel global QoS produce the 
following error:
 basic.consume caused a connection exception not_implemented: "queue 'xx' 
in vhost 'x' does not support global qos"
Thanks.C Vuillemez.


  

Issue in PrefetchCount

2024-05-07 Thread ajay pandey
Hi Flink Team,


I am currently reading streaming data from RabbitMQ and using the
RMQConnectionConfig for establishing the connection. Here's how I'm setting
up the connection:
and we use flink version 1.16.2 and RabbitMQ version 3.10.7

 RMQConnectionConfig connectionConfig = new RMQConnectionConfig.Builder()
.setPrefetchCount(smsInput.prefetchCount)
.setHost(smsInput.HostServer)
.setPort(smsInput.HostPort)
.setUserName(smsInput.HostUserName)
.setPassword(smsInput.HostPassword)
.setVirtualHost("/")
.build();

ConnectionFactory rabbitMQConnectionFactory =
connectionConfig.getConnectionFactory();
rabbitMQConnectionFactory.setRequestedChannelMax(smsInput.prefetchCount);
// Set prefetchcount

DataStream stream = executionEnvironment.addSource(new
RMQSource(connectionConfig,
 smsInput.QueueName,
 new SimpleStringSchema()))
 .setParallelism(1);


Additionally, I have configured the prefetch count to read 3 data at the
same time from RabbitMQ. Here's how I have enabled the checkpointing
interval.

executionEnvironment.enableCheckpointing(smsInput.checkpointIntervalMS,CheckpointingMode.EXACTLY_ONCE,true);

The prefetch count seems to be working fine, but when I run the job with a
parallelism of 3, the prefetchCount is not working as expected.

We establish a connection to RabbitMQ with a fixed setParallelism of 1.
However, my other operators retrieve data from RabbitMQ and execute the job
with a parallelism of 3, as shown in the following command.

bin/flink run -p 3 ../apps/Flink_1.16.2_prefetch.jar
../config/app-config.properties -yD
env.java.home=/usr/lib/jvm/java-11-openjdk-11.0.19.0.7-1.el7_9.x86_64

So kindly provide a solution for configuring the prefetch count with
parallelism.



Thanks,
Ajay Pandey


Re: [DISCUSS] FLIP-448: Introduce Pluggable Workflow Scheduler Interface for Materialized Table

2024-05-07 Thread Ron Liu
Hi, dev

Following the recent PoC[1], and drawing on the excellent code design
within Flink, I have made the following optimizations to the Public
Interfaces section of FLIP:

1. I have renamed WorkflowOperation to RefreshWorkflow. This change better
conveys its purpose. RefreshWorkflow is used to provide the necessary
information required for creating, modifying, and deleting workflows. Using
WorkflowOperation could mislead people into thinking it is a command
operation, whereas in fact, it does not represent an operation but merely
provides the essential context information for performing operations on
workflows. The specific operations are completed within WorkflowScheduler.
Additionally, I felt that using WorkflowOperation could potentially
conflict with the Operation[2] interface in the table.
2. I have refined the signatures of the modifyRefreshWorkflow and
deleteRefreshWorkflow interface methods in WorkflowScheduler. The parameter
T refreshHandler is now provided by ModifyRefreshWorkflow and
deleteRefreshWorkflow, which makes the overall interface design more
symmetrical and clean.

[1] https://github.com/lsyldliu/flink/tree/FLIP-448-PoC
[2]
https://github.com/apache/flink/blob/29736b8c01924b7da03d4bcbfd9c812a8e5a08b4/flink-table/flink-table-api-java/src/main/java/org/apache/flink/table/operations/Operation.java

Best,
Ron

Ron Liu  于2024年5月7日周二 14:30写道:

> > 4. It appears that in the section on `public interfaces`, within
> `WorkflowOperation`, `CreatePeriodicWorkflowOperation` should be changed to
>
> `CreateWorkflowOperation`, right?
>
> After discussing with Xuyang offline, we need to support periodic workflow
> and one-time workflow, they need different information, for example,
> periodic workflow needs cron expression, one-time workflow needs refresh
> partition, downstream cascade materialized table, etc. Therefore,
> CreateWorkflowOperation correspondingly will have two different
> implementation classes, which will be cleaner for both the implementer and
> the caller.
>
> Best,
> Ron
>
> Ron Liu  于2024年5月6日周一 20:48写道:
>
>> Hi, Xuyang
>>
>> Thanks for joining this discussion
>>
>> > 1. In the sequence diagram, it appears that there is a missing step for
>> obtaining the refresh handler from the catalog during the suspend operation.
>>
>> Good catch
>>
>> > 2. The term "cascade refresh" does not seem to be mentioned in
>> FLIP-435. The workflow it creates is marked as a "one-time workflow". This
>> is different
>>
>> from a "periodic workflow," and it appears to be a one-off execution. Is
>> this actually referring to the Refresh command in FLIP-435?
>>
>> The cascade refresh is a future work, we don't propose the corresponding
>> syntax in FLIP-435. However, intuitively, it would be an extension of the
>> Refresh command in FLIP-435.
>>
>> > 3. The workflow-scheduler.type has no default value; should it be set
>> to CRON by default?
>>
>> Firstly, CRON is not a workflow scheduler. Secondly, I believe that
>> configuring the Scheduler should be an action that users are aware of, and
>> default values should not be set.
>>
>> > 4. It appears that in the section on `public interfaces`, within
>> `WorkflowOperation`, `CreatePeriodicWorkflowOperation` should be changed to
>>
>> `CreateWorkflowOperation`, right?
>>
>> Sorry, I don't get your point. Can you give more description?
>>
>> Best,
>> Ron
>>
>> Xuyang  于2024年5月6日周一 20:26写道:
>>
>>> Hi, Ron.
>>>
>>> Thanks for driving this. After reading the entire flip, I have the
>>> following questions:
>>>
>>>
>>>
>>>
>>> 1. In the sequence diagram, it appears that there is a missing step for
>>> obtaining the refresh handler from the catalog during the suspend operation.
>>>
>>>
>>>
>>>
>>> 2. The term "cascade refresh" does not seem to be mentioned in FLIP-435.
>>> The workflow it creates is marked as a "one-time workflow". This is
>>> different
>>>
>>> from a "periodic workflow," and it appears to be a one-off execution. Is
>>> this actually referring to the Refresh command in FLIP-435?
>>>
>>>
>>>
>>>
>>> 3. The workflow-scheduler.type has no default value; should it be set to
>>> CRON by default?
>>>
>>>
>>>
>>>
>>> 4. It appears that in the section on `public interfaces`, within
>>> `WorkflowOperation`, `CreatePeriodicWorkflowOperation` should be changed to
>>>
>>> `CreateWorkflowOperation`, right?
>>>
>>>
>>>
>>>
>>> --
>>>
>>> Best!
>>> Xuyang
>>>
>>>
>>>
>>>
>>>
>>> At 2024-04-22 14:41:39, "Ron Liu"  wrote:
>>> >Hi, Dev
>>> >
>>> >I would like to start a discussion about FLIP-448: Introduce Pluggable
>>> >Workflow Scheduler Interface for Materialized Table.
>>> >
>>> >In FLIP-435[1], we proposed Materialized Table, which has two types of
>>> data
>>> >refresh modes: Full Refresh & Continuous Refresh Mode. In Full Refresh
>>> >mode, the Materialized Table relies on a workflow scheduler to perform
>>> >periodic refresh operation to achieve the desired data freshness.
>>> >
>>> >There are numerous open-source workflow schedulers 

Re: [DISCUSSION] FLIP-449: Reorganization of flink-connector-jdbc

2024-05-07 Thread lorenzo . affetti
Thanks joao for your replies!

I also saw the latest PR that allows properties to be specified.

Thank for adding the pain points as well, that clarifies a lot.
On May 7, 2024 at 09:50 +0200, Muhammet Orazov , 
wrote:
> Thanks João for pointing it out. I didn't know about the PR, I am going
> to check it.
>
> Best,
> Muhammet
>
>
> On 2024-05-06 14:45, João Boto wrote:
> > Hi Muhammet,
> >
> > Have you had a chance to review the recently merged pull request [1]?
> > We've introduced a new feature allowing users to include ad hoc
> > configurations in the 'JdbcConnectionOptions' class.
> > ```
> > new JdbcConnectionOptions.JdbcConnectionOptionsBuilder()
> > .withUrl(FakeDBUtils.TEST_DB_URL)
> > .withProperty("keyA", "valueA")
> > .build();
> > ```
> >
> > This provides flexibility by enabling users to specify additional
> > configuration parameters dynamically.
> >
> > [1] https://github.com/apache/flink-connector-jdbc/pull/115/files
> >
> > Best
> >
> > On 2024/05/06 07:34:06 Muhammet Orazov wrote:
> > > > Morning João,
> > > >
> > > > Recently we had a case where the JDBC drivers authentication was
> > > > different than username authentication. For it to work,
> > > > certain
> > > > hacks required, there interface would have been helpful.
> > > >
> > > > But I agree maybe the interface module separation is not required at
> > > > the
> > > > moment.
> > > >
> > > > Thanks for your efforts!
> > > >
> > > > Best,
> > > > Muhammet
> > > >
> > > >
> > > > On 2024-05-03 12:25, João Boto wrote:
> > > > > > Hi Muhammet,
> > > > > >
> > > > > > While I generally agree, given our current usage, I'm struggling to
> > > > > > discern any clear advantage. We already have abstract 
> > > > > > implementations
> > > > > > that cover all necessary interfaces and offer essential 
> > > > > > functionality,
> > > > > > complemented by a robust set of reusable tests to streamline
> > > > > > implementation.
> > > > > >
> > > > > > With this established infrastructure in place, coupled with the 
> > > > > > added
> > > > > > import overhead of introducing another module, I find it difficult 
> > > > > > to
> > > > > > identify any distinct benefits at this point.
> > > > > >
> > > > > > Best
> > > > > >
> > > > > > On 2024/04/26 02:18:52 Muhammet Orazov wrote:
> > > > > > >> Hey João,
> > > > > > >>
> > > > > > >> Thanks for FLIP proposal!
> > > > > > >>
> > > > > > >> Since proposal is to introduce modules, would it make sense
> > > > > > >> to have another module for APIs (flink-jdbc-connector-api)?
> > > > > > >>
> > > > > > >> For this I would suggest to move all public interfaces (e.g,
> > > > > > >> JdbcRowConverter, JdbcConnectionProvider). And even convert
> > > > > > >> some classes into interface with their default implementations,
> > > > > > >> for example, JdbcSink, JdbcConnectionOptions.
> > > > > > >>
> > > > > > >> This way users would have clear interfaces to build their own
> > > > > > >> JDBC based Flink connectors.
> > > > > > >>
> > > > > > >> Here I am not suggesting to introduce new interfaces, only
> > > > > > >> suggest also to separate the API from the core implementation.
> > > > > > >>
> > > > > > >> What do you think?
> > > > > > >>
> > > > > > >> Best,
> > > > > > >> Muhammet
> > > > > > >>
> > > > > > >>
> > > > > > >> On 2024-04-25 08:54, Joao Boto wrote:
> > > > > > > >> > Hi all,
> > > > > > > >> >
> > > > > > > >> > I'd like to start a discussion on FLIP-449: Reorganization of
> > > > > > > >> > flink-connector-jdbc [1].
> > > > > > > >> > As Flink continues to evolve, we've noticed an increasing 
> > > > > > > >> > level of
> > > > > > > >> > complexity within the JDBC connector.
> > > > > > > >> > The proposed solution is to address this complexity by 
> > > > > > > >> > separating the
> > > > > > > >> > core
> > > > > > > >> > functionality from individual database components, thereby 
> > > > > > > >> > streamlining
> > > > > > > >> > the
> > > > > > > >> > structure into distinct modules.
> > > > > > > >> >
> > > > > > > >> > Looking forward to your feedback and suggestions, thanks.
> > > > > > > >> > Best regards,
> > > > > > > >> > Joao Boto
> > > > > > > >> >
> > > > > > > >> > [1]
> > > > > > > >> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-449%3A+Reorganization+of+flink-connector-jdbc
> > > > > > >>
> > > >


[jira] [Created] (FLINK-35304) Mongo ITCase fails due to duplicate records after resuming

2024-05-07 Thread yux (Jira)
yux created FLINK-35304:
---

 Summary: Mongo ITCase fails due to duplicate records after resuming
 Key: FLINK-35304
 URL: https://issues.apache.org/jira/browse/FLINK-35304
 Project: Flink
  Issue Type: Bug
  Components: Flink CDC
Reporter: yux


Test case testRemoveAndAddCollectionsOneByOne keeps failing since downstream 
receives duplicate data rows after MongoDB token resume.

2024-05-07T08:57:16.4720998Z [ERROR] Tests run: 20, Failures: 1, Errors: 0, 
Skipped: 0, Time elapsed: 498.203 s <<< FAILURE! - in 
org.apache.flink.cdc.connectors.mongodb.source.NewlyAddedTableITCase
2024-05-07T08:57:16.4723517Z [ERROR] 
org.apache.flink.cdc.connectors.mongodb.source.NewlyAddedTableITCase.testRemoveAndAddCollectionsOneByOne
  Time elapsed: 38.114 s  <<< FAILURE!
2024-05-07T08:57:16.4725419Z java.lang.AssertionError: expected:<33> but 
was:<34>
2024-05-07T08:57:16.4726168Z     at org.junit.Assert.fail(Assert.java:89)
2024-05-07T08:57:16.4726828Z     at 
org.junit.Assert.failNotEquals(Assert.java:835)
2024-05-07T08:57:16.4727540Z     at 
org.junit.Assert.assertEquals(Assert.java:647)
2024-05-07T08:57:16.4728301Z     at 
org.junit.Assert.assertEquals(Assert.java:633)
2024-05-07T08:57:16.4729698Z     at 
org.apache.flink.cdc.connectors.mongodb.utils.MongoDBAssertUtils.assertEqualsInOrder(MongoDBAssertUtils.java:118)
2024-05-07T08:57:16.4731863Z     at 
org.apache.flink.cdc.connectors.mongodb.utils.MongoDBAssertUtils.assertEqualsInAnyOrder(MongoDBAssertUtils.java:111)
2024-05-07T08:57:16.4734296Z     at 
org.apache.flink.cdc.connectors.mongodb.source.NewlyAddedTableITCase.testRemoveAndAddCollectionsOneByOne(NewlyAddedTableITCase.java:501)
2024-05-07T08:57:16.4736882Z     at 
org.apache.flink.cdc.connectors.mongodb.source.NewlyAddedTableITCase.testRemoveAndAddCollectionsOneByOne(NewlyAddedTableITCase.java:330)
2024-05-07T08:57:16.4738847Z     at 
java.lang.reflect.Method.invoke(Method.java:498)
2024-05-07T08:57:16.4739923Z     at 
java.util.concurrent.FutureTask.run(FutureTask.java:266)
2024-05-07T08:57:16.4740790Z     at java.lang.Thread.run(Thread.java:750)
2024-05-07T08:57:16.4741257Z



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSSION] FLIP-456: CompiledPlan support for Batch Execution Mode

2024-05-07 Thread Paul Lam
Hi Alexey,

Thanks a lot for bringing up the discussion. I’m big +1 for the FLIP.

I suppose the goal doesn’t involve the interchangeability of json plans between 
batch mode and streaming mode, right?
In other words, a json plan compiled in a batch program can’t be run in 
streaming mode without a migration (which is not yet supported).

Best,
Paul Lam

> 2024年5月7日 14:38,Alexey Leonov-Vendrovskiy  写道:
> 
> Hi everyone,
> 
> PTAL at the proposed FLIP-456: CompiledPlan support for Batch Execution
> Mode. It is pretty self-describing.
> 
> Any thoughts are welcome!
> 
> Thanks,
> Alexey
> 
> [1]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-456%3A+CompiledPlan+support+for+Batch+Execution+Mode
> .



[SUMMARY] Flink 1.20 Release Sync 05/07/2024

2024-05-07 Thread Rui Fan
Dear devs,

Today is the third meeting for Flink 1.20 release cycle.
I'd like to share the information synced in the meeting.

- Feature Freeze

  As we discussed before, the feature freeze time is
  June 15, 2024, 00:00 CEST(UTC+2). It is worth noting that
  there are only 6 weeks left until the feature freeze time,
  and developers need to pay attention to the feature freeze time.

  Also, if there are no other changes, 1.20 will be the last version
  before 2.0. Relevant developers also need to check whether
  the Public or Public Evolving API related to Flink 2.0 deprecation
  has been completed.

- Features:
  So far we've had 8 flips/features, there are two FLIPs that
  may not be completed in 1.20, and the status of the other
  six FLIPs is good. It is encouraged to continuously updating
  the 1.20 wiki page[1] for contributors.

- Blockers:
  - [Closed] FLINK-34997 PyFlink YARN per-job on Docker test failed on azure
  - [Doing] FLINK-35041
IncrementalRemoteKeyedStateHandleTest.testSharedStateReRegistration failed
- It fails frequently, Rui has pinged @feifan, he will check it
this week with high priority
  - [Doing] FLINK-35040 The performance of serializerHeavyString regresses
since April 3
- Rui needs to generate a flamegraph from the benchmark server.
  - [Doing] FLINK-35215 The performance of serializerKryo and
serializerKryoWithoutRegistration are regressed
- Rui provided an alternative change without performance regression, we
need someone who familiar with kryo serialization to review

- Sync meeting[2]:
 The next meeting is 05/21/2024 10am (UTC+2) and 4pm (UTC+8), please feel
free to join us.

Lastly, we encourage attendees to fill out the topics to be discussed at
the bottom of 1.20 wiki page[1] a day in advance, to make it easier for
everyone to understand the background of the topics, thanks!

[1] https://cwiki.apache.org/confluence/display/FLINK/1.20+Release
[2] https://meet.google.com/mtj-huez-apu

Best,

Weijie, Ufuk, Robert and Rui


[jira] [Created] (FLINK-35303) Support logical deletion of datax

2024-05-07 Thread melin (Jira)
melin created FLINK-35303:
-

 Summary:  Support logical deletion of datax
 Key: FLINK-35303
 URL: https://issues.apache.org/jira/browse/FLINK-35303
 Project: Flink
  Issue Type: New Feature
  Components: Flink CDC
Reporter: melin


delete event is logical deletion. Add a field to the table. For example: 
is_delete, the default is false, if it is a delete event, is_delete is set to 
true。



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [Vote] FLIP-438: Amazon SQS Sink Connector

2024-05-07 Thread Hong Liang
Hi Priya,

Great to see the FLIP has passed the vote.
It would be good to report the final result of the vote at the end of the
thread, listing the binding / non-binding votes in an email. See example
here [1] [2]

Regards,
Hong

[1] https://lists.apache.org/thread/3sj88kk0104vzj4hklfgbn3rpdnjxj8v
[2]
https://cwiki.apache.org/confluence/display/FLINK/Flink+Improvement+Proposals

On Mon, May 6, 2024 at 1:13 AM Dhingra, Priya 
wrote:

> Thank you all!
>
>
> Closing the vote. Will update the Flip with Jira link for tracking
> implementation.
>
>


Re: [DISCUSSION] FLIP-449: Reorganization of flink-connector-jdbc

2024-05-07 Thread Muhammet Orazov
Thanks João for pointing it out. I didn't know about the PR, I am going 
to check it.


Best,
Muhammet


On 2024-05-06 14:45, João Boto wrote:

Hi Muhammet,

Have you had a chance to review the recently merged pull request [1]?
We've introduced a new feature allowing users to include ad hoc 
configurations in the 'JdbcConnectionOptions' class.

```
 new JdbcConnectionOptions.JdbcConnectionOptionsBuilder()
.withUrl(FakeDBUtils.TEST_DB_URL)
.withProperty("keyA", "valueA")
.build();
```

This provides flexibility by enabling users to specify additional 
configuration parameters dynamically.


[1] https://github.com/apache/flink-connector-jdbc/pull/115/files

Best

On 2024/05/06 07:34:06 Muhammet Orazov wrote:

Morning João,

Recently we had a case where the JDBC drivers authentication was
different than username authentication. For it to work, 
certain

hacks required, there interface would have been helpful.

But I agree maybe the interface module separation is not required at 
the

moment.

Thanks for your efforts!

Best,
Muhammet


On 2024-05-03 12:25, João Boto wrote:
> Hi Muhammet,
>
> While I generally agree, given our current usage, I'm struggling to
> discern any clear advantage. We already have abstract implementations
> that cover all necessary interfaces and offer essential functionality,
> complemented by a robust set of reusable tests to streamline
> implementation.
>
> With this established infrastructure in place, coupled with the added
> import overhead of introducing another module, I find it difficult to
> identify any distinct benefits at this point.
>
> Best
>
> On 2024/04/26 02:18:52 Muhammet Orazov wrote:
>> Hey João,
>>
>> Thanks for FLIP proposal!
>>
>> Since proposal is to introduce modules, would it make sense
>> to have another module for APIs (flink-jdbc-connector-api)?
>>
>> For this I would suggest to move all public interfaces (e.g,
>> JdbcRowConverter, JdbcConnectionProvider). And even convert
>> some classes into interface with their default implementations,
>> for example, JdbcSink, JdbcConnectionOptions.
>>
>> This way users would have clear interfaces to build their own
>> JDBC based Flink connectors.
>>
>> Here I am not suggesting to introduce new interfaces, only
>> suggest also to separate the API from the core implementation.
>>
>> What do you think?
>>
>> Best,
>> Muhammet
>>
>>
>> On 2024-04-25 08:54, Joao Boto wrote:
>> > Hi all,
>> >
>> > I'd like to start a discussion on FLIP-449: Reorganization of
>> > flink-connector-jdbc [1].
>> > As Flink continues to evolve, we've noticed an increasing level of
>> > complexity within the JDBC connector.
>> > The proposed solution is to address this complexity by separating the
>> > core
>> > functionality from individual database components, thereby streamlining
>> > the
>> > structure into distinct modules.
>> >
>> > Looking forward to your feedback and suggestions, thanks.
>> > Best regards,
>> > Joao Boto
>> >
>> > [1]
>> > 
https://cwiki.apache.org/confluence/display/FLINK/FLIP-449%3A+Reorganization+of+flink-connector-jdbc
>>



Re: [DISCUSS] FLIP-443: Interruptible watermark processing

2024-05-07 Thread Rui Fan
Hi Piotr,

Overall this FLIP is fine for me. I have a minor concern:
IIUC, the code path will become more complex after this FLIP
due to the addition of shouldIntterupt() checks, right?

If so, it's better to add a benchmark to check whether the job
performance regresses when one job has a lot of timers.
If the performance regresses too much, we need to re-consider it.
Of course, I hope the performance is fine.

Best,
Rui

On Mon, May 6, 2024 at 6:30 PM Zakelly Lan  wrote:

> Hi Piotr,
>
> I'm saying the scenario where things happen in the following order:
> 1. advance watermark and process timers.
> 2. the cp arrives and interrupts the timer processing, after this the
> continuation mail is in the mailbox queue.
> 3. `snapshotState` is called, where the async state access responses will
> be drained by calling `tryYield()` [1]. —— What if the continuation mail is
> triggered by `tryYield()`?
>
> I'm suggesting skipping the continuation mail during draining of async
> state access.
>
>
> [1]
>
> https://github.com/apache/flink/blob/1904b215e36e4fd48e48ece7ffdf2f1470653130/flink-runtime/src/main/java/org/apache/flink/runtime/asyncprocessing/AsyncExecutionController.java#L305
>
> Best,
> Zakelly
>
>
> On Mon, May 6, 2024 at 6:00 PM Piotr Nowojski 
> wrote:
>
> > Hi Zakelly,
> >
> > Can you elaborate a bit more on what you have in mind? How marking mails
> as
> > interruptible helps with something? If an incoming async state access
> > response comes, it could just request to interrupt any currently ongoing
> > computations, regardless the currently executed mail is or is not
> > interruptible.
> >
> > Best,
> > Piotrek
> >
> > pon., 6 maj 2024 o 06:33 Zakelly Lan  napisał(a):
> >
> > > Hi Piotr,
> > >
> > > Thanks for the improvement, overall +1 for this. I'd leave a minor
> > comment:
> > >
> > > 1. I'd suggest also providing `isInterruptable()` in `Mail`, and the
> > > continuation mail will return true. The FLIP-425 will leverage this
> queue
> > > to execute some state requests, and when the cp arrives, the operator
> may
> > > call `yield()` to drain. It may happen that the continuation mail is
> > called
> > > again in `yield()`. By checking `isInterruptable()`, we can skip this
> > mail
> > > and re-enqueue.
> > >
> > >
> > > Best,
> > > Zakelly
> > >
> > > On Wed, May 1, 2024 at 4:35 PM Yanfei Lei  wrote:
> > >
> > > > Thanks for your answers, Piotrek. I got it now.  +1 for this
> > improvement.
> > > >
> > > > Best,
> > > > Yanfei
> > > >
> > > > Stefan Richter  于2024年4月30日周二
> 21:30写道:
> > > > >
> > > > >
> > > > > Thanks for the improvement proposal, I’m +1 for the change!
> > > > >
> > > > > Best,
> > > > > Stefan
> > > > >
> > > > >
> > > > >
> > > > > > On 30. Apr 2024, at 15:23, Roman Khachatryan 
> > > wrote:
> > > > > >
> > > > > > Thanks for the proposal, I definitely see the need for this
> > > > improvement, +1.
> > > > > >
> > > > > > Regards,
> > > > > > Roman
> > > > > >
> > > > > >
> > > > > > On Tue, Apr 30, 2024 at 3:11 PM Piotr Nowojski <
> > pnowoj...@apache.org
> > > > > wrote:
> > > > > >
> > > > > >> Hi Yanfei,
> > > > > >>
> > > > > >> Thanks for the feedback!
> > > > > >>
> > > > > >>> 1. Currently when AbstractStreamOperator or
> > > AbstractStreamOperatorV2
> > > > > >>> processes a watermark, the watermark will be sent to
> downstream,
> > if
> > > > > >>> the `InternalTimerServiceImpl#advanceWatermark` is interrupted,
> > > when
> > > > > >>> is the watermark sent downstream?
> > > > > >>
> > > > > >> The watermark would be outputted by an operator only once all
> > > relevant
> > > > > >> timers are fired.
> > > > > >> In other words, if firing of timers is interrupted a
> continuation
> > > > mail to
> > > > > >> continue firing those
> > > > > >> interrupted timers is created. Watermark will be emitted
> > downstream
> > > > at the
> > > > > >> end of that
> > > > > >> continuation mail.
> > > > > >>
> > > > > >>> 2. IIUC, processing-timer's firing is also encapsulated into
> mail
> > > and
> > > > > >>> executed in mailbox. Is processing-timer allowed to be
> > interrupted?
> > > > > >>
> > > > > >> Yes, both firing processing and even time timers share the same
> > code
> > > > and
> > > > > >> both will
> > > > > >> support interruptions in the same way. Actually I've renamed the
> > > FLIP
> > > > from
> > > > > >>
> > > > > >>> Interruptible watermarks processing
> > > > > >>
> > > > > >> to:
> > > > > >>
> > > > > >>> Interruptible timers firing
> > > > > >>
> > > > > >> to make this more clear.
> > > > > >>
> > > > > >> Best,
> > > > > >> Piotrek
> > > > > >>
> > > > > >> wt., 30 kwi 2024 o 06:08 Yanfei Lei 
> > > napisał(a):
> > > > > >>
> > > > > >>> Hi Piotrek,
> > > > > >>>
> > > > > >>> Thanks for this proposal. It looks like it will shorten the
> > > > checkpoint
> > > > > >>> duration, especially in the case of back pressure. +1 for it!
> > I'd
> > > > > >>> like to ask some questions to understand your thoughts more
> > > > 

[DISCUSSION] FLIP-456: CompiledPlan support for Batch Execution Mode

2024-05-07 Thread Alexey Leonov-Vendrovskiy
Hi everyone,

PTAL at the proposed FLIP-456: CompiledPlan support for Batch Execution
Mode. It is pretty self-describing.

Any thoughts are welcome!

Thanks,
Alexey

[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-456%3A+CompiledPlan+support+for+Batch+Execution+Mode
.


Re: [DISCUSS] FLIP-448: Introduce Pluggable Workflow Scheduler Interface for Materialized Table

2024-05-07 Thread Ron Liu
> 4. It appears that in the section on `public interfaces`, within
`WorkflowOperation`, `CreatePeriodicWorkflowOperation` should be changed to

`CreateWorkflowOperation`, right?

After discussing with Xuyang offline, we need to support periodic workflow
and one-time workflow, they need different information, for example,
periodic workflow needs cron expression, one-time workflow needs refresh
partition, downstream cascade materialized table, etc. Therefore,
CreateWorkflowOperation correspondingly will have two different
implementation classes, which will be cleaner for both the implementer and
the caller.

Best,
Ron

Ron Liu  于2024年5月6日周一 20:48写道:

> Hi, Xuyang
>
> Thanks for joining this discussion
>
> > 1. In the sequence diagram, it appears that there is a missing step for
> obtaining the refresh handler from the catalog during the suspend operation.
>
> Good catch
>
> > 2. The term "cascade refresh" does not seem to be mentioned in FLIP-435.
> The workflow it creates is marked as a "one-time workflow". This is
> different
>
> from a "periodic workflow," and it appears to be a one-off execution. Is
> this actually referring to the Refresh command in FLIP-435?
>
> The cascade refresh is a future work, we don't propose the corresponding
> syntax in FLIP-435. However, intuitively, it would be an extension of the
> Refresh command in FLIP-435.
>
> > 3. The workflow-scheduler.type has no default value; should it be set to
> CRON by default?
>
> Firstly, CRON is not a workflow scheduler. Secondly, I believe that
> configuring the Scheduler should be an action that users are aware of, and
> default values should not be set.
>
> > 4. It appears that in the section on `public interfaces`, within
> `WorkflowOperation`, `CreatePeriodicWorkflowOperation` should be changed to
>
> `CreateWorkflowOperation`, right?
>
> Sorry, I don't get your point. Can you give more description?
>
> Best,
> Ron
>
> Xuyang  于2024年5月6日周一 20:26写道:
>
>> Hi, Ron.
>>
>> Thanks for driving this. After reading the entire flip, I have the
>> following questions:
>>
>>
>>
>>
>> 1. In the sequence diagram, it appears that there is a missing step for
>> obtaining the refresh handler from the catalog during the suspend operation.
>>
>>
>>
>>
>> 2. The term "cascade refresh" does not seem to be mentioned in FLIP-435.
>> The workflow it creates is marked as a "one-time workflow". This is
>> different
>>
>> from a "periodic workflow," and it appears to be a one-off execution. Is
>> this actually referring to the Refresh command in FLIP-435?
>>
>>
>>
>>
>> 3. The workflow-scheduler.type has no default value; should it be set to
>> CRON by default?
>>
>>
>>
>>
>> 4. It appears that in the section on `public interfaces`, within
>> `WorkflowOperation`, `CreatePeriodicWorkflowOperation` should be changed to
>>
>> `CreateWorkflowOperation`, right?
>>
>>
>>
>>
>> --
>>
>> Best!
>> Xuyang
>>
>>
>>
>>
>>
>> At 2024-04-22 14:41:39, "Ron Liu"  wrote:
>> >Hi, Dev
>> >
>> >I would like to start a discussion about FLIP-448: Introduce Pluggable
>> >Workflow Scheduler Interface for Materialized Table.
>> >
>> >In FLIP-435[1], we proposed Materialized Table, which has two types of
>> data
>> >refresh modes: Full Refresh & Continuous Refresh Mode. In Full Refresh
>> >mode, the Materialized Table relies on a workflow scheduler to perform
>> >periodic refresh operation to achieve the desired data freshness.
>> >
>> >There are numerous open-source workflow schedulers available, with
>> popular
>> >ones including Airflow and DolphinScheduler. To enable Materialized Table
>> >to work with different workflow schedulers, we propose a pluggable
>> workflow
>> >scheduler interface for Materialized Table in this FLIP.
>> >
>> >For more details, see FLIP-448 [2]. Looking forward to your feedback.
>> >
>> >[1] https://lists.apache.org/thread/c1gnn3bvbfs8v1trlf975t327s4rsffs
>> >[2]
>> >
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-448%3A+Introduce+Pluggable+Workflow+Scheduler+Interface+for+Materialized+Table
>> >
>> >Best,
>> >Ron
>>
>