Re: Re: Iceberg table maintenance

2024-05-03 Thread Péter Váry
Hi Flink Team,

The Iceberg table maintenance proposal is on vote on the Iceberg dev list
[1].

Non-binding votes are important too, so if you are interested, please vote.

Thanks,
Peter

[1] - https://lists.apache.org/thread/qhz17ppdbb57ql356j49qqk3nyk59rvm

On Tue, Apr 2, 2024, 08:35 Péter Váry  wrote:

> Thanks Wenjin for your response.
>
> See my answers below:
>
> On Tue, Apr 2, 2024, 04:08 wenjin  wrote:
>
>> Hi Peter,
>> Thanks a lot for your answers, this detailed explanation has cleared my
>> confusions and been greatly beneficial to me. If you don't mind, could I
>> discuss two more questions with your?
>>
>> As you mentioned in your proposal and answers, the maintenance tasks may
>> cause resource interference and delay the checkpoint, and to my opinion,
>> they may also backpressure upstream when exists performance question. So,
>> If it is a better way to recommend users to run maintenance tasks in a
>> sperate Flink job?
>>
>
> For some users - small tables, manageable amounts of data - architectural
> simplicity is more important, than resource usage. Also, in the long term,
> I hope Autoscaling can help with in-place scaling for these jobs.
>
> But I definitely agree, that bigger, more resource constrained jobs are
> need to separate out compaction to another job.
>
>
>> As you mentioned in your proposal: "We can serialize different
>> maintenance tasks by chaining them together, but maintenance tasks
>> overlapping from consecutive runs also need to be prevented.”
>> In my understanding, if maintenance tasks are chained together in one
>> vertex, just like
>> "scheduler1->task1->scheduler2->task2->scheduler3->task3->schduler4->task4",they
>> will be executed serially,and only after task4 finished, scheduler1 will
>> process next record. How can the overlapping of maintenance tasks happen?
>>
>
> When I talk about chained tasks, they are not chained into a single vertex.
>
> They are using the output of the previous task to start the next task, but
> all of them has multiple operators (some of them are with different
> parallelism), which prevents them to got into a single vertex.
>
> So overlapping could happen if a new input triggers a parallel scheduling.
>
> On the other hand, ensure maintenance tasks do not run concurrently by
>> chaing them together is not guaranteed, for there may be case diable the
>> chain. In this case, I think using tags is a better way than lock
>> mechanisms, for simplicity and ease of use for user.
>>
>> Thanks,
>> Wenjin.
>>
>> On 2024/03/30 13:22:12 Péter Váry wrote:
>> > Hi Wenjin,
>> >
>> > See my answers below:
>> >
>> > On Sat, Mar 30, 2024, 10:54 wenjin  wrote:
>> >
>> > > Hi Peter,
>> > >
>> > > I am interested in your proposal and think make iceberg Flink
>> Connector
>> > > support running maintenance task is meaningful . If possible, could
>> you
>> > > help me clarify a few confusions.
>> > >
>> > > - When the iceberg table is written by single Flink job (use case1,
>> 2),the
>> > > maintenance tasks will be added to the post commit topology. How dose
>> the
>> > > maintenance tasks execute? Synchronously or Asynchronously? Will the
>> > > maintenance tasks block the data processing of Flink job?
>> > >
>> >
>> > The sceduling and maintenance tasks are just regular Flink operators.
>> Also
>> > the scheduler will make sure that the maintenance tasks are not chained
>> to
>> > the Iceberg committer, so I would call this Asynchronous.
>> > Flink operators do not block other operators, but the maintenance tasks
>> are
>> > competing for resources with the other data processing tasks. That is
>> why
>> > we provide the possibility to define slot sharing groups for the
>> > maintenance tasks. This allows the users to separate the provided
>> resources
>> > as much as Flink allows.
>> >
>> > I have seen only one exception to this separation where we emit high
>> number
>> > of records in the maintenance flow, which would cause delays in starting
>> > the checpoints, but it could be mitigated by enabling unaligned
>> > checkpoints, and using AsyncIO. There is one issue with AsynIO found by
>> > Gyula Fora: https://issues.apache.org/jira/browse/FLINK-34704 which
>> means,
>> > even with AsyncIO the checkpoint could be blocked until at least one
>> > compaction group is finished.
>> >
>> > - When the iceberg table is written by multi Flink jobs (use case 3),
>> user
>> > > need to create a separate Flink job to run the maintenance task. In
>> this
>> > > case, if user do not create a single job, but enable run maintenance
>> task
>> > > in exist Flink jobs just like use case 1, what would be the
>> consequences?
>> > > Or, is there an automatic mechanism to avoid this issue?
>> > >
>> >
>> > The user needs to create a new job, or chose a single job to run the
>> > maintenance tasks to avoid running concurrent instances of the
>> compaction
>> > tasks.
>> > Even if concurrent compaction tasks could be handled, they would be a
>> > serious waste of resources 

[jira] [Created] (FLINK-35288) Flink Restart Strategy does not work as documented

2024-05-03 Thread Keshav Kansal (Jira)
Keshav Kansal created FLINK-35288:
-

 Summary: Flink Restart Strategy does not work as documented
 Key: FLINK-35288
 URL: https://issues.apache.org/jira/browse/FLINK-35288
 Project: Flink
  Issue Type: Bug
Reporter: Keshav Kansal


As per the documentation when using the Fixed Delay Restart Strategy, the
restart-strategy.fixed-delay.attempts defines the "The number of times that 
Flink retries the execution before the job is declared as failed if has been 
set to fixed-delay". 

However in reality it is the *maximum-total-task-failures*, i.e. it is possbile 
that the job does not even attempt to restart. 
This is as per documented in 
https://cwiki.apache.org/confluence/display/FLINK/FLIP-1%3A+Fine+Grained+Recovery+from+Task+Failures

If there is an outage at a Sink level, for example Elasticsearch outage, all 
the independent tasks might fail and the job will immediately fail without 
restart if restart-strategy.fixed-delay.attempts is set lower or equal to the 
parallelism of the sink. 




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: Question around Flink's AdaptiveBatchScheduler

2024-05-03 Thread Venkatakrishnan Sowrirajan
Jinrui and Xia

Gentle ping for reviews.

On Mon, Apr 29, 2024, 8:28 PM Venkatakrishnan Sowrirajan 
wrote:

> Hi Xia and Jinrui,
>
> Filed https://github.com/apache/flink/pull/24736 to address the above
> described issue. Please take a look whenever you can.
>
> Thanks
> Venkat
>
>
> On Thu, Apr 18, 2024 at 12:16 PM Venkatakrishnan Sowrirajan <
> vsowr...@asu.edu> wrote:
>
>> Filed https://issues.apache.org/jira/browse/FLINK-35165 to address the
>> above described issue. Will share the PR here once it is ready for review.
>>
>> Regards
>> Venkata krishnan
>>
>>
>> On Wed, Apr 17, 2024 at 5:32 AM Junrui Lee  wrote:
>>
>>> Thanks Venkata and Xia for providing further clarification. I think your
>>> example illustrates the significance of this proposal very well. Please
>>> feel free go ahead and address the concerns.
>>>
>>> Best,
>>> Junrui
>>>
>>> Venkatakrishnan Sowrirajan  于2024年4月16日周二 07:01写道:
>>>
>>> > Thanks for adding your thoughts to this discussion.
>>> >
>>> > If we all agree that the source vertex parallelism shouldn't be bound
>>> by
>>> > the downstream max parallelism
>>> > (jobmanager.adaptive-batch-scheduler.max-parallelism)
>>> > based on the rationale and the issues described above, I can take a
>>> stab at
>>> > addressing the issue.
>>> >
>>> > Let me file a ticket to track this issue. Otherwise, I'm looking
>>> forward to
>>> > hearing more thoughts from others as well, especially Lijie and Junrui
>>> who
>>> > have more context on the AdaptiveBatchScheduler.
>>> >
>>> > Regards
>>> > Venkata krishnan
>>> >
>>> >
>>> > On Mon, Apr 15, 2024 at 12:54 AM Xia Sun  wrote:
>>> >
>>> > > Hi Venkat,
>>> > > I agree that the parallelism of source vertex should not be upper
>>> bounded
>>> > > by the job's global max parallelism. The case you mentioned, >> High
>>> > filter
>>> > > selectivity with huge amounts of data to read  excellently supports
>>> this
>>> > > viewpoint. (In fact, in the current implementation, if the source
>>> > > parallelism is pre-specified at job create stage, rather than
>>> relying on
>>> > > the dynamic parallelism inference of the AdaptiveBatchScheduler, the
>>> > source
>>> > > vertex's parallelism can indeed exceed the job's global max
>>> parallelism.)
>>> > >
>>> > > As Lijie and Junrui pointed out, the key issue is "semantic
>>> consistency."
>>> > > Currently, if a vertex has not set maxParallelism, the
>>> > > AdaptiveBatchScheduler will use
>>> > > `execution.batch.adaptive.auto-parallelism.max-parallelism` as the
>>> > vertex's
>>> > > maxParallelism. Since the current implementation does not distinguish
>>> > > between source vertices and downstream vertices, source vertices are
>>> also
>>> > > subject to this limitation.
>>> > >
>>> > > Therefore, I believe that if the issue of "semantic consistency" can
>>> be
>>> > > well explained in the code and configuration documentation, the
>>> > > AdaptiveBatchScheduler should support that the parallelism of source
>>> > > vertices can exceed the job's global max parallelism.
>>> > >
>>> > > Best,
>>> > > Xia
>>> > >
>>> > > Venkatakrishnan Sowrirajan  于2024年4月14日周日 10:31写道:
>>> > >
>>> > > > Let me state why I think "*jobmanager.adaptive-batch-sche*
>>> > > > *duler.default-source-parallelism*" should not be bound by the "
>>> > > > *jobmanager.adaptive-batch-sche**duler.max-parallelism*".
>>> > > >
>>> > > >- Source vertex is unique and does not have any upstream
>>> vertices
>>> > > >- Downstream vertices read shuffled data partitioned by key,
>>> which
>>> > is
>>> > > >not the case for the Source vertex
>>> > > >- Limiting source parallelism by downstream vertices' max
>>> > parallelism
>>> > > is
>>> > > >incorrect
>>> > > >
>>> > > > If we say for ""semantic consistency" the source vertex
>>> parallelism has
>>> > > to
>>> > > > be bound by the overall job's max parallelism, it can lead to
>>> following
>>> > > > issues:
>>> > > >
>>> > > >- High filter selectivity with huge amounts of data to read -
>>> > setting
>>> > > >high "*jobmanager.adaptive-batch-scheduler.max-parallelism*" so
>>> that
>>> > > >source parallelism can be set higher can lead to small blocks
>>> and
>>> > > >sub-optimal performance.
>>> > > >- Setting high
>>> > "*jobmanager.adaptive-batch-scheduler.max-parallelism*"
>>> > > >requires careful tuning of network buffer configurations which
>>> is
>>> > > >unnecessary in cases where it is not required just so that the
>>> > source
>>> > > >parallelism can be set high.
>>> > > >
>>> > > > Regards
>>> > > > Venkata krishnan
>>> > > >
>>> > > > On Thu, Apr 11, 2024 at 9:30 PM Junrui Lee 
>>> > wrote:
>>> > > >
>>> > > > > Hello Venkata krishnan,
>>> > > > >
>>> > > > > I think the term "semantic inconsistency" defined by
>>> > > > > jobmanager.adaptive-batch-scheduler.max-parallelism refers to
>>> > > > maintaining a
>>> > > > > uniform upper limit on parallelism across all vertices within a
>>> job.
>>> > As
>>> > > > the
>>> > > > > s

Re: [VOTE] Apache Flink CDC Release 3.1.0, release candidate #1

2024-05-03 Thread Jeyhun Karimov
Hi Qinsheng,

Thanks for driving the release.
+1 (non-binding)

- No binaries in source
- Verified Signatures
- Github tag exists
- Build source

Regards,
Jeyhun

On Thu, May 2, 2024 at 10:52 PM Muhammet Orazov
 wrote:

> Hey Qingsheng,
>
> Thanks a lot! +1 (non-binding)
>
> - Checked sha512sum hash
> - Checked GPG signature
> - Reviewed release notes
> - Reviewed GitHub web pr (added minor suggestions)
> - Built the source with JDK 11 & 8
> - Checked that src doesn't contain binary files
>
> Best,
> Muhammet
>
> On 2024-04-30 05:11, Qingsheng Ren wrote:
> > Hi everyone,
> >
> > Please review and vote on the release candidate #1 for the version
> > 3.1.0 of
> > Apache Flink CDC, as follows:
> > [ ] +1, Approve the release
> > [ ] -1, Do not approve the release (please provide specific comments)
> >
> > **Release Overview**
> >
> > As an overview, the release consists of the following:
> > a) Flink CDC source release to be deployed to dist.apache.org
> > b) Maven artifacts to be deployed to the Maven Central Repository
> >
> > **Staging Areas to Review**
> >
> > The staging areas containing the above mentioned artifacts are as
> > follows,
> > for your review:
> > * All artifacts for a) can be found in the corresponding dev repository
> > at
> > dist.apache.org [1], which are signed with the key with fingerprint
> > A1BD477F79D036D2C30CA7DBCA8AEEC2F6EB040B [2]
> > * All artifacts for b) can be found at the Apache Nexus Repository [3]
> >
> > Other links for your review:
> > * JIRA release notes [4]
> > * Source code tag "release-3.1.0-rc1" with commit hash
> > 63b42cb937d481f558209ab3c8547959cf039643 [5]
> > * PR for release announcement blog post of Flink CDC 3.1.0 in flink-web
> > [6]
> >
> > **Vote Duration**
> >
> > The voting time will run for at least 72 hours, adopted by majority
> > approval with at least 3 PMC affirmative votes.
> >
> > Thanks,
> > Qingsheng Ren
> >
> > [1] https://dist.apache.org/repos/dist/dev/flink/flink-cdc-3.1.0-rc1/
> > [2] https://dist.apache.org/repos/dist/release/flink/KEYS
> > [3]
> > https://repository.apache.org/content/repositories/orgapacheflink-1731
> > [4]
> >
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12354387
> > [5] https://github.com/apache/flink-cdc/releases/tag/release-3.1.0-rc1
> > [6] https://github.com/apache/flink-web/pull/739
>


Re: Discussion: Condition field in the CR status

2024-05-03 Thread Jeyhun Karimov
Hi Lajith,

Thanks a lot for driving this FLIP. Please find my comments below:

- I echo Gyula that including some examples and further explanations might
ease reader's work. With the current version, the FLIP is a bit hard to
follow.

- Will the usage of Conditions be enabled by default? Or will there be any
disadvantages for Flink users?

If Conditions with the same type already exist in the Status Conditions
> list, then replace the existing condition with the same type if the
> Condition status and message are different.

 - Do you think we should have clear rules about handling rules for how
these Conditions should be managed, especially when multiple Conditions of
the same type are present?
For example, resource has multiple causes for the same condition (e.g.,
Error due to network and Error due to I/O). Then, overriding the old
condition with the new one is not the best approach no?
Please correct me if I misunderstood.

Regards,
Jeyhun

On Fri, May 3, 2024 at 8:53 AM Gyula Fóra  wrote:

> Hi Lajith!
>
> Can you please include some examples in the document to help reviewers?
> Just some examples with the status and the proposed conditions.
>
> Cheers,
> Gyula
>
> On Wed, May 1, 2024 at 9:06 AM Lajith Koova  wrote:
>
> > Hello,
> >
> >
> > Starting discussion thread here to discuss a proposal to add Conditions
> > field in the CR status of Flink Deployment and FlinkSessionJob.
> >
> >
> > Here is the google doc with details. Please provide your thoughts/inputs.
> >
> >
> >
> >
> https://docs.google.com/document/d/12wlJCL_Vq2KZnABzK7OR7gAd1jZMmo0MxgXQXqtWODs/edit?usp=sharing
> >
> >
> > Thanks
> > Lajith
> >
>


Re: [VOTE] FLIP-454: New Apicurio Avro format

2024-05-03 Thread Jeyhun Karimov
+1 (non binding)

Thanks for driving this FLIP David.

Regards,
Jeyhun

On Fri, May 3, 2024 at 2:21 PM Mark Nuttall  wrote:

> +1, I would also like to see first class support for Avro and Apicurio
>
> -- Mark Nuttall, mnutt...@apache.org
> Senior Software Engineer, IBM Event Automation
>
> On 2024/05/02 09:41:09 David Radley wrote:
> > Hi everyone,
> >
> > I'd like to start a vote on the FLIP-454: New Apicurio Avro format
> > [1]. The discussion thread is here [2].
> >
> > The vote will be open for at least 72 hours unless there is an
> > objection
> > or
> > insufficient votes.
> >
> > [1]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-454%3A+New+Apicurio+Avro+format
> > [2] https://lists.apache.org/thread/wtkl4yn847tdd0wrqm5xgv9wc0cb0kr8
> >
> >
> > Kind regards, David.
> >
> > Unless otherwise stated above:
> >
> > IBM United Kingdom Limited
> > Registered in England and Wales with number 741598
> > Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU
> >
>


Re: [DISCUSSION] FLIP-449: Reorganization of flink-connector-jdbc

2024-05-03 Thread Jeyhun Karimov
Hi Boto,

Thanks for driving this FLIP. +1 for it.

I would also ask to include a sample usage and changes for end-users in the
FLIP.

flink-connector-jdbc: The current module, which will be transformed to
> shade all other modules and maintain backward compatibility.


Also, in order to ensure the backwards compatibility, do you think at some
point we might need to decouple interface and implementations and put only
interfaces in flink-connector-jdbc module?

Regards,
Jeyhun

On Fri, May 3, 2024 at 2:56 PM João Boto  wrote:

> Hi,
>
> > > You can now update the derby implementation and the core independently
> and decide at your own will when to include the new derby in the core?
> Not really, we are talking about creating modules in the same repository,
> not about externalizing the database modules. That is, whenever there is a
> release, both the core and the DBs will be released at the same time.
>
> > > For clarity of motivation, could you please add some concrete examples
> (just a couple) to the FLIP to clarify when this really comes in handy?
> Added.
>
> Best
>
> On 2024/04/26 07:59:30 lorenzo.affe...@ververica.com.INVALID wrote:
> > Hello Joao,
> > thank your for your proposal, modularity is always welcome :)
> >
> > > To maintain clarity and minimize conflicts, we're currently leaning
> towards maintaining the existing structure, where
> flink-connector-jdbc-${version}.jar remains shaded for simplicity,
> encompassing the core functionality and all database-related features
> within the same JAR.
> >
> > I do agree with this approach as the usecase of reading/writing to
> different DBs could be quite common.
> >
> > However, I am missing what would be the concrete advantage in this
> change for connector maintainability.
> > I make an example:
> > You can now update the derby implementation and the core independently
> and decide at your own will when to include the new derby in the core?
> >
> > For clarity of motivation, could you please add some concrete examples
> (just a couple) to the FLIP to clarify when this really comes in handy?
> >
> > Thank you!
> > On Apr 26, 2024 at 04:19 +0200, Muhammet Orazov
> , wrote:
> > > Hey João,
> > >
> > > Thanks for FLIP proposal!
> > >
> > > Since proposal is to introduce modules, would it make sense
> > > to have another module for APIs (flink-jdbc-connector-api)?
> > >
> > > For this I would suggest to move all public interfaces (e.g,
> > > JdbcRowConverter, JdbcConnectionProvider). And even convert
> > > some classes into interface with their default implementations,
> > > for example, JdbcSink, JdbcConnectionOptions.
> > >
> > > This way users would have clear interfaces to build their own
> > > JDBC based Flink connectors.
> > >
> > > Here I am not suggesting to introduce new interfaces, only
> > > suggest also to separate the API from the core implementation.
> > >
> > > What do you think?
> > >
> > > Best,
> > > Muhammet
> > >
> > >
> > > On 2024-04-25 08:54, Joao Boto wrote:
> > > > Hi all,
> > > >
> > > > I'd like to start a discussion on FLIP-449: Reorganization of
> > > > flink-connector-jdbc [1].
> > > > As Flink continues to evolve, we've noticed an increasing level of
> > > > complexity within the JDBC connector.
> > > > The proposed solution is to address this complexity by separating the
> > > > core
> > > > functionality from individual database components, thereby
> streamlining
> > > > the
> > > > structure into distinct modules.
> > > >
> > > > Looking forward to your feedback and suggestions, thanks.
> > > > Best regards,
> > > > Joao Boto
> > > >
> > > > [1]
> > > >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-449%3A+Reorganization+of+flink-connector-jdbc
> >
>


Re: [DISCUSS] FLIP-453: Promote Unified Sink API V2 to Public and Deprecate SinkFunction

2024-05-03 Thread Péter Váry
> With regards to FLINK-35149, the fix version indicates a change at Flink
CDC; is that indeed correct, or does it require a change in the SinkV2
interface?

The fix doesn't need change in SinkV2, so we are good there.
The issue is that the new SinkV2 SupportsCommitter/SupportsPreWriteTopology
doesn't work with the CDC yet.

Martijn Visser  ezt írta (időpont: 2024. máj. 3.,
P, 14:06):

> Hi Ferenc,
>
> You're right, 1.20 it is :)
>
> I've assigned the HBase one to you!
>
> Thanks,
>
> Martijn
>
> On Fri, May 3, 2024 at 1:55 PM Ferenc Csaky 
> wrote:
>
> > Hi Martijn,
> >
> > +1 for the proposal.
> >
> > > targeted for Flink 1.19
> >
> > I guess you meant Flink 1.20 here.
> >
> > Also, I volunteer to take updating the HBase sink, feel free to assign
> > that task to me.
> >
> > Best,
> > Ferenc
> >
> >
> >
> >
> > On Friday, May 3rd, 2024 at 10:20, Martijn Visser <
> > martijnvis...@apache.org> wrote:
> >
> > >
> > >
> > > Hi Peter,
> > >
> > > I'll add it for completeness, thanks!
> > > With regards to FLINK-35149, the fix version indicates a change at
> Flink
> > > CDC; is that indeed correct, or does it require a change in the SinkV2
> > > interface?
> > >
> > > Best regards,
> > >
> > > Martijn
> > >
> > >
> > > On Fri, May 3, 2024 at 7:47 AM Péter Váry peter.vary.apa...@gmail.com
> > >
> > > wrote:
> > >
> > > > Hi Martijn,
> > > >
> > > > We might want to add FLIP-371 [1] to the list. (Or we aim only for
> > higher
> > > > level FLIPs?)
> > > >
> > > > We are in the process of using the new API in Iceberg connector [2] -
> > so
> > > > far, so good.
> > > >
> > > > I know of one minor known issue about the sink [3], which should be
> > ready
> > > > for the release.
> > > >
> > > > All-in-all, I think we are in good shape, and we could move forward
> > with
> > > > the promotion.
> > > >
> > > > Thanks,
> > > > Peter
> > > >
> > > > [1] -
> > > >
> > > >
> >
> https://cwiki.apache.org/confluence/plugins/servlet/mobile?contentId=263430387
> > > > [2] - https://github.com/apache/iceberg/pull/10179
> > > > [3] - https://issues.apache.org/jira/browse/FLINK-35149
> > > >
> > > > On Thu, May 2, 2024, 09:47 Muhammet Orazov
> > mor+fl...@morazow.com.invalid
> > > > wrote:
> > > >
> > > > > Got it, thanks!
> > > > >
> > > > > On 2024-05-02 06:53, Martijn Visser wrote:
> > > > >
> > > > > > Hi Muhammet,
> > > > > >
> > > > > > Thanks for joining the discussion! The changes in this FLIP would
> > be
> > > > > > targeted for Flink 1.19, since it's only a matter of changing the
> > > > > > annotation.
> > > > > >
> > > > > > Best regards,
> > > > > >
> > > > > > Martijn
> > > > > >
> > > > > > On Thu, May 2, 2024 at 7:26 AM Muhammet Orazov
> > mor+fl...@morazow.com
> > > > > > wrote:
> > > > > >
> > > > > > > Hello Martijn,
> > > > > > >
> > > > > > > Thanks for the FLIP and detailed history of changes, +1.
> > > > > > >
> > > > > > > Would FLIP changes target for 2.0? I think it would be good
> > > > > > > to have clear APIs on 2.0 release.
> > > > > > >
> > > > > > > Best,
> > > > > > > Muhammet
> > > > > > >
> > > > > > > On 2024-05-01 15:30, Martijn Visser wrote:
> > > > > > >
> > > > > > > > Hi everyone,
> > > > > > > >
> > > > > > > > I would like to start a discussion on FLIP-453: Promote
> > Unified Sink
> > > > > > > > API V2
> > > > > > > > to Public and Deprecate SinkFunction
> > > > > > > > https://cwiki.apache.org/confluence/x/rIobEg
> > > > > > > >
> > > > > > > > This FLIP proposes to promote the Unified Sink API V2 from
> > > > > > > > PublicEvolving
> > > > > > > > to Public and to mark the SinkFunction as Deprecated.
> > > > > > > >
> > > > > > > > I'm looking forward to your thoughts.
> > > > > > > >
> > > > > > > > Best regards,
> > > > > > > >
> > > > > > > > Martijn
> >
>


Re: [DISCUSSION] FLIP-449: Reorganization of flink-connector-jdbc

2024-05-03 Thread João Boto
Hi,

> > You can now update the derby implementation and the core independently and 
> > decide at your own will when to include the new derby in the core?
Not really, we are talking about creating modules in the same repository, not 
about externalizing the database modules. That is, whenever there is a release, 
both the core and the DBs will be released at the same time.

> > For clarity of motivation, could you please add some concrete examples 
> > (just a couple) to the FLIP to clarify when this really comes in handy?
Added.

Best

On 2024/04/26 07:59:30 lorenzo.affe...@ververica.com.INVALID wrote:
> Hello Joao,
> thank your for your proposal, modularity is always welcome :)
> 
> > To maintain clarity and minimize conflicts, we're currently leaning towards 
> > maintaining the existing structure, where 
> > flink-connector-jdbc-${version}.jar remains shaded for simplicity, 
> > encompassing the core functionality and all database-related features 
> > within the same JAR.
> 
> I do agree with this approach as the usecase of reading/writing to different 
> DBs could be quite common.
> 
> However, I am missing what would be the concrete advantage in this change for 
> connector maintainability.
> I make an example:
> You can now update the derby implementation and the core independently and 
> decide at your own will when to include the new derby in the core?
> 
> For clarity of motivation, could you please add some concrete examples (just 
> a couple) to the FLIP to clarify when this really comes in handy?
> 
> Thank you!
> On Apr 26, 2024 at 04:19 +0200, Muhammet Orazov 
> , wrote:
> > Hey João,
> >
> > Thanks for FLIP proposal!
> >
> > Since proposal is to introduce modules, would it make sense
> > to have another module for APIs (flink-jdbc-connector-api)?
> >
> > For this I would suggest to move all public interfaces (e.g,
> > JdbcRowConverter, JdbcConnectionProvider). And even convert
> > some classes into interface with their default implementations,
> > for example, JdbcSink, JdbcConnectionOptions.
> >
> > This way users would have clear interfaces to build their own
> > JDBC based Flink connectors.
> >
> > Here I am not suggesting to introduce new interfaces, only
> > suggest also to separate the API from the core implementation.
> >
> > What do you think?
> >
> > Best,
> > Muhammet
> >
> >
> > On 2024-04-25 08:54, Joao Boto wrote:
> > > Hi all,
> > >
> > > I'd like to start a discussion on FLIP-449: Reorganization of
> > > flink-connector-jdbc [1].
> > > As Flink continues to evolve, we've noticed an increasing level of
> > > complexity within the JDBC connector.
> > > The proposed solution is to address this complexity by separating the
> > > core
> > > functionality from individual database components, thereby streamlining
> > > the
> > > structure into distinct modules.
> > >
> > > Looking forward to your feedback and suggestions, thanks.
> > > Best regards,
> > > Joao Boto
> > >
> > > [1]
> > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-449%3A+Reorganization+of+flink-connector-jdbc
> 


Re: [DISCUSSION] FLIP-449: Reorganization of flink-connector-jdbc

2024-05-03 Thread João Boto
Hi Muhammet,

While I generally agree, given our current usage, I'm struggling to discern any 
clear advantage. We already have abstract implementations that cover all 
necessary interfaces and offer essential functionality, complemented by a 
robust set of reusable tests to streamline implementation.

With this established infrastructure in place, coupled with the added import 
overhead of introducing another module, I find it difficult to identify any 
distinct benefits at this point.

Best

On 2024/04/26 02:18:52 Muhammet Orazov wrote:
> Hey João,
> 
> Thanks for FLIP proposal!
> 
> Since proposal is to introduce modules, would it make sense
> to have another module for APIs (flink-jdbc-connector-api)?
> 
> For this I would suggest to move all public interfaces (e.g,
> JdbcRowConverter, JdbcConnectionProvider). And even convert
> some classes into interface with their default implementations,
> for example, JdbcSink, JdbcConnectionOptions.
> 
> This way users would have clear interfaces to build their own
> JDBC based Flink connectors.
> 
> Here I am not suggesting to introduce new interfaces, only
> suggest also to separate the API from the core implementation.
> 
> What do you think?
> 
> Best,
> Muhammet
> 
> 
> On 2024-04-25 08:54, Joao Boto wrote:
> > Hi all,
> > 
> > I'd like to start a discussion on FLIP-449: Reorganization of
> > flink-connector-jdbc [1].
> > As Flink continues to evolve, we've noticed an increasing level of
> > complexity within the JDBC connector.
> > The proposed solution is to address this complexity by separating the 
> > core
> > functionality from individual database components, thereby streamlining 
> > the
> > structure into distinct modules.
> > 
> > Looking forward to your feedback and suggestions, thanks.
> > Best regards,
> > Joao Boto
> > 
> > [1]
> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-449%3A+Reorganization+of+flink-connector-jdbc
> 


Re: [VOTE] FLIP-454: New Apicurio Avro format

2024-05-03 Thread Mark Nuttall
+1, I would also like to see first class support for Avro and Apicurio

-- Mark Nuttall, mnutt...@apache.org
Senior Software Engineer, IBM Event Automation

On 2024/05/02 09:41:09 David Radley wrote:
> Hi everyone,
> 
> I'd like to start a vote on the FLIP-454: New Apicurio Avro format
> [1]. The discussion thread is here [2].
> 
> The vote will be open for at least 72 hours unless there is an
> objection
> or
> insufficient votes.
> 
> [1] 
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-454%3A+New+Apicurio+Avro+format
> [2] https://lists.apache.org/thread/wtkl4yn847tdd0wrqm5xgv9wc0cb0kr8
> 
> 
> Kind regards, David.
> 
> Unless otherwise stated above:
> 
> IBM United Kingdom Limited
> Registered in England and Wales with number 741598
> Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU
> 


Re: [DISCUSS] FLIP-453: Promote Unified Sink API V2 to Public and Deprecate SinkFunction

2024-05-03 Thread Martijn Visser
Hi Ferenc,

You're right, 1.20 it is :)

I've assigned the HBase one to you!

Thanks,

Martijn

On Fri, May 3, 2024 at 1:55 PM Ferenc Csaky 
wrote:

> Hi Martijn,
>
> +1 for the proposal.
>
> > targeted for Flink 1.19
>
> I guess you meant Flink 1.20 here.
>
> Also, I volunteer to take updating the HBase sink, feel free to assign
> that task to me.
>
> Best,
> Ferenc
>
>
>
>
> On Friday, May 3rd, 2024 at 10:20, Martijn Visser <
> martijnvis...@apache.org> wrote:
>
> >
> >
> > Hi Peter,
> >
> > I'll add it for completeness, thanks!
> > With regards to FLINK-35149, the fix version indicates a change at Flink
> > CDC; is that indeed correct, or does it require a change in the SinkV2
> > interface?
> >
> > Best regards,
> >
> > Martijn
> >
> >
> > On Fri, May 3, 2024 at 7:47 AM Péter Váry peter.vary.apa...@gmail.com
> >
> > wrote:
> >
> > > Hi Martijn,
> > >
> > > We might want to add FLIP-371 [1] to the list. (Or we aim only for
> higher
> > > level FLIPs?)
> > >
> > > We are in the process of using the new API in Iceberg connector [2] -
> so
> > > far, so good.
> > >
> > > I know of one minor known issue about the sink [3], which should be
> ready
> > > for the release.
> > >
> > > All-in-all, I think we are in good shape, and we could move forward
> with
> > > the promotion.
> > >
> > > Thanks,
> > > Peter
> > >
> > > [1] -
> > >
> > >
> https://cwiki.apache.org/confluence/plugins/servlet/mobile?contentId=263430387
> > > [2] - https://github.com/apache/iceberg/pull/10179
> > > [3] - https://issues.apache.org/jira/browse/FLINK-35149
> > >
> > > On Thu, May 2, 2024, 09:47 Muhammet Orazov
> mor+fl...@morazow.com.invalid
> > > wrote:
> > >
> > > > Got it, thanks!
> > > >
> > > > On 2024-05-02 06:53, Martijn Visser wrote:
> > > >
> > > > > Hi Muhammet,
> > > > >
> > > > > Thanks for joining the discussion! The changes in this FLIP would
> be
> > > > > targeted for Flink 1.19, since it's only a matter of changing the
> > > > > annotation.
> > > > >
> > > > > Best regards,
> > > > >
> > > > > Martijn
> > > > >
> > > > > On Thu, May 2, 2024 at 7:26 AM Muhammet Orazov
> mor+fl...@morazow.com
> > > > > wrote:
> > > > >
> > > > > > Hello Martijn,
> > > > > >
> > > > > > Thanks for the FLIP and detailed history of changes, +1.
> > > > > >
> > > > > > Would FLIP changes target for 2.0? I think it would be good
> > > > > > to have clear APIs on 2.0 release.
> > > > > >
> > > > > > Best,
> > > > > > Muhammet
> > > > > >
> > > > > > On 2024-05-01 15:30, Martijn Visser wrote:
> > > > > >
> > > > > > > Hi everyone,
> > > > > > >
> > > > > > > I would like to start a discussion on FLIP-453: Promote
> Unified Sink
> > > > > > > API V2
> > > > > > > to Public and Deprecate SinkFunction
> > > > > > > https://cwiki.apache.org/confluence/x/rIobEg
> > > > > > >
> > > > > > > This FLIP proposes to promote the Unified Sink API V2 from
> > > > > > > PublicEvolving
> > > > > > > to Public and to mark the SinkFunction as Deprecated.
> > > > > > >
> > > > > > > I'm looking forward to your thoughts.
> > > > > > >
> > > > > > > Best regards,
> > > > > > >
> > > > > > > Martijn
>


Re: [DISCUSS] FLIP-453: Promote Unified Sink API V2 to Public and Deprecate SinkFunction

2024-05-03 Thread Ferenc Csaky
Hi Martijn,

+1 for the proposal.

> targeted for Flink 1.19

I guess you meant Flink 1.20 here.

Also, I volunteer to take updating the HBase sink, feel free to assign that 
task to me.

Best,
Ferenc




On Friday, May 3rd, 2024 at 10:20, Martijn Visser  
wrote:

> 
> 
> Hi Peter,
> 
> I'll add it for completeness, thanks!
> With regards to FLINK-35149, the fix version indicates a change at Flink
> CDC; is that indeed correct, or does it require a change in the SinkV2
> interface?
> 
> Best regards,
> 
> Martijn
> 
> 
> On Fri, May 3, 2024 at 7:47 AM Péter Váry peter.vary.apa...@gmail.com
> 
> wrote:
> 
> > Hi Martijn,
> > 
> > We might want to add FLIP-371 [1] to the list. (Or we aim only for higher
> > level FLIPs?)
> > 
> > We are in the process of using the new API in Iceberg connector [2] - so
> > far, so good.
> > 
> > I know of one minor known issue about the sink [3], which should be ready
> > for the release.
> > 
> > All-in-all, I think we are in good shape, and we could move forward with
> > the promotion.
> > 
> > Thanks,
> > Peter
> > 
> > [1] -
> > 
> > https://cwiki.apache.org/confluence/plugins/servlet/mobile?contentId=263430387
> > [2] - https://github.com/apache/iceberg/pull/10179
> > [3] - https://issues.apache.org/jira/browse/FLINK-35149
> > 
> > On Thu, May 2, 2024, 09:47 Muhammet Orazov mor+fl...@morazow.com.invalid
> > wrote:
> > 
> > > Got it, thanks!
> > > 
> > > On 2024-05-02 06:53, Martijn Visser wrote:
> > > 
> > > > Hi Muhammet,
> > > > 
> > > > Thanks for joining the discussion! The changes in this FLIP would be
> > > > targeted for Flink 1.19, since it's only a matter of changing the
> > > > annotation.
> > > > 
> > > > Best regards,
> > > > 
> > > > Martijn
> > > > 
> > > > On Thu, May 2, 2024 at 7:26 AM Muhammet Orazov mor+fl...@morazow.com
> > > > wrote:
> > > > 
> > > > > Hello Martijn,
> > > > > 
> > > > > Thanks for the FLIP and detailed history of changes, +1.
> > > > > 
> > > > > Would FLIP changes target for 2.0? I think it would be good
> > > > > to have clear APIs on 2.0 release.
> > > > > 
> > > > > Best,
> > > > > Muhammet
> > > > > 
> > > > > On 2024-05-01 15:30, Martijn Visser wrote:
> > > > > 
> > > > > > Hi everyone,
> > > > > > 
> > > > > > I would like to start a discussion on FLIP-453: Promote Unified Sink
> > > > > > API V2
> > > > > > to Public and Deprecate SinkFunction
> > > > > > https://cwiki.apache.org/confluence/x/rIobEg
> > > > > > 
> > > > > > This FLIP proposes to promote the Unified Sink API V2 from
> > > > > > PublicEvolving
> > > > > > to Public and to mark the SinkFunction as Deprecated.
> > > > > > 
> > > > > > I'm looking forward to your thoughts.
> > > > > > 
> > > > > > Best regards,
> > > > > > 
> > > > > > Martijn


Re: [DISCUSS] FLIP-453: Promote Unified Sink API V2 to Public and Deprecate SinkFunction

2024-05-03 Thread Martijn Visser
Hi Peter,

I'll add it for completeness, thanks!
With regards to FLINK-35149, the fix version indicates a change at Flink
CDC; is that indeed correct, or does it require a change in the SinkV2
interface?

Best regards,

Martijn


On Fri, May 3, 2024 at 7:47 AM Péter Váry 
wrote:

> Hi Martijn,
>
> We might want to add FLIP-371 [1] to the list. (Or we aim only for higher
> level FLIPs?)
>
> We are in the process of using the new API in Iceberg connector [2] - so
> far, so good.
>
> I know of one minor known issue about the sink [3], which should be ready
> for the release.
>
> All-in-all, I think we are in good shape, and we could move forward with
> the promotion.
>
> Thanks,
> Peter
>
> [1] -
>
> https://cwiki.apache.org/confluence/plugins/servlet/mobile?contentId=263430387
> [2] - https://github.com/apache/iceberg/pull/10179
> [3] - https://issues.apache.org/jira/browse/FLINK-35149
>
>
> On Thu, May 2, 2024, 09:47 Muhammet Orazov 
> wrote:
>
> > Got it, thanks!
> >
> > On 2024-05-02 06:53, Martijn Visser wrote:
> > > Hi Muhammet,
> > >
> > > Thanks for joining the discussion! The changes in this FLIP would be
> > > targeted for Flink 1.19, since it's only a matter of changing the
> > > annotation.
> > >
> > > Best regards,
> > >
> > > Martijn
> > >
> > > On Thu, May 2, 2024 at 7:26 AM Muhammet Orazov 
> > > wrote:
> > >
> > >> Hello Martijn,
> > >>
> > >> Thanks for the FLIP and detailed history of changes, +1.
> > >>
> > >> Would FLIP changes target for 2.0? I think it would be good
> > >> to have clear APIs on 2.0 release.
> > >>
> > >> Best,
> > >> Muhammet
> > >>
> > >> On 2024-05-01 15:30, Martijn Visser wrote:
> > >> > Hi everyone,
> > >> >
> > >> > I would like to start a discussion on FLIP-453: Promote Unified Sink
> > >> > API V2
> > >> > to Public and Deprecate SinkFunction
> > >> > https://cwiki.apache.org/confluence/x/rIobEg
> > >> >
> > >> > This FLIP proposes to promote the Unified Sink API V2 from
> > >> > PublicEvolving
> > >> > to Public and to mark the SinkFunction as Deprecated.
> > >> >
> > >> > I'm looking forward to your thoughts.
> > >> >
> > >> > Best regards,
> > >> >
> > >> > Martijn
> > >>
> >
>


[jira] [Created] (FLINK-35287) Builder builds NetworkConfig for Elasticsearch connector 8

2024-05-03 Thread Mingliang Liu (Jira)
Mingliang Liu created FLINK-35287:
-

 Summary: Builder builds NetworkConfig for Elasticsearch connector 8
 Key: FLINK-35287
 URL: https://issues.apache.org/jira/browse/FLINK-35287
 Project: Flink
  Issue Type: Improvement
  Components: Connectors / ElasticSearch
Reporter: Mingliang Liu


In FLINK-26088 we added support for ElasticSearch 8.0. It is based on Async 
sink API and does not use the base module 
{{flink-connector-elasticsearch-base}}. Regarding the config options (host, 
username, password, headers, ssl...), we pass all options from the builder to 
AsyncSink, and last to AsyncWriter. It is less flexible when we add new options 
and the constructors will get longer and multiple places may validate options 
unnecessarily. I think it's nice if we make the sink builder builds the 
NetworkConfig once, and pass it all the way to the writer. This is also how the 
base module for 6.x / 7.x is implemented. In my recent work adding new options 
to the network config, this way works simpler.

Let me create a PR to demonstrate the idea. No new features or major code 
refactoring other than the builder builds the NetworkConfig (code will be 
shorter). I have a few small fixes which I'll include into the incoming PR.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)