date:20240319

[jira] [Created] (FLINK-34751) RestClusterClient APIs doesn't work with running Flink application on YARN

2024-03-19 Thread Venkata krishnan Sowrirajan (Jira)

Venkata krishnan Sowrirajan created FLINK-34751:
---

 Summary: RestClusterClient APIs doesn't work with running Flink 
application on YARN
 Key: FLINK-34751
 URL: https://issues.apache.org/jira/browse/FLINK-34751
 Project: Flink
  Issue Type: Bug
Reporter: Venkata krishnan Sowrirajan


Apache YARN uses web proxy in Resource Manager to expose the endpoints 
available through the AM process (in this case RestServerEndpoint that run as 
part of AM). Note: this is in the context of running Flink cluster in YARN 
application mode.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-34750) "legacy-flink-cdc-sources" Page of DB2 for Flink CDC Chinese Documentation.

2024-03-19 Thread LvYanquan (Jira)

LvYanquan created FLINK-34750:
-

 Summary: "legacy-flink-cdc-sources" Page of DB2 for Flink CDC 
Chinese Documentation.
 Key: FLINK-34750
 URL: https://issues.apache.org/jira/browse/FLINK-34750
 Project: Flink
  Issue Type: Sub-task
  Components: chinese-translation, Documentation, Flink CDC
Affects Versions: cdc-3.1.0
Reporter: LvYanquan
 Fix For: cdc-3.1.0


Translate legacy-flink-cdc-sources pages of 
[https://github.com/apache/flink-cdc/blob/master/docs/content/docs/connectors/legacy-flink-cdc-sources/postgres-cdc.md
 
|https://github.com/apache/flink-cdc/blob/master/docs/content/docs/connectors/legacy-flink-cdc-sources/db2-cdc.md]into
 Chinese.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-34749) "legacy-flink-cdc-sources" Page of SQLServer for Flink CDC Chinese Documentation.

2024-03-19 Thread LvYanquan (Jira)

LvYanquan created FLINK-34749:
-

 Summary: "legacy-flink-cdc-sources" Page of SQLServer for Flink 
CDC Chinese Documentation.
 Key: FLINK-34749
 URL: https://issues.apache.org/jira/browse/FLINK-34749
 Project: Flink
  Issue Type: Sub-task
  Components: chinese-translation, Documentation, Flink CDC
Affects Versions: cdc-3.1.0
Reporter: LvYanquan
 Fix For: cdc-3.1.0


Translate legacy-flink-cdc-sources pages of 
[https://github.com/apache/flink-cdc/blob/master/docs/content/docs/connectors/legacy-flink-cdc-sources/sqlserver-cdc.md
 
|https://github.com/apache/flink-cdc/blob/master/docs/content/docs/connectors/legacy-flink-cdc-sources/db2-cdc.md]into
 Chinese.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-34748) "legacy-flink-cdc-sources" Page of Oracle for Flink CDC Chinese Documentation.

2024-03-19 Thread LvYanquan (Jira)

LvYanquan created FLINK-34748:
-

 Summary: "legacy-flink-cdc-sources" Page of Oracle for Flink CDC 
Chinese Documentation.
 Key: FLINK-34748
 URL: https://issues.apache.org/jira/browse/FLINK-34748
 Project: Flink
  Issue Type: Sub-task
  Components: chinese-translation, Documentation, Flink CDC
Affects Versions: cdc-3.1.0
Reporter: LvYanquan
 Fix For: cdc-3.1.0


Translate legacy-flink-cdc-sources pages of 
[https://github.com/apache/flink-cdc/blob/master/docs/content/docs/connectors/legacy-flink-cdc-sources/oracle-cdc.md
 
|https://github.com/apache/flink-cdc/blob/master/docs/content/docs/connectors/legacy-flink-cdc-sources/db2-cdc.md]into
 Chinese.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Re: Re: [DISCUSS] FLIP-436: Introduce "SHOW CREATE CATALOG" Syntax

2024-03-19 Thread Yubin Li

Hi Hang,

I have updated FLIP as you suggested, thanks for your valuable feedback!

Best,
Yubin

On Wed, Mar 20, 2024 at 11:15 AM Hang Ruan  wrote:
>
> Hi, Yubin,
>
> I found a little mistake in FLIP.
> `ALTER CATALOG catalog_name RESET (key1=val1, key2=val2, ...)` should be
> changed as `ALTER CATALOG catalog_name RESET (key1, key2, ...)`, right?
>
> Best,
> Hang
>
>
> Lincoln Lee  于2024年3月20日周三 10:04写道：
>
> > Hi Yubin,
> >
> > Sorry, please ignore my last reply (wrong context).
> > I also asked Leonard, your proposal to extend the `CatalogDescriptor`
> > should be okay.
> >
> > Thank you for your update : ) !
> >
> >
> > Best,
> > Lincoln Lee
> >
> >
> > Lincoln Lee  于2024年3月20日周三 09:35写道：
> >
> > > Hi Yubin,
> > >
> > > Thank you for detailed explaination! I overlooked `CatalogBaseTable`, in
> > > fact
> > >  there is already a `String getComment();` interface similar to
> > `database`
> > > and `table`.
> > > Can we continue the work on FLINK-21665 and complete its implementation?
> > > It seems to be very close.
> > >
> > > Best,
> > > Lincoln Lee
> > >
> > >
> > > Yubin Li  于2024年3月20日周三 01:42写道：
> > >
> > >> Hi Lincoln,
> > >>
> > >> Thanks for your detailed comments!
> > >>
> > >> Supporting comments for `Catalog` is a really helpful feature, I agree
> > >> with you to make it introduced in this FLIP, thank you for pointing
> > >> that out :)
> > >>
> > >> Concerning the implementation, I propose to introduce `getComment()`
> > >> method in `CatalogDescriptor`, and the reasons are as follows. WDYT?
> > >> 1. For the sake of design consistency, follow the design of FLIP-295
> > >> [1] which introduced `CatalogStore` component, `CatalogDescriptor`
> > >> includes names and attributes, both of which are used to describe the
> > >> catalog, and `comment` can be added smoothly.
> > >> 2. Extending the existing class rather than add new method to the
> > >> existing interface, Especially, the `Catalog` interface, as a core
> > >> interface, is used by a series of important components such as
> > >> `CatalogFactory`, `CatalogManager` and `FactoryUtil`, and is
> > >> implemented by a large number of connectors such as JDBC, Paimon, and
> > >> Hive. Adding methods to it will greatly increase the implementation
> > >> complexity, and more importantly, increase the cost of iteration,
> > >> maintenance, and verification.
> > >>
> > >> Please see FLIP doc [2] for details.
> > >>
> > >> [1]
> > >>
> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-295%3A+Support+lazy+initialization+of+catalogs+and+persistence+of+catalog+configurations
> > >> [2]
> > >>
> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-436%3A+Introduce+Catalog-related+Syntax
> > >>
> > >> Best,
> > >> Yubin
> > >>
> > >> On Tue, Mar 19, 2024 at 9:57 PM Lincoln Lee 
> > >> wrote:
> > >> >
> > >> > Hi Yubin,
> > >> >
> > >> > Thanks for your quickly response!
> > >> >
> > >> > It would be better to support comments just like create `database` and
> > >> > `table` with comment.
> > >> > That is, add `String getComment();` to the current `Catalog`
> > interface.
> > >> > WDYT?
> > >> >
> > >> > Best,
> > >> > Lincoln Lee
> > >> >
> > >> >
> > >> > Yubin Li  于2024年3月19日周二 21:44写道：
> > >> >
> > >> > > Hi Lincoln,
> > >> > >
> > >> > > Good catch. Thanks for your suggestions.
> > >> > >
> > >> > > I found that the creation statements of database and table both
> > >> > > support specifying "if not exists". For the sake of syntactic
> > >> > > consistency and user practicality, We could introduce the '[if not
> > >> > > exists]' clause to the 'create catalog' statement.
> > >> > >
> > >> > > As for the introduction of the `catalog comment` feature, it may
> > >> > > involve changes to the Catalog structure, which can be left for
> > future
> > >> > > discussion.
> > >> > >
> > >> > > WDYT? Looking forward to your feedback :)
> > >> > >
> > >> > > Best,
> > >> > > Yubin
> > >> > >
> > >> > > On Tue, Mar 19, 2024 at 9:06 PM Lincoln Lee  > >
> > >> > > wrote:
> > >> > > >
> > >> > > > Hi Yubin,
> > >> > > >
> > >> > > > Big +1 for completing the catalog api!
> > >> > > > There's a minor addition[1] which does not affect the vote could
> > >> also be
> > >> > > > considered.
> > >> > > >
> > >> > > > [1] https://issues.apache.org/jira/browse/FLINK-21665
> > >> > > >
> > >> > > >
> > >> > > > Best,
> > >> > > > Lincoln Lee
> > >> > > >
> > >> > > >
> > >> > > > Yubin Li  于2024年3月18日周一 17:44写道：
> > >> > > >
> > >> > > > > Hi Jark,
> > >> > > > >
> > >> > > > > Thanks for your response, I have updated FLIP-436: Introduce
> > >> > > > > Catalog-related Syntax [1] as you suggested.
> > >> > > > >
> > >> > > > > If there are no more comments within 24 hours, I will start a
> > >> vote for
> > >> > > > > this, thanks :)
> > >> > > > >
> > >> > > > > Best,
> > >> > > > > Yubin
> > >> > > > >
> > >> > > > > [1]
> > >> > > > >
> > >> > >
> > >>
> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-436%3A+Introduce+Catalog-related+Syntax
> > >>

[jira] [Created] (FLINK-34747) "legacy-flink-cdc-sources" Page of DB2 for Flink CDC Chinese Documentation.

2024-03-19 Thread LvYanquan (Jira)

LvYanquan created FLINK-34747:
-

 Summary: "legacy-flink-cdc-sources" Page of DB2 for Flink CDC 
Chinese Documentation.
 Key: FLINK-34747
 URL: https://issues.apache.org/jira/browse/FLINK-34747
 Project: Flink
  Issue Type: Sub-task
  Components: chinese-translation, Documentation, Flink CDC
Affects Versions: cdc-3.1.0
Reporter: LvYanquan
 Fix For: cdc-3.1.0


Translate legacy-flink-cdc-sources pages of 
[https://github.com/apache/flink-cdc/blob/master/docs/content/docs/connectors/legacy-flink-cdc-sources/db2-cdc.md
 
|https://github.com/apache/flink-cdc/blob/master/docs/content/docs/connectors/legacy-flink-cdc-sources/db2-cdc.md]into
 Chinese.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Re: [VOTE] FLIP-436: Introduce Catalog-related Syntax

2024-03-19 Thread gongzhongqiang

+1 (non-binding)

Best,
Zhongqiang Gong

Yubin Li  于2024年3月19日周二 18:03写道：

> Hi everyone,
>
> Thanks for all the feedback, I'd like to start a vote on the FLIP-436:
> Introduce Catalog-related Syntax [1]. The discussion thread is here
> [2].
>
> The vote will be open for at least 72 hours unless there is an
> objection or insufficient votes.
>
> [1]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-436%3A+Introduce+Catalog-related+Syntax
> [2] https://lists.apache.org/thread/10k1bjb4sngyjwhmfqfky28lyoo7sv0z
>
> Best regards,
> Yubin
>

Re: [DISCUSS] FLIP-427: Disaggregated State Store

2024-03-19 Thread Hangxiang Yu

Hi, Yue.
Thanks for the reply.

If we use proposal1, we can easily reuse these optimizations .It is even
> possible to discuss and review the solution together in the Rocksdb
> community.

We also saw these useful optimizations which could be applied to ForSt in
the future.
But IIUC, it's not binding to proposal 1, right? We could also
implement interfaces about temperature and secondary cache to reuse them,
or organize a more complex HybridEnv based on proposal 2.

My point is whether we should retain the potential of proposal 1 in the
> design.
>
This is a good suggestion. We choose proposal 2 firstly due to its
maintainability and scalability, especially because it could leverage all
filesystems flink supported conveniently.
Given the indelible advantage in performance, I think we could also
consider proposal 1 as an optimization in the future.
For the interface on the DB side, we could also expose more different Envs
in the future.


On Tue, Mar 19, 2024 at 9:14 PM yue ma  wrote:

> Hi Hangxiang,
>
> Thanks for bringing this discussion.
> I have a few questions about the Proposal you mentioned in the FLIP.
>
> The current conclusion is to use proposal 2, which is okay for me. My point
> is whether we should retain the potential of proposal 1 in the design.
> There are the following reasons:
> 1. No JNI overhead, just like the Performance Part mentioned in Flip
> 2. RocksDB currently also provides an interface for Env, and there are also
> some implementations, such as HDFS-ENV, which seem to be easily scalable.
> 3. The RocksDB community continues to support LSM for different storage
> media, such as  Tiered Storage
> <
> https://github.com/facebook/rocksdb/wiki/Tiered-Storage-%28Experimental%29
> >
>   And some optimizations have been made for this scenario, such as Per
> Key Placement Comparison
> .
>  *Secondary cache
> <
> https://github.com/facebook/rocksdb/wiki/SecondaryCache-%28Experimental%29
> >*,
> similar to the Hybrid Block Cache mentioned in Flip-423
>  If we use proposal1, we can easily reuse these optimizations .It is even
> possible to discuss and review the solution together in the Rocksdb
> community.
>  In fact, we have already implemented some production practices using
> Proposal1 internally. We have integrated HybridEnv, Tiered Storage, and
> Secondary Cache on RocksDB and optimized the performance of Checkpoint and
> State Restore. It seems work well for us.
>
> --
> Best,
> Yue
>


-- 
Best,
Hangxiang.

Re: [VOTE] FLIP-436: Introduce Catalog-related Syntax

2024-03-19 Thread Hang Ruan

+1 (non-binding)

Best,
Hang

Jane Chan  于2024年3月19日周二 22:02写道：

> +1 (binding)
>
> Best,
> Jane
>
> On Tue, Mar 19, 2024 at 9:30 PM Leonard Xu  wrote:
>
> > +1(binding)
> >
> >
> > Best,
> > Leonard
> > > 2024年3月19日 下午9:03，Lincoln Lee  写道：
> > >
> > > +1 (binding)
> > >
> > > Best,
> > > Lincoln Lee
> > >
> > >
> > > Feng Jin  于2024年3月19日周二 19:59写道：
> > >
> > >> +1 (non-binding)
> > >>
> > >> Best,
> > >> Feng
> > >>
> > >> On Tue, Mar 19, 2024 at 7:46 PM Ferenc Csaky
>  > >
> > >> wrote:
> > >>
> > >>> +1 (non-binding).
> > >>>
> > >>> Best,
> > >>> Ferenc
> > >>>
> > >>>
> > >>>
> > >>>
> > >>> On Tuesday, March 19th, 2024 at 12:39, Jark Wu 
> > wrote:
> > >>>
> > 
> > 
> >  +1 (binding)
> > 
> >  Best,
> >  Jark
> > 
> >  On Tue, 19 Mar 2024 at 19:05, Yuepeng Pan panyuep...@apache.org
> > wrote:
> > 
> > > Hi, Yubin
> > >
> > > Thanks for driving it !
> > >
> > > +1 non-binding.
> > >
> > > Best,
> > > Yuepeng Pan.
> > >
> > > At 2024-03-19 17:56:42, "Yubin Li" lyb5...@gmail.com wrote:
> > >
> > >> Hi everyone,
> > >>
> > >> Thanks for all the feedback, I'd like to start a vote on the
> > >>> FLIP-436:
> > >> Introduce Catalog-related Syntax [1]. The discussion thread is
> here
> > >> [2].
> > >>
> > >> The vote will be open for at least 72 hours unless there is an
> > >> objection or insufficient votes.
> > >>
> > >> [1]
> > >>
> > >>>
> > >>
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-436%3A+Introduce+Catalog-related+Syntax
> > >> [2]
> > >> https://lists.apache.org/thread/10k1bjb4sngyjwhmfqfky28lyoo7sv0z
> > >>
> > >> Best regards,
> > >> Yubin
> > >>>
> > >>
> >
> >
>

Re: Re: [DISCUSS] FLIP-436: Introduce "SHOW CREATE CATALOG" Syntax

2024-03-19 Thread Hang Ruan

Hi, Yubin,

I found a little mistake in FLIP.
`ALTER CATALOG catalog_name RESET (key1=val1, key2=val2, ...)` should be
changed as `ALTER CATALOG catalog_name RESET (key1, key2, ...)`, right?

Best,
Hang


Lincoln Lee  于2024年3月20日周三 10:04写道：

> Hi Yubin,
>
> Sorry, please ignore my last reply (wrong context).
> I also asked Leonard, your proposal to extend the `CatalogDescriptor`
> should be okay.
>
> Thank you for your update : ) !
>
>
> Best,
> Lincoln Lee
>
>
> Lincoln Lee  于2024年3月20日周三 09:35写道：
>
> > Hi Yubin,
> >
> > Thank you for detailed explaination! I overlooked `CatalogBaseTable`, in
> > fact
> >  there is already a `String getComment();` interface similar to
> `database`
> > and `table`.
> > Can we continue the work on FLINK-21665 and complete its implementation?
> > It seems to be very close.
> >
> > Best,
> > Lincoln Lee
> >
> >
> > Yubin Li  于2024年3月20日周三 01:42写道：
> >
> >> Hi Lincoln,
> >>
> >> Thanks for your detailed comments!
> >>
> >> Supporting comments for `Catalog` is a really helpful feature, I agree
> >> with you to make it introduced in this FLIP, thank you for pointing
> >> that out :)
> >>
> >> Concerning the implementation, I propose to introduce `getComment()`
> >> method in `CatalogDescriptor`, and the reasons are as follows. WDYT?
> >> 1. For the sake of design consistency, follow the design of FLIP-295
> >> [1] which introduced `CatalogStore` component, `CatalogDescriptor`
> >> includes names and attributes, both of which are used to describe the
> >> catalog, and `comment` can be added smoothly.
> >> 2. Extending the existing class rather than add new method to the
> >> existing interface, Especially, the `Catalog` interface, as a core
> >> interface, is used by a series of important components such as
> >> `CatalogFactory`, `CatalogManager` and `FactoryUtil`, and is
> >> implemented by a large number of connectors such as JDBC, Paimon, and
> >> Hive. Adding methods to it will greatly increase the implementation
> >> complexity, and more importantly, increase the cost of iteration,
> >> maintenance, and verification.
> >>
> >> Please see FLIP doc [2] for details.
> >>
> >> [1]
> >>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-295%3A+Support+lazy+initialization+of+catalogs+and+persistence+of+catalog+configurations
> >> [2]
> >>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-436%3A+Introduce+Catalog-related+Syntax
> >>
> >> Best,
> >> Yubin
> >>
> >> On Tue, Mar 19, 2024 at 9:57 PM Lincoln Lee 
> >> wrote:
> >> >
> >> > Hi Yubin,
> >> >
> >> > Thanks for your quickly response!
> >> >
> >> > It would be better to support comments just like create `database` and
> >> > `table` with comment.
> >> > That is, add `String getComment();` to the current `Catalog`
> interface.
> >> > WDYT?
> >> >
> >> > Best,
> >> > Lincoln Lee
> >> >
> >> >
> >> > Yubin Li  于2024年3月19日周二 21:44写道：
> >> >
> >> > > Hi Lincoln,
> >> > >
> >> > > Good catch. Thanks for your suggestions.
> >> > >
> >> > > I found that the creation statements of database and table both
> >> > > support specifying "if not exists". For the sake of syntactic
> >> > > consistency and user practicality, We could introduce the '[if not
> >> > > exists]' clause to the 'create catalog' statement.
> >> > >
> >> > > As for the introduction of the `catalog comment` feature, it may
> >> > > involve changes to the Catalog structure, which can be left for
> future
> >> > > discussion.
> >> > >
> >> > > WDYT? Looking forward to your feedback :)
> >> > >
> >> > > Best,
> >> > > Yubin
> >> > >
> >> > > On Tue, Mar 19, 2024 at 9:06 PM Lincoln Lee  >
> >> > > wrote:
> >> > > >
> >> > > > Hi Yubin,
> >> > > >
> >> > > > Big +1 for completing the catalog api!
> >> > > > There's a minor addition[1] which does not affect the vote could
> >> also be
> >> > > > considered.
> >> > > >
> >> > > > [1] https://issues.apache.org/jira/browse/FLINK-21665
> >> > > >
> >> > > >
> >> > > > Best,
> >> > > > Lincoln Lee
> >> > > >
> >> > > >
> >> > > > Yubin Li  于2024年3月18日周一 17:44写道：
> >> > > >
> >> > > > > Hi Jark,
> >> > > > >
> >> > > > > Thanks for your response, I have updated FLIP-436: Introduce
> >> > > > > Catalog-related Syntax [1] as you suggested.
> >> > > > >
> >> > > > > If there are no more comments within 24 hours, I will start a
> >> vote for
> >> > > > > this, thanks :)
> >> > > > >
> >> > > > > Best,
> >> > > > > Yubin
> >> > > > >
> >> > > > > [1]
> >> > > > >
> >> > >
> >>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-436%3A+Introduce+Catalog-related+Syntax
> >> > > > >
> >> > > > > On Mon, Mar 18, 2024 at 4:39 PM Jark Wu 
> wrote:
> >> > > > > >
> >> > > > > > Hi Yubin,
> >> > > > > >
> >> > > > > > Thanks for the quick response. The suggestion sounds good to
> me!
> >> > > > > >
> >> > > > > > Best,
> >> > > > > > Jark
> >> > > > > >
> >> > > > > > On Mon, 18 Mar 2024 at 13:06, Yubin Li 
> >> wrote:
> >> > > > > >
> >> > > > > > > Hi Jark,
> >> > > > > > >
> >> > > > > > > Good pointing! Thanks for your

Re: [SUMMARY] Flink 1.19 last sync summary on 03/19/2024

2024-03-19 Thread Leonard Xu



Thanks for your continuous release sync summary,  Lincoln.  It help me a lot.


Best,
Leonard

> 2024年3月19日 下午11:49，Lincoln Lee  写道：
> 
> Hi everyone,
> 
> Flink 1.19.0 has been officially released yesterday[1].
> 
> I'd like to share some highlights of the last release sync of 1.19:
> 
> - Remaining works
> 
> The official docker image is still in progress[2], will be available once
> the related pr been merged[3].
> In addition, some follow-up items are being processed[4], and about end of
> support for lower versions will be discussed in separate mail.
> 
> Thanks to all contributors for your great work on 1.19 and the support for
> the release!
> 
> The new 1.20 release cycle[5] has set off, welcome to continue contributing!
> 
> [1] https://lists.apache.org/thread/sofmxytbh6y20nwot1gywqqc2lqxn4hm
> [2] https://issues.apache.org/jira/browse/FLINK-34701
> [3] https://github.com/docker-library/official-images/pull/16430
> [4] https://issues.apache.org/jira/browse/FLINK-34706
> [5] https://lists.apache.org/thread/80h3nzk08v276xmllswbbbg1z7m3v70t
> 
> 
> Best,
> Yun, Jing, Martijn and Lincoln

[jira] [Created] (FLINK-34746) Switching to the Apache CDN for Dockerfile

2024-03-19 Thread lincoln lee (Jira)

lincoln lee created FLINK-34746:
---

 Summary: Switching to the Apache CDN for Dockerfile
 Key: FLINK-34746
 URL: https://issues.apache.org/jira/browse/FLINK-34746
 Project: Flink
  Issue Type: Improvement
  Components: flink-docker
Reporter: lincoln lee


During publishing the official image, we received some comments

for Switching to the Apache CDN

 

See

https://github.com/docker-library/official-images/pull/16114

https://github.com/docker-library/official-images/pull/16430

 

Reason for switching: [https://apache.org/history/mirror-history.html] (also 
[https://www.apache.org/dyn/closer.cgi] and [https://www.apache.org/mirrors])



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Re: Re: [DISCUSS] FLIP-436: Introduce "SHOW CREATE CATALOG" Syntax

2024-03-19 Thread Lincoln Lee

Hi Yubin,

Sorry, please ignore my last reply (wrong context).
I also asked Leonard, your proposal to extend the `CatalogDescriptor`
should be okay.

Thank you for your update : ) !


Best,
Lincoln Lee


Lincoln Lee  于2024年3月20日周三 09:35写道：

> Hi Yubin,
>
> Thank you for detailed explaination! I overlooked `CatalogBaseTable`, in
> fact
>  there is already a `String getComment();` interface similar to `database`
> and `table`.
> Can we continue the work on FLINK-21665 and complete its implementation?
> It seems to be very close.
>
> Best,
> Lincoln Lee
>
>
> Yubin Li  于2024年3月20日周三 01:42写道：
>
>> Hi Lincoln,
>>
>> Thanks for your detailed comments!
>>
>> Supporting comments for `Catalog` is a really helpful feature, I agree
>> with you to make it introduced in this FLIP, thank you for pointing
>> that out :)
>>
>> Concerning the implementation, I propose to introduce `getComment()`
>> method in `CatalogDescriptor`, and the reasons are as follows. WDYT?
>> 1. For the sake of design consistency, follow the design of FLIP-295
>> [1] which introduced `CatalogStore` component, `CatalogDescriptor`
>> includes names and attributes, both of which are used to describe the
>> catalog, and `comment` can be added smoothly.
>> 2. Extending the existing class rather than add new method to the
>> existing interface, Especially, the `Catalog` interface, as a core
>> interface, is used by a series of important components such as
>> `CatalogFactory`, `CatalogManager` and `FactoryUtil`, and is
>> implemented by a large number of connectors such as JDBC, Paimon, and
>> Hive. Adding methods to it will greatly increase the implementation
>> complexity, and more importantly, increase the cost of iteration,
>> maintenance, and verification.
>>
>> Please see FLIP doc [2] for details.
>>
>> [1]
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-295%3A+Support+lazy+initialization+of+catalogs+and+persistence+of+catalog+configurations
>> [2]
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-436%3A+Introduce+Catalog-related+Syntax
>>
>> Best,
>> Yubin
>>
>> On Tue, Mar 19, 2024 at 9:57 PM Lincoln Lee 
>> wrote:
>> >
>> > Hi Yubin,
>> >
>> > Thanks for your quickly response!
>> >
>> > It would be better to support comments just like create `database` and
>> > `table` with comment.
>> > That is, add `String getComment();` to the current `Catalog` interface.
>> > WDYT?
>> >
>> > Best,
>> > Lincoln Lee
>> >
>> >
>> > Yubin Li  于2024年3月19日周二 21:44写道：
>> >
>> > > Hi Lincoln,
>> > >
>> > > Good catch. Thanks for your suggestions.
>> > >
>> > > I found that the creation statements of database and table both
>> > > support specifying "if not exists". For the sake of syntactic
>> > > consistency and user practicality, We could introduce the '[if not
>> > > exists]' clause to the 'create catalog' statement.
>> > >
>> > > As for the introduction of the `catalog comment` feature, it may
>> > > involve changes to the Catalog structure, which can be left for future
>> > > discussion.
>> > >
>> > > WDYT? Looking forward to your feedback :)
>> > >
>> > > Best,
>> > > Yubin
>> > >
>> > > On Tue, Mar 19, 2024 at 9:06 PM Lincoln Lee 
>> > > wrote:
>> > > >
>> > > > Hi Yubin,
>> > > >
>> > > > Big +1 for completing the catalog api!
>> > > > There's a minor addition[1] which does not affect the vote could
>> also be
>> > > > considered.
>> > > >
>> > > > [1] https://issues.apache.org/jira/browse/FLINK-21665
>> > > >
>> > > >
>> > > > Best,
>> > > > Lincoln Lee
>> > > >
>> > > >
>> > > > Yubin Li  于2024年3月18日周一 17:44写道：
>> > > >
>> > > > > Hi Jark,
>> > > > >
>> > > > > Thanks for your response, I have updated FLIP-436: Introduce
>> > > > > Catalog-related Syntax [1] as you suggested.
>> > > > >
>> > > > > If there are no more comments within 24 hours, I will start a
>> vote for
>> > > > > this, thanks :)
>> > > > >
>> > > > > Best,
>> > > > > Yubin
>> > > > >
>> > > > > [1]
>> > > > >
>> > >
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-436%3A+Introduce+Catalog-related+Syntax
>> > > > >
>> > > > > On Mon, Mar 18, 2024 at 4:39 PM Jark Wu  wrote:
>> > > > > >
>> > > > > > Hi Yubin,
>> > > > > >
>> > > > > > Thanks for the quick response. The suggestion sounds good to me!
>> > > > > >
>> > > > > > Best,
>> > > > > > Jark
>> > > > > >
>> > > > > > On Mon, 18 Mar 2024 at 13:06, Yubin Li 
>> wrote:
>> > > > > >
>> > > > > > > Hi Jark,
>> > > > > > >
>> > > > > > > Good pointing! Thanks for your reply, there are some details
>> to
>> > > align
>> > > > > :)
>> > > > > > >
>> > > > > > > 1. I think the purpose of DESCRIBE CATALOG is to display
>> metadata
>> > > > > > > > information including catalog name,
>> > > > > > > > catalog comment (may be introduced in the future), catalog
>> type,
>> > > and
>> > > > > > > > catalog properties (for example [1])
>> > > > > > >
>> > > > > > > Adopting { DESC | DESCRIBE } CATALOG [ EXTENDED ] xx as formal
>> > > syntax,
>> > > > > > > Producing rich and compatible results for future needs is very
>>

Re: Re: [DISCUSS] FLIP-436: Introduce "SHOW CREATE CATALOG" Syntax

2024-03-19 Thread Lincoln Lee

Hi Yubin,

Thank you for detailed explaination! I overlooked `CatalogBaseTable`, in
fact
 there is already a `String getComment();` interface similar to `database`
and `table`.
Can we continue the work on FLINK-21665 and complete its implementation? It
seems to be very close.

Best,
Lincoln Lee


Yubin Li  于2024年3月20日周三 01:42写道：

> Hi Lincoln,
>
> Thanks for your detailed comments!
>
> Supporting comments for `Catalog` is a really helpful feature, I agree
> with you to make it introduced in this FLIP, thank you for pointing
> that out :)
>
> Concerning the implementation, I propose to introduce `getComment()`
> method in `CatalogDescriptor`, and the reasons are as follows. WDYT?
> 1. For the sake of design consistency, follow the design of FLIP-295
> [1] which introduced `CatalogStore` component, `CatalogDescriptor`
> includes names and attributes, both of which are used to describe the
> catalog, and `comment` can be added smoothly.
> 2. Extending the existing class rather than add new method to the
> existing interface, Especially, the `Catalog` interface, as a core
> interface, is used by a series of important components such as
> `CatalogFactory`, `CatalogManager` and `FactoryUtil`, and is
> implemented by a large number of connectors such as JDBC, Paimon, and
> Hive. Adding methods to it will greatly increase the implementation
> complexity, and more importantly, increase the cost of iteration,
> maintenance, and verification.
>
> Please see FLIP doc [2] for details.
>
> [1]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-295%3A+Support+lazy+initialization+of+catalogs+and+persistence+of+catalog+configurations
> [2]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-436%3A+Introduce+Catalog-related+Syntax
>
> Best,
> Yubin
>
> On Tue, Mar 19, 2024 at 9:57 PM Lincoln Lee 
> wrote:
> >
> > Hi Yubin,
> >
> > Thanks for your quickly response!
> >
> > It would be better to support comments just like create `database` and
> > `table` with comment.
> > That is, add `String getComment();` to the current `Catalog` interface.
> > WDYT?
> >
> > Best,
> > Lincoln Lee
> >
> >
> > Yubin Li  于2024年3月19日周二 21:44写道：
> >
> > > Hi Lincoln,
> > >
> > > Good catch. Thanks for your suggestions.
> > >
> > > I found that the creation statements of database and table both
> > > support specifying "if not exists". For the sake of syntactic
> > > consistency and user practicality, We could introduce the '[if not
> > > exists]' clause to the 'create catalog' statement.
> > >
> > > As for the introduction of the `catalog comment` feature, it may
> > > involve changes to the Catalog structure, which can be left for future
> > > discussion.
> > >
> > > WDYT? Looking forward to your feedback :)
> > >
> > > Best,
> > > Yubin
> > >
> > > On Tue, Mar 19, 2024 at 9:06 PM Lincoln Lee 
> > > wrote:
> > > >
> > > > Hi Yubin,
> > > >
> > > > Big +1 for completing the catalog api!
> > > > There's a minor addition[1] which does not affect the vote could
> also be
> > > > considered.
> > > >
> > > > [1] https://issues.apache.org/jira/browse/FLINK-21665
> > > >
> > > >
> > > > Best,
> > > > Lincoln Lee
> > > >
> > > >
> > > > Yubin Li  于2024年3月18日周一 17:44写道：
> > > >
> > > > > Hi Jark,
> > > > >
> > > > > Thanks for your response, I have updated FLIP-436: Introduce
> > > > > Catalog-related Syntax [1] as you suggested.
> > > > >
> > > > > If there are no more comments within 24 hours, I will start a vote
> for
> > > > > this, thanks :)
> > > > >
> > > > > Best,
> > > > > Yubin
> > > > >
> > > > > [1]
> > > > >
> > >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-436%3A+Introduce+Catalog-related+Syntax
> > > > >
> > > > > On Mon, Mar 18, 2024 at 4:39 PM Jark Wu  wrote:
> > > > > >
> > > > > > Hi Yubin,
> > > > > >
> > > > > > Thanks for the quick response. The suggestion sounds good to me!
> > > > > >
> > > > > > Best,
> > > > > > Jark
> > > > > >
> > > > > > On Mon, 18 Mar 2024 at 13:06, Yubin Li 
> wrote:
> > > > > >
> > > > > > > Hi Jark,
> > > > > > >
> > > > > > > Good pointing! Thanks for your reply, there are some details to
> > > align
> > > > > :)
> > > > > > >
> > > > > > > 1. I think the purpose of DESCRIBE CATALOG is to display
> metadata
> > > > > > > > information including catalog name,
> > > > > > > > catalog comment (may be introduced in the future), catalog
> type,
> > > and
> > > > > > > > catalog properties (for example [1])
> > > > > > >
> > > > > > > Adopting { DESC | DESCRIBE } CATALOG [ EXTENDED ] xx as formal
> > > syntax,
> > > > > > > Producing rich and compatible results for future needs is very
> > > > > important.
> > > > > > > When
> > > > > > > specifying "extended" in the syntax, it will output the
> complete
> > > > > > > information including
> > > > > > > properties.The complete output example is as follows:
> > > > > > >
> > > > > > >
> > > > >
> > >
> +-+---+
> > > > > > > | catalog_description_item |

Re: [VOTE] Apache Flink Kubernetes Operator Release 1.8.0, release candidate #1

2024-03-19 Thread Alexander Fedulov

Hi Max,

+1

- Verified SHA checksums
- Verified GPG signatures
- Verified that the source distributions do not contain binaries
- Verified built-in tests (mvn clean verify)
- Verified build with Java 11 (mvn clean install -DskipTests -T 1C)
- Verified that Helm and operator files contain Apache licenses (rg -L
--files-without-match "http://www.apache.org/licenses/LICENSE-2.0; .).
 I am not sure we need to
include ./examples/flink-beam-example/dependency-reduced-pom.xml
and ./flink-autoscaler-standalone/dependency-reduced-pom.xml though
- Verified that chart and appVersion matches the target release (91d67d9)
- Verified that Helm chart can be installed from the local Helm folder
without overriding any parameters
- Verified that Helm chart can be installed from the RC repo without
overriding any parameters (
https://dist.apache.org/repos/dist/dev/flink/flink-kubernetes-operator-1.8.0-rc1
)
- Verified docker container build

Best,
Alex


On Mon, 18 Mar 2024 at 20:50, Maximilian Michels  wrote:

> @Rui @Gyula Thanks for checking the release!
>
> >A minor correction is that [3] in the email should point to:
> >ghcr.io/apache/flink-kubernetes-operator:91d67d9 . But the helm chart and
> > everything is correct. It's a typo in the vote email.
>
> Good catch. Indeed, for the linked Docker image 8938658 points to
> HEAD^ of the rc branch, 91d67d9 is the HEAD. There are no code changes
> between those two commits, except for updating the version. So the
> votes are not impacted, especially because votes are casted against
> the source release which, as you pointed out, contains the correct
> image ref.
>
>
>
>
>
>
>
>
>
>
> On Mon, Mar 18, 2024 at 9:54 AM Gyula Fóra  wrote:
> >
> > Hi Max!
> >
> > +1 (binding)
> >
> >  - Verified source release, helm chart + checkpoints / signatures
> >  - Helm points to correct image
> >  - Deployed operator, stateful example and executed upgrade + savepoint
> > redeploy
> >  - Verified logs
> >  - Flink web PR looks good +1
> >
> > A minor correction is that [3] in the email should point to:
> > ghcr.io/apache/flink-kubernetes-operator:91d67d9 . But the helm chart
> and
> > everything is correct. It's a typo in the vote email.
> >
> > Thank you for preparing the release!
> >
> > Cheers,
> > Gyula
> >
> > On Mon, Mar 18, 2024 at 8:26 AM Rui Fan <1996fan...@gmail.com> wrote:
> >
> > > Thanks Max for driving this release!
> > >
> > > +1(non-binding)
> > >
> > > - Downloaded artifacts from dist ( svn co
> > >
> > >
> https://dist.apache.org/repos/dist/dev/flink/flink-kubernetes-operator-1.8.0-rc1/
> > > )
> > > - Verified SHA512 checksums : ( for i in *.tgz; do echo $i; sha512sum
> > > --check $i.sha512; done )
> > > - Verified GPG signatures : ( $ for i in *.tgz; do echo $i; gpg
> --verify
> > > $i.asc $i )
> > > - Build the source with java-11 and java-17 ( mvn -T 20 clean install
> > > -DskipTests )
> > > - Verified the license header during build the source
> > > - Verified that chart and appVersion matches the target release (less
> the
> > > index.yaml and Chart.yaml )
> > > - RC repo works as Helm repo( helm repo add
> flink-operator-repo-1.8.0-rc1
> > >
> > >
> https://dist.apache.org/repos/dist/dev/flink/flink-kubernetes-operator-1.8.0-rc1/
> > > )
> > > - Verified Helm chart can be installed  ( helm install
> > > flink-kubernetes-operator
> > > flink-operator-repo-1.8.0-rc1/flink-kubernetes-operator --set
> > > webhook.create=false )
> > > - Submitted the autoscaling demo, the autoscaler works well with
> *memory
> > > tuning *(kubectl apply -f autoscaling.yaml)
> > >- job.autoscaler.memory.tuning.enabled: "true"
> > > - Download Autoscaler standalone: wget
> > >
> > >
> https://repository.apache.org/content/repositories/orgapacheflink-1710/org/apache/flink/flink-autoscaler-standalone/1.8.0/flink-autoscaler-standalone-1.8.0.jar
> > > - Ran Autoscaler standalone locally, it works well with rescale api and
> > > JDBC state store/event handler
> > >
> > > Best,
> > > Rui
> > >
> > > On Fri, Mar 15, 2024 at 1:45 AM Maximilian Michels 
> wrote:
> > >
> > > > Hi everyone,
> > > >
> > > > Please review and vote on the release candidate #1 for the version
> > > > 1.8.0 of the Apache Flink Kubernetes Operator, as follows:
> > > >
> > > > [ ] +1, Approve the release
> > > > [ ] -1, Do not approve the release (please provide specific comments)
> > > >
> > > > **Release Overview**
> > > >
> > > > As an overview, the release consists of the following:
> > > > a) Kubernetes Operator canonical source distribution (including the
> > > > Dockerfile), to be deployed to the release repository at
> > > > dist.apache.org
> > > > b) Kubernetes Operator Helm Chart to be deployed to the release
> > > > repository at dist.apache.org
> > > > c) Maven artifacts to be deployed to the Maven Central Repository
> > > > d) Docker image to be pushed to Dockerhub
> > > >
> > > > **Staging Areas to Review**
> > > >
> > > > The staging areas containing the above mentioned artifacts are as
> > > >

Re: Proposal to modify Flink’s startup sequence for MINIMUM downtime across deployments

2024-03-19 Thread Asimansu Bera

Resending ->

Hello Sergio,
I'm a newbie(not an expert) and give it a try.

If I presume you've only a reference state on Flink and no other states and
Kafka is the source, you may see below approach as an option , should test.

a. Create two applications - Primary and Secondary (blue and green)
b. Create two S3 buckets - Primary and secondary(S3/A and S3/B)
c. S3/A and S3/B  -both are in sync with reference data only
d. Refresh the reference data on both  Primary and Secondary buckets
everyday - separate Flink job(refresh reference data) with separate topic
e. Just shut down the primary cluster/instance . Note the latest offset
value of Kafka topic(s) of transaction events data till which primary able
to complete processing
f. intermittently processing could be ignored as processing would start
from the latest successful offset at secondary side
f. start the secondary cluster with newer deployment with Kafka - offset
setting latest. No gap of offset value of Kafka topic between Primary side
process where it stopped vs secondary where it starts

Logically it should work with very minimum downtime. Downside is, you need
to maintain two S3 buckets.

I leave it to Flink committers and experts to comment as I may have
completely misunderstood the problem and internal operations.

-A

On Tue, Mar 19, 2024 at 1:52 PM Asimansu Bera 
wrote:

> Hello Sergio,
> I'm a newbie(not an expert) and give it a try.
>
> If I presume you've only a reference state on Flink and no other states
> and Kafka is the source, you may see below approach as an option , should
> test.
>
> a. Create two applications - Primary and Secondary (blue and green)
> b. Create two S3 buckets - Primary and secondary(S3/A and S3/B)
> c. S3/A and S3/B  -both are in sync with reference data only
> d. Refresh the reference data on both  Primary and Secondary buckets
> everyday - separate Flink job(refresh reference data) with separate topic
> e. Just shut down the primary cluster/instance . Note the latest offset
> value of Kafka topic(s) of transaction events data till which primary able
> to complete processing
> f. intermittently processing could be ignored as processing would start
> from the latest successful offset at secondary side
> f. start the secondary cluster with newer deployment with Kafka - offset
> setting latest. No gap of offset value of Kafka topic between Primary side
> process where it stopped vs secondary where it starts
>
> Logically it should work with very minimum downtime. Down side is, you
> need to maintain two S3 buckets.
>
> I leave it to Flink committers and experts to comments as I may have
> completely misunderstood the problem and internal operations.
>
> -A
>
> On Tue, Mar 19, 2024 at 11:29 AM Sergio Chong Loo
>  wrote:
>
>> Hello Flink Community,
>>
>> We have a particular challenging scenario, which we’d like to run by the
>> rest of the community experts and check
>>
>> 1. If anything can be done with existing functionality that we’re
>> overlooking, or
>> 2. The feasibility of this proposal.
>>
>> I tried to keep it concise in a 1-pager type format.
>>
>> Thanks in advance,
>> Sergio Chong
>>
>> —
>>
>> Goal
>>
>> We have a particular use case with a team evaluating and looking to adopt
>> Flink. Their pipelines have a specially intricate and long bootstrapping
>> sequence.
>>
>> The main objective: to have a minimum downtime (~1 minute as a starting
>> point). The situation that would affect this downtime the most for this
>> team is a (re)deployment. All changes proposed here are the minimum
>> necessary within a Flink job’s lifecycle boundaries only.
>>
>>
>> Non-goal
>>
>> Anything that requires coordination of 2 jobs or more belongs in the
>> orchestration layer and is outside of the scope of this document; no
>> inter-job awareness/communication should be considered (initially).
>> However, any necessary “hooks” should be provided to make that integration
>> smoother.
>>
>>
>> Scenario
>>
>> The (re)deployments are particularly of concern here. Given the stateful
>> nature of Flink we of course need to “snapshot” the state at any point in
>> time so that the next deployment can take over. However, if we consider a
>> typical sequence in EKS these would be the (simplified) list of steps:
>>
>> All the points with an [*] are “long running” steps where potentially no
>> progress/data processing occurs
>> [*] Take a Checkpoint
>> [*] Issue a restart (pod teardown, resource deallocation, etc.)
>> [*] Start deployment (resource allocation, pod setup, etc.)
>> [*] Flink job submission
>> [*] State loading from the Checkpoint
>> [*] Task Managers initialization (callbacks to user code via open)
>> Begin processing data
>>
>> These typical steps will already take more than 1 minute but there’s
>> more:
>>
>> The team has a requirement of statically loading about 120GB of reference
>> data into direct (non-heap) memory. This data is only read by the pipeline
>> and will need to be reloaded/refreshed every day. This

Re: Proposal to modify Flink’s startup sequence for MINIMUM downtime across deployments

2024-03-19 Thread Asimansu Bera

Hello Sergio,
I'm a newbie(not an expert) and give it a try.

If I presume you've only a reference state on Flink and no other states and
Kafka is the source, you may see below approach as an option , should test.

a. Create two applications - Primary and Secondary (blue and green)
b. Create two S3 buckets - Primary and secondary(S3/A and S3/B)
c. S3/A and S3/B  -both are in sync with reference data only
d. Refresh the reference data on both  Primary and Secondary buckets
everyday - separate Flink job(refresh reference data) with separate topic
e. Just shut down the primary cluster/instance . Note the latest offset
value of Kafka topic(s) of transaction events data till which primary able
to complete processing
f. intermittently processing could be ignored as processing would start
from the latest successful offset at secondary side
f. start the secondary cluster with newer deployment with Kafka - offset
setting latest. No gap of offset value of Kafka topic between Primary side
process where it stopped vs secondary where it starts

Logically it should work with very minimum downtime. Down side is, you need
to maintain two S3 buckets.

I leave it to Flink committers and experts to comments as I may have
completely misunderstood the problem and internal operations.

-A

On Tue, Mar 19, 2024 at 11:29 AM Sergio Chong Loo
 wrote:

> Hello Flink Community,
>
> We have a particular challenging scenario, which we’d like to run by the
> rest of the community experts and check
>
> 1. If anything can be done with existing functionality that we’re
> overlooking, or
> 2. The feasibility of this proposal.
>
> I tried to keep it concise in a 1-pager type format.
>
> Thanks in advance,
> Sergio Chong
>
> —
>
> Goal
>
> We have a particular use case with a team evaluating and looking to adopt
> Flink. Their pipelines have a specially intricate and long bootstrapping
> sequence.
>
> The main objective: to have a minimum downtime (~1 minute as a starting
> point). The situation that would affect this downtime the most for this
> team is a (re)deployment. All changes proposed here are the minimum
> necessary within a Flink job’s lifecycle boundaries only.
>
>
> Non-goal
>
> Anything that requires coordination of 2 jobs or more belongs in the
> orchestration layer and is outside of the scope of this document; no
> inter-job awareness/communication should be considered (initially).
> However, any necessary “hooks” should be provided to make that integration
> smoother.
>
>
> Scenario
>
> The (re)deployments are particularly of concern here. Given the stateful
> nature of Flink we of course need to “snapshot” the state at any point in
> time so that the next deployment can take over. However, if we consider a
> typical sequence in EKS these would be the (simplified) list of steps:
>
> All the points with an [*] are “long running” steps where potentially no
> progress/data processing occurs
> [*] Take a Checkpoint
> [*] Issue a restart (pod teardown, resource deallocation, etc.)
> [*] Start deployment (resource allocation, pod setup, etc.)
> [*] Flink job submission
> [*] State loading from the Checkpoint
> [*] Task Managers initialization (callbacks to user code via open)
> Begin processing data
>
> These typical steps will already take more than 1 minute but there’s more:
>
> The team has a requirement of statically loading about 120GB of reference
> data into direct (non-heap) memory. This data is only read by the pipeline
> and will need to be reloaded/refreshed every day. This process of fetching
> the data from storage (S3) and loading it into direct memory takes around
> 4.5 minutes.
>
> NOTES:
> Reloading this data on the fly (online) is not the first choice since the
> pipeline would need twice the memory capacity to guarantee a live copy at
> all times.
> The (re)deployment process becomes our common denominator since we’ll
> still have to deal with this during normal code changes/updates. Hence...
>
>
> Proposal
>
> The idea is to piggy back on the concept of blue/green deployments but
> With a slightly modified Flink start up sequence and...
> Still relying on a clean cutover between deployments (A and B) via
> incremental checkpoints.
> Why? To have deployment B bootstrap as much as possible and only cutover
> deployment A to B when the latter is fully ready to proceed. This means B
> can “take its time” and prepare what it needs first, then “signal” A and
> potentially “wait” for A to deliver the desired checkpoint
>
> The “signaling” mechanism does not need to be fancy. An idea is to define
> a “future” checkpoint as the cutover point (since checkpoints are
> sequential and we know their S3 paths). Deployment B would just implement a
> monitor on said checkpoint and wait until it’s ready, when it does it’ll
> load it and proceed. Deployment A should just terminate right after it
> produced the checkpoint.
>
> NOTES:
> Rely on the existence of the ...chk-XX/_metadata file. This file gets
> created when the CP is

Proposal to modify Flink’s startup sequence for MINIMUM downtime across deployments

2024-03-19 Thread Sergio Chong Loo

Hello Flink Community,

We have a particular challenging scenario, which we’d like to run by the rest 
of the community experts and check

1. If anything can be done with existing functionality that we’re overlooking, 
or
2. The feasibility of this proposal.

I tried to keep it concise in a 1-pager type format.

Thanks in advance,
Sergio Chong

—

Goal

We have a particular use case with a team evaluating and looking to adopt 
Flink. Their pipelines have a specially intricate and long bootstrapping 
sequence.

The main objective: to have a minimum downtime (~1 minute as a starting point). 
The situation that would affect this downtime the most for this team is a 
(re)deployment. All changes proposed here are the minimum necessary within a 
Flink job’s lifecycle boundaries only.


Non-goal

Anything that requires coordination of 2 jobs or more belongs in the 
orchestration layer and is outside of the scope of this document; no inter-job 
awareness/communication should be considered (initially). However, any 
necessary “hooks” should be provided to make that integration smoother.


Scenario

The (re)deployments are particularly of concern here. Given the stateful nature 
of Flink we of course need to “snapshot” the state at any point in time so that 
the next deployment can take over. However, if we consider a typical sequence 
in EKS these would be the (simplified) list of steps:

All the points with an [*] are “long running” steps where potentially no 
progress/data processing occurs
[*] Take a Checkpoint
[*] Issue a restart (pod teardown, resource deallocation, etc.)
[*] Start deployment (resource allocation, pod setup, etc.)
[*] Flink job submission
[*] State loading from the Checkpoint
[*] Task Managers initialization (callbacks to user code via open)
Begin processing data

These typical steps will already take more than 1 minute but there’s more: 

The team has a requirement of statically loading about 120GB of reference data 
into direct (non-heap) memory. This data is only read by the pipeline and will 
need to be reloaded/refreshed every day. This process of fetching the data from 
storage (S3) and loading it into direct memory takes around 4.5 minutes.

NOTES:
Reloading this data on the fly (online) is not the first choice since the 
pipeline would need twice the memory capacity to guarantee a live copy at all 
times.
The (re)deployment process becomes our common denominator since we’ll still 
have to deal with this during normal code changes/updates. Hence...


Proposal

The idea is to piggy back on the concept of blue/green deployments but 
With a slightly modified Flink start up sequence and...
Still relying on a clean cutover between deployments (A and B) via incremental 
checkpoints.
Why? To have deployment B bootstrap as much as possible and only cutover 
deployment A to B when the latter is fully ready to proceed. This means B can 
“take its time” and prepare what it needs first, then “signal” A and 
potentially “wait” for A to deliver the desired checkpoint 

The “signaling” mechanism does not need to be fancy. An idea is to define a 
“future” checkpoint as the cutover point (since checkpoints are sequential and 
we know their S3 paths). Deployment B would just implement a monitor on said 
checkpoint and wait until it’s ready, when it does it’ll load it and proceed. 
Deployment A should just terminate right after it produced the checkpoint.

NOTES:
Rely on the existence of the ...chk-XX/_metadata file. This file gets created 
when the CP is complete. Might need confirmation that the status is actually 
COMPLETE (successful).
A checkpoint can fail, if that's the case we should “increment” the counter and 
expect the next one to succeed.
TBD: what alternatives do we have to query the status of a checkpoint? e.g. 
IN_PROGRESS, COMPLETED, FAILED, etc.

The proposed sequence of steps would look like this:
[A] ... already running...
[B] Start deployment with a “future” checkpoint of chk-117 for example 
(resource allocation, pod setup, etc. happens here)
[B] Flink job submission
[B] Time consuming user reference data loading
[B] Job is ready and will wait until our example checkpoint chk-117 
exists/becomes ready. Transitions to a new state e.g. WAITING_FOR_START_SIGNAL
[A] As an example the current checkpoint could be chk-112, the job continues 
taking cps normally chk-113, -114... until it reaches chk-117
[*] [A] Job is terminated after chk-117 checkpoint is complete
[*] [B] Detects chk-117 is ready so it continues with the state loading from it
[B] Task Managers initialization
[B] Transitions to RUNNING. Begin processing data

NOTES:
Only steps 7-8 ([*]) should potentially “block” the progress/data processing.
There should be metrics emitted indicating if a checkpoint was ready before the 
job was. It is expected the new job to wait for the CP, not the other way 
around.
There are no guarantees that steps 7-8 will take less than a minute but:
That’s the best this solution can do
The actual

Re: Re: [DISCUSS] FLIP-436: Introduce "SHOW CREATE CATALOG" Syntax

2024-03-19 Thread Yubin Li

Hi Lincoln,

Thanks for your detailed comments!

Supporting comments for `Catalog` is a really helpful feature, I agree
with you to make it introduced in this FLIP, thank you for pointing
that out :)

Concerning the implementation, I propose to introduce `getComment()`
method in `CatalogDescriptor`, and the reasons are as follows. WDYT?
1. For the sake of design consistency, follow the design of FLIP-295
[1] which introduced `CatalogStore` component, `CatalogDescriptor`
includes names and attributes, both of which are used to describe the
catalog, and `comment` can be added smoothly.
2. Extending the existing class rather than add new method to the
existing interface, Especially, the `Catalog` interface, as a core
interface, is used by a series of important components such as
`CatalogFactory`, `CatalogManager` and `FactoryUtil`, and is
implemented by a large number of connectors such as JDBC, Paimon, and
Hive. Adding methods to it will greatly increase the implementation
complexity, and more importantly, increase the cost of iteration,
maintenance, and verification.

Please see FLIP doc [2] for details.

[1] 
https://cwiki.apache.org/confluence/display/FLINK/FLIP-295%3A+Support+lazy+initialization+of+catalogs+and+persistence+of+catalog+configurations
[2] 
https://cwiki.apache.org/confluence/display/FLINK/FLIP-436%3A+Introduce+Catalog-related+Syntax

Best,
Yubin

On Tue, Mar 19, 2024 at 9:57 PM Lincoln Lee  wrote:
>
> Hi Yubin,
>
> Thanks for your quickly response!
>
> It would be better to support comments just like create `database` and
> `table` with comment.
> That is, add `String getComment();` to the current `Catalog` interface.
> WDYT?
>
> Best,
> Lincoln Lee
>
>
> Yubin Li  于2024年3月19日周二 21:44写道：
>
> > Hi Lincoln,
> >
> > Good catch. Thanks for your suggestions.
> >
> > I found that the creation statements of database and table both
> > support specifying "if not exists". For the sake of syntactic
> > consistency and user practicality, We could introduce the '[if not
> > exists]' clause to the 'create catalog' statement.
> >
> > As for the introduction of the `catalog comment` feature, it may
> > involve changes to the Catalog structure, which can be left for future
> > discussion.
> >
> > WDYT? Looking forward to your feedback :)
> >
> > Best,
> > Yubin
> >
> > On Tue, Mar 19, 2024 at 9:06 PM Lincoln Lee 
> > wrote:
> > >
> > > Hi Yubin,
> > >
> > > Big +1 for completing the catalog api!
> > > There's a minor addition[1] which does not affect the vote could also be
> > > considered.
> > >
> > > [1] https://issues.apache.org/jira/browse/FLINK-21665
> > >
> > >
> > > Best,
> > > Lincoln Lee
> > >
> > >
> > > Yubin Li  于2024年3月18日周一 17:44写道：
> > >
> > > > Hi Jark,
> > > >
> > > > Thanks for your response, I have updated FLIP-436: Introduce
> > > > Catalog-related Syntax [1] as you suggested.
> > > >
> > > > If there are no more comments within 24 hours, I will start a vote for
> > > > this, thanks :)
> > > >
> > > > Best,
> > > > Yubin
> > > >
> > > > [1]
> > > >
> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-436%3A+Introduce+Catalog-related+Syntax
> > > >
> > > > On Mon, Mar 18, 2024 at 4:39 PM Jark Wu  wrote:
> > > > >
> > > > > Hi Yubin,
> > > > >
> > > > > Thanks for the quick response. The suggestion sounds good to me!
> > > > >
> > > > > Best,
> > > > > Jark
> > > > >
> > > > > On Mon, 18 Mar 2024 at 13:06, Yubin Li  wrote:
> > > > >
> > > > > > Hi Jark,
> > > > > >
> > > > > > Good pointing! Thanks for your reply, there are some details to
> > align
> > > > :)
> > > > > >
> > > > > > 1. I think the purpose of DESCRIBE CATALOG is to display metadata
> > > > > > > information including catalog name,
> > > > > > > catalog comment (may be introduced in the future), catalog type,
> > and
> > > > > > > catalog properties (for example [1])
> > > > > >
> > > > > > Adopting { DESC | DESCRIBE } CATALOG [ EXTENDED ] xx as formal
> > syntax,
> > > > > > Producing rich and compatible results for future needs is very
> > > > important.
> > > > > > When
> > > > > > specifying "extended" in the syntax, it will output the complete
> > > > > > information including
> > > > > > properties.The complete output example is as follows:
> > > > > >
> > > > > >
> > > >
> > +-+---+
> > > > > > | catalog_description_item | catalog_description_value
> > > >|
> > > > > >
> > > > > >
> > > >
> > +-+---+
> > > > > > |   Name | cat1
> > > > > >   |
> > > > > > |   Type   |
> >  generic_in_memory
> > > > > >|
> > > > > > |   Comment   |
> > > > > >   |
> > > > > > |   Properties  |((k1,v1),
> > (k2,v2))
> > > > > > |
> > > > > >
> > > > > >
> > > >
> >

[SUMMARY] Flink 1.19 last sync summary on 03/19/2024

2024-03-19 Thread Lincoln Lee

Hi everyone,

Flink 1.19.0 has been officially released yesterday[1].

I'd like to share some highlights of the last release sync of 1.19:

- Remaining works

The official docker image is still in progress[2], will be available once
the related pr been merged[3].
In addition, some follow-up items are being processed[4], and about end of
support for lower versions will be discussed in separate mail.

Thanks to all contributors for your great work on 1.19 and the support for
the release!

The new 1.20 release cycle[5] has set off, welcome to continue contributing!

[1] https://lists.apache.org/thread/sofmxytbh6y20nwot1gywqqc2lqxn4hm
[2] https://issues.apache.org/jira/browse/FLINK-34701
[3] https://github.com/docker-library/official-images/pull/16430
[4] https://issues.apache.org/jira/browse/FLINK-34706
[5] https://lists.apache.org/thread/80h3nzk08v276xmllswbbbg1z7m3v70t


Best,
Yun, Jing, Martijn and Lincoln

[jira] [Created] (FLINK-34745) Parsing temporal table join throws cryptic exceptions

2024-03-19 Thread Dawid Wysakowicz (Jira)

Dawid Wysakowicz created FLINK-34745:


 Summary: Parsing temporal table join throws cryptic exceptions
 Key: FLINK-34745
 URL: https://issues.apache.org/jira/browse/FLINK-34745
 Project: Flink
  Issue Type: Bug
  Components: Table SQL / Planner
Affects Versions: 1.19.0
Reporter: Dawid Wysakowicz
Assignee: Dawid Wysakowicz
 Fix For: 1.20.0


1. Wrong expression type in `AS OF`:
{code}
SELECT * " +
  "FROM Orders AS o JOIN " +
  "RatesHistoryWithPK FOR SYSTEM_TIME AS OF 'o.rowtime' AS r " +
  "ON o.currency = r.currency
{code}

throws: 

{code}
java.lang.AssertionError: cannot convert CHAR literal to class 
org.apache.calcite.util.TimestampString
{code}

2. Not a table simple table reference
{code}
SELECT * " +
  "FROM Orders AS o JOIN " +
  "RatesHistoryWithPK FOR SYSTEM_TIME AS OF o.rowtime + INTERVAL '1' SECOND 
AS r " +
  "ON o.currency = r.currency
{code}

throws:
{code}
java.lang.AssertionError: no unique expression found for {id: o.rowtime, 
prefix: 1}; count is 0
{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Re: [VOTE] FLIP-436: Introduce Catalog-related Syntax

2024-03-19 Thread Jane Chan

+1 (binding)

Best,
Jane

On Tue, Mar 19, 2024 at 9:30 PM Leonard Xu  wrote:

> +1(binding)
>
>
> Best,
> Leonard
> > 2024年3月19日 下午9:03，Lincoln Lee  写道：
> >
> > +1 (binding)
> >
> > Best,
> > Lincoln Lee
> >
> >
> > Feng Jin  于2024年3月19日周二 19:59写道：
> >
> >> +1 (non-binding)
> >>
> >> Best,
> >> Feng
> >>
> >> On Tue, Mar 19, 2024 at 7:46 PM Ferenc Csaky  >
> >> wrote:
> >>
> >>> +1 (non-binding).
> >>>
> >>> Best,
> >>> Ferenc
> >>>
> >>>
> >>>
> >>>
> >>> On Tuesday, March 19th, 2024 at 12:39, Jark Wu 
> wrote:
> >>>
> 
> 
>  +1 (binding)
> 
>  Best,
>  Jark
> 
>  On Tue, 19 Mar 2024 at 19:05, Yuepeng Pan panyuep...@apache.org
> wrote:
> 
> > Hi, Yubin
> >
> > Thanks for driving it !
> >
> > +1 non-binding.
> >
> > Best,
> > Yuepeng Pan.
> >
> > At 2024-03-19 17:56:42, "Yubin Li" lyb5...@gmail.com wrote:
> >
> >> Hi everyone,
> >>
> >> Thanks for all the feedback, I'd like to start a vote on the
> >>> FLIP-436:
> >> Introduce Catalog-related Syntax [1]. The discussion thread is here
> >> [2].
> >>
> >> The vote will be open for at least 72 hours unless there is an
> >> objection or insufficient votes.
> >>
> >> [1]
> >>
> >>>
> >>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-436%3A+Introduce+Catalog-related+Syntax
> >> [2]
> >> https://lists.apache.org/thread/10k1bjb4sngyjwhmfqfky28lyoo7sv0z
> >>
> >> Best regards,
> >> Yubin
> >>>
> >>
>
>

Re: Re: [DISCUSS] FLIP-436: Introduce "SHOW CREATE CATALOG" Syntax

2024-03-19 Thread Lincoln Lee

Hi Yubin,

Thanks for your quickly response!

It would be better to support comments just like create `database` and
`table` with comment.
That is, add `String getComment();` to the current `Catalog` interface.
WDYT?

Best,
Lincoln Lee


Yubin Li  于2024年3月19日周二 21:44写道：

> Hi Lincoln,
>
> Good catch. Thanks for your suggestions.
>
> I found that the creation statements of database and table both
> support specifying "if not exists". For the sake of syntactic
> consistency and user practicality, We could introduce the '[if not
> exists]' clause to the 'create catalog' statement.
>
> As for the introduction of the `catalog comment` feature, it may
> involve changes to the Catalog structure, which can be left for future
> discussion.
>
> WDYT? Looking forward to your feedback :)
>
> Best,
> Yubin
>
> On Tue, Mar 19, 2024 at 9:06 PM Lincoln Lee 
> wrote:
> >
> > Hi Yubin,
> >
> > Big +1 for completing the catalog api!
> > There's a minor addition[1] which does not affect the vote could also be
> > considered.
> >
> > [1] https://issues.apache.org/jira/browse/FLINK-21665
> >
> >
> > Best,
> > Lincoln Lee
> >
> >
> > Yubin Li  于2024年3月18日周一 17:44写道：
> >
> > > Hi Jark,
> > >
> > > Thanks for your response, I have updated FLIP-436: Introduce
> > > Catalog-related Syntax [1] as you suggested.
> > >
> > > If there are no more comments within 24 hours, I will start a vote for
> > > this, thanks :)
> > >
> > > Best,
> > > Yubin
> > >
> > > [1]
> > >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-436%3A+Introduce+Catalog-related+Syntax
> > >
> > > On Mon, Mar 18, 2024 at 4:39 PM Jark Wu  wrote:
> > > >
> > > > Hi Yubin,
> > > >
> > > > Thanks for the quick response. The suggestion sounds good to me!
> > > >
> > > > Best,
> > > > Jark
> > > >
> > > > On Mon, 18 Mar 2024 at 13:06, Yubin Li  wrote:
> > > >
> > > > > Hi Jark,
> > > > >
> > > > > Good pointing! Thanks for your reply, there are some details to
> align
> > > :)
> > > > >
> > > > > 1. I think the purpose of DESCRIBE CATALOG is to display metadata
> > > > > > information including catalog name,
> > > > > > catalog comment (may be introduced in the future), catalog type,
> and
> > > > > > catalog properties (for example [1])
> > > > >
> > > > > Adopting { DESC | DESCRIBE } CATALOG [ EXTENDED ] xx as formal
> syntax,
> > > > > Producing rich and compatible results for future needs is very
> > > important.
> > > > > When
> > > > > specifying "extended" in the syntax, it will output the complete
> > > > > information including
> > > > > properties.The complete output example is as follows:
> > > > >
> > > > >
> > >
> +-+---+
> > > > > | catalog_description_item | catalog_description_value
> > >|
> > > > >
> > > > >
> > >
> +-+---+
> > > > > |   Name | cat1
> > > > >   |
> > > > > |   Type   |
>  generic_in_memory
> > > > >|
> > > > > |   Comment   |
> > > > >   |
> > > > > |   Properties  |((k1,v1),
> (k2,v2))
> > > > > |
> > > > >
> > > > >
> > >
> +-+---+
> > > > >
> > > > > 2. Could you add support for ALTER CATALOG xxx UNSET ('mykey')?
> This is
> > > > > > also very useful in ALTER TABLE.
> > > > >
> > > > > I found that there is already an ALTER TABLE xxx RESET ('mykey')
> > > syntax [1]
> > > > > now,
> > > > > which will reset the myKey attribute of a certain table to the
> default
> > > > > value. For catalogs,
> > > > > it might be better to use ALTER CATALOG xxx RESET ('mykey') for the
> > > sake of
> > > > > design
> > > > > consistency.
> > > > >
> > > > > WDYT? Looking forward to your suggestions.
> > > > >
> > > > > Best,
> > > > > Yubin
> > > > >
> > > > > [1]
> > > > >
> > > > >
> > >
> https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/sql/alter/#reset
> > > > >
> > > > >
> > > > > On Mon, Mar 18, 2024 at 11:49 AM Jark Wu  wrote:
> > > > >
> > > > > > Hi Yubin,
> > > > > >
> > > > > > Thanks for updating the FLIP. The updated version looks good in
> > > general.
> > > > > > I only have 2 minor comments.
> > > > > >
> > > > > > 1. I think the purpose of DESCRIBE CATALOG is to display metadata
> > > > > > information including catalog name,
> > > > > > catalog comment (may be introduced in the future), catalog type,
> and
> > > > > > catalog properties (for example [1]).
> > > > > > Expanding all properties may limit this syntax to include more
> > > metadata
> > > > > > information in the future.
> > > > > >
> > > > > > 2. Could you add support for ALTER CATALOG xxx UNSET ('mykey')?
> This
> > > is
> > > > > > also very useful in ALTER TABLE.
> > > > > >
> > > > > > Best,
> > > > > >

Re: [DISCUSS] FLIP-424: Asynchronous State APIs

2024-03-19 Thread Jane Chan

Hi Zakelly,

Thanks for your clarification. I'm +1 for using `onNext`.

Best,
Jane

On Tue, Mar 19, 2024 at 6:38 PM Zakelly Lan  wrote:

> Hi Jane,
>
> Thanks for your comments!
>
> I guess there is no problem with the word 'on' in this scenario, since it
> is an event-driven-like execution model. I think the word 'hasNext' may be
> misleading since there is a 'hasNext' in a typical iterator which returns a
> boolean for the existence of a next element. I think the GPT may also be
> misled by this word, and misunderstood the meaning of this interface and
> therefore giving the wrong suggestions :) . Actually this interface is
> introduced to iterating elements (like next()) instead of checking the
> existence. I think the name `onNext()` is more suitable, WDYT?  Or to be
> more explicit, we can add 'Compose' or 'Apply' to interfaces
> (onNextCompose, onNextAccept) matching the functions of corresponding APIs
> from 'StateFuture'. WDYT? But I'd prefer the former since it is more
> simple.
>
>
> Best,
> Zakelly
>
> On Tue, Mar 19, 2024 at 6:09 PM Jane Chan  wrote:
>
> > Hi Zakelly,
> >
> > Thanks for bringing this discussion.
> >
> > I'm +1 for the overall API design, except for one minor comment about the
> > name of StateIterator#onHasNext since I feel it is a little bit
> > unintuitive. Meanwhile, I asked the opinion from GPT, here's what it said
> >
> > The prefix "on" is commonly used in event-driven programming to denote an
> > > event handler, not to check a condition. For instance, in JavaScript,
> you
> > > might have onClick to handle click events. Therefore, using "on" may be
> > > misleading if the method is being used to check for the existence of a
> > next
> > > element.
> >
> > For an async iterator, you'd want a name that clearly conveys that the
> > > method will check for the next item asynchronously and return a promise
> > or
> > > some form of future result. In JavaScript, which supports async
> > iteration,
> > > the standard method for this is next(), which when used with async
> > > iterators, returns a promise that resolves to an object with properties
> > > value and done.
> >
> > Here are a couple of better alternatives:
> >
> > hasNextAsync: This name clearly states that the function is an
> asynchronous
> > > version of the typical hasNext method found in synchronous iterators.
> > > nextExists: This name suggests the method checks for the existence of a
> > > next item, without the potential confusion of event handler naming
> > > conventions.
> > >
> >
> > WDYT?
> >
> > Best,
> > Jane
> >
> > On Tue, Mar 19, 2024 at 5:47 PM Zakelly Lan 
> wrote:
> >
> > > Hi everyone,
> > >
> > > Thanks for your valuable feedback!
> > >
> > > The discussions were vibrant and have led to significant enhancements
> to
> > > this FLIP. With this progress, I'm looking to initiate the voting in 72
> > > hours.
> > >
> > > Please let me know if you have any concerns, thanks!
> > >
> > >
> > > Best,
> > > Zakelly
> > >
> > > On Tue, Mar 19, 2024 at 5:35 PM Zakelly Lan 
> > wrote:
> > >
> > > > Hi Yue,
> > > >
> > > > Thanks for your comments!
> > > >
> > > > 1. Is it possible for all `FutureUtils` in Flink to reuse the same
> util
> > > >> class?
> > > >
> > > > Actually, the `FutureUtils` here is a new util class that will share
> > the
> > > > same package path with the `StateFuture`. Or I'd be fine renaming it
> > > > 'StateFutureUtils'.
> > > >
> > > > 2. It seems that there is no concept of retry, timeout, or delay in
> > your
> > > >> async state api design . Do we need to provide such capabilities
> like
> > > >> `orTimeout` 、`completeDelayed`?
> > > >>
> > > > For ease of use, we do not provide such APIs allowing users to
> > customize
> > > > the behavior on timeout or retry. We may introduce a retry mechanism
> in
> > > the
> > > > framework enabled by configuration. And we will hide the 'complete'
> and
> > > > related APIs of StateFuture from users, since the completion of these
> > > > futures is totally managed by the execution framework.
> > > >
> > > >
> > > > Best,
> > > > Zakelly
> > > >
> > > >
> > > >
> > > > On Tue, Mar 19, 2024 at 5:20 PM yue ma  wrote:
> > > >
> > > >> Hi Zakelly,
> > > >>
> > > >> Thanks for your proposal. The FLIP looks good  to me +1! I'd like to
> > ask
> > > >> some minor questions
> > > >> I found that there is also a definition of class `FutureUtils` under
> > > `org.
> > > >> apache. flink. util. concurrent` which seems to offer more
> interfaces.
> > > My
> > > >> question is:
> > > >> 1. Is it possible for all `FutureUtils` in Flink to reuse the same
> > util
> > > >> class?
> > > >> 2. It seems that there is no concept of retry, timeout, or delay in
> > your
> > > >> async state api design . Do we need to provide such capabilities
> like
> > > >> `orTimeout` 、`completeDelayed`?
> > > >>
> > > >> Jing Ge  于2024年3月13日周三 20:00写道：
> > > >>
> > > >> > indeed! I missed that part. Thanks for the hint!
> > > >> >
> > > >> > Best regards,
> > > >> > Jing
> >

[jira] [Created] (FLINK-34744) autoscaling-dynamic cannot run

2024-03-19 Thread Rui Fan (Jira)

Rui Fan created FLINK-34744:
---

 Summary: autoscaling-dynamic cannot run
 Key: FLINK-34744
 URL: https://issues.apache.org/jira/browse/FLINK-34744
 Project: Flink
  Issue Type: Bug
  Components: Autoscaler
Reporter: Rui Fan
Assignee: Rui Fan
 Fix For: kubernetes-operator-1.9.0
 Attachments: image-2024-03-19-21-46-15-530.png

autoscaling-dynamic cannot run on my Mac

 !image-2024-03-19-21-46-15-530.png! 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Re: Re: [DISCUSS] FLIP-436: Introduce "SHOW CREATE CATALOG" Syntax

2024-03-19 Thread Yubin Li

Hi Lincoln,

Good catch. Thanks for your suggestions.

I found that the creation statements of database and table both
support specifying "if not exists". For the sake of syntactic
consistency and user practicality, We could introduce the '[if not
exists]' clause to the 'create catalog' statement.

As for the introduction of the `catalog comment` feature, it may
involve changes to the Catalog structure, which can be left for future
discussion.

WDYT? Looking forward to your feedback :)

Best,
Yubin

On Tue, Mar 19, 2024 at 9:06 PM Lincoln Lee  wrote:
>
> Hi Yubin,
>
> Big +1 for completing the catalog api!
> There's a minor addition[1] which does not affect the vote could also be
> considered.
>
> [1] https://issues.apache.org/jira/browse/FLINK-21665
>
>
> Best,
> Lincoln Lee
>
>
> Yubin Li  于2024年3月18日周一 17:44写道：
>
> > Hi Jark,
> >
> > Thanks for your response, I have updated FLIP-436: Introduce
> > Catalog-related Syntax [1] as you suggested.
> >
> > If there are no more comments within 24 hours, I will start a vote for
> > this, thanks :)
> >
> > Best,
> > Yubin
> >
> > [1]
> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-436%3A+Introduce+Catalog-related+Syntax
> >
> > On Mon, Mar 18, 2024 at 4:39 PM Jark Wu  wrote:
> > >
> > > Hi Yubin,
> > >
> > > Thanks for the quick response. The suggestion sounds good to me!
> > >
> > > Best,
> > > Jark
> > >
> > > On Mon, 18 Mar 2024 at 13:06, Yubin Li  wrote:
> > >
> > > > Hi Jark,
> > > >
> > > > Good pointing! Thanks for your reply, there are some details to align
> > :)
> > > >
> > > > 1. I think the purpose of DESCRIBE CATALOG is to display metadata
> > > > > information including catalog name,
> > > > > catalog comment (may be introduced in the future), catalog type, and
> > > > > catalog properties (for example [1])
> > > >
> > > > Adopting { DESC | DESCRIBE } CATALOG [ EXTENDED ] xx as formal syntax,
> > > > Producing rich and compatible results for future needs is very
> > important.
> > > > When
> > > > specifying "extended" in the syntax, it will output the complete
> > > > information including
> > > > properties.The complete output example is as follows:
> > > >
> > > >
> > +-+---+
> > > > | catalog_description_item | catalog_description_value
> >|
> > > >
> > > >
> > +-+---+
> > > > |   Name | cat1
> > > >   |
> > > > |   Type   | generic_in_memory
> > > >|
> > > > |   Comment   |
> > > >   |
> > > > |   Properties  |((k1,v1), (k2,v2))
> > > > |
> > > >
> > > >
> > +-+---+
> > > >
> > > > 2. Could you add support for ALTER CATALOG xxx UNSET ('mykey')? This is
> > > > > also very useful in ALTER TABLE.
> > > >
> > > > I found that there is already an ALTER TABLE xxx RESET ('mykey')
> > syntax [1]
> > > > now,
> > > > which will reset the myKey attribute of a certain table to the default
> > > > value. For catalogs,
> > > > it might be better to use ALTER CATALOG xxx RESET ('mykey') for the
> > sake of
> > > > design
> > > > consistency.
> > > >
> > > > WDYT? Looking forward to your suggestions.
> > > >
> > > > Best,
> > > > Yubin
> > > >
> > > > [1]
> > > >
> > > >
> > https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/sql/alter/#reset
> > > >
> > > >
> > > > On Mon, Mar 18, 2024 at 11:49 AM Jark Wu  wrote:
> > > >
> > > > > Hi Yubin,
> > > > >
> > > > > Thanks for updating the FLIP. The updated version looks good in
> > general.
> > > > > I only have 2 minor comments.
> > > > >
> > > > > 1. I think the purpose of DESCRIBE CATALOG is to display metadata
> > > > > information including catalog name,
> > > > > catalog comment (may be introduced in the future), catalog type, and
> > > > > catalog properties (for example [1]).
> > > > > Expanding all properties may limit this syntax to include more
> > metadata
> > > > > information in the future.
> > > > >
> > > > > 2. Could you add support for ALTER CATALOG xxx UNSET ('mykey')? This
> > is
> > > > > also very useful in ALTER TABLE.
> > > > >
> > > > > Best,
> > > > > Jark
> > > > >
> > > > > [1]:
> > > > >
> > > > >
> > > >
> > https://docs.databricks.com/en/sql/language-manual/sql-ref-syntax-aux-describe-schema.html
> > > > >
> > > > >
> > > > >
> > > > > On Fri, 15 Mar 2024 at 12:06, Yubin Li  wrote:
> > > > >
> > > > > > Hi Xuyang,
> > > > > >
> > > > > > Thank you for pointing this out, The parser part of `describe
> > catalog`
> > > > > > syntax
> > > > > > has indeed been implemented in FLIP-69, and it is not actually
> > > > available.
> > > > > > we can complete the syntax in this FLIP [1].  I have updated the
> > doc :)

Re: [DISCUSS] FLIP-435: Introduce a New Dynamic Table for Simplifying Data Pipelines

2024-03-19 Thread Lincoln Lee

Hi Yun,

Thank you very much for your valuable input!

Incremental mode is indeed an attractive idea, we have also discussed
this, but in the current design,

we first provided two refresh modes: CONTINUOUS and
FULL. Incremental mode can be introduced

once the execution layer has the capability.

My answer for the two questions:

1.
Yes, cascading is a good question.  Current proposal provides a
freshness that defines a dynamic
table relative to the base table’s lag. If users need to consider the
end-to-end freshness of multiple
cascaded dynamic tables, he can manually split them for now. Of
course, how to let multiple cascaded
 or dependent dynamic tables complete the freshness definition in a
simpler way, I think it can be
extended in the future.

2.
Cascading refresh is also a part we focus on discussing. In this flip,
we hope to focus as much as
possible on the core features (as it already involves a lot things),
so we did not directly introduce related
 syntax. However, based on the current design, combined with the
catalog and lineage, theoretically,
users can also finish the cascading refresh.


Best,
Lincoln Lee


Yun Tang  于2024年3月19日周二 13:45写道：

> Hi Lincoln,
>
> Thanks for driving this discussion, and I am so excited to see this topic
> being discussed in the Flink community!
>
> From my point of view, instead of the work of unifying streaming and batch
> in DataStream API [1], this FLIP actually could make users benefit from one
> engine to rule batch & streaming.
>
> If we treat this FLIP as an open-source implementation of Snowflake's
> dynamic tables [2], we still lack an incremental refresh mode to make the
> ETL near real-time with a much cheaper computation cost. However, I think
> this could be done under the current design by introducing another refresh
> mode in the future. Although the extra work of incremental view maintenance
> would be much larger.
>
> For the FLIP itself, I have several questions below:
>
> 1. It seems this FLIP does not consider the lag of refreshes across ETL
> layers from ODS ---> DWD ---> APP [3]. We currently only consider the
> scheduler interval, which means we cannot use lag to automatically schedule
> the upfront micro-batch jobs to do the work.
> 2. To support the automagical refreshes, we should consider the lineage in
> the catalog or somewhere else.
>
>
> [1]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-134%3A+Batch+execution+for+the+DataStream+API
> [2] https://docs.snowflake.com/en/user-guide/dynamic-tables-about
> [3] https://docs.snowflake.com/en/user-guide/dynamic-tables-refresh
>
> Best
> Yun Tang
>
>
> 
> From: Lincoln Lee 
> Sent: Thursday, March 14, 2024 14:35
> To: dev@flink.apache.org 
> Subject: Re: [DISCUSS] FLIP-435: Introduce a New Dynamic Table for
> Simplifying Data Pipelines
>
> Hi Jing,
>
> Thanks for your attention to this flip! I'll try to answer the following
> questions.
>
> > 1. How to define query of dynamic table?
> > Use flink sql or introducing new syntax?
> > If use flink sql, how to handle the difference in SQL between streaming
> and
> > batch processing?
> > For example, a query including window aggregate based on processing time?
> > or a query including global order by?
>
> Similar to `CREATE TABLE AS query`, here the `query` also uses Flink sql
> and
>
> doesn't introduce a totally new syntax.
> We will not change the status respect to
>
> the difference in functionality of flink sql itself on streaming and
> batch, for example,
>
> the proctime window agg on streaming and global sort on batch that you
> mentioned,
>
> in fact, do not work properly in the
> other mode, so when the user modifies the
>
> refresh mode of a dynamic table that is not supported, we will throw an
> exception.
>
> > 2. Whether modify the query of dynamic table is allowed?
> > Or we could only refresh a dynamic table based on the initial query?
>
> Yes, in the current design, the query definition of the
> dynamic table is not allowed
>
>  to be modified, and you can only refresh the data based on the
> initial definition.
>
> > 3. How to use dynamic table?
> > The dynamic table seems to be similar to the materialized view.  Will we
> do
> > something like materialized view rewriting during the optimization?
>
> It's true that dynamic table and materialized view
> are similar in some ways, but as Ron
>
> explains
> there are differences. In terms of optimization, automated
> materialization discovery
>
> similar to that supported by calcite is also a potential possibility,
> perhaps with the
>
> addition of automated rewriting in the future.
>
>
>
> Best,
> Lincoln Lee
>
>
> Ron liu  于2024年3月14日周四 14:01写道：
>
> > Hi, Timo
> >
> > Sorry for later response,  thanks for your feedback.
> > Regarding your questions:
> >
> > > Flink has introduced the concept of Dynamic Tables many years ago. How
> >
> > does the term "Dynamic Table" fit into Flink's regular tables and also
> >
> > how does it relate to

Re: [VOTE] FLIP-436: Introduce Catalog-related Syntax

2024-03-19 Thread Leonard Xu

+1(binding)


Best,
Leonard
> 2024年3月19日 下午9:03，Lincoln Lee  写道：
> 
> +1 (binding)
> 
> Best,
> Lincoln Lee
> 
> 
> Feng Jin  于2024年3月19日周二 19:59写道：
> 
>> +1 (non-binding)
>> 
>> Best,
>> Feng
>> 
>> On Tue, Mar 19, 2024 at 7:46 PM Ferenc Csaky 
>> wrote:
>> 
>>> +1 (non-binding).
>>> 
>>> Best,
>>> Ferenc
>>> 
>>> 
>>> 
>>> 
>>> On Tuesday, March 19th, 2024 at 12:39, Jark Wu  wrote:
>>> 
 
 
 +1 (binding)
 
 Best,
 Jark
 
 On Tue, 19 Mar 2024 at 19:05, Yuepeng Pan panyuep...@apache.org wrote:
 
> Hi, Yubin
> 
> Thanks for driving it !
> 
> +1 non-binding.
> 
> Best,
> Yuepeng Pan.
> 
> At 2024-03-19 17:56:42, "Yubin Li" lyb5...@gmail.com wrote:
> 
>> Hi everyone,
>> 
>> Thanks for all the feedback, I'd like to start a vote on the
>>> FLIP-436:
>> Introduce Catalog-related Syntax [1]. The discussion thread is here
>> [2].
>> 
>> The vote will be open for at least 72 hours unless there is an
>> objection or insufficient votes.
>> 
>> [1]
>> 
>>> 
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-436%3A+Introduce+Catalog-related+Syntax
>> [2]
>> https://lists.apache.org/thread/10k1bjb4sngyjwhmfqfky28lyoo7sv0z
>> 
>> Best regards,
>> Yubin
>>> 
>>

[jira] [Created] (FLINK-34743) Memory tuning takes effect even if the parallelism isn't changed

2024-03-19 Thread Rui Fan (Jira)

Rui Fan created FLINK-34743:
---

 Summary: Memory tuning takes effect even if the parallelism isn't 
changed
 Key: FLINK-34743
 URL: https://issues.apache.org/jira/browse/FLINK-34743
 Project: Flink
  Issue Type: Improvement
  Components: Autoscaler
Reporter: Rui Fan
Assignee: Rui Fan
 Fix For: kubernetes-operator-1.9.0


Currently, the memory tuning related logic is only called when the parallelism 
is changed.

See ScalingExecutor#scaleResource to get more details.

It's better to let the memory tuning takes effect even if the parallelism isn't 
changed. For example, one flink job runs with desired parallelisms, but it 
wastes memory.





--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Re: [DISCUSS] FLIP-426: Grouping Remote State Access

2024-03-19 Thread Jinzhong Li

Hi everyone,

Thanks for your valuable discussion and feedback!

Our discussions have been going on for a while. If there are no more
concerns, I would like to start the vote thread after 72 hours,thanks again!

Please let me know if you have any concerns, thanks!

Best,
Jinzhong


On Tue, Mar 19, 2024 at 8:56 PM Jinzhong Li 
wrote:

> Hi Yue,
>
> Thanks for your feedback!
>
> > 1. Does Grouping Remote State Access only support asynchronous
> interfaces?
> >--If it is: IIUC, MultiGet can also greatly improve performance for
> > synchronous access modes. Do we need to support it ？
>
> Yes. If we want to support MultiGet on existing synchronous access mode,
> we have to introduce a grouping component akin to the AEC described in
> FLIP-425[1].
> I think such a change would introduce additional complexity to the current
> synchronous model, and the extent of performance gains remains uncertain.
> Therefore, I recommend only asynchronous interfaces support "Grouping
> Remote State Access", which is designed to efficiently minimize latency in
> accessing remote state storage.
>
> > 2. Can a simple example be added to FLip on how to use Batch to access
> > states and obtain the results of states on the API?
>
> Sure. I have added a code example in the Flip[2]. Note that the multiget
> in this Flip is an internal interface, not a user-facing interface.
>
> > 3. I also agree with XiaoRui's viewpoint. Is there a corresponding Config
> > to control the  state access batch strategy?
>
> Yes, we would offer some configurable options that allow users to adjust
> the behavior of batching and grouping state access (eg. batching size,
> etc.).
>
> [1]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-425%3A+Asynchronous+Execution+Model
> [2]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-426%3A+Grouping+Remote+State+Access#FLIP426:GroupingRemoteStateAccess-CodeExampleonHowtoAccessStateUsingBatch
>
> Best,
> Jinzhong Li
>
>
> On Tue, Mar 19, 2024 at 5:52 PM yue ma  wrote:
>
>> Hi Jinzhong,
>>
>> Thanks for the FLIP.  I have the following questions:
>>
>> 1. Does Grouping Remote State Access only support asynchronous interfaces?
>> --If it is: IIUC, MultiGet can also greatly improve performance for
>> synchronous access modes. Do we need to support it ？
>> --If not, how can we distinguish between using Grouping State Access
>> in
>> asynchronous and synchronous modes?
>> 2.  Can a simple example be added to FLip on how to use Batch to access
>> states and obtain the results of states on the API?
>> 3. I also agree with XiaoRui's viewpoint. Is there a corresponding Config
>> to control the  state access batch strategy?
>>
>> --
>> Best,
>> Yue
>>
>

Re: [DISCUSS] FLIP-427: Disaggregated State Store

2024-03-19 Thread yue ma

Hi Hangxiang,

Thanks for bringing this discussion.
I have a few questions about the Proposal you mentioned in the FLIP.

The current conclusion is to use proposal 2, which is okay for me. My point
is whether we should retain the potential of proposal 1 in the design.
There are the following reasons:
1. No JNI overhead, just like the Performance Part mentioned in Flip
2. RocksDB currently also provides an interface for Env, and there are also
some implementations, such as HDFS-ENV, which seem to be easily scalable.
3. The RocksDB community continues to support LSM for different storage
media, such as  Tiered Storage

  And some optimizations have been made for this scenario, such as Per
Key Placement Comparison
.
 *Secondary cache
*,
similar to the Hybrid Block Cache mentioned in Flip-423
 If we use proposal1, we can easily reuse these optimizations .It is even
possible to discuss and review the solution together in the Rocksdb
community.
 In fact, we have already implemented some production practices using
Proposal1 internally. We have integrated HybridEnv, Tiered Storage, and
Secondary Cache on RocksDB and optimized the performance of Checkpoint and
State Restore. It seems work well for us.

-- 
Best,
Yue

Re: [DISCUSS] FLIP-428: Fault Tolerance/Rescale Integration for Disaggregated State

2024-03-19 Thread Jinzhong Li

Hi everyone,

This discussion has been open for a while and there are no new comments for
several days . As a sub-FLIP of FLIP-423 which is nearing a consensus, I
would like to start a vote after 72 hours.

Please let me know if you have any concerns, thanks!

Best,
Jinzhong


On Thu, Mar 7, 2024 at 4:55 PM Jinzhong Li  wrote:

> Hi devs,
>
> I'd like to start a discussion on a sub-FLIP of FLIP-423: Disaggregated
> State Storage and Management[1], which is a joint work of Yuan Mei, Zakelly
> Lan, Jinzhong Li, Hangxiang Yu, Yanfei Lei and Feng Wang:
>
> - FLIP-428: Fault Tolerance/Rescale Integration for Disaggregated State
> 
>  [2]
>
> This FLIP integrates checkpointing mechanisms with the disaggregated
> state store for fault tolerance and fast rescaling.
>
> Please make sure you have read the FLIP-423[1] to know the whole story,
> and we'll discuss the details of FLIP-424[2] under this mail. For the
> discussion of overall architecture or topics related with multiple
> sub-FLIPs, please post in the previous mail[3].
>
> Looking forward to hearing from you!
>
> [1] https://cwiki.apache.org/confluence/x/R4p3EQ
>
> [2] https://cwiki.apache.org/confluence/x/UYp3EQ
>
> [3] https://lists.apache.org/thread/ct8smn6g9y0b8730z7rp9zfpnwmj8vf0
>
> Best,
>
> Jinzhong Li
>
>
>
>

Re: [VOTE] FLIP-436: Introduce Catalog-related Syntax

2024-03-19 Thread Lincoln Lee

+1 (binding)

Best,
Lincoln Lee


Feng Jin  于2024年3月19日周二 19:59写道：

> +1 (non-binding)
>
> Best,
> Feng
>
> On Tue, Mar 19, 2024 at 7:46 PM Ferenc Csaky 
> wrote:
>
> > +1 (non-binding).
> >
> > Best,
> > Ferenc
> >
> >
> >
> >
> > On Tuesday, March 19th, 2024 at 12:39, Jark Wu  wrote:
> >
> > >
> > >
> > > +1 (binding)
> > >
> > > Best,
> > > Jark
> > >
> > > On Tue, 19 Mar 2024 at 19:05, Yuepeng Pan panyuep...@apache.org wrote:
> > >
> > > > Hi, Yubin
> > > >
> > > > Thanks for driving it !
> > > >
> > > > +1 non-binding.
> > > >
> > > > Best,
> > > > Yuepeng Pan.
> > > >
> > > > At 2024-03-19 17:56:42, "Yubin Li" lyb5...@gmail.com wrote:
> > > >
> > > > > Hi everyone,
> > > > >
> > > > > Thanks for all the feedback, I'd like to start a vote on the
> > FLIP-436:
> > > > > Introduce Catalog-related Syntax [1]. The discussion thread is here
> > > > > [2].
> > > > >
> > > > > The vote will be open for at least 72 hours unless there is an
> > > > > objection or insufficient votes.
> > > > >
> > > > > [1]
> > > > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-436%3A+Introduce+Catalog-related+Syntax
> > > > > [2]
> https://lists.apache.org/thread/10k1bjb4sngyjwhmfqfky28lyoo7sv0z
> > > > >
> > > > > Best regards,
> > > > > Yubin
> >
>

Re: Re: [DISCUSS] FLIP-436: Introduce "SHOW CREATE CATALOG" Syntax

2024-03-19 Thread Lincoln Lee

Hi Yubin,

Big +1 for completing the catalog api!
There's a minor addition[1] which does not affect the vote could also be
considered.

[1] https://issues.apache.org/jira/browse/FLINK-21665


Best,
Lincoln Lee


Yubin Li  于2024年3月18日周一 17:44写道：

> Hi Jark,
>
> Thanks for your response, I have updated FLIP-436: Introduce
> Catalog-related Syntax [1] as you suggested.
>
> If there are no more comments within 24 hours, I will start a vote for
> this, thanks :)
>
> Best,
> Yubin
>
> [1]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-436%3A+Introduce+Catalog-related+Syntax
>
> On Mon, Mar 18, 2024 at 4:39 PM Jark Wu  wrote:
> >
> > Hi Yubin,
> >
> > Thanks for the quick response. The suggestion sounds good to me!
> >
> > Best,
> > Jark
> >
> > On Mon, 18 Mar 2024 at 13:06, Yubin Li  wrote:
> >
> > > Hi Jark,
> > >
> > > Good pointing! Thanks for your reply, there are some details to align
> :)
> > >
> > > 1. I think the purpose of DESCRIBE CATALOG is to display metadata
> > > > information including catalog name,
> > > > catalog comment (may be introduced in the future), catalog type, and
> > > > catalog properties (for example [1])
> > >
> > > Adopting { DESC | DESCRIBE } CATALOG [ EXTENDED ] xx as formal syntax,
> > > Producing rich and compatible results for future needs is very
> important.
> > > When
> > > specifying "extended" in the syntax, it will output the complete
> > > information including
> > > properties.The complete output example is as follows:
> > >
> > >
> +-+---+
> > > | catalog_description_item | catalog_description_value
>|
> > >
> > >
> +-+---+
> > > |   Name | cat1
> > >   |
> > > |   Type   | generic_in_memory
> > >|
> > > |   Comment   |
> > >   |
> > > |   Properties  |((k1,v1), (k2,v2))
> > > |
> > >
> > >
> +-+---+
> > >
> > > 2. Could you add support for ALTER CATALOG xxx UNSET ('mykey')? This is
> > > > also very useful in ALTER TABLE.
> > >
> > > I found that there is already an ALTER TABLE xxx RESET ('mykey')
> syntax [1]
> > > now,
> > > which will reset the myKey attribute of a certain table to the default
> > > value. For catalogs,
> > > it might be better to use ALTER CATALOG xxx RESET ('mykey') for the
> sake of
> > > design
> > > consistency.
> > >
> > > WDYT? Looking forward to your suggestions.
> > >
> > > Best,
> > > Yubin
> > >
> > > [1]
> > >
> > >
> https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/sql/alter/#reset
> > >
> > >
> > > On Mon, Mar 18, 2024 at 11:49 AM Jark Wu  wrote:
> > >
> > > > Hi Yubin,
> > > >
> > > > Thanks for updating the FLIP. The updated version looks good in
> general.
> > > > I only have 2 minor comments.
> > > >
> > > > 1. I think the purpose of DESCRIBE CATALOG is to display metadata
> > > > information including catalog name,
> > > > catalog comment (may be introduced in the future), catalog type, and
> > > > catalog properties (for example [1]).
> > > > Expanding all properties may limit this syntax to include more
> metadata
> > > > information in the future.
> > > >
> > > > 2. Could you add support for ALTER CATALOG xxx UNSET ('mykey')? This
> is
> > > > also very useful in ALTER TABLE.
> > > >
> > > > Best,
> > > > Jark
> > > >
> > > > [1]:
> > > >
> > > >
> > >
> https://docs.databricks.com/en/sql/language-manual/sql-ref-syntax-aux-describe-schema.html
> > > >
> > > >
> > > >
> > > > On Fri, 15 Mar 2024 at 12:06, Yubin Li  wrote:
> > > >
> > > > > Hi Xuyang,
> > > > >
> > > > > Thank you for pointing this out, The parser part of `describe
> catalog`
> > > > > syntax
> > > > > has indeed been implemented in FLIP-69, and it is not actually
> > > available.
> > > > > we can complete the syntax in this FLIP [1].  I have updated the
> doc :)
> > > > >
> > > > > Best,
> > > > > Yubin
> > > > >
> > > > > [1]
> > > > >
> > > > >
> > > >
> > >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-436%3A+Introduce+Catalog-related+Syntax
> > > > >
> > > > > On Fri, Mar 15, 2024 at 10:12 AM Xuyang 
> wrote:
> > > > >
> > > > > > Hi, Yubin. Big +1 for this Flip. I just left one minor comment
> > > > following.
> > > > > >
> > > > > >
> > > > > > I found that although flink has not supported syntax 'DESCRIBE
> > > CATALOG
> > > > > > catalog_name' currently, it was already
> > > > > > discussed in flip-69[1], do we need to restart discussing it?
> > > > > > I don't have a particular preference regarding the restart
> > > discussion.
> > > > It
> > > > > > seems that there is no difference on this syntax
> > > > > > in FLIP-436, so maybe it would be best to refer back

Re: [DISCUSS] FLIP-426: Grouping Remote State Access

2024-03-19 Thread Jinzhong Li

Hi Yue,

Thanks for your feedback!

> 1. Does Grouping Remote State Access only support asynchronous interfaces?
>--If it is: IIUC, MultiGet can also greatly improve performance for
> synchronous access modes. Do we need to support it ？

Yes. If we want to support MultiGet on existing synchronous access mode, we
have to introduce a grouping component akin to the AEC described in
FLIP-425[1].
I think such a change would introduce additional complexity to the current
synchronous model, and the extent of performance gains remains uncertain.
Therefore, I recommend only asynchronous interfaces support "Grouping
Remote State Access", which is designed to efficiently minimize latency in
accessing remote state storage.

> 2. Can a simple example be added to FLip on how to use Batch to access
> states and obtain the results of states on the API?

Sure. I have added a code example in the Flip[2]. Note that the multiget in
this Flip is an internal interface, not a user-facing interface.

> 3. I also agree with XiaoRui's viewpoint. Is there a corresponding Config
> to control the  state access batch strategy?

Yes, we would offer some configurable options that allow users to adjust
the behavior of batching and grouping state access (eg. batching size,
etc.).

[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-425%3A+Asynchronous+Execution+Model
[2]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-426%3A+Grouping+Remote+State+Access#FLIP426:GroupingRemoteStateAccess-CodeExampleonHowtoAccessStateUsingBatch

Best,
Jinzhong Li

On Tue, Mar 19, 2024 at 5:52 PM yue ma  wrote:

> Hi Jinzhong,
>
> Thanks for the FLIP.  I have the following questions:
>
> 1. Does Grouping Remote State Access only support asynchronous interfaces?
> --If it is: IIUC, MultiGet can also greatly improve performance for
> synchronous access modes. Do we need to support it ？
> --If not, how can we distinguish between using Grouping State Access in
> asynchronous and synchronous modes?
> 2.  Can a simple example be added to FLip on how to use Batch to access
> states and obtain the results of states on the API?
> 3. I also agree with XiaoRui's viewpoint. Is there a corresponding Config
> to control the  state access batch strategy?
>
> --
> Best,
> Yue
>

Re: [VOTE] FLIP-436: Introduce Catalog-related Syntax

2024-03-19 Thread Feng Jin

+1 (non-binding)

Best,
Feng

On Tue, Mar 19, 2024 at 7:46 PM Ferenc Csaky 
wrote:

> +1 (non-binding).
>
> Best,
> Ferenc
>
>
>
>
> On Tuesday, March 19th, 2024 at 12:39, Jark Wu  wrote:
>
> >
> >
> > +1 (binding)
> >
> > Best,
> > Jark
> >
> > On Tue, 19 Mar 2024 at 19:05, Yuepeng Pan panyuep...@apache.org wrote:
> >
> > > Hi, Yubin
> > >
> > > Thanks for driving it !
> > >
> > > +1 non-binding.
> > >
> > > Best,
> > > Yuepeng Pan.
> > >
> > > At 2024-03-19 17:56:42, "Yubin Li" lyb5...@gmail.com wrote:
> > >
> > > > Hi everyone,
> > > >
> > > > Thanks for all the feedback, I'd like to start a vote on the
> FLIP-436:
> > > > Introduce Catalog-related Syntax [1]. The discussion thread is here
> > > > [2].
> > > >
> > > > The vote will be open for at least 72 hours unless there is an
> > > > objection or insufficient votes.
> > > >
> > > > [1]
> > > >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-436%3A+Introduce+Catalog-related+Syntax
> > > > [2] https://lists.apache.org/thread/10k1bjb4sngyjwhmfqfky28lyoo7sv0z
> > > >
> > > > Best regards,
> > > > Yubin
>

Re: [VOTE] FLIP-436: Introduce Catalog-related Syntax

2024-03-19 Thread Ferenc Csaky

+1 (non-binding).

Best,
Ferenc




On Tuesday, March 19th, 2024 at 12:39, Jark Wu  wrote:

> 
> 
> +1 (binding)
> 
> Best,
> Jark
> 
> On Tue, 19 Mar 2024 at 19:05, Yuepeng Pan panyuep...@apache.org wrote:
> 
> > Hi, Yubin
> > 
> > Thanks for driving it !
> > 
> > +1 non-binding.
> > 
> > Best,
> > Yuepeng Pan.
> > 
> > At 2024-03-19 17:56:42, "Yubin Li" lyb5...@gmail.com wrote:
> > 
> > > Hi everyone,
> > > 
> > > Thanks for all the feedback, I'd like to start a vote on the FLIP-436:
> > > Introduce Catalog-related Syntax [1]. The discussion thread is here
> > > [2].
> > > 
> > > The vote will be open for at least 72 hours unless there is an
> > > objection or insufficient votes.
> > > 
> > > [1]
> > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-436%3A+Introduce+Catalog-related+Syntax
> > > [2] https://lists.apache.org/thread/10k1bjb4sngyjwhmfqfky28lyoo7sv0z
> > > 
> > > Best regards,
> > > Yubin

Re: [VOTE] FLIP-436: Introduce Catalog-related Syntax

2024-03-19 Thread Jark Wu

+1 (binding)

Best,
Jark

On Tue, 19 Mar 2024 at 19:05, Yuepeng Pan  wrote:

> Hi, Yubin
>
>
> Thanks for driving it !
>
> +1 non-binding.
>
>
>
>
>
>
>
> Best,
> Yuepeng Pan.
>
>
>
>
>
>
>
>
> At 2024-03-19 17:56:42, "Yubin Li"  wrote:
> >Hi everyone,
> >
> >Thanks for all the feedback, I'd like to start a vote on the FLIP-436:
> >Introduce Catalog-related Syntax [1]. The discussion thread is here
> >[2].
> >
> >The vote will be open for at least 72 hours unless there is an
> >objection or insufficient votes.
> >
> >[1]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-436%3A+Introduce+Catalog-related+Syntax
> >[2] https://lists.apache.org/thread/10k1bjb4sngyjwhmfqfky28lyoo7sv0z
> >
> >Best regards,
> >Yubin
>

Re:[VOTE] FLIP-436: Introduce Catalog-related Syntax

2024-03-19 Thread Yuepeng Pan

Hi, Yubin


Thanks for driving it !

+1 non-binding.







Best,
Yuepeng Pan.








At 2024-03-19 17:56:42, "Yubin Li"  wrote:
>Hi everyone,
>
>Thanks for all the feedback, I'd like to start a vote on the FLIP-436:
>Introduce Catalog-related Syntax [1]. The discussion thread is here
>[2].
>
>The vote will be open for at least 72 hours unless there is an
>objection or insufficient votes.
>
>[1] 
>https://cwiki.apache.org/confluence/display/FLINK/FLIP-436%3A+Introduce+Catalog-related+Syntax
>[2] https://lists.apache.org/thread/10k1bjb4sngyjwhmfqfky28lyoo7sv0z
>
>Best regards,
>Yubin

Re: [DISCUSS] FLIP Suggestion: Externalize Kudu Connector from Bahir

2024-03-19 Thread Ferenc Csaky

Hi,

since there are no more comments for a while, if there are no more comments for 
another day, I will start a vote thread.

Thanks,
Ferenc


On Thursday, March 14th, 2024 at 11:20, Ferenc Csaky 
 wrote:

> 
> 
> Hi,
> 
> Gentle ping to see if there are any other concerns or things that seems 
> missing from the FLIP.
> 
> Best,
> Ferenc
> 
> 
> 
> 
> On Monday, March 11th, 2024 at 11:11, Ferenc Csaky ferenc.cs...@pm.me.INVALID 
> wrote:
> 
> > Hi Jing,
> > 
> > Thank you for your comments! Updated the FLIP with reasoning on the 
> > proposed release versions and included them in the headline "Release" field.
> > 
> > Best,
> > Ferenc
> > 
> > On Sunday, March 10th, 2024 at 16:59, Jing Ge j...@ververica.com.INVALID 
> > wrote:
> > 
> > > Hi Ferenc,
> > > 
> > > Thanks for the proposal! +1 for it!
> > > 
> > > Similar to what Leonard mentioned. I would suggest:
> > > 1. Use the "release" to define the release version of the Kudu connector
> > > itself.
> > > 2. Optionally, add one more row underneath to describe which Flink 
> > > versions
> > > this release will be compatible with, e.g. 1.17, 1.18. I think it makes
> > > sense to support at least two last Flink releases. An example could be
> > > found at [1]
> > > 
> > > Best regards,
> > > Jing
> > > 
> > > [1] https://lists.apache.org/thread/jcjfy3fgpg5cdnb9noslq2c77h0gtcwp
> > > 
> > > On Sun, Mar 10, 2024 at 3:46 PM Yanquan Lv decq12y...@gmail.com wrote:
> > > 
> > > > Hi Ferenc, +1 for this FLIP.
> > > > 
> > > > Ferenc Csaky ferenc.cs...@pm.me.invalid 于2024年3月9日周六 01:49写道：
> > > > 
> > > > > Thank you Jeyhun, Leonard, and Hang for your comments! Let me
> > > > > address them from earliest to latest.
> > > > > 
> > > > > > How do you plan the review process in this case (e.g. incremental
> > > > > > over existing codebase or cumulative all at once) ?
> > > > > 
> > > > > I think incremental would be less time consuming and complex for
> > > > > reviewers so I would leaning towards that direction. I would
> > > > > imagine multiple subtasks for migrating the existing code, and
> > > > > updating the deprecated interfaces, so those should be separate PRs 
> > > > > and
> > > > > the release can be initiated when everything is merged.
> > > > > 
> > > > > > (1) About the release version, could you specify kudu connector 
> > > > > > version
> > > > > > instead of flink version 1.18 as external connector version is 
> > > > > > different
> > > > > > with flink?
> > > > > > (2) About the connector config options, could you enumerate these
> > > > > > options so that we can review they’re reasonable or not?
> > > > > 
> > > > > I added these to the FLIP, copied the current configs options as is,
> > > > > PTAL.
> > > > > 
> > > > > > (3) Metrics is also key part of connector, could you add the 
> > > > > > supported
> > > > > > connector metrics to public interface as well?
> > > > > 
> > > > > The current Bahir conenctor code does not include any metrics and I 
> > > > > did
> > > > > not plan to include them into the scope of this FLIP.
> > > > > 
> > > > > > I think that how to state this code originally lived in Bahir may 
> > > > > > be in
> > > > > > the
> > > > > > FLIP.
> > > > > 
> > > > > I might miss your point, but the FLIP contains this: "Migrating the
> > > > > current code keeping the history and noting it explicitly it was 
> > > > > forked
> > > > > from the Bahir repository [2]." Pls. share if you meant something 
> > > > > else.
> > > > > 
> > > > > Best,
> > > > > Ferenc
> > > > > 
> > > > > On Friday, March 8th, 2024 at 10:42, Hang Ruan ruanhang1...@gmail.com
> > > > > wrote:
> > > > > 
> > > > > > Hi, Ferenc.
> > > > > > 
> > > > > > Thanks for the FLIP discussion. +1 for the proposal.
> > > > > > I think that how to state this code originally lived in Bahir may 
> > > > > > be in
> > > > > > the
> > > > > > FLIP.
> > > > > > 
> > > > > > Best,
> > > > > > Hang
> > > > > > 
> > > > > > Leonard Xu xbjt...@gmail.com 于2024年3月7日周四 14:14写道：
> > > > > > 
> > > > > > > Thanks Ferenc for kicking off this discussion, I left some 
> > > > > > > comments
> > > > > > > here:
> > > > > > > 
> > > > > > > (1) About the release version, could you specify kudu connector
> > > > > > > version
> > > > > > > instead of flink version 1.18 as external connector version is
> > > > > > > different
> > > > > > > with flink ?
> > > > > > > 
> > > > > > > (2) About the connector config options, could you enumerate these
> > > > > > > options
> > > > > > > so that we can review they’re reasonable or not?
> > > > > > > 
> > > > > > > (3) Metrics is also key part of connector, could you add the
> > > > > > > supported
> > > > > > > connector metrics to public interface as well?
> > > > > > > 
> > > > > > > Best,
> > > > > > > Leonard
> > > > > > > 
> > > > > > > > 2024年3月6日 下午11:23，Ferenc Csaky ferenc.cs...@pm.me.INVALID 写道：
> > > > > > > > 
> > > > > > > > Hello devs,
> > > > > > > > 
> > > > > > > > Opening this thread to discuss a FLIP [1] about

Re: [DISCUSS] FLIP-424: Asynchronous State APIs

2024-03-19 Thread Zakelly Lan

Hi Jane,

Thanks for your comments!

I guess there is no problem with the word 'on' in this scenario, since it
is an event-driven-like execution model. I think the word 'hasNext' may be
misleading since there is a 'hasNext' in a typical iterator which returns a
boolean for the existence of a next element. I think the GPT may also be
misled by this word, and misunderstood the meaning of this interface and
therefore giving the wrong suggestions :) . Actually this interface is
introduced to iterating elements (like next()) instead of checking the
existence. I think the name `onNext()` is more suitable, WDYT?  Or to be
more explicit, we can add 'Compose' or 'Apply' to interfaces
(onNextCompose, onNextAccept) matching the functions of corresponding APIs
from 'StateFuture'. WDYT? But I'd prefer the former since it is more simple.


Best,
Zakelly

On Tue, Mar 19, 2024 at 6:09 PM Jane Chan  wrote:

> Hi Zakelly,
>
> Thanks for bringing this discussion.
>
> I'm +1 for the overall API design, except for one minor comment about the
> name of StateIterator#onHasNext since I feel it is a little bit
> unintuitive. Meanwhile, I asked the opinion from GPT, here's what it said
>
> The prefix "on" is commonly used in event-driven programming to denote an
> > event handler, not to check a condition. For instance, in JavaScript, you
> > might have onClick to handle click events. Therefore, using "on" may be
> > misleading if the method is being used to check for the existence of a
> next
> > element.
>
> For an async iterator, you'd want a name that clearly conveys that the
> > method will check for the next item asynchronously and return a promise
> or
> > some form of future result. In JavaScript, which supports async
> iteration,
> > the standard method for this is next(), which when used with async
> > iterators, returns a promise that resolves to an object with properties
> > value and done.
>
> Here are a couple of better alternatives:
>
> hasNextAsync: This name clearly states that the function is an asynchronous
> > version of the typical hasNext method found in synchronous iterators.
> > nextExists: This name suggests the method checks for the existence of a
> > next item, without the potential confusion of event handler naming
> > conventions.
> >
>
> WDYT?
>
> Best,
> Jane
>
> On Tue, Mar 19, 2024 at 5:47 PM Zakelly Lan  wrote:
>
> > Hi everyone,
> >
> > Thanks for your valuable feedback!
> >
> > The discussions were vibrant and have led to significant enhancements to
> > this FLIP. With this progress, I'm looking to initiate the voting in 72
> > hours.
> >
> > Please let me know if you have any concerns, thanks!
> >
> >
> > Best,
> > Zakelly
> >
> > On Tue, Mar 19, 2024 at 5:35 PM Zakelly Lan 
> wrote:
> >
> > > Hi Yue,
> > >
> > > Thanks for your comments!
> > >
> > > 1. Is it possible for all `FutureUtils` in Flink to reuse the same util
> > >> class?
> > >
> > > Actually, the `FutureUtils` here is a new util class that will share
> the
> > > same package path with the `StateFuture`. Or I'd be fine renaming it
> > > 'StateFutureUtils'.
> > >
> > > 2. It seems that there is no concept of retry, timeout, or delay in
> your
> > >> async state api design . Do we need to provide such capabilities like
> > >> `orTimeout` 、`completeDelayed`?
> > >>
> > > For ease of use, we do not provide such APIs allowing users to
> customize
> > > the behavior on timeout or retry. We may introduce a retry mechanism in
> > the
> > > framework enabled by configuration. And we will hide the 'complete' and
> > > related APIs of StateFuture from users, since the completion of these
> > > futures is totally managed by the execution framework.
> > >
> > >
> > > Best,
> > > Zakelly
> > >
> > >
> > >
> > > On Tue, Mar 19, 2024 at 5:20 PM yue ma  wrote:
> > >
> > >> Hi Zakelly,
> > >>
> > >> Thanks for your proposal. The FLIP looks good  to me +1! I'd like to
> ask
> > >> some minor questions
> > >> I found that there is also a definition of class `FutureUtils` under
> > `org.
> > >> apache. flink. util. concurrent` which seems to offer more interfaces.
> > My
> > >> question is:
> > >> 1. Is it possible for all `FutureUtils` in Flink to reuse the same
> util
> > >> class?
> > >> 2. It seems that there is no concept of retry, timeout, or delay in
> your
> > >> async state api design . Do we need to provide such capabilities like
> > >> `orTimeout` 、`completeDelayed`?
> > >>
> > >> Jing Ge  于2024年3月13日周三 20:00写道：
> > >>
> > >> > indeed! I missed that part. Thanks for the hint!
> > >> >
> > >> > Best regards,
> > >> > Jing
> > >> >
> > >> > On Wed, Mar 13, 2024 at 6:02 AM Zakelly Lan 
> > >> wrote:
> > >> >
> > >> > > Hi Jing,
> > >> > >
> > >> > > The deprecation and removal of original APIs is beyond the scope
> of
> > >> > > current FLIP, but I do add/highlight such information under
> > >> > "Compatibility,
> > >> > > Deprecation, and Migration Plan" section.
> > >> > >
> > >> > >
> > >> > > Best,
> > >> > > Zakelly
> > >> > >
> > >> > >

Re: [DISCUSS] FLIP-424: Asynchronous State APIs

2024-03-19 Thread Jane Chan

Hi Zakelly,

Thanks for bringing this discussion.

I'm +1 for the overall API design, except for one minor comment about the
name of StateIterator#onHasNext since I feel it is a little bit
unintuitive. Meanwhile, I asked the opinion from GPT, here's what it said

The prefix "on" is commonly used in event-driven programming to denote an
> event handler, not to check a condition. For instance, in JavaScript, you
> might have onClick to handle click events. Therefore, using "on" may be
> misleading if the method is being used to check for the existence of a next
> element.

For an async iterator, you'd want a name that clearly conveys that the
> method will check for the next item asynchronously and return a promise or
> some form of future result. In JavaScript, which supports async iteration,
> the standard method for this is next(), which when used with async
> iterators, returns a promise that resolves to an object with properties
> value and done.

Here are a couple of better alternatives:

hasNextAsync: This name clearly states that the function is an asynchronous
> version of the typical hasNext method found in synchronous iterators.
> nextExists: This name suggests the method checks for the existence of a
> next item, without the potential confusion of event handler naming
> conventions.
>

WDYT?

Best,
Jane

On Tue, Mar 19, 2024 at 5:47 PM Zakelly Lan  wrote:

> Hi everyone,
>
> Thanks for your valuable feedback!
>
> The discussions were vibrant and have led to significant enhancements to
> this FLIP. With this progress, I'm looking to initiate the voting in 72
> hours.
>
> Please let me know if you have any concerns, thanks!
>
>
> Best,
> Zakelly
>
> On Tue, Mar 19, 2024 at 5:35 PM Zakelly Lan  wrote:
>
> > Hi Yue,
> >
> > Thanks for your comments!
> >
> > 1. Is it possible for all `FutureUtils` in Flink to reuse the same util
> >> class?
> >
> > Actually, the `FutureUtils` here is a new util class that will share the
> > same package path with the `StateFuture`. Or I'd be fine renaming it
> > 'StateFutureUtils'.
> >
> > 2. It seems that there is no concept of retry, timeout, or delay in your
> >> async state api design . Do we need to provide such capabilities like
> >> `orTimeout` 、`completeDelayed`?
> >>
> > For ease of use, we do not provide such APIs allowing users to customize
> > the behavior on timeout or retry. We may introduce a retry mechanism in
> the
> > framework enabled by configuration. And we will hide the 'complete' and
> > related APIs of StateFuture from users, since the completion of these
> > futures is totally managed by the execution framework.
> >
> >
> > Best,
> > Zakelly
> >
> >
> >
> > On Tue, Mar 19, 2024 at 5:20 PM yue ma  wrote:
> >
> >> Hi Zakelly,
> >>
> >> Thanks for your proposal. The FLIP looks good  to me +1! I'd like to ask
> >> some minor questions
> >> I found that there is also a definition of class `FutureUtils` under
> `org.
> >> apache. flink. util. concurrent` which seems to offer more interfaces.
> My
> >> question is:
> >> 1. Is it possible for all `FutureUtils` in Flink to reuse the same util
> >> class?
> >> 2. It seems that there is no concept of retry, timeout, or delay in your
> >> async state api design . Do we need to provide such capabilities like
> >> `orTimeout` 、`completeDelayed`?
> >>
> >> Jing Ge  于2024年3月13日周三 20:00写道：
> >>
> >> > indeed! I missed that part. Thanks for the hint!
> >> >
> >> > Best regards,
> >> > Jing
> >> >
> >> > On Wed, Mar 13, 2024 at 6:02 AM Zakelly Lan 
> >> wrote:
> >> >
> >> > > Hi Jing,
> >> > >
> >> > > The deprecation and removal of original APIs is beyond the scope of
> >> > > current FLIP, but I do add/highlight such information under
> >> > "Compatibility,
> >> > > Deprecation, and Migration Plan" section.
> >> > >
> >> > >
> >> > > Best,
> >> > > Zakelly
> >> > >
> >> > > On Wed, Mar 13, 2024 at 9:18 AM Yunfeng Zhou <
> >> > flink.zhouyunf...@gmail.com>
> >> > > wrote:
> >> > >
> >> > >> Hi Zakelly,
> >> > >>
> >> > >> Thanks for your responses. I agree with it that we can keep the
> >> design
> >> > >> as it is for now and see if others have any better ideas for these
> >> > >> questions.
> >> > >>
> >> > >> Best,
> >> > >> Yunfeng
> >> > >>
> >> > >> On Tue, Mar 12, 2024 at 5:23 PM Zakelly Lan  >
> >> > >> wrote:
> >> > >> >
> >> > >> > Hi Xuannan,
> >> > >> >
> >> > >> > Thanks for your comments, I modified the FLIP accordingly.
> >> > >> >
> >> > >> > Hi Yunfeng,
> >> > >> >
> >> > >> > Thanks for sharing your opinions!
> >> > >> >
> >> > >> >> Could you provide some hint on use cases where users need to mix
> >> sync
> >> > >> >> and async state operations in spite of the performance
> regression?
> >> > >> >> This information might help address our concerns on design. If
> the
> >> > >> >> mixed usage is simply something not recommended, I would prefer
> to
> >> > >> >> prohibit such usage from API.
> >> > >> >
> >> > >> > In fact, there is no scenario where users MUST use the sync APIs,
> >> but
> >>

Re: [DISCUSS] FLIP-433: State Access on DataStream API V2

2024-03-19 Thread weijie guo

Hi everyone,

Thanks for your discussion and feedback!

Our discussions have been going on for a while and there have been no new
concerns for several days. So I would like to start voting recently.

Best regards,

Weijie


Zakelly Lan  于2024年3月12日周二 17:40写道：

> Hi Weijie,
>
> Thanks for your reply!
>
> Overall I'd be fine with the builder pattern, but it is a little bit long
> when carrying explicit 'build()' and declaring the builder. Keeping the
> StateDeclaration immutable is OK, but it is a little bit inconvenient for
> overriding the undefined options by job configuration at runtime. I'd
> suggest providing some methods responsible for rebuilding a new
> StateDeclaration with new configurable options, just like the
> ConfigOptions#defaultValue does. Well, this is just a suggestion, I'm not
> going to insist on it.
>
>
> Best,
> Zakelly
>
> On Tue, Mar 12, 2024 at 2:07 PM weijie guo 
> wrote:
>
> > Hi Zakelly,
> >
> > > But still, from a user's point of view,  state can be characterized
> along
> > two relatively independent dimensions, how states redistribute and the
> data
> > structure. Thus I still suggest a chained-like configuration API that
> > configures one aspect on each call.
> >
> >
> > I think the chained-like style is a good suggestion. But I'm not going to
> > introduce any mutable-like API to StateDeclaration (even though we can
> > achieve immutability by returning a new object). For this reason, I
> decided
> > to use the builder pattern, which also has the benefit of chaining calls
> > and allows us to support further configurations such as setTTL in the
> > future. For ease of use, we'll also provide some shortcuts to avoid
> having
> > to go through a long build chain each time. Of course, I have updated the
> > the FLIP about this part.
> >
> >
> >
> > Best regards,
> >
> > Weijie
> >
> >
> > weijie guo  于2024年3月12日周二 14:00写道：
> >
> > > Hi Hangxiang,
> > >
> > > > So these operators only define all states they may use which could be
> > > explained by the caller, right ?
> > >
> > > Yes, you're right.
> > >
> > >
> > > Best regards,
> > >
> > > Weijie
> > >
> > >
> > > weijie guo  于2024年3月12日周二 13:59写道：
> > >
> > >> Hi Max,
> > >>
> > >> > In this thread it looks like the plan is to remove the old state
> > >> declaration API. I think we should consider keeping the old APIs to
> > >> avoid breaking too many jobs.
> > >>
> > >> We're not plan to remove any old apis, which means that changes made
> in
> > >> V2 won't affect any V1 DataStream jobs. But V2 is limited to the new
> > state
> > >> declaration API, and users who want to migrate to DataStream V2 will
> > need
> > >> to rewrite their jobs anyway.
> > >>
> > >> Best regards,
> > >>
> > >> Weijie
> > >>
> > >>
> > >> Hangxiang Yu  于2024年3月12日周二 10:26写道：
> > >>
> > >>> Hi, Weijie.
> > >>> Thanks for your answer!
> > >>>
> > >>> > No, Introducing and declaring new state
> > >>> > at runtime is something we want to explicitly disallow.
> > >>>
> > >>> I just thinked about how some operators define their useState() when
> > >>> their
> > >>> real used states may be changed at runtime, e.g. different state
> types
> > >>> for
> > >>> different state sizes.
> > >>> So these operators only define all states they may use which could be
> > >>> explained by the caller, right ?
> > >>>
> > >>> On Mon, Mar 11, 2024 at 10:57 PM Maximilian Michels 
> > >>> wrote:
> > >>>
> > >>> > The FLIP mentions: "The contents described in this FLIP are all new
> > >>> > APIs and do not involve compatibility issues."
> > >>> >
> > >>> > In this thread it looks like the plan is to remove the old state
> > >>> > declaration API. I think we should consider keeping the old APIs to
> > >>> > avoid breaking too many jobs. The new APIs will still be beneficial
> > >>> > for new jobs, e.g. for SQL jobs.
> > >>> >
> > >>> > -Max
> > >>> >
> > >>> > On Fri, Mar 8, 2024 at 4:39 AM Zakelly Lan 
> > >>> wrote:
> > >>> > >
> > >>> > > Hi Weijie,
> > >>> > >
> > >>> > > Thanks for your answer! Well I get your point. Since partitions
> are
> > >>> > > first-class citizens, and redistribution means how states migrate
> > >>> when
> > >>> > > partitions change, I'd be fine with deemphasizing the concept of
> > >>> > > keyed/operator state if we highlight the definition of partition
> in
> > >>> the
> > >>> > > document. Keeping `RedistributionMode` under `StateDeclaration`
> is
> > >>> also
> > >>> > > fine with me, as I guess it is only for internal usage.
> > >>> > > But still, from a user's point of view,  state can be
> characterized
> > >>> along
> > >>> > > two relatively independent dimensions, how states redistribute
> and
> > >>> the
> > >>> > data
> > >>> > > structure. Thus I still suggest a chained-like configuration API
> > that
> > >>> > > configures one aspect on each call, such as:
> > >>> > > ```
> > >>> > > # Keyed stream, no redistribution mode specified, the state will
> go
> > >>> with
> > >>> > > partition (no redistribution).  Keyed state
>

[jira] [Created] (FLINK-34742) Translate "FAQ" Page for Flink CDC Chinese Documentation

2024-03-19 Thread LvYanquan (Jira)

LvYanquan created FLINK-34742:
-

 Summary: Translate "FAQ" Page for Flink CDC Chinese Documentation
 Key: FLINK-34742
 URL: https://issues.apache.org/jira/browse/FLINK-34742
 Project: Flink
  Issue Type: Sub-task
  Components: chinese-translation, Documentation, Flink CDC
Affects Versions: cdc-3.1.0
Reporter: LvYanquan
 Fix For: cdc-3.1.0


Translate 
[https://github.com/apache/flink-cdc/blob/master/docs/content/docs/faq/faq.md] 
page into Chinese.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[VOTE] FLIP-436: Introduce Catalog-related Syntax

2024-03-19 Thread Yubin Li

Hi everyone,

Thanks for all the feedback, I'd like to start a vote on the FLIP-436:
Introduce Catalog-related Syntax [1]. The discussion thread is here
[2].

The vote will be open for at least 72 hours unless there is an
objection or insufficient votes.

[1] 
https://cwiki.apache.org/confluence/display/FLINK/FLIP-436%3A+Introduce+Catalog-related+Syntax
[2] https://lists.apache.org/thread/10k1bjb4sngyjwhmfqfky28lyoo7sv0z

Best regards,
Yubin

Re: [DISCUSS] FLIP-427: Disaggregated State Store

2024-03-19 Thread Hangxiang Yu

Hi everyone,

Thanks for your valuable feedback!

Our discussions have been going on for a while.
As a sub-FLIP of FLIP-423 which is nearing a consensus, I would like to
start a vote after 72 hours.

Please let me know if you have any concerns, thanks!

On Mon, Mar 11, 2024 at 11:48 AM Hangxiang Yu  wrote:

> Hi, Jeyhun.
>
> Thanks for the reply.
>
> Is this argument true for all workloads? Or does this argument also hold
> for workloads with many small files, which is quite a common case [1] ?
>
> Yes, I think so. The overhead should still be considered negligible,
> particularly in comparison to remote I/O, and other benefits of this
> proposal may be more significant than this one.
>
> Additionally, there is JNI overhead when Flink calls RocksDB methods
> currently. The frequency of these calls could surpass that of actual file
> system interface calls, given that not all state requests require accessing
> the file system.
>
> BTW, the issue with small files can also impact the performance of db with
> the local file system at runtime, so we usually resolve this firstly in the
> production environment.
>
> the engine spawns huge amount of scan range requests to the
> file system to retrieve different parts of a file.
>
> Indeed, frequent requests to the remote file system can significantly
> affect performance. To address this, other FLIPs have introduced various
> strategies:
>
> 1. Local disk cache to minimize remote requests as described in FLIP-423
> which we will introduce in FLIP-429 as you mentioned. With effective cache
> utilization, the performance will not be inferior to the local strategy
> when cache hits.
>
> 2. Grouping remote access to decrease the number of remote I/O requests,
> as proposed in "FLIP-426: Grouping Remote State Access."
>
> 3. Parallel I/O to maximize network bandwidth usage, outlined in
> "FLIP-425: Asynchronous Execution Model."
>
> The PoC implements a simple file cache and asynchronous execution which
> improves the performance a lot. You could also refer to the PoC results in
> FLIP-423.
>
> On Mon, Mar 11, 2024 at 3:11 AM Jeyhun Karimov 
> wrote:
>
>> Hi Hangxiang,
>>
>> Thanks for the proposal. +1 for it.
>> I have a few comments.
>>
>> Proposal 2 has additional JNI overhead, but the overhead is relatively
>> > negligible when weighed against the latency of remote I/O.
>>
>> - Is this argument true for all workloads? Or does this argument also hold
>> for workloads with many small files, which is quite a common case [1] ?
>>
>> - Also, in many workloads the engine does not need the whole file either
>> because of the query forces it or
>> file type supports efficient filtering (e.g. ORC, parquet, arrow files),
>> or
>> simply one file is "divided" among multiple workers.
>> In these cases, the engine spawns huge amount of scan range requests to
>> the
>> file system to retrieve different parts of a file.
>> How the proposed solution would work with these workloads?
>>
>> - The similar question related to the above applies also for caching ( I
>> know caching is subject of FLIP-429, asking here becasue of the related
>> section in this FLIP).
>>
>> Regards,
>> Jeyhun
>>
>> [1] https://blog.min.io/challenge-big-data-small-files/
>>
>>
>>
>> On Thu, Mar 7, 2024 at 10:09 AM Hangxiang Yu  wrote:
>>
>> > Hi devs,
>> >
>> >
>> > I'd like to start a discussion on a sub-FLIP of FLIP-423: Disaggregated
>> > State Storage and Management[1], which is a joint work of Yuan Mei,
>> Zakelly
>> > Lan, Jinzhong Li, Hangxiang Yu, Yanfei Lei and Feng Wang:
>> >
>> > - FLIP-427: Disaggregated State Store
>> >
>> > This FLIP introduces the initial version of the ForSt disaggregated
>> state
>> > store.
>> >
>> > Please make sure you have read the FLIP-423[1] to know the whole story,
>> and
>> > we'll discuss the details of FLIP-427[2] under this mail. For the
>> > discussion of overall architecture or topics related with multiple
>> > sub-FLIPs, please post in the previous mail[3].
>> >
>> > Looking forward to hearing from you!
>> >
>> > [1] https://cwiki.apache.org/confluence/x/R4p3EQ
>> >
>> > [2] https://cwiki.apache.org/confluence/x/T4p3EQ
>> >
>> > [3] https://lists.apache.org/thread/ct8smn6g9y0b8730z7rp9zfpnwmj8vf0
>> >
>> >
>> > Best,
>> >
>> > Hangxiang.
>> >
>>
>
>
> --
> Best,
> Hangxiang.
>


-- 
Best,
Hangxiang.

[jira] [Created] (FLINK-34741) "get-started" Page for Flink CDC Chinese Documentation

2024-03-19 Thread LvYanquan (Jira)

LvYanquan created FLINK-34741:
-

 Summary: "get-started" Page for Flink CDC Chinese Documentation
 Key: FLINK-34741
 URL: https://issues.apache.org/jira/browse/FLINK-34741
 Project: Flink
  Issue Type: Sub-task
  Components: chinese-translation, Documentation, Flink CDC
Affects Versions: cdc-3.1.0
Reporter: LvYanquan
 Fix For: cdc-3.1.0


Translate 
[https://github.com/apache/flink-cdc/tree/master/docs/content/docs/get-started] 
pages into Chinese.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-34740) "legacy-flink-cdc-sources" Pages for Flink CDC Chinese Documentation

2024-03-19 Thread LvYanquan (Jira)

LvYanquan created FLINK-34740:
-

 Summary: "legacy-flink-cdc-sources" Pages for Flink CDC Chinese 
Documentation
 Key: FLINK-34740
 URL: https://issues.apache.org/jira/browse/FLINK-34740
 Project: Flink
  Issue Type: Sub-task
  Components: chinese-translation, Documentation, Flink CDC
Affects Versions: cdc-3.1.0
Reporter: LvYanquan
 Fix For: cdc-3.1.0


Translate legacy-flink-cdc-sources pages of 
[https://github.com/apache/flink-cdc/tree/master/docs/content/docs/connectors/legacy-flink-cdc-sources]
 into Chinese.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-34739) "Connectors" Page for Flink CDC Chinese Documentation

2024-03-19 Thread LvYanquan (Jira)

LvYanquan created FLINK-34739:
-

 Summary: "Connectors" Page for Flink CDC Chinese Documentation
 Key: FLINK-34739
 URL: https://issues.apache.org/jira/browse/FLINK-34739
 Project: Flink
  Issue Type: Sub-task
  Components: chinese-translation, Documentation, Flink CDC
Affects Versions: cdc-3.1.0
Reporter: LvYanquan
 Fix For: cdc-3.1.0


Translate pipeline connector pages 
[https://github.com/apache/flink-cdc/tree/master/docs/content/docs/connectors] 
into Chinese.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Re: [DISCUSS] FLIP-426: Grouping Remote State Access

2024-03-19 Thread yue ma

Hi Jinzhong,

Thanks for the FLIP.  I have the following questions:

1. Does Grouping Remote State Access only support asynchronous interfaces?
--If it is: IIUC, MultiGet can also greatly improve performance for
synchronous access modes. Do we need to support it ？
--If not, how can we distinguish between using Grouping State Access in
asynchronous and synchronous modes?
2.  Can a simple example be added to FLip on how to use Batch to access
states and obtain the results of states on the API?
3. I also agree with XiaoRui's viewpoint. Is there a corresponding Config
to control the  state access batch strategy?

-- 
Best,
Yue

Re: [DISCUSS] FLIP-425: Asynchronous Execution Model

2024-03-19 Thread Yanfei Lei

Hi everyone,

Thanks for your valuable discussion and feedback!

Our discussions have been going on for a while and there have been no
new comments for several days. So I would like to start a vote after
72 hours.

Please let me know if you have any concerns, thanks!

Yanfei Lei  于2024年3月13日周三 12:54写道：

>
> Hi Jing,
> Thanks for the reply and follow up.
>
> > What is the benefit for users to build a chain of mails instead of just one 
> > mail(it is still async)?
>
> Just to make sure we're on the same page, I try to paraphrase your question:
> A `then()` call will be encapsulated as a callback mail. Your question
> is whether we can call then() as little as possible to reduce the
> overhead of encapsulating it into a mail.
>
> In general, whether to call `then()` depends on the user's data
> dependencies. The operations in a chain of `then()` are strictly
> ordered.
>
>
>
> The following is an example without data dependencies, if written in
> the form of a `then` chain:
> stateA.update(1).then(stateB.update(2).then(stateC.update(3)));
>
> The execution order is:
> ```
> stateA update 1 -> stateB update 2-> stateC update 3
> ```
>
> If written in the form without `then()` call, they will be placed in a
> "mail/mailboxDefaultAction", and each state request will still be
> executed asynchronously:
> ```
> stateA.update(1);
> stateB.update(2);
> stateC.update(3);
> ```
>
> The order in which they are executed is undefined and may be:
> ```
> - stateA update 1 -> stateB update 2-> stateC update 3
> - stateB update 2 -> stateC update 3-> stateA update 1
> - stateC update 3 -> stateA update 1-> stateB update 2
> ...
> ```
> And the final results are "stateA = 1, stateB = 2, stateC = 3". In
> this case, the two ways of writing are equivalent.
>
>
>
> If there are data dependencies, for example:
> ```
> stateA.update(1).then(stateA.update(2))
> ```
>
> Then the execution order is:
> ```
> stateA update 1 -> stateA update 2
> ```
>
> If written in the form without `then()` call:
> ```
> stateA.update(1);
> stateA.update(2);
> ```
>
> The order in which they are executed is undefined and may be:
> ```
> - stateA update 1 -> stateA update 2
> - stateA update 2-> stateA update 1
> ```
> The final result may be "stateA = 1" *OR* "stateA = 2". In this case,
> the way without `then()` chain to limit the execution order, and the
> results may be wrong.
>
> In summary, how many mails are encapsulated depends on how the user
> writes the code, and how the user writes the code depends on their
> data dependencies. [1][2] may be helpful for asynchronous programming
> practice.
>
>
> > I was wondering if exceptions in the mail chain would have an impact on the 
> > reference counting?
>
> We will catch exceptions that can be handled, they don't have impacts
> on the reference counting.
> For exceptions that cannot be handled, we will directly fail the job.
>
> > Maybe a UT to cover all kinds of cases, i.e. happy paths and unhappy paths, 
> > would make sense.
>
> Nice suggestions, we will add a UT to cover those cases.
>
>
> [1] 
> https://docs.oracle.com/javase/8/docs/api/java/util/concurrent/CompletableFuture.html
> [2] https://www.codingjunkie.net/completable-futures-part1/
>
> Jing Ge  于2024年3月13日周三 07:05写道：
> >
> > Hi Yanfei,
> >
> > Thanks for your clarification! Now I got a much clear picture and I am
> > still trying to understand your thoughts for some of those questions:
> >
> >
> > > How many mails are encapsulated depends on how the user writes the
> > > code. The statements in a `then()` will be wrapped into a mail.
> > > StateFuture is a restricted version of CompletableFuture, their basic
> > > semantics are consistent.
> > >
> >
> > Conceptually, users can write a chain of many async calls, i.e. many then()
> > calls. And all these calls for Record A must be executed in order, while
> > Record B should stay at the Blocking buffer. What is the benefit for users
> > to build a chain of mails instead of just one mail(it is still async)? Is
> > there any best practices or guidelines to teach/tell users when and how
> > many async calls in a chain could/should be built?
> >
> > > The challenge arises in determining when all the processing logic
> > associated with Record A is fully executed. To address this, we have
> > adopted a reference counting mechanism that tracks ongoing operations
> > (either processing input or executing a callback) related to a single
> > record.
> >
> > > We describe this in the "Error handling"[2] section. This FLIP also
> > > adopts the design from FLIP-368, ensuring that all state interfaces
> > > throw unchecked exceptions and, consequently, do not declare any
> > > exceptions in their signatures. In cases where an exception occurs
> > > while accessing the state, the job should fail.
> > >
> >
> > My question was not about how exceptions will be defined. I am not sure how
> > unchecked exceptions handling will be implemented. I was wondering if
> > exceptions in the mail chain

Re: [DISCUSS] FLIP-424: Asynchronous State APIs

2024-03-19 Thread Zakelly Lan

Hi everyone,

Thanks for your valuable feedback!

The discussions were vibrant and have led to significant enhancements to
this FLIP. With this progress, I'm looking to initiate the voting in 72
hours.

Please let me know if you have any concerns, thanks!


Best,
Zakelly

On Tue, Mar 19, 2024 at 5:35 PM Zakelly Lan  wrote:

> Hi Yue,
>
> Thanks for your comments!
>
> 1. Is it possible for all `FutureUtils` in Flink to reuse the same util
>> class?
>
> Actually, the `FutureUtils` here is a new util class that will share the
> same package path with the `StateFuture`. Or I'd be fine renaming it
> 'StateFutureUtils'.
>
> 2. It seems that there is no concept of retry, timeout, or delay in your
>> async state api design . Do we need to provide such capabilities like
>> `orTimeout` 、`completeDelayed`?
>>
> For ease of use, we do not provide such APIs allowing users to customize
> the behavior on timeout or retry. We may introduce a retry mechanism in the
> framework enabled by configuration. And we will hide the 'complete' and
> related APIs of StateFuture from users, since the completion of these
> futures is totally managed by the execution framework.
>
>
> Best,
> Zakelly
>
>
>
> On Tue, Mar 19, 2024 at 5:20 PM yue ma  wrote:
>
>> Hi Zakelly,
>>
>> Thanks for your proposal. The FLIP looks good  to me +1! I'd like to ask
>> some minor questions
>> I found that there is also a definition of class `FutureUtils` under `org.
>> apache. flink. util. concurrent` which seems to offer more interfaces. My
>> question is:
>> 1. Is it possible for all `FutureUtils` in Flink to reuse the same util
>> class?
>> 2. It seems that there is no concept of retry, timeout, or delay in your
>> async state api design . Do we need to provide such capabilities like
>> `orTimeout` 、`completeDelayed`?
>>
>> Jing Ge  于2024年3月13日周三 20:00写道：
>>
>> > indeed! I missed that part. Thanks for the hint!
>> >
>> > Best regards,
>> > Jing
>> >
>> > On Wed, Mar 13, 2024 at 6:02 AM Zakelly Lan 
>> wrote:
>> >
>> > > Hi Jing,
>> > >
>> > > The deprecation and removal of original APIs is beyond the scope of
>> > > current FLIP, but I do add/highlight such information under
>> > "Compatibility,
>> > > Deprecation, and Migration Plan" section.
>> > >
>> > >
>> > > Best,
>> > > Zakelly
>> > >
>> > > On Wed, Mar 13, 2024 at 9:18 AM Yunfeng Zhou <
>> > flink.zhouyunf...@gmail.com>
>> > > wrote:
>> > >
>> > >> Hi Zakelly,
>> > >>
>> > >> Thanks for your responses. I agree with it that we can keep the
>> design
>> > >> as it is for now and see if others have any better ideas for these
>> > >> questions.
>> > >>
>> > >> Best,
>> > >> Yunfeng
>> > >>
>> > >> On Tue, Mar 12, 2024 at 5:23 PM Zakelly Lan 
>> > >> wrote:
>> > >> >
>> > >> > Hi Xuannan,
>> > >> >
>> > >> > Thanks for your comments, I modified the FLIP accordingly.
>> > >> >
>> > >> > Hi Yunfeng,
>> > >> >
>> > >> > Thanks for sharing your opinions!
>> > >> >
>> > >> >> Could you provide some hint on use cases where users need to mix
>> sync
>> > >> >> and async state operations in spite of the performance regression?
>> > >> >> This information might help address our concerns on design. If the
>> > >> >> mixed usage is simply something not recommended, I would prefer to
>> > >> >> prohibit such usage from API.
>> > >> >
>> > >> > In fact, there is no scenario where users MUST use the sync APIs,
>> but
>> > >> it is much easier to use for those who are not familiar with
>> > asynchronous
>> > >> programming. If they want to migrate their job from Flink 1.x to 2.0
>> > >> leveraging some benefits from asynchronous APIs, they may try the
>> mixed
>> > >> usage. It is not user-friendly to directly throw exceptions at
>> runtime,
>> > I
>> > >> think our better approach is to warn users and recommend avoiding
>> this.
>> > I
>> > >> added an example in this FLIP.
>> > >> >
>> > >> > Well, I do not insist on allowing mixed usage of APIs if others
>> reach
>> > >> an agreement that we won't support that . I think the most important
>> is
>> > to
>> > >> keep the API easy to use and understand, thus I propose a unified
>> state
>> > >> declaration and explicit meaning in method name. WDYT?
>> > >> >
>> > >> >> Sorry I missed the new sink API. I do still think that it would be
>> > >> >> better to make the package name more informative, and ".v2." does
>> not
>> > >> >> contain information for new Flink users who did not know the v1 of
>> > >> >> state API. Unlike internal implementation and performance
>> > >> >> optimization, API will hardly be compromised for now and updated
>> in
>> > >> >> future, so I still suggest we improve the package name now if
>> > >> >> possible. But given the existing practice of sink v2 and
>> > >> >> AbstractStreamOperatorV2, the current package name would be
>> > acceptable
>> > >> >> to me if other reviewers of this FLIP agrees on it.
>> > >> >
>> > >> > Actually, I don't like 'v2' either. So if there is another good
>> name,
>> > >> I'd be happy to apply. This is a

[jira] [Created] (FLINK-34738) "Deployment - YARN" Page for Flink CDC Chinese Documentation

2024-03-19 Thread LvYanquan (Jira)

LvYanquan created FLINK-34738:
-

 Summary: "Deployment - YARN" Page for Flink CDC Chinese 
Documentation
 Key: FLINK-34738
 URL: https://issues.apache.org/jira/browse/FLINK-34738
 Project: Flink
  Issue Type: Sub-task
  Components: chinese-translation, Documentation, Flink CDC
Affects Versions: 3.1.0
Reporter: LvYanquan
 Fix For: 3.1.0


Translate 
[https://github.com/apache/flink-cdc/blob/master/docs/content/docs/deployment/yarn.md]
 into Chinese.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Re: [DISCUSS] FLIP-423 ～FLIP-428: Introduce Disaggregated State Storage and Management in Flink 2.0

2024-03-19 Thread Zakelly Lan

Hi everyone,

Thanks for your valuable feedback!

Our discussions have been going on for a while and are nearing a
consensus. So I would like to start a vote after 72 hours.

Please let me know if you have any concerns, thanks!


Best,
Zakelly

On Tue, Mar 19, 2024 at 3:37 PM Zakelly Lan  wrote:

> Hi Yunfeng,
>
> Thanks for the suggestion!
>
> I will reorganize the FLIP-425 accordingly.
>
>
> Best,
> Zakelly
>
> On Tue, Mar 19, 2024 at 3:20 PM Yunfeng Zhou 
> wrote:
>
>> Hi Xintong and Zakelly,
>>
>> > 2. Regarding Strictly-ordered and Out-of-order of Watermarks
>> I agree with it that watermarks can use only out-of-order mode for
>> now, because there is still not a concrete example showing the
>> correctness risk about it. However, the strictly-ordered mode should
>> still be supported as the default option for non-record event types
>> other than watermark, at least for checkpoint barriers.
>>
>> I noticed that this information has already been documented in "For
>> other non-record events, such as RecordAttributes ...", but it's at
>> the bottom of the "Watermark" section, which might not be very
>> obvious. Thus it might be better to reorganize the FLIP to better
>> claim that the two order modes are designed for all non-record events,
>> and which mode this FLIP would choose for each type of event.
>>
>> Best,
>> Yunfeng
>>
>> On Tue, Mar 19, 2024 at 1:09 PM Xintong Song 
>> wrote:
>> >
>> > Thanks for the quick response. Sounds good to me.
>> >
>> > Best,
>> >
>> > Xintong
>> >
>> >
>> >
>> > On Tue, Mar 19, 2024 at 1:03 PM Zakelly Lan 
>> wrote:
>> >
>> > > Hi Xintong,
>> > >
>> > > Thanks for sharing your thoughts!
>> > >
>> > > 1. Regarding Record-ordered and State-ordered of processElement.
>> > > >
>> > > > I understand that while State-ordered likely provides better
>> performance,
>> > > > Record-ordered is sometimes required for correctness. The question
>> is how
>> > > > should a user choose between these two modes? My concern is that
>> such a
>> > > > decision may require users to have in-depth knowledge about the
>> Flink
>> > > > internals, and may lead to correctness issues if State-ordered is
>> chosen
>> > > > improperly.
>> > > >
>> > > > I'd suggest not to expose such a knob, at least in the first
>> version.
>> > > That
>> > > > means always use Record-ordered for custom operators / UDFs, and
>> keep
>> > > > State-ordered for internal usages (built-in operators) only.
>> > > >
>> > >
>> > > Indeed, users may not be able to choose the mode properly. I agree to
>> keep
>> > > such options for internal use.
>> > >
>> > >
>> > > 2. Regarding Strictly-ordered and Out-of-order of Watermarks.
>> > > >
>> > > > I'm not entirely sure about Strictly-ordered being the default, or
>> even
>> > > > being supported. From my understanding, a Watermark(T) only
>> suggests that
>> > > > all records with event time before T has arrived, and it has
>> nothing to
>> > > do
>> > > > with whether records with event time after T has arrived or not.
>> From
>> > > that
>> > > > perspective, preventing certain records from arriving before a
>> Watermark
>> > > is
>> > > > never supported. I also cannot come up with any use case where
>> > > > Strictly-ordered is necessary. This implies the same issue as 1):
>> how
>> > > does
>> > > > the user choose between the two modes?
>> > > >
>> > > > I'd suggest not expose the knob to users and only support
>> Out-of-order,
>> > > > until we see a concrete use case that Strictly-ordered is needed.
>> > > >
>> > >
>> > > The semantics of watermarks do not define the sequence between a
>> watermark
>> > > and subsequent records. For the most part, this is inconsequential,
>> except
>> > > it may affect some current users who have previously relied on the
>> implicit
>> > > assumption of an ordered execution. I'd be fine with initially
>> supporting
>> > > only out-of-order processing. We may consider exposing the
>> > > 'Strictly-ordered' mode once we encounter a concrete use case that
>> > > necessitates it.
>> > >
>> > >
>> > > My philosophies behind not exposing the two config options are:
>> > > > - There are already too many options in Flink that barely know how
>> to use
>> > > > them. I think Flink should try as much as possible to decide its own
>> > > > behavior, rather than throwing all the decisions to the users.
>> > > > - It's much harder to take back knobs than to introduce them.
>> Therefore,
>> > > > options should be introduced only if concrete use cases are
>> identified.
>> > > >
>> > >
>> > > I agree to keep minimal configurable items especially for the MVP.
>> Given
>> > > that we have the opportunity to refine the functionality before the
>> > > framework transitions from @Experimental to @PublicEvolving, it makes
>> sense
>> > > to refrain from presenting user-facing options until we have ensured
>> > > their necessity.
>> > >
>> > >
>> > > Best,
>> > > Zakelly
>> > >
>> > > On Tue, Mar 19, 2024 at 12:06 PM Xintong Song 
>> > > wrote:
>> >

[jira] [Created] (FLINK-34737) "Deployment - Kubernetes" Page for Flink CDC Chinese Documentation

2024-03-19 Thread LvYanquan (Jira)

LvYanquan created FLINK-34737:
-

 Summary: "Deployment - Kubernetes" Page for Flink CDC Chinese 
Documentation
 Key: FLINK-34737
 URL: https://issues.apache.org/jira/browse/FLINK-34737
 Project: Flink
  Issue Type: Sub-task
  Components: chinese-translation, Documentation, Flink CDC
Affects Versions: 3.1.0
Reporter: LvYanquan
 Fix For: 3.1.0


Translate 
[https://github.com/apache/flink-cdc/blob/master/docs/content/docs/deployment/kubernetes.md]
 into Chinese.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-34736) "Deployment - Standalone" Page for Flink CDC Chinese Documentation

2024-03-19 Thread LvYanquan (Jira)

LvYanquan created FLINK-34736:
-

 Summary: "Deployment - Standalone" Page for Flink CDC Chinese 
Documentation
 Key: FLINK-34736
 URL: https://issues.apache.org/jira/browse/FLINK-34736
 Project: Flink
  Issue Type: Sub-task
  Components: chinese-translation, Documentation, Flink CDC
Affects Versions: 3.1.0
Reporter: LvYanquan
 Fix For: 3.1.0


Translate 
[https://github.com/apache/flink-cdc/blob/master/docs/content/docs/deployment/standalone.md]
 into Chinese.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-34735) "Developer Guide - Understanding Flink CDC API" Page for Flink CDC Chinese Documentation

2024-03-19 Thread LvYanquan (Jira)

LvYanquan created FLINK-34735:
-

 Summary: "Developer Guide - Understanding Flink CDC API" Page for 
Flink CDC Chinese Documentation
 Key: FLINK-34735
 URL: https://issues.apache.org/jira/browse/FLINK-34735
 Project: Flink
  Issue Type: Sub-task
  Components: chinese-translation, Documentation, Flink CDC
Affects Versions: 3.1.0
Reporter: LvYanquan
 Fix For: 3.1.0


Translate 
[https://github.com/apache/flink-cdc/blob/master/docs/content/docs/developer-guide/understand-flink-cdc-api.md]
 into Chinese.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Re: [DISCUSS] Planning Flink 1.20

2024-03-19 Thread Rui Fan

Hi Weijie,

Thanks for kicking off 1.20! I'd like to join you and participate in the
1.20 release.

Best,
Rui

On Tue, Mar 19, 2024 at 5:30 PM weijie guo 
wrote:

> Hi everyone,
>
> With the release announcement of Flink 1.19, it's a good time to kick off
> discussion of the next release 1.20.
>
>
> - Release managers
>
>
> I'd like to volunteer as one of the release managers this time. It has been
> good practice to have a team of release managers from different
> backgrounds, so please raise you hand if you'd like to volunteer and get
> involved.
>
>
>
> - Timeline
>
>
> Flink 1.19 has been released. With a target release cycle of 4 months,
> we propose a feature freeze date of *June 15, 2024*.
>
>
>
> - Collecting features
>
>
> As usual, we've created a wiki page[1] for collecting new features in 1.20.
>
>
> In addition, we already have a number of FLIPs that have been voted or are
> in the process, including pre-works for version 2.0.
>
>
> In the meantime, the release management team will be finalized in the next
> few days, and we'll continue to create Jira Boards and Sync meetings
> to make it easy
> for everyone to get an overview and track progress.
>
>
>
> Best regards,
>
> Weijie
>
>
>
> [1] https://cwiki.apache.org/confluence/display/FLINK/1.20+Release
>

Re: [DISCUSS] FLIP-424: Asynchronous State APIs

2024-03-19 Thread Zakelly Lan

Hi Yue,

Thanks for your comments!

1. Is it possible for all `FutureUtils` in Flink to reuse the same util
> class?

Actually, the `FutureUtils` here is a new util class that will share the
same package path with the `StateFuture`. Or I'd be fine renaming it
'StateFutureUtils'.

2. It seems that there is no concept of retry, timeout, or delay in your
> async state api design . Do we need to provide such capabilities like
> `orTimeout` 、`completeDelayed`?
>
For ease of use, we do not provide such APIs allowing users to customize
the behavior on timeout or retry. We may introduce a retry mechanism in the
framework enabled by configuration. And we will hide the 'complete' and
related APIs of StateFuture from users, since the completion of these
futures is totally managed by the execution framework.


Best,
Zakelly



On Tue, Mar 19, 2024 at 5:20 PM yue ma  wrote:

> Hi Zakelly,
>
> Thanks for your proposal. The FLIP looks good  to me +1! I'd like to ask
> some minor questions
> I found that there is also a definition of class `FutureUtils` under `org.
> apache. flink. util. concurrent` which seems to offer more interfaces. My
> question is:
> 1. Is it possible for all `FutureUtils` in Flink to reuse the same util
> class?
> 2. It seems that there is no concept of retry, timeout, or delay in your
> async state api design . Do we need to provide such capabilities like
> `orTimeout` 、`completeDelayed`?
>
> Jing Ge  于2024年3月13日周三 20:00写道：
>
> > indeed! I missed that part. Thanks for the hint!
> >
> > Best regards,
> > Jing
> >
> > On Wed, Mar 13, 2024 at 6:02 AM Zakelly Lan 
> wrote:
> >
> > > Hi Jing,
> > >
> > > The deprecation and removal of original APIs is beyond the scope of
> > > current FLIP, but I do add/highlight such information under
> > "Compatibility,
> > > Deprecation, and Migration Plan" section.
> > >
> > >
> > > Best,
> > > Zakelly
> > >
> > > On Wed, Mar 13, 2024 at 9:18 AM Yunfeng Zhou <
> > flink.zhouyunf...@gmail.com>
> > > wrote:
> > >
> > >> Hi Zakelly,
> > >>
> > >> Thanks for your responses. I agree with it that we can keep the design
> > >> as it is for now and see if others have any better ideas for these
> > >> questions.
> > >>
> > >> Best,
> > >> Yunfeng
> > >>
> > >> On Tue, Mar 12, 2024 at 5:23 PM Zakelly Lan 
> > >> wrote:
> > >> >
> > >> > Hi Xuannan,
> > >> >
> > >> > Thanks for your comments, I modified the FLIP accordingly.
> > >> >
> > >> > Hi Yunfeng,
> > >> >
> > >> > Thanks for sharing your opinions!
> > >> >
> > >> >> Could you provide some hint on use cases where users need to mix
> sync
> > >> >> and async state operations in spite of the performance regression?
> > >> >> This information might help address our concerns on design. If the
> > >> >> mixed usage is simply something not recommended, I would prefer to
> > >> >> prohibit such usage from API.
> > >> >
> > >> > In fact, there is no scenario where users MUST use the sync APIs,
> but
> > >> it is much easier to use for those who are not familiar with
> > asynchronous
> > >> programming. If they want to migrate their job from Flink 1.x to 2.0
> > >> leveraging some benefits from asynchronous APIs, they may try the
> mixed
> > >> usage. It is not user-friendly to directly throw exceptions at
> runtime,
> > I
> > >> think our better approach is to warn users and recommend avoiding
> this.
> > I
> > >> added an example in this FLIP.
> > >> >
> > >> > Well, I do not insist on allowing mixed usage of APIs if others
> reach
> > >> an agreement that we won't support that . I think the most important
> is
> > to
> > >> keep the API easy to use and understand, thus I propose a unified
> state
> > >> declaration and explicit meaning in method name. WDYT?
> > >> >
> > >> >> Sorry I missed the new sink API. I do still think that it would be
> > >> >> better to make the package name more informative, and ".v2." does
> not
> > >> >> contain information for new Flink users who did not know the v1 of
> > >> >> state API. Unlike internal implementation and performance
> > >> >> optimization, API will hardly be compromised for now and updated in
> > >> >> future, so I still suggest we improve the package name now if
> > >> >> possible. But given the existing practice of sink v2 and
> > >> >> AbstractStreamOperatorV2, the current package name would be
> > acceptable
> > >> >> to me if other reviewers of this FLIP agrees on it.
> > >> >
> > >> > Actually, I don't like 'v2' either. So if there is another good
> name,
> > >> I'd be happy to apply. This is a compromise to the current situation.
> > Maybe
> > >> we could refine this after the retirement of original state APIs.
> > >> >
> > >> >
> > >> > Thanks & Best,
> > >> > Zakelly
> > >> >
> > >> >
> > >> > On Tue, Mar 12, 2024 at 1:42 PM Yunfeng Zhou <
> > >> flink.zhouyunf...@gmail.com> wrote:
> > >> >>
> > >> >> Hi Zakelly,
> > >> >>
> > >> >> Thanks for the quick response!
> > >> >>
> > >> >> > Actually splitting APIs into two sets ... warn them in runtime.
> > >> >>
> > >> >>

[DISCUSS] Planning Flink 1.20

2024-03-19 Thread weijie guo

Hi everyone,

With the release announcement of Flink 1.19, it's a good time to kick off
discussion of the next release 1.20.


- Release managers


I'd like to volunteer as one of the release managers this time. It has been
good practice to have a team of release managers from different
backgrounds, so please raise you hand if you'd like to volunteer and get
involved.



- Timeline


Flink 1.19 has been released. With a target release cycle of 4 months,
we propose a feature freeze date of *June 15, 2024*.



- Collecting features


As usual, we've created a wiki page[1] for collecting new features in 1.20.


In addition, we already have a number of FLIPs that have been voted or are
in the process, including pre-works for version 2.0.


In the meantime, the release management team will be finalized in the next
few days, and we'll continue to create Jira Boards and Sync meetings
to make it easy
for everyone to get an overview and track progress.



Best regards,

Weijie



[1] https://cwiki.apache.org/confluence/display/FLINK/1.20+Release

Re: [DISCUSS] FLIP-424: Asynchronous State APIs

2024-03-19 Thread yue ma

Hi Zakelly,

Thanks for your proposal. The FLIP looks good  to me +1! I'd like to ask
some minor questions
I found that there is also a definition of class `FutureUtils` under `org.
apache. flink. util. concurrent` which seems to offer more interfaces. My
question is:
1. Is it possible for all `FutureUtils` in Flink to reuse the same util
class?
2. It seems that there is no concept of retry, timeout, or delay in your
async state api design . Do we need to provide such capabilities like
`orTimeout` 、`completeDelayed`?

Jing Ge  于2024年3月13日周三 20:00写道：

> indeed! I missed that part. Thanks for the hint!
>
> Best regards,
> Jing
>
> On Wed, Mar 13, 2024 at 6:02 AM Zakelly Lan  wrote:
>
> > Hi Jing,
> >
> > The deprecation and removal of original APIs is beyond the scope of
> > current FLIP, but I do add/highlight such information under
> "Compatibility,
> > Deprecation, and Migration Plan" section.
> >
> >
> > Best,
> > Zakelly
> >
> > On Wed, Mar 13, 2024 at 9:18 AM Yunfeng Zhou <
> flink.zhouyunf...@gmail.com>
> > wrote:
> >
> >> Hi Zakelly,
> >>
> >> Thanks for your responses. I agree with it that we can keep the design
> >> as it is for now and see if others have any better ideas for these
> >> questions.
> >>
> >> Best,
> >> Yunfeng
> >>
> >> On Tue, Mar 12, 2024 at 5:23 PM Zakelly Lan 
> >> wrote:
> >> >
> >> > Hi Xuannan,
> >> >
> >> > Thanks for your comments, I modified the FLIP accordingly.
> >> >
> >> > Hi Yunfeng,
> >> >
> >> > Thanks for sharing your opinions!
> >> >
> >> >> Could you provide some hint on use cases where users need to mix sync
> >> >> and async state operations in spite of the performance regression?
> >> >> This information might help address our concerns on design. If the
> >> >> mixed usage is simply something not recommended, I would prefer to
> >> >> prohibit such usage from API.
> >> >
> >> > In fact, there is no scenario where users MUST use the sync APIs, but
> >> it is much easier to use for those who are not familiar with
> asynchronous
> >> programming. If they want to migrate their job from Flink 1.x to 2.0
> >> leveraging some benefits from asynchronous APIs, they may try the mixed
> >> usage. It is not user-friendly to directly throw exceptions at runtime,
> I
> >> think our better approach is to warn users and recommend avoiding this.
> I
> >> added an example in this FLIP.
> >> >
> >> > Well, I do not insist on allowing mixed usage of APIs if others reach
> >> an agreement that we won't support that . I think the most important is
> to
> >> keep the API easy to use and understand, thus I propose a unified state
> >> declaration and explicit meaning in method name. WDYT?
> >> >
> >> >> Sorry I missed the new sink API. I do still think that it would be
> >> >> better to make the package name more informative, and ".v2." does not
> >> >> contain information for new Flink users who did not know the v1 of
> >> >> state API. Unlike internal implementation and performance
> >> >> optimization, API will hardly be compromised for now and updated in
> >> >> future, so I still suggest we improve the package name now if
> >> >> possible. But given the existing practice of sink v2 and
> >> >> AbstractStreamOperatorV2, the current package name would be
> acceptable
> >> >> to me if other reviewers of this FLIP agrees on it.
> >> >
> >> > Actually, I don't like 'v2' either. So if there is another good name,
> >> I'd be happy to apply. This is a compromise to the current situation.
> Maybe
> >> we could refine this after the retirement of original state APIs.
> >> >
> >> >
> >> > Thanks & Best,
> >> > Zakelly
> >> >
> >> >
> >> > On Tue, Mar 12, 2024 at 1:42 PM Yunfeng Zhou <
> >> flink.zhouyunf...@gmail.com> wrote:
> >> >>
> >> >> Hi Zakelly,
> >> >>
> >> >> Thanks for the quick response!
> >> >>
> >> >> > Actually splitting APIs into two sets ... warn them in runtime.
> >> >>
> >> >> Could you provide some hint on use cases where users need to mix sync
> >> >> and async state operations in spite of the performance regression?
> >> >> This information might help address our concerns on design. If the
> >> >> mixed usage is simply something not recommended, I would prefer to
> >> >> prohibit such usage from API.
> >> >>
> >> >> > In fact ... .sink2`.
> >> >>
> >> >> Sorry I missed the new sink API. I do still think that it would be
> >> >> better to make the package name more informative, and ".v2." does not
> >> >> contain information for new Flink users who did not know the v1 of
> >> >> state API. Unlike internal implementation and performance
> >> >> optimization, API will hardly be compromised for now and updated in
> >> >> future, so I still suggest we improve the package name now if
> >> >> possible. But given the existing practice of sink v2 and
> >> >> AbstractStreamOperatorV2, the current package name would be
> acceptable
> >> >> to me if other reviewers of this FLIP agrees on it.
> >> >>
> >> >> Best,
> >> >> Yunfeng
> >> >>
> >> >> On Mon, Mar 11, 2024 at 5:27 PM

[jira] [Created] (FLINK-34734) Update the titles for Chinese Documents.

2024-03-19 Thread LvYanquan (Jira)

LvYanquan created FLINK-34734:
-

 Summary: Update the titles for Chinese Documents.
 Key: FLINK-34734
 URL: https://issues.apache.org/jira/browse/FLINK-34734
 Project: Flink
  Issue Type: Sub-task
  Components: chinese-translation, Documentation, Flink CDC
Affects Versions: 3.1.0
Reporter: LvYanquan
 Fix For: 3.1.0


The titles is used to build directory and document names.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-34733) OSS Filesystem throws ClassNotFoundException

2024-03-19 Thread zhuoshaojian (Jira)

zhuoshaojian created FLINK-34733:


 Summary: OSS Filesystem throws ClassNotFoundException
 Key: FLINK-34733
 URL: https://issues.apache.org/jira/browse/FLINK-34733
 Project: Flink
  Issue Type: Bug
  Components: FileSystems
Affects Versions: 1.17.2
 Environment: Flink >= v1.17
Reporter: zhuoshaojian


The ClassNotFoundException was caused by this commit 
[[https://github.com/apache/flink/commit/52a2b98bb5af842633df0c051b5da95d437a6b2f]],
 which removed the relocation configuration from 
pom.xml.[FLINK-31612|https://issues.apache.org/jira/browse/FLINK-31612].

But in plugin flink-oss-fs-hadoop, the shaded prefix was hardcoded 
[https://github.com/apache/flink/blob/c0027e5777f9d77970fdb99bcc158d65ea48d514/flink-filesystems/flink-oss-fs-hadoop/src/main/java/org/apache/flink/fs/osshadoop/OSSFileSystemFactory.java#L50]

This resulted in the exception:

```

Caused by: java.lang.ClassNotFoundException: 
org.apache.flink.fs.osshadoop.shaded.com.aliyun.oss.common.auth.EnvironmentVariableCredentialsProvider
    at java.net.URLClassLoader.findClass(Unknown Source) ~[?:?]
    at java.lang.ClassLoader.loadClass(Unknown Source) ~[?:?]
    at 
org.apache.flink.core.classloading.ComponentClassLoader.loadClassFromComponentOnly(ComponentClassLoader.java:150)
 ~[flink-dist-1.17.2.jar:1.17.2]
    at 
org.apache.flink.core.classloading.ComponentClassLoader.loadClassFromOwnerFirst(ComponentClassLoader.java:172)
 ~[flink-dist-1.17.2.jar:1.17.2]
    at 
org.apache.flink.core.classloading.ComponentClassLoader.loadClass(ComponentClassLoader.java:107)
 ~[flink-dist-1.17.2.jar:1.17.2]
    at java.lang.ClassLoader.loadClass(Unknown Source) ~[?:?]
    at java.lang.Class.forName0(Native Method) ~[?:?]
    at java.lang.Class.forName(Unknown Source) ~[?:?]
    at 
org.apache.hadoop.fs.aliyun.oss.AliyunOSSUtils.getCredentialsProvider(AliyunOSSUtils.java:118)
 ~[?:?]
    at 
org.apache.hadoop.fs.aliyun.oss.AliyunOSSFileSystemStore.initialize(AliyunOSSFileSystemStore.java:155)
 ~[?:?]
    at 
org.apache.hadoop.fs.aliyun.oss.AliyunOSSFileSystem.initialize(AliyunOSSFileSystem.java:349)
 ~[?:?]
    at 
org.apache.flink.fs.osshadoop.OSSFileSystemFactory.create(OSSFileSystemFactory.java:103)
 ~[?:?]
    at 
org.apache.flink.core.fs.PluginFileSystemFactory.create(PluginFileSystemFactory.java:62)
 ~[flink-dist-1.17.2.jar:1.17.2]
    at 
org.apache.flink.core.fs.FileSystem.getUnguardedFileSystem(FileSystem.java:508) 
~[flink-dist-1.17.2.jar:1.17.2]
    at org.apache.flink.core.fs.FileSystem.get(FileSystem.java:409) 
~[flink-dist-1.17.2.jar:1.17.2]
    at org.apache.flink.core.fs.Path.getFileSystem(Path.java:274) 
~[flink-dist-1.17.2.jar:1.17.2]
    at 
org.apache.flink.runtime.blob.BlobUtils.createFileSystemBlobStore(BlobUtils.java:99)
 ~[flink-dist-1.17.2.jar:1.17.2]
    at 
org.apache.flink.runtime.blob.BlobUtils.createBlobStoreFromConfig(BlobUtils.java:86)
 ~[flink-dist-1.17.2.jar:1.17.2]
    at 
org.apache.flink.kubernetes.highavailability.KubernetesHaServicesFactory.createHAServices(KubernetesHaServicesFactory.java:41)
 ~[flink-dist-1.17.2.jar:1.17.2]
    at 
org.apache.flink.runtime.highavailability.HighAvailabilityServicesUtils.createCustomHAServices(HighAvailabilityServicesUtils.java:296)
 ~[flink-dist-1.17.2.jar:1.17.2]
    at 
org.apache.flink.runtime.highavailability.HighAvailabilityServicesUtils.createHighAvailabilityServices(HighAvailabilityServicesUtils.java:139)
 ~[flink-dist-1.17.2.jar:1.17.2]
    at 
org.apache.flink.runtime.entrypoint.ClusterEntrypoint.createHaServices(ClusterEntrypoint.java:442)
 ~[flink-dist-1.17.2.jar:1.17.2]
    at 
org.apache.flink.runtime.entrypoint.ClusterEntrypoint.initializeServices(ClusterEntrypoint.java:391)
 ~[flink-dist-1.17.2.jar:1.17.2]
    at 
org.apache.flink.runtime.entrypoint.ClusterEntrypoint.runCluster(ClusterEntrypoint.java:282)
 ~[flink-dist-1.17.2.jar:1.17.2]
    at 
org.apache.flink.runtime.entrypoint.ClusterEntrypoint.lambda$startCluster$1(ClusterEntrypoint.java:232)
 ~[flink-dist-1.17.2.jar:1.17.2]
    at 
org.apache.flink.runtime.security.contexts.NoOpSecurityContext.runSecured(NoOpSecurityContext.java:28)
 ~[flink-dist-1.17.2.jar:1.17.2]
    at 
org.apache.flink.runtime.entrypoint.ClusterEntrypoint.startCluster(ClusterEntrypoint.java:229)
 ~[flink-dist-1.17.2.jar:1.17.2]
    ... 2 more

```



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-34732) Add document dead link check for Flink CDC Documentation

2024-03-19 Thread Zhongqiang Gong (Jira)

Zhongqiang Gong created FLINK-34732:
---

 Summary: Add document dead link check for Flink CDC Documentation
 Key: FLINK-34732
 URL: https://issues.apache.org/jira/browse/FLINK-34732
 Project: Flink
  Issue Type: Sub-task
  Components: Flink CDC
Reporter: Zhongqiang Gong


Add ci for check dead link in flink cdc document.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-34731) Remove SpeculativeScheduler and incorporate its features into AdaptiveBatchScheduler

2024-03-19 Thread Junrui Li (Jira)

Junrui Li created FLINK-34731:
-

 Summary: Remove SpeculativeScheduler and incorporate its features 
into AdaptiveBatchScheduler
 Key: FLINK-34731
 URL: https://issues.apache.org/jira/browse/FLINK-34731
 Project: Flink
  Issue Type: Technical Debt
  Components: Runtime / Coordination
Reporter: Junrui Li
 Fix For: 1.20.0


Presently, speculative execution is exposed to users as a feature of the 
AdaptiveBatchScheduler.

To streamline our codebase and reduce maintenance overhead, this ticket will 
consolidate the SpeculativeScheduler into the AdaptiveBatchScheduler, 
eliminating the need for a separate SpeculativeScheduler class.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-34730) "Deployment" Page for Flink CDC Chinese Documentation

2024-03-19 Thread LvYanquan (Jira)

LvYanquan created FLINK-34730:
-

 Summary: "Deployment" Page for Flink CDC Chinese Documentation
 Key: FLINK-34730
 URL: https://issues.apache.org/jira/browse/FLINK-34730
 Project: Flink
  Issue Type: Sub-task
  Components: chinese-translation, Documentation, Flink CDC
Reporter: LvYanquan


Translate "Deployment" Page into Chinese.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-34729) "Core Concept" Pages for Flink CDC Chinese Documentation

2024-03-19 Thread LvYanquan (Jira)

LvYanquan created FLINK-34729:
-

 Summary: "Core Concept" Pages for Flink CDC Chinese Documentation
 Key: FLINK-34729
 URL: https://issues.apache.org/jira/browse/FLINK-34729
 Project: Flink
  Issue Type: Sub-task
  Components: chinese-translation, Documentation, Flink CDC
Reporter: LvYanquan


Translate "Core Concept" Pages into Chinese.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-34728) operator does not need to upload and download the jar when deploying session job

2024-03-19 Thread Fei Feng (Jira)

Fei Feng created FLINK-34728:


 Summary: operator does not need to upload and download the jar 
when deploying session job
 Key: FLINK-34728
 URL: https://issues.apache.org/jira/browse/FLINK-34728
 Project: Flink
  Issue Type: Improvement
  Components: Deployment / Kubernetes, Kubernetes Operator
Affects Versions: kubernetes-operator-1.7.0, kubernetes-operator-1.6.0, 
kubernetes-operator-1.5.0
Reporter: Fei Feng


By reading the source code of the sessionjob's first reconcilition in the 
session mode of the flink kubernetes operator, a clear single point of 
bottleneck can be identified. When submitting a session job, the operator needs 
to first [download the job jar from the 
jarURL|https://github.com/apache/flink-kubernetes-operator/blob/main/flink-kubernetes-operator/src/main/java/org/apache/flink/kubernetes/operator/service/AbstractFlinkService.java#L827]
 to the local storage of kubernetes pod , then [upload the jar to the job 
manager through the `/jars/upload` rest api 
|https://github.com/apache/flink-kubernetes-operator/blob/main/flink-kubernetes-operator/src/main/java/org/apache/flink/kubernetes/operator/service/AbstractFlinkService.java#L842],
 and finally call the `/jars/:jarid/run` rest api to launch the job.

In this process, the operator needs to first download the jar and then upload 
the jar. When multiple jobs are submitted to the session cluster 
simultaneously, the operator can become a single point of bottleneck, which may 
be limited by the network traffic or other resource constraints of the operator 
pod.

We can modify the job submission process in the session mode. The jobmanager 
can provide a `/jars/run` rest api that supports self-downloading the job jar, 
and the operator only needs to send a rest request to submit the job, without 
download and upload the job jar. In this way, the submission pressure of the 
operator can be distributed to each job manager. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-34727) RestClusterClient.requestJobResult throw ConnectionClosedException when the accumulator data is large

2024-03-19 Thread Wancheng Xiao (Jira)

Wancheng Xiao created FLINK-34727:
-

 Summary: RestClusterClient.requestJobResult throw 
ConnectionClosedException when the accumulator data is large
 Key: FLINK-34727
 URL: https://issues.apache.org/jira/browse/FLINK-34727
 Project: Flink
  Issue Type: Bug
  Components: Runtime / REST
Affects Versions: 1.19.0, 1.16.2
 Environment: i added some log to trace this bug: !AbstractHandler.png!

!AbstractRestHandler.png!

!MiniDispatcher.png!

!RestServerEndpoint.png!

then i got:

/*then call requestJobResult and shutDownFuture.complete; (close channel when 
request deregisted)*/
2024-03-17 18:01:34.788 [flink-akka.actor.default-dispatcher-20] INFO  
o.a.flink.runtime.rest.handler.job.JobExecutionResultHandler  - 
JobExecutionResultHandler gateway.requestJobStatus complete. 
[jobStatus=FINISHED]
/*submit sendResponse*/
2024-03-17 18:01:34.821 [flink-akka.actor.default-dispatcher-20] INFO  
o.a.flink.runtime.rest.handler.job.JobExecutionResultHandler  - submit 
HandlerUtils.sendResponse().
/*thenAccept(sendResponse()) is complete, will call inFlightRequestTracker, but 
sendResponse's return feature not completed  */
2024-03-17 18:01:34.821 [flink-akka.actor.default-dispatcher-20] INFO  
o.a.flink.runtime.rest.handler.job.JobExecutionResultHandler  - 
requestProcessingFuture complete. 
[requestProcessingFuture=java.util.concurrent.CompletableFuture@1329aca5[Completed
 normally]]
/*sendResponse's write task is still running*/
2024-03-17 18:01:34.822 [flink-rest-server-netty-worker-thread-10] INFO  
o.a.f.s.netty4.io.netty.handler.stream.ChunkedWriteHandler  - write
/*deregister request and then shut down, then channel close*/
2024-03-17 18:01:34.826 [flink-akka.actor.default-dispatcher-20] INFO  
o.a.flink.runtime.rest.handler.job.JobExecutionResultHandler  - call 
inFlightRequestTracker.deregisterRequest() done
2024-03-17 18:01:34.827 [flink-rest-server-netty-worker-thread-10] INFO  
o.a.f.shaded.netty4.io.netty.channel.DefaultChannelPipeline  - pipeline close.
2024-03-17 18:01:34.827 [flink-rest-server-netty-worker-thread-10] INFO  
org.apache.flink.runtime.rest.handler.util.HandlerUtils  - lastContentFuture 
complete. [future=DefaultChannelPromise@621f03ea(failure: 
java.nio.channels.ClosedChannelException)]

 

 

more details in flink_bug_complex.log
Reporter: Wancheng Xiao
 Attachments: AbstractHandler.png, AbstractRestHandler.png, 
MiniDispatcher.png, RestServerEndpoint.png, flink_bug_complex.log, 
flink_bug_simple.log

The task was succeed, but "RestClusterClient.requestJobResult()" encountered an 
error reporting ConnectionClosedException. (Channel became inactive)
After debugging, it is speculated that the problem occurred in the flink task 
server-side "AbstractRestHandler.respondToRequest()" with the 
"response.thenAccept(resp -> HandlerUtils.sendResponse())", this "thenAccept()" 
did not pass the feature returned by sendResponse, causing the server shutdown 
process before the request was sent. I suspect that "thenAccept()" needs to be 
replaced with "thenCompose()"

The details are as follows:

 

*Pseudocode：*

1 . AbstractHandler.responseAsLeader(){
2 .   inFlightRequestTracker.registerRequest()
3 .   CompletableFuture requestProcessingFuture = respondToRequest(){
4 .    response = JobExecutionResultHandler.handleRequest() {
5 .      return (MiniDispatcher)gateway.requestJobResult(){
6 .        CompletableFuture jobResultFuture = 
super.requestJobResult(jobId, timeout);
7 .        // wait deregisterRequest completed. then shut down server
8 .        jobResultFuture.thenAccept(shutDownFuture.complete(status)); 
9 .        return jobResultFuture;
10.      }
11.    }
12.    // thenAccept cause requestProcessingFuture completed when 
HandlerUtils.sendResponse complete instead of HandlerUtils.sendResponse complete
13.    response.thenAccept(resp -> HandlerUtils.sendResponse(resp))
14.  }
15.  
requestProcessingFuture.whenComplete(inFlightRequestTracker.deregisterRequest())
16.}

 

*Server handling steps:*

netty-thread: got request
flink-dispatcher-thread: exec requestJobResult[6] and complete 
shutDownFuture[7], then call HandlerUtils.sendResponse[8](netty async write)
netty-thread: write some data to channel.(not done)
flink-dispatcher-thread: call inFlightRequestTracker.deregisterRequest[9]
netty-thread: write some data to channel failed, channel not active

 


Additionally:

During the process of investigating this bug, 
FutureUtils.retryOperationWithDelay swallowed the first occurrence of the 
"Channel became inactive" exception and, after several retries, the server was 
shut down,then the client throw "Connection refused" Exception. which had some 
impact on the troubleshooting process. Could we consider adding some logging 
here to aid in future diagnostics?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Re: [DISCUSS] FLIP-423 ～FLIP-428: Introduce Disaggregated State Storage and Management in Flink 2.0

2024-03-19 Thread Zakelly Lan

Hi Yunfeng,

Thanks for the suggestion!

I will reorganize the FLIP-425 accordingly.


Best,
Zakelly

On Tue, Mar 19, 2024 at 3:20 PM Yunfeng Zhou 
wrote:

> Hi Xintong and Zakelly,
>
> > 2. Regarding Strictly-ordered and Out-of-order of Watermarks
> I agree with it that watermarks can use only out-of-order mode for
> now, because there is still not a concrete example showing the
> correctness risk about it. However, the strictly-ordered mode should
> still be supported as the default option for non-record event types
> other than watermark, at least for checkpoint barriers.
>
> I noticed that this information has already been documented in "For
> other non-record events, such as RecordAttributes ...", but it's at
> the bottom of the "Watermark" section, which might not be very
> obvious. Thus it might be better to reorganize the FLIP to better
> claim that the two order modes are designed for all non-record events,
> and which mode this FLIP would choose for each type of event.
>
> Best,
> Yunfeng
>
> On Tue, Mar 19, 2024 at 1:09 PM Xintong Song 
> wrote:
> >
> > Thanks for the quick response. Sounds good to me.
> >
> > Best,
> >
> > Xintong
> >
> >
> >
> > On Tue, Mar 19, 2024 at 1:03 PM Zakelly Lan 
> wrote:
> >
> > > Hi Xintong,
> > >
> > > Thanks for sharing your thoughts!
> > >
> > > 1. Regarding Record-ordered and State-ordered of processElement.
> > > >
> > > > I understand that while State-ordered likely provides better
> performance,
> > > > Record-ordered is sometimes required for correctness. The question
> is how
> > > > should a user choose between these two modes? My concern is that
> such a
> > > > decision may require users to have in-depth knowledge about the Flink
> > > > internals, and may lead to correctness issues if State-ordered is
> chosen
> > > > improperly.
> > > >
> > > > I'd suggest not to expose such a knob, at least in the first version.
> > > That
> > > > means always use Record-ordered for custom operators / UDFs, and keep
> > > > State-ordered for internal usages (built-in operators) only.
> > > >
> > >
> > > Indeed, users may not be able to choose the mode properly. I agree to
> keep
> > > such options for internal use.
> > >
> > >
> > > 2. Regarding Strictly-ordered and Out-of-order of Watermarks.
> > > >
> > > > I'm not entirely sure about Strictly-ordered being the default, or
> even
> > > > being supported. From my understanding, a Watermark(T) only suggests
> that
> > > > all records with event time before T has arrived, and it has nothing
> to
> > > do
> > > > with whether records with event time after T has arrived or not. From
> > > that
> > > > perspective, preventing certain records from arriving before a
> Watermark
> > > is
> > > > never supported. I also cannot come up with any use case where
> > > > Strictly-ordered is necessary. This implies the same issue as 1): how
> > > does
> > > > the user choose between the two modes?
> > > >
> > > > I'd suggest not expose the knob to users and only support
> Out-of-order,
> > > > until we see a concrete use case that Strictly-ordered is needed.
> > > >
> > >
> > > The semantics of watermarks do not define the sequence between a
> watermark
> > > and subsequent records. For the most part, this is inconsequential,
> except
> > > it may affect some current users who have previously relied on the
> implicit
> > > assumption of an ordered execution. I'd be fine with initially
> supporting
> > > only out-of-order processing. We may consider exposing the
> > > 'Strictly-ordered' mode once we encounter a concrete use case that
> > > necessitates it.
> > >
> > >
> > > My philosophies behind not exposing the two config options are:
> > > > - There are already too many options in Flink that barely know how
> to use
> > > > them. I think Flink should try as much as possible to decide its own
> > > > behavior, rather than throwing all the decisions to the users.
> > > > - It's much harder to take back knobs than to introduce them.
> Therefore,
> > > > options should be introduced only if concrete use cases are
> identified.
> > > >
> > >
> > > I agree to keep minimal configurable items especially for the MVP.
> Given
> > > that we have the opportunity to refine the functionality before the
> > > framework transitions from @Experimental to @PublicEvolving, it makes
> sense
> > > to refrain from presenting user-facing options until we have ensured
> > > their necessity.
> > >
> > >
> > > Best,
> > > Zakelly
> > >
> > > On Tue, Mar 19, 2024 at 12:06 PM Xintong Song 
> > > wrote:
> > >
> > > > Sorry for joining the discussion late.
> > > >
> > > > I have two questions about FLIP-425.
> > > >
> > > > 1. Regarding Record-ordered and State-ordered of processElement.
> > > >
> > > > I understand that while State-ordered likely provides better
> performance,
> > > > Record-ordered is sometimes required for correctness. The question
> is how
> > > > should a user choose between these two modes? My concern is that
> such a
> > > > decision

Re: [DISCUSS] FLIP-423 ～FLIP-428: Introduce Disaggregated State Storage and Management in Flink 2.0

2024-03-19 Thread Yunfeng Zhou

Hi Xintong and Zakelly,

> 2. Regarding Strictly-ordered and Out-of-order of Watermarks
I agree with it that watermarks can use only out-of-order mode for
now, because there is still not a concrete example showing the
correctness risk about it. However, the strictly-ordered mode should
still be supported as the default option for non-record event types
other than watermark, at least for checkpoint barriers.

I noticed that this information has already been documented in "For
other non-record events, such as RecordAttributes ...", but it's at
the bottom of the "Watermark" section, which might not be very
obvious. Thus it might be better to reorganize the FLIP to better
claim that the two order modes are designed for all non-record events,
and which mode this FLIP would choose for each type of event.

Best,
Yunfeng

On Tue, Mar 19, 2024 at 1:09 PM Xintong Song  wrote:
>
> Thanks for the quick response. Sounds good to me.
>
> Best,
>
> Xintong
>
>
>
> On Tue, Mar 19, 2024 at 1:03 PM Zakelly Lan  wrote:
>
> > Hi Xintong,
> >
> > Thanks for sharing your thoughts!
> >
> > 1. Regarding Record-ordered and State-ordered of processElement.
> > >
> > > I understand that while State-ordered likely provides better performance,
> > > Record-ordered is sometimes required for correctness. The question is how
> > > should a user choose between these two modes? My concern is that such a
> > > decision may require users to have in-depth knowledge about the Flink
> > > internals, and may lead to correctness issues if State-ordered is chosen
> > > improperly.
> > >
> > > I'd suggest not to expose such a knob, at least in the first version.
> > That
> > > means always use Record-ordered for custom operators / UDFs, and keep
> > > State-ordered for internal usages (built-in operators) only.
> > >
> >
> > Indeed, users may not be able to choose the mode properly. I agree to keep
> > such options for internal use.
> >
> >
> > 2. Regarding Strictly-ordered and Out-of-order of Watermarks.
> > >
> > > I'm not entirely sure about Strictly-ordered being the default, or even
> > > being supported. From my understanding, a Watermark(T) only suggests that
> > > all records with event time before T has arrived, and it has nothing to
> > do
> > > with whether records with event time after T has arrived or not. From
> > that
> > > perspective, preventing certain records from arriving before a Watermark
> > is
> > > never supported. I also cannot come up with any use case where
> > > Strictly-ordered is necessary. This implies the same issue as 1): how
> > does
> > > the user choose between the two modes?
> > >
> > > I'd suggest not expose the knob to users and only support Out-of-order,
> > > until we see a concrete use case that Strictly-ordered is needed.
> > >
> >
> > The semantics of watermarks do not define the sequence between a watermark
> > and subsequent records. For the most part, this is inconsequential, except
> > it may affect some current users who have previously relied on the implicit
> > assumption of an ordered execution. I'd be fine with initially supporting
> > only out-of-order processing. We may consider exposing the
> > 'Strictly-ordered' mode once we encounter a concrete use case that
> > necessitates it.
> >
> >
> > My philosophies behind not exposing the two config options are:
> > > - There are already too many options in Flink that barely know how to use
> > > them. I think Flink should try as much as possible to decide its own
> > > behavior, rather than throwing all the decisions to the users.
> > > - It's much harder to take back knobs than to introduce them. Therefore,
> > > options should be introduced only if concrete use cases are identified.
> > >
> >
> > I agree to keep minimal configurable items especially for the MVP. Given
> > that we have the opportunity to refine the functionality before the
> > framework transitions from @Experimental to @PublicEvolving, it makes sense
> > to refrain from presenting user-facing options until we have ensured
> > their necessity.
> >
> >
> > Best,
> > Zakelly
> >
> > On Tue, Mar 19, 2024 at 12:06 PM Xintong Song 
> > wrote:
> >
> > > Sorry for joining the discussion late.
> > >
> > > I have two questions about FLIP-425.
> > >
> > > 1. Regarding Record-ordered and State-ordered of processElement.
> > >
> > > I understand that while State-ordered likely provides better performance,
> > > Record-ordered is sometimes required for correctness. The question is how
> > > should a user choose between these two modes? My concern is that such a
> > > decision may require users to have in-depth knowledge about the Flink
> > > internals, and may lead to correctness issues if State-ordered is chosen
> > > improperly.
> > >
> > > I'd suggest not to expose such a knob, at least in the first version.
> > That
> > > means always use Record-ordered for custom operators / UDFs, and keep
> > > State-ordered for internal usages (built-in operators) only.
> > >
> > > 2. Regarding

Re: [ANNOUNCE] Apache Flink 1.19.0 released

2024-03-19 Thread Shawn Huang

Congratulations!

Best,
Shawn Huang


Xuannan Su  于2024年3月19日周二 14:40写道：

> Congratulations! Thanks for all the great work!
>
> Best regards,
> Xuannan
>
> On Tue, Mar 19, 2024 at 1:31 PM Yu Li  wrote:
> >
> > Congrats and thanks all for the efforts!
> >
> > Best Regards,
> > Yu
> >
> > On Tue, 19 Mar 2024 at 11:51, gongzhongqiang 
> wrote:
> > >
> > > Congrats! Thanks to everyone involved!
> > >
> > > Best,
> > > Zhongqiang Gong
> > >
> > > Lincoln Lee  于2024年3月18日周一 16:27写道：
> > >>
> > >> The Apache Flink community is very happy to announce the release of
> Apache
> > >> Flink 1.19.0, which is the fisrt release for the Apache Flink 1.19
> series.
> > >>
> > >> Apache Flink® is an open-source stream processing framework for
> > >> distributed, high-performing, always-available, and accurate data
> streaming
> > >> applications.
> > >>
> > >> The release is available for download at:
> > >> https://flink.apache.org/downloads.html
> > >>
> > >> Please check out the release blog post for an overview of the
> improvements
> > >> for this bugfix release:
> > >>
> https://flink.apache.org/2024/03/18/announcing-the-release-of-apache-flink-1.19/
> > >>
> > >> The full release notes are available in Jira:
> > >>
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12353282
> > >>
> > >> We would like to thank all contributors of the Apache Flink community
> who
> > >> made this release possible!
> > >>
> > >>
> > >> Best,
> > >> Yun, Jing, Martijn and Lincoln
>

Re: [ANNOUNCE] Apache Flink 1.19.0 released

2024-03-19 Thread Xuannan Su

Congratulations! Thanks for all the great work!

Best regards,
Xuannan

On Tue, Mar 19, 2024 at 1:31 PM Yu Li  wrote:
>
> Congrats and thanks all for the efforts!
>
> Best Regards,
> Yu
>
> On Tue, 19 Mar 2024 at 11:51, gongzhongqiang  
> wrote:
> >
> > Congrats! Thanks to everyone involved!
> >
> > Best,
> > Zhongqiang Gong
> >
> > Lincoln Lee  于2024年3月18日周一 16:27写道：
> >>
> >> The Apache Flink community is very happy to announce the release of Apache
> >> Flink 1.19.0, which is the fisrt release for the Apache Flink 1.19 series.
> >>
> >> Apache Flink® is an open-source stream processing framework for
> >> distributed, high-performing, always-available, and accurate data streaming
> >> applications.
> >>
> >> The release is available for download at:
> >> https://flink.apache.org/downloads.html
> >>
> >> Please check out the release blog post for an overview of the improvements
> >> for this bugfix release:
> >> https://flink.apache.org/2024/03/18/announcing-the-release-of-apache-flink-1.19/
> >>
> >> The full release notes are available in Jira:
> >> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12353282
> >>
> >> We would like to thank all contributors of the Apache Flink community who
> >> made this release possible!
> >>
> >>
> >> Best,
> >> Yun, Jing, Martijn and Lincoln

[jira] [Created] (FLINK-34726) Flink Kubernetes Operator has some room for optimizing performance.

2024-03-19 Thread Fei Feng (Jira)

Fei Feng created FLINK-34726:


 Summary: Flink Kubernetes Operator has some room for optimizing 
performance.
 Key: FLINK-34726
 URL: https://issues.apache.org/jira/browse/FLINK-34726
 Project: Flink
  Issue Type: Improvement
  Components: Kubernetes Operator
Affects Versions: kubernetes-operator-1.7.0, kubernetes-operator-1.6.0, 
kubernetes-operator-1.5.0
Reporter: Fei Feng
 Attachments: operator_no_submit_no_kill.flamegraph.html

When there is a huge number of FlinkDeployment and FlinkSessionJob in a 
kubernetes cluster, there will be a significant delay between event submit into 
reconcile thread pool and  event is processed. 

this is our test：we give operator enough resource（cpu: 10core, memory: 20g, 
reconcile thread pool  size was 200 ) and we deployed 1 jobs firstly (one 
FlinkDeployment and one SessionJob per job) , then we do submit/delete job 
tests. we found that 
1. it cost about 2min between create new FlinkDeployment and FlinkSessionJob CR 
to k8s and the flink job submited to jobmanager.
2. it cost about 1min between delete a FlinkDeployment and FlinkSessionJob CR  
and the flink job and session cluster cleared.

 

I use async-profiler to get flamegraph when  there is a huge number 
FlinkDeployment and FlinkSessionJob. I found two obvious areas for optimization

1. For Flinkdeployment: in the observe step, we call 
AbstractFlinkService.getClusterInfo/listJobs/getTaskManagerInfo , every time we 
call these method we need create RestClusterClient/ send requests/ close, I 
think we should reuse RestClusterClient as much as possible to avoid frequently 
creating objects to reduce GC pressure

2. For FlinkSessionJob （This issue is more obvious）: in the whole reconcile 
loop, we call getSecondaryResource 5 times to get FlinkDeployement resource 
info. Based on my current understanding of the Flink Operator, I think we do 
not need to call it 5 times in a single reconcile loop, calling it once is 
enough. If yes, we cloud save 30% cpu usage (every getSecondaryResource cost 6% 
cpu usage)

[^operator_no_submit_no_kill.flamegraph.html]

I hope we can discuss solutions to address this problem together. I'm very 
willing to optimize and resolve this issue.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

74 matches

Mail list logo