Re: [ANNOUNCE] New Apache Flink PMC Member - Weijie Guo

2024-06-04 Thread Weihua Hu
Congratulations, Weijie!

Best,
Weihua


On Tue, Jun 4, 2024 at 3:03 PM Yuxin Tan  wrote:

> Congratulations, Weijie!
>
> Best,
> Yuxin
>
>
> Yuepeng Pan  于2024年6月4日周二 14:57写道:
>
> > Congratulations !
> >
> >
> > Best,
> > Yuepeng Pan
> >
> > At 2024-06-04 14:45:45, "Xintong Song"  wrote:
> > >Hi everyone,
> > >
> > >On behalf of the PMC, I'm very happy to announce that Weijie Guo has
> > joined
> > >the Flink PMC!
> > >
> > >Weijie has been an active member of the Apache Flink community for many
> > >years. He has made significant contributions in many components,
> including
> > >runtime, shuffle, sdk, connectors, etc. He has driven / participated in
> > >many FLIPs, authored and reviewed hundreds of PRs, been consistently
> > active
> > >on mailing lists, and also helped with release management of 1.20 and
> > >several other bugfix releases.
> > >
> > >Congratulations and welcome Weijie!
> > >
> > >Best,
> > >
> > >Xintong (on behalf of the Flink PMC)
> >
>


Re: [ANNOUNCE] Apache Flink 1.19.0 released

2024-03-18 Thread Weihua Hu
Congratulations

Best,
Weihua


On Tue, Mar 19, 2024 at 10:56 AM Rodrigo Meneses  wrote:

> Congratulations
>
> On Mon, Mar 18, 2024 at 7:43 PM Yu Chen  wrote:
>
> > Congratulations!
> > Thanks to release managers and everyone involved!
> >
> > Best,
> > Yu Chen
> >
> >
> > > 2024年3月19日 01:01,Jeyhun Karimov  写道:
> > >
> > > Congrats!
> > > Thanks to release managers and everyone involved.
> > >
> > > Regards,
> > > Jeyhun
> > >
> > > On Mon, Mar 18, 2024 at 9:25 AM Lincoln Lee 
> > wrote:
> > >
> > >> The Apache Flink community is very happy to announce the release of
> > Apache
> > >> Flink 1.19.0, which is the fisrt release for the Apache Flink 1.19
> > series.
> > >>
> > >> Apache Flink® is an open-source stream processing framework for
> > >> distributed, high-performing, always-available, and accurate data
> > streaming
> > >> applications.
> > >>
> > >> The release is available for download at:
> > >> https://flink.apache.org/downloads.html
> > >>
> > >> Please check out the release blog post for an overview of the
> > improvements
> > >> for this bugfix release:
> > >>
> > >>
> >
> https://flink.apache.org/2024/03/18/announcing-the-release-of-apache-flink-1.19/
> > >>
> > >> The full release notes are available in Jira:
> > >>
> > >>
> >
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12353282
> > >>
> > >> We would like to thank all contributors of the Apache Flink community
> > who
> > >> made this release possible!
> > >>
> > >>
> > >> Best,
> > >> Yun, Jing, Martijn and Lincoln
> > >>
> >
> >
>


Re: [ANNOUNCE] New Apache Flink Committer - Jiabao Sun

2024-02-22 Thread Weihua Hu
Congratulations, Jiabao!

Best,
Weihua


On Thu, Feb 22, 2024 at 10:34 AM Jingsong Li  wrote:

> Congratulations! Well deserved!
>
> On Wed, Feb 21, 2024 at 4:36 PM Yuepeng Pan  wrote:
> >
> > Congratulations~ :)
> >
> > Best,
> > Yuepeng Pan
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > 在 2024-02-21 09:52:17,"Hongshun Wang"  写道:
> > >Congratulations, Jiabao :)
> > >Congratulations Jiabao!
> > >
> > >Best,
> > >Hongshun
> > >Best regards,
> > >
> > >Weijie
> > >
> > >On Tue, Feb 20, 2024 at 2:19 PM Runkang He  wrote:
> > >
> > >> Congratulations Jiabao!
> > >>
> > >> Best,
> > >> Runkang He
> > >>
> > >> Jane Chan  于2024年2月20日周二 14:18写道:
> > >>
> > >> > Congrats, Jiabao!
> > >> >
> > >> > Best,
> > >> > Jane
> > >> >
> > >> > On Tue, Feb 20, 2024 at 10:32 AM Paul Lam 
> wrote:
> > >> >
> > >> > > Congrats, Jiabao!
> > >> > >
> > >> > > Best,
> > >> > > Paul Lam
> > >> > >
> > >> > > > 2024年2月20日 10:29,Zakelly Lan  写道:
> > >> > > >
> > >> > > >> Congrats! Jiabao!
> > >> > >
> > >> > >
> > >> >
> > >>
>


Re: [VOTE] FLIP-407: Improve Flink Client performance in interactive scenarios

2024-01-14 Thread Weihua Hu
+1 binding

Best,
Weihua


On Mon, Jan 15, 2024 at 11:06 AM Yangze Guo  wrote:

> +1 binding
>
> Best,
> Yangze Guo
>
> On Thu, Jan 11, 2024 at 8:36 PM Rui Fan <1996fan...@gmail.com> wrote:
> >
> > +1 binding
> >
> > Best,
> > Rui
> >
> >
> > On Thu, 11 Jan 2024 at 19:45, xiangyu feng  wrote:
> >
> > > Hi all,
> > >
> > > I would like to start the vote for FLIP-407: Improve Flink Client
> > > performance in interactive scenarios[1].
> > > This FLIP was discussed in this thread [2].
> > >
> > > The vote will be open for at least 72 hours unless there is an
> objection or
> > > insufficient votes.
> > >
> > > Regards,
> > > Xiangyu
> > >
> > > [1]
> > >
> > >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-407%3A+Improve+Flink+Client+performance+in+interactive+scenarios
> > > [2] https://lists.apache.org/thread/ccsv66ygffgqbv956bnknbpllj4t24kj
> > >
>


Re: [Discuss] FLIP-407: Improve Flink Client performance in interactive scenarios

2024-01-08 Thread Weihua Hu
Thanks for proposing this FLIP.

Experiments have shown that it significantly enhances the real-time query
experience.
+1 for this.

Best,
Weihua


On Mon, Jan 8, 2024 at 5:19 PM Rui Fan <1996fan...@gmail.com> wrote:

> Thanks Xiangyu for the quick update!
>
> LGTM
>
> Best,
> Rui
>
> On Mon, Jan 8, 2024 at 4:27 PM xiangyu feng  wrote:
>
> > Hi Rui and Yong,
> >
> > Thx for ur reply.
> >
> > My initial attention here is that for short-lived jobs under high QPS: a
> > fixed delay retry strategy will cause extra resource waste and not
> flexible
> > enough, an exponential-backoff strategy might significantly increase the
> > query latency since the interval time grows too fast. An
> incremental-delay
> > strategy could be balanced between resource consumption and short-query
> > latency.
> >
> > With a second thought,  an exponential-delay retry strategy with a
> > configurable multiplier option can also achieve this goal. By setting the
> > default value of multiplier to 1, we can be consistent with the original
> > behavior and reduce the configuration items at the same time.
> >
> > I've updated this FLIP accordingly, look forward to your feedback.
> >
> > Regards,
> > Xiangyu Feng
> >
> >
> > Rui Fan <1996fan...@gmail.com> 于2024年1月8日周一 15:29写道:
> >
> >> Only one strategy is fine to me.
> >>
> >> When the multiplier is set to 1, the exponential-delay will become
> >> fixed-delay.
> >> So fixed-delay may not be needed.
> >>
> >> Best,
> >> Rui
> >>
> >> On Mon, Jan 8, 2024 at 2:17 PM Yong Fang  wrote:
> >>
> >> > I agree with @Rui that the current configuration for Flink Client is a
> >> > little complex. Can we just provide one strategy with less
> configuration
> >> > items for all scenarios?
> >> >
> >> > Best,
> >> > Fang Yong
> >> >
> >> > On Mon, Jan 8, 2024 at 11:19 AM Rui Fan <1996fan...@gmail.com> wrote:
> >> >
> >> > > Thanks xiangyu for driving this proposal! And sorry for the
> >> > > late reply.
> >> > >
> >> > > Overall looks good to me, I only have some minor questions:
> >> > >
> >> > > 1. Do we need to introduce 3 collect strategies in the first
> version?
> >> > >
> >> > > Large and comprehensive configuration items will bring
> >> > > additional learning costs and usage costs to users. I tend to
> >> > > provide users with out-of-the-box parameters and 2 collect
> >> > > strategies may be enough for users.
> >> > >
> >> > > IIUC, there is no big difference between exponential-delay and
> >> > > incremental-delay, especially the default parameters provided.
> >> > > I wonder could we provide a multiplier for exponential-delay
> strategy
> >> > > and removing the incremental-delay strategy?
> >> > >
> >> > > Of course, if you think multiplier option is not needed based on
> >> > > your production experience, it's totally fine for me. Simple is
> >> better.
> >> > >
> >> > > 2. Which strategy do you think is best in mass production?
> >> > >
> >> > > I'm working on FLIP-364[1], it's related to Flink failover restart
> >> > > strategy. IIUC, when one cluster only has a few flink jobs,
> >> > > fixed-delay is fine. It guarantees minimal latency without too
> >> > > much stress. But if one cluster has too many jobs, fixed-delay
> >> > > may not be stable.
> >> > >
> >> > > Do you think exponential-delay is better than fixed delay in this
> >> > > scenario? And which strategy is used in your production for now?
> >> > > Would you mind sharing it?
> >> > >
> >> > > Looking forwarding to your opinion~
> >> > >
> >> > > Best,
> >> > > Rui
> >> > >
> >> > > On Sat, Jan 6, 2024 at 5:54 PM xiangyu feng 
> >> > wrote:
> >> > >
> >> > > > Hi all,
> >> > > >
> >> > > > Thanks for the comments.
> >> > > >
> >> > > > If there is no further comment, we will open the voting thread
> next
> >> > week.
> >> > > >
> >> > > > Regards,
> >> > > > Xiangyu
> >> > > >
> >> > > > Zhanghao Chen  于2024年1月3日周三 16:46写道:
> >> > > >
> >> > > > > Thanks for driving this effort on improving the interactive use
> >> > > > experience
> >> > > > > of Flink. The proposal overall looks good to me.
> >> > > > >
> >> > > > > Best,
> >> > > > > Zhanghao Chen
> >> > > > > 
> >> > > > > From: xiangyu feng 
> >> > > > > Sent: Tuesday, December 26, 2023 16:51
> >> > > > > To: dev@flink.apache.org 
> >> > > > > Subject: [Discuss] FLIP-407: Improve Flink Client performance in
> >> > > > > interactive scenarios
> >> > > > >
> >> > > > > Hi devs,
> >> > > > >
> >> > > > > I'm opening this thread to discuss FLIP-407: Improve Flink
> Client
> >> > > > > performance in interactive scenarios. The POC test results and
> >> design
> >> > > doc
> >> > > > > can be found at: FLIP-407
> >> > > > > <
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-407%3A+Improve+Flink+Client+performance+when+interacting+with+dedicated+Flink+Session+Clusters
> >> > > > > >
> >> > > > > .
> >> > > > >
> >> > > > > Currently, Flink Client is mainly designed for one time
> >> interacti

Re: [DISCUSS] FLIP-395: Deprecate Global Aggregator Manager

2023-11-26 Thread Weihua Hu
Thanks Zhanghao for driving this FLIP.

+1 for this.

Best,
Weihua


On Mon, Nov 20, 2023 at 5:49 PM Zhanghao Chen 
wrote:

> Hi all,
>
> I'd like to start a discussion of FLIP-395: Deprecate Global Aggregator
> Manager [1].
>
> Global Aggregate Manager was introduced in [2] to support event time
> synchronization across sources and more generally, coordination of parallel
> tasks. AFAIK, this was only used in the Kinesis source for an early version
> of watermark alignment. Operator Coordinator, introduced in FLIP-27,
> provides a more powerful and elegant solution for that need and is part of
> the new source API standard. FLIP-217 further provides a complete solution
> for watermark alignment of source splits on top of the Operator Coordinator
> mechanism. Furthermore, Global Aggregate Manager manages state in JobMaster
> object, causing problems for adaptive parallelism changes [3].
>
> Therefore, I propose to deprecate the use of Global Aggregate Manager,
> which can improve the maintainability of the Flink codebase without
> compromising its functionality.
>
> Looking forward to your feedbacks, thanks.
>
> [1]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-395%3A+Deprecate+Global+Aggregator+Manager
> [2] https://issues.apache.org/jira/browse/FLINK-10886
> [3] https://issues.apache.org/jira/browse/FLINK-31245
>
> Best,
> Zhanghao Chen
>


Re: [ANNOUNCE] New Apache Flink Committer - Ron Liu

2023-10-15 Thread Weihua Hu
Congrats, Ron!

Best,
Weihua


On Mon, Oct 16, 2023 at 11:50 AM Yangze Guo  wrote:

> Congrats, Ron!
>
> Best,
> Yangze Guo
>
> On Mon, Oct 16, 2023 at 11:48 AM Matt Wang  wrote:
> >
> > Congratulations, Ron!
> >
> >
> > --
> >
> > Best,
> > Matt Wang
> >
> >
> >  Replied Message 
> > | From | Feng Jin |
> > | Date | 10/16/2023 11:29 |
> > | To |  |
> > | Subject | Re: [ANNOUNCE] New Apache Flink Committer - Ron Liu |
> > Congratulations, Ron!
> >
> > Best,
> > Feng
> >
> > On Mon, Oct 16, 2023 at 11:22 AM yh z  wrote:
> >
> > Congratulations, Ron!
> >
> > Best,
> > Yunhong (SwuferHong)
> >
> > Yuxin Tan  于2023年10月16日周一 11:12写道:
> >
> > Congratulations, Ron!
> >
> > Best,
> > Yuxin
> >
> >
> > Junrui Lee  于2023年10月16日周一 10:24写道:
> >
> > Congratulations Ron !
> >
> > Best,
> > Junrui
> >
> > Yun Tang  于2023年10月16日周一 10:22写道:
> >
> > Congratulations, Ron!
> >
> > Best
> > Yun Tang
> > 
> > From: yu zelin 
> > Sent: Monday, October 16, 2023 10:16
> > To: dev@flink.apache.org 
> > Cc: ron9@gmail.com 
> > Subject: Re: [ANNOUNCE] New Apache Flink Committer - Ron Liu
> >
> > Congratulations!
> >
> > Best,
> > Yu Zelin
> >
> > 2023年10月16日 09:56,Jark Wu  写道:
> >
> > Hi, everyone
> >
> > On behalf of the PMC, I'm very happy to announce Ron Liu as a new
> > Flink
> > Committer.
> >
> > Ron has been continuously contributing to the Flink project for
> > many
> > years,
> > authored and reviewed a lot of codes. He mainly works on Flink SQL
> > parts
> > and drove several important FLIPs, e.g., USING JAR (FLIP-214),
> > Operator
> > Fusion CodeGen (FLIP-315), Runtime Filter (FLIP-324). He has a
> > great
> > knowledge of the Batch SQL and improved a lot of batch performance
> > in
> > the
> > past several releases. He is also quite active in mailing lists,
> > participating in discussions and answering user questions.
> >
> > Please join me in congratulating Ron Liu for becoming a Flink
> > Committer!
> >
> > Best,
> > Jark Wu (on behalf of the Flink PMC)
> >
> >
> >
> >
> >
>


Re: [ANNOUNCE] New Apache Flink Committer - Jane Chan

2023-10-15 Thread Weihua Hu
Congrats, Jane!

Best,
Weihua


On Mon, Oct 16, 2023 at 11:50 AM Yangze Guo  wrote:

> Congrats, Jane!
>
> Best,
> Yangze Guo
>
> On Mon, Oct 16, 2023 at 11:49 AM Matt Wang  wrote:
> >
> > Congratulations Jane!
> >
> >
> > --
> >
> > Best,
> > Matt Wang
> >
> >
> >  Replied Message 
> > | From | Feng Jin |
> > | Date | 10/16/2023 11:29 |
> > | To |  |
> > | Subject | Re: [ANNOUNCE] New Apache Flink Committer - Jane Chan |
> > Congratulations Jane!
> >
> > Best,
> > Feng
> >
> > On Mon, Oct 16, 2023 at 11:23 AM yh z  wrote:
> >
> > Congratulations Jane!
> >
> > Best,
> > Yunhong (swuferHong)
> >
> > Yuxin Tan  于2023年10月16日周一 11:11写道:
> >
> > Congratulations Jane!
> >
> > Best,
> > Yuxin
> >
> >
> > xiangyu feng  于2023年10月16日周一 10:27写道:
> >
> > Congratulations Jane!
> >
> > Best,
> > Xiangyu
> >
> > Xuannan Su  于2023年10月16日周一 10:25写道:
> >
> > Congratulations Jane!
> >
> > Best,
> > Xuannan
> >
> > On Mon, Oct 16, 2023 at 10:21 AM Yun Tang  wrote:
> >
> > Congratulations, Jane!
> >
> > Best
> > Yun Tang
> > 
> > From: Rui Fan <1996fan...@gmail.com>
> > Sent: Monday, October 16, 2023 10:16
> > To: dev@flink.apache.org 
> > Cc: qingyue@gmail.com 
> > Subject: Re: [ANNOUNCE] New Apache Flink Committer - Jane Chan
> >
> > Congratulations Jane!
> >
> > Best,
> > Rui
> >
> > On Mon, Oct 16, 2023 at 10:15 AM yu zelin 
> > wrote:
> >
> > Congratulations!
> >
> > Best,
> > Yu Zelin
> >
> > 2023年10月16日 09:58,Jark Wu  写道:
> >
> > Hi, everyone
> >
> > On behalf of the PMC, I'm very happy to announce Jane Chan as a
> > new
> > Flink
> > Committer.
> >
> > Jane started code contribution in Jan 2021 and has been active
> > in
> > the
> > Flink
> > community since. She authored more than 60 PRs and reviewed
> > more
> > than 40
> > PRs. Her contribution mainly revolves around Flink SQL,
> > including
> > Plan
> > Advice (FLIP-280), operator-level state TTL (FLIP-292), and
> > ALTER
> > TABLE
> > statements (FLINK-21634). Jane participated deeply in
> > development
> > discussions and also helped answer user question emails. Jane
> > was
> > also a
> > core contributor of Flink Table Store (now Paimon) when the
> > project
> > was
> > in
> > the early days.
> >
> > Please join me in congratulating Jane Chan for becoming a Flink
> > Committer!
> >
> > Best,
> > Jark Wu (on behalf of the Flink PMC)
> >
> >
> >
> >
> >
> >
>


Re: [VOTE] FLIP-374: Adding a separate configuration for specifying Java Options of the SQL Gateway

2023-10-10 Thread Weihua Hu
+1(binding)

Best,
Weihua


On Wed, Oct 11, 2023 at 10:56 AM xiangyu feng  wrote:

> +1(non-binding)
>
> Regards,
> Xiangyu
>
> Shammon FY  于2023年10月11日周三 10:30写道:
>
> > +1(binding), good job!
> >
> > Best,
> > Shammon FY
> >
> > On Wed, Oct 11, 2023 at 10:18 AM Benchao Li 
> wrote:
> >
> > > +1 (binding)
> > >
> > > Rui Fan <1996fan...@gmail.com> 于2023年10月11日周三 10:17写道:
> > > >
> > > > +1(binding)
> > > >
> > > > Best,
> > > > Rui
> > > >
> > > > On Wed, Oct 11, 2023 at 10:07 AM Yangze Guo 
> > wrote:
> > > >
> > > > > Hi everyone,
> > > > >
> > > > > I'd like to start the vote of FLIP-374 [1]. This FLIP is discussed
> in
> > > > > the thread [2].
> > > > >
> > > > > The vote will be open for at least 72 hours. Unless there is an
> > > > > objection, I'll try to close it by October 16, 2023 if we have
> > > > > received sufficient votes.
> > > > >
> > > > > [1]
> > > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-374%3A+Adding+a+separate+configuration+for+specifying+Java+Options+of+the+SQL+Gateway
> > > > > [2]
> https://lists.apache.org/thread/g4vl8mgnwgl7vjyvjy6zrc8w54b2lthv
> > > > >
> > > > > Best,
> > > > > Yangze Guo
> > > > >
> > >
> > >
> > >
> > > --
> > >
> > > Best,
> > > Benchao Li
> > >
> >
>


Re: [VOTE] FLIP-367: Support Setting Parallelism for Table/SQL Sources

2023-10-08 Thread Weihua Hu
+1 (binding)

Best,
Weihua


On Mon, Oct 9, 2023 at 11:47 AM Shammon FY  wrote:

> +1 (binding)
>
>
> On Mon, Oct 9, 2023 at 11:12 AM Benchao Li  wrote:
>
> > +1 (binding)
> >
> > Zhanghao Chen  于2023年10月9日周一 10:20写道:
> > >
> > > Hi All,
> > >
> > > Thanks for all the feedback on FLIP-367: Support Setting Parallelism
> for
> > Table/SQL Sources [1][2].
> > >
> > > I'd like to start a vote for FLIP-367. The vote will be open until Oct
> > 12th 12:00 PM GMT) unless there is an objection or insufficient votes.
> > >
> > > [1]
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=263429150
> > > [2] https://lists.apache.org/thread/gtpswl42jzv0c9o3clwqskpllnw8rh87
> > >
> > > Best,
> > > Zhanghao Chen
> >
> >
> >
> > --
> >
> > Best,
> > Benchao Li
> >
>


Re: [VOTE] FLIP-362: Support minimum resource limitation

2023-09-25 Thread Weihua Hu
+1 (binding)

Best,
Weihua


On Tue, Sep 26, 2023 at 10:00 AM Shammon FY  wrote:

> +1(binding), thanks for the proposal.
>
> Best,
> Shammon FY
>
> On Mon, Sep 25, 2023 at 11:20 PM Jing Ge 
> wrote:
>
> > +1(binding) Thanks!
> >
> > Best regards,
> > Jing
> >
> > On Mon, Sep 25, 2023 at 3:48 AM Chen Zhanghao  >
> > wrote:
> >
> > > Thanks for driving this. +1 (non-binding)
> > >
> > > Best,
> > > Zhanghao Chen
> > > 
> > > 发件人: xiangyu feng 
> > > 发送时间: 2023年9月25日 17:38
> > > 收件人: dev@flink.apache.org 
> > > 主题: [VOTE] FLIP-362: Support minimum resource limitation
> > >
> > > Hi all,
> > >
> > > I would like to start the vote for FLIP-362:  Support minimum resource
> > > limitation[1].
> > > This FLIP was discussed in this thread [2].
> > >
> > > The vote will be open for at least 72 hours unless there is an
> objection
> > or
> > > insufficient votes.
> > >
> > > Regards,
> > > Xiangyu
> > >
> > > [1]
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-362%3A+Support+minimum+resource+limitation
> > > [2] https://lists.apache.org/thread/m2v9n4yynm97v8swhqj2o5k0sqlb5ym4
> > >
> >
>


Re: [VOTE] FLIP-363: Unify the Representation of TaskManager Location in REST API and Web UI

2023-09-19 Thread Weihua Hu
+1(binding)

Best,
Weihua


On Tue, Sep 19, 2023 at 6:16 PM Jing Ge  wrote:

> +1(binding) Thanks!
>
> Best regards,
> Jing
>
> On Tue, Sep 19, 2023 at 9:01 AM Chen Zhanghao 
> wrote:
>
> > Hi Devs,
> >
> > Thanks for all the feedbacks on FLIP-363: Unify the Representation of
> > TaskManager Location in REST API and Web UI [1][2]. Given that the
> > consensus on the naming issue has been reached (using "endpoint" instead
> of
> > "location"),  I'd like to restart the vote for it. The vote will be open
> > for at least 72 hours (until Sep 22th 12:00 PM GMT) unless there is an
> > objection or insufficient votes.
> >
> > [1]
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-363%3A+Unify+the+Representation+of+TaskManager+Location+in+REST+API+and+Web+UI
> > [2] https://lists.apache.org/thread/sls1196mmk25w8nm2qf585254nbjr9hd
> >
> > Best,
> > Zhanghao Chen
> > 
> > 发件人: Chen Zhanghao 
> > 发送时间: 2023年9月18日 19:19
> > 收件人: dev@flink.apache.org ; Jing Ge <
> > j...@ververica.com>
> > 主题: 回复: [VOTE] FLIP-363: Unify the Representation of TaskManager Location
> > in REST API and Web UI
> >
> > Thanks for pointing that out. Let's give it a bit more time for reaching
> > consensus on the naming issue and postpone the voting for now. Sorry for
> > the inconvenience here. Will send another email once the voting restarts.
> >
> > Best,
> > Zhanghao Chen
> > 
> > 发件人: Rui Fan <1996fan...@gmail.com>
> > 发送时间: 2023年9月18日 11:55
> > 收件人: dev@flink.apache.org ; Jing Ge <
> > j...@ververica.com>
> > 主题: Re: [VOTE] FLIP-363: Unify the Representation of TaskManager Location
> > in REST API and Web UI
> >
> > A gentle reminder about the location naming.
> > The naming of location is a little unclear, but
> > I can't think of any other better naming.
> >
> > So I +1(binding) first.
> >
> > Ping @Jing Ge  to help double check the name again.
> >
> > Sorry for mentioning naming in the VOTE thread,
> > I didn't know this VOTE would be so early.
> >
> > Best,
> > Rui
> >
> > On Mon, Sep 18, 2023 at 11:44 AM Yangze Guo  wrote:
> >
> > > +1 (binding)
> > >
> > > Best,
> > > Yangze Guo
> > >
> > > On Mon, Sep 18, 2023 at 11:37 AM Chen Zhanghao
> > >  wrote:
> > > >
> > > > Hi All,
> > > >
> > > > Thanks for all the feedback on FLIP-363: Unify the Representation of
> > > TaskManager Location in REST API and Web UI [1][2]
> > > >
> > > > I'd like to start a vote for FLIP-363. The vote will be open for at
> > > least 72 hours unless there is an objection or insufficient votes.
> > > >
> > > > [1]
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-363%3A+Unify+the+Representation+of+TaskManager+Location+in+REST+API+and+Web+UI
> > > > [2] https://lists.apache.org/thread/sls1196mmk25w8nm2qf585254nbjr9hd
> > > >
> > > > Best,
> > > > Zhanghao Chen
> > >
> >
>


Re: [VOTE] FLIP-361: Improve GC Metrics

2023-09-15 Thread Weihua Hu
+1 (binding)

Best,
Weihua


On Fri, Sep 15, 2023 at 4:28 PM Yangze Guo  wrote:

> +1 (binding)
>
> Best,
> Yangze Guo
>
> On Thu, Sep 14, 2023 at 5:47 PM Maximilian Michels  wrote:
> >
> > +1 (binding)
> >
> > On Thu, Sep 14, 2023 at 4:26 AM Venkatakrishnan Sowrirajan
> >  wrote:
> > >
> > > +1 (non-binding)
> > >
> > > On Wed, Sep 13, 2023, 6:55 PM Matt Wang  wrote:
> > >
> > > > +1 (non-binding)
> > > >
> > > >
> > > > Thanks for driving this FLIP
> > > >
> > > >
> > > >
> > > >
> > > > --
> > > >
> > > > Best,
> > > > Matt Wang
> > > >
> > > >
> > > >  Replied Message 
> > > > | From | Xintong Song |
> > > > | Date | 09/14/2023 09:54 |
> > > > | To |  |
> > > > | Subject | Re: [VOTE] FLIP-361: Improve GC Metrics |
> > > > +1 (binding)
> > > >
> > > > Best,
> > > >
> > > > Xintong
> > > >
> > > >
> > > >
> > > > On Thu, Sep 14, 2023 at 2:40 AM Samrat Deb 
> wrote:
> > > >
> > > > +1 ( non binding)
> > > >
> > > > These improved GC metrics will be a great addition.
> > > >
> > > > Bests,
> > > > Samrat
> > > >
> > > > On Wed, 13 Sep 2023 at 7:58 PM, ConradJam 
> wrote:
> > > >
> > > > +1 (non-binding)
> > > > gc metrics help with autoscale tuning features
> > > >
> > > > Chen Zhanghao  于2023年9月13日周三 22:16写道:
> > > >
> > > > +1 (unbinding). Looking forward to it
> > > >
> > > > Best,
> > > > Zhanghao Chen
> > > > 
> > > > 发件人: Gyula Fóra 
> > > > 发送时间: 2023年9月13日 21:16
> > > > 收件人: dev 
> > > > 主题: [VOTE] FLIP-361: Improve GC Metrics
> > > >
> > > > Hi All!
> > > >
> > > > Thanks for all the feedback on FLIP-361: Improve GC Metrics [1][2]
> > > >
> > > > I'd like to start a vote for it. The vote will be open for at least
> 72
> > > > hours unless there is an objection or insufficient votes.
> > > >
> > > > Cheers,
> > > > Gyula
> > > >
> > > > [1]
> > > >
> > > >
> > > >
> > > >
> > > >
> https://urldefense.com/v3/__https://cwiki.apache.org/confluence/display/FLINK/FLIP-361*3A*Improve*GC*Metrics__;JSsrKw!!IKRxdwAv5BmarQ!dpHSOqsSHlPJ5gCvZ2yxSGjcR4xA2N-mpGZ1w2jPuKb78aujNpbzENmi1e7B26d6v4UQ8bQZ7IQaUcI$
> > > > [2]
> > > >
> https://urldefense.com/v3/__https://lists.apache.org/thread/qqqv54vyr4gbp63wm2d12q78m8h95xb2__;!!IKRxdwAv5BmarQ!dpHSOqsSHlPJ5gCvZ2yxSGjcR4xA2N-mpGZ1w2jPuKb78aujNpbzENmi1e7B26d6v4UQ8bQZFdEMnAg$
> > > >
> > > >
> > > >
> > > > --
> > > > Best
> > > >
> > > > ConradJam
> > > >
> > > >
> > > >
>


Re: [DISCUSS] FLIP-363: Unify the Representation of TaskManager Location in REST API and Web UI

2023-09-10 Thread Weihua Hu
Hi, Zhanghao

Since the meaning of "host" is not aligned, it seems good for me to remove
it in the future release.

Best,
Weihua


On Mon, Sep 11, 2023 at 11:48 AM Chen Zhanghao 
wrote:

> Hi Fan Rui,
>
> Thanks for clarifying the definition of "public interfaces", that helps a
> lot!
>
> Best,
> Zhanghao Chen
> 
> 发件人: Rui Fan <1996fan...@gmail.com>
> 发送时间: 2023年9月11日 11:18
> 收件人: dev@flink.apache.org 
> 主题: Re: [DISCUSS] FLIP-363: Unify the Representation of TaskManager
> Location in REST API and Web UI
>
> Thanks Zhanghao driving this FLIP, adding the port in Web UI
> seems good to me.
>
> Hi Shammon and Zhanghao,
>
> I would like to clarify the difference between Public Interfaces
> in FLIP and @Public in code.
>
> As I understand, the `Public Interfaces in FLIP` means these
> changes will be used in user side, such as: @Public class,
> Configuration settings, User-facing scripts/command-line tools,
> and rest api, etc.
>
> You can refer to  "What are the "public interfaces" of the project?"
> part in Flink Improvement Proposals doc[1].
>
> @Public class means the user will use this class directly, and
> these rest classes won't be depended on directly. So I think
> these classes related to rest don't need to be marked @Public.
>
> Please correct me if anything is wrong, thanks~
>
> [1]
>
> https://cwiki.apache.org/confluence/display/FLINK/Flink+Improvement+Proposals
>
> Best,
> Rui
>
> On Mon, Sep 11, 2023 at 11:09 AM Weihua Hu  wrote:
>
> > Hi, Zhanghao
> >
> > Thanks for bringing this proposal.
> >
> > I have a concern:
> >
> > I prefer to keep the "host" field and add a "location" field in future
> > versions.
> > Consider a scenario where a machine (host) with multiple TaskManagers has
> > poor processing performance due to some problems.
> > By using a host field aggregation, I can identify the problems with this
> > machine and take it offline.
> >
> > Best,
> > Weihua
> >
> >
> > On Mon, Sep 11, 2023 at 10:34 AM Chen Zhanghao <
> zhanghao.c...@outlook.com>
> > wrote:
> >
> > > Hi Shammon,
> > >
> > > I think all REST API response messages (e.g.
> > > SubtaskExecutionAttemptDetailsInfo) should be considered as part of the
> > > public APIs and therefore be marked as @Public. It is true though none
> of
> > > them are marked as @public yet. Maybe we should do that. ccing
> > > @chesnay<mailto:ches...@apache.org> for confirmation.
> > >
> > > Best,
> > > Zhanghao Chen
> > > 
> > > 发件人: Shammon FY 
> > > 发送时间: 2023年9月11日 10:22
> > > 收件人: dev@flink.apache.org 
> > > 主题: Re: [DISCUSS] FLIP-363: Unify the Representation of TaskManager
> > > Location in REST API and Web UI
> > >
> > > Thanks Zhanghao for initialing this discussion, I have just one
> comment:
> > >
> > > I checked the classes `SubtasksAllAccumulatorsHandler`,
> > > `SubtasksTimesHandler`, `SubtaskCurrentAttemptDetailsHandler`,
> > > `JobVertexTaskManagersHandler` and `JobExceptionsHandler` you mentioned
> > in
> > > `Public Interfaces` and they are not annotated as `Public`. So do you
> > want
> > > to annotate them as `Plublic`? If not, I think you may need to move
> them
> > > from `Public Interfaces` to `Proposed Changes`.
> > >
> > > Best,
> > > Shammon FY
> > >
> > > On Sat, Sep 9, 2023 at 12:11 PM Chen Zhanghao <
> zhanghao.c...@outlook.com
> > >
> > > wrote:
> > >
> > > > Hi Devs,
> > > >
> > > > I would like to start a discussion on FLIP-363: Unify the
> > Representation
> > > > of TaskManager Location in REST API and Web UI [1].
> > > >
> > > > The TaskManager location of subtasks is important for identifying
> > > > TM-related problems. There are a number of places in REST API and Web
> > UI
> > > > where TaskManager location is returned/displayed.
> > > >
> > > > Problems:
> > > >
> > > >   *   Only hostname is provided to represent TaskManager location in
> > some
> > > > places (e.g. SubtaskCurrentAttemptDetailsHandler). However, in a
> > > > containerized era, it is common to have multiple TMs on the same
> host,
> > > and
> > > > port info is crucial to distinguish different TMs.
> > &g

Re: [DISCUSS] FLIP-363: Unify the Representation of TaskManager Location in REST API and Web UI

2023-09-10 Thread Weihua Hu
Hi, Zhanghao

Thanks for bringing this proposal.

I have a concern:

I prefer to keep the "host" field and add a "location" field in future
versions.
Consider a scenario where a machine (host) with multiple TaskManagers has
poor processing performance due to some problems.
By using a host field aggregation, I can identify the problems with this
machine and take it offline.

Best,
Weihua


On Mon, Sep 11, 2023 at 10:34 AM Chen Zhanghao 
wrote:

> Hi Shammon,
>
> I think all REST API response messages (e.g.
> SubtaskExecutionAttemptDetailsInfo) should be considered as part of the
> public APIs and therefore be marked as @Public. It is true though none of
> them are marked as @public yet. Maybe we should do that. ccing
> @chesnay for confirmation.
>
> Best,
> Zhanghao Chen
> 
> 发件人: Shammon FY 
> 发送时间: 2023年9月11日 10:22
> 收件人: dev@flink.apache.org 
> 主题: Re: [DISCUSS] FLIP-363: Unify the Representation of TaskManager
> Location in REST API and Web UI
>
> Thanks Zhanghao for initialing this discussion, I have just one comment:
>
> I checked the classes `SubtasksAllAccumulatorsHandler`,
> `SubtasksTimesHandler`, `SubtaskCurrentAttemptDetailsHandler`,
> `JobVertexTaskManagersHandler` and `JobExceptionsHandler` you mentioned in
> `Public Interfaces` and they are not annotated as `Public`. So do you want
> to annotate them as `Plublic`? If not, I think you may need to move them
> from `Public Interfaces` to `Proposed Changes`.
>
> Best,
> Shammon FY
>
> On Sat, Sep 9, 2023 at 12:11 PM Chen Zhanghao 
> wrote:
>
> > Hi Devs,
> >
> > I would like to start a discussion on FLIP-363: Unify the Representation
> > of TaskManager Location in REST API and Web UI [1].
> >
> > The TaskManager location of subtasks is important for identifying
> > TM-related problems. There are a number of places in REST API and Web UI
> > where TaskManager location is returned/displayed.
> >
> > Problems:
> >
> >   *   Only hostname is provided to represent TaskManager location in some
> > places (e.g. SubtaskCurrentAttemptDetailsHandler). However, in a
> > containerized era, it is common to have multiple TMs on the same host,
> and
> > port info is crucial to distinguish different TMs.
> >   *   Inconsistent naming of the field to represent TaskManager location:
> > "host" is used in most places but "location" is also used in
> > JobExceptions-related places.
> >   *   Inconsistent semantics of the "host" field: The semantics of the
> > host field are inconsistent, sometimes it denotes hostname only while in
> > other times it denotes hostname + port (which is also inconsistent with
> the
> > name of "host").
> >
> > We propose to improve the current situation by:
> >
> >   *   Use a field named "location" that represents TaskManager location
> in
> > the form of "${hostname}:${port}" in a consistent manner across REST APIs
> > and the front-end.
> >   *   Rename the column name from "Host" to "Location" on the Web UI to
> > reflect the change that both hostname and port are displayed.
> >   *   Keep the old "host" fields untouched for compatibility. They can be
> > removed in the next major version.
> >
> > Looking forward to your feedback.
> >
> > [1] FLIP-363: Unify the Representation of TaskManager Location in REST
> API
> > and Web UI - Apache Flink - Apache Software Foundation<
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-363%3A+Unify+the+Representation+of+TaskManager+Location+in+REST+API+and+Web+UI
> > >
> >
> > Best,
> > Zhanghao Chen
> >
>


Re: [VOTE] FLIP-323: Support Attached Execution on Flink Application Completion for Batch Jobs

2023-09-10 Thread Weihua Hu
+1 (binding)

Best,
Weihua


On Mon, Sep 11, 2023 at 3:16 AM Jing Ge  wrote:

> +1(binding)
>
> Best Regards,
> Jing
>
> On Sun, Sep 10, 2023 at 10:17 AM Dong Lin  wrote:
>
> > Thanks Allison for proposing the FLIP.
> >
> > +1 (binding)
> >
> > On Fri, Sep 8, 2023 at 4:21 AM Allison Chang
>  > >
> > wrote:
> >
> > > Hi everyone,
> > >
> > > Would like to start the VOTE for FLIP-323<
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-323%3A+Support+Attached+Execution+on+Flink+Application+Completion+for+Batch+Jobs
> > >
> > > which proposes to introduce attached execution for batch jobs. The
> > > discussion thread can be found here<
> > > https://lists.apache.org/thread/d3toldk6qqjh2fnbmqthlfkj9rc6lwgl>:
> > >
> > >
> > > Best,
> > >
> > > Allison Chang
> > >
> > >
> >
>


Re: [DISCUSS] FLIP-323: Support Attached Execution on Flink Application Completion for Batch Jobs

2023-08-22 Thread Weihua Hu
Hi, Jiangjie

Thanks for the clarification.

My key point is the meaning of the "submission" in
"client.attached.after.submission".
At first glance, I thought only job submissions were taken into account.
After your clarification, this option also works for cluster submissions.

It's fine for me.

Best,
Weihua


On Wed, Aug 23, 2023 at 8:35 AM Becket Qin  wrote:

> Hi Weihua,
>
> Thanks for the explanation. From the doc, it looks like the current
> behaviors of "execution.attached=true" between Yarn and K8S session
> cluster are exactly the opposite. For YARN it basically means the cluster
> will shutdown if the client disconnects. For K8S, it means the cluster will
> not shutdown until a client explicitly stops it. This sounds like a bad
> situation to me and needs to be fixed.
>
> My guess is that the YARN behavior here is the original intended behavior,
> while K8S reused the configuration for a different purpose. If we deprecate
> the execution.attached config here. The behavior would be:
>
> For YARN session clusters:
> 1. Current "execution.attached=true" would be equivalent to
> "execution.shutdown-on-attached-exit=true" +
> "client.attached.after.submission=true".
> 2. Current "execution.attached=false" would be equivalent to
> "execution.shutdown-on-attached-exit=false", i.e. the cluster will keep
> running until explicitly stopped.
>
> I am not sure what the current behavior of "execution.attached=true" +
> "execution.shutdown-on-attached-exit=false" is. Supposedly, it should be
> equivalent to "execution.shutdown-on-attached-exit=false", which means
> "execution.attached" only controls the client side behavior, while the
> cluster side behavior is controlled by
> "execution.shutdown-on-attached-exit".
>
> For K8S session clusters:
> 1. Current "execution.attached=true" would be equivalent to
> "execution.shutdown-on-attached-exit=false".
> 2. Current "execution.attached=false" would be equivalent to
> "execution.shutdown-on-attached-exit=true" +
> "client.attached.after.submission=true".
>
> This will make the same config behave the same for YARN and K8S.
>
> Thanks,
>
> Jiangjie (Becket) Qin
>
> On Tue, Aug 22, 2023 at 11:04 PM Weihua Hu  wrote:
>
> > Hi, Jiangjie
> >
> > 'execution.attached' can be used to attach an existing cluster and stop
> it
> > [1][2],
> > which is not related to job submission. So does YARN session mode[3].
> > IMO, this behavior should not be controlled by the new option
> > 'client.attached.after.submission'.
> >
> > [1]
> >
> >
> https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/resource-providers/native_kubernetes/#session-mode
> > [2]
> >
> >
> https://github.com/apache/flink/blob/a85ffc491874ecf3410f747df3ed09f61df52ac6/flink-kubernetes/src/main/java/org/apache/flink/kubernetes/cli/KubernetesSessionCli.java#L126
> > [3]
> >
> >
> https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/resource-providers/yarn/#session-mode
> >
> > Best,
> > Weihua
> >
> >
> > On Tue, Aug 22, 2023 at 5:16 PM Becket Qin  wrote:
> >
> > > Hi Weihua,
> > >
> > > Just want to clarify a little bit, what is the impact of
> > > `execution.attached` on a cluster startup before a client submits a job
> > to
> > > that cluster? Does this config only become effective after a job
> > > submission?
> > >
> > > Currently, the cluster behavior has an independent config of
> > > 'execution.shutdown-on-attached-exit'. So if a client submitted a job
> in
> > > attached mode, and this `execution.shutdown-on-attached-exit` is set to
> > > true, the cluster will shutdown if the client detaches from the
> cluster.
> > Is
> > > this sufficient? Or do you mean we need another independent
> > configuration?
> > >
> > > Thanks,
> > >
> > > Jiangjie (Becket) Qin
> > >
> > > On Tue, Aug 22, 2023 at 2:20 PM Weihua Hu 
> > wrote:
> > >
> > > > Hi Jiangjie
> > > >
> > > > Sorry for the late reply, I fully agree with the three user sensible
> > > > behaviors you described.
> > > >
> > > > I would like to bring up a point.
> > > >
> > > > Currently, 'execution.attached' is not only used for submitting jobs,
> > > > But also for starting a new cluster (YARN and Kubernetes). If i

Re: [DISCUSS] FLIP-323: Support Attached Execution on Flink Application Completion for Batch Jobs

2023-08-22 Thread Weihua Hu
Hi, Jiangjie

'execution.attached' can be used to attach an existing cluster and stop it
[1][2],
which is not related to job submission. So does YARN session mode[3].
IMO, this behavior should not be controlled by the new option
'client.attached.after.submission'.

[1]
https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/resource-providers/native_kubernetes/#session-mode
[2]
https://github.com/apache/flink/blob/a85ffc491874ecf3410f747df3ed09f61df52ac6/flink-kubernetes/src/main/java/org/apache/flink/kubernetes/cli/KubernetesSessionCli.java#L126
[3]
https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/resource-providers/yarn/#session-mode

Best,
Weihua


On Tue, Aug 22, 2023 at 5:16 PM Becket Qin  wrote:

> Hi Weihua,
>
> Just want to clarify a little bit, what is the impact of
> `execution.attached` on a cluster startup before a client submits a job to
> that cluster? Does this config only become effective after a job
> submission?
>
> Currently, the cluster behavior has an independent config of
> 'execution.shutdown-on-attached-exit'. So if a client submitted a job in
> attached mode, and this `execution.shutdown-on-attached-exit` is set to
> true, the cluster will shutdown if the client detaches from the cluster. Is
> this sufficient? Or do you mean we need another independent configuration?
>
> Thanks,
>
> Jiangjie (Becket) Qin
>
> On Tue, Aug 22, 2023 at 2:20 PM Weihua Hu  wrote:
>
> > Hi Jiangjie
> >
> > Sorry for the late reply, I fully agree with the three user sensible
> > behaviors you described.
> >
> > I would like to bring up a point.
> >
> > Currently, 'execution.attached' is not only used for submitting jobs,
> > But also for starting a new cluster (YARN and Kubernetes). If it's true,
> > the starting cluster script will
> > wait for the user to input the next command (quit or stop).
> >
> > In my opinion, this behavior should have an independent option besides
> > "client.attached.after.submission" for control.
> >
> >
> > Best,
> > Weihua
> >
> >
> > On Thu, Aug 17, 2023 at 10:07 AM liu ron  wrote:
> >
> > > Hi, Jiangjie
> > >
> > > Thanks for your detailed explanation, I got your point. If the
> > > execution.attached is only used for client currently, removing it also
> > make
> > > sense to me.
> > >
> > > Best,
> > > Ron
> > >
> > > Becket Qin  于2023年8月17日周四 07:37写道:
> > >
> > > > Hi Ron,
> > > >
> > > > Isn't the cluster (session or per job) only using the
> > execution.attached
> > > to
> > > > determine whether the client is attached? If so, the client can
> always
> > > > include the information of whether it's an attached client or not in
> > the
> > > > JobSubmissoinRequestBody, right? For a shared session cluster, there
> > > could
> > > > be multiple clients submitting jobs to it. These clients may or may
> not
> > > be
> > > > attached. A static execution.attached configuration for the session
> > > cluster
> > > > does not work in this case, right?
> > > >
> > > > The current problem of execution.attached is that it is not always
> > > honored.
> > > > For example, if a session cluster was started with execution.attached
> > set
> > > > to false. And a client submits a job later to that session cluster
> with
> > > > execution.attached set to true. In this case, the cluster won't (and
> > > > shouldn't) shutdown after the job finishes or the attached client
> loses
> > > > connection. So, in fact, the execution.attached configuration is only
> > > > honored by the client, but not the cluster. Therefore, I think
> removing
> > > it
> > > > makes sense.
> > > >
> > > > Thanks,
> > > >
> > > > Jiangjie (Becket) Qin
> > > >
> > > > On Thu, Aug 17, 2023 at 12:31 AM liu ron  wrote:
> > > >
> > > > > Hi, Jiangjie
> > > > >
> > > > > Sorry for late reply. Thank you for such a detailed response. As
> you
> > > say,
> > > > > there are three behaviours here for users and I agree with you. The
> > > goal
> > > > of
> > > > > this FLIP is to clarify the behaviour of the client side, which I
> > also
> > > > > agree with. However, as weihua said, the config execution.attached
> is
> &

Re: [DISCUSS] FLIP-323: Support Attached Execution on Flink Application Completion for Batch Jobs

2023-08-21 Thread Weihua Hu
gt; > session clusters, the lifecycle of the Flink cluster should be
> > > independent
> > > > of the jobs running in it.
> > > >
> > > > As we can see, these three behaviors are sort of independent, the
> > current
> > > > configurations fail to support all the combination of wanted
> behaviors.
> > > > Ideally there should be three separate configurations, for example:
> > > > - client.attached.after.submission and client.heartbeat.timeout
> control
> > > the
> > > > behavior on the client side.
> > > > - jobmanager.cancel-on-attached-client-exit controls the behavior of
> > the
> > > > job when an attached client lost connection. The client heartbeat
> > timeout
> > > > and attach-ness will be also passed to the JM upon job submission.
> > > > - cluster.shutdown-on-first-job-finishes *(*or
> > > > jobmanager.shutdown-cluster-after-job-finishes) controls the cluster
> > > > behavior after the job finishes normally / abnormally. This is a
> > cluster
> > > > level setting instead of a job level setting. Therefore it can only
> be
> > > set
> > > > when launching the cluster.
> > > >
> > > > The current code sort of combines config 2 and 3 into
> > > > execution.shutdown-on-attach-exit.
> > > > This assumes the the life cycle of the cluster is the same as the job
> > > when
> > > > the client is attached. This FLIP does not intend to change that. but
> > > using
> > > > the execution.attached config for the client behavior control looks
> > > > misleading. So this FLIP proposes to replace it with a more intuitive
> > > > config of client.attached.after.submission. This makes it clear that
> it
> > > is
> > > > a configuration controlling the client side behavior, instead of the
> > > > execution of the job.
> > > >
> > > > Thanks,
> > > >
> > > > Jiangjie (Becket) Qin
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > On Thu, Aug 10, 2023 at 10:34 PM Weihua Hu 
> > > wrote:
> > > >
> > > > > Hi Allison
> > > > >
> > > > > Thanks for driving this FLIP. It's a valuable feature for batch
> jobs.
> > > > > This helps keep "Drop Per-Job Mode [1]" going.
> > > > >
> > > > > +1 for this proposal.
> > > > >
> > > > > However, it seems that the change in this FLIP is not detailed
> > enough.
> > > > > I have a few questions.
> > > > >
> > > > > 1. The config 'execution.attached' is not only used in per-job
> mode,
> > > > > but also in session mode to shutdown the cluster. IMHO, it's better
> > to
> > > > > keep this option name.
> > > > >
> > > > > 2. This FLIP only mentions YARN mode. I believe this feature should
> > > > > work in both YARN and Kubernetes mode.
> > > > >
> > > > > 3. Within the attach mode, we support two features:
> > > > > execution.shutdown-on-attached-exit
> > > > > and client.heartbeat.timeout. These should also be taken into
> > account.
> > > > >
> > > > > 4. The Application Mode will shut down once the job has been
> > completed.
> > > > > So, if we use the flink client to poll job status via REST API for
> > > attach
> > > > > mode,
> > > > > there is a chance that the client will not be able to retrieve the
> > job
> > > > > finish status.
> > > > > Perhaps FLINK-24113[3] will help with this.
> > > > >
> > > > >
> > > > > [1]https://issues.apache.org/jira/browse/FLINK-26000
> > > > > [2]
> > > > >
> > > > >
> > > >
> > >
> >
> https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/resource-providers/native_kubernetes/#session-mode
> > > > > [2]https://issues.apache.org/jira/browse/FLINK-24113
> > > > >
> > > > > Best,
> > > > > Weihua
> > > > >
> > > > >
> > > > > On Thu, Aug 10, 2023 at 10:47 AM liu ron 
> wrote:
> > > > >
> > > > > > Hi, Allison
> > > > > >
> > > > > > Thanks for driving th

[jira] [Created] (FLINK-32882) FineGrainedSlotManager cause NPE when clear pending taskmanager twice

2023-08-16 Thread Weihua Hu (Jira)
Weihua Hu created FLINK-32882:
-

 Summary: FineGrainedSlotManager cause NPE when clear pending 
taskmanager twice
 Key: FLINK-32882
 URL: https://issues.apache.org/jira/browse/FLINK-32882
 Project: Flink
  Issue Type: Bug
  Components: Runtime / Coordination
Affects Versions: 1.18.0
Reporter: Weihua Hu
Assignee: Weihua Hu
 Attachments: image-2023-08-16-17-34-32-619.png

When job finished we call 
processResourceRequirements(ResourceRequirements.empty) and 
clearResourceRequirements. Both methods trigger 
taskManagerTracker.clearPendingAllocationsOfJob(jobId) to release pending task 
manager early.
 
This causes NPE, we need to add a safety net for 
PendingTaskManager#clearPendingAllocationsOfJob



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] FLIP-323: Support Attached Execution on Flink Application Completion for Batch Jobs

2023-08-10 Thread Weihua Hu
Hi Allison

Thanks for driving this FLIP. It's a valuable feature for batch jobs.
This helps keep "Drop Per-Job Mode [1]" going.

+1 for this proposal.

However, it seems that the change in this FLIP is not detailed enough.
I have a few questions.

1. The config 'execution.attached' is not only used in per-job mode,
but also in session mode to shutdown the cluster. IMHO, it's better to
keep this option name.

2. This FLIP only mentions YARN mode. I believe this feature should
work in both YARN and Kubernetes mode.

3. Within the attach mode, we support two features:
execution.shutdown-on-attached-exit
and client.heartbeat.timeout. These should also be taken into account.

4. The Application Mode will shut down once the job has been completed.
So, if we use the flink client to poll job status via REST API for attach
mode,
there is a chance that the client will not be able to retrieve the job
finish status.
Perhaps FLINK-24113[3] will help with this.


[1]https://issues.apache.org/jira/browse/FLINK-26000
[2]
https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/resource-providers/native_kubernetes/#session-mode
[2]https://issues.apache.org/jira/browse/FLINK-24113

Best,
Weihua


On Thu, Aug 10, 2023 at 10:47 AM liu ron  wrote:

> Hi, Allison
>
> Thanks for driving this proposal, it looks cool for batch jobs under
> application mode. But after reading your FLIP document and [1], I have a
> question. Why do you want to rename the execution.attached configuration to
> client.attached.after.submission and at the same time deprecate
> execution.attached? Based on your design, I understand the role of these
> two options are the same. Introducing a new option would increase the cost
> of understanding and use for the user, so why not follow the idea discussed
> in FLINK-25495 and make Application mode support attached.execution.
>
> [1] https://issues.apache.org/jira/browse/FLINK-25495
>
> Best,
> Ron
>
> Venkatakrishnan Sowrirajan  于2023年8月9日周三 02:07写道:
>
> > This is definitely a useful feature especially for the flink batch
> > execution workloads using flow orchestrators like Airflow, Azkaban, Oozie
> > etc. Thanks for reviving this issue and starting a FLIP.
> >
> > Regards
> > Venkata krishnan
> >
> >
> > On Mon, Aug 7, 2023 at 4:09 PM Allison Chang
>  > >
> > wrote:
> >
> > > Hi all,
> > >
> > > I am opening this thread to discuss this proposal to support attached
> > > execution on Flink Application Completion for Batch Jobs. The link to
> the
> > > FLIP proposal is here:
> > >
> >
> https://urldefense.com/v3/__https://cwiki.apache.org/confluence/display/FLINK/FLIP-323*3A*Support*Attached*Execution*on*Flink*Application*Completion*for*Batch*Jobs__;JSsrKysrKysrKys!!IKRxdwAv5BmarQ!friFO6bJub5FKSLhPIzA6kv-7uffv-zXlv9ZLMKqj_xMcmZl62HhsgvwDXSCS5hfSeyHZgoAVSFg3fk7ChaAFNKi$
> > >
> > > This FLIP proposes adding back attached execution for Application Mode.
> > In
> > > the past attached execution was supported for the per-job mode, which
> > will
> > > be deprecated and we want to include this feature back into Application
> > > mode.
> > >
> > > Please reply to this email thread and share your thoughts/opinions.
> > >
> > > Thank you!
> > >
> > > Allison Chang
> > >
> >
>


Re: [ANNOUNCE] New Apache Flink Committer - Yanfei Lei

2023-08-07 Thread Weihua Hu
Congratulations Yanfei!

Best,
Weihua


On Mon, Aug 7, 2023 at 8:08 PM Feifan Wang  wrote:

> Congratulations Yanfei! :)
>
>
>
> ——
> Name: Feifan Wang
> Email: zoltar9...@163.com
>
>
>  Replied Message 
> | From | Matt Wang |
> | Date | 08/7/2023 19:40 |
> | To | dev@flink.apache.org |
> | Subject | Re: [ANNOUNCE] New Apache Flink Committer - Yanfei Lei |
> Congratulations Yanfei!
>
>
> --
>
> Best,
> Matt Wang
>
>
>  Replied Message 
> | From | Mang Zhang |
> | Date | 08/7/2023 18:56 |
> | To |  |
> | Subject | Re:Re: [ANNOUNCE] New Apache Flink Committer - Yanfei Lei |
> Congratulations--
>
> Best regards,
> Mang Zhang
>
>
>
>
>
> 在 2023-08-07 18:17:58,"Yuxin Tan"  写道:
> Congrats, Yanfei!
>
> Best,
> Yuxin
>
>
> weijie guo  于2023年8月7日周一 17:59写道:
>
> Congrats, Yanfei!
>
> Best regards,
>
> Weijie
>
>
> Biao Geng  于2023年8月7日周一 17:03写道:
>
> Congrats, Yanfei!
> Best,
> Biao Geng
>
> 发送自 Outlook for iOS
> 
> 发件人: Qingsheng Ren 
> 发送时间: Monday, August 7, 2023 4:23:52 PM
> 收件人: dev@flink.apache.org 
> 主题: Re: [ANNOUNCE] New Apache Flink Committer - Yanfei Lei
>
> Congratulations and welcome, Yanfei!
>
> Best,
> Qingsheng
>
> On Mon, Aug 7, 2023 at 4:19 PM Matthias Pohl  .invalid>
> wrote:
>
> Congratulations, Yanfei! :)
>
> On Mon, Aug 7, 2023 at 10:00 AM Junrui Lee 
> wrote:
>
> Congratulations Yanfei!
>
> Best,
> Junrui
>
> Yun Tang  于2023年8月7日周一 15:19写道:
>
> Congratulations, Yanfei!
>
> Best
> Yun Tang
> 
> From: Danny Cranmer 
> Sent: Monday, August 7, 2023 15:10
> To: dev 
> Subject: Re: [ANNOUNCE] New Apache Flink Committer - Yanfei Lei
>
> Congrats Yanfei! Welcome to the team.
>
> Danny
>
> On Mon, 7 Aug 2023, 08:03 Rui Fan, <1996fan...@gmail.com> wrote:
>
> Congratulations Yanfei!
>
> Best,
> Rui
>
> On Mon, Aug 7, 2023 at 2:56 PM Yuan Mei 
> wrote:
>
> On behalf of the PMC, I'm happy to announce Yanfei Lei as a new
> Flink
> Committer.
>
> Yanfei has been active in the Flink community for almost two
> years
> and
> has
> played an important role in developing and maintaining State
> and
> Checkpoint
> related features/components, including RocksDB Rescaling
> Performance
> Improvement and Generic Incremental Checkpoints.
>
> Yanfei also helps improve community infrastructure in many
> ways,
> including
> migrating the Flink Daily performance benchmark to the Apache
> Flink
> slack
> channel. She is the maintainer of the benchmark and has
> improved
> its
> detection stability significantly. She is also one of the major
> maintainers
> of the FrocksDB Repo and released FRocksDB 6.20.3 (part of
> Flink
> 1.17
> release). Yanfei is a very active community member, supporting
> users
> and
> participating
> in tons of discussions on the mailing lists.
>
> Please join me in congratulating Yanfei for becoming a Flink
> Committer!
>
> Thanks,
> Yuan Mei (on behalf of the Flink PMC)
>
>
>
>
>
>
>
>


Re: [ANNOUNCE] New Apache Flink PMC Member - Matthias Pohl

2023-08-03 Thread Weihua Hu
Congratulations,  Matthias!

Best,
Weihua


On Fri, Aug 4, 2023 at 2:49 PM Yuxin Tan  wrote:

> Congratulations, Matthias!
>
> Best,
> Yuxin
>
>
> Sergey Nuyanzin  于2023年8月4日周五 14:21写道:
>
> > Congratulations, Matthias!
> > Well deserved!
> >
> > On Fri, Aug 4, 2023 at 7:59 AM liu ron  wrote:
> >
> > > Congrats, Matthias!
> > >
> > > Best,
> > > Ron
> > >
> > > Shammon FY  于2023年8月4日周五 13:24写道:
> > >
> > > > Congratulations, Matthias!
> > > >
> > > > On Fri, Aug 4, 2023 at 1:13 PM Samrat Deb 
> > wrote:
> > > >
> > > > > Congrats, Matthias!
> > > > >
> > > > >
> > > > > On Fri, 4 Aug 2023 at 10:13 AM, Benchao Li 
> > > wrote:
> > > > >
> > > > > > Congratulations, Matthias!
> > > > > >
> > > > > > Jing Ge  于2023年8月4日周五 12:35写道:
> > > > > >
> > > > > > > Congrats! Matthias!
> > > > > > >
> > > > > > > Best regards,
> > > > > > > Jing
> > > > > > >
> > > > > > > On Fri, Aug 4, 2023 at 12:09 PM Yangze Guo  >
> > > > wrote:
> > > > > > >
> > > > > > > > Congrats, Matthias!
> > > > > > > >
> > > > > > > > Best,
> > > > > > > > Yangze Guo
> > > > > > > >
> > > > > > > > On Fri, Aug 4, 2023 at 11:44 AM Qingsheng Ren <
> > re...@apache.org>
> > > > > > wrote:
> > > > > > > > >
> > > > > > > > > Congratulations, Matthias! This is absolutely well
> deserved.
> > > > > > > > >
> > > > > > > > > Best,
> > > > > > > > > Qingsheng
> > > > > > > > >
> > > > > > > > > On Fri, Aug 4, 2023 at 11:31 AM Rui Fan <
> > 1996fan...@gmail.com>
> > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Congratulations Matthias, well deserved!
> > > > > > > > > >
> > > > > > > > > > Best,
> > > > > > > > > > Rui Fan
> > > > > > > > > >
> > > > > > > > > > On Fri, Aug 4, 2023 at 11:30 AM Leonard Xu <
> > > xbjt...@gmail.com>
> > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > Congratulations,  Matthias.
> > > > > > > > > > >
> > > > > > > > > > > Well deserved ^_^
> > > > > > > > > > >
> > > > > > > > > > > Best,
> > > > > > > > > > > Leonard
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > > On Aug 4, 2023, at 11:18 AM, Xintong Song <
> > > > > > tonysong...@gmail.com
> > > > > > > >
> > > > > > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > Hi everyone,
> > > > > > > > > > > >
> > > > > > > > > > > > On behalf of the PMC, I'm very happy to announce that
> > > > > Matthias
> > > > > > > > Pohl has
> > > > > > > > > > > > joined the Flink PMC!
> > > > > > > > > > > >
> > > > > > > > > > > > Matthias has been consistently contributing to the
> > > project
> > > > > > since
> > > > > > > > Sep
> > > > > > > > > > > 2020,
> > > > > > > > > > > > and became a committer in Dec 2021. He mainly works
> in
> > > > > Flink's
> > > > > > > > > > > distributed
> > > > > > > > > > > > coordination and high availability areas. He has
> worked
> > > on
> > > > > many
> > > > > > > > FLIPs
> > > > > > > > > > > > including FLIP195/270/285. He helped a lot with the
> > > release
> > > > > > > > management,
> > > > > > > > > > > > being one of the Flink 1.17 release managers and also
> > > very
> > > > > > active
> > > > > > > > in
> > > > > > > > > > > Flink
> > > > > > > > > > > > 1.18 / 2.0 efforts. He also contributed a lot to
> > > improving
> > > > > the
> > > > > > > > build
> > > > > > > > > > > > stability.
> > > > > > > > > > > >
> > > > > > > > > > > > Please join me in congratulating Matthias!
> > > > > > > > > > > >
> > > > > > > > > > > > Best,
> > > > > > > > > > > >
> > > > > > > > > > > > Xintong (on behalf of the Apache Flink PMC)
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > >
> > > > > > Best,
> > > > > > Benchao Li
> > > > > >
> > > > >
> > > >
> > >
> >
> >
> > --
> > Best regards,
> > Sergey
> >
>


Re: [ANNOUNCE] New Apache Flink Committer - Hong Teoh

2023-08-03 Thread Weihua Hu
Congratulations, Hong!

Best,
Weihua


On Fri, Aug 4, 2023 at 10:49 AM Samrat Deb  wrote:

> Congratulations, Hong Teoh
>
> On Fri, 4 Aug 2023 at 7:52 AM, Benchao Li  wrote:
>
> > Congratulations, Hong!
> >
> > yuxia  于2023年8月4日周五 09:23写道:
> >
> > > Congratulations, Hong Teoh!
> > >
> > > Best regards,
> > > Yuxia
> > >
> > > - 原始邮件 -
> > > 发件人: "Matthias Pohl" 
> > > 收件人: "dev" 
> > > 发送时间: 星期四, 2023年 8 月 03日 下午 11:24:39
> > > 主题: Re: [ANNOUNCE] New Apache Flink Committer - Hong Teoh
> > >
> > > Congratulations, Hong! :)
> > >
> > > On Thu, Aug 3, 2023 at 3:39 PM Leonard Xu  wrote:
> > >
> > > > Congratulations, Hong!
> > > >
> > > >
> > > > Best,
> > > > Leonard
> > > >
> > > > > On Aug 3, 2023, at 8:45 PM, Jiabao Sun  > > .INVALID>
> > > > wrote:
> > > > >
> > > > > Congratulations, Hong Teoh!
> > > > >
> > > > > Best,
> > > > > Jiabao Sun
> > > > >
> > > > >> 2023年8月3日 下午7:32,Danny Cranmer  写道:
> > > > >>
> > > > >> On behalf of the PMC, I'm very happy to announce Hong Teoh as a
> new
> > > > Flink
> > > > >> Committer.
> > > > >>
> > > > >> Hong has been active in the Flink community for over 1 year and
> has
> > > > played
> > > > >> a key role in developing and maintaining AWS integrations, core
> > > > connector
> > > > >> APIs and more recently, improvements to the Flink REST API.
> > > > Additionally,
> > > > >> Hong is a very active community member, supporting users and
> > > > participating
> > > > >> in discussions on the mailing lists, Flink slack channels and
> > speaking
> > > > at
> > > > >> conferences.
> > > > >>
> > > > >> Please join me in congratulating Hong for becoming a Flink
> > Committer!
> > > > >>
> > > > >> Thanks,
> > > > >> Danny Cranmer (on behalf of the Flink PMC)
> > > > >
> > > >
> > > >
> > >
> >
> >
> > --
> >
> > Best,
> > Benchao Li
> >
>


[jira] [Created] (FLINK-32702) streaming examples could not run

2023-07-27 Thread Weihua Hu (Jira)
Weihua Hu created FLINK-32702:
-

 Summary: streaming examples could not run
 Key: FLINK-32702
 URL: https://issues.apache.org/jira/browse/FLINK-32702
 Project: Flink
  Issue Type: Bug
  Components: Examples
Affects Versions: 1.18.0
Reporter: Weihua Hu
 Attachments: image-2023-07-27-16-54-20-070.png

There are some streaming examples that depend on flink-connector-datagen, but 
didn't package the datagen connector to the final jar. As a result, these 
examples couldn't run.

* Iteration.jar
* TopSpeedWindowing.jar
* SessionWindowing.jar

I would like to use the Maven Shade Plugin to include the dependencies in a 
single jar.


 !image-2023-07-27-16-54-20-070.png! 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [ANNOUNCE] New Apache Flink Committer - Yong Fang

2023-07-23 Thread Weihua Hu
Congratulations!

Best,
Weihua


On Mon, Jul 24, 2023 at 11:04 AM Paul Lam  wrote:

> Congrats, Shammon!
>
> Best,
> Paul Lam
>
> > 2023年7月24日 10:56,Jingsong Li  写道:
> >
> > Shammon
>
>


Re: JIRA Permission Opening Application

2023-07-10 Thread Weihua Hu
Hi YongPing

Thank you for your willingness to contribute, no special permissions are
required,
just find the issue you want to contribute to or create a new one and then
you contribute the code,
refer to the how-to-contribute[1]

[1] https://flink.apache.org/how-to-contribute/contribute-code/

Best,
Weihua


On Mon, Jul 10, 2023 at 8:48 PM YongPingLi <928689...@qq.com.invalid> wrote:

> Hi,
>
> I want to contribute to Apache Flink. Would you please give me the
> contributor permission? My JIRA ID is pingcai678.
>
>
> YongPingLi
> 928689...@qq.com
>
>
>
>  


Re: [ANNOUNCE] Apache Flink has won the 2023 SIGMOD Systems Award

2023-07-05 Thread Weihua Hu
Congratulations!

Best,
Weihua


On Wed, Jul 5, 2023 at 5:46 PM Shammon FY  wrote:

> Congratulations!
>
> Best,
> Shammon FY
>
> On Wed, Jul 5, 2023 at 2:38 PM Paul Lam  wrote:
>
> > Congrats and cheers!
> >
> > Best,
> > Paul Lam
> >
> > > 2023年7月4日 18:04,Benchao Li  写道:
> > >
> > > Congratulations!
> > >
> > > Feng Jin  于2023年7月4日周二 16:17写道:
> > >
> > >> Congratulations!
> > >>
> > >> Best,
> > >> Feng Jin
> > >>
> > >>
> > >>
> > >> On Tue, Jul 4, 2023 at 4:13 PM Yuxin Tan 
> > wrote:
> > >>
> > >>> Congratulations!
> > >>>
> > >>> Best,
> > >>> Yuxin
> > >>>
> > >>>
> > >>> Dunn Bangui  于2023年7月4日周二 16:04写道:
> > >>>
> >  Congratulations!
> > 
> >  Best,
> >  Bangui Dunn
> > 
> >  Yangze Guo  于2023年7月4日周二 15:59写道:
> > 
> > > Congrats everyone!
> > >
> > > Best,
> > > Yangze Guo
> > >
> > > On Tue, Jul 4, 2023 at 3:53 PM Rui Fan <1996fan...@gmail.com>
> wrote:
> > >>
> > >> Congratulations!
> > >>
> > >> Best,
> > >> Rui Fan
> > >>
> > >> On Tue, Jul 4, 2023 at 2:08 PM Zhu Zhu  wrote:
> > >>
> > >>> Congratulations everyone!
> > >>>
> > >>> Thanks,
> > >>> Zhu
> > >>>
> > >>> Hang Ruan  于2023年7月4日周二 14:06写道:
> > 
> >  Congratulations!
> > 
> >  Best,
> >  Hang
> > 
> >  Jingsong Li  于2023年7月4日周二 13:47写道:
> > 
> > > Congratulations!
> > >
> > > Thank you! All of the Flink community!
> > >
> > > Best,
> > > Jingsong
> > >
> > > On Tue, Jul 4, 2023 at 1:24 PM tison 
> >  wrote:
> > >>
> > >> Congrats and with honor :D
> > >>
> > >> Best,
> > >> tison.
> > >>
> > >>
> > >> Mang Zhang  于2023年7月4日周二 11:08写道:
> > >>
> > >>> Congratulations!--
> > >>>
> > >>> Best regards,
> > >>> Mang Zhang
> > >>>
> > >>>
> > >>>
> > >>>
> > >>>
> > >>> 在 2023-07-04 01:53:46,"liu ron"  写道:
> >  Congrats everyone
> > 
> >  Best,
> >  Ron
> > 
> >  Jark Wu  于2023年7月3日周一 22:48写道:
> > 
> > > Congrats everyone!
> > >
> > > Best,
> > > Jark
> > >
> > >> 2023年7月3日 22:37,Yuval Itzchakov 
> > >>> 写道:
> > >>
> > >> Congrats team!
> > >>
> > >> On Mon, Jul 3, 2023, 17:28 Jing Ge via user <
> > > u...@flink.apache.org
> > > > wrote:
> > >>> Congratulations!
> > >>>
> > >>> Best regards,
> > >>> Jing
> > >>>
> > >>>
> > >>> On Mon, Jul 3, 2023 at 3:21 PM yuxia <
> > > luoyu...@alumni.sjtu.edu.cn
> > > > wrote:
> >  Congratulations!
> > 
> >  Best regards,
> >  Yuxia
> > 
> >  发件人: "Pushpa Ramakrishnan" <
> > > pushpa.ramakrish...@icloud.com
> > >  > > pushpa.ramakrish...@icloud.com>>
> >  收件人: "Xintong Song"  > >>  > > tonysong...@gmail.com>>
> >  抄送: "dev"  > >>> dev@flink.apache.org>>,
> > > "User"  >  u...@flink.apache.org
> > >>>
> >  发送时间: 星期一, 2023年 7 月 03日 下午 8:36:30
> >  主题: Re: [ANNOUNCE] Apache Flink has won the 2023
> >  SIGMOD
> > >>> Systems
> > >>> Award
> > 
> >  Congratulations \uD83E\uDD73
> > 
> >  On 03-Jul-2023, at 3:30 PM, Xintong Song <
> > >>> tonysong...@gmail.com
> > > > wrote:
> > 
> >  
> >  Dear Community,
> > 
> >  I'm pleased to share this good news with everyone.
> > >>> As
> > > some
> > >>> of
> > > you
> > >>> may
> > > have already heard, Apache Flink has won the 2023
> > >> SIGMOD
> > > Systems
> > > Award
> > >>> [1].
> > 
> >  "Apache Flink greatly expanded the use of stream
> > > data-processing."
> > >>> --
> > > SIGMOD Awards Committee
> > 
> >  SIGMOD is one of the most influential data
> > >>> management
> > >>> research
> > > conferences in the world. The Systems Award is awarded
> > >>> to
> >  an
> > > individual
> > >>> or
> > > set of individuals to recognize the development of a
> > > software or
> > >>> hardware
> > > system whose technical contributions have had
> > >>> significant
>

[jira] [Created] (FLINK-32387) InputGateDeploymentDescriptor uses cache to avoid deserializing shuffle descriptors multiple times

2023-06-19 Thread Weihua Hu (Jira)
Weihua Hu created FLINK-32387:
-

 Summary: InputGateDeploymentDescriptor uses cache to avoid 
deserializing shuffle descriptors multiple times
 Key: FLINK-32387
 URL: https://issues.apache.org/jira/browse/FLINK-32387
 Project: Flink
  Issue Type: Sub-task
  Components: Runtime / Coordination
Reporter: Weihua Hu


InputGateDeploymentDescriptor uses cache to avoid deserializing shuffle 
descriptors multiple times.

The cache only affects when the shuffle descriptors are offloaded by the blob 
server. This means the shuffle descriptors size is large enough to use caches.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-32386) Add ShuffleDescriptorsCache in TaskExecutor to cache ShuffleDescriptors

2023-06-19 Thread Weihua Hu (Jira)
Weihua Hu created FLINK-32386:
-

 Summary: Add ShuffleDescriptorsCache in TaskExecutor to cache 
ShuffleDescriptors
 Key: FLINK-32386
 URL: https://issues.apache.org/jira/browse/FLINK-32386
 Project: Flink
  Issue Type: Sub-task
  Components: Runtime / Coordination
Reporter: Weihua Hu


Introduce a new struct named ShuffleDescriptorsCache to cache 
ShuffleDescriptorAndIndex which are offloaded by the blob server. 

The cache should have the following capabilities:

1. Expired after exceeding the TTL.
2. Limit the size of the cache. Remove the oldest element from the cache when 
its maximum size has been exceeded.
3. Clear elements belong to a job when it disconnects from TaskExecutor.




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-32385) Introduce a struct SerializedShuffleDescriptorAndIndices to identify a group of ShuffleDescriptorAndIndex

2023-06-19 Thread Weihua Hu (Jira)
Weihua Hu created FLINK-32385:
-

 Summary: Introduce a struct SerializedShuffleDescriptorAndIndices 
to identify a group of ShuffleDescriptorAndIndex
 Key: FLINK-32385
 URL: https://issues.apache.org/jira/browse/FLINK-32385
 Project: Flink
  Issue Type: Sub-task
  Components: Runtime / Coordination
Reporter: Weihua Hu


Introduce a new struct named SerializedShuffleDescriptorAndIndices to identify 
a group of ShuffleDescriptorAndIndex. 

Then we could cache these ShuffleDescriptorAndIndex in TaskExecutor side



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] FLIP-316: Introduce SQL Driver

2023-06-06 Thread Weihua Hu
for Flink
> >> SQLs,
> >>>>> which allows users to submit Flink SQLs in the same way as DataStream
> >>>>> jobs, or else users need to write their own main class.
> >>>>>
> >>>>>> SQL Driver needs to serialize SessionState which is very challenging
> >>>>>> but not detailed covered in the FLIP.
> >>>>>
> >>>>> With the help of ExecNodeGraph, do we still need the serialized
> >>>>> SessionState? If not, we could make SQL Driver accepts two serialized
> >>>>> formats:
> >>>>>
> >>>>> - SQL files for user-facing public usage
> >>>>> - ExecNodeGraph for internal usage
> >>>>>
> >>>>> It’s kind of similar to the relationship between job jars and
> >> jobgraphs.
> >>>>>
> >>>>>> Regarding "K8S doesn't support shipping multiple jars", is that
> true?
> >>>> Is it
> >>>>>> possible to support it?
> >>>>>
> >>>>> Yes, K8s doesn’t distribute any files. It’s the users’ responsibility
> >> to
> >>>> make
> >>>>> sure the resources are accessible in the containers. The common
> >> solutions
> >>>>> I know is to use distributed file systems or use init containers to
> >>>> localize the
> >>>>> resources.
> >>>>>
> >>>>> Now I lean toward introducing a fs to do the distribution job. WDYT?
> >>>>>
> >>>>> Best,
> >>>>> Paul Lam
> >>>>>
> >>>>>> 2023年6月1日 20:33,Jark Wu mailto:imj...@gmail.com>
> <mailto:imj...@gmail.com <mailto:imj...@gmail.com>>
> >> <mailto:imj...@gmail.com <mailto:imj...@gmail.com>  imj...@gmail.com <mailto:imj...@gmail.com>>>>
> >>>> 写道:
> >>>>>>
> >>>>>> Hi Paul,
> >>>>>>
> >>>>>> Thanks for starting this discussion. I like the proposal! This is a
> >>>>>> frequently requested feature!
> >>>>>>
> >>>>>> I agree with Shengkai that ExecNodeGraph as the submission object
> is a
> >>>>>> better idea than SQL file. To be more specific, it should be
> >>>> JsonPlanGraph
> >>>>>> or CompiledPlan which is the serializable representation.
> CompiledPlan
> >>>> is a
> >>>>>> clear separation between compiling/optimization/validation and
> >>>> execution.
> >>>>>> This can keep the validation and metadata accessing still on the
> >>>> SQLGateway
> >>>>>> side. This allows SQLGateway to leverage some metadata caching and
> UDF
> >>>> JAR
> >>>>>> caching for better compiling performance.
> >>>>>>
> >>>>>> If we decide to submit ExecNodeGraph instead of SQL file, is it
> still
> >>>>>> necessary to support SQL Driver? Regarding non-interactive SQL jobs,
> >>>> users
> >>>>>> can use the Table API program for application mode. SQL Driver needs
> >> to
> >>>>>> serialize SessionState which is very challenging but not detailed
> >>>> covered
> >>>>>> in the FLIP.
> >>>>>>
> >>>>>> Regarding "K8S doesn't support shipping multiple jars", is that
> true?
> >>>> Is it
> >>>>>> possible to support it?
> >>>>>>
> >>>>>> Best,
> >>>>>> Jark
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> On Thu, 1 Jun 2023 at 16:58, Paul Lam  <mailto:paullin3...@gmail.com>  >> paullin3...@gmail.com <mailto:paullin3...@gmail.com>>  >>>> paullin3...@gmail.com <mailto:paullin3...@gmail.com>  paullin3...@gmail.com <mailto:paullin3...@gmail.com>>>> wrote:
> >>>>>>
> >>>>>>> Hi Weihua,
> >>>>>>>
> >>>>>>> You’re right. Distributing the SQLs to the TMs is one of the
> >>>> challenging
> >>>>>>> parts of this FLIP.
> >>>>>>>
> >>>>>>> Web submission is not enabled in application mode currentl

Re: [DISCUSS] FLIP-316: Introduce SQL Driver

2023-05-31 Thread Weihua Hu
s is sort of like Driver in JDBC which
> >> translates SQLs into
> >>databases specific languages.
> >>
> >> In general, I’m +1 for SQL Driver and +0 for SQL Runner.
> >>
> >>> - Could we run SQL jobs using SQL in strings? Otherwise, we need to
> >> prepare
> >>> a SQL file in an image for Kubernetes application mode, which may be a
> >> bit
> >>> cumbersome.
> >>
> >> Do you mean a pass the SQL string a configuration or a program argument?
> >>
> >> I thought it might be convenient for testing propose, but not
> recommended
> >> for production,
> >> cause Flink SQLs could be complicated and involves lots of characters
> that
> >> need to escape.
> >>
> >> WDYT?
> >>
> >>> - I noticed that we don't specify the SQLDriver jar in the
> >> "run-application"
> >>> command. Does that mean we need to perform automatic detection in
> Flink?
> >>
> >> Yes! It’s like running a PyFlink job with the following command:
> >>
> >> ```
> >> ./bin/flink run \
> >>  --pyModule table.word_count \
> >>  --pyFiles examples/python/table
> >> ```
> >>
> >> The CLI determines if it’s a SQL job, if yes apply the SQL Driver
> >> automatically.
> >>
> >>
> >> [1]
> >>
> https://github.com/apache/flink/blob/master/flink-python/src/main/java/org/apache/flink/client/python/PythonDriver.java
> >>
> >> Best,
> >> Paul Lam
> >>
> >>> 2023年5月30日 21:56,Weihua Hu  写道:
> >>>
> >>> Thanks Paul for the proposal.
> >>>
> >>> +1 for this. It is valuable in improving ease of use.
> >>>
> >>> I have a few questions.
> >>> - Is SQLRunner the better name? We use this to run a SQL Job. (Not
> >> strong,
> >>> the SQLDriver is fine for me)
> >>> - Could we run SQL jobs using SQL in strings? Otherwise, we need to
> >> prepare
> >>> a SQL file in an image for Kubernetes application mode, which may be a
> >> bit
> >>> cumbersome.
> >>> - I noticed that we don't specify the SQLDriver jar in the
> >> "run-application"
> >>> command. Does that mean we need to perform automatic detection in
> Flink?
> >>>
> >>>
> >>> Best,
> >>> Weihua
> >>>
> >>>
> >>> On Mon, May 29, 2023 at 7:24 PM Paul Lam 
> wrote:
> >>>
> >>>> Hi team,
> >>>>
> >>>> I’d like to start a discussion about FLIP-316 [1], which introduces a
> >> SQL
> >>>> driver as the
> >>>> default main class for Flink SQL jobs.
> >>>>
> >>>> Currently, Flink SQL could be executed out of the box either via SQL
> >>>> Client/Gateway
> >>>> or embedded in a Flink Java/Python program.
> >>>>
> >>>> However, each one has its drawback:
> >>>>
> >>>> - SQL Client/Gateway doesn’t support the application deployment mode
> [2]
> >>>> - Flink Java/Python program requires extra work to write a non-SQL
> >> program
> >>>>
> >>>> Therefore, I propose adding a SQL driver to act as the default main
> >> class
> >>>> for SQL jobs.
> >>>> Please see the FLIP docs for details and feel free to comment. Thanks!
> >>>>
> >>>> [1]
> >>>>
> >>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-316%3A+Introduce+SQL+Driver
> >>>> <
> >>>>
> >>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-316:+Introduce+SQL+Driver
> >>>>>
> >>>> [2] https://issues.apache.org/jira/browse/FLINK-26541 <
> >>>> https://issues.apache.org/jira/browse/FLINK-26541>
> >>>>
> >>>> Best,
> >>>> Paul Lam
> >>
> >>
>
>


[jira] [Created] (FLINK-32225) merge task deployment related fields into a new configuration

2023-05-31 Thread Weihua Hu (Jira)
Weihua Hu created FLINK-32225:
-

 Summary: merge task deployment related fields into a new 
configuration 
 Key: FLINK-32225
 URL: https://issues.apache.org/jira/browse/FLINK-32225
 Project: Flink
  Issue Type: Technical Debt
  Components: Runtime / Coordination
Reporter: Weihua Hu
 Fix For: 1.18.0


As discussed in https://github.com/apache/flink/pull/22674

TaskDeploymentDescriptorFactory#fromExecution needs to retrieve several fields 
from the ExecutionGraphAccessor.  We could introduce a new 
TaskDeploymentDescriptorConfiguration to merge all these fields to make the 
EG/TaskDeploymentDescriptor more readable.




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] FLIP-316: Introduce SQL Driver

2023-05-30 Thread Weihua Hu
Thanks Paul for the proposal.

+1 for this. It is valuable in improving ease of use.

I have a few questions.
- Is SQLRunner the better name? We use this to run a SQL Job. (Not strong,
the SQLDriver is fine for me)
- Could we run SQL jobs using SQL in strings? Otherwise, we need to prepare
a SQL file in an image for Kubernetes application mode, which may be a bit
cumbersome.
- I noticed that we don't specify the SQLDriver jar in the "run-application"
command. Does that mean we need to perform automatic detection in Flink?


Best,
Weihua


On Mon, May 29, 2023 at 7:24 PM Paul Lam  wrote:

> Hi team,
>
> I’d like to start a discussion about FLIP-316 [1], which introduces a SQL
> driver as the
> default main class for Flink SQL jobs.
>
> Currently, Flink SQL could be executed out of the box either via SQL
> Client/Gateway
> or embedded in a Flink Java/Python program.
>
> However, each one has its drawback:
>
> - SQL Client/Gateway doesn’t support the application deployment mode [2]
> - Flink Java/Python program requires extra work to write a non-SQL program
>
> Therefore, I propose adding a SQL driver to act as the default main class
> for SQL jobs.
> Please see the FLIP docs for details and feel free to comment. Thanks!
>
> [1]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-316%3A+Introduce+SQL+Driver
> <
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-316:+Introduce+SQL+Driver
> >
> [2] https://issues.apache.org/jira/browse/FLINK-26541 <
> https://issues.apache.org/jira/browse/FLINK-26541>
>
> Best,
> Paul Lam


[jira] [Created] (FLINK-32205) Flink Rest Client should support connecting to the server using URLs.

2023-05-26 Thread Weihua Hu (Jira)
Weihua Hu created FLINK-32205:
-

 Summary: Flink Rest Client should support connecting to the server 
using URLs.
 Key: FLINK-32205
 URL: https://issues.apache.org/jira/browse/FLINK-32205
 Project: Flink
  Issue Type: Improvement
  Components: Command Line Client
Affects Versions: 1.17.0
Reporter: Weihua Hu


Currently, Flink Client can only connect to the server via the address:port, 
which is configured by the rest.address and rest.port.

But in some other scenarios. Flink Server is run behind a proxy. Such as 
running on Kubernetes and exposing services through ingress. The URL to access 
the Flink server can be: http://{proxy address}/{some prefix path to identify 
flink clusters}/{flink request path} 

In [FLINK-32030|https://issues.apache.org/jira/browse/FLINK-32030], the SQL 
Client gateway accepts URLs by using the '--endpoint'.

IMO, we should introduce an option, such as "rest.endpoint", to make the Flink 
client work with URLs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-32201) Enable the distribution of shuffle descriptors via the blob server by connection number

2023-05-25 Thread Weihua Hu (Jira)
Weihua Hu created FLINK-32201:
-

 Summary: Enable the distribution of shuffle descriptors via the 
blob server by connection number
 Key: FLINK-32201
 URL: https://issues.apache.org/jira/browse/FLINK-32201
 Project: Flink
  Issue Type: Sub-task
  Components: Runtime / Coordination
Reporter: Weihua Hu



Flink support distributes shuffle descriptors via the blob server to reduce 
JobManager overhead. But the default threshold to enable it is 1MB, which never 
reaches. Users need to set a proper value for this, but it requires advanced 
knowledge before configuring it.

I would like to enable this feature by the number of connections of a group of 
shuffle descriptors. For examples, a simple streaming job with two operators, 
each with 10,000 parallelism and connected via all-to-all distribution. In this 
job, we only get one set of shuffle descriptors, and this group has 1 * 
1 connections. This means that JobManager needs to send this set of shuffle 
descriptors to 1 tasks.

Since it is also difficult for users to configure, I would like to give it a 
default value. The serialized shuffle descriptors sizes for different 
parallelism are shown below.


|| Producer parallelism || serialized shuffle descriptor size || consumer 
parallelism || total data size that JM needs to send ||
| 5000 | 100KB | 5000 | 500MB |
| 1 | 200KB | 1 | 2GB |
| 2 | 400Kb | 2 | 8GB |

So, I would like to set the default value to 10,000 * 10,000. 

Any suggestions or concerns are appreciated.





--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: How to pass the TLS certs to the latest version of flink-connector-pulsar

2023-05-16 Thread Weihua Hu
Hi,

Did you try set 'serviceurl' starts with "pulsar+ssl://"? [1]

[1]
https://nightlies.apache.org/flink/flink-docs-release-1.17/docs/connectors/datastream/pulsar/#pulsar-client-serviceurl

Best,
Weihua


On Mon, May 15, 2023 at 7:22 PM Bauddhik Anand  wrote:

> Can someone please help with this?
>
> On Mon, 15 May, 2023, 09:34 Bauddhik Anand,  wrote:
>
> > I am trying to connect my Flink application to a Pulsar topic for
> > ingesting data. The topic is active and i am able to ingest the data via
> a
> > normal Java application.
> >
> > When i try to use the Flink application to ingest the data from the same
> > topic, using the latest version of flink-connector-pulsar i.e
> 4.0.0-1.17, i
> > do not find in the documenation anywhere how to pass to pass the TLS
> certs.
> >
> > I tried with below code:
> >
> >
> > final StreamExecutionEnvironment envn =
> StreamExecutionEnvironment.getExecutionEnvironment();
> >
> > Configuration config = new Configuration();
> >
> > config.setString("pulsar.client.authentication","tls");
> >
>  config.setString("pulsar.client.tlsCertificateFilePath",tlsCert);
> > config.setString("pulsar.client.tlsKeyFilePath",tlsKey);
> >
>  config.setString("pulsar.client.tlsTrustCertsFilePath",tlsTrustCert);
> >
> >  PulsarSource pulsarSource = PulsarSource.builder()
> > .setServiceUrl("serviceurl")
> > .setAdminUrl("adminurl")
> > .setStartCursor(StartCursor.earliest())
> > .setTopics("topicname")
> > .setDeserializationSchema(new SimpleStringSchema())
> > .setSubscriptionName("test-sub")
> > .setConfig(config)
> > .build();
> >
> >
> > pulsarStream.map(new MapFunction() {
> > private static final long serialVersionUID =
> -999736771747691234L;
> >
> > public String map(String value) throws Exception {
> >   return "Receiving from Pulsar : " + value;
> > }
> >   }).print();
> >
> >
> > envn.execute();
> >
> >
> > As per documentation i did not find any inbuilt method in the
> PulsarSource
> > class to pass the TLS certs, i tried using the PulsarClient options as
> > config and pass it to PulsarSource as option.
> >
> > This doesn't seem to work, as when i try to deploy the app, the Flink job
> > is submitted and JobManager throws the below error.
> >
> > Caused by: sun.security.validator.ValidatorException: PKIX path building
> failed: sun.security.provider.certpath.SunCertPathBuilderException: unable
> to find valid certification path to requested target
> > at sun.security.validator.PKIXValidator.doBuild(Unknown Source)
> ~[?:?]
> > at sun.security.validator.PKIXValidator.engineValidate(Unknown
> Source) ~[?:?]
> > at sun.security.validator.Validator.validate(Unknown Source) ~[?:?]
> > at sun.security.ssl.X509TrustManagerImpl.validate(Unknown Source)
> ~[?:?]
> >
> >
> > Caused by: sun.security.provider.certpath.SunCertPathBuilderException:
> unable to find valid certification path to requested target
> > at sun.security.provider.certpath.SunCertPathBuilder.build(Unknown
> Source) ~[?:?]
> > at
> sun.security.provider.certpath.SunCertPathBuilder.engineBuild(Unknown
> Source) ~[?:?]
> > at java.security.cert.CertPathBuilder.build(Unknown Source) ~[?:?]
> > at sun.security.validator.PKIXValidator.doBuild(Unknown Source)
> ~[?:?]
> >
> > I have already verified the certs path and it is correct, also i am using
> > the same path as a volume mount for my other apps and they work fine.
> >
> > My question is :
> >
> > How i can pass the certs to the latest version of the
> > *flink-connector-pulsar* i.e *4.0.0-1.17*
> >
>


[jira] [Created] (FLINK-32045) optimize task deployment performance for large-scale jobs

2023-05-10 Thread Weihua Hu (Jira)
Weihua Hu created FLINK-32045:
-

 Summary: optimize task deployment performance for large-scale jobs
 Key: FLINK-32045
 URL: https://issues.apache.org/jira/browse/FLINK-32045
 Project: Flink
  Issue Type: Improvement
  Components: Runtime / Coordination
Reporter: Weihua Hu


h1. Background

In FLINK-21110, we cache shuffle descriptors on the job manager side and 
support using blob servers to offload these descriptors in order to reduce the 
cost of tasks deployment.

I think there is also some improvement we could do for large-scale jobs.
 # The default min size to enable distribution via blob server is 1MB. But for 
a large wordcount job with 2 parallelism, the size of serialized shuffle 
descriptors is only 300KB. It means users need to lower the 
"blob.offload.minsize", but the value is hard for users to decide.
 # The task executor side still needs to load blob files and deserialize 
shuffle descriptors for each task. Since these operations are running in the 
main thread, it may be pending other RPCs from the job manager.

h1. Propose
 # Enable distribute shuffle descriptors via blob server automatically. This 
could be decided by the edge number of the current shuffle descriptor. The blob 
offload will be enabled when the edge number exceeds an internal threshold.
 # Introduce cache of deserialized shuffle descriptors on the task executor 
side. This could reduce the cost of reading from local blob files and 
deserialization. Of course, the cache should have TTL to avoid occupying too 
much memory. And the cache should have the same switch mechanism as the blob 
server offload.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] Preventing Mockito usage for the new code with Checkstyle

2023-04-25 Thread Weihua Hu
Thanks for driving this.

+1 for Mockito and Junit4.

A clarity checkstyle will be of great help to new developers.

Best,
Weihua


On Wed, Apr 26, 2023 at 1:47 PM Jing Ge  wrote:

> This is a great idea, thanks for bringing this up. +1
>
> Also +1 for Junit4. If I am not mistaken, it could only be done after the
> Junit5 migration is done.
>
> @Chesnay thanks for the hint. Do we have any doc about it? If not, it might
> deserve one. WDYT?
>
> Best regards,
> Jing
>
> On Wed, Apr 26, 2023 at 5:13 AM Lijie Wang 
> wrote:
>
> > Thanks for driving this. +1 for the proposal.
> >
> > Can we also prevent Junit4 usage in new code by this way?Because
> currently
> > we are aiming to migrate our codebase to JUnit 5.
> >
> > Best,
> > Lijie
> >
> > Piotr Nowojski  于2023年4月25日周二 23:02写道:
> >
> > > Ok, thanks for the clarification.
> > >
> > > Piotrek
> > >
> > > wt., 25 kwi 2023 o 16:38 Chesnay Schepler 
> > napisał(a):
> > >
> > > > The checkstyle rule would just ban certain imports.
> > > > We'd add exclusions for all existing usages as we did when
> introducing
> > > > other rules.
> > > > So far we usually disabled checkstyle rules for a specific files.
> > > >
> > > > On 25/04/2023 16:34, Piotr Nowojski wrote:
> > > > > +1 to the idea.
> > > > >
> > > > > How would this checkstyle rule work? Are you suggesting to start
> > with a
> > > > > number of exclusions? On what level will those exclusions be? Per
> > file?
> > > > Per
> > > > > line?
> > > > >
> > > > > Best,
> > > > > Piotrek
> > > > >
> > > > > wt., 25 kwi 2023 o 13:18 David Morávek 
> napisał(a):
> > > > >
> > > > >> Hi Everyone,
> > > > >>
> > > > >> A long time ago, the community decided not to use Mockito-based
> > tests
> > > > >> because those are hard to maintain. This is already baked in our
> > Code
> > > > Style
> > > > >> and Quality Guide [1].
> > > > >>
> > > > >> Because we still have Mockito imported into the code base, it's
> very
> > > > easy
> > > > >> for newcomers to unconsciously introduce new tests violating the
> > code
> > > > style
> > > > >> because they're unaware of the decision.
> > > > >>
> > > > >> I propose to prevent Mockito usage with a Checkstyle rule for a
> new
> > > > code,
> > > > >> which would eventually allow us to eliminate it. This could also
> > > prevent
> > > > >> some wasted work and unnecessary feedback cycles during reviews.
> > > > >>
> > > > >> WDYT?
> > > > >>
> > > > >> [1]
> > > > >>
> > > > >>
> > > >
> > >
> >
> https://flink.apache.org/how-to-contribute/code-style-and-quality-common/#avoid-mockito---use-reusable-test-implementations
> > > > >>
> > > > >> Best,
> > > > >> D.
> > > > >>
> > > >
> > > >
> > >
> >
>


Re: [DISCUSS] FLINK-31873: Add setMaxParallelism to the DataStreamSink Class

2023-04-23 Thread Weihua Hu
Hi, Eric

Thanks for bringing this discussion.
I think it's reasonable to add ''setMaxParallelism" for DataStreamSink.

+1

Best,
Weihua


On Sat, Apr 22, 2023 at 3:20 AM eric xiao  wrote:

> Hi there devs,
>
> I would like to start a discussion thread for FLINK-31873[1].
>
> We are in the processing of enabling Flink reactive mode as the default
> scheduling mode. While reading configuration docs [2] (I believe it was
> also mentioned during one of the training sessions during Flink Forward
> 2022), one can/should replace all setParallelism calls with
> setMaxParallelism when migrating to reactive mode.
>
> This currently isn't possible on a sink in a Flink pipeline as we do not
> expose a setMaxParallelism on the DataStreamSink class [3]. The underlying
> Transformation class does have both a setMaxParallelism and setParallelism
> function defined [4], but only setParallelism is offered in the
> DataStreamSink class.
>
> I believe adding setMaxParallelism would be beneficial for not just flink
> reactive mode, both modes of running of a flink pipeline (non reactive
> mode, flink auto scaling).
>
> Best,
>
> Eric Xiao
>
> [1] https://issues.apache.org/jira/browse/FLINK-31873
> [2]
>
> https://nightlies.apache.org/flink/flink-docs-release-1.17/docs/deployment/elastic_scaling/#configuration
> [3]
>
> https://github.com/apache/flink/blob/master/flink-streaming-java/src/main/java/org/apache/flink/streaming/api/datastream/DataStreamSink.java
> [4]
>
> https://github.com/apache/flink/blob/master/flink-core/src/main/java/org/apache/flink/api/dag/Transformation.java#L248-L285
>


Re: [ANNOUNCE] New Apache Flink PMC Member - Qingsheng Ren

2023-04-23 Thread Weihua Hu
Congratulations, Qingsheng!

Best,
Weihua


On Sun, Apr 23, 2023 at 3:53 PM Yun Tang  wrote:

> Congratulations, Qingsheng!
>
> Best
> Yun Tang
> 
> From: weijie guo 
> Sent: Sunday, April 23, 2023 14:50
> To: dev@flink.apache.org 
> Subject: Re: [ANNOUNCE] New Apache Flink PMC Member - Qingsheng Ren
>
> Congratulations, Qingsheng!
>
> Best regards,
>
> Weijie
>
>
> Geng Biao  于2023年4月23日周日 14:29写道:
>
> > Congrats, Qingsheng!
> > Best,
> > Biao Geng
> >
> > 获取 Outlook for iOS
> > 
> > 发件人: Wencong Liu 
> > 发送时间: Sunday, April 23, 2023 11:06:39 AM
> > 收件人: dev@flink.apache.org 
> > 主题: Re:[ANNOUNCE] New Apache Flink PMC Member - Qingsheng Ren
> >
> > Congratulations, Qingsheng!
> >
> > Best,
> > Wencong LIu
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > At 2023-04-21 19:47:52, "Jark Wu"  wrote:
> > >Hi everyone,
> > >
> > >We are thrilled to announce that Leonard Xu has joined the Flink PMC!
> > >
> > >Leonard has been an active member of the Apache Flink community for many
> > >years and became a committer in Nov 2021. He has been involved in
> various
> > >areas of the project, from code contributions to community building. His
> > >contributions are mainly focused on Flink SQL and connectors, especially
> > >leading the flink-cdc-connectors project to receive 3.8+K GitHub stars.
> He
> > >authored 150+ PRs, and reviewed 250+ PRs, and drove several FLIPs (e.g.,
> > >FLIP-132, FLIP-162). He has participated in plenty of discussions in the
> > >dev mailing list, answering questions about 500+ threads in the
> > >user/user-zh mailing list. Besides that, he is community minded, such as
> > >being the release manager of 1.17, verifying releases, managing release
> > >syncs, etc.
> > >
> > >Congratulations and welcome Leonard!
> > >
> > >Best,
> > >Jark (on behalf of the Flink PMC)
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > At 2023-04-21 19:50:02, "Jark Wu"  wrote:
> > >Hi everyone,
> > >
> > >We are thrilled to announce that Qingsheng Ren has joined the Flink PMC!
> > >
> > >Qingsheng has been contributing to Apache Flink for a long time. He is
> the
> > >core contributor and maintainer of the Kafka connector and
> > >flink-cdc-connectors, bringing users stability and ease of use in both
> > >projects. He drove discussions and implementations in FLIP-221,
> FLIP-288,
> > >and the connector testing framework. He is continuously helping with the
> > >expansion of the Flink community and has given several talks about Flink
> > >connectors at many conferences, such as Flink Forward Global and Flink
> > >Forward Asia. Besides that, he is willing to help a lot in the community
> > >work, such as being the release manager for both 1.17 and 1.18,
> verifying
> > >releases, and answering questions on the mailing list.
> > >
> > >Congratulations and welcome Qingsheng!
> > >
> > >Best,
> > >Jark (on behalf of the Flink PMC)
> >
>


Re: [ANNOUNCE] New Apache Flink PMC Member - Leonard Xu

2023-04-23 Thread Weihua Hu
Congratulations, Leonard!

Best,
Weihua


On Sun, Apr 23, 2023 at 3:53 PM Yun Tang  wrote:

> Congratulations, Leonard!
>
> Best,
> Yun Tang
> 
> From: weijie guo 
> Sent: Sunday, April 23, 2023 14:50
> To: dev@flink.apache.org 
> Subject: Re: [ANNOUNCE] New Apache Flink PMC Member - Leonard Xu
>
> Congratulations, Leonard!
>
> Best regards,
>
> Weijie
>
>
> Geng Biao  于2023年4月23日周日 14:30写道:
>
> > Congrats, Lenorad!
> > Best,
> > Biao Geng
> >
> > 获取 Outlook for iOS
> > 
> > 发件人: Wencong Liu 
> > 发送时间: Sunday, April 23, 2023 11:05:41 AM
> > 收件人: dev@flink.apache.org 
> > 主题: Re:[ANNOUNCE] New Apache Flink PMC Member - Leonard Xu
> >
> > Congratulations, Leonard!
> >
> > Best,
> > Wencong LIu
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > At 2023-04-21 19:47:52, "Jark Wu"  wrote:
> > >Hi everyone,
> > >
> > >We are thrilled to announce that Leonard Xu has joined the Flink PMC!
> > >
> > >Leonard has been an active member of the Apache Flink community for many
> > >years and became a committer in Nov 2021. He has been involved in
> various
> > >areas of the project, from code contributions to community building. His
> > >contributions are mainly focused on Flink SQL and connectors, especially
> > >leading the flink-cdc-connectors project to receive 3.8+K GitHub stars.
> He
> > >authored 150+ PRs, and reviewed 250+ PRs, and drove several FLIPs (e.g.,
> > >FLIP-132, FLIP-162). He has participated in plenty of discussions in the
> > >dev mailing list, answering questions about 500+ threads in the
> > >user/user-zh mailing list. Besides that, he is community minded, such as
> > >being the release manager of 1.17, verifying releases, managing release
> > >syncs, etc.
> > >
> > >Congratulations and welcome Leonard!
> > >
> > >Best,
> > >Jark (on behalf of the Flink PMC)
> >
>


Re: [VOTE] FLIP-304: Pluggable Failure Enrichers

2023-04-21 Thread Weihua Hu
+1 (non-binding)

Best,
Weihua


On Fri, Apr 21, 2023 at 4:00 PM weijie guo 
wrote:

> +1 (binding)
>
> Best regards,
>
> Weijie
>
>
> Zhu Zhu  于2023年4月21日周五 11:03写道:
>
> > +1 (binding)
> >
> > Thanks,
> > Zhu
> >
> > Anton Kalashnikov  于2023年4月20日周四 20:03写道:
> > >
> > > +1 (binding)
> > >
> > >
> > > Thanks for this FLIP Panos, LGTM.
> > >
> > > --
> > > Best regards,
> > > Anton Kalashnikov
> > >
> > > On 20.04.23 13:44, Roman Khachatryan wrote:
> > > > +1 (binding)
> > > >
> > > > The FLIP LGTM, thanks Panos!
> > > >
> > > > Regards,
> > > > Roman
> > > >
> > > >
> > > > On Thu, Apr 20, 2023 at 1:33 PM Hong Teoh 
> wrote:
> > > >
> > > >> +1 (non-binding)
> > > >>
> > > >> Thank you for driving this effort, Panagiotis.
> > > >>
> > > >> Regards,
> > > >> Hong
> > > >>
> > > >>
> > > >>> On 20 Apr 2023, at 12:16, David Morávek  wrote:
> > > >>>
> > > >>> Thanks for the update!
> > > >>>
> > > >>> +1 (binding)
> > > >>>
> > > >>> Best,
> > > >>> D.
> > > >>>
> > > >>> On Thu, Apr 20, 2023 at 9:50 AM Piotr Nowojski <
> pnowoj...@apache.org
> > >
> > > >> wrote:
> > >  Hi,
> > > 
> > >  I see that the FLIP has been updated, thanks Panos!
> > > 
> > >  +1 (binding)
> > > 
> > >  Best,
> > >  Piotrek
> > > 
> > >  śr., 19 kwi 2023 o 13:49 Piotr Nowojski  >
> > >  napisał(a):
> > > 
> > > > +1 to what David wrote. I think we need to update the FLIP and
> > extend
> > > >> the
> > > > voting?
> > > >
> > > > Piotrek
> > > >
> > > > śr., 19 kwi 2023 o 09:06 David Morávek 
> > napisał(a):
> > > >
> > > >> Hi Panos,
> > > >>
> > > >> It seems that most recent discussions (e.g. changing the
> > semantics of
> > >  the
> > > >> config option) are not reflected in the FLIP. Can you please
> > >  double-check
> > > >> that this is the correct version?
> > > >>
> > > >> Best,
> > > >> D.
> > > >>
> > > >>
> > > >> On Mon, Apr 17, 2023 at 9:24 AM Panagiotis Garefalakis <
> > >  pga...@apache.org
> > > >> wrote:
> > > >>
> > > >>> Hello everyone,
> > > >>>
> > > >>> I want to start the vote for FLIP-304: Pluggable Failure
> > Enrichers
> > > >> [1]
> > > >> --
> > > >>> discussed as part of [2].
> > > >>>
> > > >>> FLIP-304 introduces a pluggable interface allowing users to add
> > > >> custom
> > > >>> logic and enrich failures with custom metadata labels.
> > > >>>
> > > >>> The vote will last for at least 72 hours (Thursday, 20th of
> April
> > >  2023,
> > > >>> 12:30 PST) unless there is an objection or insufficient votes.
> > > >>>
> > > >>> [1]
> > > >>>
> > > >>>
> > > >>
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-304%3A+Pluggable+Failure+Enrichers
> > > >>> [2]
> > https://lists.apache.org/thread/zs9n9p8d7tyvnq4yyxhc8zvq1k2c1hvs
> > > >>>
> > > >>>
> > > >>> Cheers,
> > > >>> Panagiotis
> > > >>>
> > > >>
> >
>


[jira] [Created] (FLINK-31843) Select slots from SlotPool#freeSlots in bulk

2023-04-18 Thread Weihua Hu (Jira)
Weihua Hu created FLINK-31843:
-

 Summary: Select slots from SlotPool#freeSlots in bulk
 Key: FLINK-31843
 URL: https://issues.apache.org/jira/browse/FLINK-31843
 Project: Flink
  Issue Type: Sub-task
  Components: Runtime / Coordination
Reporter: Weihua Hu


we should also reduce the number of calls of "getFreeSlotInformations". In 
current implementation, the scheduler will batch request slots for tasks in the 
same pipeline region(ExecutionSlotAllocator#allocateSlotsFor), but the slot 
allocator will process these requests one by one, and call 
"getFreeSlotInformations" once for each request.

We can optimize it to call "getFreeSlotInformations" once for a bulk (of slot 
requests), instead of once for each slot request. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-31842) calculate task executor's utilization only when used

2023-04-18 Thread Weihua Hu (Jira)
Weihua Hu created FLINK-31842:
-

 Summary: calculate task executor's utilization only when used
 Key: FLINK-31842
 URL: https://issues.apache.org/jira/browse/FLINK-31842
 Project: Flink
  Issue Type: Sub-task
  Components: Runtime / Coordination
Reporter: Weihua Hu


Currently DefaultAllocatedSlotPool#getFreeSlotsInformation always calculates 
the taskExecutorUtilization. This causes task schedules to be too slow when 
there are lots of slots, such as 2 slots total. But only the 
EvenlySpreadOutLocationPreferenceSlotSelectionStrategy uses this utilization.

So I would like to move the calculation of taskExecutorUtilization to usage. 
DefaultAllocatedSlotPool provides a function: getTaskExecutorUtilization, and 
is only used in EvenlySpreadOutLocationPreferenceSlotSelectionStrategy.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-31832) Add benchmarks for end to end  restarting tasks

2023-04-17 Thread Weihua Hu (Jira)
Weihua Hu created FLINK-31832:
-

 Summary: Add benchmarks for end to end  restarting tasks
 Key: FLINK-31832
 URL: https://issues.apache.org/jira/browse/FLINK-31832
 Project: Flink
  Issue Type: Improvement
  Components: Runtime / Coordination
Reporter: Weihua Hu


As discussed in https://issues.apache.org/jira/browse/FLINK-31771. 

We need a benchmark for job failover and end to end restarting tasks



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-31808) wrong examples of how to set operator name in documents

2023-04-14 Thread Weihua Hu (Jira)
Weihua Hu created FLINK-31808:
-

 Summary: wrong examples of how to set operator name  in documents
 Key: FLINK-31808
 URL: https://issues.apache.org/jira/browse/FLINK-31808
 Project: Flink
  Issue Type: Bug
  Components: Documentation
Reporter: Weihua Hu


[https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/operators/overview/#name-and-description]

 
{code:java}
.setName("filter"){code}
 should be
{code:java}
.name("filter"){code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-31771) Improve select available slot from SlotPool

2023-04-11 Thread Weihua Hu (Jira)
Weihua Hu created FLINK-31771:
-

 Summary: Improve select available slot from SlotPool
 Key: FLINK-31771
 URL: https://issues.apache.org/jira/browse/FLINK-31771
 Project: Flink
  Issue Type: Improvement
  Components: Runtime / Coordination
Reporter: Weihua Hu


DefaultScheduler will request slots from SlotPool for tasks one by one.
For each task, the PhysicalSlotProviderImpl#tryAllocateFromAvailable will 
retrieve all available slots from 
DefaultAllocatedSlotPool#getFreeSlotsInformation, and then select the best slot 
by SlotSelectionStrategy.

Currently DefaultAllocatedSlotPool#getFreeSlotsInformation always calculates 
the taskExecutorUtilization.  This causes task schedules to be too slow when 
there are lots of slots, such as 2 slots total. But only the 
EvenlySpreadOutLocationPreferenceSlotSelectionStrategy uses this utilization.

So I would like to move the calculation of taskExecutorUtilization to usage. 
DefaultAllocatedSlotPool provides a function: getTaskExecutorUtilization, and 
is only used in EvenlySpreadOutLocationPreferenceSlotSelectionStrategy.

This change could reduce the latency of allocated 2 slots from 72s to 12s 
in my local IDE.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-31752) SourceOperatorStreamTask increments numRecordsOut twice

2023-04-07 Thread Weihua Hu (Jira)
Weihua Hu created FLINK-31752:
-

 Summary: SourceOperatorStreamTask increments numRecordsOut twice
 Key: FLINK-31752
 URL: https://issues.apache.org/jira/browse/FLINK-31752
 Project: Flink
  Issue Type: Bug
  Components: Runtime / Metrics
Affects Versions: 1.17.0
Reporter: Weihua Hu
 Attachments: image-2023-04-07-15-51-44-304.png

The counter of numRecordsOut was introduce to ChainingOutput to reduce the 
function call stack depth in 
https://issues.apache.org/jira/browse/FLINK-30536

But SourceOperatorStreamTask.AsyncDataOutputToOutput increments the counter of 
numRecordsOut too. This results in the source operator's numRecordsOut are 
doubled.

We should delete the numRecordsOut.inc in 
SourceOperatorStreamTask.AsyncDataOutputToOutput

 !image-2023-04-07-15-51-44-304.png! 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [jira] [Created] (FLINK-31690) The current key is not set for KeyedCoProcessOperator

2023-04-02 Thread Weihua Hu
Hi,

you need send email to dev-unsubscr...@flink.apache.org with any contents
or subject

[1]
https://flink.apache.org/community.html#how-to-subscribe-to-a-mailing-list

Best,
Weihua


On Mon, Apr 3, 2023 at 10:35 AM 越张  wrote:

> 退订
>
> Dian Fu (Jira)  于2023年4月3日周一 10:33写道:
>
> > Dian Fu created FLINK-31690:
> > ---
> >
> >  Summary: The current key is not set for
> KeyedCoProcessOperator
> >  Key: FLINK-31690
> >  URL: https://issues.apache.org/jira/browse/FLINK-31690
> >  Project: Flink
> >   Issue Type: Bug
> >   Components: API / Python
> > Reporter: Dian Fu
> > Assignee: Dian Fu
> >
> >
> > See
> https://apache-flink.slack.com/archives/C03G7LJTS2G/p1680294701254239
> > for more details.
> >
> >
> >
> > --
> > This message was sent by Atlassian Jira
> > (v8.20.10#820010)
> >
>


[jira] [Created] (FLINK-31679) [Flink][UI] Show data exchange type on web ui

2023-03-31 Thread Weihua Hu (Jira)
Weihua Hu created FLINK-31679:
-

 Summary: [Flink][UI] Show data exchange type on web ui
 Key: FLINK-31679
 URL: https://issues.apache.org/jira/browse/FLINK-31679
 Project: Flink
  Issue Type: Improvement
  Components: Runtime / Web Frontend
Affects Versions: 1.18.0
Reporter: Weihua Hu


Flink supports multiple data exchange types (ResultPartitionType), which 
subject to multiple parameters, such as whether job is streaming or batch.

I think display the data exchange types on the UI would be helpful.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: Subtask distribution in Flink

2023-03-29 Thread Weihua Hu
For metrics about failover, you can refer to [1]

[1]
https://nightlies.apache.org/flink/flink-docs-master/docs/ops/metrics/#availability

Best,
Weihua


On Tue, Mar 28, 2023 at 1:34 AM santhosh venkat <
santhoshvenkat1...@gmail.com> wrote:

> Hi,
>
> Thank you so much for taking time to answer my questions and pointing me to
> relevant documentation. Really appreciate it.
>
> When the task failover happens, are there internal metrics in Flink at a
> job level to track the new execution attempt?  Is there a way for the
> application owner to figure out how many task failovers have happened in a
> job execution and get the current execution attempt.
>
> Thanks.
>
> On Mon, Mar 27, 2023 at 2:55 AM Weihua Hu  wrote:
>
> > Hi,
> >
> > 1. Does this mean that  each  task slot will contain an entire pipeline
> in
> > > the job?
> >
> > not exactly, each slot will run a subtask of each task. If the job is so
> > simple that
> > there is no keyby logic and we do not enable rebalance shuffle type, each
> > slot
> > could run all the pipeline. But if not we need to shuffle data to other
> > subtasks.
> > You can get some examples from [1].
> >
> > 2. Upon a TM pod failure and after K8s brings back the TM pod, would
> flink
> > > assign the same subtasks back to restarted TM  again? Or will they be
> > > distributed to different TaskManagers?
> >
> > If there is no shuffle data in your job (described in 1), only tasks on
> > failure pods
> >  will be restarted, and they will be assigned to the new TM again.
> > But if not, all the related tasks will be restarted. When these tasks
> > re-scheduled,
> > there are some strategy to assign slots. They will try to assign the task
> > to previous
> > slot to reduce the recovery time, But there is no guarantee.
> > You can read [2] to get more information about failure recovery.
> >
> >
> > [1]
> >
> >
> https://nightlies.apache.org/flink/flink-docs-master/docs/concepts/flink-architecture/#tasks-and-operator-chains
> > [2]
> >
> >
> https://nightlies.apache.org/flink/flink-docs-master/docs/ops/state/task_failure_recovery/
> >
> > Best,
> > Weihua
> >
> >
> > On Mon, Mar 27, 2023 at 3:22 PM santhosh venkat <
> > santhoshvenkat1...@gmail.com> wrote:
> >
> > > Hi,
> > >
> > > I am trying to understand how subtask distribution works in Flink.
> Let's
> > > assume a setup of a Flink cluster with a fixed number of TaskManagers
> in
> > a
> > > kubernetes cluster.
> > >
> > > Let's say I have a flink job with all the operators having the same
> > > parallelism and with the same Slot sharing group. The operator
> > parallelism
> > > is computed as the number of task managers multiplied by number of task
> > > slots per TM.
> > >
> > > 1. Does this mean that  each  task slot will contain an entire pipeline
> > in
> > > the job?
> > > 2. Upon a TM pod failure and after K8s brings back the TM pod, would
> > flink
> > > assign the same subtasks back to restarted TM  again? Or will they be
> > > distributed to different TaskManagers?
> > >
> > > It would be great if someone can answer this question.
> > >
> > > Thanks.
> > >
> >
>


Re: Subtask distribution in Flink

2023-03-27 Thread Weihua Hu
Hi,

1. Does this mean that  each  task slot will contain an entire pipeline in
> the job?

not exactly, each slot will run a subtask of each task. If the job is so
simple that
there is no keyby logic and we do not enable rebalance shuffle type, each
slot
could run all the pipeline. But if not we need to shuffle data to other
subtasks.
You can get some examples from [1].

2. Upon a TM pod failure and after K8s brings back the TM pod, would flink
> assign the same subtasks back to restarted TM  again? Or will they be
> distributed to different TaskManagers?

If there is no shuffle data in your job (described in 1), only tasks on
failure pods
 will be restarted, and they will be assigned to the new TM again.
But if not, all the related tasks will be restarted. When these tasks
re-scheduled,
there are some strategy to assign slots. They will try to assign the task
to previous
slot to reduce the recovery time, But there is no guarantee.
You can read [2] to get more information about failure recovery.


[1]
https://nightlies.apache.org/flink/flink-docs-master/docs/concepts/flink-architecture/#tasks-and-operator-chains
[2]
https://nightlies.apache.org/flink/flink-docs-master/docs/ops/state/task_failure_recovery/

Best,
Weihua


On Mon, Mar 27, 2023 at 3:22 PM santhosh venkat <
santhoshvenkat1...@gmail.com> wrote:

> Hi,
>
> I am trying to understand how subtask distribution works in Flink. Let's
> assume a setup of a Flink cluster with a fixed number of TaskManagers in a
> kubernetes cluster.
>
> Let's say I have a flink job with all the operators having the same
> parallelism and with the same Slot sharing group. The operator parallelism
> is computed as the number of task managers multiplied by number of task
> slots per TM.
>
> 1. Does this mean that  each  task slot will contain an entire pipeline in
> the job?
> 2. Upon a TM pod failure and after K8s brings back the TM pod, would flink
> assign the same subtasks back to restarted TM  again? Or will they be
> distributed to different TaskManagers?
>
> It would be great if someone can answer this question.
>
> Thanks.
>


[jira] [Created] (FLINK-31537) Derive a new TaskManagerTrackerConfiguration from the SlotManagerConfiguration.

2023-03-21 Thread Weihua Hu (Jira)
Weihua Hu created FLINK-31537:
-

 Summary: Derive a new TaskManagerTrackerConfiguration from the 
SlotManagerConfiguration.
 Key: FLINK-31537
 URL: https://issues.apache.org/jira/browse/FLINK-31537
 Project: Flink
  Issue Type: Improvement
  Components: Runtime / Coordination
Reporter: Weihua Hu


As discussion in 
https://github.com/apache/flink/pull/22196#discussion_r1141847905
We need derive a new TaskManagerTrackerConfiguration from the 
SlotManagerConfiguration after DeclarativeSlotManager removed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] FLIP-304: Pluggable failure handling for Apache Flink

2023-03-20 Thread Weihua Hu
Hi Panagiotis,

Thanks for your proposal. It is valuable to analyze the reason for
failure with the user plug-in.

Making the context immutable could make the contract stronger.
Letting the listener return an enriching result may be a better way.

IIUC, listeners could do two things, enrich more information (tags/labels)
to FailureHandlingResult, and push data out of Flink (metrics or something).
 IMO, we could split these two types into Listener and Advisor (maybe
other names). The Listener just pushes the data out and returns nothing to
 Flink, so we can run these async and don't have to wait for Listener's
result.
 The Advisor returns rich information to the FailureHadingResult, and it
should
 have a lighter logic.


Supporting a custom restart strategy is also valuable. In this design, we
use
RestartStrategy to construct a FailureHandingResult, and then pass it to
Listener.
My question is, should we change the restart strategy interface to support
the
custom restart strategy, or keep the current restart strategy and let the
later
Listener enrich the restartable information to FailureHandingResult? The
latter
may cause some confusion when we use a custom restart strategy.
The default flink restart strategy also runs but does not take effect.


Best,
Weihua


On Mon, Mar 20, 2023 at 11:42 PM Lijie Wang 
wrote:

> Hi Panagiotis,
>
> Thanks for driving this.
>
> +1 for supporting custom restart strategy, we did receive such requests
> from the user mailing list [1][2].
>
> Besides, in current design, the plugin will only do some statistical and
> classification work, and will not affect the *FailureHandlingResult*. Just
> listening, no handling, it doesn't quite match the title.
>
> [1] https://lists.apache.org/thread/ch3s4jhh09wnff3tscqnb6btp2zlp2r1
> [2] https://lists.apache.org/thread/lwjfdr7c1ypo77r4rwojdk7kxx2sw4sx
>
> Best,
> Lijie
>
> Zhu Zhu  于2023年3月20日周一 21:39写道:
>
> > Hi Panagiotis,
> >
> > Thanks for creating this proposal! It's good to enable Flink to handle
> > different errors in different ways, through a pluggable way.
> >
> > There are requests for flexible restart strategies from time to time, for
> > different strategies of restart backoff time, or to suppress restarting
> > on certain errors. Therefore, I think it's better that the proposed
> > failure handling plugin can also support custom restart strategies.
> >
> > Maybe we can call it FailureHandlingAdvisor which provides more
> > information (labels) and gives advice (restart backoff time, whether
> > to restart)? I do not have a strong opinion though, any explanatory
> > name would be good.
> >
> > To avoid unexpected mutation, how about to make the context immutable
> > and let the plugin return an immutable result? i.e. remove the setters
> > from the context, and let the plugin method return a result which
> > contains `labels`, `canRestart` and `restartBackoffTime`. Flink should
> > apply the result to the context before invoking the next plugin, so
> > that the next plugin will see the updated context.
> >
> > The plugin should avoid taking too much time to return the result,
> because
> > it will block the RPC and result in instability. However, it can still
> > perform heavy actions in a different thread. The context can provide an
> > `ioExecutor` to the plugins for reuse.
> >
> > Thanks,
> > Zhu
> >
> > Shammon FY  于2023年3月20日周一 20:21写道:
> > >
> > > Hi Panagiotis
> > >
> > > Thank you for your answer. I agree that `FailureListener` could be
> > > stateless, then I have some thoughts as follows
> > >
> > > 1. I see that listeners and tag collections are associated. When
> > JobManager
> > > fails and restarts, how can the new listener be associated with the tag
> > > collection before failover? Is the listener loading order?
> > >
> > > 2. The tag collection may be too large, resulting in the JobManager
> OOM,
> > do
> > > we need to provide a management class that supports some obsolescence
> > > strategies instead of a direct Collection?
> > >
> > > 3. Is it possible to provide a more complex data structure than a
> simple
> > > string collection for tags in listeners, such as key-value?
> > >
> > > Best,
> > > Shammon FY
> > >
> > >
> > > On Mon, Mar 20, 2023 at 7:48 PM Leonard Xu  wrote:
> > >
> > > > Hi,Panagiotis
> > > >
> > > >
> > > > Thank you for kicking off this discussion. Overall, the proposed
> > feature of
> > > > this FLIP makes sense to me. We have also discussed similar
> > requirements
> > > > with our users and developers, and I believe it will help many users.
> > > >
> > > >
> > > > In terms of FLIP content, I have some thoughts:
> > > >
> > > > (1) For the FailureListenerContextget interface, the methods
> > > > FailureListenerContext#addTag and FailureListenerContextgetTags looks
> > very
> > > > inconsistent because they imply specific implementation details, and
> > not
> > > > all FailureListeners need to handle them, we shouldn't put them in
> the
> > > > interface. Minor: The comment "UDF load

[jira] [Created] (FLINK-31529) Let yarn client exit early before JobManager running

2023-03-20 Thread Weihua Hu (Jira)
Weihua Hu created FLINK-31529:
-

 Summary: Let yarn client exit early before JobManager running
 Key: FLINK-31529
 URL: https://issues.apache.org/jira/browse/FLINK-31529
 Project: Flink
  Issue Type: Improvement
  Components: Deployment / YARN
Reporter: Weihua Hu


Currently the YarnClusterDescriptor always wait yarn application status to be 
RUNNING even if we use the detach mode. 

In batch mode, the queue resources is insufficient in most case. So the job 
manager may take a long time to wait resources. And client also keep waiting 
too. 





--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [VOTE] FLIP-301: Hybrid Shuffle supports Remote Storage

2023-03-19 Thread Weihua Hu
+1 (non-binding)

Best,
Weihua


On Mon, Mar 20, 2023 at 12:39 PM Wencong Liu  wrote:

> +1 (non-binding)
>
> Best regards,
>
> Wencong Liu
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> At 2023-03-20 12:05:47, "Yuxin Tan"  wrote:
> >Hi, everyone,
> >
> >Thanks for all your feedback for FLIP-301: Hybrid Shuffle
> >supports Remote Storage[1] on the discussion thread[2].
> >
> >I'd like to start a vote for it. The vote will be open for at
> >least 72 hours (03/23, 13:00 UTC+8) unless there is an
> >objection or not enough votes.
> >
> >[1]
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-301%3A+Hybrid+Shuffle+supports+Remote+Storage
> >[2] https://lists.apache.org/thread/nwrqd5jtqwks89tbxpcrgto6r2bhdhno
> >
> >Best,
> >Yuxin
>


[jira] [Created] (FLINK-31498) DeclartiveSlotManager always request redundant task manager when resource is not enough

2023-03-17 Thread Weihua Hu (Jira)
Weihua Hu created FLINK-31498:
-

 Summary: DeclartiveSlotManager always request redundant task 
manager when resource is not enough
 Key: FLINK-31498
 URL: https://issues.apache.org/jira/browse/FLINK-31498
 Project: Flink
  Issue Type: Bug
  Components: Runtime / Coordination
Reporter: Weihua Hu
 Attachments: image-2023-03-17-18-05-43-088.png

Currently redundant task manager check in DeclarativeSlotManager only compare 
free slots with required redundant slots. 

when there are no enough resources in YARN/Kubernetes, this mechanism will 
always try to request new task manager. 

there are two way to address this.
1. maintain the state of redundant workers to avoid request twice
2. only try to request redundant workers when there is no pending worker

The first way will make the logic of redundant worker too complicated, I would 
like to choose the second way

Looking forward to any suggestion.

 !image-2023-03-17-18-05-43-088.png! 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-31449) Remove DeclarativeSlotManager related logic

2023-03-14 Thread Weihua Hu (Jira)
Weihua Hu created FLINK-31449:
-

 Summary: Remove DeclarativeSlotManager related logic
 Key: FLINK-31449
 URL: https://issues.apache.org/jira/browse/FLINK-31449
 Project: Flink
  Issue Type: Sub-task
  Components: Runtime / Coordination
Reporter: Weihua Hu


The DeclarativeSlotManager and related configs will be completely removed in 
the next release after the default SlotManager change to FineGrainedSlotManager.





--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-31447) Aligning unit tests of FineGrainedSlotManager with DeclarativeSlotManager

2023-03-14 Thread Weihua Hu (Jira)
Weihua Hu created FLINK-31447:
-

 Summary: Aligning unit tests of FineGrainedSlotManager with 
DeclarativeSlotManager
 Key: FLINK-31447
 URL: https://issues.apache.org/jira/browse/FLINK-31447
 Project: Flink
  Issue Type: Sub-task
  Components: Runtime / Coordination
Reporter: Weihua Hu


There's the DeclarativeSlotManagerTest that covers some specific issues that 
should be ported to the fine grained slot manager.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-31448) Use FineGrainedSlotManager as the default SlotManager

2023-03-14 Thread Weihua Hu (Jira)
Weihua Hu created FLINK-31448:
-

 Summary: Use FineGrainedSlotManager as the default SlotManager
 Key: FLINK-31448
 URL: https://issues.apache.org/jira/browse/FLINK-31448
 Project: Flink
  Issue Type: Sub-task
  Components: Runtime / Coordination
Reporter: Weihua Hu






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-31445) Split resource allocate/release related logic from FineGrainedSlotManager to TaskManagerTracker

2023-03-14 Thread Weihua Hu (Jira)
Weihua Hu created FLINK-31445:
-

 Summary: Split resource allocate/release related logic from 
FineGrainedSlotManager to TaskManagerTracker
 Key: FLINK-31445
 URL: https://issues.apache.org/jira/browse/FLINK-31445
 Project: Flink
  Issue Type: Sub-task
  Components: Runtime / Coordination
Reporter: Weihua Hu


Currently the FineGrainedSlotManager is response to slots allocations and 
resources request/release. This makes the logical of FineGrainedSlotManager 
complicated, So we will move task manager related work from 
FineGrainedSlotManager to TaskManagerTracker, which already tracks task 
managers but not including request/release.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-31444) FineGrainedSlotManager reclaims slots when job is finished

2023-03-14 Thread Weihua Hu (Jira)
Weihua Hu created FLINK-31444:
-

 Summary: FineGrainedSlotManager reclaims slots when job is finished
 Key: FLINK-31444
 URL: https://issues.apache.org/jira/browse/FLINK-31444
 Project: Flink
  Issue Type: Sub-task
  Components: Runtime / Coordination
Reporter: Weihua Hu


implementation of 
[FLINK-21751|https://issues.apache.org/jira/browse/FLINK-21751] in 
FineGrainedSlotManager



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-31443) FineGrainedSlotManager maintain some redundant task managers

2023-03-14 Thread Weihua Hu (Jira)
Weihua Hu created FLINK-31443:
-

 Summary: FineGrainedSlotManager maintain some redundant task 
managers
 Key: FLINK-31443
 URL: https://issues.apache.org/jira/browse/FLINK-31443
 Project: Flink
  Issue Type: Sub-task
  Components: Runtime / Coordination
Reporter: Weihua Hu


implementation of 
[FLINK-18625|https://issues.apache.org/jira/browse/FLINK-18625] in 
FineGrainedSlotManager.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-31441) FineGrainedSlotManager support select slot evenly

2023-03-14 Thread Weihua Hu (Jira)
Weihua Hu created FLINK-31441:
-

 Summary: FineGrainedSlotManager support select slot evenly
 Key: FLINK-31441
 URL: https://issues.apache.org/jira/browse/FLINK-31441
 Project: Flink
  Issue Type: Sub-task
  Components: Runtime / Coordination
Reporter: Weihua Hu


with [FLINK-12122|https://issues.apache.org/jira/browse/FLINK-12122], we 
support spread out tasks evenly to available task managers. 

FineGrainedSlotManager should support this.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-31439) FLIP-298: Unifying the Implementation of SlotManager

2023-03-14 Thread Weihua Hu (Jira)
Weihua Hu created FLINK-31439:
-

 Summary: FLIP-298: Unifying the Implementation of SlotManager
 Key: FLINK-31439
 URL: https://issues.apache.org/jira/browse/FLINK-31439
 Project: Flink
  Issue Type: Improvement
  Components: Runtime / Coordination
Reporter: Weihua Hu
 Fix For: 1.18.0


This is an umbrella ticket for 
[FLIP-298|https://cwiki.apache.org/confluence/display/FLINK/FLIP-298%3A+Unifying+the+Implementation+of+SlotManager].






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [RESULT][Vote] FLIP-298: Unifying the Implementation of SlotManager

2023-03-14 Thread Weihua Hu
Sorry for missing the link of FLIP

[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-298%3A+Unifying+the+Implementation+of+SlotManager

Best,
Weihua


On Tue, Mar 14, 2023 at 3:05 PM Weihua Hu  wrote:

> I am happy to announce that FLIP-298: Unifying the Implementation of
> SlotManager[1] has been accepted.
>
> There are 6 binding votes and 5 non-binding vote:
> - Xintong Song (binding)
> - weijie guo (binding)
> - David Morávek (binding)
> - Matthias Pohl (binding)
> - Yangze Guo (binding)
> - Maximilian Michels (binding)
> - Shammon FY (non-binding)
> - John Roesler (non-binding)
> - Yuxin Tan (non-binding)
> - Zhanghao Chen (non-binding)
> - Etienne Chauchot (non-binding)
>
> There is no disapproving vote.
>
> Best,
> Weihua
>


[RESULT][Vote] FLIP-298: Unifying the Implementation of SlotManager

2023-03-14 Thread Weihua Hu
I am happy to announce that FLIP-298: Unifying the Implementation of
SlotManager[1] has been accepted.

There are 6 binding votes and 5 non-binding vote:
- Xintong Song (binding)
- weijie guo (binding)
- David Morávek (binding)
- Matthias Pohl (binding)
- Yangze Guo (binding)
- Maximilian Michels (binding)
- Shammon FY (non-binding)
- John Roesler (non-binding)
- Yuxin Tan (non-binding)
- Zhanghao Chen (non-binding)
- Etienne Chauchot (non-binding)

There is no disapproving vote.

Best,
Weihua


Re: [Vote] FLIP-298: Unifying the Implementation of SlotManager

2023-03-14 Thread Weihua Hu
Thanks, everyone, this vote is closing now.

Best,
Weihua


On Mon, Mar 13, 2023 at 7:18 PM Maximilian Michels  wrote:

> +1 (binding)
>
> On Mon, Mar 13, 2023 at 11:33 AM Etienne Chauchot 
> wrote:
> >
> > +1 (not binding)
> >
> > Etienne
> >
> > Le 11/03/2023 à 07:37, Yangze Guo a écrit :
> > > +1 (binding)
> > >
> > > Zhanghao Chen  于 2023年3月10日周五 下午5:07写道:
> > >
> > >> Thanks Weihua. +1 (non-binding)
> > >>
> > >> Best,
> > >> Zhanghao Chen
> > >> 
> > >> From: Weihua Hu 
> > >> Sent: Thursday, March 9, 2023 13:27
> > >> To: dev 
> > >> Subject: [Vote] FLIP-298: Unifying the Implementation of SlotManager
> > >>
> > >> Hi Everyone,
> > >>
> > >> I would like to start the vote on FLIP-298: Unifying the
> Implementation
> > >> of SlotManager [1]. The FLIP was discussed in this thread [2].
> > >>
> > >> This FLIP aims to unify the implementation of SlotManager in
> > >> order to reduce maintenance costs.
> > >>
> > >> The vote will last for at least 72 hours (03/14, 15:00 UTC+8)
> > >> unless there is an objection or insufficient votes. Thank you all.
> > >>
> > >> [1]
> > >>
> > >>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-298%3A+Unifying+the+Implementation+of+SlotManager
> > >> [2]https://lists.apache.org/thread/ocssfxglpc8z7cto3k8p44mrjxwr67r9
> > >>
> > >> Best,
> > >> Weihua
> > >>
>


Re: [DISCUSS] FLIP-298: Unifying the Implementation of SlotManager

2023-03-13 Thread Weihua Hu
ights here.
> >>>>
> >>>> 3. For me personally, having a more detailed summary comparing the
> >>>> subcomponents of both SlotManager implementations with where
> >>>> their functionality matches and where they differ might help
> understand
> >>> the
> >>>> consequences of the changes proposed in FLIP-298.
> >>>>
> >>>> Best,
> >>>> Matthias
> >>>>
> >>>> [1]
> >>>>
> >>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-285%3A+Refactoring+LeaderElection+to+make+Flink+support+multi-component+leader+election+out-of-the-box
> >>>> [2] https://issues.apache.org/jira/browse/FLINK-30338
> >>>> [3]
> >>>>
> >>
> https://github.com/apache/flink/blob/f611ea8cb5deddb42429df2c99f0c68d7382e9bd/flink-runtime/src/main/java/org/apache/flink/runtime/resourcemanager/slotmanager/TaskExecutorManager.java#L66-L68
> >>>> On Tue, Feb 28, 2023 at 6:14 AM Matt Wang  wrote:
> >>>>
> >>>>> This is a good proposal for me, it will make the code of the
> >> SlotManager
> >>>>> more clear.
> >>>>>
> >>>>>
> >>>>>
> >>>>> --
> >>>>>
> >>>>> Best,
> >>>>> Matt Wang
> >>>>>
> >>>>>
> >>>>>  Replied Message 
> >>>>> | From | David Morávek |
> >>>>> | Date | 02/27/2023 22:45 |
> >>>>> | To |  |
> >>>>> | Subject | Re: [DISCUSS] FLIP-298: Unifying the Implementation of
> >>>>> SlotManager |
> >>>>> Hi Weihua, I still need to dig into the details, but the overall
> >>> sentiment
> >>>>> of this change sounds reasonable.
> >>>>>
> >>>>> Best,
> >>>>> D.
> >>>>>
> >>>>> On Mon, Feb 27, 2023 at 2:26 PM Zhanghao Chen <
> >>> zhanghao.c...@outlook.com>
> >>>>> wrote:
> >>>>>
> >>>>> Thanks for driving this topic. I think this FLIP could help clean up
> >> the
> >>>>> codebase to make it easier to maintain. +1 on it.
> >>>>>
> >>>>> Best,
> >>>>> Zhanghao Chen
> >>>>> 
> >>>>> From: Weihua Hu 
> >>>>> Sent: Monday, February 27, 2023 20:40
> >>>>> To: dev 
> >>>>> Subject: [DISCUSS] FLIP-298: Unifying the Implementation of
> >> SlotManager
> >>>>> Hi everyone,
> >>>>>
> >>>>> I would like to begin a discussion on FLIP-298: Unifying the
> >>> Implementation
> >>>>> of SlotManager[1]. There are currently two types of SlotManager in
> >>> Flink:
> >>>>> DeclarativeSlotManager and FineGrainedSlotManager.
> >>> FineGrainedSlotManager
> >>>>> should behave as DeclarativeSlotManager if the user does not
> configure
> >>> the
> >>>>> slot request profile.
> >>>>>
> >>>>> Therefore, this FLIP aims to unify the implementation of SlotManager
> >> in
> >>>>> order to reduce maintenance costs.
> >>>>>
> >>>>> Looking forward to hearing from you.
> >>>>>
> >>>>> [1]
> >>>>>
> >>>>>
> >>>>>
> >>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-298%3A+Unifying+the+Implementation+of+SlotManager
> >>>>> Best,
> >>>>> Weihua
> >>>>>
> >>>>>
>
>


Re: [DISCUSS] FLIP-298: Unifying the Implementation of SlotManager

2023-03-12 Thread Weihua Hu
Hi, Max

Thanks for feedback.

SlotManager only releases task managers with no task running on it after an
idle timeout.
IMO, this behavior could support scale down naturally.

There are two ways to trigger scale down:
1. Using adaptive scheduler and release task managers by resource
provider(k8s/yarn).
Then scheduler will scale down the parallelism to match the less resources.

2. After FLIP-291[1], we could set the parallelism of jobs, this may scale
down jobs. In this case,
We could add some strategy in SlotPool to release slots. Once SlotPool
releases all slots
of a task manager, the SlotManager will release this task manager after the
idle timeout exceeds.

[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-291%3A+Externalized+Declarative+Resource+Management

Best,
Weihua


On Fri, Mar 10, 2023 at 10:11 PM Maximilian Michels  wrote:

> +1 on the proposal.
>
> I'm wondering about the scale down behavior of the unified
> implementation. Does the new unified implementation prioritize
> releasing entire task managers in favor of evenly spreading out task
> managers? Consider a scale down from parallelism 15 to parallelism 10
> where each task manager has 5 slots. We could either spread out 10
> slots among the 3 task managers, or fit the 10 slots in 2 task
> managers and surrender the task manager (at least on k8s). From a cost
> saving perspective, the latter would be preferable.
>
> -Max
>
> On Thu, Mar 9, 2023 at 4:17 PM Matthias Pohl
>  wrote:
> >
> > Thanks for your clarification. I have nothing else to add to the
> > discussion. +1 from my side to proceed
> >
> > On Wed, Mar 8, 2023 at 4:16 AM Weihua Hu  wrote:
> >
> > > Thanks Yangze for your attention, this would be a great help.
> > >
> > > And thanks Matthias too.
> > >
> > > FLIP-156 [1] mentions some incompatibility between fine-grained
> resource
> > > > management and reactive mode. I assume that this is independent of
> the
> > > > SlotManager and replacing the DSM with the FGSM wouldn't affect
> reactive
> > > > mode?
> > >
> > > Yes. This incompatibility is independent of SlotManager. That means the
> > > AdpativeScheduler will always ignore the resource requirement set by
> > > slotSharingGroup and declare Unknown ResourceProfile to SlotManager.
> > > So, using FGSM as default will not affect reactive mode.
> > >
> > > About the heterogeneous TaskManager: This is a feature that's also not
> > > > supported in the DSM right now, is it? We should state that fact in
> the
> > > > FLIP if we mentioned that we don't want to implement it for the FSGM.
> > >
> > > Yes, both DSM and FGSM do not support request heterogeneous
> > > TaskManager right now. Heterogeneous will make the resource allocation
> > > logic more complicated, such as the resource deadlock if request A
> > > allocated
> > > the bigger slot B and then request B could not allocate the small slot
> A.
> > > We
> > > need to think more before starting to support the heterogeneous task
> > > manager.
> > > So, we don't want to implement heterogeneity in this FLIP.
> > >
> > > Best,
> > > Weihua
> > >
> > >
> > > On Wed, Mar 8, 2023 at 12:44 AM Matthias Pohl
> > >  wrote:
> > >
> > > > Thanks for updating the FLIP and adding more context to it.
> Additionally,
> > > > thanks to Xintong and Yangze for offering your expertise here as
> > > > contributors to the initial FineGrainedSlotManager implementation.
> > > >
> > > > The remark on cutting out functionality was only based on some
> > > superficial
> > > > initial code reading. I cannot come up with a better code structure
> > > myself.
> > > > Therefore, I'm fine with not refactoring the code as part of this
> FLIP.
> > > >
> > > > The strategies that were proposed around making sure that the
> refactoring
> > > > is properly backed by tests sound reasonable. My initial concern was
> > > based
> > > > on the fact that we might have unit test scenarios for the DSM that
> are
> > > not
> > > > covered in the unit tests of the FSGM. In that case, swapping the DSM
> > > with
> > > > the FSGM might not be good enough. Going over the GSM tests to make
> sure
> > > > that we're not accidentally deleting test scenarios sounds good to
> me.
> > > > Thanks, Weihua.
> > > >
> > > > FLIP-156 [1] mentions some incompa

Re: Re: [ANNOUNCE] New Apache Flink Committer - Yuxia Luo

2023-03-12 Thread Weihua Hu
Congratulations, Yuxia!

Best,
Weihua


On Mon, Mar 13, 2023 at 12:11 PM Jingsong Li  wrote:

> Congratulations, Yuxia!
>
> On Mon, Mar 13, 2023 at 11:49 AM Juntao Hu  wrote:
> >
> > Congratulations, Yuxia!
> >
> > Best,
> > Juntao
> >
> >
> > Wencong Liu  于2023年3月13日周一 11:33写道:
> >
> > > Congratulations, Yuxia!
> > >
> > > Best,
> > > Wencong Liu
> > >
> > >
> > > At 2023-03-13 11:20:21, "Qingsheng Ren"  wrote:
> > > >Congratulations, Yuxia!
> > > >
> > > >Best,
> > > >Qingsheng
> > > >
> > > >On Mon, Mar 13, 2023 at 10:27 AM Jark Wu  wrote:
> > > >
> > > >> Hi, everyone
> > > >>
> > > >> On behalf of the PMC, I'm very happy to announce Yuxia Luo as a new
> > > Flink
> > > >> Committer.
> > > >>
> > > >> Yuxia has been continuously contributing to the Flink project for
> almost
> > > >> two
> > > >> years, authored and reviewed hundreds of PRs over this time. He is
> > > >> currently
> > > >> the core maintainer of the Hive component, where he contributed many
> > > >> valuable
> > > >> features, including the Hive dialect with 95% compatibility and
> small
> > > file
> > > >> compaction.
> > > >> In addition, Yuxia driven FLIP-282 (DELETE & UPDATE API) to better
> > > >> integrate
> > > >> Flink with data lakes. He actively participated in dev discussions
> and
> > > >> answered
> > > >> many questions on the user mailing list.
> > > >>
> > > >> Please join me in congratulating Yuxia Luo for becoming a Flink
> > > Committer!
> > > >>
> > > >> Best,
> > > >> Jark Wu (on behalf of the Flink PMC)
> > > >>
> > >
>


Re: [DISCUSS] FLIP-301: Hybrid Shuffle supports Remote Storage

2023-03-09 Thread Weihua Hu
Thanks Yuxin for your explanation.

That sounds reasonable. Looking forward to the new shuffle.


Best,
Weihua


On Fri, Mar 10, 2023 at 11:48 AM Yuxin Tan  wrote:

> Hi, Weihua,
> Thanks for the questions and the ideas.
>
> > 1. How many performance regressions would there be if we only
> used remote storage?
>
> The new architecture can support to use remote storage only, but this
> FLIP target is to improve job stability. And the change in the FLIP has
> been significantly complex and the goal of the first version is to update
> Hybrid Shuffle to the new architecture and support remote storage as
> a supplement. The performance of this version is not the first priority,
> so we haven’t tested the performance of using only remote storage.
> If there are indeed regressions, we will keep optimizing the performance
> of the remote storages and improve it until only remote storage is
> available in the production environment.
>
> > 2. Shall we move the local data to remote storage if the producer is
> finished for a long time?
>
> I agree that it is a good idea, which can release task manager resources
> more timely. But moving data from TM local disk to remote storage needs
> more detailed discussion and design, and it is easier to implement it based
> on the new architecture. Considering the complexity, the target focus, and
> the iteration cycle of the FLIP, we decide that the details are not
> included
> in the first version. We will extend and implement them in the subsequent
> versions.
>
> Best,
> Yuxin
>
>
> Weihua Hu  于2023年3月9日周四 11:22写道:
>
> > Hi, Yuxin
> >
> > Thanks for driving this FLIP.
> >
> > The remote storage shuffle could improve the stability of Batch jobs.
> >
> > In our internal scenario, we use a hybrid cluster to run both
> > Streaming(high priority)
> > and Batch jobs(low priority). When there is not enough resources(such as
> > cpu usage
> > reaches a threshold), the batch containers will be evicted. So this will
> > cause some re-run
> > of batch tasks.
> >
> > It would be a great help if the remote storage could address this. So I
> > have a few questions.
> >
> > 1. How many performance regressions would there be if we only used remote
> > storage?
> >
> > 2. In current design, the shuffle data segment will write to one kind of
> > storage tier.
> > Shall we move the local data to remote storage if the producer is
> finished
> > for a long time?
> > So we can release the idle task manager with no shuffle data on it. This
> > may help to reduce
> > the resource usage when producer parallelism is larger than consume.
> >
> > Best,
> > Weihua
> >
> >
> > On Thu, Mar 9, 2023 at 10:38 AM Yuxin Tan 
> wrote:
> >
> > > Hi, Junrui,
> > > Thanks for the suggestions and ideas.
> > >
> > > > If they are fixed, I suggest that FLIP could provide clearer
> > > explanations.
> > > I have updated the FLIP and described the segment size more clearly.
> > >
> > > > can we provide configuration options for users to manually adjust the
> > > sizes?
> > > The segment size can be configured if necessary. But considering that
> if
> > we
> > > exposed these parameters prematurely, it may be difficult to modify the
> > > implementation later because the user has already used the configs. We
> > > can make these internal configs or fixed values when implementing the
> > first
> > > version, I think we can use either of these two ways, because they are
> > > internal and do not affect the public APIs.
> > >
> > > Best,
> > > Yuxin
> > >
> > >
> > > Junrui Lee  于2023年3月8日周三 00:24写道:
> > >
> > > > Hi Yuxin,
> > > >
> > > > This FLIP looks quite reasonable. Flink can solve the problem of
> Batch
> > > > shuffle by
> > > > combining local and remote storage, and can use fixed local disks for
> > > > better performance
> > > >  in most scenarios, while using remote storage as a supplement when
> > local
> > > > disks are not
> > > >  sufficient, avoiding wasteful costs and poor job stability.
> Moreover,
> > > the
> > > > solution also
> > > > considers the issue of dynamic switching, which can automatically
> > switch
> > > to
> > > > remote
> > > > storage when the local disk is full, saving costs, and automatically
> > > switch
> > > > back when
> > > &g

[Vote] FLIP-298: Unifying the Implementation of SlotManager

2023-03-08 Thread Weihua Hu
Hi Everyone,

I would like to start the vote on FLIP-298: Unifying the Implementation
of SlotManager [1]. The FLIP was discussed in this thread [2].

This FLIP aims to unify the implementation of SlotManager in
order to reduce maintenance costs.

The vote will last for at least 72 hours (03/14, 15:00 UTC+8)
unless there is an objection or insufficient votes. Thank you all.

[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-298%3A+Unifying+the+Implementation+of+SlotManager
[2]https://lists.apache.org/thread/ocssfxglpc8z7cto3k8p44mrjxwr67r9

Best,
Weihua


Re: [DISCUSS] FLIP-301: Hybrid Shuffle supports Remote Storage

2023-03-08 Thread Weihua Hu
Hi, Yuxin

Thanks for driving this FLIP.

The remote storage shuffle could improve the stability of Batch jobs.

In our internal scenario, we use a hybrid cluster to run both
Streaming(high priority)
and Batch jobs(low priority). When there is not enough resources(such as
cpu usage
reaches a threshold), the batch containers will be evicted. So this will
cause some re-run
of batch tasks.

It would be a great help if the remote storage could address this. So I
have a few questions.

1. How many performance regressions would there be if we only used remote
storage?

2. In current design, the shuffle data segment will write to one kind of
storage tier.
Shall we move the local data to remote storage if the producer is finished
for a long time?
So we can release the idle task manager with no shuffle data on it. This
may help to reduce
the resource usage when producer parallelism is larger than consume.

Best,
Weihua


On Thu, Mar 9, 2023 at 10:38 AM Yuxin Tan  wrote:

> Hi, Junrui,
> Thanks for the suggestions and ideas.
>
> > If they are fixed, I suggest that FLIP could provide clearer
> explanations.
> I have updated the FLIP and described the segment size more clearly.
>
> > can we provide configuration options for users to manually adjust the
> sizes?
> The segment size can be configured if necessary. But considering that if we
> exposed these parameters prematurely, it may be difficult to modify the
> implementation later because the user has already used the configs. We
> can make these internal configs or fixed values when implementing the first
> version, I think we can use either of these two ways, because they are
> internal and do not affect the public APIs.
>
> Best,
> Yuxin
>
>
> Junrui Lee  于2023年3月8日周三 00:24写道:
>
> > Hi Yuxin,
> >
> > This FLIP looks quite reasonable. Flink can solve the problem of Batch
> > shuffle by
> > combining local and remote storage, and can use fixed local disks for
> > better performance
> >  in most scenarios, while using remote storage as a supplement when local
> > disks are not
> >  sufficient, avoiding wasteful costs and poor job stability. Moreover,
> the
> > solution also
> > considers the issue of dynamic switching, which can automatically switch
> to
> > remote
> > storage when the local disk is full, saving costs, and automatically
> switch
> > back when
> > there is available space on the local disk.
> >
> > As Wencong Liu stated, an appropriate segment size is essential, as it
> can
> > significantly
> > affect shuffle performance. I also agree that the first version should
> > focus mainly on the
> > design and implementation. However, I have a small question about FLIP. I
> > did not see
> > any information regarding the segment size of memory, local disk, and
> > remote storage
> > in this FLIP. Are these three values fixed at present? If they are
> fixed, I
> > suggest that FLIP
> > could provide clearer explanations. Moreover, although a dynamic segment
> > size
> > mechanism is not necessary at the moment, can we provide configuration
> > options for users
> >  to manually adjust these sizes? I think it might be useful.
> >
> > Best,
> > Junrui.
> >
> > Yuxin Tan  于2023年3月7日周二 20:14写道:
> >
> > > Thanks for joining the discussion.
> > >
> > > @weijie guo
> > > > 1. How to optimize the broadcast result partition?
> > > For the partitions with multi-consumers, e.g., broadcast result
> > partition,
> > > partition reuse,
> > > speculative, etc, the processing logic is the same as the original
> Hybrid
> > > Shuffle, that is,
> > > using the full spilling strategy. It indeed may reduce the opportunity
> to
> > > consume from
> > > memory, but the PoC shows that it has no effect on the performance
> > > basically.
> > >
> > > > 2. Can the new proposal completely avoid this problem of inaccurate
> > > backlog
> > > calculation?
> > > Yes, this can avoid the problem completely. About the read buffers,
> the N
> > > is to reserve
> > > one exclusive buffer per channel, which is to avoid the deadlock
> because
> > > the buffers
> > > are acquired by some channels and other channels can not request any
> > > buffers. But
> > > the buffers except for the N can be floating (competing to request the
> > > buffers) by all
> > > channels.
> > >
> > > @Wencong Liu
> > > > Deciding the Segment size dynamically will be helpful.
> > > I agree that it may be better if the segment size is dynamically
> decided,
> > > but for simplifying
> > > the implementation of the first version, we want to make this a fixed
> > value
> > > for each tier.
> > > In the future, this can be a good improvement if necessary. In the
> first
> > > version, we will mainly
> > > focus on the more important features, such as the tiered storage
> > > architecture, dynamic
> > > switching tiers, supporting remote storage, memory management, etc.
> > >
> > > Best,
> > > Yuxin
> > >
> > >
> > > Wencong Liu  于2023年3月7日周二 16:48写道:
> > >
> > > > Hello Yuxin,
> > > >
> > > >
> > > > Thanks for your pro

Re: [DISCUSS] FLIP-298: Unifying the Implementation of SlotManager

2023-03-07 Thread Weihua Hu
Thanks Yangze for your attention, this would be a great help.

And thanks Matthias too.

FLIP-156 [1] mentions some incompatibility between fine-grained resource
> management and reactive mode. I assume that this is independent of the
> SlotManager and replacing the DSM with the FGSM wouldn't affect reactive
> mode?

Yes. This incompatibility is independent of SlotManager. That means the
AdpativeScheduler will always ignore the resource requirement set by
slotSharingGroup and declare Unknown ResourceProfile to SlotManager.
So, using FGSM as default will not affect reactive mode.

About the heterogeneous TaskManager: This is a feature that's also not
> supported in the DSM right now, is it? We should state that fact in the
> FLIP if we mentioned that we don't want to implement it for the FSGM.

Yes, both DSM and FGSM do not support request heterogeneous
TaskManager right now. Heterogeneous will make the resource allocation
logic more complicated, such as the resource deadlock if request A
allocated
the bigger slot B and then request B could not allocate the small slot A.
We
need to think more before starting to support the heterogeneous task
manager.
So, we don't want to implement heterogeneity in this FLIP.

Best,
Weihua


On Wed, Mar 8, 2023 at 12:44 AM Matthias Pohl
 wrote:

> Thanks for updating the FLIP and adding more context to it. Additionally,
> thanks to Xintong and Yangze for offering your expertise here as
> contributors to the initial FineGrainedSlotManager implementation.
>
> The remark on cutting out functionality was only based on some superficial
> initial code reading. I cannot come up with a better code structure myself.
> Therefore, I'm fine with not refactoring the code as part of this FLIP.
>
> The strategies that were proposed around making sure that the refactoring
> is properly backed by tests sound reasonable. My initial concern was based
> on the fact that we might have unit test scenarios for the DSM that are not
> covered in the unit tests of the FSGM. In that case, swapping the DSM with
> the FSGM might not be good enough. Going over the GSM tests to make sure
> that we're not accidentally deleting test scenarios sounds good to me.
> Thanks, Weihua.
>
> FLIP-156 [1] mentions some incompatibility between fine-grained resource
> management and reactive mode. I assume that this is independent of the
> SlotManager and replacing the DSM with the FGSM wouldn't affect reactive
> mode?
>
> About the heterogeneous TaskManager: This is a feature that's also not
> supported in the DSM right now, is it? We should state that fact in the
> FLIP if we mentioned that we don't want to implement it for the FSGM.
>
> Best,
> Matthias
>
> [1]
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-156%3A+Runtime+Interfaces+for+Fine-Grained+Resource+Requirements
>
> On Tue, Mar 7, 2023 at 8:58 AM Yangze Guo  wrote:
>
> > Hi Weihua,
> >
> > Thanks for driving this. As Xintong mentioned, this was a technical
> > debt from FLIP-56.
> >
> > The latest version of FLIP sounds good, +1 from my side. As a
> > contributor to this component, I'm willing to assist with the review
> > process. Feel free to reach me if you need help.
> >
> > Best,
> > Yangze Guo
> >
> > On Tue, Mar 7, 2023 at 1:47 PM Weihua Hu  wrote:
> > >
> > > Hi,
> > >
> > > @David @Matthias
> > > There are a few days after hearing your thoughts. I would like to know
> if
> > > there are any other concerns about this FLIP.
> > >
> > >
> > > Best,
> > > Weihua
> > >
> > >
> > > On Mon, Mar 6, 2023 at 7:53 PM Weihua Hu 
> wrote:
> > >
> > > >
> > > > Thanks Shammon,
> > > >
> > > > I've updated FLIP to add this redundant Task Manager limitation.
> > > >
> > > >
> > > > Best,
> > > > Weihua
> > > >
> > > >
> > > > On Mon, Mar 6, 2023 at 5:07 PM Shammon FY  wrote:
> > > >
> > > >> Hi weihua
> > > >>
> > > >> Can you add content related to `heterogeneous resources` to this
> > FLIP? We
> > > >> can record it and consider it in the future. It may be useful for
> some
> > > >> scenarios, such as the combination of streaming and ML.
> > > >>
> > > >> Best,
> > > >> Shammon
> > > >>
> > > >>
> > > >> On Mon, Mar 6, 2023 at 1:39 PM weijie guo <
> guoweijieres...@gmail.com>
> > > >> wrote:
> > > >>
>

Re: [DISCUSS] FLIP-298: Unifying the Implementation of SlotManager

2023-03-06 Thread Weihua Hu
Hi,

@David @Matthias
There are a few days after hearing your thoughts. I would like to know if
there are any other concerns about this FLIP.


Best,
Weihua


On Mon, Mar 6, 2023 at 7:53 PM Weihua Hu  wrote:

>
> Thanks Shammon,
>
> I've updated FLIP to add this redundant Task Manager limitation.
>
>
> Best,
> Weihua
>
>
> On Mon, Mar 6, 2023 at 5:07 PM Shammon FY  wrote:
>
>> Hi weihua
>>
>> Can you add content related to `heterogeneous resources` to this FLIP? We
>> can record it and consider it in the future. It may be useful for some
>> scenarios, such as the combination of streaming and ML.
>>
>> Best,
>> Shammon
>>
>>
>> On Mon, Mar 6, 2023 at 1:39 PM weijie guo 
>> wrote:
>>
>> > Hi Weihua,
>> >
>> > Thanks for your clarification, SGTM.
>> >
>> > Best regards,
>> >
>> > Weijie
>> >
>> >
>> > Weihua Hu  于2023年3月6日周一 11:43写道:
>> >
>> > > Thanks Weijie.
>> > >
>> > > Heterogeneous task managers will not be considered in this FLIP since
>> > > it does not request heterogeneous resources as you said.
>> > >
>> > > My first thought is we can adjust the meaning of redundant
>> configuration
>> > > to redundant number of per resource type. These can be considered in
>> > > detail when we decide to support heterogeneous task managers.
>> > >
>> > > Best,
>> > > Weihua
>> > >
>> > >
>> > > On Sat, Mar 4, 2023 at 1:13 AM weijie guo 
>> > > wrote:
>> > >
>> > > > Thanks Weihua for preparing this FLIP.
>> > > >
>> > > > This FLIP overall looks reasonable to me after updating as
>> suggested by
>> > > > Matthias.
>> > > >
>> > > > I only have one small question about keeping some redundant task
>> > > managers:
>> > > > In the fine-grained resource management, theoretically, it can
>> support
>> > > > heterogeneous taskmanagers. When we complete the missing features
>> for
>> > > FGSM,
>> > > > do we plan to take this into account?
>> > > > Of course, if I remember correctly, FGSM will not request
>> heterogeneous
>> > > > resources at present, so it is also acceptable to me if there is no
>> > > special
>> > > > treatment now.
>> > > >
>> > > > +1 for this changes if we can ensure the test coverage.
>> > > >
>> > > > Best regards,
>> > > >
>> > > > Weijie
>> > > >
>> > > >
>> > > > John Roesler  于2023年3月2日周四 12:53写道:
>> > > >
>> > > > > Thanks for the test plan, Weihua!
>> > > > >
>> > > > > Yes, it addresses my concerns.
>> > > > >
>> > > > > Thanks,
>> > > > > John
>> > > > >
>> > > > > On Wed, Mar 1, 2023, at 22:38, Weihua Hu wrote:
>> > > > > > Hi, everyone,
>> > > > > > Thanks for your suggestions and ideas.
>> > > > > > Thanks Xintong for sharing the detailed backgrounds of
>> SlotManager.
>> > > > > >
>> > > > > > *@Matthias
>> > > > > >
>> > > > > > 1. Did you do a proper test coverage analysis?
>> > > > > >
>> > > > > >
>> > > > > > Just as Xintong said, we already have a CI stage for fine
>> grained
>> > > > > resource
>> > > > > > managers.
>> > > > > > And I will make sure FineGrainedSlotManager as the default
>> > > SlotManager
>> > > > > can
>> > > > > > pass all the tests of CI.
>> > > > > > In addition, I will review all unit tests of
>> > > > DeclarativeSlotManager(DSM)
>> > > > > to
>> > > > > > ensure that there are no gaps in the
>> > > > > > coverage provided by the FineGrainedSlotManager.
>> > > > > > I also added the 'Test Plan' part to the FLIP.
>> > > > > > @Matthias @John @Shammon Does this test plan address your
>> concerns?
>> > > > > >
>> > > > > > 2.  DeclarativeSlotManager and FineGrainedSlotManager feel quite
>> &g

Re: [DISCUSS] FLIP-298: Unifying the Implementation of SlotManager

2023-03-06 Thread Weihua Hu
Thanks Shammon,

I've updated FLIP to add this redundant Task Manager limitation.


Best,
Weihua


On Mon, Mar 6, 2023 at 5:07 PM Shammon FY  wrote:

> Hi weihua
>
> Can you add content related to `heterogeneous resources` to this FLIP? We
> can record it and consider it in the future. It may be useful for some
> scenarios, such as the combination of streaming and ML.
>
> Best,
> Shammon
>
>
> On Mon, Mar 6, 2023 at 1:39 PM weijie guo 
> wrote:
>
> > Hi Weihua,
> >
> > Thanks for your clarification, SGTM.
> >
> > Best regards,
> >
> > Weijie
> >
> >
> > Weihua Hu  于2023年3月6日周一 11:43写道:
> >
> > > Thanks Weijie.
> > >
> > > Heterogeneous task managers will not be considered in this FLIP since
> > > it does not request heterogeneous resources as you said.
> > >
> > > My first thought is we can adjust the meaning of redundant
> configuration
> > > to redundant number of per resource type. These can be considered in
> > > detail when we decide to support heterogeneous task managers.
> > >
> > > Best,
> > > Weihua
> > >
> > >
> > > On Sat, Mar 4, 2023 at 1:13 AM weijie guo 
> > > wrote:
> > >
> > > > Thanks Weihua for preparing this FLIP.
> > > >
> > > > This FLIP overall looks reasonable to me after updating as suggested
> by
> > > > Matthias.
> > > >
> > > > I only have one small question about keeping some redundant task
> > > managers:
> > > > In the fine-grained resource management, theoretically, it can
> support
> > > > heterogeneous taskmanagers. When we complete the missing features for
> > > FGSM,
> > > > do we plan to take this into account?
> > > > Of course, if I remember correctly, FGSM will not request
> heterogeneous
> > > > resources at present, so it is also acceptable to me if there is no
> > > special
> > > > treatment now.
> > > >
> > > > +1 for this changes if we can ensure the test coverage.
> > > >
> > > > Best regards,
> > > >
> > > > Weijie
> > > >
> > > >
> > > > John Roesler  于2023年3月2日周四 12:53写道:
> > > >
> > > > > Thanks for the test plan, Weihua!
> > > > >
> > > > > Yes, it addresses my concerns.
> > > > >
> > > > > Thanks,
> > > > > John
> > > > >
> > > > > On Wed, Mar 1, 2023, at 22:38, Weihua Hu wrote:
> > > > > > Hi, everyone,
> > > > > > Thanks for your suggestions and ideas.
> > > > > > Thanks Xintong for sharing the detailed backgrounds of
> SlotManager.
> > > > > >
> > > > > > *@Matthias
> > > > > >
> > > > > > 1. Did you do a proper test coverage analysis?
> > > > > >
> > > > > >
> > > > > > Just as Xintong said, we already have a CI stage for fine grained
> > > > > resource
> > > > > > managers.
> > > > > > And I will make sure FineGrainedSlotManager as the default
> > > SlotManager
> > > > > can
> > > > > > pass all the tests of CI.
> > > > > > In addition, I will review all unit tests of
> > > > DeclarativeSlotManager(DSM)
> > > > > to
> > > > > > ensure that there are no gaps in the
> > > > > > coverage provided by the FineGrainedSlotManager.
> > > > > > I also added the 'Test Plan' part to the FLIP.
> > > > > > @Matthias @John @Shammon Does this test plan address your
> concerns?
> > > > > >
> > > > > > 2.  DeclarativeSlotManager and FineGrainedSlotManager feel quite
> > big
> > > in
> > > > > >
> > > > > > terms of lines of code
> > > > > >
> > > > > >
> > > > > > IMO, the refactoring of SlotManager does not belong to this FLIP
> > > since
> > > > it
> > > > > > may lead to some unstable risks. For
> > > > > > FineGrainedSlotManager(FGSM), we already split some reasonable
> > > > > components.
> > > > > > They are:
> > > > > > * TaskManagerTracker: Track task managers and their resources.
> > > > > > * ResourceTracke

Re: Re: obtain the broadcast stream information in sink

2023-03-06 Thread Weihua Hu
AFAIK, we can not get the broadcast state in sink.

Maybe you can enrich records with broadcast information, and then get the
information from each record in the Sink function.

Best,
Weihua


On Mon, Mar 6, 2023 at 6:20 PM zhan...@eastcom-sw.com <
zhan...@eastcom-sw.com> wrote:

>
> The sinks needs to get some configuration information while writing. I
> want to get it from the broadcast stream ~
>
>
> From: Weihua Hu
> Date: 2023-03-06 18:06
> To: dev; user
> Subject: Re: obtain the broadcast stream information in sink
> Hi,
>
> Could you describe your usage scenario in detail?
> Why do you need to get the broadcast stream in sink?
> And could you split an operator from the sink to deal with broadcast
> stream?
>
> Best,
> Weihua
>
>
> On Mon, Mar 6, 2023 at 10:57 AM zhan...@eastcom-sw.com <
> zhan...@eastcom-sw.com> wrote:
>
> >
> > hi, all ~
> >
> > how can i obtain the broadcast stream information in sink ?
> >
>


Re: obtain the broadcast stream information in sink

2023-03-06 Thread Weihua Hu
Hi,

Could you describe your usage scenario in detail?
Why do you need to get the broadcast stream in sink?
And could you split an operator from the sink to deal with broadcast stream?

Best,
Weihua


On Mon, Mar 6, 2023 at 10:57 AM zhan...@eastcom-sw.com <
zhan...@eastcom-sw.com> wrote:

>
> hi, all ~
>
> how can i obtain the broadcast stream information in sink ?
>


Re: [DISCUSS] FLIP-298: Unifying the Implementation of SlotManager

2023-03-05 Thread Weihua Hu
Thanks Weijie.

Heterogeneous task managers will not be considered in this FLIP since
it does not request heterogeneous resources as you said.

My first thought is we can adjust the meaning of redundant configuration
to redundant number of per resource type. These can be considered in
detail when we decide to support heterogeneous task managers.

Best,
Weihua


On Sat, Mar 4, 2023 at 1:13 AM weijie guo  wrote:

> Thanks Weihua for preparing this FLIP.
>
> This FLIP overall looks reasonable to me after updating as suggested by
> Matthias.
>
> I only have one small question about keeping some redundant task managers:
> In the fine-grained resource management, theoretically, it can support
> heterogeneous taskmanagers. When we complete the missing features for FGSM,
> do we plan to take this into account?
> Of course, if I remember correctly, FGSM will not request heterogeneous
> resources at present, so it is also acceptable to me if there is no special
> treatment now.
>
> +1 for this changes if we can ensure the test coverage.
>
> Best regards,
>
> Weijie
>
>
> John Roesler  于2023年3月2日周四 12:53写道:
>
> > Thanks for the test plan, Weihua!
> >
> > Yes, it addresses my concerns.
> >
> > Thanks,
> > John
> >
> > On Wed, Mar 1, 2023, at 22:38, Weihua Hu wrote:
> > > Hi, everyone,
> > > Thanks for your suggestions and ideas.
> > > Thanks Xintong for sharing the detailed backgrounds of SlotManager.
> > >
> > > *@Matthias
> > >
> > > 1. Did you do a proper test coverage analysis?
> > >
> > >
> > > Just as Xintong said, we already have a CI stage for fine grained
> > resource
> > > managers.
> > > And I will make sure FineGrainedSlotManager as the default SlotManager
> > can
> > > pass all the tests of CI.
> > > In addition, I will review all unit tests of
> DeclarativeSlotManager(DSM)
> > to
> > > ensure that there are no gaps in the
> > > coverage provided by the FineGrainedSlotManager.
> > > I also added the 'Test Plan' part to the FLIP.
> > > @Matthias @John @Shammon Does this test plan address your concerns?
> > >
> > > 2.  DeclarativeSlotManager and FineGrainedSlotManager feel quite big in
> > >
> > > terms of lines of code
> > >
> > >
> > > IMO, the refactoring of SlotManager does not belong to this FLIP since
> it
> > > may lead to some unstable risks. For
> > > FineGrainedSlotManager(FGSM), we already split some reasonable
> > components.
> > > They are:
> > > * TaskManagerTracker: Track task managers and their resources.
> > > * ResourceTracker: track requirements of jobs
> > > * ResourceAllocationStrategy: Try to fulfill the resource requirements
> > with
> > > available/pending resources.
> > > * SlotStatusSyncer: communicate with TaskManager, for
> allocating/freeing
> > > slot and reconciling the slot status
> > > Maybe we can start a discussion about refactoring SlotManager in
> another
> > > FLIP if there are some good suggestions.
> > > WDYT
> > >
> > > 3. For me personally, having a more detailed summary comparing the
> > >> subcomponents of both SlotManager implementations with where
> > >> their functionality matches and where they differ might help
> understand
> > the
> > >> consequences of the changes proposed in FLIP-298
> > >
> > > Good suggestion, I have updated the comparison in this FLIP. Looking
> > > forward to any suggestions/thoughts
> > > if they are not described clearly.
> > >
> > > *@John
> > >
> > > 4. In addition to changing the default, would it make sense to log a
> > >> deprecation warning on initialization
> > >
> > > if the DeclarativeSlotManager is used?
> > >>
> > > SGTM, We should add Deprecated annotations to DSM for devs. And log a
> > > deprecation warning for users.
> > >
> > > *@Shammon
> > >
> > > 1. For their functional differences, can you give some detailed tests
> to
> > >> verify that the new FineGrainedSlotManager has these capabilities?
> This
> > can
> > >> effectively verify the new functions
> > >>
> > > As just maintained, there is already a CI stage of FGSM, and I will do
> > more
> > > review of unit tests for DSM.
> > >
> > >  2. I'm worried that many functions are not independent and it is
> > difficult
> > >> to migr

Re: [DISCUSS] PRs in flink-ci/flink-mirror

2023-03-02 Thread Weihua Hu
Hi, Matthias

Thanks for bringing this discussion.

When I wanted to trigger a CI pipeline, my first thought was to submit a PR
to flink repo. But considering that the PR was not intended to be merged
in,
it might interfere with others. So I tried to retrieve how to run the CI
pipeline
without PR, Then I found this documentation and flows it's suggestion to
submit a PR to flink-mirror. Sorry for making noise in the slack channel.

+1 for removing this section in the docs.


Best,
Weihua


On Thu, Mar 2, 2023 at 4:17 PM Matthias Pohl 
wrote:

> Hi everyone,
> Weihua Hu [1] notified me of a section in Flink's Azure Pipeline
> documentation [2] where it's suggested to create PRs against
> flink-ci/mirror as a workaround if you're not having a private Azure
> Pipeline account and want to run CI with your code changes. Even though
> it's a viable solution it does generate noise in the Slack channel for
> build failure (#builds). Additionally, I don't see any extra value in
> comparison to using your own fork and creating a PR on your branch against
> the apache/flink repo. CI will be picked up by Flink's CiBot. Or am I
> missing something?
>
> I couldn't find any ML discussion on that matter. I suggest removing this
> section in the docs.
>
> Best,
> Matthias
>
> [1]
> https://github.com/flink-ci/flink-mirror/pull/16#issuecomment-1451183958
> [2]
>
> https://cwiki.apache.org/confluence/display/FLINK/Azure+Pipelines#AzurePipelines-AzurePipelineUsageRestrictions
>
> --
>
> [image: Aiven] <https://www.aiven.io>
>
> *Matthias Pohl*
> Opensource Software Engineer, *Aiven*
> matthias.p...@aiven.io|  +49 170 9869525
> aiven.io <https://www.aiven.io>   |   <https://www.facebook.com/aivencloud
> >
>   <https://www.linkedin.com/company/aiven/>   <
> https://twitter.com/aiven_io>
> *Aiven Deutschland GmbH*
> Alexanderufer 3-7, 10117 Berlin
> Geschäftsführer: Oskari Saarenmaa & Hannu Valtonen
> Amtsgericht Charlottenburg, HRB 209739 B
>


Re: [DISCUSS] FLIP-298: Unifying the Implementation of SlotManager

2023-03-01 Thread Weihua Hu
 might improve the overall codebase and might make reviewing the
> > > refactoring
> > > > easier. I did a first pass over the code and struggled to identify
> code
> > > > blocks that could be moved out of the SlotManager implementation(s).
> > > > Therefore, I might be wrong with this proposal. I haven't worked on
> > this
> > > > codebase in that detail that it would allow me to come up with a
> > > judgement
> > > > call. I wanted to bring it up, anyway, because I'm curious whether
> that
> > > > could be an option. There's a comment created by Chesnay (CC'd) in
> the
> > > > JavaDoc of TaskExecutorManager [3] indicating something similar. I'm
> > > > wondering whether he can add some insights here.
> > > >
> > > > 3. For me personally, having a more detailed summary comparing the
> > > > subcomponents of both SlotManager implementations with where
> > > > their functionality matches and where they differ might help
> understand
> > > the
> > > > consequences of the changes proposed in FLIP-298.
> > > >
> > > > Best,
> > > > Matthias
> > > >
> > > > [1]
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-285%3A+Refactoring+LeaderElection+to+make+Flink+support+multi-component+leader+election+out-of-the-box
> > > > [2] https://issues.apache.org/jira/browse/FLINK-30338
> > > > [3]
> > > >
> > >
> >
> https://github.com/apache/flink/blob/f611ea8cb5deddb42429df2c99f0c68d7382e9bd/flink-runtime/src/main/java/org/apache/flink/runtime/resourcemanager/slotmanager/TaskExecutorManager.java#L66-L68
> > > >
> > > > On Tue, Feb 28, 2023 at 6:14 AM Matt Wang  wrote:
> > > >
> > > >> This is a good proposal for me, it will make the code of the
> > SlotManager
> > > >> more clear.
> > > >>
> > > >>
> > > >>
> > > >> --
> > > >>
> > > >> Best,
> > > >> Matt Wang
> > > >>
> > > >>
> > > >>  Replied Message 
> > > >> | From | David Morávek |
> > > >> | Date | 02/27/2023 22:45 |
> > > >> | To |  |
> > > >> | Subject | Re: [DISCUSS] FLIP-298: Unifying the Implementation of
> > > >> SlotManager |
> > > >> Hi Weihua, I still need to dig into the details, but the overall
> > > sentiment
> > > >> of this change sounds reasonable.
> > > >>
> > > >> Best,
> > > >> D.
> > > >>
> > > >> On Mon, Feb 27, 2023 at 2:26 PM Zhanghao Chen <
> > > zhanghao.c...@outlook.com>
> > > >> wrote:
> > > >>
> > > >> Thanks for driving this topic. I think this FLIP could help clean up
> > the
> > > >> codebase to make it easier to maintain. +1 on it.
> > > >>
> > > >> Best,
> > > >> Zhanghao Chen
> > > >> 
> > > >> From: Weihua Hu 
> > > >> Sent: Monday, February 27, 2023 20:40
> > > >> To: dev 
> > > >> Subject: [DISCUSS] FLIP-298: Unifying the Implementation of
> > SlotManager
> > > >>
> > > >> Hi everyone,
> > > >>
> > > >> I would like to begin a discussion on FLIP-298: Unifying the
> > > Implementation
> > > >> of SlotManager[1]. There are currently two types of SlotManager in
> > > Flink:
> > > >> DeclarativeSlotManager and FineGrainedSlotManager.
> > > FineGrainedSlotManager
> > > >> should behave as DeclarativeSlotManager if the user does not
> configure
> > > the
> > > >> slot request profile.
> > > >>
> > > >> Therefore, this FLIP aims to unify the implementation of SlotManager
> > in
> > > >> order to reduce maintenance costs.
> > > >>
> > > >> Looking forward to hearing from you.
> > > >>
> > > >> [1]
> > > >>
> > > >>
> > > >>
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-298%3A+Unifying+the+Implementation+of+SlotManager
> > > >>
> > > >> Best,
> > > >> Weihua
> > > >>
> > > >>
> > >
> >
>


Re: [VOTE] Flink minor version support policy for old releases

2023-02-27 Thread Weihua Hu
Thanks, Danny.

+1 (non-binding)

Best,
Weihua


On Tue, Feb 28, 2023 at 12:38 PM weijie guo 
wrote:

> Thanks Danny for bring this.
>
> +1 (non-binding)
>
> Best regards,
>
> Weijie
>
>
> Jing Ge  于2023年2月27日周一 20:23写道:
>
> > +1 (non-binding)
> >
> > BTW, should we follow the content style [1] to describe the new rule
> using
> > 1.2.x, 1.1.y, 1.1.z?
> >
> > [1] https://flink.apache.org/downloads/#update-policy-for-old-releases
> >
> > Best regards,
> > Jing
> >
> > On Mon, Feb 27, 2023 at 1:06 PM Matthias Pohl
> >  wrote:
> >
> > > Thanks, Danny. Sounds good to me.
> > >
> > > +1 (non-binding)
> > >
> > > On Wed, Feb 22, 2023 at 10:11 AM Danny Cranmer <
> dannycran...@apache.org>
> > > wrote:
> > >
> > > > I am starting a vote to update the "Update Policy for old releases"
> [1]
> > > to
> > > > include additional bugfix support for end of life versions.
> > > >
> > > > As per the discussion thread [2], the change we are voting on is:
> > > > - Support policy: updated to include: "Upon release of a new Flink
> > minor
> > > > version, the community will perform one final bugfix release for
> > resolved
> > > > critical/blocker issues in the Flink minor version losing support."
> > > > - Release process: add a step to start the discussion thread for the
> > > final
> > > > patch version, if there are resolved critical/blocking issues to
> flush.
> > > >
> > > > Voting schema: since our bylaws [3] do not cover this particular
> > > scenario,
> > > > and releases require PMC involvement, we will use a consensus vote
> with
> > > PMC
> > > > binding votes.
> > > >
> > > > Thanks,
> > > > Danny
> > > >
> > > > [1]
> > > https://flink.apache.org/downloads.html#update-policy-for-old-releases
> > > > [2] https://lists.apache.org/thread/szq23kr3rlkm80rw7k9n95js5vqpsnbv
> > > > [3] https://cwiki.apache.org/confluence/display/FLINK/Flink+Bylaws
> > > >
> > >
> >
>


[DISCUSS] FLIP-298: Unifying the Implementation of SlotManager

2023-02-27 Thread Weihua Hu
Hi everyone,

I would like to begin a discussion on FLIP-298: Unifying the Implementation
of SlotManager[1]. There are currently two types of SlotManager in Flink:
DeclarativeSlotManager and FineGrainedSlotManager. FineGrainedSlotManager
should behave as DeclarativeSlotManager if the user does not configure the
slot request profile.

Therefore, this FLIP aims to unify the implementation of SlotManager in
order to reduce maintenance costs.

Looking forward to hearing from you.

[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-298%3A+Unifying+the+Implementation+of+SlotManager

Best,
Weihua


[jira] [Created] (FLINK-31234) Add an option to redirect stdout/stderr for flink on kubernetes

2023-02-26 Thread Weihua Hu (Jira)
Weihua Hu created FLINK-31234:
-

 Summary: Add an option to redirect stdout/stderr for flink on 
kubernetes
 Key: FLINK-31234
 URL: https://issues.apache.org/jira/browse/FLINK-31234
 Project: Flink
  Issue Type: Improvement
  Components: Deployment / Kubernetes
Affects Versions: 1.17.0
Reporter: Weihua Hu
 Fix For: 1.18.0


Flink on Kubernetes does not support redirecting stdout/stderr to files. This 
is to allow users to get logs via "kubectl logs".

But for our internal scenario, we use a kubernetes user to submit all jobs to 
the k8s cluster and provide a platform for users to submit jobs. Users can't 
access kubernetes directly. so we need to display logs/stdout in flink webui.

Because the web ui retrieves the stdout file by filename, which has the same 
prefix as \{taskmanager}.log. We can't support this with a simple custom image.

IMO, we should add an option for redirecting stdout/stderr to files. When this 
is enabled.
1. flink-console.sh will redirect stdout/err to file.
2. flink-console.sh use log4j.properties as log4j configuration to avoid logs 
both in log file and stdout file.
Of course, this option is false by default.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-31233) no error should be logged when retrieving the task manager's stdout if it does not exist

2023-02-26 Thread Weihua Hu (Jira)
Weihua Hu created FLINK-31233:
-

 Summary: no error should be logged  when retrieving the task 
manager's stdout if it does not exist
 Key: FLINK-31233
 URL: https://issues.apache.org/jira/browse/FLINK-31233
 Project: Flink
  Issue Type: Improvement
  Components: Runtime / REST
Affects Versions: 1.17.0
Reporter: Weihua Hu
 Fix For: 1.18.0
 Attachments: image-2023-02-27-13-56-40-718.png, 
image-2023-02-27-13-57-27-190.png

When running Flink on Kubernetes, the stdout logs is not redirected to files so 
it will not shown in WEB UI as expected.

But It returns “500 Internal error” in REST API and produces an error log in 
jobmanager.log. This is confusing and misleading.

 

I think this API should return “404 Not Found” without any error logs, similar 
to how jobmanager/stdout works. 

 

!image-2023-02-27-13-57-27-190.png!

!image-2023-02-27-13-56-40-718.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [VOTE] Release 1.15.4, release candidate #1

2023-02-23 Thread Weihua Hu
Thanks Danny.

+1(non-binding)

Tested the following:
- Download the artifacts and build image
- Ran WordCount on Kubernetes(session mode and application mode)


Best,
Weihua


On Fri, Feb 24, 2023 at 12:29 PM Yanfei Lei  wrote:

> Thanks Danny.
> +1 (non-binding)
>
> - Downloaded artifacts & built Flink from sources
> - Verified GPG signatures of bin and source.
> - Verified version in poms
> - Ran WordCount example in streaming and batch mode(standalone cluster)
> - Went over flink-web PR, looks good except for Sergey's remark.
>
> Danny Cranmer  于2023年2月24日周五 02:08写道:
> >
> > Hi everyone,
> > Please review and vote on the release candidate #1 for the version
> 1.15.4,
> > as follows:
> > [ ] +1, Approve the release
> > [ ] -1, Do not approve the release (please provide specific comments)
> >
> >
> > The complete staging area is available for your review, which includes:
> > * JIRA release notes [1],
> > * the official Apache source release and binary convenience releases to
> be
> > deployed to dist.apache.org [2], which are signed with the key with
> > fingerprint 125FD8DB [3],
> > * all artifacts to be deployed to the Maven Central Repository [4],
> > * source code tag "release-1.15.4-rc1" [5],
> > * website pull request listing the new release and adding announcement
> blog
> > post [6].
> >
> > The vote will be open for at least 72 hours (excluding weekends
> 2023-02-28
> > 19:00). It is adopted by majority approval, with at least 3 PMC
> affirmative
> > votes.
> >
> > Thanks,
> > Danny
> >
> > [1]
> >
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12352526
> > [2] https://dist.apache.org/repos/dist/dev/flink/flink-1.15.4-rc1/
> > [3] https://dist.apache.org/repos/dist/release/flink/KEYS
> > [4]
> >
> https://repository.apache.org/content/repositories/orgapacheflink-1588/org/apache/flink/
> > [5] https://github.com/apache/flink/releases/tag/release-1.15.4-rc1
> > [6] https://github.com/apache/flink-web/pull/611
>
>
>
> --
> Best,
> Yanfei
>


Re: [ANNOUNCE] Flink project website is now powered by Hugo

2023-02-23 Thread Weihua Hu
 Congrats!  Thanks for your effort


Best,
Weihua


On Fri, Feb 24, 2023 at 10:47 AM Junrui Lee  wrote:

> Thanks Martijn for your great work.
>
> Best regards,
> Junrui
>
> Jane Chan  于2023年2月24日周五 10:37写道:
>
> > Thanks Martijn for your great work.
> >
> > Best regards,
> > Jane
> >
> > On Fri, Feb 24, 2023 at 10:24 AM weijie guo 
> > wrote:
> >
> > > Thanks Martijn for your great work.
> > >
> > > Best regards,
> > >
> > > Weijie
> > >
> > >
> > > Jingsong Li  于2023年2月24日周五 09:49写道:
> > >
> > > > Thanks Martijn!
> > > >
> > > > On Fri, Feb 24, 2023 at 9:46 AM yuxia 
> > > wrote:
> > > > >
> > > > > Thanks Martijn for your work.
> > > > >
> > > > > Best regards,
> > > > > Yuxia
> > > > >
> > > > > - 原始邮件 -
> > > > > 发件人: "Jing Ge" 
> > > > > 收件人: "dev" 
> > > > > 发送时间: 星期五, 2023年 2 月 24日 上午 5:20:30
> > > > > 主题: Re: [ANNOUNCE] Flink project website is now powered by Hugo
> > > > >
> > > > > Congrats Martijn! You have made great progress. Thanks for your
> > effort!
> > > > >
> > > > > Best regards,
> > > > > Jing
> > > > >
> > > > > On Thu, Feb 23, 2023 at 8:47 PM Konstantin Knauf <
> kna...@apache.org>
> > > > wrote:
> > > > >
> > > > > > Thanks, Martijn. That was a lot of work.
> > > > > >
> > > > > > Am Do., 23. Feb. 2023 um 16:33 Uhr schrieb Maximilian Michels <
> > > > > > m...@apache.org>:
> > > > > >
> > > > > > > Congrats! Great work. This was a long time in the making!
> > > > > > >
> > > > > > > -Max
> > > > > > >
> > > > > > > On Thu, Feb 23, 2023 at 3:28 PM Martijn Visser <
> > > > martijnvis...@apache.org
> > > > > > >
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > Hi everyone,
> > > > > > > >
> > > > > > > > The project website at https://flink.apache.org is now
> powered
> > > by
> > > > Hugo
> > > > > > > [1]
> > > > > > > > which is the same system as the documentation.
> > > > > > > >
> > > > > > > > The theme is the same as the documentation website, so
> there's
> > no
> > > > > > > redesign
> > > > > > > > involved.
> > > > > > > >
> > > > > > > > If you encounter any issues, please create a Jira ticket and
> > feel
> > > > free
> > > > > > to
> > > > > > > > ping me in it.
> > > > > > > >
> > > > > > > > Thanks to all that have been involved with testing and
> > reviewing!
> > > > > > > >
> > > > > > > > Best regards,
> > > > > > > >
> > > > > > > > Martijn
> > > > > > > >
> > > > > > > > [1] https://issues.apache.org/jira/browse/FLINK-22922
> > > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > https://twitter.com/snntrable
> > > > > > https://github.com/knaufk
> > > > > >
> > > >
> > >
> >
>


Re: [ANNOUNCE] New Apache Flink Committer - Anton Kalashnikov

2023-02-20 Thread Weihua Hu
Congratulations, Anton!

Best,
Weihua


On Tue, Feb 21, 2023 at 11:22 AM weijie guo 
wrote:

> Congratulations, Anton!
>
> Best regards,
>
> Weijie
>
>
> Leonard Xu  于2023年2月21日周二 11:02写道:
>
> > Congratulations, Anton!
> >
> > Best,
> > Leonard
> >
> >
> > > On Feb 21, 2023, at 10:02 AM, Rui Fan  wrote:
> > >
> > > Congratulations, Anton!
> > >
> > > Best,
> > > Rui Fan
> > >
> > > On Tue, Feb 21, 2023 at 9:23 AM yuxia 
> > wrote:
> > >
> > >> Congrats Anton!
> > >>
> > >> Best regards,
> > >> Yuxia
> > >>
> > >> - 原始邮件 -
> > >> 发件人: "Matthias Pohl" 
> > >> 收件人: "dev" 
> > >> 发送时间: 星期二, 2023年 2 月 21日 上午 12:52:40
> > >> 主题: Re: [ANNOUNCE] New Apache Flink Committer - Anton Kalashnikov
> > >>
> > >> Congratulations, Anton! :-)
> > >>
> > >> On Mon, Feb 20, 2023 at 5:09 PM Jing Ge 
> > >> wrote:
> > >>
> > >>> Congrats Anton!
> > >>>
> > >>> On Mon, Feb 20, 2023 at 5:02 PM Samrat Deb 
> > >> wrote:
> > >>>
> >  congratulations Anton!
> > 
> >  Bests,
> >  Samrat
> > 
> >  On Mon, 20 Feb 2023 at 9:29 PM, John Roesler 
> > >>> wrote:
> > 
> > > Congratulations, Anton!
> > > -John
> > >
> > > On Mon, Feb 20, 2023, at 08:18, Piotr Nowojski wrote:
> > >> Hi, everyone
> > >>
> > >> On behalf of the PMC, I'm very happy to announce Anton Kalashnikov
> > >>> as a
> > > new
> > >> Flink Committer.
> > >>
> > >> Anton has been very active for almost two years already, authored
> > >> and
> > >> reviewed many PRs over this time. He is active in the Flink's
> > >>> runtime,
> > >> being the main author of improvements like Buffer Debloating
> > >>> (FLIP-183)
> > >> [1], solved many bugs and fixed many test instabilities, generally
> > > speaking
> > >> helping with the maintenance of runtime components.
> > >>
> > >> Please join me in congratulating Anton Kalashnikov for becoming a
> > >>> Flink
> > >> Committer!
> > >>
> > >> Best,
> > >> Piotr Nowojski (on behalf of the Flink PMC)
> > >>
> > >> [1]
> > >>
> > >
> > 
> > >>>
> > >>
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-183%3A+Dynamic+buffer+size+adjustment
> > >
> > 
> > >>>
> > >>
> >
> >
>


Re: [ANNOUNCE] New Apache Flink Committer - Rui Fan

2023-02-20 Thread Weihua Hu
Congratulations, Rui!

Best,
Weihua


On Tue, Feb 21, 2023 at 11:28 AM Biao Geng  wrote:

> Congrats, Rui!
> Best,
> Biao Geng
>
> weijie guo  于2023年2月21日周二 11:21写道:
>
> > Congrats, Rui!
> >
> > Best regards,
> >
> > Weijie
> >
> >
> > Leonard Xu  于2023年2月21日周二 11:03写道:
> >
> > > Congratulations, Rui!
> > >
> > > Best,
> > > Leonard
> > >
> > > > On Feb 21, 2023, at 9:50 AM, Matt Wang  wrote:
> > > >
> > > > Congrats Rui
> > > >
> > > >
> > > > --
> > > >
> > > > Best,
> > > > Matt Wang
> > > >
> > > >
> > > >  Replied Message 
> > > > | From | yuxia |
> > > > | Date | 02/21/2023 09:22 |
> > > > | To | dev |
> > > > | Subject | Re: [ANNOUNCE] New Apache Flink Committer - Rui Fan |
> > > > Congrats Rui
> > > >
> > > > Best regards,
> > > > Yuxia
> > > >
> > > > - 原始邮件 -
> > > > 发件人: "Samrat Deb" 
> > > > 收件人: "dev" 
> > > > 发送时间: 星期二, 2023年 2 月 21日 上午 1:09:25
> > > > 主题: Re: [ANNOUNCE] New Apache Flink Committer - Rui Fan
> > > >
> > > > Congrats Rui
> > > >
> > > > On Mon, 20 Feb 2023 at 10:28 PM, Anton Kalashnikov <
> kaa@yandex.com
> > >
> > > > wrote:
> > > >
> > > > Congrats Rui!
> > > >
> > > > --
> > > > Best regards,
> > > > Anton Kalashnikov
> > > >
> > > > On 20.02.23 17:53, Matthias Pohl wrote:
> > > > Congratulations, Rui :)
> > > >
> > > > On Mon, Feb 20, 2023 at 5:10 PM Jing Ge 
> > > > wrote:
> > > >
> > > > Congrats Rui!
> > > >
> > > > On Mon, Feb 20, 2023 at 3:19 PM Piotr Nowojski  >
> > > > wrote:
> > > >
> > > > Hi, everyone
> > > >
> > > > On behalf of the PMC, I'm very happy to announce Rui Fan as a new
> Flink
> > > > Committer.
> > > >
> > > > Rui Fan has been active on a small scale since August 2019, and
> ramped
> > > > up
> > > > his contributions in the 2nd half of 2021. He was mostly involved in
> > > > quite
> > > > demanding performance related work around the network stack and
> > > > checkpointing, like re-using TCP connections [1], and many crucial
> > > > improvements to the unaligned checkpoints. Among others: FLIP-227:
> > > > Support
> > > > overdraft buffer [2], Merge small ChannelState file for Unaligned
> > > > Checkpoint [3], Timeout aligned to unaligned checkpoint barrier in
> the
> > > > output buffers [4].
> > > >
> > > > Please join me in congratulating Rui Fan for becoming a Flink
> > > > Committer!
> > > >
> > > > Best,
> > > > Piotr Nowojski (on behalf of the Flink PMC)
> > > >
> > > > [1] https://issues.apache.org/jira/browse/FLINK-22643
> > > > [2]
> > > >
> > > >
> > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-227%3A+Support+overdraft+buffer
> > > > [3] https://issues.apache.org/jira/browse/FLINK-26803
> > > > [4] https://issues.apache.org/jira/browse/FLINK-27251
> > > >
> > > >
> > >
> > >
> >
>


Re: [DISCUSS] Flink minor version support policy for old releases

2023-02-19 Thread Weihua Hu
+1

Thanks for the proposal, this is valuable for stability.

Best,
Weihua


On Mon, Feb 20, 2023 at 10:52 AM Dong Lin  wrote:

> This makes a lot of sense. Thanks Danny for the proposal!
>
> +1
>
> On Sat, Feb 18, 2023 at 12:52 AM Danny Cranmer 
> wrote:
>
> > Hello all,
> >
> > As proposed by Matthias in a separate thread [1], I would like to start a
> > discussion on changing the policy wording to include the release of bug
> > fixes during their support window. Our current policy [2] is to only
> > support the latest two minor versions: " If 1.17.x is the current
> release,
> > 1.16.y is the previous minor supported release. Both versions will
> receive
> > bugfixes for critical issues.". However there may be bug fixes that have
> > been resolved but not released during their support window. Consider this
> > example:
> > 1. Current Flink versions are 1.15.3 and 1.16.1
> > 2. We fix bugs for 1.15.3
> > 3. 1.17.0 is released
> > 4. The 1.15 bug fixes will now not be released unless we get an exception
> >
> > The current process is subject to race conditions between releases.
> Should
> > we upgrade the policy to allow bugfix releases to support issues that
> were
> > resolved during their support window. I propose we update the policy to
> > include:
> >
> > "Upon release of a new Flink minor version, the community will perform
> one
> > final bugfix release for resolved critical/blocker issues in the Flink
> > version losing support."
> >
> > Let's discuss.
> >
> > Thanks,
> > Danny
> >
> > [1] https://lists.apache.org/thread/019wsqqtjt6h0cb81781ogzldbjq0v48
> > [2]
> https://flink.apache.org/downloads.html#update-policy-for-old-releases
> >
>


Re: [ANNOUNCE] New Apache Flink PMC Member - Dong Lin

2023-02-15 Thread Weihua Hu
Congratulations Dong!

Best,
Weihua


On Thu, Feb 16, 2023 at 3:24 PM yuxia  wrote:

> Congratulations, Dong!
>
> Best regards,
> Yuxia
>
> - 原始邮件 -
> 发件人: "Junrui Lee" 
> 收件人: "dev" 
> 发送时间: 星期四, 2023年 2 月 16日 下午 2:40:39
> 主题: Re: [ANNOUNCE] New Apache Flink PMC Member - Dong Lin
>
> Congratulations, Dong!
>
> Best,
> Junrui
>
> Guowei Ma  于2023年2月16日周四 14:21写道:
>
> > Hi, everyone
> >
> > On behalf of the PMC, I'm very happy to announce Dong Lin as a new
> > Flink PMC.
> >
> > Dong is currently the main driver of Flink ML. He reviewed a large
> > number of Flink ML related PRs and also participated in many Flink ML
> > improvements, such as "FLIP-173","FLIP-174" etc. At the same time, he
> made
> > a lot of evangelism events contributions for the Flink ML ecosystem.
> > In fact, in addition to the Flink machine learning field, Dong has
> also
> > participated in many other improvements in Flink, such as "FLIP-205",
> > "FLIP-266","FLIP-269","FLIP-274" etc.
> > Please join me in congratulating Dong Lin for becoming a Flink PMC!
> >
> > Best,
> > Guowei(on behalf of the Flink PMC)
> >
>


Re: [ANNOUNCE] New Apache Flink Committer - Jing Ge

2023-02-15 Thread Weihua Hu
Congratulations, Jing!

Best,
Weihua


On Thu, Feb 16, 2023 at 2:03 PM Dian Fu  wrote:

> Congratulations Jing!
>
> Regards,
> Dian
>
> On Wed, Feb 15, 2023 at 8:04 PM Yun Tang  wrote:
>
> > Congratulations, Jing!
> >
> > Best
> > Yun Tang
> > 
> > From: Dong Lin 
> > Sent: Wednesday, February 15, 2023 19:40
> > To: dev@flink.apache.org 
> > Subject: Re: [ANNOUNCE] New Apache Flink Committer - Jing Ge
> >
> > Congratulations Jing!
> >
> > On Wed, Feb 15, 2023 at 12:48 PM Shiwei Wang 
> wrote:
> >
> > > Congratulations Jing!
> > >
> > >
> > >
> > > > 在 2023年2月14日,16:00,weijie guo  写道:
> > > >
> > > > Congratulations Jing!
> > >
> >
>


Re: [ANNOUNCE] New Apache Flink Committer - Weijie Guo

2023-02-12 Thread Weihua Hu
Congratulations, Weijie!

Best,
Weihua

> On Feb 13, 2023, at 11:55, Lijie Wang  wrote:
> 
> Congratulations, Weijie!



Re: Supplying jar stored at S3 to flink to run the job in kubernetes

2023-01-16 Thread Weihua Hu
Hi, Rahul

User support and questions should be sent to the user mailing list (
u...@flink.apache.org)

You can resend the issue to the user mailing list with a detailed error log.

Best,
Weihua


On Mon, Jan 16, 2023 at 11:18 PM rahul sahoo  wrote:

> I have been following the examples mentioned here:
> flink-kubernetes-operator_examples
> .
> I'm testing this on the local minikube. I have deployed minio for s3 and
> flink operator.
>
> I have my application jar in s3(using minio for this). I have deployed the
> flink session deployment in minikube and want to submit the job as
> mentioned in basic-session-deployment-and-job.yaml
> <
> https://github.com/apache/flink-kubernetes-operator/blob/main/examples/basic-session-deployment-and-job.yaml
> >
>
> I want to replace the `https://` to `s3a://` in this line
> <
> https://github.com/apache/flink-kubernetes-operator/blob/92034fa912f39f5c8bd57632295c7ca85801f33a/examples/basic-session-deployment-and-job.yaml#L43
> >.
> The final URL should look like
> `s3a://local-bkt/flink-examples-streaming_2.12-1.16.0.jar`. I'm using
> flink_2.12-1.16.0 with s3 plugin in docker image.
>
> Can anyone help me to solve this?
>
> Thank You,
> Rahul Sahoo
>


[jira] [Created] (FLINK-30525) Cannot open jobmanager configuration web page

2022-12-28 Thread Weihua Hu (Jira)
Weihua Hu created FLINK-30525:
-

 Summary: Cannot open jobmanager configuration web page
 Key: FLINK-30525
 URL: https://issues.apache.org/jira/browse/FLINK-30525
 Project: Flink
  Issue Type: Bug
  Components: Runtime / Web Frontend
Affects Versions: 1.17.0
Reporter: Weihua Hu
 Attachments: image-2022-12-28-20-37-00-825.png, 
image-2022-12-28-20-37-05-551.png

we remove the environments in rest api in 
https://issues.apache.org/jira/browse/FLINK-30116.
The jobmanager.configuration web page will throw "TypeError: Cannot read 
properties of undefined (reading 'length')" 

the environment in jobmanager.configuration web page should be delete too.
 !image-2022-12-28-20-37-00-825.png! 
 !image-2022-12-28-20-37-05-551.png! 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-30517) Decrease log output interval while waiting for YARN JobManager be allocated

2022-12-28 Thread Weihua Hu (Jira)
Weihua Hu created FLINK-30517:
-

 Summary: Decrease log output interval while waiting for YARN 
JobManager be allocated
 Key: FLINK-30517
 URL: https://issues.apache.org/jira/browse/FLINK-30517
 Project: Flink
  Issue Type: Improvement
  Components: Deployment / YARN
Affects Versions: 1.16.0
Reporter: Weihua Hu
 Attachments: image-2022-12-28-15-56-56-045.png

Flink Client will retrieve the application status every 250ms after submitting 
to YARN. 
If JobManager does not start in 60 seconds, it will log "Deployment took more 
than 60 seconds. Please check if the requested resources are available in the 
YARN cluster" every 250ms. This will lead to too many logs. 

We can keep the check interval at 250ms, but log the message every 1 minute.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] FLIP-266: Simplify network memory configurations for TaskManager

2022-12-26 Thread Weihua Hu
Hi Yuxin,
Thanks for the proposal.

"Insufficient number of network buffers" exceptions also bother us. It's
too hard for users to figure out
how much network buffer they really need. It relates to partitioner type,
parallelism, slots per taskmanager.

Since streaming jobs are our primary scenario, I have some questions about
streaming jobs.

1. In this FLIP, all read buffers will use floating buffers when the total
buffer is more than
'taskmanager.memory.network.read-required-buffer.max'. Competition in
buffer allocation led to preference regression.
How about reducing ExclusiveBuffersPerChannel to 1 first when the total
buffer is not enough?
Will this reduce performance regression in streaming?

2. Changing taskmanager.memory.network.max will affect user migration from
the lower version.
IMO, network buffer size should not increase with total memory, especially
for streaming jobs with application mode.
For example, some ETL jobs with rescale partitioner only require a few
network buffers.
And we already have 'taskmanager.memory.network.read-required-buffer.max'
to control maximum read network buffer usage.
Do we really need to change the default value of
'taskmanager.memory.network.max'?

Best,
Weihua


On Mon, Dec 26, 2022 at 6:26 PM Yuxin Tan  wrote:

> Hi, all
> Thanks for the reply and feedback for everyone!
>
>
> After combining everyone's comments, the main concerns, and corresponding
> adjustments are as follows.
>
>
> @Guowei Ma, Thanks for your feedback.
> > should we introduce a _new_ non-orthogonal
> option(`taskmanager.memory.network.required-buffer-per-gate.max`). That is
> to say, the option will affect both streaming and batch shuffle behavior at
> the
> same time.
>
> 1. Because the default option can meet most requirements no matter in
> Streaming
> or Batch scenarios. We do not want users to adjust this default config
> option by
> design. This configuration option is added only to preserve the possibility
> of
> modification options for users.
> 2. In a few cases, if you really want to adjust this option, users may not
> expect to
> adjust the option according to Streaming or Batch, for example, according
> to the
> parallelism of the job.
> 3. Regarding the performance of streaming shuffle, the same problem of
> insufficient memory also exists for Streaming jobs. We introduced this
> configuration
> to enable users to decouple memory and parallelism, but it will affect some
> performance. By default, the feature is disabled and does not affect
> performance.
> However, the added configuration enables users to choose to decouple memory
> usage and parallelism for Streaming jobs.
>
> > It's better not to expose more implementation-related concepts to users.
>
> Thanks for you suggestion. I will modify the option name to avoid exposing
> implementation-related concepts. I have changed it to
> `taskmanager.memory.network.read-required-buffer.max` in the FLIP.
>
>
>
> @Dong Lin, Thanks for your reply.
> >  it might be helpful to add a dedicated public interface section to
> describe
> the config key and config semantics.
>
> Thanks for your suggestion. I have added public interface section to
> describe
> the config key and config semantics clearly.
>
> > This FLIP seems to add more configs without removing any config from
> Flink.
>
> This Flip is to reduce the number of options to be adjusted when using
> Flink.
> After the Flip, the default option can meet the requirements in most
> sceneries
> rather than modifying any config
> options(`taskmanager.network.memory.buffers-per-channel`
> and `taskmanager.network.memory.floating-buffers-per-gate`), which is
> helpful
> to improve the out-of-box usability. In the long run, these two parameters
> `taskmanager.network.memory.buffers-per-channel` and
> `taskmanager.network.memory.floating-buffers-per-gate` may indeed be
> deprecated
> to reduce user parameters, but from the perspective of compatibility, we
> need to
> pay attention to users' feedback before deciding to deprecate the options.
>
>
>
> @Yanfei Lei,Thanks for your feedback.
> 1. Through the option is cluster level, the default value is different
> according to the
> job type. In other words, by default, for Batch jobs, the config value is
> enabled, 1000.
> And for Streaming jobs, the config value is not enabled by default.
>
> 2. I think this is a good point. The total floating buffers will not change
> with
> ExclusiveBuffersPerChannel(taskmanager.network.memory.buffers-per-channel)
> because this is the maximum memory threshold. But if the user explicitly
> specified
> the ExclusiveBuffersPerChannel, the calculated result of
> ExclusiveBuffersPerChannel * numChannels will change with it.
>
>
> Thanks again for all feedback!
>
>
> Best,
> Yuxin
>
>
> Zhu Zhu  于2022年12月26日周一 17:18写道:
>
> > Hi Yuxin,
> >
> > Thanks for creating this FLIP.
> >
> > It's good if Flink does not require users to set a very large network
> > memory, or tune the advanced(hard-to-understand) per-channel/

  1   2   >