Re: [DISCUSS] Enhance Support for Multicast Communication Pattern

2019-08-25 Thread Guowei Ma
Thanks Yun for bringing up this discussion and very thanks for all the deep
thoughts!

For now, I think this discussion contains two scenarios: one if for
iteration library support and the other is for SQL join support. I think
both of the two scenarios are useful but they seem to have different best
suitable solutions. For making the discussion more clear, I would suggest
to split the discussion into two threads.

And I agree with Piotr that it is very tricky that a keyed stream received
a "broadcast element". So we may add some new interfaces, which could
broadcast or process some special "broadcast event". In that way "broadcast
event" will not be sent with the normal process.

Best,
Guowei


SHI Xiaogang  于2019年8月26日周一 上午9:27写道:

> Hi all,
>
> I also think that multicasting is a necessity in Flink, but more details
> are needed to be considered.
>
> Currently network is tightly coupled with states in Flink to achieve
> automatic scaling. We can only access keyed states in keyed streams and
> operator states in all streams.
> In the concrete example of theta-joins implemented with mutlticasting, the
> following questions exist:
>
>- In which type of states will the data be stored? Do we need another
>type of states which is coupled with multicasting streams?
>- How to ensure the consistency between network and states when jobs
>scale out or scale in?
>
> Regards,
> Xiaogang
>
> Xingcan Cui  于2019年8月25日周日 上午10:03写道:
>
> > Hi all,
> >
> > Sorry for joining this thread late. Basically, I think enabling multicast
> > pattern could be the right direction, but more detailed implementation
> > policies need to be discussed.
> >
> > Two years ago, I filed an issue [1] about the multicast API. However, due
> > to some reasons, it was laid aside. After that, when I tried to
> cherry-pick
> > the change for experimental use, I found the return type of
> > `selectChannels()` method had changed from `int[]` to `int`, which makes
> > the old implementation not work anymore.
> >
> > From my side, the multicast has always been used for theta-join. As far
> as
> > I know, it’s an essential requirement for some sophisticated joining
> > algorithms. Until now, the Flink non-equi joins can still only be
> executed
> > single-threaded. If we'd like to make some improvements on this, we
> should
> > first take some measures to support multicast pattern.
> >
> > Best,
> > Xingcan
> >
> > [1] https://issues.apache.org/jira/browse/FLINK-6936
> >
> > > On Aug 24, 2019, at 5:54 AM, Zhu Zhu  wrote:
> > >
> > > Hi Piotr,
> > >
> > > Thanks for the explanation.
> > > Agreed that the broadcastEmit(record) is a better choice for
> broadcasting
> > > for the iterations.
> > > As broadcasting for the iterations is the first motivation, let's
> support
> > > it first.
> > >
> > > Thanks,
> > > Zhu Zhu
> > >
> > > Yun Gao  于2019年8月23日周五 下午11:56写道:
> > >
> > >> Hi Piotr,
> > >>
> > >>  Very thanks for the suggestions!
> > >>
> > >> Totally agree with that we could first focus on the broadcast
> > >> scenarios and exposing the broadcastEmit method first considering the
> > >> semantics and performance.
> > >>
> > >> For the keyed stream, I also agree with that broadcasting keyed
> > >> records to all the tasks may be confused considering the semantics of
> > keyed
> > >> partitioner. However, in the iteration case supporting broadcast over
> > keyed
> > >> partitioner should be required since users may create any subgraph for
> > the
> > >> iteration body, including the operators with key. I think a possible
> > >> solution to this issue is to introduce another data type for
> > >> 'broadcastEmit'. For example, for an operator Operator, it may
> > broadcast
> > >> emit another type E instead of T, and the transmitting E will bypass
> the
> > >> partitioner and setting keyed context. This should result in the
> design
> > to
> > >> introduce customized operator event (option 1 in the document). The
> > cost of
> > >> this method is that we need to introduce a new type of StreamElement
> and
> > >> new interface for this type, but it should be suitable for both keyed
> or
> > >> non-keyed partitioner.
> > >>
> > >> Best,
> > >> Yun
> > >>
> > >>
> > >>
> > >> --
> > >> From:Piotr Nowojski 
> > >> Send Time:2019 Aug. 23 (Fri.) 22:29
> > >> To:Zhu Zhu 
> > >> Cc:dev ; Yun Gao 
> > >> Subject:Re: [DISCUSS] Enhance Support for Multicast Communication
> > Pattern
> > >>
> > >> Hi,
> > >>
> > >> If the primary motivation is broadcasting (for the iterations) and we
> > have
> > >> no immediate need for multicast (cross join), I would prefer to first
> > >> expose broadcast via the DataStream API and only later, once we
> finally
> > >> need it, support multicast. As I wrote, multicast would be more
> > challenging
> > >> to implement, with more complicated runtime and API. And re-using
> > multicast
> > >> just to support broadcast doesn’t have much sense

Re: [ANNOUNCE] Zili Chen becomes a Flink committer

2019-09-11 Thread Guowei Ma
Congratulations Zili !

Best,
Guowei


Fabian Hueske  于2019年9月11日周三 下午7:02写道:

> Congrats Zili Chen :-)
>
> Cheers, Fabian
>
> Am Mi., 11. Sept. 2019 um 12:48 Uhr schrieb Biao Liu :
>
>> Congrats Zili!
>>
>> Thanks,
>> Biao /'bɪ.aʊ/
>>
>>
>>
>> On Wed, 11 Sep 2019 at 18:43, Oytun Tez  wrote:
>>
>>> Congratulations!
>>>
>>> ---
>>> Oytun Tez
>>>
>>> *M O T A W O R D*
>>> The World's Fastest Human Translation Platform.
>>> oy...@motaword.com — www.motaword.com
>>>
>>>
>>> On Wed, Sep 11, 2019 at 6:36 AM bupt_ljy  wrote:
>>>
 Congratulations!


 Best,

 Jiayi Liao

  Original Message
 *Sender:* Till Rohrmann
 *Recipient:* dev; user
 *Date:* Wednesday, Sep 11, 2019 17:22
 *Subject:* [ANNOUNCE] Zili Chen becomes a Flink committer

 Hi everyone,

 I'm very happy to announce that Zili Chen (some of you might also know
 him as Tison Kun) accepted the offer of the Flink PMC to become a committer
 of the Flink project.

 Zili Chen has been an active community member for almost 16 months
 now. He helped pushing the Flip-6 effort over the finish line, ported a lot
 of legacy code tests, removed a good part of the legacy code, contributed
 numerous fixes, is involved in the Flink's client API refactoring, drives
 the refactoring of Flink's HighAvailabilityServices and much more. Zili
 Chen also helped the community by PR reviews, reporting Flink issues,
 answering user mails and being very active on the dev mailing list.

 Congratulations Zili Chen!

 Best, Till
 (on behalf of the Flink PMC)

>>>


Re: [ANNOUNCE] Becket Qin joins the Flink PMC

2019-10-31 Thread Guowei Ma
Congratulations, Becket!
Best,
Guowei


Steven Wu  于2019年11月1日周五 上午6:20写道:

> Congratulations, Becket!
>
> On Wed, Oct 30, 2019 at 9:51 PM Shaoxuan Wang  wrote:
>
> > Congratulations, Becket!
> >
> > On Mon, Oct 28, 2019 at 6:08 PM Fabian Hueske  wrote:
> >
> > > Hi everyone,
> > >
> > > I'm happy to announce that Becket Qin has joined the Flink PMC.
> > > Let's congratulate and welcome Becket as a new member of the Flink PMC!
> > >
> > > Cheers,
> > > Fabian
> > >
> >
>


Re: [VOTE] FLIP-309: Support using larger checkpointing interval when source is processing backlog

2023-07-18 Thread Guowei Ma
+1(binding)
Best,
Guowei


On Wed, Jul 19, 2023 at 11:18 AM Hang Ruan  wrote:

> +1 (non-binding)
>
> Thanks for driving.
>
> Best,
> Hang
>
> Leonard Xu  于2023年7月19日周三 10:42写道:
>
> > Thanks Dong for the continuous work.
> >
> > +1(binding)
> >
> > Best,
> > Leonard
> >
> > > On Jul 18, 2023, at 10:16 PM, Jingsong Li 
> > wrote:
> > >
> > > +1 binding
> > >
> > > Thanks Dong for continuous driving.
> > >
> > > Best,
> > > Jingsong
> > >
> > > On Tue, Jul 18, 2023 at 10:04 PM Jark Wu  wrote:
> > >>
> > >> +1 (binding)
> > >>
> > >> Best,
> > >> Jark
> > >>
> > >> On Tue, 18 Jul 2023 at 20:30, Piotr Nowojski 
> > wrote:
> > >>
> > >>> +1 (binding)
> > >>>
> > >>> Piotrek
> > >>>
> > >>> wt., 18 lip 2023 o 08:51 Jing Ge 
> > napisał(a):
> > >>>
> >  +1(binding)
> > 
> >  Best regards,
> >  Jing
> > 
> >  On Tue, Jul 18, 2023 at 8:31 AM Rui Fan <1996fan...@gmail.com>
> wrote:
> > 
> > > +1(binding)
> > >
> > > Best,
> > > Rui Fan
> > >
> > >
> > > On Tue, Jul 18, 2023 at 12:04 PM Dong Lin 
> > wrote:
> > >
> > >> Hi all,
> > >>
> > >> We would like to start the vote for FLIP-309: Support using larger
> > >> checkpointing interval when source is processing backlog [1]. This
> > >>> FLIP
> > > was
> > >> discussed in this thread [2].
> > >>
> > >> The vote will be open until at least July 21st (at least 72
> hours),
> > >> following
> > >> the consensus voting process.
> > >>
> > >> Cheers,
> > >> Yunfeng and Dong
> > >>
> > >> [1] https://cwiki.apache.org/confluence/display/FLINK/FLIP-309
> > >>
> > >>
> > >
> > 
> > >>>
> >
> %3A+Support+using+larger+checkpointing+interval+when+source+is+processing+backlog
> > >> [2]
> > https://lists.apache.org/thread/l1l7f30h7zldjp6ow97y70dcthx7tl37
> > >>
> > >
> > 
> > >>>
> >
> >
>


Re: [DISCUSS] FLIP-383: Support Job Recovery for Batch Jobs

2023-11-18 Thread Guowei Ma
Hi,


This is a very good proposal, as far as I know, it can solve some very
critical production operations in certain scenarios. I have two minor
issues:

As far as I know, there are multiple job managers on standby in some
scenarios. In this case, is your design still effective? I'm unsure if you
have conducted any tests. For instance, standby job managers might take
over these failed jobs more quickly.
Regarding the part about the operator coordinator, how can you ensure that
the checkpoint mechanism can restore the state of the operator coordinator:
For example:
How do you rule out that there might still be some states in the memory of
the original operator coordinator? After all, the implementation was done
under the assumption of scenarios where the job manager doesn't fail.
Additionally, using NO_CHECKPOINT seems a bit odd. Why not use a normal
checkpoint ID greater than 0 and record it in the event store?
If the issues raised in point 2 cannot be resolved in the short term, would
it be possible to consider not supporting failover with a source job
manager?

Best,
Guowei


On Thu, Nov 2, 2023 at 6:01 PM Lijie Wang  wrote:

> Hi devs,
>
> Zhu Zhu and I would like to start a discussion about FLIP-383: Support Job
> Recovery for Batch Jobs[1]
>
> Currently, when Flink’s job manager crashes or gets killed, possibly due to
> unexpected errors or planned nodes decommission, it will cause the
> following two situations:
> 1. Failed, if the job does not enable HA.
> 2. Restart, if the job enable HA. If it’s a streaming job, the job will be
> resumed from the last successful checkpoint. If it’s a batch job, it has to
> run from beginning, all previous progress will be lost.
>
> In view of this, we think the JM crash may cause great regression for batch
> jobs, especially long running batch jobs. This FLIP is mainly to solve this
> problem so that batch jobs can recover most job progress after JM crashes.
> In this FLIP, our goal is to let most finished tasks not need to be re-run.
>
> You can find more details in the FLIP-383[1]. Looking forward to your
> feedback.
>
> [1]
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-383%3A+Support+Job+Recovery+for+Batch+Jobs
>
> Best,
> Lijie
>


Re: [ANNOUNCE] Apache Flink 1.8.0 released

2019-04-10 Thread Guowei Ma
Congratulations!

Thanks Aljoscha and  all contributors!

Best,
Guowei


Jark Wu  于2019年4月10日周三 下午5:47写道:

> Cheers!
>
> Thanks Aljoscha and all others who make 1.8.0 possible.
>
> On Wed, 10 Apr 2019 at 17:33, vino yang  wrote:
>
> > Great news!
> >
> > Thanks Aljoscha for being the release manager and thanks to all the
> > contributors!
> >
> > Best,
> > Vino
> >
> > Driesprong, Fokko  于2019年4月10日周三 下午4:54写道:
> >
> >> Great news! Great effort by the community to make this happen. Thanks
> all!
> >>
> >> Cheers, Fokko
> >>
> >> Op wo 10 apr. 2019 om 10:50 schreef Shaoxuan Wang  >:
> >>
> >> > Thanks Aljoscha and all others who made contributions to FLINK 1.8.0.
> >> > Looking forward to FLINK 1.9.0.
> >> >
> >> > Regards,
> >> > Shaoxuan
> >> >
> >> > On Wed, Apr 10, 2019 at 4:31 PM Aljoscha Krettek  >
> >> > wrote:
> >> >
> >> > > The Apache Flink community is very happy to announce the release of
> >> > Apache
> >> > > Flink 1.8.0, which is the next major release.
> >> > >
> >> > > Apache Flink® is an open-source stream processing framework for
> >> > > distributed, high-performing, always-available, and accurate data
> >> > streaming
> >> > > applications.
> >> > >
> >> > > The release is available for download at:
> >> > > https://flink.apache.org/downloads.html
> >> > >
> >> > > Please check out the release blog post for an overview of the
> >> > improvements
> >> > > for this bugfix release:
> >> > > https://flink.apache.org/news/2019/04/09/release-1.8.0.html
> >> > >
> >> > > The full release notes are available in Jira:
> >> > >
> >> > >
> >> >
> >>
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12344274
> >> > >
> >> > > We would like to thank all contributors of the Apache Flink
> community
> >> who
> >> > > made this release possible!
> >> > >
> >> > > Regards,
> >> > > Aljoscha
> >> >
> >>
> >
>


Re: [PROGRESS-UPDATE] Redesign Flink Scheduling, introducing dedicated Scheduler Component

2019-04-14 Thread Guowei Ma
Thanks Gary for sharing the documents to the community.
The idea makes scheduler more flexible.

Best,
Guowei


Till Rohrmann  于2019年4月13日周六 上午12:21写道:

> Thanks for sharing the current state of the scheduler refactorings with the
> community Gary. The proposed changes look good to me and, hence, +1 for
> proceeding with bringing the interfaces into place.
>
> Cheers,
> Till
>
> On Fri, Apr 12, 2019 at 5:53 PM Gary Yao  wrote:
>
> > Hi all,
> >
> > As you might have seen already, we are currently reworking Flink's
> > scheduling.
> > At the moment, scheduling is a concern that is scattered across different
> > components, such as ExecutionGraph, Execution and SlotPool. Scheduling
> also
> > happens only on the granularity of individual tasks, which make holistic
> > scheduling strategies hard to implement. For more details on our
> motivation
> > see [1]. To track the progress, we have created the umbrella issue
> > FLINK-10429
> > [2] (bear in mind that the current sub-tasks are still subject to
> change).
> >
> > We are currently in the process of finalizing the scheduler interfaces.
> Our
> > current state can be found in [3]. Feel free to review and comment on our
> > design proposal.
> >
> > Best,
> > Gary
> >
> > [1]
> >
> >
> https://docs.google.com/document/d/1q7NOqt05HIN-PlKEEPB36JiuU1Iu9fnxxVGJzylhsxU/edit
> > [2] https://issues.apache.org/jira/browse/FLINK-10429
> > [3]
> >
> >
> https://docs.google.com/document/d/1fstkML72YBO1tGD_dmG2rwvd9bklhRVauh4FSsDDwXU/edit?usp=sharing
> >
>


Re: [ANNOUNCE] Jincheng Sun is now part of the Flink PMC

2019-06-25 Thread Guowei Ma
Congratulations, Jincheng !

Best,
Guowei


zhisheng <173855...@qq.com> 于2019年6月25日周二 下午6:32写道:

> Congratulations, Jincheng
>
>
>
> ---Original---
> From: "Fan Liya"
> Date: Tue, Jun 25, 2019 18:23 PM
> To: "dev";
> Subject: Re: [ANNOUNCE] Jincheng Sun is now part of the Flink PMC
>
>
> Congratulations!
>
> Best,
> Liya Fan
>
> On Tue, Jun 25, 2019 at 6:21 PM Shaoxuan Wang  wrote:
>
> > Congratulations, Jincheng!
> >
> > On Tue, Jun 25, 2019 at 5:54 PM Fabian Hueske  wrote:
> >
> > > Congrats Jincheng!
> > >
> > > Am Di., 25. Juni 2019 um 11:48 Uhr schrieb Aljoscha Krettek <
> > > aljos...@apache.org>:
> > >
> > > > Congratulations! :-)
> > > >
> > > > > On 25. Jun 2019, at 11:34, Wei Zhong 
> wrote:
> > > > >
> > > > > Congratulations Jincheng!
> > > > >
> > > > > Best,
> > > > > Wei
> > > > >
> > > > >
> > > > >> 在 2019年6月25日,15:18,JingsongLee 
> > 写道:
> > > > >>
> > > > >> Jincheng, Congratulations!
> > > > >>
> > > > >> Best, JingsongLee
> > > > >>
> > > > >>
> > > > >> --
> > > > >> From:Dawid Wysakowicz 
> > > > >> Send Time:2019年6月25日(星期二) 14:21
> > > > >> To:dev 
> > > > >> Subject:Re: [ANNOUNCE] Jincheng Sun is now part of the Flink PMC
> > > > >>
> > > > >> Congratulations!
> > > > >>
> > > > >> On 25/06/2019 08:06, Xingcan Cui wrote:
> > > > >>> Congratulations Jincheng and thanks for all you’ve done!
> > > > >>>
> > > > >>> Cheers,
> > > > >>> Xingcan
> > > > >>>
> > > >  On Jun 25, 2019, at 1:59 AM, Tzu-Li (Gordon) Tai <
> > > tzuli...@apache.org>
> > > > wrote:
> > > > 
> > > >  Congratulations Jincheng, great to have you on board :)
> > > > 
> > > >  Cheers,
> > > >  Gordon
> > > > 
> > > >  On Tue, Jun 25, 2019, 11:31 AM Terry Wang 
> > > wrote:
> > > > 
> > > > > Congratulations Jincheng!
> > > > >
> > > > >> 在 2019年6月24日,下午11:08,Robert Metzger  写道:
> > > > >>
> > > > >> Hi all,
> > > > >>
> > > > >> On behalf of the Flink PMC, I'm happy to announce that
> Jincheng
> > > Sun
> > > > is
> > > > > now
> > > > >> part of the Apache Flink Project Management Committee (PMC).
> > > > >>
> > > > >> Jincheng has been a committer since July 2017. He has been
> very
> > > > active on
> > > > >> Flink's Table API / SQL component, as well as helping with
> > > releases.
> > > > >>
> > > > >> Congratulations & Welcome Jincheng!
> > > > >>
> > > > >> Best,
> > > > >> Robert
> > > > >
> > > > >>
> > > > >
> > > >
> > > >
> > >
> >


Re: [ANNOUNCE] Jiangjie (Becket) Qin has been added as a committer to the Flink project

2019-07-18 Thread Guowei Ma
Congrats Becket!

Best,
Guowei


Terry Wang  于2019年7月18日周四 下午5:17写道:

> Congratulations Becket!
>
> > 在 2019年7月18日,下午5:09,Dawid Wysakowicz  写道:
> >
> > Congratulations Becket! Good to have you onboard!
> >
> > On 18/07/2019 10:56, Till Rohrmann wrote:
> >> Congrats Becket!
> >>
> >> On Thu, Jul 18, 2019 at 10:52 AM Jeff Zhang  wrote:
> >>
> >>> Congratulations Becket!
> >>>
> >>> Xu Forward  于2019年7月18日周四 下午4:39写道:
> >>>
>  Congratulations Becket! Well deserved.
> 
> 
>  Cheers,
> 
>  forward
> 
>  Kurt Young  于2019年7月18日周四 下午4:20写道:
> 
> > Congrats Becket!
> >
> > Best,
> > Kurt
> >
> >
> > On Thu, Jul 18, 2019 at 4:12 PM JingsongLee  > .invalid>
> > wrote:
> >
> >> Congratulations Becket!
> >>
> >> Best, Jingsong Lee
> >>
> >>
> >> --
> >> From:Congxian Qiu 
> >> Send Time:2019年7月18日(星期四) 16:09
> >> To:dev@flink.apache.org 
> >> Subject:Re: [ANNOUNCE] Jiangjie (Becket) Qin has been added as a
> > committer
> >> to the Flink project
> >>
> >> Congratulations Becket! Well deserved.
> >>
> >> Best,
> >> Congxian
> >>
> >>
> >> Jark Wu  于2019年7月18日周四 下午4:03写道:
> >>
> >>> Congratulations Becket! Well deserved.
> >>>
> >>> Cheers,
> >>> Jark
> >>>
> >>> On Thu, 18 Jul 2019 at 15:56, Paul Lam 
>  wrote:
>  Congrats Becket!
> 
>  Best,
>  Paul Lam
> 
> > 在 2019年7月18日,15:41,Robert Metzger  写道:
> >
> > Hi all,
> >
> > I'm excited to announce that Jiangjie (Becket) Qin just became
> >>> a
> >> Flink
> > committer!
> >
> > Congratulations Becket!
> >
> > Best,
> > Robert (on behalf of the Flink PMC)
> 
> >>>
> >>> --
> >>> Best Regards
> >>>
> >>> Jeff Zhang
> >>>
> >
>
>


Re: [ANNOUNCE] Zhijiang Wang has been added as a committer to the Flink project

2019-07-22 Thread Guowei Ma
Congratulations Zhijiang

发自我的 iPhone

> 在 2019年7月23日,上午12:55,Xuefu Z  写道:
> 
> Congratulations, Zhijiang!
> 
>> On Mon, Jul 22, 2019 at 7:42 AM Bo WANG  wrote:
>> 
>> Congratulations Zhijiang!
>> 
>> 
>> Best,
>> 
>> Bo WANG
>> 
>> 
>> On Mon, Jul 22, 2019 at 10:12 PM Robert Metzger 
>> wrote:
>> 
>>> Hey all,
>>> 
>>> We've added another committer to the Flink project: Zhijiang Wang.
>>> 
>>> Congratulations Zhijiang!
>>> 
>>> Best,
>>> Robert
>>> (on behalf of the Flink PMC)
>>> 
>> 
> 
> 
> -- 
> Xuefu Zhang
> 
> "In Honey We Trust!"


Re: [ANNOUNCE] Kete Young is now part of the Flink PMC

2019-07-23 Thread Guowei Ma
Congratulations Kurt!

发自我的 iPhone

> 在 2019年7月23日,下午7:13,Bo WANG  写道:
> 
> Congratulations Kurt!
> 
> 
> Best,
> 
> Bo WANG
> 
> 
>> On Tue, Jul 23, 2019 at 5:24 PM Robert Metzger  wrote:
>> 
>> Hi all,
>> 
>> On behalf of the Flink PMC, I'm happy to announce that Kete Young is now
>> part of the Apache Flink Project Management Committee (PMC).
>> 
>> Kete has been a committer since February 2017, working a lot on Table API /
>> SQL. He's currently co-managing the 1.9 release! Thanks a lot for your work
>> for Flink!
>> 
>> Congratulations & Welcome Kurt!
>> 
>> Best,
>> Robert
>> 


Re: Fine Grained Recovery / FLIP-1

2019-07-25 Thread Guowei Ma
Hi,
1. Currently, much work in FLINK-4256 is about failover improvements in the
bouded dataset scenario.
2. For the streaming scenario,  a new shuffle plugin + proper failover
strategy could avoid the "stop-the-word" recovery.
3. We have already done many works about the new shuffle in the old Flink
shuffle architectures because many of our customers have the concern. We
have a plan to move the work to the new Flink pluggable shuffle
architecture.

Best,
Guowei


Thomas Weise  于2019年7月26日周五 上午8:54写道:

> Hi,
>
> We are using Flink for streaming and find the "stop-the-world" recovery
> behavior of Flink prohibitive for use cases that prioritize availability.
> Partial recovery as outlined in FLIP-1 would probably alleviate these
> concerns.
>
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-1+%3A+Fine+Grained+Recovery+from+Task+Failures
>
> Looking at the subtasks in
> https://issues.apache.org/jira/browse/FLINK-4256 it
> appears that much of the work was already done but not much recent
> progress? What is missing (for streaming)? How close is version 2 (recovery
> from limited intermediate results)?
>
> Thanks!
> Thomas
>


Re: [VOTE] Apache Flink Release 1.9.0, release candidate #2

2019-08-16 Thread Guowei Ma
Hi,
-1
We have a benchmark job, which includes a two-input operator.
This job has a big performance regression using 1.9 compared to 1.8.
It's still not very clear why this regression happens.

Best,
Guowei


Yu Li  于2019年8月16日周五 下午3:27写道:

> +1 (non-binding)
>
> - checked release notes: OK
> - checked sums and signatures: OK
> - source release
>  - contains no binaries: OK
>  - contains no 1.9-SNAPSHOT references: OK
>  - build from source: OK (8u102)
>  - mvn clean verify: OK (8u102)
> - binary release
>  - no examples appear to be missing
>  - started a cluster; WebUI reachable, example ran successfully
> - repository appears to contain all expected artifacts
>
> Best Regards,
> Yu
>
>
> On Fri, 16 Aug 2019 at 06:06, Bowen Li  wrote:
>
> > Hi Jark,
> >
> > Thanks for letting me know that it's been like this in previous releases.
> > Though I don't think that's the right behavior, it can be discussed for
> > later release. Thus I retract my -1 for RC2.
> >
> > Bowen
> >
> >
> > On Thu, Aug 15, 2019 at 7:49 PM Jark Wu  wrote:
> >
> > > Hi Bowen,
> > >
> > > Thanks for reporting this.
> > > However, I don't think this is an issue. IMO, it is by design.
> > > The `tEnv.listUserDefinedFunctions()` in Table API and `show
> functions;`
> > in
> > > SQL CLI are intended to return only the registered UDFs, not including
> > > built-in functions.
> > > This is also the behavior in previous versions.
> > >
> > > Best,
> > > Jark
> > >
> > > On Fri, 16 Aug 2019 at 06:52, Bowen Li  wrote:
> > >
> > > > -1 for RC2.
> > > >
> > > > I found a bug https://issues.apache.org/jira/browse/FLINK-13741,
> and I
> > > > think it's a blocker.  The bug means currently if users call
> > > > `tEnv.listUserDefinedFunctions()` in Table API or `show functions;`
> > thru
> > > > SQL would not be able to see Flink's built-in functions.
> > > >
> > > > I'm preparing a fix right now.
> > > >
> > > > Bowen
> > > >
> > > >
> > > > On Thu, Aug 15, 2019 at 8:55 AM Tzu-Li (Gordon) Tai <
> > tzuli...@apache.org
> > > >
> > > > wrote:
> > > >
> > > > > Thanks for all the test efforts, verifications and votes so far.
> > > > >
> > > > > So far, things are looking good, but we still require one more PMC
> > > > binding
> > > > > vote for this RC to be the official release, so I would like to
> > extend
> > > > the
> > > > > vote time for 1 more day, until *Aug. 16th 17:00 CET*.
> > > > >
> > > > > In the meantime, the release notes for 1.9.0 had only just been
> > > finalized
> > > > > [1], and could use a few more eyes before closing the vote.
> > > > > Any help with checking if anything else should be mentioned there
> > > > regarding
> > > > > breaking changes / known shortcomings would be appreciated.
> > > > >
> > > > > Cheers,
> > > > > Gordon
> > > > >
> > > > > [1] https://github.com/apache/flink/pull/9438
> > > > >
> > > > > On Thu, Aug 15, 2019 at 3:58 PM Kurt Young 
> wrote:
> > > > >
> > > > > > Great, then I have no other comments on legal check.
> > > > > >
> > > > > > Best,
> > > > > > Kurt
> > > > > >
> > > > > >
> > > > > > On Thu, Aug 15, 2019 at 9:56 PM Chesnay Schepler <
> > ches...@apache.org
> > > >
> > > > > > wrote:
> > > > > >
> > > > > > > The licensing items aren't a problem; we don't care about Flink
> > > > modules
> > > > > > > in NOTICE files, and we don't have to update the source-release
> > > > > > > licensing since we don't have a pre-built version of the WebUI
> in
> > > the
> > > > > > > source.
> > > > > > >
> > > > > > > On 15/08/2019 15:22, Kurt Young wrote:
> > > > > > > > After going through the licenses, I found 2 suspicions but
> not
> > > sure
> > > > > if
> > > > > > > they
> > > > > > > > are
> > > > > > > > valid or not.
> > > > > > > >
> > > > > > > > 1. flink-state-processing-api is packaged in to flink-dist
> jar,
> > > but
> > > > > not
> > > > > > > > included in
> > > > > > > > NOTICE-binary file (the one under the root directory) like
> > other
> > > > > > modules.
> > > > > > > > 2. flink-runtime-web distributed some JavaScript dependencies
> > > > through
> > > > > > > source
> > > > > > > > codes, the licenses and NOTICE file were only updated inside
> > the
> > > > > module
> > > > > > > of
> > > > > > > > flink-runtime-web, but not the NOTICE file and licenses
> > directory
> > > > > which
> > > > > > > > under
> > > > > > > > the  root directory.
> > > > > > > >
> > > > > > > > Another minor issue I just found is:
> > > > > > > > FLINK-13558 tries to include table examples to flink-dist,
> but
> > I
> > > > > cannot
> > > > > > > > find it in
> > > > > > > > the binary distribution of RC2.
> > > > > > > >
> > > > > > > > Best,
> > > > > > > > Kurt
> > > > > > > >
> > > > > > > >
> > > > > > > > On Thu, Aug 15, 2019 at 6:19 PM Kurt Young  >
> > > > wrote:
> > > > > > > >
> > > > > > > >> Hi Gordon & Timo,
> > > > > > > >>
> > > > > > > >> Thanks for the feedback, and I agree with it. I will
> document
> > > this
> > > > > in
> > > > > > > the
> > > > > > > >> rele

Re: [VOTE] Apache Flink Release 1.9.0, release candidate #2

2019-08-16 Thread Guowei Ma
Hi, till
I can send the job to you offline.
It is just a datastream job and does not use TwoInputSelectableStreamTask.
A->B
 \
   C
 /
D->E
Best,
Guowei


Till Rohrmann  于2019年8月16日周五 下午4:34写道:

> Thanks for reporting this issue Guowei. Could you share a bit more details
> what the job exactly does and which operators it uses? Does the job uses
> the new `TwoInputSelectableStreamTask` which might cause the performance
> regression?
>
> I think it is important to understand where the problem comes from before
> we proceed with the release.
>
> Cheers,
> Till
>
> On Fri, Aug 16, 2019 at 10:27 AM Guowei Ma  wrote:
>
> > Hi,
> > -1
> > We have a benchmark job, which includes a two-input operator.
> > This job has a big performance regression using 1.9 compared to 1.8.
> > It's still not very clear why this regression happens.
> >
> > Best,
> > Guowei
> >
> >
> > Yu Li  于2019年8月16日周五 下午3:27写道:
> >
> > > +1 (non-binding)
> > >
> > > - checked release notes: OK
> > > - checked sums and signatures: OK
> > > - source release
> > >  - contains no binaries: OK
> > >  - contains no 1.9-SNAPSHOT references: OK
> > >  - build from source: OK (8u102)
> > >  - mvn clean verify: OK (8u102)
> > > - binary release
> > >  - no examples appear to be missing
> > >  - started a cluster; WebUI reachable, example ran successfully
> > > - repository appears to contain all expected artifacts
> > >
> > > Best Regards,
> > > Yu
> > >
> > >
> > > On Fri, 16 Aug 2019 at 06:06, Bowen Li  wrote:
> > >
> > > > Hi Jark,
> > > >
> > > > Thanks for letting me know that it's been like this in previous
> > releases.
> > > > Though I don't think that's the right behavior, it can be discussed
> for
> > > > later release. Thus I retract my -1 for RC2.
> > > >
> > > > Bowen
> > > >
> > > >
> > > > On Thu, Aug 15, 2019 at 7:49 PM Jark Wu  wrote:
> > > >
> > > > > Hi Bowen,
> > > > >
> > > > > Thanks for reporting this.
> > > > > However, I don't think this is an issue. IMO, it is by design.
> > > > > The `tEnv.listUserDefinedFunctions()` in Table API and `show
> > > functions;`
> > > > in
> > > > > SQL CLI are intended to return only the registered UDFs, not
> > including
> > > > > built-in functions.
> > > > > This is also the behavior in previous versions.
> > > > >
> > > > > Best,
> > > > > Jark
> > > > >
> > > > > On Fri, 16 Aug 2019 at 06:52, Bowen Li 
> wrote:
> > > > >
> > > > > > -1 for RC2.
> > > > > >
> > > > > > I found a bug https://issues.apache.org/jira/browse/FLINK-13741,
> > > and I
> > > > > > think it's a blocker.  The bug means currently if users call
> > > > > > `tEnv.listUserDefinedFunctions()` in Table API or `show
> functions;`
> > > > thru
> > > > > > SQL would not be able to see Flink's built-in functions.
> > > > > >
> > > > > > I'm preparing a fix right now.
> > > > > >
> > > > > > Bowen
> > > > > >
> > > > > >
> > > > > > On Thu, Aug 15, 2019 at 8:55 AM Tzu-Li (Gordon) Tai <
> > > > tzuli...@apache.org
> > > > > >
> > > > > > wrote:
> > > > > >
> > > > > > > Thanks for all the test efforts, verifications and votes so
> far.
> > > > > > >
> > > > > > > So far, things are looking good, but we still require one more
> > PMC
> > > > > > binding
> > > > > > > vote for this RC to be the official release, so I would like to
> > > > extend
> > > > > > the
> > > > > > > vote time for 1 more day, until *Aug. 16th 17:00 CET*.
> > > > > > >
> > > > > > > In the meantime, the release notes for 1.9.0 had only just been
> > > > > finalized
> > > > > > > [1], and could use a few more eyes before closing the vote.
> > > > > > > Any help with checking if anything else should be mentioned
> there
> > &

Re: Re: [ANNOUNCE] New Apache Flink Committer - Weijie Guo

2023-02-13 Thread Guowei Ma
Congratulations!!! Weijie

Best,
Guowei


On Tue, Feb 14, 2023 at 1:49 PM Dian Fu  wrote:

> Congratulations, Weijie!
>
> Regards,
> Dian
>
> On Mon, Feb 13, 2023 at 10:55 PM Matthias Pohl
>  wrote:
>
> > Congrats, Weijie! :-)
> >
> > On Mon, Feb 13, 2023 at 10:50 AM Sergey Nuyanzin 
> > wrote:
> >
> > > Congratulations, Weijie!
> > >
> > > On Mon, Feb 13, 2023 at 10:32 AM Yun Tang  wrote:
> > >
> > > > Congratulations, Weijie!
> > > >
> > > > Best
> > > > Yun Tang
> > > >
> > > > On 2023/02/13 09:23:16 ramkrishna vasudevan wrote:
> > > > > Congratulations!!! Weijie
> > > > >
> > > > > Regards
> > > > > Ram
> > > > >
> > > > > On Mon, Feb 13, 2023 at 2:17 PM Yuepeng Pan 
> wrote:
> > > > >
> > > > > > Congratulations, Weijie!
> > > > > >
> > > > > > Best,
> > > > > >
> > > > > > Best, Yuepeng Pan.
> > > > > >
> > > > > > 在 2023-02-13 16:26:32,"Lincoln Lee"  写道:
> > > > > > >Congratulations, Weijie!
> > > > > > >
> > > > > > >Best,
> > > > > > >Lincoln Lee
> > > > > > >
> > > > > > >
> > > > > > >Martijn Visser  于2023年2月13日周一
> 16:17写道:
> > > > > > >
> > > > > > >> Congrats Weijie!
> > > > > > >>
> > > > > > >> On Mon, Feb 13, 2023 at 7:19 AM Weihua Hu <
> > huweihua@gmail.com
> > > >
> > > > > > wrote:
> > > > > > >>
> > > > > > >> > Congratulations, Weijie!
> > > > > > >> >
> > > > > > >> > Best,
> > > > > > >> > Weihua
> > > > > > >> >
> > > > > > >> > > On Feb 13, 2023, at 11:55, Lijie Wang <
> > > wangdachui9...@gmail.com
> > > > >
> > > > > > >> wrote:
> > > > > > >> > >
> > > > > > >> > > Congratulations, Weijie!
> > > > > > >> >
> > > > > > >> >
> > > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> > >
> > > --
> > > Best regards,
> > > Sergey
> > >
> >
>


Re: [ANNOUNCE] New Apache Flink Committer - Jing Ge

2023-02-14 Thread Guowei Ma
Congratulations!
Best,
Guowei


On Tue, Feb 14, 2023 at 5:11 PM Leonard Xu  wrote:

> Congratulations!! Jing
>
> Best,
> Leonard
>
> > On Feb 14, 2023, at 4:33 PM, Hang Ruan  wrote:
> >
> > Congratulations Jing!
> >
> > Shammon FY  于2023年2月14日周二 16:04写道:
> >
> >> Congratulations Jing!
> >>
> >> Best,
> >> Shammon
> >>
> >> On Tue, Feb 14, 2023 at 4:00 PM weijie guo 
> >> wrote:
> >>
> >>> Congratulations Jing!
> >>>
> >>> Best regards,
> >>>
> >>> Weijie
> >>>
> >>>
> >>> Jingsong Li  于2023年2月14日周二 15:54写道:
> >>>
>  Congratulations Jing!
> 
>  Best,
>  Jingsong
> 
>  On Tue, Feb 14, 2023 at 3:50 PM godfrey he 
> >> wrote:
> >
> > Hi everyone,
> >
> > On behalf of the PMC, I'm very happy to announce Jing Ge as a new
> >> Flink
> > committer.
> >
> > Jing has been consistently contributing to the project for over 1
> >> year.
> > He authored more than 50 PRs and reviewed more than 40 PRs
> > with mainly focus on connector, test, and document modules.
> > He was very active on the mailing list (more than 90 threads) last
> >>> year,
> > which includes participating in a lot of dev discussions (30+),
> > providing many effective suggestions for FLIPs and answering
> > many user questions. He was the Flink Forward 2022 keynote speaker
> > to help promote Flink and  a trainer for Flink troubleshooting and
>  performance
> > tuning of Flink Forward 2022 training program.
> >
> > Please join me in congratulating Jing for becoming a Flink committer!
> >
> > Best,
> > Godfrey
> 
> >>>
> >>
>
>


[ANNOUNCE] New Apache Flink PMC Member - Dong Lin

2023-02-15 Thread Guowei Ma
Hi, everyone

On behalf of the PMC, I'm very happy to announce Dong Lin as a new
Flink PMC.

Dong is currently the main driver of Flink ML. He reviewed a large
number of Flink ML related PRs and also participated in many Flink ML
improvements, such as "FLIP-173","FLIP-174" etc. At the same time, he made
a lot of evangelism events contributions for the Flink ML ecosystem.
In fact, in addition to the Flink machine learning field, Dong has also
participated in many other improvements in Flink, such as "FLIP-205",
"FLIP-266","FLIP-269","FLIP-274" etc.
Please join me in congratulating Dong Lin for becoming a Flink PMC!

Best,
Guowei(on behalf of the Flink PMC)


Re: [ANNOUNCE] New Apache Flink Committer - Rui Fan

2023-02-20 Thread Guowei Ma
Congratulations, Rui!

Best,
Guowei


On Tue, Feb 21, 2023 at 2:57 PM Wei Zhong  wrote:

> Congratulations, Rui!
>
> Best,
> Wei
>
> > 2023年2月21日 下午1:52,Shammon FY  写道:
> >
> > Congratulations, Rui!
> >
> >
> > Best,
> > Shammon
> >
> > On Tue, Feb 21, 2023 at 1:40 PM Sergey Nuyanzin 
> wrote:
> >
> >> Congratulations, Rui!
> >>
> >> On Tue, Feb 21, 2023 at 4:53 AM Weihua Hu 
> wrote:
> >>
> >>> Congratulations, Rui!
> >>>
> >>> Best,
> >>> Weihua
> >>>
> >>>
> >>> On Tue, Feb 21, 2023 at 11:28 AM Biao Geng 
> wrote:
> >>>
>  Congrats, Rui!
>  Best,
>  Biao Geng
> 
>  weijie guo  于2023年2月21日周二 11:21写道:
> 
> > Congrats, Rui!
> >
> > Best regards,
> >
> > Weijie
> >
> >
> > Leonard Xu  于2023年2月21日周二 11:03写道:
> >
> >> Congratulations, Rui!
> >>
> >> Best,
> >> Leonard
> >>
> >>> On Feb 21, 2023, at 9:50 AM, Matt Wang  wrote:
> >>>
> >>> Congrats Rui
> >>>
> >>>
> >>> --
> >>>
> >>> Best,
> >>> Matt Wang
> >>>
> >>>
> >>>  Replied Message 
> >>> | From | yuxia |
> >>> | Date | 02/21/2023 09:22 |
> >>> | To | dev |
> >>> | Subject | Re: [ANNOUNCE] New Apache Flink Committer - Rui Fan |
> >>> Congrats Rui
> >>>
> >>> Best regards,
> >>> Yuxia
> >>>
> >>> - 原始邮件 -
> >>> 发件人: "Samrat Deb" 
> >>> 收件人: "dev" 
> >>> 发送时间: 星期二, 2023年 2 月 21日 上午 1:09:25
> >>> 主题: Re: [ANNOUNCE] New Apache Flink Committer - Rui Fan
> >>>
> >>> Congrats Rui
> >>>
> >>> On Mon, 20 Feb 2023 at 10:28 PM, Anton Kalashnikov <
>  kaa@yandex.com
> >>
> >>> wrote:
> >>>
> >>> Congrats Rui!
> >>>
> >>> --
> >>> Best regards,
> >>> Anton Kalashnikov
> >>>
> >>> On 20.02.23 17:53, Matthias Pohl wrote:
> >>> Congratulations, Rui :)
> >>>
> >>> On Mon, Feb 20, 2023 at 5:10 PM Jing Ge
> >>  
> >>> wrote:
> >>>
> >>> Congrats Rui!
> >>>
> >>> On Mon, Feb 20, 2023 at 3:19 PM Piotr Nowojski <
> >>> pnowoj...@apache.org
> >
> >>> wrote:
> >>>
> >>> Hi, everyone
> >>>
> >>> On behalf of the PMC, I'm very happy to announce Rui Fan as a new
>  Flink
> >>> Committer.
> >>>
> >>> Rui Fan has been active on a small scale since August 2019, and
>  ramped
> >>> up
> >>> his contributions in the 2nd half of 2021. He was mostly involved
> >>> in
> >>> quite
> >>> demanding performance related work around the network stack and
> >>> checkpointing, like re-using TCP connections [1], and many
> >> crucial
> >>> improvements to the unaligned checkpoints. Among others:
> >> FLIP-227:
> >>> Support
> >>> overdraft buffer [2], Merge small ChannelState file for Unaligned
> >>> Checkpoint [3], Timeout aligned to unaligned checkpoint barrier
> >> in
>  the
> >>> output buffers [4].
> >>>
> >>> Please join me in congratulating Rui Fan for becoming a Flink
> >>> Committer!
> >>>
> >>> Best,
> >>> Piotr Nowojski (on behalf of the Flink PMC)
> >>>
> >>> [1] https://issues.apache.org/jira/browse/FLINK-22643
> >>> [2]
> >>>
> >>>
> >>>
> >>>
> >>
> >
> 
> >>>
> >>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-227%3A+Support+overdraft+buffer
> >>> [3] https://issues.apache.org/jira/browse/FLINK-26803
> >>> [4] https://issues.apache.org/jira/browse/FLINK-27251
> >>>
> >>>
> >>
> >>
> >
> 
> >>>
> >>
> >>
> >> --
> >> Best regards,
> >> Sergey
> >>
>
>


Re: [ANNOUNCE] New Apache Flink Committer - Anton Kalashnikov

2023-02-20 Thread Guowei Ma
Congratulations, Anton!

Best,
Guowei


On Tue, Feb 21, 2023 at 1:52 PM Shammon FY  wrote:

> Congratulations, Anton!
>
> Best,
> Shammon
>
> On Tue, Feb 21, 2023 at 1:41 PM Sergey Nuyanzin 
> wrote:
>
> > Congratulations, Anton!
> >
> > On Tue, Feb 21, 2023 at 4:53 AM Weihua Hu 
> wrote:
> >
> > > Congratulations, Anton!
> > >
> > > Best,
> > > Weihua
> > >
> > >
> > > On Tue, Feb 21, 2023 at 11:22 AM weijie guo  >
> > > wrote:
> > >
> > > > Congratulations, Anton!
> > > >
> > > > Best regards,
> > > >
> > > > Weijie
> > > >
> > > >
> > > > Leonard Xu  于2023年2月21日周二 11:02写道:
> > > >
> > > > > Congratulations, Anton!
> > > > >
> > > > > Best,
> > > > > Leonard
> > > > >
> > > > >
> > > > > > On Feb 21, 2023, at 10:02 AM, Rui Fan  wrote:
> > > > > >
> > > > > > Congratulations, Anton!
> > > > > >
> > > > > > Best,
> > > > > > Rui Fan
> > > > > >
> > > > > > On Tue, Feb 21, 2023 at 9:23 AM yuxia <
> luoyu...@alumni.sjtu.edu.cn
> > >
> > > > > wrote:
> > > > > >
> > > > > >> Congrats Anton!
> > > > > >>
> > > > > >> Best regards,
> > > > > >> Yuxia
> > > > > >>
> > > > > >> - 原始邮件 -
> > > > > >> 发件人: "Matthias Pohl" 
> > > > > >> 收件人: "dev" 
> > > > > >> 发送时间: 星期二, 2023年 2 月 21日 上午 12:52:40
> > > > > >> 主题: Re: [ANNOUNCE] New Apache Flink Committer - Anton
> Kalashnikov
> > > > > >>
> > > > > >> Congratulations, Anton! :-)
> > > > > >>
> > > > > >> On Mon, Feb 20, 2023 at 5:09 PM Jing Ge
> >  > > >
> > > > > >> wrote:
> > > > > >>
> > > > > >>> Congrats Anton!
> > > > > >>>
> > > > > >>> On Mon, Feb 20, 2023 at 5:02 PM Samrat Deb <
> > decordea...@gmail.com>
> > > > > >> wrote:
> > > > > >>>
> > > > >  congratulations Anton!
> > > > > 
> > > > >  Bests,
> > > > >  Samrat
> > > > > 
> > > > >  On Mon, 20 Feb 2023 at 9:29 PM, John Roesler <
> > vvcep...@apache.org
> > > >
> > > > > >>> wrote:
> > > > > 
> > > > > > Congratulations, Anton!
> > > > > > -John
> > > > > >
> > > > > > On Mon, Feb 20, 2023, at 08:18, Piotr Nowojski wrote:
> > > > > >> Hi, everyone
> > > > > >>
> > > > > >> On behalf of the PMC, I'm very happy to announce Anton
> > > Kalashnikov
> > > > > >>> as a
> > > > > > new
> > > > > >> Flink Committer.
> > > > > >>
> > > > > >> Anton has been very active for almost two years already,
> > > authored
> > > > > >> and
> > > > > >> reviewed many PRs over this time. He is active in the
> Flink's
> > > > > >>> runtime,
> > > > > >> being the main author of improvements like Buffer Debloating
> > > > > >>> (FLIP-183)
> > > > > >> [1], solved many bugs and fixed many test instabilities,
> > > generally
> > > > > > speaking
> > > > > >> helping with the maintenance of runtime components.
> > > > > >>
> > > > > >> Please join me in congratulating Anton Kalashnikov for
> > becoming
> > > a
> > > > > >>> Flink
> > > > > >> Committer!
> > > > > >>
> > > > > >> Best,
> > > > > >> Piotr Nowojski (on behalf of the Flink PMC)
> > > > > >>
> > > > > >> [1]
> > > > > >>
> > > > > >
> > > > > 
> > > > > >>>
> > > > > >>
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-183%3A+Dynamic+buffer+size+adjustment
> > > > > >
> > > > > 
> > > > > >>>
> > > > > >>
> > > > >
> > > > >
> > > >
> > >
> >
> >
> > --
> > Best regards,
> > Sergey
> >
>


Re: [ANNOUNCE] New Apache Flink Committer - Yuxia Luo

2023-03-12 Thread Guowei Ma
congratulations Yuxia
Best,
Guowei


On Mon, Mar 13, 2023 at 10:43 AM Junrui Lee  wrote:

> Congratulations, Yuxia!
>
> Best,
> Junrui
>
> Yanfei Lei  于2023年3月13日周一 10:42写道:
>
> > Congratulations, Yuxia!
> >
> > Best,
> > Yanfei
> >
> >
> > Samrat Deb  于2023年3月13日周一 10:41写道:
> > >
> > > congratulations Yuxia
> > >
> > > Bests,
> > > Samrat
> > >
> > > On Mon, 13 Mar 2023 at 8:06 AM, Yuxin Tan 
> > wrote:
> > >
> > > > Congratulations, Yuxia!
> > > >
> > > > Best,
> > > > Yuxin
> > > >
> > > >
> > > > Jark Wu  于2023年3月13日周一 10:26写道:
> > > >
> > > > > Hi, everyone
> > > > >
> > > > > On behalf of the PMC, I'm very happy to announce Yuxia Luo as a new
> > Flink
> > > > > Committer.
> > > > >
> > > > > Yuxia has been continuously contributing to the Flink project for
> > almost
> > > > > two
> > > > > years, authored and reviewed hundreds of PRs over this time. He is
> > > > > currently
> > > > > the core maintainer of the Hive component, where he contributed
> many
> > > > > valuable
> > > > > features, including the Hive dialect with 95% compatibility and
> small
> > > > file
> > > > > compaction.
> > > > > In addition, Yuxia driven FLIP-282 (DELETE & UPDATE API) to better
> > > > > integrate
> > > > > Flink with data lakes. He actively participated in dev discussions
> > and
> > > > > answered
> > > > > many questions on the user mailing list.
> > > > >
> > > > > Please join me in congratulating Yuxia Luo for becoming a Flink
> > > > Committer!
> > > > >
> > > > > Best,
> > > > > Jark Wu (on behalf of the Flink PMC)
> > > > >
> > > >
> >
>


Re: [ANNOUNCE] Flink Table Store Joins Apache Incubator as Apache Paimon(incubating)

2023-03-28 Thread Guowei Ma
Congratulations!

Best,
Guowei


On Tue, Mar 28, 2023 at 12:02 PM Yuxin Tan  wrote:

> Congratulations!
>
> Best,
> Yuxin
>
>
> Guanghui Zhang  于2023年3月28日周二 11:06写道:
>
>> Congratulations!
>>
>> Best,
>> Zhang Guanghui
>>
>> Hang Ruan  于2023年3月28日周二 10:29写道:
>>
>> > Congratulations!
>> >
>> > Best,
>> > Hang
>> >
>> > yu zelin  于2023年3月28日周二 10:27写道:
>> >
>> >> Congratulations!
>> >>
>> >> Best,
>> >> Yu Zelin
>> >>
>> >> 2023年3月27日 17:23,Yu Li  写道:
>> >>
>> >> Dear Flinkers,
>> >>
>> >>
>> >>
>> >> As you may have noticed, we are pleased to announce that Flink Table
>> Store has joined the Apache Incubator as a separate project called Apache
>> Paimon(incubating) [1] [2] [3]. The new project still aims at building a
>> streaming data lake platform for high-speed data ingestion, change data
>> tracking and efficient real-time analytics, with the vision of supporting a
>> larger ecosystem and establishing a vibrant and neutral open source
>> community.
>> >>
>> >>
>> >>
>> >> We would like to thank everyone for their great support and efforts
>> for the Flink Table Store project, and warmly welcome everyone to join the
>> development and activities of the new project. Apache Flink will continue
>> to be one of the first-class citizens supported by Paimon, and we believe
>> that the Flink and Paimon communities will maintain close cooperation.
>> >>
>> >>
>> >> 亲爱的Flinkers,
>> >>
>> >>
>> >> 正如您可能已经注意到的,我们很高兴地宣布,Flink Table Store 已经正式加入 Apache
>> >> 孵化器独立孵化 [1] [2] [3]。新项目的名字是
>> >> Apache
>> Paimon(incubating),仍致力于打造一个支持高速数据摄入、流式数据订阅和高效实时分析的新一代流式湖仓平台。此外,新项目将支持更加丰富的生态,并建立一个充满活力和中立的开源社区。
>> >>
>> >>
>> >> 在这里我们要感谢大家对 Flink Table Store
>> >> 项目的大力支持和投入,并热烈欢迎大家加入新项目的开发和社区活动。Apache Flink 将继续作为 Paimon
>> 支持的主力计算引擎之一,我们也相信
>> >> Flink 和 Paimon 社区将继续保持密切合作。
>> >>
>> >>
>> >> Best Regards,
>> >> Yu (on behalf of the Apache Flink PMC and Apache Paimon PPMC)
>> >>
>> >> 致礼,
>> >> 李钰(谨代表 Apache Flink PMC 和 Apache Paimon PPMC)
>> >>
>> >> [1] https://paimon.apache.org/
>> >> [2] https://github.com/apache/incubator-paimon
>> >> [3]
>> https://cwiki.apache.org/confluence/display/INCUBATOR/PaimonProposal
>> >>
>> >>
>> >>
>>
>


Re: [VOTE] Apache Flink ML Release 2.2.0, release candidate #2

2023-04-14 Thread Guowei Ma
Hi Dong,

Thanks for driving this release!

+1 (binding)

* Checked JIRA release notes
* Verified signature and checksum for the source
* Download the source code and build the code with JDK8
* Browsed through README.md files.

Best,
Guowei


On Fri, Apr 14, 2023 at 2:48 PM Zhipeng Zhang 
wrote:

> Hi Dong,
> Thanks for driving this release!
>
> +1 (non-binding)
>
> Here is what I have checked.
> - Verified that the checksums and GPG files.
> - Verified that the source distributions do not contain any binaries.
> - Built the source distribution and run all unit tests.
> - Verified that all POM files point to the same version.
> - Browsed through JIRA release notes files.
> - Browsed through README.md files.
> - Verified the source code tag.
>
>
> Dong Lin  于2023年4月13日周四 18:28写道:
>
> >
> > Hi everyone,
> >
> > Please review and vote on the release candidate #2 for version 2.2.0 of
> > Apache Flink ML as follows:
> >
> > [ ] +1, Approve the release
> > [ ] -1, Do not approve the release (please provide specific comments)
> >
> > **Testing Guideline**
> >
> > You can find here [1] a page in the project wiki on instructions for
> > testing.
> >
> > To cast a vote, it is not necessary to perform all listed checks, but
> > please mention which checks you have performed when voting.
> >
> > **Release Overview**
> >
> > As an overview, the release consists of the following:
> > a) Flink ML source release to be deployed to dist.apache.org
> > b) Flink ML Python source distributions to be deployed to PyPI
> > c) Maven artifacts to be deployed to the Maven Central Repository
> >
> > **Staging Areas to Review**
> >
> > The staging areas containing the above-mentioned artifacts are as
> follows, for
> > your review:
> >
> > - All artifacts for a) and b) can be found in the corresponding dev
> repository
> > at dist.apache.org [2], which are signed with the key with fingerprint
> AFAC
> > DB09 E6F0 FF28 C93D 64BC BEED 4F6C B9F7 7D0E [3]
> > - All artifacts for c) can be found at the Apache Nexus Repository [4]
> >
> > **Other links for your review**
> >
> > - JIRA release notes [5]
> > - Source code tag "release-2.2.0-rc2" [6]
> > - PR to update the website Downloads page to include Flink ML links [7]
> >
> > **Vote Duration**
> >
> > The voting time will run for at least 72 hours. It is adopted by majority
> > approval, with at least 3 PMC affirmative votes.
> >
> >
> > Cheers,
> > Dong
> >
> >
> > [1]
> https://cwiki.apache.org/confluence/display/FLINK/Verifying+a+Flink+ML+
> > Release
> > [2] https://dist.apache.org/repos/dist/dev/flink/flink-ml-2.2.0-rc2/
> > [3] https://dist.apache.org/repos/dist/release/flink/KEYS
> > [4]
> https://repository.apache.org/content/repositories/orgapacheflink-1605/
> > [5]
> >
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12351884
> > [6] https://github.com/apache/flink-ml/releases/tag/release-2.2.0-rc2
> > [7] https://github.com/apache/flink-web/pull/630
>
>
>
> --
> best,
> Zhipeng
>


Re: [ANNOUNCE] New scalafmt formatter has been merged

2022-04-13 Thread Guowei Ma
Hi, Francesco
Thanks for your work!
Best,
Guowei


On Wed, Apr 13, 2022 at 5:35 PM Dian Fu  wrote:

> Thanks a lot for this great work Francesco!
>
> Regards,
> Dian
>
> On Wed, Apr 13, 2022 at 3:23 PM Marios Trivyzas  wrote:
>
> > Thank you for this Francesco!
> >
> > It will really improve the lives of everyone touching scala code!
> >
> > Best,
> > Marios
> >
> > On Wed, Apr 13, 2022 at 9:55 AM Timo Walther  wrote:
> >
> > > Thanks for the great work Francesco!
> > >
> > > This will improve the contributor productivity a lot and ease reviews.
> > > This change was long overdue.
> > >
> > > Regards,
> > > Timo
> > >
> > > Am 12.04.22 um 17:21 schrieb Francesco Guardiani:
> > > > Hi all,
> > > > The new scalafmt formatter has been merged. From now on, just using
> mvn
> > > > spotless:apply as usual will format both Java and Scala, and Intellij
> > > will
> > > > automatically pick up the scalafmt config for who has the Scala
> plugin
> > > > installed. If it doesn't, just go in Preferences > Editor > Code
> Style
> > >
> > > > Scala and change the Formatter to scalafmt. If you use the actions on
> > > save
> > > > plugin, make sure you have the reformat on save enabled for Scala.
> > > >
> > > > For more details on integration with IDEs, please refer to
> > > > https://scalameta.org/scalafmt/docs/installation.html
> > > >
> > > > If you have a pending PR with Scala changes, chances are you're going
> > to
> > > > have conflicts with upstream/master now. In order to fix it, here is
> > the
> > > > suggested procedure:
> > > >
> > > > - Do an interactive rebase on commit
> > > > 3ea3fee5ac996f6ae8836c3cba252f974d20bd2e, which is the commit
> > before
> > > the
> > > > refactoring of the whole codebase, fixing as usual the
> conflicting
> > > changes.
> > > > This will make sure you won't miss the changes between your
> branch
> > > and
> > > > master *before* the reformatting commit.
> > > > - Do a rebase on commit 91d81c427aa6312841ca868d54e8ce6ea721cd60
> > > > accepting all changes from your local branch. You can easily do
> > that
> > > via git
> > > > rebase -Xours 91d81c427aa6312841ca868d54e8ce6ea721cd60
> > > > - Run mvn spotless:apply and commit all the changes
> > > > - Do an interactive rebase on upstream/master. This will make
> sure
> > > you
> > > > won't miss the changes between your branch and master *after* the
> > > > reformatting commit.
> > > > - Force push your branch to update the PR
> > > >
> > > > Sorry for this noise!
> > > >
> > > > Thank you,
> > > > FG
> > > >
> > >
> > >
> >
> > --
> > Marios
> >
>


Re: [VOTE] Release 1.15.0, release candidate #3

2022-04-18 Thread Guowei Ma
+1(binding)

- Verified the signature and checksum of the release binary
- Run the SqlClient example
- Run the WordCount example
- Compile from the source and success

Best,
Guowei


On Mon, Apr 18, 2022 at 11:13 AM Xintong Song  wrote:

> +1 (binding)
>
> - verified signature and checksum
> - build from source
> - run example jobs in a standalone cluster, everything looks expected
>
>
> Thank you~
>
> Xintong Song
>
>
>
> On Fri, Apr 15, 2022 at 12:56 PM Yun Gao 
> wrote:
>
> > Hi everyone,
> >
> > Please review and vote on the release candidate #3 for the version
> 1.15.0,
> > as follows:
> > [ ] +1, Approve the release[ ] -1, Do not approve the release (please
> > provide specific comments)
> >
> > The complete staging area is available for your review, which includes:
> > * JIRA release notes [1],* the official Apache source release and binary
> > convenience releases to be deployed to dist.apache.org [2],
> >which are signed with the key with fingerprint
> > CBE82BEFD827B08AFA843977EDBF922A7BC84897 [3],
> > * all artifacts to be deployed to the Maven Central Repository [4],
> > * source code tag "release-1.15.0-rc3" [5],* website pull request listing
> > the new release and adding announcement blog post [6].
> >
> > The vote will be open for at least 72 hours. It is adopted by majority
> > approval, with at least 3 PMC affirmative votes.
> >
> > Thanks,
> > Joe, Till and Yun Gao
> > [1]
> >
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12350442
> > [2] https://dist.apache.org/repos/dist/dev/flink/flink-1.15.0-rc3/
> > [3] https://dist.apache.org/repos/dist/release/flink/KEYS
> > [4]
> > https://repository.apache.org/content/repositories/orgapacheflink-1497/
> > [5] https://github.com/apache/flink/releases/tag/release-1.15.0-rc3/
> > [6] https://github.com/apache/flink-web/pull/526
> >
> >
> >
>


Re: Re: [VOTE] Release 1.15.0, release candidate #4

2022-04-25 Thread Guowei Ma
+1 (binding)

1. Build from the source
2. Verify the checksums and signatures
3. Start a local standalone cluster
4. Submit a streaming example job and stop it on the ui.
5. Submit a job running with batch mode.

Best,
Guowei


On Tue, Apr 26, 2022 at 10:56 AM Zhu Zhu  wrote:

> +1 (binding)
> - Checked checksums and signatures
> - Built from source
> - Ran a batch job using AdaptiveBatchScheduler on yarn cluster
> - Ran an example job(parallelism=1000) on yarn cluster
> - Checked the website PR
>
> Thanks,
> Zhu
>
> Yun Gao  于2022年4月26日周二 10:37写道:
> >
> > Very thanks Matthias for tracking and double confirming this issue!
> >
> > Best,
> > Yun Gao
> >
> >
> >
> >  --Original Mail --
> > Sender:Matthias Pohl 
> > Send Date:Mon Apr 25 23:52:30 2022
> > Recipients:Yun Gao 
> > CC:dev , Yun Gao 
> > Subject:Re: [VOTE] Release 1.15.0, release candidate #4
> >
> > +1 (non-binding)
> >
> > The issue around the user mailing list thread [1] is identified. We
> couldn't come up with a reason for why some change in 1.15 would have
> caused the behavior. The bug itself is/was already present in 1.14. All the
> findings are collected in FLINK-27354 [2]. We're not considering this to be
> a blocker for 1.15.
> >
> > * Checked checksums
> > * Compiled Flink 1.15 from source
> > * Did a test run with ZooKeeper HA and standalone setup to verify the
> repeatable cleanup and JobResultStore working based on rc3; verified that
> the diff between rc3 and rc4 didn't add any significant changes to run
> >
> > [1] https://lists.apache.org/thread/5pm3crntmb1hl17h4txnlhjz34clghrg
> > [2] https://issues.apache.org/jira/browse/FLINK-27354
> > On Mon, Apr 25, 2022 at 4:38 AM Yun Gao  wrote:
> >
> > Very thanks Peter for the help! The vote time
> > would exclude the weekend, thus no worry for
> > that.
> >
> > Best,
> > Yun Gao
> >
> >
> > --
> > From:Peter Schrott 
> > Send Time:2022 Apr. 25 (Mon.) 09:17
> > To:Matthias Pohl 
> > Cc:dev ; Yun Gao 
> > Subject:Re: [VOTE] Release 1.15.0, release candidate #4
> >
> > Hi all,
> >
> > Unfortunately I could not extract the logs in time today - the ordering
> in
> > the csv was wrong and the csv was not parsable due to extra commas in the
> > logs... I discovered the issue in a pre-production cluster.
> >
> > I am gone for the weekend - I don't have access to the systems - but I
> will
> > retry to get the logs Monday morning, CEST.
> >
> > Sorry for the delay. I hope this heads up helps.
> >
> > Best, Peter
> >
> > Matthias Pohl  schrieb am Fr., 22. Apr. 2022,
> 17:44:
> >
> > > An issue was reported on the user ML [1] by Peter (CC) that was
> observed
> > > on rc4. We couldn't find any blocker based on the description so far.
> But
> > > it doesn't explain the entire problem the user is describing, though.
> We're
> > > still waiting for the logs to investigate the cause of the JobMaster
> not
> > > terminating.
> > >
> > > Considering that it's unclear when the logs are provided and that we
> > > haven't found anything similar during the release testing we might
> want to
> > > handle this as a non-blocking issue if Peter cannot provide the logs
> in a
> > > reasonable time frame. But we might want to give him maybe till Monday
> to
> > > provide the logs. WDYT?
> > >
> > > Matthias
> > >
> > > [1] https://lists.apache.org/thread/5pm3crntmb1hl17h4txnlhjz34clghrg
> > >
> > > On Fri, Apr 22, 2022 at 11:03 AM Xingbo Huang 
> wrote:
> > >
> > >> +1 (non-binding)
> > >>
> > >> - Verified checksums and signatures
> > >> - Verified Python wheel package contents
> > >> - Pip install apache-flink-libraries source package and apache-flink
> wheel
> > >> package in Mac/Linux
> > >> - Run the examples from Python Table API Tutorial in Python REPL
> > >> - Test the Python UDF jobs in Thread Mode.
> > >>
> > >> Best,
> > >> Xingbo
> > >>
> > >> Dawid Wysakowicz  于2022年4月21日周四 18:52写道:
> > >>
> > >> > +1 (binding),
> > >> >
> > >> >- checked licenses diff to my previous checks on rc2, this time
> > >> >everything seems ok
> > >> >- checked checksums, signatures, there are no binaries
> > >> >- compiled from sources
> > >> >- run a standalone cluster, clicked through the UI
> > >> >- run StateMachineExample took a savepoint in native format and
> > >> >stopped with a savepoint in native format
> > >> >
> > >> > Best,
> > >> >
> > >> > Dawid
> > >> > On 21/04/2022 05:49, Yun Gao wrote:
> > >> >
> > >> > Hi everyone,
> > >> >
> > >> > Please review and vote on the release candidate #4 for the version
> > >> 1.15.0, as follows:
> > >> > [ ] +1, Approve the release[ ] -1, Do not approve the release
> (please
> > >> provide specific comments)
> > >> >
> > >> > The complete staging area is available for your review, which
> includes:
> > >> > * JIRA release notes [1],* the official Apache source release and
> > >> binary convenience releases to be deployed to dist.apache.org [2],
> > >> >

Re: Re: [DISCUSS] FLIP-168: Speculative execution for Batch Job

2022-04-28 Thread Guowei Ma
Hi, zhu

Many thanks to zhuzhu for initiating the FLIP discussion. Overall I think
it's ok, I just have 3 small questions

1. How to judge whether the Execution Vertex belongs to a slow task.
The current calculation method is: the current timestamp minus the
timestamp of the execution deployment. If the execution time of this
execution exceeds the baseline, then it is judged as a slow task. Normally
this is no problem. But if an execution fails, the time may not be
accurate. For example, the baseline is 59 minutes, and a task fails after
56 minutes of execution. In the worst case, it may take an additional 59
minutes to discover that the task is a slow task.

2. Speculative Scheduler's fault tolerance strategy.
The strategy in FLIP is: if the Execution Vertex can be executed, even if
the execution fails, the fault tolerance strategy will not be adopted.
Although currently `ExecutionTimeBasedSlowTaskDetector` can restart an
execution. But isn't this dependency a bit too strong? To some extent, the
fault tolerance strategy and the Slow task detection strategy are coupled
together.


3. The value of the default configuration
IMHO, prediction execution should only be required for relatively
large-scale, very time-consuming and long-term jobs.
If `slow-task-detector.execution-time.baseline-lower-bound` is too small,
is it possible for the system to always start some additional tasks that
have little effect? In the end, the user needs to reset this default
configuration. Is it possible to consider a larger configuration. Of
course, this part is best to listen to the suggestions of other community
users.

Best,
Guowei


On Thu, Apr 28, 2022 at 3:54 PM Jiangang Liu 
wrote:

> +1 for the feature.
>
> Mang Zhang  于2022年4月28日周四 11:36写道:
>
> > Hi zhu:
> >
> >
> > This sounds like a great job! Thanks for your great job.
> > In our company, there are already some jobs using Flink Batch,
> > but everyone knows that the offline cluster has a lot more load than
> > the online cluster, and the failure rate of the machine is also much
> higher.
> > If this work is done, we'd love to use it, it's simply awesome for
> our
> > flink users.
> > thanks again!
> >
> >
> >
> >
> >
> >
> >
> > --
> >
> > Best regards,
> > Mang Zhang
> >
> >
> >
> >
> >
> > At 2022-04-27 10:46:06, "Zhu Zhu"  wrote:
> > >Hi everyone,
> > >
> > >More and more users are running their batch jobs on Flink nowadays.
> > >One major problem they encounter is slow tasks running on hot/bad
> > >nodes, resulting in very long and uncontrollable execution time of
> > >batch jobs. This problem is a pain or even unacceptable in
> > >production. Many users have been asking for a solution for it.
> > >
> > >Therefore, I'd like to revive the discussion of speculative
> > >execution to solve this problem.
> > >
> > >Weijun Wang, Jing Zhang, Lijie Wang and I had some offline
> > >discussions to refine the design[1]. We also implemented a PoC[2]
> > >and verified it using TPC-DS benchmarks and production jobs.
> > >
> > >Looking forward to your feedback!
> > >
> > >Thanks,
> > >Zhu
> > >
> > >[1]
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-168%3A+Speculative+execution+for+Batch+Job
> > >[2]
> > https://github.com/zhuzhurk/flink/commits/1.14-speculative-execution-poc
> > >
> > >
> > >刘建刚  于2021年12月13日周一 11:38写道:
> > >
> > >> Any progress on the feature? We have the same requirement in our
> > company.
> > >> Since the soft and hard environment can be complex, it is normal to
> see
> > a
> > >> slow task which determines the execution time of the flink job.
> > >>
> > >>  于2021年6月20日周日 22:35写道:
> > >>
> > >> > Hi everyone,
> > >> >
> > >> > I would like to kick off a discussion on speculative execution for
> > batch
> > >> > job.
> > >> > I have created FLIP-168 [1] that clarifies our motivation to do this
> > and
> > >> > some improvement proposals for the new design.
> > >> > It would be great to resolve the problem of long tail task in batch
> > job.
> > >> > Please let me know your thoughts. Thanks.
> > >> >   Regards,
> > >> > wangwj
> > >> > [1]
> > >> >
> > >>
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-168%3A+Speculative+execution+for+Batch+Job
> > >> >
> > >>
> >
>


Re: Failed Unit Test on Master Branch

2022-04-28 Thread Guowei Ma
Hi Haizhou

I ran the test and there is no problem.
And commit is "d940af688be90c92ce4f8b9ca883f6753c94aa0f"

Best,
Guowei


On Fri, Apr 29, 2022 at 5:39 AM Haizhou Zhao 
wrote:

> Hello Flink Community,
>
> I was encountering some unit test failure in the flink-avro sub-module when
> I tried to pull down the master branch and build.
>
> Here is the command I ran:
>
> mvn clean package -pl flink-formats/flink-avro
>
> Here is the test that fails:
>
> https://github.com/apache/flink/blob/master/flink-formats/flink-avro/src/test/java/org/apache/flink/formats/avro/AvroRowDeSerializationSchemaTest.java#L178
>
> Here is the exception that was thrown:
> [ERROR]
>
> org.apache.flink.formats.avro.AvroRowDeSerializationSchemaTest.testGenericDeserializeSeveralTimes
>  Time elapsed: 0.008 s  <<< ERROR!
> java.io.IOException: Failed to deserialize Avro record.
> ...
>
> Here is the latest commit of the HEAD I pulled:
> commit c5430e2e5d4eeb0aba14ce3ea8401747afe0182d (HEAD -> master,
> oss/master)
>
> Can someone confirm this is indeed a problem on the master branch? If yes,
> any suggestions on fixing it?
>
> Thank you,
> Haizhou Zhao
>


Re: [ANNOUNCE] Apache Flink 1.15.0 released

2022-05-05 Thread Guowei Ma
Hi, Yun

Great job!
Thank you very much for your efforts to release Flink-1.15 during this time.
Thanks also to all the contributors who worked on this release!

Best,
Guowei


On Thu, May 5, 2022 at 3:24 PM Peter Schrott  wrote:

> Great!
>
> Will install it on the cluster asap! :)
>
> One thing I noticed: the linked release notes in the blog announcement
> under "Upgrade Notes" result in a 404
> (
>
> https://nightlies.apache.org/flink/flink-docs-release-1.15/release-notes/flink-1.15/
> )
>
> They are also not linked on the main page:
> https://nightlies.apache.org/flink/flink-docs-release-1.15/
>
> Keep it up!
> Peter
>
>
> On Thu, May 5, 2022 at 8:43 AM Martijn Visser 
> wrote:
>
> > Thank you Yun Gao, Till and Joe for driving this release. Your efforts
> are
> > greatly appreciated!
> >
> > To everyone who has opened Jira tickets, provided PRs, reviewed code,
> > written documentation or anything contributed in any other way, this
> > release was (once again) made possible by you! Thank you.
> >
> > Best regards,
> >
> > Martijn
> >
> > Op do 5 mei 2022 om 08:38 schreef Yun Gao 
> >
> >> The Apache Flink community is very happy to announce the release of
> >> Apache Flink 1.15.0, which is the first release for the Apache Flink
> >> 1.15 series.
> >>
> >> Apache Flink® is an open-source stream processing framework for
> >> distributed, high-performing, always-available, and accurate data
> >> streaming applications.
> >>
> >> The release is available for download at:
> >> https://flink.apache.org/downloads.html
> >>
> >> Please check out the release blog post for an overview of the
> >> improvements for this release:
> >> https://flink.apache.org/news/2022/05/05/1.15-announcement.html
> >>
> >> The full release notes are available in Jira:
> >>
> >>
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12350442
> >>
> >> We would like to thank all contributors of the Apache Flink community
> >> who made this release possible!
> >>
> >> Regards,
> >> Joe, Till, Yun Gao
> >>
> > --
> >
> > Martijn Visser | Product Manager
> >
> > mart...@ververica.com
> >
> > 
> >
> >
> > Follow us @VervericaData
> >
> > --
> >
> > Join Flink Forward  - The Apache Flink
> > Conference
> >
> > Stream Processing | Event Driven | Real Time
> >
> >
>


Re: [DISCUSS] FLIP-217 Support watermark alignment of source splits

2022-05-05 Thread Guowei Ma
Hi,

We know that in the case of Bounded input Flink supports the Batch
execution mode. Currently in Batch execution mode, flink is executed on a
stage-by-stage basis. In this way, perhaps watermark alignment might not
gain much.

So my question is: Is watermark alignment the default behavior(for
implemented source only)? If so, have you considered evaluating the impact
of this behavior on the Batch execution mode? Or thinks it is not necessary.

Correct me if I miss something.

Best,
Guowei


On Thu, May 5, 2022 at 1:01 PM Piotr Nowojski 
wrote:

> Hi Becket and Dawid,
>
> > I feel that no matter which option we choose this can not be solved
> entirely in either of the options, because of the point above and because
> the signature of SplitReader#pauseOrResumeSplits and
> SourceReader#pauseOrResumeSplits are slightly different (one identifies
> splits with splitId the other one passes the splits directly).
>
> Yes, that's a good point in this case and for features that need to be
> implemented in more than one place.
>
> > Is there any reason for pausing reading from a split an optional feature,
> > other than that this was not included in the original interface?
>
> An additional argument in favor of making it optional is to simplify source
> implementation. But on its own I'm not sure if that would be enough to
> justify making this feature optional. Maybe.
>
> > I think it would be way simpler and clearer to just let end users and
> Flink
> > assume all the connectors will implement this feature.
>
> As I wrote above that would be an interesting choice to make (ease of
> implementation for new users, vs system consistency). Regardless of that,
> yes, for me the main argument is the API backward compatibility. But let's
> clear a couple of points:
> - The current proposal adding methods to the base interface with default
> implementations is an OPTIONAL feature. Same as the decorative version
> would be.
> - Decorative version could implement "throw UnsupportedOperationException"
> if user enabled watermark alignment just as well and I agree that's a
> better option compared to logging a warning.
>
> Best,
> Piotrek
>
>
> śr., 4 maj 2022 o 15:40 Becket Qin  napisał(a):
>
> > Thanks for the reply and patient discussion, Piotr and Dawid.
> >
> > Is there any reason for pausing reading from a split an optional feature,
> > other than that this was not included in the original interface?
> >
> > To be honest I am really worried about the complexity of the user story
> > here. Optional features like this have a high overhead. Imagine this
> > feature is optional, now a user enabled watermark alignment and defined a
> > few watermark groups. Would it work? Hmm, that depends on whether the
> > involved Source has implmemented this feature. If the Sources are well
> > documented, good luck. Otherwise end users may have to look into the code
> > of the Source to see whether the feature is supported. Which is something
> > they shouldn't have to do.
> >
> > I think it would be way simpler and clearer to just let end users and
> Flink
> > assume all the connectors will implement this feature. After all the
> > watermark group is not optinoal to the end users. If in some rare cases,
> > the feature cannot be supported, a clear UnsupportedOperationException
> will
> > be thrown to tell users to explicitly remove this Source from the
> watermark
> > group. I don't think we should have a warning message here, as they tend
> to
> > be ignored in many cases. If we do this, we don't even need the
> supportXXX
> > method in the Source for this feature. In fact this is exactly how many
> > interfaces works today. For example, SplitEnumerator#addSplitsBack() is
> not
> > supported by Pravega source because it does not support partial failover.
> > In that case, it simply throws an exception to trigger a global recovery.
> >
> > The reason we add a default implementation in this case would just for
> the
> > sake of backwards compatibility so the old source can still compile.
> Sure,
> > in short term, this feature might not be supported by many existing
> > sources. That is OK, and it is quite visible to the source developers
> that
> > they did not override the default impl which throws an
> > UnsupportedOperationException.
> >
> > @Dawid,
> >
> > the Java doc of the SupportXXX() method in the Source would be the single
> > >> source of truth regarding how to implement this feature.
> > >
> > >
> >
> > I also don't find it entirely true. Half of the classes are theoretically
> > > optional and are utility classes from the point of view how the
> > interfaces
> > > are organized. Theoretically users do not need to use any of
> > > SourceReaderBase & SplitReader. Would be weird to list their methods in
> > the
> > > Source interface.
> >
> > I think the ultimate goal of java docs is to guide users to implement the
> > Source. If SourceReaderBase is the preferred way to implement a
> > SourceReader, it seems worth mentioning 

Re: [ANNOUNCE] New Flink PMC member: Yang Wang

2022-05-05 Thread Guowei Ma
Congratulations!
Best,
Guowei


On Thu, May 5, 2022 at 9:01 PM Jiangang Liu 
wrote:

> Congratulations!
>
> Best
> Liu Jiangang
>
> Marios Trivyzas  于2022年5月5日周四 20:47写道:
>
> > Congrats Yang!
> >
> > On Thu, May 5, 2022, 15:29 Yuan Mei  wrote:
> >
> > > Congrats and well Deserved, Yang!
> > >
> > > Best,
> > > Yuan
> > >
> > > On Thu, May 5, 2022 at 8:21 PM Nicholas Jiang <
> nicholasji...@apache.org>
> > > wrote:
> > >
> > > > Congrats Yang!
> > > >
> > > > Best regards,
> > > > Nicholas Jiang
> > > >
> > > > On 2022/05/05 11:18:10 Xintong Song wrote:
> > > > > Hi all,
> > > > >
> > > > > I'm very happy to announce that Yang Wang has joined the Flink PMC!
> > > > >
> > > > > Yang has been consistently contributing to our community, by
> > > contributing
> > > > > codes, participating in discussions, mentoring new contributors,
> > > > answering
> > > > > questions on mailing lists, and giving talks on Flink at
> > > > > various conferences and events. He is one of the main contributors
> > and
> > > > > maintainers in Flink's Native Kubernetes / Yarn integrations and
> the
> > > > Flink
> > > > > Kubernetes Operator.
> > > > >
> > > > > Congratulations and welcome, Yang!
> > > > >
> > > > > Thank you~
> > > > >
> > > > > Xintong Song (On behalf of the Apache Flink PMC)
> > > > >
> > > >
> > >
> >
>


Re: [VOTE] FLIP-168: Speculative Execution for Batch Job

2022-05-26 Thread Guowei Ma
+1 (binding)
Best,
Guowei


On Fri, May 27, 2022 at 12:41 AM Shqiprim Bunjaku 
wrote:

> +1 (non-binding)
>
> Best Regards
> Shqiprim
>
> On Thu, May 26, 2022 at 6:22 PM rui fan <1996fan...@gmail.com> wrote:
>
> > Hi
> >
> > +1(non-binding), it’s very useful for batch job stability.
> >
> > Best wishes
> > fanrui
> >
> > On Thu, May 26, 2022 at 15:56 Zhu Zhu  wrote:
> >
> > > Hi everyone,
> > >
> > > Thanks for the feedback for FLIP-168: Blocklist Mechanism [1] on the
> > > discussion thread [2].
> > >
> > > I'd like to start a vote for it. The vote will last for at least 72
> hours
> > > unless there is an objection or insufficient votes.
> > >
> > > [1]
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-168%3A+Speculative+execution+for+Batch+Job
> > > [2] https://lists.apache.org/thread/ot352tp8t7mclzx9zfv704gcm0fwrq58
> > >
> >
>


Re: [ANNOUNCE] New Apache Flink PMC Member - Jingsong Lee

2022-06-15 Thread Guowei Ma
Congrats, Jingsong!

Best,
Guowei


On Thu, Jun 16, 2022 at 9:49 AM Hangxiang Yu  wrote:

> Congrats, Jingsong!
>
> Best,
> Hangxiang
>
> On Thu, Jun 16, 2022 at 9:46 AM Aitozi  wrote:
>
> > Congrats, Jingsong!
> >
> > Best,
> > Aitozi
> >
> > Zhuoluo Yang  于2022年6月16日周四 09:26写道:
> >
> > > Many congratulations to teacher Lee!
> > >
> > > Thanks,
> > > Zhuoluo
> > >
> > >
> > > Dian Fu  于2022年6月16日周四 08:54写道:
> > >
> > > > Congratulations, Jingsong!
> > > >
> > > > Regards,
> > > > Dian
> > > >
> > > > On Thu, Jun 16, 2022 at 1:08 AM Yu Li  wrote:
> > > >
> > > > > Congrats, Jingsong!
> > > > >
> > > > > Best Regards,
> > > > > Yu
> > > > >
> > > > >
> > > > > On Wed, 15 Jun 2022 at 15:26, Sergey Nuyanzin  >
> > > > wrote:
> > > > >
> > > > > > Congratulations, Jingsong!
> > > > > >
> > > > > > On Wed, Jun 15, 2022 at 8:45 AM Jingsong Li <
> > jingsongl...@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > > > Thanks everyone.
> > > > > > >
> > > > > > > It's great to be with you in the Flink community!
> > > > > > >
> > > > > > > Best,
> > > > > > > Jingsong
> > > > > > >
> > > > > > > On Wed, Jun 15, 2022 at 2:11 PM Yun Gao
> > >  > > > >
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > Congratulations, Jingsong!
> > > > > > > >
> > > > > > > > Best,
> > > > > > > > Yun Gao
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > --
> > > > > > > > From:Jing Zhang 
> > > > > > > > Send Time:2022 Jun. 14 (Tue.) 11:05
> > > > > > > > To:dev 
> > > > > > > > Subject:Re: [ANNOUNCE] New Apache Flink PMC Member - Jingsong
> > Lee
> > > > > > > >
> > > > > > > > Congratulations, Jingsong!
> > > > > > > >
> > > > > > > > Best,
> > > > > > > > Jing Zhang
> > > > > > > >
> > > > > > > > Leonard Xu  于2022年6月14日周二 10:54写道:
> > > > > > > >
> > > > > > > > > Congratulations, Jingsong!
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Best,
> > > > > > > > > Leonard
> > > > > > > > >
> > > > > > > > > > 2022年6月13日 下午6:52,刘首维  写道:
> > > > > > > > > >
> > > > > > > > > > Congratulations and well deserved, Jingsong!
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Best regards,
> > > > > > > > > > Shouwei
> > > > > > > > > > -- 原始邮件 --
> > > > > > > > > > 发件人:
> > > > > > > > > "dev"
> > > > > > > > >   <
> > > > > > > luoyu...@alumni.sjtu.edu.cn
> > > > > > > > > >;
> > > > > > > > > > 发送时间: 2022年6月13日(星期一) 晚上6:09
> > > > > > > > > > 收件人: "dev" > > > > dev@flink.apache.org
> > > > > > > >>;
> > > > > > > > > >
> > > > > > > > > > 主题: Re: [ANNOUNCE] New Apache Flink PMC Member -
> > > Jingsong
> > > > > Lee
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Congratulations, Jingsong!
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Best regards,
> > > > > > > > > > Yuxia
> > > > > > > > > >
> > > > > > > > > > - 原始邮件 -
> > > > > > > > > > 发件人: "Yun Tang"  > > > > > > > > > 收件人: "dev"  > > > > > > > > > 发送时间: 星期一, 2022年 6 月 13日 下午 6:12:24
> > > > > > > > > > 主题: Re: [ANNOUNCE] New Apache Flink PMC Member - Jingsong
> > Lee
> > > > > > > > > >
> > > > > > > > > > Congratulations, Jingsong! Well deserved.
> > > > > > > > > >
> > > > > > > > > > Best
> > > > > > > > > > Yun Tang
> > > > > > > > > > 
> > > > > > > > > > From: Xingbo Huang  > > > > > > > > > Sent: Monday, June 13, 2022 17:39
> > > > > > > > > > To: dev  > > > > > > > > > Subject: Re: [ANNOUNCE] New Apache Flink PMC Member -
> > > Jingsong
> > > > > Lee
> > > > > > > > > >
> > > > > > > > > > Congratulations, Jingsong!
> > > > > > > > > >
> > > > > > > > > > Best,
> > > > > > > > > > Xingbo
> > > > > > > > > >
> > > > > > > > > > Jane Chan  > 17:23写道:
> > > > > > > > > >
> > > > > > > > > > > Congratulations, Jingsong!
> > > > > > > > > > >
> > > > > > > > > > > Best,
> > > > > > > > > > > Jane Chan
> > > > > > > > > > >
> > > > > > > > > > > On Mon, Jun 13, 2022 at 4:43 PM Shuo Cheng <
> > > > > > njucs...@gmail.com
> > > > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > > Congratulations, Jingsong!
> > > > > > > > > > > >
> > > > > > > > > > > > On 6/13/22, Paul Lam  > >  > > > > > > > > paullin3...@gmail.com>> wrote:
> > > > > > > > > > > > > Congrats, Jingsong! Well deserved!
> > > > > > > > > > > > >
> > > > > > > > > > > > > Best,
> > > > > > > > > > > > > Paul Lam
> > > > > > > > > > > > >
> > > > > > > > > > > > >> 2022年6月13日 16:31,Lincoln Lee <
> > > > > > > lincoln.8...@gmail.com
> > > > > > > > > > 写道:
> > > > > > > > > > > > >>
> > > > > > > > > > > > >> Congratulations, Jingsong!
> > > > > > > > > > > > >>
> > > > > > > > > > > > >> Best,
> > > > > > > > > > > > >> Lincoln Lee
> > > > > > > > > > > > >>
> > > > > > > > > > > > >>
> > > > > > > > > > > > >> Jar

Re: [DISCUSS] FLIP-245: Source Supports Speculative Execution For Batch Job

2022-06-29 Thread Guowei Ma
Hi, Jing

Thanks a lot for writing this FLIP, which is very useful to Batch users.
Currently  I have only two small questions:

1. First of all, please complete the fault-tolerant processing flow in the
FLIP. (Maybe you've already considered it, but it's better to explicitly
give the specific solution in the FLIP.)
For example, how to handle Source `Reader` in case of error. As far as I
know, once the reader is unavailable, it will result in the inability to
allocate a new split, which may be unacceptable in the case of speculative
execution.

2. Secondly the FLIP only says that user-defined events are not supported,
but it does not explain how to deal with the existing
ReportedWatermarkEvent/ReaderRegistrationEvent. After all, in the case of
speculative execution, there may be two "same" tasks being executed at the
same time. If these events are repeated, whether they really have no effect
on the execution of the job, there is still a clear evaluation.

Best,
Guowei


On Fri, Jun 24, 2022 at 5:41 PM Jing Zhang  wrote:

> Hi all,
> One major problem of Flink batch jobs is slow tasks running on hot/bad
> nodes, resulting in very long execution time.
>
> In order to solve this problem, FLIP-168: Speculative Execution for Batch
> Job[1] is introduced and approved recently.
>
> Here, Zhu Zhu and I propose to support speculative execution of sources as
> one of follow up of FLIP-168. You could find more details in FLIP-245[2].
> Looking forward to your feedback.
>
> Best,
> Jing Zhang
>
> [1]
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-168%3A+Speculative+Execution+for+Batch+Job#FLIP168:SpeculativeExecutionforBatchJob-NointegrationwithFlink'swebUI
>
> [2]
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-245%3A+Source+Supports+Speculative+Execution+For+Batch+Job
>


Re: [VOTE] FLIP-245: Source Supports Speculative Execution For Batch Job

2022-07-05 Thread Guowei Ma
+1 (binding)
Best,
Guowei


On Tue, Jul 5, 2022 at 12:38 PM Jiangang Liu 
wrote:

> +1 for the feature.
>
> Jing Zhang  于2022年7月5日周二 11:43写道:
>
> > Hi all,
> >
> > I'd like to start a vote for FLIP-245: Source Supports Speculative
> > Execution For Batch Job[1] on the discussion thread [2].
> >
> > The vote will last for at least 72 hours unless there is an objection or
> > insufficient votes.
> >
> > Best,
> > Jing Zhang
> >
> > [1]
> >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-245%3A+Source+Supports+Speculative+Execution+For+Batch+Job
> > [2] https://lists.apache.org/thread/zvc5no4yxvwkto7xxpw1vo7j1p6h0lso
> >
>


Re: [VOTE] Accept Flink CDC into Apache Flink

2024-01-09 Thread Guowei Ma
+1 (binding)
Best,
Guowei


On Tue, Jan 9, 2024 at 4:49 PM Rui Fan <1996fan...@gmail.com> wrote:

> +1 (non-binding)
>
> Best,
> Rui
>
> On Tue, Jan 9, 2024 at 4:41 PM Hang Ruan  wrote:
>
> > +1 (non-binding)
> >
> > Best,
> > Hang
> >
> > gongzhongqiang  于2024年1月9日周二 16:25写道:
> >
> > > +1 non-binding
> > >
> > > Best,
> > > Zhongqiang
> > >
> > > Leonard Xu  于2024年1月9日周二 15:05写道:
> > >
> > > > Hello all,
> > > >
> > > > This is the official vote whether to accept the Flink CDC code
> > > contribution
> > > >  to Apache Flink.
> > > >
> > > > The current Flink CDC code, documentation, and website can be
> > > > found here:
> > > > code: https://github.com/ververica/flink-cdc-connectors <
> > > > https://github.com/ververica/flink-cdc-connectors>
> > > > docs: https://ververica.github.io/flink-cdc-connectors/ <
> > > > https://ververica.github.io/flink-cdc-connectors/>
> > > >
> > > > This vote should capture whether the Apache Flink community is
> > interested
> > > > in accepting, maintaining, and evolving Flink CDC.
> > > >
> > > > Regarding my original proposal[1] in the dev mailing list, I firmly
> > > believe
> > > > that this initiative aligns perfectly with Flink. For the Flink
> > > community,
> > > > it represents an opportunity to bolster Flink's competitive edge in
> > > > streaming
> > > > data integration, fostering the robust growth and prosperity of the
> > > Apache
> > > > Flink
> > > > ecosystem. For the Flink CDC project, becoming a sub-project of
> Apache
> > > > Flink
> > > > means becoming an integral part of a neutral open-source community,
> > > > capable of
> > > > attracting a more diverse pool of contributors.
> > > >
> > > > All Flink CDC maintainers are dedicated to continuously contributing
> to
> > > > achieve
> > > > seamless integration with Flink. Additionally, PMC members like Jark,
> > > > Qingsheng,
> > > > and I are willing to infacilitate the expansion of contributors and
> > > > committers to
> > > > effectively maintain this new sub-project.
> > > >
> > > > This is a "Adoption of a new Codebase" vote as per the Flink bylaws
> > [2].
> > > > Only PMC votes are binding. The vote will be open at least 7 days
> > > > (excluding weekends), meaning until Thursday January 18 12:00 UTC, or
> > > > until we
> > > > achieve the 2/3rd majority. We will follow the instructions in the
> > Flink
> > > > Bylaws
> > > > in the case of insufficient active binding voters:
> > > >
> > > > > 1. Wait until the minimum length of the voting passes.
> > > > > 2. Publicly reach out via personal email to the remaining binding
> > > voters
> > > > in the
> > > > voting mail thread for at least 2 attempts with at least 7 days
> between
> > > > two attempts.
> > > > > 3. If the binding voter being contacted still failed to respond
> after
> > > > all the attempts,
> > > > the binding voter will be considered as inactive for the purpose of
> > this
> > > > particular voting.
> > > >
> > > > Welcome voting !
> > > >
> > > > Best,
> > > > Leonard
> > > > [1] https://lists.apache.org/thread/o7klnbsotmmql999bnwmdgo56b6kxx9l
> > > > [2]
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=120731026
> > >
> >
>


Re: [DISCUSS] FLIP-408: [Umbrella] Introduce DataStream API V2

2024-02-25 Thread Guowei Ma
Hi,weijie

Thank you very much to Weijie for proposing this series of improvements,
especially the complete decoupling of user interface and implementation.
This part is actually a very serious problem that disturbs downstream users
in the community. I hope this problem can be completely solved in the
future.

However, regarding the API decoupling part, I have a question: Do
connectors and SQL currently have similar problems? If so, will similar
methods be used to solve them?

Best,
Guowei


On Tue, Feb 20, 2024 at 3:10 PM weijie guo 
wrote:

> Hi All,
>
> Thanks for all the feedback.
>
> If there are no more comments, I would like to start the vote thread,
> thanks again!
>
> Best regards,
>
> Weijie
>
>
> Xintong Song  于2024年1月30日周二 11:04写道:
>
> > Thanks for working on this, Weijie.
> >
> > The design flaws of the current DataStream API (i.e., V1) have been a
> pain
> > for a long time. It's great to see efforts going on trying to resolve
> them.
> >
> > Significant changes to such an important and comprehensive set of public
> > APIs deserves caution. From that perspective, the ideas of introducing a
> > new set of APIs that gradually replace the current one, splitting the
> > introducing of the new APIs into many separate FLIPs, and making
> > intermediate APIs @Experiemental until all of them are completed make
> > great sense to me.
> >
> > Besides, the ideas of generalized watermark, execution hints sound quite
> > interesting. Looking forward to more detailed discussions in the
> > corresponding sub-FLIPs.
> >
> > +1 for the roadmap.
> >
> > Best,
> >
> > Xintong
> >
> >
> >
> > On Tue, Jan 30, 2024 at 11:00 AM weijie guo 
> > wrote:
> >
> > > Hi Wencong:
> > >
> > > > The Processing TimerService is currently
> > > defined as one of the basic primitives, partly because it's understood
> > that
> > > you have to choose between processing time and event time.
> > > The other part of the reason is that it needs to work based on the
> task's
> > > mailbox thread model to avoid concurrency issues. Could you clarify the
> > > second
> > > part of the reason?
> > >
> > > Since the processing logic of the operators takes place in the mailbox
> > > thread, the processing timer's callback function must also be executed
> in
> > > the mailbox to ensure thread safety.
> > > If we do not define the Processing TimerService as primitive, there is
> no
> > > way for the user to dispatch custom logic to the mailbox thread.
> > >
> > >
> > > Best regards,
> > >
> > > Weijie
> > >
> > >
> > > Xuannan Su  于2024年1月29日周一 17:12写道:
> > >
> > > > Hi Weijie,
> > > >
> > > > Thanks for driving the work! There are indeed many pain points in the
> > > > current DataStream API, which are challenging to resolve with its
> > > > existing design. It is a great opportunity to propose a new
> DataStream
> > > > API that tackles these issues. I like the way we've divided the FLIP
> > > > into multiple sub-FLIPs; the roadmap is clear and comprehensible. +1
> > > > for the umbrella FLIP. I am eager to see the sub-FLIPs!
> > > >
> > > > Best regards,
> > > > Xuannan
> > > >
> > > >
> > > >
> > > >
> > > > On Wed, Jan 24, 2024 at 8:55 PM Wencong Liu 
> > > wrote:
> > > > >
> > > > > Hi Weijie,
> > > > >
> > > > >
> > > > > Thank you for the effort you've put into the DataStream API ! By
> > > > reorganizing and
> > > > > redesigning the DataStream API, as well as addressing some of the
> > > > unreasonable
> > > > > designs within it, we can enhance the efficiency of job development
> > for
> > > > developers.
> > > > > It also allows developers to design more flexible Flink jobs to
> meet
> > > > business requirements.
> > > > >
> > > > >
> > > > > I have conducted a comprehensive review of the DataStream API
> design
> > in
> > > > versions
> > > > > 1.18 and 1.19. I found quite a few functional defects in the
> > DataStream
> > > > API, such as the
> > > > > lack of corresponding APIs in batch processing scenarios. In the
> > > > upcoming 1.20 version,
> > > > > I will further improve the DataStream API in batch computing
> > scenarios.
> > > > >
> > > > >
> > > > > The issues existing in the old DataStream API (which can be
> referred
> > to
> > > > as V1) can be
> > > > > addressed from a design perspective in the initial version of V2. I
> > > hope
> > > > to also have the
> > > > >  opportunity to participate in the development of DataStream V2 and
> > > make
> > > > my contribution.
> > > > >
> > > > >
> > > > > Regarding FLIP-408, I have a question: The Processing TimerService
> is
> > > > currently
> > > > > defined as one of the basic primitives, partly because it's
> > understood
> > > > that
> > > > > you have to choose between processing time and event time.
> > > > > The other part of the reason is that it needs to work based on the
> > > task's
> > > > > mailbox thread model to avoid concurrency issues. Could you clarify
> > the
> > > > second
> > > > > part of the reason?
> > > > >
> > > > > Best,
> > > > > Wencong Liu
>

Re: [DISCUSS] FLIP-408: [Umbrella] Introduce DataStream API V2

2024-02-26 Thread Guowei Ma
Hi,

Thanks for your reply!
I have no other comments!

Best,
Guowei


On Mon, Feb 26, 2024 at 3:43 PM weijie guo 
wrote:

> Hi Guowei,
>
> thanks for your reply!
>
> > Do connectors and SQL currently have similar problems?
>
> - Connectors:
> The APIs for FLIP-27 based source and Sink-V2 are currently in flink-core,
> and we will gradually move them to flink-core-api. Anyway, connector should
> be free of this problem.
>
> - SQL/Table:
>
> For pure SQL job, the user does not need to have any dependencies when
> writing the SQL query string.
>
> But for the Table API, it relies on the original DataStream(i.e. DataStream
> V1), so the implementation is not decoupled from the API.
>
> In the future, we will consider building the Table API on top of DataStream
> API V2, which solves this problem also.
>
>
> Best regards,
>
> Weijie
>
>
> Guowei Ma  于2024年2月26日周一 14:37写道:
>
> > Hi,weijie
> >
> > Thank you very much to Weijie for proposing this series of improvements,
> > especially the complete decoupling of user interface and implementation.
> > This part is actually a very serious problem that disturbs downstream
> users
> > in the community. I hope this problem can be completely solved in the
> > future.
> >
> > However, regarding the API decoupling part, I have a question: Do
> > connectors and SQL currently have similar problems? If so, will similar
> > methods be used to solve them?
> >
> > Best,
> > Guowei
> >
> >
> > On Tue, Feb 20, 2024 at 3:10 PM weijie guo 
> > wrote:
> >
> > > Hi All,
> > >
> > > Thanks for all the feedback.
> > >
> > > If there are no more comments, I would like to start the vote thread,
> > > thanks again!
> > >
> > > Best regards,
> > >
> > > Weijie
> > >
> > >
> > > Xintong Song  于2024年1月30日周二 11:04写道:
> > >
> > > > Thanks for working on this, Weijie.
> > > >
> > > > The design flaws of the current DataStream API (i.e., V1) have been a
> > > pain
> > > > for a long time. It's great to see efforts going on trying to resolve
> > > them.
> > > >
> > > > Significant changes to such an important and comprehensive set of
> > public
> > > > APIs deserves caution. From that perspective, the ideas of
> introducing
> > a
> > > > new set of APIs that gradually replace the current one, splitting the
> > > > introducing of the new APIs into many separate FLIPs, and making
> > > > intermediate APIs @Experiemental until all of them are completed make
> > > > great sense to me.
> > > >
> > > > Besides, the ideas of generalized watermark, execution hints sound
> > quite
> > > > interesting. Looking forward to more detailed discussions in the
> > > > corresponding sub-FLIPs.
> > > >
> > > > +1 for the roadmap.
> > > >
> > > > Best,
> > > >
> > > > Xintong
> > > >
> > > >
> > > >
> > > > On Tue, Jan 30, 2024 at 11:00 AM weijie guo <
> guoweijieres...@gmail.com
> > >
> > > > wrote:
> > > >
> > > > > Hi Wencong:
> > > > >
> > > > > > The Processing TimerService is currently
> > > > > defined as one of the basic primitives, partly because it's
> > understood
> > > > that
> > > > > you have to choose between processing time and event time.
> > > > > The other part of the reason is that it needs to work based on the
> > > task's
> > > > > mailbox thread model to avoid concurrency issues. Could you clarify
> > the
> > > > > second
> > > > > part of the reason?
> > > > >
> > > > > Since the processing logic of the operators takes place in the
> > mailbox
> > > > > thread, the processing timer's callback function must also be
> > executed
> > > in
> > > > > the mailbox to ensure thread safety.
> > > > > If we do not define the Processing TimerService as primitive, there
> > is
> > > no
> > > > > way for the user to dispatch custom logic to the mailbox thread.
> > > > >
> > > > >
> > > > > Best regards,
> > > > >
> > > > > Weijie
> > > > >
> > > > >
> > > > > Xuanna

Re: [VOTE] FLIP-408: [Umbrella] Introduce DataStream API V2

2024-02-26 Thread Guowei Ma
+1(binding)
Best,
Guowei


On Tue, Feb 27, 2024 at 10:06 AM Rui Fan <1996fan...@gmail.com> wrote:

> +1(binding)
>
> Best,
> Rui
>
> On Tue, Feb 27, 2024 at 9:43 AM weijie guo 
> wrote:
>
> > +1(binding)
> >
> > Best regards,
> >
> > Weijie
> >
> >
> > Xintong Song  于2024年2月27日周二 09:36写道:
> >
> > > +1 (binding)
> > >
> > > Best,
> > >
> > > Xintong
> > >
> > >
> > >
> > > On Mon, Feb 26, 2024 at 6:08 PM weijie guo 
> > > wrote:
> > >
> > > > Hi everyone,
> > > >
> > > >
> > > > Thanks for all the feedback about the FLIP-408: [Umbrella] Introduce
> > > > DataStream API V2 [1]. The discussion thread is here [2].
> > > >
> > > >
> > > > The vote will be open for at least 72 hours unless there is an
> > > > objection or insufficient votes.
> > > >
> > > >
> > > > [1]
> > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-408%3A+%5BUmbrella%5D+Introduce+DataStream+API+V2
> > > >
> > > > [2] https://lists.apache.org/thread/w8olky9s7fo5h8fl3nj3qbym307zk2l0
> > > >
> > > > Best regards,
> > > >
> > > > Weijie
> > > >
> > >
> >
>


Re: [VOTE] FLIP-410: Config, Context and Processing Timer Service of DataStream API V2

2024-02-26 Thread Guowei Ma
+1 (binding)
Best,
Guowei


On Tue, Feb 27, 2024 at 10:09 AM Xuannan Su  wrote:

> +1 (non-binding)
>
> Best,
> Xuannan
>
> On Tue, Feb 27, 2024 at 9:37 AM Xintong Song 
> wrote:
> >
> > +1 (binding)
> >
> > Best,
> >
> > Xintong
> >
> >
> >
> > On Mon, Feb 26, 2024 at 6:10 PM weijie guo 
> > wrote:
> >
> > > Hi everyone,
> > >
> > >
> > > Thanks for all the feedback about the FLIP-410: Config, Context and
> > > Processing Timer Service of DataStream API V2 [1]. The discussion
> > > thread is here [2].
> > >
> > >
> > > The vote will be open for at least 72 hours unless there is an
> > > objection or insufficient votes.
> > >
> > >
> > > [1]
> > >
> > >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-410%3A++Config%2C+Context+and+Processing+Timer+Service+of+DataStream+API+V2
> > >
> > > [2] https://lists.apache.org/thread/70gf028c5gsdb9qhsgpht0chzyp9nogc
> > >
> > >
> > > Best regards,
> > >
> > > Weijie
> > >
>


Re: [VOTE] FLIP-409: DataStream V2 Building Blocks: DataStream, Partitioning and ProcessFunction

2024-02-26 Thread Guowei Ma
+1 (binding)
Best,
Guowei


On Tue, Feb 27, 2024 at 10:08 AM Rui Fan <1996fan...@gmail.com> wrote:

> +1(binding)
>
> Best,
> Rui
>
> On Tue, Feb 27, 2024 at 9:44 AM weijie guo 
> wrote:
>
> > +1(binding)
> >
> > Best regards,
> >
> > Weijie
> >
> >
> > Xintong Song  于2024年2月27日周二 09:38写道:
> >
> > > +1 (binding)
> > >
> > > Best,
> > >
> > > Xintong
> > >
> > >
> > >
> > > On Mon, Feb 26, 2024 at 6:09 PM weijie guo 
> > > wrote:
> > >
> > > > Hi everyone,
> > > >
> > > >
> > > > Thanks for all the feedback about the FLIP-409: DataStream V2
> Building
> > > > Blocks: DataStream, Partitioning and ProcessFunction [1]. The
> > > > discussion thread is here [2].
> > > >
> > > >
> > > > The vote will be open for at least 72 hours unless there is an
> > > > objection or insufficient votes.
> > > >
> > > >
> > > > [1]
> > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-409%3A+DataStream+V2+Building+Blocks%3A+DataStream%2C+Partitioning+and+ProcessFunction
> > > >
> > > > [2] https://lists.apache.org/thread/cwds0bwbgy3lfdgnlqbfhm6lfvx2qbrv
> > > >
> > > >
> > > > Best regards,
> > > >
> > > > Weijie
> > > >
> > >
> >
>


Re: Re: [ANNOUNCE] Apache Paimon is graduated to Top Level Project

2024-03-31 Thread Guowei Ma
Congratulations!
Best,
Guowei


On Mon, Apr 1, 2024 at 11:15 AM Feng Jin  wrote:

> Congratulations!
>
> Best,
> Feng Jin
>
> On Mon, Apr 1, 2024 at 10:51 AM weijie guo 
> wrote:
>
>> Congratulations!
>>
>> Best regards,
>>
>> Weijie
>>
>>
>> Hang Ruan  于2024年4月1日周一 09:49写道:
>>
>> > Congratulations!
>> >
>> > Best,
>> > Hang
>> >
>> > Lincoln Lee  于2024年3月31日周日 00:10写道:
>> >
>> > > Congratulations!
>> > >
>> > > Best,
>> > > Lincoln Lee
>> > >
>> > >
>> > > Jark Wu  于2024年3月30日周六 22:13写道:
>> > >
>> > > > Congratulations!
>> > > >
>> > > > Best,
>> > > > Jark
>> > > >
>> > > > On Fri, 29 Mar 2024 at 12:08, Yun Tang  wrote:
>> > > >
>> > > > > Congratulations to all Paimon guys!
>> > > > >
>> > > > > Glad to see a Flink sub-project has been graduated to an Apache
>> > > top-level
>> > > > > project.
>> > > > >
>> > > > > Best
>> > > > > Yun Tang
>> > > > >
>> > > > > 
>> > > > > From: Hangxiang Yu 
>> > > > > Sent: Friday, March 29, 2024 10:32
>> > > > > To: dev@flink.apache.org 
>> > > > > Subject: Re: Re: [ANNOUNCE] Apache Paimon is graduated to Top
>> Level
>> > > > Project
>> > > > >
>> > > > > Congratulations!
>> > > > >
>> > > > > On Fri, Mar 29, 2024 at 10:27 AM Benchao Li > >
>> > > > wrote:
>> > > > >
>> > > > > > Congratulations!
>> > > > > >
>> > > > > > Zakelly Lan  于2024年3月29日周五 10:25写道:
>> > > > > > >
>> > > > > > > Congratulations!
>> > > > > > >
>> > > > > > >
>> > > > > > > Best,
>> > > > > > > Zakelly
>> > > > > > >
>> > > > > > > On Thu, Mar 28, 2024 at 10:13 PM Jing Ge
>> > > > > > > >
>> > > > > > wrote:
>> > > > > > >
>> > > > > > > > Congrats!
>> > > > > > > >
>> > > > > > > > Best regards,
>> > > > > > > > Jing
>> > > > > > > >
>> > > > > > > > On Thu, Mar 28, 2024 at 1:27 PM Feifan Wang <
>> > zoltar9...@163.com>
>> > > > > > wrote:
>> > > > > > > >
>> > > > > > > > > Congratulations!——
>> > > > > > > > >
>> > > > > > > > > Best regards,
>> > > > > > > > >
>> > > > > > > > > Feifan Wang
>> > > > > > > > >
>> > > > > > > > >
>> > > > > > > > >
>> > > > > > > > >
>> > > > > > > > > At 2024-03-28 20:02:43, "Yanfei Lei" > >
>> > > > wrote:
>> > > > > > > > > >Congratulations!
>> > > > > > > > > >
>> > > > > > > > > >Best,
>> > > > > > > > > >Yanfei
>> > > > > > > > > >
>> > > > > > > > > >Zhanghao Chen  于2024年3月28日周四
>> > > > 19:59写道:
>> > > > > > > > > >>
>> > > > > > > > > >> Congratulations!
>> > > > > > > > > >>
>> > > > > > > > > >> Best,
>> > > > > > > > > >> Zhanghao Chen
>> > > > > > > > > >> 
>> > > > > > > > > >> From: Yu Li 
>> > > > > > > > > >> Sent: Thursday, March 28, 2024 15:55
>> > > > > > > > > >> To: d...@paimon.apache.org 
>> > > > > > > > > >> Cc: dev ; user <
>> > u...@flink.apache.org
>> > > >
>> > > > > > > > > >> Subject: Re: [ANNOUNCE] Apache Paimon is graduated to
>> Top
>> > > > Level
>> > > > > > > > Project
>> > > > > > > > > >>
>> > > > > > > > > >> CC the Flink user and dev mailing list.
>> > > > > > > > > >>
>> > > > > > > > > >> Paimon originated within the Flink community, initially
>> > > known
>> > > > as
>> > > > > > Flink
>> > > > > > > > > >> Table Store, and all our incubating mentors are
>> members of
>> > > the
>> > > > > > Flink
>> > > > > > > > > >> Project Management Committee. I am confident that the
>> > bonds
>> > > of
>> > > > > > > > > >> enduring friendship and close collaboration will
>> continue
>> > to
>> > > > > > unite the
>> > > > > > > > > >> two communities.
>> > > > > > > > > >>
>> > > > > > > > > >> And congratulations all!
>> > > > > > > > > >>
>> > > > > > > > > >> Best Regards,
>> > > > > > > > > >> Yu
>> > > > > > > > > >>
>> > > > > > > > > >> On Wed, 27 Mar 2024 at 20:35, Guojun Li <
>> > > > > gjli.schna...@gmail.com>
>> > > > > > > > > wrote:
>> > > > > > > > > >> >
>> > > > > > > > > >> > Congratulations!
>> > > > > > > > > >> >
>> > > > > > > > > >> > Best,
>> > > > > > > > > >> > Guojun
>> > > > > > > > > >> >
>> > > > > > > > > >> > On Wed, Mar 27, 2024 at 5:24 PM wulin <
>> > > ouyangwu...@163.com>
>> > > > > > wrote:
>> > > > > > > > > >> >
>> > > > > > > > > >> > > Congratulations~
>> > > > > > > > > >> > >
>> > > > > > > > > >> > > > 2024年3月27日 15:54,王刚 > > .INVALID>
>> > > > 写道:
>> > > > > > > > > >> > > >
>> > > > > > > > > >> > > > Congratulations~
>> > > > > > > > > >> > > >
>> > > > > > > > > >> > > >> 2024年3月26日 10:25,Jingsong Li <
>> > jingsongl...@gmail.com
>> > > >
>> > > > > 写道:
>> > > > > > > > > >> > > >>
>> > > > > > > > > >> > > >> Hi Paimon community,
>> > > > > > > > > >> > > >>
>> > > > > > > > > >> > > >> I’m glad to announce that the ASF board has
>> > approved
>> > > a
>> > > > > > > > > resolution to
>> > > > > > > > > >> > > >> graduate Paimon into a full Top Level Project.
>> > Thanks
>> > > > to
>> > > > > > > > > everyone for
>> > > > > > > > > >> > > >> your help to get to this point.
>> > > > > > > > > >> > > >>
>> > > > > > > > > >> > > >> I just created an issue to track the things we
>> need
>> > > to
>> > 

Re: [ANNOUNCE] New Apache Flink Committer - Lijie Wang

2022-08-17 Thread Guowei Ma
Congratulations, Lijie. Welcome on board~!
Best,
Guowei


On Wed, Aug 17, 2022 at 6:25 PM Zhu Zhu  wrote:

> Hi everyone,
>
> On behalf of the PMC, I'm very happy to announce Lijie Wang as
> a new Flink committer.
>
> Lijie has been contributing to Flink project for more than 2 years.
> He mainly works on the runtime/coordination part, doing feature
> development, problem debugging and code reviews. He has also
> driven the work of FLIP-187(Adaptive Batch Scheduler) and
> FLIP-224(Blocklist for Speculative Execution), which are important
> to run batch jobs.
>
> Please join me in congratulating Lijie for becoming a Flink committer!
>
> Cheers,
> Zhu
>


Re: [ANNOUNCE] New Apache Flink Committer - Junhan Yang

2022-08-18 Thread Guowei Ma
Congratulations, Junhan!
Best,
Guowei


On Fri, Aug 19, 2022 at 6:01 AM Jing Ge  wrote:

> Congrats Junhan!
>
> Best regards,
> Jing
>
> On Thu, Aug 18, 2022 at 12:05 PM Jark Wu  wrote:
>
> > Congrats and welcome Junhan!
> >
> > Cheers,
> > Jark
> >
> > > 2022年8月18日 17:59,Timo Walther  写道:
> > >
> > > Congratulations and welcome to the committer team :-)
> > >
> > > Regards,
> > > Timo
> > >
> > > On 18.08.22 07:19, Lijie Wang wrote:
> > >> Congratulations, Junhan!
> > >> Best,
> > >> Lijie
> > >> Leonard Xu  于2022年8月18日周四 11:31写道:
> > >>> Congratulations, Junhan!
> > >>>
> > >>> Best,
> > >>>
> >  2022年8月18日 上午11:27,Zhipeng Zhang  写道:
> > 
> >  Congratulations, Junhan!
> > 
> >  Xintong Song  于2022年8月18日周四 11:21写道:
> > >
> > > Hi everyone,
> > >
> > > On behalf of the PMC, I'm very happy to announce Junhan Yang as a
> new
> > >>> Flink
> > > committer.
> > >
> > > Junhan has been contributing to the Flink project for more than 1
> > year.
> > >>> His
> > > contributions are mostly identified in the web frontend, including
> > > FLIP-241, FLIP-249 and various maintenance efforts of Flink's
> > frontend
> > > frameworks.
> > >
> > > Please join me in congratulating Junhan for becoming a Flink
> > committer!
> > >
> > > Best,
> > > Xintong
> > 
> > 
> > 
> >  --
> >  best,
> >  Zhipeng
> > >>>
> > >>>
> > >
> >
> >
>


Re: [ANNOUNCE] New Apache Flink PMC Member - Danny Cranmer

2022-11-01 Thread Guowei Ma
Congratulations Danny!
Best,
Guowei


On Tue, Nov 1, 2022 at 2:20 PM weijie guo  wrote:

> Congratulations Danny!
>
> Best regards,
>
> Weijie
>
>
> Maximilian Michels  于2022年10月13日周四 21:41写道:
>
> > Congratulations Danny! Well deserved :)
> >
> > -Max
> >
> > On Thu, Oct 13, 2022 at 2:40 PM Yang Wang  wrote:
> >
> > > Congratulations Danny!
> > >
> > > Best,
> > > Yang
> > >
> > > Hang Ruan  于2022年10月13日周四 10:58写道:
> > >
> > > > Congratulations Danny!
> > > >
> > > > Best,
> > > > Hang
> > > >
> > > > Yun Gao  于2022年10月13日周四 10:56写道:
> > > >
> > > > > Congratulations Danny!
> > > > > Best,
> > > > > Yun Gao
> > > > > --
> > > > > From:yuxia 
> > > > > Send Time:2022 Oct. 12 (Wed.) 09:49
> > > > > To:dev 
> > > > > Subject:Re: [ANNOUNCE] New Apache Flink PMC Member - Danny Cranmer
> > > > > Congratulations Danny!
> > > > > Best regards,
> > > > > Yuxia
> > > > > - 原始邮件 -
> > > > > 发件人: "Xingbo Huang" 
> > > > > 收件人: "dev" 
> > > > > 发送时间: 星期三, 2022年 10 月 12日 上午 9:44:22
> > > > > 主题: Re: [ANNOUNCE] New Apache Flink PMC Member - Danny Cranmer
> > > > > Congratulations Danny!
> > > > > Best,
> > > > > Xingbo
> > > > > Sergey Nuyanzin  于2022年10月12日周三 01:26写道:
> > > > > > Congratulations, Danny
> > > > > >
> > > > > > On Tue, Oct 11, 2022, 15:18 Lincoln Lee 
> > > > wrote:
> > > > > >
> > > > > > > Congratulations Danny!
> > > > > > >
> > > > > > > Best,
> > > > > > > Lincoln Lee
> > > > > > >
> > > > > > >
> > > > > > > Congxian Qiu  于2022年10月11日周二 19:42写道:
> > > > > > >
> > > > > > > > Congratulations Danny!
> > > > > > > >
> > > > > > > > Best,
> > > > > > > > Congxian
> > > > > > > >
> > > > > > > >
> > > > > > > > Leonard Xu  于2022年10月11日周二 18:03写道:
> > > > > > > >
> > > > > > > > > Congratulations Danny!
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Best,
> > > > > > > > > Leonard
> > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>


Re: [DISCUSS] FLIP-266: Simplify network memory configurations for TaskManager

2022-12-24 Thread Guowei Ma
Hi,
Thank you very much for driving this FLIP in order to improve user
usability.

I understand that a key goal of this FLIP is to adjust the memory
requirements of shuffle to a more reasonable range. Through this adaptive
range adjustment, the memory efficiency can be improved under the premise
of ensuring the performance, thereby improving the user experience.

I have no problem with this goal, but I have a concern about the means of
implementation: should we introduce a _new_ non-orthogonal
option(`taskmanager.memory.network.required-buffer-per-gate.max`). That is
to say, the option will affect both streaming and batch shuffle behavior at
the same time.

>From the description in FLIP, we can see that we do not want this value to
be the same in streaming and batch scenarios. But we still let the user
configure this parameter, and once this parameter is configured, the
shuffle behavior of streaming and batch may be the same. In theory, there
may be a configuration that can meet the requirements of batch shuffle, but
it will affect the performance of streaming shuffle. (For example, we need
to reduce the memory overhead in batch scenarios, but it will affect the
performance of streaming shuffle). In other words, do we really want to add
a new option that exposes this possible risk problem?

  Personally, I think there might be two ways:
1. Modify the current implementation of streaming shuffle. Don't let
the streaming shuffle performance regression. In this way, this option will
not couple streaming shuffle and batch shuffle. This also avoids confusion
for the user.  But I am not sure how to do it. :-)
2. Introduce a pure batch read option, similar to the one introduced on
the batch write side.

BTW: It's better not to expose more implementation-related concepts to
users. For example, the "gate" is related to the internal implementation.
Relatively speaking, `shuffle.read/shuffle.client.read` may be more
general. After all, it can also avoid coupling with the topology structure
and scheduling units.

Best,
Guowei


On Fri, Dec 23, 2022 at 2:57 PM Lijie Wang  wrote:

> Hi,
>
> Thanks for driving this FLIP, +1 for the proposed changes.
>
> Limit the maximum value of shuffle read memory is very useful when using
> when using adaptive batch scheduler. Currently, the adaptive batch
> scheduler may cause a large number of input channels in a certain TM, so we
> generally recommend that users configure
> "taskmanager.network.memory.buffers-per-channel: 0" to decrease the the
> possibility of “Insufficient number of network buffers” error. After this
> FLIP, users no longer need to configure the
> "taskmanager.network.memory.buffers-per-channel".
>
> So +1 from my side.
>
> Best,
> Lijie
>
> Xintong Song  于2022年12月20日周二 10:04写道:
>
> > Thanks for the proposal, Yuxin.
> >
> > +1 for the proposed changes. I think these are indeed helpful usability
> > improvements.
> >
> > Best,
> >
> > Xintong
> >
> >
> >
> > On Mon, Dec 19, 2022 at 3:36 PM Yuxin Tan 
> wrote:
> >
> > > Hi, devs,
> > >
> > > I'd like to start a discussion about FLIP-266: Simplify network memory
> > > configurations for TaskManager[1].
> > >
> > > When using Flink, users may encounter the following issues that affect
> > > usability.
> > > 1. The job may fail with an "Insufficient number of network buffers"
> > > exception.
> > > 2. Flink network memory size adjustment is complex.
> > > When encountering these issues, users can solve some problems by adding
> > or
> > > adjusting parameters. However, multiple memory config options should be
> > > changed. The config option adjustment requires understanding the
> detailed
> > > internal implementation, which is impractical for most users.
> > >
> > > To simplify network memory configurations for TaskManager and improve
> > Flink
> > > usability, this FLIP proposed some optimization solutions for the
> issues.
> > >
> > > Looking forward to your feedback.
> > >
> > > [1]
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-266%3A+Simplify+network+memory+configurations+for+TaskManager
> > >
> > > Best regards,
> > > Yuxin
> > >
> >
>


Re: [ANNOUNCE] New Apache Flink Committer - Lincoln Lee

2023-01-09 Thread Guowei Ma
Congratulations, Lincoln!

Best,
Guowei


On Tue, Jan 10, 2023 at 2:57 PM Biao Geng  wrote:

> Congrats, Lincoln!
> Best,
> Biao Geng
>
> 获取 Outlook for iOS
> 
> 发件人: Wencong Liu 
> 发送时间: Tuesday, January 10, 2023 2:39:47 PM
> 收件人: dev@flink.apache.org 
> 主题: Re:Re: [ANNOUNCE] New Apache Flink Committer - Lincoln Lee
>
> Congratulations, Lincoln!
>
> Best regards,
> Wencong
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> 在 2023-01-10 13:25:09,"Yanfei Lei"  写道:
> >Congratulations, well deserved!
> >
> >Best,
> >Yanfei
> >
> >Yuan Mei  于2023年1月10日周二 13:16写道:
> >
> >> Congratulations, Lincoln!
> >>
> >> Best,
> >> Yuan
> >>
> >> On Tue, Jan 10, 2023 at 12:23 PM Lijie Wang 
> >> wrote:
> >>
> >> > Congratulations, Lincoln!
> >> >
> >> > Best,
> >> > Lijie
> >> >
> >> > Jingsong Li  于2023年1月10日周二 12:07写道:
> >> >
> >> > > Congratulations, Lincoln!
> >> > >
> >> > > Best,
> >> > > Jingsong
> >> > >
> >> > > On Tue, Jan 10, 2023 at 11:56 AM Leonard Xu 
> wrote:
> >> > > >
> >> > > > Congratulations, Lincoln!
> >> > > >
> >> > > > Impressive work in streaming semantics, well deserved!
> >> > > >
> >> > > >
> >> > > > Best,
> >> > > > Leonard
> >> > > >
> >> > > >
> >> > > > > On Jan 10, 2023, at 11:52 AM, Jark Wu  wrote:
> >> > > > >
> >> > > > > Hi everyone,
> >> > > > >
> >> > > > > On behalf of the PMC, I'm very happy to announce Lincoln Lee as
> a
> >> new
> >> > > Flink
> >> > > > > committer.
> >> > > > >
> >> > > > > Lincoln Lee has been a long-term Flink contributor since 2017.
> He
> >> > > mainly
> >> > > > > works on Flink
> >> > > > > SQL parts and drives several important FLIPs, e.g., FLIP-232
> (Retry
> >> > > Async
> >> > > > > I/O), FLIP-234 (
> >> > > > > Retryable Lookup Join), FLIP-260 (TableFunction Finish).
> Besides,
> >> He
> >> > > also
> >> > > > > contributed
> >> > > > > much to Streaming Semantics, including the non-determinism
> problem
> >> > and
> >> > > the
> >> > > > > message
> >> > > > > ordering problem.
> >> > > > >
> >> > > > > Please join me in congratulating Lincoln for becoming a Flink
> >> > > committer!
> >> > > > >
> >> > > > > Cheers,
> >> > > > > Jark Wu
> >> > > >
> >> > >
> >> >
> >>
>


Re: [ANNOUNCE] New Apache Flink PMC Member - Weijie Guo

2024-06-04 Thread Guowei Ma
Congratulations!

Best,
Guowei


On Tue, Jun 4, 2024 at 4:55 PM gongzhongqiang 
wrote:

> Congratulations Weijie! Best,
> Zhongqiang Gong
>
> Xintong Song  于2024年6月4日周二 14:46写道:
>
> > Hi everyone,
> >
> > On behalf of the PMC, I'm very happy to announce that Weijie Guo has
> joined
> > the Flink PMC!
> >
> > Weijie has been an active member of the Apache Flink community for many
> > years. He has made significant contributions in many components,
> including
> > runtime, shuffle, sdk, connectors, etc. He has driven / participated in
> > many FLIPs, authored and reviewed hundreds of PRs, been consistently
> active
> > on mailing lists, and also helped with release management of 1.20 and
> > several other bugfix releases.
> >
> > Congratulations and welcome Weijie!
> >
> > Best,
> >
> > Xintong (on behalf of the Flink PMC)
> >
>


Re: [ANNOUNCE] New Apache Flink Committer - Xuannan Su

2024-08-18 Thread Guowei Ma
Congratulations!

Best,
Guowei


On Mon, Aug 19, 2024 at 8:24 AM Luke Chen  wrote:

> Congratulations, Xuannan!
>
> Luke
>
> On Sun, Aug 18, 2024 at 6:53 PM Aleksandr Pilipenko 
> wrote:
>
> > Congratulations, Xuannan!
> >
> > Best,
> > Aleksandr
> >
> > On Sun, 18 Aug 2024 at 09:49, clouding.vip 
> > wrote:
> >
> > > Congratulations, Xuannan!
> > >
> > >
> > >
> > >
> > > 在 2024年8月18日 16:39,Junrui Lee 写道:
> > >
> > >
> > > Congratulations, Xuannan! Best, Junrui Feng Jin  >
> > > 于2024年8月18日周日 16:34写道: > Congratulations, Xuannan! > > Best, > Feng >
> > >
> > > On Sun, Aug 18, 2024 at 3:08 PM Rui Fan <1996fan...@gmail.com> wrote:
> >
> > >
> > > > Congratulations, Xuannan! > > > > Best, > > Rui > > > > On Sun, Aug
> 18,
> > > 2024 at 2:20 PM Leonard Xu  wrote: > > > > >
> > > Congratulations! Xuannan > > > > > > > > > Best, > > > Leonard > > > >
> >
> > >
> > > > > >
> >
>


Re: [DISCUSS] FLIP-27: Refactor Source Interface

2018-11-04 Thread Guowei Ma
Hi,
Thanks Aljoscha for this FLIP.

1. I agree with Piotr and Becket that the non-blocking source is very
important. But in addition to `Future/poll`, there may be another way to
achieve this. I think it may be not very memory friendly if every advance
call return a Future.

public interface Listener {
 public void notify();
}

public interface SplitReader() {
 /**
  * When there is no element temporarily, this will return false.
  * When elements is available again splitReader can call
listener.notify()
  * In addition the frame would check `advance` periodically .
  * Of course advance can always return true and ignore the listener
argument for simplicity.
  */
 public boolean advance(Listener listener);
}

2.  The FLIP tells us very clearly that how to create all Splits and how to
create a SplitReader from a Split. But there is no strategy for the user to
choose how to assign the splits to the tasks. I think we could add a Enum
to let user to choose.
/**
  public Enum SplitsAssignmentPolicy {
Location,
Workload,
Random,
Average
  }
*/

3. If merge the `advance` and `getCurrent`  to one method like `getNext`
the `getNext` would need return a `ElementWithTimestamp` because some
sources want to add timestamp to every element. IMO, this is not so memory
friendly so I prefer this design.



Thanks

Piotr Nowojski  于2018年11月1日周四 下午6:08写道:

> Hi,
>
> Thanks Aljoscha for starting this, it’s blocking quite a lot of other
> possible improvements. I have one proposal. Instead of having a method:
>
> boolean advance() throws IOException;
>
> I would replace it with
>
> /*
>  * Return a future, which when completed means that source has more data
> and getNext() will not block.
>  * If you wish to use benefits of non blocking connectors, please
> implement this method appropriately.
>  */
> default CompletableFuture isBlocked() {
> return CompletableFuture.completedFuture(null);
> }
>
> And rename `getCurrent()` to `getNext()`.
>
> Couple of arguments:
> 1. I don’t understand the division of work between `advance()` and
> `getCurrent()`. What should be done in which, especially for connectors
> that handle records in batches (like Kafka) and when should you call
> `advance` and when `getCurrent()`.
> 2. Replacing `boolean` with `CompletableFuture` will allow us in the
> future to have asynchronous/non blocking connectors and more efficiently
> handle large number of blocked threads, without busy waiting. While at the
> same time it doesn’t add much complexity, since naive connector
> implementations can be always blocking.
> 3. This also would allow us to use a fixed size thread pool of task
> executors, instead of one thread per task.
>
> Piotrek
>
> > On 31 Oct 2018, at 17:22, Aljoscha Krettek  wrote:
> >
> > Hi All,
> >
> > In order to finally get the ball rolling on the new source interface
> that we have discussed for so long I finally created a FLIP:
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-27%3A+Refactor+Source+Interface
> >
> > I cc'ed Thomas and Jamie because of the ongoing work/discussion about
> adding per-partition watermark support to the Kinesis source and because
> this would enable generic implementation of event-time alignment for all
> sources. Maybe we need another FLIP for the event-time alignment part,
> especially the part about information sharing between operations (I'm not
> calling it state sharing because state has a special meaning in Flink).
> >
> > Please discuss away!
> >
> > Aljoscha
> >
> >
>
>


Re: [DISCUSS] FLIP-27: Refactor Source Interface

2018-11-23 Thread Guowei Ma
t; >> On Sun, Nov 4, 2018 at 1:43 PM Thomas Weise  wrote:
> >>
> >>> Thanks for getting the ball rolling on this!
> >>>
> >>> Can the number of splits decrease? Yes, splits can be closed and go
> away.
> >>> An example would be a shard merge in Kinesis (2 existing shards will be
> >>> closed and replaced with a new shard).
> >>>
> >>> Regarding advance/poll/take: IMO the least restrictive approach would
> be
> >>> the thread-less IO model (pull based, non-blocking, caller retrieves
> new
> >>> records when available). The current Kinesis API requires the use of
> >>> threads. But that can be internal to the split reader and does not need
> >> to
> >>> be a source API concern. In fact, that's what we are working on right
> now
> >>> as improvement to the existing consumer: Each shard consumer thread
> will
> >>> push to a queue, the consumer main thread will poll the queue(s). It is
> >>> essentially a mapping from threaded IO to non-blocking.
> >>>
> >>> The proposed SplitReader interface would fit the thread-less IO model.
> >>> Similar to an iterator, we find out if there is a new element (hasNext)
> >> and
> >>> if so, move to it (next()). Separate calls deliver the meta information
> >>> (timestamp, watermark). Perhaps advance call could offer a timeout
> >> option,
> >>> so that the caller does not end up in a busy wait. On the other hand, a
> >>> caller processing multiple splits may want to cycle through fast, to
> >>> process elements of other splits as soon as they become available. The
> >> nice
> >>> thing is that this "split merge" logic can now live in Flink and be
> >>> optimized and shared between different sources.
> >>>
> >>> Thanks,
> >>> Thomas
> >>>
> >>>
> >>> On Sun, Nov 4, 2018 at 6:34 AM Guowei Ma  wrote:
> >>>
> >>>> Hi,
> >>>> Thanks Aljoscha for this FLIP.
> >>>>
> >>>> 1. I agree with Piotr and Becket that the non-blocking source is very
> >>>> important. But in addition to `Future/poll`, there may be another way
> to
> >>>> achieve this. I think it may be not very memory friendly if every
> >> advance
> >>>> call return a Future.
> >>>>
> >>>> public interface Listener {
> >>>> public void notify();
> >>>> }
> >>>>
> >>>> public interface SplitReader() {
> >>>> /**
> >>>>  * When there is no element temporarily, this will return false.
> >>>>  * When elements is available again splitReader can call
> >>>> listener.notify()
> >>>>  * In addition the frame would check `advance` periodically .
> >>>>  * Of course advance can always return true and ignore the
> listener
> >>>> argument for simplicity.
> >>>>  */
> >>>> public boolean advance(Listener listener);
> >>>> }
> >>>>
> >>>> 2.  The FLIP tells us very clearly that how to create all Splits and
> how
> >>>> to create a SplitReader from a Split. But there is no strategy for the
> >> user
> >>>> to choose how to assign the splits to the tasks. I think we could add
> a
> >>>> Enum to let user to choose.
> >>>> /**
> >>>>  public Enum SplitsAssignmentPolicy {
> >>>>Location,
> >>>>Workload,
> >>>>Random,
> >>>>Average
> >>>>  }
> >>>> */
> >>>>
> >>>> 3. If merge the `advance` and `getCurrent`  to one method like
> `getNext`
> >>>> the `getNext` would need return a `ElementWithTimestamp` because some
> >>>> sources want to add timestamp to every element. IMO, this is not so
> >> memory
> >>>> friendly so I prefer this design.
> >>>>
> >>>>
> >>>>
> >>>> Thanks
> >>>>
> >>>> Piotr Nowojski  于2018年11月1日周四 下午6:08写道:
> >>>>
> >>>>> Hi,
> >>>>>
> >>>>> Thanks Aljoscha for starting this, it’s blocking quite a lot of other
> >>>>> possible improvements. I have one proposal. Instead of having a
> method:
> >>>&

Re: [DISCUSS]Enhancing flink scheduler by implementing blacklist mechanism

2018-11-27 Thread Guowei Ma
thanks yingjie to share this doc and I think this is very important feature
for production.

As you mentioned in your document, an unhealthy node  can cause a TM
startup failure but cluster management may offer the same node for some
reason. (I have encountered such a scenario in our production environment.)
As your proposal  RM can blacklist this unhealthy node because of the
launch failure.

I have some questions:
Do you want every
ResourceManager(MesosResoruceManager,YarnResourceManager)  to implement
this policy?
If not, you want the Flink to implement this mechanism, I think the
interface of current RM may be not enough.

thanks.


Yun Gao  于2018年11月28日周三 上午11:29写道:

> Hi yingjie,
>   Thanks for proposing the blacklist! I agree with that black list is
> important for job maintenance, since some jobs may not be able to failover
> automatically if some tasks are always scheduled to the problematic hosts
> or TMs. This will increase the burden of the operators since they need to
> pay more attention to the status of the jobs.
>
>   I have read the proposal and left some comments. I think a problem
> is how we cooperator with external resource managers (like YARN or Mesos)
> so that they will apply for resource according to our blacklist. If they
> cannot fully obey the blacklist, then we may need to deal with the
> inappropriate resource.
>
>  Looking forward to the future advance of this feature! Thanks again
> for the exciting proposal.
>
>
> Best,
> Yun Gao
>
>
>
> --
> From:zhijiang 
> Send Time:2018 Nov 27 (Tue) 10:40
> To:dev 
> Subject:回复:[DISCUSS]Enhancing flink scheduler by implementing blacklist
> mechanism
>
> Thanks yingjie for bringing this discussion.
>
> I encountered this issue during failover and also noticed other users
> complainting related issues in community before.
> So it is necessary to have this mechanism for enhancing schedule process
> first, and then enrich the internal rules step by step.
> Wish this feature working in the next major release. :)
>
> Best,
> Zhijiang
> --
> 发件人:Till Rohrmann 
> 发送时间:2018年11月5日(星期一) 18:43
> 收件人:dev 
> 主 题:Re: [DISCUSS]Enhancing flink scheduler by implementing blacklist
> mechanism
>
> Thanks for sharing this design document with the community Yingjie.
>
> I like the design to pass the job specific blacklisted TMs as a scheduling
> constraint. This makes a lot of sense to me.
>
> Cheers,
> Till
>
> On Fri, Nov 2, 2018 at 4:51 PM yingjie  wrote:
>
> > Hi everyone,
> >
> > This post proposes the blacklist mechanism as an enhancement of flink
> > scheduler. The motivation is as follows.
> >
> > In our clusters, jobs encounter Hardware and software environment
> problems
> > occasionally, including software library missing,bad hardware,resource
> > shortage like out of disk space,these problems will lead to task
> > failure,the
> > failover strategy will take care of that and redeploy the relevant tasks.
> > But because of reasons like location preference and limited total
> > resources,the failed task will be scheduled to be deployed on the same
> > host,
> > then the task will fail again and again, many times. The primary cause of
> > this problem is the mismatching of task and resource. Currently, the
> > resource allocation algorithm does not take these into consideration.
> >
> > We introduce the blacklist mechanism to solve this problem. The basic
> idea
> > is that when a task fails too many times on some resource, the Scheduler
> > will not assign the resource to that task. We have implemented this
> feature
> > in our inner version of flink, and currently, it works fine.
> >
> > The following is the design draft, we would really appreciate it if you
> can
> > review and comment.
> >
> >
> https://docs.google.com/document/d/1Qfb_QPd7CLcGT-kJjWSCdO8xFeobSCHF0vNcfiO4Bkw
> >
> > Best,
> > Yingjie
> >
> >
> >
> > --
> > Sent from:
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/
> >
>
>
>


Re: [DISCUSS] Unified Core API for Streaming and Batch

2018-12-04 Thread Guowei Ma
Hi, all
Thanks to Haibo for initiating this discussion in the community.

  - Relationship of DataStream, DataSet, and Table API
Table/DataStream/Dataset does have different aspects. For example,
DataStream can access State and Table cannot. DataStream can be easily
extended by users because they do not need to understand complex
optimization logic. On the other hand, as a developer, I think they need to
cooperate. When I develop a job, I want to use Table, DataStream, and
Dataset at the same time. Now DataStream and DataSet can't be converted to
each other.

  - Future of DataSet (subsumed in data stream, or remains independent)
IMO, if the DataSet can be converted to a DataStream, there may be no
difference if the DataSet is independent or included in the DataStream.

  - What happens with iterations
It seems a very interesting question of what iterations should look like.
There may be two options here, one based on compilation and one based on
interaction. Both options are beneficial, and based on compilation,
FailOver may be more controllable and scheduling more efficient. The
interaction-based approach is more flexible, but fault tolerance can be a
bit more difficult. Choosing these two methods may have a certain impact on
the subsequent Operator API design. At this point, I want to hear what the
community thinks.




Kurt Young  于2018年12月5日周三 上午11:09写道:

> Hi all,
>
> Really excited to see this discussion really happens, I also want to share
> my two cents here.
> Lets first focus on this question: “What Flink API Stack Should be for a
> Unified Engine".
>
> There are multiply ways to judge whether an engine is unified or not. From
> user's perspective, as long as you provides api for
> both stream and batch processing, can be considered unified. But that's
> definitely not the way a developer sees. I think developers
> cares more about the implementations. How many infrastructures are shared
> between different compute modes, network, functions,
> or even operators? Sharing more things is not a free lunch, it can make
> things more complicated, but the potential benefits is also
> huge.
>
> In current Flink's implementation, two things are shared. One is the
> network stack, and the other one is job scheduling. If we want to
> push the unify effort a little further, next thing we should consider is
> tasks and operators (BatchTask and Driver for current batch
> implementation).
> Here are the benefits I can see if we try to unify them:
> 1. State is open to batch processing, make batch failover more
> possibilities.
> 2. Stream processing can borrow some efficient ideas from batch, such as
> memory management, binary data processing.
> 3. Batch & stream operators can be mixed together, to meet more complicated
> computation requirements, such as progressive computation,
> which is not pure stream processing or batch processing
> 4. Make all developers a joint effort, working on same technique stack no
> matter you are mainly focus on stream or batch, even ML.
>
> And once the operator api is unified, we can next consider to have a more
> formal DAG api for various processing modes. I think we both agree
> the idea which Flink built upon: "stream is the basic, batch is just a
> special case of streaming". I think it's also make sense to have the DAG
> api mainly
> focus to describe a stream, whether it is bounded or not. I found
> StreamTransformation is good fit for this requirement. It has no semantics,
> just tell
> you the physical transformation we did on the stream. Like
> OneInputStreamTransfomation, all we should know is this takes one stream as
> input, have a
> operator to process the elements it received, and the output can be further
> transformed by another  OneInputStreamTransfomation, or be one input to a
> TwoInputStreamTransfomation. It describes how data flows, but have very
> limited information about how data be processed, is it be mapped, or be
> flatmapped.
>
> All the user API (DataStream, DataSet, Table) we now have, have semantics,
> or even consists of optimizers. Based on these thoughts, I will try to
> answer the questions Stephan has raised:
>
>   - Relationship of DataStream, DataSet, and Table API
> I think these three APIs can be independent for now. DataSet for pure batch
> processing, DataStream for pure stream processing and you want to deal with
> state explicitly,
> Table API for relational data processing.
>
>   - Where do we want automatic optimization, and where not
> DataStream, DataSet, and Table API can all have their own optimizations,
> but StreamTransformation does not.
>
>   - Future of DataSet (subsumed in data stream, or remains independent)
> I think it's better to remains independent for now, and subsumed in data
> stream in the future.
>
>   - What happens with iterations
> I think the more important question is how to describe iteration on stream
> transformations, what information can be hided, and what information must
> be exposed to transfo

Re: [DISCUSS] Start a user...@flink.apache.org mailing list for the Chinese-speaking community?

2019-01-24 Thread Guowei Ma
+1

This not only helps Chinese users but also helps the community to collect more 
feedback and scenarios.


> 在 2019年1月25日,上午2:29,Zhang, Xuefu  写道:
> 
> +1 on the idea. This will certainly help promote Flink in China industries. 
> On a side note, it would be great if anyone in the list can help source 
> ideas, bug reports, and feature requests to dev@ list and/or JIRAs so as to 
> gain broader attention.
> 
> Thanks,
> Xuefu
> 
> 
> --
> From:Fabian Hueske 
> Sent At:2019 Jan. 24 (Thu.) 05:32
> To:dev 
> Subject:Re: [DISCUSS] Start a user...@flink.apache.org mailing list for the 
> Chinese-speaking community?
> 
> Thanks Robert!
> I think this is a very good idea.
> +1
> 
> Fabian
> 
>> Am Do., 24. Jan. 2019 um 14:09 Uhr schrieb Jeff Zhang :
>> 
>> +1
>> 
>> Piotr Nowojski  于2019年1月24日周四 下午8:38写道:
>> 
>>> +1, good idea, especially with that many Chinese speaking contributors,
>>> committers & users :)
>>> 
>>> Piotrek
>>> 
 On 24 Jan 2019, at 13:20, Kurt Young  wrote:
 
 Big +1 on this, it will indeed help Chinese speaking users a lot.
 
 fudian.fd 于2019年1月24日 周四20:18写道:
 
> +1. I noticed that many folks from China are requesting the JIRA
> permission in the past year. It reflects that more and more developers
>>> from
> China are using Flink. A Chinese oriented mailing list will definitely
>>> be
> helpful for the growth of Flink in China.
> 
> 
>> 在 2019年1月24日,下午7:42,Stephan Ewen  写道:
>> 
>> +1, a very nice idea
>> 
>> On Thu, Jan 24, 2019 at 12:41 PM Robert Metzger >> 
> wrote:
>> 
>>> Thanks for your response.
>>> 
>>> You are right, I'm proposing "user...@flink.apache.org" as the
>>> mailing
>>> list's name!
>>> 
>>> On Thu, Jan 24, 2019 at 12:37 PM Tzu-Li (Gordon) Tai <
> tzuli...@apache.org>
>>> wrote:
>>> 
 Hi Robert,
 
 Thanks a lot for starting this discussion!
 
 +1 to a user-zh@flink.a.o mailing list (you mentioned -zh in the
> title,
 but
 -cn in the opening email content.
 I think -zh would be better as we are establishing the tool for
>>> general
 Chinese-speaking users).
 All dev@ discussions / JIRAs should still be in a single English
> mailing
 list.
 
 From what I've seen in the DingTalk Flink user group, there's
>> quite a
> bit
 of activity in forms of user questions and replies.
 It would really be great if the Chinese-speaking user community can
 actually have these discussions happen in the Apache mailing lists,
 so that questions / discussions / replies from developers can be
> indexed
 and searchable.
 Moreover, it'll give the community more insight in how active a
 Chinese-speaking contributor is helping with user requests,
 which in general is a form of contribution that the community
>> always
>>> merits
 a lot.
 
 Cheers,
 Gordon
 
 On Thu, Jan 24, 2019 at 12:15 PM Robert Metzger <
>> rmetz...@apache.org
 
 wrote:
 
> Hey all,
> 
> I would like to create a new user support mailing list called "
> user...@flink.apache.org" to cater the Chinese-speaking Flink
>>> community.
> 
> Why?
> In the last year 24% of the traffic on flink.apache.org came from
>>> the
 US,
> 22% from China. In the last three months, China is at 30%, the US
>> at
>>> 20%.
> An additional data point is that there's a Flink DingTalk group
>> with
>>> more
> than 5000 members, asking Flink questions.
> I believe that knowledge about Flink should be available in public
>>> forums
> (our mailing list), indexable by search engines. If there's a huge
>>> demand
> in a Chinese language support, we as a community should provide
>>> these
 users
> the tools they need, to grow our community and to allow them to
>>> follow
 the
> Apache way.
> 
> Is it possible?
> I believe it is, because a number of other Apache projects are
>>> running
> non-English user@ mailing lists.
> Apache OpenOffice, Cocoon, OpenMeetings, CloudStack all have
>>> non-English
> lists: http://mail-archives.apache.org/mod_mbox/
> One thing I want to make very clear in this discussion is that all
 project
> decisions, developer discussions, JIRA tickets etc. need to happen
>>> in
> English, as this is the primary language of the Apache Foundation
>>> and
>>> our
> community.
> We should also clarify this on the page listing the mailing lists.
> 
> How?
> If there is consensus in this discussion thread, I would request
>> the
>>> 

Re: [DISCUSS] Shall we make SpillableSubpartition repeatedly readable to support fine grained recovery

2019-01-24 Thread Guowei Ma
Thanks to zhijiang for a detailed explanation. I would do some supplements
Blink has indeed solved this particular problem. This problem can be
identified in Blink and the upstream will be restarted by Blink
thanks

zhijiang  于2019年1月25日周五 下午12:04写道:

> Hi Bo,
>
> Your mentioned problems can be summaried into two issues:
>
> 1. Failover strategy should consider whether the upstream produced
> partition is still available when the downstream fails. If the produced
> partition is available, then only downstream region needs to restarted,
> otherwise the upstream region should also be restarted to re-produce the
> partition data.
> 2. The lifecycle of partition: Currently once the partition data is
> transfered via network completely, the partition and view would be released
> from producer side, no matter whether the data is actually processed by
> consumer or not. Even the TaskManager would be released earier when the
> partition data is not transfered yet.
>
> Both issues are already considered in my proposed pluggable shuffle
> manager architecutre which would introduce the ShuffleMaster componenet to
> manage partition globally on JobManager side, then it is natural to solve
> the above problems based on this architecuture. You can refer to the flip
> [1] if interested.
>
> [1]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-31%3A+Pluggable+Shuffle+Manager
>
> Best,
> Zhijiang
> --
> From:Stephan Ewen 
> Send Time:2019年1月24日(星期四) 22:17
> To:dev ; Kurt Young 
> Subject:Re: [DISCUSS] Shall we make SpillableSubpartition repeatedly
> readable to support fine grained recovery
>
> The SpillableSubpartition can also be used during the execution of bounded
> DataStreams programs. I think this is largely independent from deprecating
> the DataSet API.
>
> I am wondering if this particular issue is one that has been addressed in
> the Blink code already (we are looking to merge much of that functionality)
> - because the proposed extension is actually necessary for proper batch
> fault tolerance (independent of the DataSet or Query Processor stack).
>
> I am adding Kurt to this thread - maybe he help us find that out.
>
> On Thu, Jan 24, 2019 at 2:36 PM Piotr Nowojski 
> wrote:
>
> > Hi,
> >
> > I’m not sure how much effort we will be willing to invest in the existing
> > batch stack. We are currently focusing on the support of bounded
> > DataStreams (already done in Blink and will be merged to Flink soon) and
> > unifing batch & stream under DataStream API.
> >
> > Piotrek
> >
> > > On 23 Jan 2019, at 04:45, Bo WANG  wrote:
> > >
> > > Hi all,
> > >
> > > When running the batch WordCount example,  I configured the job
> execution
> > > mode
> > > as BATCH_FORCED, and failover-strategy as region, I manually injected
> > some
> > > errors to let the execution fail in different phases. In some cases,
> the
> > > job could
> > > recovery from failover and became succeed, but in some cases, the job
> > > retried
> > > several times and failed.
> > >
> > > Example:
> > > - If the failure occurred before task read data, e.g., failed before
> > > invokable.invoke() in Task.java, failover could succeed.
> > > - If the failure occurred after task having read data, failover did not
> > > work.
> > >
> > > Problem diagnose:
> > > Running the example described before, each ExecutionVertex is defined
> as
> > > a restart region, and the ResultPartitionType between executions is
> > > BLOCKING.
> > > Thus, SpillableSubpartition and SpillableSubpartitionView are used to
> > > write/read
> > > shuffle data, and data blocks are described as BufferConsumers stored
> in
> > a
> > > list
> > > called buffers, when task requires input data from
> > > SpillableSubpartitionView,
> > > BufferConsumers are REMOVED from buffers. Thus, when failures occurred
> > > after having read data, some BufferConsumers have already released.
> > > Although tasks retried, the input data is incomplete.
> > >
> > > Fix Proposal:
> > > - BufferConsumer should not be removed from buffers until the consumed
> > > ExecutionVertex is terminal.
> > > - SpillableSubpartition should not be released until the consumed
> > > ExecutionVertex is terminal.
> > > - SpillableSubpartition could creates multi SpillableSubpartitionViews,
> > > each of which is corresponding to a ExecutionAttempt.
> > >
> > > Best,
> > > Bo
> >
> >
>
>


[DISCUSS] Enhance Operator API to Support Dynamically Selective Reading and EndOfInput Event

2019-02-01 Thread Guowei Ma
Hi, guys:
I propose a design to enhance Stream Operator API for Batch’s requirements.
This is also the Flink’s goal that Batch is a special case of Streaming. This
proposal mainly contains two changes to operator api:

1. Allow "StreamOperator" can choose which input to read;
2. Notify "StreamOperator" that an input has ended.


This proposal was discussed with Piotr Nowojski, Kostas Kloudas, Haibo Sun
offlline.
It will be great to hear the feed backs and suggestions from the community.
Please kindly share your comments and suggestions.

Best
GuoWei Ma.

 Enhance Operator API to Support Dynamically Sel...
<https://docs.google.com/document/d/10k5pQm3SkMiK5Zn1iFDqhQnzjQTLF0Vtcbc8poB4_c8/edit?usp=drive_web>


Re: [DISCUSS] Enhance Operator API to Support Dynamically Selective Reading and EndOfInput Event

2019-02-09 Thread Guowei Ma
> I don't understand the problem with timers. Timers are bound to the
> operator, not the input, so they should still work if an input ends.
> There are cases where some state in the operator that is only relevant as
> long as an input still has data (like in symmetric joins) and the timers
> are relevant to that state.
> When the state is dropped, the timers should also be dropped, but that is
> the operator's logic on "endInput()". So there is no inherent issue between
> input and timers.
>
> Best,
> Stephan
>
>
> On Sat, Feb 2, 2019 at 3:55 AM Guowei Ma  wrote:
>
> > Hi, guys:
> > I propose a design to enhance Stream Operator API for Batch’s
> requirements.
> > This is also the Flink’s goal that Batch is a special case of Streaming.
> > This
> > proposal mainly contains two changes to operator api:
> >
> > 1. Allow "StreamOperator" can choose which input to read;
> > 2. Notify "StreamOperator" that an input has ended.
> >
> >
> > This proposal was discussed with Piotr Nowojski, Kostas Kloudas, Haibo
> Sun
> > offlline.
> > It will be great to hear the feed backs and suggestions from the
> community.
> > Please kindly share your comments and suggestions.
> >
> > Best
> > GuoWei Ma.
> >
> >  Enhance Operator API to Support Dynamically Sel...
> > <
> >
> https://docs.google.com/document/d/10k5pQm3SkMiK5Zn1iFDqhQnzjQTLF0Vtcbc8poB4_c8/edit?usp=drive_web
> > >
> >
>


Re: [DISCUSS] FLIP-33: Terminate/Suspend Job with Savepoint

2019-02-12 Thread Guowei Ma
thanks for starting this discussion. It is a very cool feature.

+1 for the FLIP

Best
Guowei

jincheng sun  于2019年2月13日周三 上午9:35写道:

> Thank you for starting the discussion about cancel-with-savepoint Kostas.
>
> +1 for the FLIP.
>
> Cheers,
> Jincheng
>
> Fabian Hueske  于2019年2月13日周三 上午4:31写道:
>
> > Thanks for working on improving cancel-with-savepoint Kostas.
> > Distinguishing the termination modes would be a big step forward, IMO.
> >
> > Btw. there is already another FLIP-33 on the way.
> > This one should be FLIP-34.
> >
> > Cheers,
> > Fabian
> >
> > Am Di., 12. Feb. 2019 um 18:49 Uhr schrieb Stephan Ewen <
> se...@apache.org
> > >:
> >
> > > Thank you for starting this feature discussion.
> > > This is a feature that has been requested various times, great to see
> it
> > > happening.
> > >
> > > +1 for this FLIP
> > >
> > > On Tue, Feb 12, 2019 at 5:28 PM Kostas Kloudas <
> k.klou...@ververica.com>
> > > wrote:
> > >
> > > > Hi everyone,
> > > >
> > > >  A commonly used functionality offered by Flink is the
> > > > "cancel-with-savepoint" operation. When applied to the current
> > > exactly-once
> > > > sinks, the current implementation of the feature can be problematic,
> as
> > > it
> > > > does not guarantee that side-effects will be committed by Flink to
> the
> > > 3rd
> > > > party storage system.
> > > >
> > > >  This discussion targets fixing this issue and proposes the addition
> of
> > > two
> > > > termination modes, namely:
> > > > 1) SUSPEND, for temporarily stopping the job, e.g. for Flink
> > version
> > > > upgrading in your cluster
> > > > 2) TERMINATE, for terminal shut down which ends the stream and
> > sends
> > > > MAX_WATERMARK time, and flushes any state associated with (event
> time)
> > > > timers
> > > >
> > > > A google doc with the FLIP proposal can be found here:
> > > >
> > > >
> > >
> >
> https://docs.google.com/document/d/1EZf6pJMvqh_HeBCaUOnhLUr9JmkhfPgn6Mre_z6tgp8/edit?usp=sharing
> > > >
> > > > And the page for the FLIP is here:
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=103090212
> > > >
> > > >  The implementation sketch is far from complete, but it is worth
> > having a
> > > > discussion on the semantics as soon as possible. The implementation
> > > section
> > > > is going to be updated soon.
> > > >
> > > >  Looking forward to the discussion,
> > > >  Kostas
> > > >
> > > > --
> > > >
> > > > Kostas Kloudas | Software Engineer
> > > >
> > > >
> > > > 
> > > >
> > > > Follow us @VervericaData
> > > >
> > > > --
> > > >
> > > > Join Flink Forward  - The Apache Flink
> > > > Conference
> > > >
> > > > Stream Processing | Event Driven | Real Time
> > > >
> > > > --
> > > >
> > > > Data Artisans GmbH | Invalidenstrasse 115, 10115 Berlin, Germany
> > > >
> > > > --
> > > > Data Artisans GmbH
> > > > Registered at Amtsgericht Charlottenburg: HRB 158244 B
> > > > Managing Directors: Dr. Kostas Tzoumas, Dr. Stephan Ewen
> > > >
> > >
> >
>


Re: [DISCUSS] Enhance Operator API to Support Dynamically Selective Reading and EndOfInput Event

2019-02-15 Thread Guowei Ma
I agree with you. In the long run, we should clearly define which parts
should be exposed to users.
At present, AbstractStreamOperator exposes a lot of concrete
implementations, such as the details of the asynchronous Checkpoint
"OperatorSnapshotFutures", and how to release some resource that is needed
by the implementation. These parts might be transparent to the user. These
also lead to rely on Runtime implementation on the client side. In a way,
users might not see the AbstractStreamOperator class, users should only see
StreamOperator/OneInputStreamOperator/TwoInputStreamOperator Interface.

Aljoscha Krettek  于2019年2月14日周四 下午11:49写道:

> While we’re on operators and tasks, I think it would also make sense in
> the long run to move the logic that is now in
> AbstractStreamOperator.setup()/initializeState()/snapshot()/snapshotState()(and
> the other snapshotState()…)/dispose() outside of the operator itself. This
> logic is the same for every operator but shouldn’t really be in there. We
> currently have a very complicated dance between the StreamTask and
> AbstractStreamOperator for initialising the state backends that doesn’t
> really seem necessary.
>
> > On 14. Feb 2019, at 11:54, Stephan Ewen  wrote:
> >
> > To move this forward, would suggest the following:
> >
> >  - Let's quickly check which other classes need to change. I assume the
> > TwoInputStreamTask and StreamTwoInputProcessor ?
> >  - Can those changes be new classes that are used when the new operator
> is
> > used? The current TwoInputStreamTask and StreamTwoInputProcessor remain
> > until they are fully subsumed and are then removed.
> >
> >  - Do we need and other refactorings before, like some cleanup of the
> > Operator Config or the Operator Chain?
> >
> > Best,
> > Stephan
> >
> >
> > On Sun, Feb 10, 2019 at 7:25 AM Guowei Ma  wrote:
> >
> >> 2019.2.10
> >>
> >>
> >> Hi,Stephan
> >>
> >>
> >> Thank you very much for such detailed and constructive comments.
> >>
> >>
> >> *binary vs. n-ary* and *enum vs. integer*
> >>
> >>
> >> Considering the N-ary, as you mentioned, using integers may be a better
> >> choice.
> >>
> >>
> >> *generic selectable interface*
> >>
> >>
> >> You are right. This interface can be removed.
> >>
> >>
> >> *end-input*
> >>
> >> It is true that the Operator does not need to store the end-input state,
> >> which can be inferred by the system and notify the Operator at the right
> >> time. We can consider using this mechanism when the system can
> checkpoint
> >> the topology with the Finish Tasks.
> >>
> >>
> >> *early-out*
> >>
> >> It is reasonable for me not to consider this situation at present.
> >>
> >>
> >> *distributed stream deadlocks*
> >>
> >>
> >> At present, there is no deadlock for the streaming, but I think it might
> >> be  still necessary to do some validation(Warning or Reject) in
> JobGraph.
> >> Because once Flink introduces this TwoInputSelectable interface, the
> user
> >> of the streaming would also construct a diamond-style topology that may
> be
> >> deadlocked.
> >>
> >>
> >> *empty input / selection timeout*
> >>
> >> It is reasonable for me not to consider this situation at present.
> >>
> >>
> >> *timers*
> >>
> >> When all the inputs are finished, TimeService will wait until all timers
> >> are triggered. So there should be no problem. I and others guys are
> >> confirming the details to see if there are other considerations
> >>
> >>
> >> Best
> >>
> >> GuoWei
> >>
> >> Stephan Ewen  于2019年2月8日周五 下午7:56写道:
> >>
> >>> Nice design proposal, and +1 to the general idea.
> >>>
> >>> A few thoughts / suggestions:
> >>>
> >>> *binary vs. n-ary*
> >>>
> >>> I would plan ahead for N-ary operators. Not because we necessarily need
> >>> n-ary inputs (one can probably build that purely in the API) but
> because
> >> of
> >>> future side inputs. The proposal should be able to handle that as well.
> >>>
> >>> *enum vs. integer*
> >>>
> >>> The above might be easier is to realize when going directly with
> integer
> >>> and having ANY, FIRST, SECOND, etc. as pre-defined constants.
> >

Re: [ANNOUNCE] Zhu Zhu becomes a Flink committer

2019-12-15 Thread Guowei Ma
Congrats Zhuzhu!
Best,
Guowei


Zhenghua Gao  于2019年12月16日周一 上午10:47写道:

> Congrats!
>
> *Best Regards,*
> *Zhenghua Gao*
>
>
> On Mon, Dec 16, 2019 at 10:36 AM Biao Liu  wrote:
>
> > Congrats Zhu Zhu!
> >
> > Thanks,
> > Biao /'bɪ.aʊ/
> >
> >
> >
> > On Mon, 16 Dec 2019 at 10:23, Congxian Qiu 
> wrote:
> >
> > > Congrats, Zhu Zhu!
> > >
> > > Best,
> > > Congxian
> > >
> > >
> > > aihua li  于2019年12月16日周一 上午10:16写道:
> > >
> > > > Congratulations, zhuzhu!
> > > >
> > > > > 在 2019年12月16日,上午10:04,Jingsong Li  写道:
> > > > >
> > > > > Congratulations Zhu Zhu!
> > > > >
> > > > > Best,
> > > > > Jingsong Lee
> > > > >
> > > > > On Mon, Dec 16, 2019 at 10:01 AM Yang Wang 
> > > > wrote:
> > > > >
> > > > >> Congratulations, Zhu Zhu!
> > > > >>
> > > > >> wenlong.lwl  于2019年12月16日周一 上午9:56写道:
> > > > >>
> > > > >>> Congratulations, Zhu Zhu!
> > > > >>>
> > > > >>> On Mon, 16 Dec 2019 at 09:14, Leonard Xu 
> > wrote:
> > > > >>>
> > > >  Congratulations, Zhu Zhu ! !
> > > > 
> > > >  Best,
> > > >  Leonard Xu
> > > > 
> > > > > On Dec 16, 2019, at 07:53, Becket Qin 
> > > wrote:
> > > > >
> > > > > Congrats, Zhu Zhu!
> > > > >
> > > > > On Sun, Dec 15, 2019 at 10:26 PM Dian Fu <
> dian0511...@gmail.com>
> > > > >>> wrote:
> > > > >
> > > > >> Congrats Zhu Zhu!
> > > > >>
> > > > >>> 在 2019年12月15日,下午6:23,Zhu Zhu  写道:
> > > > >>>
> > > > >>> Thanks everyone for the warm welcome!
> > > > >>> It's my honor and pleasure to improve Flink with all of you
> in
> > > the
> > > > >>> community!
> > > > >>>
> > > > >>> Thanks,
> > > > >>> Zhu Zhu
> > > > >>>
> > > > >>> Benchao Li  于2019年12月15日周日 下午3:54写道:
> > > > >>>
> > > >  Congratulations!:)
> > > > 
> > > >  Hequn Cheng  于2019年12月15日周日
> 上午11:47写道:
> > > > 
> > > > > Congrats, Zhu Zhu!
> > > > >
> > > > > Best, Hequn
> > > > >
> > > > > On Sun, Dec 15, 2019 at 6:11 AM Shuyi Chen <
> > suez1...@gmail.com
> > > >
> > > >  wrote:
> > > > >
> > > > >> Congratulations!
> > > > >>
> > > > >> On Sat, Dec 14, 2019 at 7:59 AM Rong Rong <
> > > walter...@gmail.com>
> > > > >> wrote:
> > > > >>
> > > > >>> Congrats Zhu Zhu :-)
> > > > >>>
> > > > >>> --
> > > > >>> Rong
> > > > >>>
> > > > >>> On Sat, Dec 14, 2019 at 4:47 AM tison <
> > wander4...@gmail.com>
> > > >  wrote:
> > > > >>>
> > > >  Congratulations!:)
> > > > 
> > > >  Best,
> > > >  tison.
> > > > 
> > > > 
> > > >  OpenInx  于2019年12月14日周六 下午7:34写道:
> > > > 
> > > > > Congrats Zhu Zhu!
> > > > >
> > > > > On Sat, Dec 14, 2019 at 2:38 PM Jeff Zhang <
> > > zjf...@gmail.com
> > > > >>>
> > > > > wrote:
> > > > >
> > > > >> Congrats, Zhu Zhu!
> > > > >>
> > > > >> Paul Lam  于2019年12月14日周六
> > > 上午10:29写道:
> > > > >>
> > > > >>> Congrats Zhu Zhu!
> > > > >>>
> > > > >>> Best,
> > > > >>> Paul Lam
> > > > >>>
> > > > >>> Kurt Young  于2019年12月14日周六
> > 上午10:22写道:
> > > > >>>
> > > >  Congratulations Zhu Zhu!
> > > > 
> > > >  Best,
> > > >  Kurt
> > > > 
> > > > 
> > > >  On Sat, Dec 14, 2019 at 10:04 AM jincheng sun <
> > > > >> sunjincheng...@gmail.com>
> > > >  wrote:
> > > > 
> > > > > Congrats ZhuZhu and welcome on board!
> > > > >
> > > > > Best,
> > > > > Jincheng
> > > > >
> > > > >
> > > > > Jark Wu  于2019年12月14日周六
> 上午9:55写道:
> > > > >
> > > > >> Congratulations, Zhu Zhu!
> > > > >>
> > > > >> Best,
> > > > >> Jark
> > > > >>
> > > > >> On Sat, 14 Dec 2019 at 08:20, Yangze Guo <
> > > > >> karma...@gmail.com
> > > > 
> > > > >> wrote:
> > > > >>
> > > > >>> Congrats, ZhuZhu!
> > > > >>>
> > > > >>> Bowen Li  于 2019年12月14日周六
> > > > > 上午5:37写道:
> > > > >>>
> > > >  Congrats!
> > > > 
> > > >  On Fri, Dec 13, 2019 at 10:42 AM Xuefu Z <
> > > >  usxu...@gmail.com>
> > > >  wrote:
> > > > 
> > > > > Congratulations, Zhu Zhu!
> > > > >
> > > > > On Fri, Dec 13, 2019 at 10:37 AM Peter Huang <
> > > > >>>

Re: Understanding watermark

2020-01-14 Thread Guowei Ma
Hi, Cam,
I think you might want to know why the web page does not show the watermark
of the source.
Currently, the web only shows the "input" watermark. The source only
outputs the watermark so the web shows you that there is "No Watermark".
 Actually Flink has "output" watermark metrics. I think Flink should also
show this information on the web. Would you mind open a Jira to track this?


Best,
Guowei


Cam Mach  于2020年1月15日周三 上午4:05写道:

> Hi Till,
>
> Thanks for your response.
>
> Our sources are S3 and Kinesis. We have run several tests, and we are able
> to take savepoint/checkpoint, but only when S3 complete reading. And at
> that point, our pipeline has watermarks for other operators, but not the
> source operator. We are not running `PROCESS_CONTINUOUSLY`, so we should
> have watermark for the source as well, right?
>
>  Attached is snapshot of our pipeline.
>
> [image: image.png]
>
> Thanks
>
>
>
> On Tue, Jan 14, 2020 at 10:43 AM Till Rohrmann 
> wrote:
>
>> Hi Cam,
>>
>> could you share a bit more details about your job (e.g. which sources are
>> you using, what are your settings, etc.). Ideally you can provide a minimal
>> example in order to better understand the program.
>>
>> From a high level perspective, there might be different problems: First
>> of all, Flink does not support checkpointing/taking a savepoint if some of
>> the job's operator have already terminated iirc. But your description
>> points rather into the direction that your bounded source does not
>> terminate. So maybe you are reading a file via
>> StreamExecutionEnvironment.createFileInput
>> with FileProcessingMode.PROCESS_CONTINUOUSLY. But these things are hard to
>> tell without a better understanding of your job.
>>
>> Cheers,
>> Till
>>
>> On Mon, Jan 13, 2020 at 8:35 PM Cam Mach  wrote:
>>
>>> Hello Flink expert,
>>>
>>> We have a pipeline that read both bounded and unbounded sources and our
>>> understanding is that when the bounded sources complete they should get a
>>> watermark of +inf and then we should be able to take a savepoint and safely
>>> restart the pipeline. However, we have source that never get watermarks and
>>> we are confused as to what we are seeing and what we should expect
>>>
>>>
>>> Cam Mach
>>> Software Engineer
>>> E-mail: cammac...@gmail.com
>>> Tel: 206 972 2768
>>>
>>>


Re: Understanding watermark

2020-01-19 Thread Guowei Ma
>>What I understand from you, one operator has two watermarks? If so, one
operator's output watermark would be an input watermark of the next
operator? Does it sounds redundant?
There are no two watermarks for an operator. What I want to say is
"watermark metrics".

>>Or do you mean the Web UI only show the input watermarks of every
operator, but since the source doesn't have input watermark show it show
"No Watermark" ? And we should have output watermark for source?
Yes. But the web UI only shows the task level watermarks metrics, not the
operator level. Yout could find more detail information about metrics in
the link[1].

>>And, yes we want to understand when we should expect to see watermarks
for our "combined" sources (bounded and un-bounded) for our pipeline?
Do you try a topology with only Kinesis source and the web UI shows the
Watermark of source?  Actually, I think it might not be related to the
"combined" source.

[1]
https://ci.apache.org/projects/flink/flink-docs-release-1.9/monitoring/metrics.html
Best,
Guowei


Cam Mach  于2020年1月15日周三 下午3:53写道:

> Hi Guowei,
>
> Thanks for your response.
>
> What I understand from you, one operator has two watermarks? If so, one
> operator's output watermark would be an input watermark of the next
> operator? Does it sounds redundant?
>
> Or do you mean the Web UI only show the input watermarks of every
> operator, but since the source doesn't have input watermark show it show
> "No Watermark" ? And we should have output watermark for source?
>
> And, yes we want to understand when we should expect to see watermarks for
> our "combined" sources (bounded and un-bounded) for our pipeline?
>
> If you can be more directly, it would be very helpful.
>
> Thanks,
>
> On Tue, Jan 14, 2020 at 5:42 PM Guowei Ma  wrote:
>
>> Hi, Cam,
>> I think you might want to know why the web page does not show the
>> watermark of the source.
>> Currently, the web only shows the "input" watermark. The source only
>> outputs the watermark so the web shows you that there is "No Watermark".
>>  Actually Flink has "output" watermark metrics. I think Flink should also
>> show this information on the web. Would you mind open a Jira to track this?
>>
>>
>> Best,
>> Guowei
>>
>>
>> Cam Mach  于2020年1月15日周三 上午4:05写道:
>>
>>> Hi Till,
>>>
>>> Thanks for your response.
>>>
>>> Our sources are S3 and Kinesis. We have run several tests, and we are
>>> able to take savepoint/checkpoint, but only when S3 complete reading. And
>>> at that point, our pipeline has watermarks for other operators, but not the
>>> source operator. We are not running `PROCESS_CONTINUOUSLY`, so we should
>>> have watermark for the source as well, right?
>>>
>>>  Attached is snapshot of our pipeline.
>>>
>>> [image: image.png]
>>>
>>> Thanks
>>>
>>>
>>>
>>> On Tue, Jan 14, 2020 at 10:43 AM Till Rohrmann 
>>> wrote:
>>>
>>>> Hi Cam,
>>>>
>>>> could you share a bit more details about your job (e.g. which sources
>>>> are you using, what are your settings, etc.). Ideally you can provide a
>>>> minimal example in order to better understand the program.
>>>>
>>>> From a high level perspective, there might be different problems: First
>>>> of all, Flink does not support checkpointing/taking a savepoint if some of
>>>> the job's operator have already terminated iirc. But your description
>>>> points rather into the direction that your bounded source does not
>>>> terminate. So maybe you are reading a file via
>>>> StreamExecutionEnvironment.createFileInput
>>>> with FileProcessingMode.PROCESS_CONTINUOUSLY. But these things are hard to
>>>> tell without a better understanding of your job.
>>>>
>>>> Cheers,
>>>> Till
>>>>
>>>> On Mon, Jan 13, 2020 at 8:35 PM Cam Mach  wrote:
>>>>
>>>>> Hello Flink expert,
>>>>>
>>>>> We have a pipeline that read both bounded and unbounded sources and
>>>>> our understanding is that when the bounded sources complete they should 
>>>>> get
>>>>> a watermark of +inf and then we should be able to take a savepoint and
>>>>> safely restart the pipeline. However, we have source that never get
>>>>> watermarks and we are confused as to what we are seeing and what we should
>>>>> expect
>>>>>
>>>>>
>>>>> Cam Mach
>>>>> Software Engineer
>>>>> E-mail: cammac...@gmail.com
>>>>> Tel: 206 972 2768
>>>>>
>>>>>


Re: [ANNOUNCE] Yu Li became a Flink committer

2020-01-24 Thread Guowei Ma
Congratulations
Best,
Guowei


刘建刚  于2020年1月24日周五 下午5:56写道:

> Congratulations!
>
> > 2020年1月23日 下午4:59,Stephan Ewen  写道:
> >
> > Hi all!
> >
> > We are announcing that Yu Li has joined the rank of Flink committers.
> >
> > Yu joined already in late December, but the announcement got lost because
> > of the Christmas and New Years season, so here is a belated proper
> > announcement.
> >
> > Yu is one of the main contributors to the state backend components in the
> > recent year, working on various improvements, for example the RocksDB
> > memory management for 1.10.
> > He has also been one of the release managers for the big 1.10 release.
> >
> > Congrats for joining us, Yu!
> >
> > Best,
> > Stephan
>
>


Re: [VOTE] FLIP-27 - Refactor Source Interface

2020-02-03 Thread Guowei Ma
+1 (non-binding), thanks for driving.

Best,
Guowei


Jingsong Li  于2020年2月4日周二 上午11:20写道:

> +1 (non-binding), thanks for driving.
> FLIP-27 is the basis of a lot of follow-up work.
>
> Best,
> Jingsong Lee
>
> On Tue, Feb 4, 2020 at 10:26 AM Jark Wu  wrote:
>
> > Thanks for driving this Becket!
> >
> > +1 from my side.
> >
> > Cheers,
> > Jark
> >
> > On Mon, 3 Feb 2020 at 18:06, Yu Li  wrote:
> >
> > > +1, thanks for the efforts Becket!
> > >
> > > Best Regards,
> > > Yu
> > >
> > >
> > > On Mon, 3 Feb 2020 at 17:52, Becket Qin  wrote:
> > >
> > > > Bump up the thread.
> > > >
> > > > On Tue, Jan 21, 2020 at 10:43 AM Becket Qin 
> > > wrote:
> > > >
> > > > > Hi Folks,
> > > > >
> > > > > I'd like to resume the voting thread for FlIP-27.
> > > > >
> > > > > Please note that the FLIP wiki has been updated to reflect the
> latest
> > > > > discussions in the discussion thread.
> > > > >
> > > > > To avoid confusion, I'll only count the votes casted after this
> > point.
> > > > >
> > > > > FLIP wiki:
> > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-27
> > > > > %3A+Refactor+Source+Interface
> > > > >
> > > > > Discussion thread:
> > > > >
> > > > >
> > > >
> > >
> >
> https://lists.apache.org/thread.html/70484d6aa4b8e7121181ed8d5857a94bfb7d5a76334b9c8fcc59700c%40%3Cdev.flink.apache.org%3E
> > > > >
> > > > > The vote will last for at least 72 hours, following the consensus
> > > voting
> > > > >  process.
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Jiangjie (Becket) Qin
> > > > >
> > > > > On Thu, Dec 5, 2019 at 10:31 AM jincheng sun <
> > sunjincheng...@gmail.com
> > > >
> > > > > wrote:
> > > > >
> > > > >> +1 (binding), and looking forward to seeing the new interface in
> the
> > > > >> master.
> > > > >>
> > > > >> Best,
> > > > >> Jincheng
> > > > >>
> > > > >> Becket Qin  于2019年12月5日周四 上午8:05写道:
> > > > >>
> > > > >> > Hi all,
> > > > >> >
> > > > >> > I would like to start the vote for FLIP-27 which proposes to
> > > > introduce a
> > > > >> > new Source connector interface to address a few problems in the
> > > > existing
> > > > >> > source connector. The main goals of the the FLIP are following:
> > > > >> >
> > > > >> > 1. Unify the Source interface in Flink for batch and stream.
> > > > >> > 2. Significantly reduce the work for developers to develop new
> > > source
> > > > >> > connectors.
> > > > >> > 3. Provide a common abstraction for all the sources, as well as
> a
> > > > >> mechanism
> > > > >> > to allow source subtasks to coordinate among themselves.
> > > > >> >
> > > > >> > The vote will last for at least 72 hours, following the
> consensus
> > > > voting
> > > > >> > process.
> > > > >> >
> > > > >> > FLIP wiki:
> > > > >> >
> > > > >> >
> > > > >>
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-27%3A+Refactor+Source+Interface
> > > > >> >
> > > > >> > Discussion thread:
> > > > >> >
> > > > >> >
> > > > >>
> > > >
> > >
> >
> https://lists.apache.org/thread.html/70484d6aa4b8e7121181ed8d5857a94bfb7d5a76334b9c8fcc59700c@%3Cdev.flink.apache.org%3E
> > > > >> >
> > > > >> > Thanks,
> > > > >> >
> > > > >> > Jiangjie (Becket) Qin
> > > > >> >
> > > > >>
> > > > >
> > > >
> > >
> >
>
>
> --
> Best, Jingsong Lee
>


Re: [ANNOUNCE] Jingsong Lee becomes a Flink committer

2020-02-26 Thread Guowei Ma
Congratulations !!
Best,
Guowei


Yun Tang  于2020年2月27日周四 上午2:11写道:

> Congratulations and well deserved!
>
>
> Best
> Yun Tang
> 
> From: Canbin Zheng 
> Sent: Monday, February 24, 2020 16:07
> To: dev 
> Subject: Re: [ANNOUNCE] Jingsong Lee becomes a Flink committer
>
> Congratulations !!
>
> Dawid Wysakowicz  于2020年2月24日周一 下午3:55写道:
>
> > Congratulations Jingsong!
> >
> > Best,
> >
> > Dawid
> >
> > On 24/02/2020 08:12, zhenya Sun wrote:
> > > Congratulations!!!
> > > | |
> > > zhenya Sun
> > > |
> > > |
> > > toke...@126.com
> > > |
> > > 签名由网易邮箱大师定制
> > >
> > >
> > > On 02/24/2020 14:35,Yu Li wrote:
> > > Congratulations Jingsong! Well deserved.
> > >
> > > Best Regards,
> > > Yu
> > >
> > >
> > > On Mon, 24 Feb 2020 at 14:10, Congxian Qiu 
> > wrote:
> > >
> > > Congratulations Jingsong!
> > >
> > > Best,
> > > Congxian
> > >
> > >
> > > jincheng sun  于2020年2月24日周一 下午1:38写道:
> > >
> > > Congratulations Jingsong!
> > >
> > > Best,
> > > Jincheng
> > >
> > >
> > > Zhu Zhu  于2020年2月24日周一 上午11:55写道:
> > >
> > > Congratulations Jingsong!
> > >
> > > Thanks,
> > > Zhu Zhu
> > >
> > > Fabian Hueske  于2020年2月22日周六 上午1:30写道:
> > >
> > > Congrats Jingsong!
> > >
> > > Cheers, Fabian
> > >
> > > Am Fr., 21. Feb. 2020 um 17:49 Uhr schrieb Rong Rong <
> > > walter...@gmail.com>:
> > >
> > > Congratulations Jingsong!!
> > >
> > > Cheers,
> > > Rong
> > >
> > > On Fri, Feb 21, 2020 at 8:45 AM Bowen Li  wrote:
> > >
> > > Congrats, Jingsong!
> > >
> > > On Fri, Feb 21, 2020 at 7:28 AM Till Rohrmann  > >
> > > wrote:
> > >
> > > Congratulations Jingsong!
> > >
> > > Cheers,
> > > Till
> > >
> > > On Fri, Feb 21, 2020 at 4:03 PM Yun Gao 
> > > wrote:
> > >
> > > Congratulations Jingsong!
> > >
> > > Best,
> > > Yun
> > >
> > > --
> > > From:Jingsong Li 
> > > Send Time:2020 Feb. 21 (Fri.) 21:42
> > > To:Hequn Cheng 
> > > Cc:Yang Wang ; Zhijiang <
> > > wangzhijiang...@aliyun.com>; Zhenghua Gao ;
> > > godfrey
> > > he ; dev ; user <
> > > u...@flink.apache.org>
> > > Subject:Re: [ANNOUNCE] Jingsong Lee becomes a Flink committer
> > >
> > > Thanks everyone~
> > >
> > > It's my pleasure to be part of the community. I hope I can make a
> > > better
> > > contribution in future.
> > >
> > > Best,
> > > Jingsong Lee
> > >
> > > On Fri, Feb 21, 2020 at 2:48 PM Hequn Cheng 
> > > wrote:
> > > Congratulations Jingsong! Well deserved.
> > >
> > > Best,
> > > Hequn
> > >
> > > On Fri, Feb 21, 2020 at 2:42 PM Yang Wang 
> > > wrote:
> > > Congratulations!Jingsong. Well deserved.
> > >
> > >
> > > Best,
> > > Yang
> > >
> > > Zhijiang  于2020年2月21日周五 下午1:18写道:
> > > Congrats Jingsong! Welcome on board!
> > >
> > > Best,
> > > Zhijiang
> > >
> > > --
> > > From:Zhenghua Gao 
> > > Send Time:2020 Feb. 21 (Fri.) 12:49
> > > To:godfrey he 
> > > Cc:dev ; user 
> > > Subject:Re: [ANNOUNCE] Jingsong Lee becomes a Flink committer
> > >
> > > Congrats Jingsong!
> > >
> > >
> > > *Best Regards,*
> > > *Zhenghua Gao*
> > >
> > >
> > > On Fri, Feb 21, 2020 at 11:59 AM godfrey he 
> > > wrote:
> > > Congrats Jingsong! Well deserved.
> > >
> > > Best,
> > > godfrey
> > >
> > > Jeff Zhang  于2020年2月21日周五 上午11:49写道:
> > > Congratulations!Jingsong. You deserve it
> > >
> > > wenlong.lwl  于2020年2月21日周五 上午11:43写道:
> > > Congrats Jingsong!
> > >
> > > On Fri, 21 Feb 2020 at 11:41, Dian Fu 
> > > wrote:
> > >
> > > Congrats Jingsong!
> > >
> > > 在 2020年2月21日,上午11:39,Jark Wu  写道:
> > >
> > > Congratulations Jingsong! Well deserved.
> > >
> > > Best,
> > > Jark
> > >
> > > On Fri, 21 Feb 2020 at 11:32, zoudan  wrote:
> > >
> > > Congratulations! Jingsong
> > >
> > >
> > > Best,
> > > Dan Zou
> > >
> > >
> > >
> > >
> > >
> > > --
> > > Best Regards
> > >
> > > Jeff Zhang
> > >
> > >
> > >
> > > --
> > > Best, Jingsong Lee
> > >
> > >
> > >
> > >
> > >
> > >
> >
> >
>


Re: [DISCUSS] Drop Bucketing Sink

2020-03-13 Thread Guowei Ma
+1 to drop it.

To Jingsong :
we are planning to implement the orc StreamingFileSink in 1.11.
I think users also could reference the old BucktSink from the old version.

Best,
Guowei


Jingsong Li  于2020年3月13日周五 上午10:07写道:

> Hi Robert,
>
> +1 to drop it but maybe not 1.11.
>
> ORC has not been supported on StreamingFileSink. I have seen lots of users
> run ORC in the bucketing sink.
>
> Best,
> Jingsong Lee
>
> On Fri, Mar 13, 2020 at 1:11 AM Seth Wiesman  wrote:
>
> > Sorry, I meant FLIP-46.
> >
> > Seth
> >
> > On Thu, Mar 12, 2020 at 11:52 AM Seth Wiesman 
> wrote:
> >
> > > I agree with David, I think FLIP-49 needs to be prioritized for 1.11 if
> > we
> > > want to drop the bucketing sink.
> > >
> > > Seth
> > >
> > > On Thu, Mar 12, 2020 at 10:53 AM David Anderson 
> > > wrote:
> > >
> > >> The BucketingSink is still somewhat widely used, I think in part
> because
> > >> of
> > >> shortcomings in the StreamingFileSink.
> > >>
> > >> I would hope that in tandem with removing the bucketing sink we could
> > also
> > >> address some of these issues. I'm thinking in particular of issues
> that
> > >> are
> > >> waiting on FLIP-46 [1].
> > >>
> > >> Removing the bucketing sink will go down better, in my opinion, if
> it's
> > >> coupled with progress on some of the open StreamingFileSink tickets.
> > >>
> > >> Best,
> > >> David
> > >>
> > >> [1]
> > >>
> > >>
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-46%3A+Graceful+Shutdown+Handling+by+UDFs
> > >>
> > >>
> > >> On Thu, Mar 12, 2020 at 4:27 PM Zhijiang  > >> .invalid>
> > >> wrote:
> > >>
> > >> > Thanks for driving this discussion, Robert!
> > >> >
> > >> > This e2e test really fails frequently.  +1 to drop bucketing sink,
> it
> > is
> > >> > not worth paying more efforts since deprecated.
> > >> >
> > >> > Best,
> > >> > Zhijiang
> > >> >
> > >> >
> > >> > --
> > >> > From:Jeff Zhang 
> > >> > Send Time:2020 Mar. 12 (Thu.) 23:17
> > >> > To:dev 
> > >> > Subject:Re: [DISCUSS] Drop Bucketing Sink
> > >> >
> > >> > +1, dropping deprecated api is always necessary for a sustainable
> > >> project.
> > >> >
> > >> > Kostas Kloudas  于2020年3月12日周四 下午11:06写道:
> > >> >
> > >> > > Hi Robert,
> > >> > >
> > >> > > +1 for dropping the BucketingSink.
> > >> > > In any case, it has not been maintained for quite some time.
> > >> > >
> > >> > > Cheers,
> > >> > > Kostas
> > >> > >
> > >> > > On Thu, Mar 12, 2020 at 3:41 PM Robert Metzger <
> rmetz...@apache.org
> > >
> > >> > > wrote:
> > >> > > >
> > >> > > > Hi all,
> > >> > > >
> > >> > > > I'm currently investigating a failing end to end test for the
> > >> bucketing
> > >> > > > sink [1].
> > >> > > > The bucketing sink has been deprecated in the 1.9 release [2],
> > >> because
> > >> > we
> > >> > > > have the new StreamingFileSink [3] for quite a while.
> > >> > > > Before putting any effort into fixing the end to end test for
> the
> > >> > sink, I
> > >> > > > wanted to propose dropping the bucketing sink from master for
> the
> > >> > > upcoming
> > >> > > > 1.11 release.
> > >> > > >
> > >> > > > What do you think?
> > >> > > >
> > >> > > >
> > >> > > >
> > >> > > > [1] https://issues.apache.org/jira/browse/FLINK-16227
> > >> > > > [2] https://issues.apache.org/jira/browse/FLINK-13396
> > >> > > > [3] https://issues.apache.org/jira/browse/FLINK-9749
> > >> > >
> > >> >
> > >> >
> > >> > --
> > >> > Best Regards
> > >> >
> > >> > Jeff Zhang
> > >> >
> > >> >
> > >>
> > >
> >
>
>
> --
> Best, Jingsong Lee
>


Re: [DISCUSS] FLIP-115: Filesystem connector in Table

2020-03-19 Thread Guowei Ma
Hi,


I am very interested in the topic. I would like to join the offline
discussion if possible. I think you guys already give many inputs and
concerns. I would share some of my thoughts. Correct me if I misunderstand
you.

Flink is a unified engine. Since that Flink should provide the e2e
exactly-once semantics for the user in both streaming and batch. E2E
exactly-once is not a trivial thing.

StreamingFileSink already does a lot of work on how to support e2e
exactly-once semantics for the “file” output scenario. It provides a
layered architecture

   1.

   BulkWriter/Encode deals with the data format in a file.
   2.

   BucketAssinger/RollingPolicy deals with the lifecycle of the file and
   directory structure.
   3.

   RecoverableWriter deals with the exactly-once semantics.


All these layers are orthogonal and could be combined with each other. This
could reduce much of the work. (Currently, there are some limitations.)
There are some already known issues such as how to support batch in the
StreamingFileSink, How to support orc and so on. But there are already some
discussions on how to resolve these issues.

Jinsong also gives some new cases that the StreamingFileSink might not
support currently.  I am very glad to see that you all agree that improving
the StreamingFileSink architecture for these new cases.

Best,
Guowei


Jingsong Li  于2020年3月19日周四 上午12:19写道:

> Hi Stephan & Kostas & Piotrek, thanks for these inputs,
>
> Maybe what I expressed is not clear. For the implementation, I want to know
> what you think, rather than must making another set from scratch. Piotrek
> you are right, implementation is the part of this FLIP too, because we can
> not list all detail things in the FLIP, so the implementation do affect
> user's behaviors. And the maintenance / development costs are also points.
>
> I think you already persuaded me. I am thinking about based on
> StreamingFileSink. And extending StreamingFileSink can solve "partition
> commit" requirement, I have tried in my POC. And it is true, Recoverable
> things and S3 things also important.
> So I listed "What is missing" for StreamingFileSink in previous mail. (It
> is not defense for making another set from scratch)
>
> Hi Stephan,
>
> >> The FLIP is "Filesystem connector in Table", it's about building up
> Flink
> Table's capabilities.
>
> What I mean is this is not just for Hive, this FLIP is for table. So we
> don't need do all things for Hive. But Hive is also a "format" (or
> something else) of Filesystem connector. Its requirements can be
> considered.
>
> I think you are right about the design, and let me take this seriously, a
> unify way make us stronger, less confuse, less surprise, more rigorous
> design. And I am pretty sure table things are good for enhancing DataStream
> api too.
>
> Hi Kostas,
>
> Yes, Parquet and Orc are the main formats. Good to know your support~
>
> I think streaming file sink and file system connector things are important
> to Flink, it is good&time to think about these common requirements, think
> about batch support, it is not just about table, it is for whole Flink too.
> If there are some requirements that is hard to support or violates existing
> design. Exclude them.
>
> Best,
> Jingsong Lee
>
>
> On Wed, Mar 18, 2020 at 10:31 PM Piotr Nowojski 
> wrote:
>
> > Hi Kurt,
> >
> > +1 for having some offline discussion on this topic.
> >
> > But I think the question about using StreamingFileSink or implementing
> > subset of it’s feature from scratch is quite fundamental design decision,
> > with impact on the behaviour of Public API, so I wouldn’t discard it as
> > technical detail and should be included as part of the FLIP (I know It
> > could be argued in opposite direction).
> >
> > Piotrek
> >
> > > On 18 Mar 2020, at 13:55, Kurt Young  wrote:
> > >
> > > Hi all,
> > >
> > > Thanks for the discuss and feedbacks. I think this FLIP doesn't imply
> the
> > > implementation
> > > of such connector yet, it only describes the functionality and expected
> > > behaviors from user's
> > > perspective. Reusing current StreamingFileSink is definitely one of the
> > > possible ways to
> > > implement it. Since there are lots of details and I would suggest we
> can
> > > have an offline meeting
> > > to discuss the how these could be achieved by extending
> StremingFileSink,
> > > and how much
> > > effort we need to put on it. What do you think?
> > >
> > > Best,
> > > Kurt
> > >
> > >
> > > On Wed, Mar 18, 2020 at 7:21 PM Kostas Kloudas 
> > wrote:
> > >
> > >> Hi all,
> > >>
> > >> I also agree with Stephan on this!
> > >>
> > >> It has been more than a year now that most of our efforts have had the
> > >> "unify" / "unification"/ etc either on their title or in their core
> > >> and this has been the focus of all our resources. By deviating from
> > >> this now, we only put more stress on other teams in the future. When
> > >> the users start using a given API, with high probability, they will
> > >> ask (

Re: [VOTE] FLIP-147: Support Checkpoint After Tasks Finished

2021-01-18 Thread Guowei Ma
+1 non-binding
Best,
Guowei


On Fri, Jan 15, 2021 at 10:56 PM Yun Gao 
wrote:

>
> Hi all,
>
> I would like to start the vote for FLIP-147[1], which propose to support
> checkpoints after
> tasks finished and is discussed in [2].
>
> The vote will last at least 72 hours (Jan 20th due to weekend), following
> the consensus
> voting process.
>
> thanks,
>  Yun
>
>
> [1] https://cwiki.apache.org/confluence/x/mw-ZCQ
> [2]
> https://lists.apache.org/thread.html/r2780b46267af6e98c7427cb98b36de8218f1499ae098044e1f24c527%40%3Cdev.flink.apache.org%3E


Re: [ANNOUNCE] Apache Flink 1.12.1 released

2021-01-19 Thread Guowei Ma
Thanks Xintong's effort!
Best,
Guowei


On Tue, Jan 19, 2021 at 5:37 PM Yangze Guo  wrote:

> Thanks Xintong for the great work!
>
> Best,
> Yangze Guo
>
> On Tue, Jan 19, 2021 at 4:47 PM Till Rohrmann 
> wrote:
> >
> > Thanks a lot for driving this release Xintong. This was indeed a release
> with some obstacles to overcome and you did it very well!
> >
> > Cheers,
> > Till
> >
> > On Tue, Jan 19, 2021 at 5:59 AM Xingbo Huang  wrote:
> >>
> >> Thanks Xintong for the great work!
> >>
> >> Best,
> >> Xingbo
> >>
> >> Peter Huang  于2021年1月19日周二 下午12:51写道:
> >>
> >> > Thanks for the great effort to make this happen. It paves us from
> using
> >> > 1.12 soon.
> >> >
> >> > Best Regards
> >> > Peter Huang
> >> >
> >> > On Mon, Jan 18, 2021 at 8:16 PM Yang Wang 
> wrote:
> >> >
> >> > > Thanks Xintong for the great work as our release manager!
> >> > >
> >> > >
> >> > > Best,
> >> > > Yang
> >> > >
> >> > > Xintong Song  于2021年1月19日周二 上午11:53写道:
> >> > >
> >> > >> The Apache Flink community is very happy to announce the release of
> >> > >> Apache Flink 1.12.1, which is the first bugfix release for the
> Apache
> >> > Flink
> >> > >> 1.12 series.
> >> > >>
> >> > >> Apache Flink® is an open-source stream processing framework for
> >> > >> distributed, high-performing, always-available, and accurate data
> >> > streaming
> >> > >> applications.
> >> > >>
> >> > >> The release is available for download at:
> >> > >> https://flink.apache.org/downloads.html
> >> > >>
> >> > >> Please check out the release blog post for an overview of the
> >> > >> improvements for this bugfix release:
> >> > >> https://flink.apache.org/news/2021/01/19/release-1.12.1.html
> >> > >>
> >> > >> The full release notes are available in Jira:
> >> > >>
> >> > >>
> >> >
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12349459
> >> > >>
> >> > >> We would like to thank all contributors of the Apache Flink
> community
> >> > who
> >> > >> made this release possible!
> >> > >>
> >> > >> Regards,
> >> > >> Xintong
> >> > >>
> >> > >
> >> >
>


Re: [DISCUSS] FLINK-21110: Optimize Scheduler Performance for Large-Scale Jobs

2021-02-02 Thread Guowei Ma
Thanks to ZhiLong for improving the scheduler's performance.
I think many users would benefit from your work!
Best,
Guowei


On Wed, Feb 3, 2021 at 2:04 AM Till Rohrmann  wrote:

> Thanks for making the community aware of these performance improvements,
> Zhilong. I like them and I am looking forward to a faster Flink :-)
>
> Cheers,
> Till
>
> On Tue, Feb 2, 2021 at 11:00 AM Zhilong Hong 
> wrote:
>
> > Hello, everyone:
> >
> > I would like to start the discussion about FLINK-21110: Optimize
> Scheduler
> > Performance for Large-Scale Jobs [1].
> >
> > According to the result of scheduler benchmarks we implemented in
> > FLINK-20612 [2], the bottleneck of deploying and running a large-scale
> job
> > in Flink is mainly focused on the following procedures:
> >
> > Procedure   Time complexity
> > Initializing ExecutionGraph
> > O(N^2)
> > Building DefaultExecutionTopology
> > O(N^2)
> > Initializing PipelinedRegionSchedulingStrategy
> > O(N^2)
> > Scheduling downstream tasks when a task finishes
> > O(N^2)
> > Calculating tasks to restart when a failover occurs
> > O(N^2)
> > Releasing result partitions
> > O(N^2)
> >
> > These procedures are all related to the complexity of the topology in the
> > ExecutionGraph. Between two vertices connected with the all-to-all edges,
> > all the upstream Intermediate ResultPartitions are connected to all
> > downstream ExecutionVertices. The computation complexity of building and
> > traversing all these edges will be O(N^2).
> >
> > As for memory usage, currently we use ExecutionEdges to store the
> > information of connections. For the all-to-all distribution type, there
> are
> > O(N^2) ExecutionEdges. We test a simple job with only two vertices. The
> > parallelisms of them are both 10k. Furthermore, they are connected with
> > all-to-all edges. It takes 4.175 GiB (estimated via MXBean) to store the
> > 100M ExecutionEdges.
> >
> > In most large-scale jobs, there will be more than two vertices with large
> > parallelisms, and they would cost a lot of time and memory to deploy the
> > job.
> >
> > As we can see, for two JobVertices connected with the all-to-all
> > distribution type, all IntermediateResultPartitions produced by the
> > upstream ExecutionVertices are isomorphic, which means that the
> downstream
> > ExecutionVertices they connected are exactly the same. The downstream
> > ExecutionVertices belonging to the same JobVertex are also isomorphic, as
> > the upstream ResultPartitions they connect are the same, too.
> >
> > Since every JobEdge has exactly one distribution type, we can divide the
> > vertices and result partitions into groups according to the distribution
> > type of the JobEdge.
> >
> > For the all-to-all distribution type, since all downstream vertices are
> > isomorphic, they belong to a single group, and all the upstream result
> > partitions are connected to this group. Vice versa, all the upstream
> result
> > partitions also belong to a single group, and all the downstream vertices
> > are connected to this group. In the past, when we wanted to iterate all
> the
> > downstream vertices, we needed to loop over them n times, which leads to
> > the complexity of O(N^2). Now since all upstream result partitions are
> > connected to one downstream group, we just need to loop over them once,
> > with the complexity of O(N).
> >
> > For the pointwise distribution type, because each result partition is
> > connected to different downstream vertices, they should belong to
> different
> > groups. Vice versa, all the vertices belong to different groups. Since
> one
> > result partition group is connected to one vertex group pointwisely, the
> > computation complexity of looping over them is still O(N).
> >
> > After we group the result partitions and vertices, ExecutionEdge is no
> > longer needed. For the test job we mentioned above, the optimization can
> > effectively reduce the memory usage from 4.175 GiB to 12.076 MiB
> (estimated
> > via MXBean) in our POC. The time cost is reduced from 62.090 seconds to
> > 8.551 seconds (with 10k parallelism).
> >
> > The detailed design doc with illustrations is located at [3]. Please find
> > more details in the links below.
> >
> > Looking forward to your feedback.
> >
> > [1] https://issues.apache.org/jira/browse/FLINK-21110
> > [2] https://issues.apache.org/jira/browse/FLINK-20612
> > [3]
> >
> https://docs.google.com/document/d/1OjGAyJ9Z6KsxcMtBHr6vbbrwP9xye7CdCtrLvf8dFYw/edit?usp=sharing
> >
> >
>


Re: [DISCUSS] FLIP-147: Support Checkpoints After Tasks Finished

2021-02-16 Thread Guowei Ma
Thanks Yun for the detailed explanation.
A simple supplementary explanation about the sink case: Maybe we could use
`OperatorCoordinator` to avoid sending the element to the downstream
operator.
But I agree we could not limit the users not to emit records in the
`notiyCheckpointComplete`.

Best,
Guowei


On Tue, Feb 16, 2021 at 2:06 PM Yun Gao 
wrote:

> Hi all,
>
> I'd like to first detail the issue with emitting records in
> notifyCheckpointComplete for context. For specific usage,
> an example would be for sink, it might want to write some metadata after
> all the transactions are committed
> (like write a marker file _SUCCESS to the output directory). This case is
> currently supported via the two level
> committers of the new sink API: when received endOfInput(), the Committer
> wait for another checkpoint to
> commits all the pending transactions and emit the list of files to the
> GlobalCommitter. The GlobalCommitter
> would wait for another checkpoint to also write the metadata with 2pc
> (Although sometimes 2pc is not needed
> for writing metadata, it should be only an optimization and still requires
> the Committer do commit before
> notifying the global Committer. Also another note is GlobalCommitter is
> also added for some other cases
> like some sinks want an commiter with dop = 1, like IceBergSink).
>
> However, a more general issue to me is that currently we do not limit
> users to not emit records in
> notifyCheckpointComplete in the API level. The sink case could be viewed
> as a special case, but in addition
> to this one, logically users could also implement their own cases that
> emits records in notifyCheckpointComplete.
>
> Best,
> Yun
>
>  --Original Mail --
> Sender:Arvid Heise 
> Send Date:Fri Feb 12 20:46:04 2021
> Recipients:dev 
> CC:Yun Gao 
> Subject:Re: [DISCUSS] FLIP-147: Support Checkpoints After Tasks Finished
> Hi Piotr,
>
>
>
> Thank you for raising your concern. Unfortunately, I do not have a better
>
> idea than doing closing of operators intermittently with checkpoints (=
>
> multiple last checkpoints).
>
>
>
> However, two ideas on how to improve the overall user experience:
>
> 1. If an operator is not relying on notifyCheckpointComplete, we can close
>
> it faster (without waiting for a checkpoint). In general, I'd assume that
>
> almost all non-sinks behave that way.
>
> 2. We may increase the checkpointing frequency for the last checkpoints. We
>
> need to avoid overloading checkpoint storages and task managers, but I
>
> assume the more operators are closed, the lower the checkpointing interval
>
> can be.
>
>
>
> For 1, I'd propose to add (name TBD):
>
>
>
> default boolean StreamOperator#requiresFinalCheckpoint() {
>
>  return true;
>
> }
>
>
>
> This means all operators are conservatively (=slowly) closed. For most
>
> operators, we can then define their behavior by overriding in
>
> AbstractUdfStreamOperator
>
>
>
> @Override
>
> boolean AbstractUdfStreamOperator#requiresFinalCheckpoint() {
>
>  return userFunction instanceof CheckpointListener;
>
> }
>
>
>
> This idea can be further refined in also adding requiresFinalCheckpoint to
>
> CheckpointListener to exclude all operators with UDFs that implement
>
> CheckpointListener but do not need it for 2pc.
>
>
>
> @Override
>
> boolean AbstractUdfStreamOperator#requiresFinalCheckpoint() {
>
>  return userFunction instanceof CheckpointListener &&
>
>  ((CheckpointListener) userFunction).requiresFinalCheckpoint();
>
> }
>
>
>
> That approach would also work for statebackends/snapshot strategies that
>
> require some 2pc.
>
>
>
> If we can contain it to the @PublicEvolving StreamOperator, it would be
>
> better of course.
>
>
>
> Best,
>
>
>
> Arvid
>
>
>
> On Fri, Feb 12, 2021 at 11:36 AM Piotr Nowojski
>
> wrote:
>
>
>
> > Hey,
>
> >
>
> > I would like to raise a concern about implementation of the final
>
> > checkpoints taking into account operators/functions that are implementing
>
> > two phase commit (2pc) protocol for exactly-once processing with some
>
> > external state (kept outside of the Flink). Primarily exactly-once sinks.
>
> >
>
> > First of all, as I understand it, this is not planned in the first
> version
>
> > of this FLIP. I'm fine with that, however I would strongly emphasize this
>
> > in every place we will be mentioning FLIP-147 efforts. This is because
> me,
>
> > as a user, upon hearing "Flink supports checkpointing with bounded
> inputs"
>
> > I would expect 2pc to work properly and to commit the external side
> effects
>
> > upon finishing. As it is now, I (as a user) would be surprised with a
>
> > silent data loss (of not committed trailing data). This is just a remark,
>
> > that we need to attach this warning to every blog post/documentation/user
>
> > mailing list response related to "Support Checkpoints After Tasks
>
> > Finished". Also I would suggest to prioritize the follow up of supporting
>
> > 2pc.
>
> >
>
> > Secondly, I think we are miss

Re: [ANNOUNCE] Welcome Roman Khachatryan a new Apache Flink Committer

2021-02-16 Thread Guowei Ma
Congratulations Roman!
Best,
Guowei


On Thu, Feb 11, 2021 at 3:37 PM Yun Tang  wrote:

> Congratulations, Roman!
>
> Today is also the beginning of Chinese Spring Festival holiday, at which
> we Chinese celebrate across the world for the next lunar new year, and also
> very happy to have you on board!
>
> Best
> Yun Tang
> 
> From: Roman Khachatryan 
> Sent: Thursday, February 11, 2021 4:03
> To: matth...@ververica.com 
> Cc: dev 
> Subject: Re: [ANNOUNCE] Welcome Roman Khachatryan a new Apache Flink
> Committer
>
> Many thanks to all of you!
>
> Regards,
> Roman
>
>
> On Wed, Feb 10, 2021 at 7:12 PM Matthias Pohl 
> wrote:
>
> > Congratulations, Roman! :-)
> >
> > On Wed, Feb 10, 2021 at 3:23 PM Kezhu Wang  wrote:
> >
> >> Congratulations!
> >>
> >> Best,
> >> Kezhu Wang
> >>
> >>
> >> On February 10, 2021 at 21:53:52, Dawid Wysakowicz (
> >> dwysakow...@apache.org)
> >> wrote:
> >>
> >> Congratulations Roman! Glad to have you on board!
> >>
> >> Best,
> >>
> >> Dawid
> >>
> >> On 10/02/2021 14:44, Igal Shilman wrote:
> >> > Welcome Roman!
> >> > Top-notch stuff! :)
> >> >
> >> > All the best,
> >> > Igal.
> >> >
> >> > On Wed, Feb 10, 2021 at 2:15 PM Kostas Kloudas 
> >> wrote:
> >> >
> >> >> Congrats Roman!
> >> >>
> >> >> Kostas
> >> >>
> >> >> On Wed, Feb 10, 2021 at 2:08 PM Arvid Heise 
> wrote:
> >> >>> Congrats! Well deserved.
> >> >>>
> >> >>> On Wed, Feb 10, 2021 at 1:54 PM Yun Gao
>  >> >
> >> >>> wrote:
> >> >>>
> >>  Congratulations Roman!
> >> 
> >>  Best,
> >>  Yun
> >> 
> >> 
> >>  --Original Mail --
> >>  Sender:Till Rohrmann 
> >>  Send Date:Wed Feb 10 20:53:21 2021
> >>  Recipients:dev 
> >>  CC:Khachatryan Roman , Roman
> >> Khachatryan
> >> >> <
> >>  ro...@apache.org>
> >>  Subject:Re: [ANNOUNCE] Welcome Roman Khachatryan a new Apache Flink
> >>  Committer
> >>  Congratulations Roman :-)
> >> 
> >>  Cheers,
> >>  Till
> >> 
> >>  On Wed, Feb 10, 2021 at 1:01 PM Konstantin Knauf <
> kna...@apache.org>
> >>  wrote:
> >> 
> >> > Congratulations Roman!
> >> >
> >> > On Wed, Feb 10, 2021 at 11:29 AM Piotr Nowojski <
> >> >> pnowoj...@apache.org>
> >> > wrote:
> >> >
> >> >> Hi everyone,
> >> >>
> >> >> I'm very happy to announce that Roman Khachatryan has accepted
> the
> >> >> invitation to
> >> >> become a Flink committer.
> >> >>
> >> >> Roman has been recently active in the runtime parts of the Flink.
> >> >> He is
> >> > one
> >> >> of the main developers behind FLIP-76 Unaligned Checkpoints,
> >> >> FLIP-151
> >> >> Incremental Heap/FS State Backend [3] and providing a faster
> >> > checkpointing
> >> >> mechanism in FLIP-158.
> >> >>
> >> >> Please join me in congratulating Roman for becoming a Flink
> >> >> committer!
> >> >> Best,
> >> >> Piotrek
> >> >>
> >> >> [1]
> >> >>
> >> >>
> >> >>
> >>
> >>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-76%3A+Unaligned+Checkpoints
> >> >> [2]
> >> >>
> >> >>
> >> >>
> >>
> >>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-151%3A+Incremental+snapshots+for+heap-based+state+backend
> >> >> [3]
> >> >>
> >> >>
> >> >>
> >>
> >>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-158%3A+Generalized+incremental+checkpoints
> >> >
> >> > --
> >> >
> >> > Konstantin Knauf
> >> >
> >> > https://twitter.com/snntrable
> >> >
> >> > https://github.com/knaufk
> >> >
> >
> >
>


Re: Re: [ANNOUNCE] New Apache Flink Committers - Wei Zhong and Xingbo Huang

2021-02-23 Thread Guowei Ma
Congratulations Wei and Xingbo!

Best,
Guowei


On Wed, Feb 24, 2021 at 9:21 AM Zhu Zhu  wrote:

> Congratulations Wei and Xingbo!
>
> Thanks,
> Zhu
>
> Zhijiang  于2021年2月23日周二 上午10:59写道:
>
> > Congratulations Wei and Xingbo!
> >
> >
> > Best,
> > Zhijiang
> >
> >
> > --
> > From:Yun Tang 
> > Send Time:2021年2月23日(星期二) 10:58
> > To:Roman Khachatryan ; dev 
> > Subject:Re: Re: [ANNOUNCE] New Apache Flink Committers - Wei Zhong and
> > Xingbo Huang
> >
> > Congratulation!
> >
> > Best
> > Yun Tang
> > 
> > From: Yun Gao 
> > Sent: Tuesday, February 23, 2021 10:56
> > To: Roman Khachatryan ; dev 
> > Subject: Re: Re: [ANNOUNCE] New Apache Flink Committers - Wei Zhong and
> > Xingbo Huang
> >
> > Congratulations Wei and Xingbo!
> >
> > Best,
> > Yun
> >
> >
> >  --Original Mail --
> > Sender:Roman Khachatryan 
> > Send Date:Tue Feb 23 00:59:22 2021
> > Recipients:dev 
> > Subject:Re: [ANNOUNCE] New Apache Flink Committers - Wei Zhong and Xingbo
> > Huang
> > Congratulations!
> >
> > Regards,
> > Roman
> >
> >
> > On Mon, Feb 22, 2021 at 12:22 PM Yangze Guo  wrote:
> >
> > > Congrats,  Well deserved!
> > >
> > > Best,
> > > Yangze Guo
> > >
> > > On Mon, Feb 22, 2021 at 6:47 PM Yang Wang 
> wrote:
> > > >
> > > > Congratulations Wei & Xingbo!
> > > >
> > > > Best,
> > > > Yang
> > > >
> > > > Rui Li  于2021年2月22日周一 下午6:23写道:
> > > >
> > > > > Congrats Wei & Xingbo!
> > > > >
> > > > > On Mon, Feb 22, 2021 at 4:24 PM Yuan Mei 
> > > wrote:
> > > > >
> > > > > > Congratulations Wei & Xingbo!
> > > > > >
> > > > > > Best,
> > > > > > Yuan
> > > > > >
> > > > > > On Mon, Feb 22, 2021 at 4:04 PM Yu Li  wrote:
> > > > > >
> > > > > > > Congratulations Wei and Xingbo!
> > > > > > >
> > > > > > > Best Regards,
> > > > > > > Yu
> > > > > > >
> > > > > > >
> > > > > > > On Mon, 22 Feb 2021 at 15:56, Till Rohrmann <
> > trohrm...@apache.org>
> > > > > > wrote:
> > > > > > >
> > > > > > > > Congratulations Wei & Xingbo. Great to have you as committers
> > in
> > > the
> > > > > > > > community now.
> > > > > > > >
> > > > > > > > Cheers,
> > > > > > > > Till
> > > > > > > >
> > > > > > > > On Mon, Feb 22, 2021 at 5:08 AM Xintong Song <
> > > tonysong...@gmail.com>
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Congratulations, Wei & Xingbo~! Welcome aboard.
> > > > > > > > >
> > > > > > > > > Thank you~
> > > > > > > > >
> > > > > > > > > Xintong Song
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On Mon, Feb 22, 2021 at 11:48 AM Dian Fu <
> dia...@apache.org>
> > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Hi all,
> > > > > > > > > >
> > > > > > > > > > On behalf of the PMC, I’m very happy to announce that Wei
> > > Zhong
> > > > > and
> > > > > > > > > Xingbo
> > > > > > > > > > Huang have accepted the invitation to become Flink
> > > committers.
> > > > > > > > > >
> > > > > > > > > > - Wei Zhong mainly works on PyFlink and has driven
> several
> > > > > > important
> > > > > > > > > > features in PyFlink, e.g. Python UDF dependency
> management
> > > > > > (FLIP-78),
> > > > > > > > > > Python UDF support in SQL (FLIP-106, FLIP-114), Python
> UDAF
> > > > > support
> > > > > > > > > > (FLIP-139), etc. He has contributed the first PR of
> PyFlink
> > > and
> > > > > > have
> > > > > > > > > > contributed 100+ commits since then.
> > > > > > > > > >
> > > > > > > > > > - Xingbo Huang's contribution is also mainly in PyFlink
> and
> > > has
> > > > > > > driven
> > > > > > > > > > several important features in PyFlink, e.g. performance
> > > > > optimizing
> > > > > > > for
> > > > > > > > > > Python UDF and Python UDAF (FLIP-121, FLINK-16747,
> > > FLINK-19236),
> > > > > > > Pandas
> > > > > > > > > > UDAF support (FLIP-137), Python UDTF support
> (FLINK-14500),
> > > > > > row-based
> > > > > > > > > > Operations support in Python Table API (FLINK-20479),
> etc.
> > > He is
> > > > > > also
> > > > > > > > > > actively helping on answering questions in the user
> mailing
> > > list,
> > > > > > > > helping
> > > > > > > > > > on the release check, monitoring the status of the azure
> > > > > pipeline,
> > > > > > > etc.
> > > > > > > > > >
> > > > > > > > > > Please join me in congratulating Wei Zhong and Xingbo
> Huang
> > > for
> > > > > > > > becoming
> > > > > > > > > > Flink committers!
> > > > > > > > > >
> > > > > > > > > > Regards,
> > > > > > > > > > Dian
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Best regards!
> > > > > Rui Li
> > > > >
> > >
> >
> >
>


Re: [DISCUSS] FLIP-147: Support Checkpoints After Tasks Finished

2021-02-24 Thread Guowei Ma
Hi, Till
Thank you very much for your careful consideration

*1. Emit records in `NotifyCheckpointComplete`.*
Sorry for making you misunderstanding because of my expression. I just
want to say the current interface does not prevent users from doing it.
>From the perspective of the new sink api, we might not depend on emitting
records in `NotifyCheckpointComplete`, like using `OperatorCoordinator`
instead.


*2. What does the FLIP-147 guarantee?*I think initially this FLIP want to
achieve two targets:
1. Tasks/Operators exit correctly (as you mentioned the lifecycle of a
Task/StreamTask/StreamOperator.).
2. Continue to trigger checkpoint after some tasks for mixed jobs.

I think the first thing is related to the discussion in FLINK-21133. If I
understand correctly, in addition to supporting the tasks / operators to
exit correctly, now we also want to unify the process of the tasks and
operators for savepoint / finish.
I think the second thing is orthogonal to the FLINK-21133 because there are
topologies that have both the bounded and unbounded input.

*3. How to unify the operator exit process of FLIP-147 with
stop-with-savepoint?*
I  am not very sure about how to do it yet. But if I understand the
discussion in the jira correctly it needs to introduce some logic into
`CheckpointCoordinator`, which responses for triggering “the unified
operator exit process”.  Am I correct?

Best,
Guowei


On Tue, Feb 23, 2021 at 5:10 PM Till Rohrmann  wrote:

> Thanks for the explanation Yun and Guowei. I have to admit that I do not
> fully understand why this is strictly required but I think that we are
> touching two very important aspects which might have far fetching
> consequences for how Flink works:
>
> 1) Do we want to allow that multiple checkpoints are required to
> materialize results?
> 2) Do we want to allow to emit records in notifyCheckpointComplete?
>
> For 1) I am not sure whether this has been discussed within the community
> sufficiently. Requiring multiple checkpoints to materialize a result
> because of multi level committers has the consequence that we increase the
> latency from checkpoint interval to #levels * checkpoint interval.
> Moreover, having to drain the pipeline in multiple steps, would break the
> stop-with-savepoint --drain because which savepoint do you report to the
> user?
>
> For 2) allowing to send records after the final notifyCheckpointComplete
> will effectively mean that we need to shut down a topology in multiple
> steps (in the worst case one operator per checkpoint). This would be a
> strong argument for not allowing this to me. The fact that users can send
> records after the notifyCheckpointComplete is more by accident than by
> design. I think we should make this a very deliberate decision and in doubt
> I would be in favour of a more restrictive model unless there is a very
> good reason why this should be supported.
>
> Taking also the discussion in FLINK-21133 [1] into account, it seems to me
> that we haven't really understood what kind of guarantees we want to give
> to our users and how the final checkpoint should exactly work. I understand
> that this is not included in the first scope of FLIP-147 but I think this
> is so important that we should figure this out asap. Also because the exact
> shut down behaviour will have to be aligned with the lifecycle of a
> Task/StreamTask/StreamOperator. And last but not least because other
> features such as the new sink API start building upon a shut down model
> which has not been fully understood/agreed upon.
>
> [1] https://issues.apache.org/jira/browse/FLINK-21133
>
> Cheers,
> Till
>
> On Tue, Feb 16, 2021 at 9:45 AM Guowei Ma  wrote:
>
> > Thanks Yun for the detailed explanation.
> > A simple supplementary explanation about the sink case: Maybe we could
> use
> > `OperatorCoordinator` to avoid sending the element to the downstream
> > operator.
> > But I agree we could not limit the users not to emit records in the
> > `notiyCheckpointComplete`.
> >
> > Best,
> > Guowei
> >
> >
> > On Tue, Feb 16, 2021 at 2:06 PM Yun Gao 
> > wrote:
> >
> > > Hi all,
> > >
> > > I'd like to first detail the issue with emitting records in
> > > notifyCheckpointComplete for context. For specific usage,
> > > an example would be for sink, it might want to write some metadata
> after
> > > all the transactions are committed
> > > (like write a marker file _SUCCESS to the output directory). This case
> is
> > > currently supported via the two level
> > > committers of the new sink API: when received endOfInput(), the
> Committer
> > > wait for another checkpoint to
> > > commits all the pending transactio

[DISCUSSION] Introduce a separated memory pool for the TM merge shuffle

2021-03-04 Thread Guowei Ma
Hi, all


In the Flink 1.12 we introduce the TM merge shuffle. But the out-of-the-box
experience of using TM merge shuffle is not very good. The main reason is
that the default configuration always makes users encounter OOM [1]. So we
hope to introduce a managed memory pool for TM merge shuffle to avoid the
problem.
Goals

   1. Don't affect the streaming and pipelined-shuffle-only batch setups.
   2. Don't mix memory with different life cycle in the same pool. E.g.,
   write buffers needed by running tasks and read buffer needed even after
   tasks being finished.
   3. User can use the TM merge shuffle with default memory configurations.
   (May need further tunings for performance optimization, but should not fail
   with the default configurations.)

Proposal

   1. Introduce a configuration `taskmanager.memory.network.batch-read` to
   specify the size of this memory pool. The default value is 16m.
   2. Allocate the pool lazily. It means that the memory pool would be
   allocated when the TM merge shuffle is used at the first time.
   3. This pool size will not be add up to the TM's total memory size, but
   will be considered part of `taskmanager.memory.framework.off-heap.size`. We
   need to check that the pool size is not larger than the framework off-heap
   size, if TM merge shuffle is enabled.


In this default configuration, the allocation of the memory pool is almost
impossible to fail. Currently the default framework’s off-heap memory is
128m, which is mainly used by Netty. But after we introduced zero copy, the
usage of it has been reduced, and you can refer to the detailed data [2].
Known Limitation
Usability for increasing the memory pool size

In addition to increasing `taskmanager.memory.network.batch-read`, the user
may also need to adjust `taskmanager.memory.framework.off-heap.size` at the
same time. It also means that once the user forgets this, it is likely to
fail the check when allocating the memory pool.


So in the following two situations, we will still prompt the user to
increase the size of `framework.off-heap.size`.

   1. `taskmanager.memory.network.batch-read` is bigger than
   `taskmanager.memory.framework.off-heap.size`
   2. Allocating the pool encounters the OOM.


An alternative is that when the user adjusts the size of the memory pool,
the system automatically adjusts it. But we are not entierly sure about
this, given its implicity and complicating the memory configurations.
Potential memory waste

In the first step, the memory pool will not be released once allocated. This
means in the first step, even if there is no subsequent batch job, the
pooled memory cannot be used by other consumers.


We are not releasing the pool in the first step due to the concern that
frequently allocating/deallocating the entire pool may increase the GC
pressue. Investitations on how to dynamically release the pool when it's no
longer needed is considered a future follow-up.


Looking forward to your feedback.



[1] https://issues.apache.org/jira/browse/FLINK-20740

[2] https://github.com/apache/flink/pull/7368.
Best,
Guowei


Re: Re: [DISCUSSION] Introduce a separated memory pool for the TM merge shuffle

2021-03-09 Thread Guowei Ma
Hi, all

Thanks all for your suggestions and feedback.
I think it is a good idea that we increase the default size of the
separated pool by testing. I am fine with adding the suffix(".size") to the
config name, which makes it more clear to the user.
But I am a little worried about adding a prefix("framework") because
currently the tm shuffle service is only a shuffle-plugin, which is not a
part of the framework. So maybe we could add a clear explanation in the
document?

Best,
Guowei


On Tue, Mar 9, 2021 at 3:58 PM 曹英杰(北牧)  wrote:

> Thanks for the suggestions. I will do some tests and share the results
> after the implementation is ready. Then we can give a proper default value.
>
> Best,
> Yingjie
>
> --
> 发件人:Till Rohrmann
> 日 期:2021年03月05日 23:03:10
> 收件人:Stephan Ewen
> 抄 送:dev; user; Xintong Song<
> tonysong...@gmail.com>; 曹英杰(北牧); Guowei Ma<
> guowei@gmail.com>
> 主 题:Re: [DISCUSSION] Introduce a separated memory pool for the TM merge
> shuffle
>
> Thanks for this proposal Guowei. +1 for it.
>
> Concerning the default size, maybe we can run some experiments and see how
> the system behaves with different pool sizes.
>
> Cheers,
> Till
>
> On Fri, Mar 5, 2021 at 2:45 PM Stephan Ewen  wrote:
>
>> Thanks Guowei, for the proposal.
>>
>> As discussed offline already, I think this sounds good.
>>
>> One thought is that 16m sounds very small for a default read buffer pool.
>> How risky do you think it is to increase this to 32m or 64m?
>>
>> Best,
>> Stephan
>>
>> On Fri, Mar 5, 2021 at 4:33 AM Guowei Ma  wrote:
>>
>>> Hi, all
>>>
>>>
>>> In the Flink 1.12 we introduce the TM merge shuffle. But the
>>> out-of-the-box experience of using TM merge shuffle is not very good. The
>>> main reason is that the default configuration always makes users encounter
>>> OOM [1]. So we hope to introduce a managed memory pool for TM merge shuffle
>>> to avoid the problem.
>>> Goals
>>>
>>>1. Don't affect the streaming and pipelined-shuffle-only batch
>>>setups.
>>>2. Don't mix memory with different life cycle in the same pool.
>>>E.g., write buffers needed by running tasks and read buffer needed even
>>>after tasks being finished.
>>>3. User can use the TM merge shuffle with default memory
>>>configurations. (May need further tunings for performance optimization, 
>>> but
>>>should not fail with the default configurations.)
>>>
>>> Proposal
>>>
>>>1. Introduce a configuration `taskmanager.memory.network.batch-read`
>>>to specify the size of this memory pool. The default value is 16m.
>>>2. Allocate the pool lazily. It means that the memory pool would be
>>>allocated when the TM merge shuffle is used at the first time.
>>>3. This pool size will not be add up to the TM's total memory size,
>>>but will be considered part of
>>>`taskmanager.memory.framework.off-heap.size`. We need to check that the
>>>pool size is not larger than the framework off-heap size, if TM merge
>>>shuffle is enabled.
>>>
>>>
>>> In this default configuration, the allocation of the memory pool is
>>> almost impossible to fail. Currently the default framework’s off-heap
>>> memory is 128m, which is mainly used by Netty. But after we introduced zero
>>> copy, the usage of it has been reduced, and you can refer to the detailed
>>> data [2].
>>> Known Limitation
>>> Usability for increasing the memory pool size
>>>
>>> In addition to increasing `taskmanager.memory.network.batch-read`, the
>>> user may also need to adjust `taskmanager.memory.framework.off-heap.size`
>>> at the same time. It also means that once the user forgets this, it is
>>> likely to fail the check when allocating the memory pool.
>>>
>>>
>>> So in the following two situations, we will still prompt the user to
>>> increase the size of `framework.off-heap.size`.
>>>
>>>1. `taskmanager.memory.network.batch-read` is bigger than
>>>`taskmanager.memory.framework.off-heap.size`
>>>2. Allocating the pool encounters the OOM.
>>>
>>>
>>> An alternative is that when the user adjusts the size of the memory
>>> pool, the system automatically adjusts it. But we are not entierly sure
>>> about this, given its implicity and complicating the memo

Re: Re: Re: [DISCUSSION] Introduce a separated memory pool for the TM merge shuffle

2021-03-23 Thread Guowei Ma
Hi,
I discussed with Xingtong and Yingjie offline and we agreed that the name
`taskmanager.memory.framework.off-heap.batch-shuffle.size` can better
reflect the current memory usage. So we decided to use the name Till
suggested.
Thank you all for your valuable feedback.
Best,
Guowei


On Mon, Mar 22, 2021 at 5:21 PM Stephan Ewen  wrote:

> Hi Yingjie!
>
> Thanks for doing those experiments, the results look good. Let's go ahead
> with 32M then.
>
> Regarding the key, I am not strongly opinionated there. There are
> arguments for both keys, (1) making the key part of the network pool config
> as you did here or (2) making it part of the TM config (relative to
> framework off-heap memory). I find (1) quite understandable, but it is
> personal taste, so I can go with either option.
>
> Best,
> Stephan
>
>
> On Mon, Mar 22, 2021 at 9:15 AM 曹英杰(北牧)  wrote:
>
>> Hi all,
>>
>> I have tested the default memory size with both batch (tpcds) and
>> streaming jobs running in one session cluster (more than 100 queries). The
>> result is:
>> 1. All settings (16M, 32M and 64M) can work well without any OOM.
>> 2. For streaming jobs running after batch jobs, there is no performance
>> or stability regression.
>> 2. 32M and 64M is better (over 10%) in terms of performance for the test
>> batch job on HDD.
>>
>> Based on the above results, I think 32M is a good default choice, because
>> the performance is good enough for the test job and compared to 64M, more
>> direct memory can be used by netty and other components. What do you think?
>>
>> BTW, about the configuration key, do we reach a consensus? I am
>> temporarily using taskmanager.memory.network.batch-shuffle-read.size in
>> my PR now. Any suggestions about that?
>>
>> Best,
>> Yingjie (Kevin)
>>
>> --
>> 发件人:Guowei Ma
>> 日 期:2021年03月09日 17:28:35
>> 收件人:曹英杰(北牧)
>> 抄 送:Till Rohrmann; Stephan Ewen;
>> dev; user; Xintong Song<
>> tonysong...@gmail.com>
>> 主 题:Re: Re: [DISCUSSION] Introduce a separated memory pool for the TM
>> merge shuffle
>>
>> Hi, all
>>
>> Thanks all for your suggestions and feedback.
>> I think it is a good idea that we increase the default size of the
>> separated pool by testing. I am fine with adding the suffix(".size") to the
>> config name, which makes it more clear to the user.
>> But I am a little worried about adding a prefix("framework") because
>> currently the tm shuffle service is only a shuffle-plugin, which is not a
>> part of the framework. So maybe we could add a clear explanation in the
>> document?
>>
>> Best,
>> Guowei
>>
>>
>> On Tue, Mar 9, 2021 at 3:58 PM 曹英杰(北牧)  wrote:
>>
>>> Thanks for the suggestions. I will do some tests and share the results
>>> after the implementation is ready. Then we can give a proper default value.
>>>
>>> Best,
>>> Yingjie
>>>
>>> --
>>> 发件人:Till Rohrmann
>>> 日 期:2021年03月05日 23:03:10
>>> 收件人:Stephan Ewen
>>> 抄 送:dev; user; Xintong
>>> Song; 曹英杰(北牧); Guowei
>>> Ma
>>> 主 题:Re: [DISCUSSION] Introduce a separated memory pool for the TM merge
>>> shuffle
>>>
>>> Thanks for this proposal Guowei. +1 for it.
>>>
>>> Concerning the default size, maybe we can run some experiments and see
>>> how the system behaves with different pool sizes.
>>>
>>> Cheers,
>>> Till
>>>
>>> On Fri, Mar 5, 2021 at 2:45 PM Stephan Ewen  wrote:
>>>
>>>> Thanks Guowei, for the proposal.
>>>>
>>>> As discussed offline already, I think this sounds good.
>>>>
>>>> One thought is that 16m sounds very small for a default read buffer
>>>> pool. How risky do you think it is to increase this to 32m or 64m?
>>>>
>>>> Best,
>>>> Stephan
>>>>
>>>> On Fri, Mar 5, 2021 at 4:33 AM Guowei Ma  wrote:
>>>>
>>>>> Hi, all
>>>>>
>>>>>
>>>>> In the Flink 1.12 we introduce the TM merge shuffle. But the
>>>>> out-of-the-box experience of using TM merge shuffle is not very good. The
>>>>> main reason is that the default configuration always makes users encounter
>>>>> OOM [1]. So we hope to introduce a managed memory pool for TM merge 
>>>>> shuffle
>>>&

Re: [DISCUSS] Introducing Backpressure for connected streams

2021-03-24 Thread Guowei Ma
Hi Robin

Thank you for bringing up this discussion. AFAIK there are many same
requirements.But it might lead to a deadlock if we depend on pausing one
input of two to align the watermark.
After the FLIP-27 Flink would introduce some new mechanism for aligning the
watermark of different sources .Maybe @Becket could give some inputs or
some plans for this.

Best,
Guowei


On Wed, Mar 24, 2021 at 1:46 PM Robin KC  wrote:

> Hi all,
>
> The issue has been discussed before here -
>
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/Sharing-state-between-subtasks-td24489.html
>
> Our use case requires event time join of two streams and we use
> ConnectedStreams for the same. Within the CoProcessFunction, we buffer
> records until watermark and perform the join and business logic based on
> watermark. The issue is if one stream is slower than the other, the buffer
> (a rocksdb state) is unnecessarily filled by continuously reading from the
> fast stream.
>
> I took an inspiration from a response on the same thread
> <
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/Sharing-state-between-subtasks-tp24489p24564.html
> >
> by Elias Levy -
>
> The idea I was suggesting is not for operators to block an input.  Rather,
> it is that they selectively choose from which input to process the next
> message from based on their timestamp, so long as there are buffered
> messages waiting to be processed.  That is a best-effort alignment
> strategy.  Seems to work relatively well in practice, at least within Kafka
> Streams.
>
> E.g. at the moment StreamTwoInputProcessor uses a UnionInputGate for both
> its inputs.  Instead, it could keep them separate and selectively consume
> from the one that had a buffer available, and if both have buffers
> available, from the buffer with the messages with a lower timestamp.
>
> And attempted a POC implementation of CoBackpressure whenever 2 streams are
> connected. This is committed in a branch in my own fork -
>
> https://github.com/apache/flink/commit/65eef7a5baeec9db588a6ba1f8fe339b3b436fb1
>
> The approach is
>
>1. Provide a new method setCoBackpressureThreshold on ConnectedStream
>2. Pass the user-provided CoBackpressureThreshold through various
>classes till StreamTwoInputProcessorFactory.
>3. Implement a WatermarkHandler in StreamTwoInputProcessorFactory that
>pauses input1 or input2 if the diff between watermarks is greater than
> the
>threshold. In other words, it selectively chooses the next input which
> is
>lagging behind.
>
> Some key points
>
>1. One benefit of this approach is that a user can configure
>CoBackpressureThreshold at any join and it is not a global config.
>2. IntervalJoins internally use ConnectedStreams and therefore this will
>work for intervalJoins as well.
>3. Other window joins do not use ConnectedStreams but use
>UnionedStreams. Will have to find a solution for that.
>4. I believe MultipleInputStreams will also need similar functioality.
>5. IMP: This approach does not solve the problem of having event-time
>skew within different partitions/shards of the same input source. It
> only
>solves for event time alignment of different sources.
>
> Looking forward to inputs on the same. If this seems like a feasible
> approach, I can take it forward and implement code with fixes for
> identified gaps and appropriate test cases.
>
> Thanks,
> Robin
>


Re: [DISCUSS] Feature freeze date for 1.13

2021-03-31 Thread Guowei Ma
Hi, community:

Friendly reminder that today (3.31) is the last day of feature development.
Under normal circumstances, you will not be able to submit new features
from tomorrow (4.1). Tomorrow we will create 1.13.0-rc0 for testing,
welcome to help test together.
After the test is relatively stable, we will cut the release-1.13 branch.

Best,
Dawid & Guowei


On Mon, Mar 29, 2021 at 5:17 PM Till Rohrmann  wrote:

> +1 for the 31st of March for the feature freeze.
>
> Cheers,
> Till
>
> On Mon, Mar 29, 2021 at 10:12 AM Robert Metzger 
> wrote:
>
> > +1 for March 31st for the feature freeze.
> >
> >
> >
> > On Fri, Mar 26, 2021 at 3:39 PM Dawid Wysakowicz  >
> > wrote:
> >
> > > Thank you Thomas! I'll definitely check the issue you linked.
> > >
> > > Best,
> > >
> > > Dawid
> > >
> > > On 23/03/2021 20:35, Thomas Weise wrote:
> > > > Hi Dawid,
> > > >
> > > > Thanks for the heads up.
> > > >
> > > > Regarding the "Rebase and merge" button. I find that merge option
> > useful,
> > > > especially for small simple changes and for backports. The following
> > > should
> > > > help to safeguard from the issue encountered previously:
> > > > https://github.com/jazzband/pip-tools/issues/1085
> > > >
> > > > Thanks,
> > > > Thomas
> > > >
> > > >
> > > > On Tue, Mar 23, 2021 at 4:58 AM Dawid Wysakowicz <
> > dwysakow...@apache.org
> > > >
> > > > wrote:
> > > >
> > > >> Hi devs, users!
> > > >>
> > > >> 1. *Feature freeze date*
> > > >>
> > > >> We are approaching the end of March which we agreed would be the
> time
> > > for
> > > >> a Feature Freeze. From the knowledge I've gather so far it still
> seems
> > > to
> > > >> be a viable plan. I think it is a good time to agree on a particular
> > > date,
> > > >> when it should happen. We suggest *(end of day CEST) March 31st*
> > > >> (Wednesday next week) as the feature freeze time.
> > > >>
> > > >> Similarly as last time, we want to create RC0 on the day after the
> > > feature
> > > >> freeze, to make sure the RC creation process is running smoothly,
> and
> > to
> > > >> have a common testing reference point.
> > > >>
> > > >> Having said that let us remind after Robert & Dian from the previous
> > > >> release what it a Feature Freeze means:
> > > >>
> > > >> *B) What does feature freeze mean?*After the feature freeze, no new
> > > >> features are allowed to be merged to master. Only bug fixes and
> > > >> documentation improvements.
> > > >> The release managers will revert new feature commits after the
> feature
> > > >> freeze.
> > > >> Rational: The goal of the feature freeze phase is to improve the
> > system
> > > >> stability by addressing known bugs. New features tend to introduce
> new
> > > >> instabilities, which would prolong the release process.
> > > >> If you need to merge a new feature after the freeze, please open a
> > > >> discussion on the dev@ list. If there are no objections by a PMC
> > member
> > > >> within 48 (workday)hours, the feature can be merged.
> > > >>
> > > >> 2. *Merge PRs from the command line*
> > > >>
> > > >> In the past releases it was quite frequent around the Feature Freeze
> > > date
> > > >> that we ended up with a broken main branch that either did not
> compile
> > > or
> > > >> there were failing tests. It was often due to concurrent merges to
> the
> > > main
> > > >> branch via the "Rebase and merge" button. To overcome the problem we
> > > would
> > > >> like to suggest only ever merging PRs from a command line. Thank you
> > > >> Stephan for the idea! The suggested workflow would look as follows:
> > > >>
> > > >>1. Pull the change and rebase on the current main branch
> > > >>2. Build the project (e.g. from IDE, which should be faster than
> > > >>building entire project from cmd) -> this should ensure the
> project
> > > compiles
> > > >>3. Run the tests in the module that the change affects -> this
> > should
> > > >>greatly minimize the chances of failling tests
> > > >>4. Push the change to the main branch
> > > >>
> > > >> Let us know what you think!
> > > >>
> > > >> Best,
> > > >>
> > > >> Guowei & Dawid
> > > >>
> > > >>
> > > >>
> > >
> > >
> >
>


Re: [DISCUSS] Feature freeze date for 1.13

2021-04-01 Thread Guowei Ma
Hi, Yuval

Thanks for your contribution. I am not a SQL expert, but it seems to be
beneficial to users, and the amount of code is not much and only left is
the test. Therefore, I am open to this entry into rc1.
But according to the rules, you still have to see if there are other PMC's
objections within 48 hours.

Best,
Guowei


On Thu, Apr 1, 2021 at 10:33 PM Yuval Itzchakov  wrote:

> Hi All,
>
> I would really love to merge https://github.com/apache/flink/pull/15307
> prior to 1.13 release cutoff, it just needs some more tests which I can
> hopefully get to today / tomorrow morning.
>
> This is a critical fix as now predicate pushdown won't work for any stream
> which generates a watermark and wants to push down predicates.
>
> On Thu, Apr 1, 2021, 10:56 Kurt Young  wrote:
>
>> Thanks Dawid, I have merged FLINK-20320.
>>
>> Best,
>> Kurt
>>
>>
>> On Thu, Apr 1, 2021 at 2:49 PM Dawid Wysakowicz 
>> wrote:
>>
>>> Hi all,
>>>
>>> @Kurt @Arvid I think it's fine to merge those two, as they are pretty
>>> much finished. We can wait for those two before creating the RC0.
>>>
>>> @Leonard Personally I'd be ok with 3 more days for that single PR. I
>>> find the request reasonable and I second that it's better to have a proper
>>> review rather than rush unfinished feature and try to fix it later.
>>> Moreover it got broader support. Unless somebody else objects, I think we
>>> can merge this PR later and include it in RC1.
>>>
>>> Best,
>>>
>>> Dawid
>>> On 01/04/2021 08:39, Arvid Heise wrote:
>>>
>>> Hi Dawid and Guowei,
>>>
>>> I'd like to merge [FLINK-13550][rest][ui] Vertex Flame Graph [1]. We are
>>> pretty much just waiting for AZP to turn green, it's separate from other
>>> components, and it's a super useful feature for Flink users.
>>>
>>> Best,
>>>
>>> Arvid
>>>
>>> [1] https://github.com/apache/flink/pull/15054
>>>
>>> On Thu, Apr 1, 2021 at 6:21 AM Kurt Young  wrote:
>>>
>>>> Hi Guowei and Dawid,
>>>>
>>>> I want to request the permission to merge this feature [1], it's a
>>>> useful improvement to sql client and won't affect
>>>> other components too much. We were plan to merge it yesterday but met
>>>> some tricky multi-process issue which
>>>> has a very high possibility hanging the tests. It took us a while to
>>>> find out the root cause and fix it.
>>>>
>>>> Since it's not too far away from feature freeze and RC0 also not
>>>> created yet, thus I would like to include this
>>>> in 1.13.
>>>>
>>>> [1] https://issues.apache.org/jira/browse/FLINK-20320
>>>>
>>>> Best,
>>>> Kurt
>>>>
>>>>
>>>> On Wed, Mar 31, 2021 at 5:55 PM Guowei Ma  wrote:
>>>>
>>>>> Hi, community:
>>>>>
>>>>> Friendly reminder that today (3.31) is the last day of feature
>>>>> development. Under normal circumstances, you will not be able to submit 
>>>>> new
>>>>> features from tomorrow (4.1). Tomorrow we will create 1.13.0-rc0 for
>>>>> testing, welcome to help test together.
>>>>> After the test is relatively stable, we will cut the release-1.13
>>>>> branch.
>>>>>
>>>>> Best,
>>>>> Dawid & Guowei
>>>>>
>>>>>
>>>>> On Mon, Mar 29, 2021 at 5:17 PM Till Rohrmann 
>>>>> wrote:
>>>>>
>>>>>> +1 for the 31st of March for the feature freeze.
>>>>>>
>>>>>> Cheers,
>>>>>> Till
>>>>>>
>>>>>> On Mon, Mar 29, 2021 at 10:12 AM Robert Metzger 
>>>>>> wrote:
>>>>>>
>>>>>> > +1 for March 31st for the feature freeze.
>>>>>> >
>>>>>> >
>>>>>> >
>>>>>> > On Fri, Mar 26, 2021 at 3:39 PM Dawid Wysakowicz <
>>>>>> dwysakow...@apache.org>
>>>>>> > wrote:
>>>>>> >
>>>>>> > > Thank you Thomas! I'll definitely check the issue you linked.
>>>>>> > >
>>>>>> > > Best,
>>>>>> > >
>>>>>> > > Dawid
>>>>>

Re: [DISCUSS] Releasing Flink 1.12.3

2021-04-16 Thread Guowei Ma
+1
Thanks for driving this, Arvid.

Best,
Guowei


On Fri, Apr 16, 2021 at 5:58 PM Till Rohrmann  wrote:

> +1.
>
> Thanks for volunteering Arvid.
>
> Cheers,
> Till
>
> On Fri, Apr 16, 2021 at 9:50 AM Stephan Ewen  wrote:
>
> > +1
> >
> > Thanks for pushing this, Arvid, let's get this fix out asap.
> >
> >
> >
> > On Fri, Apr 16, 2021 at 9:46 AM Arvid Heise  wrote:
> >
> > > Dear devs,
> > >
> > > Since we just fixed a severe bug that causes the dataflow to halt under
> > > specific circumstances [1], we would like to release a bugfix asap.
> > >
> > > I would volunteer as the release manager and kick off the release
> process
> > > on next Monday (April 19th).
> > > What do you think?
> > >
> > > Note that this time around, I would not wait for any specific
> > > fixes/backports. However, you can still merge all fixes that you'd like
> > to
> > > see in 1.12.3 until Monday.
> > >
> > > Btw the fix is already in master and will be directly applied to the
> next
> > > RC of 1.13.0. Flink version 1.11.x and older are not affected.
> > >
> > > [1] https://issues.apache.org/jira/browse/FLINK-21992
> > >
> >
>


[VOTE] Release 1.13.0, release candidate #1

2021-04-18 Thread Guowei Ma
Hi everyone,

Currently there are still some on-going efforts for 1.13.0. However we
think the current state is stable enough for the community to test. The
earlier the test, the more time we can fix unexpected issues if there is
any.

We also cut out the release-1.13 branch, so any fix needs to be submitted
to both the master and release-1.13.

Please review and vote on the release candidate #1 for the version 1.13.0,
as follows:
[ ] +1, Approve the release
[ ] -1, Do not approve the release (please provide specific comments)

The complete staging area is available for your review, which includes:
* the official Apache source release and binary convenience releases to be
deployed to dist.apache.org [1], which are signed with the key with
fingerprint D7C86B9C[2],
* all artifacts to be deployed to the Maven Central Repository [3],
* source code tag "release-1.13.0-rc1" [4],

The vote will be open for at least 72 hours. It is adopted by majority
approval, with at least 3 PMC affirmative votes.

Your help testing the release will be greatly appreciated!


[1] https://dist.apache.org/repos/dist/dev/flink/flink-1.13.0-rc1/
[2] https://dist.apache.org/repos/dist/release/flink/KEYS
[3] https://repository.apache.org/content/repositories/orgapacheflink-1418/
[4] https://github.com/apache/flink/tree/release-1.13.0-rc1

Best,
Guowei


Re: Re: Re: [ANNOUNCE] New Apache Flink Committer - Rui Li

2021-04-22 Thread Guowei Ma
Congratulations, Rui!
Best,
Guowei


On Fri, Apr 23, 2021 at 10:38 AM Yun Tang  wrote:

> Congratulations, Rui!
>
> Best,
> Yun Tang
> 
> From: Xuannan Su 
> Sent: Friday, April 23, 2021 10:01
> To: dev@flink.apache.org ; matth...@ververica.com <
> matth...@ververica.com>
> Subject: Re: Re: Re: [ANNOUNCE] New Apache Flink Committer - Rui Li
>
> Congratulations Rui!
>
> Best,
> Xuannan
> On Apr 22, 2021, 6:23 PM +0800, M Shengkai Fang ,
> wrote:
> >
> > Congratulations Rui!
>


Re: [VOTE] Release 1.13.0, release candidate #2

2021-04-28 Thread Guowei Ma
Hi, Matthias

Thank you very much for your careful inspection.
I check the flink-python_2.11-1.13.0.jar and we do not bundle
org.conscrypt:conscrypt-openjdk-uber:2.5.1 to it.
So I think we may not need to add this to the NOTICE file. (BTW The jar's
scope is runtime)

Best,
Guowei


On Thu, Apr 29, 2021 at 2:33 AM Matthias Pohl 
wrote:

> Thanks Dawid and Guowei for managing this release.
>
> - downloaded the sources and binaries and checked the checksums
> - built Flink from the downloaded sources
> - executed example jobs with standalone deployments - I didn't find
> anything suspicious in the logs
> - reviewed release announcement pull request
>
> - I did a pass over dependency updates: git diff release-1.12.2
> release-1.13.0-rc2 */*.xml
> There's one thing someone should double-check whether that's suppose to be
> like that: We added org.conscrypt:conscrypt-openjdk-uber:2.5.1 as a
> dependency but I don't see it being reflected in the NOTICE file of the
> flink-python module. Or is this automatically added later on?
>
> +1 (non-binding; please see remark on dependency above)
>
> Matthias
>
> On Wed, Apr 28, 2021 at 1:52 PM Stephan Ewen  wrote:
>
> > Glad to hear that outcome. And no worries about the false alarm.
> > Thank you for doing thorough testing, this is very helpful!
> >
> > On Wed, Apr 28, 2021 at 1:04 PM Caizhi Weng 
> wrote:
> >
> > > After the investigation we found that this issue is caused by the
> > > implementation of connector, not by the Flink framework.
> > >
> > > Sorry for the false alarm.
> > >
> > > Stephan Ewen  于2021年4月28日周三 下午3:23写道:
> > >
> > > > @Caizhi and @Becket - let me reach out to you to jointly debug this
> > > issue.
> > > >
> > > > I am wondering if there is some incorrect reporting of failed events?
> > > >
> > > > On Wed, Apr 28, 2021 at 8:53 AM Caizhi Weng 
> > > wrote:
> > > >
> > > > > -1
> > > > >
> > > > > We're testing this version on batch jobs with large (600~1000)
> > > > parallelisms
> > > > > and the following exception messages appear with high frequency:
> > > > >
> > > > > 2021-04-27 21:27:26
> > > > > org.apache.flink.util.FlinkException: An OperatorEvent from an
> > > > > OperatorCoordinator to a task was lost. Triggering task failover to
> > > > ensure
> > > > > consistency. Event: '[NoMoreSplitEvent]', targetTask:  -
> > > > > execution #0
> > > > > at
> > > > >
> > > > >
> > > >
> > >
> >
> org.apache.flink.runtime.operators.coordination.SubtaskGatewayImpl.lambda$sendEvent$0(SubtaskGatewayImpl.java:81)
> > > > > at
> > > > >
> > > > >
> > > >
> > >
> >
> java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:822)
> > > > > at
> > > > >
> > > > >
> > > >
> > >
> >
> java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:797)
> > > > > at
> > > > >
> > > > >
> > > >
> > >
> >
> java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:442)
> > > > > at
> > > > >
> > > > >
> > > >
> > >
> >
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRunAsync(AkkaRpcActor.java:440)
> > > > > at
> > > > >
> > > > >
> > > >
> > >
> >
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:208)
> > > > > at
> > > > >
> > > > >
> > > >
> > >
> >
> org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:77)
> > > > > at
> > > > >
> > > > >
> > > >
> > >
> >
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage(AkkaRpcActor.java:158)
> > > > > at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:26)
> > > > > at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:21)
> > > > > at
> scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123)
> > > > > at akka.japi.pf
> > .UnitCaseStatement.applyOrElse(CaseStatements.scala:21)
> > > > > at
> > scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:170)
> > > > > at
> > scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171)
> > > > > at
> > scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171)
> > > > > at akka.actor.Actor$class.aroundReceive(Actor.scala:517)
> > > > > at akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:225)
> > > > > at akka.actor.ActorCell.receiveMessage(ActorCell.scala:592)
> > > > > at akka.actor.ActorCell.invoke(ActorCell.scala:561)
> > > > > at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:258)
> > > > > at akka.dispatch.Mailbox.run(Mailbox.scala:225)
> > > > > at akka.dispatch.Mailbox.exec(Mailbox.scala:235)
> > > > > at
> akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
> > > > > at
> > > > >
> > > > >
> > > >
> > >
> >
> akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
> > > > > at
> > > akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
> > > > > at
> > > > >
> > > > >
> > > >
> > >
> >
> akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
> > > > >
> > > > > Becket Qin is investigating this 

Re: [DISCUSS] Limit size of already processed files in File Source SplitEnumerator

2021-06-08 Thread Guowei Ma
It would really simplify a lot if the modification timestamp of each newly 
scanned file is increased.

 We only need to record the file list corresponding to the largest timestamp.  
Timestamp of each scanned file
 1. It is smaller than the maximum timestamp, which means it has been processed;
 2. The timestamps are equal, so you need to see if it is in this file list, if 
it is, you don't need to process it, if it is not, you need to process it;
 3. It is larger than the maximum timestamp and has not been processed

 If the maximum timestamp is dynamically restored from the file list every time 
it is started, the state compatibility issue can be ignored.


BTW I haven't done any test, but I am actually a little curious, if a lot of 
files have been processed, is the scan itself already very slow?  I mean maybe 
the bottleneck at the beginning might be scan?



> 在 2021年6月8日,下午5:41,Till Rohrmann  写道:
> 
> Hi Tianxin,
> 
> thanks for starting this discussion. I am pulling in Arvid who works on
> Flink's connectors.
> 
> I think the problem you are describing can happen.
> 
> From what I understand you are proposing to keep track of the watermark of
> processed file input splits and then filter out splits based on their
> modification timestamps and the watermark. What is the benefit of keeping
> for every split the modification timestamp in the map? Could it also work
> if we sort the input splits according to their modification timestamps and
> then remember the last processed split? That way we only remember a single
> value and upon recovery, we only process those splits which have a newer
> modification timestamp.
> 
> Cheers,
> Till
> 
>> On Tue, Jun 8, 2021 at 12:11 AM Tianxin Zhao  wrote:
>> 
>> Hi!
>> 
>> Currently Flink File Source relies on a Set pathsAlreadyProcessed in
>> SplitEnumerator to decide which file has been processed and avoids
>> reprocessing files if a file is already in this set. However this set could
>> be ever growing and ultimately exceed memory size if there are new files
>> continuously added to the input path.
>> 
>> I submitted https://issues.apache.org/jira/browse/FLINK-22792 and would
>> like to be assigned to the ticket.
>> 
>> Current proposed change as belows, would like to get an agreement on the
>> approach taken.
>> 
>>   1.
>> 
>>   Maintain fileWatermark updated by new files modification time in
>>   ContinuousFileSplitEnumerator
>>   2.
>> 
>>   Change Set pathsAlreadyProcessed to a HashMap
>>   pathsAlreadyProcessed where the key is same as before which is the file
>>   path of already processed files, and the value is file modification
>> time,
>>   expose getModificationTime() method to FileSourceSplit.
>> 
>> 
>>   1.
>> 
>>   Adding a fileExpireTime user configurable config, any files older
>> than fileWatermark
>>   - fileExpireTime would get ignored.
>>   2.
>> 
>>   When snapshotting splitEnumerator, remove files that are older than
>> fileWatermark
>>   - fileExpireTime from the pathsAlreadyProcessed map.
>>   3.
>> 
>>   Adding alreadyProcessedPaths map and fileWatermark to
>>   PendingSplitsCheckpoint, modify the current
>>   PendingSplitsCheckpointSerializer to add a version2 serializer that
>>   serialize the alreadyProcessedPaths map which included file modification
>>   time.
>>   4.
>> 
>>   Subclass of PendingSplitsCheckpoint like
>>   ContinuousHivePendingSplitsCheckpoint would not be impacted by
>>   initializing an empty alreadyProcessedMap and 0 as initial watermark.
>> 
>> Thanks!
>> 


Re: [DISCUSS] FLIP-134: DataStream Semantics for Bounded Input

2020-08-24 Thread Guowei Ma
Hi, Klou

Thanks for your proposal. It's a very good idea.
Just a little comment about the "Batch vs Streaming Scheduling".  In the
AUTOMATIC execution mode maybe we could not pick BATCH execution mode even
if all sources are bounded. For example some applications would use the
`CheckpointListener`, which is not available in the BATCH mode in current
implementation.
So maybe we need more checks in the AUTOMATIC execution mode.

Best,
Guowei


On Thu, Aug 20, 2020 at 10:27 PM Kostas Kloudas  wrote:

> Hi all,
>
> Thanks for the comments!
>
> @Dawid: "execution.mode" can be a nice alternative and from a quick
> look it is not used currently by any configuration option. I will
> update the FLIP accordingly.
>
> @David: Given that having the option to allow timers to fire at the
> end of the job is already in the FLIP, I will leave it as is and I
> will update the default policy to be "ignore processing time timers
> set by the user". This will allow existing dataStream programs to run
> on bounded inputs. This update will affect point 2 in the "Processing
> Time Support in Batch" section.
>
> If these changes cover your proposals, then I would like to start a
> voting thread tomorrow evening if this is ok with you.
>
> Please let me know until then.
>
> Kostas
>
> On Tue, Aug 18, 2020 at 3:54 PM David Anderson 
> wrote:
> >
> > Being able to optionally fire registered processing time timers at the
> end of a job would be interesting, and would help in (at least some of) the
> cases I have in mind. I don't have a better idea.
> >
> > David
> >
> > On Mon, Aug 17, 2020 at 8:24 PM Kostas Kloudas 
> wrote:
> >>
> >> Hi Kurt and David,
> >>
> >> Thanks a lot for the insightful feedback!
> >>
> >> @Kurt: For the topic of checkpointing with Batch Scheduling, I totally
> >> agree with you that it requires a lot more work and careful thinking
> >> on the semantics. This FLIP was written under the assumption that if
> >> the user wants to have checkpoints on bounded input, he/she will have
> >> to go with STREAMING as the scheduling mode. Checkpointing for BATCH
> >> can be handled as a separate topic in the future.
> >>
> >> In the case of MIXED workloads and for this FLIP, the scheduling mode
> >> should be set to STREAMING. That is why the AUTOMATIC option sets
> >> scheduling to BATCH only if all the sources are bounded. I am not sure
> >> what are the plans there at the scheduling level, as one could imagine
> >> in the future that in mixed workloads, we schedule first all the
> >> bounded subgraphs in BATCH mode and we allow only one UNBOUNDED
> >> subgraph per application, which is going to be scheduled after all
> >> Bounded ones have finished. Essentially the bounded subgraphs will be
> >> used to bootstrap the unbounded one. But, I am not aware of any plans
> >> towards that direction.
> >>
> >>
> >> @David: The processing time timer handling is a topic that has also
> >> been discussed in the community in the past, and I do not remember any
> >> final conclusion unfortunately.
> >>
> >> In the current context and for bounded input, we chose to favor
> >> reproducibility of the result, as this is expected in batch processing
> >> where the whole input is available in advance. This is why this
> >> proposal suggests to not allow processing time timers. But I
> >> understand your argument that the user may want to be able to run the
> >> same pipeline on batch and streaming this is why we added the two
> >> options under future work, namely (from the FLIP):
> >>
> >> ```
> >> Future Work: In the future we may consider adding as options the
> capability of:
> >> * firing all the registered processing time timers at the end of a job
> >> (at close()) or,
> >> * ignoring all the registered processing time timers at the end of a
> job.
> >> ```
> >>
> >> Conceptually, we are essentially saying that we assume that batch
> >> execution is assumed to be instantaneous and refers to a single
> >> "point" in time and any processing-time timers for the future may fire
> >> at the end of execution or be ignored (but not throw an exception). I
> >> could also see ignoring the timers in batch as the default, if this
> >> makes more sense.
> >>
> >> By the way, do you have any usecases in mind that will help us better
> >> shape our processing time timer handling?
> >>
> >> Kostas
> >>
> >> On Mon, Aug 17, 2020 at 2:52 PM David Anderson 
> wrote:
> >> >
> >> > Kostas,
> >> >
> >> > I'm pleased to see some concrete details in this FLIP.
> >> >
> >> > I wonder if the current proposal goes far enough in the direction of
> recognizing the need some users may have for "batch" and "bounded
> streaming" to be treated differently. If I've understood it correctly, the
> section on scheduling allows me to choose STREAMING scheduling even if I
> have bounded sources. I like that approach, because it recognizes that even
> though I have bounded inputs, I don't necessarily want batch processing
> semantics. I think it makes sense to exten

Re: [ANNOUNCE] Apache Flink 1.10.2 released

2020-08-25 Thread Guowei Ma
Hi,

Thanks a lot for being the release manager Zhu Zhu!
Thanks everyone contributed to this!

Best,
Guowei


On Wed, Aug 26, 2020 at 11:18 AM Yun Tang  wrote:

> Thanks for Zhu's work to manage this release and everyone who contributed
> to this!
>
> Best,
> Yun Tang
> 
> From: Yangze Guo 
> Sent: Tuesday, August 25, 2020 14:47
> To: Dian Fu 
> Cc: Zhu Zhu ; dev ; user <
> u...@flink.apache.org>; user-zh 
> Subject: Re: [ANNOUNCE] Apache Flink 1.10.2 released
>
> Thanks a lot for being the release manager Zhu Zhu!
> Congrats to all others who have contributed to the release!
>
> Best,
> Yangze Guo
>
> On Tue, Aug 25, 2020 at 2:42 PM Dian Fu  wrote:
> >
> > Thanks ZhuZhu for managing this release and everyone else who
> contributed to this release!
> >
> > Regards,
> > Dian
> >
> > 在 2020年8月25日,下午2:22,Till Rohrmann  写道:
> >
> > Great news. Thanks a lot for being our release manager Zhu Zhu and to
> all others who have contributed to the release!
> >
> > Cheers,
> > Till
> >
> > On Tue, Aug 25, 2020 at 5:37 AM Zhu Zhu  wrote:
> >>
> >> The Apache Flink community is very happy to announce the release of
> Apache Flink 1.10.2, which is the first bugfix release for the Apache Flink
> 1.10 series.
> >>
> >> Apache Flink® is an open-source stream processing framework for
> distributed, high-performing, always-available, and accurate data streaming
> applications.
> >>
> >> The release is available for download at:
> >> https://flink.apache.org/downloads.html
> >>
> >> Please check out the release blog post for an overview of the
> improvements for this bugfix release:
> >> https://flink.apache.org/news/2020/08/25/release-1.10.2.html
> >>
> >> The full release notes are available in Jira:
> >>
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12347791
> >>
> >> We would like to thank all contributors of the Apache Flink community
> who made this release possible!
> >>
> >> Thanks,
> >> Zhu
> >
> >
>


Re: [VOTE] FLIP-131: Consolidate the user-facing Dataflow SDKs/APIs (and deprecate the DataSet API)

2020-09-03 Thread Guowei Ma
+1
Looking forward to having a unified datastream api.
Best,
Guowei


On Thu, Sep 3, 2020 at 3:46 PM Dawid Wysakowicz 
wrote:

> +1
>
> I think it gives a clear idea why we should deprecate and eventually
> remove the DataSet API.
>
> Best,
>
> Dawid
>
> On 03/09/2020 09:37, Yun Gao wrote:
> > Very thanks for bring this up!  +1 for deprecating the DataSet API and
> providing a unified streaming/batch programming model to users.
> >
> > Best,
> >  Yun
> >
> >
> > --
> > Sender:Aljoscha Krettek
> > Date:2020/09/02 19:22:51
> > Recipient:Flink Dev
> > Theme:[VOTE] FLIP-131: Consolidate the user-facing Dataflow SDKs/APIs
> (and deprecate the DataSet API)
> >
> > Hi all,
> >
> > After the discussion in [1], I would like to open a voting thread for
> > FLIP-131 (https://s.apache.org/FLIP-131) [2] which discusses the
> > deprecation of the DataSet API and future work on the DataStream API and
> > Table API for bounded (batch) execution.
> >
> > The vote will be open until September 7 (72h + weekend), unless there is
> > an objection or not enough votes.
> >
> > Regards,
> > Aljoscha
> >
> > [1]
> >
> https://lists.apache.org/thread.html/r4f24c4312cef7270a1349c39b89fb1184c84065944b43aedf9cfba6a%40%3Cdev.flink.apache.org%3E
> > [2]
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=158866741
>
>


[DISCUSS] FLIP-143: Unified Sink API

2020-09-10 Thread Guowei Ma
Hi, devs & users

As discussed in FLIP-131[1], Flink will deprecate the DataSet API in favor
of DataStream API and Table API. Users should be able to use DataStream API
to write jobs that support both bounded and unbounded execution modes.
However, Flink does not provide a sink API to guarantee the Exactly-once
semantics in both bounded and unbounded scenarios, which blocks the
unification.

So we want to introduce a new unified sink API which could let the user
develop the sink once and run it everywhere. You could find more details in
FLIP-143[2].

The FLIP contains some open questions that I'd really appreciate inputs
from the community. Some of the open questions include:

   1. We provide two alternative Sink API in the FLIP. The only difference
   between the two versions is how to expose the state to the user. We want to
   know which one is your preference?
   2. How does the sink API support to write to the Hive?
   3. Is the sink an operator or a topology?

[1]
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=158866741
[2]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-143%3A+Unified+Sink+API

Best,
Guowei


Re: [DISCUSS] FLIP-143: Unified Sink API

2020-09-14 Thread Guowei Ma
Hi all,


Very thanks for the discussion and the valuable opinions! Currently there
are several ongoing issues and we would like to show what we are thinking
in the next few mails.

It seems that the biggest issue now is about the topology of the sinks.
Before deciding what the sink API would look like, I would like to first
summarize the different topologies we have mentioned so that we could sync
on the same page and gain more insights about this issue. There are four
types of topology I could see. Please correct me if I misunderstand what
you mean:

   1.

   Commit individual files. (StreamingFileSink)
   1.

  FileWriter -> FileCommitter
  2.

   Commit a directory (HiveSink)
   1.

  FileWriter -> FileCommitter -> GlobalCommitter
  3.

   Commit a bundle of files (Iceberg)
   1.

  DataFileWriter  -> GlobalCommitter
  4.

   Commit a directory with merged files(Some user want to merge the files
   in a directory before committing the directory to Hive meta store)
   1.

  FileWriter -> SingleFileCommit -> FileMergeWriter  -> GlobalCommitter


It can be seen from the above that the topologies are different according
to different requirements. Not only that there may be other options for the
second and third categories. E.g

A alternative topology option for the IcebergSink might be : DataFileWriter
-> Agg -> GlobalCommitter. One pro of this method is that we can let Agg
take care of the cleanup instead of coupling the cleanup logic to the
committer.


In the long run I think we might provide the sink developer the ability to
build arbitrary topologies. Maybe Flink could only provide a basic commit
transformation and let the user build other parts of the topology. In the
1.12 we might first provide different patterns for these different
scenarios at first and I think these components could be reused in the
future.

Best,
Guowei


On Mon, Sep 14, 2020 at 11:19 PM Dawid Wysakowicz 
wrote:

> Hi all,
>
> > I would think that we only need flush() and the semantics are that it
> > prepares for a commit, so on a physical level it would be called from
> > "prepareSnapshotPreBarrier". Now that I'm thinking about it more I
> > think flush() should be renamed to something like "prepareCommit()".
>
> Generally speaking it is a good point that emitting the committables
> should happen before emitting the checkpoint barrier downstream.
> However, if I remember offline discussions well, the idea behind
> Writer#flush and Writer#snapshotState was to differentiate commit on
> checkpoint vs final checkpoint at the end of the job. Both of these
> methods could emit committables, but the flush should not leave any in
> progress state (e.g. in case of file sink in STREAM mode, in
> snapshotState it could leave some open files that would be committed in
> a subsequent cycle, however flush should close all files). The
> snapshotState as it is now can not be called in
> prepareSnapshotPreBarrier as it can store some state, which should
> happen in Operator#snapshotState as otherwise it would always be
> synchronous. Therefore I think we would need sth like:
>
> void prepareCommit(boolean flush, WriterOutput output);
>
> ver 1:
>
> List snapshotState();
>
> ver 2:
>
> void snapshotState(); // not sure if we need that method at all in option 2
>
> > The Committer is as described in the FLIP, it's basically a function
> > "void commit(Committable)". The GobalCommitter would be a function "void
> > commit(List)". The former would be used by an S3 sink where
> > we can individually commit files to S3, a committable would be the list
> > of part uploads that will form the final file and the commit operation
> > creates the metadata in S3. The latter would be used by something like
> > Iceberg where the Committer needs a global view of all the commits to be
> > efficient and not overwhelm the system.
> >
> > I don't know yet if sinks would only implement on type of commit
> > function or potentially both at the same time, and maybe Commit can
> > return some CommitResult that gets shipped to the GlobalCommit function.
> I must admit it I did not get the need for Local/Normal + Global
> committer at first. The Iceberg example helped a lot. I think it makes a
> lot of sense.
>
> > For Iceberg, writers don't need any state. But the GlobalCommitter
> > needs to
> > checkpoint StateT. For the committer, CommT is "DataFile". Since a single
> > committer can collect thousands (or more) data files in one checkpoint
> > cycle, as an optimization we checkpoint a single "ManifestFile" (for the
> > collected thousands data files) as StateT. This allows us to absorb
> > extended commit outages without losing written/uploaded data files, as
> > operator state size is as small as one manifest file per checkpoint cycle
> > [2].
> > --
> > StateT snapshotState(SnapshotContext context) throws Exception;
> >
> > That means we also need the restoreCommitter API in the Sink interface
> > ---
> > C

Re: [DISCUSS] FLIP-143: Unified Sink API

2020-09-14 Thread Guowei Ma
## Concurrent checkpoints
AFAIK the committer would not see the file-1-2 when ck1 happens in the
ExactlyOnce mode.

## Committable bookkeeping and combining

I agree with you that the "CombineGlobalCommitter" would work. But we put
more optimization logic in the committer, which will make the committer
more and more complicated, and eventually become the same as the
Writer. For example, The committer needs to clean up some unused manifest
file when restoring from a failure if we introduce the optimizations to the
committer.

In this case another alternative might be to put this "merging"
optimization to a separate agg operator(maybe just like another `Writer`?).
The agg could produce an aggregated committable to the committer. The agg
operator could manage the whole life cycle of the manifest file it created.
It would make the committer have single responsibility.

>>The main question is if this pattern is generic to be put into the sink
framework or not.
Maybe I am wrong. But what I can feel from the current discussion is that
different requirements have different topological requirements.

## Using checkpointId
In the batch execution mode there would be no normal checkpoint any more.
That is why we do not introduce the checkpoint id in the API. So it is a
great thing that sink decouples its implementation from checkpointid. :)

Best,
Guowei


On Tue, Sep 15, 2020 at 7:33 AM Steven Wu  wrote:

>
> ## concurrent checkpoints
>
> @Aljoscha Krettek  regarding the concurrent
> checkpoints, let me illustrate with a simple DAG below.
> [image: image.png]
>
> Let's assume each writer emits one file per checkpoint cycle and *writer-2
> is slow*. Now let's look at what the global committer receives
>
> timeline:
> --> Now
> from Writer-1:  file-1-1, ck-1, file-1-2, ck-2
> from Writer-2:
> file-2-1, ck-1
>
> In this case, the committer shouldn't include "file-1-2" into the commit
> for ck-1.
>
> ## Committable bookkeeping and combining
>
> I like David's proposal where the framework takes care of the
> bookkeeping of committables and provides a combiner API (CommT ->
> GlobalCommT) for GlobalCommitter. The only requirement is to tie the
> commit/CommT/GlobalCommT to a checkpoint.
>
> When a commit is successful for checkpoint-N, the framework needs to
> remove the GlobalCommT from the state corresponding to checkpoints <= N. If
> a commit fails, the GlobalCommT accumulates and will be included in the
> next cycle. That is how the Iceberg sink works. I think it is good to
> piggyback retries with Flink's periodic checkpoints for Iceberg sink.
> Otherwise, it can get complicated to implement retry logic that won't
> interfere with Flink checkpoints.
>
> The main question is if this pattern is generic to be put into the sink
> framework or not.
>
> > A alternative topology option for the IcebergSink might be :
> DataFileWriter
> -> Agg -> GlobalCommitter. One pro of this method is that we can let Agg
> take care of the cleanup instead of coupling the cleanup logic to the
> committer.
>
> @Guowei Ma  I would favor David's suggestion of
> "combine" API rather than a separate "Agg" operator.
>
> ## Using checkpointId
>
> > I think this can have some problems, for example when checkpoint ids are
> not strictly sequential, when we wrap around, or when the JobID changes.
> This will happen when doing a stop/start-from-savepoint cycle, for example.
>
> checkpointId can work if it is monotonically increasing, which I believe
> is the case for Flink today. Restoring from checkpoint or savepoint will
> resume the checkpointIds.
>
> We can deal with JobID change by saving it into the state and Iceberg
> snapshot metadata. There is already a PR [1] for that.
>
> ## Nonce
>
> > Flink provide a nonce to the GlobalCommitter where Flink guarantees that
> this nonce is unique
>
> That is actually how we implemented internally. Flink Iceberg sink
> basically hashes the Manifest file location as the nonce. Since the Flink
> generated Manifest file location is unique, it  guarantees the nonce is
> unique.
>
> IMO, checkpointId is also one way of implementing Nonce based on today's
> Flink behavior.
>
> > and will not change for repeated invocations of the GlobalCommitter with
> the same set of committables
>
>  if the same set of committables are combined into one GlobalCommT (like
> ManifestFile in Iceberg), then the Nonce could be part of the GlobalCommT
> interface.
>
> BTW, as David pointed out, the ManifestFile optimization is only in our
> internal implementation [2] right now. For the open source version, t

Re: [DISCUSS] FLIP-143: Unified Sink API

2020-09-14 Thread Guowei Ma
Hi, aljoscha

>I don't understand why we need the "Drain and Snapshot" section. It
>seems to be some details about stop-with-savepoint and drain, and the
>relation to BATCH execution but I don't know if it is needed to
>understand the rest of the document. I'm happy to be wrong here, though,
>if there's good reasons for the section.

The new unified sink API should provide a way for the sink developer to
deal with EOI(Drain) to guarantee the Exactly-once semantics. This is what
I want to say mostly in this section. Current streaming style sink API does
not provide a good way to deal with it. It is why the `StreamingFileSink`
does not commit the last part of data in the bounded scenario. Our theme is
unified. I am afraid that I will let users misunderstand that adding this
requirement to the new sink API is only for bounded scenarios, so I
explained in this paragraph that stop-with-savepoint might also have the
similar requirement.

For the snapshot I also want to prevent users from misunderstanding that it
is specially prepared for the unbounded scenario. Actually it might be also
possible with bounded + batch execution mode in the future.

I could reorganize the section if this section makes the reader confused
but I think we might need to keep the drain at least. WDYT?

>On the question of Alternative 1 and 2, I have a strong preference for
>Alternative 1 because we could avoid strong coupling to other modules.
>With Alternative 2 we would depend on `flink-streaming-java` and even
>`flink-runtime`. For the new source API (FLIP-27) we managed to keep the
>dependencies slim and the code is in flink-core. I'd be very happy if we
>can manage the same for the new sink API.

I am open to alternative 1. Maybe I miss something but I do not get why the
second alternative would depend on `flink-runtime` or
`flink-streaming-java`. The all the state api currently is in the
flink-core. Could you give some further explanation?  thanks :)

Best,
Guowei


On Tue, Sep 15, 2020 at 12:05 PM Guowei Ma  wrote:

> ## Concurrent checkpoints
> AFAIK the committer would not see the file-1-2 when ck1 happens in the
> ExactlyOnce mode.
>
> ## Committable bookkeeping and combining
>
> I agree with you that the "CombineGlobalCommitter" would work. But we put
> more optimization logic in the committer, which will make the committer
> more and more complicated, and eventually become the same as the
> Writer. For example, The committer needs to clean up some unused manifest
> file when restoring from a failure if we introduce the optimizations to the
> committer.
>
> In this case another alternative might be to put this "merging"
> optimization to a separate agg operator(maybe just like another `Writer`?).
> The agg could produce an aggregated committable to the committer. The agg
> operator could manage the whole life cycle of the manifest file it created.
> It would make the committer have single responsibility.
>
> >>The main question is if this pattern is generic to be put into the sink
> framework or not.
> Maybe I am wrong. But what I can feel from the current discussion is that
> different requirements have different topological requirements.
>
> ## Using checkpointId
> In the batch execution mode there would be no normal checkpoint any more.
> That is why we do not introduce the checkpoint id in the API. So it is a
> great thing that sink decouples its implementation from checkpointid. :)
>
> Best,
> Guowei
>
>
> On Tue, Sep 15, 2020 at 7:33 AM Steven Wu  wrote:
>
>>
>> ## concurrent checkpoints
>>
>> @Aljoscha Krettek  regarding the concurrent
>> checkpoints, let me illustrate with a simple DAG below.
>> [image: image.png]
>>
>> Let's assume each writer emits one file per checkpoint cycle and *writer-2
>> is slow*. Now let's look at what the global committer receives
>>
>> timeline:
>> --> Now
>> from Writer-1:  file-1-1, ck-1, file-1-2, ck-2
>> from Writer-2:
>> file-2-1, ck-1
>>
>> In this case, the committer shouldn't include "file-1-2" into the commit
>> for ck-1.
>>
>> ## Committable bookkeeping and combining
>>
>> I like David's proposal where the framework takes care of the
>> bookkeeping of committables and provides a combiner API (CommT ->
>> GlobalCommT) for GlobalCommitter. The only requirement is to tie the
>> commit/CommT/GlobalCommT to a checkpoint.
>>
>> When a commit is successful for checkpoint-N, the framework needs to
>> remove the GlobalCommT from the state corresponding to checkpoints <= N. If
>> a commit fails, the GlobalCommT accumulates a

Re: [DISCUSS] FLIP-143: Unified Sink API

2020-09-14 Thread Guowei Ma
>> I would think that we only need flush() and the semantics are that it
>> prepares for a commit, so on a physical level it would be called from
>> "prepareSnapshotPreBarrier". Now that I'm thinking about it more I
>> think flush() should be renamed to something like "prepareCommit()".

> Generally speaking it is a good point that emitting the committables
> should happen before emitting the checkpoint barrier downstream.
> However, if I remember offline discussions well, the idea behind
> Writer#flush and Writer#snapshotState was to differentiate commit on
> checkpoint vs final checkpoint at the end of the job. Both of these
> methods could emit committables, but the flush should not leave any in
> progress state (e.g. in case of file sink in STREAM mode, in
> snapshotState it could leave some open files that would be committed in
> a subsequent cycle, however flush should close all files). The
> snapshotState as it is now can not be called in
> prepareSnapshotPreBarrier as it can store some state, which should
> happen in Operator#snapshotState as otherwise it would always be
> synchronous. Therefore I think we would need sth like:

> void prepareCommit(boolean flush, WriterOutput output);

> ver 1:

> List snapshotState();

> ver 2:

> void snapshotState(); // not sure if we need that method at all in option
2

I second Dawid's proposal. This is a valid scenario. And version2 does not
need the snapshotState() any more.

>> The Committer is as described in the FLIP, it's basically a function
>> "void commit(Committable)". The GobalCommitter would be a function "void
>> commit(List)". The former would be used by an S3 sink where
>> we can individually commit files to S3, a committable would be the list
>> of part uploads that will form the final file and the commit operation
>> creates the metadata in S3. The latter would be used by something like
>> Iceberg where the Committer needs a global view of all the commits to be
>> efficient and not overwhelm the system.
>>
>> I don't know yet if sinks would only implement on type of commit
>> function or potentially both at the same time, and maybe Commit can
>> return some CommitResult that gets shipped to the GlobalCommit function.
>> I must admit it I did not get the need for Local/Normal + Global
>> committer at first. The Iceberg example helped a lot. I think it makes a
>> lot of sense.

@Dawid
What I understand is that HiveSink's implementation might need the local
committer(FileCommitter) because the file rename is needed.
But the iceberg only needs to write the manifest file.  Would you like to
enlighten me why the Iceberg needs the local committer?
Thanks

Best,
Guowei


On Mon, Sep 14, 2020 at 11:19 PM Dawid Wysakowicz 
wrote:

> Hi all,
>
> > I would think that we only need flush() and the semantics are that it
> > prepares for a commit, so on a physical level it would be called from
> > "prepareSnapshotPreBarrier". Now that I'm thinking about it more I
> > think flush() should be renamed to something like "prepareCommit()".
>
> Generally speaking it is a good point that emitting the committables
> should happen before emitting the checkpoint barrier downstream.
> However, if I remember offline discussions well, the idea behind
> Writer#flush and Writer#snapshotState was to differentiate commit on
> checkpoint vs final checkpoint at the end of the job. Both of these
> methods could emit committables, but the flush should not leave any in
> progress state (e.g. in case of file sink in STREAM mode, in
> snapshotState it could leave some open files that would be committed in
> a subsequent cycle, however flush should close all files). The
> snapshotState as it is now can not be called in
> prepareSnapshotPreBarrier as it can store some state, which should
> happen in Operator#snapshotState as otherwise it would always be
> synchronous. Therefore I think we would need sth like:
>
> void prepareCommit(boolean flush, WriterOutput output);
>
> ver 1:
>
> List snapshotState();
>
> ver 2:
>
> void snapshotState(); // not sure if we need that method at all in option 2
>
> > The Committer is as described in the FLIP, it's basically a function
> > "void commit(Committable)". The GobalCommitter would be a function "void
> > commit(List)". The former would be used by an S3 sink where
> > we can individually commit files to S3, a committable would be the list
> > of part uploads that will form the final file and the commit operation
> > creates the metadata in S3. The latter would be used by something like
> > Iceberg where the Committer needs a global view of all the commits to be
> > efficient and not overwhelm the system.
> >
> > I don't know yet if sinks would only implement on type of commit
> > function or potentially both at the same time, and maybe Commit can
> > return some CommitResult that gets shipped to the GlobalCommit function.
> I must admit it I did not get the need for Local/Normal + Global
> committer at first. The Iceberg example helped a lot. 

Re: [DISCUSS] FLIP-143: Unified Sink API

2020-09-15 Thread Guowei Ma
Hi, Dawid

>>I still find the merging case the most confusing. I don't necessarily
understand why do you need the "SingleFileCommit" step in this scenario.
The way I
>> understand "commit" operation is that it makes some data/artifacts
visible to the external system, thus it should be immutable from a point of
view of a single >>process. Having an additional step in the same process
that works on committed data contradicts with those assumptions. I might be
missing something though. >> Could you elaborate >why can't it be something
like FileWriter -> FileMergeWriter -> Committer (either global or
non-global)? Again it might be just me not getting the example.

I think you are right. The topology
"FileWriter->FileMergeWriter->Committer" could meet the merge requirement.
The topology "FileWriter-> SingleFileCommitter -> FileMergeWriter ->
GlobalCommitter" reuses some code of the StreamingFileSink(For example
rolling policy) so it has the "SingleFileCommitter" in the topology. In
general I want to use the case to show that there are different topologies
according to the requirements.

BTW: IIRC, @Jingsong Lee  telled me that the
actual topology of merged supported HiveSink is more complicated than that.


>> I've just briefly skimmed over the proposed interfaces. I would suggest
one
>> addition to the Writer interface (as I understand this is the runtime
>> interface in this proposal?): add some availability method, to avoid, if
>> possible, blocking calls on the sink. We already have similar
>> availability methods in the new sources [1] and in various places in the
>> network stack [2].
>> BTW Let's not forget about Piotr's comment. I think we could add the
isAvailable or similar method to the Writer interface in the FLIP.

Thanks @Dawid Wysakowicz   for your reminder. There
are two many issues at the same time.

In addition to what Ajjoscha said : there is very little system support
it.   Another thing I worry about is that: Does the sink's snapshot return
immediately when the sink's status is unavailable? Maybe we could do it by
dedupe some element in the state but I think it might be too complicated.
For me I want to know is what specific sink will benefit from this
feature.  @piotr   Please correct me if  I
misunderstand you. thanks.

Best,
Guowei


On Tue, Sep 15, 2020 at 3:55 PM Dawid Wysakowicz 
wrote:

> What I understand is that HiveSink's implementation might need the local
> committer(FileCommitter) because the file rename is needed.
> But the iceberg only needs to write the manifest file.  Would you like to
> enlighten me why the Iceberg needs the local committer?
> Thanks
>
> Sorry if I caused a confusion here. I am not saying the Iceberg sink needs
> a local committer. What I had in mind is that prior to the Iceberg example
> I did not see a need for a "GlobalCommitter" in the streaming case. I
> thought it is always enough to have the "normal" committer in that case.
> Now I understand that this differentiation is not really about logical
> separation. It is not really about the granularity with which we commit,
> i.e. answering the "WHAT" question. It is really about the performance and
> that in the end we will have a single "transaction", so it is about
> answering the question "HOW".
>
>
>-
>
>Commit a directory with merged files(Some user want to merge the files
>in a directory before committing the directory to Hive meta store)
>
>
>1.
>
>FileWriter -> SingleFileCommit -> FileMergeWriter  -> GlobalCommitter
>
> I still find the merging case the most confusing. I don't necessarily
> understand why do you need the "SingleFileCommit" step in this scenario.
> The way I understand "commit" operation is that it makes some
> data/artifacts visible to the external system, thus it should be immutable
> from a point of view of a single process. Having an additional step in the
> same process that works on committed data contradicts with those
> assumptions. I might be missing something though. Could you elaborate why
> can't it be something like FileWriter -> FileMergeWriter -> Committer
> (either global or non-global)? Again it might be just me not getting the
> example.
>
> I've just briefly skimmed over the proposed interfaces. I would suggest one
> addition to the Writer interface (as I understand this is the runtime
> interface in this proposal?): add some availability method, to avoid, if
> possible, blocking calls on the sink. We already have similar
> availability methods in the new sources [1] and in various places in the
> network stack [2].
>
> BTW Let&

Re: [DISCUSS] FLIP-143: Unified Sink API

2020-09-15 Thread Guowei Ma
Hi, Steven
Thanks you for your thoughtful ideas and concerns.

>>I still like the concept of grouping data files per checkpoint for
streaming mode. it is cleaner and probably easier to manage and deal with
commit failures. Plus, it >>can reduce dupes for the at least once
>>mode.  I understand checkpoint is not an option for batch execution. We
don't have to expose the checkpointId in API, as >>long as  the internal
bookkeeping groups data files by checkpoints for streaming >>mode.

I think this problem(How to dedupe the combined committed data) also
depends on where to place the agg/combine logic .

1. If the agg/combine takes place in the “commit” maybe we need to figure
out how to give the aggregated committable a unique and auto-increment id
in the committer.
2. If the agg/combine takes place in a separate operator maybe sink
developer could maintain the id itself by using the state.

I think this problem is also decided by what the topology pattern the sink
API should support. Actually there are already many other topology
requirements. :)

Best,
Guowei


On Wed, Sep 16, 2020 at 7:46 AM Steven Wu  wrote:

> > AFAIK the committer would not see the file-1-2 when ck1 happens in the
> ExactlyOnce mode.
>
> @Guowei Ma  I think you are right for exactly once
> checkpoint semantics. what about "at least once"? I guess we can argue that
> it is fine to commit file-1-2 for at least once mode.
>
> I still like the concept of grouping data files per checkpoint for
> streaming mode. it is cleaner and probably easier to manage and deal with
> commit failures. Plus, it can reduce dupes for the at least once mode.  I
> understand checkpoint is not an option for batch execution. We don't have
> to expose the checkpointId in API, as long as  the internal bookkeeping
> groups data files by checkpoints for streaming mode.
>
>
> On Tue, Sep 15, 2020 at 6:58 AM Steven Wu  wrote:
>
>> > images don't make it through to the mailing lists. You would need to
>> host the file somewhere and send a link.
>>
>> Sorry about that. Here is the sample DAG in google drawings.
>>
>> https://docs.google.com/drawings/d/1-P8F2jF9RG9HHTtAfWEBRuU_2uV9aDTdqEt5dLs2JPk/edit?usp=sharing
>>
>>
>> On Tue, Sep 15, 2020 at 4:58 AM Guowei Ma  wrote:
>>
>>> Hi, Dawid
>>>
>>> >>I still find the merging case the most confusing. I don't necessarily
>>> understand why do you need the "SingleFileCommit" step in this scenario.
>>> The way I
>>> >> understand "commit" operation is that it makes some data/artifacts
>>> visible to the external system, thus it should be immutable from a point
>>> of
>>> view of a single >>process. Having an additional step in the same process
>>> that works on committed data contradicts with those assumptions. I might
>>> be
>>> missing something though. >> Could you elaborate >why can't it be
>>> something
>>> like FileWriter -> FileMergeWriter -> Committer (either global or
>>> non-global)? Again it might be just me not getting the example.
>>>
>>> I think you are right. The topology
>>> "FileWriter->FileMergeWriter->Committer" could meet the merge
>>> requirement.
>>> The topology "FileWriter-> SingleFileCommitter -> FileMergeWriter ->
>>> GlobalCommitter" reuses some code of the StreamingFileSink(For example
>>> rolling policy) so it has the "SingleFileCommitter" in the topology. In
>>> general I want to use the case to show that there are different
>>> topologies
>>> according to the requirements.
>>>
>>> BTW: IIRC, @Jingsong Lee  telled me that the
>>> actual topology of merged supported HiveSink is more complicated than
>>> that.
>>>
>>>
>>> >> I've just briefly skimmed over the proposed interfaces. I would
>>> suggest
>>> one
>>> >> addition to the Writer interface (as I understand this is the runtime
>>> >> interface in this proposal?): add some availability method, to avoid,
>>> if
>>> >> possible, blocking calls on the sink. We already have similar
>>> >> availability methods in the new sources [1] and in various places in
>>> the
>>> >> network stack [2].
>>> >> BTW Let's not forget about Piotr's comment. I think we could add the
>>> isAvailable or similar method to the Writer interface in the FLIP.
>>>
>>> Thanks @Dawid Wysakowicz   for your reminder.
>>> There
>>&g

Re: [DISCUSS] FLIP-143: Unified Sink API

2020-09-15 Thread Guowei Ma
Hi,all

Thanks for all your valuable options and ideas.Currently there are many
topics in the mail. I try to summarize what is consensus and what is not.
Correct me if I am wrong.

## Consensus

1. The motivation of the unified sink API is to decouple the sink
implementation from the different runtime execution mode.
2. The initial scope of the unified sink API only covers the file system
type, which supports the real transactions. The FLIP focuses more on the
semantics the new sink api should support.
3. We prefer the first alternative API, which could give the framework a
greater opportunity to optimize.
4. The `Writer` needs to add a method `prepareCommit`, which would be
called from `prepareSnapshotPreBarrier`. And remove the `Flush` method.
5. The FLIP could move the `Snapshot & Drain` section in order to be more
focused.

## Not Consensus

1. What should the “Unified Sink API” support/cover? The API can
“unified”(decoupe) the commit operation in the term of supporting exactly
once semantics. However, even if we narrow down the initial supported
system to the file system there would be different topology requirements.
These requirements come from performance optimization
(IceBergSink/MergeHiveSink) or functionality(i.e. whether a bucket is
“finished”).  Should the unified sink API support these requirements?
2. The API does not expose the checkpoint-id because the batch execution
mode does not have the normal checkpoint. But there still some
implementations depend on this.(IceBergSink uses this to do some dedupe).
I think how to support this requirement depends on the first open question.
3. Whether the `Writer` supports async functionality or not. Currently I do
not know which sink could benefit from it. Maybe it is just my own problem.

Best,
Guowei


On Wed, Sep 16, 2020 at 12:02 PM Guowei Ma  wrote:

>
> Hi, Steven
> Thanks you for your thoughtful ideas and concerns.
>
> >>I still like the concept of grouping data files per checkpoint for
> streaming mode. it is cleaner and probably easier to manage and deal with
> commit failures. Plus, it >>can reduce dupes for the at least once
> >>mode.  I understand checkpoint is not an option for batch execution. We
> don't have to expose the checkpointId in API, as >>long as  the internal
> bookkeeping groups data files by checkpoints for streaming >>mode.
>
> I think this problem(How to dedupe the combined committed data) also
> depends on where to place the agg/combine logic .
>
> 1. If the agg/combine takes place in the “commit” maybe we need to figure
> out how to give the aggregated committable a unique and auto-increment id
> in the committer.
> 2. If the agg/combine takes place in a separate operator maybe sink
> developer could maintain the id itself by using the state.
>
> I think this problem is also decided by what the topology pattern the sink
> API should support. Actually there are already many other topology
> requirements. :)
>
> Best,
> Guowei
>
>
> On Wed, Sep 16, 2020 at 7:46 AM Steven Wu  wrote:
>
>> > AFAIK the committer would not see the file-1-2 when ck1 happens in the
>> ExactlyOnce mode.
>>
>> @Guowei Ma  I think you are right for exactly once
>> checkpoint semantics. what about "at least once"? I guess we can argue that
>> it is fine to commit file-1-2 for at least once mode.
>>
>> I still like the concept of grouping data files per checkpoint for
>> streaming mode. it is cleaner and probably easier to manage and deal with
>> commit failures. Plus, it can reduce dupes for the at least once mode.  I
>> understand checkpoint is not an option for batch execution. We don't have
>> to expose the checkpointId in API, as long as  the internal bookkeeping
>> groups data files by checkpoints for streaming mode.
>>
>>
>> On Tue, Sep 15, 2020 at 6:58 AM Steven Wu  wrote:
>>
>>> > images don't make it through to the mailing lists. You would need to
>>> host the file somewhere and send a link.
>>>
>>> Sorry about that. Here is the sample DAG in google drawings.
>>>
>>> https://docs.google.com/drawings/d/1-P8F2jF9RG9HHTtAfWEBRuU_2uV9aDTdqEt5dLs2JPk/edit?usp=sharing
>>>
>>>
>>> On Tue, Sep 15, 2020 at 4:58 AM Guowei Ma  wrote:
>>>
>>>> Hi, Dawid
>>>>
>>>> >>I still find the merging case the most confusing. I don't necessarily
>>>> understand why do you need the "SingleFileCommit" step in this scenario.
>>>> The way I
>>>> >> understand "commit" operation is that it makes some data/artifacts
>>>> visible to the external system, thus it should be immutable from a
>>>> poin

Re: [ANNOUNCE] New Apache Flink Committer - Godfrey He

2020-09-15 Thread Guowei Ma
Congratulations :)

Best,
Guowei


On Wed, Sep 16, 2020 at 12:19 PM Jark Wu  wrote:

> Hi everyone,
>
> It's great seeing many new Flink committers recently, and on behalf of the
> PMC,
> I'd like to announce one more new committer: Godfrey He.
>
> Godfrey is a very long time contributor in the Flink community since the
> end of 2016.
> He has been a very active contributor in the Flink SQL component with 153
> PRs and more than 571,414 lines which is quite outstanding.
> Godfrey has paid essential effort with SQL optimization and helped a lot
> during the blink merging.
> Besides that, he is also quite active with community work especially in
> Chinese mailing list.
>
> Please join me in congratulating Godfrey for becoming a Flink committer!
>
> Cheers,
> Jark Wu
>


Re: Re: [ANNOUNCE] New Apache Flink Committer - Igal Shilman

2020-09-15 Thread Guowei Ma
Congratulations :)
Best,
Guowei


On Wed, Sep 16, 2020 at 11:54 AM Zhijiang
 wrote:

> Congratulations and welcome, Igal!
>
>
> --
> From:Yun Gao 
> Send Time:2020年9月16日(星期三) 10:59
> To:Stephan Ewen ; dev 
> Subject:Re: Re: [ANNOUNCE] New Apache Flink Committer - Igal Shilman
>
> Congratulations Igal!
>
> Best,
>  Yun
>
>
>
>
>
>
> --
> Sender:Stephan Ewen
> Date:2020/09/15 22:48:30
> Recipient:dev
> Theme:Re: [ANNOUNCE] New Apache Flink Committer - Igal Shilman
>
> Welcome, Igal!
>
> On Tue, Sep 15, 2020 at 3:18 PM Seth Wiesman  wrote:
>
> > Congrats Igal!
> >
> > On Tue, Sep 15, 2020 at 7:13 AM Benchao Li  wrote:
> >
> > > Congratulations!
> > >
> > > Zhu Zhu  于2020年9月15日周二 下午6:51写道:
> > >
> > > > Congratulations, Igal!
> > > >
> > > > Thanks,
> > > > Zhu
> > > >
> > > > Rafi Aroch  于2020年9月15日周二 下午6:43写道:
> > > >
> > > > > Congratulations Igal! Well deserved!
> > > > >
> > > > > Rafi
> > > > >
> > > > >
> > > > > On Tue, Sep 15, 2020 at 11:14 AM Tzu-Li (Gordon) Tai <
> > > > tzuli...@apache.org>
> > > > > wrote:
> > > > >
> > > > > > Hi all,
> > > > > >
> > > > > > It's great seeing many new Flink committers recently, and to add
> to
> > > > that
> > > > > > I'd like to announce one more new committer: Igal Shilman!
> > > > > >
> > > > > > Igal has been a long time member of the community. You may very
> > > likely
> > > > > know
> > > > > > Igal from the Stateful Functions sub-project, as he was the
> > original
> > > > > author
> > > > > > of it before it was contributed to Flink.
> > > > > > Ever since StateFun was contributed to Flink, he has consistently
> > > > > > maintained the project and supported users in the mailing lists.
> > > > > > Before that, he had also helped tremendously in some work on
> > Flink's
> > > > > > serialization stack.
> > > > > >
> > > > > > Please join me in welcoming and congratulating Igal for becoming
> a
> > > > Flink
> > > > > > committer!
> > > > > >
> > > > > > Cheers,
> > > > > > Gordon
> > > > > >
> > > > >
> > > >
> > >
> > >
> > > --
> > >
> > > Best,
> > > Benchao Li
> > >
> >
>
>
>


Re: [ANNOUNCE] New Apache Flink Committer - Yun Tang

2020-09-15 Thread Guowei Ma
Congratulations :)

Best,
Guowei


On Wed, Sep 16, 2020 at 11:54 AM Zhijiang
 wrote:

> Congratulations and welcome, Yun!
>
>
> --
> From:Jark Wu 
> Send Time:2020年9月16日(星期三) 11:35
> To:dev 
> Cc:tangyun ; Yun Tang 
> Subject:Re: [ANNOUNCE] New Apache Flink Committer - Yun Tang
>
> Congratulations Yun!
>
> On Wed, 16 Sep 2020 at 10:40, Rui Li  wrote:
>
> > Congratulations Yun!
> >
> > On Wed, Sep 16, 2020 at 10:20 AM Paul Lam  wrote:
> >
> > > Congrats, Yun! Well deserved!
> > >
> > > Best,
> > > Paul Lam
> > >
> > > > 2020年9月15日 19:14,Yang Wang  写道:
> > > >
> > > > Congratulations, Yun!
> > > >
> > > > Best,
> > > > Yang
> > > >
> > > > Leonard Xu  于2020年9月15日周二 下午7:11写道:
> > > >
> > > >> Congrats, Yun!
> > > >>
> > > >> Best,
> > > >> Leonard
> > > >>> 在 2020年9月15日,19:01,Yangze Guo  写道:
> > > >>>
> > > >>> Congrats, Yun!
> > > >>
> > > >>
> > >
> > >
> >
> > --
> > Best regards!
> > Rui Li
> >
>
>


Re: [ANNOUNCE] New Apache Flink Committer - Niels Basjes

2020-09-15 Thread Guowei Ma
Congratulations :)

Best,
Guowei


On Tue, Sep 15, 2020 at 6:14 PM Matthias Pohl 
wrote:

> Congrats!
>
> Best,
> Matthias
>
> On Tue, Sep 15, 2020 at 9:26 AM Dawid Wysakowicz 
> wrote:
>
> > Welcome, Niels!
> >
> > Best,
> >
> > Dawid
> >
> > On 14/09/2020 11:22, Matt Wang wrote:
> > > Congratulations, Niels!
> > >
> > >
> > > --
> > >
> > > Best,
> > > Matt Wang
> > >
> > >
> > > On 09/14/2020 17:02,Konstantin Knauf wrote:
> > > Congratulations!
> > >
> > > On Mon, Sep 14, 2020 at 10:51 AM tison  wrote:
> > >
> > > Congrats!
> > >
> > > Best,
> > > tison.
> > >
> > >
> > > Aljoscha Krettek  于2020年9月14日周一 下午4:38写道:
> > >
> > > Congratulations! 💐
> > >
> > > Aljoscha
> > >
> > > On 14.09.20 10:37, Robert Metzger wrote:
> > > Hi all,
> > >
> > > On behalf of the PMC, I’m very happy to announce Niels Basjes as a new
> > > Flink committer.
> > >
> > > Niels has been an active community member since the early days of
> > > Flink,
> > > with 19 commits dating back until 2015.
> > > Besides his work on the code, he has been driving initiatives on dev@
> > > list,
> > > supporting users and giving talks at conferences.
> > >
> > > Please join me in congratulating Niels for becoming a Flink committer!
> > >
> > > Best,
> > > Robert Metzger
> > >
> > >
> > >
> > >
> > >
> > >
> > > --
> > >
> > > Konstantin Knauf | Head of Product
> > >
> > > +49 160 91394525
> > >
> > >
> > > Follow us @VervericaData Ververica 
> > >
> > >
> > > --
> > >
> > > Join Flink Forward  - The Apache Flink
> > > Conference
> > >
> > > Stream Processing | Event Driven | Real Time
> > >
> > > --
> > >
> > > Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany
> > >
> > > --
> > > Ververica GmbH
> > > Registered at Amtsgericht Charlottenburg: HRB 158244 B
> > > Managing Directors: Yip Park Tung Jason, Jinwei (Kevin) Zhang, Karl
> Anton
> > > Wehner
> >
>


Re: [ANNOUNCE] New Apache Flink Committer - Arvid Heise

2020-09-15 Thread Guowei Ma
Congratulations :)

Best,
Guowei


On Tue, Sep 15, 2020 at 6:41 PM 刘建刚  wrote:

> Congratulations!
>
> Best
>
> Matthias Pohl  于2020年9月15日周二 下午6:07写道:
>
> > Congratulations! ;-)
> >
> > On Tue, Sep 15, 2020 at 11:47 AM Xingbo Huang 
> wrote:
> >
> > > Congratulations!
> > >
> > > Best,
> > > Xingbo
> > >
> > > Igal Shilman  于2020年9月15日周二 下午5:44写道:
> > >
> > > > Congrats Arvid!
> > > >
> > > > On Tue, Sep 15, 2020 at 11:12 AM David Anderson <
> da...@alpinegizmo.com
> > >
> > > > wrote:
> > > >
> > > > > Congratulations, Arvid! Well deserved.
> > > > >
> > > > > Best,
> > > > > David
> > > > >
> > > > > On Tue, Sep 15, 2020 at 10:23 AM Paul Lam 
> > > wrote:
> > > > >
> > > > > > Congrats, Arvid!
> > > > > >
> > > > > > Best,
> > > > > > Paul Lam
> > > > > >
> > > > > > > 2020年9月15日 15:29,Jingsong Li  写道:
> > > > > > >
> > > > > > > Congratulations Arvid !
> > > > > > >
> > > > > > > Best,
> > > > > > > Jingsong
> > > > > > >
> > > > > > > On Tue, Sep 15, 2020 at 3:27 PM Dawid Wysakowicz <
> > > > > dwysakow...@apache.org
> > > > > > >
> > > > > > > wrote:
> > > > > > >
> > > > > > >> Congratulations Arvid! Very well deserved!
> > > > > > >>
> > > > > > >> Best,
> > > > > > >>
> > > > > > >> Dawid
> > > > > > >>
> > > > > > >> On 15/09/2020 04:38, Zhijiang wrote:
> > > > > > >>> Hi all,
> > > > > > >>>
> > > > > > >>> On behalf of the PMC, I’m very happy to announce Arvid Heise
> > as a
> > > > new
> > > > > > >> Flink committer.
> > > > > > >>>
> > > > > > >>> Arvid has been an active community member for more than a
> year,
> > > > with
> > > > > > 138
> > > > > > >> contributions including 116 commits, reviewed many PRs with
> good
> > > > > quality
> > > > > > >> comments.
> > > > > > >>> He is mainly working on the runtime scope, involved in
> critical
> > > > > > features
> > > > > > >> like task mailbox model and unaligned checkpoint, etc.
> > > > > > >>> Besides that, he was super active to reply questions in the
> > user
> > > > mail
> > > > > > >> list (34 emails in March, 51 emails in June, etc), also active
> > in
> > > > dev
> > > > > > mail
> > > > > > >> list and Jira issue discussions.
> > > > > > >>>
> > > > > > >>> Please join me in congratulating Arvid for becoming a Flink
> > > > > committer!
> > > > > > >>>
> > > > > > >>> Best,
> > > > > > >>> Zhijiang
> > > > > > >>
> > > > > > >>
> > > > > > >
> > > > > > > --
> > > > > > > Best, Jingsong Lee
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> >
> > --
> >
> > Matthias Pohl | Engineer
> >
> > Follow us @VervericaData Ververica 
> >
> > --
> >
> > Join Flink Forward  - The Apache Flink
> > Conference
> >
> > Stream Processing | Event Driven | Real Time
> >
> > --
> >
> > Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany
> >
> > --
> > Ververica GmbH
> > Registered at Amtsgericht Charlottenburg: HRB 158244 B
> > Managing Directors: Yip Park Tung Jason, Jinwei (Kevin) Zhang, Karl Anton
> > Wehner
> >
>


  1   2   3   4   >