[jira] [Created] (FLINK-32868) Document the need to backport FLINK-30213 for using autoscaler with older version Flinks

2023-08-14 Thread Zhanghao Chen (Jira)
Zhanghao Chen created FLINK-32868:
-

 Summary: Document the need to backport FLINK-30213 for using 
autoscaler with older version Flinks
 Key: FLINK-32868
 URL: https://issues.apache.org/jira/browse/FLINK-32868
 Project: Flink
  Issue Type: Improvement
  Components: Autoscaler
Reporter: Zhanghao Chen


The current Autoscaler doc states on job requirements as the following:

Job requirements:
 * The autoscaler currently only works with the latest [Flink 
1.17|https://hub.docker.com/_/flink] or after backporting the following fixes 
to your 1.15/1.16 Flink image
 ** [Job vertex parallelism 
overrides|https://github.com/apache/flink/commit/23ce2281a0bb4047c64def9af7ddd5f19d88e2a9]
 (must have)
 ** [Support timespan for busyTime 
metrics|https://github.com/apache/flink/commit/a7fdab8b23cddf568fa32ee7eb804d7c3eb23a35]
 (good to have)

However, https://issues.apache.org/jira/browse/FLINK-30213 is also crucial and 
need to be backported to 1.15/1.16 to enable autoscaling. We should add it to 
the doc as well, and marked as must have.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] Update Flink Roadmap

2023-08-14 Thread Xintong Song
Thanks for driving this, Jark.

The current draft looks good to me. I think it is good to open a PR with
it. And if there are other comments, we can discuss them during the PR
review.

I also added a few minor comments in the draft regarding the feature radar.
Those can also be discussed on the PR.

Best,

Xintong



On Tue, Aug 15, 2023 at 11:15 AM Shammon FY  wrote:

> Hi Jark,
>
> Sounds good and I would love to, thanks! I will involve you and Xingtong
> on the document after updating.
>
> Best,
> Shammon FY
>
>
> On Mon, Aug 14, 2023 at 10:39 PM Jark Wu  wrote:
>
>> Hi Shammon,
>>
>> Sure, could you help to draft a subsection about this in the google doc?
>>
>> Best,
>> Jark
>>
>> 2023年8月14日 20:30,Shammon FY  写道:
>>
>> Thanks @Jark for driving the Flink Roadmap.
>>
>> As we discussed olap in the thread [1] and according to the suggestions
>> from @Xingtong Song, could we add a subsection in `Towards Streaming
>> Warehouses` or `Performance` that the short-lived query in Flink Session
>> Cluster is one of the future directions for Flink?
>>
>> Best,
>> Shammon FY
>>
>> On Mon, Aug 14, 2023 at 8:03 PM Jark Wu  wrote:
>>
>>> Thank you everyone for helping polish the roadmap [1].
>>>
>>> I think I have addressed all the comments and we have included all
>>> ongoing
>>> parts of Flink.
>>> Please feel free to take a last look. I'm going to prepare the pull
>>> request
>>> if there are no more concerns.
>>>
>>> Best,
>>> Jark
>>>
>>> [1]:
>>>
>>> https://docs.google.com/document/d/12BDiVKEsY-f7HI3suO_IxwzCmR04QcVqLarXgyJAb7c/edit
>>>
>>> On Sun, 13 Aug 2023 at 13:04, Yuan Mei  wrote:
>>>
>>> > Sorry for taking so long
>>> >
>>> > I've added a section about Flink Disaggregated State Management
>>> Evolution
>>> > in the attached doc.
>>> >
>>> > I found some of the contents might be overlapped with the "large-scale
>>> > streaming jobs". So that part might need some changes as well.
>>> >
>>> > Please let me know what you think.
>>> >
>>> > Best
>>> > Yuan
>>> >
>>> > On Mon, Jul 24, 2023 at 12:07 PM Yuan Mei 
>>> wrote:
>>> >
>>> > > Sorry have missed this email and respond a bit late.
>>> > >
>>> > > I will put a draft for the long-term vision for the state as well as
>>> > > large-scale state support into the roadmap.
>>> > >
>>> > > Best
>>> > > Yuan
>>> > >
>>> > > On Mon, Jul 17, 2023 at 10:34 AM Jark Wu  wrote:
>>> > >
>>> > >> Hi Jiabao,
>>> > >>
>>> > >> Thank you for your suggestions. I have added them to the "Going
>>> Beyond a
>>> > >> SQL Stream/Batch Processing Engine" and "Large-Scale State Jobs"
>>> > sections.
>>> > >>
>>> > >> Best,
>>> > >> Jark
>>> > >>
>>> > >> On Thu, 13 Jul 2023 at 16:06, Jiabao Sun >> > >> .invalid>
>>> > >> wrote:
>>> > >>
>>> > >> > Thanks Jark and Martijn for driving this.
>>> > >> >
>>> > >> > There are two suggestions about the Table API:
>>> > >> >
>>> > >> > - Add the JSON type to adapt to the no sql database type.
>>> > >> > - Remove changelog normalize operator for upsert stream.
>>> > >> >
>>> > >> >
>>> > >> > Best,
>>> > >> > Jiabao
>>> > >> >
>>> > >> >
>>> > >> > > 2023年7月13日 下午3:49,Jark Wu  写道:
>>> > >> > >
>>> > >> > > Hi all,
>>> > >> > >
>>> > >> > > Sorry for taking so long back here.
>>> > >> > >
>>> > >> > > Martijn and I have drafted the first version of the updated
>>> roadmap,
>>> > >> > > including the updated feature radar reflecting the current
>>> state of
>>> > >> > > different components.
>>> > >> > >
>>> > >> >
>>> > >>
>>> >
>>> https://docs.google.com/document/d/12BDiVKEsY-f7HI3suO_IxwzCmR04QcVqLarXgyJAb7c/edit
>>> > >> > >
>>> > >> > > Feel free to leave comments in the thread or the document.
>>> > >> > > We may miss mentioning something important, so your help in
>>> > enriching
>>> > >> > > the content is greatly appreciated.
>>> > >> > >
>>> > >> > > Best,
>>> > >> > > Jark & Martijn
>>> > >> > >
>>> > >> > >
>>> > >> > > On Fri, 2 Jun 2023 at 00:50, Jing Ge >> >
>>> > >> wrote:
>>> > >> > >
>>> > >> > >> Hi Jark,
>>> > >> > >>
>>> > >> > >> Fair enough. Let's do it like you suggested. Thanks!
>>> > >> > >>
>>> > >> > >> Best regards,
>>> > >> > >> Jing
>>> > >> > >>
>>> > >> > >> On Thu, Jun 1, 2023 at 6:00 PM Jark Wu 
>>> wrote:
>>> > >> > >>
>>> > >> > >>> Hi Jing,
>>> > >> > >>>
>>> > >> > >>> This thread is for discussing the roadmap for versions 1.18,
>>> 2.0,
>>> > >> and
>>> > >> > >> even
>>> > >> > >>> more.
>>> > >> > >>> One of the outcomes of this discussion will be an updated
>>> version
>>> > of
>>> > >> > the
>>> > >> > >>> current roadmap.
>>> > >> > >>> Let's work together on refining the roadmap in this thread.
>>> > >> > >>>
>>> > >> > >>> Best,
>>> > >> > >>> Jark
>>> > >> > >>>
>>> > >> > >>> On Thu, 1 Jun 2023 at 23:25, Jing Ge
>>> 
>>> > >> > wrote:
>>> > >> > >>>
>>> > >> >  Hi Jark,
>>> > >> > 
>>> > >> >  Thanks for driving it! For point 2, since we are developing
>>> 1.18
>>> > >> now,
>>> > >> >  does it make sense to update the roadmap this time while we
>>> are
>>> > >> > >> releasing

Re: [DISCUSS] FLIP-328: Allow source operators to determine isProcessingBacklog based on watermark lag

2023-08-14 Thread Xintong Song
Sounds good to me.

It is true that, if we are introducing the generalized watermark, there
will be other watermark related concepts / configurations that need to be
updated anyway.


Best,

Xintong



On Tue, Aug 15, 2023 at 11:30 AM Xuannan Su  wrote:

> Hi Xingtong,
>
> Thank you for your suggestion.
>
> After considering the idea of using a general configuration key, I think
> it may not be a good idea for the reasons below.
>
> While I agree that using a more general configuration key provides us with
> the flexibility to switch to other approaches to calculate the lag in the
> future, the downside is that it may cause confusion for users. We currently
> have fetchEventTimeLag, emitEventTimeLag, and watermarkLag in the source,
> and it is not clear which specific lag we are referring to. With the
> potential introduction of the Generalized Watermark mechanism in the
> future, if I understand correctly, a watermark won't necessarily need to be
> a timestamp. I am concern that the general configuration key may not  be
> enough to cover all the use case and we will need to introduce a general
> way to determine the backlog status regardless.
>
> For the reasons above, I prefer introducing the configuration as is, and
> change it later with the a deprecation process or migration process. What
> do you think?
>
> Best,
> Xuannan
> On Aug 14, 2023, 14:09 +0800, Xintong Song , wrote:
> > Thanks for the explanation.
> >
> > I wonder if it makes sense to not expose this detail via the
> configuration
> > option. To be specific, I suggest not mentioning the "watermark" keyword
> in
> > the configuration key and description.
> >
> > - From the users' perspective, I think they only need to know there's a
> > lag higher than the given threshold, Flink will consider latency of
> > individual records as less important and prioritize throughput over it.
> > They don't really need the details of how the lags are calculated.
> > - For the internal implementation, I also think using watermark lags is
> > a good idea, for the reasons you've already mentioned. However, it's not
> > the only possible option. Hiding this detail from users would give us the
> > flexibility to switch to other approaches if needed in future.
> > - We are currently working on designing the ProcessFunction API
> > (consider it as a DataStream API V2). There's an idea to introduce a
> > Generalized Watermark mechanism, where basically the watermark can be
> > anything that needs to travel along the data-flow with certain alignment
> > strategies, and event time watermark would be one specific case of it.
> This
> > is still an idea and has not been discussed and agreed on by the
> community,
> > and we are preparing a FLIP for it. But if we are going for it, the
> concept
> > "watermark-lag-threshold" could be ambiguous.
> >
> > I do not intend to block the FLIP on this. I'd also be fine with
> > introducing the configuration as is, and changing it later, if needed,
> with
> > a regular deprecation and migration process. Just making my suggestions.
> >
> >
> > Best,
> >
> > Xintong
> >
> >
> >
> > On Mon, Aug 14, 2023 at 12:00 PM Xuannan Su 
> wrote:
> >
> > > Hi Xintong,
> > >
> > > Thanks for the reply.
> > >
> > > I have considered using the timestamp in the records to determine the
> > > backlog status, and decided to use watermark at the end. By definition,
> > > watermark is the time progress indication in the data stream. It
> indicates
> > > the stream’s event time has progressed to some specific time. On the
> other
> > > hand, timestamp in the records is usually used to generate the
> watermark.
> > > Therefore, it appears more appropriate and intuitive to calculate the
> event
> > > time lag by watermark and determine the backlog status. And by using
> the
> > > watermark, we can easily deal with the out-of-order and the idleness
> of the
> > > data.
> > >
> > > Please let me know if you have further questions.
> > >
> > > Best,
> > > Xuannan
> > > On Aug 10, 2023, 20:23 +0800, Xintong Song ,
> wrote:
> > > > Thanks for preparing the FLIP, Xuannan.
> > > >
> > > > +1 in general.
> > > >
> > > > A quick question, could you explain why we are relying on the
> watermark
> > > for
> > > > emitting the record attribute? Why not use timestamps in the
> records? I
> > > > don't see any concern in using watermarks. Just wondering if there's
> any
> > > > deep considerations behind this.
> > > >
> > > > Best,
> > > >
> > > > Xintong
> > > >
> > > >
> > > >
> > > > On Thu, Aug 3, 2023 at 3:03 PM Xuannan Su 
> wrote:
> > > >
> > > > > Hi all,
> > > > >
> > > > > I am opening this thread to discuss FLIP-328: Allow source
> operators to
> > > > > determine isProcessingBacklog based on watermark lag[1]. We had a
> > > several
> > > > > discussions with Dong Ling about the design, and thanks for all the
> > > > > valuable advice.
> > > > >
> > > > > The FLIP aims to target the use-case where user want to run a Flink
> > > job to
> > > > > backfill hi

Re: [DISCUSS] FLIP-328: Allow source operators to determine isProcessingBacklog based on watermark lag

2023-08-14 Thread Xuannan Su
Hi Xingtong,

Thank you for your suggestion.

After considering the idea of using a general configuration key, I think it may 
not be a good idea for the reasons below.

While I agree that using a more general configuration key provides us with the 
flexibility to switch to other approaches to calculate the lag in the future, 
the downside is that it may cause confusion for users. We currently have 
fetchEventTimeLag, emitEventTimeLag, and watermarkLag in the source, and it is 
not clear which specific lag we are referring to. With the potential 
introduction of the Generalized Watermark mechanism in the future, if I 
understand correctly, a watermark won't necessarily need to be a timestamp. I 
am concern that the general configuration key may not  be enough to cover all 
the use case and we will need to introduce a general way to determine the 
backlog status regardless.

For the reasons above, I prefer introducing the configuration as is, and change 
it later with the a deprecation process or migration process. What do you think?

Best,
Xuannan
On Aug 14, 2023, 14:09 +0800, Xintong Song , wrote:
> Thanks for the explanation.
>
> I wonder if it makes sense to not expose this detail via the configuration
> option. To be specific, I suggest not mentioning the "watermark" keyword in
> the configuration key and description.
>
> - From the users' perspective, I think they only need to know there's a
> lag higher than the given threshold, Flink will consider latency of
> individual records as less important and prioritize throughput over it.
> They don't really need the details of how the lags are calculated.
> - For the internal implementation, I also think using watermark lags is
> a good idea, for the reasons you've already mentioned. However, it's not
> the only possible option. Hiding this detail from users would give us the
> flexibility to switch to other approaches if needed in future.
> - We are currently working on designing the ProcessFunction API
> (consider it as a DataStream API V2). There's an idea to introduce a
> Generalized Watermark mechanism, where basically the watermark can be
> anything that needs to travel along the data-flow with certain alignment
> strategies, and event time watermark would be one specific case of it. This
> is still an idea and has not been discussed and agreed on by the community,
> and we are preparing a FLIP for it. But if we are going for it, the concept
> "watermark-lag-threshold" could be ambiguous.
>
> I do not intend to block the FLIP on this. I'd also be fine with
> introducing the configuration as is, and changing it later, if needed, with
> a regular deprecation and migration process. Just making my suggestions.
>
>
> Best,
>
> Xintong
>
>
>
> On Mon, Aug 14, 2023 at 12:00 PM Xuannan Su  wrote:
>
> > Hi Xintong,
> >
> > Thanks for the reply.
> >
> > I have considered using the timestamp in the records to determine the
> > backlog status, and decided to use watermark at the end. By definition,
> > watermark is the time progress indication in the data stream. It indicates
> > the stream’s event time has progressed to some specific time. On the other
> > hand, timestamp in the records is usually used to generate the watermark.
> > Therefore, it appears more appropriate and intuitive to calculate the event
> > time lag by watermark and determine the backlog status. And by using the
> > watermark, we can easily deal with the out-of-order and the idleness of the
> > data.
> >
> > Please let me know if you have further questions.
> >
> > Best,
> > Xuannan
> > On Aug 10, 2023, 20:23 +0800, Xintong Song , wrote:
> > > Thanks for preparing the FLIP, Xuannan.
> > >
> > > +1 in general.
> > >
> > > A quick question, could you explain why we are relying on the watermark
> > for
> > > emitting the record attribute? Why not use timestamps in the records? I
> > > don't see any concern in using watermarks. Just wondering if there's any
> > > deep considerations behind this.
> > >
> > > Best,
> > >
> > > Xintong
> > >
> > >
> > >
> > > On Thu, Aug 3, 2023 at 3:03 PM Xuannan Su  wrote:
> > >
> > > > Hi all,
> > > >
> > > > I am opening this thread to discuss FLIP-328: Allow source operators to
> > > > determine isProcessingBacklog based on watermark lag[1]. We had a
> > several
> > > > discussions with Dong Ling about the design, and thanks for all the
> > > > valuable advice.
> > > >
> > > > The FLIP aims to target the use-case where user want to run a Flink
> > job to
> > > > backfill historical data in a high throughput manner and continue
> > > > processing real-time data with low latency. Building upon the backlog
> > > > concept introduced in FLIP-309[2], this proposal enables sources to
> > report
> > > > their status of processing backlog based on the watermark lag.
> > > >
> > > > We would greatly appreciate any comments or feedback you may have on
> > this
> > > > proposal.
> > > >
> > > > Best,
> > > > Xuannan
> > > >
> > > >
> > > > [1]
> > > >
> > https://cwiki.

Re: [DISCUSS] Update Flink Roadmap

2023-08-14 Thread Shammon FY
Hi Jark,

Sounds good and I would love to, thanks! I will involve you and Xingtong on
the document after updating.

Best,
Shammon FY


On Mon, Aug 14, 2023 at 10:39 PM Jark Wu  wrote:

> Hi Shammon,
>
> Sure, could you help to draft a subsection about this in the google doc?
>
> Best,
> Jark
>
> 2023年8月14日 20:30,Shammon FY  写道:
>
> Thanks @Jark for driving the Flink Roadmap.
>
> As we discussed olap in the thread [1] and according to the suggestions
> from @Xingtong Song, could we add a subsection in `Towards Streaming
> Warehouses` or `Performance` that the short-lived query in Flink Session
> Cluster is one of the future directions for Flink?
>
> Best,
> Shammon FY
>
> On Mon, Aug 14, 2023 at 8:03 PM Jark Wu  wrote:
>
>> Thank you everyone for helping polish the roadmap [1].
>>
>> I think I have addressed all the comments and we have included all ongoing
>> parts of Flink.
>> Please feel free to take a last look. I'm going to prepare the pull
>> request
>> if there are no more concerns.
>>
>> Best,
>> Jark
>>
>> [1]:
>>
>> https://docs.google.com/document/d/12BDiVKEsY-f7HI3suO_IxwzCmR04QcVqLarXgyJAb7c/edit
>>
>> On Sun, 13 Aug 2023 at 13:04, Yuan Mei  wrote:
>>
>> > Sorry for taking so long
>> >
>> > I've added a section about Flink Disaggregated State Management
>> Evolution
>> > in the attached doc.
>> >
>> > I found some of the contents might be overlapped with the "large-scale
>> > streaming jobs". So that part might need some changes as well.
>> >
>> > Please let me know what you think.
>> >
>> > Best
>> > Yuan
>> >
>> > On Mon, Jul 24, 2023 at 12:07 PM Yuan Mei 
>> wrote:
>> >
>> > > Sorry have missed this email and respond a bit late.
>> > >
>> > > I will put a draft for the long-term vision for the state as well as
>> > > large-scale state support into the roadmap.
>> > >
>> > > Best
>> > > Yuan
>> > >
>> > > On Mon, Jul 17, 2023 at 10:34 AM Jark Wu  wrote:
>> > >
>> > >> Hi Jiabao,
>> > >>
>> > >> Thank you for your suggestions. I have added them to the "Going
>> Beyond a
>> > >> SQL Stream/Batch Processing Engine" and "Large-Scale State Jobs"
>> > sections.
>> > >>
>> > >> Best,
>> > >> Jark
>> > >>
>> > >> On Thu, 13 Jul 2023 at 16:06, Jiabao Sun > > >> .invalid>
>> > >> wrote:
>> > >>
>> > >> > Thanks Jark and Martijn for driving this.
>> > >> >
>> > >> > There are two suggestions about the Table API:
>> > >> >
>> > >> > - Add the JSON type to adapt to the no sql database type.
>> > >> > - Remove changelog normalize operator for upsert stream.
>> > >> >
>> > >> >
>> > >> > Best,
>> > >> > Jiabao
>> > >> >
>> > >> >
>> > >> > > 2023年7月13日 下午3:49,Jark Wu  写道:
>> > >> > >
>> > >> > > Hi all,
>> > >> > >
>> > >> > > Sorry for taking so long back here.
>> > >> > >
>> > >> > > Martijn and I have drafted the first version of the updated
>> roadmap,
>> > >> > > including the updated feature radar reflecting the current state
>> of
>> > >> > > different components.
>> > >> > >
>> > >> >
>> > >>
>> >
>> https://docs.google.com/document/d/12BDiVKEsY-f7HI3suO_IxwzCmR04QcVqLarXgyJAb7c/edit
>> > >> > >
>> > >> > > Feel free to leave comments in the thread or the document.
>> > >> > > We may miss mentioning something important, so your help in
>> > enriching
>> > >> > > the content is greatly appreciated.
>> > >> > >
>> > >> > > Best,
>> > >> > > Jark & Martijn
>> > >> > >
>> > >> > >
>> > >> > > On Fri, 2 Jun 2023 at 00:50, Jing Ge > >
>> > >> wrote:
>> > >> > >
>> > >> > >> Hi Jark,
>> > >> > >>
>> > >> > >> Fair enough. Let's do it like you suggested. Thanks!
>> > >> > >>
>> > >> > >> Best regards,
>> > >> > >> Jing
>> > >> > >>
>> > >> > >> On Thu, Jun 1, 2023 at 6:00 PM Jark Wu 
>> wrote:
>> > >> > >>
>> > >> > >>> Hi Jing,
>> > >> > >>>
>> > >> > >>> This thread is for discussing the roadmap for versions 1.18,
>> 2.0,
>> > >> and
>> > >> > >> even
>> > >> > >>> more.
>> > >> > >>> One of the outcomes of this discussion will be an updated
>> version
>> > of
>> > >> > the
>> > >> > >>> current roadmap.
>> > >> > >>> Let's work together on refining the roadmap in this thread.
>> > >> > >>>
>> > >> > >>> Best,
>> > >> > >>> Jark
>> > >> > >>>
>> > >> > >>> On Thu, 1 Jun 2023 at 23:25, Jing Ge
>> 
>> > >> > wrote:
>> > >> > >>>
>> > >> >  Hi Jark,
>> > >> > 
>> > >> >  Thanks for driving it! For point 2, since we are developing
>> 1.18
>> > >> now,
>> > >> >  does it make sense to update the roadmap this time while we
>> are
>> > >> > >> releasing
>> > >> >  1.18? This discussion thread will be focusing on the Flink 2.0
>> > >> > roadmap,
>> > >> > >>> as
>> > >> >  you mentioned previously. WDYT?
>> > >> > 
>> > >> >  Best regards,
>> > >> >  Jing
>> > >> > 
>> > >> >  On Thu, Jun 1, 2023 at 3:31 PM Jark Wu 
>> wrote:
>> > >> > 
>> > >> > > Hi all,
>> > >> > >
>> > >> > > Martijn and I would like to initiate a discussion on the
>> Flink
>> > >> > >> roadmap,
>> > >> > > which should cover the project's long-term roadmap and the
>> > regular
>

Re: [ANNOUNCE] New Apache Flink Committer - Hangxiang Yu

2023-08-14 Thread Hang Ruan
Congratulations!

Best,
Hang

Roman Khachatryan  于2023年8月14日周一 22:36写道:

> Congratulations, Hangxiang!
>
> Regards,
> Roman
>
>
> On Wed, Aug 9, 2023 at 12:49 PM Benchao Li  wrote:
>
> > Congrats, Hangxiang!
> >
> > Jing Ge  于2023年8月8日周二 17:44写道:
> >
> > > Congrats, Hangxiang!
> > >
> > > Best regards,
> > > Jing
> > >
> > > On Tue, Aug 8, 2023 at 3:04 PM Yangze Guo  wrote:
> > >
> > > > Congrats, Hangxiang!
> > > >
> > > > Best,
> > > > Yangze Guo
> > > >
> > > > On Tue, Aug 8, 2023 at 11:28 AM yh z 
> wrote:
> > > > >
> > > > > Congratulations, Hangxiang !
> > > > >
> > > > >
> > > > > Best,
> > > > > Yunhong Zheng (Swuferhong)
> > > > >
> > > > > yuxia  于2023年8月8日周二 09:20写道:
> > > > >
> > > > > > Congratulations, Hangxiang !
> > > > > >
> > > > > > Best regards,
> > > > > > Yuxia
> > > > > >
> > > > > > - 原始邮件 -
> > > > > > 发件人: "Wencong Liu" 
> > > > > > 收件人: "dev" 
> > > > > > 发送时间: 星期一, 2023年 8 月 07日 下午 11:55:24
> > > > > > 主题: Re:[ANNOUNCE] New Apache Flink Committer - Hangxiang Yu
> > > > > >
> > > > > > Congratulations, Hangxiang !
> > > > > >
> > > > > >
> > > > > > Best,
> > > > > > Wencong
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > At 2023-08-07 14:57:49, "Yuan Mei" 
> wrote:
> > > > > > >On behalf of the PMC, I'm happy to announce Hangxiang Yu as a
> new
> > > > Flink
> > > > > > >Committer.
> > > > > > >
> > > > > > >Hangxiang has been active in the Flink community for more than
> 1.5
> > > > years
> > > > > > >and has played an important role in developing and maintaining
> > State
> > > > and
> > > > > > >Checkpoint related features/components, including Generic
> > > Incremental
> > > > > > >Checkpoints (take great efforts to make the feature prod-ready).
> > > > Hangxiang
> > > > > > >is also the main driver of the FLIP-263: Resolving schema
> > > > compatibility.
> > > > > > >
> > > > > > >Hangxiang is passionate about the Flink community. Besides the
> > > > technical
> > > > > > >contribution above, he is also actively promoting Flink: talks
> > about
> > > > > > Generic
> > > > > > >Incremental Checkpoints in Flink Forward and Meet-up. Hangxiang
> > also
> > > > spent
> > > > > > >a good amount of time supporting users, participating in
> > > Jira/mailing
> > > > list
> > > > > > >discussions, and reviewing code.
> > > > > > >
> > > > > > >Please join me in congratulating Hangxiang for becoming a Flink
> > > > Committer!
> > > > > > >
> > > > > > >Thanks,
> > > > > > >Yuan Mei (on behalf of the Flink PMC)
> > > > > >
> > > >
> > >
> >
> >
> > --
> >
> > Best,
> > Benchao Li
> >
>


Re: [ANNOUNCE] New Apache Flink Committer - Yanfei Lei

2023-08-14 Thread Hang Ruan
Congratulations!

Best,
Hang

Roman Khachatryan  于2023年8月14日周一 22:38写道:

> Congratulations, Yanfey!
>
> Regards,
> Roman
>
>
> On Wed, Aug 9, 2023 at 12:49 PM Benchao Li  wrote:
>
> > Congrats, YanFei!
> >
> > Jing Ge  于2023年8月8日周二 17:41写道:
> >
> > > Congrats, YanFei!
> > >
> > > Best regards,
> > > Jing
> > >
> > > On Tue, Aug 8, 2023 at 3:04 PM Yangze Guo  wrote:
> > >
> > > > Congrats, Yanfei!
> > > >
> > > > Best,
> > > > Yangze Guo
> > > >
> > > > On Tue, Aug 8, 2023 at 9:20 AM yuxia 
> > > wrote:
> > > > >
> > > > > Congratulations, Yanfei!
> > > > >
> > > > > Best regards,
> > > > > Yuxia
> > > > >
> > > > > - 原始邮件 -
> > > > > 发件人: "ron9 liu" 
> > > > > 收件人: "dev" 
> > > > > 发送时间: 星期一, 2023年 8 月 07日 下午 11:44:23
> > > > > 主题: Re: [ANNOUNCE] New Apache Flink Committer - Yanfei Lei
> > > > >
> > > > > Congratulations Yanfei!
> > > > >
> > > > > Best,
> > > > > Ron
> > > > >
> > > > > Zakelly Lan  于2023年8月7日周一 23:15写道:
> > > > >
> > > > > > Congratulations, Yanfei!
> > > > > >
> > > > > > Best regards,
> > > > > > Zakelly
> > > > > >
> > > > > > On Mon, Aug 7, 2023 at 9:04 PM Lincoln Lee <
> lincoln.8...@gmail.com
> > >
> > > > wrote:
> > > > > > >
> > > > > > > Congratulations, Yanfei!
> > > > > > >
> > > > > > > Best,
> > > > > > > Lincoln Lee
> > > > > > >
> > > > > > >
> > > > > > > Weihua Hu  于2023年8月7日周一 20:43写道:
> > > > > > >
> > > > > > > > Congratulations Yanfei!
> > > > > > > >
> > > > > > > > Best,
> > > > > > > > Weihua
> > > > > > > >
> > > > > > > >
> > > > > > > > On Mon, Aug 7, 2023 at 8:08 PM Feifan Wang <
> zoltar9...@163.com
> > >
> > > > wrote:
> > > > > > > >
> > > > > > > > > Congratulations Yanfei! :)
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > ——
> > > > > > > > > Name: Feifan Wang
> > > > > > > > > Email: zoltar9...@163.com
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >  Replied Message 
> > > > > > > > > | From | Matt Wang |
> > > > > > > > > | Date | 08/7/2023 19:40 |
> > > > > > > > > | To | dev@flink.apache.org |
> > > > > > > > > | Subject | Re: [ANNOUNCE] New Apache Flink Committer -
> > Yanfei
> > > > Lei |
> > > > > > > > > Congratulations Yanfei!
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > --
> > > > > > > > >
> > > > > > > > > Best,
> > > > > > > > > Matt Wang
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >  Replied Message 
> > > > > > > > > | From | Mang Zhang |
> > > > > > > > > | Date | 08/7/2023 18:56 |
> > > > > > > > > | To |  |
> > > > > > > > > | Subject | Re:Re: [ANNOUNCE] New Apache Flink Committer -
> > > Yanfei
> > > > > > Lei |
> > > > > > > > > Congratulations--
> > > > > > > > >
> > > > > > > > > Best regards,
> > > > > > > > > Mang Zhang
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > 在 2023-08-07 18:17:58,"Yuxin Tan" 
> > 写道:
> > > > > > > > > Congrats, Yanfei!
> > > > > > > > >
> > > > > > > > > Best,
> > > > > > > > > Yuxin
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > weijie guo  于2023年8月7日周一
> 17:59写道:
> > > > > > > > >
> > > > > > > > > Congrats, Yanfei!
> > > > > > > > >
> > > > > > > > > Best regards,
> > > > > > > > >
> > > > > > > > > Weijie
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Biao Geng  于2023年8月7日周一 17:03写道:
> > > > > > > > >
> > > > > > > > > Congrats, Yanfei!
> > > > > > > > > Best,
> > > > > > > > > Biao Geng
> > > > > > > > >
> > > > > > > > > 发送自 Outlook for iOS
> > > > > > > > > 
> > > > > > > > > 发件人: Qingsheng Ren 
> > > > > > > > > 发送时间: Monday, August 7, 2023 4:23:52 PM
> > > > > > > > > 收件人: dev@flink.apache.org 
> > > > > > > > > 主题: Re: [ANNOUNCE] New Apache Flink Committer - Yanfei Lei
> > > > > > > > >
> > > > > > > > > Congratulations and welcome, Yanfei!
> > > > > > > > >
> > > > > > > > > Best,
> > > > > > > > > Qingsheng
> > > > > > > > >
> > > > > > > > > On Mon, Aug 7, 2023 at 4:19 PM Matthias Pohl <
> > > > matthias.p...@aiven.io
> > > > > > > > > .invalid>
> > > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > Congratulations, Yanfei! :)
> > > > > > > > >
> > > > > > > > > On Mon, Aug 7, 2023 at 10:00 AM Junrui Lee <
> > > jrlee@gmail.com>
> > > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > Congratulations Yanfei!
> > > > > > > > >
> > > > > > > > > Best,
> > > > > > > > > Junrui
> > > > > > > > >
> > > > > > > > > Yun Tang  于2023年8月7日周一 15:19写道:
> > > > > > > > >
> > > > > > > > > Congratulations, Yanfei!
> > > > > > > > >
> > > > > > > > > Best
> > > > > > > > > Yun Tang
> > > > > > > > > 
> > > > > > > > > From: Danny Cranmer 
> > > > > > > > > Sent: Monday, August 7, 2023 15:10
> > > > > > > > > To: dev 
> > > > > > > > > Subject: Re: [ANNOUNCE] New Apache Flink Committer - Yanfei
> > Lei
> > > > > > > > >
> > > > > > > > > Congrats Yanfei! Welcome to the team.
> > > > > > > > >
> > > > > > > > > Danny
> > > > > > > > >
> > > > > > > > > On 

Re: request for jira

2023-08-14 Thread lambda ch
Sorry I may have misunderstood. I need a jira account. And I have already
submitted a jira account request. How long will it take to get approved? My
Jira Id is CHLambda. Thank you.

liu ron  于 2023年8月15日周二 09:31写道:

> Hi, Lambda
>
> If you're interested in a ticket that doesn't currently have an owner, you
> can comment directly in the ticket and the committer will assign it to you
> without assigning special permissions.
>
> Best,
> Ron
>
> lambda ch  于2023年8月14日周一 22:42写道:
>
> > Hi,
> > I want to contribute to Apache Flink.
> > Would you please give me the contributor permission?
> > My JIRA ID is CHlambda
> >
>


Re: request for jira

2023-08-14 Thread liu ron
Hi, Lambda

If you're interested in a ticket that doesn't currently have an owner, you
can comment directly in the ticket and the committer will assign it to you
without assigning special permissions.

Best,
Ron

lambda ch  于2023年8月14日周一 22:42写道:

> Hi,
> I want to contribute to Apache Flink.
> Would you please give me the contributor permission?
> My JIRA ID is CHlambda
>


Re: [DISCUSS] FLIP-330: Support specifying record timestamp requirement

2023-08-14 Thread Becket Qin
Hi Jark,

Thanks for the comments. I agree that at this point SQL is the only API
that we can apply this optimization transparently to the users.

For the other APIs (DataStream or the new Process Function targeted in
2.0), the hope is that in the future they will evolve so that the framework
can derive the necessity of the optimization. BTW, I think conceptually
watermarks should only be generated in the source, and the rest of the
operators should just merge and pass on the watermark merging result.
Therefore, if the sources in a job do not generate watermarks, it seems we
can assume the job won't have watermarks at all, and therefore, the
timestamp field is not needed.

I also agree with Xintong and Gyula that we need to prove the benefit. We
will probably see that the perf improvement is more obvious in some cases
while ignorable in some other use cases. The question is whether the use
cases that the optimization helps are important enough. And this judgement
is somewhat subjective.

Thanks,

Jiangjie (Becket) Qin

On Mon, Aug 14, 2023 at 9:13 PM Jark Wu  wrote:

> Hi Becket,
>
> > I kind of think that we can
> restrain the scope to just batch mode, and only for StreamRecord class.
> That means only in batch mode, the timestamp in the StreamRecord will be
> dropped when the config is enabled.
>
> However, IIUC, dropping timestamp in StreamRecord has been supported.
> This is an existing optimization in StreamElementSerializer that the 8bytes
> of
> the timestamp is not serialized if there is no timestamp on the
> StreamRecord.
>
> -
>
> Reducing 1-byte of StreamElement tag is a good idea to improve performance.
> But I agree with Xintong and Gyula that we should have a balance between
> complexity and performance. I'm fine to introduce this optimization if only
> for
> pure batch SQL. Because this is the only way (not even batch DataStream
> and batch Table API) to enable it by default. But I have concerns about
> other options.
>
> The largest concern from my side is it exposing a configuration to users
> which
> is hard to understand and afraid to enable and not worth enabling it. If
> users
> rarely enable this configuration, this would be an overhead to maintain for
> the community without benefits.
>
> Besides, I suspect whether we can remove "pipeline.force-timestamp-support"
> in the future. From my understanding, it is pretty hard for the framework
> to detect
> whether the job does not have a watermark strategy. Because the watermark
> may be assigned in any operators by using Output#emitWatermark.
>
> Best,
> Jark
>
>
> On Sat, 12 Aug 2023 at 13:23, Gyula Fóra  wrote:
>
> > Hey Devs,
> >
> > It would be great to see some other benchmarks ,  not only the dummy
> > WordCount example.
> >
> > I would love to see a few SQL queries documented and whether there is any
> > measurable benefit at all.
> >
> > Prod pipelines usually have some IO component etc which will add enough
> > overhead to make this even less noticeable. I agree that even small
> > improvements are worthwhile but they should be observable/significant on
> > real workloads. Otherwise complicating the runtime layer, types and
> configs
> > are not worth it in my opinion.
> >
> > Cheers
> > Gyula
> >
> > On Sat, 12 Aug 2023 at 04:39, Becket Qin  wrote:
> >
> > > Thanks for the FLIP, Yunfeng.
> > >
> > > I had a brief offline discussion with Dong, and here are my two cents:
> > >
> > > ## The benefit
> > > The FLIP is related to one of the perf benchmarks we saw at LinkedIn
> > which
> > > is pretty much doing a word count, except that the words are country
> > code,
> > > so it is typically just two bytes, e.g. CN, US, UK. What I see is that
> > the
> > > amount of data going through shuffle is much higher in Flink
> DataStream
> > > batch mode compared with the Flink DataSet API. And in this case,
> because
> > > the actual key is just 2 bytes so the overhead is kind of high. In
> batch
> > > processing, it is not rare that people first tokenize the data before
> > > processing to save cost. For example, imagine in word count the words
> are
> > > coded as 4-byte Integers instead of String. So the 1 byte overhead can
> > > still introduce 25 percent of the overhead. Therefore, I think the
> > > optimization in the FLIP can still benefit a bunch of batch processing
> > > cases. For streaming, the benefit still applies, although less compared
> > > with batch.
> > >
> > > ## The complexity and long term solution
> > > In terms of the complexity of the FLIP. I kind of think that we can
> > > restrain the scope to just batch mode, and only for StreamRecord class.
> > > That means only in batch mode, the timestamp in the StreamRecord will
> be
> > > dropped when the config is enabled. This should give the most of the
> > > benefit while significantly reducing the complexity of the FLIP.
> > > In practice, I think people rarely use StreamRecord timestamps in batch
> > > jobs. But because t

Re: [DISCUSS] Status of Statefun Project

2023-08-14 Thread Galen Warren
I created a pull request for this: [FLINK-31619] Upgrade Stateful Functions
to Flink 1.16.1 by galenwarren · Pull Request #331 · apache/flink-statefun
(github.com) .

JIRA is here: [FLINK-31619] Upgrade Stateful Functions to Flink 1.16.1 -
ASF JIRA (apache.org)
.

Statefun references 1.16.2, despite the title -- that version has come out
since the issue was created.

I figured out how to run all the playground tests locally, but it took a
bit of reworking of the playground setup with respect to Docker;
specifically, the Docker contexts used to build the example functions
needed to be broadened (i.e. moved up the tree) so that, if needed, local
artifacts/code can be accessed from within the containers at build time.
Then I made the Docker compose.yml configurable through environment
variables to allow for them to run in either the original manner -- i.e.
pulling artifacts from public repos -- or in a "local" mode, where
artifacts are pulled from local builds.

This process is a cleaner if the playground is a subfolder of the
flink-statefun project rather than be its own repository
(flink-statefun-playground), because then all the relative paths between
the playground files and the build artifacts are fixed. So, I'd like to
propose to move the playground files, modified as described above, to
flink-statefun/playground and retire flink-statefun-playground. I can
submit separate PR s those changes if everyone is on board.

Also, should I plan to do the same upgrade to handle Flink 1.17.x? It
should be easy to do, especially while the 1.16.x upgrade is fresh on my
mind.

Thanks.


On Fri, Aug 11, 2023 at 6:40 PM Galen Warren 
wrote:

> I'm done with the code to make Statefun compatible with Flink 1.16, and
> all the tests (including e2e succeed). The required changes were pretty
> minimal.
>
> I'm running into a bit of a chicken/egg problem executing the tests in
> flink-statefun-playground
> , though. That
> project seems to assume that all the various Statefun artifacts are built
> and deployed to the various public repositories already. I've looked into
> trying to redirect references to local artifacts; however, that's also
> tricky since all the building occurs in Docker containers.
>
> Gordon, is there a trick to running the sample code in
> flink-statefun-playground against yet-unreleased code that I'm missing?
>
> Thanks.
>
> On Sat, Jun 24, 2023 at 12:40 PM Galen Warren 
> wrote:
>
>> Great -- thanks!
>>
>> I'm going to be out of town for about a week but I'll take a look at this
>> when I'm back.
>>
>> On Tue, Jun 20, 2023 at 8:46 AM Martijn Visser 
>> wrote:
>>
>>> Hi Galen,
>>>
>>> Yes, I'll be more than happy to help with Statefun releases.
>>>
>>> Best regards,
>>>
>>> Martijn
>>>
>>> On Tue, Jun 20, 2023 at 2:21 PM Galen Warren 
>>> wrote:
>>>
 Thanks.

 Martijn, to answer your question, I'd need to do a small amount of work
 to get a PR ready, but not much. Happy to do it if we're deciding to
 restart Statefun releases -- are we?

 -- Galen

 On Sat, Jun 17, 2023 at 9:47 AM Tzu-Li (Gordon) Tai <
 tzuli...@apache.org> wrote:

> > Perhaps he could weigh in on whether the combination of automated
> tests plus those smoke tests should be sufficient for testing with new
> Flink versions
>
> What we usually did at the bare minimum for new StateFun releases was
> the following:
>
>1. Build tests (including the smoke tests in the e2e module, which
>covers important tests like exactly-once verification)
>2. Updating the flink-statefun-playground repo and manually
>running all language examples there.
>
> If upgrading Flink versions was the only change in the release, I'd
> probably say that this is sufficient.
>
> Best,
> Gordon
>
> On Thu, Jun 15, 2023 at 5:25 AM Martijn Visser <
> martijnvis...@apache.org> wrote:
>
>> Let me know if you have a PR for a Flink update :)
>>
>> On Thu, Jun 8, 2023 at 5:52 PM Galen Warren via user <
>> u...@flink.apache.org> wrote:
>>
>>> Thanks Martijn.
>>>
>>> Personally, I'm already using a local fork of Statefun that is
>>> compatible with Flink 1.16.x, so I wouldn't have any need for a released
>>> version compatible with 1.15.x. I'd be happy to do the PRs to modify
>>> Statefun to work with new versions of Flink as they come along.
>>>
>>> As for testing, Statefun does have unit tests and Gordon also sent
>>> me instructions a while back for how to do some additional smoke tests
>>> which are pretty straightforward. Perhaps he could weigh in on whether 
>>> the
>>> combination of automated tests plus those smoke tests should be 
>>> sufficient
>>> for testing with new Flink versions (I beli

Re: [VOTE] Apache Flink Kubernetes Operator Release 1.6.0, release candidate #2

2023-08-14 Thread Maximilian Michels
+1 (binding)

1. Downloaded the archives, checksums, and signatures
2. Verified the signatures and checksums
3. Extract and inspect the source code for binaries
4. Compiled and tested the source code via mvn verify
5. Verified license files / headers
6. Deployed helm chart to test cluster
7. Ran example job
8. Tested autoscaling without rescaling API

-Max

On Mon, Aug 14, 2023 at 3:44 PM Márton Balassi  wrote:
>
> Thank you, team.
>
> +1 (binding)
>
> - Verified Helm repo works as expected, points to correct image tag, build,
> version
> - Verified basic examples + checked operator logs everything looks as
> expected
> - Verified hashes, signatures and source release contains no binaries
> - Ran built-in tests, built jars + docker image from source successfully
>
> Best,
> Marton
>
> On Mon, Aug 14, 2023 at 1:24 PM Rui Fan <1996fan...@gmail.com> wrote:
>
> > Thanks Gyula for the release!
> >
> > +1 (non-binding)
> >
> > - Compiled and tested the source code via mvn verify
> > - Verified the signatures
> > - Downloaded the image
> > - Deployed helm chart to test cluster
> > - Ran example job
> >
> > Best,
> > Rui
> >
> > On Mon, Aug 14, 2023 at 3:58 PM Gyula Fóra  wrote:
> >
> > > +1 (binding)
> > >
> > > Verified:
> > >  - Hashes, signatures, source files contain no binaries
> > >  - Maven repo contents look good
> > >  - Verified helm chart, image, deployed stateful and autoscaling
> > examples.
> > > Operator logs look good
> > >
> > > Cheers,
> > > Gyula
> > >
> > > On Thu, Aug 10, 2023 at 3:03 PM Gyula Fóra  wrote:
> > >
> > > > Hi Everyone,
> > > >
> > > > Please review and vote on the release candidate #2 for the
> > > > version 1.6.0 of Apache Flink Kubernetes Operator,
> > > > as follows:
> > > > [ ] +1, Approve the release
> > > > [ ] -1, Do not approve the release (please provide specific comments)
> > > >
> > > > **Release Overview**
> > > >
> > > > As an overview, the release consists of the following:
> > > > a) Kubernetes Operator canonical source distribution (including the
> > > > Dockerfile), to be deployed to the release repository at
> > dist.apache.org
> > > > b) Kubernetes Operator Helm Chart to be deployed to the release
> > > repository
> > > > at dist.apache.org
> > > > c) Maven artifacts to be deployed to the Maven Central Repository
> > > > d) Docker image to be pushed to dockerhub
> > > >
> > > > **Staging Areas to Review**
> > > >
> > > > The staging areas containing the above mentioned artifacts are as
> > > follows,
> > > > for your review:
> > > > * All artifacts for a,b) can be found in the corresponding dev
> > repository
> > > > at dist.apache.org [1]
> > > > * All artifacts for c) can be found at the Apache Nexus Repository [2]
> > > > * The docker image for d) is staged on github [3]
> > > >
> > > > All artifacts are signed with the key 21F06303B87DAFF1 [4]
> > > >
> > > > Other links for your review:
> > > > * JIRA release notes [5]
> > > > * source code tag "release-1.6.0-rc2" [6]
> > > > * PR to update the website Downloads page to
> > > > include Kubernetes Operator links [7]
> > > >
> > > > **Vote Duration**
> > > >
> > > > The voting time will run for at least 72 hours.
> > > > It is adopted by majority approval, with at least 3 PMC affirmative
> > > votes.
> > > >
> > > >
> > > > **Note on Verification**
> > > >
> > > > You can follow the basic verification guide here[8].
> > > > Note that you don't need to verify everything yourself, but please make
> > > > note of what you have tested together with your +- vote.
> > > >
> > > > Cheers!
> > > > Gyula Fora
> > > >
> > > > [1]
> > > >
> > >
> > https://dist.apache.org/repos/dist/dev/flink/flink-kubernetes-operator-1.6.0-rc2/
> > > > [2]
> > > >
> > https://repository.apache.org/content/repositories/orgapacheflink-1649/
> > > > [3] ghcr.io/apache/flink-kubernetes-operator:ebb1fed
> > > > [4] https://dist.apache.org/repos/dist/release/flink/KEYS
> > > > [5]
> > > >
> > >
> > https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12353230
> > > > [6]
> > > >
> > >
> > https://github.com/apache/flink-kubernetes-operator/tree/release-1.6.0-rc2
> > > > [7] https://github.com/apache/flink-web/pull/666
> > > > [8]
> > > >
> > >
> > https://cwiki.apache.org/confluence/display/FLINK/Verifying+a+Flink+Kubernetes+Operator+Release
> > > >
> > >
> >


[jira] [Created] (FLINK-32867) Suspending job and triggering savepoint when only updating restartNonce

2023-08-14 Thread haiqingchen (Jira)
haiqingchen created FLINK-32867:
---

 Summary: Suspending job and triggering savepoint when only 
updating restartNonce
 Key: FLINK-32867
 URL: https://issues.apache.org/jira/browse/FLINK-32867
 Project: Flink
  Issue Type: Improvement
  Components: Kubernetes Operator
Reporter: haiqingchen


When I tried to restart Flink Deployment without any configuration change,

I updated restartNonce to current time mills.

However, kubernetes operator suspend the Flink Deployment and trigger a 
savepoint but we didn't update savepointTriggerNonce column.

It's not what we expected as we thought if we didn't update 
savepointTriggerNonce it would never trigger savepoint.

Is there any possibility to add a precheck(if savepointTriggerNonce is updated) 
when canceling the job?

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-32866) Clean up the `@ExtendWith(TestLoggerExtension.class)` for modules that added the `TestLoggerExtension` to the `org.junit.jupiter.api.extension.Extension resource` file

2023-08-14 Thread Rui Fan (Jira)
Rui Fan created FLINK-32866:
---

 Summary: Clean up the `@ExtendWith(TestLoggerExtension.class)` for 
modules that added the `TestLoggerExtension` to the 
`org.junit.jupiter.api.extension.Extension resource` file
 Key: FLINK-32866
 URL: https://issues.apache.org/jira/browse/FLINK-32866
 Project: Flink
  Issue Type: Sub-task
  Components: Tests
Reporter: Rui Fan
Assignee: Rui Fan


Some modules added the `TestLoggerExtension` to the 
`org.junit.jupiter.api.extension.Extension resource` file. All test classes of 
these modules don't need add the `@ExtendWith(TestLoggerExtension.class)`  at 
class level.


This JIRA propose clean up the `@ExtendWith(TestLoggerExtension.class)` for 
modules that added the `TestLoggerExtension` to the 
`org.junit.jupiter.api.extension.Extension resource` file.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: Plans for Schema Evolution in Table API

2023-08-14 Thread Ashish Khatkar
Bumping the thread.

On Fri, Aug 4, 2023 at 12:51 PM Ashish Khatkar  wrote:

> Hi all,
>
> We are using flink-1.17.0 table API and RocksDB as backend to provide a
> service to our users to run sql queries. The tables are created using the
> avro schema and when the schema is changed in a compatible manner i.e
> adding a field with default, we are unable to recover the job from the
> savepoint. This is mentioned in the flink doc on evolution [1] as well.
>
> Are there any plans to support schema evolution in the table API? Our
> current approach involves rebuilding the entire state by discarding the
> output and then utilizing that state in the actual job. This is already
> done for table-store [2]
>
> [1]
> https://nightlies.apache.org/flink/flink-docs-release-1.17/docs/dev/table/concepts/overview/#stateful-upgrades-and-evolution
> [2]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-226%3A+Introduce+Schema+Evolution+on+Table+Store
>
>
>


request for jira

2023-08-14 Thread lambda ch
Hi,
I want to contribute to Apache Flink.
Would you please give me the contributor permission?
My JIRA ID is CHlambda


Re: [DISCUSS] Update Flink Roadmap

2023-08-14 Thread Jark Wu
Hi Shammon,

Sure, could you help to draft a subsection about this in the google doc?

Best,
Jark

> 2023年8月14日 20:30,Shammon FY  写道:
> 
> Thanks @Jark for driving the Flink Roadmap. 
> 
> As we discussed olap in the thread [1] and according to the suggestions from 
> @Xingtong Song, could we add a subsection in `Towards Streaming Warehouses` 
> or `Performance` that the short-lived query in Flink Session Cluster is one 
> of the future directions for Flink?
> 
> Best,
> Shammon FY
> 
> On Mon, Aug 14, 2023 at 8:03 PM Jark Wu  > wrote:
>> Thank you everyone for helping polish the roadmap [1].
>> 
>> I think I have addressed all the comments and we have included all ongoing
>> parts of Flink.
>> Please feel free to take a last look. I'm going to prepare the pull request
>> if there are no more concerns.
>> 
>> Best,
>> Jark
>> 
>> [1]:
>> https://docs.google.com/document/d/12BDiVKEsY-f7HI3suO_IxwzCmR04QcVqLarXgyJAb7c/edit
>> 
>> On Sun, 13 Aug 2023 at 13:04, Yuan Mei > > wrote:
>> 
>> > Sorry for taking so long
>> >
>> > I've added a section about Flink Disaggregated State Management Evolution
>> > in the attached doc.
>> >
>> > I found some of the contents might be overlapped with the "large-scale
>> > streaming jobs". So that part might need some changes as well.
>> >
>> > Please let me know what you think.
>> >
>> > Best
>> > Yuan
>> >
>> > On Mon, Jul 24, 2023 at 12:07 PM Yuan Mei > > > wrote:
>> >
>> > > Sorry have missed this email and respond a bit late.
>> > >
>> > > I will put a draft for the long-term vision for the state as well as
>> > > large-scale state support into the roadmap.
>> > >
>> > > Best
>> > > Yuan
>> > >
>> > > On Mon, Jul 17, 2023 at 10:34 AM Jark Wu > > > > wrote:
>> > >
>> > >> Hi Jiabao,
>> > >>
>> > >> Thank you for your suggestions. I have added them to the "Going Beyond a
>> > >> SQL Stream/Batch Processing Engine" and "Large-Scale State Jobs"
>> > sections.
>> > >>
>> > >> Best,
>> > >> Jark
>> > >>
>> > >> On Thu, 13 Jul 2023 at 16:06, Jiabao Sun > > >> 
>> > >> .invalid>
>> > >> wrote:
>> > >>
>> > >> > Thanks Jark and Martijn for driving this.
>> > >> >
>> > >> > There are two suggestions about the Table API:
>> > >> >
>> > >> > - Add the JSON type to adapt to the no sql database type.
>> > >> > - Remove changelog normalize operator for upsert stream.
>> > >> >
>> > >> >
>> > >> > Best,
>> > >> > Jiabao
>> > >> >
>> > >> >
>> > >> > > 2023年7月13日 下午3:49,Jark Wu > > >> > > > 写道:
>> > >> > >
>> > >> > > Hi all,
>> > >> > >
>> > >> > > Sorry for taking so long back here.
>> > >> > >
>> > >> > > Martijn and I have drafted the first version of the updated roadmap,
>> > >> > > including the updated feature radar reflecting the current state of
>> > >> > > different components.
>> > >> > >
>> > >> >
>> > >>
>> > https://docs.google.com/document/d/12BDiVKEsY-f7HI3suO_IxwzCmR04QcVqLarXgyJAb7c/edit
>> > >> > >
>> > >> > > Feel free to leave comments in the thread or the document.
>> > >> > > We may miss mentioning something important, so your help in
>> > enriching
>> > >> > > the content is greatly appreciated.
>> > >> > >
>> > >> > > Best,
>> > >> > > Jark & Martijn
>> > >> > >
>> > >> > >
>> > >> > > On Fri, 2 Jun 2023 at 00:50, Jing Ge 
>> > >> wrote:
>> > >> > >
>> > >> > >> Hi Jark,
>> > >> > >>
>> > >> > >> Fair enough. Let's do it like you suggested. Thanks!
>> > >> > >>
>> > >> > >> Best regards,
>> > >> > >> Jing
>> > >> > >>
>> > >> > >> On Thu, Jun 1, 2023 at 6:00 PM Jark Wu > > >> > >> > wrote:
>> > >> > >>
>> > >> > >>> Hi Jing,
>> > >> > >>>
>> > >> > >>> This thread is for discussing the roadmap for versions 1.18, 2.0,
>> > >> and
>> > >> > >> even
>> > >> > >>> more.
>> > >> > >>> One of the outcomes of this discussion will be an updated version
>> > of
>> > >> > the
>> > >> > >>> current roadmap.
>> > >> > >>> Let's work together on refining the roadmap in this thread.
>> > >> > >>>
>> > >> > >>> Best,
>> > >> > >>> Jark
>> > >> > >>>
>> > >> > >>> On Thu, 1 Jun 2023 at 23:25, Jing Ge 
>> > >> > wrote:
>> > >> > >>>
>> > >> >  Hi Jark,
>> > >> > 
>> > >> >  Thanks for driving it! For point 2, since we are developing 1.18
>> > >> now,
>> > >> >  does it make sense to update the roadmap this time while we are
>> > >> > >> releasing
>> > >> >  1.18? This discussion thread will be focusing on the Flink 2.0
>> > >> > roadmap,
>> > >> > >>> as
>> > >> >  you mentioned previously. WDYT?
>> > >> > 
>> > >> >  Best regards,
>> > >> >  Jing
>> > >> > 
>> > >> >  On Thu, Jun 1, 2023 at 3:31 PM Jark Wu > > >> >  > wrote:
>> > >> > 
>> > >> > > Hi all,
>> > >> > >
>> > >> > > Martijn and I would like to initiate a discussion on the Flink
>> > >> > >> roadmap,
>> > >> > > which should cover the p

Re: [ANNOUNCE] New Apache Flink Committer - Yanfei Lei

2023-08-14 Thread Roman Khachatryan
Congratulations, Yanfey!

Regards,
Roman


On Wed, Aug 9, 2023 at 12:49 PM Benchao Li  wrote:

> Congrats, YanFei!
>
> Jing Ge  于2023年8月8日周二 17:41写道:
>
> > Congrats, YanFei!
> >
> > Best regards,
> > Jing
> >
> > On Tue, Aug 8, 2023 at 3:04 PM Yangze Guo  wrote:
> >
> > > Congrats, Yanfei!
> > >
> > > Best,
> > > Yangze Guo
> > >
> > > On Tue, Aug 8, 2023 at 9:20 AM yuxia 
> > wrote:
> > > >
> > > > Congratulations, Yanfei!
> > > >
> > > > Best regards,
> > > > Yuxia
> > > >
> > > > - 原始邮件 -
> > > > 发件人: "ron9 liu" 
> > > > 收件人: "dev" 
> > > > 发送时间: 星期一, 2023年 8 月 07日 下午 11:44:23
> > > > 主题: Re: [ANNOUNCE] New Apache Flink Committer - Yanfei Lei
> > > >
> > > > Congratulations Yanfei!
> > > >
> > > > Best,
> > > > Ron
> > > >
> > > > Zakelly Lan  于2023年8月7日周一 23:15写道:
> > > >
> > > > > Congratulations, Yanfei!
> > > > >
> > > > > Best regards,
> > > > > Zakelly
> > > > >
> > > > > On Mon, Aug 7, 2023 at 9:04 PM Lincoln Lee  >
> > > wrote:
> > > > > >
> > > > > > Congratulations, Yanfei!
> > > > > >
> > > > > > Best,
> > > > > > Lincoln Lee
> > > > > >
> > > > > >
> > > > > > Weihua Hu  于2023年8月7日周一 20:43写道:
> > > > > >
> > > > > > > Congratulations Yanfei!
> > > > > > >
> > > > > > > Best,
> > > > > > > Weihua
> > > > > > >
> > > > > > >
> > > > > > > On Mon, Aug 7, 2023 at 8:08 PM Feifan Wang  >
> > > wrote:
> > > > > > >
> > > > > > > > Congratulations Yanfei! :)
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > ——
> > > > > > > > Name: Feifan Wang
> > > > > > > > Email: zoltar9...@163.com
> > > > > > > >
> > > > > > > >
> > > > > > > >  Replied Message 
> > > > > > > > | From | Matt Wang |
> > > > > > > > | Date | 08/7/2023 19:40 |
> > > > > > > > | To | dev@flink.apache.org |
> > > > > > > > | Subject | Re: [ANNOUNCE] New Apache Flink Committer -
> Yanfei
> > > Lei |
> > > > > > > > Congratulations Yanfei!
> > > > > > > >
> > > > > > > >
> > > > > > > > --
> > > > > > > >
> > > > > > > > Best,
> > > > > > > > Matt Wang
> > > > > > > >
> > > > > > > >
> > > > > > > >  Replied Message 
> > > > > > > > | From | Mang Zhang |
> > > > > > > > | Date | 08/7/2023 18:56 |
> > > > > > > > | To |  |
> > > > > > > > | Subject | Re:Re: [ANNOUNCE] New Apache Flink Committer -
> > Yanfei
> > > > > Lei |
> > > > > > > > Congratulations--
> > > > > > > >
> > > > > > > > Best regards,
> > > > > > > > Mang Zhang
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > 在 2023-08-07 18:17:58,"Yuxin Tan" 
> 写道:
> > > > > > > > Congrats, Yanfei!
> > > > > > > >
> > > > > > > > Best,
> > > > > > > > Yuxin
> > > > > > > >
> > > > > > > >
> > > > > > > > weijie guo  于2023年8月7日周一 17:59写道:
> > > > > > > >
> > > > > > > > Congrats, Yanfei!
> > > > > > > >
> > > > > > > > Best regards,
> > > > > > > >
> > > > > > > > Weijie
> > > > > > > >
> > > > > > > >
> > > > > > > > Biao Geng  于2023年8月7日周一 17:03写道:
> > > > > > > >
> > > > > > > > Congrats, Yanfei!
> > > > > > > > Best,
> > > > > > > > Biao Geng
> > > > > > > >
> > > > > > > > 发送自 Outlook for iOS
> > > > > > > > 
> > > > > > > > 发件人: Qingsheng Ren 
> > > > > > > > 发送时间: Monday, August 7, 2023 4:23:52 PM
> > > > > > > > 收件人: dev@flink.apache.org 
> > > > > > > > 主题: Re: [ANNOUNCE] New Apache Flink Committer - Yanfei Lei
> > > > > > > >
> > > > > > > > Congratulations and welcome, Yanfei!
> > > > > > > >
> > > > > > > > Best,
> > > > > > > > Qingsheng
> > > > > > > >
> > > > > > > > On Mon, Aug 7, 2023 at 4:19 PM Matthias Pohl <
> > > matthias.p...@aiven.io
> > > > > > > > .invalid>
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > Congratulations, Yanfei! :)
> > > > > > > >
> > > > > > > > On Mon, Aug 7, 2023 at 10:00 AM Junrui Lee <
> > jrlee@gmail.com>
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > Congratulations Yanfei!
> > > > > > > >
> > > > > > > > Best,
> > > > > > > > Junrui
> > > > > > > >
> > > > > > > > Yun Tang  于2023年8月7日周一 15:19写道:
> > > > > > > >
> > > > > > > > Congratulations, Yanfei!
> > > > > > > >
> > > > > > > > Best
> > > > > > > > Yun Tang
> > > > > > > > 
> > > > > > > > From: Danny Cranmer 
> > > > > > > > Sent: Monday, August 7, 2023 15:10
> > > > > > > > To: dev 
> > > > > > > > Subject: Re: [ANNOUNCE] New Apache Flink Committer - Yanfei
> Lei
> > > > > > > >
> > > > > > > > Congrats Yanfei! Welcome to the team.
> > > > > > > >
> > > > > > > > Danny
> > > > > > > >
> > > > > > > > On Mon, 7 Aug 2023, 08:03 Rui Fan, <1996fan...@gmail.com>
> > wrote:
> > > > > > > >
> > > > > > > > Congratulations Yanfei!
> > > > > > > >
> > > > > > > > Best,
> > > > > > > > Rui
> > > > > > > >
> > > > > > > > On Mon, Aug 7, 2023 at 2:56 PM Yuan Mei <
> > yuanmei.w...@gmail.com>
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > On behalf of the PMC, I'm happy to announce Yanfei Lei as a
> new
> > > > > > > > Flink
> > > > > > > > Committer.
> > > > > > > >
> > > > > > > > Yan

Re: [ANNOUNCE] New Apache Flink Committer - Hangxiang Yu

2023-08-14 Thread Roman Khachatryan
Congratulations, Hangxiang!

Regards,
Roman


On Wed, Aug 9, 2023 at 12:49 PM Benchao Li  wrote:

> Congrats, Hangxiang!
>
> Jing Ge  于2023年8月8日周二 17:44写道:
>
> > Congrats, Hangxiang!
> >
> > Best regards,
> > Jing
> >
> > On Tue, Aug 8, 2023 at 3:04 PM Yangze Guo  wrote:
> >
> > > Congrats, Hangxiang!
> > >
> > > Best,
> > > Yangze Guo
> > >
> > > On Tue, Aug 8, 2023 at 11:28 AM yh z  wrote:
> > > >
> > > > Congratulations, Hangxiang !
> > > >
> > > >
> > > > Best,
> > > > Yunhong Zheng (Swuferhong)
> > > >
> > > > yuxia  于2023年8月8日周二 09:20写道:
> > > >
> > > > > Congratulations, Hangxiang !
> > > > >
> > > > > Best regards,
> > > > > Yuxia
> > > > >
> > > > > - 原始邮件 -
> > > > > 发件人: "Wencong Liu" 
> > > > > 收件人: "dev" 
> > > > > 发送时间: 星期一, 2023年 8 月 07日 下午 11:55:24
> > > > > 主题: Re:[ANNOUNCE] New Apache Flink Committer - Hangxiang Yu
> > > > >
> > > > > Congratulations, Hangxiang !
> > > > >
> > > > >
> > > > > Best,
> > > > > Wencong
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > At 2023-08-07 14:57:49, "Yuan Mei"  wrote:
> > > > > >On behalf of the PMC, I'm happy to announce Hangxiang Yu as a new
> > > Flink
> > > > > >Committer.
> > > > > >
> > > > > >Hangxiang has been active in the Flink community for more than 1.5
> > > years
> > > > > >and has played an important role in developing and maintaining
> State
> > > and
> > > > > >Checkpoint related features/components, including Generic
> > Incremental
> > > > > >Checkpoints (take great efforts to make the feature prod-ready).
> > > Hangxiang
> > > > > >is also the main driver of the FLIP-263: Resolving schema
> > > compatibility.
> > > > > >
> > > > > >Hangxiang is passionate about the Flink community. Besides the
> > > technical
> > > > > >contribution above, he is also actively promoting Flink: talks
> about
> > > > > Generic
> > > > > >Incremental Checkpoints in Flink Forward and Meet-up. Hangxiang
> also
> > > spent
> > > > > >a good amount of time supporting users, participating in
> > Jira/mailing
> > > list
> > > > > >discussions, and reviewing code.
> > > > > >
> > > > > >Please join me in congratulating Hangxiang for becoming a Flink
> > > Committer!
> > > > > >
> > > > > >Thanks,
> > > > > >Yuan Mei (on behalf of the Flink PMC)
> > > > >
> > >
> >
>
>
> --
>
> Best,
> Benchao Li
>


Re: [DISCUSS] How about adding OLAP to Flink Roadmap?

2023-08-14 Thread Shammon FY
Hi,

Thanks for all the feedback. I'm so glad that I can see some consensus we
have reached from the feedback. I am trying to summarize our consensus as
follows and please correct me if I'm wrong or misunderstanding.

1) Batch is a special case of Streaming, while olap is a special case of
batch, so Flink will not lose focus from supporting short-lived olap jobs.

2) As a streaming and batch processing engine, it's valuable for Flink to
support olap jobs which will bring big merit to users and worth our efforts
to promote and achieve.

3) From the evolution of unified streaming-batch-olap processing engine for
Flink, we could add a subsection for Flink olap in the roadmap and continue
to evolve.

4) In order to support short live queries in Flink, it is necessary to do
some very careful design in terms of architecture and implementation. These
designs cannot affect streaming and batch capabilities in Flink while
supporting olap.

5) In order to better guide and measure the optimization of Flink olap, we
need to add relevant olap benchmarks to the flink project repository such
as flink-benchmarks.


As I replied in the roadmap thread [1], @Jark Wu and @Xingtong Song could
you please help to add the flink olap related subsection to the doc [2] ?
Thanks very much!


[1] https://lists.apache.org/thread/szdr4ngrfcmo7zko4917393zbqhgw0v5
[2]
https://docs.google.com/document/d/12BDiVKEsY-f7HI3suO_IxwzCmR04QcVqLarXgyJAb7c/edit

Best,
Shammon FY

On Mon, Aug 14, 2023 at 4:08 PM Yun Tang  wrote:

> Thanks to the guys from ByteDance driving this topic, which could be
> another big story to extend Flink's ability.
>
> In general, I think this is a great idea. However, before we move forward,
> I think we should first answer the question: which is the target for Flink
> OLAP?
>
> We run Presto/Trino and SparkSQL in the production environment for OLAP
> SQL analysis. Since Presto runs faster than SparkSQL in many cases,
> especially for ad-hoc queries at medium-sized data, we would run queries on
> Presto first or switch to SparkSQL for large-scale queries if necessary.
> Presto runs as a service and emphasis on query performance without node
> fault tolerance. Moreover, it leverages a pipeline-like data exchange mode
> instead of the classic stage blocking exchange mode, which is a bit like
> Flink's pipeline mode vs blocking mode.
>
> Can we say we hope Flink OLAP could target Presto/Trino in medium-sized
> data query, and switch to Flink batch SQL for large-scale analysis query?
> If so, I also think the naming of Flink OLAP looks a bit strange, as Flink
> batch SQL shall also serve for large-scale OLAP analysis.
>
> Best
> Yun Tang
> 
> From: Jing Ge 
> Sent: Thursday, August 10, 2023 13:52
> To: dev@flink.apache.org 
> Subject: Re: [DISCUSS] How about adding OLAP to Flink Roadmap?
>
> Hi Shammon, Hi Xiangyu,
>
> Thanks for bringing this to our attention. I can see this is a great
> proposal born from real business scenarios. +1 for it.
>
> People have been keen to use one platform to cover all their data
> production and consumption requirements. Flink did a great job for the
> production, i.e. streaming and batch processing with all excellent
> ecosystems. This is the big advantage for Flink to go one step further and
> cover the consumption part. It will turn Flink into a unified compute
> platform like what the Ray project(the platform behind ChatGPT, if someone
> is not aware of it)[1] is doing and secure Flink to be one of the most
> interesting open source platforms for the next decade.
>
> Frankly speaking, it will be a big change. As far as I am concerned, the
> following should be considered(just thought about it at the first glance,
> there must be more).
>
> Architecture upgrade - since we will have three capabilities(I wanted to
> use "engines", but it might be too early to use the big word), i.e.
> streaming, batch, OLAP,  it might make sense to upgrade the architecture
> while we are building the OLAP in Flink. The unified foundation or
> abstraction for distributed computation should be designed and implemented
> underneath those capabilities. In the future, new capabilities can leverage
> the foundation and could be developed at a very fast pace.
>
> MPP architecture - Flink session cluster is not the MMP architecture.
> Commonly speaking, SNA(shared nothing architecture) is the key that could
> implement MPP. Flink has everything to offer SNA. That is the reason why we
> can consider building OLAP into or on top of the Flink. And speaking of
> MPP, there will be a lot of things to do, e.g. the Retrieval
> Architecture[2], multiple level task split, dynamic retry or even split,
> etc. I will not expand all those topics at this early stage.
>
> OLAP queries syntax - at least some common syntax and statements need to be
> implemented, e.g. cube, grouping set, over partition by, you mention it.
>
> Last but not least, there will be a big effort to upgrade the runtime
> fe

Re: [VOTE] Apache Flink Kubernetes Operator Release 1.6.0, release candidate #2

2023-08-14 Thread Márton Balassi
Thank you, team.

+1 (binding)

- Verified Helm repo works as expected, points to correct image tag, build,
version
- Verified basic examples + checked operator logs everything looks as
expected
- Verified hashes, signatures and source release contains no binaries
- Ran built-in tests, built jars + docker image from source successfully

Best,
Marton

On Mon, Aug 14, 2023 at 1:24 PM Rui Fan <1996fan...@gmail.com> wrote:

> Thanks Gyula for the release!
>
> +1 (non-binding)
>
> - Compiled and tested the source code via mvn verify
> - Verified the signatures
> - Downloaded the image
> - Deployed helm chart to test cluster
> - Ran example job
>
> Best,
> Rui
>
> On Mon, Aug 14, 2023 at 3:58 PM Gyula Fóra  wrote:
>
> > +1 (binding)
> >
> > Verified:
> >  - Hashes, signatures, source files contain no binaries
> >  - Maven repo contents look good
> >  - Verified helm chart, image, deployed stateful and autoscaling
> examples.
> > Operator logs look good
> >
> > Cheers,
> > Gyula
> >
> > On Thu, Aug 10, 2023 at 3:03 PM Gyula Fóra  wrote:
> >
> > > Hi Everyone,
> > >
> > > Please review and vote on the release candidate #2 for the
> > > version 1.6.0 of Apache Flink Kubernetes Operator,
> > > as follows:
> > > [ ] +1, Approve the release
> > > [ ] -1, Do not approve the release (please provide specific comments)
> > >
> > > **Release Overview**
> > >
> > > As an overview, the release consists of the following:
> > > a) Kubernetes Operator canonical source distribution (including the
> > > Dockerfile), to be deployed to the release repository at
> dist.apache.org
> > > b) Kubernetes Operator Helm Chart to be deployed to the release
> > repository
> > > at dist.apache.org
> > > c) Maven artifacts to be deployed to the Maven Central Repository
> > > d) Docker image to be pushed to dockerhub
> > >
> > > **Staging Areas to Review**
> > >
> > > The staging areas containing the above mentioned artifacts are as
> > follows,
> > > for your review:
> > > * All artifacts for a,b) can be found in the corresponding dev
> repository
> > > at dist.apache.org [1]
> > > * All artifacts for c) can be found at the Apache Nexus Repository [2]
> > > * The docker image for d) is staged on github [3]
> > >
> > > All artifacts are signed with the key 21F06303B87DAFF1 [4]
> > >
> > > Other links for your review:
> > > * JIRA release notes [5]
> > > * source code tag "release-1.6.0-rc2" [6]
> > > * PR to update the website Downloads page to
> > > include Kubernetes Operator links [7]
> > >
> > > **Vote Duration**
> > >
> > > The voting time will run for at least 72 hours.
> > > It is adopted by majority approval, with at least 3 PMC affirmative
> > votes.
> > >
> > >
> > > **Note on Verification**
> > >
> > > You can follow the basic verification guide here[8].
> > > Note that you don't need to verify everything yourself, but please make
> > > note of what you have tested together with your +- vote.
> > >
> > > Cheers!
> > > Gyula Fora
> > >
> > > [1]
> > >
> >
> https://dist.apache.org/repos/dist/dev/flink/flink-kubernetes-operator-1.6.0-rc2/
> > > [2]
> > >
> https://repository.apache.org/content/repositories/orgapacheflink-1649/
> > > [3] ghcr.io/apache/flink-kubernetes-operator:ebb1fed
> > > [4] https://dist.apache.org/repos/dist/release/flink/KEYS
> > > [5]
> > >
> >
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12353230
> > > [6]
> > >
> >
> https://github.com/apache/flink-kubernetes-operator/tree/release-1.6.0-rc2
> > > [7] https://github.com/apache/flink-web/pull/666
> > > [8]
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/Verifying+a+Flink+Kubernetes+Operator+Release
> > >
> >
>


Re: [DISCUSS] FLIP-330: Support specifying record timestamp requirement

2023-08-14 Thread Jark Wu
Hi Becket,

> I kind of think that we can
restrain the scope to just batch mode, and only for StreamRecord class.
That means only in batch mode, the timestamp in the StreamRecord will be
dropped when the config is enabled.

However, IIUC, dropping timestamp in StreamRecord has been supported.
This is an existing optimization in StreamElementSerializer that the 8bytes
of
the timestamp is not serialized if there is no timestamp on the
StreamRecord.

-

Reducing 1-byte of StreamElement tag is a good idea to improve performance.
But I agree with Xintong and Gyula that we should have a balance between
complexity and performance. I'm fine to introduce this optimization if only
for
pure batch SQL. Because this is the only way (not even batch DataStream
and batch Table API) to enable it by default. But I have concerns about
other options.

The largest concern from my side is it exposing a configuration to users
which
is hard to understand and afraid to enable and not worth enabling it. If
users
rarely enable this configuration, this would be an overhead to maintain for
the community without benefits.

Besides, I suspect whether we can remove "pipeline.force-timestamp-support"
in the future. From my understanding, it is pretty hard for the framework
to detect
whether the job does not have a watermark strategy. Because the watermark
may be assigned in any operators by using Output#emitWatermark.

Best,
Jark


On Sat, 12 Aug 2023 at 13:23, Gyula Fóra  wrote:

> Hey Devs,
>
> It would be great to see some other benchmarks ,  not only the dummy
> WordCount example.
>
> I would love to see a few SQL queries documented and whether there is any
> measurable benefit at all.
>
> Prod pipelines usually have some IO component etc which will add enough
> overhead to make this even less noticeable. I agree that even small
> improvements are worthwhile but they should be observable/significant on
> real workloads. Otherwise complicating the runtime layer, types and configs
> are not worth it in my opinion.
>
> Cheers
> Gyula
>
> On Sat, 12 Aug 2023 at 04:39, Becket Qin  wrote:
>
> > Thanks for the FLIP, Yunfeng.
> >
> > I had a brief offline discussion with Dong, and here are my two cents:
> >
> > ## The benefit
> > The FLIP is related to one of the perf benchmarks we saw at LinkedIn
> which
> > is pretty much doing a word count, except that the words are country
> code,
> > so it is typically just two bytes, e.g. CN, US, UK. What I see is that
> the
> > amount of data going through shuffle is much higher in Flink  DataStream
> > batch mode compared with the Flink DataSet API. And in this case, because
> > the actual key is just 2 bytes so the overhead is kind of high. In batch
> > processing, it is not rare that people first tokenize the data before
> > processing to save cost. For example, imagine in word count the words are
> > coded as 4-byte Integers instead of String. So the 1 byte overhead can
> > still introduce 25 percent of the overhead. Therefore, I think the
> > optimization in the FLIP can still benefit a bunch of batch processing
> > cases. For streaming, the benefit still applies, although less compared
> > with batch.
> >
> > ## The complexity and long term solution
> > In terms of the complexity of the FLIP. I kind of think that we can
> > restrain the scope to just batch mode, and only for StreamRecord class.
> > That means only in batch mode, the timestamp in the StreamRecord will be
> > dropped when the config is enabled. This should give the most of the
> > benefit while significantly reducing the complexity of the FLIP.
> > In practice, I think people rarely use StreamRecord timestamps in batch
> > jobs. But because this is not an explicit API contract for users, from
> what
> > I understand, the configuration is introduced to make it 100% safe for
> the
> > users. In another word, we won't need this configuration if our contract
> > with users does not support timestamps in batch mode. In order to make
> the
> > contract clear, maybe we can print a warning if the timestamp field in
> > StreamRecord is accessed in batch mode starting from the next release. So
> > we can drop the configuration completely in 2.0. By then, Flink should
> have
> > enough information to determine whether timestamps in StreamRecords
> should
> > be supported for a job/operator or not, e.g. batch mode, processing time
> > only jobs, etc.
> >
> > Thanks,
> >
> > Jiangjie (Becket) Qin
> >
> >
> > On Fri, Aug 11, 2023 at 9:46 PM Dong Lin  wrote:
> >
> > > Hi Xintong,
> > >
> > > Thanks for the quick reply. I also agree that we should hear from
> others
> > > about
> > > whether this optimization is worthwhile.
> > >
> > > Please see my comments inline.
> > >
> > > On Fri, Aug 11, 2023 at 5:54 PM Xintong Song 
> > > wrote:
> > >
> > > > Thanks for the quick replies.
> > > >
> > > > Overall, it seems that the main concern with this FLIP is that the 2%
> > > > > throughput saving might not

Re: [DISCUSS] FLIP-323: Support Attached Execution on Flink Application Completion for Batch Jobs

2023-08-14 Thread Becket Qin
Hi Ron and Weihua,

Thanks for the feedback.

There seem three user sensible behaviors that we are talking about:

1. The behavior on the client side, i.e. whether blocking until the job
finishes or not.

2. The behavior of the submitted job, whether stop the job execution if the
client is detached from the Flink cluster, i.e. whether bind the lifecycle
of the job with the connection status of the attached client. For example,
one might want to keep a batch job running until finish even after the
client connection is lost. But it makes sense to stop the job upon client
connection lost if the job invokes collect() on a streaming job.

3. The behavior of the Flink cluster (JM and TMs), whether shutdown the
Flink cluster if the client is detached from the Flink cluster, i.e.
whether bind the cluster lifecycle with the job lifecycle. For dedicated
clusters (application cluster or dedicated session clusters), the lifecycle
of the cluster should be bound with the job lifecycle. But for shared
session clusters, the lifecycle of the Flink cluster should be independent
of the jobs running in it.

As we can see, these three behaviors are sort of independent, the current
configurations fail to support all the combination of wanted behaviors.
Ideally there should be three separate configurations, for example:
- client.attached.after.submission and client.heartbeat.timeout control the
behavior on the client side.
- jobmanager.cancel-on-attached-client-exit controls the behavior of the
job when an attached client lost connection. The client heartbeat timeout
and attach-ness will be also passed to the JM upon job submission.
- cluster.shutdown-on-first-job-finishes *(*or
jobmanager.shutdown-cluster-after-job-finishes) controls the cluster
behavior after the job finishes normally / abnormally. This is a cluster
level setting instead of a job level setting. Therefore it can only be set
when launching the cluster.

The current code sort of combines config 2 and 3 into
execution.shutdown-on-attach-exit.
This assumes the the life cycle of the cluster is the same as the job when
the client is attached. This FLIP does not intend to change that. but using
the execution.attached config for the client behavior control looks
misleading. So this FLIP proposes to replace it with a more intuitive
config of client.attached.after.submission. This makes it clear that it is
a configuration controlling the client side behavior, instead of the
execution of the job.

Thanks,

Jiangjie (Becket) Qin





On Thu, Aug 10, 2023 at 10:34 PM Weihua Hu  wrote:

> Hi Allison
>
> Thanks for driving this FLIP. It's a valuable feature for batch jobs.
> This helps keep "Drop Per-Job Mode [1]" going.
>
> +1 for this proposal.
>
> However, it seems that the change in this FLIP is not detailed enough.
> I have a few questions.
>
> 1. The config 'execution.attached' is not only used in per-job mode,
> but also in session mode to shutdown the cluster. IMHO, it's better to
> keep this option name.
>
> 2. This FLIP only mentions YARN mode. I believe this feature should
> work in both YARN and Kubernetes mode.
>
> 3. Within the attach mode, we support two features:
> execution.shutdown-on-attached-exit
> and client.heartbeat.timeout. These should also be taken into account.
>
> 4. The Application Mode will shut down once the job has been completed.
> So, if we use the flink client to poll job status via REST API for attach
> mode,
> there is a chance that the client will not be able to retrieve the job
> finish status.
> Perhaps FLINK-24113[3] will help with this.
>
>
> [1]https://issues.apache.org/jira/browse/FLINK-26000
> [2]
>
> https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/resource-providers/native_kubernetes/#session-mode
> [2]https://issues.apache.org/jira/browse/FLINK-24113
>
> Best,
> Weihua
>
>
> On Thu, Aug 10, 2023 at 10:47 AM liu ron  wrote:
>
> > Hi, Allison
> >
> > Thanks for driving this proposal, it looks cool for batch jobs under
> > application mode. But after reading your FLIP document and [1], I have a
> > question. Why do you want to rename the execution.attached configuration
> to
> > client.attached.after.submission and at the same time deprecate
> > execution.attached? Based on your design, I understand the role of these
> > two options are the same. Introducing a new option would increase the
> cost
> > of understanding and use for the user, so why not follow the idea
> discussed
> > in FLINK-25495 and make Application mode support attached.execution.
> >
> > [1] https://issues.apache.org/jira/browse/FLINK-25495
> >
> > Best,
> > Ron
> >
> > Venkatakrishnan Sowrirajan  于2023年8月9日周三 02:07写道:
> >
> > > This is definitely a useful feature especially for the flink batch
> > > execution workloads using flow orchestrators like Airflow, Azkaban,
> Oozie
> > > etc. Thanks for reviving this issue and starting a FLIP.
> > >
> > > Regards
> > > Venkata krishnan
> > >
> > >
> > > On Mon, Aug 7, 2023 at 4:09 PM Allison Chang
> 

Re: [DISCUSS] Update Flink Roadmap

2023-08-14 Thread Shammon FY
Thanks @Jark for driving the Flink Roadmap.

As we discussed olap in the thread [1] and according to the suggestions
from @Xingtong Song, could we add a subsection in `Towards Streaming
Warehouses` or `Performance` that the short-lived query in Flink Session
Cluster is one of the future directions for Flink?

Best,
Shammon FY

On Mon, Aug 14, 2023 at 8:03 PM Jark Wu  wrote:

> Thank you everyone for helping polish the roadmap [1].
>
> I think I have addressed all the comments and we have included all ongoing
> parts of Flink.
> Please feel free to take a last look. I'm going to prepare the pull request
> if there are no more concerns.
>
> Best,
> Jark
>
> [1]:
>
> https://docs.google.com/document/d/12BDiVKEsY-f7HI3suO_IxwzCmR04QcVqLarXgyJAb7c/edit
>
> On Sun, 13 Aug 2023 at 13:04, Yuan Mei  wrote:
>
> > Sorry for taking so long
> >
> > I've added a section about Flink Disaggregated State Management Evolution
> > in the attached doc.
> >
> > I found some of the contents might be overlapped with the "large-scale
> > streaming jobs". So that part might need some changes as well.
> >
> > Please let me know what you think.
> >
> > Best
> > Yuan
> >
> > On Mon, Jul 24, 2023 at 12:07 PM Yuan Mei 
> wrote:
> >
> > > Sorry have missed this email and respond a bit late.
> > >
> > > I will put a draft for the long-term vision for the state as well as
> > > large-scale state support into the roadmap.
> > >
> > > Best
> > > Yuan
> > >
> > > On Mon, Jul 17, 2023 at 10:34 AM Jark Wu  wrote:
> > >
> > >> Hi Jiabao,
> > >>
> > >> Thank you for your suggestions. I have added them to the "Going
> Beyond a
> > >> SQL Stream/Batch Processing Engine" and "Large-Scale State Jobs"
> > sections.
> > >>
> > >> Best,
> > >> Jark
> > >>
> > >> On Thu, 13 Jul 2023 at 16:06, Jiabao Sun  > >> .invalid>
> > >> wrote:
> > >>
> > >> > Thanks Jark and Martijn for driving this.
> > >> >
> > >> > There are two suggestions about the Table API:
> > >> >
> > >> > - Add the JSON type to adapt to the no sql database type.
> > >> > - Remove changelog normalize operator for upsert stream.
> > >> >
> > >> >
> > >> > Best,
> > >> > Jiabao
> > >> >
> > >> >
> > >> > > 2023年7月13日 下午3:49,Jark Wu  写道:
> > >> > >
> > >> > > Hi all,
> > >> > >
> > >> > > Sorry for taking so long back here.
> > >> > >
> > >> > > Martijn and I have drafted the first version of the updated
> roadmap,
> > >> > > including the updated feature radar reflecting the current state
> of
> > >> > > different components.
> > >> > >
> > >> >
> > >>
> >
> https://docs.google.com/document/d/12BDiVKEsY-f7HI3suO_IxwzCmR04QcVqLarXgyJAb7c/edit
> > >> > >
> > >> > > Feel free to leave comments in the thread or the document.
> > >> > > We may miss mentioning something important, so your help in
> > enriching
> > >> > > the content is greatly appreciated.
> > >> > >
> > >> > > Best,
> > >> > > Jark & Martijn
> > >> > >
> > >> > >
> > >> > > On Fri, 2 Jun 2023 at 00:50, Jing Ge 
> > >> wrote:
> > >> > >
> > >> > >> Hi Jark,
> > >> > >>
> > >> > >> Fair enough. Let's do it like you suggested. Thanks!
> > >> > >>
> > >> > >> Best regards,
> > >> > >> Jing
> > >> > >>
> > >> > >> On Thu, Jun 1, 2023 at 6:00 PM Jark Wu  wrote:
> > >> > >>
> > >> > >>> Hi Jing,
> > >> > >>>
> > >> > >>> This thread is for discussing the roadmap for versions 1.18,
> 2.0,
> > >> and
> > >> > >> even
> > >> > >>> more.
> > >> > >>> One of the outcomes of this discussion will be an updated
> version
> > of
> > >> > the
> > >> > >>> current roadmap.
> > >> > >>> Let's work together on refining the roadmap in this thread.
> > >> > >>>
> > >> > >>> Best,
> > >> > >>> Jark
> > >> > >>>
> > >> > >>> On Thu, 1 Jun 2023 at 23:25, Jing Ge  >
> > >> > wrote:
> > >> > >>>
> > >> >  Hi Jark,
> > >> > 
> > >> >  Thanks for driving it! For point 2, since we are developing
> 1.18
> > >> now,
> > >> >  does it make sense to update the roadmap this time while we are
> > >> > >> releasing
> > >> >  1.18? This discussion thread will be focusing on the Flink 2.0
> > >> > roadmap,
> > >> > >>> as
> > >> >  you mentioned previously. WDYT?
> > >> > 
> > >> >  Best regards,
> > >> >  Jing
> > >> > 
> > >> >  On Thu, Jun 1, 2023 at 3:31 PM Jark Wu 
> wrote:
> > >> > 
> > >> > > Hi all,
> > >> > >
> > >> > > Martijn and I would like to initiate a discussion on the Flink
> > >> > >> roadmap,
> > >> > > which should cover the project's long-term roadmap and the
> > regular
> > >> > >>> update
> > >> > > mechanism.
> > >> > >
> > >> > > Xintong has already started a discussion about Flink 2.0
> > planning.
> > >> > >> One
> > >> > >>> of
> > >> > > the points raised in that discussion is that we should have a
> > >> > >>> high-level
> > >> > > discussion of the roadmap to present where the project is
> > heading
> > >> > >>> (which
> > >> > > doesn't necessarily need to block the Flink 2.0 planning).
> > >> Moreover,
> > >> > >>> the
> > >> > > roadmap on the

Re: [DISCUSS] Update Flink Roadmap

2023-08-14 Thread Jark Wu
Thank you everyone for helping polish the roadmap [1].

I think I have addressed all the comments and we have included all ongoing
parts of Flink.
Please feel free to take a last look. I'm going to prepare the pull request
if there are no more concerns.

Best,
Jark

[1]:
https://docs.google.com/document/d/12BDiVKEsY-f7HI3suO_IxwzCmR04QcVqLarXgyJAb7c/edit

On Sun, 13 Aug 2023 at 13:04, Yuan Mei  wrote:

> Sorry for taking so long
>
> I've added a section about Flink Disaggregated State Management Evolution
> in the attached doc.
>
> I found some of the contents might be overlapped with the "large-scale
> streaming jobs". So that part might need some changes as well.
>
> Please let me know what you think.
>
> Best
> Yuan
>
> On Mon, Jul 24, 2023 at 12:07 PM Yuan Mei  wrote:
>
> > Sorry have missed this email and respond a bit late.
> >
> > I will put a draft for the long-term vision for the state as well as
> > large-scale state support into the roadmap.
> >
> > Best
> > Yuan
> >
> > On Mon, Jul 17, 2023 at 10:34 AM Jark Wu  wrote:
> >
> >> Hi Jiabao,
> >>
> >> Thank you for your suggestions. I have added them to the "Going Beyond a
> >> SQL Stream/Batch Processing Engine" and "Large-Scale State Jobs"
> sections.
> >>
> >> Best,
> >> Jark
> >>
> >> On Thu, 13 Jul 2023 at 16:06, Jiabao Sun  >> .invalid>
> >> wrote:
> >>
> >> > Thanks Jark and Martijn for driving this.
> >> >
> >> > There are two suggestions about the Table API:
> >> >
> >> > - Add the JSON type to adapt to the no sql database type.
> >> > - Remove changelog normalize operator for upsert stream.
> >> >
> >> >
> >> > Best,
> >> > Jiabao
> >> >
> >> >
> >> > > 2023年7月13日 下午3:49,Jark Wu  写道:
> >> > >
> >> > > Hi all,
> >> > >
> >> > > Sorry for taking so long back here.
> >> > >
> >> > > Martijn and I have drafted the first version of the updated roadmap,
> >> > > including the updated feature radar reflecting the current state of
> >> > > different components.
> >> > >
> >> >
> >>
> https://docs.google.com/document/d/12BDiVKEsY-f7HI3suO_IxwzCmR04QcVqLarXgyJAb7c/edit
> >> > >
> >> > > Feel free to leave comments in the thread or the document.
> >> > > We may miss mentioning something important, so your help in
> enriching
> >> > > the content is greatly appreciated.
> >> > >
> >> > > Best,
> >> > > Jark & Martijn
> >> > >
> >> > >
> >> > > On Fri, 2 Jun 2023 at 00:50, Jing Ge 
> >> wrote:
> >> > >
> >> > >> Hi Jark,
> >> > >>
> >> > >> Fair enough. Let's do it like you suggested. Thanks!
> >> > >>
> >> > >> Best regards,
> >> > >> Jing
> >> > >>
> >> > >> On Thu, Jun 1, 2023 at 6:00 PM Jark Wu  wrote:
> >> > >>
> >> > >>> Hi Jing,
> >> > >>>
> >> > >>> This thread is for discussing the roadmap for versions 1.18, 2.0,
> >> and
> >> > >> even
> >> > >>> more.
> >> > >>> One of the outcomes of this discussion will be an updated version
> of
> >> > the
> >> > >>> current roadmap.
> >> > >>> Let's work together on refining the roadmap in this thread.
> >> > >>>
> >> > >>> Best,
> >> > >>> Jark
> >> > >>>
> >> > >>> On Thu, 1 Jun 2023 at 23:25, Jing Ge 
> >> > wrote:
> >> > >>>
> >> >  Hi Jark,
> >> > 
> >> >  Thanks for driving it! For point 2, since we are developing 1.18
> >> now,
> >> >  does it make sense to update the roadmap this time while we are
> >> > >> releasing
> >> >  1.18? This discussion thread will be focusing on the Flink 2.0
> >> > roadmap,
> >> > >>> as
> >> >  you mentioned previously. WDYT?
> >> > 
> >> >  Best regards,
> >> >  Jing
> >> > 
> >> >  On Thu, Jun 1, 2023 at 3:31 PM Jark Wu  wrote:
> >> > 
> >> > > Hi all,
> >> > >
> >> > > Martijn and I would like to initiate a discussion on the Flink
> >> > >> roadmap,
> >> > > which should cover the project's long-term roadmap and the
> regular
> >> > >>> update
> >> > > mechanism.
> >> > >
> >> > > Xintong has already started a discussion about Flink 2.0
> planning.
> >> > >> One
> >> > >>> of
> >> > > the points raised in that discussion is that we should have a
> >> > >>> high-level
> >> > > discussion of the roadmap to present where the project is
> heading
> >> > >>> (which
> >> > > doesn't necessarily need to block the Flink 2.0 planning).
> >> Moreover,
> >> > >>> the
> >> > > roadmap on the Flink website [1] hasn't been updated for half a
> >> year,
> >> > >>> and
> >> > > the last update was for the feature radar for the 1.15 release.
> It
> >> > >> has
> >> >  been
> >> > > 2 years since the community discussed Flink's overall roadmap.
> >> > >
> >> > > I would like to raise two topics for discussion:
> >> > >
> >> > > 1. The new roadmap. This should be an updated version of the
> >> current
> >> > > roadmap[1].
> >> > > 2. A mechanism to regularly discuss and update the roadmap.
> >> > >
> >> > > To make the first topic discussion more efficient, Martijn and I
> >> >  volunteer
> >> > > to summarize the ongoing big things of diff

Re: [VOTE] Apache Flink Kubernetes Operator Release 1.6.0, release candidate #2

2023-08-14 Thread Rui Fan
Thanks Gyula for the release!

+1 (non-binding)

- Compiled and tested the source code via mvn verify
- Verified the signatures
- Downloaded the image
- Deployed helm chart to test cluster
- Ran example job

Best,
Rui

On Mon, Aug 14, 2023 at 3:58 PM Gyula Fóra  wrote:

> +1 (binding)
>
> Verified:
>  - Hashes, signatures, source files contain no binaries
>  - Maven repo contents look good
>  - Verified helm chart, image, deployed stateful and autoscaling examples.
> Operator logs look good
>
> Cheers,
> Gyula
>
> On Thu, Aug 10, 2023 at 3:03 PM Gyula Fóra  wrote:
>
> > Hi Everyone,
> >
> > Please review and vote on the release candidate #2 for the
> > version 1.6.0 of Apache Flink Kubernetes Operator,
> > as follows:
> > [ ] +1, Approve the release
> > [ ] -1, Do not approve the release (please provide specific comments)
> >
> > **Release Overview**
> >
> > As an overview, the release consists of the following:
> > a) Kubernetes Operator canonical source distribution (including the
> > Dockerfile), to be deployed to the release repository at dist.apache.org
> > b) Kubernetes Operator Helm Chart to be deployed to the release
> repository
> > at dist.apache.org
> > c) Maven artifacts to be deployed to the Maven Central Repository
> > d) Docker image to be pushed to dockerhub
> >
> > **Staging Areas to Review**
> >
> > The staging areas containing the above mentioned artifacts are as
> follows,
> > for your review:
> > * All artifacts for a,b) can be found in the corresponding dev repository
> > at dist.apache.org [1]
> > * All artifacts for c) can be found at the Apache Nexus Repository [2]
> > * The docker image for d) is staged on github [3]
> >
> > All artifacts are signed with the key 21F06303B87DAFF1 [4]
> >
> > Other links for your review:
> > * JIRA release notes [5]
> > * source code tag "release-1.6.0-rc2" [6]
> > * PR to update the website Downloads page to
> > include Kubernetes Operator links [7]
> >
> > **Vote Duration**
> >
> > The voting time will run for at least 72 hours.
> > It is adopted by majority approval, with at least 3 PMC affirmative
> votes.
> >
> >
> > **Note on Verification**
> >
> > You can follow the basic verification guide here[8].
> > Note that you don't need to verify everything yourself, but please make
> > note of what you have tested together with your +- vote.
> >
> > Cheers!
> > Gyula Fora
> >
> > [1]
> >
> https://dist.apache.org/repos/dist/dev/flink/flink-kubernetes-operator-1.6.0-rc2/
> > [2]
> > https://repository.apache.org/content/repositories/orgapacheflink-1649/
> > [3] ghcr.io/apache/flink-kubernetes-operator:ebb1fed
> > [4] https://dist.apache.org/repos/dist/release/flink/KEYS
> > [5]
> >
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12353230
> > [6]
> >
> https://github.com/apache/flink-kubernetes-operator/tree/release-1.6.0-rc2
> > [7] https://github.com/apache/flink-web/pull/666
> > [8]
> >
> https://cwiki.apache.org/confluence/display/FLINK/Verifying+a+Flink+Kubernetes+Operator+Release
> >
>


[jira] [Created] (FLINK-32865) DynamicFilteringDataCollectorOperator can't chain with the upstream operator when the parallelism is inconsistent

2023-08-14 Thread dalongliu (Jira)
dalongliu created FLINK-32865:
-

 Summary: DynamicFilteringDataCollectorOperator can't chain with 
the upstream operator when the parallelism is inconsistent
 Key: FLINK-32865
 URL: https://issues.apache.org/jira/browse/FLINK-32865
 Project: Flink
  Issue Type: Bug
  Components: Table SQL / Planner
Affects Versions: 1.17.1, 1.16.2, 1.18.0
Reporter: dalongliu
 Fix For: 1.18.0
 Attachments: image-2023-08-14-19-17-22-109.png

!image-2023-08-14-19-17-22-109.png!

 

If the DynamicFilteringDataCollectorOperator parallelism is not consistent with 
the upstream operator, they can't chain together, this will the 
DynamicFilteringDataCollectorOperator to execute after the fact source, so the 
dpp won't work. Due to the operator parallelism being decided during runtime, 
so we should add scheduler dependency forcibly in compile phase.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-32864) StreamDependencyTests.test_set_requirements_with_cached_directory fails on AZP

2023-08-14 Thread Sergey Nuyanzin (Jira)
Sergey Nuyanzin created FLINK-32864:
---

 Summary: 
StreamDependencyTests.test_set_requirements_with_cached_directory  fails on AZP
 Key: FLINK-32864
 URL: https://issues.apache.org/jira/browse/FLINK-32864
 Project: Flink
  Issue Type: Bug
  Components: API / Python
Affects Versions: 1.18.0
Reporter: Sergey Nuyanzin


This build 
[https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=52209&view=logs&j=9cada3cb-c1d3-5621-16da-0f718fb86602&t=c67e71ed-6451-5d26-8920-5a8cf9651901&l=25208]
fails as below.
The error is very similar to one of the old issues FLINK-15929, probably some 
dependencies in requirements should be fixed...

 {noformat}
Aug 13 01:41:17 === FAILURES 
===
Aug 13 01:41:17 __ 
StreamDependencyTests.test_set_requirements_with_cached_directory ___
Aug 13 01:41:17 
Aug 13 01:41:17 self = 

Aug 13 01:41:17 
Aug 13 01:41:17 def test_set_requirements_with_cached_directory(self):
Aug 13 01:41:17 tmp_dir = self.tempdir
Aug 13 01:41:17 requirements_txt_path = os.path.join(tmp_dir, 
"requirements_txt_" + str(uuid.uuid4()))
Aug 13 01:41:17 with open(requirements_txt_path, 'w') as f:
Aug 13 01:41:17 f.write("python-package1==0.0.0")
Aug 13 01:41:17 
Aug 13 01:41:17 requirements_dir_path = os.path.join(tmp_dir, 
"requirements_dir_" + str(uuid.uuid4()))
Aug 13 01:41:17 os.mkdir(requirements_dir_path)
Aug 13 01:41:17 package_file_name = "python-package1-0.0.0.tar.gz"

{noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] How about adding OLAP to Flink Roadmap?

2023-08-14 Thread Yun Tang
Thanks to the guys from ByteDance driving this topic, which could be another 
big story to extend Flink's ability.

In general, I think this is a great idea. However, before we move forward, I 
think we should first answer the question: which is the target for Flink OLAP?

We run Presto/Trino and SparkSQL in the production environment for OLAP SQL 
analysis. Since Presto runs faster than SparkSQL in many cases, especially for 
ad-hoc queries at medium-sized data, we would run queries on Presto first or 
switch to SparkSQL for large-scale queries if necessary.
Presto runs as a service and emphasis on query performance without node fault 
tolerance. Moreover, it leverages a pipeline-like data exchange mode instead of 
the classic stage blocking exchange mode, which is a bit like Flink's pipeline 
mode vs blocking mode.

Can we say we hope Flink OLAP could target Presto/Trino in medium-sized data 
query, and switch to Flink batch SQL for large-scale analysis query?
If so, I also think the naming of Flink OLAP looks a bit strange, as Flink 
batch SQL shall also serve for large-scale OLAP analysis.

Best
Yun Tang

From: Jing Ge 
Sent: Thursday, August 10, 2023 13:52
To: dev@flink.apache.org 
Subject: Re: [DISCUSS] How about adding OLAP to Flink Roadmap?

Hi Shammon, Hi Xiangyu,

Thanks for bringing this to our attention. I can see this is a great
proposal born from real business scenarios. +1 for it.

People have been keen to use one platform to cover all their data
production and consumption requirements. Flink did a great job for the
production, i.e. streaming and batch processing with all excellent
ecosystems. This is the big advantage for Flink to go one step further and
cover the consumption part. It will turn Flink into a unified compute
platform like what the Ray project(the platform behind ChatGPT, if someone
is not aware of it)[1] is doing and secure Flink to be one of the most
interesting open source platforms for the next decade.

Frankly speaking, it will be a big change. As far as I am concerned, the
following should be considered(just thought about it at the first glance,
there must be more).

Architecture upgrade - since we will have three capabilities(I wanted to
use "engines", but it might be too early to use the big word), i.e.
streaming, batch, OLAP,  it might make sense to upgrade the architecture
while we are building the OLAP in Flink. The unified foundation or
abstraction for distributed computation should be designed and implemented
underneath those capabilities. In the future, new capabilities can leverage
the foundation and could be developed at a very fast pace.

MPP architecture - Flink session cluster is not the MMP architecture.
Commonly speaking, SNA(shared nothing architecture) is the key that could
implement MPP. Flink has everything to offer SNA. That is the reason why we
can consider building OLAP into or on top of the Flink. And speaking of
MPP, there will be a lot of things to do, e.g. the Retrieval
Architecture[2], multiple level task split, dynamic retry or even split,
etc. I will not expand all those topics at this early stage.

OLAP queries syntax - at least some common syntax and statements need to be
implemented, e.g. cube, grouping set, over partition by, you mention it.

Last but not least, there will be a big effort to upgrade the runtime
features to support OLAP wrt the performance and latency.

Best regards,
Jing


[1] https://www.ray.io/
[2] https://www.tutorialsbook.com/teradata/teradata-architecture

On Thu, Aug 10, 2023 at 11:39 AM Dan Zou  wrote:

> Thanks for bringing up this discussion, Shammon. I would like to share
> some of my observations and experiences.
>
> Flink has almost become the de facto standard for streaming computing, and
> Flink batch have been successfully applied in some companies. If Flink can
> support OLAP scenarios well, a unified engine to support streaming, batch,
> and OLAP will become a reality, which is very exciting.
>
> Based on the status quo, Flink can be used as a primary OLAP engine,
> although there is still a lot of room for optimization. This means that we
> do not need to carry out large-scale renovation at the beginning, but only
> gradually and continuously enhance it without affecting streaming.
>
> Flink OLAP can largely reuse the capabilities of Flink Batch SQL and
> optimizations in OLAP can also benefit Flink Batch. If we simplify job
> startup overhead and increase cross-job resource reuse (Plan reuse,
> Generated class reuse, Connection reuse, etc.) on this basis, Flink will
> become a good OLAP engine.
>
> So, I am big +1 for adding OLAP to Flink Roadmap, and I am willing to
> contribute to it.
>
>
> > 2023年8月9日 15:35,xiangyu feng  写道:
> >
> > Thank you Shammon for initiating this discussion. As one of the Flink
> OLAP
> > developers in ByteDance, I would also like to share a real case of our
> > users.
> >
> > About two years ago we found our first OLAP user internally by
> inte

Re: [VOTE] Apache Flink Kubernetes Operator Release 1.6.0, release candidate #2

2023-08-14 Thread Gyula Fóra
+1 (binding)

Verified:
 - Hashes, signatures, source files contain no binaries
 - Maven repo contents look good
 - Verified helm chart, image, deployed stateful and autoscaling examples.
Operator logs look good

Cheers,
Gyula

On Thu, Aug 10, 2023 at 3:03 PM Gyula Fóra  wrote:

> Hi Everyone,
>
> Please review and vote on the release candidate #2 for the
> version 1.6.0 of Apache Flink Kubernetes Operator,
> as follows:
> [ ] +1, Approve the release
> [ ] -1, Do not approve the release (please provide specific comments)
>
> **Release Overview**
>
> As an overview, the release consists of the following:
> a) Kubernetes Operator canonical source distribution (including the
> Dockerfile), to be deployed to the release repository at dist.apache.org
> b) Kubernetes Operator Helm Chart to be deployed to the release repository
> at dist.apache.org
> c) Maven artifacts to be deployed to the Maven Central Repository
> d) Docker image to be pushed to dockerhub
>
> **Staging Areas to Review**
>
> The staging areas containing the above mentioned artifacts are as follows,
> for your review:
> * All artifacts for a,b) can be found in the corresponding dev repository
> at dist.apache.org [1]
> * All artifacts for c) can be found at the Apache Nexus Repository [2]
> * The docker image for d) is staged on github [3]
>
> All artifacts are signed with the key 21F06303B87DAFF1 [4]
>
> Other links for your review:
> * JIRA release notes [5]
> * source code tag "release-1.6.0-rc2" [6]
> * PR to update the website Downloads page to
> include Kubernetes Operator links [7]
>
> **Vote Duration**
>
> The voting time will run for at least 72 hours.
> It is adopted by majority approval, with at least 3 PMC affirmative votes.
>
>
> **Note on Verification**
>
> You can follow the basic verification guide here[8].
> Note that you don't need to verify everything yourself, but please make
> note of what you have tested together with your +- vote.
>
> Cheers!
> Gyula Fora
>
> [1]
> https://dist.apache.org/repos/dist/dev/flink/flink-kubernetes-operator-1.6.0-rc2/
> [2]
> https://repository.apache.org/content/repositories/orgapacheflink-1649/
> [3] ghcr.io/apache/flink-kubernetes-operator:ebb1fed
> [4] https://dist.apache.org/repos/dist/release/flink/KEYS
> [5]
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12353230
> [6]
> https://github.com/apache/flink-kubernetes-operator/tree/release-1.6.0-rc2
> [7] https://github.com/apache/flink-web/pull/666
> [8]
> https://cwiki.apache.org/confluence/display/FLINK/Verifying+a+Flink+Kubernetes+Operator+Release
>


[jira] [Created] (FLINK-32863) Improve Flink UI's time precision from second level to milliseconds level

2023-08-14 Thread Runkang He (Jira)
Runkang He created FLINK-32863:
--

 Summary: Improve Flink UI's time precision from second level to 
milliseconds level
 Key: FLINK-32863
 URL: https://issues.apache.org/jira/browse/FLINK-32863
 Project: Flink
  Issue Type: Improvement
  Components: Runtime / Web Frontend
Affects Versions: 1.17.1
Reporter: Runkang He


This an UI improvement for OLAP jobs.

OLAP queries are generally small queries which will finish at the seconds or 
milliseconds, but currently the time precision displayed is second level and 
not enough for OLAP queries. Millisecond part of time is very important for 
users and developers, to see accurate time, for performance measurement and 
optimization. The displayed time includes job duration, task duration, task 
start time, end time and so on.

It would be nice to improve this for better OLAP user experience.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)