Re: [ANNOUNCE] Apache Flink has won the 2023 SIGMOD Systems Award

2023-07-06 Thread Yuan Mei
Congrats everyone :-)

Best
Yuan

On Fri, Jul 7, 2023 at 11:29 AM Hang Ruan  wrote:

> Hi, Leonard.
>
> I would like to help to add this page. Please assign this issue to me.
> Thanks.
>
> Best,
> Hang
>
> Leonard Xu  于2023年7月7日周五 11:26写道:
>
>> Congrats to all !
>>
>> It will be helpful to promote Apache Flink if we can add a page to our
>> website like others[2]. I’ve created an issue to improve this.
>>
>>
>> Best,
>> Leonard
>>
>> [1] https://issues.apache.org/jira/browse/FLINK-32555
>> [2] https://spark.apache.org/news/sigmod-system-award.html
>>
>


Re: [ANNOUNCE] Apache Flink has won the 2023 SIGMOD Systems Award

2023-07-06 Thread Hang Ruan
Hi, Leonard.

I would like to help to add this page. Please assign this issue to me.
Thanks.

Best,
Hang

Leonard Xu  于2023年7月7日周五 11:26写道:

> Congrats to all !
>
> It will be helpful to promote Apache Flink if we can add a page to our
> website like others[2]. I’ve created an issue to improve this.
>
>
> Best,
> Leonard
>
> [1] https://issues.apache.org/jira/browse/FLINK-32555
> [2] https://spark.apache.org/news/sigmod-system-award.html
>


Re: [ANNOUNCE] Apache Flink has won the 2023 SIGMOD Systems Award

2023-07-06 Thread Leonard Xu
Congrats to all !

It will be helpful to promote Apache Flink if we can add a page to our website 
like others[2]. I’ve created an issue to improve this.


Best,
Leonard

[1] https://issues.apache.org/jira/browse/FLINK-32555 
[2] https://spark.apache.org/news/sigmod-system-award.html

[jira] [Created] (FLINK-32555) Add a page to show SIGMOD System Awards for Apache Flink

2023-07-06 Thread Leonard Xu (Jira)
Leonard Xu created FLINK-32555:
--

 Summary: Add a page to show SIGMOD System Awards for Apache Flink
 Key: FLINK-32555
 URL: https://issues.apache.org/jira/browse/FLINK-32555
 Project: Flink
  Issue Type: Improvement
  Components: Documentation
Reporter: Leonard Xu
 Fix For: 1.18.0


As you all known, Apache Flink has won the 2023 SIGMOD Systems Award [1].

It will be helpful to promote Apache Flink if we can add a page community 
document to let our users know.


[1][https://sigmod.org/2023-sigmod-systems-award/]

[2]https://spark.apache.org/news/sigmod-system-award.html



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [ANNOUNCE] Flink 1.18 Feature Freeze Extended until July 24th, 2023

2023-07-06 Thread Shammon FY
Thanks for driving the release and sharing the update, looking forward to
1.18

Best,
Shammon FY

On Fri, Jul 7, 2023 at 10:56 AM Yun Tang  wrote:

> Thanks for driving this release and sharing the update on the feature
> freeze extension.
>
>
> Best
> Yun Tang
> 
> From: Jing Ge 
> Sent: Thursday, July 6, 2023 17:13
> To: yuxia 
> Cc: dev ; re...@apache.org ;
> snuyan...@gmail.com ; Konstantin Knauf <
> kna...@apache.org>
> Subject: Re: [ANNOUNCE] Flink 1.18 Feature Freeze Extended until July
> 24th, 2023
>
> Thanks for driving it and sharing the update!
>
> Best regards,
> Jing
>
> On Thu, Jul 6, 2023 at 9:21 AM yuxia  wrote:
>
> > Thanks for the update and thanks for your efforts.
> >
> > Best regards,
> > Yuxia
> >
> > - 原始邮件 -
> > 发件人: "Rui Fan" <1996fan...@gmail.com>
> > 收件人: "dev" , re...@apache.org
> > 抄送: "Jing Ge" , snuyan...@gmail.com, "Konstantin
> > Knauf" 
> > 发送时间: 星期四, 2023年 7 月 06日 下午 3:06:28
> > 主题: Re: [ANNOUNCE] Flink 1.18 Feature Freeze Extended until July 24th,
> 2023
> >
> > Thanks for the update, and thank you for your efforts for the 1.18
> release!
> >
> > Best,
> > Rui Fan
> >
> > On Thu, Jul 6, 2023 at 2:40 PM Qingsheng Ren  wrote:
> >
> > > Hi devs,
> > >
> > > Recently we collected some feedback from developers, and in order to
> give
> > > more time for polishing some important features in 1.18, we decide to
> > > extend the feature freezing date to:
> > >
> > > July 24th, 2023, at 00:00 CEST(UTC+2)
> > >
> > > which gives us ~2 weeks for development from now. There will be no
> > > extension after Jul 24, so please arrange new features in the next
> > release
> > > if they cannot be finished before the closing date.
> > >
> > > Thanks everyone for your work in 1.18!
> > >
> > > Best regards,
> > > Qingsheng, Jing, Konstantin and Sergey
> > >
> >
>


Re: [ANNOUNCE] Flink 1.18 Feature Freeze Extended until July 24th, 2023

2023-07-06 Thread Yun Tang
Thanks for driving this release and sharing the update on the feature freeze 
extension.


Best
Yun Tang

From: Jing Ge 
Sent: Thursday, July 6, 2023 17:13
To: yuxia 
Cc: dev ; re...@apache.org ; 
snuyan...@gmail.com ; Konstantin Knauf 
Subject: Re: [ANNOUNCE] Flink 1.18 Feature Freeze Extended until July 24th, 2023

Thanks for driving it and sharing the update!

Best regards,
Jing

On Thu, Jul 6, 2023 at 9:21 AM yuxia  wrote:

> Thanks for the update and thanks for your efforts.
>
> Best regards,
> Yuxia
>
> - 原始邮件 -
> 发件人: "Rui Fan" <1996fan...@gmail.com>
> 收件人: "dev" , re...@apache.org
> 抄送: "Jing Ge" , snuyan...@gmail.com, "Konstantin
> Knauf" 
> 发送时间: 星期四, 2023年 7 月 06日 下午 3:06:28
> 主题: Re: [ANNOUNCE] Flink 1.18 Feature Freeze Extended until July 24th, 2023
>
> Thanks for the update, and thank you for your efforts for the 1.18 release!
>
> Best,
> Rui Fan
>
> On Thu, Jul 6, 2023 at 2:40 PM Qingsheng Ren  wrote:
>
> > Hi devs,
> >
> > Recently we collected some feedback from developers, and in order to give
> > more time for polishing some important features in 1.18, we decide to
> > extend the feature freezing date to:
> >
> > July 24th, 2023, at 00:00 CEST(UTC+2)
> >
> > which gives us ~2 weeks for development from now. There will be no
> > extension after Jul 24, so please arrange new features in the next
> release
> > if they cannot be finished before the closing date.
> >
> > Thanks everyone for your work in 1.18!
> >
> > Best regards,
> > Qingsheng, Jing, Konstantin and Sergey
> >
>


Re: [DISCUSS] FLIP-314: Support Customized Job Lineage Listener

2023-07-06 Thread Shammon FY
Thanks Jing, sounds good to me.

I have updated the FLIP and renamed the lineage related classes to
`LineageGraph`, `LineageVertex` and `LineageEdge` and keep it consistent
with the job definition in Flink.

Best,
Shammon FY

On Thu, Jul 6, 2023 at 8:25 PM Jing Ge  wrote:

> Hi Shammon,
>
> Thanks for the clarification. Atlas might have his historical reason back
> to the hadoop era or maybe even back to the hibernate where Entity and
> Relation were commonly used. Flink already used Vertex and Edge to describe
> DAG. Some popular tools like dbt are also using this convention[1] and,
> afaik, most graph frameworks use vertex and edge too. It will be easier for
> Flink devs and users to have a consistent naming convention for the same
> concept, i.e. in this case, DAG.
>
> Best regards,
> Jing
>
> [1]
>
> https://docs.getdbt.com/docs/dbt-cloud-apis/discovery-use-cases-and-examples#discovery
>
> On Wed, Jul 5, 2023 at 11:28 AM Shammon FY  wrote:
>
> > Hi Jing,
> >
> > Thanks for your feedback.
> >
> > > 1. TableColumnLineageRelation#sinkColumn() should return
> > TableColumnLineageEntity instead of String, right?
> >
> > The `sinkColumn()` will return `String` which is the column name in the
> > sink connector. I found the name of `TableColumnLineageEntity` may
> > cause ambiguity and I have renamed it to
> `TableColumnSourceLineageEntity`.
> > In my mind the `TableColumnLineageRelation` represents the lineage for
> each
> > sink column, each column may be computed from multiple sources and
> columns.
> > I use `TableColumnSourceLineageEntity` to manage each source and its
> > columns for the sink column, so `TableColumnLineageRelation` has a sink
> > column name and `TableColumnSourceLineageEntity` list.
> >
> > > 2. Since LineageRelation already contains all information to build the
> > lineage between sources and sink, do we still need to set the
> LineageEntity
> > in the source?
> >
> > The lineage interface of `DataStream` is very flexible. We have added
> > `setLineageEntity` to the source to limit and verify user behavior,
> > ensuring that users have not added non-existent sources as lineage.
> >
> > > 3. About the "Entity" and "Relation" naming, I was confused too, like
> > Qingsheng mentioned. How about LineageVertex, LineageEdge, and
> LineageEdges
> > which contains multiple LineageEdge?
> >
> > We referred to `Atlas` for the name of lineage, it uses `Entity` and
> > `Relation` to represent the lineage relationship and another metadata
> > service `Datahub` uses `DataSet` to represent the entity. I think
> `Entity`
> > and `Relation` are nicer for lineage, what do you think of it?
> >
> > Best,
> > Shammon FY
> >
> >
> > On Thu, Jun 29, 2023 at 4:21 AM Jing Ge 
> > wrote:
> >
> > > Hi Shammon,
> > >
> > > Thanks for your proposal. After reading the FLIP, I'd like to ask
> > > some questions to make sure we are on the same page. Thanks!
> > >
> > > 1. TableColumnLineageRelation#sinkColumn() should return
> > > TableColumnLineageEntity instead of String, right?
> > >
> > > 2. Since LineageRelation already contains all information to build the
> > > lineage between sources and sink, do we still need to set the
> > LineageEntity
> > > in the source?
> > >
> > > 3. About the "Entity" and "Relation" naming, I was confused too, like
> > > Qingsheng mentioned. How about LineageVertex, LineageEdge, and
> > LineageEdges
> > > which contains multiple LineageEdge? E.g. multiple sources join into
> one
> > > sink, or, edges of columns from one or different tables, etc.
> > >
> > > Best regards,
> > > Jing
> > >
> > > On Sun, Jun 25, 2023 at 2:06 PM Shammon FY  wrote:
> > >
> > > > Hi yuxia and Yun,
> > > >
> > > > Thanks for your input.
> > > >
> > > > For yuxia:
> > > > > 1: What kinds of JobStatus will the `JobExecutionStatusEven`
> > including?
> > > >
> > > > At present, we only need to notify the listener when a job goes to
> > > > termination, but I think it makes sense to add generic `oldStatus`
> and
> > > > `newStatus` in the listener and users can update the job state in
> their
> > > > service as needed.
> > > >
> > > > > 2: I'm really confused about the `config()` included in
> > > `LineageEntity`,
> > > > where is it from and what is it for ?
> > > >
> > > > The `config` in `LineageEntity` is used for users to get options for
> > > source
> > > > and sink connectors. As the examples in the FLIP, users can add
> > > > server/group/topic information in the config for kafka and create
> > lineage
> > > > entities for `DataStream` jobs, then the listeners can get this
> > > information
> > > > to identify the same connector in different jobs. Otherwise, the
> > `config`
> > > > in `TableLineageEntity` will be the same as `getOptions` in
> > > > `CatalogBaseTable`.
> > > >
> > > > > 3: Regardless whether `inputChangelogMode` in
> > `TableSinkLineageEntity`
> > > is
> > > > needed or not, since `TableSinkLineageEntity` contains
> > > > `inputChangelogMode`, why `TableSourceLineageEntity` don't 

Re: [DISCUSS] FLIP-313 Add support of User Defined AsyncTableFunction

2023-07-06 Thread Aitozi
Hi guys,
Since there are no further comments, Kindly ping for the vote thread
[1] :D

Thanks,
Aitozi.

[1]: https://lists.apache.org/thread/7g5n2vshosom2dj9bp7x4n01okrnx4xx

Aitozi  于2023年6月26日周一 10:31写道:

> Hi Lincoln,
> Thanks for your confirmation. I have updated the consensus to the FLIP
> doc.
> If there are no other comments, I'd like to restart the vote process in
> [1] today.
>
> https://lists.apache.org/thread/7g5n2vshosom2dj9bp7x4n01okrnx4xx
>
> Thanks,
> Aitozi.
>
> Lincoln Lee  于2023年6月21日周三 22:29写道:
>
>> Hi Aitozi,
>>
>> Thanks for your updates!
>>
>> By the design of hints, the hints after select clause belong to the query
>> hints category, and this new hint is also a kind of join hints[1].
>> Join table function is one of the join type defined by flink sql joins[2],
>> all existing join hints[1] omit the 'join' keyword,
>> so I would prefer the 'ASYNC_TABLE_FUNC' (which is actually the one for
>> 'ASYNC_TABLE_FUNC_JOIN').
>>
>> Since a short Chinese holiday is coming, I suggest waiting for other
>> people's responses before continuing to vote (next monday?)
>>
>> Btw, I discussed with @fudian offline about pyflink support, there should
>> be no known issues, so you can create a subtask with pyflink support after
>> the vote passed.
>>
>> [1]
>>
>> https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/sql/queries/hints/#join-hints
>> [2]
>>
>> https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/sql/queries/joins/
>>
>> Best,
>> Lincoln Lee
>>
>>
>> Aitozi  于2023年6月18日周日 21:18写道:
>>
>> > Hi all,
>> > Sorry for the late reply, I have a discussion with Lincoln offline,
>> > mainly about
>> > the naming of the hints option. Thanks Lincoln for the valuable
>> > suggestions.
>> >
>> > Let me answer the last email inline.
>> >
>> > >For `JavaAsyncTableFunc0` in flip, can you use a scenario like RPC
>> call as
>> > an example?
>> >
>> > Sure, will give an example when adding the doc of async udtf and will
>> > update the FLIP simultaneously
>> >
>> > >For the name of this query hint, 'LATERAL' (include its internal
>> options)
>> > don't show any relevance to async, but I haven't thought of a suitable
>> name
>> > at the moment,
>> >
>> > After some discussion with Lincoln, We prefer to choose one of the
>> > `ASYNC_TABLE_FUNC` and `ASYNC_LATERAL`.
>> > Besides, In my opinion the keyword `lateral`'s use scenario is wider
>> than
>> > the table function join, but in this case we only want to config
>> > the async table function, So I'm a bit more lean to the
>> `ASYNC_TABLE_FUNC`.
>> > Looking forward to some inputs if you guys have
>> > some better suggestion on the naming.
>> >
>> > For the usage of the hints config option, I have updated the section
>> > of ConfigOption, you can refer to the FLIP
>> > for more details.
>> >
>> > >Also, the terms 'correlate join' and 'lateral join' are not the same
>> as in
>> > the current joins page[1], so maybe it would be better if we unified
>> them
>> > into  'join table function'
>> >
>> > Yes, we should unified to the 'join table function', updated.
>> >
>> > Best,
>> > Aitozi
>> >
>> > Lincoln Lee  于2023年6月15日周四 09:15写道:
>> >
>> > > Hi Aitozi,
>> > >
>> > > Thanks for your reply!  Gives sql users more flexibility to get
>> > > asynchronous processing capabilities via lateral join table function
>> +1
>> > for
>> > > this
>> > >
>> > > For `JavaAsyncTableFunc0` in flip, can you use a scenario like RPC
>> call
>> > as
>> > > an example?
>> > >
>> > > For the name of this query hint, 'LATERAL' (include its internal
>> options)
>> > > don't show any relevance to async, but I haven't thought of a suitable
>> > name
>> > > at the moment,
>> > > maybe we need to highlight the async keyword directly, we can also
>> see if
>> > > others have better candidates
>> > >
>> > > For the hint option "timeout = '180s'" should be "'timeout' = '180s'",
>> > > seems a typo in the flip. And use upper case for all keywords in sql
>> > > examples.
>> > > Also, the terms 'correlate join' and 'lateral join' are not the same
>> as
>> > in
>> > > the current joins page[1], so maybe it would be better if we unified
>> them
>> > > into  'join table function'
>> > >
>> > > [1]
>> > >
>> > >
>> >
>> https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/sql/queries/joins/#table-function
>> > >
>> > > Best,
>> > > Lincoln Lee
>> > >
>> > >
>> > > Aitozi  于2023年6月14日周三 16:11写道:
>> > >
>> > > > Hi Lincoln
>> > > >
>> > > > Very thanks for your valuable question. I will try to answer
>> your
>> > > > questions inline.
>> > > >
>> > > > >Does the async udtf bring any additional benefits besides a
>> > > > lighter implementation?
>> > > >
>> > > > IMO, async udtf is more than a lighter implementation. It can act
>> as a
>> > > > general way for sql users to use the async operator. And they don't
>> > have
>> > > to
>> > > > bind the async function with a table (a LookupTable), and they are
>> not
>> > > > forced to join on an 

Re: Re: [DISCUSS] FLIP-313 Add support of User Defined AsyncTableFunction

2023-07-06 Thread Aitozi
Hi Awake,
Thanks for your good point, updated

Best,
Aitozi.

宇航 李  于2023年7月5日周三 11:29写道:

> Hi Aitozi,
>
> I think it is necessary to add the following description in FLIP to
> express the difference between user-defined asynchronous table function and
> AsyncTableFunction:
>
> User-defined asynchronous table functions allow complex parameters (e.g.,
> Row type) to be passed to function, which is important in RPC, rather than
> using ‘join … on ...'.
>
> Thanks,
> Awake.
>
>
> On 2023/06/26 02:31:59 Aitozi wrote:
> > Hi Lincoln,
> > Thanks for your confirmation. I have updated the consensus to the
> FLIP
> > doc.
> > If there are no other comments, I'd like to restart the vote process in
> [1]
> > today.
> >
> > https://lists.apache.org/thread/7g5n2vshosom2dj9bp7x4n01okrnx4xx
> >
> > Thanks,
> > Aitozi.
> >
> > Lincoln Lee  于2023年6月21日周三 22:29写道:
> >
> > > Hi Aitozi,
> > >
> > > Thanks for your updates!
> > >
> > > By the design of hints, the hints after select clause belong to the
> query
> > > hints category, and this new hint is also a kind of join hints[1].
> > > Join table function is one of the join type defined by flink sql
> joins[2],
> > > all existing join hints[1] omit the 'join' keyword,
> > > so I would prefer the 'ASYNC_TABLE_FUNC' (which is actually the one for
> > > 'ASYNC_TABLE_FUNC_JOIN').
> > >
> > > Since a short Chinese holiday is coming, I suggest waiting for other
> > > people's responses before continuing to vote (next monday?)
> > >
> > > Btw, I discussed with @fudian offline about pyflink support, there
> should
> > > be no known issues, so you can create a subtask with pyflink support
> after
> > > the vote passed.
> > >
> > > [1]
> > >
> > >
> https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/sql/queries/hints/#join-hints
> > > [2]
> > >
> > >
> https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/sql/queries/joins/
> > >
> > > Best,
> > > Lincoln Lee
> > >
> > >
> > > Aitozi  于2023年6月18日周日 21:18写道:
> > >
> > > > Hi all,
> > > > Sorry for the late reply, I have a discussion with Lincoln
> offline,
> > > > mainly about
> > > > the naming of the hints option. Thanks Lincoln for the valuable
> > > > suggestions.
> > > >
> > > > Let me answer the last email inline.
> > > >
> > > > >For `JavaAsyncTableFunc0` in flip, can you use a scenario like RPC
> call
> > > as
> > > > an example?
> > > >
> > > > Sure, will give an example when adding the doc of async udtf and will
> > > > update the FLIP simultaneously
> > > >
> > > > >For the name of this query hint, 'LATERAL' (include its internal
> > > options)
> > > > don't show any relevance to async, but I haven't thought of a
> suitable
> > > name
> > > > at the moment,
> > > >
> > > > After some discussion with Lincoln, We prefer to choose one of the
> > > > `ASYNC_TABLE_FUNC` and `ASYNC_LATERAL`.
> > > > Besides, In my opinion the keyword `lateral`'s use scenario is wider
> than
> > > > the table function join, but in this case we only want to config
> > > > the async table function, So I'm a bit more lean to the
> > > `ASYNC_TABLE_FUNC`.
> > > > Looking forward to some inputs if you guys have
> > > > some better suggestion on the naming.
> > > >
> > > > For the usage of the hints config option, I have updated the section
> > > > of ConfigOption, you can refer to the FLIP
> > > > for more details.
> > > >
> > > > >Also, the terms 'correlate join' and 'lateral join' are not the
> same as
> > > in
> > > > the current joins page[1], so maybe it would be better if we unified
> them
> > > > into  'join table function'
> > > >
> > > > Yes, we should unified to the 'join table function', updated.
> > > >
> > > > Best,
> > > > Aitozi
> > > >
> > > > Lincoln Lee  于2023年6月15日周四 09:15写道:
> > > >
> > > > > Hi Aitozi,
> > > > >
> > > > > Thanks for your reply!  Gives sql users more flexibility to get
> > > > > asynchronous processing capabilities via lateral join table
> function +1
> > > > for
> > > > > this
> > > > >
> > > > > For `JavaAsyncTableFunc0` in flip, can you use a scenario like RPC
> call
> > > > as
> > > > > an example?
> > > > >
> > > > > For the name of this query hint, 'LATERAL' (include its internal
> > > options)
> > > > > don't show any relevance to async, but I haven't thought of a
> suitable
> > > > name
> > > > > at the moment,
> > > > > maybe we need to highlight the async keyword directly, we can also
> see
> > > if
> > > > > others have better candidates
> > > > >
> > > > > For the hint option "timeout = '180s'" should be "'timeout' =
> '180s'",
> > > > > seems a typo in the flip. And use upper case for all keywords in
> sql
> > > > > examples.
> > > > > Also, the terms 'correlate join' and 'lateral join' are not the
> same as
> > > > in
> > > > > the current joins page[1], so maybe it would be better if we
> unified
> > > them
> > > > > into  'join table function'
> > > > >
> > > > > [1]
> > > > >
> > > > >
> > > >
> > >
> 

Re: [DISCUSS] FLIP-329: Add operator attribute to specify support for object-reuse

2023-07-06 Thread Dong Lin
Hi Jing,

Thank you for the suggestion. Yes, we can extend it to support null if in
the future we find any use-case for this flexibility.

Best,
Dong

On Thu, Jul 6, 2023 at 7:55 PM Jing Ge  wrote:

> Hi Dong,
>
> one scenario I could imagine is that users could enable global object
> reuse features but force deep copy for some user defined specific functions
> because of any limitations. But that is only my gut feeling. And agree, we
> could keep the solution simple for now as FLIP described and upgrade to 3VL
> once there are such real requirements that are rising.
>
> Best regards,
> Jing
>
> On Thu, Jul 6, 2023 at 12:30 PM Dong Lin  wrote:
>
>> Hi Jing,
>>
>> Thank you for the detailed explanation. Please see my reply inline.
>>
>> On Thu, Jul 6, 2023 at 3:17 AM Jing Ge  wrote:
>>
>>> Hi Xuannan, Hi Dong,
>>>
>>> Thanks for your clarification.
>>>
>>> @Xuannan
>>>
>>> A Jira ticket has been created for the doc update:
>>> https://issues.apache.org/jira/browse/FLINK-32546
>>>
>>> @Dong
>>>
>>> I don't have a concrete example. I just thought about it from a
>>> conceptual or pattern's perspective. Since we have 1. coarse-grained global
>>> switch(CGS as abbreviation), i.e. the pipeline.object-reuse and 2.
>>> fine-grained local switch(FGS as abbreviation), i.e. the
>>> objectReuseCompliant variable for specific operators/functions, there will
>>> be the following patterns with appropriate combinations:
>>>
>>> pattern 1: coarse-grained switch only. Local object reuse will be
>>> controlled by the coarse-grained switch:
>>> 1.1 cgs == true -> local object reused enabled
>>> 1.2 cgs == true  -> local object reused enabled
>>> 1.3 cgs == false -> local object reused disabled, i.e. deep copy enabled
>>> 1.4 cgs == false -> local object reused disabled, i.e. deep copy enabled
>>>
>>> afaiu, this is the starting point. I wrote 4 on purpose to make the
>>> regression check easier. We can consider it as the combinations with
>>> cgs(true/false) and fgs(true/false) while fgs is ignored.
>>>
>>> Now we introduce fine-grained switch. There will be two patterns:
>>>
>>> pattern 2: fine-grained switch over coarse-grained switch.
>>> Coarse-grained switch will be ignored when the local fine-grained switch
>>> has different value:
>>> 2.1 cgs == true and fgs == true -> local object reused enabled
>>> 2.2 cgs == true and fgs == false -> local object reused disabled, i.e.
>>> deep copy enabled
>>> 2.3 cgs == false and fgs == true -> local object reused enabled
>>> 2.4 cgs == false and fgs == false -> local object reused disabled, i.e.
>>> deep copy enabled
>>>
>>> cgs is actually ignored.
>>>
>>> Current FLIP is using a slightly different pattern:
>>>
>>> pattern 3: fine-grained switch over coarse-grained switch only when
>>> coarse-grained switch is off, i..e cgs OR fgs:
>>> 3.1 cgs == true and fgs == true -> local object reused enabled
>>> 3.2 cgs == true and fgs == false -> local object reused enabled
>>> 3.3 cgs == false and fgs == true -> local object reused enabled
>>> 3.4 cgs == false and fgs == false -> local object reused disabled, i.e.
>>> deep copy enabled
>>>
>>> All of those patterns are rational and each has different focus. It
>>> depends on the real requirement to choose one of them.
>>>
>>> As we can see, if fgs is using 2VL, there is a regression between
>>> pattern 1 and pattern 2. You are absolutely right in this case. That's why
>>> I suggested 3VL, i.e. fgs will have triple values: true, false,
>>> unknown(e.g. null)
>>>
>>> pattern 4: 3VL fgs with the null as init value (again, there are just
>>> two combination, I made it 4 on purpose):
>>> 4.1 cgs == true and fgs == null -> local object reused enabled
>>> 4.2 cgs == true and fgs == null -> local object reused enabled
>>> 4.3 cgs == false and fgs == null -> local object reused disabled, i.e.
>>> deep copy enabled
>>> 4.4 cgs == false and fgs == null -> local object reused disabled, i.e.
>>> deep copy enabled
>>>
>>> Since the default value of fgs is null, pattern 4 is backward compatible
>>> with pattern 1, which means no regression.
>>>
>>> Now we will set value to fgs and follow the pattern 2:
>>> 4.5 cgs == true and fgs == true -> local object reused enabled
>>> 4.6 cgs == true and fgs == false -> local object reused disabled, i.e.
>>> deep copy enabled
>>> 4.7 cgs == false and fgs == true -> local object reused enabled
>>> 4.8 cgs == false and fgs == false -> local object reused disabled, i.e.
>>> deep copy enabled
>>>
>>> Pattern 4 contains pattern 3 with the following combinations(force
>>> enabling local object reuse):
>>> 4.5 cgs == true and fgs == true -> local object reused enabled
>>> 4.2 cgs == true and fgs == null -> local object reused enabled
>>> 4.7 cgs == false and fgs == true -> local object reused enabled
>>> 4.4 cgs == false and fgs == null -> local object reused disabled, i.e.
>>> deep copy enabled
>>>
>>> Comparing pattern 4 to pattern 3, user will have one additional
>>> flexibility to control(force disabling) the local 

Re: [DISCUSS] FLIP-325: Support configuring end-to-end allowed latency

2023-07-06 Thread Dong Lin
Hi Jing,

Thanks for the comments. Please see my reply inline.

On Fri, Jul 7, 2023 at 5:40 AM Jing Ge  wrote:

> Hi,
>
> Thank you all for the inspired discussion. Really appreciate it!
>
> @Dong I'd like to ask some (stupid) questions to make sure I understand
> your thoughts correctly.
>
> 1. It will make no sense to send the same type of RecordAttributes right?
> e.g.  if one RecordAttributes(isBacklog=true) has been sent, a new
> RecordAttributes will be only sent when isBacklog is changed to be false,
> and vice versa. In this way, the number of RecordAttributes will be very
> limited.
>

Yes, you are right. Actually, this is what we plan to do when we update
operators to emit RecordAttributes via `Output#emitRecordAttributes()`.

Note that the FLIP does not specify the frequency of how operators should
invoke `Output#emitRecordAttributes()`. It is up to the operator
to decide when to emit RecordAttributes.


> 2. Since source readers can invoke Output#emitRecordAttributes to emit
> RecordAttributes(isBacklog=true/false), it might be weird to send
> RecordAttributes with different isBacklog back and forth too often. Devs
> and users should pay attention to it. Something is wrong when such a thing
> happens(metrics for monitoring?). Is this correct?
>

Actually, I think it could make sense to toggle isBacklog between true and
false while the job is running.

Suppose the job is reading from user-action data from Kafka and there is a
traffic spike for 2 hours. If the job keeps running in pure stream mode,
the watermark lag might keep increasing during this period because the
job's processing capability can not catch up with the Kafka input
throughput. In this case, it can be beneficial to dynamically switch
isBacklog to true when watermarkLag exceeds a given threshold (e.g. 5
minutes), and switch isBacklog to false again when the watermarkLag is low
enough (30 seconds).


> 3. Is there any relationship between end-to-end-latency and checkpoint
> interval that users should pay attention to? In the example described in
> the FLIP, both have the same value, 2 min. What about end-to-end-latency is
> configured bigger than checkpoint interval? Could checkpoint between
> end-to-end-latency be skipped?
>

This FLIP would not enforce any relationship between end-to-end latency and
checkpoint interval. Users are free to configure end-to-end latency to be
bigger than checkpoint interval.

I don't think there exists any use-case which requires end-to-end latency
to be higher than the checkpoint interval. Note that introducing a
relationship between these two configs would increase code complexity and
also make the documentation of these configs a bit more complex for users
to understand.

Since there is no correctness when a user sets end-to-end latency to be
bigger than the checkpointing interval, I think it is simpler to just let
the user decide how to configure them.


> 4. Afaiu, one major discussion point is that isBacklog can be derived from
> back pressure and there will be no need of RecordAttributes. In case a
> Flink job has rich resources that there is no back pressure (it will be
> difficult to perfectly have just enough resources that everything is fine
> but will have back pressure only for backlog) but we want to improve the
> throughput. We then need some other ways to derive isBacklog. That is the
> reason why RecordAttributes has been introduced. Did I understand it
> correctly?
>

I think there can be multiple ways to derive isBackog, including:
1) Based on the source operator's state. For example, when MySQL CDC source
is reading snapshot, it can claim isBacklog=true.
2) Based on the watermarkLag in the source. For example, when system_time -
watermark > user_specified_threshold, then isBacklog=true.
3) Based on metrics. For example, when busyTimeMsPerSecond (or
backPressuredTimeMsPerSecond) > user_specified_threshold, then
isBacklog=true.

Note that there are pros/cons between these choices and none of them can
best fit all use-cases. For example, since option-1 does not require any
extra user-specified threshold, it can be the best choice when we want to
improve user's existing job (e.g. the one in the motivation section)
without extra user configuration.

For use-cases which want to increase throughput when reading backlog data
from Kafka, option-2 can be the best choice because a threshold based on
the watermark lag is easier to understand and configure than configuring
threshold based on the backPressuredTimeMsPerSecond.

option-3 might be the only choice when option-1 and option-2 are not
available for the given use-cases. But it might be harder to configure a
threshold against backPressuredTimeMsPerSecond. This is because the choice
of the percentage (or ms per second) threshold will mostly be empirical and
approximate. For example, should the user configure this to be 100%, 99%,
or 90%? I would prefer not to have user worry about this if option-1 or
option-2 can be used.


Re: [VOTE] FLIP-321: introduce an API deprecation process

2023-07-06 Thread John Roesler
+1 (non-binding)

Thanks for the FLIP!
-John

On Mon, Jul 3, 2023, at 22:21, Jingsong Li wrote:
> +1 binding
>
> On Tue, Jul 4, 2023 at 10:40 AM Zhu Zhu  wrote:
>>
>> +1 (binding)
>>
>> Thanks,
>> Zhu
>>
>> ConradJam  于2023年7月3日周一 22:39写道:
>> >
>> > +1 (no-binding)
>> >
>> > Matthias Pohl  于2023年7月3日周一 22:33写道:
>> >
>> > > Thanks, Becket
>> > >
>> > > +1 (binding)
>> > >
>> > > On Mon, Jul 3, 2023 at 10:44 AM Jing Ge 
>> > > wrote:
>> > >
>> > > > +1(binding)
>> > > >
>> > > > On Mon, Jul 3, 2023 at 10:19 AM Stefan Richter
>> > > >  wrote:
>> > > >
>> > > > > +1 (binding)
>> > > > >
>> > > > >
>> > > > > > On 3. Jul 2023, at 10:08, Martijn Visser 
>> > > > > wrote:
>> > > > > >
>> > > > > > +1 (binding)
>> > > > > >
>> > > > > >
>> > > > > >
>> > > > > > On Mon, Jul 3, 2023 at 10:03 AM Xintong Song > > > > > > wrote:
>> > > > > >
>> > > > > >> +1 (binding)
>> > > > > >>
>> > > > > >> Best,
>> > > > > >>
>> > > > > >> Xintong
>> > > > > >>
>> > > > > >>
>> > > > > >>
>> > > > > >> On Sat, Jul 1, 2023 at 11:26 PM Dong Lin 
>> > > wrote:
>> > > > > >>
>> > > > > >>> Thanks for the FLIP.
>> > > > > >>>
>> > > > > >>> +1 (binding)
>> > > > > >>>
>> > > > > >>> On Fri, Jun 30, 2023 at 5:39 PM Becket Qin 
>> > > > > wrote:
>> > > > > >>>
>> > > > >  Hi folks,
>> > > > > 
>> > > > >  I'd like to start the VOTE for FLIP-321[1] which proposes to
>> > > > introduce
>> > > > > >> an
>> > > > >  API deprecation process to Flink. The discussion thread for the
>> > > FLIP
>> > > > > >> can
>> > > > > >>> be
>> > > > >  found here[2].
>> > > > > 
>> > > > >  The vote will be open until at least July 4, following the
>> > > consensus
>> > > > > >>> voting
>> > > > >  process.
>> > > > > 
>> > > > >  Thanks,
>> > > > > 
>> > > > >  Jiangjie (Becket) Qin
>> > > > > 
>> > > > >  [1]
>> > > > > 
>> > > > > 
>> > > > > >>>
>> > > > > >>
>> > > > >
>> > > >
>> > > https://www.google.com/url?q=https://cwiki.apache.org/confluence/display/FLINK/FLIP-321%253A%2BIntroduce%2Ban%2BAPI%2Bdeprecation%2Bprocess=gmail-imap=168897655400=AOvVaw24XYJrIcv_vYj1fJVQ7TNY
>> > > > >  [2]
>> > > > >
>> > > >
>> > > https://www.google.com/url?q=https://lists.apache.org/thread/vmhzv8fcw2b33pqxp43486owrxbkd5x9=gmail-imap=168897655400=AOvVaw1yaMLBBkFfvbBhvyAbHYfX
>> > > > >
>> > > > >
>> > > >
>> > >


Re: [DISCUSS] FLIP-325: Support configuring end-to-end allowed latency

2023-07-06 Thread Jing Ge
Hi,

Thank you all for the inspired discussion. Really appreciate it!

@Dong I'd like to ask some (stupid) questions to make sure I understand
your thoughts correctly.

1. It will make no sense to send the same type of RecordAttributes right?
e.g.  if one RecordAttributes(isBacklog=true) has been sent, a new
RecordAttributes will be only sent when isBacklog is changed to be false,
and vice versa. In this way, the number of RecordAttributes will be very
limited.

2. Since source readers can invoke Output#emitRecordAttributes to emit
RecordAttributes(isBacklog=true/false), it might be weird to send
RecordAttributes with different isBacklog back and forth too often. Devs
and users should pay attention to it. Something is wrong when such a thing
happens(metrics for monitoring?). Is this correct?

3. Is there any relationship between end-to-end-latency and checkpoint
interval that users should pay attention to? In the example described in
the FLIP, both have the same value, 2 min. What about end-to-end-latency is
configured bigger than checkpoint interval? Could checkpoint between
end-to-end-latency be skipped?

4. Afaiu, one major discussion point is that isBacklog can be derived from
back pressure and there will be no need of RecordAttributes. In case a
Flink job has rich resources that there is no back pressure (it will be
difficult to perfectly have just enough resources that everything is fine
but will have back pressure only for backlog) but we want to improve the
throughput. We then need some other ways to derive isBacklog. That is the
reason why RecordAttributes has been introduced. Did I understand it
correctly?

5. NIT: Just like we talked about in another thread, JavaBean naming
convention is recommended, i.e. isBacklog() & setBacklog() instead of
getIsBacklog() and setIsBacklog().

Best regards,
Jing

On Thu, Jul 6, 2023 at 2:38 PM Dong Lin  wrote:

> Hi Shammon,
>
> Thanks for your comments. Please see my reply inline.
>
>
> On Thu, Jul 6, 2023 at 12:47 PM Shammon FY  wrote:
>
> > Hi,
> >
> > Thanks for your replay @Dong. I really agree with Piotr's points and I
> > would like to share some thoughts from my side.
> >
> > About the latency for mini-batch mechanism in Flink SQL, I still think
> the
> > description in the FLIP is not right. If there are N operators and the
> > whole process time for data in the job is `t`, then the latency in
> > mini-batch will be `table.exec.mini-batch.allow-latency`+`t`, not `
> > table.exec.mini-batch.allow-latency`*N. I think this is one of the
> > foundations of this FLIP, and you may need to confirm it again.
> >
>
> Given that we agree to have a mechanism to support end-to-end latency for
> DataStream programs, I think the exact semantics of
> table.exec.mini-batch.allow-latency will not affect the motivation or API
> design of this FLIP. I have updated the FLIP to remove any mention of
> table.exec.mini-batch.allow-latency.
>
>
> >
> > I think supporting similar mechanisms in the runtime and balance latency
> > and throughput dynamically for all flink jobs is a very good idea, and I
> > have some questions for that.
> >
> > 1. We encounter a situation where the workload is high when processing
> > snapshot data and we need mini-batch in sql for performance reason. But
> the
> > workload is low when processing delta data, we need to automatically
> adjust
> > the mini-batch SQL for them, or even cancel the mini-batch during delta
> > processing. I think this FLIP meets our needs, but I think we need a
> > general solution which covers all source types in flink, and the
> > `isBacklog` in the FLIP is only one strategy.
> >
>
> The focus of this FLIP is to allow Flink runtime to adjust the behavior of
> operators (e.g. the buffer time) based on the IsBacklog status of sources
> and the user-specified execution.end-to-end-latency (effective only when
> there is no backlog). The FLIP assumes there is already a strategy for
> sources to determine the IsProcessingBacklog status without adding more
> strategies.
>
> I agree it is useful to introduce more strategies to determine the the
> IsProcessingBacklog status for sources. We can determine the
> IsProcessingBacklog status based on the backpressure metrics, the
> event-time watermark lag, or anything we find reasonable. I would like to
> work on this in follow-up FLIPs and that we don't work on too many things
> in the same FLIP.
>
> Would this be OK with you?
>
>
> > From the FLIP I think there should be two parts: dynamic trigger flush
> > event in JM and dynamic trigger flush operations in Operator. We need to
> > introduce much more general interfaces for them, such as
> > `DynamicFlushStrategy` in JM and `DynamicFlushOperation` in TM? As Piotr
> > mentioned above, we can collect many information from TM locally such as
> > backpressure, queue size and `Operator` can decide whether to buffer data
> > or process it immediately.  JM is also the same, it can decide to send
> > flush events on a regular basis or 

[jira] [Created] (FLINK-32554) Facilitate slot isolation and resource management for global committer

2023-07-06 Thread Allen Wang (Jira)
Allen Wang created FLINK-32554:
--

 Summary: Facilitate slot isolation and resource management for 
global committer
 Key: FLINK-32554
 URL: https://issues.apache.org/jira/browse/FLINK-32554
 Project: Flink
  Issue Type: Improvement
Affects Versions: 1.16.2
Reporter: Allen Wang


Flink's global committer executes unique workload compared to the source and 
sink operators. In some use cases, it may require much higher amount of 
resources (CPU, memory) than other operators. However, according to this 
[source 
code|https://github.com/apache/flink/blob/master/flink-streaming-java/src/main/java/org/apache/flink/streaming/api/connector/sink2/StandardSinkTopologies.java],
 currently it is not possible to isolate the global committer to a dedicated 
task manager or task slot, or assign more resources to it by leveraging the 
fine grained resource management. Flink would always make the global committer 
task share with another task in a task slot. (In one test, we tried to have one 
more task slot than required by the source/sink parallelism, but Flink still 
assigns the global committer to share a slot with another task.)

As a result, we often see CPU utilization spike on the task manger that runs 
the global committer compared with other task managers and becomes the 
bottleneck for the job. Due to slot sharing and inadequate resources on the 
global committer, the job takes long time to initialize upon restarting and the 
checkpoints take long time to complete. Our job consumes from Kafka and this 
bottleneck causes significant increase of consumer lag. The lag in turn causes 
the Kafka source operator to replay backlogs, causing more CPU consumption on 
the source operator and making it worse for the global committer that runs in 
the same task slot.

At minimum, we want the capability to configure the global committer to run in 
its own task slot, and make that work under reactive scaling. It would also be 
great to make the fine grained resource management working for global committer.

 

 

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-32552) Mixed up Flink session job deployments

2023-07-06 Thread Fabio Wanner (Jira)
Fabio Wanner created FLINK-32552:


 Summary: Mixed up Flink session job deployments
 Key: FLINK-32552
 URL: https://issues.apache.org/jira/browse/FLINK-32552
 Project: Flink
  Issue Type: Bug
  Components: Kubernetes Operator
Reporter: Fabio Wanner


*Context*

In the scope of end-to-end tests we deploy all the Flink session jobs we have 
regularly in a staging environment. Some of the jobs are bundled together in 
one helm chart and therefore deployed at the same time. There are around 40 
individual Flink jobs (running on the same Flink session cluster). The session 
cluster is individual for each e2e test run. The problems described below 
happen scarcely (1 in ~ 50 run maybe).

*Problem*

Rarely the operator seems to "mix up" the deployments. This can be seen in the 
Flink cluster logs as multiple {{Received JobGraph submission '' 
()}} logs are created from jobs with the same job_id. This results in 
errors such as:

{{DuplicateJobSubmissionException}} or {{ClassNotFoundException.}}

It' also visible in the FlinkSessionJob resource: status.jobStatus.jobName does 
not match the expected job name of the job being deployed (The job name is 
passed to the application via argument).

So far we were unable to reliably reproduce the error.

*Details*

The following lines show the status of 3 jobs form the view point of the Flink 
cluster dashboard, and the FlinkSessionJob ressource:

*aletsch_smc_e5730831db8092adb12f5189c4c895ef3a268615*

Apache Flink Dashboard:
 * State: Restarting
 * ID: a7d36f3881f943a2
 * Exceptions: Cannot load user class: aelps.pipelines.aletsch.smc.SMCUrlMapper

FlinkSessionJob Ressource:
 * State: RUNNING
 * jobId: a1221c743367497b0002
 * uid: a1221c74-3367-497b-ad2f-8793ab23919d

*aletsch_mat_e5730831db8092adb12f5189c4c895ef3a268615*

Apache Flink Dashboard:
 * State: -
 * ID: -

FlinkSessionJob Ressource:
 * State: UPGRADING
 * jobId: -
 * uid: a7d36f38-81f9-43a0-898f-19b950430e9d

Flink K8s Operator:
 * Exceptions: DuplicateJobSubmissionException: Job has already been submitted.

*aletsch_wp_wafer_e5730831db8092adb12f5189c4c895ef3a268615*

Apache Flink Dashboard:
 * State: Running
 * ID: e692c2dfaa18441c0002
 * Exceptions: -

FlinkSessionJob Ressource:
 * State: RUNNING
 * jobId: e692c2dfaa18441c0002
 * uid: e692c2df-aa18-441c-a352-88aefa9a3017

As we can see the *aletsch_smc* job is presumably running according to the 
FlinkSessionJob resource, but crash-looping in the cluster and it has the jobID 
matching the uid of the resource of {*}aletsch_mat{*}. While *aletsch_mat* is 
not even running. The following logs also show some suspicious entries: There 
are several {{Received JobGraph submission}} from different jobs with the same 
jobID.

*Logs*

The logs are filtered by the ** 3 jobIds from above.

 

JobID: a7d36f3881f943a2
{code:bash}
Flink Cluster
...
023-07-06 10:23:50,552 INFO  
org.apache.flink.runtime.executiongraph.ExecutionGraph   [] - Job 
aletsch_smc_e5730831db8092adb12f5189c4c895ef3a268615 
(a7d36f3881f943a2) switched from state RUNNING to RESTARTING.
2023-07-06 10:23:50 file: 
'/tmp/tm_10.0.11.159:6122-e9fadc/blobStorage/job_a7d36f3881f943a2/blob_p-40c7a30adef8868254191d2cf2dbc4cb7ab46f0d-8a02a0583d91c5e8e6c94f378aa444c2'
 (valid JAR)
2023-07-06 10:23:50,522 INFO  
org.apache.flink.runtime.resourcemanager.slotmanager.DeclarativeSlotManager [] 
- Received resource requirements from job a7d36f3881f943a2: 
[ResourceRequirement{resourceProfile=ResourceProfile{UNKNOWN}, 
numberOfRequiredSlots=4}]
2023-07-06 10:23:50,522 INFO  
org.apache.flink.runtime.resourcemanager.slotmanager.DeclarativeSlotManager [] 
- Received resource requirements from job a7d36f3881f943a2: 
[ResourceRequirement{resourceProfile=ResourceProfile{UNKNOWN}, 
numberOfRequiredSlots=3}]
2023-07-06 10:23:50,522 INFO  
org.apache.flink.runtime.resourcemanager.slotmanager.DeclarativeSlotManager [] 
- Received resource requirements from job a7d36f3881f943a2: 
[ResourceRequirement{resourceProfile=ResourceProfile{UNKNOWN}, 
numberOfRequiredSlots=2}]
2023-07-06 10:23:50,522 INFO  
org.apache.flink.runtime.resourcemanager.slotmanager.DeclarativeSlotManager [] 
- Received resource requirements from job a7d36f3881f943a2: 
[ResourceRequirement{resourceProfile=ResourceProfile{UNKNOWN}, 
numberOfRequiredSlots=1}]
2023-07-06 10:23:50,512 INFO  
org.apache.flink.runtime.executiongraph.ExecutionGraph   [] - Job 
aletsch_smc_e5730831db8092adb12f5189c4c895ef3a268615 
(a7d36f3881f943a2) switched from state RESTARTING to RUNNING.
2023-07-06 10:23:48,979 INFO  
org.apache.flink.runtime.resourcemanager.slotmanager.DeclarativeSlotManager [] 
- Clearing resource requirements of job 

[jira] [Created] (FLINK-32553) ClusterEntrypointTest.testCloseAsyncShouldNotDeregisterApp failed on AZP

2023-07-06 Thread Sergey Nuyanzin (Jira)
Sergey Nuyanzin created FLINK-32553:
---

 Summary: 
ClusterEntrypointTest.testCloseAsyncShouldNotDeregisterApp failed on AZP
 Key: FLINK-32553
 URL: https://issues.apache.org/jira/browse/FLINK-32553
 Project: Flink
  Issue Type: Bug
  Components: Runtime / Coordination
Affects Versions: 1.17.2
Reporter: Sergey Nuyanzin



https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=51013=logs=77a9d8e1-d610-59b3-fc2a-4766541e0e33=125e07e7-8de0-5c6c-a541-a567415af3ef=7961
{noformat}
Jul 06 05:38:37 [ERROR] Tests run: 9, Failures: 0, Errors: 1, Skipped: 0, Time 
elapsed: 71.304 s <<< FAILURE! - in 
org.apache.flink.runtime.entrypoint.ClusterEntrypointTest
Jul 06 05:38:37 [ERROR] 
org.apache.flink.runtime.entrypoint.ClusterEntrypointTest.testCloseAsyncShouldNotDeregisterApp
  Time elapsed: 22.51 s  <<< ERROR!
Jul 06 05:38:37 org.apache.flink.runtime.entrypoint.ClusterEntrypointException: 
Failed to initialize the cluster entrypoint TestingEntryPoint.
Jul 06 05:38:37 at 
org.apache.flink.runtime.entrypoint.ClusterEntrypoint.startCluster(ClusterEntrypoint.java:255)
Jul 06 05:38:37 at 
org.apache.flink.runtime.entrypoint.ClusterEntrypointTest.startClusterEntrypoint(ClusterEntrypointTest.java:347)
Jul 06 05:38:37 at 
org.apache.flink.runtime.entrypoint.ClusterEntrypointTest.testCloseAsyncShouldNotDeregisterApp(ClusterEntrypointTest.java:175)
Jul 06 05:38:37 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native 
Method)
Jul 06 05:38:37 at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
Jul 06 05:38:37 at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
Jul 06 05:38:37 at java.lang.reflect.Method.invoke(Method.java:498)
Jul 06 05:38:37 at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
Jul 06 05:38:37 at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
Jul 06 05:38:37 at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)

{noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] FLIP-325: Support configuring end-to-end allowed latency

2023-07-06 Thread Dong Lin
Hi Shammon,

Thanks for your comments. Please see my reply inline.


On Thu, Jul 6, 2023 at 12:47 PM Shammon FY  wrote:

> Hi,
>
> Thanks for your replay @Dong. I really agree with Piotr's points and I
> would like to share some thoughts from my side.
>
> About the latency for mini-batch mechanism in Flink SQL, I still think the
> description in the FLIP is not right. If there are N operators and the
> whole process time for data in the job is `t`, then the latency in
> mini-batch will be `table.exec.mini-batch.allow-latency`+`t`, not `
> table.exec.mini-batch.allow-latency`*N. I think this is one of the
> foundations of this FLIP, and you may need to confirm it again.
>

Given that we agree to have a mechanism to support end-to-end latency for
DataStream programs, I think the exact semantics of
table.exec.mini-batch.allow-latency will not affect the motivation or API
design of this FLIP. I have updated the FLIP to remove any mention of
table.exec.mini-batch.allow-latency.


>
> I think supporting similar mechanisms in the runtime and balance latency
> and throughput dynamically for all flink jobs is a very good idea, and I
> have some questions for that.
>
> 1. We encounter a situation where the workload is high when processing
> snapshot data and we need mini-batch in sql for performance reason. But the
> workload is low when processing delta data, we need to automatically adjust
> the mini-batch SQL for them, or even cancel the mini-batch during delta
> processing. I think this FLIP meets our needs, but I think we need a
> general solution which covers all source types in flink, and the
> `isBacklog` in the FLIP is only one strategy.
>

The focus of this FLIP is to allow Flink runtime to adjust the behavior of
operators (e.g. the buffer time) based on the IsBacklog status of sources
and the user-specified execution.end-to-end-latency (effective only when
there is no backlog). The FLIP assumes there is already a strategy for
sources to determine the IsProcessingBacklog status without adding more
strategies.

I agree it is useful to introduce more strategies to determine the the
IsProcessingBacklog status for sources. We can determine the
IsProcessingBacklog status based on the backpressure metrics, the
event-time watermark lag, or anything we find reasonable. I would like to
work on this in follow-up FLIPs and that we don't work on too many things
in the same FLIP.

Would this be OK with you?


> From the FLIP I think there should be two parts: dynamic trigger flush
> event in JM and dynamic trigger flush operations in Operator. We need to
> introduce much more general interfaces for them, such as
> `DynamicFlushStrategy` in JM and `DynamicFlushOperation` in TM? As Piotr
> mentioned above, we can collect many information from TM locally such as
> backpressure, queue size and `Operator` can decide whether to buffer data
> or process it immediately.  JM is also the same, it can decide to send
> flush events on a regular basis or send them based on the collected metrics
> information and other information, such as the isBacklog in the FLIP.
>
> 2. I really don't get enough benefits for `RecordAttribute` in the FLIP and
> as Piotr mentioned above too, it will generate a large number of messages,
>

If there is any sentence in the FLIP that suggests we will emit a lot of
RecordAttribute, sorry for that and I would fix it.

Currently, the FLIP provides the `Output#emitRecordAttributes()` for
operators (e.g. source reader) to emit RecordAttributes. The FLIP leaves
the operator to decide the frequency and value of the emitted
RecordAttributes.

Our plan is to let SourceReader emit RecordAttributes only when its value
(e.g. isBacklog) differs from the value of the RecordAttributes it has
emitted earlier. It should avoid resending RecordAttributes with the same
value, similar to how Flink currently avoids resending
Watermark/WatermarkStatus with the same value.

Would it address your concern?


> affecting performance. FLIP mentions that it will be applied to Operator
> and Sink, I try to understand it's role and please correct me if I'm wrong.
> a) It tells the Operator and Sink that current most of data they are
> processing are from snapshot and are "insert" data? For the out of order in
> flink, the Operator and Sink may receive "upsert" data from other sources.
>

The RecordAttributes currently proposed in the FLIP only provides the
IsBacklog information, which tells the operator (including sink operator)
whether the records received after this RercordAttributes event are
"backlog". Note that snapshot (e.g. MySQL CDC snapshot) is one particular
case which can be classified as backlog. But we might introduce more
strategies to classify recods as backlog in the future.

Currently, RecordAttributes does not specify whether the following records
are insert-only or upsert. We might introduce such an atttribute if there
is a good use-case for having it.


> b) Do Operators and Sink perform any very special 

Re: [DISCUSS] FLIP-314: Support Customized Job Lineage Listener

2023-07-06 Thread Jing Ge
Hi Shammon,

Thanks for the clarification. Atlas might have his historical reason back
to the hadoop era or maybe even back to the hibernate where Entity and
Relation were commonly used. Flink already used Vertex and Edge to describe
DAG. Some popular tools like dbt are also using this convention[1] and,
afaik, most graph frameworks use vertex and edge too. It will be easier for
Flink devs and users to have a consistent naming convention for the same
concept, i.e. in this case, DAG.

Best regards,
Jing

[1]
https://docs.getdbt.com/docs/dbt-cloud-apis/discovery-use-cases-and-examples#discovery

On Wed, Jul 5, 2023 at 11:28 AM Shammon FY  wrote:

> Hi Jing,
>
> Thanks for your feedback.
>
> > 1. TableColumnLineageRelation#sinkColumn() should return
> TableColumnLineageEntity instead of String, right?
>
> The `sinkColumn()` will return `String` which is the column name in the
> sink connector. I found the name of `TableColumnLineageEntity` may
> cause ambiguity and I have renamed it to `TableColumnSourceLineageEntity`.
> In my mind the `TableColumnLineageRelation` represents the lineage for each
> sink column, each column may be computed from multiple sources and columns.
> I use `TableColumnSourceLineageEntity` to manage each source and its
> columns for the sink column, so `TableColumnLineageRelation` has a sink
> column name and `TableColumnSourceLineageEntity` list.
>
> > 2. Since LineageRelation already contains all information to build the
> lineage between sources and sink, do we still need to set the LineageEntity
> in the source?
>
> The lineage interface of `DataStream` is very flexible. We have added
> `setLineageEntity` to the source to limit and verify user behavior,
> ensuring that users have not added non-existent sources as lineage.
>
> > 3. About the "Entity" and "Relation" naming, I was confused too, like
> Qingsheng mentioned. How about LineageVertex, LineageEdge, and LineageEdges
> which contains multiple LineageEdge?
>
> We referred to `Atlas` for the name of lineage, it uses `Entity` and
> `Relation` to represent the lineage relationship and another metadata
> service `Datahub` uses `DataSet` to represent the entity. I think `Entity`
> and `Relation` are nicer for lineage, what do you think of it?
>
> Best,
> Shammon FY
>
>
> On Thu, Jun 29, 2023 at 4:21 AM Jing Ge 
> wrote:
>
> > Hi Shammon,
> >
> > Thanks for your proposal. After reading the FLIP, I'd like to ask
> > some questions to make sure we are on the same page. Thanks!
> >
> > 1. TableColumnLineageRelation#sinkColumn() should return
> > TableColumnLineageEntity instead of String, right?
> >
> > 2. Since LineageRelation already contains all information to build the
> > lineage between sources and sink, do we still need to set the
> LineageEntity
> > in the source?
> >
> > 3. About the "Entity" and "Relation" naming, I was confused too, like
> > Qingsheng mentioned. How about LineageVertex, LineageEdge, and
> LineageEdges
> > which contains multiple LineageEdge? E.g. multiple sources join into one
> > sink, or, edges of columns from one or different tables, etc.
> >
> > Best regards,
> > Jing
> >
> > On Sun, Jun 25, 2023 at 2:06 PM Shammon FY  wrote:
> >
> > > Hi yuxia and Yun,
> > >
> > > Thanks for your input.
> > >
> > > For yuxia:
> > > > 1: What kinds of JobStatus will the `JobExecutionStatusEven`
> including?
> > >
> > > At present, we only need to notify the listener when a job goes to
> > > termination, but I think it makes sense to add generic `oldStatus` and
> > > `newStatus` in the listener and users can update the job state in their
> > > service as needed.
> > >
> > > > 2: I'm really confused about the `config()` included in
> > `LineageEntity`,
> > > where is it from and what is it for ?
> > >
> > > The `config` in `LineageEntity` is used for users to get options for
> > source
> > > and sink connectors. As the examples in the FLIP, users can add
> > > server/group/topic information in the config for kafka and create
> lineage
> > > entities for `DataStream` jobs, then the listeners can get this
> > information
> > > to identify the same connector in different jobs. Otherwise, the
> `config`
> > > in `TableLineageEntity` will be the same as `getOptions` in
> > > `CatalogBaseTable`.
> > >
> > > > 3: Regardless whether `inputChangelogMode` in
> `TableSinkLineageEntity`
> > is
> > > needed or not, since `TableSinkLineageEntity` contains
> > > `inputChangelogMode`, why `TableSourceLineageEntity` don't contain
> > > changelogmode?
> > >
> > > At present, we do not actually use the changelog mode. It can be
> deleted,
> > > and I have updated FLIP.
> > >
> > > > Btw, since there're a lot interfaces proposed, I think it'll be
> better
> > to
> > > give an example about how to implement a listener in this FLIP to make
> us
> > > know better about the interfaces.
> > >
> > > I have added the example in the FLIP and the related interfaces and
> > > examples are in branch [1].
> > >
> > > For Yun:
> > > > I have one more 

Re: [DISCUSS] FLIP-329: Add operator attribute to specify support for object-reuse

2023-07-06 Thread Jing Ge
Hi Dong,

one scenario I could imagine is that users could enable global object reuse
features but force deep copy for some user defined specific functions
because of any limitations. But that is only my gut feeling. And agree, we
could keep the solution simple for now as FLIP described and upgrade to 3VL
once there are such real requirements that are rising.

Best regards,
Jing

On Thu, Jul 6, 2023 at 12:30 PM Dong Lin  wrote:

> Hi Jing,
>
> Thank you for the detailed explanation. Please see my reply inline.
>
> On Thu, Jul 6, 2023 at 3:17 AM Jing Ge  wrote:
>
>> Hi Xuannan, Hi Dong,
>>
>> Thanks for your clarification.
>>
>> @Xuannan
>>
>> A Jira ticket has been created for the doc update:
>> https://issues.apache.org/jira/browse/FLINK-32546
>>
>> @Dong
>>
>> I don't have a concrete example. I just thought about it from a
>> conceptual or pattern's perspective. Since we have 1. coarse-grained global
>> switch(CGS as abbreviation), i.e. the pipeline.object-reuse and 2.
>> fine-grained local switch(FGS as abbreviation), i.e. the
>> objectReuseCompliant variable for specific operators/functions, there will
>> be the following patterns with appropriate combinations:
>>
>> pattern 1: coarse-grained switch only. Local object reuse will be
>> controlled by the coarse-grained switch:
>> 1.1 cgs == true -> local object reused enabled
>> 1.2 cgs == true  -> local object reused enabled
>> 1.3 cgs == false -> local object reused disabled, i.e. deep copy enabled
>> 1.4 cgs == false -> local object reused disabled, i.e. deep copy enabled
>>
>> afaiu, this is the starting point. I wrote 4 on purpose to make the
>> regression check easier. We can consider it as the combinations with
>> cgs(true/false) and fgs(true/false) while fgs is ignored.
>>
>> Now we introduce fine-grained switch. There will be two patterns:
>>
>> pattern 2: fine-grained switch over coarse-grained switch. Coarse-grained
>> switch will be ignored when the local fine-grained switch has different
>> value:
>> 2.1 cgs == true and fgs == true -> local object reused enabled
>> 2.2 cgs == true and fgs == false -> local object reused disabled, i.e.
>> deep copy enabled
>> 2.3 cgs == false and fgs == true -> local object reused enabled
>> 2.4 cgs == false and fgs == false -> local object reused disabled, i.e.
>> deep copy enabled
>>
>> cgs is actually ignored.
>>
>> Current FLIP is using a slightly different pattern:
>>
>> pattern 3: fine-grained switch over coarse-grained switch only when
>> coarse-grained switch is off, i..e cgs OR fgs:
>> 3.1 cgs == true and fgs == true -> local object reused enabled
>> 3.2 cgs == true and fgs == false -> local object reused enabled
>> 3.3 cgs == false and fgs == true -> local object reused enabled
>> 3.4 cgs == false and fgs == false -> local object reused disabled, i.e.
>> deep copy enabled
>>
>> All of those patterns are rational and each has different focus. It
>> depends on the real requirement to choose one of them.
>>
>> As we can see, if fgs is using 2VL, there is a regression between pattern
>> 1 and pattern 2. You are absolutely right in this case. That's why I
>> suggested 3VL, i.e. fgs will have triple values: true, false, unknown(e.g.
>> null)
>>
>> pattern 4: 3VL fgs with the null as init value (again, there are just two
>> combination, I made it 4 on purpose):
>> 4.1 cgs == true and fgs == null -> local object reused enabled
>> 4.2 cgs == true and fgs == null -> local object reused enabled
>> 4.3 cgs == false and fgs == null -> local object reused disabled, i.e.
>> deep copy enabled
>> 4.4 cgs == false and fgs == null -> local object reused disabled, i.e.
>> deep copy enabled
>>
>> Since the default value of fgs is null, pattern 4 is backward compatible
>> with pattern 1, which means no regression.
>>
>> Now we will set value to fgs and follow the pattern 2:
>> 4.5 cgs == true and fgs == true -> local object reused enabled
>> 4.6 cgs == true and fgs == false -> local object reused disabled, i.e.
>> deep copy enabled
>> 4.7 cgs == false and fgs == true -> local object reused enabled
>> 4.8 cgs == false and fgs == false -> local object reused disabled, i.e.
>> deep copy enabled
>>
>> Pattern 4 contains pattern 3 with the following combinations(force
>> enabling local object reuse):
>> 4.5 cgs == true and fgs == true -> local object reused enabled
>> 4.2 cgs == true and fgs == null -> local object reused enabled
>> 4.7 cgs == false and fgs == true -> local object reused enabled
>> 4.4 cgs == false and fgs == null -> local object reused disabled, i.e.
>> deep copy enabled
>>
>> Comparing pattern 4 to pattern 3, user will have one additional
>> flexibility to control(force disabling) the local object reuse capability
>> because of 3VL, i.e. 4.2+4.6 vs. 3.2.
>>
>
> I think you are suggesting to allow the user setting fgs to null so that
> "user will have one additional flexibility to control(force disabling) the
> local object reuse capability".
>
> In general, an API that only allows false/true is a bit more 

Re: [ANNOUNCE] Apache Flink has won the 2023 SIGMOD Systems Award

2023-07-06 Thread Piotr Nowojski
Congratulations to everyone :)

Piotrek

czw., 6 lip 2023 o 12:23 Jane Chan  napisał(a):

> Congratulations!
>
> Best,
> Jane
>
> On Thu, Jul 6, 2023 at 2:15 PM Jiadong Lu  wrote:
>
> > Congratulations!
> >
> > Best regards,
> > Jiadong Lu
> >
> > On 2023/7/6 13:26, Weihua Hu wrote:
> > > Congratulations!
> > >
> > > Best,
> > > Weihua
> > >
> > >
> > > On Wed, Jul 5, 2023 at 5:46 PM Shammon FY  wrote:
> > >
> > >> Congratulations!
> > >>
> > >> Best,
> > >> Shammon FY
> > >>
> > >> On Wed, Jul 5, 2023 at 2:38 PM Paul Lam 
> wrote:
> > >>
> > >>> Congrats and cheers!
> > >>>
> > >>> Best,
> > >>> Paul Lam
> > >>>
> >  2023年7月4日 18:04,Benchao Li  写道:
> > 
> >  Congratulations!
> > 
> >  Feng Jin  于2023年7月4日周二 16:17写道:
> > 
> > > Congratulations!
> > >
> > > Best,
> > > Feng Jin
> > >
> > >
> > >
> > > On Tue, Jul 4, 2023 at 4:13 PM Yuxin Tan 
> > >>> wrote:
> > >
> > >> Congratulations!
> > >>
> > >> Best,
> > >> Yuxin
> > >>
> > >>
> > >> Dunn Bangui  于2023年7月4日周二 16:04写道:
> > >>
> > >>> Congratulations!
> > >>>
> > >>> Best,
> > >>> Bangui Dunn
> > >>>
> > >>> Yangze Guo  于2023年7月4日周二 15:59写道:
> > >>>
> >  Congrats everyone!
> > 
> >  Best,
> >  Yangze Guo
> > 
> >  On Tue, Jul 4, 2023 at 3:53 PM Rui Fan <1996fan...@gmail.com>
> > >> wrote:
> > >
> > > Congratulations!
> > >
> > > Best,
> > > Rui Fan
> > >
> > > On Tue, Jul 4, 2023 at 2:08 PM Zhu Zhu 
> > wrote:
> > >
> > >> Congratulations everyone!
> > >>
> > >> Thanks,
> > >> Zhu
> > >>
> > >> Hang Ruan  于2023年7月4日周二 14:06写道:
> > >>>
> > >>> Congratulations!
> > >>>
> > >>> Best,
> > >>> Hang
> > >>>
> > >>> Jingsong Li  于2023年7月4日周二 13:47写道:
> > >>>
> >  Congratulations!
> > 
> >  Thank you! All of the Flink community!
> > 
> >  Best,
> >  Jingsong
> > 
> >  On Tue, Jul 4, 2023 at 1:24 PM tison 
> > >>> wrote:
> > >
> > > Congrats and with honor :D
> > >
> > > Best,
> > > tison.
> > >
> > >
> > > Mang Zhang  于2023年7月4日周二 11:08写道:
> > >
> > >> Congratulations!--
> > >>
> > >> Best regards,
> > >> Mang Zhang
> > >>
> > >>
> > >>
> > >>
> > >>
> > >> 在 2023-07-04 01:53:46,"liu ron"  写道:
> > >>> Congrats everyone
> > >>>
> > >>> Best,
> > >>> Ron
> > >>>
> > >>> Jark Wu  于2023年7月3日周一 22:48写道:
> > >>>
> >  Congrats everyone!
> > 
> >  Best,
> >  Jark
> > 
> > > 2023年7月3日 22:37,Yuval Itzchakov 
> > >> 写道:
> > >
> > > Congrats team!
> > >
> > > On Mon, Jul 3, 2023, 17:28 Jing Ge via user <
> >  u...@flink.apache.org
> >  > wrote:
> > >> Congratulations!
> > >>
> > >> Best regards,
> > >> Jing
> > >>
> > >>
> > >> On Mon, Jul 3, 2023 at 3:21 PM yuxia <
> >  luoyu...@alumni.sjtu.edu.cn
> >  > wrote:
> > >>> Congratulations!
> > >>>
> > >>> Best regards,
> > >>> Yuxia
> > >>>
> > >>> 发件人: "Pushpa Ramakrishnan" <
> >  pushpa.ramakrish...@icloud.com
> >   >  pushpa.ramakrish...@icloud.com>>
> > >>> 收件人: "Xintong Song"  > >  >  tonysong...@gmail.com>>
> > >>> 抄送: "dev"  > >> dev@flink.apache.org>>,
> >  "User"  > >>> u...@flink.apache.org
> > >>
> > >>> 发送时间: 星期一, 2023年 7 月 03日 下午 8:36:30
> > >>> 主题: Re: [ANNOUNCE] Apache Flink has won the 2023
> > >>> SIGMOD
> > >> Systems
> > >> Award
> > >>>
> > >>> Congratulations \uD83E\uDD73
> > >>>
> > >>> On 03-Jul-2023, at 3:30 PM, Xintong Song <
> > >> tonysong...@gmail.com
> >  > wrote:
> > >>>
> > >>> 
> > >>> Dear Community,
> > >>>
> > >>> I'm pleased to share this good news with everyone.
> > >> As
> >  

Re: [DISCUSS] FLIP-327: Support stream-batch unified operator to improve job throughput when processing backlog data

2023-07-06 Thread Dong Lin
Hi Piotr,

Thanks for your comments! Please see my reply inline.

On Wed, Jul 5, 2023 at 11:44 PM Piotr Nowojski 
wrote:

> Hi Dong,
>
> I have a couple of questions.
>
> Could you explain why those properties
>
> @Nullable private Boolean isOutputOnEOF = null;
> @Nullable private Boolean isOutputOnCheckpoint = null;
> @Nullable private Boolean isInternalSorterSupported = null;
>
> must be `@Nullable`, instead of having the default value set to `false`?
>

By initializing these private variables in OperatorAttributesBuilder as
null, we can implement `OperatorAttributesBuilder#build()` in such a way
that it can print DEBUG level logging to say "isOutputOnCheckpoint is not
explicitly set". This can help user/SRE debug performance issues (or lack
of the expected optimization) due to operators not explicitly setting the
right operator attribute.

For example, we might want a job to always use the longer checkpointing
interval (i.e. execution.checkpointing.interval-during-backlog) if all
running operators have isOutputOnCheckpoint==false, and use the short
checkpointing interval otherwise. If a user has explicitly configured the
execution.checkpointing.interval-during-backlog but the two-phase commit
sink library has not been upgraded to set isOutputOnCheckpoint=true, then
the job will end up using the long checkpointing interval, and it will be
useful to figure out what is going wrong in this case by checking the log.

Note that the default value of these fields of the OperatorAttributes
instance built by OperatorAttributesBuilder will still be false. The
following is mentioned in the Java doc of
`OperatorAttributesBuilder#build()`:

 /**
  * If any operator attribute is null, we will log it at DEBUG level and
use the following
  * default values.
  * - isOutputOnEOF defaults to false
  * - isOutputOnCheckpoint defaults to false
  * - isInternalSorterSupported defaults to false
  */


>
> Second question, have you thought about cases where someone is
> either bootstrapping from a streaming source like Kafka
> or simply trying to catch up after a long period of downtime in a purely
> streaming job? Generally speaking a cases where
> user doesn't care about latency in the catch up phase, regardless if the
> source is bounded or unbounded, but wants to process
> the data as fast as possible, and then switch dynamically to real time
> processing?
>

Yes, I have thought about this. We should allow this job to effectively run
in batch mode when the job is in the catch-up phase. FLIP-327 is actually
an important step toward addressing this use-case.

In order to address the above use-case, all we need is a way for source
operator (e.g. Kafka) to tell Flink runtime (via IsProcessingBacklog)
whether it is in the catch-up phase.

Since every Kafka message has event-timestamp, we can allow users to
specify a job-level config such as backlog-watermark-lag-threshold, and
consider a Kafka Source to have IsProcessingBacklog=true if system_time -
watermark > backlog-watermark-lag-threshold. This effectively allows us to
determine whether Kafka is in the catch up phase.

Once we have this capability (I plan to work on this in FLIP-328), we can
directly use the features proposed in FLIP-325 and FLIP-327 to optimize the
above use-case.

What do you think?

Best,
Dong


>
> Best,
> Piotrek
>
> niedz., 2 lip 2023 o 16:15 Dong Lin  napisał(a):
>
> > Hi all,
> >
> > I am opening this thread to discuss FLIP-327: Support stream-batch
> unified
> > operator to improve job throughput when processing backlog data. The
> design
> > doc can be found at
> >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-327%3A+Support+stream-batch+unified+operator+to+improve+job+throughput+when+processing+backlog+data
> > .
> >
> > This FLIP enables a Flink job to initially operate in batch mode,
> achieving
> > high throughput while processing records that do not require low
> processing
> > latency. Subsequently, the job can seamlessly transition to stream mode
> for
> > processing real-time records with low latency. Importantly, the same
> state
> > can be utilized before and after this mode switch, making it particularly
> > valuable when users wish to bootstrap the job's state using historical
> > data.
> >
> > We would greatly appreciate any comments or feedback you may have on this
> > proposal.
> >
> > Cheers,
> > Dong
> >
>


[VOTE] FLIP-322 Cooldown period for adaptive scheduler. Second vote.

2023-07-06 Thread Etienne Chauchot

Hi all,

Thanks for your feedback about the FLIP-322: Cooldown period for 
adaptive scheduler [1].


This FLIP was discussed in [2].

I'd like to start a vote for it. The vote will be open for at least 72
hours (until July 9th 15:00 GMT) unless there is an objection or
insufficient votes.

[1] 
https://cwiki.apache.org/confluence/display/FLINK/FLIP-322+Cooldown+period+for+adaptive+scheduler

[2] https://lists.apache.org/thread/qvgxzhbp9rhlsqrybxdy51h05zwxfns6

Best,

Etienne


Re: [DISCUSS] FLIP-322 Cooldown period for adaptive scheduler

2023-07-06 Thread Etienne Chauchot

Hi,

I think we have reached a consensus here. I have updated the FLIP to 
reflect recent suggestions. I will start a new vote.


Best

Etienne

Le 05/07/2023 à 14:42, Etienne Chauchot a écrit :


Hi all,

Thanks David for your suggestions. Comments inline.

Le 04/07/2023 à 13:35, David Morávek a écrit :

waiting 2 min between 2 requirements push seems ok to me

This depends on the workload. Would you care if the cost of rescaling were
close to zero (which is for most out-of-the-box workloads)? In that case,
it would be desirable to rescale more frequently, for example, if TMs join
incrementally.

Creating a value that covers everything is impossible unless it's
self-tuning, so I'd prefer having a smooth experience for people trying
things out (just imagine doing a demo at the conference) and having them
opt-in for longer cooldowns.

The users still have the ability to lower the cooldown period for high 
workloads but we could definitely set a default value to a lower 
number. I agree to favo 
r 
lower numbers (for smooth rescale experience) and consider higher 
numbers (for high workloads) as exceptions. But we still need to agree 
on a suitable default for most cases: 30s ?

One idea to keep the timeouts lower while getting more balance would be
restarting the cooldown period when new resources or requirements are
received. This would also bring the cooldown's behavior closer to the
resource-stabilization timeout. Would that make sense?



you mean, if slots are received during the cooldown period instead of 
proposed behavior (A),  do behavior (B) ?


A. schedule a rescale at lastRescale + cooldown point in time

B. schedule a rescale at ** now ** + cooldown point in time

It looks fine to me. It is even better because it avoids having 2 
rescales scheduled at the same time if 2 slots change arrive during 
the same cooldown period.



Etienne



Depends on how you implement it. If you ignore all of shouldRescale, yes,

but you shouldn't do that in the first place.



I agree, this is not what I planned to implement.



That sounds great; let's go ahead and outline this in the FLIP.

Best,
D.


On Tue, Jul 4, 2023 at 12:30 PM Etienne Chauchot
wrote:


Hi all,

Thanks David for your feedback. My comments are inline

Le 04/07/2023 à 09:16, David Morávek a écrit :

They will struggle if they add new resources and nothing happens for 5

minutes.

The same applies if they start playing with FLIP-291 APIs. I'm wondering

if

the cooldown makes sense there since it was the user's deliberate choice

to

push new requirements. 樂

Sure, but remember that the initial rescale is always done immediately.
Only the time between 2 rescales is controlled by the cooldown period. I
don't see a user adding resources every 10s (your proposed default
value) or even with, let's say 2 min, waiting 2 min between 2
requirements push seems ok to me.



Best,
D.

On Tue, Jul 4, 2023 at 9:11 AM David Morávek   wrote:


The FLIP reads sane to me. I'm unsure about the default values, though;

5

minutes of wait time between rescales feels rather strict, and we should
rethink it to provide a better out-of-the-box experience.

I'd focus on newcomers trying AS / Reactive Mode out. They will struggle
if they add new resources and nothing happens for 5 minutes. I'd suggest
defaulting to
*jobmanager.adaptive-scheduler.resource-stabilization-timeout* (which
defaults to 10s).

If users add resources, the re-scale will happen right away. It is only
for next additions that they will have to wait for the coolDown period
to end.

But anyway, we could lower the default value, I just took what Robert
suggested in the ticket.



I'm still struggling to grasp max internal (force rescale). Ignoring

`AdaptiveScheduler#shouldRescale()`

condition seems rather dangerous. Wouldn't a simple case where you add a
new TM and remove it before the max interval is reached (so there is
nothing to do) result in an unnecessary job restart?

With current behavior (on master) : adding the TM will result in
restarting if the number of slots added leads to job parallelism
increase of more than 2. Then removing it can have 2 consequences:
either it is removed before the resource-stabilisation timeout and there
will be no restart. Or it is removed after this timeout (the job is in
Running state) and it will entail another restart and parallelism decrease.

With the proposed behavior: what the scaling-interval.max will change is
only on the resource addition part: when the TM is added, if the time
since last rescale > scaling-interval.max, then a restart and
parallelism increase will be done even if it leads to a parallelism
increase < 2. The rest regarding TM removal does not change.

=> So, the real difference with the current behavior is ** if the slots
addition was too little ** : in the current behavior nothing happens. In
the new behavior nothing happens unless the addition arrives after

Re: [DISCUSS] FLIP-329: Add operator attribute to specify support for object-reuse

2023-07-06 Thread Dong Lin
Hi Jing,

Thank you for the detailed explanation. Please see my reply inline.

On Thu, Jul 6, 2023 at 3:17 AM Jing Ge  wrote:

> Hi Xuannan, Hi Dong,
>
> Thanks for your clarification.
>
> @Xuannan
>
> A Jira ticket has been created for the doc update:
> https://issues.apache.org/jira/browse/FLINK-32546
>
> @Dong
>
> I don't have a concrete example. I just thought about it from a conceptual
> or pattern's perspective. Since we have 1. coarse-grained global switch(CGS
> as abbreviation), i.e. the pipeline.object-reuse and 2. fine-grained local
> switch(FGS as abbreviation), i.e. the objectReuseCompliant variable for
> specific operators/functions, there will be the following patterns with
> appropriate combinations:
>
> pattern 1: coarse-grained switch only. Local object reuse will be
> controlled by the coarse-grained switch:
> 1.1 cgs == true -> local object reused enabled
> 1.2 cgs == true  -> local object reused enabled
> 1.3 cgs == false -> local object reused disabled, i.e. deep copy enabled
> 1.4 cgs == false -> local object reused disabled, i.e. deep copy enabled
>
> afaiu, this is the starting point. I wrote 4 on purpose to make the
> regression check easier. We can consider it as the combinations with
> cgs(true/false) and fgs(true/false) while fgs is ignored.
>
> Now we introduce fine-grained switch. There will be two patterns:
>
> pattern 2: fine-grained switch over coarse-grained switch. Coarse-grained
> switch will be ignored when the local fine-grained switch has different
> value:
> 2.1 cgs == true and fgs == true -> local object reused enabled
> 2.2 cgs == true and fgs == false -> local object reused disabled, i.e.
> deep copy enabled
> 2.3 cgs == false and fgs == true -> local object reused enabled
> 2.4 cgs == false and fgs == false -> local object reused disabled, i.e.
> deep copy enabled
>
> cgs is actually ignored.
>
> Current FLIP is using a slightly different pattern:
>
> pattern 3: fine-grained switch over coarse-grained switch only when
> coarse-grained switch is off, i..e cgs OR fgs:
> 3.1 cgs == true and fgs == true -> local object reused enabled
> 3.2 cgs == true and fgs == false -> local object reused enabled
> 3.3 cgs == false and fgs == true -> local object reused enabled
> 3.4 cgs == false and fgs == false -> local object reused disabled, i.e.
> deep copy enabled
>
> All of those patterns are rational and each has different focus. It
> depends on the real requirement to choose one of them.
>
> As we can see, if fgs is using 2VL, there is a regression between pattern
> 1 and pattern 2. You are absolutely right in this case. That's why I
> suggested 3VL, i.e. fgs will have triple values: true, false, unknown(e.g.
> null)
>
> pattern 4: 3VL fgs with the null as init value (again, there are just two
> combination, I made it 4 on purpose):
> 4.1 cgs == true and fgs == null -> local object reused enabled
> 4.2 cgs == true and fgs == null -> local object reused enabled
> 4.3 cgs == false and fgs == null -> local object reused disabled, i.e.
> deep copy enabled
> 4.4 cgs == false and fgs == null -> local object reused disabled, i.e.
> deep copy enabled
>
> Since the default value of fgs is null, pattern 4 is backward compatible
> with pattern 1, which means no regression.
>
> Now we will set value to fgs and follow the pattern 2:
> 4.5 cgs == true and fgs == true -> local object reused enabled
> 4.6 cgs == true and fgs == false -> local object reused disabled, i.e.
> deep copy enabled
> 4.7 cgs == false and fgs == true -> local object reused enabled
> 4.8 cgs == false and fgs == false -> local object reused disabled, i.e.
> deep copy enabled
>
> Pattern 4 contains pattern 3 with the following combinations(force
> enabling local object reuse):
> 4.5 cgs == true and fgs == true -> local object reused enabled
> 4.2 cgs == true and fgs == null -> local object reused enabled
> 4.7 cgs == false and fgs == true -> local object reused enabled
> 4.4 cgs == false and fgs == null -> local object reused disabled, i.e.
> deep copy enabled
>
> Comparing pattern 4 to pattern 3, user will have one additional
> flexibility to control(force disabling) the local object reuse capability
> because of 3VL, i.e. 4.2+4.6 vs. 3.2.
>

I think you are suggesting to allow the user setting fgs to null so that
"user will have one additional flexibility to control(force disabling) the
local object reuse capability".

In general, an API that only allows false/true is a bit more complex to
implement than an API that allows false/true/null. All things being equal,
I believe it is preferred to have a simpler public API.

I understand you are coming from a conceptual perspective and trying to
make it similar to hierarchical RBAC. However, after thinking through this,
I still could not find a use-case where we actually need this flexibility.
In particular, for cases where a user has explicitly configured
pipeline.object-reuse to true, that means the user already knows (or takes
the responsibility of ensuring) that 

Re: [ANNOUNCE] Apache Flink has won the 2023 SIGMOD Systems Award

2023-07-06 Thread Jane Chan
Congratulations!

Best,
Jane

On Thu, Jul 6, 2023 at 2:15 PM Jiadong Lu  wrote:

> Congratulations!
>
> Best regards,
> Jiadong Lu
>
> On 2023/7/6 13:26, Weihua Hu wrote:
> > Congratulations!
> >
> > Best,
> > Weihua
> >
> >
> > On Wed, Jul 5, 2023 at 5:46 PM Shammon FY  wrote:
> >
> >> Congratulations!
> >>
> >> Best,
> >> Shammon FY
> >>
> >> On Wed, Jul 5, 2023 at 2:38 PM Paul Lam  wrote:
> >>
> >>> Congrats and cheers!
> >>>
> >>> Best,
> >>> Paul Lam
> >>>
>  2023年7月4日 18:04,Benchao Li  写道:
> 
>  Congratulations!
> 
>  Feng Jin  于2023年7月4日周二 16:17写道:
> 
> > Congratulations!
> >
> > Best,
> > Feng Jin
> >
> >
> >
> > On Tue, Jul 4, 2023 at 4:13 PM Yuxin Tan 
> >>> wrote:
> >
> >> Congratulations!
> >>
> >> Best,
> >> Yuxin
> >>
> >>
> >> Dunn Bangui  于2023年7月4日周二 16:04写道:
> >>
> >>> Congratulations!
> >>>
> >>> Best,
> >>> Bangui Dunn
> >>>
> >>> Yangze Guo  于2023年7月4日周二 15:59写道:
> >>>
>  Congrats everyone!
> 
>  Best,
>  Yangze Guo
> 
>  On Tue, Jul 4, 2023 at 3:53 PM Rui Fan <1996fan...@gmail.com>
> >> wrote:
> >
> > Congratulations!
> >
> > Best,
> > Rui Fan
> >
> > On Tue, Jul 4, 2023 at 2:08 PM Zhu Zhu 
> wrote:
> >
> >> Congratulations everyone!
> >>
> >> Thanks,
> >> Zhu
> >>
> >> Hang Ruan  于2023年7月4日周二 14:06写道:
> >>>
> >>> Congratulations!
> >>>
> >>> Best,
> >>> Hang
> >>>
> >>> Jingsong Li  于2023年7月4日周二 13:47写道:
> >>>
>  Congratulations!
> 
>  Thank you! All of the Flink community!
> 
>  Best,
>  Jingsong
> 
>  On Tue, Jul 4, 2023 at 1:24 PM tison 
> >>> wrote:
> >
> > Congrats and with honor :D
> >
> > Best,
> > tison.
> >
> >
> > Mang Zhang  于2023年7月4日周二 11:08写道:
> >
> >> Congratulations!--
> >>
> >> Best regards,
> >> Mang Zhang
> >>
> >>
> >>
> >>
> >>
> >> 在 2023-07-04 01:53:46,"liu ron"  写道:
> >>> Congrats everyone
> >>>
> >>> Best,
> >>> Ron
> >>>
> >>> Jark Wu  于2023年7月3日周一 22:48写道:
> >>>
>  Congrats everyone!
> 
>  Best,
>  Jark
> 
> > 2023年7月3日 22:37,Yuval Itzchakov 
> >> 写道:
> >
> > Congrats team!
> >
> > On Mon, Jul 3, 2023, 17:28 Jing Ge via user <
>  u...@flink.apache.org
>  > wrote:
> >> Congratulations!
> >>
> >> Best regards,
> >> Jing
> >>
> >>
> >> On Mon, Jul 3, 2023 at 3:21 PM yuxia <
>  luoyu...@alumni.sjtu.edu.cn
>  > wrote:
> >>> Congratulations!
> >>>
> >>> Best regards,
> >>> Yuxia
> >>>
> >>> 发件人: "Pushpa Ramakrishnan" <
>  pushpa.ramakrish...@icloud.com
>    pushpa.ramakrish...@icloud.com>>
> >>> 收件人: "Xintong Song"  >   tonysong...@gmail.com>>
> >>> 抄送: "dev"  >> dev@flink.apache.org>>,
>  "User"  >>> u...@flink.apache.org
> >>
> >>> 发送时间: 星期一, 2023年 7 月 03日 下午 8:36:30
> >>> 主题: Re: [ANNOUNCE] Apache Flink has won the 2023
> >>> SIGMOD
> >> Systems
> >> Award
> >>>
> >>> Congratulations \uD83E\uDD73
> >>>
> >>> On 03-Jul-2023, at 3:30 PM, Xintong Song <
> >> tonysong...@gmail.com
>  > wrote:
> >>>
> >>> 
> >>> Dear Community,
> >>>
> >>> I'm pleased to share this good news with everyone.
> >> As
>  some
> >> of
>  you
> >> may
>  have already heard, Apache Flink has won the 2023
> > SIGMOD
>  Systems
>  Award
> >> [1].
> >>>
> >>> "Apache Flink greatly expanded the use of stream
>  data-processing."
> >> --
>  SIGMOD Awards Committee
> >>>
> >>> SIGMOD is one of the 

[jira] [Created] (FLINK-32551) Provide the possibility to have a savepoint taken by the operator when deleting a flinkdeployment

2023-07-06 Thread Nicolas Fraison (Jira)
Nicolas Fraison created FLINK-32551:
---

 Summary: Provide the possibility to have a savepoint taken by the 
operator when deleting a flinkdeployment
 Key: FLINK-32551
 URL: https://issues.apache.org/jira/browse/FLINK-32551
 Project: Flink
  Issue Type: Improvement
Reporter: Nicolas Fraison


Currently if a flinkdeployment is deleted all the HA metadata is removed and no 
checkpoint is taken.

It would be great (for ex. in case of fat finger) to be able to configure 
deployment in order to take a savepoint if the deployment is deleted



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [ANNOUNCE] Flink 1.18 Feature Freeze Extended until July 24th, 2023

2023-07-06 Thread Jing Ge
Thanks for driving it and sharing the update!

Best regards,
Jing

On Thu, Jul 6, 2023 at 9:21 AM yuxia  wrote:

> Thanks for the update and thanks for your efforts.
>
> Best regards,
> Yuxia
>
> - 原始邮件 -
> 发件人: "Rui Fan" <1996fan...@gmail.com>
> 收件人: "dev" , re...@apache.org
> 抄送: "Jing Ge" , snuyan...@gmail.com, "Konstantin
> Knauf" 
> 发送时间: 星期四, 2023年 7 月 06日 下午 3:06:28
> 主题: Re: [ANNOUNCE] Flink 1.18 Feature Freeze Extended until July 24th, 2023
>
> Thanks for the update, and thank you for your efforts for the 1.18 release!
>
> Best,
> Rui Fan
>
> On Thu, Jul 6, 2023 at 2:40 PM Qingsheng Ren  wrote:
>
> > Hi devs,
> >
> > Recently we collected some feedback from developers, and in order to give
> > more time for polishing some important features in 1.18, we decide to
> > extend the feature freezing date to:
> >
> > July 24th, 2023, at 00:00 CEST(UTC+2)
> >
> > which gives us ~2 weeks for development from now. There will be no
> > extension after Jul 24, so please arrange new features in the next
> release
> > if they cannot be finished before the closing date.
> >
> > Thanks everyone for your work in 1.18!
> >
> > Best regards,
> > Qingsheng, Jing, Konstantin and Sergey
> >
>


Re: [DISCUSS] FLIP-325: Support configuring end-to-end allowed latency

2023-07-06 Thread Dong Lin
Hi Piotr,

Thanks for your comments. Please see my reply inline.


On Thu, Jul 6, 2023 at 1:54 AM Piotr Nowojski 
wrote:

> Hi,
>
> Thanks for this proposal, this is a very much needed thing that should be
> addressed in Flink.
>
> I think there is one thing that hasn't been discussed neither here nor in
> FLIP-309. Given that we have
> three dimensions:
> - e2e latency/checkpointing interval
> - enabling some kind of batching/buffering on the operator level
> - how much resources we want to allocate to the job
>
> How do we want Flink to adjust itself between those three? For example:
> a) Should we assume that given Job has a fixed amount of assigned resources
> and make it paramount that
>   Flink doesn't exceed those available resources? So in case of
> backpressure, we
>   should extend checkpointing intervals, emit records less frequently and
> in batches.


Yes, this FLIP works under the assumption (also the current behavior of
Flink in stream mode) that the given job has a fixed amount of the assigned
resources.

When any source reader has IsProcessingBacklog=false, it can
invoke Output#emitRecordAttributes to
emit RecordAttributes(isBacklog=true). This event tells the downstream
operators that they can optionally buffer the records received after this
RecordAttributes event until isBacklog=false. With the correponding support
in the operator implementation, the job can achieve the goal of emitting
records less frequently and in batches.

And yes, we can also extend the checkpointing interval when any source
reader is processing a backlog. By using the APIs proposed in FLIP-309,
source reader can tell its isProcessingBacklog status to the checkpoing
coordinator, which can use the longer checkpointing interval (i.e.
execution.checkpointing.interval-during-backlog) as specified by the user.

Currently, FLIP-309 only supports adjusting the IsProcessingBacklog when
the source reader explicitly invokes an API (e.g. setProcessingBacklog). In
the future, we can add plugin to support adjusting the IsProcessingBacklog
status based on e.g. backpressure metrics or any other reasonable source of
information in TM/JM, so that the checkpoint interval and the operator
buffer time can be adjusted accordingly (e.g. when backpressure is high).



> b) Or should we assume that the amount of resources is flexible (up to a
> point?), and the desired e2e latency
>   is the paramount aspect? So in case of backpressure, we should still
> adhere to the configured e2e latency,
>   and wait for the user or autoscaler to scale up the job?
>

I would say it will be an awesome feature to let Flink automatically adjust
its resource usage to meet the user's throughput/latency requirement with
minimal resources.

The following challenges would need to be addressed in order to have this
feature:
- Find the best way to allocate resources (e.g. CPU, memory, parallelism)
across operators.
- Predict the impact on the throughput/latency as we adjust the resource
usages, the buffering time, checkpointing interval etc.
- Balance between the overhead of adjusting/restarting operators and the
benefits of making resource adjustments.

I don't have a good idea on how to address the above issues nicely.
Supposing flexible resource usage is currently out of the scope of this
FLIP.


> In case of a), I think the concept of "isProcessingBacklog" is not needed,
> we could steer the behaviour only
> using the backpressure information.
>

I guess the point is that the downstream operators (and also the checkpoint
coordinator) need a boolean value telling them whether they can start to
buffer records (and use the long checkpointing interval). We can also
rename this value as isUnderBackPressure, or any other name that we find
reasonable. I don't have a strong preference between the choices of this
name.

I think what matters is how to derive the boolean value of
IsProcessingBacklog (I will use this name for now). It seems that there are
a couple of valuable ways to derive this value, and no single one of them
can cover all scenarios with the best performance:

- Allow source reader to determine the IsProcessingBacklog based on the
watermark lag (only when event-time is available).
- Allow source reader to determine the IsProcessingBacklog based on the
backpressure (with the potential downside of introducing backlog when there
is a short interval of traffic spikes on just one reader)
- Allow source reader to determine the IsProcessingBacklog based on its
internal state/event (e.g. MySQL CDC switches from the snapshot phase to
the binlog phase).


> On the other hand, in case of b), "isProcessingBacklog" information might
> be helpful, to let Flink know that
> we can safely decrease the e2e latency/checkpoint interval even if there is
> no backpressure, to use fewer
> resources (and let the autoscaler scale down the job).
>
> Do we want to have both, or only one of those? Do a) and b) complement one
> another? If job is backpressured,
> we should 

[jira] [Created] (FLINK-32550) Single-parallel task consumes multi-partition data exception in pulsar

2023-07-06 Thread jiangchunyang (Jira)
jiangchunyang created FLINK-32550:
-

 Summary: Single-parallel task consumes multi-partition data 
exception in pulsar
 Key: FLINK-32550
 URL: https://issues.apache.org/jira/browse/FLINK-32550
 Project: Flink
  Issue Type: Bug
  Components: Connectors / Pulsar
Affects Versions: 1.16.0
 Environment: flink 1.16.0

pulsar 2.11.1
Reporter: jiangchunyang


Send data to pulsar every second through the program. When there is a 
partitioned-topic with 3 partitions in pulsar, use the Exclusive consumption 
mode for consumption (verified, it is not a data problem, because there is no 
abnormal data in normal tasks), start A 3-parallel task consumes this topic, 
and everything is normal; but when I use a single-parallel task to consume this 
topic, an exception occurs when some data is deserialized, and the avro format 
data is used, so an error is reported (Length is negative: -52); When I create 
a partitioned-topic with a single partition, I use a single parallelism task to 
consume, and there is no such problem, all the data is normal, and there is no 
task error.
I checked the logs and found that the consumer and topic-partition are 
allocated as expected.
Summarize:
3 topic-partitions and 1 consume work with exception (3 partitions are 
allocated to a single consumer)
3 topic-partitions and 3 consumers work normally
1 topic-partition 1 consumer works normally



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-32549) Tiered storage memory manager supports ownership transfer for buffers

2023-07-06 Thread Yuxin Tan (Jira)
Yuxin Tan created FLINK-32549:
-

 Summary: Tiered storage memory manager supports ownership transfer 
for buffers
 Key: FLINK-32549
 URL: https://issues.apache.org/jira/browse/FLINK-32549
 Project: Flink
  Issue Type: Sub-task
  Components: Runtime / Network
Affects Versions: 1.18.0
Reporter: Yuxin Tan
Assignee: Yuxin Tan


Currently, the accumulator is responsible for requesting all buffers, leading 
to an inaccurate number of requested buffers for each tier. 
To address this issue, buffer ownership must be transferred from the 
accumulator to the tiers when writing them, which will enable the memory 
manager to maintain a correct number of requested buffers for different owners.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [ANNOUNCE] Flink 1.18 Feature Freeze Extended until July 24th, 2023

2023-07-06 Thread yuxia
Thanks for the update and thanks for your efforts.

Best regards,
Yuxia

- 原始邮件 -
发件人: "Rui Fan" <1996fan...@gmail.com>
收件人: "dev" , re...@apache.org
抄送: "Jing Ge" , snuyan...@gmail.com, "Konstantin Knauf" 

发送时间: 星期四, 2023年 7 月 06日 下午 3:06:28
主题: Re: [ANNOUNCE] Flink 1.18 Feature Freeze Extended until July 24th, 2023

Thanks for the update, and thank you for your efforts for the 1.18 release!

Best,
Rui Fan

On Thu, Jul 6, 2023 at 2:40 PM Qingsheng Ren  wrote:

> Hi devs,
>
> Recently we collected some feedback from developers, and in order to give
> more time for polishing some important features in 1.18, we decide to
> extend the feature freezing date to:
>
> July 24th, 2023, at 00:00 CEST(UTC+2)
>
> which gives us ~2 weeks for development from now. There will be no
> extension after Jul 24, so please arrange new features in the next release
> if they cannot be finished before the closing date.
>
> Thanks everyone for your work in 1.18!
>
> Best regards,
> Qingsheng, Jing, Konstantin and Sergey
>


Re: [ANNOUNCE] Flink 1.18 Feature Freeze Extended until July 24th, 2023

2023-07-06 Thread Rui Fan
Thanks for the update, and thank you for your efforts for the 1.18 release!

Best,
Rui Fan

On Thu, Jul 6, 2023 at 2:40 PM Qingsheng Ren  wrote:

> Hi devs,
>
> Recently we collected some feedback from developers, and in order to give
> more time for polishing some important features in 1.18, we decide to
> extend the feature freezing date to:
>
> July 24th, 2023, at 00:00 CEST(UTC+2)
>
> which gives us ~2 weeks for development from now. There will be no
> extension after Jul 24, so please arrange new features in the next release
> if they cannot be finished before the closing date.
>
> Thanks everyone for your work in 1.18!
>
> Best regards,
> Qingsheng, Jing, Konstantin and Sergey
>


[jira] [Created] (FLINK-32548) Make watermark alignment ready for production use

2023-07-06 Thread Rui Fan (Jira)
Rui Fan created FLINK-32548:
---

 Summary: Make watermark alignment ready for production use
 Key: FLINK-32548
 URL: https://issues.apache.org/jira/browse/FLINK-32548
 Project: Flink
  Issue Type: Improvement
Affects Versions: 1.17.1, 1.16.2
Reporter: Rui Fan
Assignee: Rui Fan


We found a series of watermark alignment bugs and performance issues and hope 
to reach production availability in 1.18.0. 

And fixes all bugs found in 1.16.3 and 1.17.2.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[ANNOUNCE] Flink 1.18 Feature Freeze Extended until July 24th, 2023

2023-07-06 Thread Qingsheng Ren
Hi devs,

Recently we collected some feedback from developers, and in order to give
more time for polishing some important features in 1.18, we decide to
extend the feature freezing date to:

July 24th, 2023, at 00:00 CEST(UTC+2)

which gives us ~2 weeks for development from now. There will be no
extension after Jul 24, so please arrange new features in the next release
if they cannot be finished before the closing date.

Thanks everyone for your work in 1.18!

Best regards,
Qingsheng, Jing, Konstantin and Sergey


Re: [ANNOUNCE] Apache Flink has won the 2023 SIGMOD Systems Award

2023-07-06 Thread Jiadong Lu

Congratulations!

Best regards,
Jiadong Lu

On 2023/7/6 13:26, Weihua Hu wrote:

Congratulations!

Best,
Weihua


On Wed, Jul 5, 2023 at 5:46 PM Shammon FY  wrote:


Congratulations!

Best,
Shammon FY

On Wed, Jul 5, 2023 at 2:38 PM Paul Lam  wrote:


Congrats and cheers!

Best,
Paul Lam


2023年7月4日 18:04,Benchao Li  写道:

Congratulations!

Feng Jin  于2023年7月4日周二 16:17写道:


Congratulations!

Best,
Feng Jin



On Tue, Jul 4, 2023 at 4:13 PM Yuxin Tan 

wrote:



Congratulations!

Best,
Yuxin


Dunn Bangui  于2023年7月4日周二 16:04写道:


Congratulations!

Best,
Bangui Dunn

Yangze Guo  于2023年7月4日周二 15:59写道:


Congrats everyone!

Best,
Yangze Guo

On Tue, Jul 4, 2023 at 3:53 PM Rui Fan <1996fan...@gmail.com>

wrote:


Congratulations!

Best,
Rui Fan

On Tue, Jul 4, 2023 at 2:08 PM Zhu Zhu  wrote:


Congratulations everyone!

Thanks,
Zhu

Hang Ruan  于2023年7月4日周二 14:06写道:


Congratulations!

Best,
Hang

Jingsong Li  于2023年7月4日周二 13:47写道:


Congratulations!

Thank you! All of the Flink community!

Best,
Jingsong

On Tue, Jul 4, 2023 at 1:24 PM tison 

wrote:


Congrats and with honor :D

Best,
tison.


Mang Zhang  于2023年7月4日周二 11:08写道:


Congratulations!--

Best regards,
Mang Zhang





在 2023-07-04 01:53:46,"liu ron"  写道:

Congrats everyone

Best,
Ron

Jark Wu  于2023年7月3日周一 22:48写道:


Congrats everyone!

Best,
Jark


2023年7月3日 22:37,Yuval Itzchakov 

写道:


Congrats team!

On Mon, Jul 3, 2023, 17:28 Jing Ge via user <

u...@flink.apache.org

> wrote:

Congratulations!

Best regards,
Jing


On Mon, Jul 3, 2023 at 3:21 PM yuxia <

luoyu...@alumni.sjtu.edu.cn

> wrote:

Congratulations!

Best regards,
Yuxia

发件人: "Pushpa Ramakrishnan" <

pushpa.ramakrish...@icloud.com


pushpa.ramakrish...@icloud.com>>

收件人: "Xintong Song" 

tonysong...@gmail.com>>

抄送: "dev" 
dev@flink.apache.org>>,

"User" 
u...@flink.apache.org



发送时间: 星期一, 2023年 7 月 03日 下午 8:36:30
主题: Re: [ANNOUNCE] Apache Flink has won the 2023

SIGMOD

Systems

Award


Congratulations \uD83E\uDD73

On 03-Jul-2023, at 3:30 PM, Xintong Song <

tonysong...@gmail.com

> wrote:



Dear Community,

I'm pleased to share this good news with everyone.

As

some

of

you

may

have already heard, Apache Flink has won the 2023

SIGMOD

Systems

Award

[1].


"Apache Flink greatly expanded the use of stream

data-processing."

--

SIGMOD Awards Committee


SIGMOD is one of the most influential data

management

research

conferences in the world. The Systems Award is awarded

to

an

individual

or

set of individuals to recognize the development of a

software or

hardware

system whose technical contributions have had

significant

impact on

the

theory or practice of large-scale data management

systems.

Winning

of

the

award indicates the high recognition of Flink's

technological

advancement

and industry influence from academia.


As an open-source project, Flink wouldn't have

come

this far

without

the wide, active and supportive community behind it.

Kudos

to

all

of us

who

helped make this happen, including the over 1,400

contributors

and

many

others who contributed in ways beyond code.


Best,
Xintong (on behalf of the Flink PMC)

[1] https://sigmod.org/2023-sigmod-systems-award/





















--

Best,
Benchao Li