date:20231214

[jira] [Created] (FLINK-33853) Migrate Junit5 for DefaultDeclarativeSlotPool test classes

2023-12-14 Thread RocMarshal (Jira)

RocMarshal created FLINK-33853:
--

 Summary: Migrate Junit5 for DefaultDeclarativeSlotPool test classes
 Key: FLINK-33853
 URL: https://issues.apache.org/jira/browse/FLINK-33853
 Project: Flink
  Issue Type: Improvement
  Components: Tests
Reporter: RocMarshal






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-33852) FLIP-403 High Availability Services for OLAP Scenarios

2023-12-14 Thread Yangze Guo (Jira)

Yangze Guo created FLINK-33852:
--

 Summary: FLIP-403 High Availability Services for OLAP Scenarios
 Key: FLINK-33852
 URL: https://issues.apache.org/jira/browse/FLINK-33852
 Project: Flink
  Issue Type: New Feature
  Components: Runtime / Coordination
Reporter: Yangze Guo
Assignee: Yangze Guo






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Re: [PROPOSAL] Contribute Flink CDC Connectors project to Apache Flink

2023-12-14 Thread MengHui Yu

Good News. +1

Re: [VOTE] FLIP-383: Support Job Recovery for Batch Jobs

2023-12-14 Thread Rui Fan

+1 (binding)

Best,
Rui

On Fri, Dec 15, 2023 at 12:05 PM weijie guo 
wrote:

> +1 (binding)
>
> Best regards,
>
> Weijie
>
>
> Zhu Zhu  于2023年12月15日周五 10:49写道：
>
> > +1 (binding)
> >
> > Thanks,
> > Zhu
> >
> > Xintong Song  于2023年12月14日周四 15:36写道：
> >
> > > +1 (binding)
> > >
> > > Best,
> > >
> > > Xintong
> > >
> > >
> > >
> > > On Thu, Dec 14, 2023 at 3:15 PM Lijie Wang 
> > > wrote:
> > >
> > > > Hi devs, Thanks for all feedback about the FLIP-383: Support Job
> > Recovery
> > > > for Batch Jobs[1]. This FLIP was discussed in [2].
> > > >
> > > > I'd like to start a vote for it. The vote will be open for at least
> 72
> > > > hours (until December 19th 12:00 GMT) unless there is an objection or
> > > > insufficient votes. [1]
> > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-383%3A+Support+Job+Recovery+for+Batch+Jobs
> > > >
> > > > [2] https://lists.apache.org/thread/074z237c07vtj74685nxo6bttkq3kshz
> > > >   Best, Lijie
> > > >
> > >
> >
>

Re: [VOTE] FLIP-383: Support Job Recovery for Batch Jobs

2023-12-14 Thread weijie guo

+1 (binding)

Best regards,

Weijie


Zhu Zhu  于2023年12月15日周五 10:49写道：

> +1 (binding)
>
> Thanks,
> Zhu
>
> Xintong Song  于2023年12月14日周四 15:36写道：
>
> > +1 (binding)
> >
> > Best,
> >
> > Xintong
> >
> >
> >
> > On Thu, Dec 14, 2023 at 3:15 PM Lijie Wang 
> > wrote:
> >
> > > Hi devs, Thanks for all feedback about the FLIP-383: Support Job
> Recovery
> > > for Batch Jobs[1]. This FLIP was discussed in [2].
> > >
> > > I'd like to start a vote for it. The vote will be open for at least 72
> > > hours (until December 19th 12:00 GMT) unless there is an objection or
> > > insufficient votes. [1]
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-383%3A+Support+Job+Recovery+for+Batch+Jobs
> > >
> > > [2] https://lists.apache.org/thread/074z237c07vtj74685nxo6bttkq3kshz
> > >   Best, Lijie
> > >
> >
>

Re:Re: [DISCUSS] FLIP-400: AsyncScalarFunction for asynchronous scalar function support

2023-12-14 Thread Xuyang

Hi, Alan. Thanks for driving this.


Using async to improve throughput has been done on look join, and the 
improvement 
effect is obvious, so I think it makes sense to support async scalar function.  
Big +1 for this flip. 
I have some questions below.


1. Override the function `getRequirements` in `AsyncScalarFunction`


I’m just curious why you don’t use conf(global) and query hint(individual async 
udx) to mark the output 
mode 'order' or 'unorder' like async look join [1] and async udtf[2], but chose 
to introduce a new enum 
in AsyncScalarFunction.


2. In some scenarios with semantic correctness, async operators must be 
executed in sync mode.


What about throwing an exception to make it clear to users that using async 
scalar functions in this situation 
is problematic instead of executing silently in sync mode? Because users may be 
confused about 
the final actual job graph.


[1] 
https://cwiki.apache.org/confluence/display/FLINK/FLIP-234%3A+Support+Retryable+Lookup+Join+To+Solve+Delayed+Updates+Issue+In+External+Systems
[2] 
https://cwiki.apache.org/confluence/display/FLINK/FLIP-313%3A+Add+support+of+User+Defined+AsyncTableFunction









--

Best！
Xuyang





在 2023-12-15 11:20:24，"Aitozi"  写道：
>Hi Alan,
>Nice FLIP, I also explore leveraging the async table function[1] to
>improve the throughput before.
>
>About the configs, what do you think using hints as mentioned in [1].
>
>[1]:
>https://cwiki.apache.org/confluence/display/FLINK/FLIP-313%3A+Add+support+of+User+Defined+AsyncTableFunction
>
>Thanks,
>Aitozi.
>
>Timo Walther  于2023年12月14日周四 17:29写道：
>
>> Hi Alan,
>>
>> thanks for proposing this FLIP. It's a great addition to Flink and has
>> been requested multiple times. It will be in particular interesting for
>> accessing REST endpoints and other remote services.
>>
>> Great that we can generalize and reuse parts of the Python planner rules
>> and code for this.
>>
>> I have some feedback regarding the API:
>>
>> 1) Configuration
>>
>> Configuration keys like
>>
>> `table.exec.async-scalar.catalog.db.func-name.buffer-capacity`
>>
>> are currently not supported in the configuration stack. The key space
>> should remain constant. Only a constant key space enables the use of the
>> ConfigOption class which is required in the layered configuration. For
>> now I would suggest to only allow a global setting for buffer capacity,
>> timeout, and retry-strategy. We can later work on a per-function
>> configuration (potentially also needed for other use cases).
>>
>> 2) Semantical declaration
>>
>> Regarding
>>
>> `table.exec.async-scalar.catalog.db.func-name.output-mode`
>>
>> this is a semantical property of a function and should be defined
>> per-function. It impacts the query result and potentially the behavior
>> of planner rules.
>>
>> I see two options for this either: (a) an additional method in
>> AsyncScalarFunction or (b) adding this to the function's requirements. I
>> vote for (b), because a FunctionDefinition should be fully self
>> contained and sufficient for planning.
>>
>> Thus, for `FunctionDefinition.getRequirements():
>> Set` we can add a new requirement `ORDERED` which
>> should also be the default for AsyncScalarFunction. `getRequirements()`
>> can be overwritten and return a set without this requirement if the user
>> intents to do this.
>>
>>
>> Thanks,
>> Timo
>>
>>
>>
>>
>> On 11.12.23 18:43, Piotr Nowojski wrote:
>> > +1 to the idea, I don't have any comments.
>> >
>> > Best,
>> > Piotrek
>> >
>> > czw., 7 gru 2023 o 07:15 Alan Sheinberg > .invalid>
>> > napisał(a):
>> >
>> >>>
>> >>> Nicely written and makes sense.  The only feedback I have is around the
>> >>> naming of the generalization, e.g. "Specifically,
>> PythonCalcSplitRuleBase
>> >>> will be generalized into RemoteCalcSplitRuleBase."  This naming seems
>> to
>> >>> imply/suggest that all Async functions are remote.  I wonder if we can
>> >> find
>> >>> another name which doesn't carry that connotation; maybe
>> >>> AsyncCalcSplitRuleBase.  (An AsyncCalcSplitRuleBase which handles
>> Python
>> >>> and Async functions seems reasonable.)
>> >>>
>> >> Thanks.  That's fair.  I agree that "Remote" isn't always accurate.  I
>> >> believe that the python calls are also done asynchronously, so that
>> might
>> >> be a reasonable name, so long as there's no confusion between the base
>> and
>> >> async child class.
>> >>
>> >> On Wed, Dec 6, 2023 at 3:48 PM Jim Hughes > >
>> >> wrote:
>> >>
>> >>> Hi Alan,
>> >>>
>> >>> Nicely written and makes sense.  The only feedback I have is around the
>> >>> naming of the generalization, e.g. "Specifically,
>> PythonCalcSplitRuleBase
>> >>> will be generalized into RemoteCalcSplitRuleBase."  This naming seems
>> to
>> >>> imply/suggest that all Async functions are remote.  I wonder if we can
>> >> find
>> >>> another name which doesn't carry that connotation; maybe
>> >>> AsyncCalcSplitRuleBase.  (An AsyncCalcSplitRuleBase which handles
>> Python
>> >>> and Async

Re: [DISCUSS] Should Configuration support getting value based on String key?

2023-12-14 Thread Zhu Zhu

I think it's not clear whether forcing using ConfigOption would hurt
the user experience.

Maybe it does at the beginning, because using string keys to access
Flink configuration can be simpler for new components/jobs.
However, problems may happen later if the configuration usages become
more complex, like key renaming, using types other than strings, and
other problems that ConfigOption was invented to address.

Personally I prefer to encourage the usage of ConfigOption.
Jobs should use GlobalJobParameter for custom config, which is different
from the Configuration interface. Therefore, Configuration is mostly
used in other components/plugins, in which case the long-term maintenance
can be important.

However, since it is not a right or wrong choice, I'd also be fine
to keep the `getString()` method if more devs/users are in favor of it.

Thanks,
Zhu

Timo Walther  于2023年12月14日周四 17:41写道：

> The configuration in Flink is complicated and I fear we won't have
> enough capacity to substantially fix it. The introduction of
> ReadableConfig, WritableConfig, and typed ConfigOptions was a right step
> into making the code more maintainable. From the Flink side, every read
> access should go through ConfigOption.
>
> However, I also understand Gyula pragmatism here because (practically
> speaking) users get access `getString()` via `toMap().get()`. So I'm
> fine with removing the deprecation for functionality that is available
> anyways. We should, however, add the message to `getString()` that this
> method is discouraged and `get(ConfigOption)` should be the preferred
> way of accessting Configuration.
>
> In any case we should remove the getInt and related methods.
>
> Cheers,
> Timo
>
>
> On 14.12.23 09:56, Gyula Fóra wrote:
> > I see a strong value for user facing configs to use ConfigOption and this
> > should definitely be an enforced convention.
> >
> > However with the Flink project growing and many other components and even
> > users using the Configuration object I really don’t think that we should
> > “force” this on the users/developers.
> >
> > If we make fromMap / toMap free with basically no overhead, that is fine
> > but otherwise I think it would hurt the user experience to remove the
> > simple getters / setters. Temporary configoptions to access strings from
> > what is practically string->string map is exactly the kind of unnecessary
> > boilerplate that every dev and user wants to avoid.
> >
> > Gyula
> >
> > There are many cases where the features of the configoption are really
> not
> > needed.
> >
> > On Thu, 14 Dec 2023 at 09:38, Xintong Song 
> wrote:
> >
> >> Hi Gyula,
> >>
> >> First of all, even if we remove the `getXXX(String key, XXX
> defaultValue)`
> >> methods, there are still several ways to access the configuration with
> >> string-keys.
> >>
> >> - If one wants to access a specific option, as Rui mentioned,
> >> `ConfigOptions.key("xxx").stringType().noDefaultValue()` can be
> used.
> >> TBH,
> >> I can't think of a use case where a temporally created ConfigOption
> is
> >> preferred over a predefined one. Do you have any examples for that?
> >> - If one wants to access the whole configuration set, then `toMap`
> or
> >> `iterator` might be helpful.
> >>
> >> It is true that these ways are less convenient than `getXXX(String key,
> XXX
> >> defaultValue)`, and that's exactly my purpose, to make the key-string
> less
> >> convenient so that developers choose ConfigOption over it whenever is
> >> possible.
> >>
> >> there will always be cases where a more flexible
> >>> dynamic handling is necessary without the added overhead of the toMap
> >> logic
> >>>
> >>
> >> I'm not sure about this. I agree there are cases where flexible and
> dynamic
> >> handling is needed, but maybe "without the added overhead of the toMap
> >> logic" is not that necessary?
> >>
> >> I'd think of this as "encouraging developers to use ConfigOption as
> much as
> >> possible" vs. "a bit less convenient in 5% of the cases". I guess
> there's
> >> no right and wrong, just different engineer opinions. While I'm
> personally
> >> stand with removing the string-key access methods, I'd also be fine with
> >> the other way if there are more people in favor of it.
> >>
> >> Best,
> >>
> >> Xintong
> >>
> >>
> >>
> >> On Thu, Dec 14, 2023 at 3:45 PM Gyula Fóra 
> wrote:
> >>
> >>> Hi Xintong,
> >>>
> >>> I don’t really see the actual practical benefit from removing the
> >> getstring
> >>> and setstring low level methods.
> >>>
> >>> I understand that ConfigOptions are nicer for 95% of the cases but
> from a
> >>> technical point of view there will always be cases where a more
> flexible
> >>> dynamic handling is necessary without the added overhead of the toMap
> >>> logic.
> >>>
> >>> I think it’s the most natural thing for any config abstraction to
> expose
> >>> basic get set methods with a simple key.
> >>>
> >>> What do you think?
> >>>
> >>> Cheers
> >>> Gyula
> >>>
> >>> On Thu,

Re: [DISCUSS] FLIP-400: AsyncScalarFunction for asynchronous scalar function support

2023-12-14 Thread Aitozi

Hi Alan,
Nice FLIP, I also explore leveraging the async table function[1] to
improve the throughput before.

About the configs, what do you think using hints as mentioned in [1].

[1]:
https://cwiki.apache.org/confluence/display/FLINK/FLIP-313%3A+Add+support+of+User+Defined+AsyncTableFunction

Thanks,
Aitozi.

Timo Walther  于2023年12月14日周四 17:29写道：

> Hi Alan,
>
> thanks for proposing this FLIP. It's a great addition to Flink and has
> been requested multiple times. It will be in particular interesting for
> accessing REST endpoints and other remote services.
>
> Great that we can generalize and reuse parts of the Python planner rules
> and code for this.
>
> I have some feedback regarding the API:
>
> 1) Configuration
>
> Configuration keys like
>
> `table.exec.async-scalar.catalog.db.func-name.buffer-capacity`
>
> are currently not supported in the configuration stack. The key space
> should remain constant. Only a constant key space enables the use of the
> ConfigOption class which is required in the layered configuration. For
> now I would suggest to only allow a global setting for buffer capacity,
> timeout, and retry-strategy. We can later work on a per-function
> configuration (potentially also needed for other use cases).
>
> 2) Semantical declaration
>
> Regarding
>
> `table.exec.async-scalar.catalog.db.func-name.output-mode`
>
> this is a semantical property of a function and should be defined
> per-function. It impacts the query result and potentially the behavior
> of planner rules.
>
> I see two options for this either: (a) an additional method in
> AsyncScalarFunction or (b) adding this to the function's requirements. I
> vote for (b), because a FunctionDefinition should be fully self
> contained and sufficient for planning.
>
> Thus, for `FunctionDefinition.getRequirements():
> Set` we can add a new requirement `ORDERED` which
> should also be the default for AsyncScalarFunction. `getRequirements()`
> can be overwritten and return a set without this requirement if the user
> intents to do this.
>
>
> Thanks,
> Timo
>
>
>
>
> On 11.12.23 18:43, Piotr Nowojski wrote:
> > +1 to the idea, I don't have any comments.
> >
> > Best,
> > Piotrek
> >
> > czw., 7 gru 2023 o 07:15 Alan Sheinberg  .invalid>
> > napisał(a):
> >
> >>>
> >>> Nicely written and makes sense.  The only feedback I have is around the
> >>> naming of the generalization, e.g. "Specifically,
> PythonCalcSplitRuleBase
> >>> will be generalized into RemoteCalcSplitRuleBase."  This naming seems
> to
> >>> imply/suggest that all Async functions are remote.  I wonder if we can
> >> find
> >>> another name which doesn't carry that connotation; maybe
> >>> AsyncCalcSplitRuleBase.  (An AsyncCalcSplitRuleBase which handles
> Python
> >>> and Async functions seems reasonable.)
> >>>
> >> Thanks.  That's fair.  I agree that "Remote" isn't always accurate.  I
> >> believe that the python calls are also done asynchronously, so that
> might
> >> be a reasonable name, so long as there's no confusion between the base
> and
> >> async child class.
> >>
> >> On Wed, Dec 6, 2023 at 3:48 PM Jim Hughes  >
> >> wrote:
> >>
> >>> Hi Alan,
> >>>
> >>> Nicely written and makes sense.  The only feedback I have is around the
> >>> naming of the generalization, e.g. "Specifically,
> PythonCalcSplitRuleBase
> >>> will be generalized into RemoteCalcSplitRuleBase."  This naming seems
> to
> >>> imply/suggest that all Async functions are remote.  I wonder if we can
> >> find
> >>> another name which doesn't carry that connotation; maybe
> >>> AsyncCalcSplitRuleBase.  (An AsyncCalcSplitRuleBase which handles
> Python
> >>> and Async functions seems reasonable.)
> >>>
> >>> Cheers,
> >>>
> >>> Jim
> >>>
> >>> On Wed, Dec 6, 2023 at 5:45 PM Alan Sheinberg
> >>>  wrote:
> >>>
>  I'd like to start a discussion of FLIP-400: AsyncScalarFunction for
>  asynchronous scalar function support [1]
> 
>  This feature proposes adding a new UDF type AsyncScalarFunction which
> >> is
>  invoked just like a normal ScalarFunction, but is implemented with an
>  asynchronous eval method.  I had brought this up including the
> >> motivation
>  in a previous discussion thread [2].
> 
>  The purpose is to achieve high throughput scalar function UDFs while
>  allowing that an individual call may have high latency.  It allows
> >>> scaling
>  up the parallelism of just these calls without having to increase the
>  parallelism of the whole query (which could be rather resource
>  inefficient).
> 
>  In practice, it should enable SQL integration with external services
> >> and
>  systems, which Flink has limited support for at the moment. It should
> >>> also
>  allow easier integration with existing libraries which use
> asynchronous
>  APIs.
> 
>  Looking forward to your feedback and suggestions.
> 
>  [1]
> 
> 
> >>>
> >>
>

Re: [VOTE] FLIP-383: Support Job Recovery for Batch Jobs

2023-12-14 Thread Zhu Zhu

+1 (binding)

Thanks,
Zhu

Xintong Song  于2023年12月14日周四 15:36写道：

> +1 (binding)
>
> Best,
>
> Xintong
>
>
>
> On Thu, Dec 14, 2023 at 3:15 PM Lijie Wang 
> wrote:
>
> > Hi devs, Thanks for all feedback about the FLIP-383: Support Job Recovery
> > for Batch Jobs[1]. This FLIP was discussed in [2].
> >
> > I'd like to start a vote for it. The vote will be open for at least 72
> > hours (until December 19th 12:00 GMT) unless there is an objection or
> > insufficient votes. [1]
> >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-383%3A+Support+Job+Recovery+for+Batch+Jobs
> >
> > [2] https://lists.apache.org/thread/074z237c07vtj74685nxo6bttkq3kshz
> >   Best, Lijie
> >
>

Re:Re: [DISCUSS] FLIP-392: Deprecate the Legacy Group Window Aggregation

2023-12-14 Thread Xuyang

Hi, Timo, thanks for your advice.


I am considering splitting the existing flip into two while leaving the 
existing flip (or without). 
One of them points to the completion of the operator about window tvf to 
support CDC (there are several 
small work items, such as window agg, window rank, window join, etc. Due to 
time constraints, 
the 1.19 version takes priority to complete the window agg). The other points 
to the HOP window tvf
supports a size that is a non-integer multiple of the step. Once these two 
flips are basically completed 
in 1.19, we can consider officially deprecating the old group window agg syntax 
in the release note.


WDYT?




--

Best！
Xuyang





At 2023-12-14 17:51:01, "Timo Walther"  wrote:
>Hi Xuyang,
>
> > I'm not spliting this flip is that all of these subtasks like session 
>window tvf and cdc support do not change the public interface and the 
>public syntax
>
>Given the length of this mailing list discussion and number of involved 
>people I would strongly suggest to simplify the FLIP and give it a 
>better title to make quicker progress. In general, we all seem to be on 
>the same page in what we want. And both session TVF support and the 
>deprecation of the legacy group windows has been voted already and 
>discussed thouroughly. The FLIP can purely focus on the CDC topic.
>
>Cheers,
>Timo
>
>
>On 14.12.23 08:35, Xuyang wrote:
>> Hi, Timo, Sergey and Lincoln Lee. Thanks for your feedback.
>> 
>> 
>>> In my opinion the FLIP touches too many
>>> topics at the same time and should be split into multiple FLIPs. We > 
>>> should stay focused on what is needed for Flink 2.0.
>> The main goal and topic of this Flip is to align the abilities between the 
>> legacy group window agg syntax and the new window tvf syntax,
>> and then we can say that the legacy window syntax will be deprecated. IMO, 
>> although there are many misalignments about these two
>> syntaxes, such as session window tvf, cdc support and so on,  they are all 
>> the subtasks we need to do in this flip. Another reason I'm not
>> spliting this flip is that all of these subtasks like session window tvf and 
>> cdc support do not change the public interface and the public
>> syntax, the implements of them will only be in modules table-planner and 
>> table-runtime.
>> 
>> 
>>> Can we postpone this discussion? Currently we should focus on user
>>> switching to Window TVFs before Flink 2.0. Early fire, late fire and > 
>>> allow lateness have not exposed through public configuration. It can be > 
>>> introduced after Flink 2.0 released.
>> 
>> 
>> Agree with you. This flip will not and should not expose these experimental 
>> flink conf to users. I list them in this flip just aims to show the
>> misalignments about these two window syntaxes.
>> 
>> 
>> Look for your thought.
>> 
>> 
>> 
>> 
>> --
>> 
>>  Best！
>>  Xuyang
>> 
>> 
>> 
>> 
>> 
>> At 2023-12-13 15:40:16, "Lincoln Lee"  wrote:
>>> Thanks Xuyang driving this work! It's great that everyone agrees with the
>>> work itself in this flip[1]!
>>>
>>> Regarding whether to split the flip or adjust the scope of this flip, I'd
>>> like to share some thoughts:
>>>
>>> 1. About the title of this flip, what I want to say is that flip-145[2] had
>>> marked the legacy group window deprecated in the documentation but the
>>> functionality of the new syntax is not aligned with the legacy one.
>>> This is not a user-friendly deprecation, so the initiation of this flip, as
>>> I understand it, is for the formal deprecation of the legacy window, which
>>> requires us to complete the functionality alignment.
>>>
>>> 2. Agree with Timo that we can process the late-fire/early-fire features
>>> separately. These experimental parameters have not been officially opened
>>> to users.
>>> Considering the workload, we can focus more on this version.
>>>
>>> 3. I have no objection to splitting this flip if everyone feels that the
>>> work included is too much.
>>> Regarding the support of session tvf, it seems that the main problem is
>>> that this part of the description occupies a large part of the flip,
>>> causing some misunderstandings.
>>> This is indeed a predetermined task in FLIP-145, just adding more
>>> explanation about semantics. In addition, I saw the discussion history in
>>> FLINK-24024[3], thanks Sergey for being willing to help driving this work
>>> together.
>>>
>>> [1]
>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-392%3A+Deprecate+the+Legacy+Group+Window+Aggregation
>>> [2]
>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-145%3A+Support+SQL+windowing+table-valued+function
>>> [3] https://issues.apache.org/jira/browse/FLINK-24024
>>>
>>> Best,
>>> Lincoln Lee
>>>
>>>
>>> Sergey Nuyanzin  于2023年12月13日周三 08:02写道：
>>>
 thanks for summarising Timo

 +1 for splitting it in different FLIPs
 and agree about having "SESSION Window TVF Aggregation" under FLIP-145
 Moreover the task is already there, so no need

Re: [VOTE] FLIP-382: Unify the Provision of Diverse Metadata for Context-like APIs

2023-12-14 Thread weijie guo

+1(binding)

Best regards,

Weijie


Wencong Liu  于2023年12月15日周五 09:13写道：

> Hi dev,
>
> I'd like to start a vote on FLIP-382.
>
> Discussion thread:
> https://lists.apache.org/thread/3mgsc31odtpmzzl32s4oqbhlhxd0mn6b
> FLIP:
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-382%3A+Unify+the+Provision+of+Diverse+Metadata+for+Context-like+APIs
>
> Best regards,
> Wencong Liu

[VOTE] FLIP-380: Support Full Partition Processing On Non-keyed DataStream

2023-12-14 Thread Wencong Liu

Hi dev,

I'd like to start a vote on FLIP-380.

Discussion thread: 
https://lists.apache.org/thread/nn7myj7vsvytbkdrnbvj5h0homsjrn1h
FLIP: 
https://cwiki.apache.org/confluence/display/FLINK/FLIP-380%3A+Support+Full+Partition+Processing+On+Non-keyed+DataStream

Best regards,
Wencong Liu

[VOTE] FLIP-382: Unify the Provision of Diverse Metadata for Context-like APIs

2023-12-14 Thread Wencong Liu

Hi dev,

I'd like to start a vote on FLIP-382.

Discussion thread: 
https://lists.apache.org/thread/3mgsc31odtpmzzl32s4oqbhlhxd0mn6b
FLIP: 
https://cwiki.apache.org/confluence/display/FLINK/FLIP-382%3A+Unify+the+Provision+of+Diverse+Metadata+for+Context-like+APIs

Best regards,
Wencong Liu

Re: [DISCUSS] FLIP-400: AsyncScalarFunction for asynchronous scalar function support

2023-12-14 Thread Alan Sheinberg

Thanks Piotr and Timo for your responses.

To address your comments Timo:

1) Configuration

Configuration keys like

`table.exec.async-scalar.catalog.db.func-name.buffer-capacity`

are currently not supported in the configuration stack. The key space
> should remain constant. Only a constant key space enables the use of the
> ConfigOption class which is required in the layered configuration. For
> now I would suggest to only allow a global setting for buffer capacity,
> timeout, and retry-strategy. We can later work on a per-function
> configuration (potentially also needed for other use cases).


Yeah, I was trying to find similar examples and couldn't find too many,
because as you say, they aren't supported.

There are things like the metric reporters
,
where you can make up a name (e.g. my_jmx_reporter), but then must list it
in metrics.reporters so that configs can all be iterated over.  Doing a
similar thing here would be a bit inelegant for this case.  I'm happy to
have a global setting and a future solution could override the global
setting once we figure that out.

2) Semantical declaration

Regarding

`table.exec.async-scalar.catalog.db.func-name.output-mode`

this is a semantical property of a function and should be defined
> per-function. It impacts the query result and potentially the behavior
> of planner rules.

I see two options for this either: (a) an additional method in
> AsyncScalarFunction or (b) adding this to the function's requirements. I
> vote for (b), because a FunctionDefinition should be fully self
> contained and sufficient for planning.

Thus, for `FunctionDefinition.getRequirements():
> Set` we can add a new requirement `ORDERED` which
> should also be the default for AsyncScalarFunction. `getRequirements()`
> can be overwritten and return a set without this requirement if the user
> intents to do this.


That's a good point. Maybe if we had per-function configs this could make
sense, but it doesn't make as much when everything is global. The default
of each definition should be to get a correct result, but allowing a manual
override to say that performance is ultimately what they care about over
certain SQL order semantics is also useful.  If the user overrides
`requirements()` to omit the `ORDERED` requirement, do we allow the
operator to return out-of-order results or should it fall back on
`AsyncOutputMode.ALLOW_UNORDERED` type behavior (where we allow
out-of-order only if it's deemed correct)? Having a manual override to mean
out-of-order seems like a decent starting point and we could alway add
`FunctionRequirement.ALLOW_UNDERED` in the future to allow the more
sophisticated behavior.

Thanks,
Alan

On Thu, Dec 14, 2023 at 1:29 AM Timo Walther  wrote:

> Hi Alan,
>
> thanks for proposing this FLIP. It's a great addition to Flink and has
> been requested multiple times. It will be in particular interesting for
> accessing REST endpoints and other remote services.
>
> Great that we can generalize and reuse parts of the Python planner rules
> and code for this.
>
> I have some feedback regarding the API:
>
> 1) Configuration
>
> Configuration keys like
>
> `table.exec.async-scalar.catalog.db.func-name.buffer-capacity`
>
> are currently not supported in the configuration stack. The key space
> should remain constant. Only a constant key space enables the use of the
> ConfigOption class which is required in the layered configuration. For
> now I would suggest to only allow a global setting for buffer capacity,
> timeout, and retry-strategy. We can later work on a per-function
> configuration (potentially also needed for other use cases).
>
> 2) Semantical declaration
>
> Regarding
>
> `table.exec.async-scalar.catalog.db.func-name.output-mode`
>
> this is a semantical property of a function and should be defined
> per-function. It impacts the query result and potentially the behavior
> of planner rules.
>
> I see two options for this either: (a) an additional method in
> AsyncScalarFunction or (b) adding this to the function's requirements. I
> vote for (b), because a FunctionDefinition should be fully self
> contained and sufficient for planning.
>
> Thus, for `FunctionDefinition.getRequirements():
> Set` we can add a new requirement `ORDERED` which
> should also be the default for AsyncScalarFunction. `getRequirements()`
> can be overwritten and return a set without this requirement if the user
> intents to do this.
>
>
> Thanks,
> Timo
>
>
>
>
> On 11.12.23 18:43, Piotr Nowojski wrote:
> > +1 to the idea, I don't have any comments.
> >
> > Best,
> > Piotrek
> >
> > czw., 7 gru 2023 o 07:15 Alan Sheinberg  .invalid>
> > napisał(a):
> >
> >>>
> >>> Nicely written and makes sense.  The only feedback I have is around the
> >>> naming of the generalization, e.g. "Specifically,
> PythonCalcSplitRuleBase
> >>> will be generalized into RemoteCalcSplitRuleBase."  This naming seems
> to
>

Re: [DISCUSS] FLIP-398: Improve Serialization Configuration And Usage In Flink

2023-12-14 Thread Ken Krugler

Hi Yong,

Looks good, thanks for creating this.

One comment - related to my recent email about Fury, I would love to see the v2 
serialization decoupled from Kryo.

As part of that, instead of using xxxKryo in methods, call them xxxGeneric.

A more extreme change would be to totally rely on Fury (so no more POJO 
serializer). Fury is faster than the POJO serializer in my tests, but this 
would be a much bigger change.

Though it could dramatically simplify the Flink serialization support.

— Ken

PS - a separate issue is how to migrate state from Kryo to something like Fury, 
which supports schema evolution. I think this might be possible, by having a 
smarter deserializer that identifies state as being created by Kryo, and using 
(shaded) Kryo to deserialize, while still writing as Fury.

> On Dec 6, 2023, at 6:35 PM, Yong Fang  wrote:
> 
> Hi devs,
> 
> I'd like to start a discussion about FLIP-398: Improve Serialization
> Configuration And Usage In Flink [1].
> 
> Currently, users can register custom data types and serializers in Flink
> jobs through various methods, including registration in code,
> configuration, and annotations. These lead to difficulties in upgrading
> Flink jobs and priority issues.
> 
> In flink-2.0 we would like to manage job data types and serializers through
> configurations. This FLIP will introduce a unified option for data type and
> serializer and users can configure all custom data types and
> pojo/kryo/custom serializers. In addition, this FLIP will add more built-in
> serializers for complex data types such as List and Map, and optimize the
> management of Avro Serializers.
> 
> Looking forward to hearing from you, thanks!
> 
> [1]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-398%3A+Improve+Serialization+Configuration+And+Usage+In+Flink
> 
> Best,
> Fang Yong

--
Ken Krugler
http://www.scaleunlimited.com
Custom big data solutions
Flink & Pinot

[jira] [Created] (FLINK-33851) CLONE - Start End of Life discussion thread for now outdated Flink minor version

2023-12-14 Thread Jing Ge (Jira)

Jing Ge created FLINK-33851:
---

 Summary: CLONE - Start End of Life discussion thread for now 
outdated Flink minor version
 Key: FLINK-33851
 URL: https://issues.apache.org/jira/browse/FLINK-33851
 Project: Flink
  Issue Type: Sub-task
Reporter: Jing Ge


The idea is to discuss whether we should do a final release for the now not 
supported minor version in the community. Such a minor release shouldn't be 
covered by the current minor version release managers. Their only 
responsibility is to trigger the discussion.

The intention of a final patch release for the now unsupported Flink minor 
version is to flush out all the fixes that didn't end up in the previous 
release.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-33848) CLONE - Other announcements

2023-12-14 Thread Jing Ge (Jira)

Jing Ge created FLINK-33848:
---

 Summary: CLONE - Other announcements
 Key: FLINK-33848
 URL: https://issues.apache.org/jira/browse/FLINK-33848
 Project: Flink
  Issue Type: Sub-task
Reporter: Jing Ge


h3. Recordkeeping

Use [reporter.apache.org|https://reporter.apache.org/addrelease.html?flink] to 
seed the information about the release into future project reports.

(Note: Only PMC members have access report releases. If you do not have access, 
ask on the mailing list for assistance.)
h3. Flink blog

Major or otherwise important releases should have a blog post. Write one if 
needed for this particular release. Minor releases that don’t introduce new 
major functionality don’t necessarily need to be blogged (see [flink-web PR 
#581 for Flink 1.15.3|https://github.com/apache/flink-web/pull/581] as an 
example for a minor release blog post).

Please make sure that the release notes of the documentation (see section 
"Review and update documentation") are linked from the blog post of a major 
release.
We usually include the names of all contributors in the announcement blog post. 
Use the following command to get the list of contributors:
{code}
# first line is required to make sort first with uppercase and then lower
export LC_ALL=C
export FLINK_PREVIOUS_RELEASE_BRANCH=
export FLINK_CURRENT_RELEASE_BRANCH=
# e.g.
# export FLINK_PREVIOUS_RELEASE_BRANCH=release-1.17
# export FLINK_CURRENT_RELEASE_BRANCH=release-1.18
git log $(git merge-base master $FLINK_PREVIOUS_RELEASE_BRANCH)..$(git show-ref 
--hash ${FLINK_CURRENT_RELEASE_BRANCH}) --pretty=format:"%an%n%cn" | sort  -u | 
paste -sd, | sed "s/\,/\, /g"
{code}
h3. Social media

Tweet, post on Facebook, LinkedIn, and other platforms. Ask other contributors 
to do the same.
h3. Flink Release Wiki page

Add a summary of things that went well or that went not so well during the 
release process. This can include feedback from contributors but also more 
generic things like the release have taken longer than initially anticipated 
(and why) to give a bit of context to the release process.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-33845) CLONE - Merge website pull request

2023-12-14 Thread Jing Ge (Jira)

Jing Ge created FLINK-33845:
---

 Summary: CLONE - Merge website pull request
 Key: FLINK-33845
 URL: https://issues.apache.org/jira/browse/FLINK-33845
 Project: Flink
  Issue Type: Sub-task
Reporter: Jing Ge


Merge the website pull request to [list the 
release|http://flink.apache.org/downloads.html]. Make sure to regenerate the 
website as well, as it isn't build automatically.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-33844) CLONE - Update japicmp configuration

2023-12-14 Thread Jing Ge (Jira)

Jing Ge created FLINK-33844:
---

 Summary: CLONE - Update japicmp configuration
 Key: FLINK-33844
 URL: https://issues.apache.org/jira/browse/FLINK-33844
 Project: Flink
  Issue Type: Sub-task
Reporter: Jing Ge
Assignee: Sergey Nuyanzin
 Fix For: 1.19.0, 1.18.1


Update the japicmp reference version and wipe exclusions / enable API 
compatibility checks for {{@PublicEvolving}} APIs on the corresponding SNAPSHOT 
branch with the {{update_japicmp_configuration.sh}} script (see below).

For a new major release (x.y.0), run the same command also on the master branch 
for updating the japicmp reference version and removing out-dated exclusions in 
the japicmp configuration.

Make sure that all Maven artifacts are already pushed to Maven Central. 
Otherwise, there's a risk that CI fails due to missing reference artifacts.
{code:bash}
tools $ NEW_VERSION=$RELEASE_VERSION releasing/update_japicmp_configuration.sh
tools $ cd ..$ git add *$ git commit -m "Update japicmp configuration for 
$RELEASE_VERSION" {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-33847) CLONE - Apache mailing lists announcements

2023-12-14 Thread Jing Ge (Jira)

Jing Ge created FLINK-33847:
---

 Summary: CLONE - Apache mailing lists announcements
 Key: FLINK-33847
 URL: https://issues.apache.org/jira/browse/FLINK-33847
 Project: Flink
  Issue Type: Sub-task
Reporter: Jing Ge


Announce on the {{dev@}} mailing list that the release has been finished.

Announce on the release on the {{user@}} mailing list, listing major 
improvements and contributions.

Announce the release on the [annou...@apache.org|mailto:annou...@apache.org] 
mailing list.
{panel}
{panel}
|{{From: Release Manager}}
{{To: dev@flink.apache.org, u...@flink.apache.org, user...@flink.apache.org, 
annou...@apache.org}}
{{Subject: [ANNOUNCE] Apache Flink 1.2.3 released}}
 
{{The Apache Flink community is very happy to announce the release of Apache 
Flink 1.2.3, which is the third bugfix release for the Apache Flink 1.2 
series.}}
 
{{Apache Flink® is an open-source stream processing framework for distributed, 
high-performing, always-available, and accurate data streaming applications.}}
 
{{The release is available for download at:}}
{{[https://flink.apache.org/downloads.html]}}
 
{{Please check out the release blog post for an overview of the improvements 
for this bugfix release:}}
{{}}
 
{{The full release notes are available in Jira:}}
{{}}
 
{{We would like to thank all contributors of the Apache Flink community who 
made this release possible!}}
 
{{Feel free to reach out to the release managers (or respond to this thread) 
with feedback on the release process. Our goal is to constantly improve the 
release process. Feedback on what could be improved or things that didn't go so 
well are appreciated.}}
 
{{Regards,}}
{{Release Manager}}|



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-33850) CLONE - Updates the docs stable version

2023-12-14 Thread Jing Ge (Jira)

Jing Ge created FLINK-33850:
---

 Summary: CLONE - Updates the docs stable version
 Key: FLINK-33850
 URL: https://issues.apache.org/jira/browse/FLINK-33850
 Project: Flink
  Issue Type: Sub-task
Reporter: Jing Ge


Update docs to "stable" in {{docs/config.toml}} in the branch of the 
_just-released_ version:
 * Change V{{{}ersion{}}} from {{{}x.y-SNAPSHOT }}to \{{{}x.y.z{}}}, i.e. 
{{1.6-SNAPSHOT}} to {{1.6.0}}
 * Change V{{{}ersionTitle{}}} from {{x.y-SNAPSHOT}} to {{{}x.y{}}}, i.e. 
{{1.6-SNAPSHOT}} to {{1.6}}
 * Change Branch from {{master}} to {{{}release-x.y{}}}, i.e. {{master}} to 
{{release-1.6}}
 * Change {{baseURL}} from 
{{//[ci.apache.org/projects/flink/flink-docs-master|http://ci.apache.org/projects/flink/flink-docs-master]}}
 to 
{{//[ci.apache.org/projects/flink/flink-docs-release-x.y|http://ci.apache.org/projects/flink/flink-docs-release-x.y]}}
 * Change {{javadocs_baseurl}} from 
{{//[ci.apache.org/projects/flink/flink-docs-master|http://ci.apache.org/projects/flink/flink-docs-master]}}
 to 
{{//[ci.apache.org/projects/flink/flink-docs-release-x.y|http://ci.apache.org/projects/flink/flink-docs-release-x.y]}}
 * Change {{IsStable}} to {{true}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-33849) CLONE - Update reference data for Migration Tests

2023-12-14 Thread Jing Ge (Jira)

Jing Ge created FLINK-33849:
---

 Summary: CLONE - Update reference data for Migration Tests
 Key: FLINK-33849
 URL: https://issues.apache.org/jira/browse/FLINK-33849
 Project: Flink
  Issue Type: Sub-task
Reporter: Jing Ge
Assignee: Sergey Nuyanzin
 Fix For: 1.19.0, 1.18.1


Update migration tests in master to cover migration from new version. Since 
1.18, this step could be done automatically with the following steps. For more 
information please refer to [this 
page.|https://github.com/apache/flink/blob/master/flink-test-utils-parent/flink-migration-test-utils/README.md]
 # {*}On the published release tag (e.g., release-1.16.0){*}, run 
{panel}
{panel}
|{{$ mvn clean }}{{package}} {{{}-Pgenerate-migration-test-data 
-Dgenerate.version={}}}{{{}1.16{}}} {{-nsu -Dfast -DskipTests}}|

The version (1.16 in the command above) should be replaced with the target one.

 # Modify the content of the file 
[apache/flink:flink-test-utils-parent/flink-migration-test-utils/src/main/resources/most_recently_published_version|https://github.com/apache/flink/blob/master/flink-test-utils-parent/flink-migration-test-utils/src/main/resources/most_recently_published_version]
 to the latest version (it would be "v1_16" if sticking to the example where 
1.16.0 was released). 
 # Commit the modification in step a and b with "{_}[release] Generate 
reference data for state migration tests based on release-1.xx.0{_}" to the 
corresponding release branch (e.g. {{release-1.16}} in our example), replace 
"xx" with the actual version (in this example "16"). You should use the Jira 
issue ID in case of [release]  as the commit message's prefix if you have a 
dedicated Jira issue for this task.

 # Cherry-pick the commit to the master branch. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-33843) Promote release 1.18.1

2023-12-14 Thread Jing Ge (Jira)

Jing Ge created FLINK-33843:
---

 Summary: Promote release 1.18.1
 Key: FLINK-33843
 URL: https://issues.apache.org/jira/browse/FLINK-33843
 Project: Flink
  Issue Type: New Feature
Affects Versions: 1.18.0
Reporter: Jing Ge
Assignee: Jing Ge


Once the release has been finalized (FLINK-32920), the last step of the process 
is to promote the release within the project and beyond. Please wait for 24h 
after finalizing the release in accordance with the [ASF release 
policy|http://www.apache.org/legal/release-policy.html#release-announcements].

*Final checklist to declare this issue resolved:*
 # Website pull request to [list the 
release|http://flink.apache.org/downloads.html] merged
 # Release announced on the user@ mailing list.
 # Blog post published, if applicable.
 # Release recorded in 
[reporter.apache.org|https://reporter.apache.org/addrelease.html?flink].
 # Release announced on social media.
 # Completion declared on the dev@ mailing list.
 # Update Homebrew: [https://docs.brew.sh/How-To-Open-a-Homebrew-Pull-Request] 
(seems to be done automatically - at least for minor releases  for both minor 
and major releases)
 # Updated the japicmp configuration
 ** corresponding SNAPSHOT branch japicmp reference version set to the just 
released version, and API compatibiltity checks for {{@PublicEvolving}}  was 
enabled
 ** (minor version release only) master branch japicmp reference version set to 
the just released version
 ** (minor version release only) master branch japicmp exclusions have been 
cleared
 # Update the list of previous version in {{docs/config.toml}} on the master 
branch.
 # Set {{show_outdated_warning: true}} in {{docs/config.toml}} in the branch of 
the _now deprecated_ Flink version (i.e. 1.16 if 1.18.0 is released)
 # Update stable and master alias in 
[https://github.com/apache/flink/blob/master/.github/workflows/docs.yml]
 # Open discussion thread for End of Life for Unsupported version (i.e. 1.16)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-33846) CLONE - Remove outdated versions

2023-12-14 Thread Jing Ge (Jira)

Jing Ge created FLINK-33846:
---

 Summary: CLONE - Remove outdated versions
 Key: FLINK-33846
 URL: https://issues.apache.org/jira/browse/FLINK-33846
 Project: Flink
  Issue Type: Sub-task
Reporter: Jing Ge


h4. dist.apache.org

For a new major release remove all release files older than 2 versions, e.g., 
when releasing 1.7, remove all releases <= 1.5.

For a new bugfix version remove all release files for previous bugfix releases 
in the same series, e.g., when releasing 1.7.1, remove the 1.7.0 release.
# If you have not already, check out the Flink section of the {{release}} 
repository on {{[dist.apache.org|http://dist.apache.org/]}} via Subversion. In 
a fresh directory:
{code}
svn checkout https://dist.apache.org/repos/dist/release/flink --depth=immediates
cd flink
{code}
# Remove files for outdated releases and commit the changes.
{code}
svn remove flink-
svn commit
{code}
# Verify that files  are 
[removed|https://dist.apache.org/repos/dist/release/flink]
(!) Remember to remove the corresponding download links from the website.

h4. CI

Disable the cron job for the now-unsupported version from 
(tools/azure-pipelines/[build-apache-repo.yml|https://github.com/apache/flink/blob/master/tools/azure-pipelines/build-apache-repo.yml])
 in the respective branch.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-33840) CLONE - Deploy artifacts to Maven Central Repository

2023-12-14 Thread Jing Ge (Jira)

Jing Ge created FLINK-33840:
---

 Summary: CLONE - Deploy artifacts to Maven Central Repository
 Key: FLINK-33840
 URL: https://issues.apache.org/jira/browse/FLINK-33840
 Project: Flink
  Issue Type: Sub-task
Reporter: Jing Ge
Assignee: Qingsheng Ren


Use the [Apache Nexus repository|https://repository.apache.org/] to release the 
staged binary artifacts to the Maven Central repository. In the Staging 
Repositories section, find the relevant release candidate orgapacheflink-XXX 
entry and click Release. Drop all other release candidates that are not being 
released.
h3. Deploy source and binary releases to dist.apache.org

Copy the source and binary releases from the dev repository to the release 
repository at [dist.apache.org|http://dist.apache.org/] using Subversion.
{code:java}
$ svn move -m "Release Flink ${RELEASE_VERSION}" 
https://dist.apache.org/repos/dist/dev/flink/flink-${RELEASE_VERSION}-rc${RC_NUM}
 https://dist.apache.org/repos/dist/release/flink/flink-${RELEASE_VERSION}
{code}
(Note: Only PMC members have access to the release repository. If you do not 
have access, ask on the mailing list for assistance.)
h3. Remove old release candidates from [dist.apache.org|http://dist.apache.org/]

Remove the old release candidates from 
[https://dist.apache.org/repos/dist/dev/flink] using Subversion.
{code:java}
$ svn checkout https://dist.apache.org/repos/dist/dev/flink --depth=immediates
$ cd flink
$ svn remove flink-${RELEASE_VERSION}-rc*
$ svn commit -m "Remove old release candidates for Apache Flink 
${RELEASE_VERSION}
{code}
 

h3. Expectations
 * Maven artifacts released and indexed in the [Maven Central 
Repository|https://search.maven.org/#search%7Cga%7C1%7Cg%3A%22org.apache.flink%22]
 (usually takes about a day to show up)
 * Source & binary distributions available in the release repository of 
[https://dist.apache.org/repos/dist/release/flink/]
 * Dev repository [https://dist.apache.org/repos/dist/dev/flink/] is empty
 * Website contains links to new release binaries and sources in download page
 * (for minor version updates) the front page references the correct new major 
release version and directs to the correct link



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-33841) CLONE - Create Git tag and mark version as released in Jira

2023-12-14 Thread Jing Ge (Jira)

Jing Ge created FLINK-33841:
---

 Summary: CLONE - Create Git tag and mark version as released in 
Jira
 Key: FLINK-33841
 URL: https://issues.apache.org/jira/browse/FLINK-33841
 Project: Flink
  Issue Type: Sub-task
Reporter: Jing Ge
Assignee: Qingsheng Ren


Create and push a new Git tag for the released version by copying the tag for 
the final release candidate, as follows:
{code:java}
$ git tag -s "release-${RELEASE_VERSION}" refs/tags/${TAG}^{} -m "Release Flink 
${RELEASE_VERSION}"
$ git push  refs/tags/release-${RELEASE_VERSION}
{code}
In JIRA, inside [version 
management|https://issues.apache.org/jira/plugins/servlet/project-config/FLINK/versions],
 hover over the current release and a settings menu will appear. Click Release, 
and select today’s date.

(Note: Only PMC members have access to the project administration. If you do 
not have access, ask on the mailing list for assistance.)

If PRs have been merged to the release branch after the the last release 
candidate was tagged, make sure that the corresponding Jira tickets have the 
correct Fix Version set.

 

h3. Expectations
 * Release tagged in the source code repository
 * Release version finalized in JIRA. (Note: Not all committers have 
administrator access to JIRA. If you end up getting permissions errors ask on 
the mailing list for assistance)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-33839) CLONE - Deploy Python artifacts to PyPI

2023-12-14 Thread Jing Ge (Jira)

Jing Ge created FLINK-33839:
---

 Summary: CLONE - Deploy Python artifacts to PyPI
 Key: FLINK-33839
 URL: https://issues.apache.org/jira/browse/FLINK-33839
 Project: Flink
  Issue Type: Sub-task
Reporter: Jing Ge
Assignee: Qingsheng Ren


Release manager should create a PyPI account and ask the PMC add this account 
to pyflink collaborator list with Maintainer role (The PyPI admin account info 
can be found here. NOTE, only visible to PMC members) to deploy the Python 
artifacts to PyPI. The artifacts could be uploaded using 
twine([https://pypi.org/project/twine/]). To install twine, just run:
{code:java}
pip install --upgrade twine==1.12.0
{code}
Download the python artifacts from dist.apache.org and upload it to pypi.org:
{code:java}
svn checkout 
https://dist.apache.org/repos/dist/dev/flink/flink-${RELEASE_VERSION}-rc${RC_NUM}
cd flink-${RELEASE_VERSION}-rc${RC_NUM}
 
cd python
 
#uploads wheels
for f in *.whl; do twine upload --repository-url 
https://upload.pypi.org/legacy/ $f $f.asc; done
 
#upload source packages
twine upload --repository-url https://upload.pypi.org/legacy/ 
apache-flink-libraries-${RELEASE_VERSION}.tar.gz 
apache-flink-libraries-${RELEASE_VERSION}.tar.gz.asc
 
twine upload --repository-url https://upload.pypi.org/legacy/ 
apache-flink-${RELEASE_VERSION}.tar.gz 
apache-flink-${RELEASE_VERSION}.tar.gz.asc
{code}
If upload failed or incorrect for some reason (e.g. network transmission 
problem), you need to delete the uploaded release package of the same version 
(if exists) and rename the artifact to 
{{{}apache-flink-${RELEASE_VERSION}.post0.tar.gz{}}}, then re-upload.

(!) Note: re-uploading to pypi.org must be avoided as much as possible because 
it will cause some irreparable problems. If that happens, users cannot install 
the apache-flink package by explicitly specifying the package version, i.e. the 
following command "pip install apache-flink==${RELEASE_VERSION}" will fail. 
Instead they have to run "pip install apache-flink" or "pip install 
apache-flink==${RELEASE_VERSION}.post0" to install the apache-flink package.

 

h3. Expectations
 * Python artifacts released and indexed in the 
[PyPI|https://pypi.org/project/apache-flink/] Repository



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-33838) Finalize release 1.18.1

2023-12-14 Thread Jing Ge (Jira)

Jing Ge created FLINK-33838:
---

 Summary: Finalize release 1.18.1
 Key: FLINK-33838
 URL: https://issues.apache.org/jira/browse/FLINK-33838
 Project: Flink
  Issue Type: New Feature
Affects Versions: 1.17.0
Reporter: Jing Ge
Assignee: Qingsheng Ren


Once the release candidate has been reviewed and approved by the community, the 
release should be finalized. This involves the final deployment of the release 
candidate to the release repositories, merging of the website changes, etc.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-33842) CLONE - Publish the Dockerfiles for the new release

2023-12-14 Thread Jing Ge (Jira)

Jing Ge created FLINK-33842:
---

 Summary: CLONE - Publish the Dockerfiles for the new release
 Key: FLINK-33842
 URL: https://issues.apache.org/jira/browse/FLINK-33842
 Project: Flink
  Issue Type: Sub-task
Reporter: Jing Ge
Assignee: Matthias Pohl


Note: the official Dockerfiles fetch the binary distribution of the target 
Flink version from an Apache mirror. After publishing the binary release 
artifacts, mirrors can take some hours to start serving the new artifacts, so 
you may want to wait to do this step until you are ready to continue with the 
"Promote the release" steps in the follow-up Jira.

Follow the [release instructions in the flink-docker 
repo|https://github.com/apache/flink-docker#release-workflow] to build the new 
Dockerfiles and send an updated manifest to Docker Hub so the new images are 
built and published.

 

h3. Expectations
 * Dockerfiles in [flink-docker|https://github.com/apache/flink-docker] updated 
for the new Flink release and pull request opened on the Docker official-images 
with an updated manifest



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-33837) CLONE - Vote on the release candidate

2023-12-14 Thread Jing Ge (Jira)

Jing Ge created FLINK-33837:
---

 Summary: CLONE - Vote on the release candidate
 Key: FLINK-33837
 URL: https://issues.apache.org/jira/browse/FLINK-33837
 Project: Flink
  Issue Type: Sub-task
Affects Versions: 1.17.0
Reporter: Jing Ge
Assignee: Qingsheng Ren
 Fix For: 1.17.0


Once you have built and individually reviewed the release candidate, please 
share it for the community-wide review. Please review foundation-wide [voting 
guidelines|http://www.apache.org/foundation/voting.html] for more information.

Start the review-and-vote thread on the dev@ mailing list. Here’s an email 
template; please adjust as you see fit.
{quote}From: Release Manager
To: dev@flink.apache.org
Subject: [VOTE] Release 1.2.3, release candidate #3

Hi everyone,
Please review and vote on the release candidate #3 for the version 1.2.3, as 
follows:
[ ] +1, Approve the release
[ ] -1, Do not approve the release (please provide specific comments)

The complete staging area is available for your review, which includes:
 * JIRA release notes [1],
 * the official Apache source release and binary convenience releases to be 
deployed to dist.apache.org [2], which are signed with the key with fingerprint 
 [3],
 * all artifacts to be deployed to the Maven Central Repository [4],
 * source code tag "release-1.2.3-rc3" [5],
 * website pull request listing the new release and adding announcement blog 
post [6].

The vote will be open for at least 72 hours. It is adopted by majority 
approval, with at least 3 PMC affirmative votes.

Thanks,
Release Manager

[1] link
[2] link
[3] [https://dist.apache.org/repos/dist/release/flink/KEYS]
[4] link
[5] link
[6] link
{quote}
*If there are any issues found in the release candidate, reply on the vote 
thread to cancel the vote.* There’s no need to wait 72 hours. Proceed to the 
Fix Issues step below and address the problem. However, some issues don’t 
require cancellation. For example, if an issue is found in the website pull 
request, just correct it on the spot and the vote can continue as-is.

For cancelling a release, the release manager needs to send an email to the 
release candidate thread, stating that the release candidate is officially 
cancelled. Next, all artifacts created specifically for the RC in the previous 
steps need to be removed:
 * Delete the staging repository in Nexus
 * Remove the source / binary RC files from dist.apache.org
 * Delete the source code tag in git

*If there are no issues, reply on the vote thread to close the voting.* Then, 
tally the votes in a separate email. Here’s an email template; please adjust as 
you see fit.
{quote}From: Release Manager
To: dev@flink.apache.org
Subject: [RESULT] [VOTE] Release 1.2.3, release candidate #3

I'm happy to announce that we have unanimously approved this release.

There are XXX approving votes, XXX of which are binding:
 * approver 1
 * approver 2
 * approver 3
 * approver 4

There are no disapproving votes.

Thanks everyone!
{quote}
 

h3. Expectations
 * Community votes to release the proposed candidate, with at least three 
approving PMC votes

Any issues that are raised till the vote is over should be either resolved or 
moved into the next release (if applicable).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-33836) CLONE - Propose a pull request for website updates

2023-12-14 Thread Jing Ge (Jira)

Jing Ge created FLINK-33836:
---

 Summary: CLONE - Propose a pull request for website updates
 Key: FLINK-33836
 URL: https://issues.apache.org/jira/browse/FLINK-33836
 Project: Flink
  Issue Type: Sub-task
Affects Versions: 1.17.0
Reporter: Jing Ge
Assignee: Qingsheng Ren
 Fix For: 1.17.0


The final step of building the candidate is to propose a website pull request 
containing the following changes:
 # update 
[apache/flink-web:_config.yml|https://github.com/apache/flink-web/blob/asf-site/_config.yml]
 ## update {{FLINK_VERSION_STABLE}} and {{FLINK_VERSION_STABLE_SHORT}} as 
required
 ## update version references in quickstarts ({{{}q/{}}} directory) as required
 ## (major only) add a new entry to {{flink_releases}} for the release binaries 
and sources
 ## (minor only) update the entry for the previous release in the series in 
{{flink_releases}}
 ### Please pay notice to the ids assigned to the download entries. They should 
be unique and reflect their corresponding version number.
 ## add a new entry to {{release_archive.flink}}
 # add a blog post announcing the release in _posts
 # add a organized release notes page under docs/content/release-notes and 
docs/content.zh/release-notes (like 
[https://nightlies.apache.org/flink/flink-docs-release-1.15/release-notes/flink-1.15/]).
 The page is based on the non-empty release notes collected from the issues, 
and only the issues that affect existing users should be included (e.g., 
instead of new functionality). It should be in a separate PR since it would be 
merged to the flink project.

(!) Don’t merge the PRs before finalizing the release.

 

h3. Expectations
 * Website pull request proposed to list the 
[release|http://flink.apache.org/downloads.html]
 * (major only) Check {{docs/config.toml}} to ensure that
 ** the version constants refer to the new version
 ** the {{baseurl}} does not point to {{flink-docs-master}}  but 
{{flink-docs-release-X.Y}} instead



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-33834) CLONE - Build and stage Java and Python artifacts

2023-12-14 Thread Jing Ge (Jira)

Jing Ge created FLINK-33834:
---

 Summary: CLONE - Build and stage Java and Python artifacts
 Key: FLINK-33834
 URL: https://issues.apache.org/jira/browse/FLINK-33834
 Project: Flink
  Issue Type: Sub-task
Reporter: Jing Ge
Assignee: Qingsheng Ren


# Create a local release branch ((!) this step can not be skipped for minor 
releases):
{code:bash}
$ cd ./tools
tools/ $ OLD_VERSION=$CURRENT_SNAPSHOT_VERSION NEW_VERSION=$RELEASE_VERSION 
RELEASE_CANDIDATE=$RC_NUM releasing/create_release_branch.sh
{code}
 # Tag the release commit:
{code:bash}
$ git tag -s ${TAG} -m "${TAG}"
{code}
 # We now need to do several things:
 ## Create the source release archive
 ## Deploy jar artefacts to the [Apache Nexus 
Repository|https://repository.apache.org/], which is the staging area for 
deploying the jars to Maven Central
 ## Build PyFlink wheel packages
You might want to create a directory on your local machine for collecting the 
various source and binary releases before uploading them. Creating the binary 
releases is a lengthy process but you can do this on another machine (for 
example, in the "cloud"). When doing this, you can skip signing the release 
files on the remote machine, download them to your local machine and sign them 
there.
 # Build the source release:
{code:bash}
tools $ RELEASE_VERSION=$RELEASE_VERSION releasing/create_source_release.sh
{code}
 # Stage the maven artifacts:
{code:bash}
tools $ releasing/deploy_staging_jars.sh
{code}
Review all staged artifacts ([https://repository.apache.org/]). They should 
contain all relevant parts for each module, including pom.xml, jar, test jar, 
source, test source, javadoc, etc. Carefully review any new artifacts.
 # Close the staging repository on Apache Nexus. When prompted for a 
description, enter “Apache Flink, version X, release candidate Y”.
Then, you need to build the PyFlink wheel packages (since 1.11):
 # Set up an azure pipeline in your own Azure account. You can refer to [Azure 
Pipelines|https://cwiki.apache.org/confluence/display/FLINK/Azure+Pipelines#AzurePipelines-Tutorial:SettingupAzurePipelinesforaforkoftheFlinkrepository]
 for more details on how to set up azure pipeline for a fork of the Flink 
repository. Note that a google cloud mirror in Europe is used for downloading 
maven artifacts, therefore it is recommended to set your [Azure organization 
region|https://docs.microsoft.com/en-us/azure/devops/organizations/accounts/change-organization-location]
 to Europe to speed up the downloads.
 # Push the release candidate branch to your forked personal Flink repository, 
e.g.
{code:bash}
tools $ git push  
refs/heads/release-${RELEASE_VERSION}-rc${RC_NUM}:release-${RELEASE_VERSION}-rc${RC_NUM}
{code}
 # Trigger the Azure Pipelines manually to build the PyFlink wheel packages
 ## Go to your Azure Pipelines Flink project → Pipelines
 ## Click the "New pipeline" button on the top right
 ## Select "GitHub" → your GitHub Flink repository → "Existing Azure Pipelines 
YAML file"
 ## Select your branch → Set path to "/azure-pipelines.yaml" → click on 
"Continue" → click on "Variables"
 ## Then click "New Variable" button, fill the name with "MODE", and the value 
with "release". Click "OK" to set the variable and the "Save" button to save 
the variables, then back on the "Review your pipeline" screen click "Run" to 
trigger the build.
 ## You should now see a build where only the "CI build (release)" is running
 # Download the PyFlink wheel packages from the build result page after the 
jobs of "build_wheels mac" and "build_wheels linux" have finished.
 ## Download the PyFlink wheel packages
 ### Open the build result page of the pipeline
 ### Go to the {{Artifacts}} page (build_wheels linux -> 1 artifact)
 ### Click {{wheel_Darwin_build_wheels mac}} and {{wheel_Linux_build_wheels 
linux}} separately to download the zip files
 ## Unzip these two zip files
{code:bash}
$ cd /path/to/downloaded_wheel_packages
$ unzip wheel_Linux_build_wheels\ linux.zip
$ unzip wheel_Darwin_build_wheels\ mac.zip{code}
 ## Create directory {{./dist}} under the directory of {{{}flink-python{}}}:
{code:bash}
$ cd 
$ mkdir flink-python/dist{code}
 ## Move the unzipped wheel packages to the directory of 
{{{}flink-python/dist{}}}:
{code:java}
$ mv /path/to/wheel_Darwin_build_wheels\ mac/* flink-python/dist/
$ mv /path/to/wheel_Linux_build_wheels\ linux/* flink-python/dist/
$ cd tools{code}

Finally, we create the binary convenience release files:
{code:bash}
tools $ RELEASE_VERSION=$RELEASE_VERSION releasing/create_binary_release.sh
{code}
If you want to run this step in parallel on a remote machine you have to make 
the release commit available there (for example by pushing to a repository). 
*This is important: the commit inside the binary builds has to match the commit 
of the source builds and the tagged release commit.* 
When building remotely, you can

[jira] [Created] (FLINK-33835) CLONE - Stage source and binary releases on dist.apache.org

2023-12-14 Thread Jing Ge (Jira)

Jing Ge created FLINK-33835:
---

 Summary: CLONE - Stage source and binary releases on 
dist.apache.org
 Key: FLINK-33835
 URL: https://issues.apache.org/jira/browse/FLINK-33835
 Project: Flink
  Issue Type: Sub-task
Reporter: Jing Ge
Assignee: Qingsheng Ren


Copy the source release to the dev repository of dist.apache.org:
# If you have not already, check out the Flink section of the dev repository on 
dist.apache.org via Subversion. In a fresh directory:
{code:bash}
$ svn checkout https://dist.apache.org/repos/dist/dev/flink --depth=immediates
{code}
# Make a directory for the new release and copy all the artifacts (Flink 
source/binary distributions, hashes, GPG signatures and the python 
subdirectory) into that newly created directory:
{code:bash}
$ mkdir flink/flink-${RELEASE_VERSION}-rc${RC_NUM}
$ mv /tools/releasing/release/* 
flink/flink-${RELEASE_VERSION}-rc${RC_NUM}
{code}
# Add and commit all the files.
{code:bash}
$ cd flink
flink $ svn add flink-${RELEASE_VERSION}-rc${RC_NUM}
flink $ svn commit -m "Add flink-${RELEASE_VERSION}-rc${RC_NUM}"
{code}
# Verify that files are present under 
[https://dist.apache.org/repos/dist/dev/flink|https://dist.apache.org/repos/dist/dev/flink].
# Push the release tag if not done already (the following command assumes to be 
called from within the apache/flink checkout):
{code:bash}
$ git push  refs/tags/release-${RELEASE_VERSION}-rc${RC_NUM}
{code}

 

h3. Expectations
 * Maven artifacts deployed to the staging repository of 
[repository.apache.org|https://repository.apache.org/content/repositories/]
 * Source distribution deployed to the dev repository of 
[dist.apache.org|https://dist.apache.org/repos/dist/dev/flink/]
 * Check hashes (e.g. shasum -c *.sha512)
 * Check signatures (e.g. {{{}gpg --verify 
flink-1.2.3-source-release.tar.gz.asc flink-1.2.3-source-release.tar.gz{}}})
 * {{grep}} for legal headers in each file.
 * If time allows check the NOTICE files of the modules whose dependencies have 
been changed in this release in advance, since the license issues from time to 
time pop up during voting. See [Verifying a Flink 
Release|https://cwiki.apache.org/confluence/display/FLINK/Verifying+a+Flink+Release]
 "Checking License" section.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-33833) Build Release Candidate: 1.18.1-rc1

2023-12-14 Thread Jing Ge (Jira)

Jing Ge created FLINK-33833:
---

 Summary: Build Release Candidate: 1.18.1-rc1
 Key: FLINK-33833
 URL: https://issues.apache.org/jira/browse/FLINK-33833
 Project: Flink
  Issue Type: New Feature
Affects Versions: 1.17.0
Reporter: Jing Ge
Assignee: Jing Ge
 Fix For: 1.17.0


The core of the release process is the build-vote-fix cycle. Each cycle 
produces one release candidate. The Release Manager repeats this cycle until 
the community approves one release candidate, which is then finalized.

h4. Prerequisites
Set up a few environment variables to simplify Maven commands that follow. This 
identifies the release candidate being built. Start with {{RC_NUM}} equal to 1 
and increment it for each candidate:
{code}
RC_NUM="1"
TAG="release-${RELEASE_VERSION}-rc${RC_NUM}"
{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Re: [VOTE] Release flink-connector-pulsar 4.1.0, release candidate #1

2023-12-14 Thread Hang Ruan

+1 (non-binding)

- Validated checksum hash
- Verified signature
- Verified that no binaries exist in the source archive
- Build the source with jdk8
- Verified web PR
- Make sure flink-connector-base have the provided scope

Best,
Hang

tison  于2023年12月14日周四 11:51写道：

> Thanks Leonard for driving this release!
>
> +1 (non-binding)
>
> * Download link valid
> * Maven staging artifacts look good.
> * Checksum and gpg matches
> * LICENSE and NOTICE exist
> * Can build from source.
>
> Best,
> tison.
>
> Rui Fan <1996fan...@gmail.com> 于2023年12月14日周四 09:23写道：
> >
> > Thanks Leonard for driving this release!
> >
> > +1 (non-binding)
> >
> > - Validated checksum hash
> > - Verified signature
> > - Verified that no binaries exist in the source archive
> > - Build the source with Maven and jdk8
> > - Verified licenses
> > - Verified web PRs, left a minor comment
> >
> > Best,
> > Rui
> >
> > On Wed, Dec 13, 2023 at 7:15 PM Leonard Xu  wrote:
> >>
> >> Hey all,
> >>
> >> Please review and vote on the release candidate #1 for the version
> 4.1.0 of the Apache Flink Pulsar Connector as follows:
> >>
> >> [ ] +1, Approve the release
> >> [ ] -1, Do not approve the release (please provide specific comments)
> >>
> >> The complete staging area is available for your review, which includes:
> >> * JIRA release notes [1],
> >> * The official Apache source release to be deployed to dist.apache.org
> [2], which are signed with the key with fingerprint
> >> 5B2F6608732389AEB67331F5B197E1F1108998AD [3],
> >> * All artifacts to be deployed to the Maven Central Repository [4],
> >> * Source code tag v4.1.0-rc1 [5],
> >> * Website pull request listing the new release [6].
> >>
> >> The vote will be open for at least 72 hours. It is adopted by majority
> approval, with at least 3 PMC affirmative votes.
> >>
> >>
> >> Best,
> >> Leonard
> >>
> >> [1]
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12353431
> >> [2]
> https://dist.apache.org/repos/dist/dev/flink/flink-connector-pulsar-4.1.0-rc1/
> >> [3] https://dist.apache.org/repos/dist/release/flink/KEYS
> >> [4]
> https://repository.apache.org/content/repositories/orgapacheflink-1688/
> >> [5] https://github.com/apache/flink-connector-pulsar/tree/v4.1.0-rc1
> >> [6] https://github.com/apache/flink-web/pull/703
>

[RESULT][VOTE] FLIP-401: REST API JSON response deserialization unknown field tolerance

2023-12-14 Thread Gabor Somogyi

Hi All,

FLIP-401: REST API JSON response deserialization unknown field tolerance
[1] has been accepted and voted through this thread [2].

The proposal received 4 binding approval votes and there is no disapproval:
- Gyula Fora (binding)
- Mátyás Őrhidi (binding)
- Peter Huang (non-binding)
- Maximilian Michels (binding)
- Rodrigo Meneses (non-binding)
- Márton Balassi (binding)

Thanks to all involved!

BR,
G

[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-401%3A+REST+API+JSON+response+deserialization+unknown+field+tolerance
[2] https://lists.apache.org/thread/vnksnq5t7gz7owgx4hofnr8c9bl3fgky

[DISCUSS] FLIP-402: Extend ZooKeeper Curator configurations

2023-12-14 Thread Alex Nitavsky

Hi all,

I would like to start a discussion thread for: *FLIP-402: Extend ZooKeeper
Curator configurations *[1]

* Problem statement *
Currently Flink misses several Apache Curator configurations, which could
be useful for Flink deployment with ZooKeeper as HA provider.

* Proposed solution *
We have inspected all possible options for Apache Curator and proposed
those which could be valuable for Flink users:

- high-availability.zookeeper.client.authorization [2]
- high-availability.zookeeper.client.maxCloseWaitMs [3]
- high-availability.zookeeper.client.simulatedSessionExpirationPercent [4]

The proposed way is to reflect those properties into Flink configuration
options for Apache ZooKeeper.

Looking forward to your feedback and suggestions.

Kind regards
Oleksandr

[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-402%3A+Extend+ZooKeeper+Curator+configurations
[2]
https://curator.apache.org/apidocs/org/apache/curator/framework/CuratorFrameworkFactory.Builder.html#authorization(java.lang.String,byte%5B%5D)
[3]
https://curator.apache.org/apidocs/org/apache/curator/framework/CuratorFrameworkFactory.Builder.html#maxCloseWaitMs(int)
[4]
https://curator.apache.org/apidocs/org/apache/curator/framework/CuratorFrameworkFactory.Builder.html#simulatedSessionExpirationPercent(int)

Re: join flink slack

2023-12-14 Thread Matthias Pohl

I updated the Slack invite link [1]. Feel free to use that one. I created a
PR [2] to reflect the change on the website as well.

Matthias

[1]
https://join.slack.com/t/apache-flink/shared_invite/zt-294plfx41-23lOoovZOdegKwjW9_0q_g
[2] https://github.com/apache/flink-web/pull/704

On Wed, Dec 13, 2023 at 1:35 PM Romit Mahanta 
wrote:

> Second that
>
> On Wed, 13 Dec, 2023, 5:58 pm yd c,  wrote:
>
> > can u invite flink slack
> >
>

[jira] [Created] (FLINK-33829) CLONE - Review Release Notes in JIRA

2023-12-14 Thread Jing Ge (Jira)

Jing Ge created FLINK-33829:
---

 Summary: CLONE - Review Release Notes in JIRA
 Key: FLINK-33829
 URL: https://issues.apache.org/jira/browse/FLINK-33829
 Project: Flink
  Issue Type: Sub-task
Reporter: Jing Ge
Assignee: Qingsheng Ren


JIRA automatically generates Release Notes based on the {{Fix Version}} field 
applied to issues. Release Notes are intended for Flink users (not Flink 
committers/contributors). You should ensure that Release Notes are informative 
and useful.

Open the release notes from the version status page by choosing the release 
underway and clicking Release Notes.

You should verify that the issues listed automatically by JIRA are appropriate 
to appear in the Release Notes. Specifically, issues should:
 * Be appropriately classified as {{{}Bug{}}}, {{{}New Feature{}}}, 
{{{}Improvement{}}}, etc.
 * Represent noteworthy user-facing changes, such as new functionality, 
backward-incompatible API changes, or performance improvements.
 * Have occurred since the previous release; an issue that was introduced and 
fixed between releases should not appear in the Release Notes.
 * Have an issue title that makes sense when read on its own.

Adjust any of the above properties to the improve clarity and presentation of 
the Release Notes.

Ensure that the JIRA release notes are also included in the release notes of 
the documentation (see section "Review and update documentation").
h4. Content of Release Notes field from JIRA tickets 

To get the list of "release notes" field from JIRA, you can ran the following 
script using JIRA REST API (notes the maxResults limits the number of entries):
{code:bash}
curl -s 
https://issues.apache.org/jira//rest/api/2/search?maxResults=200=project%20%3D%20FLINK%20AND%20%22Release%20Note%22%20is%20not%20EMPTY%20and%20fixVersion%20%3D%20${RELEASE_VERSION}
 | jq '.issues[]|.key,.fields.summary,.fields.customfield_12310192' | paste - - 
-
{code}
{{jq}}  is present in most Linux distributions and on MacOS can be installed 
via brew.

 

h3. Expectations
 * Release Notes in JIRA have been audited and adjusted



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-33827) CLONE - Review and update documentation

2023-12-14 Thread Jing Ge (Jira)

Jing Ge created FLINK-33827:
---

 Summary: CLONE - Review and update documentation
 Key: FLINK-33827
 URL: https://issues.apache.org/jira/browse/FLINK-33827
 Project: Flink
  Issue Type: Sub-task
Affects Versions: 1.17.0
Reporter: Jing Ge
Assignee: Qingsheng Ren
 Fix For: 1.17.0


There are a few pages in the documentation that need to be reviewed and updated 
for each release.
 * Ensure that there exists a release notes page for each non-bugfix release 
(e.g., 1.5.0) in {{{}./docs/release-notes/{}}}, that it is up-to-date, and 
linked from the start page of the documentation.
 * Upgrading Applications and Flink Versions: 
[https://ci.apache.org/projects/flink/flink-docs-master/ops/upgrading.html]
 * ...

 

h3. Expectations
 * Update upgrade compatibility table 
([apache-flink:./docs/content/docs/ops/upgrading.md|https://github.com/apache/flink/blob/master/docs/content/docs/ops/upgrading.md#compatibility-table]
 and 
[apache-flink:./docs/content.zh/docs/ops/upgrading.md|https://github.com/apache/flink/blob/master/docs/content.zh/docs/ops/upgrading.md#compatibility-table]).
 * Update [Release Overview in 
Confluence|https://cwiki.apache.org/confluence/display/FLINK/Release+Management+and+Feature+Plan]
 * (minor only) The documentation for the new major release is visible under 
[https://nightlies.apache.org/flink/flink-docs-release-$SHORT_RELEASE_VERSION] 
(after at least one [doc 
build|https://github.com/apache/flink/actions/workflows/docs.yml] succeeded).
 * (minor only) The documentation for the new major release does not contain 
"-SNAPSHOT" in its version title, and all links refer to the corresponding 
version docs instead of {{{}master{}}}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-33830) CLONE - Select executing Release Manager

2023-12-14 Thread Jing Ge (Jira)

Jing Ge created FLINK-33830:
---

 Summary: CLONE - Select executing Release Manager
 Key: FLINK-33830
 URL: https://issues.apache.org/jira/browse/FLINK-33830
 Project: Flink
  Issue Type: Sub-task
  Components: Release System
Affects Versions: 1.17.0
Reporter: Jing Ge
Assignee: Qingsheng Ren
 Fix For: 1.17.0


h4. GPG Key

You need to have a GPG key to sign the release artifacts. Please be aware of 
the ASF-wide [release signing 
guidelines|https://www.apache.org/dev/release-signing.html]. If you don’t have 
a GPG key associated with your Apache account, please create one according to 
the guidelines.

Determine your Apache GPG Key and Key ID, as follows:
{code:java}
$ gpg --list-keys
{code}
This will list your GPG keys. One of these should reflect your Apache account, 
for example:
{code:java}
--
pub   2048R/845E6689 2016-02-23
uid  Nomen Nescio 
sub   2048R/BA4D50BE 2016-02-23
{code}
In the example above, the key ID is the 8-digit hex string in the {{pub}} line: 
{{{}845E6689{}}}.

Now, add your Apache GPG key to the Flink’s {{KEYS}} file in the [Apache Flink 
release KEYS file|https://dist.apache.org/repos/dist/release/flink/KEYS] 
repository at [dist.apache.org|http://dist.apache.org/]. Follow the 
instructions listed at the top of these files. (Note: Only PMC members have 
write access to the release repository. If you end up getting 403 errors ask on 
the mailing list for assistance.)

Configure {{git}} to use this key when signing code by giving it your key ID, 
as follows:
{code:java}
$ git config --global user.signingkey 845E6689
{code}
You may drop the {{--global}} option if you’d prefer to use this key for the 
current repository only.

You may wish to start {{gpg-agent}} to unlock your GPG key only once using your 
passphrase. Otherwise, you may need to enter this passphrase hundreds of times. 
The setup for {{gpg-agent}} varies based on operating system, but may be 
something like this:
{code:bash}
$ eval $(gpg-agent --daemon --no-grab --write-env-file $HOME/.gpg-agent-info)
$ export GPG_TTY=$(tty)
$ export GPG_AGENT_INFO
{code}
h4. Access to Apache Nexus repository

Configure access to the [Apache Nexus 
repository|https://repository.apache.org/], which enables final deployment of 
releases to the Maven Central Repository.
 # You log in with your Apache account.
 # Confirm you have appropriate access by finding {{org.apache.flink}} under 
{{{}Staging Profiles{}}}.
 # Navigate to your {{Profile}} (top right drop-down menu of the page).
 # Choose {{User Token}} from the dropdown, then click {{{}Access User 
Token{}}}. Copy a snippet of the Maven XML configuration block.
 # Insert this snippet twice into your global Maven {{settings.xml}} file, 
typically {{{}${HOME}/.m2/settings.xml{}}}. The end result should look like 
this, where {{TOKEN_NAME}} and {{TOKEN_PASSWORD}} are your secret tokens:
{code:xml}

   
 
   apache.releases.https
   TOKEN_NAME
   TOKEN_PASSWORD
 
 
   apache.snapshots.https
   TOKEN_NAME
   TOKEN_PASSWORD
 
   
 
{code}

h4. Website development setup

Get ready for updating the Flink website by following the [website development 
instructions|https://flink.apache.org/contributing/improve-website.html].
h4. GNU Tar Setup for Mac (Skip this step if you are not using a Mac)

The default tar application on Mac does not support GNU archive format and 
defaults to Pax. This bloats the archive with unnecessary metadata that can 
result in additional files when decompressing (see [1.15.2-RC2 vote 
thread|https://lists.apache.org/thread/mzbgsb7y9vdp9bs00gsgscsjv2ygy58q]). 
Install gnu-tar and create a symbolic link to use in preference of the default 
tar program.
{code:bash}
$ brew install gnu-tar
$ ln -s /usr/local/bin/gtar /usr/local/bin/tar
$ which tar
{code}
 

h3. Expectations
 * Release Manager’s GPG key is published to 
[dist.apache.org|http://dist.apache.org/]
 * Release Manager’s GPG key is configured in git configuration
 * Release Manager's GPG key is configured as the default gpg key.
 * Release Manager has {{org.apache.flink}} listed under Staging Profiles in 
Nexus
 * Release Manager’s Nexus User Token is configured in settings.xml



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-33832) CLONE - Verify that no exclusions were erroneously added to the japicmp plugin

2023-12-14 Thread Jing Ge (Jira)

Jing Ge created FLINK-33832:
---

 Summary: CLONE - Verify that no exclusions were erroneously added 
to the japicmp plugin
 Key: FLINK-33832
 URL: https://issues.apache.org/jira/browse/FLINK-33832
 Project: Flink
  Issue Type: Sub-task
Reporter: Jing Ge
Assignee: Matthias Pohl


Verify that no exclusions were erroneously added to the japicmp plugin that 
break compatibility guarantees. Check the exclusions for the 
japicmp-maven-plugin in the root pom (see 
[apache/flink:pom.xml:2175ff|https://github.com/apache/flink/blob/3856c49af77601cf7943a5072d8c932279ce46b4/pom.xml#L2175]
 for exclusions that:
* For minor releases: break source compatibility for {{@Public}} APIs
* For patch releases: break source/binary compatibility for 
{{@Public}}/{{@PublicEvolving}}  APIs
Any such exclusion must be properly justified, in advance.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-33831) CLONE - Create a release branch

2023-12-14 Thread Jing Ge (Jira)

Jing Ge created FLINK-33831:
---

 Summary: CLONE - Create a release branch
 Key: FLINK-33831
 URL: https://issues.apache.org/jira/browse/FLINK-33831
 Project: Flink
  Issue Type: Sub-task
Affects Versions: 1.17.0
Reporter: Jing Ge
Assignee: Leonard Xu
 Fix For: 1.17.0


If you are doing a new minor release, you need to update Flink version in the 
following repositories and the [AzureCI project 
configuration|https://dev.azure.com/apache-flink/apache-flink/]:
 * [apache/flink|https://github.com/apache/flink]
 * [apache/flink-docker|https://github.com/apache/flink-docker]
 * [apache/flink-benchmarks|https://github.com/apache/flink-benchmarks]

Patch releases don't require the these repositories to be touched. Simply 
checkout the already existing branch for that version:
{code:java}
$ git checkout release-$SHORT_RELEASE_VERSION
{code}
h4. Flink repository

Create a branch for the new version that we want to release before updating the 
master branch to the next development version:
{code:bash}
$ cd ./tools
tools $ releasing/create_snapshot_branch.sh
tools $ git checkout master
tools $ OLD_VERSION=$CURRENT_SNAPSHOT_VERSION 
NEW_VERSION=$NEXT_SNAPSHOT_VERSION releasing/update_branch_version.sh
{code}
In the {{master}} branch, add a new value (e.g. {{v1_16("1.16")}}) to 
[apache-flink:flink-annotations/src/main/java/org/apache/flink/FlinkVersion|https://github.com/apache/flink/blob/master/flink-annotations/src/main/java/org/apache/flink/FlinkVersion.java]
 as the last entry:
{code:java}
// ...
v1_12("1.12"),
v1_13("1.13"),
v1_14("1.14"),
v1_15("1.15"),
v1_16("1.16");
{code}
The newly created branch and updated {{master}} branch need to be pushed to the 
official repository.
h4. Flink Docker Repository

Afterwards fork off from {{dev-master}} a {{dev-x.y}} branch in the 
[apache/flink-docker|https://github.com/apache/flink-docker] repository. Make 
sure that 
[apache/flink-docker:.github/workflows/ci.yml|https://github.com/apache/flink-docker/blob/dev-master/.github/workflows/ci.yml]
 points to the correct snapshot version; for {{dev-x.y}} it should point to 
{{{}x.y-SNAPSHOT{}}}, while for {{dev-master}} it should point to the most 
recent snapshot version (\{[$NEXT_SNAPSHOT_VERSION}}).

After pushing the new minor release branch, as the last step you should also 
update the documentation workflow to also build the documentation for the new 
release branch. Check [Managing 
Documentation|https://cwiki.apache.org/confluence/display/FLINK/Managing+Documentation]
 on details on how to do that. You may also want to manually trigger a build to 
make the changes visible as soon as possible.

h4. Flink Benchmark Repository
First of all, checkout the {{master}} branch to {{dev-x.y}} branch in 
[apache/flink-benchmarks|https://github.com/apache/flink-benchmarks], so that 
we can have a branch named {{dev-x.y}} which could be built on top of 
(${{CURRENT_SNAPSHOT_VERSION}}).

Then, inside the repository you need to manually update the {{flink.version}} 
property inside the parent *pom.xml* file. It should be pointing to the most 
recent snapshot version ($NEXT_SNAPSHOT_VERSION). For example:
{code:xml}
1.18-SNAPSHOT
{code}

h4. AzureCI Project Configuration
The new release branch needs to be configured within AzureCI to make azure 
aware of the new release branch. This matter can only be handled by Ververica 
employees since they are owning the AzureCI setup.
 

h3. Expectations (Minor Version only if not stated otherwise)
 * Release branch has been created and pushed
 * Changes on the new release branch are picked up by [Azure 
CI|https://dev.azure.com/apache-flink/apache-flink/_build?definitionId=1&_a=summary]
 * {{master}} branch has the version information updated to the new version 
(check pom.xml files and 
 * 
[apache-flink:flink-annotations/src/main/java/org/apache/flink/FlinkVersion|https://github.com/apache/flink/blob/master/flink-annotations/src/main/java/org/apache/flink/FlinkVersion.java]
 enum)
 * New version is added to the 
[apache-flink:flink-annotations/src/main/java/org/apache/flink/FlinkVersion|https://github.com/apache/flink/blob/master/flink-annotations/src/main/java/org/apache/flink/FlinkVersion.java]
 enum.
 * Make sure [flink-docker|https://github.com/apache/flink-docker/] has 
{{dev-x.y}} branch and docker e2e tests run against this branch in the 
corresponding Apache Flink release branch (see 
[apache/flink:flink-end-to-end-tests/test-scripts/common_docker.sh:51|https://github.com/apache/flink/blob/master/flink-end-to-end-tests/test-scripts/common_docker.sh#L51])
 * 
[apache-flink:docs/config.toml|https://github.com/apache/flink/blob/release-1.17/docs/config.toml]
 has been updated appropriately in the new Apache Flink release branch.
 * The {{flink.version}} property (see

[jira] [Created] (FLINK-33825) CLONE - Create a new version in JIRA

2023-12-14 Thread Jing Ge (Jira)

Jing Ge created FLINK-33825:
---

 Summary: CLONE - Create a new version in JIRA
 Key: FLINK-33825
 URL: https://issues.apache.org/jira/browse/FLINK-33825
 Project: Flink
  Issue Type: Sub-task
Reporter: Jing Ge
Assignee: Martijn Visser


When contributors resolve an issue in JIRA, they are tagging it with a release 
that will contain their changes. With the release currently underway, new 
issues should be resolved against a subsequent future release. Therefore, you 
should create a release item for this subsequent release, as follows:
 # In JIRA, navigate to the [Flink > Administration > 
Versions|https://issues.apache.org/jira/plugins/servlet/project-config/FLINK/versions].
 # Add a new release: choose the next minor version number compared to the one 
currently underway, select today’s date as the Start Date, and choose Add.
(Note: Only PMC members have access to the project administration. If you do 
not have access, ask on the mailing list for assistance.)

 

h3. Expectations
 * The new version should be listed in the dropdown menu of {{fixVersion}} or 
{{affectedVersion}} under "unreleased versions" when creating a new Jira issue.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-33824) Prepare Flink 1.18.1 Release

2023-12-14 Thread Jing Ge (Jira)

Jing Ge created FLINK-33824:
---

 Summary: Prepare Flink 1.18.1 Release
 Key: FLINK-33824
 URL: https://issues.apache.org/jira/browse/FLINK-33824
 Project: Flink
  Issue Type: New Feature
  Components: Release System
Affects Versions: 1.17.0
Reporter: Jing Ge
Assignee: Leonard Xu
 Fix For: 1.17.0


This umbrella issue is meant as a test balloon for moving the [release 
documentation|https://cwiki.apache.org/confluence/display/FLINK/Creating+a+Flink+Release]
 into Jira.
h3. Prerequisites
h4. Environment Variables

Commands in the subtasks might expect some of the following enviroment 
variables to be set accordingly to the version that is about to be released:
{code:bash}
RELEASE_VERSION="1.5.0"
SHORT_RELEASE_VERSION="1.5"
CURRENT_SNAPSHOT_VERSION="$SHORT_RELEASE_VERSION-SNAPSHOT"
NEXT_SNAPSHOT_VERSION="1.6-SNAPSHOT"
SHORT_NEXT_SNAPSHOT_VERSION="1.6"
{code}
h4. Build Tools

All of the following steps require to use Maven 3.2.5 and Java 8. Modify your 
PATH environment variable accordingly if needed.
h4. Flink Source
 * Create a new directory for this release and clone the Flink repository from 
Github to ensure you have a clean workspace (this step is optional).
 * Run {{mvn -Prelease clean install}} to ensure that the build processes that 
are specific to that profile are in good shape (this step is optional).

The rest of this instructions assumes that commands are run in the root (or 
{{./tools}} directory) of a repository on the branch of the release version 
with the above environment variables set.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-33828) CLONE - Cross team testing

2023-12-14 Thread Jing Ge (Jira)

Jing Ge created FLINK-33828:
---

 Summary: CLONE - Cross team testing
 Key: FLINK-33828
 URL: https://issues.apache.org/jira/browse/FLINK-33828
 Project: Flink
  Issue Type: Sub-task
Reporter: Jing Ge
Assignee: Qingsheng Ren


For user facing features that go into the release we'd like to ensure they can 
actually _be used_ by Flink users. To achieve this the release managers ensure 
that an issue for cross team testing is created in the Apache Flink Jira. This 
can and should be picked up by other community members to verify the 
functionality and usability of the feature.
The issue should contain some entry points which enables other community 
members to test it. It should not contain documentation on how to use the 
feature as this should be part of the actual documentation. The cross team 
tests are performed after the feature freeze. Documentation should be in place 
before that. Those tests are manual tests, so do not confuse them with 
automated tests.
To sum that up:
 * User facing features should be tested by other contributors
 * The scope is usability and sanity of the feature
 * The feature needs to be already documented
 * The contributor creates an issue containing some pointers on how to get 
started (e.g. link to the documentation, suggested targets of verification)
 * Other community members pick those issues up and provide feedback
 * Cross team testing happens right after the feature freeze

 

h3. Expectations
 * Jira issues for each expected release task according to the release plan is 
created and labeled as {{{}release-testing{}}}.
 * All the created release-testing-related Jira issues are resolved and the 
corresponding blocker issues are fixed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-33826) CLONE - Triage release-blocking issues in JIRA

2023-12-14 Thread Jing Ge (Jira)

Jing Ge created FLINK-33826:
---

 Summary: CLONE - Triage release-blocking issues in JIRA
 Key: FLINK-33826
 URL: https://issues.apache.org/jira/browse/FLINK-33826
 Project: Flink
  Issue Type: Sub-task
Reporter: Jing Ge
Assignee: Qingsheng Ren


There could be outstanding release-blocking issues, which should be triaged 
before proceeding to build a release candidate. We track them by assigning a 
specific Fix version field even before the issue resolved.

The list of release-blocking issues is available at the version status page. 
Triage each unresolved issue with one of the following resolutions:
 * If the issue has been resolved and JIRA was not updated, resolve it 
accordingly.
 * If the issue has not been resolved and it is acceptable to defer this until 
the next release, update the Fix Version field to the new version you just 
created. Please consider discussing this with stakeholders and the dev@ mailing 
list, as appropriate.
 ** When using "Bulk Change" functionality of Jira
 *** First, add the newly created version to Fix Version for all unresolved 
tickets that have old the old version among its Fix Versions.
 *** Afterwards, remove the old version from the Fix Version.
 * If the issue has not been resolved and it is not acceptable to release until 
it is fixed, the release cannot proceed. Instead, work with the Flink community 
to resolve the issue.

 

h3. Expectations
 * There are no release blocking JIRA issues



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Re: [DISCUSS] FLIP-387: Support named parameters for functions and call procedures

2023-12-14 Thread Feng Jin

Hi Timo
Thanks for your reply.

>  1) ArgumentNames annotation

I'm sorry for my incorrect expression. argumentNames is a method of
FunctionHints. We should introduce a new arguments method to replace this
method and return Argument[].
I updated the FLIP doc about this part.

>  2) Evolution of FunctionHint

+1 define DataTypeHint as part of ArgumentHint. I'll update the FLIP doc.

> 3)  Semantical correctness

I realized that I forgot to submit the latest modification of the FLIP
document. Xuyang and I had a prior discussion before starting this discuss.
Let's restrict it to supporting only one eval() function, which will
simplify the overall design.

Therefore, I also concur with not permitting overloaded named parameters.


Best,
Feng

On Thu, Dec 14, 2023 at 6:15 PM Timo Walther  wrote:

> Hi Feng,
>
> thank you for proposing this FLIP. This nicely completes FLIP-65 which
> is great for usability.
>
> I read the FLIP and have some feedback:
>
>
> 1) ArgumentNames annotation
>
>  > Deprecate the ArgumentNames annotation as it is not user-friendly for
> specifying argument names with optional configuration.
>
> Which annotation does the FLIP reference here? I cannot find it in the
> Flink code base.
>
> 2) Evolution of FunctionHint
>
> Introducing @ArgumentHint makes a lot of sense to me. However, using it
> within @FunctionHint looks complex, because there is both `input=` and
> `arguments=`. Ideally, the @DataTypeHint can be defined inline as part
> of the @ArgumentHint. It could even be the `value` such that
> `@ArgumentHint(@DataTypeHint("INT"))` is valid on its own.
>
> We could deprecate `input=`. Or let both `input` and `arguments=`
> coexist but never be defined at the same time.
>
> 3) Semantical correctness
>
> As you can see in the `TypeInference` class, named parameters are
> prepared in the stack already. However, we need to watch out between
> helpful explanation (see `InputTypeStrategy#getExpectedSignatures`) and
> named parameters (see `TypeInference.Builder#namedArguments`) that can
> be used in SQL.
>
> If I remember correctly, named parameters can be reordered and don't
> allow overloading of signatures. Thus, only a single eval() should have
> named parameters. Looking at the FLIP it seems you would like to support
> multiple parameter lists. What changes are you planning to TypeInference
> (which is also declared as @PublicEvoving)? This should also be
> documented as the annotations should compile into this class.
>
> In general, I would prefer to keep it simple and don't allow overloading
> named parameters. With the optionality, users can add an arbitrary
> number of parameters to the signature of the same eval method.
>
> Regards,
> Timo
>
> On 14.12.23 10:02, Feng Jin wrote:
> > Hi all,
> >
> >
> > Xuyang and I would like to start a discussion of FLIP-387: Support
> > named parameters for functions and call procedures [1]
> >
> > Currently, when users call a function or call a procedure, they must
> > specify all fields in order. When there are a large number of
> > parameters, it is easy to make mistakes and cannot omit specifying
> > non-mandatory fields.
> >
> > By using named parameters, you can selectively specify the required
> > parameters, reducing the probability of errors and making it more
> > convenient to use.
> >
> > Here is an example of using Named Procedure.
> > ```
> > -- for scalar function
> > SELECT my_scalar_function(param1 => ‘value1’, param2 => ‘value2’’) FROM
> []
> >
> > -- for table function
> > SELECT  *  FROM TABLE(my_table_function(param1 => 'value1', param2 =>
> 'value2'))
> >
> > -- for agg function
> > SELECT my_agg_function(param1 => 'value1', param2 => 'value2') FROM []
> >
> > -- for call procedure
> > CALL  procedure_name(param1 => ‘value1’, param2 => ‘value2’)
> > ```
> >
> > For UDX and Call procedure developers, we introduce a new annotation
> > to specify the parameter name, indicate if it is optional, and
> > potentially support specifying default values in the future
> >
> > ```
> > public @interface ArgumentHint {
> >  /**
> >   * The name of the parameter, default is an empty string.
> >   */
> >  String name() default "";
> >
> >  /**
> >   * Whether the parameter is optional, default is true.
> >   */
> >  boolean isOptional() default true;
> > }}
> > ```
> >
> > ```
> > // Call Procedure Development
> >
> > public static class NamedArgumentsProcedure implements Procedure {
> >
> > // Example usage: CALL myNamedProcedure(in1 => 'value1', in2 =>
> 'value2')
> >
> > // Example usage: CALL myNamedProcedure(in1 => 'value1', in2 =>
> > 'value2', in3 => 'value3')
> >
> > @ProcedureHint(
> > input = {@DataTypeHint(value = "STRING"),
> > @DataTypeHint(value = "STRING"), @DataTypeHint(value = "STRING")},
> > output = @DataTypeHint("STRING"),
> >  arguments = {
> >  @ArgumentHint(name = "in1", isOptional = false),
> >

[jira] [Created] (FLINK-33823) Serialize PlannerQueryOperation into SQL

2023-12-14 Thread Dawid Wysakowicz (Jira)

Dawid Wysakowicz created FLINK-33823:


 Summary: Serialize PlannerQueryOperation into SQL
 Key: FLINK-33823
 URL: https://issues.apache.org/jira/browse/FLINK-33823
 Project: Flink
  Issue Type: Sub-task
Reporter: Dawid Wysakowicz
Assignee: Dawid Wysakowicz
 Fix For: 1.19.0






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-33822) Move Slack Invite URL into config.toml

2023-12-14 Thread Matthias Pohl (Jira)

Matthias Pohl created FLINK-33822:
-

 Summary: Move Slack Invite URL into config.toml
 Key: FLINK-33822
 URL: https://issues.apache.org/jira/browse/FLINK-33822
 Project: Flink
  Issue Type: Bug
  Components: Project Website
Reporter: Matthias Pohl


Instead of 4 locations, we want to update only one location with the invite 
link. Additionally, we should add documentation on how to update the link.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Re: Edit permissions request

2023-12-14 Thread Matthias Pohl

Sure, I will help you.

On Wed, Dec 13, 2023 at 7:55 PM Alex Nitavsky 
wrote:

> Hello,
>
> I would love to initiate FLIP in order to extend Apache Curator
> configuration: https://issues.apache.org/jira/browse/FLINK-33376.
>
> Would it be please possible to get edit permissions for Apache Flink
> confluence space please?
>
> Thanks
>

Re: [DISCUSS] Release Flink 1.18.1

2023-12-14 Thread Jing Ge

Hi folks,

What Martijn said makes sense. We should pay more attention to
https://issues.apache.org/jira/browse/FLINK-33793. I have also tried to
contribute and share my thoughts in the PR
https://github.com/apache/flink/pull/23489.

For 1.18.1-rc1, I will wait until tomorrow. If there is no progress with
the PR, I will move forward with the release, unless we upgrade the ticket
to be a Blocker. Look forward to your feedback.

Best regards,
Jing

On Wed, Dec 13, 2023 at 9:57 AM Jing Ge  wrote:

> Hi Martijn,
>
> Thanks for the heads-up! I already started 1.18.1-rc1[1]. But please feel
> free to ping me. I will either redo rc1 or start rc2. Thanks!
>
> Best regards,
> Jing
>
> [1] https://dist.apache.org/repos/dist/dev/flink/flink-1.18.1-rc1/
>
> On Tue, Dec 12, 2023 at 9:49 PM Martijn Visser 
> wrote:
>
>> Hi Jing,
>>
>> The only thing that was brought to my attention today was
>> https://issues.apache.org/jira/browse/FLINK-33793. I've marked it as a
>> Critical (I think we should have working checkpointing with GCS), but
>> could also be considered a Blocker. There is a PR open for it, but I
>> don't think it's the right fix so I'm testing
>>
>> https://github.com/apache/flink/commit/2ebe7f6e090690486c1954099a2b57283c578192
>> at this moment. If you haven't started the 1.18.1 release yet, perhaps
>> we could include it (hopefully we can merge it tomorrow), else it
>> would have to wait for the next release.
>>
>> Thanks,
>>
>> Martijn
>>
>> On Tue, Dec 12, 2023 at 9:30 AM Jing Ge 
>> wrote:
>> >
>> > Hi All,
>> >
>> > Thank you for your feedback!
>> >
>> > Since FLINK-33523[1] is done(thanks Martijn for driving it) and there
>> are
>> > no other concerns or objections. Currently I am not aware of any
>> unresolved
>> > blockers. There is one critical task[2] whose PR still has some
>> checkstyle
>> > issues. I will try to include it into 1.18.1 with best effort, since it
>> > depends on how quickly the author can finish it and cp to 1.18. If
>> anyone
>> > considers FLINK-33588 as a must-have fix in 1.18.1, please let me know,
>> > thanks!
>> >
>> > I will start working on the RC1 release today.
>> >
>> > @benchao: gotcha wrt FLINK-33313
>> >
>> > Best regards,
>> > Jing
>> >
>> > [1] https://issues.apache.org/jira/browse/FLINK-33523
>> > [2] https://issues.apache.org/jira/browse/FLINK-33588
>> >
>> > On Tue, Dec 12, 2023 at 7:03 AM weijie guo 
>> > wrote:
>> >
>> > > Thanks Jing for driving this bug-fix release.
>> > >
>> > > +1 from my side.
>> > >
>> > > Best regards,
>> > >
>> > > Weijie
>> > >
>> > >
>> > > Jark Wu  于2023年12月12日周二 12:17写道：
>> > >
>> > > > Thanks Jing for driving 1.18.1.
>> > > > +1 for this.
>> > > >
>> > > > Best,
>> > > > Jark
>> > > >
>> > > > On Mon, 11 Dec 2023 at 16:59, Hong Liang 
>> wrote:
>> > > >
>> > > > > +1. Thanks Jing for driving this.
>> > > > >
>> > > > > Hong
>> > > > >
>> > > > > On Mon, Dec 11, 2023 at 2:27 AM Yun Tang 
>> wrote:
>> > > > >
>> > > > > > Thanks Jing for driving 1.18.1 release, +1 for this.
>> > > > > >
>> > > > > >
>> > > > > > Best
>> > > > > > Yun Tang
>> > > > > > 
>> > > > > > From: Rui Fan <1996fan...@gmail.com>
>> > > > > > Sent: Saturday, December 9, 2023 21:46
>> > > > > > To: dev@flink.apache.org 
>> > > > > > Subject: Re: [DISCUSS] Release Flink 1.18.1
>> > > > > >
>> > > > > > Thanks Jing for driving this release, +1
>> > > > > >
>> > > > > > Best,
>> > > > > > Rui
>> > > > > >
>> > > > > > On Sat, Dec 9, 2023 at 7:33 AM Leonard Xu 
>> wrote:
>> > > > > >
>> > > > > > > Thanks Jing for driving this release, +1
>> > > > > > >
>> > > > > > > Best,
>> > > > > > > Leonard
>> > > > > > >
>> > > > > > > > 2023年12月9日 上午1:23，Danny Cranmer 
>> 写道：
>> > > > > > > >
>> > > > > > > > +1
>> > > > > > > >
>> > > > > > > > Thanks for driving this
>> > > > > > > >
>> > > > > > > > On Fri, 8 Dec 2023, 12:05 Timo Walther, > >
>> > > > wrote:
>> > > > > > > >
>> > > > > > > >> Thanks for taking care of this Jing.
>> > > > > > > >>
>> > > > > > > >> +1 to release 1.18.1 for this.
>> > > > > > > >>
>> > > > > > > >> Cheers,
>> > > > > > > >> Timo
>> > > > > > > >>
>> > > > > > > >>
>> > > > > > > >> On 08.12.23 10:00, Benchao Li wrote:
>> > > > > > > >>> I've merged FLINK-33313 to release-1.18 branch.
>> > > > > > > >>>
>> > > > > > > >>> Péter Váry  于2023年12月8日周五
>> > > 16:56写道：
>> > > > > > > 
>> > > > > > >  Hi Jing,
>> > > > > > >  Thanks for taking care of this!
>> > > > > > >  +1 (non-binding)
>> > > > > > >  Peter
>> > > > > > > 
>> > > > > > >  Sergey Nuyanzin  ezt írta (időpont:
>> > > 2023.
>> > > > > dec.
>> > > > > > > >> 8., P,
>> > > > > > >  9:36):
>> > > > > > > 
>> > > > > > > > Thanks Jing driving it
>> > > > > > > > +1
>> > > > > > > >
>> > > > > > > > also +1 to include FLINK-33313 mentioned by Benchao Li
>> > > > > > > >
>> > > > > > > > On Fri, Dec 8, 2023 at 9:17 AM Benchao Li <
>> > > > libenc...@apache.org>
>> > > > > > > >> wrote:
>> > > > >

Re: [DISCUSS] FLIP-387: Support named parameters for functions and call procedures

2023-12-14 Thread Timo Walther


Hi Feng,

thank you for proposing this FLIP. This nicely completes FLIP-65 which 
is great for usability.


I read the FLIP and have some feedback:


1) ArgumentNames annotation

> Deprecate the ArgumentNames annotation as it is not user-friendly for 
specifying argument names with optional configuration.


Which annotation does the FLIP reference here? I cannot find it in the 
Flink code base.


2) Evolution of FunctionHint

Introducing @ArgumentHint makes a lot of sense to me. However, using it 
within @FunctionHint looks complex, because there is both `input=` and 
`arguments=`. Ideally, the @DataTypeHint can be defined inline as part 
of the @ArgumentHint. It could even be the `value` such that 
`@ArgumentHint(@DataTypeHint("INT"))` is valid on its own.


We could deprecate `input=`. Or let both `input` and `arguments=` 
coexist but never be defined at the same time.


3) Semantical correctness

As you can see in the `TypeInference` class, named parameters are 
prepared in the stack already. However, we need to watch out between 
helpful explanation (see `InputTypeStrategy#getExpectedSignatures`) and 
named parameters (see `TypeInference.Builder#namedArguments`) that can 
be used in SQL.


If I remember correctly, named parameters can be reordered and don't 
allow overloading of signatures. Thus, only a single eval() should have 
named parameters. Looking at the FLIP it seems you would like to support 
multiple parameter lists. What changes are you planning to TypeInference 
(which is also declared as @PublicEvoving)? This should also be 
documented as the annotations should compile into this class.


In general, I would prefer to keep it simple and don't allow overloading 
named parameters. With the optionality, users can add an arbitrary 
number of parameters to the signature of the same eval method.


Regards,
Timo

On 14.12.23 10:02, Feng Jin wrote:

Hi all,


Xuyang and I would like to start a discussion of FLIP-387: Support
named parameters for functions and call procedures [1]

Currently, when users call a function or call a procedure, they must
specify all fields in order. When there are a large number of
parameters, it is easy to make mistakes and cannot omit specifying
non-mandatory fields.

By using named parameters, you can selectively specify the required
parameters, reducing the probability of errors and making it more
convenient to use.

Here is an example of using Named Procedure.
```
-- for scalar function
SELECT my_scalar_function(param1 => ‘value1’, param2 => ‘value2’’) FROM []

-- for table function
SELECT  *  FROM TABLE(my_table_function(param1 => 'value1', param2 => 'value2'))

-- for agg function
SELECT my_agg_function(param1 => 'value1', param2 => 'value2') FROM []

-- for call procedure
CALL  procedure_name(param1 => ‘value1’, param2 => ‘value2’)
```

For UDX and Call procedure developers, we introduce a new annotation
to specify the parameter name, indicate if it is optional, and
potentially support specifying default values in the future

```
public @interface ArgumentHint {
 /**
  * The name of the parameter, default is an empty string.
  */
 String name() default "";

 /**
  * Whether the parameter is optional, default is true.
  */
 boolean isOptional() default true;
}}
```

```
// Call Procedure Development

public static class NamedArgumentsProcedure implements Procedure {

// Example usage: CALL myNamedProcedure(in1 => 'value1', in2 => 'value2')

// Example usage: CALL myNamedProcedure(in1 => 'value1', in2 =>
'value2', in3 => 'value3')

@ProcedureHint(
input = {@DataTypeHint(value = "STRING"),
@DataTypeHint(value = "STRING"), @DataTypeHint(value = "STRING")},
output = @DataTypeHint("STRING"),
   arguments = {
 @ArgumentHint(name = "in1", isOptional = false),
 @ArgumentHint(name = "in2", isOptional = true)
 @ArgumentHint(name = "in3", isOptional = true)})
public String[] call(ProcedureContext procedureContext, String
arg1, String arg2, String arg3) {
return new String[]{arg1 + ", " + arg2 + "," + arg3};
}
}
```


Currently, we offer support for two scenarios when calling a function
or procedure:

1. The corresponding parameters can be specified using the parameter
name, without a specific order.
2. Unnecessary parameters can be omitted.


There are still some limitations when using Named parameters:
1. Named parameters do not support variable arguments.
2. UDX or procedure classes that support named parameters can only
have one eval method.
3. Due to the current limitations of Calcite-947[2], we cannot specify
a default value for omitted parameters, which is Null by default.



Also, thanks very much for the suggestions and help provided by Zelin
and Lincoln.




1. 
https://cwiki.apache.org/confluence/display/FLINK/FLIP-387%3A+Support+named+parameters+for+functions+and+call+procedures.

2.

Re: [DISCUSS] FLIP-392: Deprecate the Legacy Group Window Aggregation

2023-12-14 Thread Timo Walther

Hi Xuyang,

> I'm not spliting this flip is that all of these subtasks like session
window tvf and cdc support do not change the public interface and the
public syntax

Given the length of this mailing list discussion and number of involved
people I would strongly suggest to simplify the FLIP and give it a
better title to make quicker progress. In general, we all seem to be on
the same page in what we want. And both session TVF support and the
deprecation of the legacy group windows has been voted already and
discussed thouroughly. The FLIP can purely focus on the CDC topic.

Cheers,
Timo

On 14.12.23 08:35, Xuyang wrote:

Hi, Timo, Sergey and Lincoln Lee. Thanks for your feedback.

In my opinion the FLIP touches too many
topics at the same time and should be split into multiple FLIPs. We > should
stay focused on what is needed for Flink 2.0.

The main goal and topic of this Flip is to align the abilities between the
legacy group window agg syntax and the new window tvf syntax,
and then we can say that the legacy window syntax will be deprecated. IMO,
although there are many misalignments about these two
syntaxes, such as session window tvf, cdc support and so on, they are all the
subtasks we need to do in this flip. Another reason I'm not
spliting this flip is that all of these subtasks like session window tvf and
cdc support do not change the public interface and the public
syntax, the implements of them will only be in modules table-planner and
table-runtime.

Can we postpone this discussion? Currently we should focus on user
switching to Window TVFs before Flink 2.0. Early fire, late fire and > allow
lateness have not exposed through public configuration. It can be > introduced
after Flink 2.0 released.

Agree with you. This flip will not and should not expose these experimental
flink conf to users. I list them in this flip just aims to show the
misalignments about these two window syntaxes.

Look for your thought.

Best！
Xuyang

At 2023-12-13 15:40:16, "Lincoln Lee" wrote:

Thanks Xuyang driving this work! It's great that everyone agrees with the
work itself in this flip[1]!

Regarding whether to split the flip or adjust the scope of this flip, I'd
like to share some thoughts:

1. About the title of this flip, what I want to say is that flip-145[2] had
marked the legacy group window deprecated in the documentation but the
functionality of the new syntax is not aligned with the legacy one.
This is not a user-friendly deprecation, so the initiation of this flip, as
I understand it, is for the formal deprecation of the legacy window, which
requires us to complete the functionality alignment.

2. Agree with Timo that we can process the late-fire/early-fire features
separately. These experimental parameters have not been officially opened
to users.
Considering the workload, we can focus more on this version.

3. I have no objection to splitting this flip if everyone feels that the
work included is too much.
Regarding the support of session tvf, it seems that the main problem is
that this part of the description occupies a large part of the flip,
causing some misunderstandings.
This is indeed a predetermined task in FLIP-145, just adding more
explanation about semantics. In addition, I saw the discussion history in
FLINK-24024[3], thanks Sergey for being willing to help driving this work
together.

[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-392%3A+Deprecate+the+Legacy+Group+Window+Aggregation
[2]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-145%3A+Support+SQL+windowing+table-valued+function
[3] https://issues.apache.org/jira/browse/FLINK-24024

Best,
Lincoln Lee

Sergey Nuyanzin 于2023年12月13日周三 08:02写道：

thanks for summarising Timo

+1 for splitting it in different FLIPs
and agree about having "SESSION Window TVF Aggregation" under FLIP-145
Moreover the task is already there, so no need to move it from one FLIP to
another

And actually Sergey Nuyanzin wanted to work
in this if I remember correctly. Not sure if this is still the case.

Yes, however it seems there is already an existing PR for that.
Anyway I'm happy to help here with the review as well and will reserve some
time for it in coming days.

On Tue, Dec 12, 2023 at 12:18 PM Timo Walther wrote:

Hi Xuyang,

thanks for proposing this FLIP. In my opinion the FLIP touches too many
topics at the same time and should be split into multiple FLIPs. We
should stay focused on what is needed for Flink 2.0.

Let me summarizing the topics:

1) SESSION Window TVF Aggregation

This has been agreed in FLIP-145 already. We don't need another FLIP but
someone that finally implements this after we have performed the Calcite
upgrade a couple of months ago. The Calcite upgrade was important
exactly for SESSION windows. And actually Sergey Nuyanzin wanted to work
in this if I remember correctly. Not sure if this is still the case.

2) CDC support of Window TVFs

This can be a FLIP on

Re: [DISCUSS] Should Configuration support getting value based on String key?

2023-12-14 Thread Timo Walther

The configuration in Flink is complicated and I fear we won't have 
enough capacity to substantially fix it. The introduction of 
ReadableConfig, WritableConfig, and typed ConfigOptions was a right step 
into making the code more maintainable. From the Flink side, every read 
access should go through ConfigOption.


However, I also understand Gyula pragmatism here because (practically 
speaking) users get access `getString()` via `toMap().get()`. So I'm 
fine with removing the deprecation for functionality that is available 
anyways. We should, however, add the message to `getString()` that this 
method is discouraged and `get(ConfigOption)` should be the preferred 
way of accessting Configuration.


In any case we should remove the getInt and related methods.

Cheers,
Timo


On 14.12.23 09:56, Gyula Fóra wrote:

I see a strong value for user facing configs to use ConfigOption and this
should definitely be an enforced convention.

However with the Flink project growing and many other components and even
users using the Configuration object I really don’t think that we should
“force” this on the users/developers.

If we make fromMap / toMap free with basically no overhead, that is fine
but otherwise I think it would hurt the user experience to remove the
simple getters / setters. Temporary configoptions to access strings from
what is practically string->string map is exactly the kind of unnecessary
boilerplate that every dev and user wants to avoid.

Gyula

There are many cases where the features of the configoption are really not
needed.

On Thu, 14 Dec 2023 at 09:38, Xintong Song  wrote:


Hi Gyula,

First of all, even if we remove the `getXXX(String key, XXX defaultValue)`
methods, there are still several ways to access the configuration with
string-keys.

- If one wants to access a specific option, as Rui mentioned,
`ConfigOptions.key("xxx").stringType().noDefaultValue()` can be used.
TBH,
I can't think of a use case where a temporally created ConfigOption is
preferred over a predefined one. Do you have any examples for that?
- If one wants to access the whole configuration set, then `toMap` or
`iterator` might be helpful.

It is true that these ways are less convenient than `getXXX(String key, XXX
defaultValue)`, and that's exactly my purpose, to make the key-string less
convenient so that developers choose ConfigOption over it whenever is
possible.

there will always be cases where a more flexible

dynamic handling is necessary without the added overhead of the toMap

logic




I'm not sure about this. I agree there are cases where flexible and dynamic
handling is needed, but maybe "without the added overhead of the toMap
logic" is not that necessary?

I'd think of this as "encouraging developers to use ConfigOption as much as
possible" vs. "a bit less convenient in 5% of the cases". I guess there's
no right and wrong, just different engineer opinions. While I'm personally
stand with removing the string-key access methods, I'd also be fine with
the other way if there are more people in favor of it.

Best,

Xintong



On Thu, Dec 14, 2023 at 3:45 PM Gyula Fóra  wrote:


Hi Xintong,

I don’t really see the actual practical benefit from removing the

getstring

and setstring low level methods.

I understand that ConfigOptions are nicer for 95% of the cases but from a
technical point of view there will always be cases where a more flexible
dynamic handling is necessary without the added overhead of the toMap
logic.

I think it’s the most natural thing for any config abstraction to expose
basic get set methods with a simple key.

What do you think?

Cheers
Gyula

On Thu, 14 Dec 2023 at 08:00, Xintong Song 

wrote:




IIUC, you both prefer using ConfigOption instead of string keys for
all use cases, even internal ones. We can even forcefully delete
these @Depreated methods in Flink-2.0 to guide users or
developers to use ConfigOption.



Yes, at least from my side.


I noticed that Configuration is used in

DistributedCache#writeFileInfoToConfig and readFileInfoFromConfig
to store some cacheFile meta-information. Their keys are
temporary(key name with number) and it is not convenient
to predefine ConfigOption.



True, this one requires a bit more effort to migrate from string-key to
ConfigOption, but still should be doable. Looking at how the two

mentioned

methods are implemented and used, it seems what is really needed is
serialization and deserialization of `DistributedCacheEntry`-s. And all

the

entries are always written / read at once. So I think we can serialize

the

whole set of entries into a JSON string (or something similar), and use

one

ConfigOption with a deterministic key for it, rather than having one
ConfigOption for each field in each entry. WDYT?


If everyone agrees with this direction, we can start to refactor all

code that uses getXxx(String key, String defaultValue) into
getXxx(ConfigOption configOption), and completely
delete all getXxx(String key,

[jira] [Created] (FLINK-33821) ArrowSerializer$finishCurrentBatch consumes too much time

2023-12-14 Thread wuwenchi (Jira)

wuwenchi created FLINK-33821:


 Summary: ArrowSerializer$finishCurrentBatch consumes too much time
 Key: FLINK-33821
 URL: https://issues.apache.org/jira/browse/FLINK-33821
 Project: Flink
  Issue Type: Technical Debt
  Components: API / Python
Affects Versions: 1.18.0
Reporter: wuwenchi


We convert the data into arrow format through flink and send it to doris.
Data convertion likes this: RowData --> arrow --> doris.
But during testing, we found that the `ArrowSerializer` provided by 
flink-python consumes a lot of time in the `finishCurrentBatch` function.
A total of 1.4G parquet files, the overall conversion time is 70 seconds, but 
`finishCurrentBatch` takes a total of 40 seconds(especially `writeBatch` cost 
39 seconds in `finishCurrentBatch`).

 

So, we compare with spark, data convertion likes this: InternalRow --> arrow 
--> doris.
Using the same parquet file, the overall conversion time only takes 35 seconds, 
and `writeBatch` only cost 10 seconds.

 

In spark, we use `org.apache.spark.sql.execution.arrow.ArrowWriter` to convert 
`InternalRow` into arrowVector, and then serialize arrowVector into binary 
through `org.apache.arrow.vector.ipc.ArrowStreamWriter$writeBatch`.
Simple code like this:

 
{code:java}
            ArrowWriter arrowWriter = ArrowWriter.create(vectorSchemaRoot);

            // --- phase1: InternalRow to arrowVector
            while () {
                arrowWriter.write(iterator.next());
            }
            arrowWriter.finish();

            // --- phase2: arrowVector to binary
            ByteArrayOutputStream out = new ByteArrayOutputStream();
            ArrowStreamWriter writer = new ArrowStreamWriter(vectorSchemaRoot, 
null, out);
            writer.writeBatch();
            writer.end();

            // --- phase3: get binary
            out.toByteArray(); {code}
 

In flink, we use 
`org.apache.flink.table.runtime.arrow.serializers.ArrowSerializer`. This class 
is very useful, not only includes the conversion of RowData to arrowVector, but 
also the serialization of arrowVector to binary.
Simple code like this:


{code:java}
            arrowSerializer = new ArrowSerializer(rowType, rowType);
            outputStream = new ByteArrayOutputStream();
            arrowSerializer.open(new ByteArrayInputStream(new byte[0]), 
outputStream);

            // --- phase1: RowData to arrowVector
            while() {
                arrowSerializer.write(rowData);
            }

            // --- phase2: arrowVector to binary
            arrowSerializer.finishCurrentBatch();

            // --- phase3: get binary
            outputStream.toByteArray();
            outputStream.reset(); {code}
In phase 1 and phase 3, the time of flink and spark is basically the same. In 
phase 2, spark's writeBatch function took 10 seconds, but the writeBatch 
function in flink's finishCurrentBatch took 40 seconds.

Is there any flink related configuration that I am missing? Or, did I use it 
wrong somewhere in flink?

 

Looking forward to your reply! Thanks!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Re: [DISCUSS] FLIP-400: AsyncScalarFunction for asynchronous scalar function support

2023-12-14 Thread Timo Walther


Hi Alan,

thanks for proposing this FLIP. It's a great addition to Flink and has 
been requested multiple times. It will be in particular interesting for 
accessing REST endpoints and other remote services.


Great that we can generalize and reuse parts of the Python planner rules 
and code for this.


I have some feedback regarding the API:

1) Configuration

Configuration keys like

`table.exec.async-scalar.catalog.db.func-name.buffer-capacity`

are currently not supported in the configuration stack. The key space 
should remain constant. Only a constant key space enables the use of the 
ConfigOption class which is required in the layered configuration. For 
now I would suggest to only allow a global setting for buffer capacity, 
timeout, and retry-strategy. We can later work on a per-function 
configuration (potentially also needed for other use cases).


2) Semantical declaration

Regarding

`table.exec.async-scalar.catalog.db.func-name.output-mode`

this is a semantical property of a function and should be defined 
per-function. It impacts the query result and potentially the behavior 
of planner rules.


I see two options for this either: (a) an additional method in 
AsyncScalarFunction or (b) adding this to the function's requirements. I 
vote for (b), because a FunctionDefinition should be fully self 
contained and sufficient for planning.


Thus, for `FunctionDefinition.getRequirements(): 
Set` we can add a new requirement `ORDERED` which 
should also be the default for AsyncScalarFunction. `getRequirements()` 
can be overwritten and return a set without this requirement if the user 
intents to do this.



Thanks,
Timo




On 11.12.23 18:43, Piotr Nowojski wrote:

+1 to the idea, I don't have any comments.

Best,
Piotrek

czw., 7 gru 2023 o 07:15 Alan Sheinberg 
napisał(a):



Nicely written and makes sense.  The only feedback I have is around the
naming of the generalization, e.g. "Specifically, PythonCalcSplitRuleBase
will be generalized into RemoteCalcSplitRuleBase."  This naming seems to
imply/suggest that all Async functions are remote.  I wonder if we can

find

another name which doesn't carry that connotation; maybe
AsyncCalcSplitRuleBase.  (An AsyncCalcSplitRuleBase which handles Python
and Async functions seems reasonable.)


Thanks.  That's fair.  I agree that "Remote" isn't always accurate.  I
believe that the python calls are also done asynchronously, so that might
be a reasonable name, so long as there's no confusion between the base and
async child class.

On Wed, Dec 6, 2023 at 3:48 PM Jim Hughes 
wrote:


Hi Alan,

Nicely written and makes sense.  The only feedback I have is around the
naming of the generalization, e.g. "Specifically, PythonCalcSplitRuleBase
will be generalized into RemoteCalcSplitRuleBase."  This naming seems to
imply/suggest that all Async functions are remote.  I wonder if we can

find

another name which doesn't carry that connotation; maybe
AsyncCalcSplitRuleBase.  (An AsyncCalcSplitRuleBase which handles Python
and Async functions seems reasonable.)

Cheers,

Jim

On Wed, Dec 6, 2023 at 5:45 PM Alan Sheinberg
 wrote:


I'd like to start a discussion of FLIP-400: AsyncScalarFunction for
asynchronous scalar function support [1]

This feature proposes adding a new UDF type AsyncScalarFunction which

is

invoked just like a normal ScalarFunction, but is implemented with an
asynchronous eval method.  I had brought this up including the

motivation

in a previous discussion thread [2].

The purpose is to achieve high throughput scalar function UDFs while
allowing that an individual call may have high latency.  It allows

scaling

up the parallelism of just these calls without having to increase the
parallelism of the whole query (which could be rather resource
inefficient).

In practice, it should enable SQL integration with external services

and

systems, which Flink has limited support for at the moment. It should

also

allow easier integration with existing libraries which use asynchronous
APIs.

Looking forward to your feedback and suggestions.

[1]





https://cwiki.apache.org/confluence/display/FLINK/FLIP-400%3A+AsyncScalarFunction+for+asynchronous+scalar+function+support

<




https://cwiki.apache.org/confluence/display/FLINK/FLIP-400%3A+AsyncScalarFunction+for+asynchronous+scalar+function+support




[2] https://lists.apache.org/thread/bn153gmcobr41x2nwgodvmltlk810hzs


Thanks,
Alan

[DISCUSS] FLIP-387: Support named parameters for functions and call procedures

2023-12-14 Thread Feng Jin

Hi all,


Xuyang and I would like to start a discussion of FLIP-387: Support
named parameters for functions and call procedures [1]

Currently, when users call a function or call a procedure, they must
specify all fields in order. When there are a large number of
parameters, it is easy to make mistakes and cannot omit specifying
non-mandatory fields.

By using named parameters, you can selectively specify the required
parameters, reducing the probability of errors and making it more
convenient to use.

Here is an example of using Named Procedure.
```
-- for scalar function
SELECT my_scalar_function(param1 => ‘value1’, param2 => ‘value2’’) FROM []

-- for table function
SELECT  *  FROM TABLE(my_table_function(param1 => 'value1', param2 => 'value2'))

-- for agg function
SELECT my_agg_function(param1 => 'value1', param2 => 'value2') FROM []

-- for call procedure
CALL  procedure_name(param1 => ‘value1’, param2 => ‘value2’)
```

For UDX and Call procedure developers, we introduce a new annotation
to specify the parameter name, indicate if it is optional, and
potentially support specifying default values in the future

```
public @interface ArgumentHint {
/**
 * The name of the parameter, default is an empty string.
 */
String name() default "";

/**
 * Whether the parameter is optional, default is true.
 */
boolean isOptional() default true;
}}
```

```
// Call Procedure Development

public static class NamedArgumentsProcedure implements Procedure {

   // Example usage: CALL myNamedProcedure(in1 => 'value1', in2 => 'value2')

   // Example usage: CALL myNamedProcedure(in1 => 'value1', in2 =>
'value2', in3 => 'value3')

   @ProcedureHint(
   input = {@DataTypeHint(value = "STRING"),
@DataTypeHint(value = "STRING"), @DataTypeHint(value = "STRING")},
   output = @DataTypeHint("STRING"),
   arguments = {
@ArgumentHint(name = "in1", isOptional = false),
@ArgumentHint(name = "in2", isOptional = true)
@ArgumentHint(name = "in3", isOptional = true)})
   public String[] call(ProcedureContext procedureContext, String
arg1, String arg2, String arg3) {
   return new String[]{arg1 + ", " + arg2 + "," + arg3};
   }
}
```


Currently, we offer support for two scenarios when calling a function
or procedure:

1. The corresponding parameters can be specified using the parameter
name, without a specific order.
2. Unnecessary parameters can be omitted.


There are still some limitations when using Named parameters:
1. Named parameters do not support variable arguments.
2. UDX or procedure classes that support named parameters can only
have one eval method.
3. Due to the current limitations of Calcite-947[2], we cannot specify
a default value for omitted parameters, which is Null by default.



Also, thanks very much for the suggestions and help provided by Zelin
and Lincoln.




1. 
https://cwiki.apache.org/confluence/display/FLINK/FLIP-387%3A+Support+named+parameters+for+functions+and+call+procedures.

2. https://issues.apache.org/jira/browse/CALCITE-947



Best,

Feng

[jira] [Created] (FLINK-33820) StreamingWithStateTestBase's subclasses might fail during cleanup if sharing the state directory (through a common @TempDir)

2023-12-14 Thread Matthias Pohl (Jira)

Matthias Pohl created FLINK-33820:
-

 Summary: StreamingWithStateTestBase's subclasses might fail during 
cleanup if sharing the state directory (through a common @TempDir)
 Key: FLINK-33820
 URL: https://issues.apache.org/jira/browse/FLINK-33820
 Project: Flink
  Issue Type: Bug
Reporter: Matthias Pohl


FLINK-33641 revealed an issue where StreamingWithStateTestBase implementing 
test classes failed during cleanup due to some concurrency that's not fully 
understood, yet (see [FLINK-33641 
PR|https://github.com/apache/flink/pull/23914] discussion for more details).

FLINK-33641 provided a workaround for now that creates per-class temporary 
folders. This Jira is about investigating why this of test failures actually 
exists.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Re: [DISCUSS] Should Configuration support getting value based on String key?

2023-12-14 Thread Gyula Fóra

I see a strong value for user facing configs to use ConfigOption and this
should definitely be an enforced convention.

However with the Flink project growing and many other components and even
users using the Configuration object I really don’t think that we should
“force” this on the users/developers.

If we make fromMap / toMap free with basically no overhead, that is fine
but otherwise I think it would hurt the user experience to remove the
simple getters / setters. Temporary configoptions to access strings from
what is practically string->string map is exactly the kind of unnecessary
boilerplate that every dev and user wants to avoid.

Gyula

There are many cases where the features of the configoption are really not
needed.

On Thu, 14 Dec 2023 at 09:38, Xintong Song  wrote:

> Hi Gyula,
>
> First of all, even if we remove the `getXXX(String key, XXX defaultValue)`
> methods, there are still several ways to access the configuration with
> string-keys.
>
>- If one wants to access a specific option, as Rui mentioned,
>`ConfigOptions.key("xxx").stringType().noDefaultValue()` can be used.
> TBH,
>I can't think of a use case where a temporally created ConfigOption is
>preferred over a predefined one. Do you have any examples for that?
>- If one wants to access the whole configuration set, then `toMap` or
>`iterator` might be helpful.
>
> It is true that these ways are less convenient than `getXXX(String key, XXX
> defaultValue)`, and that's exactly my purpose, to make the key-string less
> convenient so that developers choose ConfigOption over it whenever is
> possible.
>
> there will always be cases where a more flexible
> > dynamic handling is necessary without the added overhead of the toMap
> logic
> >
>
> I'm not sure about this. I agree there are cases where flexible and dynamic
> handling is needed, but maybe "without the added overhead of the toMap
> logic" is not that necessary?
>
> I'd think of this as "encouraging developers to use ConfigOption as much as
> possible" vs. "a bit less convenient in 5% of the cases". I guess there's
> no right and wrong, just different engineer opinions. While I'm personally
> stand with removing the string-key access methods, I'd also be fine with
> the other way if there are more people in favor of it.
>
> Best,
>
> Xintong
>
>
>
> On Thu, Dec 14, 2023 at 3:45 PM Gyula Fóra  wrote:
>
> > Hi Xintong,
> >
> > I don’t really see the actual practical benefit from removing the
> getstring
> > and setstring low level methods.
> >
> > I understand that ConfigOptions are nicer for 95% of the cases but from a
> > technical point of view there will always be cases where a more flexible
> > dynamic handling is necessary without the added overhead of the toMap
> > logic.
> >
> > I think it’s the most natural thing for any config abstraction to expose
> > basic get set methods with a simple key.
> >
> > What do you think?
> >
> > Cheers
> > Gyula
> >
> > On Thu, 14 Dec 2023 at 08:00, Xintong Song 
> wrote:
> >
> > > >
> > > > IIUC, you both prefer using ConfigOption instead of string keys for
> > > > all use cases, even internal ones. We can even forcefully delete
> > > > these @Depreated methods in Flink-2.0 to guide users or
> > > > developers to use ConfigOption.
> > > >
> > >
> > > Yes, at least from my side.
> > >
> > >
> > > I noticed that Configuration is used in
> > > > DistributedCache#writeFileInfoToConfig and readFileInfoFromConfig
> > > > to store some cacheFile meta-information. Their keys are
> > > > temporary(key name with number) and it is not convenient
> > > > to predefine ConfigOption.
> > > >
> > >
> > > True, this one requires a bit more effort to migrate from string-key to
> > > ConfigOption, but still should be doable. Looking at how the two
> > mentioned
> > > methods are implemented and used, it seems what is really needed is
> > > serialization and deserialization of `DistributedCacheEntry`-s. And all
> > the
> > > entries are always written / read at once. So I think we can serialize
> > the
> > > whole set of entries into a JSON string (or something similar), and use
> > one
> > > ConfigOption with a deterministic key for it, rather than having one
> > > ConfigOption for each field in each entry. WDYT?
> > >
> > >
> > > If everyone agrees with this direction, we can start to refactor all
> > > > code that uses getXxx(String key, String defaultValue) into
> > > > getXxx(ConfigOption configOption), and completely
> > > > delete all getXxx(String key, String defaultValue) in 2.0.
> > > > And I'm willing to pick it up~
> > > >
> > >
> > > Exactly. Actually, Xuannan and a few other colleagues are working on
> > > reviewing all the existing configuration options. We identified some
> > common
> > > issues that can potentially be improved, and not using string-key is
> one
> > of
> > > them. We are still summarizing the findings, and will bring them to the
> > > community for discussion once ready. Helping hands is definitely
> >

Re: [DISCUSS] Allow TwoPhaseCommittingSink WithPreCommitTopology to alter the type of the Committable

2023-12-14 Thread Márton Balassi

+1

Thanks, Peter. Based on the consensus in the recent thread on FLIP-371 [1]
I agree that this is the right approach. I made some minor edits to the
FLIP, which looks good to me now.

[1] https://lists.apache.org/thread/h6nkgth838dlh5s90sd95zd6hlsxwx57

On Wed, Dec 13, 2023 at 5:30 PM Gyula Fóra  wrote:

> Thank you Peter Vary for updating the FLIP based on the discussions.
> I really like the improvements introduced by the mixin interfaces which now
> aligns much better with the source and table connectors.
>
> While this introduces some breaking changes to the existing connectors,
> this is a technical debt that we need to resolve as soon as possible and
> fully before 2.0.
>
> +1 from my side.
>
> I am cc'ing some folks participating in the other threads, sorry about that
> :)
>
> Cheers,
> Gyula
>
> On Wed, Dec 13, 2023 at 4:14 PM Péter Váry 
> wrote:
>
> > I have updated the FLIP-372 [1] based on the comments from multiple
> > sources. Moved to the mixin approach as this seems to be the consensus
> > based on this thread [2]
> > Also created a draft implementation [3] PR, so I can test the changes and
> > default implementations (no new tests yet)
> > Please provide your feedback, so I can address your questions, comments
> and
> > then we can move forward to voting.
> >
> > Thanks,
> > Peter
> >
> > [1] -
> >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-372%3A+Allow+TwoPhaseCommittingSink+WithPreCommitTopology+to+alter+the+type+of+the+Committable
> > [2] - https://lists.apache.org/thread/h6nkgth838dlh5s90sd95zd6hlsxwx57
> > [3] - https://github.com/apache/flink/pull/23912
> >
> > Péter Váry  ezt írta (időpont: 2023. dec.
> > 11.,
> > H, 14:28):
> >
> > > We identified another issue with the current Sink API in a thread [1]
> > > related to FLIP-371 [2]. Currently it is not possible to evolve the
> > > Sink.createWriter method with deprecation, because StatefulSink and
> > > TwoPhaseCommittingSink has methods with the same name and parameters,
> but
> > > narrowed down return type (StatefulSinkWriter,
> PrecommittingSinkWriter).
> > >
> > > To make the Sink API evolvable, we minimally have to remove these.
> > >
> > > The participants there also pointed out, that the Source API also uses
> > > mixin interfaces (SupportsHandleExecutionAttemptSourceEvent,
> > > SupportsIntermediateNoMoreSplits) in some cases. My observation is that
> > it
> > > has inheritance as well (ReaderOutput, ExternallyInducedSourceReader)
> > >
> > > I have created a draft API along these lines in a branch [3] where only
> > > the last commit is relevant [4]. This implementation would follow the
> > same
> > > patterns as the current Source API.
> > >
> > > I see two different general approaches here, and I would like to hear
> > your
> > > preferences:
> > > - Keep the changes minimal, stick to the current Sink API design. We
> > > introduce the required new combination of interfaces
> > > (TwoPhaseCommttingSinkWithPreCommitTopology,
> > > WithPostCommitTopologyWithPreCommitTopology), and do not change the API
> > > structure.
> > >  - Pros:
> > >   - Minimal change - smaller rewrite on the connector side
> > >   - Type checks happen on compile time
> > >  - Cons:
> > >   - Harder to evolve
> > >   - The number of interfaces increases with the possible
> > > combinations
> > >   - Inconsistent API patterns wrt Source API - harder for
> > > developers to understand
> > > - Migrate to a model similar to the Source API. We create mixin
> > interfaces
> > > for SupportsCommitter, SupportsWriterState, SupportsPreCommitTopology,
> > > SupportsPostCommitTopology, SupportsPreWriteTopology.
> > > - Pros:
> > > - Better evolvability
> > > - Consistent with the Source API
> > > - Cons:
> > > - The connectors need to change their inheritance patterns
> (after
> > > the deprecation period) if they are using any of the more complicated
> > Sinks.
> > > - Type checks need to use `instanceof`, which could happen on
> DAG
> > > generation time. Also, if the developer fails to correctly match the
> > > generic types on the mixin interfaces, the error will only surface
> during
> > > execution time - when the job tries to process the first record
> > >
> > > I personally prefer the Mixin approach for easier evolvability and
> better
> > > consistency, but I would like to hear your thoughts, so I can flash out
> > the
> > > chosen approach in FLIP-372
> > >
> > > Thanks,
> > > Peter
> > >
> > > [1] - https://lists.apache.org/thread/h6nkgth838dlh5s90sd95zd6hlsxwx57
> > > [2] -
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-371%3A+Provide+initialization+context+for+Committer+creation+in+TwoPhaseCommittingSink
> > > [3] - https://github.com/pvary/flink/tree/mixin_demo
> > > [4] -
> > >
> >
> https://github.com/pvary/flink/commit/acfc09f4c846f983f633bbf0c902aea49aa6b156
> > >
> > >
> > > On Fri, Nov 24, 2023, 11:38 Gyula Fóra

Re: [DISCUSS] Should Configuration support getting value based on String key?

2023-12-14 Thread Xintong Song

Hi Gyula,

First of all, even if we remove the `getXXX(String key, XXX defaultValue)`
methods, there are still several ways to access the configuration with
string-keys.

   - If one wants to access a specific option, as Rui mentioned,
   `ConfigOptions.key("xxx").stringType().noDefaultValue()` can be used. TBH,
   I can't think of a use case where a temporally created ConfigOption is
   preferred over a predefined one. Do you have any examples for that?
   - If one wants to access the whole configuration set, then `toMap` or
   `iterator` might be helpful.

It is true that these ways are less convenient than `getXXX(String key, XXX
defaultValue)`, and that's exactly my purpose, to make the key-string less
convenient so that developers choose ConfigOption over it whenever is
possible.

there will always be cases where a more flexible
> dynamic handling is necessary without the added overhead of the toMap logic
>

I'm not sure about this. I agree there are cases where flexible and dynamic
handling is needed, but maybe "without the added overhead of the toMap
logic" is not that necessary?

I'd think of this as "encouraging developers to use ConfigOption as much as
possible" vs. "a bit less convenient in 5% of the cases". I guess there's
no right and wrong, just different engineer opinions. While I'm personally
stand with removing the string-key access methods, I'd also be fine with
the other way if there are more people in favor of it.

Best,

Xintong



On Thu, Dec 14, 2023 at 3:45 PM Gyula Fóra  wrote:

> Hi Xintong,
>
> I don’t really see the actual practical benefit from removing the getstring
> and setstring low level methods.
>
> I understand that ConfigOptions are nicer for 95% of the cases but from a
> technical point of view there will always be cases where a more flexible
> dynamic handling is necessary without the added overhead of the toMap
> logic.
>
> I think it’s the most natural thing for any config abstraction to expose
> basic get set methods with a simple key.
>
> What do you think?
>
> Cheers
> Gyula
>
> On Thu, 14 Dec 2023 at 08:00, Xintong Song  wrote:
>
> > >
> > > IIUC, you both prefer using ConfigOption instead of string keys for
> > > all use cases, even internal ones. We can even forcefully delete
> > > these @Depreated methods in Flink-2.0 to guide users or
> > > developers to use ConfigOption.
> > >
> >
> > Yes, at least from my side.
> >
> >
> > I noticed that Configuration is used in
> > > DistributedCache#writeFileInfoToConfig and readFileInfoFromConfig
> > > to store some cacheFile meta-information. Their keys are
> > > temporary(key name with number) and it is not convenient
> > > to predefine ConfigOption.
> > >
> >
> > True, this one requires a bit more effort to migrate from string-key to
> > ConfigOption, but still should be doable. Looking at how the two
> mentioned
> > methods are implemented and used, it seems what is really needed is
> > serialization and deserialization of `DistributedCacheEntry`-s. And all
> the
> > entries are always written / read at once. So I think we can serialize
> the
> > whole set of entries into a JSON string (or something similar), and use
> one
> > ConfigOption with a deterministic key for it, rather than having one
> > ConfigOption for each field in each entry. WDYT?
> >
> >
> > If everyone agrees with this direction, we can start to refactor all
> > > code that uses getXxx(String key, String defaultValue) into
> > > getXxx(ConfigOption configOption), and completely
> > > delete all getXxx(String key, String defaultValue) in 2.0.
> > > And I'm willing to pick it up~
> > >
> >
> > Exactly. Actually, Xuannan and a few other colleagues are working on
> > reviewing all the existing configuration options. We identified some
> common
> > issues that can potentially be improved, and not using string-key is one
> of
> > them. We are still summarizing the findings, and will bring them to the
> > community for discussion once ready. Helping hands is definitely
> welcomed.
> > :)
> >
> >
> > Yeah, `toMap` can solve the problem, and I also mentioned it in the
> > > initial mail.
> > >
> >
> > True. Sorry I overlooked it.
> >
> >
> > Best,
> >
> > Xintong
> >
> >
> >
> > On Thu, Dec 14, 2023 at 2:14 PM Rui Fan <1996fan...@gmail.com> wrote:
> >
> > > Thanks Xintong and Xuannan for the feedback!
> > >
> > > IIUC, you both prefer using ConfigOption instead of string keys for
> > > all use cases, even internal ones. We can even forcefully delete
> > > these @Depreated methods in Flink-2.0 to guide users or
> > > developers to use ConfigOption.
> > >
> > > Using ConfigOption is feasible in all scenarios because ConfigOption
> > > can be easily created via
> > > `ConfigOptions.key("xxx").stringType().noDefaultValue()` even if
> > > there is no predefined ConfigOption.
> > >
> > > I noticed that Configuration is used in
> > > DistributedCache#writeFileInfoToConfig and readFileInfoFromConfig
> > > to store some cacheFile meta-information. Their keys are
>

62 matches

Mail list logo