Re: [DISCUSS] FLIP-400: AsyncScalarFunction for asynchronous scalar function support

2023-12-18 Thread Alan Sheinberg
Thanks for the helpful comments, Xuyang and Timo.

@Timo, @Alan: IIUC, there seems to be something wrong here. Take kafka as
> source and mysql as sink as an example.
> Although kafka is an append-only source, one of its fields is used as pk
> when writing to mysql. If async udx is executed
>  in an unordered mode, there may be problems with the data in mysql in the
> end. In this case, we need to ensure that
> the sink-based pk is in order actually.


@Xuyang: That's a great point.  If some node downstream of my operator
cares about ordering, there's no way for it to reconstruct the original
ordering of the rows as they were input to my operator.  So even if they
want to preserve ordering by key, the order in which they see it may
already be incorrect.  Somehow I thought that maybe the analysis of the
changelog mode at a given operator was aware of downstream operations, but
it seems not.

Clear "no" on this. Changelog semantics make the planner complex and we
> need to be careful. Therefore I would strongly suggest we introduce
> ORDERED and slowly enable UNORDERED whenever we see a good fit for it in
> plans with appropriate planner rules that guard it.


@Timo: The better I understand the complexity, the more I agree with this.
I would be totally fine with the first version only having ORDERED mode.
For a v2, we could attempt to do the next most conservative thing and only
allow UNORDERED when the whole graph is in *INSERT *changelog mode.  The
next best type of optimization might understand what's the key required
downstream, and allow breaking the original order only between unrelated
keys, but maintaining it between rows of the same key.  Of course if the
key used downstream is computed in some manner, that makes it all the
harder to know this beforehand.

So unordering should be fine *within* watermarks. This is also what
> watermarks are good for, a trade-off between strict ordering and making
> progress. The async operator from DataStream API also supports this if I
> remember correctly. However, it assumes a timestamp is present in
> StreamRecord on which it can work. But this is not the case within the
> SQL engine.


*AsyncWaitOperator* and *UnorderedStreamElementQueue* (the implementations
I plan on using) seem to support exactly this behavior.  I don't think it
makes assumptions about the record's timestamp, but just preserves whatever
the input order is w.r.t watermarks.  I'd be curious to understand the
timestamp use in more detail and see if it's required with the mentioned
classes.

TLDR: Let's focus on ORDERED first.


I'm more than happy to start here and we can consider UNORDERED as a
followup.  Then maybe we consider only INSERT mode graphs and ones where we
can solve the watermark constraints.

Thanks,
Alan


On Mon, Dec 18, 2023 at 2:36 AM Timo Walther  wrote:

> Hi Xuyang and Alan,
>
> thanks for this productive discussion.
>
>  > Would it make a difference if it were exposed by the explain
>
> @Alan: I think this is great idea. +1 on exposing the sync/async
> behavior thought EXPLAIN.
>
>
>  > Is there an easy way to determine if the output of an async function
>  > would be problematic or not?
>
> Clear "no" on this. Changelog semantics make the planner complex and we
> need to be careful. Therefore I would strongly suggest we introduce
> ORDERED and slowly enable UNORDERED whenever we see a good fit for it in
> plans with appropriate planner rules that guard it.
>
>  > If the input to the operator is append-only, it seems fine, because
>  > this implies that each row is effectively independent and ordering is
>  > unimportant.
>
> As @Xuyang pointed out, it's not only the input that decides whether
> append-only is safe. It's also the subsequent operators in the pipeline.
> The example of Xuyang is a good one, when the sink operates in upsert
> mode. Append-only source, append-only operators, and append-only sink
> are safer.
>
> However, even in this combination, a row is not fully "independent"
> there are still watermarks flowing between rows:
>
> R(5), W(4), R(3), R(4), R(2), R(1), W(0)
>
> So unordering should be fine *within* watermarks. This is also what
> watermarks are good for, a trade-off between strict ordering and making
> progress. The async operator from DataStream API also supports this if I
> remember correctly. However, it assumes a timestamp is present in
> StreamRecord on which it can work. But this is not the case within the
> SQL engine.
>
> TLDR: Let's focus on ORDERED first.
>
> If we want to use UNORDERED, I would suggest to check the input operator
> for exactly 1 time attribute column. If there is exactly 1 time
> attribute column, we could insert it into the StreamRecord and allow
> UNORDERED mode. If this condition is not met, we go with ORDERED.
>
> Regards,
> Timo
>
>
>
>
> On 18.12.23 07:05, Xuyang wrote:
> > Hi, Alan and Timo. Thanks for your reply.
> >> Would it make a difference if it were exposed by the explain
> >> method (the operator 

Re: 退订

2023-12-18 Thread Hang Ruan
Please send email to user-unsubscr...@flink.apache.org if you want to
unsubscribe the mail from u...@flink.apache.org, and you can refer [1][2]
for more details.
请发送任意内容的邮件到 user-unsubscr...@flink.apache.org 地址来取消订阅来自
u...@flink.apache.org 邮件组的邮件,你可以参考[1][2] 管理你的邮件订阅。

Best,
Hang

[1]
https://flink.apache.org/zh/community/#%e9%82%ae%e4%bb%b6%e5%88%97%e8%a1%a8
[2] https://flink.apache.org/community.html#mailing-lists

唐大彪  于2023年12月18日周一 23:44写道:

> 退订
>


Re: [DISCUSS] FLIP-377: Support configuration to disable filter push down for Table/SQL Sources

2023-12-18 Thread Jiabao Sun
Thanks Becket for the suggestions,

Updated.
Please help review it again when you have time.

Best,
Jiabao


> 2023年12月19日 09:06,Becket Qin  写道:
> 
> Hi JIabao,
> 
> Thanks for updating the FLIP. It looks better. Some suggestions / questions:
> 
> 1. In the motivation section:
> 
>> *Currently, Flink Table/SQL does not expose fine-grained control for users
>> to control filter pushdown. **However, filter pushdown has some side
>> effects, such as additional computational pressure on external
>> systems. Moreover, Improper queries can lead to issues such as full table
>> scans, which in turn can impact the stability of external systems.*
> 
> This statement sounds like the side effects are there for all the systems,
> which is inaccurate. Maybe we can say:
> *Currently, Flink Table/SQL does not expose fine-grained control for users
> to control filter pushdown. **However, filter pushdown may have side
> effects in some cases, **such as additional computational pressure on
> external systems. The JDBC source is a typical example of that. If a filter
> is pushed down to the database, an expensive full table scan may happen if
> the filter involves unindexed columns.*
> 
> 2. Regarding the prefix, usually a prefix is not required for the top level
> connector options. This is because the *connector* option is already there.
> So
>'connector' = 'jdbc',
>  'filter.handling.policy' = 'ALWAYS'
> is sufficient.
> 
> The prefix is needed when the option is for a 2nd+ level. For example,
>'connector' = 'jdbc',
>'format' = 'orc',
>'orc.some.option' = 'some_value'
> In this case, the prefix of "orc" is needed to make it clear this option is
> for the format.
> 
> I am guessing that the reason that currently the connector prefix is there
> is because the values of this configuration may be different depending on
> the connectors. For example, jdbc may have INDEXED_ONLY while MongoDB may
> have something else. Personally speaking, I am fine if we do not have a
> prefix in this case because users have already specified the connector type
> and it is intuitive enough that the option value is for that connector, not
> others.
> 
> 3. can we clarify on the following statement:
> 
>> *Introduce the native configuration [prefix].filter.handling.policy in the
>> connector.*
> 
> What do you mean by "native configuration"? From what I understand, the
> FLIP does the following:
> - introduces a new configuration to the JDBC and MongoDB connector.
> - Suggests a convention option name if other connectors are going to add an
> option for the same purpose.
> 
> Thanks,
> 
> Jiangjie (Becket) Qin
> 
> 
> 
> On Mon, Dec 18, 2023 at 5:45 PM Jiabao Sun 
> wrote:
> 
>> Hi Becket,
>> 
>> The FLIP document[1] has been updated.
>> Could you help take a look again?
>> 
>> Thanks,
>> Jiabao
>> 
>> [1]
>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=276105768
>> 
>> 
>>> 2023年12月18日 16:53,Becket Qin  写道:
>>> 
>>> Yes, that sounds reasonable to me. We can start with ALWAYS and NEVER,
>> and
>>> add more policies as needed.
>>> 
>>> Thanks,
>>> 
>>> Jiangjie (Becket) Qin
>>> 
>>> On Mon, Dec 18, 2023 at 4:48 PM Jiabao Sun > .invalid>
>>> wrote:
>>> 
 Thanks Bucket,
 
 The jdbc.filter.handling.policy is good to me as it provides sufficient
 extensibility for future filter pushdown optimizations.
 However, currently, we don't have an implementation for the AUTO mode,
>> and
 it seems that the AUTO mode can easily be confused with the ALWAYS mode
 because users don't have the opportunity to MANUALLY decide which
>> filters
 to push down.
 
 I suggest that we only introduce the ALWAYS and NEVER modes for now, and
 we can consider introducing more flexible policies in the future,
 such as INDEX_ONLY, NUMBERIC_ONLY and so on.
 
 WDYT?
 
 Best,
 Jiabao
 
 
 
> 2023年12月18日 16:27,Becket Qin  写道:
> 
> Hi Jiabao,
> 
> Please see the reply inline.
> 
> 
>> The MySQL connector is currently in the flink-connector-jdbc
>> repository
>> and is not a standalone connector.
>> Is it too unique to use "mysql" as the configuration option prefix?
> 
> If the intended behavior makes sense to all the supported JDBC drivers,
 we
> can make this a JDBC connector configuration.
> 
> Also, I would like to ask about the difference in behavior between AUTO
 and
>> ALWAYS.
>> It seems that we cannot guarantee the pushing down of all filters to
>> the
>> external system under the ALWAYS
>> mode because not all filters in Flink SQL are supported by the
>> external
>> system.
>> Should we throw an error when encountering a filter that cannot be
 pushed
>> down in the ALWAYS mode?
> 
> The idea of AUTO is to do efficiency-aware pushdowns. The source will
 query
> the external system (MySQL, Oracle, SQL Server, etc) first to retrieve
 the
> 

[jira] [Created] (FLINK-33878) Many Keyed Operator extends `TableStreamOperator` which is marked without key.

2023-12-18 Thread xuyang (Jira)
xuyang created FLINK-33878:
--

 Summary: Many Keyed Operator extends `TableStreamOperator` which 
is marked without key. 
 Key: FLINK-33878
 URL: https://issues.apache.org/jira/browse/FLINK-33878
 Project: Flink
  Issue Type: Technical Debt
  Components: Table SQL / Runtime
Reporter: xuyang


Many Keyed Operator like `WindowJoinOperator` and `SlicingWindowOperator` 
extends `TableStreamOperator` which is marked without key. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re:Re:Re: [DISCUSS] FLIP-392: Deprecate the Legacy Group Window Aggregation

2023-12-18 Thread Xuyang
Hi, Timo. Sorry for this noise.
What do you think about splitting the flip like this?




--

Best!
Xuyang





At 2023-12-15 10:05:32, "Xuyang"  wrote:
>Hi, Timo, thanks for your advice.
>
>
>I am considering splitting the existing flip into two while leaving the 
>existing flip (or without). 
>One of them points to the completion of the operator about window tvf to 
>support CDC (there are several 
>small work items, such as window agg, window rank, window join, etc. Due to 
>time constraints, 
>the 1.19 version takes priority to complete the window agg). The other points 
>to the HOP window tvf
>supports a size that is a non-integer multiple of the step. Once these two 
>flips are basically completed 
>in 1.19, we can consider officially deprecating the old group window agg 
>syntax in the release note.
>
>
>WDYT?
>
>
>
>
>--
>
>Best!
>Xuyang
>
>
>
>
>
>At 2023-12-14 17:51:01, "Timo Walther"  wrote:
>>Hi Xuyang,
>>
>> > I'm not spliting this flip is that all of these subtasks like session 
>>window tvf and cdc support do not change the public interface and the 
>>public syntax
>>
>>Given the length of this mailing list discussion and number of involved 
>>people I would strongly suggest to simplify the FLIP and give it a 
>>better title to make quicker progress. In general, we all seem to be on 
>>the same page in what we want. And both session TVF support and the 
>>deprecation of the legacy group windows has been voted already and 
>>discussed thouroughly. The FLIP can purely focus on the CDC topic.
>>
>>Cheers,
>>Timo
>>
>>
>>On 14.12.23 08:35, Xuyang wrote:
>>> Hi, Timo, Sergey and Lincoln Lee. Thanks for your feedback.
>>> 
>>> 
 In my opinion the FLIP touches too many
 topics at the same time and should be split into multiple FLIPs. We > 
 should stay focused on what is needed for Flink 2.0.
>>> The main goal and topic of this Flip is to align the abilities between the 
>>> legacy group window agg syntax and the new window tvf syntax,
>>> and then we can say that the legacy window syntax will be deprecated. IMO, 
>>> although there are many misalignments about these two
>>> syntaxes, such as session window tvf, cdc support and so on,  they are all 
>>> the subtasks we need to do in this flip. Another reason I'm not
>>> spliting this flip is that all of these subtasks like session window tvf 
>>> and cdc support do not change the public interface and the public
>>> syntax, the implements of them will only be in modules table-planner and 
>>> table-runtime.
>>> 
>>> 
 Can we postpone this discussion? Currently we should focus on user
 switching to Window TVFs before Flink 2.0. Early fire, late fire and > 
 allow lateness have not exposed through public configuration. It can be > 
 introduced after Flink 2.0 released.
>>> 
>>> 
>>> Agree with you. This flip will not and should not expose these experimental 
>>> flink conf to users. I list them in this flip just aims to show the
>>> misalignments about these two window syntaxes.
>>> 
>>> 
>>> Look for your thought.
>>> 
>>> 
>>> 
>>> 
>>> --
>>> 
>>>  Best!
>>>  Xuyang
>>> 
>>> 
>>> 
>>> 
>>> 
>>> At 2023-12-13 15:40:16, "Lincoln Lee"  wrote:
 Thanks Xuyang driving this work! It's great that everyone agrees with the
 work itself in this flip[1]!

 Regarding whether to split the flip or adjust the scope of this flip, I'd
 like to share some thoughts:

 1. About the title of this flip, what I want to say is that flip-145[2] had
 marked the legacy group window deprecated in the documentation but the
 functionality of the new syntax is not aligned with the legacy one.
 This is not a user-friendly deprecation, so the initiation of this flip, as
 I understand it, is for the formal deprecation of the legacy window, which
 requires us to complete the functionality alignment.

 2. Agree with Timo that we can process the late-fire/early-fire features
 separately. These experimental parameters have not been officially opened
 to users.
 Considering the workload, we can focus more on this version.

 3. I have no objection to splitting this flip if everyone feels that the
 work included is too much.
 Regarding the support of session tvf, it seems that the main problem is
 that this part of the description occupies a large part of the flip,
 causing some misunderstandings.
 This is indeed a predetermined task in FLIP-145, just adding more
 explanation about semantics. In addition, I saw the discussion history in
 FLINK-24024[3], thanks Sergey for being willing to help driving this work
 together.

 [1]
 https://cwiki.apache.org/confluence/display/FLINK/FLIP-392%3A+Deprecate+the+Legacy+Group+Window+Aggregation
 [2]
 https://cwiki.apache.org/confluence/display/FLINK/FLIP-145%3A+Support+SQL+windowing+table-valued+function
 [3] https://issues.apache.org/jira/browse/FLINK-24024

 

[jira] [Created] (FLINK-33877) CollectSinkFunctionTest.testConfiguredPortIsUsed fails due to BindException

2023-12-18 Thread Jiabao Sun (Jira)
Jiabao Sun created FLINK-33877:
--

 Summary: CollectSinkFunctionTest.testConfiguredPortIsUsed fails 
due to BindException
 Key: FLINK-33877
 URL: https://issues.apache.org/jira/browse/FLINK-33877
 Project: Flink
  Issue Type: Bug
  Components: API / DataStream
Affects Versions: 1.19.0
Reporter: Jiabao Sun


https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=55646=logs=0da23115-68bb-5dcd-192c-bd4c8adebde1=24c3384f-1bcb-57b3-224f-51bf973bbee8=9482


{noformat}
Dec 18 17:49:57 17:49:57.241 [ERROR] 
org.apache.flink.streaming.api.operators.collect.CollectSinkFunctionTest.testConfiguredPortIsUsed
 -- Time elapsed: 0.021 s <<< ERROR!
Dec 18 17:49:57 java.net.BindException: Address already in use (Bind failed)
Dec 18 17:49:57 at java.net.PlainSocketImpl.socketBind(Native Method)
Dec 18 17:49:57 at 
java.net.AbstractPlainSocketImpl.bind(AbstractPlainSocketImpl.java:387)
Dec 18 17:49:57 at java.net.ServerSocket.bind(ServerSocket.java:390)
Dec 18 17:49:57 at java.net.ServerSocket.(ServerSocket.java:252)
Dec 18 17:49:57 at 
org.apache.flink.streaming.api.operators.collect.CollectSinkFunction$ServerThread.(CollectSinkFunction.java:375)
Dec 18 17:49:57 at 
org.apache.flink.streaming.api.operators.collect.CollectSinkFunction$ServerThread.(CollectSinkFunction.java:362)
Dec 18 17:49:57 at 
org.apache.flink.streaming.api.operators.collect.CollectSinkFunction.open(CollectSinkFunction.java:252)
Dec 18 17:49:57 at 
org.apache.flink.streaming.api.operators.collect.utils.CollectSinkFunctionTestWrapper.openFunction(CollectSinkFunctionTestWrapper.java:103)
Dec 18 17:49:57 at 
org.apache.flink.streaming.api.operators.collect.CollectSinkFunctionTest.testConfiguredPortIsUsed(CollectSinkFunctionTest.java:138)
Dec 18 17:49:57 at java.lang.reflect.Method.invoke(Method.java:498)
Dec 18 17:49:57 at 
java.util.concurrent.RecursiveAction.exec(RecursiveAction.java:189)
Dec 18 17:49:57 at 
java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
Dec 18 17:49:57 at 
java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056)
Dec 18 17:49:57 at 
java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)
Dec 18 17:49:57 at 
java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:175)
{noformat}





--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] FLIP-377: Support configuration to disable filter push down for Table/SQL Sources

2023-12-18 Thread Becket Qin
Hi JIabao,

Thanks for updating the FLIP. It looks better. Some suggestions / questions:

1. In the motivation section:

> *Currently, Flink Table/SQL does not expose fine-grained control for users
> to control filter pushdown. **However, filter pushdown has some side
> effects, such as additional computational pressure on external
> systems. Moreover, Improper queries can lead to issues such as full table
> scans, which in turn can impact the stability of external systems.*

This statement sounds like the side effects are there for all the systems,
which is inaccurate. Maybe we can say:
*Currently, Flink Table/SQL does not expose fine-grained control for users
to control filter pushdown. **However, filter pushdown may have side
effects in some cases, **such as additional computational pressure on
external systems. The JDBC source is a typical example of that. If a filter
is pushed down to the database, an expensive full table scan may happen if
the filter involves unindexed columns.*

2. Regarding the prefix, usually a prefix is not required for the top level
connector options. This is because the *connector* option is already there.
So
'connector' = 'jdbc',
  'filter.handling.policy' = 'ALWAYS'
is sufficient.

The prefix is needed when the option is for a 2nd+ level. For example,
'connector' = 'jdbc',
'format' = 'orc',
'orc.some.option' = 'some_value'
In this case, the prefix of "orc" is needed to make it clear this option is
for the format.

I am guessing that the reason that currently the connector prefix is there
is because the values of this configuration may be different depending on
the connectors. For example, jdbc may have INDEXED_ONLY while MongoDB may
have something else. Personally speaking, I am fine if we do not have a
prefix in this case because users have already specified the connector type
and it is intuitive enough that the option value is for that connector, not
others.

3. can we clarify on the following statement:

> *Introduce the native configuration [prefix].filter.handling.policy in the
> connector.*

What do you mean by "native configuration"? From what I understand, the
FLIP does the following:
- introduces a new configuration to the JDBC and MongoDB connector.
- Suggests a convention option name if other connectors are going to add an
option for the same purpose.

Thanks,

Jiangjie (Becket) Qin



On Mon, Dec 18, 2023 at 5:45 PM Jiabao Sun 
wrote:

> Hi Becket,
>
> The FLIP document[1] has been updated.
> Could you help take a look again?
>
> Thanks,
> Jiabao
>
> [1]
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=276105768
>
>
> > 2023年12月18日 16:53,Becket Qin  写道:
> >
> > Yes, that sounds reasonable to me. We can start with ALWAYS and NEVER,
> and
> > add more policies as needed.
> >
> > Thanks,
> >
> > Jiangjie (Becket) Qin
> >
> > On Mon, Dec 18, 2023 at 4:48 PM Jiabao Sun  .invalid>
> > wrote:
> >
> >> Thanks Bucket,
> >>
> >> The jdbc.filter.handling.policy is good to me as it provides sufficient
> >> extensibility for future filter pushdown optimizations.
> >> However, currently, we don't have an implementation for the AUTO mode,
> and
> >> it seems that the AUTO mode can easily be confused with the ALWAYS mode
> >> because users don't have the opportunity to MANUALLY decide which
> filters
> >> to push down.
> >>
> >> I suggest that we only introduce the ALWAYS and NEVER modes for now, and
> >> we can consider introducing more flexible policies in the future,
> >> such as INDEX_ONLY, NUMBERIC_ONLY and so on.
> >>
> >> WDYT?
> >>
> >> Best,
> >> Jiabao
> >>
> >>
> >>
> >>> 2023年12月18日 16:27,Becket Qin  写道:
> >>>
> >>> Hi Jiabao,
> >>>
> >>> Please see the reply inline.
> >>>
> >>>
>  The MySQL connector is currently in the flink-connector-jdbc
> repository
>  and is not a standalone connector.
>  Is it too unique to use "mysql" as the configuration option prefix?
> >>>
> >>> If the intended behavior makes sense to all the supported JDBC drivers,
> >> we
> >>> can make this a JDBC connector configuration.
> >>>
> >>> Also, I would like to ask about the difference in behavior between AUTO
> >> and
>  ALWAYS.
>  It seems that we cannot guarantee the pushing down of all filters to
> the
>  external system under the ALWAYS
>  mode because not all filters in Flink SQL are supported by the
> external
>  system.
>  Should we throw an error when encountering a filter that cannot be
> >> pushed
>  down in the ALWAYS mode?
> >>>
> >>> The idea of AUTO is to do efficiency-aware pushdowns. The source will
> >> query
> >>> the external system (MySQL, Oracle, SQL Server, etc) first to retrieve
> >> the
> >>> information of the table. With that information, the source will decide
> >>> whether to further push a filter to the external system based on the
> >>> efficiency. E.g. only push the indexed fields. In contrast, ALWAYS will
> >>> just always push the supported filters to the external system,
> regardless
> 

Re: [DISCUSS] Release Flink 1.18.1

2023-12-18 Thread Jing Ge
Hi folks,

Found another blocker issue for 1.18:
https://github.com/apache/flink/pull/23950. Thanks Sergey for the heads-up.

Best regards,
Jing

On Mon, Dec 18, 2023 at 5:02 PM Jing Ge  wrote:

> Hi,
>
> I am waiting for the CI of
> https://issues.apache.org/jira/browse/FLINK-33872 @Hong. It would be
> great if anyone would like to review the PR. Thanks.
>
> 1.18.1-rc1 release will be canceled because of
> https://issues.apache.org/jira/browse/FLINK-33704, I will start with rc2
> once FLINK-33872 has been merged into 1.18.
>
> Best regards,
> Jing
>
> On Mon, Dec 18, 2023 at 11:52 AM Jing Ge  wrote:
>
>> Hi folks,
>>
>> Thanks Martijn for driving the
>> https://issues.apache.org/jira/browse/FLINK-33704, which will solve the
>> issue of https://issues.apache.org/jira/browse/FLINK-33793. I will move
>> forward with the 1.18.1 release today.
>>
>> Best regards,
>> Jing
>>
>> On Thu, Dec 14, 2023 at 11:17 AM Jing Ge  wrote:
>>
>>> Hi folks,
>>>
>>> What Martijn said makes sense. We should pay more attention to
>>> https://issues.apache.org/jira/browse/FLINK-33793. I have also tried to
>>> contribute and share my thoughts in the PR
>>> https://github.com/apache/flink/pull/23489.
>>>
>>> For 1.18.1-rc1, I will wait until tomorrow. If there is no progress with
>>> the PR, I will move forward with the release, unless we upgrade the ticket
>>> to be a Blocker. Look forward to your feedback.
>>>
>>> Best regards,
>>> Jing
>>>
>>> On Wed, Dec 13, 2023 at 9:57 AM Jing Ge  wrote:
>>>
 Hi Martijn,

 Thanks for the heads-up! I already started 1.18.1-rc1[1]. But please
 feel free to ping me. I will either redo rc1 or start rc2. Thanks!

 Best regards,
 Jing

 [1] https://dist.apache.org/repos/dist/dev/flink/flink-1.18.1-rc1/

 On Tue, Dec 12, 2023 at 9:49 PM Martijn Visser <
 martijnvis...@apache.org> wrote:

> Hi Jing,
>
> The only thing that was brought to my attention today was
> https://issues.apache.org/jira/browse/FLINK-33793. I've marked it as a
> Critical (I think we should have working checkpointing with GCS), but
> could also be considered a Blocker. There is a PR open for it, but I
> don't think it's the right fix so I'm testing
>
> https://github.com/apache/flink/commit/2ebe7f6e090690486c1954099a2b57283c578192
> at this moment. If you haven't started the 1.18.1 release yet, perhaps
> we could include it (hopefully we can merge it tomorrow), else it
> would have to wait for the next release.
>
> Thanks,
>
> Martijn
>
> On Tue, Dec 12, 2023 at 9:30 AM Jing Ge 
> wrote:
> >
> > Hi All,
> >
> > Thank you for your feedback!
> >
> > Since FLINK-33523[1] is done(thanks Martijn for driving it) and
> there are
> > no other concerns or objections. Currently I am not aware of any
> unresolved
> > blockers. There is one critical task[2] whose PR still has some
> checkstyle
> > issues. I will try to include it into 1.18.1 with best effort, since
> it
> > depends on how quickly the author can finish it and cp to 1.18. If
> anyone
> > considers FLINK-33588 as a must-have fix in 1.18.1, please let me
> know,
> > thanks!
> >
> > I will start working on the RC1 release today.
> >
> > @benchao: gotcha wrt FLINK-33313
> >
> > Best regards,
> > Jing
> >
> > [1] https://issues.apache.org/jira/browse/FLINK-33523
> > [2] https://issues.apache.org/jira/browse/FLINK-33588
> >
> > On Tue, Dec 12, 2023 at 7:03 AM weijie guo <
> guoweijieres...@gmail.com>
> > wrote:
> >
> > > Thanks Jing for driving this bug-fix release.
> > >
> > > +1 from my side.
> > >
> > > Best regards,
> > >
> > > Weijie
> > >
> > >
> > > Jark Wu  于2023年12月12日周二 12:17写道:
> > >
> > > > Thanks Jing for driving 1.18.1.
> > > > +1 for this.
> > > >
> > > > Best,
> > > > Jark
> > > >
> > > > On Mon, 11 Dec 2023 at 16:59, Hong Liang 
> wrote:
> > > >
> > > > > +1. Thanks Jing for driving this.
> > > > >
> > > > > Hong
> > > > >
> > > > > On Mon, Dec 11, 2023 at 2:27 AM Yun Tang 
> wrote:
> > > > >
> > > > > > Thanks Jing for driving 1.18.1 release, +1 for this.
> > > > > >
> > > > > >
> > > > > > Best
> > > > > > Yun Tang
> > > > > > 
> > > > > > From: Rui Fan <1996fan...@gmail.com>
> > > > > > Sent: Saturday, December 9, 2023 21:46
> > > > > > To: dev@flink.apache.org 
> > > > > > Subject: Re: [DISCUSS] Release Flink 1.18.1
> > > > > >
> > > > > > Thanks Jing for driving this release, +1
> > > > > >
> > > > > > Best,
> > > > > > Rui
> > > > > >
> > > > > > On Sat, Dec 9, 2023 at 7:33 AM Leonard Xu 
> wrote:
> > > > > >
> > > > > > > Thanks Jing for driving this release, +1
> > > > > > >
> > 

Re: [DISCUSS] Release Flink 1.18.1

2023-12-18 Thread Jing Ge
Hi,

I am waiting for the CI of https://issues.apache.org/jira/browse/FLINK-33872
@Hong. It would be great if anyone would like to review the PR. Thanks.

1.18.1-rc1 release will be canceled because of
https://issues.apache.org/jira/browse/FLINK-33704, I will start with rc2
once FLINK-33872 has been merged into 1.18.

Best regards,
Jing

On Mon, Dec 18, 2023 at 11:52 AM Jing Ge  wrote:

> Hi folks,
>
> Thanks Martijn for driving the
> https://issues.apache.org/jira/browse/FLINK-33704, which will solve the
> issue of https://issues.apache.org/jira/browse/FLINK-33793. I will move
> forward with the 1.18.1 release today.
>
> Best regards,
> Jing
>
> On Thu, Dec 14, 2023 at 11:17 AM Jing Ge  wrote:
>
>> Hi folks,
>>
>> What Martijn said makes sense. We should pay more attention to
>> https://issues.apache.org/jira/browse/FLINK-33793. I have also tried to
>> contribute and share my thoughts in the PR
>> https://github.com/apache/flink/pull/23489.
>>
>> For 1.18.1-rc1, I will wait until tomorrow. If there is no progress with
>> the PR, I will move forward with the release, unless we upgrade the ticket
>> to be a Blocker. Look forward to your feedback.
>>
>> Best regards,
>> Jing
>>
>> On Wed, Dec 13, 2023 at 9:57 AM Jing Ge  wrote:
>>
>>> Hi Martijn,
>>>
>>> Thanks for the heads-up! I already started 1.18.1-rc1[1]. But please
>>> feel free to ping me. I will either redo rc1 or start rc2. Thanks!
>>>
>>> Best regards,
>>> Jing
>>>
>>> [1] https://dist.apache.org/repos/dist/dev/flink/flink-1.18.1-rc1/
>>>
>>> On Tue, Dec 12, 2023 at 9:49 PM Martijn Visser 
>>> wrote:
>>>
 Hi Jing,

 The only thing that was brought to my attention today was
 https://issues.apache.org/jira/browse/FLINK-33793. I've marked it as a
 Critical (I think we should have working checkpointing with GCS), but
 could also be considered a Blocker. There is a PR open for it, but I
 don't think it's the right fix so I'm testing

 https://github.com/apache/flink/commit/2ebe7f6e090690486c1954099a2b57283c578192
 at this moment. If you haven't started the 1.18.1 release yet, perhaps
 we could include it (hopefully we can merge it tomorrow), else it
 would have to wait for the next release.

 Thanks,

 Martijn

 On Tue, Dec 12, 2023 at 9:30 AM Jing Ge 
 wrote:
 >
 > Hi All,
 >
 > Thank you for your feedback!
 >
 > Since FLINK-33523[1] is done(thanks Martijn for driving it) and there
 are
 > no other concerns or objections. Currently I am not aware of any
 unresolved
 > blockers. There is one critical task[2] whose PR still has some
 checkstyle
 > issues. I will try to include it into 1.18.1 with best effort, since
 it
 > depends on how quickly the author can finish it and cp to 1.18. If
 anyone
 > considers FLINK-33588 as a must-have fix in 1.18.1, please let me
 know,
 > thanks!
 >
 > I will start working on the RC1 release today.
 >
 > @benchao: gotcha wrt FLINK-33313
 >
 > Best regards,
 > Jing
 >
 > [1] https://issues.apache.org/jira/browse/FLINK-33523
 > [2] https://issues.apache.org/jira/browse/FLINK-33588
 >
 > On Tue, Dec 12, 2023 at 7:03 AM weijie guo >>> >
 > wrote:
 >
 > > Thanks Jing for driving this bug-fix release.
 > >
 > > +1 from my side.
 > >
 > > Best regards,
 > >
 > > Weijie
 > >
 > >
 > > Jark Wu  于2023年12月12日周二 12:17写道:
 > >
 > > > Thanks Jing for driving 1.18.1.
 > > > +1 for this.
 > > >
 > > > Best,
 > > > Jark
 > > >
 > > > On Mon, 11 Dec 2023 at 16:59, Hong Liang 
 wrote:
 > > >
 > > > > +1. Thanks Jing for driving this.
 > > > >
 > > > > Hong
 > > > >
 > > > > On Mon, Dec 11, 2023 at 2:27 AM Yun Tang 
 wrote:
 > > > >
 > > > > > Thanks Jing for driving 1.18.1 release, +1 for this.
 > > > > >
 > > > > >
 > > > > > Best
 > > > > > Yun Tang
 > > > > > 
 > > > > > From: Rui Fan <1996fan...@gmail.com>
 > > > > > Sent: Saturday, December 9, 2023 21:46
 > > > > > To: dev@flink.apache.org 
 > > > > > Subject: Re: [DISCUSS] Release Flink 1.18.1
 > > > > >
 > > > > > Thanks Jing for driving this release, +1
 > > > > >
 > > > > > Best,
 > > > > > Rui
 > > > > >
 > > > > > On Sat, Dec 9, 2023 at 7:33 AM Leonard Xu 
 wrote:
 > > > > >
 > > > > > > Thanks Jing for driving this release, +1
 > > > > > >
 > > > > > > Best,
 > > > > > > Leonard
 > > > > > >
 > > > > > > > 2023年12月9日 上午1:23,Danny Cranmer 
 写道:
 > > > > > > >
 > > > > > > > +1
 > > > > > > >
 > > > > > > > Thanks for driving this
 > > > > > > >
 > > > > > > > On Fri, 8 Dec 2023, 12:05 Timo Walther, <
 twal...@apache.org>
 > > > wrote:
 > > > > > > >
 > > > > > > >> Thanks for taking care of this 

[jira] [Created] (FLINK-33876) [JUnit5 Migration] Introduce testName method in TableTestBase

2023-12-18 Thread Jiabao Sun (Jira)
Jiabao Sun created FLINK-33876:
--

 Summary: [JUnit5 Migration] Introduce testName method in 
TableTestBase
 Key: FLINK-33876
 URL: https://issues.apache.org/jira/browse/FLINK-33876
 Project: Flink
  Issue Type: Sub-task
  Components: Table SQL / Planner, Tests
Affects Versions: 1.19.0
Reporter: Jiabao Sun


After completing the JUnit5 migration in the table planner, there is an 
incompatibility issue with JUnit TestName and TestInfo. Therefore, considering 
introducing the methodName method in TableTestBase. External connectors's 
TablePlanTest can override this method when performing JUnit 5 migration for 
TableTestBase to avoid compilation issues.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


退订

2023-12-18 Thread 唐大彪
退订


Re: Request permission for creating a new FLIP page

2023-12-18 Thread Martijn Visser
Hi Jinsui,

You should now have permissions.

Best regards,

Martijn

On Mon, Dec 18, 2023 at 1:56 PM Jinsui Chen  wrote:
>
> hi, all
>
> I want to add a new connector for Flink (see FLINK-33873). For this level
> of change a new FLIP page needs to be created, but I don't have the
> corresponding permissions.
>
> Following the instructions in the FLIP document, I sent this email. Could
> anyone help me with this?
>
> Thanks,
> Jinsui


[jira] [Created] (FLINK-33875) Support slots wait mechanism at DeclarativeSlotPoolBridge side for Default Scheduler

2023-12-18 Thread RocMarshal (Jira)
RocMarshal created FLINK-33875:
--

 Summary: Support slots wait mechanism at DeclarativeSlotPoolBridge 
side for Default Scheduler
 Key: FLINK-33875
 URL: https://issues.apache.org/jira/browse/FLINK-33875
 Project: Flink
  Issue Type: Sub-task
  Components: Runtime / Task
Reporter: RocMarshal






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-33874) Introduce resource request wait mechanism at DefaultDeclarativeSlotPool side for Default Scheduler

2023-12-18 Thread RocMarshal (Jira)
RocMarshal created FLINK-33874:
--

 Summary: Introduce resource request wait mechanism at 
DefaultDeclarativeSlotPool side for Default Scheduler
 Key: FLINK-33874
 URL: https://issues.apache.org/jira/browse/FLINK-33874
 Project: Flink
  Issue Type: New Feature
  Components: Runtime / Task
Reporter: RocMarshal






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Request permission for creating a new FLIP page

2023-12-18 Thread Jinsui Chen
hi, all

I want to add a new connector for Flink (see FLINK-33873). For this level
of change a new FLIP page needs to be created, but I don't have the
corresponding permissions.

Following the instructions in the FLIP document, I sent this email. Could
anyone help me with this?

Thanks,
Jinsui


Re: [VOTE] FLIP-372: Allow TwoPhaseCommittingSink WithPreCommitTopology to alter the type of the Committable

2023-12-18 Thread Gyula Fóra
+1 (binding)

Gyula

On Mon, 18 Dec 2023 at 13:04, Márton Balassi 
wrote:

> +1 (binding)
>
> On Mon 18. Dec 2023 at 09:34, Péter Váry 
> wrote:
>
> > Hi everyone,
> >
> > Since there were no further comments on the discussion thread [1], I
> would
> > like to start the vote for FLIP-372 [2].
> >
> > The FLIP started as a small new feature, but in the discussion thread and
> > in a similar parallel thread [3] we opted for a somewhat bigger change in
> > the Sink V2 API.
> >
> > Please read the FLIP and cast your vote.
> >
> > The vote will remain open for at least 72 hours and only concluded if
> there
> > are no objections and enough (i.e. at least 3) binding votes.
> >
> > Thanks,
> > Peter
> >
> > [1] - https://lists.apache.org/thread/344pzbrqbbb4w0sfj67km25msp7hxlyd
> > [2] -
> >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-372%3A+Allow+TwoPhaseCommittingSink+WithPreCommitTopology+to+alter+the+type+of+the+Committable
> > [3] - https://lists.apache.org/thread/h6nkgth838dlh5s90sd95zd6hlsxwx57
> >
>


Re: [VOTE] FLIP-372: Allow TwoPhaseCommittingSink WithPreCommitTopology to alter the type of the Committable

2023-12-18 Thread Márton Balassi
+1 (binding)

On Mon 18. Dec 2023 at 09:34, Péter Váry 
wrote:

> Hi everyone,
>
> Since there were no further comments on the discussion thread [1], I would
> like to start the vote for FLIP-372 [2].
>
> The FLIP started as a small new feature, but in the discussion thread and
> in a similar parallel thread [3] we opted for a somewhat bigger change in
> the Sink V2 API.
>
> Please read the FLIP and cast your vote.
>
> The vote will remain open for at least 72 hours and only concluded if there
> are no objections and enough (i.e. at least 3) binding votes.
>
> Thanks,
> Peter
>
> [1] - https://lists.apache.org/thread/344pzbrqbbb4w0sfj67km25msp7hxlyd
> [2] -
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-372%3A+Allow+TwoPhaseCommittingSink+WithPreCommitTopology+to+alter+the+type+of+the+Committable
> [3] - https://lists.apache.org/thread/h6nkgth838dlh5s90sd95zd6hlsxwx57
>


[jira] [Created] (FLINK-33873) Create a Redis HyperLogLog Connector for Flink

2023-12-18 Thread Jinsui Chen (Jira)
Jinsui Chen created FLINK-33873:
---

 Summary: Create a Redis HyperLogLog Connector for Flink
 Key: FLINK-33873
 URL: https://issues.apache.org/jira/browse/FLINK-33873
 Project: Flink
  Issue Type: New Feature
  Components: Connectors / Redis Streams
Reporter: Jinsui Chen


Redis HyperLogLog is a probabilistic data structure used for estimating the 
cardinality of a dataset, which is the number of unique elements in a set. I 
think it is possible to create a sink connector for HyperLogLog.

FLINK-15571 is about Redis stream connector.

Since there is no component for the Redis connector as a whole, the issue is 
created under this component.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-33872) Checkpoint history does not display for completed jobs

2023-12-18 Thread Hong Liang Teoh (Jira)
Hong Liang Teoh created FLINK-33872:
---

 Summary: Checkpoint history does not display for completed jobs
 Key: FLINK-33872
 URL: https://issues.apache.org/jira/browse/FLINK-33872
 Project: Flink
  Issue Type: Bug
  Components: Runtime / REST
Affects Versions: 1.18.0
Reporter: Hong Liang Teoh
 Fix For: 1.19.0, 1.18.2
 Attachments: image-2023-12-18-11-37-11-914.png, 
image-2023-12-18-11-37-29-596.png

Prior to https://issues.apache.org/jira/browse/FLINK-32469, we see checkpoint 
history for completed jobs (CANCELED, FAILED, FINISHED).

After https://issues.apache.org/jira/browse/FLINK-32469, the checkpoint history 
does not show up for completed jobs. 

*Reproduction steps:*
 # Start a Flink cluster.
 # Submit a job with checkpointing enabled.
 # Wait until at least 1 checkpoint completes.
 # Cancel job.
 # Open the Flink dashboard > Job > Checkpoints > History.

We will see log line in JobManager saying "FlinkJobNotFoundException: Could not 
find Flink job (  )"

*Snapshot of failure:*

When job is running, we can see checkpoints.

!image-2023-12-18-11-37-11-914.png!

When job has been CANCELLED, we no longer see checkpoints data.

!image-2023-12-18-11-37-29-596.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: Question on lookup joins

2023-12-18 Thread Benchao Li
I don't see a problem in the result. Since you are using LEFT JOIN,
the NULLs are expected where there is no matching result in the right
table.

Hang Ruan  于2023年12月18日周一 09:39写道:
>
> Hi, David.
>
> The FLIP-377[1] is about this part. You could take a look at it.
>
> Best,
> Hang
>
> [1]
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=276105768
>
>
> Hang Ruan  于2023年12月17日周日 20:56写道:
>
> > Hi, David.
> >
> > I think you are right that the value with NULL should not be returned if
> > the filter push down is closed.
> >
> > Maybe you should explain this sql to make sure this filter not be pushed
> > down to the lookup source.
> >
> > I see the configuration
> > 'table.optimizer.source.predicate-pushdown-enabled' relies on the class
> > FilterableTableSource, which is deprecated.
> > I am not sure whether this configuration is still useful for jdbc
> > connector, which is using the SupportsFilterPushDown.
> >
> > Maybe the jdbc connector should read this configuration and return an
> > empty 'acceptedFilters' in the method 'applyFilters'.
> >
> > Best,
> > Hang
> >
> > David Radley  于2023年12月16日周六 01:47写道:
> >
> >> Hi ,
> >> I am working on FLINK-33365 which related to JDBC predicate pushdown. I
> >> want to ensure that the same results occur with predicate pushdown as
> >> without. So I am asking this question outside the pr / issue.
> >>
> >> I notice the following behaviour for lookup joins without predicate
> >> pushdown. I was not expecting all the s , when there is not a
> >> matching join key.  ’a’ is a table in paimon and ‘db’ is a relational
> >> database.
> >>
> >>
> >>
> >> Flink SQL> select * from a;
> >>
> >> +++-+
> >>
> >> | op | ip |proctime |
> >>
> >> +++-+
> >>
> >> | +I |10.10.10.10 | 2023-12-15 17:36:10.028 |
> >>
> >> | +I |20.20.20.20 | 2023-12-15 17:36:10.030 |
> >>
> >> | +I |30.30.30.30 | 2023-12-15 17:36:10.031 |
> >>
> >> ^CQuery terminated, received a total of 3 rows
> >>
> >>
> >>
> >> Flink SQL> select * from  db_catalog.menagerie.e;
> >>
> >>
> >> +++-+-+-+-+
> >>
> >> | op | ip |type | age |
> >> height |  weight |
> >>
> >>
> >> +++-+-+-+-+
> >>
> >> | +I |10.10.10.10 |   1 |  30 |
> >>100 | 100 |
> >>
> >> | +I |10.10.10.10 |   2 |  40 |
> >> 90 | 110 |
> >>
> >> | +I |10.10.10.10 |   2 |  50 |
> >> 80 | 120 |
> >>
> >> | +I |10.10.10.10 |   3 |  50 |
> >> 70 |  40 |
> >>
> >> | +I |20.20.20.20 |   3 |  30 |
> >> 80 |  90 |
> >>
> >>
> >> +++-+-+-+-+
> >>
> >> Received a total of 5 rows
> >>
> >>
> >>
> >> Flink SQL> set table.optimizer.source.predicate-pushdown-enabled=false;
> >>
> >> [INFO] Execute statement succeed.
> >>
> >>
> >>
> >> Flink SQL> SELECT * FROM a left join mariadb_catalog.menagerie.e FOR
> >> SYSTEM_TIME AS OF a.proctime on e.type = 2 and a.ip = e.ip;
> >>
> >>
> >> +++-++-+-+-+-+
> >>
> >> | op | ip |proctime |
> >> ip0 |type | age |  height |
> >> weight |
> >>
> >>
> >> +++-++-+-+-+-+
> >>
> >> | +I |10.10.10.10 | 2023-12-15 17:38:05.169 |
> >> 10.10.10.10 |   2 |  40 |  90 |
> >>  110 |
> >>
> >> | +I |10.10.10.10 | 2023-12-15 17:38:05.169 |
> >> 10.10.10.10 |   2 |  50 |  80 |
> >>  120 |
> >>
> >> | +I |20.20.20.20 | 2023-12-15 17:38:05.170 |
> >>   |   |   |   |
> >>  |
> >>
> >> | +I |30.30.30.30 | 2023-12-15 17:38:05.172 |
> >>   |   |   |   |
> >>  |
> >>
> >> Unless otherwise stated above:
> >>
> >> IBM United Kingdom Limited
> >> Registered in England and Wales with number 741598
> >> Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU
> >>
> >



-- 

Best,
Benchao Li


Re: [DISCUSS] Release Flink 1.18.1

2023-12-18 Thread Jing Ge
Hi folks,

Thanks Martijn for driving the
https://issues.apache.org/jira/browse/FLINK-33704, which will solve the
issue of https://issues.apache.org/jira/browse/FLINK-33793. I will move
forward with the 1.18.1 release today.

Best regards,
Jing

On Thu, Dec 14, 2023 at 11:17 AM Jing Ge  wrote:

> Hi folks,
>
> What Martijn said makes sense. We should pay more attention to
> https://issues.apache.org/jira/browse/FLINK-33793. I have also tried to
> contribute and share my thoughts in the PR
> https://github.com/apache/flink/pull/23489.
>
> For 1.18.1-rc1, I will wait until tomorrow. If there is no progress with
> the PR, I will move forward with the release, unless we upgrade the ticket
> to be a Blocker. Look forward to your feedback.
>
> Best regards,
> Jing
>
> On Wed, Dec 13, 2023 at 9:57 AM Jing Ge  wrote:
>
>> Hi Martijn,
>>
>> Thanks for the heads-up! I already started 1.18.1-rc1[1]. But please feel
>> free to ping me. I will either redo rc1 or start rc2. Thanks!
>>
>> Best regards,
>> Jing
>>
>> [1] https://dist.apache.org/repos/dist/dev/flink/flink-1.18.1-rc1/
>>
>> On Tue, Dec 12, 2023 at 9:49 PM Martijn Visser 
>> wrote:
>>
>>> Hi Jing,
>>>
>>> The only thing that was brought to my attention today was
>>> https://issues.apache.org/jira/browse/FLINK-33793. I've marked it as a
>>> Critical (I think we should have working checkpointing with GCS), but
>>> could also be considered a Blocker. There is a PR open for it, but I
>>> don't think it's the right fix so I'm testing
>>>
>>> https://github.com/apache/flink/commit/2ebe7f6e090690486c1954099a2b57283c578192
>>> at this moment. If you haven't started the 1.18.1 release yet, perhaps
>>> we could include it (hopefully we can merge it tomorrow), else it
>>> would have to wait for the next release.
>>>
>>> Thanks,
>>>
>>> Martijn
>>>
>>> On Tue, Dec 12, 2023 at 9:30 AM Jing Ge 
>>> wrote:
>>> >
>>> > Hi All,
>>> >
>>> > Thank you for your feedback!
>>> >
>>> > Since FLINK-33523[1] is done(thanks Martijn for driving it) and there
>>> are
>>> > no other concerns or objections. Currently I am not aware of any
>>> unresolved
>>> > blockers. There is one critical task[2] whose PR still has some
>>> checkstyle
>>> > issues. I will try to include it into 1.18.1 with best effort, since it
>>> > depends on how quickly the author can finish it and cp to 1.18. If
>>> anyone
>>> > considers FLINK-33588 as a must-have fix in 1.18.1, please let me know,
>>> > thanks!
>>> >
>>> > I will start working on the RC1 release today.
>>> >
>>> > @benchao: gotcha wrt FLINK-33313
>>> >
>>> > Best regards,
>>> > Jing
>>> >
>>> > [1] https://issues.apache.org/jira/browse/FLINK-33523
>>> > [2] https://issues.apache.org/jira/browse/FLINK-33588
>>> >
>>> > On Tue, Dec 12, 2023 at 7:03 AM weijie guo 
>>> > wrote:
>>> >
>>> > > Thanks Jing for driving this bug-fix release.
>>> > >
>>> > > +1 from my side.
>>> > >
>>> > > Best regards,
>>> > >
>>> > > Weijie
>>> > >
>>> > >
>>> > > Jark Wu  于2023年12月12日周二 12:17写道:
>>> > >
>>> > > > Thanks Jing for driving 1.18.1.
>>> > > > +1 for this.
>>> > > >
>>> > > > Best,
>>> > > > Jark
>>> > > >
>>> > > > On Mon, 11 Dec 2023 at 16:59, Hong Liang 
>>> wrote:
>>> > > >
>>> > > > > +1. Thanks Jing for driving this.
>>> > > > >
>>> > > > > Hong
>>> > > > >
>>> > > > > On Mon, Dec 11, 2023 at 2:27 AM Yun Tang 
>>> wrote:
>>> > > > >
>>> > > > > > Thanks Jing for driving 1.18.1 release, +1 for this.
>>> > > > > >
>>> > > > > >
>>> > > > > > Best
>>> > > > > > Yun Tang
>>> > > > > > 
>>> > > > > > From: Rui Fan <1996fan...@gmail.com>
>>> > > > > > Sent: Saturday, December 9, 2023 21:46
>>> > > > > > To: dev@flink.apache.org 
>>> > > > > > Subject: Re: [DISCUSS] Release Flink 1.18.1
>>> > > > > >
>>> > > > > > Thanks Jing for driving this release, +1
>>> > > > > >
>>> > > > > > Best,
>>> > > > > > Rui
>>> > > > > >
>>> > > > > > On Sat, Dec 9, 2023 at 7:33 AM Leonard Xu 
>>> wrote:
>>> > > > > >
>>> > > > > > > Thanks Jing for driving this release, +1
>>> > > > > > >
>>> > > > > > > Best,
>>> > > > > > > Leonard
>>> > > > > > >
>>> > > > > > > > 2023年12月9日 上午1:23,Danny Cranmer 
>>> 写道:
>>> > > > > > > >
>>> > > > > > > > +1
>>> > > > > > > >
>>> > > > > > > > Thanks for driving this
>>> > > > > > > >
>>> > > > > > > > On Fri, 8 Dec 2023, 12:05 Timo Walther, <
>>> twal...@apache.org>
>>> > > > wrote:
>>> > > > > > > >
>>> > > > > > > >> Thanks for taking care of this Jing.
>>> > > > > > > >>
>>> > > > > > > >> +1 to release 1.18.1 for this.
>>> > > > > > > >>
>>> > > > > > > >> Cheers,
>>> > > > > > > >> Timo
>>> > > > > > > >>
>>> > > > > > > >>
>>> > > > > > > >> On 08.12.23 10:00, Benchao Li wrote:
>>> > > > > > > >>> I've merged FLINK-33313 to release-1.18 branch.
>>> > > > > > > >>>
>>> > > > > > > >>> Péter Váry  于2023年12月8日周五
>>> > > 16:56写道:
>>> > > > > > > 
>>> > > > > > >  Hi Jing,
>>> > > > > > >  Thanks for taking care of this!
>>> > > > > > >  +1 (non-binding)
>>> > > > > > >  Peter
>>> > > 

Re: [DISCUSS] FLIP-400: AsyncScalarFunction for asynchronous scalar function support

2023-12-18 Thread Timo Walther

Hi Xuyang and Alan,

thanks for this productive discussion.

> Would it make a difference if it were exposed by the explain

@Alan: I think this is great idea. +1 on exposing the sync/async 
behavior thought EXPLAIN.



> Is there an easy way to determine if the output of an async function
> would be problematic or not?

Clear "no" on this. Changelog semantics make the planner complex and we 
need to be careful. Therefore I would strongly suggest we introduce 
ORDERED and slowly enable UNORDERED whenever we see a good fit for it in 
plans with appropriate planner rules that guard it.


> If the input to the operator is append-only, it seems fine, because
> this implies that each row is effectively independent and ordering is
> unimportant.

As @Xuyang pointed out, it's not only the input that decides whether 
append-only is safe. It's also the subsequent operators in the pipeline. 
The example of Xuyang is a good one, when the sink operates in upsert 
mode. Append-only source, append-only operators, and append-only sink 
are safer.


However, even in this combination, a row is not fully "independent" 
there are still watermarks flowing between rows:


R(5), W(4), R(3), R(4), R(2), R(1), W(0)

So unordering should be fine *within* watermarks. This is also what 
watermarks are good for, a trade-off between strict ordering and making 
progress. The async operator from DataStream API also supports this if I 
remember correctly. However, it assumes a timestamp is present in 
StreamRecord on which it can work. But this is not the case within the 
SQL engine.


TLDR: Let's focus on ORDERED first.

If we want to use UNORDERED, I would suggest to check the input operator 
for exactly 1 time attribute column. If there is exactly 1 time 
attribute column, we could insert it into the StreamRecord and allow 
UNORDERED mode. If this condition is not met, we go with ORDERED.


Regards,
Timo




On 18.12.23 07:05, Xuyang wrote:

Hi, Alan and Timo. Thanks for your reply.

Would it make a difference if it were exposed by the explain
method (the operator having "syncMode" vs not)?

@Alan: I think this is a good way to tell the user what mode these async udx 
are currently in.

A regular SQL user doesn't care whether the function is sync or async.

@Timo: I agree that the planner should throw as few exceptions as possible 
rather than confusing users. So I think
it is a good way to expose syncMode through explain syntax.

If the input to the operator is append-only, it seems fine,
because this implies that each row is effectively independent and ordering is 
unimportant.




For example, if the query is > an append-only `SELECT FUNC(c) FROM t`,
I don't see a reason why the > operator is not allowed to produce unordered 
results.



@Timo, @Alan: IIUC, there seems to be something wrong here. Take kafka as 
source and mysql as sink as an example.
Although kafka is an append-only source, one of its fields is used as pk when 
writing to mysql. If async udx is executed
  in an unordered mode, there may be problems with the data in mysql in the 
end. In this case, we need to ensure that
the sink-based pk is in order actually.



--

 Best!
 Xuyang





At 2023-12-16 03:33:55, "Alan Sheinberg"  
wrote:

Thanks for the replies everyone.  My responses are inline:

About the configs, what do you think using hints as mentioned in [1].

@Aitozi: I think hints could be a good way to do this, similar to lookup
joins or the proposal in FLIP-313.  One benefit of hints is that they allow
for the highest granularity of configuration because you can decide at
each and every call site just what parameters to use.  The downside of
hints is that there's more syntax to learn and more verbosity.  I'm
somewhat partial to a configuration like this with a class definition level
of granularity (similar to how metrics reporters are defined [1]):

table.exec.async-scalar.myfunc.class: org.apache.flink.MyAsyncScalarFunction
table.exec.async-scalar.myfunc.buffer-capacity: 10
...

As Timo mentioned, the downside to this is that there's not a nice static
way to do this at the moment, unless you extend ConfigOption.  It would be
good ultimately if Lookup joins, async scalar functions, and other future
configurable UDFs shared the same methodology, but maybe a unified approach
is a followup discussion.

I’m just curious why you don’t use conf(global) and query hint(individual

async udx) to mark the output
mode 'order' or 'unorder' like async look join [1] and async udtf[2], but
chose to introduce a new enum
in AsyncScalarFunction.



@Xuyang: I'm open to adding hints. I think the important part is that we
have some method for the user to have a class definition level way to
define whether ORDERED or ALLOW_UNORDERED is most appropriate.  I don't
have a strong sense yet for what would be most appropriately exposed as a
FunctionRequirement vs a simple configuration/hint.

What about throwing an exception to make it clear to users that using async


Re: [DISCUSS] FLIP-377: Support configuration to disable filter push down for Table/SQL Sources

2023-12-18 Thread Jiabao Sun
Hi Becket,

The FLIP document[1] has been updated.
Could you help take a look again?

Thanks,
Jiabao

[1] https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=276105768


> 2023年12月18日 16:53,Becket Qin  写道:
> 
> Yes, that sounds reasonable to me. We can start with ALWAYS and NEVER, and
> add more policies as needed.
> 
> Thanks,
> 
> Jiangjie (Becket) Qin
> 
> On Mon, Dec 18, 2023 at 4:48 PM Jiabao Sun 
> wrote:
> 
>> Thanks Bucket,
>> 
>> The jdbc.filter.handling.policy is good to me as it provides sufficient
>> extensibility for future filter pushdown optimizations.
>> However, currently, we don't have an implementation for the AUTO mode, and
>> it seems that the AUTO mode can easily be confused with the ALWAYS mode
>> because users don't have the opportunity to MANUALLY decide which filters
>> to push down.
>> 
>> I suggest that we only introduce the ALWAYS and NEVER modes for now, and
>> we can consider introducing more flexible policies in the future,
>> such as INDEX_ONLY, NUMBERIC_ONLY and so on.
>> 
>> WDYT?
>> 
>> Best,
>> Jiabao
>> 
>> 
>> 
>>> 2023年12月18日 16:27,Becket Qin  写道:
>>> 
>>> Hi Jiabao,
>>> 
>>> Please see the reply inline.
>>> 
>>> 
 The MySQL connector is currently in the flink-connector-jdbc repository
 and is not a standalone connector.
 Is it too unique to use "mysql" as the configuration option prefix?
>>> 
>>> If the intended behavior makes sense to all the supported JDBC drivers,
>> we
>>> can make this a JDBC connector configuration.
>>> 
>>> Also, I would like to ask about the difference in behavior between AUTO
>> and
 ALWAYS.
 It seems that we cannot guarantee the pushing down of all filters to the
 external system under the ALWAYS
 mode because not all filters in Flink SQL are supported by the external
 system.
 Should we throw an error when encountering a filter that cannot be
>> pushed
 down in the ALWAYS mode?
>>> 
>>> The idea of AUTO is to do efficiency-aware pushdowns. The source will
>> query
>>> the external system (MySQL, Oracle, SQL Server, etc) first to retrieve
>> the
>>> information of the table. With that information, the source will decide
>>> whether to further push a filter to the external system based on the
>>> efficiency. E.g. only push the indexed fields. In contrast, ALWAYS will
>>> just always push the supported filters to the external system, regardless
>>> of the efficiency. In case there are filters that are not supported,
>>> according to the current contract of SupportsFilterPushdown, these
>>> unsupported filters should just be returned by the
>>> *SupportsFilterPushdown.applyFilters()* method as remaining filters.
>>> Therefore, there is no need to throw exceptions here. This is likely the
>>> desired behavior for most users, IMO. If there are cases that users
>> really
>>> want to get alerted when a filter cannot be pushed to the external
>> system,
>>> we can add another value like "ENFORCED_ALWAYS", which behaves like
>> ALWAYS,
>>> but throws exceptions when a filter cannot be applied to the external
>>> system. But personally I don't see much value in doing this.
>>> 
>>> Thanks,
>>> 
>>> Jiangjie (Becket) Qin
>>> 
>>> 
>>> 
>>> On Mon, Dec 18, 2023 at 3:54 PM Jiabao Sun > .invalid>
>>> wrote:
>>> 
 Hi Becket,
 
 The MySQL connector is currently in the flink-connector-jdbc repository
 and is not a standalone connector.
 Is it too unique to use "mysql" as the configuration option prefix?
 
 Also, I would like to ask about the difference in behavior between AUTO
 and ALWAYS.
 It seems that we cannot guarantee the pushing down of all filters to the
 external system under the ALWAYS
 mode because not all filters in Flink SQL are supported by the external
 system.
 Should we throw an error when encountering a filter that cannot be
>> pushed
 down in the ALWAYS mode?
 
 Thanks,
 Jiabao
 
> 2023年12月18日 15:34,Becket Qin  写道:
> 
> Hi JIabao,
> 
> Thanks for updating the FLIP. Maybe I did not explain it clearly
>> enough.
 My
> point is that given there are various good flavors of behaviors
>> handling
> filters pushed down, we should not have a common config of
> "ignore.filter.pushdown", because the behavior is not *common*.
> 
> It looks like the original motivation of this FLIP is just for MySql.
 Let's
> focus on what is the best solution for MySql connector here first.
>> After
> that, if people think the best behavior for MySql happens to be a
>> common
> one, we can then discuss whether that is worth being added to the base
> implementation of source. For MySQL, if we are going to introduce a
 config
> to MySql, why not have something like "mysql.filter.handling.policy"
>> with
> value of AUTO / NEVER / ALWAYS? Isn't that better than
> "ignore.filter.pushdown"?
> 
> Thanks,
> 
> Jiangjie (Becket) Qin
> 
> 
> 

Re: [DISCUSS] FLIP-377: Support configuration to disable filter push down for Table/SQL Sources

2023-12-18 Thread Becket Qin
Yes, that sounds reasonable to me. We can start with ALWAYS and NEVER, and
add more policies as needed.

Thanks,

Jiangjie (Becket) Qin

On Mon, Dec 18, 2023 at 4:48 PM Jiabao Sun 
wrote:

> Thanks Bucket,
>
> The jdbc.filter.handling.policy is good to me as it provides sufficient
> extensibility for future filter pushdown optimizations.
> However, currently, we don't have an implementation for the AUTO mode, and
> it seems that the AUTO mode can easily be confused with the ALWAYS mode
> because users don't have the opportunity to MANUALLY decide which filters
> to push down.
>
> I suggest that we only introduce the ALWAYS and NEVER modes for now, and
> we can consider introducing more flexible policies in the future,
> such as INDEX_ONLY, NUMBERIC_ONLY and so on.
>
> WDYT?
>
> Best,
> Jiabao
>
>
>
> > 2023年12月18日 16:27,Becket Qin  写道:
> >
> > Hi Jiabao,
> >
> > Please see the reply inline.
> >
> >
> >> The MySQL connector is currently in the flink-connector-jdbc repository
> >> and is not a standalone connector.
> >> Is it too unique to use "mysql" as the configuration option prefix?
> >
> > If the intended behavior makes sense to all the supported JDBC drivers,
> we
> > can make this a JDBC connector configuration.
> >
> > Also, I would like to ask about the difference in behavior between AUTO
> and
> >> ALWAYS.
> >> It seems that we cannot guarantee the pushing down of all filters to the
> >> external system under the ALWAYS
> >> mode because not all filters in Flink SQL are supported by the external
> >> system.
> >> Should we throw an error when encountering a filter that cannot be
> pushed
> >> down in the ALWAYS mode?
> >
> > The idea of AUTO is to do efficiency-aware pushdowns. The source will
> query
> > the external system (MySQL, Oracle, SQL Server, etc) first to retrieve
> the
> > information of the table. With that information, the source will decide
> > whether to further push a filter to the external system based on the
> > efficiency. E.g. only push the indexed fields. In contrast, ALWAYS will
> > just always push the supported filters to the external system, regardless
> > of the efficiency. In case there are filters that are not supported,
> > according to the current contract of SupportsFilterPushdown, these
> > unsupported filters should just be returned by the
> > *SupportsFilterPushdown.applyFilters()* method as remaining filters.
> > Therefore, there is no need to throw exceptions here. This is likely the
> > desired behavior for most users, IMO. If there are cases that users
> really
> > want to get alerted when a filter cannot be pushed to the external
> system,
> > we can add another value like "ENFORCED_ALWAYS", which behaves like
> ALWAYS,
> > but throws exceptions when a filter cannot be applied to the external
> > system. But personally I don't see much value in doing this.
> >
> > Thanks,
> >
> > Jiangjie (Becket) Qin
> >
> >
> >
> > On Mon, Dec 18, 2023 at 3:54 PM Jiabao Sun  .invalid>
> > wrote:
> >
> >> Hi Becket,
> >>
> >> The MySQL connector is currently in the flink-connector-jdbc repository
> >> and is not a standalone connector.
> >> Is it too unique to use "mysql" as the configuration option prefix?
> >>
> >> Also, I would like to ask about the difference in behavior between AUTO
> >> and ALWAYS.
> >> It seems that we cannot guarantee the pushing down of all filters to the
> >> external system under the ALWAYS
> >> mode because not all filters in Flink SQL are supported by the external
> >> system.
> >> Should we throw an error when encountering a filter that cannot be
> pushed
> >> down in the ALWAYS mode?
> >>
> >> Thanks,
> >> Jiabao
> >>
> >>> 2023年12月18日 15:34,Becket Qin  写道:
> >>>
> >>> Hi JIabao,
> >>>
> >>> Thanks for updating the FLIP. Maybe I did not explain it clearly
> enough.
> >> My
> >>> point is that given there are various good flavors of behaviors
> handling
> >>> filters pushed down, we should not have a common config of
> >>> "ignore.filter.pushdown", because the behavior is not *common*.
> >>>
> >>> It looks like the original motivation of this FLIP is just for MySql.
> >> Let's
> >>> focus on what is the best solution for MySql connector here first.
> After
> >>> that, if people think the best behavior for MySql happens to be a
> common
> >>> one, we can then discuss whether that is worth being added to the base
> >>> implementation of source. For MySQL, if we are going to introduce a
> >> config
> >>> to MySql, why not have something like "mysql.filter.handling.policy"
> with
> >>> value of AUTO / NEVER / ALWAYS? Isn't that better than
> >>> "ignore.filter.pushdown"?
> >>>
> >>> Thanks,
> >>>
> >>> Jiangjie (Becket) Qin
> >>>
> >>>
> >>>
> >>> On Sun, Dec 17, 2023 at 11:30 PM Jiabao Sun  >> .invalid>
> >>> wrote:
> >>>
>  Hi Becket,
> 
>  The FLIP document has been updated as well.
>  Please take a look when you have time.
> 
>  Thanks,
>  Jiabao
> 
> 
> > 2023年12月17日 22:54,Jiabao Sun  写道:
> >
> 

Re: [DISCUSS] FLIP-377: Support configuration to disable filter push down for Table/SQL Sources

2023-12-18 Thread Jiabao Sun
Thanks Bucket,

The jdbc.filter.handling.policy is good to me as it provides sufficient 
extensibility for future filter pushdown optimizations. 
However, currently, we don't have an implementation for the AUTO mode, and it 
seems that the AUTO mode can easily be confused with the ALWAYS mode 
because users don't have the opportunity to MANUALLY decide which filters to 
push down. 

I suggest that we only introduce the ALWAYS and NEVER modes for now, and we can 
consider introducing more flexible policies in the future,
such as INDEX_ONLY, NUMBERIC_ONLY and so on.

WDYT?

Best,
Jiabao



> 2023年12月18日 16:27,Becket Qin  写道:
> 
> Hi Jiabao,
> 
> Please see the reply inline.
> 
> 
>> The MySQL connector is currently in the flink-connector-jdbc repository
>> and is not a standalone connector.
>> Is it too unique to use "mysql" as the configuration option prefix?
> 
> If the intended behavior makes sense to all the supported JDBC drivers, we
> can make this a JDBC connector configuration.
> 
> Also, I would like to ask about the difference in behavior between AUTO and
>> ALWAYS.
>> It seems that we cannot guarantee the pushing down of all filters to the
>> external system under the ALWAYS
>> mode because not all filters in Flink SQL are supported by the external
>> system.
>> Should we throw an error when encountering a filter that cannot be pushed
>> down in the ALWAYS mode?
> 
> The idea of AUTO is to do efficiency-aware pushdowns. The source will query
> the external system (MySQL, Oracle, SQL Server, etc) first to retrieve the
> information of the table. With that information, the source will decide
> whether to further push a filter to the external system based on the
> efficiency. E.g. only push the indexed fields. In contrast, ALWAYS will
> just always push the supported filters to the external system, regardless
> of the efficiency. In case there are filters that are not supported,
> according to the current contract of SupportsFilterPushdown, these
> unsupported filters should just be returned by the
> *SupportsFilterPushdown.applyFilters()* method as remaining filters.
> Therefore, there is no need to throw exceptions here. This is likely the
> desired behavior for most users, IMO. If there are cases that users really
> want to get alerted when a filter cannot be pushed to the external system,
> we can add another value like "ENFORCED_ALWAYS", which behaves like ALWAYS,
> but throws exceptions when a filter cannot be applied to the external
> system. But personally I don't see much value in doing this.
> 
> Thanks,
> 
> Jiangjie (Becket) Qin
> 
> 
> 
> On Mon, Dec 18, 2023 at 3:54 PM Jiabao Sun 
> wrote:
> 
>> Hi Becket,
>> 
>> The MySQL connector is currently in the flink-connector-jdbc repository
>> and is not a standalone connector.
>> Is it too unique to use "mysql" as the configuration option prefix?
>> 
>> Also, I would like to ask about the difference in behavior between AUTO
>> and ALWAYS.
>> It seems that we cannot guarantee the pushing down of all filters to the
>> external system under the ALWAYS
>> mode because not all filters in Flink SQL are supported by the external
>> system.
>> Should we throw an error when encountering a filter that cannot be pushed
>> down in the ALWAYS mode?
>> 
>> Thanks,
>> Jiabao
>> 
>>> 2023年12月18日 15:34,Becket Qin  写道:
>>> 
>>> Hi JIabao,
>>> 
>>> Thanks for updating the FLIP. Maybe I did not explain it clearly enough.
>> My
>>> point is that given there are various good flavors of behaviors handling
>>> filters pushed down, we should not have a common config of
>>> "ignore.filter.pushdown", because the behavior is not *common*.
>>> 
>>> It looks like the original motivation of this FLIP is just for MySql.
>> Let's
>>> focus on what is the best solution for MySql connector here first. After
>>> that, if people think the best behavior for MySql happens to be a common
>>> one, we can then discuss whether that is worth being added to the base
>>> implementation of source. For MySQL, if we are going to introduce a
>> config
>>> to MySql, why not have something like "mysql.filter.handling.policy" with
>>> value of AUTO / NEVER / ALWAYS? Isn't that better than
>>> "ignore.filter.pushdown"?
>>> 
>>> Thanks,
>>> 
>>> Jiangjie (Becket) Qin
>>> 
>>> 
>>> 
>>> On Sun, Dec 17, 2023 at 11:30 PM Jiabao Sun > .invalid>
>>> wrote:
>>> 
 Hi Becket,
 
 The FLIP document has been updated as well.
 Please take a look when you have time.
 
 Thanks,
 Jiabao
 
 
> 2023年12月17日 22:54,Jiabao Sun  写道:
> 
> Thanks Becket,
> 
> I apologize for not being able to continue with this proposal due to
 being too busy during this period.
> 
> The viewpoints you shared about the design of Flink Source make sense
>> to
 me
> The native configuration ‘ignore.filter.pushdown’ is good to me.
> Having a unified name or naming style can indeed prevent confusion for
 users regarding
> the inconsistent naming of this 

[VOTE] FLIP-372: Allow TwoPhaseCommittingSink WithPreCommitTopology to alter the type of the Committable

2023-12-18 Thread Péter Váry
Hi everyone,

Since there were no further comments on the discussion thread [1], I would
like to start the vote for FLIP-372 [2].

The FLIP started as a small new feature, but in the discussion thread and
in a similar parallel thread [3] we opted for a somewhat bigger change in
the Sink V2 API.

Please read the FLIP and cast your vote.

The vote will remain open for at least 72 hours and only concluded if there
are no objections and enough (i.e. at least 3) binding votes.

Thanks,
Peter

[1] - https://lists.apache.org/thread/344pzbrqbbb4w0sfj67km25msp7hxlyd
[2] -
https://cwiki.apache.org/confluence/display/FLINK/FLIP-372%3A+Allow+TwoPhaseCommittingSink+WithPreCommitTopology+to+alter+the+type+of+the+Committable
[3] - https://lists.apache.org/thread/h6nkgth838dlh5s90sd95zd6hlsxwx57


[jira] [Created] (FLINK-33871) Reduce getTable call for hive client and optimize graph generation time

2023-12-18 Thread hehuiyuan (Jira)
hehuiyuan created FLINK-33871:
-

 Summary: Reduce getTable call for hive client and optimize graph 
generation time
 Key: FLINK-33871
 URL: https://issues.apache.org/jira/browse/FLINK-33871
 Project: Flink
  Issue Type: Improvement
Reporter: hehuiyuan


HiveCatalog.getHiveTable method wastes a lot of time when generate graph, 
because the number of calls  is relatively high.


I have an sql task with over 2000 rows,  the HiveCatalog.getHiveTable  method 
is called 4879 times , but only six hive tables were used. 

![image](https://github.com/apache/flink/assets/18002496/d5f0daf3-f80a-4790-ae21-4e75dff9cfd7)

The client.getTable method costs a lot of time.  

![image](https://github.com/apache/flink/assets/18002496/be0d176f-3915-4b92-a177-f1cfaf6d2927)
There is a statistic that jobmanager interacts with hive when generate graph.

If One call takes approximately 50 milliseconds ,
How much time it spends  : 4879 * 50 =243950ms  = 243.95s  = 4min

We can cache and  client.getTable method  is only  called six times.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] FLIP-377: Support configuration to disable filter push down for Table/SQL Sources

2023-12-18 Thread Becket Qin
Hi Jiabao,

Please see the reply inline.


> The MySQL connector is currently in the flink-connector-jdbc repository
> and is not a standalone connector.
> Is it too unique to use "mysql" as the configuration option prefix?

If the intended behavior makes sense to all the supported JDBC drivers, we
can make this a JDBC connector configuration.

Also, I would like to ask about the difference in behavior between AUTO and
> ALWAYS.
> It seems that we cannot guarantee the pushing down of all filters to the
> external system under the ALWAYS
> mode because not all filters in Flink SQL are supported by the external
> system.
> Should we throw an error when encountering a filter that cannot be pushed
> down in the ALWAYS mode?

The idea of AUTO is to do efficiency-aware pushdowns. The source will query
the external system (MySQL, Oracle, SQL Server, etc) first to retrieve the
information of the table. With that information, the source will decide
whether to further push a filter to the external system based on the
efficiency. E.g. only push the indexed fields. In contrast, ALWAYS will
just always push the supported filters to the external system, regardless
of the efficiency. In case there are filters that are not supported,
according to the current contract of SupportsFilterPushdown, these
unsupported filters should just be returned by the
*SupportsFilterPushdown.applyFilters()* method as remaining filters.
Therefore, there is no need to throw exceptions here. This is likely the
desired behavior for most users, IMO. If there are cases that users really
want to get alerted when a filter cannot be pushed to the external system,
we can add another value like "ENFORCED_ALWAYS", which behaves like ALWAYS,
but throws exceptions when a filter cannot be applied to the external
system. But personally I don't see much value in doing this.

Thanks,

Jiangjie (Becket) Qin



On Mon, Dec 18, 2023 at 3:54 PM Jiabao Sun 
wrote:

> Hi Becket,
>
> The MySQL connector is currently in the flink-connector-jdbc repository
> and is not a standalone connector.
> Is it too unique to use "mysql" as the configuration option prefix?
>
> Also, I would like to ask about the difference in behavior between AUTO
> and ALWAYS.
> It seems that we cannot guarantee the pushing down of all filters to the
> external system under the ALWAYS
> mode because not all filters in Flink SQL are supported by the external
> system.
> Should we throw an error when encountering a filter that cannot be pushed
> down in the ALWAYS mode?
>
> Thanks,
> Jiabao
>
> > 2023年12月18日 15:34,Becket Qin  写道:
> >
> > Hi JIabao,
> >
> > Thanks for updating the FLIP. Maybe I did not explain it clearly enough.
> My
> > point is that given there are various good flavors of behaviors handling
> > filters pushed down, we should not have a common config of
> > "ignore.filter.pushdown", because the behavior is not *common*.
> >
> > It looks like the original motivation of this FLIP is just for MySql.
> Let's
> > focus on what is the best solution for MySql connector here first. After
> > that, if people think the best behavior for MySql happens to be a common
> > one, we can then discuss whether that is worth being added to the base
> > implementation of source. For MySQL, if we are going to introduce a
> config
> > to MySql, why not have something like "mysql.filter.handling.policy" with
> > value of AUTO / NEVER / ALWAYS? Isn't that better than
> > "ignore.filter.pushdown"?
> >
> > Thanks,
> >
> > Jiangjie (Becket) Qin
> >
> >
> >
> > On Sun, Dec 17, 2023 at 11:30 PM Jiabao Sun  .invalid>
> > wrote:
> >
> >> Hi Becket,
> >>
> >> The FLIP document has been updated as well.
> >> Please take a look when you have time.
> >>
> >> Thanks,
> >> Jiabao
> >>
> >>
> >>> 2023年12月17日 22:54,Jiabao Sun  写道:
> >>>
> >>> Thanks Becket,
> >>>
> >>> I apologize for not being able to continue with this proposal due to
> >> being too busy during this period.
> >>>
> >>> The viewpoints you shared about the design of Flink Source make sense
> to
> >> me
> >>> The native configuration ‘ignore.filter.pushdown’ is good to me.
> >>> Having a unified name or naming style can indeed prevent confusion for
> >> users regarding
> >>> the inconsistent naming of this configuration across different
> >> connectors.
> >>>
> >>> Currently, there are not many external connectors that support filter
> >> pushdown.
> >>> I propose that we first introduce it in flink-connector-jdbc and
> >> flink-connector-mongodb.
> >>> Do you think this is feasible?
> >>>
> >>> Best,
> >>> Jiabao
> >>>
> >>>
>  2023年11月16日 17:45,Becket Qin  写道:
> 
>  Hi Jiabao,
> 
>  Arguments like "because Spark has it so Flink should also have it"
> does
> >> not
>  make sense. Different projects have different API flavors and styles.
> >> What
>  is really important is the rationale and the design principle behind
> the
>  API. They should conform to the convention of the project.
> 
>  First of all, Spark