Re: [DISCUSS] FLIP-377: Support configuration to disable filter push down for Table/SQL Sources

2023-10-30 Thread Becket Qin
Hi Jiabao,

Please see the replies inline.

Introducing common configurations does not mean that all sources must
> accept these configuration options.
> The configuration options supported by a source are determined by the
> requiredOptions and optionalOptions in the Factory interface.

This is not true. Both required and optional options are SUPPORTED. That
means they are implemented and if one specifies an optional config it will
still take effect. An OptionalConfig is "Optional" because this
configuration has a default value. Hence, it is OK that users do not
specify their own value. In another word, it is "optional" for the end
users to set the config, but the implementation and support for that config
is NOT optional. In case a source does not support a common config, an
exception must be thrown when the config is provided by the end users.
However, the config we are talking about in this FLIP is a common config
optional to implement, meaning that sometimes the claimed behavior won't be
there even if users specify that config.

Similar to sources that do not implement the LookupTableSource interface,
> sources that do not implement the SupportsFilterPushDown interface also do
> not need to accept newly introduced options.

First of all, filter pushdown is a behavior of the query optimizer, not the
behavior of Sources. The Sources tells the optimizer that it has the
ability to accept pushed down filters by implementing the
SupportsFilterPushDown interface. And this is the only contract between the
Source and Optimizer regarding whether filters should be pushed down. As
long as a specific source implements this decorative interface, filter
pushdown should always take place, i.e.
*SupportsFilterPushDown.applyFilters()* will be called. There should be no
other config to disable that call. However, Sources can decide how to
behave based on their own configurations after *applyFilters()* is called.
And these configs are specific to those sources, instead of common configs.
Please see the examples I mentioned in the previous email.

Thanks,

Jiangjie (Becket) Qin

On Tue, Oct 31, 2023 at 10:27 AM Jiabao Sun 
wrote:

> Hi Becket,
>
> Sorry, there was a typo in the second point. Let me correct it:
>
> Introducing common configurations does not mean that all sources must
> accept these configuration options.
> The configuration options supported by a source are determined by the
> requiredOptions and optionalOptions in the Factory interface.
>
> Similar to sources that do not implement the LookupTableSource interface,
> sources that do not implement the SupportsFilterPushDown interface also do
> not need to accept newly introduced options.
>
> Best,
> Jiabao
>
>
> > 2023年10月31日 10:13,Jiabao Sun  写道:
> >
> > Thanks Becket for the feedback.
> >
> > 1. Currently, the SupportsFilterPushDown#applyFilters method returns a
> result that includes acceptedFilters and remainingFilters. The source can
> decide to push down some filters or not accept any of them.
> > 2. Introducing common configuration options does not mean that a source
> that supports the SupportsFilterPushDown capability must accept this
> configuration. Similar to LookupOptions, only sources that implement the
> LookupTableSource interface are necessary to accept these configuration
> options.
> >
> > Best,
> > Jiabao
> >
> >
> >> 2023年10月31日 07:49,Becket Qin  写道:
> >>
> >> Hi Jiabao and Ruanhang,
> >>
> >> Adding a configuration of source.filter-push-down.enabled as a common
> >> source configuration seems problematic.
> >> 1. The config name is misleading. filter pushdown should only be
> determined
> >> by whether the SupportsFilterPushdown interface is implemented or not.
> >> 2. The behavior of this configuration is only applicable to some source
> >> implementations. Why is it a common configuration?
> >>
> >> Here's my suggestion for design principles:
> >> 1. Only add source impl specific configuration to corresponding sources.
> >> 2. The configuration name should not overrule existing common contracts.
> >>
> >> For example, in the case of MySql source. There are several options:
> >> 1. Have a configuration of `*mysql.avoid.remote.full.table.scan`*. If
> this
> >> configuration is set, and a filter pushdown does not hit an index, the
> >> MySql source impl would not further pushdown the filter to MySql
> servers.
> >> Note that this assumes the MySql source can retrieve the index
> information
> >> from the MySql servers.
> >> 2. If the MySql index information is not available to the MySql source,
> the
> >> configuration could be something like
> *`mysql.pushback.pushed.down.filters`*.
> >> Once set to true, MySql source would just add all the filters to the
> >> RemainingFilters in the Result returned by
> >> *SupportsFilterPushdown.applyFilters().*
> >> 3. An alternative to option 2 is to have a `
> >> *mysql.apply.predicates.after.scan*`. When it is set to true, MySql
> source
> >> will not push the filter down to the MySql servers, 

Re: [DISCUSS] AWS Connectors v4.2.0 release + 1.18 support

2023-10-30 Thread Leonard Xu
+1, thanks Dany for driving this.

One related question, do we have plan to find some volunteers to release rest 
external connectors for 1.18 support? 

Best,
Leonard


> 2023年10月31日 上午12:17,Tzu-Li (Gordon) Tai  写道:
> 
> +1
> 
> On Mon, Oct 30, 2023 at 9:00 AM Danny Cranmer 
> wrote:
> 
>> Hey,
>> 
>>> Did you mean skip 4.1.1, since 4.1.0 has already been released?
>> 
>> I meant skip "4.1.0-1.18" since we could release this with the existing
>> source. We will additionally skip 4.1.1 and jump to 4.2.0 since this
>> version has features it should be a minor version rather than a patch [1].
>> 
>>> Does this imply that the 4.1.x series will be reserved for Flink 1.17,
>> and the 4.2.x series will correspond to Flink 1.18?
>> 
>> 4.1.x will receive bug fixes for Flink 1.17.
>> 4.2.x will receive bug fixes and features for Flink 1.17 and 1.18.
>> 
>> Thanks,
>> Danny
>> 
>> [1] https://semver.org/
>> 
>> 
>> On Mon, Oct 30, 2023 at 3:47 PM Samrat Deb  wrote:
>> 
>>> Hi Danny ,
>>> 
>>> Thank you for driving it.
>>> 
>>> +1 (non binding )
>>> 
>>> 
 I am proposing we skip 4.1.0 for Flink 1.18 and go
>>> straight to 4.2.0.
>>> 
>>> Does this imply that the 4.1.x series will be reserved for Flink 1.17,
>> and
>>> the 4.2.x series will correspond to Flink 1.18?
>>> 
>>> Bests,
>>> Samrat
>>> 
>>> 
>>> On Mon, Oct 30, 2023 at 7:32 PM Jing Ge 
>>> wrote:
>>> 
 Hi Danny,
 
 +1 Thanks for driving it. Did you mean skip 4.1.1, since 4.1.0 has
>>> already
 been released?
 
 Best regards,
 Jing
 
 On Mon, Oct 30, 2023 at 11:49 AM Danny Cranmer <
>> dannycran...@apache.org>
 wrote:
 
> Hello all,
> 
> I would like to start the discussion to release Apache Flink AWS
 connectors
> v4.2.0. We released v4.1.0 over six months ago on 2023-04-03. Since
>>> then
 we
> have resolved 23 issues [1]. Additionally now Flink 1.18 is live we
>>> need
 to
> add support for this. I am proposing we skip 4.1.0 for Flink 1.18 and
>>> go
> straight to 4.2.0. The CI is stable [2].
> 
> I volunteer myself as the release manager.
> 
> Thanks,
> Danny
> 
> [1]
> 
> 
 
>>> 
>> https://issues.apache.org/jira/browse/FLINK-33021?jql=statusCategory%20%3D%20done%20AND%20project%20%3D%2012315522%20AND%20fixVersion%20%3D%2012353011%20ORDER%20BY%20priority%20DESC%2C%20key%20ASC
> [2] https://github.com/apache/flink-connector-aws/actions
> 
 
>>> 
>> 



Re: [DISCUSS] FLIP-378: Support Avro timestamp with local timezone

2023-10-30 Thread Leonard Xu
Thanks @Peter for driving this FLIP

+1 from my side, the timestamp semantics mapping looks good to me.

>  In the end, the legacy behavior will be dropped in
> Flink 2.0
> I don’t think we can drop this option which introduced in 1.19 and drop in 
> 2.0, the API removal requires at least two minor versions.


Best,
Leonard

> 2023年10月31日 上午11:18,Peter Huang  写道:
> 
> Hi Devs,
> 
> Currently, Flink Avro Format doesn't support the Avro time (milli/micros)
> with local timezone type.
> Although the Avro timestamp (millis/micros) type is supported and is mapped
> to flink timestamp without timezone,
> it is not compliant to semantics defined in Consistent timestamp types in
> Hadoop SQL engines
> 
> .
> 
> I propose to support Avro timestamps with the compliance to the mapping
> semantics [1] by using a configuration flag.
> To keep back compatibility, current mapping is kept as default behavior.
> Users can explicitly turn on the new mapping
> by setting it to false. In the end, the legacy behavior will be dropped in
> Flink 2.0
> 
> Looking forward to your feedback.
> 
> 
> [1]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-378%3A+Support+Avro+timestamp+with+local+timezone
> 
> 
> Best Regards
> 
> Peter Huang



[DISCUSS] FLIP-378: Support Avro timestamp with local timezone

2023-10-30 Thread Peter Huang
Hi Devs,

Currently, Flink Avro Format doesn't support the Avro time (milli/micros)
with local timezone type.
Although the Avro timestamp (millis/micros) type is supported and is mapped
to flink timestamp without timezone,
it is not compliant to semantics defined in Consistent timestamp types in
Hadoop SQL engines

.

I propose to support Avro timestamps with the compliance to the mapping
semantics [1] by using a configuration flag.
To keep back compatibility, current mapping is kept as default behavior.
Users can explicitly turn on the new mapping
by setting it to false. In the end, the legacy behavior will be dropped in
Flink 2.0

Looking forward to your feedback.


[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-378%3A+Support+Avro+timestamp+with+local+timezone


Best Regards

Peter Huang


[jira] [Created] (FLINK-33405) ProcessJoinFunction not found in Pyflink

2023-10-30 Thread Jaehyeon Kim (Jira)
Jaehyeon Kim created FLINK-33405:


 Summary: ProcessJoinFunction not found in Pyflink
 Key: FLINK-33405
 URL: https://issues.apache.org/jira/browse/FLINK-33405
 Project: Flink
  Issue Type: Improvement
  Components: API / Python
Reporter: Jaehyeon Kim


ProcessJoinFunction doesn't exist in Pyflink. Is there a plan to add it?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-33404) on_timer method is missing in ProcessFunction and CoProcessFunction of Pyflink

2023-10-30 Thread Jaehyeon Kim (Jira)
Jaehyeon Kim created FLINK-33404:


 Summary: on_timer method is missing in ProcessFunction and 
CoProcessFunction of Pyflink
 Key: FLINK-33404
 URL: https://issues.apache.org/jira/browse/FLINK-33404
 Project: Flink
  Issue Type: Bug
  Components: API / Python
Reporter: Jaehyeon Kim


Hello,

I find the `on_timer` method is not found in ProcessFunction and 
CoProcessFunction of Pyflink and it causes an error when I register a timer eg)

 ```
  ...
  File 
"/home/jaehyeon/personal/flink-demos/venv/lib/python3.8/site-packages/pyflink/fn_execution/datastream/process/input_handler.py",
 line 101, in process_timer
    yield from _emit_results(
  File 
"/home/jaehyeon/personal/flink-demos/venv/lib/python3.8/site-packages/pyflink/fn_execution/datastream/process/input_handler.py",
 line 131, in _emit_results
    for result in results:
  File 
"/home/jaehyeon/personal/flink-demos/venv/lib/python3.8/site-packages/pyflink/fn_execution/datastream/process/input_handler.py",
 line 114, in _on_processing_time
    yield from self._on_processing_time_func(timestamp, key, namespace)
  File 
"/home/jaehyeon/personal/flink-demos/venv/lib/python3.8/site-packages/pyflink/fn_execution/datastream/process/operations.py",
 line 308, in on_processing_time
    return _on_timer(TimeDomain.PROCESSING_TIME, timestamp, key)
  File 
"/home/jaehyeon/personal/flink-demos/venv/lib/python3.8/site-packages/pyflink/fn_execution/datastream/process/operations.py",
 line 317, in _on_timer
    return process_function.on_timer(timestamp, on_timer_ctx)
AttributeError: 'ReadingFilter' object has no attribute 'on_timer'

        at 
org.apache.beam.runners.fnexecution.control.FnApiControlClient$ResponseStreamObserver.onNext(FnApiControlClient.java:180)
        at 
org.apache.beam.runners.fnexecution.control.FnApiControlClient$ResponseStreamObserver.onNext(FnApiControlClient.java:160)
        at 
org.apache.beam.vendor.grpc.v1p48p1.io.grpc.stub.ServerCalls$StreamingServerCallHandler$StreamingServerCallListener.onMessage(ServerCalls.java:262)
        at 
org.apache.beam.vendor.grpc.v1p48p1.io.grpc.ForwardingServerCallListener.onMessage(ForwardingServerCallListener.java:33)
        at 
org.apache.beam.vendor.grpc.v1p48p1.io.grpc.Contexts$ContextualizedServerCallListener.onMessage(Contexts.java:76)
        at 
org.apache.beam.vendor.grpc.v1p48p1.io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.messagesAvailableInternal(ServerCallImpl.java:332)
        at 
org.apache.beam.vendor.grpc.v1p48p1.io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.messagesAvailable(ServerCallImpl.java:315)
        at 
org.apache.beam.vendor.grpc.v1p48p1.io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1MessagesAvailable.runInContext(ServerImpl.java:834)
        at 
org.apache.beam.vendor.grpc.v1p48p1.io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
        at 
org.apache.beam.vendor.grpc.v1p48p1.io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:133)
        ... 3 more
```

I'm working on Pyflink 1.17.1 but it would be applicable other versions. 

Can the method be added to the functions?

Cheers,
Jaehyeon



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] FLIP-377: Support configuration to disable filter push down for Table/SQL Sources

2023-10-30 Thread Jiabao Sun
Hi Becket,

Sorry, there was a typo in the second point. Let me correct it: 

Introducing common configurations does not mean that all sources must accept 
these configuration options. 
The configuration options supported by a source are determined by the 
requiredOptions and optionalOptions in the Factory interface. 

Similar to sources that do not implement the LookupTableSource interface, 
sources that do not implement the SupportsFilterPushDown interface also do not 
need to accept newly introduced options.

Best,
Jiabao


> 2023年10月31日 10:13,Jiabao Sun  写道:
> 
> Thanks Becket for the feedback.
> 
> 1. Currently, the SupportsFilterPushDown#applyFilters method returns a result 
> that includes acceptedFilters and remainingFilters. The source can decide to 
> push down some filters or not accept any of them.
> 2. Introducing common configuration options does not mean that a source that 
> supports the SupportsFilterPushDown capability must accept this 
> configuration. Similar to LookupOptions, only sources that implement the 
> LookupTableSource interface are necessary to accept these configuration 
> options.
> 
> Best,
> Jiabao
> 
> 
>> 2023年10月31日 07:49,Becket Qin  写道:
>> 
>> Hi Jiabao and Ruanhang,
>> 
>> Adding a configuration of source.filter-push-down.enabled as a common
>> source configuration seems problematic.
>> 1. The config name is misleading. filter pushdown should only be determined
>> by whether the SupportsFilterPushdown interface is implemented or not.
>> 2. The behavior of this configuration is only applicable to some source
>> implementations. Why is it a common configuration?
>> 
>> Here's my suggestion for design principles:
>> 1. Only add source impl specific configuration to corresponding sources.
>> 2. The configuration name should not overrule existing common contracts.
>> 
>> For example, in the case of MySql source. There are several options:
>> 1. Have a configuration of `*mysql.avoid.remote.full.table.scan`*. If this
>> configuration is set, and a filter pushdown does not hit an index, the
>> MySql source impl would not further pushdown the filter to MySql servers.
>> Note that this assumes the MySql source can retrieve the index information
>> from the MySql servers.
>> 2. If the MySql index information is not available to the MySql source, the
>> configuration could be something like *`mysql.pushback.pushed.down.filters`*.
>> Once set to true, MySql source would just add all the filters to the
>> RemainingFilters in the Result returned by
>> *SupportsFilterPushdown.applyFilters().*
>> 3. An alternative to option 2 is to have a `
>> *mysql.apply.predicates.after.scan*`. When it is set to true, MySql source
>> will not push the filter down to the MySql servers, but apply the filters
>> inside the MySql source itself.
>> 
>> As you may see, the above configurations do not disable filter pushdown
>> itself. They just allow various implementations of filter pushdown. And the
>> configuration name does not give any illusion that filter pushdown is
>> disabled.
>> 
>> Thanks,
>> 
>> Jiangjie (Becket) Qin
>> 
>> On Mon, Oct 30, 2023 at 11:58 PM Jiabao Sun 
>> wrote:
>> 
>>> Thanks Hang for the suggestion.
>>> 
>>> 
>>> I think the configuration of TableSource is not closely related to
>>> SourceReader,
>>> so I prefer to introduce a independent configuration class
>>> TableSourceOptions in the flink-table-common module, similar to
>>> LookupOptions.
>>> 
>>> For the second point, I suggest adding Java doc to the SupportsXXXPushDown
>>> interfaces, providing detailed information on these options that needs to
>>> be supported.
>>> 
>>> I have made updates in the FLIP document.
>>> Please help check it again.
>>> 
>>> 
>>> Best,
>>> Jiabao
>>> 
>>> 
 2023年10月30日 17:23,Hang Ruan  写道:
 
 Thanks for the improvements, Jiabao.
 
 There are some details that I am not sure about.
 1. The new option `source.filter-push-down.enabled` will be added to
>>> which
 class? I think it should be `SourceReaderOptions`.
 2. How are the connector developers able to know and follow the FLIP? Do
>>> we
 need an abstract base class or provide a default method?
 
 Best,
 Hang
 
 Jiabao Sun  于2023年10月30日周一 14:45写道:
 
> Hi, all,
> 
> Thanks for the lively discussion.
> 
> Based on the discussion, I have made some adjustments to the FLIP
>>> document:
> 
> 1. The name of the newly added option has been changed to
> "source.filter-push-down.enabled".
> 2. Considering compatibility with older versions, the newly added
> "source.filter-push-down.enabled" option needs to respect the
>>> optimizer's
> "table.optimizer.source.predicate-pushdown-enabled" option.
>  But there is a consideration to remove the old option in Flink 2.0.
> 3. We can provide more options to disable other source abilities with
>>> side
> effects, such as “source.aggregate.enabled” and
>>> “source.projection.enabled"
>  

Re: [DISCUSS] FLIP-377: Support configuration to disable filter push down for Table/SQL Sources

2023-10-30 Thread Jiabao Sun
Thanks Becket for the feedback.

1. Currently, the SupportsFilterPushDown#applyFilters method returns a result 
that includes acceptedFilters and remainingFilters. The source can decide to 
push down some filters or not accept any of them.
2. Introducing common configuration options does not mean that a source that 
supports the SupportsFilterPushDown capability must accept this configuration. 
Similar to LookupOptions, only sources that implement the LookupTableSource 
interface are necessary to accept these configuration options.

Best,
Jiabao


> 2023年10月31日 07:49,Becket Qin  写道:
> 
> Hi Jiabao and Ruanhang,
> 
> Adding a configuration of source.filter-push-down.enabled as a common
> source configuration seems problematic.
> 1. The config name is misleading. filter pushdown should only be determined
> by whether the SupportsFilterPushdown interface is implemented or not.
> 2. The behavior of this configuration is only applicable to some source
> implementations. Why is it a common configuration?
> 
> Here's my suggestion for design principles:
> 1. Only add source impl specific configuration to corresponding sources.
> 2. The configuration name should not overrule existing common contracts.
> 
> For example, in the case of MySql source. There are several options:
> 1. Have a configuration of `*mysql.avoid.remote.full.table.scan`*. If this
> configuration is set, and a filter pushdown does not hit an index, the
> MySql source impl would not further pushdown the filter to MySql servers.
> Note that this assumes the MySql source can retrieve the index information
> from the MySql servers.
> 2. If the MySql index information is not available to the MySql source, the
> configuration could be something like *`mysql.pushback.pushed.down.filters`*.
> Once set to true, MySql source would just add all the filters to the
> RemainingFilters in the Result returned by
> *SupportsFilterPushdown.applyFilters().*
> 3. An alternative to option 2 is to have a `
> *mysql.apply.predicates.after.scan*`. When it is set to true, MySql source
> will not push the filter down to the MySql servers, but apply the filters
> inside the MySql source itself.
> 
> As you may see, the above configurations do not disable filter pushdown
> itself. They just allow various implementations of filter pushdown. And the
> configuration name does not give any illusion that filter pushdown is
> disabled.
> 
> Thanks,
> 
> Jiangjie (Becket) Qin
> 
> On Mon, Oct 30, 2023 at 11:58 PM Jiabao Sun 
> wrote:
> 
>> Thanks Hang for the suggestion.
>> 
>> 
>> I think the configuration of TableSource is not closely related to
>> SourceReader,
>> so I prefer to introduce a independent configuration class
>> TableSourceOptions in the flink-table-common module, similar to
>> LookupOptions.
>> 
>> For the second point, I suggest adding Java doc to the SupportsXXXPushDown
>> interfaces, providing detailed information on these options that needs to
>> be supported.
>> 
>> I have made updates in the FLIP document.
>> Please help check it again.
>> 
>> 
>> Best,
>> Jiabao
>> 
>> 
>>> 2023年10月30日 17:23,Hang Ruan  写道:
>>> 
>>> Thanks for the improvements, Jiabao.
>>> 
>>> There are some details that I am not sure about.
>>> 1. The new option `source.filter-push-down.enabled` will be added to
>> which
>>> class? I think it should be `SourceReaderOptions`.
>>> 2. How are the connector developers able to know and follow the FLIP? Do
>> we
>>> need an abstract base class or provide a default method?
>>> 
>>> Best,
>>> Hang
>>> 
>>> Jiabao Sun  于2023年10月30日周一 14:45写道:
>>> 
 Hi, all,
 
 Thanks for the lively discussion.
 
 Based on the discussion, I have made some adjustments to the FLIP
>> document:
 
 1. The name of the newly added option has been changed to
 "source.filter-push-down.enabled".
 2. Considering compatibility with older versions, the newly added
 "source.filter-push-down.enabled" option needs to respect the
>> optimizer's
 "table.optimizer.source.predicate-pushdown-enabled" option.
   But there is a consideration to remove the old option in Flink 2.0.
 3. We can provide more options to disable other source abilities with
>> side
 effects, such as “source.aggregate.enabled” and
>> “source.projection.enabled"
   This is not urgent and can be continuously introduced.
 
 Looking forward to your feedback again.
 
 Best,
 Jiabao
 
 
> 2023年10月29日 08:45,Becket Qin  写道:
> 
> Thanks for digging into the git history, Jark. I agree it makes sense
>> to
> deprecate this API in 2.0.
> 
> Cheers,
> 
> Jiangjie (Becket) Qin
> 
> On Fri, Oct 27, 2023 at 5:47 PM Jark Wu  wrote:
> 
>> Hi Becket,
>> 
>> I checked the history of "
>> *table.optimizer.source.predicate-pushdown-enabled*",
>> it seems it was introduced since the legacy FilterableTableSource
>> interface
>> which might be an experiential feature at that 

[jira] [Created] (FLINK-33403) Bump flink version to 1.18.0 for flink-kubernetes-operator

2023-10-30 Thread Rui Fan (Jira)
Rui Fan created FLINK-33403:
---

 Summary: Bump flink version to 1.18.0 for flink-kubernetes-operator
 Key: FLINK-33403
 URL: https://issues.apache.org/jira/browse/FLINK-33403
 Project: Flink
  Issue Type: Improvement
  Components: Kubernetes Operator
Reporter: Rui Fan
Assignee: Rui Fan






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] Promote SinkV2 to @Public and deprecate SinkFunction

2023-10-30 Thread Yun Gao
Hi Martijn and Gordon,

Very sorry for the very late reply, +1 for close FLINK-30238 and open
dedicated issues
for the remaining issues.

Best,
Yun

On Mon, Oct 30, 2023 at 6:42 PM Martijn Visser  wrote:
>
> Hi everyone,
>
> I would like to +1 Gordon's proposal to close FLINK-30238, create a
> new follow-up ticket and try to address the specific
> PostCommitTopology in the work that's currently being done by Peter on
> SinkV2. If there's no feedback on this topic, I assume everyone's OK
> with that.
>
> Best regards,
>
> Martijn
>
> On Fri, Sep 29, 2023 at 8:11 AM Tzu-Li (Gordon) Tai  
> wrote:
> >
> > Hi everyone,
> >
> > It’s been a while since this topic was last discussed, but nevertheless, it
> > still remains very desirable to figure out a clear path towards making
> > SinkV2 @Public.
> >
> > There’s a new thread [1] that has a pretty good description on missing
> > features in SinkV2 from the Iceberg connector’s perspective, which prevents
> > them from migrating - anything related to those new requirements, let's
> > discuss there.
> >
> > Nevertheless, I think we should also revive and reuse this thread to reach
> > consensus / closure on all concerns already brought up here.
> >
> > It’s quite a long thread where a lot of various concerns were brought up,
> > but I’d start by addressing two very specific ones: FLIP-287 [2] and
> > FLINK-30238 [3]
> >
> > First of all, FLIP-287 has been approved and merged already, and will be
> > released with 1.18.0. So, connector migrations that were waiting on this
> > should hopefully be unblocked after the release. So this seems to no longer
> > be a concern - let’s see things through with those connectors actually
> > being migrated.
> >
> > FLINK-30238 is sort of a confusing one, and I do believe it is (partially)
> > a false alarm. After looking into this, the problem reported there
> > essentially breaks down to two things:
> > 1) TwoPhaseCommittingSink is unable to take a new savepoint after restoring
> > from a savepoint generated via `stop-with-savepoint --drain`
> > 2) SinkV2 sinks with a PostCommitTopology do not properly have post-commits
> > completed after a stop-with-savepoint operation, since committed
> > commitables are not emitted to the post-commit topology after the committer
> > receives the end-of-input signal.
> >
> > My latest comment in [3] explains this in a bit more detail.
> >
> > I believe we can conclude that problem 1) is a non-concern - users should
> > not restore from a job that is drained on stop-with-savepoint and cannot
> > expect the restored job to function normally.
> > Problem 2) remains a real issue though, and to help clear things up I think
> > we should close FLINK-30238 in favor of a new ticket scoped to the specific
> > PostCommitTopology problem.
> >
> > The other open concerns seem to mostly be around graduation criteria and
> > process - I've yet to go through those and will follow up with a separate
> > reply (or perhaps Martijn can help wrap up that part?).
> >
> > Thanks,
> > Gordon
> >
> > [1] https://lists.apache.org/thread/h3kg7jcgjstpvwlhnofq093vk93ylgsn
> > [2]
> > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=240880853
> > [3] https://issues.apache.org/jira/browse/FLINK-30238
> >
> > On Mon, Feb 13, 2023 at 2:50 AM Jing Ge  wrote:
> >
> > > @Martijn
> > >
> > > What I shared previously is the fact of the current KafkaSink. Following
> > > your suggestion, the KafkaSink should still be marked as @Experimental for
> > > now which will need even longer time to graduate. BTW, KafkaSink does not
> > > depend on any @Internal interfaces. The @Internal is used for methods
> > > coming from @PublicEvolving SinkV2 interfaces, not interfaces themself.
> > > Thanks for bringing this topic up. Currently, there is no rule defined to
> > > say that no @Internal is allowed for methods implemented
> > > from @PublicEvolving interfaces. Further (off-this-topic) discussion might
> > > be required to check if it really makes sense to define such a rule, since
> > > some methods defined in interfaces might only be used internally, i.e. no
> > > connector user would be aware of them.
> > >
> > > @Dong
> > >
> > > I agree with everything you said and especially can't agree more to let
> > > developers who will own it make the decision.
> > >
> > > Best regards,
> > > Jing
> > >
> > >
> > > On Sun, Feb 12, 2023 at 2:53 AM Dong Lin  wrote:
> > >
> > > > Hi Martijn,
> > > >
> > > > Thanks for the reply. Please see my comments inline.
> > > >
> > > > Regards,
> > > > Dong
> > > >
> > > > On Sat, Feb 11, 2023 at 4:31 AM Martijn Visser  > > >
> > > > wrote:
> > > >
> > > > > Hi all,
> > > > >
> > > > > I wanted to get back on a couple of comments and have a proposal at 
> > > > > the
> > > > end
> > > > > for the next steps:
> > > > >
> > > > > @Steven Wu
> > > > > If I look back at the original discussion of FLIP-191 and also the
> > > thread
> > > > > that you're referring to, it appears from the discussion 

Re: [DISCUSS] FLIP-377: Support configuration to disable filter push down for Table/SQL Sources

2023-10-30 Thread Becket Qin
Hi Jiabao and Ruanhang,

Adding a configuration of source.filter-push-down.enabled as a common
source configuration seems problematic.
1. The config name is misleading. filter pushdown should only be determined
by whether the SupportsFilterPushdown interface is implemented or not.
2. The behavior of this configuration is only applicable to some source
implementations. Why is it a common configuration?

Here's my suggestion for design principles:
1. Only add source impl specific configuration to corresponding sources.
2. The configuration name should not overrule existing common contracts.

For example, in the case of MySql source. There are several options:
1. Have a configuration of `*mysql.avoid.remote.full.table.scan`*. If this
configuration is set, and a filter pushdown does not hit an index, the
MySql source impl would not further pushdown the filter to MySql servers.
Note that this assumes the MySql source can retrieve the index information
from the MySql servers.
2. If the MySql index information is not available to the MySql source, the
configuration could be something like *`mysql.pushback.pushed.down.filters`*.
Once set to true, MySql source would just add all the filters to the
RemainingFilters in the Result returned by
*SupportsFilterPushdown.applyFilters().*
3. An alternative to option 2 is to have a `
*mysql.apply.predicates.after.scan*`. When it is set to true, MySql source
will not push the filter down to the MySql servers, but apply the filters
inside the MySql source itself.

As you may see, the above configurations do not disable filter pushdown
itself. They just allow various implementations of filter pushdown. And the
configuration name does not give any illusion that filter pushdown is
disabled.

Thanks,

Jiangjie (Becket) Qin

On Mon, Oct 30, 2023 at 11:58 PM Jiabao Sun 
wrote:

> Thanks Hang for the suggestion.
>
>
> I think the configuration of TableSource is not closely related to
> SourceReader,
> so I prefer to introduce a independent configuration class
> TableSourceOptions in the flink-table-common module, similar to
> LookupOptions.
>
> For the second point, I suggest adding Java doc to the SupportsXXXPushDown
> interfaces, providing detailed information on these options that needs to
> be supported.
>
> I have made updates in the FLIP document.
> Please help check it again.
>
>
> Best,
> Jiabao
>
>
> > 2023年10月30日 17:23,Hang Ruan  写道:
> >
> > Thanks for the improvements, Jiabao.
> >
> > There are some details that I am not sure about.
> > 1. The new option `source.filter-push-down.enabled` will be added to
> which
> > class? I think it should be `SourceReaderOptions`.
> > 2. How are the connector developers able to know and follow the FLIP? Do
> we
> > need an abstract base class or provide a default method?
> >
> > Best,
> > Hang
> >
> > Jiabao Sun  于2023年10月30日周一 14:45写道:
> >
> >> Hi, all,
> >>
> >> Thanks for the lively discussion.
> >>
> >> Based on the discussion, I have made some adjustments to the FLIP
> document:
> >>
> >> 1. The name of the newly added option has been changed to
> >> "source.filter-push-down.enabled".
> >> 2. Considering compatibility with older versions, the newly added
> >> "source.filter-push-down.enabled" option needs to respect the
> optimizer's
> >> "table.optimizer.source.predicate-pushdown-enabled" option.
> >>But there is a consideration to remove the old option in Flink 2.0.
> >> 3. We can provide more options to disable other source abilities with
> side
> >> effects, such as “source.aggregate.enabled” and
> “source.projection.enabled"
> >>This is not urgent and can be continuously introduced.
> >>
> >> Looking forward to your feedback again.
> >>
> >> Best,
> >> Jiabao
> >>
> >>
> >>> 2023年10月29日 08:45,Becket Qin  写道:
> >>>
> >>> Thanks for digging into the git history, Jark. I agree it makes sense
> to
> >>> deprecate this API in 2.0.
> >>>
> >>> Cheers,
> >>>
> >>> Jiangjie (Becket) Qin
> >>>
> >>> On Fri, Oct 27, 2023 at 5:47 PM Jark Wu  wrote:
> >>>
>  Hi Becket,
> 
>  I checked the history of "
>  *table.optimizer.source.predicate-pushdown-enabled*",
>  it seems it was introduced since the legacy FilterableTableSource
>  interface
>  which might be an experiential feature at that time. I don't see the
>  necessity
>  of this option at the moment. Maybe we can deprecate this option and
> >> drop
>  it
>  in Flink 2.0[1] if it is not necessary anymore. This may help to
>  simplify this discussion.
> 
> 
>  Best,
>  Jark
> 
>  [1]: https://issues.apache.org/jira/browse/FLINK-32383
> 
> 
> 
>  On Thu, 26 Oct 2023 at 10:14, Becket Qin 
> wrote:
> 
> > Thanks for the proposal, Jiabao. My two cents below:
> >
> > 1. If I understand correctly, the motivation of the FLIP is mainly to
> > make predicate pushdown optional on SOME of the Sources. If so,
> >> intuitively
> > the configuration should be Source specific instead 

[jira] [Created] (FLINK-33402) Hybrid Source Concurrency Race Condition Fixes and Related Bugs

2023-10-30 Thread Varun Narayanan Chakravarthy (Jira)
Varun Narayanan Chakravarthy created FLINK-33402:


 Summary: Hybrid Source Concurrency Race Condition Fixes and 
Related Bugs
 Key: FLINK-33402
 URL: https://issues.apache.org/jira/browse/FLINK-33402
 Project: Flink
  Issue Type: Bug
  Components: Connectors / HybridSource
Affects Versions: 1.16.1
 Environment: Apache Flink 1.16.1
Mac OSX, Linux etc. 
Reporter: Varun Narayanan Chakravarthy
 Attachments: hybridSourceEnumeratorAndReaderFixes.patch

Hello Team,
I noticed that there is data loss when using Hybrid Source. We are reading from 
a series of concrete File Sources ~100. All these locations are chained 
together using the Hybrid source.
The issue stems from a race-condition in Flink Hybrid Source code. The Hybrid 
Sources switches the next source before the current source is complete. 
Similarly for the Hybrid Source readers. I have also shared the patch file that 
fixes the issue.
>From the logs:

*Task Manager logs:* 
2023-10-10 17:46:23.577 [Source: parquet-source (1/2)#0] INFO  
o.apache.flink.connector.base.source.reader.SourceReaderBase  - Adding split(s) 
to reader: [FileSourceSplit: s3://REDACTED/part-1-13189.snappy [0, 94451)  
hosts=[localhost] ID=000229 position=null] 2023-10-10 17:46:23.715 [Source 
Data Fetcher for Source: parquet-source (1/2)#0] INFO  
org.apache.hadoop.fs.s3a.S3AInputStream  - Switching to Random IO seek policy 
2023-10-10 17:46:23.715 [Source Data Fetcher for Source: parquet-source 
(1/2)#0] INFO  org.apache.hadoop.fs.s3a.S3AInputStream  - Switching to Random 
IO seek policy 2023-10-10 17:46:24.012 [Source: parquet-source (1/2)#0] INFO  
o.apache.flink.connector.base.source.reader.SourceReaderBase  - Finished 
reading split(s) [000154] 2023-10-10 17:46:24.012 [Source Data Fetcher for 
Source: parquet-source (1/2)#0] INFO  
o.a.flink.connector.base.source.reader.fetcher.SplitFetcher  - Finished reading 
from splits [000154] 2023-10-10 17:46:24.014 [Source: parquet-source 
(1/2)#0] INFO  o.apache.flink.connector.base.source.reader.SourceReaderBase  - 
Reader received NoMoreSplits event. 2023-10-10 17:46:24.014 [Source: 
parquet-source (1/2)#0] DEBUG 
o.a.flink.connector.base.source.hybrid.HybridSourceReader  - No more splits for 
subtask=0 sourceIndex=11 
currentReader=org.apache.flink.connector.file.src.impl.FileSourceReader@59620ef8
 2023-10-10 17:46:24.116 [Source Data Fetcher for Source: parquet-source 
(1/2)#0] INFO  org.apache.hadoop.fs.s3a.S3AInputStream  - Switching to Random 
IO seek policy 2023-10-10 17:46:24.116 [Source Data Fetcher for Source: 
parquet-source (1/2)#0] INFO  org.apache.hadoop.fs.s3a.S3AInputStream  - 
Switching to Random IO seek policy 2023-10-10 17:46:24.116 [Source: 
parquet-source (1/2)#0] INFO  
o.a.flink.connector.base.source.hybrid.HybridSourceReader  - Switch source 
event: subtask=0 sourceIndex=12 
source=org.apache.flink.connector.kafka.source.KafkaSource@7849da7e 2023-10-10 
17:46:24.116 [Source: parquet-source (1/2)#0] INFO  
o.apache.flink.connector.base.source.reader.SourceReaderBase  - Closing Source 
Reader. 2023-10-10 17:46:24.116 [Source: parquet-source (1/2)#0] INFO  
o.a.flink.connector.base.source.reader.fetcher.SplitFetcher  - Shutting down 
split fetcher 0 2023-10-10 17:46:24.198 [Source Data Fetcher for Source: 
parquet-source (1/2)#0] INFO  
o.a.flink.connector.base.source.reader.fetcher.SplitFetcher  - Split fetcher 0 
exited. 2023-10-10 17:46:24.198 [Source: parquet-source (1/2)#0] DEBUG 
o.a.flink.connector.base.source.hybrid.HybridSourceReader  - Reader closed: 
subtask=0 sourceIndex=11 
currentReader=org.apache.flink.connector.file.src.impl.FileSourceReader@59620ef8
We identified that data from `s3://REDACTED/part-1-13189.snappy` is missing.  
This is assigned to Reader with ID 000229. Now, we can see from the logs 
this split is added after the no-more splits event and is NOT read.

*Job Manager logs:*
2023-10-10 17:46:23.576 [SourceCoordinator-Source: parquet-source] INFO  
o.a.f.c.file.src.assigners.LocalityAwareSplitAssigner  - Assigning remote split 
to requesting host '10': Optional[FileSourceSplit: 
s3://REDACTED/part-1-13189.snappy [0, 94451)  hosts=[localhost] ID=000229 
position=null]
2023-10-10 17:46:23.576 [SourceCoordinator-Source: parquet-source] INFO  
o.a.flink.connector.file.src.impl.StaticFileSplitEnumerator  - Assigned split 
to subtask 0 : FileSourceSplit: s3://REDACTED/part-1-13189.snappy [0, 94451)  
hosts=[localhost] ID=000229 position=null
2023-10-10 17:46:23.786 [SourceCoordinator-Source: parquet-source] INFO  
o.apache.flink.runtime.source.coordinator.SourceCoordinator  - Source Source: 
parquet-source received split request from parallel task 1 (#0)
2023-10-10 17:46:23.786 [SourceCoordinator-Source: parquet-source] DEBUG 
o.a.f.c.base.source.hybrid.HybridSourceSplitEnumerator  - handleSplitRequest 
subtask=1 

[RESULT] [VOTE] Apache Flink Kafka connector version 3.0.1, RC1

2023-10-30 Thread Tzu-Li (Gordon) Tai
I'm happy to announce that we have unanimously approved this release.

There are 10 approving votes, 4 of which are binding:
* Qingsheng Ren (binding)
* Martijn Visser (binding)
* Xianxun Ye
* Mystic Lama
* Leonard Xu (binding)
* Ahmed Hamdy
* Samrat Deb
* Tzu-Li (Gordon) Tai (binding)
* Sergey Nuyanzin
* Mason Chen

There are no disapproving votes.

Thanks everyone! I'll now release the artifacts, and separately announce
once everything is ready.

Best,
Gordon

On Mon, Oct 30, 2023 at 3:35 PM Tzu-Li (Gordon) Tai 
wrote:

> Thanks for the catch on the docs and fixing it, Xianxun and Mason!
>
> On Mon, Oct 30, 2023 at 12:36 PM Mason Chen 
> wrote:
>
>> I submitted PR to fix it since I was looking at the Kafka code already:
>> https://github.com/apache/flink-connector-kafka/pull/63
>>
>> On Mon, Oct 30, 2023 at 12:19 PM Mason Chen 
>> wrote:
>>
>> > +1 (non-binding)
>> >
>> > * Verified hashes and signatures
>> > * Verified no binaries
>> > * Verified poms point to 3.0.1
>> > * Reviewed web PR
>> > * Built from source
>> > * Verified git tag
>> >
>> > @Xianxun, good catch. The datastream docs should be automatically
>> updated
>> > via the doc shortcode. However, it seems that the sql connector doc
>> > shortcode doesn't support the new format of
>> > `{connector-release-version}-{flink-version}`.
>> >
>> > Best,
>> > Mason
>> >
>> > On Mon, Oct 30, 2023 at 9:27 AM Sergey Nuyanzin 
>> > wrote:
>> >
>> >> +1 (non-binding)
>> >> * Verified hashes and checksums
>> >> * Built from source
>> >> * Checked release tag
>> >> * Reviewed the web PR
>> >>
>> >> On Mon, Oct 30, 2023 at 5:13 PM Tzu-Li (Gordon) Tai <
>> tzuli...@apache.org>
>> >> wrote:
>> >>
>> >> > +1 (binding)
>> >> >
>> >> > - Hashes and checksums
>> >> > - Build succeeds against 1.18.0: mvn clean install
>> >> -Dflink.version=1.18.0
>> >> > - Verified that memory leak issue is fixed for idle topics. Tested
>> >> against
>> >> > Flink 1.18.0 cluster.
>> >> >
>> >> > Thanks,
>> >> > Gordon
>> >> >
>> >> >
>> >> > On Mon, Oct 30, 2023 at 8:20 AM Samrat Deb 
>> >> wrote:
>> >> >
>> >> > > +1 (non-binding)
>> >> > >
>> >> > > - Verified signatures
>> >> > > - Verified Checksum
>> >> > > - Build with Java 8 /11 - build success
>> >> > > - Started MSK cluster and EMR cluster with flink, successfully ran
>> >> some
>> >> > > examples to read and write data to MSK.
>> >> > > - Checked release tag exists
>> >> > >
>> >> > >
>> >> > > Bests,
>> >> > > Samrat
>> >> > >
>> >> > > On Mon, Oct 30, 2023 at 3:47 PM Ahmed Hamdy 
>> >> > wrote:
>> >> > >
>> >> > > > +1 (non-binding)
>> >> > > > - Verified Singatures
>> >> > > > - Verified Checksum
>> >> > > > - Build source successfully
>> >> > > > - Checked release tag exists
>> >> > > > - Reviewed the web PR
>> >> > > > Best Regards
>> >> > > > Ahmed Hamdy
>> >> > > >
>> >> > > >
>> >> > > > On Sun, 29 Oct 2023 at 08:02, Leonard Xu 
>> wrote:
>> >> > > >
>> >> > > > > +1 (binding)
>> >> > > > >
>> >> > > > > - Verified signatures
>> >> > > > > - Verified hashsums
>> >> > > > > - Checked Github release tag
>> >> > > > > - Built from source code succeeded
>> >> > > > > - Checked release notes
>> >> > > > > - Reviewed the web PR
>> >> > > > >
>> >> > > > > Best,
>> >> > > > > Leonard
>> >> > > > >
>> >> > > > >
>> >> > > > > > 2023年10月29日 上午11:34,mystic lama 
>> 写道:
>> >> > > > > >
>> >> > > > > > +1 (non-binding)
>> >> > > > > >
>> >> > > > > > - verified signatures
>> >> > > > > > - build with Java 8 and Java 11 - build success
>> >> > > > > >
>> >> > > > > > Minor observation
>> >> > > > > > - RAT check flagged that README.md is missing ASL
>> >> > > > > >
>> >> > > > > > On Fri, 27 Oct 2023 at 23:40, Xianxun Ye <
>> >> yesorno828...@gmail.com>
>> >> > > > > wrote:
>> >> > > > > >
>> >> > > > > >> +1(non-binding)
>> >> > > > > >>
>> >> > > > > >> - Started a local Flink 1.18 cluster, read and wrote with
>> Kafka
>> >> > and
>> >> > > > > Upsert
>> >> > > > > >> Kafka connector successfully to Kafka 2.2 cluster
>> >> > > > > >>
>> >> > > > > >> One minor question: should we update the dependency manual
>> of
>> >> > these
>> >> > > > two
>> >> > > > > >> documentations[1][2]?
>> >> > > > > >>
>> >> > > > > >> [1]
>> >> > > > > >>
>> >> > > > >
>> >> > > >
>> >> > >
>> >> >
>> >>
>> https://nightlies.apache.org/flink/flink-docs-master/docs/connectors/table/kafka/#dependencies
>> >> > > > > >> [2]
>> >> > > > > >>
>> >> > > > >
>> >> > > >
>> >> > >
>> >> >
>> >>
>> https://nightlies.apache.org/flink/flink-docs-master/docs/connectors/table/upsert-kafka/#dependencies
>> >> > > > > >>
>> >> > > > > >> Best regards,
>> >> > > > > >> Xianxun
>> >> > > > > >>
>> >> > > > > >>> 2023年10月26日 16:12,Martijn Visser > >
>> >> 写道:
>> >> > > > > >>>
>> >> > > > > >>> +1 (binding)
>> >> > > > > >>>
>> >> > > > > >>> - Validated hashes
>> >> > > > > >>> - Verified signature
>> >> > > > > >>> - Verified that no binaries exist in the source archive
>> >> > > > > >>> - Build the source with Maven via mvn clean install
>> >> > > > > >>> 

Re: [VOTE] Apache Flink Kafka connector version 3.0.1, RC1

2023-10-30 Thread Tzu-Li (Gordon) Tai
Thanks for the catch on the docs and fixing it, Xianxun and Mason!

On Mon, Oct 30, 2023 at 12:36 PM Mason Chen  wrote:

> I submitted PR to fix it since I was looking at the Kafka code already:
> https://github.com/apache/flink-connector-kafka/pull/63
>
> On Mon, Oct 30, 2023 at 12:19 PM Mason Chen 
> wrote:
>
> > +1 (non-binding)
> >
> > * Verified hashes and signatures
> > * Verified no binaries
> > * Verified poms point to 3.0.1
> > * Reviewed web PR
> > * Built from source
> > * Verified git tag
> >
> > @Xianxun, good catch. The datastream docs should be automatically updated
> > via the doc shortcode. However, it seems that the sql connector doc
> > shortcode doesn't support the new format of
> > `{connector-release-version}-{flink-version}`.
> >
> > Best,
> > Mason
> >
> > On Mon, Oct 30, 2023 at 9:27 AM Sergey Nuyanzin 
> > wrote:
> >
> >> +1 (non-binding)
> >> * Verified hashes and checksums
> >> * Built from source
> >> * Checked release tag
> >> * Reviewed the web PR
> >>
> >> On Mon, Oct 30, 2023 at 5:13 PM Tzu-Li (Gordon) Tai <
> tzuli...@apache.org>
> >> wrote:
> >>
> >> > +1 (binding)
> >> >
> >> > - Hashes and checksums
> >> > - Build succeeds against 1.18.0: mvn clean install
> >> -Dflink.version=1.18.0
> >> > - Verified that memory leak issue is fixed for idle topics. Tested
> >> against
> >> > Flink 1.18.0 cluster.
> >> >
> >> > Thanks,
> >> > Gordon
> >> >
> >> >
> >> > On Mon, Oct 30, 2023 at 8:20 AM Samrat Deb 
> >> wrote:
> >> >
> >> > > +1 (non-binding)
> >> > >
> >> > > - Verified signatures
> >> > > - Verified Checksum
> >> > > - Build with Java 8 /11 - build success
> >> > > - Started MSK cluster and EMR cluster with flink, successfully ran
> >> some
> >> > > examples to read and write data to MSK.
> >> > > - Checked release tag exists
> >> > >
> >> > >
> >> > > Bests,
> >> > > Samrat
> >> > >
> >> > > On Mon, Oct 30, 2023 at 3:47 PM Ahmed Hamdy 
> >> > wrote:
> >> > >
> >> > > > +1 (non-binding)
> >> > > > - Verified Singatures
> >> > > > - Verified Checksum
> >> > > > - Build source successfully
> >> > > > - Checked release tag exists
> >> > > > - Reviewed the web PR
> >> > > > Best Regards
> >> > > > Ahmed Hamdy
> >> > > >
> >> > > >
> >> > > > On Sun, 29 Oct 2023 at 08:02, Leonard Xu 
> wrote:
> >> > > >
> >> > > > > +1 (binding)
> >> > > > >
> >> > > > > - Verified signatures
> >> > > > > - Verified hashsums
> >> > > > > - Checked Github release tag
> >> > > > > - Built from source code succeeded
> >> > > > > - Checked release notes
> >> > > > > - Reviewed the web PR
> >> > > > >
> >> > > > > Best,
> >> > > > > Leonard
> >> > > > >
> >> > > > >
> >> > > > > > 2023年10月29日 上午11:34,mystic lama  写道:
> >> > > > > >
> >> > > > > > +1 (non-binding)
> >> > > > > >
> >> > > > > > - verified signatures
> >> > > > > > - build with Java 8 and Java 11 - build success
> >> > > > > >
> >> > > > > > Minor observation
> >> > > > > > - RAT check flagged that README.md is missing ASL
> >> > > > > >
> >> > > > > > On Fri, 27 Oct 2023 at 23:40, Xianxun Ye <
> >> yesorno828...@gmail.com>
> >> > > > > wrote:
> >> > > > > >
> >> > > > > >> +1(non-binding)
> >> > > > > >>
> >> > > > > >> - Started a local Flink 1.18 cluster, read and wrote with
> Kafka
> >> > and
> >> > > > > Upsert
> >> > > > > >> Kafka connector successfully to Kafka 2.2 cluster
> >> > > > > >>
> >> > > > > >> One minor question: should we update the dependency manual of
> >> > these
> >> > > > two
> >> > > > > >> documentations[1][2]?
> >> > > > > >>
> >> > > > > >> [1]
> >> > > > > >>
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> https://nightlies.apache.org/flink/flink-docs-master/docs/connectors/table/kafka/#dependencies
> >> > > > > >> [2]
> >> > > > > >>
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> https://nightlies.apache.org/flink/flink-docs-master/docs/connectors/table/upsert-kafka/#dependencies
> >> > > > > >>
> >> > > > > >> Best regards,
> >> > > > > >> Xianxun
> >> > > > > >>
> >> > > > > >>> 2023年10月26日 16:12,Martijn Visser 
> >> 写道:
> >> > > > > >>>
> >> > > > > >>> +1 (binding)
> >> > > > > >>>
> >> > > > > >>> - Validated hashes
> >> > > > > >>> - Verified signature
> >> > > > > >>> - Verified that no binaries exist in the source archive
> >> > > > > >>> - Build the source with Maven via mvn clean install
> >> > > > > >>> -Pcheck-convergence -Dflink.version=1.18.0
> >> > > > > >>> - Verified licenses
> >> > > > > >>> - Verified web PR
> >> > > > > >>> - Started a cluster and the Flink SQL client, successfully
> >> read
> >> > and
> >> > > > > >>> wrote with the Kafka connector to Confluent Cloud with AVRO
> >> and
> >> > > > Schema
> >> > > > > >>> Registry enabled
> >> > > > > >>>
> >> > > > > >>> On Thu, Oct 26, 2023 at 5:09 AM Qingsheng Ren <
> >> re...@apache.org>
> >> > > > > wrote:
> >> > > > > 
> >> > > > >  +1 (binding)
> >> > > > > 
> >> > > > >  - Verified signature and checksum
> >> > > > >  - Verified that no binary exists in the source archive
> >> > > > >  - Built from 

Re: [VOTE] Apache Flink Kafka connector version 3.0.1, RC1

2023-10-30 Thread Mason Chen
I submitted PR to fix it since I was looking at the Kafka code already:
https://github.com/apache/flink-connector-kafka/pull/63

On Mon, Oct 30, 2023 at 12:19 PM Mason Chen  wrote:

> +1 (non-binding)
>
> * Verified hashes and signatures
> * Verified no binaries
> * Verified poms point to 3.0.1
> * Reviewed web PR
> * Built from source
> * Verified git tag
>
> @Xianxun, good catch. The datastream docs should be automatically updated
> via the doc shortcode. However, it seems that the sql connector doc
> shortcode doesn't support the new format of
> `{connector-release-version}-{flink-version}`.
>
> Best,
> Mason
>
> On Mon, Oct 30, 2023 at 9:27 AM Sergey Nuyanzin 
> wrote:
>
>> +1 (non-binding)
>> * Verified hashes and checksums
>> * Built from source
>> * Checked release tag
>> * Reviewed the web PR
>>
>> On Mon, Oct 30, 2023 at 5:13 PM Tzu-Li (Gordon) Tai 
>> wrote:
>>
>> > +1 (binding)
>> >
>> > - Hashes and checksums
>> > - Build succeeds against 1.18.0: mvn clean install
>> -Dflink.version=1.18.0
>> > - Verified that memory leak issue is fixed for idle topics. Tested
>> against
>> > Flink 1.18.0 cluster.
>> >
>> > Thanks,
>> > Gordon
>> >
>> >
>> > On Mon, Oct 30, 2023 at 8:20 AM Samrat Deb 
>> wrote:
>> >
>> > > +1 (non-binding)
>> > >
>> > > - Verified signatures
>> > > - Verified Checksum
>> > > - Build with Java 8 /11 - build success
>> > > - Started MSK cluster and EMR cluster with flink, successfully ran
>> some
>> > > examples to read and write data to MSK.
>> > > - Checked release tag exists
>> > >
>> > >
>> > > Bests,
>> > > Samrat
>> > >
>> > > On Mon, Oct 30, 2023 at 3:47 PM Ahmed Hamdy 
>> > wrote:
>> > >
>> > > > +1 (non-binding)
>> > > > - Verified Singatures
>> > > > - Verified Checksum
>> > > > - Build source successfully
>> > > > - Checked release tag exists
>> > > > - Reviewed the web PR
>> > > > Best Regards
>> > > > Ahmed Hamdy
>> > > >
>> > > >
>> > > > On Sun, 29 Oct 2023 at 08:02, Leonard Xu  wrote:
>> > > >
>> > > > > +1 (binding)
>> > > > >
>> > > > > - Verified signatures
>> > > > > - Verified hashsums
>> > > > > - Checked Github release tag
>> > > > > - Built from source code succeeded
>> > > > > - Checked release notes
>> > > > > - Reviewed the web PR
>> > > > >
>> > > > > Best,
>> > > > > Leonard
>> > > > >
>> > > > >
>> > > > > > 2023年10月29日 上午11:34,mystic lama  写道:
>> > > > > >
>> > > > > > +1 (non-binding)
>> > > > > >
>> > > > > > - verified signatures
>> > > > > > - build with Java 8 and Java 11 - build success
>> > > > > >
>> > > > > > Minor observation
>> > > > > > - RAT check flagged that README.md is missing ASL
>> > > > > >
>> > > > > > On Fri, 27 Oct 2023 at 23:40, Xianxun Ye <
>> yesorno828...@gmail.com>
>> > > > > wrote:
>> > > > > >
>> > > > > >> +1(non-binding)
>> > > > > >>
>> > > > > >> - Started a local Flink 1.18 cluster, read and wrote with Kafka
>> > and
>> > > > > Upsert
>> > > > > >> Kafka connector successfully to Kafka 2.2 cluster
>> > > > > >>
>> > > > > >> One minor question: should we update the dependency manual of
>> > these
>> > > > two
>> > > > > >> documentations[1][2]?
>> > > > > >>
>> > > > > >> [1]
>> > > > > >>
>> > > > >
>> > > >
>> > >
>> >
>> https://nightlies.apache.org/flink/flink-docs-master/docs/connectors/table/kafka/#dependencies
>> > > > > >> [2]
>> > > > > >>
>> > > > >
>> > > >
>> > >
>> >
>> https://nightlies.apache.org/flink/flink-docs-master/docs/connectors/table/upsert-kafka/#dependencies
>> > > > > >>
>> > > > > >> Best regards,
>> > > > > >> Xianxun
>> > > > > >>
>> > > > > >>> 2023年10月26日 16:12,Martijn Visser 
>> 写道:
>> > > > > >>>
>> > > > > >>> +1 (binding)
>> > > > > >>>
>> > > > > >>> - Validated hashes
>> > > > > >>> - Verified signature
>> > > > > >>> - Verified that no binaries exist in the source archive
>> > > > > >>> - Build the source with Maven via mvn clean install
>> > > > > >>> -Pcheck-convergence -Dflink.version=1.18.0
>> > > > > >>> - Verified licenses
>> > > > > >>> - Verified web PR
>> > > > > >>> - Started a cluster and the Flink SQL client, successfully
>> read
>> > and
>> > > > > >>> wrote with the Kafka connector to Confluent Cloud with AVRO
>> and
>> > > > Schema
>> > > > > >>> Registry enabled
>> > > > > >>>
>> > > > > >>> On Thu, Oct 26, 2023 at 5:09 AM Qingsheng Ren <
>> re...@apache.org>
>> > > > > wrote:
>> > > > > 
>> > > > >  +1 (binding)
>> > > > > 
>> > > > >  - Verified signature and checksum
>> > > > >  - Verified that no binary exists in the source archive
>> > > > >  - Built from source with Java 8 using -Dflink.version=1.18
>> > > > >  - Started a local Flink 1.18 cluster, submitted jobs with SQL
>> > > client
>> > > > >  reading from and writing (with exactly-once) to Kafka 3.2.3
>> > > cluster
>> > > > >  - Nothing suspicious in LICENSE and NOTICE file
>> > > > >  - Reviewed web PR
>> > > > > 
>> > > > >  Thanks for the effort, Gordon!
>> > > > > 
>> > > > >  Best,
>> > > > >  Qingsheng
>> > > > > 
>> > > > >  

[DISCUSS] Kubernetes Operator 1.7.0 release planning

2023-10-30 Thread Gyula Fóra
Hi all!

I would like to kick off the release planning for the operator 1.7.0
release. We have added quite a lot of new functionality over the last few
weeks and I think the operator is in a good state to kick this off.

Based on the original release schedule we had Nov 1 as the proposed feature
freeze date and Nov 7 as the date for the release cut / rc1.

I think this is reasonable as I am not aware of any big features / bug
fixes being worked on right now. Given the size of the changes related to
the autoscaler module refactor we should try to focus the remaining time on
testing.

I am happy to volunteer as a release manager but I am of course open to
working together with someone on this.

What do you think?

Cheers,
Gyula


Re: [VOTE] Apache Flink Kafka connector version 3.0.1, RC1

2023-10-30 Thread Mason Chen
+1 (non-binding)

* Verified hashes and signatures
* Verified no binaries
* Verified poms point to 3.0.1
* Reviewed web PR
* Built from source
* Verified git tag

@Xianxun, good catch. The datastream docs should be automatically updated
via the doc shortcode. However, it seems that the sql connector doc
shortcode doesn't support the new format of
`{connector-release-version}-{flink-version}`.

Best,
Mason

On Mon, Oct 30, 2023 at 9:27 AM Sergey Nuyanzin  wrote:

> +1 (non-binding)
> * Verified hashes and checksums
> * Built from source
> * Checked release tag
> * Reviewed the web PR
>
> On Mon, Oct 30, 2023 at 5:13 PM Tzu-Li (Gordon) Tai 
> wrote:
>
> > +1 (binding)
> >
> > - Hashes and checksums
> > - Build succeeds against 1.18.0: mvn clean install -Dflink.version=1.18.0
> > - Verified that memory leak issue is fixed for idle topics. Tested
> against
> > Flink 1.18.0 cluster.
> >
> > Thanks,
> > Gordon
> >
> >
> > On Mon, Oct 30, 2023 at 8:20 AM Samrat Deb 
> wrote:
> >
> > > +1 (non-binding)
> > >
> > > - Verified signatures
> > > - Verified Checksum
> > > - Build with Java 8 /11 - build success
> > > - Started MSK cluster and EMR cluster with flink, successfully ran some
> > > examples to read and write data to MSK.
> > > - Checked release tag exists
> > >
> > >
> > > Bests,
> > > Samrat
> > >
> > > On Mon, Oct 30, 2023 at 3:47 PM Ahmed Hamdy 
> > wrote:
> > >
> > > > +1 (non-binding)
> > > > - Verified Singatures
> > > > - Verified Checksum
> > > > - Build source successfully
> > > > - Checked release tag exists
> > > > - Reviewed the web PR
> > > > Best Regards
> > > > Ahmed Hamdy
> > > >
> > > >
> > > > On Sun, 29 Oct 2023 at 08:02, Leonard Xu  wrote:
> > > >
> > > > > +1 (binding)
> > > > >
> > > > > - Verified signatures
> > > > > - Verified hashsums
> > > > > - Checked Github release tag
> > > > > - Built from source code succeeded
> > > > > - Checked release notes
> > > > > - Reviewed the web PR
> > > > >
> > > > > Best,
> > > > > Leonard
> > > > >
> > > > >
> > > > > > 2023年10月29日 上午11:34,mystic lama  写道:
> > > > > >
> > > > > > +1 (non-binding)
> > > > > >
> > > > > > - verified signatures
> > > > > > - build with Java 8 and Java 11 - build success
> > > > > >
> > > > > > Minor observation
> > > > > > - RAT check flagged that README.md is missing ASL
> > > > > >
> > > > > > On Fri, 27 Oct 2023 at 23:40, Xianxun Ye <
> yesorno828...@gmail.com>
> > > > > wrote:
> > > > > >
> > > > > >> +1(non-binding)
> > > > > >>
> > > > > >> - Started a local Flink 1.18 cluster, read and wrote with Kafka
> > and
> > > > > Upsert
> > > > > >> Kafka connector successfully to Kafka 2.2 cluster
> > > > > >>
> > > > > >> One minor question: should we update the dependency manual of
> > these
> > > > two
> > > > > >> documentations[1][2]?
> > > > > >>
> > > > > >> [1]
> > > > > >>
> > > > >
> > > >
> > >
> >
> https://nightlies.apache.org/flink/flink-docs-master/docs/connectors/table/kafka/#dependencies
> > > > > >> [2]
> > > > > >>
> > > > >
> > > >
> > >
> >
> https://nightlies.apache.org/flink/flink-docs-master/docs/connectors/table/upsert-kafka/#dependencies
> > > > > >>
> > > > > >> Best regards,
> > > > > >> Xianxun
> > > > > >>
> > > > > >>> 2023年10月26日 16:12,Martijn Visser 
> 写道:
> > > > > >>>
> > > > > >>> +1 (binding)
> > > > > >>>
> > > > > >>> - Validated hashes
> > > > > >>> - Verified signature
> > > > > >>> - Verified that no binaries exist in the source archive
> > > > > >>> - Build the source with Maven via mvn clean install
> > > > > >>> -Pcheck-convergence -Dflink.version=1.18.0
> > > > > >>> - Verified licenses
> > > > > >>> - Verified web PR
> > > > > >>> - Started a cluster and the Flink SQL client, successfully read
> > and
> > > > > >>> wrote with the Kafka connector to Confluent Cloud with AVRO and
> > > > Schema
> > > > > >>> Registry enabled
> > > > > >>>
> > > > > >>> On Thu, Oct 26, 2023 at 5:09 AM Qingsheng Ren <
> re...@apache.org>
> > > > > wrote:
> > > > > 
> > > > >  +1 (binding)
> > > > > 
> > > > >  - Verified signature and checksum
> > > > >  - Verified that no binary exists in the source archive
> > > > >  - Built from source with Java 8 using -Dflink.version=1.18
> > > > >  - Started a local Flink 1.18 cluster, submitted jobs with SQL
> > > client
> > > > >  reading from and writing (with exactly-once) to Kafka 3.2.3
> > > cluster
> > > > >  - Nothing suspicious in LICENSE and NOTICE file
> > > > >  - Reviewed web PR
> > > > > 
> > > > >  Thanks for the effort, Gordon!
> > > > > 
> > > > >  Best,
> > > > >  Qingsheng
> > > > > 
> > > > >  On Thu, Oct 26, 2023 at 5:13 AM Tzu-Li (Gordon) Tai <
> > > > > >> tzuli...@apache.org>
> > > > >  wrote:
> > > > > 
> > > > > > Hi everyone,
> > > > > >
> > > > > > Please review and vote on release candidate #1 for version
> > 3.0.1
> > > of
> > > > > the
> > > > > > Apache Flink Kafka Connector, as follows:
> > > > > > [ ] +1, Approve the 

Re: [DISCUSS][FLINK-33240] Document deprecated options as well

2023-10-30 Thread Matthias Pohl
Thanks for your proposal, Zhanghao Chen. I think it adds more transparency
to the configuration documentation.

+1 from my side on the proposal

On Wed, Oct 11, 2023 at 2:09 PM Zhanghao Chen 
wrote:

> Hi Flink users and developers,
>
> Currently, Flink won't generate doc for the deprecated options. This might
> confuse users when upgrading from an older version of Flink: they have to
> either carefully read the release notes or check the source code for
> upgrade guidance on deprecated options.
>
> I propose to document deprecated options as well, with a "(deprecated)"
> tag placed at the beginning of the option description to highlight the
> deprecation status [1].
>
> Looking forward to your feedbacks on it.
>
> [1] https://issues.apache.org/jira/browse/FLINK-33240
>
> Best,
> Zhanghao Chen
>


Call for Presentations now open: Community over Code EU 2024

2023-10-30 Thread Ryan Skraba
(Note: You are receiving this because you are subscribed to the dev@
list for one or more projects of the Apache Software Foundation.)

It's back *and* it's new!

We're excited to announce that the first edition of Community over
Code Europe (formerly known as ApacheCon EU) which will be held at the
Radisson Blu Carlton Hotel in Bratislava, Slovakia from June 03-05,
2024! This eagerly anticipated event will be our first live EU
conference since 2019.

The Call for Presentations (CFP) for Community Over Code EU 2024 is
now open at https://eu.communityovercode.org/blog/cfp-open/,
and will close 2024/01/12 23:59:59 GMT.

We welcome submissions on any topic related to the Apache Software
Foundation, Apache projects, or the communities around those projects.
We are specifically looking for presentations in the following
categories:

* API & Microservices
* Big Data Compute
* Big Data Storage
* Cassandra
* CloudStack
* Community
* Data Engineering
* Fintech
* Groovy
* Incubator
* IoT
* Performance Engineering
* Search
* Tomcat, Httpd and other servers

Additionally, we are thrilled to introduce a new feature this year: a
poster session. This addition will provide an excellent platform for
showcasing high-level projects and incubator initiatives in a visually
engaging manner. We believe this will foster lively discussions and
facilitate networking opportunities among participants.

All my best, and thanks so much for your participation,

Ryan Skraba (on behalf of the program committee)

[Countdown]: https://www.timeanddate.com/countdown/to?iso=20240112T2359=1440


Re: [DISCUSS] Confluent Avro support without Schema Registry access

2023-10-30 Thread Ryan Skraba
Hello!  I took a look at FLINK-33045, which is somewhat related: In
that improvement, the author wants to control who registers schemas.
The Flink job would know the Avro schema to use, and would look up the
ID to use in framing the Avro binary.  It uses but never changes the
schema registry.

Here it sounds like you want nearly the same thing with one more step:
if the Flink job is configured with the schema to use, it could also
be pre-configured with the ID that the schema registry knows.
Technically, it could be configured with a *set* of schemas mapped to
their IDs when the job starts, but I imagine this would be pretty
clunky.

I'm curious if you can share what customer use cases wouldn't want
access to the schema registry!  One of the reasons it exists is to
prevent systems from writing unreadable or corrupted data to a Kafka
topic (or other messaging system) -- which I think is what Martijn is
asking about.  It's unlikely to be a performance gain from hiding it.

Thanks for bringing this up for discussion!  Ryan

[FLINK-33045]: https://issues.apache.org/jira/browse/FLINK-33045
[Single Object Encoding]:
https://avro.apache.org/docs/1.11.1/specification/_print/#single-object-encoding-specification

On Fri, Oct 27, 2023 at 3:13 PM Dale Lane  wrote:
>
> > if you strip the magic byte, and the schema has
> > evolved when you're consuming it from Flink,
> > you can end up with deserialization errors given
> > that a field might have been deleted/added/
> > changed etc.
>
> Aren’t we already fairly dependent on the schema remaining consistent, 
> because otherwise we’d need to update the table schema as well?
>
> > it wouldn't work when you actually want to
> > write avro-confluent, because that requires a
> > check when producing if you're still being compliant.
>
> I’m not sure what you mean here, sorry. Are you thinking about issues if you 
> needed to mix-and-match with both formatters at the same time? (Rather than 
> just using the Avro formatter as I was describing)
>
> Kind regards
>
> Dale
>
>
>
> From: Martijn Visser 
> Date: Friday, 27 October 2023 at 14:03
> To: dev@flink.apache.org 
> Subject: [EXTERNAL] Re: [DISCUSS] Confluent Avro support without Schema 
> Registry access
> Hi Dale,
>
> I'm struggling to understand in what cases you want to read data
> serialized in connection with Confluent Schema Registry, but can't get
> access to the Schema Registry service. It seems like a rather exotic
> situation and it beats the purposes of using a Schema Registry in the
> first place? I also doubt that it's actually really useful: if you
> strip the magic byte, and the schema has evolved when you're consuming
> it from Flink, you can end up with deserialization errors given that a
> field might have been deleted/added/changed etc. Also, it wouldn't
> work when you actually want to write avro-confluent, because that
> requires a check when producing if you're still being compliant.
>
> Best regards,
>
> Martijn
>
> On Fri, Oct 27, 2023 at 2:53 PM Dale Lane  wrote:
> >
> > TLDR:
> > We currently require a connection to a Confluent Schema Registry to be able 
> > to work with Confluent Avro data. With a small modification to the Avro 
> > formatter, I think we could also offer the ability to process this type of 
> > data without requiring access to the schema registry.
> >
> > What would people think of such an enhancement?
> >
> > -
> >
> > When working with Avro data, there are two formats available to us: avro 
> > and avro-confluent.
> >
> > avro
> > Data it supports: Avro records
> > Approach: You specify a table schema and it derives an appropriate Avro 
> > schema from this.
> >
> > avro-confluent
> > Data it supports: Confluent’s variant[1] of the Avro encoding
> > Approach: You provide connection details (URL, credentials, 
> > keystore/truststore, schema lookup strategy, etc.) for retrieving an 
> > appropriate schema from the Confluent Schema Registry.
> >
> > What this means is if you have Confluent Avro data[2] that you want to use 
> > in Flink, you currently have to use the avro-confluent format, and that 
> > means you need to provide Flink with access to your Schema Registry.
> >
> > I think there will be times where you may not want, or may not be able, to 
> > provide Flink with direct access to a Schema Registry. In such cases, it 
> > would be useful to support the same behaviour that the avro format does 
> > (i.e. allow you to explicitly specify a table schema)
> >
> > This could be achieved with a very minor modification to the avro formatter.
> >
> > For reading records, we could add an option to the formatter to highlight 
> > when records will be Confluent Avro. If that option is set, we just need 
> > the formatter to skip the first bytes with the schema ID/version (it can 
> > then use the remaining bytes with a regular Avro decoder as it does today – 
> > the existing implementation would be essentially unchanged).
> >
> > For writing records, something similar would 

Re: [VOTE] Apache Flink Kafka connector version 3.0.1, RC1

2023-10-30 Thread Sergey Nuyanzin
+1 (non-binding)
* Verified hashes and checksums
* Built from source
* Checked release tag
* Reviewed the web PR

On Mon, Oct 30, 2023 at 5:13 PM Tzu-Li (Gordon) Tai 
wrote:

> +1 (binding)
>
> - Hashes and checksums
> - Build succeeds against 1.18.0: mvn clean install -Dflink.version=1.18.0
> - Verified that memory leak issue is fixed for idle topics. Tested against
> Flink 1.18.0 cluster.
>
> Thanks,
> Gordon
>
>
> On Mon, Oct 30, 2023 at 8:20 AM Samrat Deb  wrote:
>
> > +1 (non-binding)
> >
> > - Verified signatures
> > - Verified Checksum
> > - Build with Java 8 /11 - build success
> > - Started MSK cluster and EMR cluster with flink, successfully ran some
> > examples to read and write data to MSK.
> > - Checked release tag exists
> >
> >
> > Bests,
> > Samrat
> >
> > On Mon, Oct 30, 2023 at 3:47 PM Ahmed Hamdy 
> wrote:
> >
> > > +1 (non-binding)
> > > - Verified Singatures
> > > - Verified Checksum
> > > - Build source successfully
> > > - Checked release tag exists
> > > - Reviewed the web PR
> > > Best Regards
> > > Ahmed Hamdy
> > >
> > >
> > > On Sun, 29 Oct 2023 at 08:02, Leonard Xu  wrote:
> > >
> > > > +1 (binding)
> > > >
> > > > - Verified signatures
> > > > - Verified hashsums
> > > > - Checked Github release tag
> > > > - Built from source code succeeded
> > > > - Checked release notes
> > > > - Reviewed the web PR
> > > >
> > > > Best,
> > > > Leonard
> > > >
> > > >
> > > > > 2023年10月29日 上午11:34,mystic lama  写道:
> > > > >
> > > > > +1 (non-binding)
> > > > >
> > > > > - verified signatures
> > > > > - build with Java 8 and Java 11 - build success
> > > > >
> > > > > Minor observation
> > > > > - RAT check flagged that README.md is missing ASL
> > > > >
> > > > > On Fri, 27 Oct 2023 at 23:40, Xianxun Ye 
> > > > wrote:
> > > > >
> > > > >> +1(non-binding)
> > > > >>
> > > > >> - Started a local Flink 1.18 cluster, read and wrote with Kafka
> and
> > > > Upsert
> > > > >> Kafka connector successfully to Kafka 2.2 cluster
> > > > >>
> > > > >> One minor question: should we update the dependency manual of
> these
> > > two
> > > > >> documentations[1][2]?
> > > > >>
> > > > >> [1]
> > > > >>
> > > >
> > >
> >
> https://nightlies.apache.org/flink/flink-docs-master/docs/connectors/table/kafka/#dependencies
> > > > >> [2]
> > > > >>
> > > >
> > >
> >
> https://nightlies.apache.org/flink/flink-docs-master/docs/connectors/table/upsert-kafka/#dependencies
> > > > >>
> > > > >> Best regards,
> > > > >> Xianxun
> > > > >>
> > > > >>> 2023年10月26日 16:12,Martijn Visser  写道:
> > > > >>>
> > > > >>> +1 (binding)
> > > > >>>
> > > > >>> - Validated hashes
> > > > >>> - Verified signature
> > > > >>> - Verified that no binaries exist in the source archive
> > > > >>> - Build the source with Maven via mvn clean install
> > > > >>> -Pcheck-convergence -Dflink.version=1.18.0
> > > > >>> - Verified licenses
> > > > >>> - Verified web PR
> > > > >>> - Started a cluster and the Flink SQL client, successfully read
> and
> > > > >>> wrote with the Kafka connector to Confluent Cloud with AVRO and
> > > Schema
> > > > >>> Registry enabled
> > > > >>>
> > > > >>> On Thu, Oct 26, 2023 at 5:09 AM Qingsheng Ren 
> > > > wrote:
> > > > 
> > > >  +1 (binding)
> > > > 
> > > >  - Verified signature and checksum
> > > >  - Verified that no binary exists in the source archive
> > > >  - Built from source with Java 8 using -Dflink.version=1.18
> > > >  - Started a local Flink 1.18 cluster, submitted jobs with SQL
> > client
> > > >  reading from and writing (with exactly-once) to Kafka 3.2.3
> > cluster
> > > >  - Nothing suspicious in LICENSE and NOTICE file
> > > >  - Reviewed web PR
> > > > 
> > > >  Thanks for the effort, Gordon!
> > > > 
> > > >  Best,
> > > >  Qingsheng
> > > > 
> > > >  On Thu, Oct 26, 2023 at 5:13 AM Tzu-Li (Gordon) Tai <
> > > > >> tzuli...@apache.org>
> > > >  wrote:
> > > > 
> > > > > Hi everyone,
> > > > >
> > > > > Please review and vote on release candidate #1 for version
> 3.0.1
> > of
> > > > the
> > > > > Apache Flink Kafka Connector, as follows:
> > > > > [ ] +1, Approve the release
> > > > > [ ] -1, Do not approve the release (please provide specific
> > > comments)
> > > > >
> > > > > This release contains important changes for the following:
> > > > > - Supports Flink 1.18.x series
> > > > > - [FLINK-28303] EOS violation when using LATEST_OFFSETS startup
> > > mode
> > > > > - [FLINK-33231] Memory leak causing OOM when there are no
> offsets
> > > to
> > > > >> commit
> > > > > back to Kafka
> > > > > - [FLINK-28758] FlinkKafkaConsumer fails to stop with savepoint
> > > > >
> > > > > The release candidate contains the source release as well as
> JAR
> > > > >> artifacts
> > > > > to be released to Maven, built against Flink 1.17.1 and 1.18.0.
> > > > >
> > > > > The complete staging area is available for your review, which
> > > > includes:

Re: [DISCUSS] FLIP-376: Add DISTRIBUTED BY clause

2023-10-30 Thread Timo Walther

Hi Jing,

> Have you considered using BUCKET BY directly?

Which vendor uses this syntax? Most vendors that I checked call this 
concept "distribution".


In any case, the "BY" is optional, so certain DDL statements would 
declare it like "BUCKET INTO 6 BUCKETS"? And following the PARTITIONED, 
we should use the passive voice.


> Did you mean users can use their own algorithm? How to do it?

"own algorithm" only refers to deciding between a list of partitioning 
strategies (namely hash and range partitioning) if the connector offers 
more than one.


Regards,
Timo


On 30.10.23 12:39, Jing Ge wrote:

Hi Timo,

The FLIP looks great! Thanks for bringing it to our attention! In order to
make sure we are on the same page, I would ask some questions:

1. DISTRIBUTED BY reminds me DISTRIBUTE BY from Hive like Benchao mentioned
which is used to distribute rows amond reducers, i.e. focusing on the
shuffle during the computation. The FLIP is focusing more on storage, if I
am not mistaken. Have you considered using BUCKET BY directly?

2. According to the FLIP: " CREATE TABLE MyTable (uid BIGINT, name STRING)
DISTRIBUTED BY HASH(uid) INTO 6 BUCKETS

- For advanced users, the algorithm can be defined explicitly.
- Currently, either HASH() or RANGE().

"
Did you mean users can use their own algorithm? How to do it?

Best regards,
Jing

On Mon, Oct 30, 2023 at 11:13 AM Timo Walther  wrote:


Let me reply to the feedback from Yunfan:

  > Distribute by in DML is also supported by Hive

I see DISTRIBUTED BY and DISTRIBUTE BY as two separate discussions. This
discussion is about DDL. For DDL, we have more freedom as every vendor
has custom syntax for CREATE TABLE clauses. Furthermore, this is tightly
connector to the connector implementation, not the engine. However, for
DML we need to watch out for standard compliance and introduce changes
with high caution.

How a LookupTableSource interprets the DISTRIBUTED BY is
connector-dependent in my opinion. In general this FLIP is a sink
ability, but we could have a follow FLIP that helps in distributing load
of lookup joins.

  > to avoid data skew problem

I understand the use case and that it is important to solve it
eventually. Maybe a solution might be to introduce helper Polymorphic
Table Functions [1] in the future instead of new syntax.

[1]

https://www.ifis.uni-luebeck.de/~moeller/Lectures/WS-19-20/NDBDM/12-Literature/Michels-SQL-2016.pdf


Let me reply to the feedback from Benchao:

  > Do you think it's useful to add some extensibility for the hash
strategy

The hash strategy is fully determined by the connector, not the Flink
SQL engine. We are not using Flink's hash strategy in any way. If the
hash strategy for the regular Flink file system connector should be
changed, this should be expressed via config option. Otherwise we should
offer a dedicated `hive-filesystem` or `spark-filesystem` connector.

Regards,
Timo


On 30.10.23 10:44, Timo Walther wrote:

Hi Jark,

my intention was to avoid too complex syntax in the first version. In
the past years, we could enable use cases also without this clause, so
we should be careful with overloading it with too functionality in the
first version. We can still iterate on it later, the interfaces are
flexible enough to support more in the future.

I agree that maybe an explicit HASH and RANGE doesn't harm. Also making
the bucket number optional.

I updated the FLIP accordingly. Now the SupportsBucketing interface
declares more methods that help in validation and proving helpful error
messages to users.

Let me know what you think.

Regards,
Timo


On 27.10.23 10:20, Jark Wu wrote:

Hi Timo,

Thanks for starting this discussion. I really like it!
The FLIP is already in good shape, I only have some minor comments.

1. Could we also support HASH and RANGE distribution kind on the DDL
syntax?
I noticed that HASH and UNKNOWN are introduced in the Java API, but
not in
the syntax.

2. Can we make "INTO n BUCKETS" optional in CREATE TABLE and ALTER

TABLE?

Some storage engines support automatically determining the bucket number
based on
the cluster resources and data size of the table. For example,
StarRocks[1]
and Paimon[2].

Best,
Jark

[1]:


https://docs.starrocks.io/en-us/latest/table_design/Data_distribution#determine-the-number-of-buckets

[2]:


https://paimon.apache.org/docs/0.5/concepts/primary-key-table/#dynamic-bucket


On Thu, 26 Oct 2023 at 18:26, Jingsong Li 

wrote:



Very thanks Timo for starting this discussion.

Big +1 for this.

The design looks good to me!

We can add some documentation for connector developers. For example:
for sink, If there needs some keyby, please finish the keyby by the
connector itself. SupportsBucketing is just a marker interface.

Best,
Jingsong

On Thu, Oct 26, 2023 at 5:00 PM Timo Walther 

wrote:


Hi everyone,

I would like to start a discussion on FLIP-376: Add DISTRIBUTED BY
clause [1].

Many SQL vendors expose the concepts of Partitioning, Bucketing, and
Clustering. 

Re: [DISCUSS] AWS Connectors v4.2.0 release + 1.18 support

2023-10-30 Thread Tzu-Li (Gordon) Tai
+1

On Mon, Oct 30, 2023 at 9:00 AM Danny Cranmer 
wrote:

> Hey,
>
> > Did you mean skip 4.1.1, since 4.1.0 has already been released?
>
> I meant skip "4.1.0-1.18" since we could release this with the existing
> source. We will additionally skip 4.1.1 and jump to 4.2.0 since this
> version has features it should be a minor version rather than a patch [1].
>
> > Does this imply that the 4.1.x series will be reserved for Flink 1.17,
> and the 4.2.x series will correspond to Flink 1.18?
>
> 4.1.x will receive bug fixes for Flink 1.17.
> 4.2.x will receive bug fixes and features for Flink 1.17 and 1.18.
>
> Thanks,
> Danny
>
> [1] https://semver.org/
>
>
> On Mon, Oct 30, 2023 at 3:47 PM Samrat Deb  wrote:
>
> > Hi Danny ,
> >
> > Thank you for driving it.
> >
> > +1 (non binding )
> >
> >
> > >  I am proposing we skip 4.1.0 for Flink 1.18 and go
> > straight to 4.2.0.
> >
> > Does this imply that the 4.1.x series will be reserved for Flink 1.17,
> and
> > the 4.2.x series will correspond to Flink 1.18?
> >
> > Bests,
> > Samrat
> >
> >
> > On Mon, Oct 30, 2023 at 7:32 PM Jing Ge 
> > wrote:
> >
> > > Hi Danny,
> > >
> > > +1 Thanks for driving it. Did you mean skip 4.1.1, since 4.1.0 has
> > already
> > > been released?
> > >
> > > Best regards,
> > > Jing
> > >
> > > On Mon, Oct 30, 2023 at 11:49 AM Danny Cranmer <
> dannycran...@apache.org>
> > > wrote:
> > >
> > > > Hello all,
> > > >
> > > > I would like to start the discussion to release Apache Flink AWS
> > > connectors
> > > > v4.2.0. We released v4.1.0 over six months ago on 2023-04-03. Since
> > then
> > > we
> > > > have resolved 23 issues [1]. Additionally now Flink 1.18 is live we
> > need
> > > to
> > > > add support for this. I am proposing we skip 4.1.0 for Flink 1.18 and
> > go
> > > > straight to 4.2.0. The CI is stable [2].
> > > >
> > > > I volunteer myself as the release manager.
> > > >
> > > > Thanks,
> > > > Danny
> > > >
> > > > [1]
> > > >
> > > >
> > >
> >
> https://issues.apache.org/jira/browse/FLINK-33021?jql=statusCategory%20%3D%20done%20AND%20project%20%3D%2012315522%20AND%20fixVersion%20%3D%2012353011%20ORDER%20BY%20priority%20DESC%2C%20key%20ASC
> > > > [2] https://github.com/apache/flink-connector-aws/actions
> > > >
> > >
> >
>


Re: [VOTE] Apache Flink Kafka connector version 3.0.1, RC1

2023-10-30 Thread Tzu-Li (Gordon) Tai
+1 (binding)

- Hashes and checksums
- Build succeeds against 1.18.0: mvn clean install -Dflink.version=1.18.0
- Verified that memory leak issue is fixed for idle topics. Tested against
Flink 1.18.0 cluster.

Thanks,
Gordon


On Mon, Oct 30, 2023 at 8:20 AM Samrat Deb  wrote:

> +1 (non-binding)
>
> - Verified signatures
> - Verified Checksum
> - Build with Java 8 /11 - build success
> - Started MSK cluster and EMR cluster with flink, successfully ran some
> examples to read and write data to MSK.
> - Checked release tag exists
>
>
> Bests,
> Samrat
>
> On Mon, Oct 30, 2023 at 3:47 PM Ahmed Hamdy  wrote:
>
> > +1 (non-binding)
> > - Verified Singatures
> > - Verified Checksum
> > - Build source successfully
> > - Checked release tag exists
> > - Reviewed the web PR
> > Best Regards
> > Ahmed Hamdy
> >
> >
> > On Sun, 29 Oct 2023 at 08:02, Leonard Xu  wrote:
> >
> > > +1 (binding)
> > >
> > > - Verified signatures
> > > - Verified hashsums
> > > - Checked Github release tag
> > > - Built from source code succeeded
> > > - Checked release notes
> > > - Reviewed the web PR
> > >
> > > Best,
> > > Leonard
> > >
> > >
> > > > 2023年10月29日 上午11:34,mystic lama  写道:
> > > >
> > > > +1 (non-binding)
> > > >
> > > > - verified signatures
> > > > - build with Java 8 and Java 11 - build success
> > > >
> > > > Minor observation
> > > > - RAT check flagged that README.md is missing ASL
> > > >
> > > > On Fri, 27 Oct 2023 at 23:40, Xianxun Ye 
> > > wrote:
> > > >
> > > >> +1(non-binding)
> > > >>
> > > >> - Started a local Flink 1.18 cluster, read and wrote with Kafka and
> > > Upsert
> > > >> Kafka connector successfully to Kafka 2.2 cluster
> > > >>
> > > >> One minor question: should we update the dependency manual of these
> > two
> > > >> documentations[1][2]?
> > > >>
> > > >> [1]
> > > >>
> > >
> >
> https://nightlies.apache.org/flink/flink-docs-master/docs/connectors/table/kafka/#dependencies
> > > >> [2]
> > > >>
> > >
> >
> https://nightlies.apache.org/flink/flink-docs-master/docs/connectors/table/upsert-kafka/#dependencies
> > > >>
> > > >> Best regards,
> > > >> Xianxun
> > > >>
> > > >>> 2023年10月26日 16:12,Martijn Visser  写道:
> > > >>>
> > > >>> +1 (binding)
> > > >>>
> > > >>> - Validated hashes
> > > >>> - Verified signature
> > > >>> - Verified that no binaries exist in the source archive
> > > >>> - Build the source with Maven via mvn clean install
> > > >>> -Pcheck-convergence -Dflink.version=1.18.0
> > > >>> - Verified licenses
> > > >>> - Verified web PR
> > > >>> - Started a cluster and the Flink SQL client, successfully read and
> > > >>> wrote with the Kafka connector to Confluent Cloud with AVRO and
> > Schema
> > > >>> Registry enabled
> > > >>>
> > > >>> On Thu, Oct 26, 2023 at 5:09 AM Qingsheng Ren 
> > > wrote:
> > > 
> > >  +1 (binding)
> > > 
> > >  - Verified signature and checksum
> > >  - Verified that no binary exists in the source archive
> > >  - Built from source with Java 8 using -Dflink.version=1.18
> > >  - Started a local Flink 1.18 cluster, submitted jobs with SQL
> client
> > >  reading from and writing (with exactly-once) to Kafka 3.2.3
> cluster
> > >  - Nothing suspicious in LICENSE and NOTICE file
> > >  - Reviewed web PR
> > > 
> > >  Thanks for the effort, Gordon!
> > > 
> > >  Best,
> > >  Qingsheng
> > > 
> > >  On Thu, Oct 26, 2023 at 5:13 AM Tzu-Li (Gordon) Tai <
> > > >> tzuli...@apache.org>
> > >  wrote:
> > > 
> > > > Hi everyone,
> > > >
> > > > Please review and vote on release candidate #1 for version 3.0.1
> of
> > > the
> > > > Apache Flink Kafka Connector, as follows:
> > > > [ ] +1, Approve the release
> > > > [ ] -1, Do not approve the release (please provide specific
> > comments)
> > > >
> > > > This release contains important changes for the following:
> > > > - Supports Flink 1.18.x series
> > > > - [FLINK-28303] EOS violation when using LATEST_OFFSETS startup
> > mode
> > > > - [FLINK-33231] Memory leak causing OOM when there are no offsets
> > to
> > > >> commit
> > > > back to Kafka
> > > > - [FLINK-28758] FlinkKafkaConsumer fails to stop with savepoint
> > > >
> > > > The release candidate contains the source release as well as JAR
> > > >> artifacts
> > > > to be released to Maven, built against Flink 1.17.1 and 1.18.0.
> > > >
> > > > The complete staging area is available for your review, which
> > > includes:
> > > > * JIRA release notes [1],
> > > > * the official Apache source release to be deployed to
> > > dist.apache.org
> > > > [2],
> > > > which are signed with the key with fingerprint
> > > > 1C1E2394D3194E1944613488F320986D35C33D6A [3],
> > > > * all artifacts to be deployed to the Maven Central Repository
> [4],
> > > > * source code tag v3.0.1-rc1 [5],
> > > > * website pull request listing the new release [6].
> > > >
> > > > The vote will be open 

Re: [DISCUSS] AWS Connectors v4.2.0 release + 1.18 support

2023-10-30 Thread Danny Cranmer
Hey,

> Did you mean skip 4.1.1, since 4.1.0 has already been released?

I meant skip "4.1.0-1.18" since we could release this with the existing
source. We will additionally skip 4.1.1 and jump to 4.2.0 since this
version has features it should be a minor version rather than a patch [1].

> Does this imply that the 4.1.x series will be reserved for Flink 1.17,
and the 4.2.x series will correspond to Flink 1.18?

4.1.x will receive bug fixes for Flink 1.17.
4.2.x will receive bug fixes and features for Flink 1.17 and 1.18.

Thanks,
Danny

[1] https://semver.org/


On Mon, Oct 30, 2023 at 3:47 PM Samrat Deb  wrote:

> Hi Danny ,
>
> Thank you for driving it.
>
> +1 (non binding )
>
>
> >  I am proposing we skip 4.1.0 for Flink 1.18 and go
> straight to 4.2.0.
>
> Does this imply that the 4.1.x series will be reserved for Flink 1.17, and
> the 4.2.x series will correspond to Flink 1.18?
>
> Bests,
> Samrat
>
>
> On Mon, Oct 30, 2023 at 7:32 PM Jing Ge 
> wrote:
>
> > Hi Danny,
> >
> > +1 Thanks for driving it. Did you mean skip 4.1.1, since 4.1.0 has
> already
> > been released?
> >
> > Best regards,
> > Jing
> >
> > On Mon, Oct 30, 2023 at 11:49 AM Danny Cranmer 
> > wrote:
> >
> > > Hello all,
> > >
> > > I would like to start the discussion to release Apache Flink AWS
> > connectors
> > > v4.2.0. We released v4.1.0 over six months ago on 2023-04-03. Since
> then
> > we
> > > have resolved 23 issues [1]. Additionally now Flink 1.18 is live we
> need
> > to
> > > add support for this. I am proposing we skip 4.1.0 for Flink 1.18 and
> go
> > > straight to 4.2.0. The CI is stable [2].
> > >
> > > I volunteer myself as the release manager.
> > >
> > > Thanks,
> > > Danny
> > >
> > > [1]
> > >
> > >
> >
> https://issues.apache.org/jira/browse/FLINK-33021?jql=statusCategory%20%3D%20done%20AND%20project%20%3D%2012315522%20AND%20fixVersion%20%3D%2012353011%20ORDER%20BY%20priority%20DESC%2C%20key%20ASC
> > > [2] https://github.com/apache/flink-connector-aws/actions
> > >
> >
>


Re: [DISCUSS] FLIP-377: Support configuration to disable filter push down for Table/SQL Sources

2023-10-30 Thread Jiabao Sun
Thanks Hang for the suggestion.


I think the configuration of TableSource is not closely related to 
SourceReader, 
so I prefer to introduce a independent configuration class TableSourceOptions 
in the flink-table-common module, similar to LookupOptions.

For the second point, I suggest adding Java doc to the SupportsXXXPushDown 
interfaces, providing detailed information on these options that needs to be 
supported.

I have made updates in the FLIP document.
Please help check it again.


Best,
Jiabao


> 2023年10月30日 17:23,Hang Ruan  写道:
> 
> Thanks for the improvements, Jiabao.
> 
> There are some details that I am not sure about.
> 1. The new option `source.filter-push-down.enabled` will be added to which
> class? I think it should be `SourceReaderOptions`.
> 2. How are the connector developers able to know and follow the FLIP? Do we
> need an abstract base class or provide a default method?
> 
> Best,
> Hang
> 
> Jiabao Sun  于2023年10月30日周一 14:45写道:
> 
>> Hi, all,
>> 
>> Thanks for the lively discussion.
>> 
>> Based on the discussion, I have made some adjustments to the FLIP document:
>> 
>> 1. The name of the newly added option has been changed to
>> "source.filter-push-down.enabled".
>> 2. Considering compatibility with older versions, the newly added
>> "source.filter-push-down.enabled" option needs to respect the optimizer's
>> "table.optimizer.source.predicate-pushdown-enabled" option.
>>But there is a consideration to remove the old option in Flink 2.0.
>> 3. We can provide more options to disable other source abilities with side
>> effects, such as “source.aggregate.enabled” and “source.projection.enabled"
>>This is not urgent and can be continuously introduced.
>> 
>> Looking forward to your feedback again.
>> 
>> Best,
>> Jiabao
>> 
>> 
>>> 2023年10月29日 08:45,Becket Qin  写道:
>>> 
>>> Thanks for digging into the git history, Jark. I agree it makes sense to
>>> deprecate this API in 2.0.
>>> 
>>> Cheers,
>>> 
>>> Jiangjie (Becket) Qin
>>> 
>>> On Fri, Oct 27, 2023 at 5:47 PM Jark Wu  wrote:
>>> 
 Hi Becket,
 
 I checked the history of "
 *table.optimizer.source.predicate-pushdown-enabled*",
 it seems it was introduced since the legacy FilterableTableSource
 interface
 which might be an experiential feature at that time. I don't see the
 necessity
 of this option at the moment. Maybe we can deprecate this option and
>> drop
 it
 in Flink 2.0[1] if it is not necessary anymore. This may help to
 simplify this discussion.
 
 
 Best,
 Jark
 
 [1]: https://issues.apache.org/jira/browse/FLINK-32383
 
 
 
 On Thu, 26 Oct 2023 at 10:14, Becket Qin  wrote:
 
> Thanks for the proposal, Jiabao. My two cents below:
> 
> 1. If I understand correctly, the motivation of the FLIP is mainly to
> make predicate pushdown optional on SOME of the Sources. If so,
>> intuitively
> the configuration should be Source specific instead of general.
>> Otherwise,
> we will end up with general configurations that may not take effect for
> some of the Source implementations. This violates the basic rule of a
> configuration - it does what it says, regardless of the implementation.
> While configuration standardization is usually a good thing, it should
>> not
> break the basic rules.
> If we really want to have this general configuration, for the sources
> this configuration does not apply, they should throw an exception to
>> make
> it clear that this configuration is not supported. However, that seems
>> ugly.
> 
> 2. I think the actual motivation of this FLIP is about "how a source
> should implement predicate pushdown efficiently", not "whether
>> predicate
> pushdown should be applied to the source." For example, if a source
>> wants
> to avoid additional computing load in the external system, it can
>> always
> read the entire record and apply the predicates by itself. However,
>> from
> the Flink perspective, the predicate pushdown is applied, it is just
> implemented differently by the source. So the design principle here is
>> that
> Flink only cares about whether a source supports predicate pushdown or
>> not,
> it does not care about the implementation efficiency / side effect of
>> the
> predicates pushdown. It is the Source implementation's responsibility
>> to
> ensure the predicates pushdown is implemented efficiently and does not
> impose excessive pressure on the external system. And it is OK to have
> additional configurations to achieve this goal. Obviously, such
> configurations will be source specific in this case.
> 
> 3. Regarding the existing configurations of
>> *table.optimizer.source.predicate-pushdown-enabled.
> *I am not sure why we need it. Supposedly, if a source implements a
> SupportsXXXPushDown interface, the optimizer should push the
>> corresponding
> predicates 

Re: [DISCUSS] AWS Connectors v4.2.0 release + 1.18 support

2023-10-30 Thread Samrat Deb
Hi Danny ,

Thank you for driving it.

+1 (non binding )


>  I am proposing we skip 4.1.0 for Flink 1.18 and go
straight to 4.2.0.

Does this imply that the 4.1.x series will be reserved for Flink 1.17, and
the 4.2.x series will correspond to Flink 1.18?

Bests,
Samrat


On Mon, Oct 30, 2023 at 7:32 PM Jing Ge  wrote:

> Hi Danny,
>
> +1 Thanks for driving it. Did you mean skip 4.1.1, since 4.1.0 has already
> been released?
>
> Best regards,
> Jing
>
> On Mon, Oct 30, 2023 at 11:49 AM Danny Cranmer 
> wrote:
>
> > Hello all,
> >
> > I would like to start the discussion to release Apache Flink AWS
> connectors
> > v4.2.0. We released v4.1.0 over six months ago on 2023-04-03. Since then
> we
> > have resolved 23 issues [1]. Additionally now Flink 1.18 is live we need
> to
> > add support for this. I am proposing we skip 4.1.0 for Flink 1.18 and go
> > straight to 4.2.0. The CI is stable [2].
> >
> > I volunteer myself as the release manager.
> >
> > Thanks,
> > Danny
> >
> > [1]
> >
> >
> https://issues.apache.org/jira/browse/FLINK-33021?jql=statusCategory%20%3D%20done%20AND%20project%20%3D%2012315522%20AND%20fixVersion%20%3D%2012353011%20ORDER%20BY%20priority%20DESC%2C%20key%20ASC
> > [2] https://github.com/apache/flink-connector-aws/actions
> >
>


[jira] [Created] (FLINK-33401) Kafka connector has broken version

2023-10-30 Thread Pavel Khokhlov (Jira)
Pavel Khokhlov created FLINK-33401:
--

 Summary: Kafka connector has broken version
 Key: FLINK-33401
 URL: https://issues.apache.org/jira/browse/FLINK-33401
 Project: Flink
  Issue Type: Bug
  Components: Connectors / Kafka
Reporter: Pavel Khokhlov


Trying to run Flink 1.18 with Kafka Connector

but official documentation has a bug  

[https://nightlies.apache.org/flink/flink-docs-release-1.18/docs/connectors/datastream/kafka/]
{noformat}

org.apache.flink
flink-connector-kafka
-1.18
{noformat}
Basically version *-1.18* doesn't exist.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [VOTE] Apache Flink Kafka connector version 3.0.1, RC1

2023-10-30 Thread Samrat Deb
+1 (non-binding)

- Verified signatures
- Verified Checksum
- Build with Java 8 /11 - build success
- Started MSK cluster and EMR cluster with flink, successfully ran some
examples to read and write data to MSK.
- Checked release tag exists


Bests,
Samrat

On Mon, Oct 30, 2023 at 3:47 PM Ahmed Hamdy  wrote:

> +1 (non-binding)
> - Verified Singatures
> - Verified Checksum
> - Build source successfully
> - Checked release tag exists
> - Reviewed the web PR
> Best Regards
> Ahmed Hamdy
>
>
> On Sun, 29 Oct 2023 at 08:02, Leonard Xu  wrote:
>
> > +1 (binding)
> >
> > - Verified signatures
> > - Verified hashsums
> > - Checked Github release tag
> > - Built from source code succeeded
> > - Checked release notes
> > - Reviewed the web PR
> >
> > Best,
> > Leonard
> >
> >
> > > 2023年10月29日 上午11:34,mystic lama  写道:
> > >
> > > +1 (non-binding)
> > >
> > > - verified signatures
> > > - build with Java 8 and Java 11 - build success
> > >
> > > Minor observation
> > > - RAT check flagged that README.md is missing ASL
> > >
> > > On Fri, 27 Oct 2023 at 23:40, Xianxun Ye 
> > wrote:
> > >
> > >> +1(non-binding)
> > >>
> > >> - Started a local Flink 1.18 cluster, read and wrote with Kafka and
> > Upsert
> > >> Kafka connector successfully to Kafka 2.2 cluster
> > >>
> > >> One minor question: should we update the dependency manual of these
> two
> > >> documentations[1][2]?
> > >>
> > >> [1]
> > >>
> >
> https://nightlies.apache.org/flink/flink-docs-master/docs/connectors/table/kafka/#dependencies
> > >> [2]
> > >>
> >
> https://nightlies.apache.org/flink/flink-docs-master/docs/connectors/table/upsert-kafka/#dependencies
> > >>
> > >> Best regards,
> > >> Xianxun
> > >>
> > >>> 2023年10月26日 16:12,Martijn Visser  写道:
> > >>>
> > >>> +1 (binding)
> > >>>
> > >>> - Validated hashes
> > >>> - Verified signature
> > >>> - Verified that no binaries exist in the source archive
> > >>> - Build the source with Maven via mvn clean install
> > >>> -Pcheck-convergence -Dflink.version=1.18.0
> > >>> - Verified licenses
> > >>> - Verified web PR
> > >>> - Started a cluster and the Flink SQL client, successfully read and
> > >>> wrote with the Kafka connector to Confluent Cloud with AVRO and
> Schema
> > >>> Registry enabled
> > >>>
> > >>> On Thu, Oct 26, 2023 at 5:09 AM Qingsheng Ren 
> > wrote:
> > 
> >  +1 (binding)
> > 
> >  - Verified signature and checksum
> >  - Verified that no binary exists in the source archive
> >  - Built from source with Java 8 using -Dflink.version=1.18
> >  - Started a local Flink 1.18 cluster, submitted jobs with SQL client
> >  reading from and writing (with exactly-once) to Kafka 3.2.3 cluster
> >  - Nothing suspicious in LICENSE and NOTICE file
> >  - Reviewed web PR
> > 
> >  Thanks for the effort, Gordon!
> > 
> >  Best,
> >  Qingsheng
> > 
> >  On Thu, Oct 26, 2023 at 5:13 AM Tzu-Li (Gordon) Tai <
> > >> tzuli...@apache.org>
> >  wrote:
> > 
> > > Hi everyone,
> > >
> > > Please review and vote on release candidate #1 for version 3.0.1 of
> > the
> > > Apache Flink Kafka Connector, as follows:
> > > [ ] +1, Approve the release
> > > [ ] -1, Do not approve the release (please provide specific
> comments)
> > >
> > > This release contains important changes for the following:
> > > - Supports Flink 1.18.x series
> > > - [FLINK-28303] EOS violation when using LATEST_OFFSETS startup
> mode
> > > - [FLINK-33231] Memory leak causing OOM when there are no offsets
> to
> > >> commit
> > > back to Kafka
> > > - [FLINK-28758] FlinkKafkaConsumer fails to stop with savepoint
> > >
> > > The release candidate contains the source release as well as JAR
> > >> artifacts
> > > to be released to Maven, built against Flink 1.17.1 and 1.18.0.
> > >
> > > The complete staging area is available for your review, which
> > includes:
> > > * JIRA release notes [1],
> > > * the official Apache source release to be deployed to
> > dist.apache.org
> > > [2],
> > > which are signed with the key with fingerprint
> > > 1C1E2394D3194E1944613488F320986D35C33D6A [3],
> > > * all artifacts to be deployed to the Maven Central Repository [4],
> > > * source code tag v3.0.1-rc1 [5],
> > > * website pull request listing the new release [6].
> > >
> > > The vote will be open for at least 72 hours. It is adopted by
> > majority
> > > approval, with at least 3 PMC affirmative votes.
> > >
> > > Thanks,
> > > Gordon
> > >
> > > [1]
> > >
> > >
> > >>
> >
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12352910
> > > [2]
> > >
> > >
> > >>
> >
> https://dist.apache.org/repos/dist/dev/flink/flink-connector-kafka-3.0.1-rc1/
> > > [3] https://dist.apache.org/repos/dist/release/flink/KEYS
> > > [4]
> > >>
> https://repository.apache.org/content/repositories/orgapacheflink-1664

Re: [DISCUSS] AWS Connectors v4.2.0 release + 1.18 support

2023-10-30 Thread Jing Ge
Hi Danny,

+1 Thanks for driving it. Did you mean skip 4.1.1, since 4.1.0 has already
been released?

Best regards,
Jing

On Mon, Oct 30, 2023 at 11:49 AM Danny Cranmer 
wrote:

> Hello all,
>
> I would like to start the discussion to release Apache Flink AWS connectors
> v4.2.0. We released v4.1.0 over six months ago on 2023-04-03. Since then we
> have resolved 23 issues [1]. Additionally now Flink 1.18 is live we need to
> add support for this. I am proposing we skip 4.1.0 for Flink 1.18 and go
> straight to 4.2.0. The CI is stable [2].
>
> I volunteer myself as the release manager.
>
> Thanks,
> Danny
>
> [1]
>
> https://issues.apache.org/jira/browse/FLINK-33021?jql=statusCategory%20%3D%20done%20AND%20project%20%3D%2012315522%20AND%20fixVersion%20%3D%2012353011%20ORDER%20BY%20priority%20DESC%2C%20key%20ASC
> [2] https://github.com/apache/flink-connector-aws/actions
>


Re: Proposal for Implementing Keyed Watermarks in Apache Flink

2023-10-30 Thread Tawfek Yasser Tawfek
Hi Alexander,

Thank you for your reply.

Yes. As you showed keyed-watermarks mechanism is mainly required for the case 
when we need a fine-grained calculation for each partition
[Calculation over data produced by each individual sensor], as scalability 
factors require partitioning the calculations,
so, the keyed-watermarks mechanism is designed for this type of problem.

Thanks,
Tawfik

From: Alexander Fedulov 
Sent: 30 October 2023 13:37
To: dev@flink.apache.org 
Subject: Re: Proposal for Implementing Keyed Watermarks in Apache Flink

[You don't often get email from alexander.fedu...@gmail.com. Learn why this is 
important at https://aka.ms/LearnAboutSenderIdentification ]

Hi Tawfek,

> The idea is to generate a watermark for each key (sub-stream), in order
to avoid the fast progress of the global watermark which affects low-rate
sources.

Let's consider the sensors example from the paper. Shouldn't it be about
the delay between the time of taking the measurement and its arrival at
Flink, rather than the rate at which the measurements are produced? If a
particular sensor produces no data during a specific time window, it
doesn't make sense to wait for it—there won't be any corresponding
measurement arriving because none was produced. Thus, I believe we should
be talking about situations where data from certain sensors can arrive with
significant delay compared to most other sensors.

>From the perspective of data aggregation, there are two main scenarios:
1) Calculation over data produced by multiple sensors
2) Calculation over data produced by an individual sensor

In scenario 1), there are two subcategories:
a) Meaningful results cannot be produced without data from those delayed
sensors; hence, you need to wait longer.
  => Time is propagated by the mix of all sources. You just need to set
a bounded watermark with enough lag to accommodate the delayed results.
This is precisely what event time processing and bounded watermarks are for
(no keyed watermarking is required).
b) You need to produce the results as they are and perhaps patch them later
when the delayed data arrives.
 => Time is propagated by the mix of all sources. You produce the
results as they are but utilize allowedLateness to patch the aggregates if
needed (no keyed watermarking is required).

So, is it correct to say that keyed watermarking is applicable only in
scenario 2)?

Best,
Alexander Fedulov

On Sat, 28 Oct 2023 at 14:33, Tawfek Yasser Tawfek 
wrote:

> Thanks, Alexander for your reply.
>
> Our solution initiated from this inquiry on Stack Overflow:
>
> https://stackoverflow.com/questions/52179898/does-flink-support-keyed-watermarks-if-not-is-there-any-plan-of-implementing-i
>
> The idea is to generate a watermark for each key (sub-stream), in order to
> avoid the fast progress of the global watermark which affects low-rate
> sources.
>
> Instead of using only one watermark (vanilla/global watermark), we changed
> the API to allow moving the keyBy() before the
> assignTimestampsAndWatermarks() so the stream will be partitioned then the
> TimestampsAndWatermarkOperator will handle the generation of each watermark
> for each key (source/sub-stream/partition).
>
> *Let's discuss more if you want I have a presentation at a conference, we
> can meet or whatever is suitable.*
>
> Also, I contacted David Anderson one year ago and he followed me step by
> step and helped me a lot.
>
> I attached some messages with David.
>
>
> *Thanks & BR,*
>
>
> 
>
>
>
>
>  Tawfik Yasser Tawfik
>
> * Teaching Assistant | AI-ITCS-NU*
>
>  Office: UB1-B, Room 229
>
>  26th of July Corridor, Sheikh Zayed City, Giza, Egypt
> --
> *From:* Alexander Fedulov 
> *Sent:* 27 October 2023 20:09
> *To:* dev@flink.apache.org 
> *Subject:* Re: Proposal for Implementing Keyed Watermarks in Apache Flink
>
> [You don't often get email from alexander.fedu...@gmail.com. Learn why
> this is important at https://aka.ms/LearnAboutSenderIdentification ]
>
> Hi Tawfek,
>
> Thanks for sharing. I am trying to understand what exact real-life problem
> you are tackling with this approach. My understanding from skimming through
> the paper is that you are concerned about some outlier event producers from
> which the events can be delayed beyond what is expected in the overall
> system.
> Do I get it correctly that the keyed watermarking only targets scenarios of
> calculating keyed windows (which are also keyed by the same producer ids)?
>
> Best,
> Alexander Fedulov
>
> On Fri, 27 Oct 2023 at 19:07, Tawfek Yasser Tawfek 
> wrote:
>
> > Dear Apache Flink Development Team,
> >
> > I hope this email finds you well. I propose an exciting new feature for
> > Apache Flink that has the potential to significantly enhance its
> > capabilities in handling unbounded streams of events, particularly in the
> > context of event-time windowing.
> >
> > As you may be aware, Apache Flink has been at 

[jira] [Created] (FLINK-33400) Pulsar connector doesn't compile for Flink 1.18 due to Archunit update

2023-10-30 Thread Martijn Visser (Jira)
Martijn Visser created FLINK-33400:
--

 Summary: Pulsar connector doesn't compile for Flink 1.18 due to 
Archunit update
 Key: FLINK-33400
 URL: https://issues.apache.org/jira/browse/FLINK-33400
 Project: Flink
  Issue Type: Bug
  Components: Connectors / Pulsar
Affects Versions: pulsar-4.0.1
Reporter: Martijn Visser
Assignee: Martijn Visser






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] FLIP-376: Add DISTRIBUTED BY clause

2023-10-30 Thread Jing Ge
Hi Timo,

The FLIP looks great! Thanks for bringing it to our attention! In order to
make sure we are on the same page, I would ask some questions:

1. DISTRIBUTED BY reminds me DISTRIBUTE BY from Hive like Benchao mentioned
which is used to distribute rows amond reducers, i.e. focusing on the
shuffle during the computation. The FLIP is focusing more on storage, if I
am not mistaken. Have you considered using BUCKET BY directly?

2. According to the FLIP: " CREATE TABLE MyTable (uid BIGINT, name STRING)
DISTRIBUTED BY HASH(uid) INTO 6 BUCKETS

   - For advanced users, the algorithm can be defined explicitly.
   - Currently, either HASH() or RANGE().

"
Did you mean users can use their own algorithm? How to do it?

Best regards,
Jing

On Mon, Oct 30, 2023 at 11:13 AM Timo Walther  wrote:

> Let me reply to the feedback from Yunfan:
>
>  > Distribute by in DML is also supported by Hive
>
> I see DISTRIBUTED BY and DISTRIBUTE BY as two separate discussions. This
> discussion is about DDL. For DDL, we have more freedom as every vendor
> has custom syntax for CREATE TABLE clauses. Furthermore, this is tightly
> connector to the connector implementation, not the engine. However, for
> DML we need to watch out for standard compliance and introduce changes
> with high caution.
>
> How a LookupTableSource interprets the DISTRIBUTED BY is
> connector-dependent in my opinion. In general this FLIP is a sink
> ability, but we could have a follow FLIP that helps in distributing load
> of lookup joins.
>
>  > to avoid data skew problem
>
> I understand the use case and that it is important to solve it
> eventually. Maybe a solution might be to introduce helper Polymorphic
> Table Functions [1] in the future instead of new syntax.
>
> [1]
>
> https://www.ifis.uni-luebeck.de/~moeller/Lectures/WS-19-20/NDBDM/12-Literature/Michels-SQL-2016.pdf
>
>
> Let me reply to the feedback from Benchao:
>
>  > Do you think it's useful to add some extensibility for the hash
> strategy
>
> The hash strategy is fully determined by the connector, not the Flink
> SQL engine. We are not using Flink's hash strategy in any way. If the
> hash strategy for the regular Flink file system connector should be
> changed, this should be expressed via config option. Otherwise we should
> offer a dedicated `hive-filesystem` or `spark-filesystem` connector.
>
> Regards,
> Timo
>
>
> On 30.10.23 10:44, Timo Walther wrote:
> > Hi Jark,
> >
> > my intention was to avoid too complex syntax in the first version. In
> > the past years, we could enable use cases also without this clause, so
> > we should be careful with overloading it with too functionality in the
> > first version. We can still iterate on it later, the interfaces are
> > flexible enough to support more in the future.
> >
> > I agree that maybe an explicit HASH and RANGE doesn't harm. Also making
> > the bucket number optional.
> >
> > I updated the FLIP accordingly. Now the SupportsBucketing interface
> > declares more methods that help in validation and proving helpful error
> > messages to users.
> >
> > Let me know what you think.
> >
> > Regards,
> > Timo
> >
> >
> > On 27.10.23 10:20, Jark Wu wrote:
> >> Hi Timo,
> >>
> >> Thanks for starting this discussion. I really like it!
> >> The FLIP is already in good shape, I only have some minor comments.
> >>
> >> 1. Could we also support HASH and RANGE distribution kind on the DDL
> >> syntax?
> >> I noticed that HASH and UNKNOWN are introduced in the Java API, but
> >> not in
> >> the syntax.
> >>
> >> 2. Can we make "INTO n BUCKETS" optional in CREATE TABLE and ALTER
> TABLE?
> >> Some storage engines support automatically determining the bucket number
> >> based on
> >> the cluster resources and data size of the table. For example,
> >> StarRocks[1]
> >> and Paimon[2].
> >>
> >> Best,
> >> Jark
> >>
> >> [1]:
> >>
> https://docs.starrocks.io/en-us/latest/table_design/Data_distribution#determine-the-number-of-buckets
> >> [2]:
> >>
> https://paimon.apache.org/docs/0.5/concepts/primary-key-table/#dynamic-bucket
> >>
> >> On Thu, 26 Oct 2023 at 18:26, Jingsong Li 
> wrote:
> >>
> >>> Very thanks Timo for starting this discussion.
> >>>
> >>> Big +1 for this.
> >>>
> >>> The design looks good to me!
> >>>
> >>> We can add some documentation for connector developers. For example:
> >>> for sink, If there needs some keyby, please finish the keyby by the
> >>> connector itself. SupportsBucketing is just a marker interface.
> >>>
> >>> Best,
> >>> Jingsong
> >>>
> >>> On Thu, Oct 26, 2023 at 5:00 PM Timo Walther 
> wrote:
> 
>  Hi everyone,
> 
>  I would like to start a discussion on FLIP-376: Add DISTRIBUTED BY
>  clause [1].
> 
>  Many SQL vendors expose the concepts of Partitioning, Bucketing, and
>  Clustering. This FLIP continues the work of previous FLIPs and would
>  like to introduce the concept of "Bucketing" to Flink.
> 
>  This is a pure connector characteristic and helps both Apache Kafka
> 

Re: Proposal for Implementing Keyed Watermarks in Apache Flink

2023-10-30 Thread Alexander Fedulov
Hi Tawfek,

> The idea is to generate a watermark for each key (sub-stream), in order
to avoid the fast progress of the global watermark which affects low-rate
sources.

Let's consider the sensors example from the paper. Shouldn't it be about
the delay between the time of taking the measurement and its arrival at
Flink, rather than the rate at which the measurements are produced? If a
particular sensor produces no data during a specific time window, it
doesn't make sense to wait for it—there won't be any corresponding
measurement arriving because none was produced. Thus, I believe we should
be talking about situations where data from certain sensors can arrive with
significant delay compared to most other sensors.

>From the perspective of data aggregation, there are two main scenarios:
1) Calculation over data produced by multiple sensors
2) Calculation over data produced by an individual sensor

In scenario 1), there are two subcategories:
a) Meaningful results cannot be produced without data from those delayed
sensors; hence, you need to wait longer.
  => Time is propagated by the mix of all sources. You just need to set
a bounded watermark with enough lag to accommodate the delayed results.
This is precisely what event time processing and bounded watermarks are for
(no keyed watermarking is required).
b) You need to produce the results as they are and perhaps patch them later
when the delayed data arrives.
 => Time is propagated by the mix of all sources. You produce the
results as they are but utilize allowedLateness to patch the aggregates if
needed (no keyed watermarking is required).

So, is it correct to say that keyed watermarking is applicable only in
scenario 2)?

Best,
Alexander Fedulov

On Sat, 28 Oct 2023 at 14:33, Tawfek Yasser Tawfek 
wrote:

> Thanks, Alexander for your reply.
>
> Our solution initiated from this inquiry on Stack Overflow:
>
> https://stackoverflow.com/questions/52179898/does-flink-support-keyed-watermarks-if-not-is-there-any-plan-of-implementing-i
>
> The idea is to generate a watermark for each key (sub-stream), in order to
> avoid the fast progress of the global watermark which affects low-rate
> sources.
>
> Instead of using only one watermark (vanilla/global watermark), we changed
> the API to allow moving the keyBy() before the
> assignTimestampsAndWatermarks() so the stream will be partitioned then the
> TimestampsAndWatermarkOperator will handle the generation of each watermark
> for each key (source/sub-stream/partition).
>
> *Let's discuss more if you want I have a presentation at a conference, we
> can meet or whatever is suitable.*
>
> Also, I contacted David Anderson one year ago and he followed me step by
> step and helped me a lot.
>
> I attached some messages with David.
>
>
> *Thanks & BR,*
>
>
> 
>
>
>
>
>  Tawfik Yasser Tawfik
>
> * Teaching Assistant | AI-ITCS-NU*
>
>  Office: UB1-B, Room 229
>
>  26th of July Corridor, Sheikh Zayed City, Giza, Egypt
> --
> *From:* Alexander Fedulov 
> *Sent:* 27 October 2023 20:09
> *To:* dev@flink.apache.org 
> *Subject:* Re: Proposal for Implementing Keyed Watermarks in Apache Flink
>
> [You don't often get email from alexander.fedu...@gmail.com. Learn why
> this is important at https://aka.ms/LearnAboutSenderIdentification ]
>
> Hi Tawfek,
>
> Thanks for sharing. I am trying to understand what exact real-life problem
> you are tackling with this approach. My understanding from skimming through
> the paper is that you are concerned about some outlier event producers from
> which the events can be delayed beyond what is expected in the overall
> system.
> Do I get it correctly that the keyed watermarking only targets scenarios of
> calculating keyed windows (which are also keyed by the same producer ids)?
>
> Best,
> Alexander Fedulov
>
> On Fri, 27 Oct 2023 at 19:07, Tawfek Yasser Tawfek 
> wrote:
>
> > Dear Apache Flink Development Team,
> >
> > I hope this email finds you well. I propose an exciting new feature for
> > Apache Flink that has the potential to significantly enhance its
> > capabilities in handling unbounded streams of events, particularly in the
> > context of event-time windowing.
> >
> > As you may be aware, Apache Flink has been at the forefront of Big Data
> > Stream processing engines, leveraging windowing techniques to manage
> > unbounded event streams effectively. The accuracy of the results obtained
> > from these streams relies heavily on the ability to gather all relevant
> > input within a window. At the core of this process are watermarks, which
> > serve as unique timestamps marking the progression of events in time.
> >
> > However, our analysis has revealed a critical issue with the current
> > watermark generation method in Apache Flink. This method, which operates
> at
> > the input stream level, exhibits a bias towards faster sub-streams,
> > resulting in the unfortunate consequence of dropped events from slower
> > 

Re: [ANNOUNCE] Apache Flink Kubernetes Operator 1.6.1 released

2023-10-30 Thread Gyula Fóra
Thank you Rui for taking care of this!


On Mon, Oct 30, 2023 at 11:55 AM Rui Fan <1996fan...@gmail.com> wrote:

> The Apache Flink community is very happy to announce the release of Apache
> Flink Kubernetes Operator 1.6.1.
>
> Please check out the release blog post for an overview of the release:
>
> https://flink.apache.org/2023/10/27/apache-flink-kubernetes-operator-1.6.1-release-announcement/
>
> The release is available for download at:
> https://flink.apache.org/downloads.html
>
> Maven artifacts for Flink Kubernetes Operator can be found at:
>
> https://search.maven.org/artifact/org.apache.flink/flink-kubernetes-operator
>
> Official Docker image for Flink Kubernetes Operator applications can be
> found at:
> https://hub.docker.com/r/apache/flink-kubernetes-operator
>
> The full release notes are available in Jira:
>
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12353784
>
> We would like to thank all contributors of the Apache Flink community who
> made this release possible!
>
> Regards,
> Rui Fan
>


[ANNOUNCE] Apache Flink Kubernetes Operator 1.6.1 released

2023-10-30 Thread Rui Fan
The Apache Flink community is very happy to announce the release of Apache
Flink Kubernetes Operator 1.6.1.

Please check out the release blog post for an overview of the release:
https://flink.apache.org/2023/10/27/apache-flink-kubernetes-operator-1.6.1-release-announcement/

The release is available for download at:
https://flink.apache.org/downloads.html

Maven artifacts for Flink Kubernetes Operator can be found at:
https://search.maven.org/artifact/org.apache.flink/flink-kubernetes-operator

Official Docker image for Flink Kubernetes Operator applications can be
found at:
https://hub.docker.com/r/apache/flink-kubernetes-operator

The full release notes are available in Jira:
https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12353784

We would like to thank all contributors of the Apache Flink community who
made this release possible!

Regards,
Rui Fan


[DISCUSS] AWS Connectors v4.2.0 release + 1.18 support

2023-10-30 Thread Danny Cranmer
Hello all,

I would like to start the discussion to release Apache Flink AWS connectors
v4.2.0. We released v4.1.0 over six months ago on 2023-04-03. Since then we
have resolved 23 issues [1]. Additionally now Flink 1.18 is live we need to
add support for this. I am proposing we skip 4.1.0 for Flink 1.18 and go
straight to 4.2.0. The CI is stable [2].

I volunteer myself as the release manager.

Thanks,
Danny

[1]
https://issues.apache.org/jira/browse/FLINK-33021?jql=statusCategory%20%3D%20done%20AND%20project%20%3D%2012315522%20AND%20fixVersion%20%3D%2012353011%20ORDER%20BY%20priority%20DESC%2C%20key%20ASC
[2] https://github.com/apache/flink-connector-aws/actions


Re: [DISCUSS] Promote SinkV2 to @Public and deprecate SinkFunction

2023-10-30 Thread Martijn Visser
Hi everyone,

I would like to +1 Gordon's proposal to close FLINK-30238, create a
new follow-up ticket and try to address the specific
PostCommitTopology in the work that's currently being done by Peter on
SinkV2. If there's no feedback on this topic, I assume everyone's OK
with that.

Best regards,

Martijn

On Fri, Sep 29, 2023 at 8:11 AM Tzu-Li (Gordon) Tai  wrote:
>
> Hi everyone,
>
> It’s been a while since this topic was last discussed, but nevertheless, it
> still remains very desirable to figure out a clear path towards making
> SinkV2 @Public.
>
> There’s a new thread [1] that has a pretty good description on missing
> features in SinkV2 from the Iceberg connector’s perspective, which prevents
> them from migrating - anything related to those new requirements, let's
> discuss there.
>
> Nevertheless, I think we should also revive and reuse this thread to reach
> consensus / closure on all concerns already brought up here.
>
> It’s quite a long thread where a lot of various concerns were brought up,
> but I’d start by addressing two very specific ones: FLIP-287 [2] and
> FLINK-30238 [3]
>
> First of all, FLIP-287 has been approved and merged already, and will be
> released with 1.18.0. So, connector migrations that were waiting on this
> should hopefully be unblocked after the release. So this seems to no longer
> be a concern - let’s see things through with those connectors actually
> being migrated.
>
> FLINK-30238 is sort of a confusing one, and I do believe it is (partially)
> a false alarm. After looking into this, the problem reported there
> essentially breaks down to two things:
> 1) TwoPhaseCommittingSink is unable to take a new savepoint after restoring
> from a savepoint generated via `stop-with-savepoint --drain`
> 2) SinkV2 sinks with a PostCommitTopology do not properly have post-commits
> completed after a stop-with-savepoint operation, since committed
> commitables are not emitted to the post-commit topology after the committer
> receives the end-of-input signal.
>
> My latest comment in [3] explains this in a bit more detail.
>
> I believe we can conclude that problem 1) is a non-concern - users should
> not restore from a job that is drained on stop-with-savepoint and cannot
> expect the restored job to function normally.
> Problem 2) remains a real issue though, and to help clear things up I think
> we should close FLINK-30238 in favor of a new ticket scoped to the specific
> PostCommitTopology problem.
>
> The other open concerns seem to mostly be around graduation criteria and
> process - I've yet to go through those and will follow up with a separate
> reply (or perhaps Martijn can help wrap up that part?).
>
> Thanks,
> Gordon
>
> [1] https://lists.apache.org/thread/h3kg7jcgjstpvwlhnofq093vk93ylgsn
> [2]
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=240880853
> [3] https://issues.apache.org/jira/browse/FLINK-30238
>
> On Mon, Feb 13, 2023 at 2:50 AM Jing Ge  wrote:
>
> > @Martijn
> >
> > What I shared previously is the fact of the current KafkaSink. Following
> > your suggestion, the KafkaSink should still be marked as @Experimental for
> > now which will need even longer time to graduate. BTW, KafkaSink does not
> > depend on any @Internal interfaces. The @Internal is used for methods
> > coming from @PublicEvolving SinkV2 interfaces, not interfaces themself.
> > Thanks for bringing this topic up. Currently, there is no rule defined to
> > say that no @Internal is allowed for methods implemented
> > from @PublicEvolving interfaces. Further (off-this-topic) discussion might
> > be required to check if it really makes sense to define such a rule, since
> > some methods defined in interfaces might only be used internally, i.e. no
> > connector user would be aware of them.
> >
> > @Dong
> >
> > I agree with everything you said and especially can't agree more to let
> > developers who will own it make the decision.
> >
> > Best regards,
> > Jing
> >
> >
> > On Sun, Feb 12, 2023 at 2:53 AM Dong Lin  wrote:
> >
> > > Hi Martijn,
> > >
> > > Thanks for the reply. Please see my comments inline.
> > >
> > > Regards,
> > > Dong
> > >
> > > On Sat, Feb 11, 2023 at 4:31 AM Martijn Visser  > >
> > > wrote:
> > >
> > > > Hi all,
> > > >
> > > > I wanted to get back on a couple of comments and have a proposal at the
> > > end
> > > > for the next steps:
> > > >
> > > > @Steven Wu
> > > > If I look back at the original discussion of FLIP-191 and also the
> > thread
> > > > that you're referring to, it appears from the discussion with Yun Gao
> > > that
> > > > the solution was in near-sight, but just not finished. Perhaps it needs
> > > to
> > > > be restarted once more so it can be brought to a closure. Also when I
> > > > looked back at the original introduction of SinkV2, there was
> > FLINK-25726
> > > > [1] which looks like it was built specifically for Iceberg and Delta
> > > Sink?
> > > >
> > > > @Jing
> > > > > All potential unstable methods coming from SinkV2 

[jira] [Created] (FLINK-33399) Support switching from batch to stream mode for KeyedCoProcessOperator and IntervalJoinOperator

2023-10-30 Thread Xuannan Su (Jira)
Xuannan Su created FLINK-33399:
--

 Summary: Support switching from batch to stream mode for 
KeyedCoProcessOperator and IntervalJoinOperator
 Key: FLINK-33399
 URL: https://issues.apache.org/jira/browse/FLINK-33399
 Project: Flink
  Issue Type: Sub-task
  Components: Runtime / Task
Reporter: Xuannan Su


Support switching from batch to stream mode for KeyedCoProcessOperator and 
IntervalJoinOperator.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-33398) Support switching from batch to stream mode for one input stream operator

2023-10-30 Thread Xuannan Su (Jira)
Xuannan Su created FLINK-33398:
--

 Summary: Support switching from batch to stream mode for one input 
stream operator
 Key: FLINK-33398
 URL: https://issues.apache.org/jira/browse/FLINK-33398
 Project: Flink
  Issue Type: Sub-task
  Components: Runtime / Task
Reporter: Xuannan Su


Introduce the infra to support switching from batch to stream mode for one 
input stream operator.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [VOTE] Apache Flink Kafka connector version 3.0.1, RC1

2023-10-30 Thread Ahmed Hamdy
+1 (non-binding)
- Verified Singatures
- Verified Checksum
- Build source successfully
- Checked release tag exists
- Reviewed the web PR
Best Regards
Ahmed Hamdy


On Sun, 29 Oct 2023 at 08:02, Leonard Xu  wrote:

> +1 (binding)
>
> - Verified signatures
> - Verified hashsums
> - Checked Github release tag
> - Built from source code succeeded
> - Checked release notes
> - Reviewed the web PR
>
> Best,
> Leonard
>
>
> > 2023年10月29日 上午11:34,mystic lama  写道:
> >
> > +1 (non-binding)
> >
> > - verified signatures
> > - build with Java 8 and Java 11 - build success
> >
> > Minor observation
> > - RAT check flagged that README.md is missing ASL
> >
> > On Fri, 27 Oct 2023 at 23:40, Xianxun Ye 
> wrote:
> >
> >> +1(non-binding)
> >>
> >> - Started a local Flink 1.18 cluster, read and wrote with Kafka and
> Upsert
> >> Kafka connector successfully to Kafka 2.2 cluster
> >>
> >> One minor question: should we update the dependency manual of these two
> >> documentations[1][2]?
> >>
> >> [1]
> >>
> https://nightlies.apache.org/flink/flink-docs-master/docs/connectors/table/kafka/#dependencies
> >> [2]
> >>
> https://nightlies.apache.org/flink/flink-docs-master/docs/connectors/table/upsert-kafka/#dependencies
> >>
> >> Best regards,
> >> Xianxun
> >>
> >>> 2023年10月26日 16:12,Martijn Visser  写道:
> >>>
> >>> +1 (binding)
> >>>
> >>> - Validated hashes
> >>> - Verified signature
> >>> - Verified that no binaries exist in the source archive
> >>> - Build the source with Maven via mvn clean install
> >>> -Pcheck-convergence -Dflink.version=1.18.0
> >>> - Verified licenses
> >>> - Verified web PR
> >>> - Started a cluster and the Flink SQL client, successfully read and
> >>> wrote with the Kafka connector to Confluent Cloud with AVRO and Schema
> >>> Registry enabled
> >>>
> >>> On Thu, Oct 26, 2023 at 5:09 AM Qingsheng Ren 
> wrote:
> 
>  +1 (binding)
> 
>  - Verified signature and checksum
>  - Verified that no binary exists in the source archive
>  - Built from source with Java 8 using -Dflink.version=1.18
>  - Started a local Flink 1.18 cluster, submitted jobs with SQL client
>  reading from and writing (with exactly-once) to Kafka 3.2.3 cluster
>  - Nothing suspicious in LICENSE and NOTICE file
>  - Reviewed web PR
> 
>  Thanks for the effort, Gordon!
> 
>  Best,
>  Qingsheng
> 
>  On Thu, Oct 26, 2023 at 5:13 AM Tzu-Li (Gordon) Tai <
> >> tzuli...@apache.org>
>  wrote:
> 
> > Hi everyone,
> >
> > Please review and vote on release candidate #1 for version 3.0.1 of
> the
> > Apache Flink Kafka Connector, as follows:
> > [ ] +1, Approve the release
> > [ ] -1, Do not approve the release (please provide specific comments)
> >
> > This release contains important changes for the following:
> > - Supports Flink 1.18.x series
> > - [FLINK-28303] EOS violation when using LATEST_OFFSETS startup mode
> > - [FLINK-33231] Memory leak causing OOM when there are no offsets to
> >> commit
> > back to Kafka
> > - [FLINK-28758] FlinkKafkaConsumer fails to stop with savepoint
> >
> > The release candidate contains the source release as well as JAR
> >> artifacts
> > to be released to Maven, built against Flink 1.17.1 and 1.18.0.
> >
> > The complete staging area is available for your review, which
> includes:
> > * JIRA release notes [1],
> > * the official Apache source release to be deployed to
> dist.apache.org
> > [2],
> > which are signed with the key with fingerprint
> > 1C1E2394D3194E1944613488F320986D35C33D6A [3],
> > * all artifacts to be deployed to the Maven Central Repository [4],
> > * source code tag v3.0.1-rc1 [5],
> > * website pull request listing the new release [6].
> >
> > The vote will be open for at least 72 hours. It is adopted by
> majority
> > approval, with at least 3 PMC affirmative votes.
> >
> > Thanks,
> > Gordon
> >
> > [1]
> >
> >
> >>
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12352910
> > [2]
> >
> >
> >>
> https://dist.apache.org/repos/dist/dev/flink/flink-connector-kafka-3.0.1-rc1/
> > [3] https://dist.apache.org/repos/dist/release/flink/KEYS
> > [4]
> >> https://repository.apache.org/content/repositories/orgapacheflink-1664
> > [5]
> https://github.com/apache/flink-connector-kafka/commits/v3.0.1-rc1
> > [6] https://github.com/apache/flink-web/pull/692
> >
> >>
> >>
>
>


Re: [DISCUSS] FLIP-376: Add DISTRIBUTED BY clause

2023-10-30 Thread Timo Walther

Let me reply to the feedback from Yunfan:

> Distribute by in DML is also supported by Hive

I see DISTRIBUTED BY and DISTRIBUTE BY as two separate discussions. This 
discussion is about DDL. For DDL, we have more freedom as every vendor 
has custom syntax for CREATE TABLE clauses. Furthermore, this is tightly 
connector to the connector implementation, not the engine. However, for 
DML we need to watch out for standard compliance and introduce changes 
with high caution.


How a LookupTableSource interprets the DISTRIBUTED BY is 
connector-dependent in my opinion. In general this FLIP is a sink 
ability, but we could have a follow FLIP that helps in distributing load 
of lookup joins.


> to avoid data skew problem

I understand the use case and that it is important to solve it 
eventually. Maybe a solution might be to introduce helper Polymorphic 
Table Functions [1] in the future instead of new syntax.


[1] 
https://www.ifis.uni-luebeck.de/~moeller/Lectures/WS-19-20/NDBDM/12-Literature/Michels-SQL-2016.pdf



Let me reply to the feedback from Benchao:

> Do you think it's useful to add some extensibility for the hash
strategy

The hash strategy is fully determined by the connector, not the Flink 
SQL engine. We are not using Flink's hash strategy in any way. If the 
hash strategy for the regular Flink file system connector should be 
changed, this should be expressed via config option. Otherwise we should 
offer a dedicated `hive-filesystem` or `spark-filesystem` connector.


Regards,
Timo


On 30.10.23 10:44, Timo Walther wrote:

Hi Jark,

my intention was to avoid too complex syntax in the first version. In 
the past years, we could enable use cases also without this clause, so 
we should be careful with overloading it with too functionality in the 
first version. We can still iterate on it later, the interfaces are 
flexible enough to support more in the future.


I agree that maybe an explicit HASH and RANGE doesn't harm. Also making 
the bucket number optional.


I updated the FLIP accordingly. Now the SupportsBucketing interface 
declares more methods that help in validation and proving helpful error 
messages to users.


Let me know what you think.

Regards,
Timo


On 27.10.23 10:20, Jark Wu wrote:

Hi Timo,

Thanks for starting this discussion. I really like it!
The FLIP is already in good shape, I only have some minor comments.

1. Could we also support HASH and RANGE distribution kind on the DDL
syntax?
I noticed that HASH and UNKNOWN are introduced in the Java API, but 
not in

the syntax.

2. Can we make "INTO n BUCKETS" optional in CREATE TABLE and ALTER TABLE?
Some storage engines support automatically determining the bucket number
based on
the cluster resources and data size of the table. For example, 
StarRocks[1]

and Paimon[2].

Best,
Jark

[1]:
https://docs.starrocks.io/en-us/latest/table_design/Data_distribution#determine-the-number-of-buckets
[2]:
https://paimon.apache.org/docs/0.5/concepts/primary-key-table/#dynamic-bucket

On Thu, 26 Oct 2023 at 18:26, Jingsong Li  wrote:


Very thanks Timo for starting this discussion.

Big +1 for this.

The design looks good to me!

We can add some documentation for connector developers. For example:
for sink, If there needs some keyby, please finish the keyby by the
connector itself. SupportsBucketing is just a marker interface.

Best,
Jingsong

On Thu, Oct 26, 2023 at 5:00 PM Timo Walther  wrote:


Hi everyone,

I would like to start a discussion on FLIP-376: Add DISTRIBUTED BY
clause [1].

Many SQL vendors expose the concepts of Partitioning, Bucketing, and
Clustering. This FLIP continues the work of previous FLIPs and would
like to introduce the concept of "Bucketing" to Flink.

This is a pure connector characteristic and helps both Apache Kafka and
Apache Paimon connectors in avoiding a complex WITH clause by providing
improved syntax.

Here is an example:

CREATE TABLE MyTable
    (
  uid BIGINT,
  name STRING
    )
    DISTRIBUTED BY (uid) INTO 6 BUCKETS
    WITH (
  'connector' = 'kafka'
    )

The full syntax specification can be found in the document. The clause
should be optional and fully backwards compatible.

Regards,
Timo

[1]


https://cwiki.apache.org/confluence/display/FLINK/FLIP-376%3A+Add+DISTRIBUTED+BY+clause









Re: [DISCUSS] FLIP-376: Add DISTRIBUTED BY clause

2023-10-30 Thread Timo Walther

Hi Yunfan and Benchao,

it seems the FLIP discussion thread got split into two parts. At least 
this is what I see in my mail program. I would kindly ask to answer in 
the other thread [1].


I will also reply there now to maintain the discussion link.

Regards,
Timo

[1] https://lists.apache.org/thread/z1p4f38x9ohv29y4nlq8g1wdxwfjwxd1



On 28.10.23 10:34, Benchao Li wrote:

Thanks Timo for preparing the FLIP.

Regarding "By default, DISTRIBUTED BY assumes a list of columns for an
implicit hash partitioning."
Do you think it's useful to add some extensibility for the hash
strategy. One scenario I can foresee is if we write bucketed data into
Hive, and if Flink's hash strategy is different than Hive/Spark's,
then they could not utilize the bucketed data written by Flink. This
is the one case I met in production already, there may be more cases
like this that needs customize the hash strategy to accommodate with
existing systems.

yunfan zhang  于2023年10月27日周五 19:06写道:


Distribute by in DML is also supported by Hive.
And it is also useful for flink.
Users can use this ability to increase cache hit rate in lookup join.
And users can use "distribute by key, rand(1, 10)” to avoid data skew problem.
And I think it is another way to solve this Flip204[1]
There is already has some people required this feature[2]

[1] 
https://cwiki.apache.org/confluence/display/FLINK/FLIP-204%3A+Introduce+Hash+Lookup+Join
[2] https://issues.apache.org/jira/browse/FLINK-27541

On 2023/10/27 08:20:25 Jark Wu wrote:

Hi Timo,

Thanks for starting this discussion. I really like it!
The FLIP is already in good shape, I only have some minor comments.

1. Could we also support HASH and RANGE distribution kind on the DDL
syntax?
I noticed that HASH and UNKNOWN are introduced in the Java API, but not in
the syntax.

2. Can we make "INTO n BUCKETS" optional in CREATE TABLE and ALTER TABLE?
Some storage engines support automatically determining the bucket number
based on
the cluster resources and data size of the table. For example, StarRocks[1]
and Paimon[2].

Best,
Jark

[1]:
https://docs.starrocks.io/en-us/latest/table_design/Data_distribution#determine-the-number-of-buckets
[2]:
https://paimon.apache.org/docs/0.5/concepts/primary-key-table/#dynamic-bucket

On Thu, 26 Oct 2023 at 18:26, Jingsong Li  wrote:


Very thanks Timo for starting this discussion.

Big +1 for this.

The design looks good to me!

We can add some documentation for connector developers. For example:
for sink, If there needs some keyby, please finish the keyby by the
connector itself. SupportsBucketing is just a marker interface.

Best,
Jingsong

On Thu, Oct 26, 2023 at 5:00 PM Timo Walther  wrote:


Hi everyone,

I would like to start a discussion on FLIP-376: Add DISTRIBUTED BY
clause [1].

Many SQL vendors expose the concepts of Partitioning, Bucketing, and
Clustering. This FLIP continues the work of previous FLIPs and would
like to introduce the concept of "Bucketing" to Flink.

This is a pure connector characteristic and helps both Apache Kafka and
Apache Paimon connectors in avoiding a complex WITH clause by providing
improved syntax.

Here is an example:

CREATE TABLE MyTable
(
  uid BIGINT,
  name STRING
)
DISTRIBUTED BY (uid) INTO 6 BUCKETS
WITH (
  'connector' = 'kafka'
)

The full syntax specification can be found in the document. The clause
should be optional and fully backwards compatible.

Regards,
Timo

[1]


https://cwiki.apache.org/confluence/display/FLINK/FLIP-376%3A+Add+DISTRIBUTED+BY+clause











Re: [DISCUSS] FLIP-376: Add DISTRIBUTED BY clause

2023-10-30 Thread Timo Walther

Hi Jark,

my intention was to avoid too complex syntax in the first version. In 
the past years, we could enable use cases also without this clause, so 
we should be careful with overloading it with too functionality in the 
first version. We can still iterate on it later, the interfaces are 
flexible enough to support more in the future.


I agree that maybe an explicit HASH and RANGE doesn't harm. Also making 
the bucket number optional.


I updated the FLIP accordingly. Now the SupportsBucketing interface 
declares more methods that help in validation and proving helpful error 
messages to users.


Let me know what you think.

Regards,
Timo


On 27.10.23 10:20, Jark Wu wrote:

Hi Timo,

Thanks for starting this discussion. I really like it!
The FLIP is already in good shape, I only have some minor comments.

1. Could we also support HASH and RANGE distribution kind on the DDL
syntax?
I noticed that HASH and UNKNOWN are introduced in the Java API, but not in
the syntax.

2. Can we make "INTO n BUCKETS" optional in CREATE TABLE and ALTER TABLE?
Some storage engines support automatically determining the bucket number
based on
the cluster resources and data size of the table. For example, StarRocks[1]
and Paimon[2].

Best,
Jark

[1]:
https://docs.starrocks.io/en-us/latest/table_design/Data_distribution#determine-the-number-of-buckets
[2]:
https://paimon.apache.org/docs/0.5/concepts/primary-key-table/#dynamic-bucket

On Thu, 26 Oct 2023 at 18:26, Jingsong Li  wrote:


Very thanks Timo for starting this discussion.

Big +1 for this.

The design looks good to me!

We can add some documentation for connector developers. For example:
for sink, If there needs some keyby, please finish the keyby by the
connector itself. SupportsBucketing is just a marker interface.

Best,
Jingsong

On Thu, Oct 26, 2023 at 5:00 PM Timo Walther  wrote:


Hi everyone,

I would like to start a discussion on FLIP-376: Add DISTRIBUTED BY
clause [1].

Many SQL vendors expose the concepts of Partitioning, Bucketing, and
Clustering. This FLIP continues the work of previous FLIPs and would
like to introduce the concept of "Bucketing" to Flink.

This is a pure connector characteristic and helps both Apache Kafka and
Apache Paimon connectors in avoiding a complex WITH clause by providing
improved syntax.

Here is an example:

CREATE TABLE MyTable
(
  uid BIGINT,
  name STRING
)
DISTRIBUTED BY (uid) INTO 6 BUCKETS
WITH (
  'connector' = 'kafka'
)

The full syntax specification can be found in the document. The clause
should be optional and fully backwards compatible.

Regards,
Timo

[1]


https://cwiki.apache.org/confluence/display/FLINK/FLIP-376%3A+Add+DISTRIBUTED+BY+clause







[jira] [Created] (FLINK-33397) FLIP-373: Support Configuring Different State TTLs using SQL Hint

2023-10-30 Thread Jane Chan (Jira)
Jane Chan created FLINK-33397:
-

 Summary: FLIP-373: Support Configuring Different State TTLs using 
SQL Hint
 Key: FLINK-33397
 URL: https://issues.apache.org/jira/browse/FLINK-33397
 Project: Flink
  Issue Type: New Feature
  Components: Table SQL / API
Affects Versions: 1.19.0
Reporter: Jane Chan
 Fix For: 1.19.0






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] FLIP-377: Support configuration to disable filter push down for Table/SQL Sources

2023-10-30 Thread Hang Ruan
Thanks for the improvements, Jiabao.

There are some details that I am not sure about.
1. The new option `source.filter-push-down.enabled` will be added to which
class? I think it should be `SourceReaderOptions`.
2. How are the connector developers able to know and follow the FLIP? Do we
need an abstract base class or provide a default method?

Best,
Hang

Jiabao Sun  于2023年10月30日周一 14:45写道:

> Hi, all,
>
> Thanks for the lively discussion.
>
> Based on the discussion, I have made some adjustments to the FLIP document:
>
> 1. The name of the newly added option has been changed to
> "source.filter-push-down.enabled".
> 2. Considering compatibility with older versions, the newly added
> "source.filter-push-down.enabled" option needs to respect the optimizer's
> "table.optimizer.source.predicate-pushdown-enabled" option.
> But there is a consideration to remove the old option in Flink 2.0.
> 3. We can provide more options to disable other source abilities with side
> effects, such as “source.aggregate.enabled” and “source.projection.enabled"
> This is not urgent and can be continuously introduced.
>
> Looking forward to your feedback again.
>
> Best,
> Jiabao
>
>
> > 2023年10月29日 08:45,Becket Qin  写道:
> >
> > Thanks for digging into the git history, Jark. I agree it makes sense to
> > deprecate this API in 2.0.
> >
> > Cheers,
> >
> > Jiangjie (Becket) Qin
> >
> > On Fri, Oct 27, 2023 at 5:47 PM Jark Wu  wrote:
> >
> >> Hi Becket,
> >>
> >> I checked the history of "
> >> *table.optimizer.source.predicate-pushdown-enabled*",
> >> it seems it was introduced since the legacy FilterableTableSource
> >> interface
> >> which might be an experiential feature at that time. I don't see the
> >> necessity
> >> of this option at the moment. Maybe we can deprecate this option and
> drop
> >> it
> >> in Flink 2.0[1] if it is not necessary anymore. This may help to
> >> simplify this discussion.
> >>
> >>
> >> Best,
> >> Jark
> >>
> >> [1]: https://issues.apache.org/jira/browse/FLINK-32383
> >>
> >>
> >>
> >> On Thu, 26 Oct 2023 at 10:14, Becket Qin  wrote:
> >>
> >>> Thanks for the proposal, Jiabao. My two cents below:
> >>>
> >>> 1. If I understand correctly, the motivation of the FLIP is mainly to
> >>> make predicate pushdown optional on SOME of the Sources. If so,
> intuitively
> >>> the configuration should be Source specific instead of general.
> Otherwise,
> >>> we will end up with general configurations that may not take effect for
> >>> some of the Source implementations. This violates the basic rule of a
> >>> configuration - it does what it says, regardless of the implementation.
> >>> While configuration standardization is usually a good thing, it should
> not
> >>> break the basic rules.
> >>> If we really want to have this general configuration, for the sources
> >>> this configuration does not apply, they should throw an exception to
> make
> >>> it clear that this configuration is not supported. However, that seems
> ugly.
> >>>
> >>> 2. I think the actual motivation of this FLIP is about "how a source
> >>> should implement predicate pushdown efficiently", not "whether
> predicate
> >>> pushdown should be applied to the source." For example, if a source
> wants
> >>> to avoid additional computing load in the external system, it can
> always
> >>> read the entire record and apply the predicates by itself. However,
> from
> >>> the Flink perspective, the predicate pushdown is applied, it is just
> >>> implemented differently by the source. So the design principle here is
> that
> >>> Flink only cares about whether a source supports predicate pushdown or
> not,
> >>> it does not care about the implementation efficiency / side effect of
> the
> >>> predicates pushdown. It is the Source implementation's responsibility
> to
> >>> ensure the predicates pushdown is implemented efficiently and does not
> >>> impose excessive pressure on the external system. And it is OK to have
> >>> additional configurations to achieve this goal. Obviously, such
> >>> configurations will be source specific in this case.
> >>>
> >>> 3. Regarding the existing configurations of
> *table.optimizer.source.predicate-pushdown-enabled.
> >>> *I am not sure why we need it. Supposedly, if a source implements a
> >>> SupportsXXXPushDown interface, the optimizer should push the
> corresponding
> >>> predicates to the Source. I am not sure in which case this
> configuration
> >>> would be used. Any ideas @Jark Wu ?
> >>>
> >>> Thanks,
> >>>
> >>> Jiangjie (Becket) Qin
> >>>
> >>>
> >>> On Wed, Oct 25, 2023 at 11:55 PM Jiabao Sun
> >>>  wrote:
> >>>
>  Thanks Jane for the detailed explanation.
> 
>  I think that for users, we should respect conventions over
>  configurations.
>  Conventions can be default values explicitly specified in
>  configurations, or they can be behaviors that follow previous
> versions.
>  If the same code has different behaviors in different versions, it
> would
>  be a very 

[jira] [Created] (FLINK-33396) The table alias when using join hints should be removed in the final plan

2023-10-30 Thread xuyang (Jira)
xuyang created FLINK-33396:
--

 Summary: The table alias when using join hints should be removed 
in the final plan
 Key: FLINK-33396
 URL: https://issues.apache.org/jira/browse/FLINK-33396
 Project: Flink
  Issue Type: Bug
  Components: Table SQL / Planner
Affects Versions: 1.17.0, 1.16.0, 1.18.0
Reporter: xuyang


See the existent test 
'NestLoopJoinHintTest#testJoinHintWithJoinHintInCorrelateAndWithAgg', the plan 
is 
{code:java}
HashJoin(joinType=[LeftSemiJoin], where=[=(a1, EXPR$0)], select=[a1, b1], 
build=[right], tryDistinctBuildRow=[true])
:- Exchange(distribution=[hash[a1]])
:  +- TableSourceScan(table=[[default_catalog, default_database, T1]], 
fields=[a1, b1])
+- Exchange(distribution=[hash[EXPR$0]])
   +- LocalHashAggregate(groupBy=[EXPR$0], select=[EXPR$0])
  +- Calc(select=[EXPR$0])
 +- HashAggregate(isMerge=[true], groupBy=[a1], select=[a1, 
Final_COUNT(count$0) AS EXPR$0])
+- Exchange(distribution=[hash[a1]])
   +- LocalHashAggregate(groupBy=[a1], select=[a1, 
Partial_COUNT(a2) AS count$0])
  +- NestedLoopJoin(joinType=[InnerJoin], where=[=(a2, a1)], 
select=[a2, a1], build=[right])
 :- TableSourceScan(table=[[default_catalog, 
default_database, T2, project=[a2], metadata=[]]], fields=[a2], hints=[[[ALIAS 
options:[T2)
 +- Exchange(distribution=[broadcast])
+- TableSourceScan(table=[[default_catalog, 
default_database, T1, project=[a1], metadata=[]]], fields=[a1], hints=[[[ALIAS 
options:[T1)  {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-33395) The join hint doesn't work when appears in subquery

2023-10-30 Thread xuyang (Jira)
xuyang created FLINK-33395:
--

 Summary: The join hint doesn't work when appears in subquery
 Key: FLINK-33395
 URL: https://issues.apache.org/jira/browse/FLINK-33395
 Project: Flink
  Issue Type: Bug
  Components: Table SQL / Planner
Affects Versions: 1.17.0, 1.16.0, 1.18.0
Reporter: xuyang


See the existent test 
'NestLoopJoinHintTest#testJoinHintWithJoinHintInCorrelateAndWithAgg', the test 
plan is 
{code:java}
HashJoin(joinType=[LeftSemiJoin], where=[=(a1, EXPR$0)], select=[a1, b1], 
build=[right], tryDistinctBuildRow=[true])
:- Exchange(distribution=[hash[a1]])
:  +- TableSourceScan(table=[[default_catalog, default_database, T1]], 
fields=[a1, b1])
+- Exchange(distribution=[hash[EXPR$0]])
   +- LocalHashAggregate(groupBy=[EXPR$0], select=[EXPR$0])
  +- Calc(select=[EXPR$0])
 +- HashAggregate(isMerge=[true], groupBy=[a1], select=[a1, 
Final_COUNT(count$0) AS EXPR$0])
+- Exchange(distribution=[hash[a1]])
   +- LocalHashAggregate(groupBy=[a1], select=[a1, 
Partial_COUNT(a2) AS count$0])
  +- NestedLoopJoin(joinType=[InnerJoin], where=[=(a2, a1)], 
select=[a2, a1], build=[right])
 :- TableSourceScan(table=[[default_catalog, 
default_database, T2, project=[a2], metadata=[]]], fields=[a2], hints=[[[ALIAS 
options:[T2)
 +- Exchange(distribution=[broadcast])
+- TableSourceScan(table=[[default_catalog, 
default_database, T1, project=[a1], metadata=[]]], fields=[a1], hints=[[[ALIAS 
options:[T1) {code}
but the NestedLoopJoin should broadcase left side.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-33394) DataGeneratorSourceITCase.testGatedRateLimiter fails on AZP

2023-10-30 Thread Sergey Nuyanzin (Jira)
Sergey Nuyanzin created FLINK-33394:
---

 Summary: DataGeneratorSourceITCase.testGatedRateLimiter fails on 
AZP
 Key: FLINK-33394
 URL: https://issues.apache.org/jira/browse/FLINK-33394
 Project: Flink
  Issue Type: Bug
  Components: Connectors / Common
Affects Versions: 1.17.2
Reporter: Sergey Nuyanzin


This build 
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=54054=logs=5cae8624-c7eb-5c51-92d3-4d2dacedd221=5acec1b4-945b-59ca-34f8-168928ce5199=22927
fails on AZP as
{noformat}
Oct 26 07:37:41 [1L, 1L, 1L, 1L, 1L, 1L]
Oct 26 07:37:41 at 
org.apache.flink.connector.datagen.source.DataGeneratorSourceITCase.testGatedRateLimiter(DataGeneratorSourceITCase.java:200)
Oct 26 07:37:41 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native 
Method)
Oct 26 07:37:41 at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
Oct 26 07:37:41 at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
Oct 26 07:37:41 at java.lang.reflect.Method.invoke(Method.java:498)

{noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-33393) flink document description error

2023-10-30 Thread Jira
蔡灿材 created FLINK-33393:
---

 Summary: flink document description error
 Key: FLINK-33393
 URL: https://issues.apache.org/jira/browse/FLINK-33393
 Project: Flink
  Issue Type: Improvement
  Components: Documentation
Affects Versions: 1.17.1
Reporter: 蔡灿材
 Fix For: 1.17.1
 Attachments: 捕获.PNG

flink document description error, function part description error



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-33392) Add the documentation page for balanced tasks scheduling

2023-10-30 Thread RocMarshal (Jira)
RocMarshal created FLINK-33392:
--

 Summary: Add the documentation page for balanced tasks scheduling
 Key: FLINK-33392
 URL: https://issues.apache.org/jira/browse/FLINK-33392
 Project: Flink
  Issue Type: Sub-task
  Components: Runtime / Task
Reporter: RocMarshal






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-33391) Support tasks balancing at TM level for Adaptive Scheduler

2023-10-30 Thread RocMarshal (Jira)
RocMarshal created FLINK-33391:
--

 Summary: Support tasks balancing at TM level for Adaptive Scheduler
 Key: FLINK-33391
 URL: https://issues.apache.org/jira/browse/FLINK-33391
 Project: Flink
  Issue Type: Sub-task
  Components: Runtime / Task
Reporter: RocMarshal






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-33390) Support slot balancing at TM level for Adaptive Scheduler

2023-10-30 Thread RocMarshal (Jira)
RocMarshal created FLINK-33390:
--

 Summary: Support slot balancing at TM level for Adaptive Scheduler
 Key: FLINK-33390
 URL: https://issues.apache.org/jira/browse/FLINK-33390
 Project: Flink
  Issue Type: Sub-task
  Components: Runtime / Task
Reporter: RocMarshal






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-33389) Introduce the assigner for Adaptive Scheduler to pursuit task balancing based slots level

2023-10-30 Thread RocMarshal (Jira)
RocMarshal created FLINK-33389:
--

 Summary: Introduce the assigner for Adaptive Scheduler to pursuit 
task balancing based slots level
 Key: FLINK-33389
 URL: https://issues.apache.org/jira/browse/FLINK-33389
 Project: Flink
  Issue Type: Sub-task
  Components: Runtime / Task
Reporter: RocMarshal






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-33388) Implement slots to taskmanagers balancing for the Default Scheduler

2023-10-30 Thread RocMarshal (Jira)
RocMarshal created FLINK-33388:
--

 Summary: Implement slots to taskmanagers balancing for the Default 
Scheduler
 Key: FLINK-33388
 URL: https://issues.apache.org/jira/browse/FLINK-33388
 Project: Flink
  Issue Type: Sub-task
  Components: Runtime / Task
Reporter: RocMarshal






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-33387) Introduce the abstraction and the interface about loading

2023-10-30 Thread RocMarshal (Jira)
RocMarshal created FLINK-33387:
--

 Summary: Introduce the abstraction and the interface about loading
 Key: FLINK-33387
 URL: https://issues.apache.org/jira/browse/FLINK-33387
 Project: Flink
  Issue Type: Sub-task
  Components: Runtime / Task
Reporter: RocMarshal






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-33386) Introduce the strategy for Default Scheduler to pursue tasks balancing based on slots level.

2023-10-30 Thread RocMarshal (Jira)
RocMarshal created FLINK-33386:
--

 Summary: Introduce the strategy for Default Scheduler to pursue 
tasks balancing based on slots level.
 Key: FLINK-33386
 URL: https://issues.apache.org/jira/browse/FLINK-33386
 Project: Flink
  Issue Type: Sub-task
  Components: Runtime / Task
Reporter: RocMarshal






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] FLIP-377: Support configuration to disable filter push down for Table/SQL Sources

2023-10-30 Thread Jiabao Sun
Hi, all,

Thanks for the lively discussion.

Based on the discussion, I have made some adjustments to the FLIP document:

1. The name of the newly added option has been changed to 
"source.filter-push-down.enabled".
2. Considering compatibility with older versions, the newly added 
"source.filter-push-down.enabled" option needs to respect the optimizer's 
"table.optimizer.source.predicate-pushdown-enabled" option. 
But there is a consideration to remove the old option in Flink 2.0.
3. We can provide more options to disable other source abilities with side 
effects, such as “source.aggregate.enabled” and “source.projection.enabled"
This is not urgent and can be continuously introduced.

Looking forward to your feedback again.

Best,
Jiabao


> 2023年10月29日 08:45,Becket Qin  写道:
> 
> Thanks for digging into the git history, Jark. I agree it makes sense to
> deprecate this API in 2.0.
> 
> Cheers,
> 
> Jiangjie (Becket) Qin
> 
> On Fri, Oct 27, 2023 at 5:47 PM Jark Wu  wrote:
> 
>> Hi Becket,
>> 
>> I checked the history of "
>> *table.optimizer.source.predicate-pushdown-enabled*",
>> it seems it was introduced since the legacy FilterableTableSource
>> interface
>> which might be an experiential feature at that time. I don't see the
>> necessity
>> of this option at the moment. Maybe we can deprecate this option and drop
>> it
>> in Flink 2.0[1] if it is not necessary anymore. This may help to
>> simplify this discussion.
>> 
>> 
>> Best,
>> Jark
>> 
>> [1]: https://issues.apache.org/jira/browse/FLINK-32383
>> 
>> 
>> 
>> On Thu, 26 Oct 2023 at 10:14, Becket Qin  wrote:
>> 
>>> Thanks for the proposal, Jiabao. My two cents below:
>>> 
>>> 1. If I understand correctly, the motivation of the FLIP is mainly to
>>> make predicate pushdown optional on SOME of the Sources. If so, intuitively
>>> the configuration should be Source specific instead of general. Otherwise,
>>> we will end up with general configurations that may not take effect for
>>> some of the Source implementations. This violates the basic rule of a
>>> configuration - it does what it says, regardless of the implementation.
>>> While configuration standardization is usually a good thing, it should not
>>> break the basic rules.
>>> If we really want to have this general configuration, for the sources
>>> this configuration does not apply, they should throw an exception to make
>>> it clear that this configuration is not supported. However, that seems ugly.
>>> 
>>> 2. I think the actual motivation of this FLIP is about "how a source
>>> should implement predicate pushdown efficiently", not "whether predicate
>>> pushdown should be applied to the source." For example, if a source wants
>>> to avoid additional computing load in the external system, it can always
>>> read the entire record and apply the predicates by itself. However, from
>>> the Flink perspective, the predicate pushdown is applied, it is just
>>> implemented differently by the source. So the design principle here is that
>>> Flink only cares about whether a source supports predicate pushdown or not,
>>> it does not care about the implementation efficiency / side effect of the
>>> predicates pushdown. It is the Source implementation's responsibility to
>>> ensure the predicates pushdown is implemented efficiently and does not
>>> impose excessive pressure on the external system. And it is OK to have
>>> additional configurations to achieve this goal. Obviously, such
>>> configurations will be source specific in this case.
>>> 
>>> 3. Regarding the existing configurations of 
>>> *table.optimizer.source.predicate-pushdown-enabled.
>>> *I am not sure why we need it. Supposedly, if a source implements a
>>> SupportsXXXPushDown interface, the optimizer should push the corresponding
>>> predicates to the Source. I am not sure in which case this configuration
>>> would be used. Any ideas @Jark Wu ?
>>> 
>>> Thanks,
>>> 
>>> Jiangjie (Becket) Qin
>>> 
>>> 
>>> On Wed, Oct 25, 2023 at 11:55 PM Jiabao Sun
>>>  wrote:
>>> 
 Thanks Jane for the detailed explanation.
 
 I think that for users, we should respect conventions over
 configurations.
 Conventions can be default values explicitly specified in
 configurations, or they can be behaviors that follow previous versions.
 If the same code has different behaviors in different versions, it would
 be a very bad thing.
 
 I agree that for regular users, it is not necessary to understand all
 the configurations related to Flink.
 By following conventions, they can have a good experience.
 
 Let's get back to the practical situation and consider it.
 
 Case 1:
 The user is not familiar with the purpose of the
 table.optimizer.source.predicate-pushdown-enabled configuration but follows
 the convention of allowing predicate pushdown to the source by default.
 Just understanding the source.predicate-pushdown-enabled configuration
 and performing fine-grained