[Discuss] expose TaskIOMetricGroup to custom Partitioner

2024-05-16 Thread Steven Wu
Hi,

I am trying to implement a custom range partitioner in the Flink Iceberg
sink. Want to publish some counter metrics for certain scenarios. This is
like the network metrics exposed in `TaskIOMetricGroup`.

This requires adding a new setup method to the custom `Partitioner`
interface. Like to get feedback from the community. More details can be
found in the jira issue [1].

Thanks,
Steven

[1] https://issues.apache.org/jira/browse/FLINK-35384


Re: DataOutputSerializer serializing long UTF Strings

2024-01-22 Thread Steven Wu
I think this is a reasonable extension to `DataOutputSerializer`. Although
64 KB is not small, it is still possible to have long strings over that
limit. There are already precedents of extended APIs
`DataOutputSerializer`. E.g.

public void setPosition(int position) {
Preconditions.checkArgument(
position >= 0 && position <= this.position, "Position out
of bounds.");
this.position = position;
}

public void setPositionUnsafe(int position) {
this.position = position;
}


On Fri, Jan 19, 2024 at 2:51 AM Péter Váry 
wrote:

> Hi Team,
>
> During the root cause analysis of an Iceberg serialization issue [1], we
> have found that *DataOutputSerializer.writeUTF* has a hard limit on the
> length of the string (64k). This is inherited from the
> *DataOutput.writeUTF*
> method, where the JDK specifically defines this limit [2].
>
> For our use-case we need to enable the possibility to serialize longer UTF
> strings, so we will need to define a *writeLongUTF* method with a similar
> specification than the *writeUTF*, but without the length limit.
>
> My question is:
> - Is it something which would be useful for every Flink user? Shall we add
> this method to *DataOutputSerializer*?
> - Is it very specific for Iceberg, and we should keep it in Iceberg
> connector code?
>
> Thanks,
> Peter
>
> [1] - https://github.com/apache/iceberg/issues/9410
> [2] -
>
> https://docs.oracle.com/javase/8/docs/api/java/io/DataOutput.html#writeUTF-java.lang.String-
>


Re: FW: [ANNOUNCE] New Apache Flink Committer - Alexander Fedulov

2024-01-03 Thread Steven Wu
Congra, Alex! Well deserved!

On Wed, Jan 3, 2024 at 2:31 AM David Radley  wrote:

> Sorry for my typo.
>
> Many congratulations Alex!
>
> From: David Radley 
> Date: Wednesday, 3 January 2024 at 10:23
> To: David Anderson 
> Cc: dev@flink.apache.org 
> Subject: Re: [EXTERNAL] [ANNOUNCE] New Apache Flink Committer - Alexander
> Fedulov
> Many Congratulations David .
>
> From: Maximilian Michels 
> Date: Tuesday, 2 January 2024 at 12:16
> To: dev 
> Cc: Alexander Fedulov 
> Subject: [EXTERNAL] [ANNOUNCE] New Apache Flink Committer - Alexander
> Fedulov
> Happy New Year everyone,
>
> I'd like to start the year off by announcing Alexander Fedulov as a
> new Flink committer.
>
> Alex has been active in the Flink community since 2019. He has
> contributed more than 100 commits to Flink, its Kubernetes operator,
> and various connectors [1][2].
>
> Especially noteworthy are his contributions on deprecating and
> migrating the old Source API functions and test harnesses, the
> enhancement to flame graphs, the dynamic rescale time computation in
> Flink Autoscaling, as well as all the small enhancements Alex has
> contributed which make a huge difference.
>
> Beyond code contributions, Alex has been an active community member
> with his activity on the mailing lists [3][4], as well as various
> talks and blog posts about Apache Flink [5][6].
>
> Congratulations Alex! The Flink community is proud to have you.
>
> Best,
> The Flink PMC
>
> [1]
> https://github.com/search?type=commits=author%3Aafedulov+org%3Aapache
> [2]
> https://issues.apache.org/jira/browse/FLINK-28229?jql=status%20in%20(Resolved%2C%20Closed)%20AND%20assignee%20in%20(afedulov)%20ORDER%20BY%20resolved%20DESC%2C%20created%20DESC
> [3] https://lists.apache.org/list?dev@flink.apache.org:lte=100M:Fedulov
> [4] https://lists.apache.org/list?u...@flink.apache.org:lte=100M:Fedulov
> [5]
> https://flink.apache.org/2020/01/15/advanced-flink-application-patterns-vol.1-case-study-of-a-fraud-detection-system/
> [6]
> https://www.ververica.com/blog/presenting-our-streaming-concepts-introduction-to-flink-video-series
>
> Unless otherwise stated above:
>
> IBM United Kingdom Limited
> Registered in England and Wales with number 741598
> Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU
>


Re: [DISCUSS] Promote SinkV2 to @Public and deprecate SinkFunction

2023-02-06 Thread Steven Wu
 misleading to
> > users
> > > to not @Deprecated SinkFunction given that is clearly will be
> deprecated.
> >
> >
> >
> >
> > > Cheers,
> > >
> > > Konstantin
> > >
> > >
> > > Am Mo., 6. Feb. 2023 um 13:26 Uhr schrieb Jark Wu :
> > >
> > > > I agree with Dong Lin.
> > > >
> > > > Oracle explains how to use Deprecate API [1]:
> > > >
> > > > You are strongly recommended to use the Javadoc @deprecated tag with
> > > > > appropriate comments explaining how to use the new API. This
> ensures
> > > > > developers will *have a workable migration path from the old API to
> > the
> > > > > new API*.
> > > >
> > > >
> > > > From a user's perspective, the workable migration path is very
> > important.
> > > > Otherwise, it blurs the semantics of API deprecation. The Flink API's
> > > > compatibility and stability issues in the past left a bad impression
> on
> > > the
> > > > downstream projects. We should be careful when changing and
> deprecating
> > > > APIs, especially when there are known migration gaps. I think it's a
> > good
> > > > idea to migrate Flink-owned connectors before marking old API
> > deprecated.
> > > > This ensures downstream projects can migrate to new APIs smoothly.
> > > >
> > > > Best,
> > > > Jark
> > > >
> > > > [1]:
> > > >
> > > >
> > >
> >
> https://docs.oracle.com/javase/8/docs/technotes/guides/javadoc/deprecation/deprecation.html
> > > >
> > > > On Mon, 6 Feb 2023 at 10:01, Steven Wu  wrote:
> > > >
> > > > > Regarding the discussion on global committer [1] for sinks with
> > global
> > > > > transactions, there is no consensus on solving that problem in
> > SinkV2.
> > > > Will
> > > > > it require any breaking change in SinkV2?
> > > > >
> > > > > Also will SinkV1 be deprecated too? or it should happen sometime
> > after
> > > > > SinkFunction deprecation?
> > > > >
> > > > > [1]
> https://lists.apache.org/thread/82bgvlton9olb591bfg2djv0cshj1bxj
> > > > >
> > > > > On Sun, Feb 5, 2023 at 2:14 AM Dong Lin 
> wrote:
> > > > >
> > > > > > Hi Konstantin,
> > > > > >
> > > > > > Thanks for the comment! Please see my comment inline.
> > > > > >
> > > > > > Cheers,
> > > > > > Dong
> > > > > >
> > > > > > On Sat, Feb 4, 2023 at 2:06 AM Konstantin Knauf <
> kna...@apache.org
> > >
> > > > > wrote:
> > > > > >
> > > > > > > Hi everyone,
> > > > > > >
> > > > > > > sorry for joining the discussion late.
> > > > > > >
> > > > > > > 1) Is there an option to deprecate SinkFunction in Flink 1.17
> > while
> > > > > > leaving
> > > > > > > SinkV2 @PublicEvolving in Flink 1.17. We then aim to make
> SinkV2
> > > > > @Public
> > > > > > in
> > > > > > > and remove SinkFunction in Flink 1.18. @PublicEvolving are
> > intended
> > > > for
> > > > > > > public use. So, I don't see it as a blocker for deprecating
> > > > > SinkFunction
> > > > > > > that we have to make SinkV2 @PublicEvovling. For reference this
> > is
> > > > the
> > > > > > > description of @PublicEvovling:
> > > > > > >
> > > > > > > /**
> > > > > > >  * Annotation to mark classes and methods for public use, but
> > with
> > > > > > > evolving interfaces.
> > > > > > >  *
> > > > > > >  * Classes and methods with this annotation are intended for
> > > > public
> > > > > > > use and have stable behavior.
> > > > > > >  * However, their interfaces and signatures are not considered
> to
> > > be
> > > > > > > stable and might be changed
> > > > > > >  * across versions.
> > > > > > >  *
> > > > > > >  * This annotation also excludes methods and classes with
> > > 

Re: [DISCUSS] Promote SinkV2 to @Public and deprecate SinkFunction

2023-02-05 Thread Steven Wu
Regarding the discussion on global committer [1] for sinks with global
transactions, there is no consensus on solving that problem in SinkV2. Will
it require any breaking change in SinkV2?

Also will SinkV1 be deprecated too? or it should happen sometime after
SinkFunction deprecation?

[1] https://lists.apache.org/thread/82bgvlton9olb591bfg2djv0cshj1bxj

On Sun, Feb 5, 2023 at 2:14 AM Dong Lin  wrote:

> Hi Konstantin,
>
> Thanks for the comment! Please see my comment inline.
>
> Cheers,
> Dong
>
> On Sat, Feb 4, 2023 at 2:06 AM Konstantin Knauf  wrote:
>
> > Hi everyone,
> >
> > sorry for joining the discussion late.
> >
> > 1) Is there an option to deprecate SinkFunction in Flink 1.17 while
> leaving
> > SinkV2 @PublicEvolving in Flink 1.17. We then aim to make SinkV2 @Public
> in
> > and remove SinkFunction in Flink 1.18. @PublicEvolving are intended for
> > public use. So, I don't see it as a blocker for deprecating SinkFunction
> > that we have to make SinkV2 @PublicEvovling. For reference this is the
> > description of @PublicEvovling:
> >
> > /**
> >  * Annotation to mark classes and methods for public use, but with
> > evolving interfaces.
> >  *
> >  * Classes and methods with this annotation are intended for public
> > use and have stable behavior.
> >  * However, their interfaces and signatures are not considered to be
> > stable and might be changed
> >  * across versions.
> >  *
> >  * This annotation also excludes methods and classes with evolving
> > interfaces / signatures within
> >  * classes annotated with {@link Public}.
> >  */
> >
> >
> > Marking SinkFunction @Deprecated would already single everyone to move to
> > SinkV2, which we as a community, I believe, have a strong interest in.
> Its
> >
>
> Yes, I also believe we all have this strong interest. I just hope that this
> can be done in the best possible way that does not confuse users.
>
> I probably still have the same concern regarding its impact on users: if we
> mark an API as deprecated, it effectively means the users of this API
> should start to migrate to another API (e.g. SinkV2) and we might remove
> this API in the future. However, given that we know there are known
> problems preventing users from doing so, it seems that we are not ready to
> send this message to users right.
>
> If I understand correctly, I guess you are suggesting that by marking
> SinkFunction as deprecated, we can put higher pressure on Flink
> contributors to update the existing Flink codebase to improve and use
> SinkV2.
>
> I am not sure this is the right way to use @deprecated, which has a
> particular meaning for its users rather than contributors. And I am also
> not sure we can even pressure contributors of an open-source project into
> developing a feature (e.g. migrate all existing SinkFunction subclasses to
> SinkV2). IMO, the typical way is for the contributor with interest/time to
> work on the feature, or talk to other contributors whether they are willing
> to collaborate/work on this, rather than pressuring other contributors into
> working on this.
>
>
> almost comical how long the transition from SourceFurnction/SinkFunction to
> > Source/Sink takes us. At the same time, we leave ourselves the option to
> to
> > make small changes to SinkV2 if any problems arise during the migration
> of
> > these connector.
> >
> > I think, we have a bit of a chicken/egg problem here. The pressure for
> >
>
> Similar to the reason described above, I am not sure we have a chicken/egg
> problem here. The issue here is that SinkV2 is not ready and we have a lot
> of existing SinkFunction that is not migrated by ourselves. We (Flink
> contributors) probably do not need to mark SinkFunction as deprecated in
> order to address these issues in our own codebase.
>
>
> users and contributors is not high enough to move away from SinkFunction as
> > long as its not deprecated, but at the same time we need people to
> migrate
> > their connectors to see if there are any gaps in SinkV2. I believe, the
> > combination proposed above could bridge this problem.
> >
> > 2) I don't understand the argument of waiting until some of the
> > implementations are @Public. How can we make the implementations of the
> > SinkV2 API @Public without making SinkV2 @Public? All public methods of
> > SinkV2 are part of every implementation. So to me it actually seems to be
> > opposite: in order to make any of the implementation @Public we first
> need
> > to make the API @Public.
> >
>
> Yeah I also agree with you.
>
>
> >
> > Cheers,
> >
> > Konstantin
> >
> > Am Mo., 30. Jan. 2023 um 13:18 Uhr schrieb Dong Lin  >:
> >
> > > Hi Martijn,
> > >
> > > Thanks for driving this effort to clean-up the Flink codebase!
> > >
> > > I like the idea to cleanup Flink codebase to avoid having two Sinks. On
> > the
> > > other hand, I also thing the concern mentioned by Jing makes sense. In
> > > addition to thinking in terms of the rule proposed in FLIP-197
> > > <
> > >
> >
> 

[DISCUSS] streaming shuffle to improve data clustering and tame small files problem

2023-01-30 Thread Steven Wu
Hi,

We had a proposal to add a streaming shuffling stage in the Flink Iceberg
sink to to improve data clustering and tame the small files problem [1].

Here are a couple of common use cases.
* Event time partitioned table where we can get small files problem due to
skewed and long-tail distribution on event time hours.
* Improve data clustering on non-partitioned columns (e.g. device_id) where
table format can leverage min-max value range for effective file pruning.

The main idea is to calculate (skewed) traffic distribution statistics and
shuffle records based on the computed statistics. This can achieve good
data clustering on the writer subtasks while largely avoiding small files
and maintaining relatively balanced traffic volume across writer subtasks.
We finished a PoC on event time partitioned tables and saw 20x reduction on
number of files.

In another thread, there is a question if it makes sense to add this
clustering shuffle feature to Flink DataStream [2], as it can potentially
be useful for other sinks (like files, Apache Hudi, Delta Lake). Hence we
would like to gauge the community's initial interests first before writing
up a large FLIP.

Thanks,
Steven

[1]
https://docs.google.com/document/d/13N8cMqPi-ZPSKbkXGOBMPOzbv2Fua59j8bIjjtxLWqo
[2]
https://lists.apache.org/list?dev@flink.apache.org:lte=1M:%22[DISCUSS]%20FLIP-264%20Extract%20BaseCoordinatorContext%22


Re: [DISCUSS] FLIP-264 Extract BaseCoordinatorContext

2023-01-30 Thread Steven Wu
Let me start an initial discussion thread at dev@flink. Like to gauge the
interests from the community (including Hudi and Delta Lake) first before
spending time on writing up a big FLIP.

On Fri, Jan 27, 2023 at 10:45 PM Jark Wu  wrote:

> Thank Steven for the explanation.
>
> It sounds good to me to implement the shuffle operator in the Iceberg
> project first.
> We can contribute it to Flink DataStream in the future if other
> projects/connectors also need it.
>
> Best,
> Jark
>
>
> On Wed, 18 Jan 2023 at 02:11, Steven Wu  wrote:
>
>> Jark,
>>
>> We were planning to discard the proposal due to some valid concerns
>> raised in the thread. Also, this proposal itself didn't really save too
>> much code duplication (maybe 100 lines or so).
>>
>> I also thought that the shuffle operator for DataStream can be useful for
>> other connectors too. The shuffling part (based on traffic statistics) can
>> be generic for other connectors. There will be some small integration part
>> unique to Iceberg, which can stay in Iceberg. If we go with this new
>> direction, we would need a new FLIP.
>>
>> Thanks,
>> Steven
>>
>>
>>
>> On Mon, Jan 16, 2023 at 12:30 AM Jark Wu  wrote:
>>
>>> What's the status and conclusion of this discussion?
>>>
>>> I have seen the value of exposing OperatorCoordinator because of the
>>> powerful RPC calls,
>>> some projects are already using it, such as Hudi[1]. But I agree this is
>>> a large topic and
>>> requires another FLIP.
>>>
>>> I am also concerned about extracting a Public base class without
>>> implementations, and
>>> clear usage is easy to break in the future. However, I think the
>>> shuffling operator can be a
>>> generic component used by other connectors and DataStream jobs.
>>>
>>> Have you considered contributing the ShuffleOperator to the Flink main
>>> repository as a
>>> part of DataStream API (e.g., DataStream#dynamicShuffle)? It's easy to
>>> extract the common
>>> part between SourceCoordinatorContext and ShuffleCoordinatorContext in a
>>> single repository
>>>  as an internal implementation.
>>>
>>>
>>> Best,
>>> Jark
>>>
>>> [1]:
>>> https://github.com/apache/hudi/blob/a80bb4f717ad8a89770176a1238c4b08874044e8/hudi-flink-datasource/hudi-flink1.16.x/src/main/java/org/apache/hudi/adapter/OperatorCoordinatorAdapter.java
>>>
>>> On Thu, 3 Nov 2022 at 22:36, Piotr Nowojski 
>>> wrote:
>>>
>>>> Ohhh, I was confused. I thought that the proposal is to make
>>>> `CoordinatorContextBase` part of the public API.
>>>>
>>>> However, I'm also against extracting `CoordinatorContextBase` as an
>>>> `@Internal` class as well.
>>>>
>>>> 1. Connectors shouldn't reuse internal classes. Using `@Internal`
>>>> CoordinatedOperatorFactory would be already quite bad, but at least
>>>> this is
>>>> a relatively stable internal API. Using `@Internal`
>>>> `@CoordinatorContextBase`, and refactoring out this base class just for
>>>> the
>>>> sake of re-using it in a connector is IMO even worse.
>>>> 2. Double so if they are in a separate repository (as the iceberg
>>>> connector
>>>> will be/is, right?). There would be no way to prevent breaking changes
>>>> between repositories.
>>>>
>>>> If that's only intended as the stop-gap solution until we properly
>>>> expose
>>>> coordinators, the lesser evil would be IMO to copy/paste/modify
>>>> SourceCoordinatorContext to the flink-connector-iceberg repository.
>>>>
>>>> Best,
>>>> Piotrek
>>>>
>>>> czw., 3 lis 2022 o 12:51 Maximilian Michels 
>>>> napisał(a):
>>>>
>>>> > +1 If we wanted to expose the OperatorCoordinator API, we should
>>>> provide
>>>> > an adequate interface. The FLIP partially addresses this by trying to
>>>> > factor out RPC code which other coordinators might make use of, but
>>>> there
>>>> > is additional design necessary to realize a public operator API.
>>>> >
>>>> > Just to be clear, I'm not opposed to any of the changes in the FLIP. I
>>>> > think they make sense in the context of an Iceberg ShuffleCoordinator
>>>> in
>>>> > Flink. If we were to add such a new coordinator, feel free to make t

Re: [DISCUSS] FLIP-274 : Introduce metric group for OperatorCoordinator

2023-01-17 Thread Steven Wu
> Additionally, the configurable variables (operator name/id) are
logically not attached to the coordinator, but operators, so to me it
just doesn't make sense to structure it like this.

Chesnay, maybe we should clarify the terminology. To me, pperators (like
FLIP-27 source) can have two parts (coordinator and reader/subtask). I
think it is fine to include operator name/id for coordinator metrics.

On Mon, Jan 16, 2023 at 2:13 AM Chesnay Schepler  wrote:

> Slight correction: Using metrics.scope.jm.job as the default should be
> safe.
>
> On 16/01/2023 10:18, Chesnay Schepler wrote:
> > The proposed ScopeFormat is still problematic for a few reasons.
> >
> > Extending the set of ScopeFormats is problematic because it in
> > practice it breaks the config if users actively rely on it, since
> > there's now another key that they _must_ set for it to be
> > consistent/compatible with their existing setup.
> > Unfortunately due to how powerful scope formats are we can't derive a
> > default value that matches their existing setup.
> > Hence we should try to do this as rarely as possible.
> >
> > This FLIP does not adhere to that since it proposes a dedicated format
> > for coordinators; next time we want to expose operator-specific
> > metrics (e.g., in the scheduler) we'd have to add another one to
> > support it.
> >
> > Additionally, the configurable variables (operator name/id) are
> > logically not attached to the coordinator, but operators, so to me it
> > just doesn't make sense to structure it like this.
> >
> > Another thing I'm concerned about is that, because we don't include
> > tasks in the hierarchy, users wishing to collect all metrics for a
> > particular task (in this case ==vertex) now have to go significantly
> > out of their way to get them, since they can no longer just filter by
> > the task ID but have to be filter for _all_ operators that are part of
> > the task.
> >
> > On 16/01/2023 03:09, Hang Ruan wrote:
> >> Hi, @ches...@apache.org  ,
> >>
> >> Do you have time to help to review this FLIP again after the
> >> modification?
> >> Looking forward to your reply.
> >> This FLIP will add a new configuration for the
> >> OperatorCoordinatorMetricGroup scope format. It provides an internal
> >> implementation and is added as a component to the
> >> JobManagerJobMetricGroup.
> >> If something doesn't make sense, could you provide some advice? It
> >> will be
> >> very helpful. Thanks a lot for your help.
> >>
> >> Best,
> >> Hang
> >>
> >> Martijn Visser  于2023年1月11日周三 16:34写道:
> >>
> >>
> >>> Hi Hang,
> >>>
> >>> I'm a bit surprised that this has gone to a vote, given that Chesnay
> >>> deliberately mentioned that he would vote against it as-is. I would
> >>> expect
> >>> that before going to a vote, he has had the opportunity to
> >>> participate in
> >>> this discussion.
> >>>
> >>> Best regards,
> >>>
> >>> Martijn
> >>>
> >>> On Tue, Jan 3, 2023 at 12:53 PM Jark Wu  wrote:
> >>>
>  Hi Dong,
> 
>  Regarding “SplitEnumeratorContext#metricGroup”, my only concern is
>  that
>  this is a core interface for the FLIP. It’s hard to tell how
>  sources use
>  metric group without mentioning this interface. Even if this is an
> >>> existing
>  API, I think it’s worth introducing the interface again and declaring
> >>> that
>  we will implement the interface instead of a no-op method in this
>  FLIP.
> 
>  Anyway, this is a minor problem and shouldn’t block this FLIP. I’m
>  +1 to
>  start a vote.
> 
>  Best,
>  Jark
> 
> 
> > 2023年1月3日 10:03,Hang Ruan  写道:
> >
> > Hi, Jark and Dong,
> >
> > Thanks for your comments. Sorry for my late reply.
> >
> > For suggestion 1, I plan to implement the
> > SplitEnumeratorMetricGroup in
> > another issue, and it is not contained in this FLIP. I will add some
> > description about this part.
> > For suggestion 2, changes about OperatorCoordinator#metricGroup has
>  already
> > been documented in the proposed change section.
> >
> > Best,
> > Hang
> >
> > Dong Lin  于2023年1月1日周日 09:45写道:
> >
> >> Let me chime-in and add comments regarding the public interface
> >>> section.
> >> Please see my comments inline.
> >>
> >> On Thu, Dec 29, 2022 at 6:08 PM Jark Wu  wrote:
> >>
> >>> Hi Hang,
> >>>
> >>> Thanks for driving this discussion. I think this is a very useful
>  feature
> >>> for connectors.
> >>>
> >>> The FLIP looks quite good to me, and I just have two suggestions.
> >>>
> >>> 1. In the "Public Interface" section, mention that the
> >>> implementation
> >>> behavior of "SplitEnumeratorContext#metricGroup" is changed from
> >> returning
> >>> null to returning a concrete SplitEnumeratorMetricGroup instance.
> >>> Even
> >>> though the API is already there, the behavior change can also be
> >> considered
> >>> a public 

Re: [DISCUSS] FLIP-264 Extract BaseCoordinatorContext

2023-01-17 Thread Steven Wu
gt; Best,
>> >>>> Qingsheng
>> >>>>
>> >>>> On Tue, Nov 1, 2022 at 8:29 PM Maximilian Michels 
>> >>>> wrote:
>> >>>>
>> >>>>> Thanks Steven! My confusion stemmed from the lack of context in the
>> >>>>> FLIP.
>> >>>>> The first version did not lay out how the refactoring would be used
>> >>>>> down
>> >>>>> the line, e.g. by the ShuffleCoordinator. The OperatorCoordinator
>> API
>> >>>>> is a
>> >>>>> non-public API and before reading the code, I wasn't even aware how
>> >>>>> exactly
>> >>>>> it worked and whether it would be available to regular operators (it
>> >>>>> was
>> >>>>> originally intended for sources only).
>> >>>>>
>> >>>>> I might seem pedantic here but I believe the purpose of a FLIP
>> should
>> >>>>> be to
>> >>>>> describe the *why* behind the changes, not only the changes itself.
>> A
>> >>>>> FLIP
>> >>>>> is not a formality but is a tool to communicate and discuss
>> changes. I
>> >>>>> think we still haven't laid out the exact reasons why we are
>> factoring
>> >>>>> out
>> >>>>> the base. As far as I understand now, we need the base class to deal
>> >>>>> with
>> >>>>> concurrent updates in the custom Coordinator from the runtime
>> >>>>> (sub)tasks.
>> >>>>> Effectively, we are enforcing an actor model for the processing of
>> the
>> >>>>> incoming messages such that the OperatorCoordinator can cleanly
>> update
>> >>>>> its
>> >>>>> state. However, if there are no actual implementations that make use
>> >>>>> of the
>> >>>>> refactoring in Flink itself, I wonder if it would make sense to copy
>> >>>>> this
>> >>>>> code to the downstream implementation, e.g. the ShuffleCoordinator.
>> As
>> >>>>> soon
>> >>>>> as it is part of Flink, we could of course try to consolidate this
>> >>>>> code.
>> >>>>>
>> >>>>> Considering the *how* of this, there appear to be both methods from
>> >>>>> SourceCoordinator (e.g. runInEventLoop) as well as
>> >>>>> SourceCoordinatorContext
>> >>>>> listed in the FLIP, as well as methods which do not appear anywhere
>> in
>> >>>>> Flink code, e.g. subTaskReady / subTaskNotReady /
>> sendEventToOperator.
>> >>>>> It
>> >>>>> appears that some of this has been extracted from a downstream
>> >>>>> implementation. It would be great to adjust this, such that it
>> >>>>> reflects the
>> >>>>> status quo in Flink.
>> >>>>>
>> >>>>> -Max
>> >>>>>
>> >>>>> On Fri, Oct 28, 2022 at 5:53 AM Steven Wu 
>> >>>>> wrote:
>> >>>>>
>> >>>>> > Max,
>> >>>>> >
>> >>>>> > Thanks a lot for the comments. We should clarify that the shuffle
>> >>>>> > operator/coordinator is not really part of the Flink sink
>> >>>>> > function/operator. shuffle operator is a custom operator that can
>> be
>> >>>>> > inserted right before the Iceberg writer operator. Shuffle
>> operator
>> >>>>> > calculates the traffic statistics and performs a custom
>> >>>>> partition/shuffle
>> >>>>> > (DataStream#partitionCustom) to cluster the data right before they
>> >>>>> get to
>> >>>>> > the Iceberg writer operator.
>> >>>>> >
>> >>>>> > We are not proposing to introduce a sink coordinator for the sink
>> >>>>> > interface. Shuffle operator needs the CoordinatorContextBase to
>> >>>>> > facilitate the communication btw shuffle subtasks and coordinator
>> for
>> >>>>> > traffic statistics aggregation. The communication part is already
>> >>>>> > implemented by SourceCoordinatorContext.
>> >

Re: [ANNOUNCE] New Apache Flink Committer - Matyas Orhidi

2022-11-21 Thread Steven Wu
Congrats, Matyas!

On Mon, Nov 21, 2022 at 11:19 PM godfrey he  wrote:

> Congratulations, Matyas!
>
> Matthias Pohl  于2022年11月22日周二 13:40写道:
> >
> > Congratulations, Matyas :)
> >
> > On Tue, Nov 22, 2022 at 11:44 AM Xingbo Huang 
> wrote:
> >
> > > Congrats Matyas!
> > >
> > > Best,
> > > Xingbo
> > >
> > > Yanfei Lei  于2022年11月22日周二 11:18写道:
> > >
> > > > Congrats Matyas! 
> > > >
> > > > Zheng Yu Chen  于2022年11月22日周二 11:15写道:
> > > >
> > > > > Congratulations ~ 
> > > > >
> > > > > Márton Balassi  于2022年11月21日周一 22:18写道:
> > > > >
> > > > > > Hi everyone,
> > > > > >
> > > > > > On behalf of the PMC, I'm very happy to announce Matyas Orhidi
> as a
> > > new
> > > > > > Flink
> > > > > > committer.
> > > > > >
> > > > > > Matyas has over a decade of experience of the Big Data ecosystem
> and
> > > > has
> > > > > > been working with Flink full time for the past 3 years. In the
> open
> > > > > source
> > > > > > community he is one of the key driving members of the Kubernetes
> > > > Operator
> > > > > > subproject. He implemented multiple key features in the operator
> > > > > including
> > > > > > the metrics system and the ability to dynamically configure
> watched
> > > > > > namespaces. He enjoys spreading the word about Flink and
> regularly
> > > does
> > > > > so
> > > > > > via authoring blogposts and giving talks or interviews
> representing
> > > the
> > > > > > community.
> > > > > >
> > > > > > Please join me in congratulating Matyas for becoming a Flink
> > > committer!
> > > > > >
> > > > > > Best,
> > > > > > Marton
> > > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Best
> > > > >
> > > > > ConradJam
> > > > >
> > > >
> > > >
> > > > --
> > > > Best,
> > > > Yanfei
> > > >
> > >
>


Re: [DISCUSS] FLIP-264 Extract BaseCoordinatorContext

2022-10-27 Thread Steven Wu
ntext will come with a separate proposal, thus we
> try
> >>> to keep it simple in Flip 264 to understand. I can add a little bit
> more
> >>> about how to use the coordinator context in Flip 264 if you think that
> will
> >>> be helpful.
> >>>
> >>> Thanks!
> >>> Gang
> >>>
> >>>
> >>>
> >>> On Wed, Oct 26, 2022 at 7:25 AM Maximilian Michels 
> >>> wrote:
> >>>
> >>>> Thanks for the proposal, Gang! This is indeed somewhat of a bigger
> >>>> change. The coordinator for sources, as part of FLIP-27, was
> specifically
> >>>> added to synchronize the global watermark and to assign splits
> dynamically.
> >>>> However, it practically allows arbitrary RPC calls between the task
> and the
> >>>> job manager. I understand that there is concern that such a powerful
> >>>> mechanism should not be available to all operators. Nevertheless, I
> see the
> >>>> practical use in case of sinks like Iceberg. So I'd suggest limiting
> this
> >>>> feature to sinks (and sources) only.
> >>>>
> >>>> I'm wondering whether extracting the SourceCoordinatorContext is
> >>>> enough to achieve what you want. There will be additional work
> necessary,
> >>>> e.g. create a SinkCoordinator similarly to SourceCoordinator which
> handles
> >>>> the RPC calls and the checkpointing. I think it would be good to
> outline
> >>>> this in the FLIP.
> >>>>
> >>>> -Max
> >>>>
> >>>> On Sun, Oct 16, 2022 at 9:01 AM Steven Wu 
> wrote:
> >>>>
> >>>>> sorry. sent the incomplete reply by mistake.
> >>>>>
> >>>>> If there are any concrete concerns, we can discuss. In the
> FLINK-27405
> >>>>> [1],
> >>>>> Avid pointed out some implications regarding checkpointing. In this
> >>>>> small
> >>>>> FLIP, we are not exposing/changing any checkpointing logic, we mainly
> >>>>> need
> >>>>> the coordinator context functionality to facilitate the communication
> >>>>> between coordinator and subtasks.
> >>>>>
> >>>>> [1] https://issues.apache.org/jira/browse/FLINK-27405
> >>>>>
> >>>>> On Sun, Oct 16, 2022 at 8:56 AM Steven Wu 
> >>>>> wrote:
> >>>>>
> >>>>> > Hang, appreciate your input. Agree that `CoordinatorContextBase`
> is a
> >>>>> > better name considering Flink code convention.
> >>>>> >
> >>>>> > If there are any concrete concerns, we can discuss. In the jira,
> >>>>> >
> >>>>> >
> >>>>> >
> >>>>> > On Sun, Oct 16, 2022 at 12:12 AM Hang Ruan  >
> >>>>> wrote:
> >>>>> >
> >>>>> >> Hi,
> >>>>> >>
> >>>>> >> IMP, I agree to extract a base class for SourceCoordinatorContext.
> >>>>> >> But I prefer to use the name `OperatorCoordinatorContextBase` or
> >>>>> >> `CoordinatorContextBase` as the format like `SourceReaderBase`.
> >>>>> >> I also agree to what Piotr said. Maybe more problems will occur
> when
> >>>>> >> connectors start to use it.
> >>>>> >>
> >>>>> >> Best,
> >>>>> >> Hang
> >>>>> >>
> >>>>> >> Steven Wu  于2022年10月14日周五 22:31写道:
> >>>>> >>
> >>>>> >> > Piotr,
> >>>>> >> >
> >>>>> >> > The proposal is to extract the listed methods from @Iinternal
> >>>>> >> > SourceCoordinatorContext to a @PublicEvolving
> >>>>> BaseCoordinatorContext.
> >>>>> >> >
> >>>>> >> > The motivation is that other operators can leverage the
> >>>>> communication
> >>>>> >> > mechanism btw operator coordinator and operator subtasks. For
> >>>>> example,
> >>>>> >> in
> >>>>> >> > the linked google doc shuffle operator (in Flink Iceberg sink)
> can
> >>>>> >> leverage
> >>>

Re: [VOTE] FLIP 267: Iceberg Connector

2022-10-24 Thread Steven Wu
+1 (non-binding)

On Mon, Oct 24, 2022 at 11:32 AM Martijn Visser 
wrote:

> +1 (binding)
>
> On Thu, Oct 20, 2022 at 12:37 AM  wrote:
>
> > Hi all,
> >
> > Thanks for all the feedback for FLIP 267[1]: Iceberg Connector in the
> > discussion thread [2].
> >
> > I would like to start a vote thread for it. The vote will be open for
> > atleast 72 hours.
> >
> >
> > [1] https://lists.apache.org/thread/ycdll6wxj4npz5h7zvoh9khtfw8vy9nr <
> > https://lists.apache.org/thread/ycdll6wxj4npz5h7zvoh9khtfw8vy9nr>
> > [2]
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP+267%3A+Iceberg+Connector
> >
> > Thanks
> > Abid
>


Re: [VOTE] Externalized connector release details​

2022-10-20 Thread Steven Wu
Chesnay, thanks for the write-up. very helpful!

Regarding the parent pom, I am wondering if it can be published to the
`org.apache.flink` group?


io.github.zentol.flink
flink-connector-parent
1.0


On Mon, Oct 17, 2022 at 5:52 AM Chesnay Schepler  wrote:

>
> https://cwiki.apache.org/confluence/display/FLINK/Externalized+Connector+development
>
> On 17/10/2022 13:13, Chesnay Schepler wrote:
> > The vote has passed unanimously.
> >
> > +1 Votes:
> > - Danny (binding)
> > - Martijn (binding)
> > - Ferenc (non-binding)
> > - Thomas (binding)
> > - Ryan (non-binding)
> > - Jing (non-binding)
> > - Matthias (binding)
> >
> > I will now document this in the wiki and start working on the release
> > scripts.
> >
> > On 12/10/2022 15:12, Chesnay Schepler wrote:
> >> Since the discussion
> >> (https://lists.apache.org/thread/mpzzlpob9ymkjfybm96vz2y2m5fjyvfo)
> >> has stalled a bit but we need a conclusion to move forward I'm
> >> opening a vote.
> >>
> >> Proposal summary:
> >>
> >> 1) Branch model
> >> 1.1) The default branch is called "main" and used for the next major
> >> iteration.
> >> 1.2) Remaining branches are called "vmajor.minor". (e.g., v3.2)
> >> 1.3) Branches are not specific to a Flink version. (i.e., no v3.2-1.15)
> >>
> >> 2) Versioning
> >> 2.1) Source releases: major.minor.patch
> >> 2.2) Jar artifacts: major.minor.match-flink-major.flink-minor
> >> (This may imply releasing the exact same connector jar multiple times
> >> under different versions)
> >>
> >> 3) Flink compatibility
> >> 3.1) The Flink versions supported by the project (last 2 major Flink
> >> versions) must be supported.
> >> 3.2) How this is achived is left to the connector, as long as it
> >> conforms to the rest of the proposal.
> >>
> >> 4) Support
> >> 4.1) The last 2 major connector releases are supported with only the
> >> latter receiving additional features, with the following exceptions:
> >> 4.1.a) If the older major connector version does not support any
> >> currently supported Flink version, then it is no longer supported.
> >> 4.1.b) If the last 2 major versions do not cover all supported Flink
> >> versions, then the latest connector version that supports the older
> >> Flink version /additionally /gets patch support.
> >> 4.2) For a given major connector version only the latest minor
> >> version is supported.
> >> (This means if 1.1.x is released there will be no more 1.0.x release)
> >>
> >>
> >> I'd like to clarify that these won't be set in stone for eternity.
> >> We should re-evaluate how well this model works over time and adjust
> >> it accordingly, consistently across all connectors.
> >> I do believe that as is this strikes a good balance between
> >> maintainability for us and clarity to users.
> >>
> >>
> >> Voting schema:
> >>
> >> Consensus, committers have binding votes, open for at least 72 hours.
> >>
> >
>
>


Re: [Discuss]- Donate Iceberg Flink Connector

2022-10-20 Thread Steven Wu
Yuxia, those are valid points. But they are applicable to every connector
(not just Iceberg).

I also had a similar concern expressed in the discussion thread of
"Externalized connector release details". My main concern is the
multiplication factor of two upstream projects (Flink & storage/Iceberg).
if we limit both to two versions, it will be 2x2, which might still be ok.
but if we need to do 3x3, that will probably be too many to manage.

On Thu, Oct 20, 2022 at 5:27 AM yuxia  wrote:

> Hi, abmo, Abid!
> Thanks you guys for diriving it.
>
> As Iceberg is more and more pupular and is an important
> upstream/downstream system to Flink, I believe Flink community has paid
> much attention to Icberg and hope to be closer to Icberg community. No
> mather it's moved to Flink unbrella or not, I believe Flink experts are
> glad to give feedbacks to Iceberg and take part in the development of
> Icberg Flink connector.
>
>
> Personaly, as a Flink contributor and main maintainer of Hive Flink
> connector, I'm really glad to take part in Iceberg community for the
> maintenance and future development of Icberg Flink connector. I think I can
> provide some views from Flink side and bring some feedbacks from Icberg
> comminuty to Flink community.
>
> But I have some concerns for moving the connector from Icberg repository
> to a separate connector under Flink umbrella:
>
> 1: If Iceberg develops new features, for icberg flink connector, it have
> to wait the Iceberg to be released before starting the development and
> release for making use of the new features.  For users, they may need to
> wait a much longer time before enjoying the new features of Icberg by using
> Flink.
>
> 2: If we move it to a sepreate repositoy, I'm afrad of it'll loss
> attention from both Flink and Iceberg sides which is definitely a harm to
> Flink and Icerberg community. What's more, whenever Flink and icberge
> release a version, we need to update the version in the sepreate
> repositoy, which I think may be easily forgotten and tedious.
>
> Feel sorry for raising a different voice in this dicussion, but I think it
> deserves a further dicussion in dev mail list, at least it will help to get
> Flink developer's attention to Iceberg.
>
> Best regards,
> Yuxia
>
> - 原始邮件 -
> 发件人: "abmo work" 
> 收件人: "dev" 
> 发送时间: 星期四, 2022年 10 月 20日 上午 6:33:40
> 主题: Re: [Discuss]- Donate Iceberg Flink Connector
>
> Hi Martijn,
>
> I created a FLIP for this, its FLIP 267: Iceberg Connector  <
> https://cwiki.apache.org/confluence/display/FLINK/FLIP+267:+Iceberg+Connector
> >
> Please let me know if anything else is needed. My email on confluence is
> abmo.w...@icloud.com.
>
> As 1.0 was released today, from Iceberg perspective we need to figure out
> what versions of Flink we will support and the release timeline as to when
> the connector will be built and release off of the new repo vs Iceberg.
>
> Thanks
> Abid
>
> > On Oct 19, 2022, at 12:43 PM, Martijn Visser 
> wrote:
> >
> > Hi Abid,
> >
> > We should have a FLIP as this would be a code contribution. If you
> provide
> > your Confluence user name, we can grant you access to create one.
> >
> > Is there also something from an Iceberg point of view needed to agree
> with
> > the code contribution?
> >
> > Best regards,
> >
> > Martijn
> >
> > Op wo 19 okt. 2022 om 19:11 schreef 
> >
> >> Thanks Martijn!
> >>
> >> Thanks for all the support and positive responses. I will start a vote
> >> thread and send it out to the dev list.
> >>
> >> Also, we need help with creation of a new repo for the Iceberg
> Connector.
> >>
> >> Can someone help with the creation of a repo? Please let me know if I
> need
> >> to create an issue or flip for that.
> >> Following similar naming for other connectors, I propose
> >> https://github.com/apache/flink-connector-iceberg (doesn’t exist)
> >>
> >> Thanks
> >> Abid
> >>
> >> On 2022/10/19 08:41:02 Martijn Visser wrote:
> >>> Hi all,
> >>>
> >>> Thanks for the info and also thanks Peter and Steven for offering to
> >>> volunteer. I think that's a great idea and a necessity.
> >>>
> >>> Overall +1 given the current ideas to make this contribution happen.
> >>>
> >>> BTW congrats on reaching Iceberg 1.0, a great accomplishment :)
> >>>
> >>> Thanks,
> >>>
> >>> Martijn
> >>>
> >>> On Tue, Oct 18, 2022 at 12:31 AM Steven Wu  wrote:
> &

Re: Re: [Discuss]- Donate Iceberg Flink Connector

2022-10-17 Thread Steven Wu
I was one of the maintainers for the Flink Iceberg connector in Iceberg
repo. I can volunteer as one of the initial maintainers if we decide to
move forward.

On Mon, Oct 17, 2022 at 3:26 PM  wrote:

> Hi Martijn,
>
> Yes, It is considered a connector in Flink terms.
>
> We wanted to join the Flink connector externalization effort so that we
> can bring the Iceberg connector closer to the Flink community. We are
> hoping any issues with the APIs for Iceberg connector will surface sooner
> and get more attention from the Flink community when the connector is
> within Flink umbrella rather than in Iceberg repo. Also to get better
> feedback from Flink experts when it comes to things related to adding
> things in a connector vs Flink itself.
>
> Thanks everyone for all your responses! Looking forward to the next steps.
>
> Thanks
> Abid
>
> On 2022/10/14 03:37:09 Jark Wu wrote:
> > Thank Abid for the discussion,
> >
> > I'm also fine with maintaining it under the Flink project.
> > But I'm also interested in the response to Martijn's question.
> >
> > Besides, once the code is moved to the Flink project, are there any
> initial
> > maintainers for the connector we can find?
> > In addition, do we still maintain documentation under Iceberg
> > https://iceberg.apache.org/docs/latest/flink/ ?
> >
> > Best,
> > Jark
> >
> >
> > On Thu, 13 Oct 2022 at 17:52, yuxia  wrote:
> >
> > > +1. Thanks for driving it. Hope I can find some chances to take part in
> > > the future development of Iceberg Flink Connector.
> > >
> > > Best regards,
> > > Yuxia
> > >
> > > - 原始邮件 -
> > > 发件人: "Zheng Yu Chen" 
> > > 收件人: "dev" 
> > > 发送时间: 星期四, 2022年 10 月 13日 上午 11:26:29
> > > 主题: Re: [Discuss]- Donate Iceberg Flink Connector
> > >
> > > +1, thanks to drive it
> > >
> > > Abid Mohammed  于2022年10月10日周一 09:22写道:
> > >
> > > > Hi,
> > > >
> > > > I would like to start a discussion about contributing Iceberg Flink
> > > > Connector to Flink.
> > > >
> > > > I created a doc <
> > > >
> > >
> https://docs.google.com/document/d/1WC8xkPiVdwtsKL2VSPAUgzm9EjrPs8ZRjEtcwv93ISI/edit?usp=sharing
> > > >
> > > > with all the details following the Flink Connector template as I
> don’t
> > > have
> > > > permissions to create a FLIP yet.
> > > > High level details are captured below:
> > > >
> > > > Motivation:
> > > >
> > > > This FLIP aims to contribute the existing Apache Iceberg Flink
> Connector
> > > > to Flink.
> > > >
> > > > Apache Iceberg is an open table format for huge analytic datasets.
> > > Iceberg
> > > > adds tables to compute engines including Spark, Trino, PrestoDB,
> Flink,
> > > > Hive and Impala using a high-performance table format that works just
> > > like
> > > > a SQL table.
> > > > Iceberg avoids unpleasant surprises. Schema evolution works and won’t
> > > > inadvertently un-delete data. Users don’t need to know about
> partitioning
> > > > to get fast queries. Iceberg was designed to solve correctness
> problems
> > > in
> > > > eventually-consistent cloud object stores.
> > > >
> > > > Iceberg supports both Flink’s DataStream API and Table API. Based on
> the
> > > > guideline of the Flink community, only the latest 2 minor versions
> are
> > > > actively maintained. See the Multi-Engine Support#apache-flink for
> > > further
> > > > details.
> > > >
> > > >
> > > > Iceberg connector supports:
> > > >
> > > > • Source: detailed Source design <
> > > >
> > >
> https://docs.google.com/document/d/1q6xaBxUPFwYsW9aXWxYUh7die6O7rDeAPFQcTAMQ0GM/edit#
> > > >,
> > > > based on FLIP-27
> > > > • Sink: detailed Sink design and interfaces used <
> > > >
> > >
> https://docs.google.com/document/d/1O-dPaFct59wUWQECXEEYIkl9_MOoG3zTbC2V-fZRwrg/edit#
> > > > >
> > > > • Usable in both DataStream and Table API/SQL
> > > > • DataStream read/append/overwrite
> > > > • SQL create/alter/drop table, select, insert into, insert
> > > > overwrite
> > > > • Streaming or batch read in Java API
> > > > • Support for Flink’s Python API
> > > >
> > > > See Iceberg Flink  <
> https://iceberg.apache.org/docs/latest/flink/#flink
> > > >for
> > > > detailed usage instructions.
> > > >
> > > > Looking forward to the discussion!
> > > >
> > > > Thanks
> > > > Abid
> > >
> >


Re: [DISCUSS] FLIP-264 Extract BaseCoordinatorContext

2022-10-16 Thread Steven Wu
sorry. sent the incomplete reply by mistake.

If there are any concrete concerns, we can discuss. In the FLINK-27405 [1],
Avid pointed out some implications regarding checkpointing. In this small
FLIP, we are not exposing/changing any checkpointing logic, we mainly need
the coordinator context functionality to facilitate the communication
between coordinator and subtasks.

[1] https://issues.apache.org/jira/browse/FLINK-27405

On Sun, Oct 16, 2022 at 8:56 AM Steven Wu  wrote:

> Hang, appreciate your input. Agree that `CoordinatorContextBase` is a
> better name considering Flink code convention.
>
> If there are any concrete concerns, we can discuss. In the jira,
>
>
>
> On Sun, Oct 16, 2022 at 12:12 AM Hang Ruan  wrote:
>
>> Hi,
>>
>> IMP, I agree to extract a base class for SourceCoordinatorContext.
>> But I prefer to use the name `OperatorCoordinatorContextBase` or
>> `CoordinatorContextBase` as the format like `SourceReaderBase`.
>> I also agree to what Piotr said. Maybe more problems will occur when
>> connectors start to use it.
>>
>> Best,
>> Hang
>>
>> Steven Wu  于2022年10月14日周五 22:31写道:
>>
>> > Piotr,
>> >
>> > The proposal is to extract the listed methods from @Iinternal
>> > SourceCoordinatorContext to a @PublicEvolving BaseCoordinatorContext.
>> >
>> > The motivation is that other operators can leverage the communication
>> > mechanism btw operator coordinator and operator subtasks. For example,
>> in
>> > the linked google doc shuffle operator (in Flink Iceberg sink) can
>> leverage
>> > it for computing traffic distribution statistics.
>> > * subtasks calculate local statistics and periodically send them to the
>> > coordinator for global aggregation.
>> > * The coordinator can broadcast the globally aggregated statistics to
>> > subtasks, which can be used to guide the shuffling decision (selecting
>> > downstream channels).
>> >
>> > Thanks,
>> > Steven
>> >
>> >
>> > On Fri, Oct 14, 2022 at 2:16 AM Piotr Nowojski 
>> > wrote:
>> >
>> > > Hi,
>> > >
>> > > Could you clarify what's the proposal that you have in mind? From the
>> > > context I would understand that the newly extracted
>> > > `BaseCoordinatorContext` would have to be marked as `@PublicEvolving`
>> or
>> > > `@Experimental`, since otherwise extracting it and keeping `@Internal`
>> > > wouldn't change much? Such `@Internal` base class could have been
>> removed
>> > > at any point of time in the future. Having said that, it sounds to me
>> > like
>> > > your proposal is a bit bigger than it looks at the first glance and
>> you
>> > > actually want to expose the operator coordinator concept to the public
>> > API?
>> > >
>> > > AFAIK there were some discussions about that, and it was a bit of a
>> > > conscious decision to NOT do that. I don't know those reasons however.
>> > Only
>> > > now, I've just heard that there are for example some problems with
>> > > checkpointing of hypothetical non source operator coordinators. Maybe
>> > > someone else could shed some light on this?
>> > >
>> > > Conceptually I would be actually in favour of exposing operator
>> > > coordinators if there is a good reason behind that, but it is a more
>> > > difficult topic and might be a larger effort than it seems at the
>> first
>> > > glance.
>> > >
>> > > Best,
>> > > Piotrek
>> > >
>> > > wt., 4 paź 2022 o 19:41 Steven Wu  napisał(a):
>> > >
>> > > > Jing, thanks a lot for your reply. The linked google doc is not for
>> > this
>> > > > FLIP, which is fully documented in the wiki page. The linked google
>> doc
>> > > is
>> > > > the design doc to introduce shuffling in Flink Iceberg sink, which
>> > > > motivated this FLIP proposal so that the shuffle coordinator can
>> > leverage
>> > > > the introduced BaseCoordinatorContext to avoid code duplication.
>> > > >
>> > > > On Tue, Oct 4, 2022 at 1:04 AM Jing Ge  wrote:
>> > > >
>> > > > > Thanks for bringing this up. It looks overall good! One small
>> thing,
>> > > you
>> > > > > might want to write all content on the wiki page instead of
>> linking
>> > to
>> > > a
>> > > > > google doc. The reason is that some people might not be able to
>> > access
>> > > > the
>> > > > > google doc.
>> > > > >
>> > > > > Best regards,
>> > > > > Jing
>> > > > >
>> > > > > On Tue, Oct 4, 2022 at 3:57 AM gang ye 
>> wrote:
>> > > > >
>> > > > >> Hi,
>> > > > >>
>> > > > >> We submit the Flip proposal
>> > > > >> <
>> > > > >>
>> > > >
>> > >
>> >
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-264%3A+Extract+BaseCoordinatorContext
>> > > > >> >
>> > > > >> at Confluent to extract BaseCoordinatorContext from
>> > > > >> SourceCoordinatorContext to reuse it for other coordinators E.g.
>> in
>> > > the
>> > > > >> shuffling support of Flink Iceberg sink
>> > > > >> <
>> > > > >>
>> > > >
>> > >
>> >
>> https://docs.google.com/document/d/13N8cMqPi-ZPSKbkXGOBMPOzbv2Fua59j8bIjjtxLWqo
>> > > > >> >
>> > > > >>
>> > > > >> Could you help to take a look?
>> > > > >> Thanks
>> > > > >>
>> > > > >> Gang
>> > > > >>
>> > > > >
>> > > >
>> > >
>> >
>>
>


Re: [DISCUSS] FLIP-264 Extract BaseCoordinatorContext

2022-10-16 Thread Steven Wu
Hang, appreciate your input. Agree that `CoordinatorContextBase` is a
better name considering Flink code convention.

If there are any concrete concerns, we can discuss. In the jira,



On Sun, Oct 16, 2022 at 12:12 AM Hang Ruan  wrote:

> Hi,
>
> IMP, I agree to extract a base class for SourceCoordinatorContext.
> But I prefer to use the name `OperatorCoordinatorContextBase` or
> `CoordinatorContextBase` as the format like `SourceReaderBase`.
> I also agree to what Piotr said. Maybe more problems will occur when
> connectors start to use it.
>
> Best,
> Hang
>
> Steven Wu  于2022年10月14日周五 22:31写道:
>
> > Piotr,
> >
> > The proposal is to extract the listed methods from @Iinternal
> > SourceCoordinatorContext to a @PublicEvolving BaseCoordinatorContext.
> >
> > The motivation is that other operators can leverage the communication
> > mechanism btw operator coordinator and operator subtasks. For example, in
> > the linked google doc shuffle operator (in Flink Iceberg sink) can
> leverage
> > it for computing traffic distribution statistics.
> > * subtasks calculate local statistics and periodically send them to the
> > coordinator for global aggregation.
> > * The coordinator can broadcast the globally aggregated statistics to
> > subtasks, which can be used to guide the shuffling decision (selecting
> > downstream channels).
> >
> > Thanks,
> > Steven
> >
> >
> > On Fri, Oct 14, 2022 at 2:16 AM Piotr Nowojski 
> > wrote:
> >
> > > Hi,
> > >
> > > Could you clarify what's the proposal that you have in mind? From the
> > > context I would understand that the newly extracted
> > > `BaseCoordinatorContext` would have to be marked as `@PublicEvolving`
> or
> > > `@Experimental`, since otherwise extracting it and keeping `@Internal`
> > > wouldn't change much? Such `@Internal` base class could have been
> removed
> > > at any point of time in the future. Having said that, it sounds to me
> > like
> > > your proposal is a bit bigger than it looks at the first glance and you
> > > actually want to expose the operator coordinator concept to the public
> > API?
> > >
> > > AFAIK there were some discussions about that, and it was a bit of a
> > > conscious decision to NOT do that. I don't know those reasons however.
> > Only
> > > now, I've just heard that there are for example some problems with
> > > checkpointing of hypothetical non source operator coordinators. Maybe
> > > someone else could shed some light on this?
> > >
> > > Conceptually I would be actually in favour of exposing operator
> > > coordinators if there is a good reason behind that, but it is a more
> > > difficult topic and might be a larger effort than it seems at the first
> > > glance.
> > >
> > > Best,
> > > Piotrek
> > >
> > > wt., 4 paź 2022 o 19:41 Steven Wu  napisał(a):
> > >
> > > > Jing, thanks a lot for your reply. The linked google doc is not for
> > this
> > > > FLIP, which is fully documented in the wiki page. The linked google
> doc
> > > is
> > > > the design doc to introduce shuffling in Flink Iceberg sink, which
> > > > motivated this FLIP proposal so that the shuffle coordinator can
> > leverage
> > > > the introduced BaseCoordinatorContext to avoid code duplication.
> > > >
> > > > On Tue, Oct 4, 2022 at 1:04 AM Jing Ge  wrote:
> > > >
> > > > > Thanks for bringing this up. It looks overall good! One small
> thing,
> > > you
> > > > > might want to write all content on the wiki page instead of linking
> > to
> > > a
> > > > > google doc. The reason is that some people might not be able to
> > access
> > > > the
> > > > > google doc.
> > > > >
> > > > > Best regards,
> > > > > Jing
> > > > >
> > > > > On Tue, Oct 4, 2022 at 3:57 AM gang ye 
> wrote:
> > > > >
> > > > >> Hi,
> > > > >>
> > > > >> We submit the Flip proposal
> > > > >> <
> > > > >>
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-264%3A+Extract+BaseCoordinatorContext
> > > > >> >
> > > > >> at Confluent to extract BaseCoordinatorContext from
> > > > >> SourceCoordinatorContext to reuse it for other coordinators E.g.
> in
> > > the
> > > > >> shuffling support of Flink Iceberg sink
> > > > >> <
> > > > >>
> > > >
> > >
> >
> https://docs.google.com/document/d/13N8cMqPi-ZPSKbkXGOBMPOzbv2Fua59j8bIjjtxLWqo
> > > > >> >
> > > > >>
> > > > >> Could you help to take a look?
> > > > >> Thanks
> > > > >>
> > > > >> Gang
> > > > >>
> > > > >
> > > >
> > >
> >
>


Re: [DISCUSS] FLIP-264 Extract BaseCoordinatorContext

2022-10-14 Thread Steven Wu
Piotr,

The proposal is to extract the listed methods from @Iinternal
SourceCoordinatorContext to a @PublicEvolving BaseCoordinatorContext.

The motivation is that other operators can leverage the communication
mechanism btw operator coordinator and operator subtasks. For example, in
the linked google doc shuffle operator (in Flink Iceberg sink) can leverage
it for computing traffic distribution statistics.
* subtasks calculate local statistics and periodically send them to the
coordinator for global aggregation.
* The coordinator can broadcast the globally aggregated statistics to
subtasks, which can be used to guide the shuffling decision (selecting
downstream channels).

Thanks,
Steven


On Fri, Oct 14, 2022 at 2:16 AM Piotr Nowojski  wrote:

> Hi,
>
> Could you clarify what's the proposal that you have in mind? From the
> context I would understand that the newly extracted
> `BaseCoordinatorContext` would have to be marked as `@PublicEvolving` or
> `@Experimental`, since otherwise extracting it and keeping `@Internal`
> wouldn't change much? Such `@Internal` base class could have been removed
> at any point of time in the future. Having said that, it sounds to me like
> your proposal is a bit bigger than it looks at the first glance and you
> actually want to expose the operator coordinator concept to the public API?
>
> AFAIK there were some discussions about that, and it was a bit of a
> conscious decision to NOT do that. I don't know those reasons however. Only
> now, I've just heard that there are for example some problems with
> checkpointing of hypothetical non source operator coordinators. Maybe
> someone else could shed some light on this?
>
> Conceptually I would be actually in favour of exposing operator
> coordinators if there is a good reason behind that, but it is a more
> difficult topic and might be a larger effort than it seems at the first
> glance.
>
> Best,
> Piotrek
>
> wt., 4 paź 2022 o 19:41 Steven Wu  napisał(a):
>
> > Jing, thanks a lot for your reply. The linked google doc is not for this
> > FLIP, which is fully documented in the wiki page. The linked google doc
> is
> > the design doc to introduce shuffling in Flink Iceberg sink, which
> > motivated this FLIP proposal so that the shuffle coordinator can leverage
> > the introduced BaseCoordinatorContext to avoid code duplication.
> >
> > On Tue, Oct 4, 2022 at 1:04 AM Jing Ge  wrote:
> >
> > > Thanks for bringing this up. It looks overall good! One small thing,
> you
> > > might want to write all content on the wiki page instead of linking to
> a
> > > google doc. The reason is that some people might not be able to access
> > the
> > > google doc.
> > >
> > > Best regards,
> > > Jing
> > >
> > > On Tue, Oct 4, 2022 at 3:57 AM gang ye  wrote:
> > >
> > >> Hi,
> > >>
> > >> We submit the Flip proposal
> > >> <
> > >>
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-264%3A+Extract+BaseCoordinatorContext
> > >> >
> > >> at Confluent to extract BaseCoordinatorContext from
> > >> SourceCoordinatorContext to reuse it for other coordinators E.g. in
> the
> > >> shuffling support of Flink Iceberg sink
> > >> <
> > >>
> >
> https://docs.google.com/document/d/13N8cMqPi-ZPSKbkXGOBMPOzbv2Fua59j8bIjjtxLWqo
> > >> >
> > >>
> > >> Could you help to take a look?
> > >> Thanks
> > >>
> > >> Gang
> > >>
> > >
> >
>


Re: [DISCUSS] Externalized connector release details

2022-10-12 Thread Steven Wu
With the model of externalized Flink connector repo (which I fully
support), there is one challenge of supporting versions of two upstream
projects (similar to what Peter Vary mentioned earlier).

E.g., today the Flink Iceberg connector lives in Iceberg repo. We have
separate modules 1.13, 1.14, 1.15 to list the supported Flink
versions, which is still manageable. With the new model, now we may need to
multiply that with the number of Iceberg versions that we are going to
support, e.g. 0.13, 0.14, 1.0. That multiplication factor would be
non-manageable.

>From the flink-connector-elasticsearch repo, it is unclear how we define
the supported Flink versions. Only one/latest Flink version?


On Fri, Sep 30, 2022 at 8:36 AM Péter Váry 
wrote:

> +1 having an option storing every version of a connector in one repo
>
> Also, it would be good to have the major(.minor) version of the connected
> system in the name of the connector jar, depending of the compatibility. I
> think this compatibility is mostly system dependent.
>
> Thanks, Peter
>
>
> On Fri, Sep 30, 2022, 09:32 Martijn Visser 
> wrote:
>
> > Hi Peter,
> >
> > I think this also depends on the support SLA that the technology that you
> > connect to provides. For example, with Flink and Elasticsearch, we choose
> > to follow Elasticsearch supported versions. So that means that when
> support
> > for Elasticsearch 8 is introduced, support for Elasticsearch 6 should be
> > dropped (since Elastic only support the last major version and the latest
> > minor version prior to that)
> >
> > I don't see value in having different connectors for Iceberg 0.14 and
> 0.15
> > in separate repositories. I think that will confuse the user. I would
> > expect that with modules you should be able to have support for multiple
> > versions in one repository.
> >
> > Best regards,
> >
> > Martijn
> >
> > On Fri, Sep 30, 2022 at 7:44 AM Péter Váry 
> > wrote:
> >
> > > Thanks for the quick response!
> > >
> > > Would this mean, that we have different connectors for Iceberg 0.14,
> and
> > > Iceberg 0.15. Would these different versions kept in different
> > repository?
> > >
> > > My feeling is that this model is fine for the stable/slow moving
> systems
> > > like Hive/HBase. For other systems, which are evolving faster, this is
> > less
> > > than ideal.
> > >
> > > For those, who have more knowledge about the Flink ecosystem: How do
> you
> > > feel? What is the distribution of the connectors between the slow
> moving
> > > and the fast moving systems?
> > >
> > > Thanks, Peter
> > >
> > >
> > > On Thu, Sep 29, 2022, 16:46 Danny Cranmer 
> > wrote:
> > >
> > > > If you look at ElasticSearch [1] as an example there are different
> > > variants
> > > > of the connector depending on the "connected" system:
> > > > - flink-connector-elasticsearch6
> > > > - flink-connector-elasticsearch7
> > > >
> > > > Looks like Hive and HBase follow a similar pattern in the main Flink
> > > repo/
> > > >
> > > > [1] https://github.com/apache/flink-connector-elasticsearch
> > > >
> > > > On Thu, Sep 29, 2022 at 3:17 PM Péter Váry <
> > peter.vary.apa...@gmail.com>
> > > > wrote:
> > > >
> > > > > Hi Team,
> > > > >
> > > > > Just joining the conversation for the first time, so pardon me if I
> > > > repeat
> > > > > already answered questions.
> > > > >
> > > > > It might be already discussed, but I think the version for the
> > > > "connected"
> > > > > system could be important as well.
> > > > >
> > > > > There might be some API changes between Iceberg 0.14.2, and 1.0.0,
> > > which
> > > > > would require as to rewrite part of the code for the Flink-Iceberg
> > > > > connector.
> > > > > It would be important for the users:
> > > > > - Which Flink version(s) are this connector working with?
> > > > > - Which Iceberg version(s) are this connector working with?
> > > > > - Which code version we have for this connector?
> > > > >
> > > > > Does this make sense? What is the community's experience with the
> > > > connected
> > > > > systems? Are they stable enough for omitting their version number
> > from
> > > > the
> > > > > naming of the connectors? Would this worth the proliferation of the
> > > > > versions?
> > > > >
> > > > > Thanks,
> > > > > Peter
> > > > >
> > > > > Chesnay Schepler  ezt írta (időpont: 2022.
> > szept.
> > > > 29.,
> > > > > Cs, 14:11):
> > > > >
> > > > > > 2) No; the branch names would not have a Flink version in them;
> > > v1.0.0,
> > > > > > v1.0.1 etc.
> > > > > >
> > > > > > On 29/09/2022 14:03, Martijn Visser wrote:
> > > > > > > If I summarize it correctly, that means that:
> > > > > > >
> > > > > > > 1. The versioning scheme would be  > > > > > > version>-, where there
> will
> > > > never
> > > > > > be a
> > > > > > > patch release for a minor version if a newer minor version
> > already
> > > > > > exists.
> > > > > > > E.g., 1.0.0-1.15; 1.0.1-1.15; 1.1.0-1.15; 1.2.0-1.15;
> > > > > > >
> > > > > > > 2. The branch naming scheme would be
> > > > > 

Re: [DISCUSS] FLIP-264 Extract BaseCoordinatorContext

2022-10-04 Thread Steven Wu
Jing, thanks a lot for your reply. The linked google doc is not for this
FLIP, which is fully documented in the wiki page. The linked google doc is
the design doc to introduce shuffling in Flink Iceberg sink, which
motivated this FLIP proposal so that the shuffle coordinator can leverage
the introduced BaseCoordinatorContext to avoid code duplication.

On Tue, Oct 4, 2022 at 1:04 AM Jing Ge  wrote:

> Thanks for bringing this up. It looks overall good! One small thing, you
> might want to write all content on the wiki page instead of linking to a
> google doc. The reason is that some people might not be able to access the
> google doc.
>
> Best regards,
> Jing
>
> On Tue, Oct 4, 2022 at 3:57 AM gang ye  wrote:
>
>> Hi,
>>
>> We submit the Flip proposal
>> <
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-264%3A+Extract+BaseCoordinatorContext
>> >
>> at Confluent to extract BaseCoordinatorContext from
>> SourceCoordinatorContext to reuse it for other coordinators E.g. in the
>> shuffling support of Flink Iceberg sink
>> <
>> https://docs.google.com/document/d/13N8cMqPi-ZPSKbkXGOBMPOzbv2Fua59j8bIjjtxLWqo
>> >
>>
>> Could you help to take a look?
>> Thanks
>>
>> Gang
>>
>


Re: Sink V2 interface replacement for GlobalCommitter

2022-09-28 Thread Steven Wu
ver changing parallelism of Committer operator to hardcoded value 1
> was not enough and I had to do two more things:
> 1. add rebalance step (RebalancePartitioner) to graph between writer and
> committer since now they have different parallelism level and default
> partitioner was FORWARD that caused an exception to be thrown - BTW this is
> clear and understood
> 2. modify Flinks CommittableCollectorSerializer [5] and this is I believe
> an important thing.
>
> The modification I had to made was caused by "Duplicate Key" exception
> from deserialize(int version, byte[] serialized) method from line 143 of
> [5] where we process a stream of SubtaskCommittableManager objects and
> collect it into to the Map. The map key is a subtaskId
> from SubtaskCommittableManager object.
>
> After Task Manager recovery it may happen that List of
> SubtaskCommittableManager that is processed in that  deserialize method
> will contain two SubtaskCommittableManager for the same subtask ID. What I
> did is that for such a case I call SubtaskCommittableManager .merge(...)
> method.
>
> With those modifications our Delta test [7] started to pass on Flink 1.15.
>
> I do not know whether setting parallelism level of the Committer to 1 is a
> right thing to do. Like I mentioned, Committer is doing some work in our
> Sink implementation and we might have more usage for it in next features we
> would like to add that would benefit from keeping parallelism level equal
> to writers count.
>
> I still think there is some issue with the V2 architecture for topologies
> with GlobalCommitter and failover scenarios [4] and even that duplicated
> key in [5] described above is another case, maybe we should never have two
> entries for same subtaskId. That I don't know.
>
> P.S.
> Steven, apologies for hijacking the thread a little bit.
>
> Thanks,
> Krzysztof Chmielewski
>
> [1]
> https://github.com/delta-io/connectors/blob/master/flink/src/main/java/io/delta/flink/sink/internal/committer/DeltaCommitter.java
> [2]
> https://github.com/delta-io/connectors/blob/master/flink/src/main/java/io/delta/flink/sink/internal/committer/DeltaGlobalCommitter.java
> [3]
> https://drive.google.com/file/d/1kU0R9nLZneJBDAkgNiaRc90dLGycyTec/view?usp=sharing
> [4] https://lists.apache.org/thread/otscy199g1l9t3llvo8s2slntyn2r1jc
> [5]
> https://github.com/apache/flink/blob/release-1.15/flink-streaming-java/src/main/java/org/apache/flink/streaming/runtime/operators/sink/committables/CommittableCollectorSerializer.java
> [7]
> https://github.com/kristoffSC/connectors/blob/Flink_1.15/flink/src/test/java/io/delta/flink/sink/DeltaSinkStreamingExecutionITCase.java
>
> śr., 14 wrz 2022 o 05:26 Steven Wu  napisał(a):
> > setting the committer parallelism to 1.
>
> Yun, setting the parallelism to 1 is essentially a global committer. That
> would work. not sure about the implications to other parts of the v2 sink
> interface.
>
> On Tue, Sep 13, 2022 at 2:17 PM Krzysztof Chmielewski <
> krzysiek.chmielew...@gmail.com> wrote:
> Hi  Martijn
> Could you clarify a little bit what do you mean by:
>
> "The important part to remember is that this
> topology is lagging one checkpoint behind in terms of fault-tolerance: it
> only receives data once the committer committed"
>
> What are the implications?
>
> Thanks,
> Krzysztof Chmielewski
>
> wt., 13 wrz 2022 o 09:57 Yun Gao 
> napisał(a):
> Hi,
> Very sorry for the late reply for being in the holiday.
> And also very thanks for the discussion, it also reminds me
> one more background on the change of the GlobalCommitter:
> When we are refactoring the job finish process in FLIP-147 to
> ensures all the records could be committed at the end of bounded
> streaming job, we have to desert the support for the cascade commits,
> which makes the cascade commit of `committer -> global committer` not work
> in all cases.
> For the current issues, one possible alternative option from my side is
> that we
> may support setting the committer parallelism to 1. Could this option
> solves
> the issue in the current scenarios? I'll also have a double check with if
> it could be implemented and the failed tests Krzysztof met.
> Best,
> Yun
> --
> From:Steven Wu 
> Send Time:2022 Sep. 10 (Sat.) 11:31
> To:dev 
> Cc:Yun Gao ; hililiwei 
> Subject:Re: Sink V2 interface replacement for GlobalCommitter
> Martjin, thanks a lot for chiming in!
> Here are my concerns with adding GlobalCommitter in the PostCommitTopology
> 1. when we use TwoPhaseCommittingSink. We would need to create a
> noop/dummy committer. Actual Iceberg/DeltaLake commits happen in the
> PostC

Re: Sink V2 interface replacement for GlobalCommitter

2022-09-14 Thread Steven Wu
Krzysztof, no worries. We are discussing the same topic (how to support
storage with globally transactional commits).

> In Delta Sink connector we actually use both Committer [1] and
GlobalCommitter [2]. The former, since we are using Flink's Parquet file
writers is doing a very simple job of "of renaming the hidden file to make
it visible and removing from the name some 'in-progress file' marker". The
GlobalCommitter is committing data to the Delta Log.

Curious if the writers can write the visible files directly (vs hidden
files first then renamed by committer). Since there is a global committer
to commit the data files when Flink checkpoint completes, job failure or
restart shouldn't cause data file dups or loss. I probably missed some
context here.

On Wed, Sep 14, 2022 at 5:20 AM Krzysztof Chmielewski <
krzysiek.chmielew...@gmail.com> wrote:

> Hi Yun,
> Thanks for your input.
>
> In Delta Sink connector we actually use both Committer [1] and
> GlobalCommitter [2]. The former, since we are using Flink's Parquet file
> writers is doing a very simple job of "of renaming the hidden file to make
> it visible and removing from the name some 'in-progress file' marker". The
> GlobalCommitter is committing data to the Delta Log.
>
> With this design, having many instances of Committers actually has a
> benefit for us. Plus we would see some next features in our connector that
> would benefit from separate Committers with parallelism level higher than 1.
>
> How I understood your suggestion Yun (and maybe It was a wrong
> interpretation) is to use both Committer and GlobalCommitter but to enforce
> parallelism level 1 on the former. The GlobalCommitter created by Flink's
> 1.15 SinkV1Adapter has parallelism 1 as expected and how it was in Flink <
> 1.15.
>
> Anyways, I've play a little bit with the Flink code and I managed to
> achieved this [3]. After some additional changes which I will describe
> below, our test described in [4] passed without any data loss and no
> Exceptions thrown by Flink.
>
> However changing parallelism of Committer operator to hardcoded value 1
> was not enough and I had to do two more things:
> 1. add rebalance step (RebalancePartitioner) to graph between writer and
> committer since now they have different parallelism level and default
> partitioner was FORWARD that caused an exception to be thrown - BTW this is
> clear and understood
> 2. modify Flinks CommittableCollectorSerializer [5] and this is I believe
> an important thing.
>
> The modification I had to made was caused by "Duplicate Key" exception
> from deserialize(int version, byte[] serialized) method from line 143 of
> [5] where we process a stream of SubtaskCommittableManager objects and
> collect it into to the Map. The map key is a subtaskId
> from SubtaskCommittableManager object.
>
> After Task Manager recovery it may happen that List of
> SubtaskCommittableManager that is processed in that  deserialize method
> will contain two SubtaskCommittableManager for the same subtask ID. What I
> did is that for such a case I call SubtaskCommittableManager .merge(...)
> method.
>
> With those modifications our Delta test [7] started to pass on Flink 1.15.
>
> I do not know whether setting parallelism level of the Committer to 1 is a
> right thing to do. Like I mentioned, Committer is doing some work in our
> Sink implementation and we might have more usage for it in next features we
> would like to add that would benefit from keeping parallelism level equal
> to writers count.
>
> I still think there is some issue with the V2 architecture for topologies
> with GlobalCommitter and failover scenarios [4] and even that duplicated
> key in [5] described above is another case, maybe we should never have two
> entries for same subtaskId. That I don't know.
>
> P.S.
> Steven, apologies for hijacking the thread a little bit.
>
> Thanks,
> Krzysztof Chmielewski
>
> [1]
> https://github.com/delta-io/connectors/blob/master/flink/src/main/java/io/delta/flink/sink/internal/committer/DeltaCommitter.java
> [2]
> https://github.com/delta-io/connectors/blob/master/flink/src/main/java/io/delta/flink/sink/internal/committer/DeltaGlobalCommitter.java
> [3]
> https://drive.google.com/file/d/1kU0R9nLZneJBDAkgNiaRc90dLGycyTec/view?usp=sharing
> [4] https://lists.apache.org/thread/otscy199g1l9t3llvo8s2slntyn2r1jc
> [5]
> https://github.com/apache/flink/blob/release-1.15/flink-streaming-java/src/main/java/org/apache/flink/streaming/runtime/operators/sink/committables/CommittableCollectorSerializer.java
> [7]
> https://github.com/kristoffSC/connectors/blob/Flink_1.15/flink/src/test/java/io/delta/flink/sink/DeltaSinkStreamingExecutionITCase.java
>
> śr., 14 wrz 2022 o 05:26 Steve

Re: Sink V2 interface replacement for GlobalCommitter

2022-09-13 Thread Steven Wu
treaming-java/src/main/java/org/apache/flink/streaming/api/transformations/SinkV1Adapter.java#L359-L370
>> >
>>  >
>>  > Op do 8 sep. 2022 om 20:51 schreef Krzysztof Chmielewski <
>>  > krzysiek.chmielew...@gmail.com <mailto:krzysiek.chmielew...@gmail.com
>> >>:
>>  >
>>  > > Hi,
>>  > > Krzysztof Chmielewski [1] from Delta-Flink connector open source
>>  > community
>>  > > here [2].
>>  > >
>>  > > I'm totally agree with Steven on this. Sink's V1 GlobalCommitter is
>>  > > something exactly what Flink-Delta Sink needs since it is the place
>> where
>>  > > we do an actual commit to the Delta Log which should be done from a
>> one
>>  > > place/instance.
>>  > >
>>  > > Currently I'm evaluating V2 for our connector and having, how Steven
>>  > > described it a "more natural, built-in concept/support of
>> GlobalCommitter
>>  > > in the sink v2 interface" would be greatly appreciated.
>>  > >
>>  > > Cheers,
>>  > > Krzysztof Chmielewski
>>  > >
>>  > > [1] https://github.com/kristoffSC <https://github.com/kristoffSC >
>>  > > [2] https://github.com/delta-io/connectors/tree/master/flink <
>> https://github.com/delta-io/connectors/tree/master/flink >
>>  > >
>>  > > czw., 8 wrz 2022 o 19:51 Steven Wu > stevenz...@gmail.com >> napisał(a):
>>  > >
>>  > > > Hi Yun,
>>  > > >
>>  > > > Thanks a lot for the reply!
>>  > > >
>>  > > > While we can add the global committer in the
>> WithPostCommitTopology,
>>  > the
>>  > > > semantics are weird. The Commit stage actually didn't commit
>> anything
>>  > to
>>  > > > the Iceberg table, and the PostCommit stage is where the Iceberg
>> commit
>>  > > > happens.
>>  > > >
>>  > > > I just took a quick look at DeltaLake Flink sink. It still uses
>> the V1
>>  > > sink
>>  > > > interface [1]. I think it might have the same issue when switching
>> to
>>  > the
>>  > > > V2 sink interface.
>>  > > >
>>  > > > For data lake storages (like Iceberg, DeltaLake) or any storage
>> with
>>  > > global
>>  > > > transactional commit, it would be more natural to have a built-in
>>  > > > concept/support of GlobalCommitter in the sink v2 interface.
>>  > > >
>>  > > > Thanks,
>>  > > > Steven
>>  > > >
>>  > > > [1]
>>  > > >
>>  > > >
>>  > >
>>  >
>> https://github.com/delta-io/connectors/blob/master/flink/src/main/java/io/delta/flink/sink/internal/committer/DeltaGlobalCommitter.java
>> <
>> https://github.com/delta-io/connectors/blob/master/flink/src/main/java/io/delta/flink/sink/internal/committer/DeltaGlobalCommitter.java
>> >
>>  > > >
>>  > > >
>>  > > > On Wed, Sep 7, 2022 at 2:15 AM Yun Gao
>> 
>>  > > > wrote:
>>  > > >
>>  > > > > Hi Steven, Liwei,
>>  > > > > Very sorry for missing this mail and response very late.
>>  > > > > I think the initial thought is indeed to use
>> `WithPostCommitTopology`
>>  > > as
>>  > > > > a replacement of the original GlobalCommitter, and currently the
>>  > > adapter
>>  > > > of
>>  > > > > Sink v1 on top of Sink v2 also maps the GlobalCommitter in Sink
>> V1
>>  > > > > interface
>>  > > > > onto an implementation of `WithPostCommitTopology`.
>>  > > > > Since `WithPostCommitTopology` supports arbitrary subgraph, thus
>> It
>>  > > seems
>>  > > > > to
>>  > > > > me it could support both global committer and small file
>> compaction?
>>  > We
>>  > > > > might
>>  > > > > have an `WithPostCommitTopology` implementation like
>>  > > > > DataStream ds = add global committer;
>>  > > > > if (enable file compaction) {
>>  > > > > build the compaction subgraph from ds
>>  > > > > }
>>  > > > > Best,
>>  > > > > Yun
>>  > > > > [1]
>>  > > > >
>>  > > >
>>  > &

Re: [ANNOUNCE] New Apache Flink PMC Member - Martijn Visser

2022-09-12 Thread Steven Wu
Congrats, Martijn!

On Mon, Sep 12, 2022 at 1:49 PM Alexander Fedulov 
wrote:

> Congrats, Martijn!
>
> On Mon, Sep 12, 2022 at 10:06 AM Jing Ge  wrote:
>
> > Congrats!
> >
> > On Mon, Sep 12, 2022 at 9:38 AM Daisy Tsang  wrote:
> >
> > > Congrats!
> > >
> > > On Mon, Sep 12, 2022 at 9:32 AM Martijn Visser <
> martijnvis...@apache.org
> > >
> > > wrote:
> > >
> > > > Thank you all :)
> > > >
> > > > Op zo 11 sep. 2022 om 13:58 schreef Zheng Yu Chen <
> jam.gz...@gmail.com
> > >:
> > > >
> > > > > Congratulations, Martijn
> > > > >
> > > > >
> > > > >
> > > > > Timo Walther  于2022年9月9日周五 23:08写道:
> > > > >
> > > > > > Hi everyone,
> > > > > >
> > > > > > I'm very happy to announce that Martijn Visser has joined the
> Flink
> > > > PMC!
> > > > > >
> > > > > > Martijn has helped the community in many different ways over the
> > past
> > > > > > months. Externalizing the connectors from the Flink repo to their
> > own
> > > > > > repository, continously updating dependencies, and performing
> other
> > > > > > project-wide refactorings. He is constantly coordinating
> > > contributions,
> > > > > > connecting stakeholders, finding committers for contributions,
> > > driving
> > > > > > release syncs, and helping in making the ASF a better place (e.g.
> > by
> > > > > > using Matomo an ASF-compliant tracking solution for all
> projects).
> > > > > >
> > > > > > Congratulations and welcome, Martijn!
> > > > > >
> > > > > > Cheers,
> > > > > > Timo Walther
> > > > > > (On behalf of the Apache Flink PMC)
> > > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Best
> > > > >
> > > > > ConradJam
> > > > >
> > > >
> > >
> >
>


Re: Sink V2 interface replacement for GlobalCommitter

2022-09-09 Thread Steven Wu
Martjin, thanks a lot for chiming in!

Here are my concerns with adding GlobalCommitter in the PostCommitTopology
1. when we use TwoPhaseCommittingSink. We would need to create a noop/dummy
committer. Actual Iceberg/DeltaLake commits happen in the PostCommit stage.
The PostCommit stage should be doing some work after the commit (not for
the commit).
2. GlobalCommitter is marked as @deprecated. It will be removed at a
certain point. What then?

Thanks,
Steven

On Fri, Sep 9, 2022 at 1:20 PM Krzysztof Chmielewski <
krzysiek.chmielew...@gmail.com> wrote:

> Thanks Martijn,
> I'm actually trying to run our V1 Delta connector on Flink 1.15 using
> SinkV1Adapter with GlobalCommitterOperator.
>
> Having said that, I might have found a potential issue with
> GlobalCommitterOperator, checkpoitining and failover recovery [1].
> For "normal" scenarios it does look good though.
>
> Regards,
> Krzysztof Chmielewski
>
> [1] https://lists.apache.org/thread/otscy199g1l9t3llvo8s2slntyn2r1jc
>
> pt., 9 wrz 2022 o 20:49 Martijn Visser 
> napisał(a):
>
> > Hi all,
> >
> > A couple of bits from when work was being done on the new sink: V1 is
> > completely simulated as V2 [1]. V2 is strictly more expressive.
> >
> > If there's desire to stick to the `GlobalCommitter` interface, have a
> > look at the StandardSinkTopologies. Or you can just add your own more
> > fitting PostCommitTopology. The important part to remember is that this
> > topology is lagging one checkpoint behind in terms of fault-tolerance: it
> > only receives data once the committer committed
> > on notifyCheckpointComplete. Thus, the global committer needs to be
> > idempotent and able to restore the actual state on recovery. That
> > limitation is coming in from Flink's checkpointing behaviour and applies
> to
> > both V1 and V2. GlobalCommitterOperator is abstracting these issues along
> > with handling retries (so commits that happen much later). So it's
> probably
> > a good place to start just with the standard topology.
> >
> > Best regards,
> >
> > Martijn
> >
> > [1]
> >
> >
> https://github.com/apache/flink/blob/955e5ff34082ff8a4a46bb74889612235458eb76/flink-streaming-java/src/main/java/org/apache/flink/streaming/api/transformations/SinkV1Adapter.java#L359-L370
> >
> > Op do 8 sep. 2022 om 20:51 schreef Krzysztof Chmielewski <
> > krzysiek.chmielew...@gmail.com>:
> >
> > > Hi,
> > > Krzysztof Chmielewski [1] from Delta-Flink connector open source
> > community
> > > here [2].
> > >
> > > I'm totally agree with Steven on this. Sink's V1 GlobalCommitter is
> > > something exactly what Flink-Delta Sink needs since it is the place
> where
> > > we do an actual commit to the Delta Log which should be done from a one
> > > place/instance.
> > >
> > > Currently I'm evaluating V2 for our connector and having, how Steven
> > > described it a "more natural, built-in concept/support of
> GlobalCommitter
> > > in the sink v2 interface" would be greatly appreciated.
> > >
> > > Cheers,
> > > Krzysztof Chmielewski
> > >
> > > [1] https://github.com/kristoffSC
> > > [2] https://github.com/delta-io/connectors/tree/master/flink
> > >
> > > czw., 8 wrz 2022 o 19:51 Steven Wu  napisał(a):
> > >
> > > > Hi Yun,
> > > >
> > > > Thanks a lot for the reply!
> > > >
> > > > While we can add the global committer in the WithPostCommitTopology,
> > the
> > > > semantics are weird. The Commit stage actually didn't commit anything
> > to
> > > > the Iceberg table, and the PostCommit stage is where the Iceberg
> commit
> > > > happens.
> > > >
> > > > I just took a quick look at DeltaLake Flink sink. It still uses the
> V1
> > > sink
> > > > interface [1]. I think it might have the same issue when switching to
> > the
> > > > V2 sink interface.
> > > >
> > > > For data lake storages (like Iceberg, DeltaLake) or any storage with
> > > global
> > > > transactional commit, it would be more natural to have a built-in
> > > > concept/support of GlobalCommitter in the sink v2 interface.
> > > >
> > > > Thanks,
> > > > Steven
> > > >
> > > > [1]
> > > >
> > > >
> > >
> >
> https://github.com/delta-io/connectors/blob/master/flink/src/main/java/io/delta/flink/sink/internal/committer/DeltaGlobalCom

Re: Sink V2 interface replacement for GlobalCommitter

2022-09-08 Thread Steven Wu
Hi Yun,

Thanks a lot for the reply!

While we can add the global committer in the WithPostCommitTopology, the
semantics are weird. The Commit stage actually didn't commit anything to
the Iceberg table, and the PostCommit stage is where the Iceberg commit
happens.

I just took a quick look at DeltaLake Flink sink. It still uses the V1 sink
interface [1]. I think it might have the same issue when switching to the
V2 sink interface.

For data lake storages (like Iceberg, DeltaLake) or any storage with global
transactional commit, it would be more natural to have a built-in
concept/support of GlobalCommitter in the sink v2 interface.

Thanks,
Steven

[1]
https://github.com/delta-io/connectors/blob/master/flink/src/main/java/io/delta/flink/sink/internal/committer/DeltaGlobalCommitter.java


On Wed, Sep 7, 2022 at 2:15 AM Yun Gao  wrote:

> Hi Steven, Liwei,
> Very sorry for missing this mail and response very late.
> I think the initial thought is indeed to use `WithPostCommitTopology` as
> a replacement of the original GlobalCommitter, and currently the adapter of
> Sink v1 on top of Sink v2 also maps the GlobalCommitter in Sink V1
> interface
> onto an implementation of `WithPostCommitTopology`.
> Since `WithPostCommitTopology` supports arbitrary subgraph, thus It seems
> to
> me it could support both global committer and small file compaction? We
> might
> have an `WithPostCommitTopology` implementation like
> DataStream ds = add global committer;
> if (enable file compaction) {
>  build the compaction subgraph from ds
> }
> Best,
> Yun
> [1]
> https://github.com/apache/flink/blob/a8ca381c57788cd1a1527e4ebdc19bdbcd132fc4/flink-streaming-java/src/main/java/org/apache/flink/streaming/api/transformations/SinkV1Adapter.java#L365
> <
> https://github.com/apache/flink/blob/a8ca381c57788cd1a1527e4ebdc19bdbcd132fc4/flink-streaming-java/src/main/java/org/apache/flink/streaming/api/transformations/SinkV1Adapter.java#L365
> >
> --
> From:Steven Wu 
> Send Time:2022 Aug. 17 (Wed.) 07:30
> To:dev ; hililiwei 
> Subject:Re: Sink V2 interface replacement for GlobalCommitter
> > Plus, it will disable the future capability of small file compaction
> stage post commit.
> I should clarify this comment. if we are using the `WithPostCommitTopology`
> for global committer, we would lose the capability of using the post commit
> stage for small files compaction.
> On Tue, Aug 16, 2022 at 9:53 AM Steven Wu  wrote:
> >
> > In the V1 sink interface, there is a GlobalCommitter for Iceberg. With
> the
> > V2 sink interface, GlobalCommitter has been deprecated by
> > WithPostCommitTopology. I thought the post commit stage is mainly for
> async
> > maintenance (like compaction).
> >
> > Are we supposed to do sth similar to the GlobalCommittingSinkAdapter? It
> > seems like a temporary transition plan for bridging v1 sinks to v2
> > interfaces.
> >
> > private class GlobalCommittingSinkAdapter extends
> TwoPhaseCommittingSinkAdapter
> > implements WithPostCommitTopology {
> > @Override
> > public void addPostCommitTopology(DataStream>
> committables) {
> > StandardSinkTopologies.addGlobalCommitter(
> > committables,
> > GlobalCommitterAdapter::new,
> > () -> sink.getCommittableSerializer().get());
> > }
> > }
> >
> >
> > In the Iceberg PR [1] for adopting the new sink interface, Liwei used the
> > "global" partitioner to force all committables go to a single committer
> > task 0. It will effectively force a global committer disguised in the
> > parallel committers. It is a little weird and also can lead to questions
> > why other committer tasks are not getting any messages. Plus, it will
> > disable the future capability of small file compaction stage post commit.
> > Hence, I am asking what is the right approach to achieve global committer
> > behavior.
> >
> > Thanks,
> > Steven
> >
> > [1] https://github.com/apache/iceberg/pull/4904/files#r946975047 <
> https://github.com/apache/iceberg/pull/4904/files#r946975047 >
> >
>


Re: Sink V2 interface replacement for GlobalCommitter

2022-08-16 Thread Steven Wu
>  Plus, it will disable the future capability of small file compaction
stage post commit.

I should clarify this comment. if we are using the `WithPostCommitTopology`
for global committer, we would lose the capability of using the post commit
stage for small files compaction.

On Tue, Aug 16, 2022 at 9:53 AM Steven Wu  wrote:

>
> In the V1 sink interface, there is a GlobalCommitter for Iceberg. With the
> V2 sink interface,  GlobalCommitter has been deprecated by
> WithPostCommitTopology. I thought the post commit stage is mainly for async
> maintenance (like compaction).
>
> Are we supposed to do sth similar to the GlobalCommittingSinkAdapter? It
> seems like a temporary transition plan for bridging v1 sinks to v2
> interfaces.
>
> private class GlobalCommittingSinkAdapter extends 
> TwoPhaseCommittingSinkAdapter
> implements WithPostCommitTopology {
> @Override
> public void addPostCommitTopology(DataStream> 
> committables) {
>StandardSinkTopologies.addGlobalCommitter(
> committables,
> GlobalCommitterAdapter::new,
> () -> sink.getCommittableSerializer().get());
> }
> }
>
>
> In the Iceberg PR [1] for adopting the new sink interface, Liwei used the
> "global" partitioner to force all committables go to a single committer
> task 0. It will effectively force a global committer disguised in the
> parallel committers. It is a little weird and also can lead to questions
> why other committer tasks are not getting any messages. Plus, it will
> disable the future capability of small file compaction stage post commit.
> Hence, I am asking what is the right approach to achieve global committer
> behavior.
>
> Thanks,
> Steven
>
> [1] https://github.com/apache/iceberg/pull/4904/files#r946975047
>


Sink V2 interface replacement for GlobalCommitter

2022-08-16 Thread Steven Wu
In the V1 sink interface, there is a GlobalCommitter for Iceberg. With the
V2 sink interface,  GlobalCommitter has been deprecated by
WithPostCommitTopology. I thought the post commit stage is mainly for async
maintenance (like compaction).

Are we supposed to do sth similar to the GlobalCommittingSinkAdapter? It
seems like a temporary transition plan for bridging v1 sinks to v2
interfaces.

private class GlobalCommittingSinkAdapter extends TwoPhaseCommittingSinkAdapter
implements WithPostCommitTopology {
@Override
public void
addPostCommitTopology(DataStream>
committables) {
   StandardSinkTopologies.addGlobalCommitter(
committables,
GlobalCommitterAdapter::new,
() -> sink.getCommittableSerializer().get());
}
}


In the Iceberg PR [1] for adopting the new sink interface, Liwei used the
"global" partitioner to force all committables go to a single committer
task 0. It will effectively force a global committer disguised in the
parallel committers. It is a little weird and also can lead to questions
why other committer tasks are not getting any messages. Plus, it will
disable the future capability of small file compaction stage post commit.
Hence, I am asking what is the right approach to achieve global committer
behavior.

Thanks,
Steven

[1] https://github.com/apache/iceberg/pull/4904/files#r946975047


Re: [VOTE] FLIP-217: Support watermark alignment of source splits

2022-08-04 Thread Steven Wu
+1 (non-binding)

On Wed, Aug 3, 2022 at 5:47 AM Martijn Visser 
wrote:

> +1 (binding)
>
> Op wo 3 aug. 2022 om 14:33 schreef Piotr Nowojski :
>
> > +1 (binding)
> >
> > śr., 3 sie 2022 o 14:13 Thomas Weise  napisał(a):
> >
> > > +1 (binding)
> > >
> > >
> > > On Sun, Jul 31, 2022 at 10:57 PM Sebastian Mattheis <
> > > sebast...@ververica.com>
> > > wrote:
> > >
> > > > Hi everyone,
> > > >
> > > > I would like to start the vote for FLIP-217 [1]. Thanks for your
> > feedback
> > > > and the discussion in [2].
> > > >
> > > > FLIP-217 is a follow-up on FLIP-182 [3] and adds support for
> watermark
> > > > alignment of source splits.
> > > >
> > > > The poll will be open until August 4th, 8.00AM GMT (72h) unless there
> > is
> > > a
> > > > binding veto or an insufficient number of votes.
> > > >
> > > > Best regards,
> > > > Sebastian
> > > >
> > > > [1]
> > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-217+Support+watermark+alignment+of+source+splits
> > > > [2] https://lists.apache.org/thread/4qwkcr3y1hrnlm2h9d69ofb4vo1lprvr
> > > > [3]
> > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-182%3A+Support+watermark+alignment+of+FLIP-27+Sources
> > > >
> > >
> >
>


Re: [DISCUSS] FLIP-238: Introduce FLIP-27-based Data Generator Source

2022-06-14 Thread Steven Wu
d your logic on are pushed further away from
> > the
> > >> > > low-level interfaces responsible for handling data and splits [1].
> > At
> > >> the
> > >> > > same time, the SourceCoordinatorProvider is hardwired into the
> > >> internals
> > >> > > of the framework, so I don't think it will be possible to provide
> a
> > >> > > customized implementation for testing purposes.
> > >> > >
> > >> > > The only chance to tie data generation to checkpointing in the new
> > >> Source
> > >> > > API that I see at the moment is via the SplitEnumerator
> serializer (
> > >> > > getEnumeratorCheckpointSerializer() method) [2]. In theory, it
> > should
> > >> be
> > >> > > possible to share a variable visible both to the generator
> function
> > >> and
> > >> > to
> > >> > > the serializer and manipulate it whenever the serialize() method
> > gets
> > >> > > called upon a checkpoint request. That said, you still won't get
> > >> > > notifications of successful checkpoints that you currently use
> (this
> > >> info
> > >> > > is only available to the SourceCoordinator).
> > >> > >
> > >> > > In general, regardless of the generator implementation itself, the
> > new
> > >> > > Source
> > >> > > API does not seem to support the use case of verifying checkpoints
> > >> > > contents in lockstep with produced data, at least I do not see an
> > >> > immediate
> > >> > > solution for this. Can you think of a different way of checking
> the
> > >> > > correctness of the Iceberg Sink implementation that does not rely
> on
> > >> this
> > >> > > approach?
> > >> > >
> > >> > > Best,
> > >> > > Alexander Fedulov
> > >> > >
> > >> > > [1]
> > >> > >
> > >> > >
> > >> >
> > >>
> >
> https://github.com/apache/flink/blob/0f19c2472c54aac97e4067f5398731ab90036d1a/flink-runtime/src/main/java/org/apache/flink/runtime/source/coordinator/SourceCoordinator.java#L337
> > >> > >
> > >> > > [2]
> > >> > >
> > >> > >
> > >> >
> > >>
> >
> https://github.com/apache/flink/blob/e4b000818c15b5b781c4e5262ba83bfc9d65121a/flink-core/src/main/java/org/apache/flink/api/connector/source/Source.java#L97
> > >> > >
> > >> > > On Tue, Jun 7, 2022 at 6:03 PM Steven Wu 
> > >> wrote:
> > >> > >
> > >> > > In Iceberg source, we have a data generator source that can
> control
> > >> the
> > >> > > records per checkpoint cycle. Can we support sth like this in the
> > >> > > DataGeneratorSource?
> > >> > >
> > >> > >
> > >> > >
> > >> > >
> > >> >
> > >>
> >
> https://github.com/apache/iceberg/blob/master/flink/v1.15/flink/src/test/java/org/apache/iceberg/flink/source/BoundedTestSource.java
> > >> > > public BoundedTestSource(List> elementsPerCheckpoint,
> > boolean
> > >> > > checkpointEnabled)
> > >> > >
> > >> > > Thanks,
> > >> > > Steven
> > >> > >
> > >> > > On Tue, Jun 7, 2022 at 8:48 AM Alexander Fedulov <
> > >> > alexan...@ververica.com
> > >> > >
> > >> > >
> > >> > > wrote:
> > >> > >
> > >> > > Hi everyone,
> > >> > >
> > >> > > I would like to open a discussion on FLIP-238: Introduce
> > FLIP-27-based
> > >> > >
> > >> > > Data
> > >> > >
> > >> > > Generator Source [1]. During the discussion about deprecating the
> > >> > > SourceFunction API [2] it became evident that an easy-to-use
> > >> > > FLIP-27-compatible data generator source is needed so that the
> > current
> > >> > > SourceFunction-based data generator implementations could be
> phased
> > >> out
> > >> > >
> > >> > > for
> > >> > >
> > >> > > both Flink demo/PoC applications and for the internal Flink tests.
> > >> This
> > >> > > FLIP proposes to introduce a generic DataGeneratorSource capable
> of
> > >> > > producing events of an arbitrary type based on a user-supplied
> > >> > >
> > >> > > MapFunction.
> > >> > >
> > >> > >
> > >> > > Looking forward to your feedback.
> > >> > >
> > >> > > [1] https://cwiki.apache.org/confluence/x/9Av1D
> > >> > > [2]
> > https://lists.apache.org/thread/d6cwqw9b3105wcpdkwq7rr4s7x4ywqr9
> > >> > >
> > >> > > Best,
> > >> > > Alexander Fedulov
> > >> > >
> > >> > >
> > >> > >
> > >> > >
> > >> >
> > >>
> > >
> >
>


Re: [DISCUSS] FLIP-238: Introduce FLIP-27-based Data Generator Source

2022-06-07 Thread Steven Wu
In Iceberg source, we have a data generator source that can control the
records per checkpoint cycle. Can we support sth like this in the
DataGeneratorSource?

https://github.com/apache/iceberg/blob/master/flink/v1.15/flink/src/test/java/org/apache/iceberg/flink/source/BoundedTestSource.java
public BoundedTestSource(List> elementsPerCheckpoint, boolean
checkpointEnabled)

Thanks,
Steven

On Tue, Jun 7, 2022 at 8:48 AM Alexander Fedulov 
wrote:

> Hi everyone,
>
> I would like to open a discussion on FLIP-238: Introduce FLIP-27-based Data
> Generator Source [1]. During the discussion about deprecating the
> SourceFunction API [2] it became evident that an easy-to-use
> FLIP-27-compatible data generator source is needed so that the current
> SourceFunction-based data generator implementations could be phased out for
> both Flink demo/PoC applications and for the internal Flink tests. This
> FLIP proposes to introduce a generic DataGeneratorSource capable of
> producing events of an arbitrary type based on a user-supplied MapFunction.
>
> Looking forward to your feedback.
>
> [1] https://cwiki.apache.org/confluence/x/9Av1D
> [2] https://lists.apache.org/thread/d6cwqw9b3105wcpdkwq7rr4s7x4ywqr9
>
> Best,
> Alexander Fedulov
>


Re: Source alignment for Iceberg

2022-05-06 Thread Steven Wu
might be the same as => might NOT be the same as

On Fri, May 6, 2022 at 8:13 PM Steven Wu  wrote:

> The conclusion of this discussion could be that we don't see much value in
> leveraging FLIP-182 with Iceberg source. That would totally be fine.
>
> For me, one big sticking point is the alignment timestamp for the
> (Iceberg) source might be the same as the Flink application watermark.
>
> On Thu, May 5, 2022 at 9:53 PM Piotr Nowojski 
> wrote:
>
>> Option 1 sounds reasonable but I would be tempted to wait for a second
>> motivational use case before generalizing the framework. However I wouldn’t
>> oppose this extension if others feel it’s useful and good thing to do
>>
>> Piotrek
>>
>> > Wiadomość napisana przez Becket Qin  w dniu
>> 06.05.2022, o godz. 03:50:
>> >
>> > I think the key point here is essentially what information should Flink
>> > expose to the user pluggables. Apparently split / local task watermark
>> is
>> > something many user pluggables would be interested in. Right now it is
>> > calculated by the Flink framework but not exposed to the users space,
>> i.e.
>> > SourceReader / SplitEnumerator. So it looks at least we can offer this
>> > information in some way so users can leverage that information to do
>> > things.
>> >
>> > That said, I am not sure if this would help in the Iceberg alignment
>> case.
>> > Because at this point, FLIP-182 reports source reader watermarks
>> > periodically, which may not align with the RequestSplitEvent. So if we
>> > really want to leverage the FLIP-182 mechanism here, I see a few ways,
>> just
>> > to name two of them:
>> > 1. we can expose the source reader watermark in the
>> SourceReaderContext, so
>> > the source readers can put the local watermark in a custom operator
>> event.
>> > This will effectively bypass the existing RequestSplitEvent. Or we can
>> also
>> > extend the RequestSplitEvent to add an additional info field of byte[]
>> > type, so users can piggy-back additional information there, be it
>> watermark
>> > or other stuff.
>> > 2. Simply piggy-back the local watermark in the RequestSplitEvent and
>> pass
>> > that info to the SplitEnumerator as well.
>> >
>> > If we are going to do this, personally I'd prefer the first way, as it
>> > provides a mechanism to allow future extension. So it would be easier to
>> > expose other framework information to the user space in the future.
>> >
>> > Thanks,
>> >
>> > Jiangjie (Becket) Qin
>> >
>> >
>> >
>> >> On Fri, May 6, 2022 at 6:15 AM Thomas Weise  wrote:
>> >>
>> >>> On Wed, May 4, 2022 at 11:03 AM Steven Wu 
>> wrote:
>> >>> Any opinion on different timestamp for source alignment (vs Flink
>> >> application watermark)? For Iceberg source, we might want to enforce
>> >> alignment on kafka timestamp but Flink application watermark may use
>> event
>> >> time field from payload.
>> >>
>> >> I imagine that more generally the question is alignment based on the
>> >> iceberg partition/file metadata vs. individual rows? I think that
>> >> should work as long as there is a guarantee for out of orderness
>> >> within the split?
>> >>
>> >> Thomas
>> >>
>> >>>
>> >>> Thanks,
>> >>> Steven
>> >>>
>> >>> On Wed, May 4, 2022 at 7:02 AM Becket Qin 
>> wrote:
>> >>>>
>> >>>> Hey Piotr,
>> >>>>
>> >>>> I think the mechanism FLIP-182 provided is a reasonable default one,
>> >> which
>> >>>> ensures the watermarks are only drifted by an upper bound. However,
>> >>>> admittedly there are also other strategies for different purposes.
>> >>>>
>> >>>> In the Iceberg case, I am not sure if a static strictly allowed
>> >> watermark
>> >>>> drift is desired. The source might just want to finish reading the
>> >> assigned
>> >>>> splits as fast as possible. And it is OK to have a drift of "one
>> split",
>> >>>> instead of a fixed time period.
>> >>>>
>> >>>> As another example, if there are some fast readers whose splits are
>> >> always
>> >>>> throttled, while the other slow readers are st

Re: Source alignment for Iceberg

2022-05-06 Thread Steven Wu
The conclusion of this discussion could be that we don't see much value in
leveraging FLIP-182 with Iceberg source. That would totally be fine.

For me, one big sticking point is the alignment timestamp for the (Iceberg)
source might be the same as the Flink application watermark.

On Thu, May 5, 2022 at 9:53 PM Piotr Nowojski 
wrote:

> Option 1 sounds reasonable but I would be tempted to wait for a second
> motivational use case before generalizing the framework. However I wouldn’t
> oppose this extension if others feel it’s useful and good thing to do
>
> Piotrek
>
> > Wiadomość napisana przez Becket Qin  w dniu
> 06.05.2022, o godz. 03:50:
> >
> > I think the key point here is essentially what information should Flink
> > expose to the user pluggables. Apparently split / local task watermark is
> > something many user pluggables would be interested in. Right now it is
> > calculated by the Flink framework but not exposed to the users space,
> i.e.
> > SourceReader / SplitEnumerator. So it looks at least we can offer this
> > information in some way so users can leverage that information to do
> > things.
> >
> > That said, I am not sure if this would help in the Iceberg alignment
> case.
> > Because at this point, FLIP-182 reports source reader watermarks
> > periodically, which may not align with the RequestSplitEvent. So if we
> > really want to leverage the FLIP-182 mechanism here, I see a few ways,
> just
> > to name two of them:
> > 1. we can expose the source reader watermark in the SourceReaderContext,
> so
> > the source readers can put the local watermark in a custom operator
> event.
> > This will effectively bypass the existing RequestSplitEvent. Or we can
> also
> > extend the RequestSplitEvent to add an additional info field of byte[]
> > type, so users can piggy-back additional information there, be it
> watermark
> > or other stuff.
> > 2. Simply piggy-back the local watermark in the RequestSplitEvent and
> pass
> > that info to the SplitEnumerator as well.
> >
> > If we are going to do this, personally I'd prefer the first way, as it
> > provides a mechanism to allow future extension. So it would be easier to
> > expose other framework information to the user space in the future.
> >
> > Thanks,
> >
> > Jiangjie (Becket) Qin
> >
> >
> >
> >> On Fri, May 6, 2022 at 6:15 AM Thomas Weise  wrote:
> >>
> >>> On Wed, May 4, 2022 at 11:03 AM Steven Wu 
> wrote:
> >>> Any opinion on different timestamp for source alignment (vs Flink
> >> application watermark)? For Iceberg source, we might want to enforce
> >> alignment on kafka timestamp but Flink application watermark may use
> event
> >> time field from payload.
> >>
> >> I imagine that more generally the question is alignment based on the
> >> iceberg partition/file metadata vs. individual rows? I think that
> >> should work as long as there is a guarantee for out of orderness
> >> within the split?
> >>
> >> Thomas
> >>
> >>>
> >>> Thanks,
> >>> Steven
> >>>
> >>> On Wed, May 4, 2022 at 7:02 AM Becket Qin 
> wrote:
> >>>>
> >>>> Hey Piotr,
> >>>>
> >>>> I think the mechanism FLIP-182 provided is a reasonable default one,
> >> which
> >>>> ensures the watermarks are only drifted by an upper bound. However,
> >>>> admittedly there are also other strategies for different purposes.
> >>>>
> >>>> In the Iceberg case, I am not sure if a static strictly allowed
> >> watermark
> >>>> drift is desired. The source might just want to finish reading the
> >> assigned
> >>>> splits as fast as possible. And it is OK to have a drift of "one
> split",
> >>>> instead of a fixed time period.
> >>>>
> >>>> As another example, if there are some fast readers whose splits are
> >> always
> >>>> throttled, while the other slow readers are struggling to keep up with
> >> the
> >>>> rest of the splits, the split enumerator may decide to reassign the
> slow
> >>>> splits so all the readers have something to read. This would need the
> >>>> SplitEnumerator to be aware of the watermark progress on each reader.
> >> So it
> >>>> seems useful to expose the WatermarkAlignmentEvent information to the
> >>>> SplitEnumerator as well.
> >>>>
> >>

Re: Source alignment for Iceberg

2022-05-05 Thread Steven Wu
Piotr,

With FLIP-27, Iceberg source already implemented alignment by tracking
watermark and holding back split assignment when necessary.

The purpose of this discussion is to see if Iceberg source can leverage
some of the watermark alignment work from Flink framework.

Thanks,
Steven

On Thu, May 5, 2022 at 1:10 AM Piotr Nowojski  wrote:

> Ok, I see. Thanks to both of you for the explanation.
>
> Do we need changes to Apache Flink for this feature? Can it be implemented
> in the Sources without changes in the framework? I presume source can
> access min/max watermark from the split, so as long as it also knows
> exactly which splits have finished, it would know which splits to hold back.
>
> Best,
> Piotrek
>
> śr., 4 maj 2022 o 20:03 Steven Wu  napisał(a):
>
>> Piotr, thanks a lot for your feedback.
>>
>> > I can see this being an issue if the existence of too many blocked
>> splits is occupying too many resources.
>>
>> This is not desirable. Eagerly assigning many splits to a reader can
>> defeat the benefits of pull based dynamic split assignments. Iceberg
>> readers request one split at a time upon start or completion of a split.
>> Dynamic split assignment is better for work sharing/stealing as Becket
>> mentioned. Limiting number of active splits can be handled by the FLIP-27
>> Iceberg source and is somewhat orthogonal to watermark alignment.
>>
>> > Can not Iceberg just emit all splits and let FLIP-182/FLIP-217 handle
>> the watermark alignment and block the splits that are too much into the
>> future?
>>
>> The enumerator just assigns the next split to the requesting reader
>> instead of holding back the split assignment. Let the reader handle the
>> pause (if the file split requires alignment wait).  This strategy might
>> work and leverage more from the framework.
>>
>> We probably need the following to make this work
>> * extract watermark/timestamp only at the completion of a split (not at
>> record level). Because records in a file aren't probably not sorted by the
>> timestamp field, the pause or watermark advancement is probably better done
>> at file level.
>> * source readers checkpoint the watermark. otherwise, upon restart
>> readers won't be able to determine the local watermark and pause for
>> alignment. We don't want to emit records upon restart due to unknown
>> watermark info.
>>
>> All,
>>
>> Any opinion on different timestamp for source alignment (vs Flink
>> application watermark)? For Iceberg source, we might want to enforce
>> alignment on kafka timestamp but Flink application watermark may use event
>> time field from payload.
>>
>> Thanks,
>> Steven
>>
>> On Wed, May 4, 2022 at 7:02 AM Becket Qin  wrote:
>>
>>> Hey Piotr,
>>>
>>> I think the mechanism FLIP-182 provided is a reasonable default one,
>>> which
>>> ensures the watermarks are only drifted by an upper bound. However,
>>> admittedly there are also other strategies for different purposes.
>>>
>>> In the Iceberg case, I am not sure if a static strictly allowed watermark
>>> drift is desired. The source might just want to finish reading the
>>> assigned
>>> splits as fast as possible. And it is OK to have a drift of "one split",
>>> instead of a fixed time period.
>>>
>>> As another example, if there are some fast readers whose splits are
>>> always
>>> throttled, while the other slow readers are struggling to keep up with
>>> the
>>> rest of the splits, the split enumerator may decide to reassign the slow
>>> splits so all the readers have something to read. This would need the
>>> SplitEnumerator to be aware of the watermark progress on each reader. So
>>> it
>>> seems useful to expose the WatermarkAlignmentEvent information to the
>>> SplitEnumerator as well.
>>>
>>> Thanks,
>>>
>>> Jiangjie (Becket) Qin
>>>
>>>
>>>
>>> On Tue, May 3, 2022 at 7:58 PM Piotr Nowojski 
>>> wrote:
>>>
>>> > Hi Steven,
>>> >
>>> > Isn't this redundant to FLIP-182 and FLIP-217? Can not Iceberg just
>>> emit
>>> > all splits and let FLIP-182/FLIP-217 handle the watermark alignment and
>>> > block the splits that are too much into the future? I can see this
>>> being an
>>> > issue if the existence of too many blocked splits is occupying too many
>>> > resources.
>>> >
>>> > If that's the case, indeed SourceCoord

Re: Source alignment for Iceberg

2022-05-04 Thread Steven Wu
Piotr, thanks a lot for your feedback.

> I can see this being an issue if the existence of too many blocked splits
is occupying too many resources.

This is not desirable. Eagerly assigning many splits to a reader can defeat
the benefits of pull based dynamic split assignments. Iceberg readers
request one split at a time upon start or completion of a split. Dynamic
split assignment is better for work sharing/stealing as Becket mentioned.
Limiting number of active splits can be handled by the FLIP-27 Iceberg
source and is somewhat orthogonal to watermark alignment.

> Can not Iceberg just emit all splits and let FLIP-182/FLIP-217 handle the
watermark alignment and block the splits that are too much into the future?

The enumerator just assigns the next split to the requesting reader instead
of holding back the split assignment. Let the reader handle the pause (if
the file split requires alignment wait).  This strategy might work and
leverage more from the framework.

We probably need the following to make this work
* extract watermark/timestamp only at the completion of a split (not at
record level). Because records in a file aren't probably not sorted by the
timestamp field, the pause or watermark advancement is probably better done
at file level.
* source readers checkpoint the watermark. otherwise, upon restart readers
won't be able to determine the local watermark and pause for alignment. We
don't want to emit records upon restart due to unknown watermark info.

All,

Any opinion on different timestamp for source alignment (vs Flink
application watermark)? For Iceberg source, we might want to enforce
alignment on kafka timestamp but Flink application watermark may use event
time field from payload.

Thanks,
Steven

On Wed, May 4, 2022 at 7:02 AM Becket Qin  wrote:

> Hey Piotr,
>
> I think the mechanism FLIP-182 provided is a reasonable default one, which
> ensures the watermarks are only drifted by an upper bound. However,
> admittedly there are also other strategies for different purposes.
>
> In the Iceberg case, I am not sure if a static strictly allowed watermark
> drift is desired. The source might just want to finish reading the assigned
> splits as fast as possible. And it is OK to have a drift of "one split",
> instead of a fixed time period.
>
> As another example, if there are some fast readers whose splits are always
> throttled, while the other slow readers are struggling to keep up with the
> rest of the splits, the split enumerator may decide to reassign the slow
> splits so all the readers have something to read. This would need the
> SplitEnumerator to be aware of the watermark progress on each reader. So it
> seems useful to expose the WatermarkAlignmentEvent information to the
> SplitEnumerator as well.
>
> Thanks,
>
> Jiangjie (Becket) Qin
>
>
>
> On Tue, May 3, 2022 at 7:58 PM Piotr Nowojski 
> wrote:
>
> > Hi Steven,
> >
> > Isn't this redundant to FLIP-182 and FLIP-217? Can not Iceberg just emit
> > all splits and let FLIP-182/FLIP-217 handle the watermark alignment and
> > block the splits that are too much into the future? I can see this being
> an
> > issue if the existence of too many blocked splits is occupying too many
> > resources.
> >
> > If that's the case, indeed SourceCoordinator/SplitEnumerator would have
> to
> > decide on some basis how many and which splits to assign in what order.
> But
> > in that case I'm not sure how much you could use from FLIP-182 and
> > FLIP-217. They seem somehow orthogonal to me, operating on different
> > levels. FLIP-182 and FLIP-217 are working with whatever splits have
> already
> > been generated and assigned. You could leverage FLIP-182 and FLIP-217 and
> > take care of only the problem to limit the number of parallel active
> > splits. And here I'm not sure if it would be worth generalising a
> solution
> > across different connectors.
> >
> > Regarding the global watermark, I made a related comment sometime ago
> > about it [1]. It sounds to me like you also need to solve this problem,
> > otherwise Iceberg users will encounter late records in case of some race
> > conditions between assigning new splits and completions of older.
> >
> > Best,
> > Piotrek
> >
> > [1]
> >
> https://issues.apache.org/jira/browse/FLINK-21871?focusedCommentId=17495545=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17495545
> >
> > pon., 2 maj 2022 o 04:26 Steven Wu  napisał(a):
> >
> >> add dev@ group to the thread as Thomas suggested
> >>
> >> Arvid,
> >>
> >> The scenario 3 (Dynamic assignment + temporary no split) in the FLIP-180
> >> (idleness) can happen to Iceberg

Re: Source alignment for Iceberg

2022-05-01 Thread Steven Wu
n point in time what the min watermark across all
>> source subtasks is.
>>
>> Here is some background:
>> In the context of idleness, we can deterministically advance the
>> watermark. In the pre-FLIP-27 era, we had heuristic approaches in sources
>> to switch to idleness and thus allow watermarks to increase in cases where
>> fewer splits than source tasks are available. However, for sources with
>> dynamic split discovery that actually yields incorrect results. Think of a
>> Kinesis consumer where a shard is split. Then a previously idle source
>> subtask may receive a new split with time t0 as the lowest timestamp. Since
>> the source subtask did not participate in the global watermark generation
>> (because it was idle), the previously emitted watermark may be past t0 and
>> thus results in late records potentially being discarded. A rerun of the
>> same pipeline on historic data would not render the source subtask idle and
>> not result in late records. The solution was to not render source subtasks
>> automatically idle by the framework if there are no spits. That leads to
>> confusion for Kafka users with static topic subscription where #splits <
>> #parallelism stalls pipelines because the watermark is not advancing. Here,
>> your sketched solution can be transferred to KafkaSource to let Flink know
>> that min global watermark on a static assignment is determined by the
>> slowest partition. Hence, all idle readers emit that min global watermark
>> and the user sees progress.
>> This whole idea is related to FLIP-182 watermark alignment but I'd go
>> with another FLIP as the goal is quite different even though the
>> implementation overlaps.
>>
>> Now Iceberg seems to use the same information to actually pause the
>> consumption of files and create some kind of orderness guarantees as far as
>> I understood. This probably can be applied to any source with dynamic split
>> discovery. However, I wouldn't mix up the concepts and hence I appreciate
>> you not chiming into the FLIP-182 and ff. threads. The goal of FLIP-182 is
>> to pause readers while consuming a split, while your approach pauses
>> readers before processing another split. So it feels more closely related
>> to the global min watermark - so it could either be part of that FLIP or a
>> FLIP of its own. Afaik API changes should actually happen only on the
>> enumerator side both for your ideas and for global min watermark.
>>
>> Best,
>>
>> Arvid
>>
>> On Wed, Apr 27, 2022 at 7:31 PM Thomas Weise  wrote:
>>
>>> Hi Steven,
>>>
>>> Would it be better to bring this as a separate thread related to Iceberg
>>> source to the dev@ list? I think this could benefit from broader input?
>>>
>>> Thanks
>>>
>>> On Wed, Apr 27, 2022 at 9:36 AM Steven Wu  wrote:
>>>
>>>> + Becket and Sebastian
>>>>
>>>> It is also related to the split level watermark alignment discussion
>>>> thread. Because it is already very long, I don't want to further complicate
>>>> the ongoing discussion there. But I can move the discussion to that
>>>> existing thread if that is preferred.
>>>>
>>>>
>>>> On Tue, Apr 26, 2022 at 10:03 PM Steven Wu 
>>>> wrote:
>>>>
>>>>> Hi all,
>>>>>
>>>>> We are thinking about how to align with the Flink community and
>>>>> leverage the FLIP-182 watermark alignment in the Iceberg source. I put 
>>>>> some
>>>>> context in this google doc. Would love to get hear your thoughts on this.
>>>>>
>>>>>
>>>>> https://docs.google.com/document/d/1zfwF8e5LszazcOzmUAOeOtpM9v8dKEPlY_BRFSmI3us/edit#
>>>>>
>>>>> Thanks,
>>>>> Steven
>>>>>
>>>>


Re: [DISCUSS] FLIP-217 Support watermark alignment of source splits

2022-04-21 Thread Steven Wu
> However, a single source operator may read data from multiple
splits/partitions, e.g., multiple Kafka partitions, such that even with
watermark alignment the source operator may need to buffer excessive amount
of data if one split emits data faster than another.

For this part from the motivation section, is it accurate? Let's assume one
source task consumes from 3 partitions and one of the partition is
significantly slower. In this situation, watermark for this source task
won't hold back as it is reading recent data from other two Kafka
partitions. As a result, it won't hold back the overall watermark. I
thought the problem is that we may have late data for this slow partition.

I have another question about the restart. Say split alignment is
triggered. checkpoint is completed. job failed and restored from the last
checkpoint. because alignment decision is not checkpointed, initially
alignment won't be enforced until we get a cycle of watermark aggregation
and propagation, right? Not saying this corner is a problem. Just want to
understand it more.



On Thu, Apr 21, 2022 at 8:20 AM Thomas Weise  wrote:

> Thanks for working on this!
>
> I wonder if "supporting" split alignment in SourceReaderBase and then doing
> nothing if the split reader does not implement AlignedSplitReader could be
> misleading? Perhaps WithSplitsAlignment can instead be added to the
> specific source reader (i.e. KafkaSourceReader) to make it explicit that
> the source actually supports it.
>
> Thanks,
> Thomas
>
>
> On Thu, Apr 21, 2022 at 4:57 AM Konstantin Knauf 
> wrote:
>
> > Hi Sebastian, Hi Dawid,
> >
> > As part of this FLIP, the `AlignedSplitReader` interface (aka the stop &
> > resume behavior) will be implemented for Kafka and Pulsar only, correct?
> >
> > +1 in general. I believe it is valuable to complete the watermark aligned
> > story with this FLIP.
> >
> > Cheers,
> >
> > Konstantin
> >
> >
> >
> >
> >
> >
> >
> > On Thu, Apr 21, 2022 at 12:36 PM Dawid Wysakowicz <
> dwysakow...@apache.org>
> > wrote:
> >
> > > To be explicit, having worked on it, I support it ;) I think we can
> > > start a vote thread soonish, as there are no concerns so far.
> > >
> > > Best,
> > >
> > > Dawid
> > >
> > > On 13/04/2022 11:27, Sebastian Mattheis wrote:
> > > > Dear Flink developers,
> > > >
> > > > I would like to open a discussion on FLIP 217 [1] for an extension of
> > > > Watermark Alignment to perform alignment also in SplitReaders. To do
> > so,
> > > > SplitReaders must be able to suspend and resume reading from split
> > > sources
> > > > where the SourceOperator coordinates and controlls suspend and
> resume.
> > To
> > > > gather information about current watermarks of the SplitReaders, we
> > > extend
> > > > the internal WatermarkOutputMulitplexer and report watermarks to the
> > > > SourceOperator.
> > > >
> > > > There is a PoC for this FLIP [2], prototyped by Arvid Heise and
> revised
> > > and
> > > > reworked by Dawid Wysakowicz (He did most of the work.) and me. The
> > > changes
> > > > are backwards compatible in a way that if affected components do not
> > > > support split alignment the behavior is as before.
> > > >
> > > > Best,
> > > > Sebastian
> > > >
> > > > [1]
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-217+Support+watermark+alignment+of+source+splits
> > > > [2] https://github.com/dawidwys/flink/tree/aligned-splits
> > > >
> > >
> >
> >
> > --
> >
> > Konstantin Knauf
> >
> > https://twitter.com/snntrable
> >
> > https://github.com/knaufk
> >
>


Re: Re: Change of focus

2022-02-28 Thread Steven Wu
Till, thank you for your immense contributions to the project and the
community.

On Mon, Feb 28, 2022 at 9:16 PM Xintong Song  wrote:

> Thanks for everything, Till. It has been a great honor working with you.
> Good luck with your new chapter~!
>
> Thank you~
>
> Xintong Song
>
>
>
> On Tue, Mar 1, 2022 at 12:33 PM Zhilong Hong  wrote:
>
> > Thank you for everything, Till! I've learned a lot from you.
> >
> > Good luck with your new adventure and the next chapter!
> >
> > Best,
> > Zhilong
> >
> > On Tue, Mar 1, 2022 at 12:08 PM Yun Tang  wrote:
> >
> > > Thanks a lot for your efforts and kindness of mentoring contributors in
> > > Apache Flink community, Till!
> > >
> > > Good luck with your new adventure in your new life.
> > >
> > >
> > > Best
> > > Yun Tang
> > >
> > > 
> > > From: Yuan Mei 
> > > Sent: Tuesday, March 1, 2022 11:00
> > > To: dev 
> > > Subject: Re: Re: Change of focus
> > >
> > > Thanks Till for everything you've done for the community!
> > > Good luck with your new adventure and best wishes to your new life!
> > >
> > > Best Regards,
> > > Yuan
> > >
> > > On Tue, Mar 1, 2022 at 10:35 AM Zhu Zhu  wrote:
> > >
> > > > Thank you for all the efforts and good luck for the new adventure,
> > Till!
> > > >
> > > > Thanks,
> > > > Zhu
> > > >
> > > > Terry  于2022年3月1日周二 10:26写道:
> > > >
> > > > > Thanks a lot for your efforts! Good Luck!
> > > > >
> > > > > Jiangang Liu  于2022年3月1日周二 10:18写道:
> > > > >
> > > > > > Thanks for the efforts and help in flink, Till. Good luck!
> > > > > >
> > > > > > Best
> > > > > > Liu Jiangang
> > > > > >
> > > > > > Lijie Wang  于2022年3月1日周二 09:53写道:
> > > > > >
> > > > > > > Thanks for all your efforts Till. Good luck !
> > > > > > >
> > > > > > > Best,
> > > > > > > Lijie
> > > > > > >
> > > > > > > Yun Gao  于2022年3月1日周二 01:15写道:
> > > > > > >
> > > > > > > > Very thanks Till for all the efforts! Good luck for the next
> > > > chapter~
> > > > > > > >
> > > > > > > > Best,
> > > > > > > > Yun
> > > > > > > >
> > > > > > > >
> > > --
> > > > > > > > Sender:Piotr Nowojski
> > > > > > > > Date:2022/02/28 22:10:46
> > > > > > > > Recipient:dev
> > > > > > > > Theme:Re: Change of focus
> > > > > > > >
> > > > > > > > Good luck Till and thanks for all of your efforts.
> > > > > > > >
> > > > > > > > Best,
> > > > > > > > Piotrek
> > > > > > > >
> > > > > > > > pon., 28 lut 2022 o 15:06 Aitozi 
> > > > napisał(a):
> > > > > > > >
> > > > > > > > > Good luck with the next chapter, will miss you :)
> > > > > > > > >
> > > > > > > > > Best,
> > > > > > > > > Aitozi
> > > > > > > > >
> > > > > > > > > Jark Wu  于2022年2月28日周一 21:28写道:
> > > > > > > > >
> > > > > > > > > > Thank you Till for every things. It's great to work with
> > you.
> > > > > Good
> > > > > > > > luck!
> > > > > > > > > >
> > > > > > > > > > Best,
> > > > > > > > > > Jark
> > > > > > > > > >
> > > > > > > > > > On Mon, 28 Feb 2022 at 21:26, Márton Balassi <
> > > > > > > balassi.mar...@gmail.com
> > > > > > > > >
> > > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > Thank you, Till. Good luck with the next chapter. :-)
> > > > > > > > > > >
> > > > > > > > > > > On Mon, Feb 28, 2022 at 1:49 PM Flavio Pompermaier <
> > > > > > > > > pomperma...@okkam.it
> > > > > > > > > > >
> > > > > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > > Good luck for your new adventure Till!
> > > > > > > > > > > >
> > > > > > > > > > > > On Mon, Feb 28, 2022 at 12:00 PM Till Rohrmann <
> > > > > > > > trohrm...@apache.org
> > > > > > > > > >
> > > > > > > > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > Hi everyone,
> > > > > > > > > > > > >
> > > > > > > > > > > > > I wanted to let you know that I will be less active
> > in
> > > > the
> > > > > > > > > community
> > > > > > > > > > > > > because I’ve decided to start a new chapter in my
> > life.
> > > > > > Hence,
> > > > > > > > > please
> > > > > > > > > > > > don’t
> > > > > > > > > > > > > wonder if I might no longer be very responsive on
> > mails
> > > > and
> > > > > > > JIRA
> > > > > > > > > > > issues.
> > > > > > > > > > > > >
> > > > > > > > > > > > > It is great being part of such a great community
> with
> > > so
> > > > > many
> > > > > > > > > amazing
> > > > > > > > > > > > > people. Over the past 7,5 years, I’ve learned a lot
> > > > thanks
> > > > > to
> > > > > > > you
> > > > > > > > > and
> > > > > > > > > > > > > together we have shaped how people think about
> stream
> > > > > > > processing
> > > > > > > > > > > > nowadays.
> > > > > > > > > > > > > This is something we can be very proud of. I am
> sure
> > > that
> > > > > the
> > > > > > > > > > community
> > > > > > > > > > > > > will continue innovating and setting the pace for
> > what
> > > is
> > > > > > > > possible
> > > > > > > > > > with
> > > > > > > > > > > > > real time processing. I wish you all godspeed!
> > > > > > > > > > > > >
> 

Re: [DISCUSS] FLIP-191: Extend unified Sink interface to support small file compaction

2021-11-08 Thread Steven Wu
>  although I think only using a customizable shuffle won't address the
generation of small files. One assumption is that at least the sink generates
one file per subtask, which can already be too many. Another problem is
that with low checkpointing intervals, the files do not meet the required
size. The latter point is probably addressable by changing the checkpoint
interval, which might be inconvenient for some users.

Agree. I didn't mean that shuffling can solve all the problems of small
files. I was just trying to use it as an example. You touched a few other
causes that maybe we can discuss separately.

1. one file per subtask is already too many. Should we reduce the
parallelism for the writer operator?
2. low checkpoint intervals. How does the proposal address this cause?
smaller number of compactor tasks read and compact files? would it be the
same as lowering the parallelism of the upstream writer operator?

I am not trying to argue against the needs of compaction. Just try to
understand the different scenarios and see how the proposal helps

> the benefits of having a coordinator in comparison to a global
committer/aggregator operator.

One benefit is the potential of maintaining embarrassingly parallel DAG
(like source -> sink) where region failover only needs to recover a small
region when one TM node died. Whether this is a big benefit or not is
certainly up to debate

> Unfortunately, the downside is that you have to your data after it is
already available for other downstream consumers. I guess this can lead to
all kinds of visibility problems.

Yes, rewriting data can have visibility problems for non-transactional
sinks. If we are going to compact files before commit. Why not shuffle or
reduce parallelism in the first place? would it achieve a similar goal?
Otherwise, we are involving writing a bunch of files, planning compaction,
reading all small files in, and writing all data out to a smaller number of
files. File read-write/upload is probably more expensive than just network
shuffle.

For transactional sinks like Iceberg, this is not a concern. It is good to
make data available ASAP. Compaction can happen after the commit (in the
same Flink streaming job or a separate batch maintenance job)

Thanks,
Steven





On Mon, Nov 8, 2021 at 3:59 AM Fabian Paul  wrote:

> Hi all,
>
> Thanks for the lively discussions. I am really excited to see so many
> people
> participating in this thread. It also underlines the need that many people
> would
> like to see a solution soon.
>
> I have updated the FLIP and removed the parallelism configuration because
> it is
> unnecessary since users can configure a constant exchange key to send all
> committables to only one committable aggregator.
>
>
> 1. Burden for developers w.r.t batch stream unification.
>
> @yun @guowei, from a theoretical point you are right about exposing the
> DataStream
> API in the sink users have the full power to write correct batch and
> streaming
> sinks. I think in reality a lot of users still struggle to build pipelines
> with
> i.e. the operator pipeline which works correct in streaming and batch mode.
> Another problem I see is by exposing more deeper concepts is that we
> cannot do
> any optimization because we cannot reason about how sinks are built in the
> future.
>
> We should also try to steer users towards using only `Functions` to give
> us more
> flexibility to swap the internal operator representation. I agree with
> @yun we
> should try to make the `ProcessFunction` more versatile to work on that
> goal but
> I see this as unrelated to the FLIP.
>
>
> 2. Regarding Commit / Global commit
>
> I envision the global committer to be specific depending on the data lake
> solution you want to write to. However, it is entirely orthogonal to the
> compaction.
> Currently, I do not expect any changes w.r.t the Global commit introduces
> by
> this FLIP.
>
>
> 3. Regarding the case of trans-checkpoints merging
>
> @yun, as user, I would expect that if the committer receives in a
> checkpoint files
> to merge/commit that these are also finished when the checkpoint finishes.
> I think all sinks rely on this principle currently i.e., KafkaSink needs to
> commit all open transactions until the next checkpoint can happen.
>
> Maybe in the future, we can somehow move the Committer#commit call to an
> asynchronous execution, but we should discuss it as a separate thread.
>
> > We probably should first describe the different causes of small files and
> > what problems was this proposal trying to solve. I wrote a data shuffling
> > proposal [1] for Flink Iceberg sink (shared with Iceberg community [2]).
> It
> > can address small files problems due to skewed data distribution across
> > Iceberg table partitions. Streaming shuffling before writers (to files)
> is
> > typically more efficient than post-write file compaction (which involves
> > read-merge-write). It is usually cheaper to prevent a problem (small
> files)
> > than 

Re: [DISCUSS] FLIP-191: Extend unified Sink interface to support small file compaction

2021-11-06 Thread Steven Wu
Fabian, thanks a lot for the proposal and starting the discussion.

We probably should first describe the different causes of small files and
what problems was this proposal trying to solve. I wrote a data shuffling
proposal [1] for Flink Iceberg sink (shared with Iceberg community [2]). It
can address small files problems due to skewed data distribution across
Iceberg table partitions. Streaming shuffling before writers (to files) is
typically more efficient than post-write file compaction (which involves
read-merge-write). It is usually cheaper to prevent a problem (small files)
than fixing it.

The sink coordinator checkpoint problem (mentioned in option 1) would be
great if Flink can address it. In the spirit of source (enumerator-reader)
and sink (writer-coordinator) duality, sink coordinator checkpoint should
happen after the writer operator. This would be a natural fit to support
global committer in FLIP-143. It is probably an orthogonal matter to this
proposal.

Personally, I am usually in favor of keeping streaming ingestion (to data
lake) relatively simple and stable. Also sometimes compaction and sorting
are performed together in data rewrite maintenance jobs to improve read
performance. In that case, the value of compacting (in Flink streaming
ingestion) diminishes.

Currently, it is unclear from the doc and this thread where the compaction
is actually happening. Jingsong's reply described one model
writer (parallel) -> aggregator (single-parallelism compaction planner) ->
compactor (parallel) -> global committer (single-parallelism)

In the Iceberg community, the following model has been discussed. It is
better for Iceberg because it won't delay the data availability.
writer (parallel) -> global committer for append (single parallelism) ->
compactor (parallel) -> global committer for rewrite commit (single
parallelism)

Thanks,
Steven

[1]
https://docs.google.com/document/d/13N8cMqPi-ZPSKbkXGOBMPOzbv2Fua59j8bIjjtxLWqo/
[2] https://www.mail-archive.com/dev@iceberg.apache.org/msg02889.html




On Thu, Nov 4, 2021 at 4:46 AM Yun Gao  wrote:

> Hi all,
>
> Very thanks for Fabian drafting the FLIP and the warm discussion!
>
> I'd like to complement some points based on the previous discussion:
>
> 1. Regarding the case of trans-checkpoints merging
>
> I agree with that all the options would not block the current checkpoint
> that producing the files to commit, but I think Guowei is referring to
> another
> issue: suppose the files are created in checkpoint 1 to 10 and we want to
> merge
> the files created in 10 checkpoints, then if with arbitrary
> topology we might merge the files during checkpoint 11 to 20, without
> blocking
> the following checkpoints. But if the compaction happens in
> Committer#commit
> as the option 2, I think perhaps with the current mechanism the commit
> need to
> be finished before checkpoint 11.
>
> 2. Regarding Commit / Global commit
>
> As a whole, I think perhaps whether we have compaction should be
> independent from whether we have the global committer ? The global
> committer is initially used to write the bucket meta after all the files
> of a single bucket is committed. Thus
> a) If there are failure & retry, the global commit should be wait.
> b) I think with all the options, the committable should represents a file
> after
>  compaction. It might be directly a file after compaction or a list of
> small files
> that wait to commit. Also, we would not need to wait for another
> checkpoint if
> we only use it in meta-writing cases. But still, I think the behavior here
> does not
> change whether we have compaction ?
>
> In fact, perhaps a better abstraction from my view is to remove the
> GlobalCommitter directly and only have one-level of committer.  If users
> need
> writing meta, then the action of writing metadata should be viewed as
> "commit". The users could write to the formal files freely, if there are
> failover, he could directly remove the unnecessary ones since all these
> files are invisible yet.
> But this might be a different topic.
>
> 3. Regarding the comparison of the API
>
> I totally agree with that for the specific case of compaction, option 2
> would
> indeed be easy to use since we have considered this case when we designed
> the new API. But as a whole, from another view, I think perhaps writing a
> stream / batch unified program with the DataStream API should not be that
> hard? It does not increase more difficulty compared to writing a normal
> stream / batch unified flink job. For the specific issues we mentioned, I
> think
> based on the previous discussion, we should finally add `finish()` to the
> UDF, and for now I think we could at least first add it to family of
> `ProcessFunction`.
>
> Best,
> Yun
>
>
>
> --
> From:Arvid Heise 
> Send Time:2021 Nov. 4 (Thu.) 16:55
> To:Till Rohrmann 
> Cc:dev ; "David Morávek" 
> Subject:Re: [DISCUSS] FLIP-191: Extend unified Sink 

Re: [VOTE] FLIP-179: Expose Standardized Operator Metrics

2021-07-30 Thread Steven Wu
+1 (non-binding)

On Fri, Jul 30, 2021 at 3:55 AM Arvid Heise  wrote:

> Dear devs,
>
> I'd like to open a vote on FLIP-179: Expose Standardized Operator Metrics
> [1] which was discussed in this thread [2].
> The vote will be open for at least 72 hours unless there is an objection
> or not enough votes.
>
> The proposal excludes the implementation for the currentFetchEventTimeLag
> metric, which caused a bit of discussion without a clear convergence. We
> will implement that metric in a generic way at a later point and encourage
> sources to implement it themselves in the meantime.
>
> Best,
>
> Arvid
>
> [1]
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-179%3A+Expose+Standardized+Operator+Metrics
> [2]
>
> https://lists.apache.org/thread.html/r856920cbfe6a262b521109c5bdb9e904e00a9b3f1825901759c24d85%40%3Cdev.flink.apache.org%3E
>


Re: [RESULT][VOTE] FLIP-147: Support Checkpoint After Tasks Finished

2021-07-21 Thread Steven Wu
> if a failure happens after sequence of finish() -> snapshotState(), but
before notifyCheckpointComplete(), we will restore such a state and we
might end up sending some more records to such an operator.

I probably missed sth here. isn't this the case today already? Why is it a
concern for the proposed change?

On Wed, Jul 21, 2021 at 4:39 AM Piotr Nowojski  wrote:

> Hi Dawid,
>
> Thanks for writing down those concerns.
>
> I think the first issue boils down what should be the contract of lifecycle
> methods like open(), close(), initializeState() etc and especially the new
> additions like finish() and endInput(). And what should be their relation
> with the operator state (regardless of it's type keyed, non-keyed, union,
> ...). Should those methods be tied to state or not? After thinking about it
> for a while (and discussing it offline with Dawid), I think the answer
> might be no, they shouldn't. I mean maybe we should just openly say that
> all of those methods relate to this single particular instance and
> execution of the operator. And if a job is recovered/rescaled, we would be
> allowed to freely resume consumption, ignoring a fact that maybe some parts
> of the state have previously seen `endInput()`. Why?
>
> 0. Yes, it might be confusing. Especially with `endInput()`. We call
> `endInput()`, we store something in a state and later after recovery
> combined with rescaling that state can see more records? Indeed weird,
> 1. I haven't come up yet with a counterexample that would break and make
> impossible to implement a real life use case. Theoretically yes, the user
> can store `endInput()` on state, and after rescaling this state would be
> inconsistent with what is actually happening with the operator, but I
> haven't found a use case that would break because of that.
> 2. Otherwise, implementation would be very difficult.
> 3. It's difficult to access keyed state from within `endInput()`/`finish()`
> calls, as they do not have key context.
> 4. After all, openly defining `endInput()` and `finish()` to be tied with
> it's operator execution instance lifecycle is not that strange and quite
> simple to explain. Sure, it can lead to a bit of confusion (0.), but that
> doesn't sound that bad in comparison with the alternatives that I'm aware
> of. Also currently methods like `open()` and `close()` are also tied to the
> operator execution instance, not to the state. Operators can be opened and
> closed multiple times, it doesn't mean that the state is lost after closing
> an operator.
>
> For the UnionListState problem I have posted my proposal in the ticket [1],
> so maybe let's move that particular discussion there?
>
> Piotrek
>
> [1] https://issues.apache.org/jira/browse/FLINK-21080
>
> śr., 21 lip 2021 o 12:39 Dawid Wysakowicz 
> napisał(a):
>
> > Hey all,
> >
> > To make the issues that were found transparent to the community, I want
> to
> > post an update:
> >
> > *1. Committing side-effects*
> > We do want to make sure that all side effects are committed before
> > bringing tasks down. Side effects are committed when calling
> > notifyCheckpointComplete. For the final checkpoint we introduced the
> method
> > finish(). This notifies the operator that we have consumed all incoming
> > records and we are preparing to close the Task. In turn we should flush
> any
> > pending buffered records and prepare to commit last transactions. The
> goal
> > is that after a successful sequence of finish() -> snapshotState() ->
> > notifyCheckpointComplete(), the remaining state can be considered
> > empty/finished and may be discarded.
> >
> > *Failure before notifyCheckpointComplete()*
> >
> > The question is what is the contract of the endInput()/finish() methods
> > and how do calling these methods affect the operators keyed, non-keyed
> > state and external state. Is it allowed to restore state snapshot taken
> > after calling endInput()/finish() and process more records? Or do we
> assume
> > that after a restore from such a state taken after finish() we should not
> > call any of the lifecycle methods or at least make sure those methods do
> > not emit records/interact with mailbox etc.
> >
> > Currently it is possible that if a failure happens after sequence of
> > finish() -> snapshotState(), but before notifyCheckpointComplete(), we
> will
> > restore such a state and we might end up sending some more records to
> such
> > an operator. It is possible if we rescale and this state is merged with a
> > state of a subtask that has not called finish() yet. It can also happen
> if
> > we rescale the upstream operator and the subtask of interest becomes
> > connected to a newly added non finished subtask.
> >
> > *Snapshotting StreamTasks that finish() has been called*
> >
> >
> > We thought about putting a flag into the snapshot of a subtask produced
> > after the finish() method. This would make it possible to skip execution
> of
> > certain lifecycle methods. Unfortunately this creates 

Re: [DISCUSS] FLIP-179: Expose Standardized Operator Metrics

2021-07-19 Thread Steven Wu
Thanks, Arvid!

+1 for SinkWriterMetricGroup. Sink is a little more tricky, because it can
have local committer (running on TM) or global committer (running on JM).
In the future, it is possible to add SinkCommitterMetricGroup or
SinkGlobalCommitterMetricGroup.

Regarding "lastFetchTime" latency metric, I found Gauge to be less
informative as it only captures the last sampling value for each metric
publish interval (e.g. 60s).
* Can we make it a histogram? Histograms are more expensive though.
* Timer [1, 2] is cheaper as it just tracks min, max, avg, count. but there
is no such metric type in Flink
* Summary metric type [3] (from Prometheus) would be nice too

[1] https://netflix.github.io/spectator/en/latest/intro/timer/#timer
[2]
https://docs.spring.io/spring-metrics/docs/current/public/prometheus#timers
[3] https://prometheus.io/docs/concepts/metric_types/#summary


On Mon, Jul 19, 2021 at 12:22 AM Arvid Heise  wrote:

> Hi Steven,
>
> I extended the FLIP and its draft PR to have a SourceReaderMetricGroup and
> a SplitEnumeratorMetricGroup. I hope that it makes it clearer.
> I'd like to address FLINK-21000 as part of the implementation but I'd keep
> it out of the FLIP discussion.
>
> Question: should we rename SinkMetricGroup to SinkWriterMetricGroup? I can
> see the same confusion arising on sink side. I have added a commit to the
> draft PR (not updated FLIP yet).
>
> Btw I'd like to start the vote soonish. @Becket Qin 
> are you okay with the setLastFetchTimeGauge explanation or do you have
> alternative ideas?
>
> Best,
>
> Arvid
>
> On Fri, Jul 16, 2021 at 8:13 PM Steven Wu  wrote:
>
> > To avoid confusion, can we either rename "SourceMetricGroup" to "
> > SplitReaderMetricGroup" or add "Reader" to the setter method names?
> >
> > Yes, we should  add the "unassigned/pending splits" enumerator metric. I
> > tried to publish those metrics for IcebergSourceEnumerator and ran into
> an
> > issue [1]. I don't want to distract the discussion with the jira ticket.
> >
> > [1] https://issues.apache.org/jira/browse/FLINK-21000
> >
> > On Thu, Jul 15, 2021 at 1:01 PM Arvid Heise  wrote:
> >
> > > Hi Steven,
> > >
> > > The semantics are unchanged compared to FLIP-33 [1] but I see your
> point.
> > >
> > > In reality, pending records would be mostly for event storage systems
> > > (Kafka, Kinesis, ...). Here, we would report the consumer lag
> > effectively.
> > > If consumer lag is more prominent, we could also rename it.
> > >
> > > For pending bytes, this is mostly related to file source or any kind of
> > > byte streams. At this point, we can only capture the assigned splits on
> > > reader levels. I don't think it makes sense to add the same metric to
> the
> > > enumerator as that might induce too much I/O on the job master. I could
> > > rather envision another metric that captures how many unassigned splits
> > > there are. In general, I think it would be a good idea to add another
> > type
> > > of top-level metric group for SplitEnumerator called
> > > SplitEnumeratorMetricGroup in SplitEnumeratorContext. There we could
> add
> > > unassigned/pending splits metric. WDYT?
> > >
> > > [1]
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-33%3A+Standardize+Connector+Metrics
> > >
> > > On Wed, Jul 14, 2021 at 9:00 AM Steven Wu 
> wrote:
> > >
> > > > I am trying to understand what those two metrics really capture
> > > >
> > > > > G setPendingBytesGauge(G pendingBytesGauge);
> > > >
> > > >-  use file source as an example, it captures the remaining bytes
> > for
> > > >the current file split that the reader is processing? How would
> > users
> > > >interpret or use this metric? enumerator keeps tracks of the
> > > >pending/unassigned splits, which is an indication of the size of
> the
> > > >backlog. that would be very useful
> > > >
> > > >
> > > > > G setPendingRecordsGauge(G
> > pendingRecordsGauge);
> > > >
> > > >- In the Kafka source case, this is intended to capture the
> consumer
> > > lag
> > > >(log head offset from broker - current record offset)? that could
> be
> > > > used
> > > >to capture the size of the backlog
> > > >
> > > >
> > > >
> > > > On Tue, Jul 13, 2021 at 3:01 PM Arvid Heise 
> wrote:
> >

Re: [DISCUSS] FLIP-179: Expose Standardized Operator Metrics

2021-07-16 Thread Steven Wu
To avoid confusion, can we either rename "SourceMetricGroup" to "
SplitReaderMetricGroup" or add "Reader" to the setter method names?

Yes, we should  add the "unassigned/pending splits" enumerator metric. I
tried to publish those metrics for IcebergSourceEnumerator and ran into an
issue [1]. I don't want to distract the discussion with the jira ticket.

[1] https://issues.apache.org/jira/browse/FLINK-21000

On Thu, Jul 15, 2021 at 1:01 PM Arvid Heise  wrote:

> Hi Steven,
>
> The semantics are unchanged compared to FLIP-33 [1] but I see your point.
>
> In reality, pending records would be mostly for event storage systems
> (Kafka, Kinesis, ...). Here, we would report the consumer lag effectively.
> If consumer lag is more prominent, we could also rename it.
>
> For pending bytes, this is mostly related to file source or any kind of
> byte streams. At this point, we can only capture the assigned splits on
> reader levels. I don't think it makes sense to add the same metric to the
> enumerator as that might induce too much I/O on the job master. I could
> rather envision another metric that captures how many unassigned splits
> there are. In general, I think it would be a good idea to add another type
> of top-level metric group for SplitEnumerator called
> SplitEnumeratorMetricGroup in SplitEnumeratorContext. There we could add
> unassigned/pending splits metric. WDYT?
>
> [1]
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-33%3A+Standardize+Connector+Metrics
>
> On Wed, Jul 14, 2021 at 9:00 AM Steven Wu  wrote:
>
> > I am trying to understand what those two metrics really capture
> >
> > > G setPendingBytesGauge(G pendingBytesGauge);
> >
> >-  use file source as an example, it captures the remaining bytes for
> >the current file split that the reader is processing? How would users
> >interpret or use this metric? enumerator keeps tracks of the
> >pending/unassigned splits, which is an indication of the size of the
> >backlog. that would be very useful
> >
> >
> > > G setPendingRecordsGauge(G pendingRecordsGauge);
> >
> >- In the Kafka source case, this is intended to capture the consumer
> lag
> >(log head offset from broker - current record offset)? that could be
> > used
> >to capture the size of the backlog
> >
> >
> >
> > On Tue, Jul 13, 2021 at 3:01 PM Arvid Heise  wrote:
> >
> > > Hi Becket,
> > >
> > > I believe 1+2 has been answered by Chesnay already. Just to add to 2:
> I'm
> > > not the biggest fan of reusing task metrics but that's what FLIP-33 and
> > > different folks suggested. I'd probably keep task I/O metrics only for
> > > internal things and add a new metric for external calls. Then, we could
> > > even allow users to track I/O in AsyncIO (which would currently be a
> > mess).
> > > However, with the current abstraction, it would be relatively easy to
> add
> > > separate metrics later.
> > >
> > > 3. As outlined in the JavaDoc and in the draft PR [1], it's up to the
> > user
> > > to implement it in a way that fetch time always corresponds to the
> latest
> > > polled record. For SourceReaderBase, I have added a new
> > > RecordsWithSplitIds#lastFetchTime (with default return value null) that
> > > sets the last fetch time automatically whenever the next batch is
> > selected.
> > > Tbh this metric is a bit more challenging to implement for
> > > non-SourceReaderBase sources but I have not found a better, thread-safe
> > > way. Of course, we could shift the complete calculation into user-land
> > but
> > > I'm not sure that this is easier.
> > > For your scenarios:
> > > - in A, you assume SourceReaderBase. In that case, we could eagerly
> > report
> > > the metric as sketched by you. It depends on the definition of "last
> > > processed record" in FLIP-33, whether this eager reporting is more
> > correct
> > > than the lazy reporting that I have proposed. The former case assumes
> > "last
> > > processed record" = last fetched record, while the latter case assumes
> > > "last processed record" = "last polled record". For the proposed
> > solution,
> > > the user would just need to implement
> RecordsWithSplitIds#lastFetchTime,
> > > which typically corresponds to the creation time of the
> > RecordsWithSplitIds
> > > instance.
> > > - B is not assuming SourceReaderBase.
> > > If it's SourceReaderBase, the same 

Re: [DISCUSS] FLIP-183: Dynamic buffer size adjustment

2021-07-15 Thread Steven Wu
I really like the new idea.

On Thu, Jul 15, 2021 at 11:51 AM Piotr Nowojski 
wrote:

> Hi Till,
>
> >  I assume that buffer sizes are only
> > changed for newly assigned buffers/credits, right? Otherwise, the data
> > could already be on the wire and then it wouldn't fit on the receiver
> side.
> > Or do we have a back channel mechanism to tell the sender that a part of
> a
> > buffer needs to be resent once more capacity is available?
>
> Initially our implementation proposal was intending to implement the first
> option. Buffer size would be attached to a credit message, so first
> received would need to allocate a buffer with the updated size, send the
> credit upstream, and sender would be allowed to only send as much data as
> in the credit. So there would be no way and no problem with changing buffer
> sizes while something is "on the wire".
>
> However Anton suggested an even simpler idea to me today. There is actually
> no problem with receivers supporting all buffer sizes up to the maximum
> allowed size (current configured memory segment size). Thus new buffer size
> can be treated as a recommendation by the sender. We can announce a new
> buffer size, and the sender will start capping the newly requested buffer
> to that size, but we can still send already filled buffers in chunks with
> any size, as long as it's below max memory segment size. In this way we can
> leave any already filled in buffers on the sender side untouched and we do
> not need to partition/slice them before sending them down, making at least
> the initial version even simpler. This way we also do not need to
> differentiate that different credits have different sizes. We just announce
> a single value "recommended/requested buffer size".
>
> Piotrek
>
> czw., 15 lip 2021 o 17:27 Till Rohrmann  napisał(a):
>
> > Hi everyone,
> >
> > Thanks a lot for creating this FLIP Anton and Piotr. I think it looks
> like
> > a very promising solution for speeding up our checkpoints and being able
> to
> > create them more reliably.
> >
> > Following up on Steven's question: I assume that buffer sizes are only
> > changed for newly assigned buffers/credits, right? Otherwise, the data
> > could already be on the wire and then it wouldn't fit on the receiver
> side.
> > Or do we have a back channel mechanism to tell the sender that a part of
> a
> > buffer needs to be resent once more capacity is available?
> >
> > Cheers,
> > Till
> >
> > On Wed, Jul 14, 2021 at 11:16 AM Piotr Nowojski 
> > wrote:
> >
> > > Hi Steven,
> > >
> > > As downstream/upstream nodes are decoupled, if downstream nodes adjust
> > > first it's buffer size first, there will be a lag until this updated
> > buffer
> > > size information reaches the upstream node.. It is a problem, but it
> has
> > a
> > > quite simple solution that we described in the FLIP document:
> > >
> > > > Sending the buffer of the right size.
> > > > It is not enough to know just the number of available buffers
> (credits)
> > > for the downstream because the size of these buffers can be different.
> > > > So we are proposing to resolve this problem in the following way: If
> > the
> > > downstream buffer size is changed then the upstream should send
> > > > the buffer of the size not greater than the new one regardless of how
> > big
> > > the current buffer on the upstream. (pollBuffer should receive
> > > > parameters like bufferSize and return buffer not greater than it)
> > >
> > > So apart from adding buffer size information to the `AddCredit`
> message,
> > we
> > > will need to support a case where upstream subpartition has already
> > > produced a buffer with older size (for example 32KB), while the next
> > credit
> > > arrives with an allowance for a smaller size (16KB). In that case, we
> are
> > > only allowed to send a portion of the data from this buffer that fits
> > into
> > > the new updated buffer size, and keep announcing the remaining part as
> > > available backlog.
> > >
> > > Best,
> > > Piotrek
> > >
> > >
> > > śr., 14 lip 2021 o 08:33 Steven Wu  napisał(a):
> > >
> > > >- The subtask observes the changes in the throughput and changes
> the
> > > >buffer size during the whole life period of the task.
> > > >- The subtask sends buffer size and number of available buffers to
> > the
> > > >upstream to the correspo

Re: [NOTICE] flink-runtime now scala-free

2021-07-15 Thread Steven Wu
This is awesome. Thank you, Chesney!

On Wed, Jul 14, 2021 at 1:50 AM Yun Tang  wrote:

> Great news, thanks for Chesnay's work!
>
> Best
> Yun Tang
> 
> From: Martijn Visser 
> Sent: Wednesday, July 14, 2021 16:05
> To: dev@flink.apache.org 
> Subject: Re: [NOTICE] flink-runtime now scala-free
>
> This is a great achievement, thank you for driving this!
>
> On Tue, 13 Jul 2021 at 18:20, Chesnay Schepler  wrote:
>
> > Hello everyone,
> >
> > I just merged the last PR for FLINK-14105, with which flink-runtime is
> > now officially scala-free. *fireworks*
> >
> >
> > What does that mean in practice?
> >
> > a) flink-runtime no longer has a scala-suffix, which cascaded into other
> > modules (e.g., our reporter modules). This _may_ cause some hiccups when
> > switching between branches. So far things worked fine, but I wanted to
> > mention the possibility.
> >
> > b) The mechanism with which Akka is now loaded requires that
> > flink-rpc-akka-loader is built through maven, because of a special build
> > step in the 'process-resources' phase. If you have so far build things
> > exclusively via IntelliJ, then you will need to run the
> > 'process-resources' in flink-rpc/flink-rpc-akka-loader at least once,
> > and then whenever you fully rebuilt the project (because it cleans the
> > target/ directory). By-and-large this shouldn't change things
> > significantly, because the 'process-resources' phase is also used for
> > various code-generation build steps.
> >
> >
>


Re: [DISCUSS] FLIP-179: Expose Standardized Operator Metrics

2021-07-14 Thread Steven Wu
I am trying to understand what those two metrics really capture

> G setPendingBytesGauge(G pendingBytesGauge);

   -  use file source as an example, it captures the remaining bytes for
   the current file split that the reader is processing? How would users
   interpret or use this metric? enumerator keeps tracks of the
   pending/unassigned splits, which is an indication of the size of the
   backlog. that would be very useful


> G setPendingRecordsGauge(G pendingRecordsGauge);

   - In the Kafka source case, this is intended to capture the consumer lag
   (log head offset from broker - current record offset)? that could be used
   to capture the size of the backlog



On Tue, Jul 13, 2021 at 3:01 PM Arvid Heise  wrote:

> Hi Becket,
>
> I believe 1+2 has been answered by Chesnay already. Just to add to 2: I'm
> not the biggest fan of reusing task metrics but that's what FLIP-33 and
> different folks suggested. I'd probably keep task I/O metrics only for
> internal things and add a new metric for external calls. Then, we could
> even allow users to track I/O in AsyncIO (which would currently be a mess).
> However, with the current abstraction, it would be relatively easy to add
> separate metrics later.
>
> 3. As outlined in the JavaDoc and in the draft PR [1], it's up to the user
> to implement it in a way that fetch time always corresponds to the latest
> polled record. For SourceReaderBase, I have added a new
> RecordsWithSplitIds#lastFetchTime (with default return value null) that
> sets the last fetch time automatically whenever the next batch is selected.
> Tbh this metric is a bit more challenging to implement for
> non-SourceReaderBase sources but I have not found a better, thread-safe
> way. Of course, we could shift the complete calculation into user-land but
> I'm not sure that this is easier.
> For your scenarios:
> - in A, you assume SourceReaderBase. In that case, we could eagerly report
> the metric as sketched by you. It depends on the definition of "last
> processed record" in FLIP-33, whether this eager reporting is more correct
> than the lazy reporting that I have proposed. The former case assumes "last
> processed record" = last fetched record, while the latter case assumes
> "last processed record" = "last polled record". For the proposed solution,
> the user would just need to implement RecordsWithSplitIds#lastFetchTime,
> which typically corresponds to the creation time of the RecordsWithSplitIds
> instance.
> - B is not assuming SourceReaderBase.
> If it's SourceReaderBase, the same proposed solution works out of the box:
> SourceOperator intercepts the emitted event time and uses the fetch time of
> the current batch.
> If it's not SourceReaderBase, the user would need to attach the timestamp
> to the handover protocol if multi-threaded and set the lastFetchTimeGauge
> when a value in the handover protocol is selected (typically a batch).
> If it's a single threaded source, the user could directly set the current
> timestamp after fetching the records in a sync fashion.
> The bad case is if the user is fetching individual records (either sync or
> async), then the fetch time would be updated with every record. However,
> I'm assuming that the required system call is dwarfed by involved I/O.
>
> [1] https://github.com/apache/flink/pull/15972
>
> On Tue, Jul 13, 2021 at 12:58 PM Chesnay Schepler 
> wrote:
>
> > Re 1: We don't expose the reuse* methods, because the proposed
> > OperatorIOMetricGroup is a separate interface from the existing
> > implementations (which will be renamed and implement the new interface).
> >
> > Re 2: Currently the plan is to re-use the "new" numByesIn/Out counters
> > for tasks ("new" because all we are doing is exposing already existing
> > metrics). We may however change this in the future if we want to report
> > the byte metrics on an operator level, which is primarily interesting
> > for async IO or other external connectivity outside of sinks/sources.
> >
> > On 13/07/2021 12:38, Becket Qin wrote:
> > > Hi Arvid,
> > >
> > > Thanks for the proposal. I like the idea of exposing concrete metric
> > group
> > > class so that users can access the predefined metrics.
> > >
> > > A few questions are following:
> > >
> > > 1. When exposing the OperatorIOMetrics to the users, we are also
> exposing
> > > the reuseInputMetricsForTask to the users. Should we hide these two
> > methods
> > > because users won't have enough information to decide whether the
> records
> > > IO metrics should be reused by the task or not.
> > >
> > > 2. Similar to question 1, in the OperatorIOMetricGroup, we are adding
> > > numBytesInCounter and numBytesOutCounter. Should these metrics be
> reusing
> > > the task level metrics by default?
> > >
> > > 3. Regarding SourceMetricGroup#setLastFetchTimeGauge(), I am not sure
> how
> > > it works with the FetchLag. Typically there are two cases when
> reporting
> > > the fetch lag.
> > >  A. The EventTime is known at the point 

Re: [DISCUSS] FLIP-183: Dynamic buffer size adjustment

2021-07-14 Thread Steven Wu
   - The subtask observes the changes in the throughput and changes the
   buffer size during the whole life period of the task.
   - The subtask sends buffer size and number of available buffers to the
   upstream to the corresponding subpartition.
   - Upstream changes the buffer size corresponding to the received
   information.
   - Upstream sends the data and number of filled buffers to the downstream


Will the above steps of buffer size adjustment cause problems with
credit-based flow control (mainly for downsizing), since downstream
adjust down first?

Here is the quote from the blog[1]
"Credit-based flow control makes sure that whatever is “on the wire” will
have capacity at the receiver to handle. "


[1]
https://flink.apache.org/2019/06/05/flink-network-stack.html#credit-based-flow-control


On Tue, Jul 13, 2021 at 7:34 PM Yingjie Cao  wrote:

> Hi,
>
> Thanks for driving this, I think it is really helpful for jobs suffering
> from backpressure.
>
> Best,
> Yingjie
>
> Anton,Kalashnikov  于2021年7月9日周五 下午10:59写道:
>
> > Hey!
> >
> > There is a wish to decrease amount of in-flight data which can improve
> > aligned checkpoint time(fewer in-flight data to process before
> > checkpoint can complete) and improve the behaviour and performance of
> > unaligned checkpoints (fewer in-flight data that needs to be persisted
> > in every unaligned checkpoint). The main idea is not to keep as much
> > in-flight data as much memory we have but keeping the amount of data
> > which can be predictably handling for configured amount of time(ex. we
> > keep data which can be processed in 1 sec). It can be achieved by
> > calculation of the effective throughput and following changes the buffer
> > size based on the this throughput. More details about the proposal you
> > can find here [1].
> >
> > What are you thoughts about it?
> >
> >
> > [1]
> >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-183%3A+Dynamic+buffer+size+adjustment
> >
> >
> > --
> > Best regards,
> > Anton Kalashnikov
> >
> >
> >
>


Re: [ANNOUNCE] New PMC member: Guowei Ma

2021-07-08 Thread Steven Wu
Awesome! Congratulations, Guowei!

On Wed, Jul 7, 2021 at 4:25 AM Jingsong Li  wrote:

> Congratulations, Guowei!
>
> Best,
> Jingsong
>
> On Wed, Jul 7, 2021 at 6:36 PM Arvid Heise  wrote:
>
> > Congratulations!
> >
> > On Wed, Jul 7, 2021 at 11:30 AM Till Rohrmann 
> > wrote:
> >
> > > Congratulations, Guowei!
> > >
> > > Cheers,
> > > Till
> > >
> > > On Wed, Jul 7, 2021 at 9:41 AM Roman Khachatryan 
> > wrote:
> > >
> > > > Congratulations!
> > > >
> > > > Regards,
> > > > Roman
> > > >
> > > > On Wed, Jul 7, 2021 at 8:24 AM Rui Li  wrote:
> > > > >
> > > > > Congratulations Guowei!
> > > > >
> > > > > On Wed, Jul 7, 2021 at 1:01 PM Benchao Li 
> > > wrote:
> > > > >
> > > > > > Congratulations!
> > > > > >
> > > > > > Dian Fu  于2021年7月7日周三 下午12:46写道:
> > > > > >
> > > > > > > Congratulations, Guowei!
> > > > > > >
> > > > > > > Regards,
> > > > > > > Dian
> > > > > > >
> > > > > > > > 2021年7月7日 上午10:37,Yun Gao  写道:
> > > > > > > >
> > > > > > > > Congratulations Guowei!
> > > > > > > >
> > > > > > > >
> > > > > > > > Best,
> > > > > > > > Yun
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > --
> > > > > > > > Sender:JING ZHANG
> > > > > > > > Date:2021/07/07 10:33:51
> > > > > > > > Recipient:dev
> > > > > > > > Theme:Re: [ANNOUNCE] New PMC member: Guowei Ma
> > > > > > > >
> > > > > > > > Congratulations,  Guowei Ma!
> > > > > > > >
> > > > > > > > Best regards,
> > > > > > > > JING ZHANG
> > > > > > > >
> > > > > > > > Zakelly Lan  于2021年7月7日周三 上午10:30写道:
> > > > > > > >
> > > > > > > >> Congratulations, Guowei!
> > > > > > > >>
> > > > > > > >> Best,
> > > > > > > >> Zakelly
> > > > > > > >>
> > > > > > > >> On Wed, Jul 7, 2021 at 10:24 AM tison  >
> > > > wrote:
> > > > > > > >>
> > > > > > > >>> Congrats! NB.
> > > > > > > >>>
> > > > > > > >>> Best,
> > > > > > > >>> tison.
> > > > > > > >>>
> > > > > > > >>>
> > > > > > > >>> Jark Wu  于2021年7月7日周三 上午10:20写道:
> > > > > > > >>>
> > > > > > >  Congratulations Guowei!
> > > > > > > 
> > > > > > >  Best,
> > > > > > >  Jark
> > > > > > > 
> > > > > > >  On Wed, 7 Jul 2021 at 09:54, XING JIN <
> > > jinxing.co...@gmail.com>
> > > > > > > wrote:
> > > > > > > 
> > > > > > > > Congratulations, Guowei~ !
> > > > > > > >
> > > > > > > > Best,
> > > > > > > > Jin
> > > > > > > >
> > > > > > > > Xintong Song  于2021年7月7日周三
> > 上午9:37写道:
> > > > > > > >
> > > > > > > >> Congratulations, Guowei~!
> > > > > > > >>
> > > > > > > >> Thank you~
> > > > > > > >>
> > > > > > > >> Xintong Song
> > > > > > > >>
> > > > > > > >>
> > > > > > > >>
> > > > > > > >> On Wed, Jul 7, 2021 at 9:31 AM Qingsheng Ren <
> > > > renqs...@gmail.com>
> > > > > > >  wrote:
> > > > > > > >>
> > > > > > > >>> Congratulations Guowei!
> > > > > > > >>>
> > > > > > > >>> --
> > > > > > > >>> Best Regards,
> > > > > > > >>>
> > > > > > > >>> Qingsheng Ren
> > > > > > > >>> Email: renqs...@gmail.com
> > > > > > > >>> 2021年7月7日 +0800 09:30 Leonard Xu  >,写道:
> > > > > > >  Congratulations! Guowei Ma
> > > > > > > 
> > > > > > >  Best,
> > > > > > >  Leonard
> > > > > > > 
> > > > > > > > ÔÚ 2021Äê7ÔÂ6ÈÕ£¬21:56£¬Kurt Young 
> > > > дµÀ£º
> > > > > > > >
> > > > > > > > Hi all!
> > > > > > > >
> > > > > > > > I'm very happy to announce that Guowei Ma has joined
> > the
> > > > > > > >> Flink
> > > > > > >  PMC!
> > > > > > > >
> > > > > > > > Congratulations and welcome Guowei!
> > > > > > > >
> > > > > > > > Best,
> > > > > > > > Kurt
> > > > > > > 
> > > > > > > >>>
> > > > > > > >>
> > > > > > > >
> > > > > > > 
> > > > > > > >>>
> > > > > > > >>
> > > > > > > >
> > > > > > >
> > > > > > >
> > > > > >
> > > > > > --
> > > > > >
> > > > > > Best,
> > > > > > Benchao Li
> > > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Best regards!
> > > > > Rui Li
> > > >
> > >
> >
>
>
> --
> Best, Jingsong Lee
>


Re: [VOTE] FLIP-150: Introduce Hybrid Source

2021-07-01 Thread Steven Wu
+1 (non-binding)

On Thu, Jul 1, 2021 at 4:59 AM Thomas Weise  wrote:

> +1 (binding)
>
>
> On Thu, Jul 1, 2021 at 8:13 AM Arvid Heise  wrote:
>
> > +1 (binding)
> >
> > Thank you and Thomas for driving this
> >
> > On Thu, Jul 1, 2021 at 7:50 AM 蒋晓峰  wrote:
> >
> > > Hi everyone,
> > >
> > >
> > >
> > >
> > > Thanks for all the feedback to Hybrid Source so far. Based on the
> > > discussion[1] we seem to have consensus, so I would like to start a
> vote
> > on
> > > FLIP-150 for which the FLIP has also been updated[2].
> > >
> > >
> > >
> > >
> > > The vote will last for at least 72 hours (Sun, Jul 4th 12:00 GMT)
> unless
> > > there is an objection or insufficient votes.
> > >
> > >
> > >
> > >
> > > Thanks,
> > >
> > > Nicholas Jiang
> > >
> > >
> > >
> > >
> > > [1]
> > >
> >
> https://lists.apache.org/thread.html/r94057d19f0df2a211695820375502d60cddeeab5ad27057c1ca988d6%40%3Cdev.flink.apache.org%3E
> > >
> > > [2]
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-150%3A+Introduce+Hybrid+Source
> >
>


Re: Add control mode for flink

2021-06-08 Thread Steven Wu
> producing control events from JobMaster is similar to triggering a
savepoint.

Paul, here is what I see the difference. Upon job or jobmanager recovery,
we don't need to recover and replay the savepoint trigger signal.

On Tue, Jun 8, 2021 at 8:20 PM Paul Lam  wrote:

> +1 for this feature. Setting up a separate control stream is too much for
> many use cases, it would very helpful if users can leverage the built-in
> control flow of Flink.
>
> My 2 cents:
> 1. @Steven IMHO, producing control events from JobMaster is similar to
> triggering a savepoint. The REST api is non-blocking, and users should poll
> the results to confirm the operation is succeeded. If something goes wrong,
> it’s user’s responsibility to retry.
> 2. There are two kinds of existing special elements, special stream
> records (e.g. watermarks) and events (e.g. checkpoint barrier). They all
> flow through the whole DAG, but events needs to be acknowledged by
> downstream and can overtake records, while stream records are not). So I’m
> wondering if we plan to unify the two approaches in the new control flow
> (as Xintong mentioned both in the previous mails)?
>
> Best,
> Paul Lam
>
> 2021年6月8日 14:08,Steven Wu  写道:
>
>
> I can see the benefits of control flow. E.g., it might help the old (and
> inactive) FLIP-17 side input. I would suggest that we add more details of
> some of the potential use cases.
>
> Here is one mismatch with using control flow for dynamic config. Dynamic
> config is typically targeted/loaded by one specific operator. Control flow
> will propagate the dynamic config to all operators. not a problem per se
>
> Regarding using the REST api (to jobmanager) for accepting control
> signals from external system, where are we going to persist/checkpoint the
> signal? jobmanager can die before the control signal is propagated and
> checkpointed. Did we lose the control signal in this case?
>
>
> On Mon, Jun 7, 2021 at 11:05 PM Xintong Song 
> wrote:
>
>> +1 on separating the effort into two steps:
>>
>>1. Introduce a common control flow framework, with flexible
>>interfaces for generating / reacting to control messages for various
>>purposes.
>>2. Features that leverating the control flow can be worked on
>>concurrently
>>
>> Meantime, keeping collecting potential features that may leverage the
>> control flow should be helpful. It provides good inputs for the control
>> flow framework design, to make the framework common enough to cover the
>> potential use cases.
>>
>> My suggestions on the next steps:
>>
>>1. Allow more time for opinions to be heard and potential use cases
>>to be collected
>>2. Draft a FLIP with the scope of common control flow framework
>>3. We probably need a poc implementation to make sure the framework
>>covers at least the following scenarios
>>   1. Produce control events from arbitrary operators
>>   2. Produce control events from JobMaster
>>   3. Consume control events from arbitrary operators downstream
>>   where the events are produced
>>
>>
>> Thank you~
>> Xintong Song
>>
>>
>>
>> On Tue, Jun 8, 2021 at 1:37 PM Yun Gao  wrote:
>>
>>> Very thanks Jiangang for bringing this up and very thanks for the
>>> discussion!
>>>
>>> I also agree with the summarization by Xintong and Jing that control
>>> flow seems to be
>>> a common buidling block for many functionalities and dynamic
>>> configuration framework
>>> is a representative application that frequently required by users.
>>> Regarding the control flow,
>>> currently we are also considering the design of iteration for the
>>> flink-ml, and as Xintong has pointed
>>> out, it also required the control flow in cases like detection global
>>> termination inside the iteration
>>>  (in this case we need to broadcast an event through the iteration body
>>> to detect if there are still
>>> records reside in the iteration body). And regarding  whether to
>>> implement the dynamic configuration
>>> framework, I also agree with Xintong that the consistency guarantee
>>> would be a point to consider, we
>>> might consider if we need to ensure every operator could receive the
>>> dynamic configuration.
>>>
>>> Best,
>>> Yun
>>>
>>>
>>>
>>> --
>>> Sender:kai wang
>>> Date:2021/06/08 11:52:12
>>> Recipient:JING ZHANG
>>> Cc:刘建刚; Xinton

Re: Re: Add control mode for flink

2021-06-08 Thread Steven Wu
option 2 is probably not feasible, as checkpoint may take a long time or
may fail.

Option 1 might work, although it complicates the job recovery and
checkpoint. After checkpoint completion, we need to clean up those control
signals stored in HA service.

On Tue, Jun 8, 2021 at 1:14 AM 刘建刚  wrote:

> Thanks for the reply. It is a good question. There are multi choices as
> follows:
>
>1. We can persist control signals in HighAvailabilityServices and replay
>them after failover.
>2. Only tell the users that the control signals take effect after they
>are checkpointed.
>
>
> Steven Wu [via Apache Flink User Mailing List archive.] <
> ml+s2336050n44278...@n4.nabble.com> 于2021年6月8日周二 下午2:15写道:
>
> >
> > I can see the benefits of control flow. E.g., it might help the old (and
> > inactive) FLIP-17 side input. I would suggest that we add more details of
> > some of the potential use cases.
> >
> > Here is one mismatch with using control flow for dynamic config. Dynamic
> > config is typically targeted/loaded by one specific operator. Control
> flow
> > will propagate the dynamic config to all operators. not a problem per se
> >
> > Regarding using the REST api (to jobmanager) for accepting control
> > signals from external system, where are we going to persist/checkpoint
> the
> > signal? jobmanager can die before the control signal is propagated and
> > checkpointed. Did we lose the control signal in this case?
> >
> >
> > On Mon, Jun 7, 2021 at 11:05 PM Xintong Song <[hidden email]
> > <http:///user/SendEmail.jtp?type=node=44278=0>> wrote:
> >
> >> +1 on separating the effort into two steps:
> >>
> >>1. Introduce a common control flow framework, with flexible
> >>interfaces for generating / reacting to control messages for various
> >>purposes.
> >>2. Features that leverating the control flow can be worked on
> >>concurrently
> >>
> >> Meantime, keeping collecting potential features that may leverage the
> >> control flow should be helpful. It provides good inputs for the control
> >> flow framework design, to make the framework common enough to cover the
> >> potential use cases.
> >>
> >> My suggestions on the next steps:
> >>
> >>1. Allow more time for opinions to be heard and potential use cases
> >>to be collected
> >>2. Draft a FLIP with the scope of common control flow framework
> >>3. We probably need a poc implementation to make sure the framework
> >>covers at least the following scenarios
> >>   1. Produce control events from arbitrary operators
> >>   2. Produce control events from JobMaster
> >>   3. Consume control events from arbitrary operators downstream
> >>   where the events are produced
> >>
> >>
> >> Thank you~
> >>
> >> Xintong Song
> >>
> >>
> >>
> >> On Tue, Jun 8, 2021 at 1:37 PM Yun Gao <[hidden email]
> >> <http:///user/SendEmail.jtp?type=node=44278=1>> wrote:
> >>
> >>> Very thanks Jiangang for bringing this up and very thanks for the
> >>> discussion!
> >>>
> >>> I also agree with the summarization by Xintong and Jing that control
> >>> flow seems to be
> >>> a common buidling block for many functionalities and dynamic
> >>> configuration framework
> >>> is a representative application that frequently required by users.
> >>> Regarding the control flow,
> >>> currently we are also considering the design of iteration for the
> >>> flink-ml, and as Xintong has pointed
> >>> out, it also required the control flow in cases like detection global
> >>> termination inside the iteration
> >>>  (in this case we need to broadcast an event through the iteration body
> >>> to detect if there are still
> >>> records reside in the iteration body). And regarding  whether to
> >>> implement the dynamic configuration
> >>> framework, I also agree with Xintong that the consistency guarantee
> >>> would be a point to consider, we
> >>> might consider if we need to ensure every operator could receive the
> >>> dynamic configuration.
> >>>
> >>> Best,
> >>> Yun
> >>>
> >>>
> >>>
> >>> --
> >>> Sender:kai wang<[hidden email]
> >>

Re: [DISCUSS]FLIP-150: Introduce Hybrid Source

2021-06-08 Thread Steven Wu
> hybrid sounds to me more like the source would constantly switch back and
forth

Initially, the focus of hybrid source is more like a sequenced chain.

But in the future it would be cool that hybrid sources can intelligently
switch back and forth between historical data source (like Iceberg) and
live data source (like Kafka). E.g.,
- if the Flink job is lagging behind Kafka retention, automatically switch
to Iceberg source
- once job caught up, switch back to Kafka source

That can simplify operational aspects of manually switching.


On Mon, Jun 7, 2021 at 8:07 AM Arvid Heise  wrote:

> Sorry for joining the party so late, but it's such an interesting FLIP with
> a huge impact that I wanted to add my 2 cents. [1]
> I'm mirroring some basic question from the PR review to this thread because
> it's about the name:
>
> We could rename the thing to ConcatenatedSource(s), SourceSequence, or
> similar.
> Hybrid has the connotation of 2 for me (maybe because I'm a non-native) and
> does not carry the concatentation concept as well (hybrid sounds to me more
> like the source would constantly switch back and forth).
>
> Could we take a few minutes to think if this is the most intuitive name for
> new users? I'm especially hoping that natives might give some ideas (or
> declare that Hybrid is perfect).
>
> [1] https://github.com/apache/flink/pull/15924#pullrequestreview-677376664
>
> On Sun, Jun 6, 2021 at 7:47 PM Steven Wu  wrote:
>
> > > Converter function relies on the specific enumerator capabilities to
> set
> > the new start position (e.g.
> > fileSourceEnumerator.getEndTimestamp() and
> > kafkaSourceEnumerator.setTimestampOffsetsInitializer(..)
> >
> > I guess the premise is that a converter is for a specific tuple of
> > (upstream source, downstream source) . We don't have to define generic
> > EndtStateT and SwitchableEnumerator interfaces. That should work.
> >
> > The benefit of defining EndtStateT and SwitchableEnumerator interfaces is
> > probably promoting uniformity across sources that support
> hybrid/switchable
> > source.
> >
> > On Sun, Jun 6, 2021 at 10:22 AM Thomas Weise  wrote:
> >
> > > Hi Steven,
> > >
> > > Thank you for the thorough review of the PR and for bringing this back
> > > to the mailing list.
> > >
> > > All,
> > >
> > > I updated the FLIP-150 page to highlight aspects in which the PR
> > > deviates from the original proposal [1]. The goal would be to update
> > > the FLIP soon and bring it to a vote, as previously suggested offline
> > > by Nicholas.
> > >
> > > A few minor issues in the PR are outstanding and I'm working on test
> > > coverage for the recovery behavior, which should be completed soon.
> > >
> > > The dynamic position transfer needs to be concluded before we can move
> > > forward however.
> > >
> > > There have been various ideas, including the special
> > > "SwitchableEnumerator" interface, using enumerator checkpoint state or
> > > an enumerator interface extension to extract the end state.
> > >
> > > One goal in the FLIP is to "Reuse the existing Source connectors built
> > > with FLIP-27 without any change." and I think it is important to honor
> > > that goal given that fixed start positions do not require interface
> > > changes.
> > >
> > > Based on the feedback the following might be a good solution for
> > > runtime position transfer:
> > >
> > > * User supplies the optional converter function (not applicable for
> > > fixed positions).
> > > * Instead of relying on the enumerator checkpoint state [2], the
> > > converter function will be supplied with the current and next
> > > enumerator (source.createEnumerator).
> > > * Converter function relies on the specific enumerator capabilities to
> > > set the new start position (e.g.
> > > fileSourceEnumerator.getEndTimestamp() and
> > > kafkaSourceEnumerator.setTimestampOffsetsInitializer(..)
> > > * HybridSourceSplitEnumerator starts new underlying enumerator
> > >
> > > With this approach, there is no need to augment FLIP-27 interfaces and
> > > custom source capabilities are easier to integrate. Removing the
> > > mandate to rely on enumerator checkpoint state also avoids potential
> > > upgrade/compatibility issues.
> > >
> > > Thoughts?
> > >
> > > Thanks,
> > > Thomas
> > >
> > > [1]
> > >
> >
> https://cwiki.apache.org/confluence/display

Re: Re: Add control mode for flink

2021-06-08 Thread Steven Wu
 things in common. A unified control flow model would help
>>>>> deduplicate the common logics, allowing us to focus on the use case
>>>>> specific parts.
>>>>>
>>>>> E.g.,
>>>>> - Watermarks: generated by source operators, handled by window
>>>>> operators.
>>>>> - Checkpoint barrier: generated by the checkpoint coordinator, handled
>>>>> by all tasks
>>>>> - Dynamic controlling: generated by JobMaster (in reaction to the REST
>>>>> command), handled by specific operators/UDFs
>>>>> - Operator defined events: The following features are still in
>>>>> planning, but may potentially benefit from the control flow model. (Please
>>>>> correct me if I'm wrong, @Yun, @Jark)
>>>>>   * Iteration: When a certain condition is met, we might want to
>>>>> signal downstream operators with an event
>>>>>   * Mini-batch assembling: Flink currently uses special watermarks for
>>>>> indicating the end of each mini-batch, which makes it tricky to deal with
>>>>> event time related computations.
>>>>>   * Hive dimension table join: For periodically reloaded hive tables,
>>>>> it would be helpful to have specific events signaling that a reloading is
>>>>> finished.
>>>>>   * Bootstrap dimension table join: This is similar to the previous
>>>>> one. In cases where we want to fully load the dimension table before
>>>>> starting joining the mainstream, it would be helpful to have an event
>>>>> signaling the finishing of the bootstrap.
>>>>>
>>>>> ## Dynamic REST controlling
>>>>> Back to the specific feature that Jiangang proposed, I personally
>>>>> think it's quite convenient. Currently, to dynamically change the behavior
>>>>> of an operator, we need to set up a separate source for the control events
>>>>> and leverage broadcast state. Being able to send the events via REST APIs
>>>>> definitely improves the usability.
>>>>>
>>>>> Leveraging dynamic configuration frameworks is for sure one possible
>>>>> approach. The reason we are in favor of introducing the control flow is
>>>>> that:
>>>>> - It benefits not only this specific dynamic controlling feature, but
>>>>> potentially other future features as well.
>>>>> - AFAICS, it's non-trivial to make a 3rd-party dynamic configuration
>>>>> framework work together with Flink's consistency mechanism.
>>>>>
>>>>> Thank you~
>>>>>
>>>>> Xintong Song
>>>>>
>>>>>
>>>>>
>>>>> On Mon, Jun 7, 2021 at 11:05 AM 刘建刚 <[hidden email]
>>>>> <http:///user/SendEmail.jtp?type=node=44245=0>> wrote:
>>>>>
>>>>>> Thank you for the reply. I have checked the post you mentioned. The
>>>>>> dynamic config may be useful sometimes. But it is hard to keep data
>>>>>> consistent in flink, for example, what if the dynamic config will take
>>>>>> effect when failover. Since dynamic config is a desire for users, maybe
>>>>>> flink can support it in some way.
>>>>>>
>>>>>> For the control mode, dynamic config is just one of the control
>>>>>> modes. In the google doc, I have list some other cases. For example,
>>>>>> control events are generated in operators or external services. Besides
>>>>>> user's dynamic config, flink system can support some common dynamic
>>>>>> configuration, like qps limit, checkpoint control and so on.
>>>>>>
>>>>>> It needs good design to handle the control mode structure. Based on
>>>>>> that, other control features can be added easily later, like changing log
>>>>>> level when job is running. In the end, flink will not just process data,
>>>>>> but also interact with users to receive control events like a service.
>>>>>>
>>>>>> Steven Wu <[hidden email]
>>>>>> <http:///user/SendEmail.jtp?type=node=44245=1>> 于2021年6月4日周五
>>>>>> 下午11:11写道:
>>>>>>
>>>>>>> I am not sure if we should solve this problem in Flink. This is more
>>>>>>> like a dynamic config problem that probably should be solved by s

Re: [DISCUSS]FLIP-150: Introduce Hybrid Source

2021-06-06 Thread Steven Wu
> Converter function relies on the specific enumerator capabilities to set
the new start position (e.g.
fileSourceEnumerator.getEndTimestamp() and
kafkaSourceEnumerator.setTimestampOffsetsInitializer(..)

I guess the premise is that a converter is for a specific tuple of
(upstream source, downstream source) . We don't have to define generic
EndtStateT and SwitchableEnumerator interfaces. That should work.

The benefit of defining EndtStateT and SwitchableEnumerator interfaces is
probably promoting uniformity across sources that support hybrid/switchable
source.

On Sun, Jun 6, 2021 at 10:22 AM Thomas Weise  wrote:

> Hi Steven,
>
> Thank you for the thorough review of the PR and for bringing this back
> to the mailing list.
>
> All,
>
> I updated the FLIP-150 page to highlight aspects in which the PR
> deviates from the original proposal [1]. The goal would be to update
> the FLIP soon and bring it to a vote, as previously suggested offline
> by Nicholas.
>
> A few minor issues in the PR are outstanding and I'm working on test
> coverage for the recovery behavior, which should be completed soon.
>
> The dynamic position transfer needs to be concluded before we can move
> forward however.
>
> There have been various ideas, including the special
> "SwitchableEnumerator" interface, using enumerator checkpoint state or
> an enumerator interface extension to extract the end state.
>
> One goal in the FLIP is to "Reuse the existing Source connectors built
> with FLIP-27 without any change." and I think it is important to honor
> that goal given that fixed start positions do not require interface
> changes.
>
> Based on the feedback the following might be a good solution for
> runtime position transfer:
>
> * User supplies the optional converter function (not applicable for
> fixed positions).
> * Instead of relying on the enumerator checkpoint state [2], the
> converter function will be supplied with the current and next
> enumerator (source.createEnumerator).
> * Converter function relies on the specific enumerator capabilities to
> set the new start position (e.g.
> fileSourceEnumerator.getEndTimestamp() and
> kafkaSourceEnumerator.setTimestampOffsetsInitializer(..)
> * HybridSourceSplitEnumerator starts new underlying enumerator
>
> With this approach, there is no need to augment FLIP-27 interfaces and
> custom source capabilities are easier to integrate. Removing the
> mandate to rely on enumerator checkpoint state also avoids potential
> upgrade/compatibility issues.
>
> Thoughts?
>
> Thanks,
> Thomas
>
> [1]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-150%3A+Introduce+Hybrid+Source#FLIP150:IntroduceHybridSource-Prototypeimplementation
> [2]
> https://github.com/apache/flink/pull/15924/files#diff-e07478b3cad9810925ec784b61ec0026396839cc5b27bd6d337a1dea05e999eaR281
>
>
> On Tue, Jun 1, 2021 at 3:10 PM Steven Wu  wrote:
> >
> > discussed the PR with Thosmas offline. Thomas, please correct me if I
> > missed anything.
> >
> > Right now, the PR differs from the FLIP-150 doc regarding the converter.
> > * Current PR uses the enumerator checkpoint state type as the input for
> the
> > converter
> > * FLIP-150 defines a new EndStateT interface.
> > It seems that the FLIP-150 approach of EndStateT is more flexible, as
> > transition EndStateT doesn't have to be included in the upstream source
> > checkpoint state.
> >
> > Let's look at two use cases:
> > 1) static cutover time at 5 pm. File source reads all data btw 9 am - 5
> pm,
> > then Kafka source starts with initial position of 5 pm. In this case,
> there
> > is no need for converter or EndStateT since the starting time for Kafka
> > source is known and fixed.
> > 2) dynamic cutover time at 1 hour before now. This is useful when the
> > bootstrap of historic data takes a long time (like days or weeks) and we
> > don't know the exact time of cutover when a job is launched. Instead, we
> > are instructing the file source to stop when it gets close to live data.
> In
> > this case, hybrid source construction will specify a relative time (now
> - 1
> > hour), the EndStateT (of file source) will be resolved to an absolute
> time
> > for cutover. We probably don't need to include EndStateT (end timestamp)
> as
> > the file source checkpoint state. Hence, the separate EndStateT is
> probably
> > more desirable.
> >
> > We also discussed the converter for the Kafka source. Kafka source
> supports
> > different OffsetsInitializer impls (including
> TimestampOffsetsInitializer).
> > To support the dynamic cutover time (use case #2 above), we can plug i

Re: Add control mode for flink

2021-06-04 Thread Steven Wu
I am not sure if we should solve this problem in Flink. This is more like a
dynamic config problem that probably should be solved by some configuration
framework. Here is one post from google search:
https://medium.com/twodigits/dynamic-app-configuration-inject-configuration-at-run-time-using-spring-boot-and-docker-ffb42631852a

On Fri, Jun 4, 2021 at 7:09 AM 刘建刚  wrote:

> Hi everyone,
>
>   Flink jobs are always long-running. When the job is running, users
> may want to control the job but not stop it. The control reasons can be
> different as following:
>
>1.
>
>Change data processing’ logic, such as filter condition.
>2.
>
>Send trigger events to make the progress forward.
>3.
>
>Define some tools to degrade the job, such as limit input qps,
>sampling data.
>4.
>
>Change log level to debug current problem.
>
>   The common way to do this is to stop the job, do modifications and
> start the job. It may take a long time to recover. In some situations,
> stopping jobs is intolerable, for example, the job is related to money or
> important activities.So we need some technologies to control the running
> job without stopping the job.
>
>
> We propose to add control mode for flink. A control mode based on the
> restful interface is first introduced. It works by these steps:
>
>
>1. The user can predefine some logic which supports config control,
>such as filter condition.
>2. Run the job.
>3. If the user wants to change the job's running logic, just send a
>restful request with the responding config.
>
> Other control modes will also be considered in the future. More
> introduction can refer to the doc
> https://docs.google.com/document/d/1WSU3Tw-pSOcblm3vhKFYApzVkb-UQ3kxso8c8jEzIuA/edit?usp=sharing
> . If the community likes the proposal, more discussion is needed and a more
> detailed design will be given later. Any suggestions and ideas are welcome.
>
>


Re: [DISCUSS]FLIP-150: Introduce Hybrid Source

2021-06-01 Thread Steven Wu
discussed the PR with Thosmas offline. Thomas, please correct me if I
missed anything.

Right now, the PR differs from the FLIP-150 doc regarding the converter.
* Current PR uses the enumerator checkpoint state type as the input for the
converter
* FLIP-150 defines a new EndStateT interface.
It seems that the FLIP-150 approach of EndStateT is more flexible, as
transition EndStateT doesn't have to be included in the upstream source
checkpoint state.

Let's look at two use cases:
1) static cutover time at 5 pm. File source reads all data btw 9 am - 5 pm,
then Kafka source starts with initial position of 5 pm. In this case, there
is no need for converter or EndStateT since the starting time for Kafka
source is known and fixed.
2) dynamic cutover time at 1 hour before now. This is useful when the
bootstrap of historic data takes a long time (like days or weeks) and we
don't know the exact time of cutover when a job is launched. Instead, we
are instructing the file source to stop when it gets close to live data. In
this case, hybrid source construction will specify a relative time (now - 1
hour), the EndStateT (of file source) will be resolved to an absolute time
for cutover. We probably don't need to include EndStateT (end timestamp) as
the file source checkpoint state. Hence, the separate EndStateT is probably
more desirable.

We also discussed the converter for the Kafka source. Kafka source supports
different OffsetsInitializer impls (including TimestampOffsetsInitializer).
To support the dynamic cutover time (use case #2 above), we can plug in a
SupplierTimestampOffsetInitializer, where the starting timestamp is not set
during source/job construction. Rather it is a supplier model where the
starting timestamp value is set to the resolved absolute timestamp during
switch.

Thanks,
Steven



On Thu, May 20, 2021 at 8:59 PM Thomas Weise  wrote:

> Hi Nicholas,
>
> Thanks for taking a look at the PR!
>
> 1. Regarding switching mechanism:
>
> There has been previous discussion in this thread regarding the pros
> and cons of how the switching can be exposed to the user.
>
> With fixed start positions, no special switching interface to transfer
> information between enumerators is required. Sources are configured as
> they would be when used standalone and just plugged into HybridSource.
> I expect that to be a common use case. You can find an example for
> this in the ITCase:
>
>
> https://github.com/apache/flink/pull/15924/files#diff-fe1407d135d7b7b3a72aeb4471ab53ccd2665a58ff0129a83db1ec19cea06f1bR101
>
> For dynamic start position, the checkpoint state is used to transfer
> information from old to new enumerator. An example for that can be
> found here:
>
>
> https://github.com/apache/flink/pull/15924/files#diff-fe1407d135d7b7b3a72aeb4471ab53ccd2665a58ff0129a83db1ec19cea06f1bR112-R136
>
> That may look verbose, but the code to convert from one state to
> another can be factored out into a utility and the function becomes a
> one-liner.
>
> For common sources like files and Kafka we can potentially (later)
> implement the conversion logic as part of the respective connector's
> checkpoint and split classes.
>
> I hope that with the PR up for review, we can soon reach a conclusion
> on how we want to expose this to the user.
>
> Following is an example for Files -> Files -> Kafka that I'm using for
> e2e testing. It exercises both ways of setting the start position.
>
> https://gist.github.com/tweise/3139d66461e87986f6eddc70ff06ef9a
>
>
> 2. Regarding the events used to implement the actual switch between
> enumerator and readers: I updated the PR with javadoc to clarify the
> intent. Please let me know if that helps or let's continue to discuss
> those details on the PR?
>
>
> Thanks,
> Thomas
>
>
> On Mon, May 17, 2021 at 1:03 AM Nicholas Jiang 
> wrote:
> >
> > Hi Thomas,
> >
> >Sorry for later reply for your POC. I have reviewed the based abstract
> > implementation of your pull request:
> > https://github.com/apache/flink/pull/15924. IMO, for the switching
> > mechanism, this level of abstraction is not concise enough, which doesn't
> > make connector contribution easier. In theory, it is necessary to
> introduce
> > a set of interfaces to support the switching mechanism. The
> SwitchableSource
> > and SwitchableSplitEnumerator interfaces are needed for connector
> > expansibility.
> >In other words, the whole switching process of above mentioned PR is
> > different from that mentioned in FLIP-150. In the above implementation,
> the
> > source reading switching is executed after receving the
> SwitchSourceEvent,
> > which could be before the sending SourceReaderFinishEvent. This timeline
> of
> > source reading switching could be discussed here.
> >@Stephan @Becket, if you are available, please help to review the
> > abstract implementation, and compare with the interfaces mentioned in
> > FLIP-150.
> >
> > Thanks,
> > Nicholas Jiang
> >
> >
> >
> > --
> > Sent from:
> 

Re: [DISCUSS] FLIP-160: Declarative scheduler

2021-01-22 Thread Steven Wu
Till, thanks a lot for the proposal.

Even if the initial phase is only to support scale-up, maybe the
"ScaleUpController" interface should be called "RescaleController" so that
in the future scale-down can be added.

On Fri, Jan 22, 2021 at 7:03 AM Till Rohrmann  wrote:

> Hi everyone,
>
> I would like to start a discussion about adding a new type of scheduler to
> Flink. The declarative scheduler will first declare the required resources
> and wait for them before deciding on the actual parallelism of a job.
> Thereby it can better handle situations where resources cannot be fully
> fulfilled. Moreover, it will act as a building block for the reactive mode
> where Flink should scale to the maximum of the currently available
> resources.
>
> Please find more details in the FLIP wiki document [1]. Looking forward to
> your feedback.
>
> [1] https://cwiki.apache.org/confluence/x/mwtRCg
>
> Cheers,
> Till
>


Re: [DISCUSS] FLIP-159: Reactive Mode

2021-01-22 Thread Steven Wu
Thanks a lot for the proposal, Robert and Till.

> No fixed parallelism for any of the operators

Regarding this limitation, can the scheduler only adjust the default
parallelism? if some operators set parallelism explicitly (like always 1),
just leave them unchanged.


On Fri, Jan 22, 2021 at 8:42 AM Robert Metzger  wrote:

> Hi all,
>
> Till started a discussion about FLIP-160: Declarative scheduler [1] earlier
> today, the first major feature based on that effort will be FLIP-159:
> Reactive Mode. It allows users to operate Flink in a way that it reactively
> scales the job up or down depending on the provided resources: adding
> TaskManagers will scale the job up, removing them will scale it down again.
>
> Here's the link to the Wiki:
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-159%3A+Reactive+Mode
>
> We are very excited to hear your feedback about the proposal!
>
> Best,
> Robert
>
> [1]
>
> https://lists.apache.org/thread.html/r604a01f739639e2a5f093fbe7894c172125530332747ecf6990a6ce4%40%3Cdev.flink.apache.org%3E
>


Re: Re: Re: [ANNOUNCE] Welcome Guowei Ma as a new Apache Flink Committer

2021-01-20 Thread Steven Wu
Congrats, Guowei!

On Wed, Jan 20, 2021 at 10:32 AM Seth Wiesman  wrote:

> Congratulations!
>
> On Wed, Jan 20, 2021 at 3:41 AM hailongwang <18868816...@163.com> wrote:
>
> > Congratulations, Guowei!
> >
> > Best,
> > Hailong
> >
> > 在 2021-01-20 15:55:24,"Till Rohrmann"  写道:
> > >Congrats, Guowei!
> > >
> > >Cheers,
> > >Till
> > >
> > >On Wed, Jan 20, 2021 at 8:32 AM Matthias Pohl 
> > >wrote:
> > >
> > >> Congrats, Guowei!
> > >>
> > >> On Wed, Jan 20, 2021 at 8:22 AM Congxian Qiu 
> > >> wrote:
> > >>
> > >> > Congrats Guowei!
> > >> >
> > >> > Best,
> > >> > Congxian
> > >> >
> > >> >
> > >> > Danny Chan  于2021年1月20日周三 下午2:59写道:
> > >> >
> > >> > > Congratulations Guowei!
> > >> > >
> > >> > > Best,
> > >> > > Danny
> > >> > >
> > >> > > Jark Wu  于2021年1月20日周三 下午2:47写道:
> > >> > >
> > >> > > > Congratulations Guowei!
> > >> > > >
> > >> > > > Cheers,
> > >> > > > Jark
> > >> > > >
> > >> > > > On Wed, 20 Jan 2021 at 14:36, SHI Xiaogang <
> > shixiaoga...@gmail.com>
> > >> > > wrote:
> > >> > > >
> > >> > > > > Congratulations MA!
> > >> > > > >
> > >> > > > > Regards,
> > >> > > > > Xiaogang
> > >> > > > >
> > >> > > > > Yun Tang  于2021年1月20日周三 下午2:24写道:
> > >> > > > >
> > >> > > > > > Congratulations Guowei!
> > >> > > > > >
> > >> > > > > > Best
> > >> > > > > > Yun Tang
> > >> > > > > > 
> > >> > > > > > From: Yang Wang 
> > >> > > > > > Sent: Wednesday, January 20, 2021 13:59
> > >> > > > > > To: dev 
> > >> > > > > > Subject: Re: Re: [ANNOUNCE] Welcome Guowei Ma as a new
> Apache
> > >> Flink
> > >> > > > > > Committer
> > >> > > > > >
> > >> > > > > > Congratulations Guowei!
> > >> > > > > >
> > >> > > > > >
> > >> > > > > > Best,
> > >> > > > > > Yang
> > >> > > > > >
> > >> > > > > > Yun Gao  于2021年1月20日周三
> > 下午1:52写道:
> > >> > > > > >
> > >> > > > > > > Congratulations Guowei!
> > >> > > > > > >
> > >> > > > > > > Best,
> > >> > > > > > >
> > >> > > >
> > Yun--
> > >> > > > > > > Sender:Yangze Guo
> > >> > > > > > > Date:2021/01/20 13:48:52
> > >> > > > > > > Recipient:dev
> > >> > > > > > > Theme:Re: [ANNOUNCE] Welcome Guowei Ma as a new Apache
> Flink
> > >> > > > Committer
> > >> > > > > > >
> > >> > > > > > > Congratulations, Guowei! Well deserved.
> > >> > > > > > >
> > >> > > > > > >
> > >> > > > > > > Best,
> > >> > > > > > > Yangze Guo
> > >> > > > > > >
> > >> > > > > > > On Wed, Jan 20, 2021 at 1:46 PM Xintong Song <
> > >> > > tonysong...@gmail.com>
> > >> > > > > > > wrote:
> > >> > > > > > > >
> > >> > > > > > > > Congratulations, Guowei~!
> > >> > > > > > > >
> > >> > > > > > > >
> > >> > > > > > > > Thank you~
> > >> > > > > > > >
> > >> > > > > > > > Xintong Song
> > >> > > > > > > >
> > >> > > > > > > >
> > >> > > > > > > >
> > >> > > > > > > > On Wed, Jan 20, 2021 at 1:42 PM Yuan Mei <
> > >> > yuanmei.w...@gmail.com
> > >> > > >
> > >> > > > > > wrote:
> > >> > > > > > > >
> > >> > > > > > > > > Congrats Guowei :-)
> > >> > > > > > > > >
> > >> > > > > > > > > Best,
> > >> > > > > > > > > Yuan
> > >> > > > > > > > >
> > >> > > > > > > > > On Wed, Jan 20, 2021 at 1:36 PM tison <
> > >> wander4...@gmail.com>
> > >> > > > > wrote:
> > >> > > > > > > > >
> > >> > > > > > > > > > Congrats Guowei!
> > >> > > > > > > > > >
> > >> > > > > > > > > > Best,
> > >> > > > > > > > > > tison.
> > >> > > > > > > > > >
> > >> > > > > > > > > >
> > >> > > > > > > > > > Kurt Young  于2021年1月20日周三
> 下午1:34写道:
> > >> > > > > > > > > >
> > >> > > > > > > > > > > Hi everyone,
> > >> > > > > > > > > > >
> > >> > > > > > > > > > > I'm very happy to announce that Guowei Ma has
> > accepted
> > >> > the
> > >> > > > > > > invitation
> > >> > > > > > > > > to
> > >> > > > > > > > > > > become a Flink committer.
> > >> > > > > > > > > > >
> > >> > > > > > > > > > > Guowei is a very long term Flink developer, he has
> > been
> > >> > > > > extremely
> > >> > > > > > > > > helpful
> > >> > > > > > > > > > > with
> > >> > > > > > > > > > > some important runtime changes, and also been
> > active
> > >> > with
> > >> > > > > > > answering
> > >> > > > > > > > > user
> > >> > > > > > > > > > > questions as well as discussing designs.
> > >> > > > > > > > > > >
> > >> > > > > > > > > > > Please join me in congratulating Guowei for
> > becoming a
> > >> > > Flink
> > >> > > > > > > committer!
> > >> > > > > > > > > > >
> > >> > > > > > > > > > > Best,
> > >> > > > > > > > > > > Kurt
> > >> > > > > > > > > > >
> > >> > > > > > > > > >
> > >> > > > > > > > >
> > >> > > > > > >
> > >> > > > > >
> > >> > > > >
> > >> > > >
> > >> > >
> > >>
> >
>


Re: [DISCUSS] Flink configuration from environment variables

2021-01-19 Thread Steven Wu
get substituted
> > immediately, I
> > > > agree.
> > > >
> > > >
> > > > Regards
> > > > Ingo
> > > >
> > > > On Tue, Jan 19, 2021 at 4:47 AM Yang Wang 
> > wrote:
> > > >
> > > > > Hi Ingo,
> > > > >
> > > > > Thanks for your response.
> > > > >
> > > > > 1. Not distinguishing JM/TM is reasonable, but what about the
> client
> > > > side.
> > > > > For Yarn/K8s deployment,
> > > > > the local flink-conf.yaml will be shipped to JM/TM. So I am just
> > confused
> > > > > about where should the environment
> > > > > variables be replaced? IIUC, it is not an issue for Ververica
> > Platform
> > > > > since it is always done in the JM/TM side.
> > > > >
> > > > > 2. I believe we should support not do the substitution for specific
> > key.
> > > > A
> > > > > typical use case is "env.java.opts". If the
> > > > > value contains environment variables, they are expected to be
> > replaced
> > > > > exactly when the java command is executed,
> > > > > not after the java process is started. Maybe escaping with single
> > quote
> > > > is
> > > > > enough.
> > > > >
> > > > > 3. The substitution only takes effects on the value makes sense to
> > me.
> > > > >
> > > > >
> > > > > Best,
> > > > > Yang
> > > > >
> > > > > Steven Wu  于2021年1月19日周二 上午12:36写道:
> > > > >
> > > > > > Variable substitution (proposed here) is definitely useful.
> > > > > >
> > > > > > For us, hierarchical override is more useful.  E.g., we may have
> > the
> > > > > > default value of "state.checkpoints.dir=path1" defined in
> > > > > flink-conf.yaml.
> > > > > > But maybe we want to override it to "state.checkpoints.dir=path2"
> > via
> > > > > > environment variable in some scenarios. Otherwise, we have to
> > define a
> > > > > > corresponding shell variable (like STATE_CHECKPOINTS_DIR) for the
> > Flink
> > > > > > config, which is annoying.
> > > > > >
> > > > > > As Ingo pointed, it is also annoying to handle Java property key
> > naming
> > > > > > convention (dots separated), as dots aren't allowed in shell env
> > var
> > > > > naming
> > > > > > (All caps, separated with underscore). Shell will complain. We
> > have to
> > > > > > bundle all env var overrides (k-v pairs) in a single property
> value
> > > > (JSON
> > > > > > and base64 encode) to avoid it.
> > > > > >
> > > > > > On Mon, Jan 18, 2021 at 8:15 AM Ingo Bürk 
> > wrote:
> > > > > >
> > > > > > > Hi Yang,
> > > > > > >
> > > > > > > thanks for your questions! I'm glad to see this feature is
> being
> > > > > received
> > > > > > > positively.
> > > > > > >
> > > > > > > ad 1) We don't distinguish JM/TM, and I can't think of a good
> > reason
> > > > > why
> > > > > > a
> > > > > > > user would want to do so. I'm not very experienced with Flink,
> > > > however,
> > > > > > so
> > > > > > > please excuse me if I'm overlooking some obvious reason here.
> :-)
> > > > > > > ad 2) Admittedly I don't have a good overview on all the
> > > > configuration
> > > > > > > options that exist, but from those that I do know I can't
> imagine
> > > > > someone
> > > > > > > wanting to pass a value like "${MY_VAR}" verbatim. In Ververica
> > > > > Platform
> > > > > > as
> > > > > > > of now we ignore this problem. If, however, this needs to be
> > > > > addressed, a
> > > > > > > possible solution could be to allow escaping syntax such as
> > > > > "\${MY_VAR}".
> > > > > > >
> > > > > > > Another point to consider here is when exactly the substitution
> > takes
> > > > > > >

Re: [DISCUSS] Flink configuration from environment variables

2021-01-18 Thread Steven Wu
Variable substitution (proposed here) is definitely useful.

For us, hierarchical override is more useful.  E.g., we may have the
default value of "state.checkpoints.dir=path1" defined in flink-conf.yaml.
But maybe we want to override it to "state.checkpoints.dir=path2" via
environment variable in some scenarios. Otherwise, we have to define a
corresponding shell variable (like STATE_CHECKPOINTS_DIR) for the Flink
config, which is annoying.

As Ingo pointed, it is also annoying to handle Java property key naming
convention (dots separated), as dots aren't allowed in shell env var naming
(All caps, separated with underscore). Shell will complain. We have to
bundle all env var overrides (k-v pairs) in a single property value (JSON
and base64 encode) to avoid it.

On Mon, Jan 18, 2021 at 8:15 AM Ingo Bürk  wrote:

> Hi Yang,
>
> thanks for your questions! I'm glad to see this feature is being received
> positively.
>
> ad 1) We don't distinguish JM/TM, and I can't think of a good reason why a
> user would want to do so. I'm not very experienced with Flink, however, so
> please excuse me if I'm overlooking some obvious reason here. :-)
> ad 2) Admittedly I don't have a good overview on all the configuration
> options that exist, but from those that I do know I can't imagine someone
> wanting to pass a value like "${MY_VAR}" verbatim. In Ververica Platform as
> of now we ignore this problem. If, however, this needs to be addressed, a
> possible solution could be to allow escaping syntax such as "\${MY_VAR}".
>
> Another point to consider here is when exactly the substitution takes
> place: on the "raw" file, or on the parsed key / value separately, and if
> so, should it support both key and value? My current thinking is that
> substituting only the value of the parsed entry should be sufficient.
>
>
> Regards
> Ingo
>
> On Mon, Jan 18, 2021 at 3:48 PM Yang Wang  wrote:
>
> > Thanks for kicking off the discussion.
> >
> > I think supporting environment variables rendering in the Flink
> > configuration yaml file is a good idea. Especially for
> > the Kubernetes environment since we are using the secret resource to
> store
> > the authentication information.
> >
> > But I have some questions for how to do it?
> > 1. The environments in Flink configuration yaml will be replaced in
> client,
> > JobManager, TaskManager or all of them?
> > 2. If users do not want some config options to be replaced, how to
> > achieve that?
> >
> > Best,
> > Yang
> >
> > Khachatryan Roman  于2021年1月18日周一 下午8:55写道:
> >
> > > Hi Ingo,
> > >
> > > Thanks a lot for this proposal!
> > >
> > > We had a related discussion recently in the context of FLINK-19520
> > > (randomizing tests configuration) [1].
> > > I believe other scenarios will benefit as well.
> > >
> > > For the end users, I think substitution in configuration files is
> > > preferable over parsing env vars in Flink code.
> > > And for cases without such a file, we could have a default one on the
> > > classpath with all substitutions defined (and then merge everything
> from
> > > the user-supplied file).
> > >
> > > [1] https://issues.apache.org/jira/browse/FLINK-19520
> > >
> > > Regards,
> > > Roman
> > >
> > >
> > > On Mon, Jan 18, 2021 at 11:11 AM Ingo Bürk  wrote:
> > >
> > > > Hi everyone,
> > > >
> > > > in Ververica Platform we offer a feature to use environment variables
> > in
> > > > the Flink configuration¹, e.g.
> > > >
> > > > ```
> > > > s3.access-key: ${S3_ACCESS_KEY}
> > > > ```
> > > >
> > > > We've been discussing internally whether contributing such a feature
> to
> > > > Flink directly would make sense and wanted to start a discussion on
> > this
> > > > topic.
> > > >
> > > > An alternative way to do so from the above would be parsing those
> > > directly
> > > > based on their name, so instead of having it defined in the Flink
> > > > configuration as above, it would get automatically set if something
> > like
> > > > $FLINK_CONFIG_S3_ACCESS_KEY was set in the environment. This is
> > somewhat
> > > > similar to what e.g. Spring does, and faces similar challenges
> (dealing
> > > > with "."s etc.)
> > > >
> > > > Although I view both of these approaches as mostly orthogonal,
> > supporting
> > > > both very likely wouldn't make sense, of course. So I was wondering
> > what
> > > > your opinion is in terms of whether the project would benefit from
> > > > environment variable support for the Flink configuration, and whether
> > > there
> > > > are tendencies as to which approach to go with.
> > > >
> > > > ¹
> > > >
> > > >
> > >
> >
> https://docs.ververica.com/user_guide/application_operations/deployments/configure_flink.html#environment-variables
> > > >
> > > > Best regards
> > > > Ingo
> > > >
> > >
> >
>


Re: [DISCUSS] Releasing Apache Flink 1.11.3

2020-11-03 Thread Steven Wu
@Stephan Ewen  yeah, we can do that. don't worry about
it. your earlier email had the perfect explanation on why file source
shouldn't be backported.

On Tue, Nov 3, 2020 at 3:37 AM Stephan Ewen  wrote:

> @Steven would it be possible to initially copy some of the code into the
> iceberg source and later replace it by a dependency on the Flink file
> source?
>
> On Mon, Nov 2, 2020 at 8:33 PM Steven Wu  wrote:
>
> > Stephan, thanks a lot for explaining the file connector. that makes
> sense.
> >
> > I was asking because we were trying to reuse some of the implementations
> in
> > the file source for Iceberg source. Flink Iceberg source lives in the
> > Iceberg repo, which is not possible to code against the master branch of
> > the Flink code.
> >
> > On Mon, Nov 2, 2020 at 3:31 AM Stephan Ewen  wrote:
> >
> > > Hi Steven!
> > >
> > > So far there are no plans to pick back the file system connector code.
> > This
> > > is still evolving and not finalized for 1.12, so I don't feel it is a
> > good
> > > candidate to be backported.
> > > However, with the base connector changes backported, you should be able
> > to
> > > run the file connector code from master against 1.11.3.
> > >
> > > The collect() utils can be picked back, I see no issue with that (it is
> > > isolated utilities).
> > >
> > > Best,
> > > Stephan
> > >
> > >
> > > On Mon, Nov 2, 2020 at 3:02 AM Steven Wu  wrote:
> > >
> > > > Basically, it would be great to get the latest code in the
> > > > flink-connector-files (FLIP-27).
> > > >
> > > > On Sat, Oct 31, 2020 at 9:57 AM Steven Wu 
> > wrote:
> > > >
> > > > > Stephan, it will be great if we can also backport the
> DataStreamUtils
> > > > > related commits that help with collecting output from unbounded
> > > streams.
> > > > > e.g.
> > > > >
> > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/flink/commit/09a7a66b7313fea64817fe960a8da1265b428efc
> > > > >
> > > > > I tried to copy and paste the code to unblock myself. but it
> quickly
> > > got
> > > > > into the rabbit hole of more and more code.
> > > > >
> > > > > On Fri, Oct 30, 2020 at 11:02 AM Stephan Ewen 
> > > wrote:
> > > > >
> > > > >> I have started with backporting the source API changes. Some minor
> > > > >> conflicts to solve, will need a bit more to finish this.
> > > > >>
> > > > >> On Fri, Oct 30, 2020 at 7:25 AM Tzu-Li (Gordon) Tai <
> > > > tzuli...@apache.org>
> > > > >> wrote:
> > > > >>
> > > > >> > @Stephan Ewen 
> > > > >> > Are there already plans or ongoing efforts for backporting the
> > list
> > > of
> > > > >> > FLIP-27 changes that you posted?
> > > > >> >
> > > > >> > On Thu, Oct 29, 2020 at 7:08 PM Xintong Song <
> > tonysong...@gmail.com
> > > >
> > > > >> > wrote:
> > > > >> >
> > > > >> >> Hi folks,
> > > > >> >>
> > > > >> >> Just to provide some updates concerning the status on the
> > > > >> >> test instabilities.
> > > > >> >>
> > > > >> >> Currently, we have 30 unresolved tickets labeled with `Affects
> > > > Version`
> > > > >> >> 1.11.x.
> > > > >> >>
> > > > >> >>
> > > > >>
> > > >
> > >
> >
> https://issues.apache.org/jira/browse/FLINK-19775?filter=12348580=project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20AND%20affectedVersion%20in%20(1.11.0%2C%201.11.1%2C%201.11.2%2C%201.11.3)%20AND%20labels%20%3D%20test-stability%20ORDER%20BY%20created%20DESC
> > > > >> >>
> > > > >> >> Among the 30 tickets, 11 of them are:
> > > > >> >> - Have occured in the recent 3 months
> > > > >> >> - Not confirmed to be pure testability issues
> > > > >> >> - Not confirmed to be rare condition cases
> > > > >> >>
> > > > >> >> It would be nice if someone familiar with these components can
> > >

Re: [DISCUSS] Releasing Apache Flink 1.11.3

2020-11-02 Thread Steven Wu
Stephan, thanks a lot for explaining the file connector. that makes sense.

I was asking because we were trying to reuse some of the implementations in
the file source for Iceberg source. Flink Iceberg source lives in the
Iceberg repo, which is not possible to code against the master branch of
the Flink code.

On Mon, Nov 2, 2020 at 3:31 AM Stephan Ewen  wrote:

> Hi Steven!
>
> So far there are no plans to pick back the file system connector code. This
> is still evolving and not finalized for 1.12, so I don't feel it is a good
> candidate to be backported.
> However, with the base connector changes backported, you should be able to
> run the file connector code from master against 1.11.3.
>
> The collect() utils can be picked back, I see no issue with that (it is
> isolated utilities).
>
> Best,
> Stephan
>
>
> On Mon, Nov 2, 2020 at 3:02 AM Steven Wu  wrote:
>
> > Basically, it would be great to get the latest code in the
> > flink-connector-files (FLIP-27).
> >
> > On Sat, Oct 31, 2020 at 9:57 AM Steven Wu  wrote:
> >
> > > Stephan, it will be great if we can also backport the DataStreamUtils
> > > related commits that help with collecting output from unbounded
> streams.
> > > e.g.
> > >
> > >
> > >
> >
> https://github.com/apache/flink/commit/09a7a66b7313fea64817fe960a8da1265b428efc
> > >
> > > I tried to copy and paste the code to unblock myself. but it quickly
> got
> > > into the rabbit hole of more and more code.
> > >
> > > On Fri, Oct 30, 2020 at 11:02 AM Stephan Ewen 
> wrote:
> > >
> > >> I have started with backporting the source API changes. Some minor
> > >> conflicts to solve, will need a bit more to finish this.
> > >>
> > >> On Fri, Oct 30, 2020 at 7:25 AM Tzu-Li (Gordon) Tai <
> > tzuli...@apache.org>
> > >> wrote:
> > >>
> > >> > @Stephan Ewen 
> > >> > Are there already plans or ongoing efforts for backporting the list
> of
> > >> > FLIP-27 changes that you posted?
> > >> >
> > >> > On Thu, Oct 29, 2020 at 7:08 PM Xintong Song  >
> > >> > wrote:
> > >> >
> > >> >> Hi folks,
> > >> >>
> > >> >> Just to provide some updates concerning the status on the
> > >> >> test instabilities.
> > >> >>
> > >> >> Currently, we have 30 unresolved tickets labeled with `Affects
> > Version`
> > >> >> 1.11.x.
> > >> >>
> > >> >>
> > >>
> >
> https://issues.apache.org/jira/browse/FLINK-19775?filter=12348580=project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20AND%20affectedVersion%20in%20(1.11.0%2C%201.11.1%2C%201.11.2%2C%201.11.3)%20AND%20labels%20%3D%20test-stability%20ORDER%20BY%20created%20DESC
> > >> >>
> > >> >> Among the 30 tickets, 11 of them are:
> > >> >> - Have occured in the recent 3 months
> > >> >> - Not confirmed to be pure testability issues
> > >> >> - Not confirmed to be rare condition cases
> > >> >>
> > >> >> It would be nice if someone familiar with these components can
> take a
> > >> look
> > >> >> into these issues.
> > >> >>
> > >> >> - https://issues.apache.org/jira/browse/FLINK-17159 (ES6)
> > >> >> - https://issues.apache.org/jira/browse/FLINK-17912 (Kafka)
> > >> >> - https://issues.apache.org/jira/browse/FLINK-17949 (Kafka)
> > >> >> ⁃ https://issues.apache.org/jira/browse/FLINK-18444 (Kafka)
> > >> >> - https://issues.apache.org/jira/browse/FLINK-18634 (Kafka)
> > >> >> - https://issues.apache.org/jira/browse/FLINK-18648 (Kafka)
> > >> >> - https://issues.apache.org/jira/browse/FLINK-18807 (Kafka)
> > >> >> - https://issues.apache.org/jira/browse/FLINK-19369
> (BlobClientTest)
> > >> >> - https://issues.apache.org/jira/browse/FLINK-19436 (TPCDS)
> > >> >> - https://issues.apache.org/jira/browse/FLINK-19690
> (Format/Parquet)
> > >> >> - https://issues.apache.org/jira/browse/FLINK-19775
> > >> >> (SystemProcessingTimeServiceTest)
> > >> >>
> > >> >> Thank you~
> > >> >>
> > >> >> Xintong Song
> > >> >>
> > >> >>
> > >> >>
> > &

Re: [DISCUSS] Releasing Apache Flink 1.11.3

2020-11-01 Thread Steven Wu
Basically, it would be great to get the latest code in the
flink-connector-files (FLIP-27).

On Sat, Oct 31, 2020 at 9:57 AM Steven Wu  wrote:

> Stephan, it will be great if we can also backport the DataStreamUtils
> related commits that help with collecting output from unbounded streams.
> e.g.
>
>
> https://github.com/apache/flink/commit/09a7a66b7313fea64817fe960a8da1265b428efc
>
> I tried to copy and paste the code to unblock myself. but it quickly got
> into the rabbit hole of more and more code.
>
> On Fri, Oct 30, 2020 at 11:02 AM Stephan Ewen  wrote:
>
>> I have started with backporting the source API changes. Some minor
>> conflicts to solve, will need a bit more to finish this.
>>
>> On Fri, Oct 30, 2020 at 7:25 AM Tzu-Li (Gordon) Tai 
>> wrote:
>>
>> > @Stephan Ewen 
>> > Are there already plans or ongoing efforts for backporting the list of
>> > FLIP-27 changes that you posted?
>> >
>> > On Thu, Oct 29, 2020 at 7:08 PM Xintong Song 
>> > wrote:
>> >
>> >> Hi folks,
>> >>
>> >> Just to provide some updates concerning the status on the
>> >> test instabilities.
>> >>
>> >> Currently, we have 30 unresolved tickets labeled with `Affects Version`
>> >> 1.11.x.
>> >>
>> >>
>> https://issues.apache.org/jira/browse/FLINK-19775?filter=12348580=project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20AND%20affectedVersion%20in%20(1.11.0%2C%201.11.1%2C%201.11.2%2C%201.11.3)%20AND%20labels%20%3D%20test-stability%20ORDER%20BY%20created%20DESC
>> >>
>> >> Among the 30 tickets, 11 of them are:
>> >> - Have occured in the recent 3 months
>> >> - Not confirmed to be pure testability issues
>> >> - Not confirmed to be rare condition cases
>> >>
>> >> It would be nice if someone familiar with these components can take a
>> look
>> >> into these issues.
>> >>
>> >> - https://issues.apache.org/jira/browse/FLINK-17159 (ES6)
>> >> - https://issues.apache.org/jira/browse/FLINK-17912 (Kafka)
>> >> - https://issues.apache.org/jira/browse/FLINK-17949 (Kafka)
>> >> ⁃ https://issues.apache.org/jira/browse/FLINK-18444 (Kafka)
>> >> - https://issues.apache.org/jira/browse/FLINK-18634 (Kafka)
>> >> - https://issues.apache.org/jira/browse/FLINK-18648 (Kafka)
>> >> - https://issues.apache.org/jira/browse/FLINK-18807 (Kafka)
>> >> - https://issues.apache.org/jira/browse/FLINK-19369 (BlobClientTest)
>> >> - https://issues.apache.org/jira/browse/FLINK-19436 (TPCDS)
>> >> - https://issues.apache.org/jira/browse/FLINK-19690 (Format/Parquet)
>> >> - https://issues.apache.org/jira/browse/FLINK-19775
>> >> (SystemProcessingTimeServiceTest)
>> >>
>> >> Thank you~
>> >>
>> >> Xintong Song
>> >>
>> >>
>> >>
>> >> On Thu, Oct 29, 2020 at 10:21 AM Jingsong Li 
>> >> wrote:
>> >>
>> >> > +1 to backport the FLIP-27 adjustments to 1.11.x.
>> >> >
>> >> > If possible, that would be great. Many people are looking forward to
>> the
>> >> > FLIP-27 interface, but they don't want to take the risk to upgrade to
>> >> 1.12
>> >> > (And wait 1.12). After all, 1.11 is a relatively stable version.
>> >> >
>> >> > Best,
>> >> > Jingsong
>> >> >
>> >> > On Thu, Oct 29, 2020 at 1:24 AM Stephan Ewen 
>> wrote:
>> >> >
>> >> > > Thanks for starting this.
>> >> > >
>> >> > > +1 form my side to backport the FLIP-27 adjustments to 1.11.x.
>> >> > >
>> >> > > There were quite a few changes, and I think we need to cherry-pick
>> >> them
>> >> > all
>> >> > > to not get some inconsistent mix of changes and many merge
>> conflicts.
>> >> > > I made a list below of what we need to add to "release-1.11".
>> >> > >
>> >> > > * Core Source API Changes to backport (in REVERSE order)*
>> >> > >
>> >> > >   (Use: "git log
>> >> > > flink-core/src/main/java/org/apache/flink/api/connector/source")
>> >> > >
>> >> > > commit 162c072e9265a7b6dd9d6f5459eb7974091c4c4e
>> >> > > [FLINK-19492][core] Consolidate Source Events between Source API
>

Re: [DISCUSS] Releasing Apache Flink 1.11.3

2020-10-31 Thread Steven Wu
; >> > > * Connector Base Changes to Backport (in REVERSE order)*
> >> > >
> >> > >   (Use: "git log flink-connectors/flink-connector-base")
> >> > >
> >> > > commit 401f56fe9d6b0271260edf9787cdcbfe4d03874d
> >> > > [FLINK-19427][FLINK-19489][tests] Fix test conditions for
> >> > > 'SplitFetcherTest.testNotifiesWhenGoingIdleConcurrent()'
> >> > >
> >> > > commit 68c5c2ff779d82a1ff81ffaf60d8a1b283797db1
> >> > > [FLINK-19448][connector base] Explicitly check for un-expected
> >> condition
> >> > > that would leave an inconsistent state
> >> > >
> >> > > commit 162c072e9265a7b6dd9d6f5459eb7974091c4c4e
> >> > > [FLINK-19492][core] Consolidate Source Events between Source API and
> >> > Split
> >> > > Reader API
> >> > >
> >> > > commit c1ca7a4c7c21ec8868c14c43c559625b794c
> >> > > [refactor][tests] Move some source test utils from
> >> flink-connector-base
> >> > to
> >> > > flink-core
> >> > >
> >> > > commit ee5c4c211c35c70d28252363bbc8400453609977
> >> > > [FLINK-19251][connectors] Avoid confusing queue handling in
> >> > > "SplitReader.handleSplitsChanges()"
> >> > >
> >> > > commit 5abef56b2bf85bcac786f6b16b6899b6cced7176
> >> > > [FLINK-19250][connectors] Fix error propagation in connector base
> >> > > (SplitFetcherManager).
> >> > >
> >> > > commit 8fcca837c55a9216595ee4c03038b52747098dbb
> >> > > [hotfix][connectors] Improve JavaDocs for SingleThreadFetcherManager
> >> > >
> >> > > commit 4700bb5dde3303cbe98882f6beb7379425717b01
> >> > > [FLINK-19225][connectors] Various small improvements to
> >> SourceReaderBase
> >> > > (part 2)
> >> > >
> >> > > commit 12261c6b7ed6478a9b9f6a69cb58246b83cab9b7
> >> > > [FLINK-17393][connectors] (follow-up) Wakeup the SplitFetchers more
> >> > > elegantly.
> >> > >
> >> > > commit c60aaff0249bfd6b5871b7f82e03efc487a54d6b
> >> > > [hotfix][tests] Extend test coverage for
> FutureCompletingBlockingQueue
> >> > >
> >> > > commit cef8a587d7fd2fe64cc644da5ed095d82e46f631
> >> > > [FLINK-19245][connectors] Set default capacity for
> >> > > FutureCompletingBlockingQueue.
> >> > >
> >> > > commit 4ea95782b4c6a2538153d4d16ad3f4839c7de0fb
> >> > > [FLINK-19223][connectors] Simplify Availability Future Model in Base
> >> > > Connector
> >> > >
> >> > > commit 511857049ba30c8ff0ee56da551fa4a479dc583e
> >> > > [FLINK-18128][connectors] Ensure idle split fetchers lead to
> >> availability
> >> > > notifications.
> >> > >
> >> > > commit a8206467af0830dcb89623ea068b5ca3b3450c92
> >> > > [refactor][core] Eagerly initialize the FetchTask to support proper
> >> unit
> >> > > testing
> >> > >
> >> > > commit 3b2f54bcb437f98e6137c904045cc51072b5c06b
> >> > > [hotfix][tests] Move constants in SplitFetcherTest relevant to only
> >> one
> >> > > test into test method
> >> > >
> >> > > commit d7625760a75a508bf05bcddc380bb4d62ee1743e
> >> > > [FLINK-19225][connectors] Various small improvements to
> >> SourceReaderBase
> >> > >
> >> > > commit a5b0d3297748c1be47ad579a88f24df2255a8df1
> >> > > [FLINK-17393][connectors] Wakeup the SplitFetchers more elegantly.
> >> > >
> >> > > commit f42a3ebc3e81a034b7221a803c153636fef34903
> >> > > [FLINK-18680][connectors] Make connector base RecordsWithSplitIds
> more
> >> > > lightweight.
> >> > >
> >> > > commit e3d273de822b085183d09b275a445879ff94b350
> >> > > [FLINK-19162][connectors] Add 'recycle()' to the RecordsWithSplitIds
> >> to
> >> > > support reuse of heavy objects.
> >> > >
> >> > > commit 8ebc464c2520453a70001cd712abc8dee6ee89e0
> >> > > [hotfix][testing] Add a set of parameterizable testing mocks for the
> >> > Split
> >> > > Reader API
> >> > >
> >> > > commit 930a07438be1185388d7150640f294dfe

Re: [DISCUSS] Releasing Apache Flink 1.11.3

2020-10-28 Thread Steven Wu
I would love to see this FLIP-27 source interface improvement [1] made to
1.11.3.

[1] https://issues.apache.org/jira/browse/FLINK-19698

On Wed, Oct 28, 2020 at 12:32 AM Tzu-Li (Gordon) Tai 
wrote:

> Thanks for the replies so far!
>
> Just to provide a brief update on the status of blockers for 1.11.3 so far:
>
>
> *PR opened, pending reviewer*- [FLINK-19717] SourceReaderBase.pollNext may
> return END_OF_INPUT if SplitReader.fetch throws (
> https://github.com/apache/flink/pull/13776)
>
> *PR opened, reviewed + close to being merged*
> - [FLINK-19741] Timer service should skip restoring from raw keyed stream
> if it isn't the writer (https://github.com/apache/flink/pull/13761)
> - [FLINK-19748] Raw keyed stream key group iterator should be skipping
> unwritten key groups (https://github.com/apache/flink/pull/13772)
>
> *Merged*
> - [FLINK-19154] Application mode deletes HA data in case of suspended
> ZooKeeper connection
> - [FLINK-19569] Upgrade ICU4J to 67.1+
>
> Right now as it seems, progress is mainly blocked on a reviewer for
> FLINK-19717.
> Meanwhile, Xintong is keeping an eye on test instabilities [1] to see if
> there are any fixes that should be applied to `release-1.11`.
>
> This is also a reminder, that if there are other blockers that we need to
> be aware of, or a need to re-establish estimated time for getting fixes in
> and delay the RC for 1.11.3, please do let us know!
>
> Cheers,
> Gordon
>
> [1]
>
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20FLINK%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened)%20AND%20affectedVersion%20in%20(1.11.0%2C%201.11.1%2C%201.11.2)%20AND%20labels%20%3D%20test-stability
>
> On Mon, Oct 26, 2020 at 9:43 PM Kostas Kloudas 
> wrote:
>
> > +1 for releasing Flink 1.11.3 as it contains a number of important
> > fixes and thanks Gordon and Xintong for volunteering.
> >
> > Cheers,
> > Kostas
> >
> > On Mon, Oct 26, 2020 at 4:37 AM Yu Li  wrote:
> > >
> > > +1 for releasing Flink 1.11.3, and thanks Gordon and Xintong for
> > > volunteering as our release managers.
> > >
> > > Best Regards,
> > > Yu
> > >
> > >
> > > On Mon, 26 Oct 2020 at 09:45, Xintong Song 
> > wrote:
> > >
> > > > Thanks Gordan for starting this discussion.
> > > > My pleasure to help with the release process.
> > > >
> > > > Thank you~
> > > >
> > > > Xintong Song
> > > >
> > > >
> > > >
> > > > On Fri, Oct 23, 2020 at 11:29 PM Till Rohrmann  >
> > > > wrote:
> > > >
> > > > > Thanks for starting this discussion Gordon. There are over 100
> issues
> > > > > which are fixed for 1.11.3. Hence +1 for a soonish 1.11.3 release.
> > Thanks
> > > > > for volunteering as our release managers Gordon and Xintong!
> > > > >
> > > > > Cheers,
> > > > > Till
> > > > >
> > > > > On Fri, Oct 23, 2020 at 5:02 PM Tzu-Li (Gordon) Tai <
> > tzuli...@apache.org
> > > > >
> > > > > wrote:
> > > > >
> > > > >> Hi,
> > > > >>
> > > > >> Xintong and I would like to start a discussion for releasing Flink
> > > > 1.11.3
> > > > >> soon.
> > > > >>
> > > > >> It seems like we already have a few pressing issues that needs to
> be
> > > > >> included in a new hotfix release:
> > > > >>
> > > > >>- Heap-based timers’ restore behaviour is causing a critical
> > recovery
> > > > >>issue for StateFun [1] [2] [3].
> > > > >>- There are several robustness issues for the FLIP-27 new
> source
> > API,
> > > > >>such as [4]. We already have some users using the FLIP-27 API
> > with
> > > > >> 1.11.x,
> > > > >>so it would be important to get those fixes in for 1.11.x as
> > well.
> > > > >>
> > > > >> Apart from the issues that are already marked as blocker for
> 1.11.3
> > in
> > > > our
> > > > >> JIRA [5], please let us know in this thread if there is already
> > ongoing
> > > > >> work for other important fixes that we should try to include.
> > > > >>
> > > > >> Xintong and I would like to volunteer for managing this release,
> and
> > > > will
> > > > >> try to communicate the priority of pending blockers over the next
> > few
> > > > >> days.
> > > > >> Since the aforementioned issues are quite critical, we’d like to
> aim
> > > > >> for a *feature
> > > > >> freeze by the end of next week (Oct. 30th)* and start the release
> > voting
> > > > >> process the week after.
> > > > >> If that is too short of a notice and you might need more time,
> > please
> > > > let
> > > > >> us know!
> > > > >>
> > > > >> Cheers,
> > > > >> Gordon
> > > > >>
> > > > >> [1] https://issues.apache.org/jira/browse/FLINK-19692
> > > > >> [2] https://issues.apache.org/jira/browse/FLINK-19741
> > > > >> [3] https://issues.apache.org/jira/browse/FLINK-19748
> > > > >> [4] https://issues.apache.org/jira/browse/FLINK-19717
> > > > >> [5]
> > > > >>
> > > > >>
> > > >
> >
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20FLINK%20AND%20priority%20%3D%20Blocker%20AND%20fixVersion%20%3D%201.11.3
> > > > >>
> > > > >
> > > >
> >
>


Re: [VOTE] FLIP-135: Approximate Task-Local Recovery

2020-10-20 Thread Steven Wu
+1 (non-binding).

Some of our users have asked for this tradeoff of consistency over
availability for some cases.

On Mon, Oct 19, 2020 at 8:02 PM Zhijiang 
wrote:

> Thanks for driving this effort, Yuan.
>
> +1 (binding) on my side.
>
> Best,
> Zhijiang
>
>
> --
> From:Piotr Nowojski 
> Send Time:2020年10月19日(星期一) 21:02
> To:dev 
> Subject:Re: [VOTE] FLIP-135: Approximate Task-Local Recovery
>
> Hey,
>
> I carry over my +1 (binding) from the discussion thread.
>
> Best,
> Piotrek
>
> pon., 19 paź 2020 o 14:56 Yuan Mei  napisał(a):
>
> > Hey,
> >
> > I would like to start a voting thread for FLIP-135 [1], for approximate
> > task local recovery. The proposal has been discussed in [2].
> >
> > The vote will be open till Oct. 23rd (72h, excluding weekends) unless
> there
> > is an objection or not enough votes.
> >
> > [1]
> >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-135+Approximate+Task-Local+Recovery
> > [2]
> >
> >
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-135-Approximate-Task-Local-Recovery-tp43930.html
> >
> >
> > Best
> >
> > Yuan
> >
>
>


Re: [VOTE] FLIP-143: Unified Sink API

2020-09-27 Thread Steven Wu
+1 (non-binding)

Although I would love to continue the discussion for tweaking the
CommitResult/GlobaCommitter interface maybe during the implementation phase.

On Fri, Sep 25, 2020 at 5:35 AM Aljoscha Krettek 
wrote:

> +1 (binding)
>
> Aljoscha
>
> On 25.09.20 14:26, Guowei Ma wrote:
> >  From the discussion[1] we could find that FLIP focuses on providing an
> > unified transactional sink API. So I updated the FLIP's title to "Unified
> > Transactional Sink API". But I found that the old link could not be
> opened
> > again.
> >
> > I would update the link[2] here. Sorry for the inconvenience.
> >
> > [1]
> >
> https://lists.apache.org/thread.html/rf09dfeeaf35da5ee98afe559b5a6e955c9f03ade0262727f6b5c4c1e%40%3Cdev.flink.apache.org%3E
> > [2] https://cwiki.apache.org/confluence/x/KEJ4CQ
> >
> > Best,
> > Guowei
> >
> >
> > On Thu, Sep 24, 2020 at 8:13 PM Guowei Ma  wrote:
> >
> >> Hi, all
> >>
> >> After the discussion in [1], I would like to open a voting thread for
> >> FLIP-143 [2], which proposes a unified sink api.
> >>
> >> The vote will be open until September 29th (72h + weekend), unless there
> >> is an objection or not enough votes.
> >>
> >> [1]
> >>
> https://lists.apache.org/thread.html/rf09dfeeaf35da5ee98afe559b5a6e955c9f03ade0262727f6b5c4c1e%40%3Cdev.flink.apache.org%3E
> >> [2]
> >>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-143%3A+Unified+Sink+API
> >>
> >> Best,
> >> Guowei
> >>
> >
>
>


Re: [DISCUSS] FLIP-143: Unified Sink API

2020-09-27 Thread Steven Wu
It is more about extended outages of metastore. E.g. If we commit every 2
minutes, 4 hours of metastore outage can lead to over 120 GlobalCommitT.
And regarding metastore outages, it is undesirable for streaming jobs to
fail the job and keep restarting. It is better to keep processing records
(avoiding backlog) and upload to DFS (like S3). Commit will succeed
whenever the metastore comes back. It also provides a nice automatic
recovery story. Since GlobalCommT combines all data files (hundreds or
thousands in one checkpoint cycle) into a single item in state, this really
makes it scalable and efficient to deal with extended metastore outages.

"CommitResult commit(GlobalCommitT)" API can work, although it is less
efficient and flexible for some sinks. It is probably better to let sink
implementations decide what is the best retry behavior: one by one vs a big
batch/transaction. Hence I would propose APIs like these.
--
interface GlobalCommitter {
  // commit all pending GlobalCommitT items accumulated
  CommitResult commit(List)
}

interface CommitResult {
  List getSucceededCommitables();
  List getFailedCommitables();

  // most likely, framework just need to check and roll over the retryable
list to the next commit try
  List getRetrableCommittables();
}
---

Anyway, I am going to vote yes on the voting thread, since it is important
to move forward to meet the 1.12 goal. We can also discuss the small tweak
during the implementation phase.

Thanks,
Steven


On Sat, Sep 26, 2020 at 8:46 PM Guowei Ma  wrote:

> Hi Steven
>
> Thank you very much for your detailed explanation.
>
> Now I got your point, I could see that there are benefits from committing a
> collection of `GlobalCommT` as a whole when the external metastore
> environment is unstable at some time.
>
> But I have two little concern about introducing committing the collection
> of `GlobalCommit`:
>
> 1. For Option1: CommitResult commit(List). This option
> implies that users should commit to the collection of `GlobalCommit` as a
> whole.
> But maybe not all the system could do it as a whole, for example changing
> some file names could not do it. If it is the case I think maybe some guy
> would always ask the same question as I asked in the previous mail.
>
> 2. For Option2: List commit(List). This option
> is more clear than the first one. But IMHO this option has only benefits
> when the external metastore is unstable and we want to retry many times and
> not fail the job. Maybe we should not rety so many times and end up with a
> lot of the uncommitted `GlobalCommitT`. If this is the case maybe we should
> make the api more clear/simple for the normal scenario. In addition there
> is only a globalcommit instance so I think the external system could bear
> the pressure.
>
> So personally I would like to say we might keep the API simpler at the
> beginning in 1.12
>
> What do you think?
>
> Best,
> Guowei
>
>
> On Fri, Sep 25, 2020 at 9:30 PM Steven Wu  wrote:
>
> > I should clarify my last email a little more.
> >
> > For the example of commits for checkpoints 1-100 failed, the job is still
> > up (processing records and uploading files). When commit for checkpoint
> 101
> > came, IcebergSink would prefer the framework to pass in all 101
> GlobalCommT
> > (100 old + 1 new) so that it can commit all of them in one transaction.
> it
> > is more efficient than 101 separate transactions.
> >
> > Maybe the GlobalCommitter#commit semantics is to give the sink all
> > uncommitted GlobalCommT items and let sink implementation decide whether
> to
> > retry one by one or in a single transaction. It could mean that we need
> to
> > expand the CommitResult (e.g. a list for each result type, SUCCESS,
> > FAILURE, RETRY) interface. We can also start with the simple enum style
> > result for the whole list for now. If we need to break the experimental
> > API, it is also not a big deal since we only need to update a few sink
> > implementations.
> >
> > Thanks,
> > Steven
> >
> > On Fri, Sep 25, 2020 at 5:56 AM Steven Wu  wrote:
> >
> > > > 1. The frame can not know which `GlobalCommT` to retry if we use the
> > > > List as parameter when the `commit` returns `RETRY`.
> > > > 2. Of course we can let the `commit` return more detailed info but it
> > > might
> > > > be too complicated.
> > >
> > > If commit(List) returns RETRY, it means the whole list
> needs
> > > to be retried. E.g. we have some outage with metadata service, commits
> > for
> > > checkpoints 1-100 failed. We can accumulate 100 GlobalCommT items. we
> >

Re: [DISCUSS] FLIP-143: Unified Sink API

2020-09-25 Thread Steven Wu
I should clarify my last email a little more.

For the example of commits for checkpoints 1-100 failed, the job is still
up (processing records and uploading files). When commit for checkpoint 101
came, IcebergSink would prefer the framework to pass in all 101 GlobalCommT
(100 old + 1 new) so that it can commit all of them in one transaction. it
is more efficient than 101 separate transactions.

Maybe the GlobalCommitter#commit semantics is to give the sink all
uncommitted GlobalCommT items and let sink implementation decide whether to
retry one by one or in a single transaction. It could mean that we need to
expand the CommitResult (e.g. a list for each result type, SUCCESS,
FAILURE, RETRY) interface. We can also start with the simple enum style
result for the whole list for now. If we need to break the experimental
API, it is also not a big deal since we only need to update a few sink
implementations.

Thanks,
Steven

On Fri, Sep 25, 2020 at 5:56 AM Steven Wu  wrote:

> > 1. The frame can not know which `GlobalCommT` to retry if we use the
> > List as parameter when the `commit` returns `RETRY`.
> > 2. Of course we can let the `commit` return more detailed info but it
> might
> > be too complicated.
>
> If commit(List) returns RETRY, it means the whole list needs
> to be retried. E.g. we have some outage with metadata service, commits for
> checkpoints 1-100 failed. We can accumulate 100 GlobalCommT items. we don't
> want to commit them one by one. It is faster to commit the whole list as
> one batch.
>
> > 3. On the other hand, I think only when restoring IcebergSink needs a
> > collection of `GlobalCommT` and giving back another collection of
> > `GlobalCommT` that are not committed
>
> That is when the job restarted due to failure or deployment.
>
>
> On Fri, Sep 25, 2020 at 5:24 AM Guowei Ma  wrote:
>
>> Hi, all
>>
>> From the above discussion we could find that FLIP focuses on providing an
>> unified transactional sink API. So I updated the FLIP's title to "Unified
>> Transactional Sink API". But I found that the old link could not be opened
>> again.
>>
>> I would update the link[1] here. Sorry for the inconvenience.
>>
>> [1]https://cwiki.apache.org/confluence/x/KEJ4CQ
>>
>> Best,
>> Guowei
>>
>>
>> On Fri, Sep 25, 2020 at 3:26 PM Guowei Ma  wrote:
>>
>> > Hi, Steven
>> >
>> > >>I also have a clarifying question regarding the WriterStateT. Since
>> > >>IcebergWriter won't need to checkpoint any state, should we set it to
>> > *Void*
>> > >>type? Since getWriterStateSerializer() returns Optional, that is clear
>> > and
>> > >>we can return Optional.empty().
>> >
>> > Yes I think you could do it. If you return Optional.empty() we would
>> > ignore all the state you return.
>> >
>> > Best,
>> > Guowei
>> >
>> >
>> > On Fri, Sep 25, 2020 at 3:14 PM Guowei Ma  wrote:
>> >
>> >> Hi,Steven
>> >>
>> >> Thank you for reading the FLIP so carefully.
>> >> 1. The frame can not know which `GlobalCommT` to retry if we use the
>> >> List as parameter when the `commit` returns `RETRY`.
>> >> 2. Of course we can let the `commit` return more detailed info but it
>> >> might be too complicated.
>> >> 3. On the other hand, I think only when restoring IcebergSink needs a
>> >> collection of `GlobalCommT` and giving back another collection of
>> >> `GlobalCommT` that are not committed.
>> >>
>> >> Best,
>> >> Guowei
>> >>
>> >>
>> >> On Fri, Sep 25, 2020 at 1:45 AM Steven Wu 
>> wrote:
>> >>
>> >>> Guowei,
>> >>>
>> >>> Thanks a lot for updating the wiki page. It looks great.
>> >>>
>> >>> I noticed one inconsistency in the wiki with your last summary email
>> for
>> >>> GlobalCommitter interface. I think the version in the summary email is
>> >>> the
>> >>> intended one, because rollover from previous failed commits can
>> >>> accumulate
>> >>> a list.
>> >>> CommitResult commit(GlobalCommT globalCommittable); // in the wiki
>> >>> =>
>> >>> CommitResult commit(List globalCommittable);  // in the
>> >>> summary email
>> >>>
>> >>> I also have a clarifying question regarding the WriterStateT. Since
>> >>> IcebergWriter won't need to checkpoint any state, 

Re: [DISCUSS] FLIP-143: Unified Sink API

2020-09-25 Thread Steven Wu
> 1. The frame can not know which `GlobalCommT` to retry if we use the
> List as parameter when the `commit` returns `RETRY`.
> 2. Of course we can let the `commit` return more detailed info but it
might
> be too complicated.

If commit(List) returns RETRY, it means the whole list needs
to be retried. E.g. we have some outage with metadata service, commits for
checkpoints 1-100 failed. We can accumulate 100 GlobalCommT items. we don't
want to commit them one by one. It is faster to commit the whole list as
one batch.

> 3. On the other hand, I think only when restoring IcebergSink needs a
> collection of `GlobalCommT` and giving back another collection of
> `GlobalCommT` that are not committed

That is when the job restarted due to failure or deployment.


On Fri, Sep 25, 2020 at 5:24 AM Guowei Ma  wrote:

> Hi, all
>
> From the above discussion we could find that FLIP focuses on providing an
> unified transactional sink API. So I updated the FLIP's title to "Unified
> Transactional Sink API". But I found that the old link could not be opened
> again.
>
> I would update the link[1] here. Sorry for the inconvenience.
>
> [1]https://cwiki.apache.org/confluence/x/KEJ4CQ
>
> Best,
> Guowei
>
>
> On Fri, Sep 25, 2020 at 3:26 PM Guowei Ma  wrote:
>
> > Hi, Steven
> >
> > >>I also have a clarifying question regarding the WriterStateT. Since
> > >>IcebergWriter won't need to checkpoint any state, should we set it to
> > *Void*
> > >>type? Since getWriterStateSerializer() returns Optional, that is clear
> > and
> > >>we can return Optional.empty().
> >
> > Yes I think you could do it. If you return Optional.empty() we would
> > ignore all the state you return.
> >
> > Best,
> > Guowei
> >
> >
> > On Fri, Sep 25, 2020 at 3:14 PM Guowei Ma  wrote:
> >
> >> Hi,Steven
> >>
> >> Thank you for reading the FLIP so carefully.
> >> 1. The frame can not know which `GlobalCommT` to retry if we use the
> >> List as parameter when the `commit` returns `RETRY`.
> >> 2. Of course we can let the `commit` return more detailed info but it
> >> might be too complicated.
> >> 3. On the other hand, I think only when restoring IcebergSink needs a
> >> collection of `GlobalCommT` and giving back another collection of
> >> `GlobalCommT` that are not committed.
> >>
> >> Best,
> >> Guowei
> >>
> >>
> >> On Fri, Sep 25, 2020 at 1:45 AM Steven Wu  wrote:
> >>
> >>> Guowei,
> >>>
> >>> Thanks a lot for updating the wiki page. It looks great.
> >>>
> >>> I noticed one inconsistency in the wiki with your last summary email
> for
> >>> GlobalCommitter interface. I think the version in the summary email is
> >>> the
> >>> intended one, because rollover from previous failed commits can
> >>> accumulate
> >>> a list.
> >>> CommitResult commit(GlobalCommT globalCommittable); // in the wiki
> >>> =>
> >>> CommitResult commit(List globalCommittable);  // in the
> >>> summary email
> >>>
> >>> I also have a clarifying question regarding the WriterStateT. Since
> >>> IcebergWriter won't need to checkpoint any state, should we set it to
> >>> *Void*
> >>> type? Since getWriterStateSerializer() returns Optional, that is clear
> >>> and
> >>> we can return Optional.empty().
> >>>
> >>> Thanks,
> >>> Steven
> >>>
> >>> On Wed, Sep 23, 2020 at 6:59 PM Guowei Ma 
> wrote:
> >>>
> >>> > Thanks Aljoscha for your suggestion.  I have updated FLIP. Any
> >>> comments are
> >>> > welcome.
> >>> >
> >>> > Best,
> >>> > Guowei
> >>> >
> >>> >
> >>> > On Wed, Sep 23, 2020 at 4:25 PM Aljoscha Krettek <
> aljos...@apache.org>
> >>> > wrote:
> >>> >
> >>> > > Yes, that sounds good! I'll probably have some comments on the FLIP
> >>> > > about the names of generic parameters and the Javadoc but we can
> >>> address
> >>> > > them later or during implementation.
> >>> > >
> >>> > > I also think that we probably need the FAIL,RETRY,SUCCESS result
> for
> >>> > > globalCommit() but we can also do that as a later addition.
> >>> > >
> >>> > > So I think 

Re: [DISCUSS] FLIP-143: Unified Sink API

2020-09-24 Thread Steven Wu
Guowei,

Thanks a lot for updating the wiki page. It looks great.

I noticed one inconsistency in the wiki with your last summary email for
GlobalCommitter interface. I think the version in the summary email is the
intended one, because rollover from previous failed commits can accumulate
a list.
CommitResult commit(GlobalCommT globalCommittable); // in the wiki
=>
CommitResult commit(List globalCommittable);  // in the
summary email

I also have a clarifying question regarding the WriterStateT. Since
IcebergWriter won't need to checkpoint any state, should we set it to *Void*
type? Since getWriterStateSerializer() returns Optional, that is clear and
we can return Optional.empty().

Thanks,
Steven

On Wed, Sep 23, 2020 at 6:59 PM Guowei Ma  wrote:

> Thanks Aljoscha for your suggestion.  I have updated FLIP. Any comments are
> welcome.
>
> Best,
> Guowei
>
>
> On Wed, Sep 23, 2020 at 4:25 PM Aljoscha Krettek 
> wrote:
>
> > Yes, that sounds good! I'll probably have some comments on the FLIP
> > about the names of generic parameters and the Javadoc but we can address
> > them later or during implementation.
> >
> > I also think that we probably need the FAIL,RETRY,SUCCESS result for
> > globalCommit() but we can also do that as a later addition.
> >
> > So I think we're good to go to update the FLIP, do any last minute
> > changes and then vote.
> >
> > Best,
> > Aljoscha
> >
> > On 23.09.20 06:13, Guowei Ma wrote:
> > > Hi, all
> > >
> > > Thank everyone very much for your ideas and suggestions. I would try to
> > > summarize again the consensus :). Correct me if I am wrong or
> > misunderstand
> > > you.
> > >
> > > ## Consensus-1
> > >
> > > 1. The motivation of the unified sink API is to decouple the sink
> > > implementation from the different runtime execution mode.
> > > 2. The initial scope of the unified sink API only covers the file
> system
> > > type, which supports the real transactions. The FLIP focuses more on
> the
> > > semantics the new sink api should support.
> > > 3. We prefer the first alternative API, which could give the framework
> a
> > > greater opportunity to optimize.
> > > 4. The `Writer` needs to add a method `prepareCommit`, which would be
> > > called from `prepareSnapshotPreBarrier`. And remove the `Flush` method.
> > > 5. The FLIP could move the `Snapshot & Drain` section in order to be
> more
> > > focused.
> > >
> > > ## Consensus-2
> > >
> > > 1. What should the “Unified Sink API” support/cover? It includes two
> > > aspects. 1. The same sink implementation would work for both the batch
> > and
> > > stream execution mode. 2. In the long run we should give the sink
> > developer
> > > the ability of building “arbitrary” topologies. But for Flink-1.12 we
> > > should be more focused on only satisfying the S3/HDFS/Iceberg sink.
> > > 2. Because the batch execution mode does not have the normal checkpoint
> > the
> > > sink developer should not depend on it any more if we want a unified
> > sink.
> > > 3. We can benefit by providing an asynchronous `Writer` version. But
> > > because the unified sink is already very complicated, we don’t add this
> > in
> > > the first version.
> > >
> > >
> > > According to these consensus I would propose the first version of the
> new
> > > sink api as follows. What do you think? Any comments are welcome.
> > >
> > > /**
> > >   * This interface lets the sink developer build a simple transactional
> > sink
> > > topology pattern, which satisfies the HDFS/S3/Iceberg sink.
> > >   * This sink topology includes one {@link Writer} + one {@link
> > Committer} +
> > > one {@link GlobalCommitter}.
> > >   * The {@link Writer} is responsible for producing the committable.
> > >   * The {@link Committer} is responsible for committing a single
> > > committables.
> > >   * The {@link GlobalCommitter} is responsible for committing an
> > aggregated
> > > committable, which we called global committables.
> > >   *
> > >   * But both the {@link Committer} and the {@link GlobalCommitter} are
> > > optional.
> > >   */
> > > interface TSink {
> > >
> > >  Writer createWriter(InitContext
> > initContext);
> > >
> > >  Writer restoreWriter(InitContext
> > initContext,
> > > List states);
> > >
> > >  Optional> createCommitter();
> > >
> > >  Optional>
> > createGlobalCommitter();
> > >
> > >  SimpleVersionedSerializer getCommittableSerializer();
> > >
> > >  Optional>
> > > getGlobalCommittableSerializer();
> > > }
> > >
> > > /**
> > >   * The {@link GlobalCommitter} is responsible for committing an
> > aggregated
> > > committable, which we called global committables.
> > >   */
> > > interface GlobalCommitter {
> > >
> > >  /**
> > >   * This method is called when restoring from a failover.
> > >   * @param globalCommittables the global committables that are
> > not
> > > committed in the previous session.
> > >   * @return the global committables that should be committed

Re: [DISCUSS] FLIP-143: Unified Sink API

2020-09-22 Thread Steven Wu
Previous APIs discussed have been trying to do more in the framework. If we
take a different approach to a lighter framework, these sets of
minimal APIs are probably good enough. Sink can handle the bookkeeping,
merge, retry logics.

/**
 * CommT is the DataFile in Iceberg
 * GlobalCommT is the checkpoint data type, like ManifestFile in Iceberg
*/
interface GlobalCommitter {

  void collect(CommT);

  void commit();

  List snapshotState();

  // this is just a callback to sink so that it can perform filter and
retain the uncommitted GlobalCommT in the internal bookkeeping
  void recoveredCommittables(List) ;
}

The most important need from the framework is to run GlobalCommitter in the
jobmanager. It involves the topology creation, checkpoint handling,
serializing the executions of commit() calls etc.

Thanks,
Steven

On Tue, Sep 22, 2020 at 6:39 AM Steven Wu  wrote:

> It is fine to leave the CommitResult/RETRY outside the scope of framework.
> Then the framework might need to provide some hooks in the
> checkpoint/restore logic. because the commit happened in the post
> checkpoint completion step, sink needs to update the internal state when
> the commit is successful so that the next checkpoint won't include the
> committed GlobalCommT.
>
> Maybe GlobalCommitter can have an API like this?
> > List snapshotState();
>
> But then we still need the recover API if we don't let sink directly
> manage the state.
> > List recoverCommittables(List)
>
> Thanks,
> Steven
>
> On Tue, Sep 22, 2020 at 6:33 AM Aljoscha Krettek 
> wrote:
>
>> On 22.09.20 13:26, Guowei Ma wrote:
>> > Actually I am not sure adding `isAvailable` is enough. Maybe it is not.
>> > But for the initial version I hope we could make the sink api sync
>> because
>> > there is already a lot of stuff that has to finish. :--)
>>
>> I agree, for the first version we should stick to a simpler synchronous
>> interface.
>>
>> Aljoscha
>>
>


Re: [DISCUSS] FLIP-143: Unified Sink API

2020-09-22 Thread Steven Wu
It is fine to leave the CommitResult/RETRY outside the scope of framework.
Then the framework might need to provide some hooks in the
checkpoint/restore logic. because the commit happened in the post
checkpoint completion step, sink needs to update the internal state when
the commit is successful so that the next checkpoint won't include the
committed GlobalCommT.

Maybe GlobalCommitter can have an API like this?
> List snapshotState();

But then we still need the recover API if we don't let sink directly manage
the state.
> List recoverCommittables(List)

Thanks,
Steven

On Tue, Sep 22, 2020 at 6:33 AM Aljoscha Krettek 
wrote:

> On 22.09.20 13:26, Guowei Ma wrote:
> > Actually I am not sure adding `isAvailable` is enough. Maybe it is not.
> > But for the initial version I hope we could make the sink api sync
> because
> > there is already a lot of stuff that has to finish. :--)
>
> I agree, for the first version we should stick to a simpler synchronous
> interface.
>
> Aljoscha
>


Re: [DISCUSS] FLIP-143: Unified Sink API

2020-09-21 Thread Steven Wu
ieve that we could support such an async sink writer
> >> very easily in the future. What do you think?
> >
> > How would you see the expansion in the future? Do you mean just adding
> > `isAvailable()` method with a default implementation later on?
> >
> > Piotrek
> >
> > pon., 21 wrz 2020 o 02:39 Steven Wu  napisał(a):
> >
> >>> I think Iceberg sink needs to do the dedup in the `commit` call. The
> >> `recoveredGlobalCommittables` is just for restoring the ids.
> >>
> >>
> >> @Guowei Ma   It is undesirable to do the dedup
> check
> >> in the `commit` call, because it happens for each checkpoint cycle. We
> only
> >> need to do the de-dup check one time when restoring GlobalCommT list
> from
> >> the checkpoint.
> >>
> >>
> >> Can you clarify the purpose of `recoveredGlobalCommittables`? If it is
> to
> >> let sink implementations know the recovered GlobalCommT list, it is
> >> probably not a sufficient API. For the Iceberg sink, we can try to
> >> implement the de-dup check  inside the `recoveredGlobalCommittables`
> method
> >> and commit any uncommitted GlobalCommT items. But how do we handle the
> >> commit failed?
> >>
> >>
> >> One alternative is to allow sink implementations to override "Li
> >> st recoverGlobalCommittables()". Framework handles the
> >> checkpoint/state, and sink implementations can further customize the
> >> restored list with de-dup check and filtering. Recovered uncommitted
> >> GlobalCommT list will be committed in the next cycle. It is the same
> >> rollover strategy for commit failure handling that we have been
> discussing.
> >>
> >>
> >> ## topologies
> >>
> >>
> >> Regarding the topology options, if we agree that there is no one size
> fit
> >> for all, we can let sink implementations choose the best topology. Maybe
> >> the framework can provide 2-3 pre-defined topology implementations to
> help
> >> the sinks.
> >>
> >>
> >>
> >>
> >> On Sun, Sep 20, 2020 at 3:27 AM Guowei Ma  wrote:
> >>
> >>> I would like to summarize the file type sink in the thread and their
> >>> possible topologies.  I also try to give pros and cons of every
> topology
> >>> option. Correct me if I am wrong.
> >>>
> >>> ### FileSink
> >>>
> >>> Topology Option: TmpFileWriter + Committer.
> >>>
> >>> ### IceBerg Sink
> >>>
> >>>  Topology Option1: `DataFileWriter` + `GlobalCommitterV0`.
> >>> Pro:
> >>> 1. Same group has some id.
> >>> Cons:
> >>> 1. May limit users’ optimization space;
> >>> 2. The topology does not meet the Hive’s requirements.
> >>>
> >>>  Topology Option 2: `DataFileWriter` + `GlobalCommitterV1`
> >>> Pro:
> >>> 1. User has the opportunity to optimize the implementation of
> idempotence
> >>> Cons:
> >>> 2. Make the GlobalCommit more complicated.
> >>> 3. The topology does not meets the Hive’s requirements
> >>>
> >>> ### Topology Option3: DataFileWriter + AggWriter + Committer
> >>>
> >>> Pros:
> >>> 1. Use two basic `Writer` & `Commiter` to meet the IceBerge’s
> >> requirements.
> >>> 2. Opportunity to optimize the implementation of idempotence
> >>> 3. The topology meets the Hive’s requirements.(See flowing)
> >>> Con:
> >>> 1. It introduce a relative complex topologies
> >>>
> >>> ## HiveSink
> >>>
> >>> ### Topology Option1: `TmpFileWriter` + `Committer` +
> >> `GlobalCommitterV2`.
> >>> Pro:
> >>> 1. Could skip the cleanup problem at first.
> >>> Con:
> >>> 1. This style topology does not meet the CompactHiveSink requirements.
> >>>
> >>> ### Topology Option2: `TmpFileWriter` + `Committer` + `AggWriter` +
> >>> `Committer`
> >>> Pros
> >>> 1. Could skip the clean up problem at first.
> >>> 2. Decouple the GlobalCommitterV2 to `AggWriter` + `Committer`
> >>> Cons
> >>> 1. This style topology does not meet the CompactHiveSink requirements.
> >>> 2. There are two general `Committers` in the topology. For Hive’s case
> >>> there might be no problem. But there might be a problem in 1.12. For
> >>> example 

Re: [DISCUSS] FLIP-143: Unified Sink API

2020-09-20 Thread Steven Wu
th the Option1 and
> Option2 need to cleanup)
>
>
> ### Summary
>
> From above we could divide the sink topology into two parts:
> 1. Write topology.
> 2. And One committer
>
> So we could provide a unified sink API looks like the following:
>
> public interface Sink {
> List> getWriters();
> Committer createCommitter()
> }
>
> In the long run maybe we could give the user more powerful ability like
> this (Currently some transformation still belongs to runtime):
> Sink {
> Transformation createWriteTopology();
>  CommitFunction createCommitter();
> }
>
> Best,
> Guowei
>
>
> On Sun, Sep 20, 2020 at 6:09 PM Guowei Ma  wrote:
>
>> Hi, Stevn
>> I want to make a clarification first, the following reply only considers
>> the Iceberge sink, but does not consider other sinks.  Before make decision
>> we should consider all the sink.I would try to summary all the sink
>> requirments in the next mail
>>
>>
>> >>  run global committer in jobmanager (e.g. like sink coordinator)
>>
>> I think it could be.
>>
>>
>> >> You meant GlobalCommit -> GlobalCommT, right?
>>
>> Yes. Thanks :)
>>
>>
>> >> Is this called when restored from checkpoint/savepoint?
>>
>> Yes.
>>
>>
>> >>Iceberg sink needs to do a dup check here on which GlobalCommT were
>> committed and which weren't. Should it return the filtered/de-duped list of
>> GlobalCommT?
>>
>>
>> I think Iceberg sink needs to do the dedup in the `commit` call. The
>> `recoveredGlobalCommittables` is just for restoring the ids.
>>
>>
>> >> Sink implementation can decide if it wants to commit immediately or
>> just leave
>>
>> I think only the frame knows *when* call the commit function.
>>
>>
>> >>should this be "commit(List)"?
>>
>> It could be. thanks.
>>
>>
>> Best,
>> Guowei
>>
>>
>> On Sun, Sep 20, 2020 at 12:11 AM Steven Wu  wrote:
>>
>>> > I prefer to let the developer produce id to dedupe. I think this gives
>>> the developer more opportunity to optimize.
>>>
>>> Thinking about it again, I totally agree with Guowei on this. We don't
>>> really need the framework to generate the unique id for Iceberg sink.
>>> De-dup logic is totally internal to Iceberg sink and should be isolated
>>> inside. My earlier question regarding "commitGlobally(List)
>>> can be concurrent or not" also becomes irrelevant, as long as the framework
>>> handles the GlobalCommT list properly (even with concurrent calls).
>>>
>>> Here are the things where framework can help
>>>
>>>1. run global committer in jobmanager (e.g. like sink coordinator)
>>>2. help with checkpointing, bookkeeping, commit failure handling,
>>>recovery
>>>
>>>
>>> @Guowei Ma  regarding the GlobalCommitter
>>> interface, I have some clarifying questions.
>>>
>>> > void recoveredGlobalCommittables(List globalCommits)
>>>
>>>1. You meant GlobalCommit -> GlobalCommT, right?
>>>2. Is this called when restored from checkpoint/savepoint?
>>>3.  Iceberg sink needs to do a dup check here on which GlobalCommT
>>>were committed and which weren't. Should it return the filtered/de-duped
>>>list of GlobalCommT?
>>>4. Sink implementation can decide if it wants to commit immediately
>>>or just leave
>>>
>>> > void commit(GlobalCommit globalCommit);
>>>
>>> should this be "commit(List)"?
>>>
>>> Thanks,
>>> Steven
>>>
>>>
>>> On Sat, Sep 19, 2020 at 1:56 AM Guowei Ma  wrote:
>>>
>>>> Hi, all
>>>>
>>>> >>Just to add to what Aljoscha said regarding the unique id. Iceberg
>>>> sink
>>>> >>checkpoints the unique id into state during snapshot. It also inserts
>>>> the
>>>> >>unique id into the Iceberg snapshot metadata during commit. When a job
>>>> >>restores the state after failure, it needs to know if the restored
>>>> >>transactions/commits were successful or not. It basically iterates
>>>> through
>>>> >>the list of table snapshots from Iceberg and matches the unique ids
>>>> with
>>>> >>what is stored in Iceberg snapshot metadata.
>>>>
>>>> Thanks Steven fo

Re: [DISCUSS] FLIP-143: Unified Sink API

2020-09-19 Thread Steven Wu
k users could generate more efficient nonce such as an auto-increment
> one. Therefore, it seems to provide more optimization chances if we let
> users to generate the nonce.
>
>
> ### Alternative Option
>
> public interface GlobalCommit {
> // provide some runtime context such as attempt-id,job-id,task-id.
> void open(InitContext context);
>
> // This GlobalCommit would aggregate the committable to a
> GlobalCommit before doing the commit operation.
> GlobalCommT combine(List commitables)
>
> // This method would be called after committing all the
> GlobalCommit producing in the previous session.
> void recoveredGlobalCommittables(List globalCommits)
>
> // developer would guarantee the idempotency by himself
> void commit(GlobalCommit globalCommit);
> }
>
> User could guarantee the idenpointecy himself in a more efficient or
> application specific way. If the user wants the `GlobalCommit` to be
> executed in a distributed way, the user could use the runtime information
> to generate the partial order id himself.(We could ignore the clean up
> first)
>
> Currently the sink might be looks like following:
>
> Sink {
> Writer createWriter();
> Optional> createCommitter();
> Optional> createGlobalCommitter();
> }
>
> ## Hive
>
> The HiveSink needs to compute whether a directory is finished or not. But
> HiveSink can not use the above `combine` method to decide whether a
> directory is finished or not.
>
> For example we assume that whether the directory is finished or not is
> decided by the event time. There might be a topology that the source and
> sink are forward. The event time might be different in different instances
> of the `writer`. So the GlobalCommit’s combine can not produce a
> GlobalCommT when the snapshot happens.
>
> In addition to the above case we should also consider the unaligned
> checkpoint. Because the watermark does not skip. So there might be the same
> problem in the unaligned checkpoint.
>
> ### Option1:
>
> public interface GlobalCommit {
> // provide some runtime context such as attempt-id,job-id,task-id,
> maybe the event time;provide the restore state
> void open(InitContext context, StateT state);
>
> // This is for the HiveSink. When all the writer say that the the
> bucket is finished it would return a GlobalCommitT
> Optional combine(Committable commitables)
>
> // This is for IcebergSink. Producing a GlobalCommitT every
> checkpoint.
> Optional preCommit();
>
> // Maybe we need the shareState? After we decide the directory we
> make more detailed consideration then. The id could be remembered here.
> StateT snapshotState();
>
> // developer would guarantee the idempotency by himself
> void commit(GlobalCommit globalCommit);
> }
>
> ### Option2
>
> Actually the `GlobalCommit` in the option1 mixes the `Writer` and
> `Committer` together. So it is intuitive to decouple the two functions. For
> support the hive we could prove a sink look like following
>
> Sink {
> Writer createWriter();
> Optional> createCommitter(); // we need this to
> change name.
> Optional> createGlobalAgg();
> Optional> createGlobalCommitter();
> }
>
> The pro of this method is that we use two basic concepts: `Committer` and
> `Writer` to build a HiveSink.
>
> ### CompactHiveSink / MergeHiveSink
>
> There are still other complicated cases, which are not satisfied by the
> above option. Users often complain about writing out many small files,
> which will affect file reading efficiency and the performance and stability
> of the distributed file system. CompactHiveSink/MergeHiveSink hopes to
> merge all files generated by this job in a single Checkpoint.
>
> The CompactHiveSink/MergeHiveSink topology can simply describe this
> topology as follows:
>
> CompactSubTopology -> GlobalAgg -> GobalCommitter.
>
> The CompactSubTopology would look like following:
>
> TmpFileWriter -> CompactCoodinator -> CompactorFileWriter
>
> Maybe the topology could be simpler but please keep in mind I just want to
> show that there might be very complicated topology requirements for users.
>
>
> A possible alternative option would be let the user build the topology
> himself. But considering we have two execution modes we could only use
> `Writer` and `Committer` to build the sink topology.
>
> ### Build Topology Option
>
> Sink {
> Sink addWriter(Writer Writer); // Maybe a
> WriterBuidler
> Sink addCommitter(Committer committer); // Maybe
&g

Re: [DISCUSS] FLIP-143: Unified Sink API

2020-09-18 Thread Steven Wu
Aljoscha,

> Instead the sink would have to check for each set of committables
seperately if they had already been committed. Do you think this is
feasible?

Yes, that is how it works in our internal implementation [1]. We don't use
checkpointId. We generate a manifest file (GlobalCommT) to bundle all the
data files that the committer received in one checkpoint cycle. Then we
generate a unique manifest id for by hashing the location of the manifest
file. The manifest ids are stored in Iceberg snapshot metadata. Upon
restore, we check each of the restored manifest files against Iceberg table
snapshot metadata to determine if we should discard or keep the restored
manifest files. If a commit has multiple manifest files (e.g. accumulated
from previous failed commits), we store the comma-separated manifest ids in
Iceberg snapshot metadata.

> During normal operation this set would be very small, it would usually
only be the committables for the last checkpoint. Only when there is an
outage would multiple sets of committables pile up.

You are absolutely right here. Even if there are multiple sets of
committables, it is usually the last a few or dozen of snapshots we need to
check. Even with our current inefficient implementation of traversing all
table snapshots (in the scale of thousands) from oldest to latest, it only
took avg 60 ms and max 800 ms. so it is really not a concern for Iceberg.

> CommitStatus commitGlobally(List, Nonce)

Just to clarify on the terminology here. Assuming here the Committable
meant the `GlobalCommT` (like ManifestFile in Iceberg) in
previous discussions, right? `CommT` means the Iceberg DataFile from writer
to committer.

This can work assuming we *don't have concurrent executions
of commitGlobally* even with concurrent checkpoints. Here is the scenario
regarding failure recovery I want to avoid.

Assuming checkpoints 1, 2, 3 all completed. Each checkpoint generates a
manifest file, manifest-1, 2, 3.
timeline
->
now
commitGlobally(manifest-1, nonce-1) started
 commitGlobally(manifest-2, nonce-2) started
commitGlobally(manifest-2, nonce-2) failed
commitGlobally(manifest-2 and manifest-3,
nonce-3) started
commitGlobally(manifest-1, nonce-1)
failed
commitGlobally(manifest-2 and
manifest-3, nonce-3) succeeded

Now the job failed and was restored from checkpoint 3, which contains
manifest file 1,2,3. We found nonce-3 was committed when checking Iceberg
table snapshot metadata. But in this case we won't be able to correctly
determine which manifest files were committed or not.

If it is possible to have concurrent executions of  commitGlobally, the
alternative is to generate the unique id/nonce per GlobalCommT. Then we can
check each individual GlobalCommT (ManifestFile) with Iceberg snapshot
metadata.

Thanks,
Steven

[1]
https://github.com/Netflix-Skunkworks/nfflink-connector-iceberg/blob/master/nfflink-connector-iceberg/src/main/java/com/netflix/spaas/nfflink/connector/iceberg/sink/IcebergCommitter.java#L569

On Fri, Sep 18, 2020 at 2:44 AM Aljoscha Krettek 
wrote:

> Steven,
>
> we were also wondering if it is a strict requirement that "later"
> updates to Iceberg subsume earlier updates. In the current version, you
> only check whether checkpoint X made it to Iceberg and then discard all
> committable state from Flink state for checkpoints smaller X.
>
> If we go with a (somewhat random) nonce, this would not work. Instead
> the sink would have to check for each set of committables seperately if
> they had already been committed. Do you think this is feasible? During
> normal operation this set would be very small, it would usually only be
> the committables for the last checkpoint. Only when there is an outage
> would multiple sets of committables pile up.
>
> We were thinking to extend the GlobalCommitter interface to allow it to
> report success or failure and then let the framework retry. I think this
> is something that you would need for the Iceberg case. The signature
> could be like this:
>
> CommitStatus commitGlobally(List, Nonce)
>
> where CommitStatus could be an enum of SUCCESS, TERMINAL_FAILURE, and
> RETRY.
>
> Best,
> Aljoscha
>


Re: [DISCUSS] FLIP-143: Unified Sink API

2020-09-17 Thread Steven Wu
Guowei

Just to add to what Aljoscha said regarding the unique id. Iceberg sink
checkpoints the unique id into state during snapshot. It also inserts the
unique id into the Iceberg snapshot metadata during commit. When a job
restores the state after failure, it needs to know if the restored
transactions/commits were successful or not. It basically iterates through
the list of table snapshots from Iceberg and matches the unique ids with
what is stored in Iceberg snapshot metadata.

Thanks,
Steven


On Thu, Sep 17, 2020 at 7:40 AM Aljoscha Krettek 
wrote:

> Thanks for the summary!
>
> On 16.09.20 06:29, Guowei Ma wrote:
> > ## Consensus
> >
> > 1. The motivation of the unified sink API is to decouple the sink
> > implementation from the different runtime execution mode.
> > 2. The initial scope of the unified sink API only covers the file system
> > type, which supports the real transactions. The FLIP focuses more on the
> > semantics the new sink api should support.
> > 3. We prefer the first alternative API, which could give the framework a
> > greater opportunity to optimize.
> > 4. The `Writer` needs to add a method `prepareCommit`, which would be
> > called from `prepareSnapshotPreBarrier`. And remove the `Flush` method.
> > 5. The FLIP could move the `Snapshot & Drain` section in order to be more
> > focused.
>
> Agreed!
>
> > ## Not Consensus
> >
> > 1. What should the “Unified Sink API” support/cover? The API can
> > “unified”(decoupe) the commit operation in the term of supporting exactly
> > once semantics. However, even if we narrow down the initial supported
> > system to the file system there would be different topology requirements.
> > These requirements come from performance optimization
> > (IceBergSink/MergeHiveSink) or functionality(i.e. whether a bucket is
> > “finished”).  Should the unified sink API support these requirements?
>
> Yes, this is still tricky. What is the current state, would the
> introduction of a "LocalCommit" and a "GlobalCommit" already solve both
> the Iceberg and Hive cases? I believe Hive is the most tricky one here,
> but if we introduce the "combine" method on GlobalCommit, that could
> serve the same purpose as the "aggregation operation" on the individual
> files, and we could even execute that "combine" in a distributed way.
>
> To answer the more general question, I think we will offer a couple of
> different commit strategies and sinks can implement 0 to n of them. What
> is unified about the sink is that the same sink implementation will work
> for both STREAMING and BATCH execution mode.
>
> > 2. The API does not expose the checkpoint-id because the batch execution
> > mode does not have the normal checkpoint. But there still some
> > implementations depend on this.(IceBergSink uses this to do some dedupe).
> > I think how to support this requirement depends on the first open
> question.
>
> I think this can be solved by introducing a nonce, see more thorough
> explanation below.
>
> > 3. Whether the `Writer` supports async functionality or not. Currently I
> do
> > not know which sink could benefit from it. Maybe it is just my own
> problem.
>
> Here, I don't really know. We can introduce an "isAvailable()" method
> and mostly ignore it for now and sinks can just always return true. Or,
> as an alternative, we don't add the method now but can add it later with
> a default implementation. Either way, we will probably not take
> advantage of the "isAvailable()" now because that would require more
> runtime changes.
>
> On 17.09.20 06:28, Guowei Ma wrote:
> > But my understanding is: if the committer function is idempotent, the
> > framework can guarantee exactly once semantics in batch/stream execution
> > mode. But I think maybe the idempotence should be guaranteed by the sink
> > developer, not on the basic API.
>
> I believe the problem here is that some sinks (including Iceberg) can
> only be idempotent with a little help from the framework.
>
> The process would be like this:
>
> 1. collect all committables, generate unique ID (nonce), store
> committables and ID in fault tolerant storage
>
> 2. call commitGlobal(committables, nonce)
>
> 3. Iceberg checks if there is already a commit with the given nonce, if
> not it will append a commit of the committables along with the nonce to
> the log structure/meta store
>
> The problem is that Iceberg cannot decide without some extra data
> whether a set of committables has already been committed because the
> commit basically just appends some information to the end of a log. And
> we just just keep appending the same data if we didn't check the nonce.
>
> We would have this same problem if we wanted to implement a
> write-ahead-log Kafka sink where the "commit" would just take some
> records from a file and append it to Kafka. Without looking at Kafka and
> checking if you already committed the same records you don't know if you
> already committed.
>
>
>
>
>


Re: [DISCUSS] FLIP-143: Unified Sink API

2020-09-16 Thread Steven Wu
sensus
>>
>> 1. What should the “Unified Sink API” support/cover? The API can
>> “unified”(decoupe) the commit operation in the term of supporting exactly
>> once semantics. However, even if we narrow down the initial supported
>> system to the file system there would be different topology requirements.
>> These requirements come from performance optimization
>> (IceBergSink/MergeHiveSink) or functionality(i.e. whether a bucket is
>> “finished”).  Should the unified sink API support these requirements?
>> 2. The API does not expose the checkpoint-id because the batch execution
>> mode does not have the normal checkpoint. But there still some
>> implementations depend on this.(IceBergSink uses this to do some dedupe).
>> I think how to support this requirement depends on the first open
>> question.
>> 3. Whether the `Writer` supports async functionality or not. Currently I
>> do
>> not know which sink could benefit from it. Maybe it is just my own
>> problem.
>>
>> Best,
>> Guowei
>>
>>
>> On Wed, Sep 16, 2020 at 12:02 PM Guowei Ma  wrote:
>>
>> >
>> > Hi, Steven
>> > Thanks you for your thoughtful ideas and concerns.
>> >
>> > >>I still like the concept of grouping data files per checkpoint for
>> > streaming mode. it is cleaner and probably easier to manage and deal
>> with
>> > commit failures. Plus, it >>can reduce dupes for the at least once
>> > >>mode.  I understand checkpoint is not an option for batch execution.
>> We
>> > don't have to expose the checkpointId in API, as >>long as  the internal
>> > bookkeeping groups data files by checkpoints for streaming >>mode.
>> >
>> > I think this problem(How to dedupe the combined committed data) also
>> > depends on where to place the agg/combine logic .
>> >
>> > 1. If the agg/combine takes place in the “commit” maybe we need to
>> figure
>> > out how to give the aggregated committable a unique and auto-increment
>> id
>> > in the committer.
>> > 2. If the agg/combine takes place in a separate operator maybe sink
>> > developer could maintain the id itself by using the state.
>> >
>> > I think this problem is also decided by what the topology pattern the
>> sink
>> > API should support. Actually there are already many other topology
>> > requirements. :)
>> >
>> > Best,
>> > Guowei
>> >
>> >
>> > On Wed, Sep 16, 2020 at 7:46 AM Steven Wu  wrote:
>> >
>> >> > AFAIK the committer would not see the file-1-2 when ck1 happens in
>> the
>> >> ExactlyOnce mode.
>> >>
>> >> @Guowei Ma  I think you are right for exactly
>> once
>> >> checkpoint semantics. what about "at least once"? I guess we can argue
>> that
>> >> it is fine to commit file-1-2 for at least once mode.
>> >>
>> >> I still like the concept of grouping data files per checkpoint for
>> >> streaming mode. it is cleaner and probably easier to manage and deal
>> with
>> >> commit failures. Plus, it can reduce dupes for the at least once
>> mode.  I
>> >> understand checkpoint is not an option for batch execution. We don't
>> have
>> >> to expose the checkpointId in API, as long as  the internal bookkeeping
>> >> groups data files by checkpoints for streaming mode.
>> >>
>> >>
>> >> On Tue, Sep 15, 2020 at 6:58 AM Steven Wu 
>> wrote:
>> >>
>> >>> > images don't make it through to the mailing lists. You would need to
>> >>> host the file somewhere and send a link.
>> >>>
>> >>> Sorry about that. Here is the sample DAG in google drawings.
>> >>>
>> >>>
>> https://docs.google.com/drawings/d/1-P8F2jF9RG9HHTtAfWEBRuU_2uV9aDTdqEt5dLs2JPk/edit?usp=sharing
>> >>>
>> >>>
>> >>> On Tue, Sep 15, 2020 at 4:58 AM Guowei Ma 
>> wrote:
>> >>>
>> >>>> Hi, Dawid
>> >>>>
>> >>>> >>I still find the merging case the most confusing. I don't
>> necessarily
>> >>>> understand why do you need the "SingleFileCommit" step in this
>> scenario.
>> >>>> The way I
>> >>>> >> understand "commit" operation is that it makes some data/artifacts
>> >>>> visible to the external system, thus it s

Re: [DISCUSS] FLIP-143: Unified Sink API

2020-09-15 Thread Steven Wu
> AFAIK the committer would not see the file-1-2 when ck1 happens in the
ExactlyOnce mode.

@Guowei Ma  I think you are right for exactly once
checkpoint semantics. what about "at least once"? I guess we can argue that
it is fine to commit file-1-2 for at least once mode.

I still like the concept of grouping data files per checkpoint for
streaming mode. it is cleaner and probably easier to manage and deal with
commit failures. Plus, it can reduce dupes for the at least once mode.  I
understand checkpoint is not an option for batch execution. We don't have
to expose the checkpointId in API, as long as  the internal bookkeeping
groups data files by checkpoints for streaming mode.


On Tue, Sep 15, 2020 at 6:58 AM Steven Wu  wrote:

> > images don't make it through to the mailing lists. You would need to
> host the file somewhere and send a link.
>
> Sorry about that. Here is the sample DAG in google drawings.
>
> https://docs.google.com/drawings/d/1-P8F2jF9RG9HHTtAfWEBRuU_2uV9aDTdqEt5dLs2JPk/edit?usp=sharing
>
>
> On Tue, Sep 15, 2020 at 4:58 AM Guowei Ma  wrote:
>
>> Hi, Dawid
>>
>> >>I still find the merging case the most confusing. I don't necessarily
>> understand why do you need the "SingleFileCommit" step in this scenario.
>> The way I
>> >> understand "commit" operation is that it makes some data/artifacts
>> visible to the external system, thus it should be immutable from a point
>> of
>> view of a single >>process. Having an additional step in the same process
>> that works on committed data contradicts with those assumptions. I might
>> be
>> missing something though. >> Could you elaborate >why can't it be
>> something
>> like FileWriter -> FileMergeWriter -> Committer (either global or
>> non-global)? Again it might be just me not getting the example.
>>
>> I think you are right. The topology
>> "FileWriter->FileMergeWriter->Committer" could meet the merge requirement.
>> The topology "FileWriter-> SingleFileCommitter -> FileMergeWriter ->
>> GlobalCommitter" reuses some code of the StreamingFileSink(For example
>> rolling policy) so it has the "SingleFileCommitter" in the topology. In
>> general I want to use the case to show that there are different topologies
>> according to the requirements.
>>
>> BTW: IIRC, @Jingsong Lee  telled me that the
>> actual topology of merged supported HiveSink is more complicated than
>> that.
>>
>>
>> >> I've just briefly skimmed over the proposed interfaces. I would suggest
>> one
>> >> addition to the Writer interface (as I understand this is the runtime
>> >> interface in this proposal?): add some availability method, to avoid,
>> if
>> >> possible, blocking calls on the sink. We already have similar
>> >> availability methods in the new sources [1] and in various places in
>> the
>> >> network stack [2].
>> >> BTW Let's not forget about Piotr's comment. I think we could add the
>> isAvailable or similar method to the Writer interface in the FLIP.
>>
>> Thanks @Dawid Wysakowicz   for your reminder.
>> There
>> are two many issues at the same time.
>>
>> In addition to what Ajjoscha said : there is very little system support
>> it.   Another thing I worry about is that: Does the sink's snapshot return
>> immediately when the sink's status is unavailable? Maybe we could do it by
>> dedupe some element in the state but I think it might be too complicated.
>> For me I want to know is what specific sink will benefit from this
>> feature.  @piotr   Please correct me if  I
>> misunderstand you. thanks.
>>
>> Best,
>> Guowei
>>
>>
>> On Tue, Sep 15, 2020 at 3:55 PM Dawid Wysakowicz 
>> wrote:
>>
>> > What I understand is that HiveSink's implementation might need the local
>> > committer(FileCommitter) because the file rename is needed.
>> > But the iceberg only needs to write the manifest file.  Would you like
>> to
>> > enlighten me why the Iceberg needs the local committer?
>> > Thanks
>> >
>> > Sorry if I caused a confusion here. I am not saying the Iceberg sink
>> needs
>> > a local committer. What I had in mind is that prior to the Iceberg
>> example
>> > I did not see a need for a "GlobalCommitter" in the streaming case. I
>> > thought it is always enough to have the "normal" committer in that case.
>> > Now I understand that this differentiation is not really about logical
>> &g

Re: [DISCUSS] FLIP-143: Unified Sink API

2020-09-15 Thread Steven Wu
s well, the idea behind
> > Writer#flush and Writer#snapshotState was to differentiate commit on
> > checkpoint vs final checkpoint at the end of the job. Both of these
> > methods could emit committables, but the flush should not leave any in
> > progress state (e.g. in case of file sink in STREAM mode, in
> > snapshotState it could leave some open files that would be committed in
> > a subsequent cycle, however flush should close all files). The
> > snapshotState as it is now can not be called in
> > prepareSnapshotPreBarrier as it can store some state, which should
> > happen in Operator#snapshotState as otherwise it would always be
> > synchronous. Therefore I think we would need sth like:
> >
> > void prepareCommit(boolean flush, WriterOutput output);
> >
> > ver 1:
> >
> > List snapshotState();
> >
> > ver 2:
> >
> > void snapshotState(); // not sure if we need that method at all in
> option 2
> >
> >
> > The Committer is as described in the FLIP, it's basically a function
> > "void commit(Committable)". The GobalCommitter would be a function "void
> > commit(List)". The former would be used by an S3 sink where
> > we can individually commit files to S3, a committable would be the list
> > of part uploads that will form the final file and the commit operation
> > creates the metadata in S3. The latter would be used by something like
> > Iceberg where the Committer needs a global view of all the commits to be
> > efficient and not overwhelm the system.
> >
> > I don't know yet if sinks would only implement on type of commit
> > function or potentially both at the same time, and maybe Commit can
> > return some CommitResult that gets shipped to the GlobalCommit function.
> >
> > I must admit it I did not get the need for Local/Normal + Global
> > committer at first. The Iceberg example helped a lot. I think it makes a
> > lot of sense.
> >
> >
> > For Iceberg, writers don't need any state. But the GlobalCommitter
> > needs to
> > checkpoint StateT. For the committer, CommT is "DataFile". Since a single
> > committer can collect thousands (or more) data files in one checkpoint
> > cycle, as an optimization we checkpoint a single "ManifestFile" (for the
> > collected thousands data files) as StateT. This allows us to absorb
> > extended commit outages without losing written/uploaded data files, as
> > operator state size is as small as one manifest file per checkpoint cycle
> > [2].
> > --
> > StateT snapshotState(SnapshotContext context) throws Exception;
> >
> > That means we also need the restoreCommitter API in the Sink interface
> > ---
> > Committer restoreCommitter(InitContext context, StateT
> > state);
> >
> > I think this might be a valid case. Not sure though if I would go with a
> > "state" there. Having a state in a committer would imply we need a
> > collect method as well. So far we needed a single method commit(...) and
> > the bookkeeping of the committables could be handled by the framework. I
> > think something like an optional combiner in the GlobalCommitter would
> > be enough. What do you think?
> >
> > GlobalCommitter {
> >
> > void commit(GlobalCommT globalCommittables);
> >
> > GlobalCommT combine(List committables);
> >
> > }
> >
> > A different problem that I see here is how do we handle commit failures.
> > Should the committables (both normal and global be included in the next
> > cycle, shall we retry it, ...) I think it would be worth laying it out
> > in the FLIP.
> >
> > @Aljoscha I think you can find the code Steven was referring in here:
> >
> https://github.com/Netflix-Skunkworks/nfflink-connector-iceberg/blob/master/nfflink-connector-iceberg/src/main/java/com/netflix/spaas/nfflink/connector/iceberg/sink/IcebergCommitter.java
> >
> > Best,
> >
> > Dawid
> >
> > On 14/09/2020 15:19, Aljoscha Krettek wrote:
> >
> > On 14.09.20 01:23, Steven Wu wrote:
> >
> > ## Writer interface
> >
> > For the Writer interface, should we add "*prepareSnapshot"* before the
> > checkpoint barrier emitted downstream?  IcebergWriter would need it. Or
> > would the framework call "*flush*" before the barrier emitted
> > downstream?
> > that guarantee would achieve the same goal.
> >
> > I would think that we only need flush() and the semantics are that it
> > prepares for a commit, so on a physical level it w

Re: [DISCUSS] FLIP-143: Unified Sink API

2020-09-14 Thread Steven Wu
hink something like an optional combiner in the GlobalCommitter would
> > be enough. What do you think?
> >
> > GlobalCommitter {
> >
> > void commit(GlobalCommT globalCommittables);
> >
> > GlobalCommT combine(List committables);
> >
> > }
> >
> > A different problem that I see here is how do we handle commit failures.
> > Should the committables (both normal and global be included in the next
> > cycle, shall we retry it, ...) I think it would be worth laying it out
> > in the FLIP.
> >
> > @Aljoscha I think you can find the code Steven was referring in here:
> >
> >
> https://github.com/Netflix-Skunkworks/nfflink-connector-iceberg/blob/master/nfflink-connector-iceberg/src/main/java/com/netflix/spaas/nfflink/connector/iceberg/sink/IcebergCommitter.java
> >
> > Best,
> >
> > Dawid
> >
> > On 14/09/2020 15:19, Aljoscha Krettek wrote:
> > > On 14.09.20 01:23, Steven Wu wrote:
> > >> ## Writer interface
> > >>
> > >> For the Writer interface, should we add "*prepareSnapshot"* before the
> > >> checkpoint barrier emitted downstream?  IcebergWriter would need it.
> Or
> > >> would the framework call "*flush*" before the barrier emitted
> > >> downstream?
> > >> that guarantee would achieve the same goal.
> > >
> > > I would think that we only need flush() and the semantics are that it
> > > prepares for a commit, so on a physical level it would be called from
> > > "prepareSnapshotPreBarrier". Now that I'm thinking about it more I
> > > think flush() should be renamed to something like "prepareCommit()".
> > >
> > > @Guowei, what do you think about this?
> > >
> > >> In [1], we discussed the reason for Writer to emit (checkpointId,
> CommT)
> > >> tuple to the committer. The committer needs checkpointId to separate
> out
> > >> data files for different checkpoints if concurrent checkpoints are
> > >> enabled.
> > >
> > > When can this happen? Even with concurrent checkpoints the snapshot
> > > barriers would still cleanly segregate the input stream of an operator
> > > into tranches that should manifest in only one checkpoint. With
> > > concurrent checkpoints, all that can happen is that we start a
> > > checkpoint before a last one is confirmed completed.
> > >
> > > Unless there is some weirdness in the sources and some sources start
> > > chk1 first and some other ones start chk2 first?
> > >
> > > @Piotrek, do you think this is a problem?
> > >
> > >> For the Committer interface, I am wondering if we should split the
> > >> single
> > >> commit method into separate "*collect"* and "*commit"* methods? This
> > >> way,
> > >> it can handle both single and multiple CommT objects.
> > >
> > > I think we can't do this. If the sink only needs a regular Commiter,
> > > we can perform the commits in parallel, possibly on different
> > > machines. Only when the sink needs a GlobalCommitter would we need to
> > > ship all commits to a single process and perform the commit there. If
> > > both methods were unified in one interface we couldn't make the
> > > decision of were to commit in the framework code.
> > >
> > >> For Iceberg, writers don't need any state. But the GlobalCommitter
> > >> needs to
> > >> checkpoint StateT. For the committer, CommT is "DataFile". Since a
> > >> single
> > >> committer can collect thousands (or more) data files in one checkpoint
> > >> cycle, as an optimization we checkpoint a single "ManifestFile" (for
> the
> > >> collected thousands data files) as StateT. This allows us to absorb
> > >> extended commit outages without losing written/uploaded data files, as
> > >> operator state size is as small as one manifest file per checkpoint
> > >> cycle
> > >
> > > You could have a point here. Is the code for this available in
> > > open-source? I was checking out
> > >
> >
> https://github.com/apache/iceberg/blob/master/flink/src/main/java/org/apache/iceberg/flink/sink/IcebergFilesCommitter.java
> > > and didn't find the ManifestFile optimization there.
> > >
> > > Best,
> > > Aljoscha
> > >
> >
> >
>


Re: [DISCUSS] Deprecate and remove UnionList OperatorState

2020-09-13 Thread Steven Wu
Right now, we use UnionState to store the `nextCheckpointId` in the Iceberg
sink use case, because we can't retrieve the checkpointId from
the FunctionInitializationContext during the restore case. But we can move
away from it if the restore context provides the checkpointId.

On Sat, Sep 12, 2020 at 8:20 AM Alexey Trenikhun  wrote:

> -1
>
> We use union state to generate sequences, each operator generates offset0
> + number-of-tasks -  task-index + task-specific-counter * number-of-tasks
> (e.g. for 2 instances of operator -one instance produce even number,
> another odd). Last generated sequence number is stored union list state, on
> restart from where we should start to avoid collision with already
> generated numbers, to do saw we calculate offset0 as max over union list
> state.
>
> Alexey
>
> --
> *From:* Seth Wiesman 
> *Sent:* Wednesday, September 9, 2020 9:37:03 AM
> *To:* dev 
> *Cc:* Aljoscha Krettek ; user 
> *Subject:* Re: [DISCUSS] Deprecate and remove UnionList OperatorState
>
> Generally +1
>
> The one use case I've seen of union state I've seen in production (outside
> of sources and sinks) is as a "poor mans" broadcast state. This was
> obviously before that feature was added which is now a few years ago so I
> don't know if those pipelines still exist. FWIW, if they do the state
> processor api can provide a migration path as it supports rewriting union
> state as broadcast state.
>
> Seth
>
> On Wed, Sep 9, 2020 at 10:21 AM Arvid Heise  wrote:
>
> +1 to getting rid of non-keyed state as is in general and for union state
> in particular. I had a hard time to wrap my head around the semantics of
> non-keyed state when designing the rescale of unaligned checkpoint.
>
> The only plausible use cases are legacy source and sinks. Both should also
> be reworked in deprecated.
>
> My main question is how to represent state in these two cases. For sources,
> state should probably be bound to splits. In that regard, split (id) may
> act as a key. More generally, there should be probably a concept that
> supersedes keys and includes splits.
>
> For sinks, I can see two cases:
> - Either we are in a keyed context, then state should be bound to the key.
> - Or we are in a non-keyed context, then state might be bound to the split
> (?) in case of a source->sink chaining.
> - Maybe it should also be a new(?) concept like output partition.
>
> It's not clear to me if there are more cases and if we can always find a
> good way to bind state to some sort of key, especially for arbitrary
> communication patterns (which we may need to replace as well potentially).
>
> On Wed, Sep 9, 2020 at 4:09 PM Aljoscha Krettek 
> wrote:
>
> > Hi Devs,
> >
> > @Users: I'm cc'ing the user ML to see if there are any users that are
> > relying on this feature. Please comment here if that is the case.
> >
> > I'd like to discuss the deprecation and eventual removal of UnionList
> > Operator State, aka Operator State with Union Redistribution. If you
> > don't know what I'm talking about you can take a look in the
> > documentation: [1]. It's not documented thoroughly because it started
> > out as mostly an internal feature.
> >
> > The immediate main reason for removing this is also mentioned in the
> > documentation: "Do not use this feature if your list may have high
> > cardinality. Checkpoint metadata will store an offset to each list
> > entry, which could lead to RPC framesize or out-of-memory errors." The
> > insidious part of this limitation is that you will only notice that
> > there is a problem when it is too late. Checkpointing will still work
> > and a program can continue when the state size is too big. The system
> > will only fail when trying to restore from a snapshot that has union
> > state that is too big. This could be fixed by working around that issue
> > but I think there are more long-term issues with this type of state.
> >
> > I think we need to deprecate and remove API for state that is not tied
> > to a key. Keyed state is easy to reason about, the system can
> > re-partition state and also re-partition records and therefore scale the
> > system in and out. Operator state, on the other hand is not tied to a
> > key but an operator. This is a more "physical" concept, if you will,
> > that potentially ties business logic closer to the underlying runtime
> > execution model, which in turns means less degrees of freedom for the
> > framework, that is Flink. This is future work, though, but we should
> > start with deprecating union list state because it is the potentially
> > most dangerous type of state.
> >
> > We currently use this state type internally in at least the
> > StreamingFileSink, FlinkKafkaConsumer, and FlinkKafkaProducer. However,
> > we're in the process of hopefully getting rid of it there with our work
> > on sources and sinks. Before we fully remove it, we should of course
> > signal this to users by deprecating it.
> >
> > What do you think?
> >
> 

Re: [DISCUSS] FLIP-143: Unified Sink API

2020-09-13 Thread Steven Wu
 you can
> > prepare "committables" in one process and commit those from another
> > process. Those are systems that support "real" transactions as you need
> > them in a two-phase commit protocol. This includes:
> >
> >   - File Sink, including HDFS, S3 via special part-file uploads
> >   - Iceberg
> >   - HDFS
> >
> > The work should include runtime support for both BATCH and STREAMING
> > execution as outlined in https://s.apache.org/FLIP-134.
> >
> > Supporting Kafka already becomes difficult but I'll go into that below.
> >
> > ## Where to run the Committer
> >
> > Guowei hinted at this in the FLIP: the idea is that the Framework
> > decides where to run the committer based on requirements and based on
> > the execution mode (STREAMING or BATCH).
> >
> > Something that is not in the FLIP but which we thought about is that we
> > need to allow different types of committers. I'm currently thinking we
> > need at least a normal "Committer" and a "GlobalCommiter" (name TBD).
> >
> > The Committer is as described in the FLIP, it's basically a function
> > "void commit(Committable)". The GobalCommitter would be a function "void
> > commit(List)". The former would be used by an S3 sink where
> > we can individually commit files to S3, a committable would be the list
> > of part uploads that will form the final file and the commit operation
> > creates the metadata in S3. The latter would be used by something like
> > Iceberg where the Committer needs a global view of all the commits to be
> > efficient and not overwhelm the system.
> >
> > I don't know yet if sinks would only implement on type of commit
> > function or potentially both at the same time, and maybe Commit can
> > return some CommitResult that gets shipped to the GlobalCommit function.
> >
> > An interesting read on this topic is the discussion on
> > https://issues.apache.org/jira/browse/MAPREDUCE-4815. About the Hadoop
> > FileOutputCommitter and the two different available algorithms for
> > committing final job/task results.
> >
> > These interfaces untie the sink implementation from the Runtime and we
> > could, for example, have a runtime like this:
> >
> > ### BATCH
> >
> >   - Collect all committables and store them in a fault tolerant way
> > until the job finishes
> >   - For a normal Commit function, call it on the individual commits. We
> > can potentially distribute this if it becomes a bottleneck
> >   - For GlobalCommit function, call it will all the commits. This cannot
> > be distributed
> >
> > We can collect the committables in an OperatorCoordinator or potentially
> > somehow in a task. Though I prefer an OperatorCoordinator right now. The
> > operator coordinator needs to keep the commits in a fault-tolerant way.
> >
> > ### STREAMING
> >
> >   - For normal Commit, keep the committables in state on the individual
> > tasks, commit them when a checkpoint completes
> >   - For global CommitFunction we have options: collect them in a DOP-1
> > operator in the topology or send them to an OperatorCoordinator to do
> > the commit there. This is where the source/sink duality that Steven
> > mentions becomes visible.
> >
> > ## Kafka
> >
> > Kafka is a problematic case because it doesn't really support
> > transactions as outlined above. Our current Sink implementations works
> > around that with hacks but that only gets us so far.
> >
> > The problem with Kafka is that we need to aggressively clean up pending
> > transactions in case a failure happens. Otherwise stale transactions
> > would block downstream consumers. See here for details:
> > http://kafka.apache.org/documentation/#isolation.level.
> >
> > The way we solve this in the current Kafka sink is by using a fixed pool
> > of transactional IDs and then cancelling all outstanding transactions
> > for the IDs when we restore from a savepoint. In order for this to work
> > we need to recycle the IDs, so there needs to be a back-channel from the
> > Committer to the Writter, or they need to share internal state.
> >
> > I don't get see a satisfying solution for this so I think we should
> > exclude this from the initial version.
> >
> > ## On Write-Ahead-Log Sinks
> >
> > Some sinks, like ES or Cassandra would require that we keep a WAL in
> > Flink and then ship the contents to the external system on checkpoint.
> > The reason is that these systems don't support real trans

Re: [DISCUSS] FLIP-143: Unified Sink API

2020-09-10 Thread Steven Wu
Guowei,

Thanks a lot for the proposal and starting the discussion thread. Very
excited.

For the big question of "Is the sink an operator or a topology?", I have a
few related sub questions.
* Where should we run the committers?
* Is the committer parallel or single parallelism?
* Can a single choice satisfy all sinks?

Trying to envision how some sinks can be implemented with this new unified
sink interface.

1. Kafka sink

Kafka supports non-transactional and transactional writes
* Non-transaction writes don't need commit action. we can have *parallel
writers and no/no-op committers*. This is probably true for other
non-transactional message queues.
* Transaction writes can be implemented as *parallel writers and parallel
committers*. In this case, I don't know if it makes sense to separate
writers and committers into two separate operators, because they probably
need to share the same KafkaProducer object.

Either way, both writers and committers probably should *run inside task
managers*.

2. ES sink

ES sink typically buffers the data up to a certain size or time threshold
and then uploads/commits a batch to ES. Writers buffer data and flush when
needed, and committer does the HTTP bulk upload to commit. To avoid
serialization/deserialization cost, we should run *parallel writers and
parallel committers* and they *should be* *chained or bundled together*
while *running inside task managers*.

It can also be implemented as *parallel writers and no/no-op committers*,
where all logics (batching and upload) are put inside the writers.

3. Iceberg [1] sink

It is currently implemented as two-stage operators with *parallel writers
and single-parallelism committers*.
* *parallel writers* that write records into data files. Upon checkpoint,
writers flush and upload the files, and send the metadata/location of the
data files to the downstream committer. Writers need to do the flush inside
the "prepareSnapshotPreBarrier" method (NOT "snapshotState" method) before
forwarding the checkpoint barrier to the committer
* single-parallelism committer operator. It collects data files from
upstream writers. During "snapshotState", it saves collected data files (or
an uber metadata file) into state. When the checkpoint is completed, inside
"notifyCheckpointComplete" it commits those data files to Iceberg tables. *The
committer has to be single parallelism*, because we don't want hundreds or
thousands of parallel committers to compete for commit operations with
opportunistic concurrency control. It will be very inefficient and probably
infeasible if the parallelism is high. Too many tiny commits/transactions
can also slow down both the table write and read paths due to too many
manifest files.

Right now, both Iceberg writer and committer operators run inside task
managers. It has one major drawback. With Iceberg sink, embarrassingly
parallel jobs won't be embarrassingly parallel anymore. That breaks the
benefit of region recovery for embarrassingly parallel DAG. Conceptually,
the Writer-Committer sink pattern is like the mirroring of the FLIP-27
Enumerator-Reader source pattern. It will be better *if the committer can
run inside the job manager* like the SplitEnumerator for the FLIP-27
source.

---
Additional questions regarding the doc/API
* Any example for the writer shared state (Writer#snapshotSharedState)?
* We allow the case where the writer has no state, right? Meaning WriterS
can be Void.

[1] https://iceberg.apache.org/

Thanks,
Steven

On Thu, Sep 10, 2020 at 6:44 AM Guowei Ma  wrote:

> Hi, devs & users
>
> As discussed in FLIP-131[1], Flink will deprecate the DataSet API in favor
> of DataStream API and Table API. Users should be able to use DataStream API
> to write jobs that support both bounded and unbounded execution modes.
> However, Flink does not provide a sink API to guarantee the Exactly-once
> semantics in both bounded and unbounded scenarios, which blocks the
> unification.
>
> So we want to introduce a new unified sink API which could let the user
> develop the sink once and run it everywhere. You could find more details in
> FLIP-143[2].
>
> The FLIP contains some open questions that I'd really appreciate inputs
> from the community. Some of the open questions include:
>
>1. We provide two alternative Sink API in the FLIP. The only
>difference between the two versions is how to expose the state to the user.
>We want to know which one is your preference?
>2. How does the sink API support to write to the Hive?
>3. Is the sink an operator or a topology?
>
> [1]
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=158866741
> [2]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-143%3A+Unified+Sink+API
>
> Best,
> Guowei
>


Re: [DISCUSS] Planning Flink 1.12

2020-08-14 Thread Steven Wu
What about the work of migrating some Flink sources to the new FLIP-27
source interface? They are not listed in the 1.12 release wiki page.

On Thu, Aug 13, 2020 at 6:51 PM Dian Fu  wrote:

> Hi Rodrigo,
>
> Both FLIP-130 and FLIP-133 will be in the list of 1.12. Besides, there are
> also some other features from PyFlink side in 1.12. More details could be
> found in the wiki page(
> https://cwiki.apache.org/confluence/display/FLINK/1.12+Release <
> https://cwiki.apache.org/confluence/display/FLINK/1.12+Release>).
>
> Regards,
> Dian
>
> > 在 2020年8月14日,上午9:37,rodrigobrochado 
> 写道:
> >
> > Hi,
> >
> > I hope it's not too late to ask, but would FLIP-130 [1] and FLIP-133 [2]
> be
> > considered? I think that it would be nice to have some details of pyFlink
> > Datastreams API (FLIP-130) on the roadmap, giving us (users) more
> insights
> > into what we can expect from pyFlink in the near future.
> >
> >
> > [1]
> >
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-130-Support-for-Python-DataStream-API-Stateless-Part-td43035.html
> > [2]
> >
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-133-Rework-PyFlink-Documentation-tt43570.html
> >
> >
> > Thanks,
> > Rodrigo
> >
> >
> >
> > --
> > Sent from:
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/
>
>


Re: [VOTE] Release 1.11.0, release candidate #4

2020-07-04 Thread Steven Wu
+1 (non-binding)

- rolled out to thousands of router jobs in our test env
- tested with a large-state job. Did simple resilience and
checkpoint/savepoint tests. General performance metrics look on par.
- tested with a high-parallelism stateless transformation job. General
performance metrics look on par.

On Sat, Jul 4, 2020 at 7:39 AM Zhijiang 
wrote:

> Hi Thomas,
>
> Thanks for the further update information.
>
> I guess we can dismiss the network stack changes, since in your case the
> downstream and upstream would probably be deployed in the same slot
> bypassing the network data shuffle.
> Also I guess release-1.11 will not bring general performance regression in
> runtime engine, as we also did the performance testing for all general
> cases by [1] in real cluster before and the testing results should fit the
> expectation. But we indeed did not test the specific source and sink
> connectors yet as I known.
>
> Regarding your performance regression with 40%, I wonder it is probably
> related to specific source/sink changes (e.g. kinesis) or environment
> issues with corner case.
> If possible, it would be helpful to further locate whether the regression
> is caused by kinesis, by replacing the kinesis source & sink and keeping
> the others same.
>
> As you said, it would be efficient to contact with you directly next week
> to further discuss this issue. And we are willing/eager to provide any help
> to resolve this issue soon.
>
> Besides that, I guess this issue should not be the blocker for the
> release, since it is probably a corner case based on the current analysis.
> If we really conclude anything need to be resolved after the final
> release, then we can also make the next minor release-1.11.1 come soon.
>
> [1] https://issues.apache.org/jira/browse/FLINK-18433
>
> Best,
> Zhijiang
>
>
> --
> From:Thomas Weise 
> Send Time:2020年7月4日(星期六) 12:26
> To:dev ; Zhijiang 
> Cc:Yingjie Cao 
> Subject:Re: [VOTE] Release 1.11.0, release candidate #4
>
> Hi Zhijiang,
>
> It will probably be best if we connect next week and discuss the issue
> directly since this could be quite difficult to reproduce.
>
> Before the testing result on our side comes out for your respective job
> case, I have some other questions to confirm for further analysis:
> -  How much percentage regression you found after switching to 1.11?
>
> ~40% throughput decline
>
> -  Are there any network bottleneck in your cluster? E.g. the network
> bandwidth is full caused by other jobs? If so, it might have more effects
> by above [2]
>
> The test runs on a k8s cluster that is also used for other production jobs.
> There is no reason be believe network is the bottleneck.
>
> -  Did you adjust the default network buffer setting? E.g.
> "taskmanager.network.memory.floating-buffers-per-gate" or
> "taskmanager.network.memory.buffers-per-channel"
>
> The job is using the defaults, i.e we don't configure the settings. If you
> want me to try specific settings in the hope that it will help to isolate
> the issue please let me know.
>
> -  I guess the topology has three vertexes "KinesisConsumer -> Chained
> FlatMap -> KinesisProducer", and the partition mode for "KinesisConsumer ->
> FlatMap" and "FlatMap->KinesisProducer" are both "forward"? If so, the edge
> connection is one-to-one, not all-to-all, then the above [1][2] should no
> effects in theory with default network buffer setting.
>
> There are only 2 vertices and the edge is "forward".
>
> - By slot sharing, I guess these three vertex parallelism task would
> probably be deployed into the same slot, then the data shuffle is by memory
> queue, not network stack. If so, the above [2] should no effect.
>
> Yes, vertices share slots.
>
> - I also saw some Jira changes for kinesis in this release, could you
> confirm that these changes would not effect the performance?
>
> I will need to take a look. 1.10 already had a regression introduced by the
> Kinesis producer update.
>
>
> Thanks,
> Thomas
>
>
> On Thu, Jul 2, 2020 at 11:46 PM Zhijiang  .invalid>
> wrote:
>
> > Hi Thomas,
> >
> > Thanks for your reply with rich information!
> >
> > We are trying to reproduce your case in our cluster to further verify it,
> > and  @Yingjie Cao is working on it now.
> >  As we have not kinesis consumer and producer internally, so we will
> > construct the common source and sink instead in the case of backpressure.
> >
> > Firstly, we can dismiss the rockdb factor in this release, since you also
> > mentioned that "filesystem leads to same symptoms".
> >
> > Secondly, if my understanding is right, you emphasis that the regression
> > only exists for the jobs with low checkpoint interval (10s).
> > Based on that, I have two suspicions with the network related changes in
> > this release:
> > - [1]: Limited the maximum backlog value (default 10) in subpartition
> > queue.
> > - [2]: Delay send the following 

Re: [DISCUSS] (Document) Backwards Compatibility of Savepoints

2020-06-06 Thread Steven Wu
> Why do we want to restore from the savepoint taken the new Flink version
instead of the previous savepoint, is that we want to minimize the source
rewind?

You are exactly right. E.g. A user upgraded to the new version for a few
days and decided to roll back to the old version due to some stability
issue. Previous savepoint for the old version was taken a few days ago,
which is a long time to rewind and reprocess. It can even be out of Kafka
retention.

On Fri, Jun 5, 2020 at 8:13 PM Congxian Qiu  wrote:

> Sorry for jumping in late.
>
> Currently, we only have a forward-compatible guarantee and do not have the
> backward-compatible guarantee. And as this may take a large effort to
> support the backward-compatible guarantee. so I agree that we should write
> this down explicitly.
>
> For the given scenario, I have a little question: Why do we want to restore
> from the savepoint taken the new Flink version instead of the previous
> savepoint, is that we want to minimize the source rewind?
>
> Best,
> Congxian
>
>
> Steven Wu  于2020年6月3日周三 上午9:08写道:
>
> > Current Flink documentation is actually pretty clear about no guarantees
> > for backward compatibility.
> >
> >
> https://ci.apache.org/projects/flink/flink-docs-stable/ops/upgrading.html#compatibility-table
> >
> > On Tue, Jun 2, 2020 at 3:20 AM Yun Tang  wrote:
> >
> > > Since Flink lacks of such kind of experiments to ensure the backwards
> > > compatibility of savepoints before, especially those built-in operators
> > > with their own operator state.
> > > I am afraid we need huge energy to cover all cases to give the most
> > > correct result.
> > >
> > > I prefer to just point out this in documentation to say explicitly
> Flink
> > > does not guarantee such kind of backwards compatibility.
> > >
> > > Best
> > > Yun Tang
> > > 
> > > From: Ufuk Celebi 
> > > Sent: Wednesday, May 27, 2020 16:42
> > > To: dev@flink.apache.org 
> > > Subject: Re: [DISCUSS] (Document) Backwards Compatibility of Savepoints
> > >
> > > I agree with Konstantin and Steven that it makes sense to point this
> out
> > > explicitly.
> > >
> > > I think that the following would be helpful:
> > >
> > > 1/ Mention breaking compatibility in release notes
> > >
> > > 2/ Update the linked table to reflect compatibilities while pointing
> out
> > > what the community commits to maintain going forward (e.g. "happens to
> > > work" vs. "guaranteed to work")
> > >
> > > In general, the table is quite large. Would it make sense to order the
> > > releases in reverse order (assuming that the table is more relevant for
> > > recent releases)?
> > >
> > > – Ufuk
> > >
> > > On Tue, May 26, 2020 at 8:36 PM Steven Wu 
> wrote:
> > >
> > > > > A use case for this might be when you want to rollback a framework
> > > > upgrade (after some time) due to e.g. a performance
> > > > or stability issue.
> > > >
> > > > Downgrade (that Konstantin called out) is an important and realistic
> > > > scenario. It will be great to support backward compatibility for
> > > savepoint
> > > > or at least document any breaking change.
> > > >
> > > > On Tue, May 26, 2020 at 4:39 AM Piotr Nowojski 
> > > > wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > It might have been implicit choice, but so far we were not
> supporting
> > > the
> > > > > scenario that you are asking for. It has never been tested and we
> > have
> > > > > lot’s of state migration code sprinkled among our code base (for
> > > example
> > > > > upgrading state fields of the operators like [1]), that only
> supports
> > > > > upgrades, not downgrades.
> > > > >
> > > > > Also we do not have testing infrastructure for checking the
> > downgrades.
> > > > We
> > > > > would need to check if save points taken from master branch, are
> > > readable
> > > > > by previous releases (not release branch!).
> > > > >
> > > > > So all in all, I don’t think it can be easily done. It would
> require
> > > some
> > > > > effort to start maintaining backward compatibility.
> > > > >
> > > > > Piotrek
> > &g

Re: [DISCUSS] (Document) Backwards Compatibility of Savepoints

2020-06-02 Thread Steven Wu
Current Flink documentation is actually pretty clear about no guarantees
for backward compatibility.
https://ci.apache.org/projects/flink/flink-docs-stable/ops/upgrading.html#compatibility-table

On Tue, Jun 2, 2020 at 3:20 AM Yun Tang  wrote:

> Since Flink lacks of such kind of experiments to ensure the backwards
> compatibility of savepoints before, especially those built-in operators
> with their own operator state.
> I am afraid we need huge energy to cover all cases to give the most
> correct result.
>
> I prefer to just point out this in documentation to say explicitly Flink
> does not guarantee such kind of backwards compatibility.
>
> Best
> Yun Tang
> 
> From: Ufuk Celebi 
> Sent: Wednesday, May 27, 2020 16:42
> To: dev@flink.apache.org 
> Subject: Re: [DISCUSS] (Document) Backwards Compatibility of Savepoints
>
> I agree with Konstantin and Steven that it makes sense to point this out
> explicitly.
>
> I think that the following would be helpful:
>
> 1/ Mention breaking compatibility in release notes
>
> 2/ Update the linked table to reflect compatibilities while pointing out
> what the community commits to maintain going forward (e.g. "happens to
> work" vs. "guaranteed to work")
>
> In general, the table is quite large. Would it make sense to order the
> releases in reverse order (assuming that the table is more relevant for
> recent releases)?
>
> – Ufuk
>
> On Tue, May 26, 2020 at 8:36 PM Steven Wu  wrote:
>
> > > A use case for this might be when you want to rollback a framework
> > upgrade (after some time) due to e.g. a performance
> > or stability issue.
> >
> > Downgrade (that Konstantin called out) is an important and realistic
> > scenario. It will be great to support backward compatibility for
> savepoint
> > or at least document any breaking change.
> >
> > On Tue, May 26, 2020 at 4:39 AM Piotr Nowojski 
> > wrote:
> >
> > > Hi,
> > >
> > > It might have been implicit choice, but so far we were not supporting
> the
> > > scenario that you are asking for. It has never been tested and we have
> > > lot’s of state migration code sprinkled among our code base (for
> example
> > > upgrading state fields of the operators like [1]), that only supports
> > > upgrades, not downgrades.
> > >
> > > Also we do not have testing infrastructure for checking the downgrades.
> > We
> > > would need to check if save points taken from master branch, are
> readable
> > > by previous releases (not release branch!).
> > >
> > > So all in all, I don’t think it can be easily done. It would require
> some
> > > effort to start maintaining backward compatibility.
> > >
> > > Piotrek
> > >
> > > [1]
> > >
> >
> org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducer011#migrateNextTransactionalIdHindState
> > >
> > > > On 26 May 2020, at 13:18, Konstantin Knauf 
> wrote:
> > > >
> > > > Hi everyone,
> > > >
> > > > I recently stumbled across the fact that Savepoints created with
> Flink
> > > 1.11
> > > > can not be read by Flink 1.10. A use case for this might be when you
> > want
> > > > to rollback a framework upgrade (after some time) due to e.g. a
> > > performance
> > > > or stability issue.
> > > >
> > > > From the documentation [1] it seems as if the Savepoint format is
> > > generally
> > > > only forward-compatible although in many cases it is actually also
> > > > backwards compatible (e.g. Savepoint taken in Flink 1.10, restored
> with
> > > > Flink 1.9).
> > > >
> > > > Was it a deliberate choice not to document any backwards
> compatibility?
> > > If
> > > > not, should we add the missing entries in the compatibility table?
> > > >
> > > > Thanks,
> > > >
> > > > Konstantin
> > > >
> > > > [1]
> > > >
> > >
> >
> https://ci.apache.org/projects/flink/flink-docs-master/ops/upgrading.html#compatibility-table
> > > >
> > > > --
> > > >
> > > > Konstantin Knauf
> > > >
> > > > https://twitter.com/snntrable
> > > >
> > > > https://github.com/knaufk
> > >
> > >
> >
>


Re: [DISCUSS] (Document) Backwards Compatibility of Savepoints

2020-05-26 Thread Steven Wu
> A use case for this might be when you want to rollback a framework
upgrade (after some time) due to e.g. a performance
or stability issue.

Downgrade (that Konstantin called out) is an important and realistic
scenario. It will be great to support backward compatibility for savepoint
or at least document any breaking change.

On Tue, May 26, 2020 at 4:39 AM Piotr Nowojski  wrote:

> Hi,
>
> It might have been implicit choice, but so far we were not supporting the
> scenario that you are asking for. It has never been tested and we have
> lot’s of state migration code sprinkled among our code base (for example
> upgrading state fields of the operators like [1]), that only supports
> upgrades, not downgrades.
>
> Also we do not have testing infrastructure for checking the downgrades. We
> would need to check if save points taken from master branch, are readable
> by previous releases (not release branch!).
>
> So all in all, I don’t think it can be easily done. It would require some
> effort to start maintaining backward compatibility.
>
> Piotrek
>
> [1]
> org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducer011#migrateNextTransactionalIdHindState
>
> > On 26 May 2020, at 13:18, Konstantin Knauf  wrote:
> >
> > Hi everyone,
> >
> > I recently stumbled across the fact that Savepoints created with Flink
> 1.11
> > can not be read by Flink 1.10. A use case for this might be when you want
> > to rollback a framework upgrade (after some time) due to e.g. a
> performance
> > or stability issue.
> >
> > From the documentation [1] it seems as if the Savepoint format is
> generally
> > only forward-compatible although in many cases it is actually also
> > backwards compatible (e.g. Savepoint taken in Flink 1.10, restored with
> > Flink 1.9).
> >
> > Was it a deliberate choice not to document any backwards compatibility?
> If
> > not, should we add the missing entries in the compatibility table?
> >
> > Thanks,
> >
> > Konstantin
> >
> > [1]
> >
> https://ci.apache.org/projects/flink/flink-docs-master/ops/upgrading.html#compatibility-table
> >
> > --
> >
> > Konstantin Knauf
> >
> > https://twitter.com/snntrable
> >
> > https://github.com/knaufk
>
>


Re: [DISCUSS] FLIP-118: Improve Flink’s ID system

2020-03-30 Thread Steven Wu
+1 on allowing user defined resourceId for taskmanager

On Sun, Mar 29, 2020 at 7:24 PM Yang Wang  wrote:

> Hi Konstantin,
>
> I think it is a good idea. Currently, our users also report a similar issue
> with
> resourceId of standalone cluster. When we start a standalone cluster now,
> the `TaskManagerRunner` always generates a uuid for the resourceId. It will
> be used to register to the jobmanager and not convenient to match with the
> real
> taskmanager, especially in container environment.
>
> I think a probably solution is we could support the user defined
> resourceId.
> We could get it from the environment. For standalone on K8s, we could set
> the "RESOURCE_ID" env to the pod name so that it is easier to match the
> taskmanager with K8s pod.
>
> Moreover, i am afraid we could not set the pod name to the resourceId. I
> think
> you could set the "deployment.meta.name". Since the pod name is generated
> by
> K8s in the pattern {deployment.meta.nane}-{rc.uuid}-{uuid}. On the
> contrary, we
> will set the resourceId to the pod name.
>
>
> Best,
> Yang
>
> Konstantin Knauf  于2020年3月29日周日 下午8:06写道:
>
> > Hi Yangze, Hi Till,
> >
> > thanks you for working on this topic. I believe it will make debugging
> > large Apache Flink deployments much more feasible.
> >
> > I was wondering whether it would make sense to allow the user to specify
> > the Resource ID in standalone setups?  For example, many users still
> > implicitly use standalone clusters on Kubernetes (the native support is
> > still experimental) and in these cases it would be interesting to also
> set
> > the PodName as the ResourceID. What do you think?
> >
> > Cheers,
> >
> > Kosntantin
> >
> > On Thu, Mar 26, 2020 at 6:49 PM Till Rohrmann 
> > wrote:
> >
> > > Hi Yangze,
> > >
> > > thanks for creating this FLIP. I think it is a very good improvement
> > > helping our users and ourselves understanding better what's going on in
> > > Flink.
> > >
> > > Creating the ResourceIDs with host information/pod name is a good idea.
> > >
> > > Also deriving ExecutionGraph IDs from their superset ID is a good idea.
> > >
> > > The InstanceID is used for fencing purposes. I would not make it a
> > > composition of the ResourceID + a monotonically increasing number. The
> > > problem is that in case of a RM failure the InstanceIDs would start
> from
> > 0
> > > again and this could lead to collisions.
> > >
> > > Logging more information on how the different runtime IDs are
> correlated
> > is
> > > also a good idea.
> > >
> > > Two other ideas for simplifying the ids are the following:
> > >
> > > * The SlotRequestID was introduced because the SlotPool was a separate
> > > RpcEndpoint a while ago. With this no longer being the case I think we
> > > could remove the SlotRequestID and replace it with the AllocationID.
> > > * Instead of creating new SlotRequestIDs for multi task slots one could
> > > derive them from the SlotRequestID used for requesting the underlying
> > > AllocatedSlot.
> > >
> > > Given that the slot sharing logic will most likely be reworked with the
> > > pipelined region scheduling, we might be able to resolve these two
> points
> > > as part of the pipelined region scheduling effort.
> > >
> > > Cheers,
> > > Till
> > >
> > > On Thu, Mar 26, 2020 at 10:51 AM Yangze Guo 
> wrote:
> > >
> > > > Hi everyone,
> > > >
> > > > We would like to start a discussion thread on "FLIP-118: Improve
> > > > Flink’s ID system"[1].
> > > >
> > > > This FLIP mainly discusses the following issues, target to enhance
> the
> > > > readability of IDs in log and help user to debug in case of failures:
> > > >
> > > > - Enhance the readability of the string literals of IDs. Most of them
> > > > are hashcodes, e.g. ExecutionAttemptID, which do not provide much
> > > > meaningful information and are hard to recognize and compare for
> > > > users.
> > > > - Log the ID’s lineage information to make debugging more convenient.
> > > > Currently, the log fails to always show the lineage information
> > > > between IDs. Finding out relationships between entities identified by
> > > > given IDs is a common demand, e.g., slot of which AllocationID is
> > > > assigned to satisfy slot request of with SlotRequestID. Absence of
> > > > such lineage information, it’s impossible to track the end to end
> > > > lifecycle of an Execution or a Task now, which makes debugging
> > > > difficult.
> > > >
> > > > Key changes proposed in the FLIP are as follows:
> > > >
> > > > - Add location information to distributed components
> > > > - Add topology information to graph components
> > > > - Log the ID’s lineage information
> > > > - Expose the identifier of distributing component to user
> > > >
> > > > Please find more details in the FLIP wiki document [1]. Looking
> forward
> > > to
> > > > your feedbacks.
> > > >
> > > > [1]
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=148643521
> > > >
> > > > Best,
> > > > Yangze Guo
> > > >
> 

Re: [VOTE] Release 1.10.0, release candidate #1

2020-02-02 Thread Steven Wu
I filed a small issue regarding readability for memory configurations. It
is not a blocking issue. I already attached a PR.
https://issues.apache.org/jira/browse/FLINK-15846

On Fri, Jan 31, 2020 at 9:20 PM Thomas Weise  wrote:

> As part of testing the RC, I run into the following issue with a test case
> that runs a job from a packaged jar on a MiniCluster. This test had to be
> modified due to the client-side API changes in 1.10.
>
> The issue is that the jar file that also contains the entry point isn't
> part of the user classpath on the task manager. The entry point is executed
> successfully; when removing all user code from the job graph, the test
> passes.
>
> If the jar isn't shipped automatically to the task manager, what do I need
> to set for it to occur?
>
> Thanks,
> Thomas
>
>
>   @ClassRule
>   public static final MiniClusterResource MINI_CLUSTER_RESOURCE = new
> MiniClusterResource(
>   new MiniClusterResourceConfiguration.Builder()
>   .build());
>
>   @Test(timeout = 3)
>   public void test() throws Exception {
> final URI restAddress =
> MINI_CLUSTER_RESOURCE.getMiniCluster().getRestAddress().get();
> Configuration config = new Configuration();
> config.setString(JobManagerOptions.ADDRESS, restAddress.getHost());
> config.setString(RestOptions.ADDRESS, restAddress.getHost());
> config.setInteger(RestOptions.PORT, restAddress.getPort());
> config.set(CoreOptions.DEFAULT_PARALLELISM, 1);
> config.setString(DeploymentOptions.TARGET, RemoteExecutor.NAME);
>
> String entryPoint = "my.TestFlinkJob";
>
> PackagedProgram.Builder program = PackagedProgram.newBuilder()
> .setJarFile(new File(JAR_PATH))
> .setEntryPointClassName(entryPoint);
>
> ClientUtils.executeProgram(DefaultExecutorServiceLoader.INSTANCE,
> config, program.build());
>   }
>
> The user function deserialization error:
>
> org.apache.flink.streaming.runtime.tasks.StreamTaskException: Cannot
> instantiate user function.
> at
>
> org.apache.flink.streaming.api.graph.StreamConfig.getStreamOperatorFactory(StreamConfig.java:269)
> at
>
> org.apache.flink.streaming.runtime.tasks.OperatorChain.(OperatorChain.java:115)
> at
>
> org.apache.flink.streaming.runtime.tasks.StreamTask.beforeInvoke(StreamTask.java:433)
> at
>
> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:461)
> at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:707)
> at org.apache.flink.runtime.taskmanager.Task.run(Task.java:532)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: java.io.StreamCorruptedException: unexpected block data
> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1581)
> at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2278)
> at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2158)
>
> With 1.9, the driver code would be:
>
> PackagedProgram program = new PackagedProgram(new File(JAR_PATH),
> entryPoint, new String[]{});
> RestClusterClient client = new RestClusterClient(config,
> "RemoteExecutor");
> client.run(program, 1);
>
> On Fri, Jan 31, 2020 at 9:16 PM Jingsong Li 
> wrote:
>
> > Thanks Jincheng,
> >
> > FLINK-15840 [1] should be a blocker, lead to
> > "TableEnvironment.from/scan(string path)" cannot be used for all
> > temporaryTable and catalogTable (not DataStreamTable). Of course, it can
> be
> > bypassed by "TableEnvironment.sqlQuery("select * from t")", but
> "from/scan"
> > are very important api of TableEnvironment and pure TableApi can't be
> used
> > seriously.
> >
> > [1] https://issues.apache.org/jira/browse/FLINK-15840
> >
> > Best,
> > Jingsong Lee
> >
> > On Sat, Feb 1, 2020 at 12:47 PM Benchao Li  wrote:
> >
> > > Hi all,
> > >
> > > I also have a issue[1] which I think it's great to be included in 1.10
> > > release. The pr is already under review.
> > >
> > > [1] https://issues.apache.org/jira/projects/FLINK/issues/FLINK-15494
> > >
> > > jincheng sun  于2020年2月1日周六 下午12:33写道:
> > >
> > > > Hi folks,
> > > >
> > > > I found another issue related to Blink planner that
> ClassCastException
> > > > would be thrown when use ConnectorDescriptor to register the Source.
> > > > Not sure if it is a blocker. The issue can be found in [1], anyway,
> > it's
> > > > better to fix this issue in new RC.
> > > >
> > > > Best,
> > > > Jincheng
> > > >
> > > > [1] https://issues.apache.org/jira/browse/FLINK-15840
> > > >
> > > >
> > > >
> > > > Till Rohrmann  于2020年1月31日周五 下午10:29写道:
> > > >
> > > > > Hi everyone,
> > > > >
> > > > > -1, because flink-kubernetes does not have the correct NOTICE file.
> > > > >
> > > > > Here is the issue to track the problem [1].
> > > > >
> > > > > [1] https://issues.apache.org/jira/browse/FLINK-15837
> > > > >
> > > > > Cheers,
> > > > > Till
> > > > >
> > > > > On Fri, Jan 31, 2020 at 2:34 PM Xintong Song <
> tonysong...@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > Thanks Gary & Yu,
> 

Re: [DISCUSS] FLIP-27: Refactor Source Interface

2020-01-15 Thread Steven Wu
Becket, is FLIP-27 still on track to be released in 1.10?

On Tue, Jan 7, 2020 at 7:04 PM Becket Qin  wrote:

> Hi folks,
>
> Happy new year!
>
> Stephan and I chatted offline yesterday. After reading the email thread
> again, I found that I have misunderstood Dawid's original proposal
> regarding the behavior of env.source(BoundedSource) and had an incorrect
> impression about the behavior of java covariant return type.
> Anyways, I agree what Dawid originally proposed makes sense, which is the
> following API:
>
> // Return a BoundedDataStream instance if the source is bounded.
> // Return a DataStream instance if the source is unbounded.
> DataStream env.source(Source);
>
> // Throws exception if the source is unbounded.
> // Used when users knows the source is bounded at programming time.
> BoundedDataStream env.boundedSource(Source);
>
> A BoundedDataStream only runs in batch execution mode.
> A DataStream only runs in streaming execution mode.
>
> To run a bounded source in streaming execution mode, one would do the
> following:
>
> // Return a DataStream instance with a source that will stop at some point;
> DataStream env.source(SourceUtils.asUnbounded(myBoundedSource));
>
> I'll update the FLIP wiki and resume the vote if there is no further
> concerns.
>
> Apologies for the misunderstanding and thanks for all the patient
> discussions.
>
> Thanks,
>
> Jiangjie (Becket) Qin
>
>
>
> On Mon, Dec 23, 2019 at 8:00 AM Becket Qin  wrote:
>
> > Hi Steven,
> >
> > I think the current proposal is what you mentioned - a Kafka source that
> > can be constructed in either BOUNDED or UNBOUNDED mode. And Flink can get
> > the boundedness by invoking getBoundedness().
> >
> > So one can create a Kafka source by doing something like the following:
> >
> > new KafkaSource().startOffset(),endOffset(); // A bounded instance.
> > new KafkaSource().startOffset(); // An unbounded instance.
> >
> > If users want to have an UNBOUNDED Kafka source that stops at some point.
> > They can wrap the BOUNDED Kafka source like below:
> >
> > SourceUtils.asUnbounded(new KafkaSource.startOffset().endOffset());
> >
> > The wrapped source would be an unbounded Kafka source that stops at the
> > end offset.
> >
> > Does that make sense?
> >
> > Thanks,
> >
> > Jiangjie (Becket) Qin
> >
> > On Fri, Dec 20, 2019 at 1:31 PM Jark Wu  wrote:
> >
> >> Hi,
> >>
> >> First of all, I think it is not called "UNBOUNDED", according to the
> >> FLIP-27, it is called "CONTINUOUS_UNBOUNDED".
> >> And from the description of the Boundedness in the FLIP-27[1] declares
> >> clearly what Becket and I think.
> >>
> >> public enum Boundedness {
> >>
> >> /**
> >>  * A bounded source processes the data that is currently available
> and
> >> will end after that.
> >>  *
> >>  * When a source produces a bounded stream, the runtime may
> >> activate
> >> additional optimizations
> >>  * that are suitable only for bounded input. Incorrectly producing
> >> unbounded data when the source
> >>  * is set to produce a bounded stream will often result in programs
> >> that do not output any results
> >>  * and may eventually fail due to runtime errors (out of memory or
> >> storage).
> >>  */
> >> BOUNDED,
> >>
> >> /**
> >>  * A continuous unbounded source continuously processes all data as
> it
> >> comes.
> >>  *
> >>  * The source may run forever (until the program is terminated)
> or
> >> might actually end at some point,
> >>  * based on some source-specific conditions. Because that is not
> >> transparent to the runtime,
> >>  * the runtime will use an execution mode for continuous unbounded
> >> streams whenever this mode
> >>  * is chosen.
> >>  */
> >> CONTINUOUS_UNBOUNDED
> >> }
> >>
> >> Best,
> >> Jark
> >>
> >> [1]:
> >>
> >>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-27%3A+Refactor+Source+Interface#FLIP-27:RefactorSourceInterface-Source
> >>
> >>
> >>
> >> On Fri, 20 Dec 2019 at 12:55, Steven Wu  wrote:
> >>
> >> > Becket,
> >> >
> >> > Regarding "UNBOUNDED source that stops at some point", I found it
> >> difficult
> >> > to gras

Re: [DISCUSS] FLIP-27: Refactor Source Interface

2019-12-19 Thread Steven Wu
Becket,

Regarding "UNBOUNDED source that stops at some point", I found it difficult
to grasp what UNBOUNDED really mean.

If we want to use Kafka source with an end/stop time, I guess you call it
UNBOUNDED kafka source that stops (aka BOUNDED-streaming). The
terminology is a little confusing to me. Maybe BOUNDED/UNBOUNDED shouldn't
be used to categorize source. Just call it Kafka source and it can run in
either BOUNDED or UNBOUNDED mode.

Thanks,
Steven

On Thu, Dec 19, 2019 at 7:02 PM Becket Qin  wrote:

> I had an offline chat with Jark, and here are some more thoughts:
>
> 1. From SQL perspective, BOUNDED source leads to the batch execution mode,
> UNBOUNDED source leads to the streaming execution mode.
> 2. The semantic of UNBOUNDED source is may or may not stop. The semantic of
> BOUNDED source is will stop.
> 3. The semantic of DataStream is may or may not terminate. The semantic of
> BoundedDataStream is will terminate.
>
> Given that, option 3 seems a better option because:
> 1. SQL already has strict binding between Boundedness and execution mode.
> Letting DataStream be consistent would be good.
> 2. The semantic of UNBOUNDED source is exactly the same as DataStream. So
> we should avoid breaking such semantic, i.e. turning some DataStream from
> "may or may not terminate" to "will terminate".
>
> For case where users want BOUNDED-streaming combination, they can simply
> use an UNBOUNDED source that stops at some point. We can even provide a
> simple wrapper to wrap a BOUNDED source as an UNBOUNDED source if that
> helps. But API wise, option 3 seems telling a pretty good whole story.
>
> Thanks,
>
> Jiangjie (Becket) Qin
>
>
>
>
> On Thu, Dec 19, 2019 at 10:30 PM Becket Qin  wrote:
>
> > Hi Timo,
> >
> > Bounded is just a special case of unbounded and every bounded source can
> >> also be treated as an unbounded source. This would unify the API if
> >> people don't need a bounded operation.
> >
> >
> > With option 3 users can still get a unified API with something like
> below:
> >
> > DataStream boundedStream = env.boundedSource(boundedSource);
> > DataStream unboundedStream = env.source(unboundedSource);
> >
> > So in both cases, users can still use a unified DataStream without
> > touching the bounded stream only methods.
> > By "unify the API if people don't need the bounded operation". Do you
> > expect a DataStream with a Bounded source to have the batch operators and
> > scheduler settings as well?
> >
> >
> > If we allow DataStream from BOUNDED source, we will essentially pick
> "*modified
> > option 2*".
> >
> > // The source is either bounded or unbounded, but only unbounded
> >> operations could be performed on the returned DataStream.
> >> DataStream dataStream = env.source(someSource);
> >
> >
> >> // The source must be a bounded source, otherwise exception is thrown.
> >> BoundedDataStream boundedDataStream =
> >> env.boundedSource(boundedSource);
> >
> >
> >
> > // Add the following method to DataStream
> >
> > Boundedness DataStream#getBoundedness();
> >
> >
> > From pure logical perspective, Boundedness and runtime settings
> > (Stream/Batch) are two orthogonal dimensions. And are specified in the
> > following way.
> >
> > *Boundedness* - defined by the source: BOUNDED / UNBOUNDED.
> > *Running mode* - defined by the API class: DataStream (Streaming mode) /
> > BoundedDataStream (batch mode).
> >
> > Excluding the UNBOUNDED-batch combination, the "*modified option 2"*
> > covers the rest three combination. Compared with "*modified option 2*",
> > the main benefit of option 3 is its simplicity and clearness, by tying
> > boundedness to running mode and giving up BOUNDED-streaming combination.
> >
> > Just to be clear, I am fine with either option. But I would like to
> > understand a bit more about the bounded-streaming use case and when users
> > would prefer this over bounded-batch case, and whether the added value
> > justifies the additional complexity in the API. Two cases I can think of
> > are:
> > 1. The records in DataStream will be processed in order, while
> > BoundedDataStream processes records without order guarantee.
> > 2. DataStream emits intermediate results when processing a finite
> dataset,
> > while BoundedDataStream only emit the final result. In any case, it could
> > be supported by an UNBOUNDED source stopping at some point.
> >
> > Case 1 is actually misleading because DataStream in general doesn't
> really
> > support in-order process.
> > Case 2 seems a rare use case because the instantaneous intermediate
> result
> > seems difficult to reason about. In any case, this can be supported by an
> > UNBOUNDED source that stops at some point.
> >
> > Is there other use cases for bounded-streaming combination I missed? I am
> > a little hesitating to put the testing requirement here because ideally
> I'd
> > avoid having public APIs for testing purpose only. And this could be
> > resolved by having a UNBOUNDED source stopping at some point as well.
> >
> > 

Re: [ANNOUNCE] Progress of Apache Flink 1.10 #2

2019-11-01 Thread Steven Wu
Gary,  FLIP-27 seems to get omitted in the 2nd update. below is the info
from update #1.

- FLIP-27: Refactor Source Interface [20]
-  FLIP accepted. Implementation is in progress.



On Fri, Nov 1, 2019 at 7:01 AM Gary Yao  wrote:

> Hi community,
>
> Because we have approximately one month of development time left until the
> targeted Flink 1.10 feature freeze, we thought now would be a good time to
> give another progress update. Below we have included a list of the ongoing
> efforts that have made progress since our last release progress update
> [1]. As
> always, if you are working on something that is not included here, feel
> free
> to use this thread to share your progress.
>
> - Support Java 11 [2]
> - Implementation is in progress (18/21 subtasks resolved)
>
> - Table API improvements
> - Full Data Type Support in Planner [3]
> - Implementing (1/8 subtasks resolved)
> - FLIP-66 Support Time Attribute in SQL DDL [4]
> - Implementation is in progress (1/7 subtasks resolved).
> - FLIP-70 Support Computed Column [5]
> - FLIP voting [6]
> - FLIP-63 Rework Table Partition Support [7]
> - Implementation is in progress (3/15 subtasks resolved).
> - FLIP-51 Rework of Expression Design [8]
> - Implementation is in progress (2/12 subtasks resolved).
> - FLIP-64 Support for Temporary Objects in Table Module [9]
> - Implementation is in progress
>
> - Hive compatibility completion (DDL/UDF) to support full Hive integration
> - FLIP-57 Rework FunctionCatalog [10]
> - Implementation is in progress (6/9 subtasks resolved)
> - FLIP-68 Extend Core Table System with Modular Plugins [11]
> - Implementation is in progress (2/8 subtasks resolved)
>
> - Finer grained resource management
> - FLIP-49: Unified Memory Configuration for TaskExecutors [12]
> - Implementation is in progress (6/10 subtasks resolved)
> - FLIP-53: Fine Grained Operator Resource Management [13]
> - Implementation is in progress (1/9 subtasks resolved)
>
> - Finish scheduler re-architecture [14]
> - Integration tests are being enabled for new scheduler
>
> - Executor/Client refactoring [15]
> - FLIP-81: Executor-related new ConfigOptions [16]
> - done
> - FLIP-73: Introducing Executors for job submission [17]
> - Implementation is in progress
>
> - FLIP-36 Support Interactive Programming [18]
> - Is built on top of FLIP-67 [19], which has been accepted
> - Implementation in progress
>
> - FLIP-58: Flink Python User-Defined Stateless Function for Table [20]
> - Implementation is in progress (12/22 subtask resolved)
> - FLIP-50: Spill-able Heap Keyed State Backend [21]
> - Implementation is in progress (2/11 subtasks resolved)
>
> - RocksDB Backend Memory Control [22]
> - FLIP for resource management on state backend will be opened soon
> - Write Buffer Manager will be backported to FRocksDB due to
> performance regression [23] in new RocksDB versions
>
> - Unaligned Checkpoints
> - FLIP-76 [24] was published and received positive feedback
> - Implementation is in progress
>
> - Separate framework and user class loader in per-job mode [25]
> - First PR is almost done. Remaining PRs will be ready next week
>
> - Active Kubernetes Integration [26]
> - Implementation is in progress (6/11 in review, 3/11 in progress,
> 2/11 todo)
>
> - FLIP-39 Flink ML pipeline and ML libs [27]
> - A few abstract ML classes have been merged (FLINK-13339,
> FLINK-13513)
> - Starting review of algorithms
>
> Again, the feature freeze is targeted to be at the end of November. Please
> make sure that all important work threads can be completed until that date.
> Feel free to use this thread to communicate any concerns about features
> that
> might not be finished until then. We will send another announcement later
> in
> the release cycle to make the date of the feature freeze official.
>
> Best,
> Yu & Gary
>
> [1] https://s.apache.org/wc0dc
> [2] https://issues.apache.org/jira/browse/FLINK-10725
> [3] https://issues.apache.org/jira/browse/FLINK-14079
> [4]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-66%3A+Support+time+attribute+in+SQL+DDL
> [5]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-70%3A+Flink+SQL+Computed+Column+Design
> [6]
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/VOTE-FLIP-70-Flink-SQL-Computed-Column-Design-td34385.html
> [7]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-63%3A+Rework+table+partition+support
> [8]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-51%3A+Rework+of+the+Expression+Design
> [9]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-64%3A+Support+for+Temporary+Objects+in+Table+module
> [10]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-57%3A+Rework+FunctionCatalog
> [11]
> 

Re: [ANNOUNCE] Becket Qin joins the Flink PMC

2019-10-31 Thread Steven Wu
Congratulations, Becket!

On Wed, Oct 30, 2019 at 9:51 PM Shaoxuan Wang  wrote:

> Congratulations, Becket!
>
> On Mon, Oct 28, 2019 at 6:08 PM Fabian Hueske  wrote:
>
> > Hi everyone,
> >
> > I'm happy to announce that Becket Qin has joined the Flink PMC.
> > Let's congratulate and welcome Becket as a new member of the Flink PMC!
> >
> > Cheers,
> > Fabian
> >
>


Re: [SURVEY] How many people are using customized RestartStrategy(s)

2019-09-25 Thread Steven Wu
Zhu Zhu, that is correct.

On Tue, Sep 24, 2019 at 8:04 PM Zhu Zhu  wrote:

> Hi Steven,
>
> As a conclusion, since we will have a meter metric[1] for restarts,
> customized restart strategy is not needed in your case.
> Is that right?
>
> [1] https://issues.apache.org/jira/browse/FLINK-14164
>
> Thanks,
> Zhu Zhu
>
> Steven Wu  于2019年9月25日周三 上午2:30写道:
>
>> Zhu Zhu,
>>
>> Sorry, I was using different terminology. yes, Flink meter is what I was
>> talking about regarding "fullRestarts" for threshold based alerting.
>>
>> On Mon, Sep 23, 2019 at 7:46 PM Zhu Zhu  wrote:
>>
>>> Steven,
>>>
>>> In my mind, Flink counter only stores its accumulated count and reports
>>> that value. Are you using an external counter directly?
>>> Maybe Flink Meter/MeterView is what you need? It stores the count and
>>> calculates the rate. And it will report its "count" as well as "rate" to
>>> external metric services.
>>>
>>> The counter "task_failures" only works if the individual failover
>>> strategy is enabled. However, it is not a public interface and is not
>>> suggested to use, as the fine grained recovery (region failover) now
>>> supersedes it.
>>> I've opened a ticket[1] to add a metric to show failovers that respects
>>> fine grained recovery.
>>>
>>> [1] https://issues.apache.org/jira/browse/FLINK-14164
>>>
>>> Thanks,
>>> Zhu Zhu
>>>
>>> Steven Wu  于2019年9月24日周二 上午6:41写道:
>>>
>>>>
>>>> When we setup alert like "fullRestarts > 1" for some rolling window, we
>>>> want to use counter. if it is a Gauge, "fullRestarts" will never go below 1
>>>> after a first full restart. So alert condition will always be true after
>>>> first job restart. If we can apply a derivative to the Gauge value, I guess
>>>> alert can probably work. I can explore if that is an option or not.
>>>>
>>>> Yeah. Understood that "fullRestart" won't increment when fine grained
>>>> recovery happened. I think "task_failures" counter already exists in Flink.
>>>>
>>>>
>>>>
>>>> On Sun, Sep 22, 2019 at 7:59 PM Zhu Zhu  wrote:
>>>>
>>>>> Steven,
>>>>>
>>>>> Thanks for the information. If we can determine this a common issue,
>>>>> we can solve it in Flink core.
>>>>> To get to that state, I have two questions which need your help:
>>>>> 1. Why is gauge not good for alerting? The metric "fullRestart" is a
>>>>> Gauge. Does the metric reporter you use report Counter and
>>>>> Gauge to external services in different ways? Or anything else can 
>>>>> be
>>>>> different due to the metric type?
>>>>> 2. Is the "number of restarts" what you actually need, rather than
>>>>> the "fullRestart" count? If so, I believe we will have such a counter
>>>>> metric in 1.10, since the previous "fullRestart" metric value is not the
>>>>> number of restarts when grained recovery (feature added 1.9.0) is enabled.
>>>>> "fullRestart" reveals how many times entire job graph has been
>>>>> restarted. If grained recovery (feature added 1.9.0) is enabled, the graph
>>>>> would not be restarted when task failures happen and the "fullRestart"
>>>>> value will not increment in such cases.
>>>>>
>>>>> I'd appreciate if you can help with these questions and we can make
>>>>> better decisions for Flink.
>>>>>
>>>>> Thanks,
>>>>> Zhu Zhu
>>>>>
>>>>> Steven Wu  于2019年9月22日周日 上午3:31写道:
>>>>>
>>>>>> Zhu Zhu,
>>>>>>
>>>>>> Flink fullRestart metric is a Gauge, which is not good for alerting
>>>>>> on. We publish an equivalent Counter metric for alerting purpose.
>>>>>>
>>>>>> Thanks,
>>>>>> Steven
>>>>>>
>>>>>> On Thu, Sep 19, 2019 at 7:45 PM Zhu Zhu  wrote:
>>>>>>
>>>>>>> Thanks Steven for the feedback!
>>>>>>> Could you share more information about the metrics you add in you
>>>>>>> customized restart strategy?
>>>>>>>
>>>>>>> Th

Re: [SURVEY] How many people are using customized RestartStrategy(s)

2019-09-24 Thread Steven Wu
Zhu Zhu,

Sorry, I was using different terminology. yes, Flink meter is what I was
talking about regarding "fullRestarts" for threshold based alerting.

On Mon, Sep 23, 2019 at 7:46 PM Zhu Zhu  wrote:

> Steven,
>
> In my mind, Flink counter only stores its accumulated count and reports
> that value. Are you using an external counter directly?
> Maybe Flink Meter/MeterView is what you need? It stores the count and
> calculates the rate. And it will report its "count" as well as "rate" to
> external metric services.
>
> The counter "task_failures" only works if the individual failover strategy
> is enabled. However, it is not a public interface and is not suggested to
> use, as the fine grained recovery (region failover) now supersedes it.
> I've opened a ticket[1] to add a metric to show failovers that respects
> fine grained recovery.
>
> [1] https://issues.apache.org/jira/browse/FLINK-14164
>
> Thanks,
> Zhu Zhu
>
> Steven Wu  于2019年9月24日周二 上午6:41写道:
>
>>
>> When we setup alert like "fullRestarts > 1" for some rolling window, we
>> want to use counter. if it is a Gauge, "fullRestarts" will never go below 1
>> after a first full restart. So alert condition will always be true after
>> first job restart. If we can apply a derivative to the Gauge value, I guess
>> alert can probably work. I can explore if that is an option or not.
>>
>> Yeah. Understood that "fullRestart" won't increment when fine grained
>> recovery happened. I think "task_failures" counter already exists in Flink.
>>
>>
>>
>> On Sun, Sep 22, 2019 at 7:59 PM Zhu Zhu  wrote:
>>
>>> Steven,
>>>
>>> Thanks for the information. If we can determine this a common issue, we
>>> can solve it in Flink core.
>>> To get to that state, I have two questions which need your help:
>>> 1. Why is gauge not good for alerting? The metric "fullRestart" is a
>>> Gauge. Does the metric reporter you use report Counter and
>>> Gauge to external services in different ways? Or anything else can be
>>> different due to the metric type?
>>> 2. Is the "number of restarts" what you actually need, rather than
>>> the "fullRestart" count? If so, I believe we will have such a counter
>>> metric in 1.10, since the previous "fullRestart" metric value is not the
>>> number of restarts when grained recovery (feature added 1.9.0) is enabled.
>>> "fullRestart" reveals how many times entire job graph has been
>>> restarted. If grained recovery (feature added 1.9.0) is enabled, the graph
>>> would not be restarted when task failures happen and the "fullRestart"
>>> value will not increment in such cases.
>>>
>>> I'd appreciate if you can help with these questions and we can make
>>> better decisions for Flink.
>>>
>>> Thanks,
>>> Zhu Zhu
>>>
>>> Steven Wu  于2019年9月22日周日 上午3:31写道:
>>>
>>>> Zhu Zhu,
>>>>
>>>> Flink fullRestart metric is a Gauge, which is not good for alerting on.
>>>> We publish an equivalent Counter metric for alerting purpose.
>>>>
>>>> Thanks,
>>>> Steven
>>>>
>>>> On Thu, Sep 19, 2019 at 7:45 PM Zhu Zhu  wrote:
>>>>
>>>>> Thanks Steven for the feedback!
>>>>> Could you share more information about the metrics you add in you
>>>>> customized restart strategy?
>>>>>
>>>>> Thanks,
>>>>> Zhu Zhu
>>>>>
>>>>> Steven Wu  于2019年9月20日周五 上午7:11写道:
>>>>>
>>>>>> We do use config like "restart-strategy:
>>>>>> org.foobar.MyRestartStrategyFactoryFactory". Mainly to add additional
>>>>>> metrics than the Flink provided ones.
>>>>>>
>>>>>> On Thu, Sep 19, 2019 at 4:50 AM Zhu Zhu  wrote:
>>>>>>
>>>>>>> Thanks everyone for the input.
>>>>>>>
>>>>>>> The RestartStrategy customization is not recognized as a public
>>>>>>> interface as it is not explicitly documented.
>>>>>>> As it is not used from the feedbacks of this survey, I'll conclude
>>>>>>> that we do not need to support customized RestartStrategy for the new
>>>>>>> scheduler in Flink 1.10
>>>>>>>
>>>>>>> Othe

  1   2   >