[Discuss] expose TaskIOMetricGroup to custom Partitioner

2024-05-16 Thread Steven Wu
Hi, I am trying to implement a custom range partitioner in the Flink Iceberg sink. Want to publish some counter metrics for certain scenarios. This is like the network metrics exposed in `TaskIOMetricGroup`. This requires adding a new setup method to the custom `Partitioner` interface. Like to

Re: DataOutputSerializer serializing long UTF Strings

2024-01-22 Thread Steven Wu
I think this is a reasonable extension to `DataOutputSerializer`. Although 64 KB is not small, it is still possible to have long strings over that limit. There are already precedents of extended APIs `DataOutputSerializer`. E.g. public void setPosition(int position) {

Re: FW: [ANNOUNCE] New Apache Flink Committer - Alexander Fedulov

2024-01-03 Thread Steven Wu
Congra, Alex! Well deserved! On Wed, Jan 3, 2024 at 2:31 AM David Radley wrote: > Sorry for my typo. > > Many congratulations Alex! > > From: David Radley > Date: Wednesday, 3 January 2024 at 10:23 > To: David Anderson > Cc: dev@flink.apache.org > Subject: Re: [EXTERNAL] [ANNOUNCE] New

Re: [DISCUSS] Promote SinkV2 to @Public and deprecate SinkFunction

2023-02-06 Thread Steven Wu
> developers will *have a workable migration path from the old API to > > the > > > > > new API*. > > > > > > > > > > > > From a user's perspective, the workable migration path is very > > important. > > > > Otherwise, it blurs

Re: [DISCUSS] Promote SinkV2 to @Public and deprecate SinkFunction

2023-02-05 Thread Steven Wu
Regarding the discussion on global committer [1] for sinks with global transactions, there is no consensus on solving that problem in SinkV2. Will it require any breaking change in SinkV2? Also will SinkV1 be deprecated too? or it should happen sometime after SinkFunction deprecation? [1]

[DISCUSS] streaming shuffle to improve data clustering and tame small files problem

2023-01-30 Thread Steven Wu
Hi, We had a proposal to add a streaming shuffling stage in the Flink Iceberg sink to to improve data clustering and tame the small files problem [1]. Here are a couple of common use cases. * Event time partitioned table where we can get small files problem due to skewed and long-tail

Re: [DISCUSS] FLIP-264 Extract BaseCoordinatorContext

2023-01-30 Thread Steven Wu
unds good to me to implement the shuffle operator in the Iceberg > project first. > We can contribute it to Flink DataStream in the future if other > projects/connectors also need it. > > Best, > Jark > > > On Wed, 18 Jan 2023 at 02:11, Steven Wu wrote: > >> Jark, &

Re: [DISCUSS] FLIP-274 : Introduce metric group for OperatorCoordinator

2023-01-17 Thread Steven Wu
> Additionally, the configurable variables (operator name/id) are logically not attached to the coordinator, but operators, so to me it just doesn't make sense to structure it like this. Chesnay, maybe we should clarify the terminology. To me, pperators (like FLIP-27 source) can have two parts

Re: [DISCUSS] FLIP-264 Extract BaseCoordinatorContext

2023-01-17 Thread Steven Wu
>>>>> the line, e.g. by the ShuffleCoordinator. The OperatorCoordinator >> API >> >>>>> is a >> >>>>> non-public API and before reading the code, I wasn't even aware how >> >>>>> exactly >> >>>>> it worked and w

Re: [ANNOUNCE] New Apache Flink Committer - Matyas Orhidi

2022-11-21 Thread Steven Wu
Congrats, Matyas! On Mon, Nov 21, 2022 at 11:19 PM godfrey he wrote: > Congratulations, Matyas! > > Matthias Pohl 于2022年11月22日周二 13:40写道: > > > > Congratulations, Matyas :) > > > > On Tue, Nov 22, 2022 at 11:44 AM Xingbo Huang > wrote: > > > > > Congrats Matyas! > > > > > > Best, > > > Xingbo

Re: [DISCUSS] FLIP-264 Extract BaseCoordinatorContext

2022-10-27 Thread Steven Wu
ically > >>>> added to synchronize the global watermark and to assign splits > dynamically. > >>>> However, it practically allows arbitrary RPC calls between the task > and the > >>>> job manager. I understand that there is concern that such a

Re: [VOTE] FLIP 267: Iceberg Connector

2022-10-24 Thread Steven Wu
+1 (non-binding) On Mon, Oct 24, 2022 at 11:32 AM Martijn Visser wrote: > +1 (binding) > > On Thu, Oct 20, 2022 at 12:37 AM wrote: > > > Hi all, > > > > Thanks for all the feedback for FLIP 267[1]: Iceberg Connector in the > > discussion thread [2]. > > > > I would like to start a vote thread

Re: [VOTE] Externalized connector release details​

2022-10-20 Thread Steven Wu
Chesnay, thanks for the write-up. very helpful! Regarding the parent pom, I am wondering if it can be published to the `org.apache.flink` group? io.github.zentol.flink flink-connector-parent 1.0 On Mon, Oct 17, 2022 at 5:52 AM Chesnay Schepler wrote: > >

Re: [Discuss]- Donate Iceberg Flink Connector

2022-10-20 Thread Steven Wu
opose > >> https://github.com/apache/flink-connector-iceberg (doesn’t exist) > >> > >> Thanks > >> Abid > >> > >> On 2022/10/19 08:41:02 Martijn Visser wrote: > >>> Hi all, > >>> > >>> Thanks for the info and also

Re: Re: [Discuss]- Donate Iceberg Flink Connector

2022-10-17 Thread Steven Wu
I was one of the maintainers for the Flink Iceberg connector in Iceberg repo. I can volunteer as one of the initial maintainers if we decide to move forward. On Mon, Oct 17, 2022 at 3:26 PM wrote: > Hi Martijn, > > Yes, It is considered a connector in Flink terms. > > We wanted to join the

Re: [DISCUSS] FLIP-264 Extract BaseCoordinatorContext

2022-10-16 Thread Steven Wu
functionality to facilitate the communication between coordinator and subtasks. [1] https://issues.apache.org/jira/browse/FLINK-27405 On Sun, Oct 16, 2022 at 8:56 AM Steven Wu wrote: > Hang, appreciate your input. Agree that `CoordinatorContextBase` is a > better name considering Flin

Re: [DISCUSS] FLIP-264 Extract BaseCoordinatorContext

2022-10-16 Thread Steven Wu
ase class for SourceCoordinatorContext. > But I prefer to use the name `OperatorCoordinatorContextBase` or > `CoordinatorContextBase` as the format like `SourceReaderBase`. > I also agree to what Piotr said. Maybe more problems will occur when > connectors start to use it. > > Best, > Hang &g

Re: [DISCUSS] FLIP-264 Extract BaseCoordinatorContext

2022-10-14 Thread Steven Wu
erator > coordinators if there is a good reason behind that, but it is a more > difficult topic and might be a larger effort than it seems at the first > glance. > > Best, > Piotrek > > wt., 4 paź 2022 o 19:41 Steven Wu napisał(a): > > > Jing, thanks a lot fo

Re: [DISCUSS] Externalized connector release details

2022-10-12 Thread Steven Wu
With the model of externalized Flink connector repo (which I fully support), there is one challenge of supporting versions of two upstream projects (similar to what Peter Vary mentioned earlier). E.g., today the Flink Iceberg connector lives in Iceberg repo. We have separate modules 1.13, 1.14,

Re: [DISCUSS] FLIP-264 Extract BaseCoordinatorContext

2022-10-04 Thread Steven Wu
Jing, thanks a lot for your reply. The linked google doc is not for this FLIP, which is fully documented in the wiki page. The linked google doc is the design doc to introduce shuffling in Flink Iceberg sink, which motivated this FLIP proposal so that the shuffle coordinator can leverage the

Re: Sink V2 interface replacement for GlobalCommitter

2022-09-28 Thread Steven Wu
ster/flink/src/main/java/io/delta/flink/sink/internal/committer/DeltaGlobalCommitter.java > [3] > https://drive.google.com/file/d/1kU0R9nLZneJBDAkgNiaRc90dLGycyTec/view?usp=sharing > [4] https://lists.apache.org/thread/otscy199g1l9t3llvo8s2slntyn2r1jc > [5] > https://github.com/apach

Re: Sink V2 interface replacement for GlobalCommitter

2022-09-14 Thread Steven Wu
BDAkgNiaRc90dLGycyTec/view?usp=sharing > [4] https://lists.apache.org/thread/otscy199g1l9t3llvo8s2slntyn2r1jc > [5] > https://github.com/apache/flink/blob/release-1.15/flink-streaming-java/src/main/java/org/apache/flink/streaming/runtime/operators/sink/committables/CommittableCollectorSeriali

Re: Sink V2 interface replacement for GlobalCommitter

2022-09-13 Thread Steven Wu
-Flink connector open source >> > community >> > > here [2]. >> > > >> > > I'm totally agree with Steven on this. Sink's V1 GlobalCommitter is >> > > something exactly what Flink-Delta Sink needs since it is the place >> where >>

Re: [ANNOUNCE] New Apache Flink PMC Member - Martijn Visser

2022-09-12 Thread Steven Wu
Congrats, Martijn! On Mon, Sep 12, 2022 at 1:49 PM Alexander Fedulov wrote: > Congrats, Martijn! > > On Mon, Sep 12, 2022 at 10:06 AM Jing Ge wrote: > > > Congrats! > > > > On Mon, Sep 12, 2022 at 9:38 AM Daisy Tsang wrote: > > > > > Congrats! > > > > > > On Mon, Sep 12, 2022 at 9:32 AM

Re: Sink V2 interface replacement for GlobalCommitter

2022-09-09 Thread Steven Wu
Log which should be done from a one > > > place/instance. > > > > > > Currently I'm evaluating V2 for our connector and having, how Steven > > > described it a "more natural, built-in concept/support of > GlobalCommitter > > > in the sink v2

Re: Sink V2 interface replacement for GlobalCommitter

2022-09-08 Thread Steven Wu
e are using the `WithPostCommitTopology` > for global committer, we would lose the capability of using the post commit > stage for small files compaction. > On Tue, Aug 16, 2022 at 9:53 AM Steven Wu wrote: > > > > In the V1 sink interface, there is a GlobalCommitter for Iceberg. Wit

Re: Sink V2 interface replacement for GlobalCommitter

2022-08-16 Thread Steven Wu
16, 2022 at 9:53 AM Steven Wu wrote: > > In the V1 sink interface, there is a GlobalCommitter for Iceberg. With the > V2 sink interface, GlobalCommitter has been deprecated by > WithPostCommitTopology. I thought the post commit stage is mainly for async > maintenance (like compa

Sink V2 interface replacement for GlobalCommitter

2022-08-16 Thread Steven Wu
In the V1 sink interface, there is a GlobalCommitter for Iceberg. With the V2 sink interface, GlobalCommitter has been deprecated by WithPostCommitTopology. I thought the post commit stage is mainly for async maintenance (like compaction). Are we supposed to do sth similar to the

Re: [VOTE] FLIP-217: Support watermark alignment of source splits

2022-08-04 Thread Steven Wu
+1 (non-binding) On Wed, Aug 3, 2022 at 5:47 AM Martijn Visser wrote: > +1 (binding) > > Op wo 3 aug. 2022 om 14:33 schreef Piotr Nowojski : > > > +1 (binding) > > > > śr., 3 sie 2022 o 14:13 Thomas Weise napisał(a): > > > > > +1 (binding) > > > > > > > > > On Sun, Jul 31, 2022 at 10:57 PM

Re: [DISCUSS] FLIP-238: Introduce FLIP-27-based Data Generator Source

2022-06-14 Thread Steven Wu
new Source API the > > >> checkpointing > > >> > > aspects that you based your logic on are pushed further away from > > the > > >> > > low-level interfaces responsible for handling data and splits [1]. > > At > > >> the &g

Re: [DISCUSS] FLIP-238: Introduce FLIP-27-based Data Generator Source

2022-06-07 Thread Steven Wu
In Iceberg source, we have a data generator source that can control the records per checkpoint cycle. Can we support sth like this in the DataGeneratorSource? https://github.com/apache/iceberg/blob/master/flink/v1.15/flink/src/test/java/org/apache/iceberg/flink/source/BoundedTestSource.java

Re: Source alignment for Iceberg

2022-05-06 Thread Steven Wu
might be the same as => might NOT be the same as On Fri, May 6, 2022 at 8:13 PM Steven Wu wrote: > The conclusion of this discussion could be that we don't see much value in > leveraging FLIP-182 with Iceberg source. That would totally be fine. > > For me, one big

Re: Source alignment for Iceberg

2022-05-06 Thread Steven Wu
work information to the user space in the future. > > > > Thanks, > > > > Jiangjie (Becket) Qin > > > > > > > >> On Fri, May 6, 2022 at 6:15 AM Thomas Weise wrote: > >> > >>> On Wed, May 4, 2022 at 11:03 AM Steven Wu > wr

Re: Source alignment for Iceberg

2022-05-05 Thread Steven Wu
m the split, so as long as it also knows > exactly which splits have finished, it would know which splits to hold back. > > Best, > Piotrek > > śr., 4 maj 2022 o 20:03 Steven Wu napisał(a): > >> Piotr, thanks a lot for your feedback. >> >> > I can see

Re: Source alignment for Iceberg

2022-05-04 Thread Steven Wu
gt; about it [1]. It sounds to me like you also need to solve this problem, > > otherwise Iceberg users will encounter late records in case of some race > > conditions between assigning new splits and completions of older. > > > > Best, > > Piotrek > > > &

Re: Source alignment for Iceberg

2022-05-01 Thread Steven Wu
e FLIP-182 and ff. threads. The goal of FLIP-182 is >> to pause readers while consuming a split, while your approach pauses >> readers before processing another split. So it feels more closely related >> to the global min watermark - so it could either be part of that F

Re: [DISCUSS] FLIP-217 Support watermark alignment of source splits

2022-04-21 Thread Steven Wu
> However, a single source operator may read data from multiple splits/partitions, e.g., multiple Kafka partitions, such that even with watermark alignment the source operator may need to buffer excessive amount of data if one split emits data faster than another. For this part from the

Re: Re: Change of focus

2022-02-28 Thread Steven Wu
Till, thank you for your immense contributions to the project and the community. On Mon, Feb 28, 2022 at 9:16 PM Xintong Song wrote: > Thanks for everything, Till. It has been a great honor working with you. > Good luck with your new chapter~! > > Thank you~ > > Xintong Song > > > > On Tue, Mar

Re: [DISCUSS] FLIP-191: Extend unified Sink interface to support small file compaction

2021-11-08 Thread Steven Wu
> although I think only using a customizable shuffle won't address the generation of small files. One assumption is that at least the sink generates one file per subtask, which can already be too many. Another problem is that with low checkpointing intervals, the files do not meet the required

Re: [DISCUSS] FLIP-191: Extend unified Sink interface to support small file compaction

2021-11-06 Thread Steven Wu
Fabian, thanks a lot for the proposal and starting the discussion. We probably should first describe the different causes of small files and what problems was this proposal trying to solve. I wrote a data shuffling proposal [1] for Flink Iceberg sink (shared with Iceberg community [2]). It can

Re: [VOTE] FLIP-179: Expose Standardized Operator Metrics

2021-07-30 Thread Steven Wu
+1 (non-binding) On Fri, Jul 30, 2021 at 3:55 AM Arvid Heise wrote: > Dear devs, > > I'd like to open a vote on FLIP-179: Expose Standardized Operator Metrics > [1] which was discussed in this thread [2]. > The vote will be open for at least 72 hours unless there is an objection > or not enough

Re: [RESULT][VOTE] FLIP-147: Support Checkpoint After Tasks Finished

2021-07-21 Thread Steven Wu
> if a failure happens after sequence of finish() -> snapshotState(), but before notifyCheckpointComplete(), we will restore such a state and we might end up sending some more records to such an operator. I probably missed sth here. isn't this the case today already? Why is it a concern for the

Re: [DISCUSS] FLIP-179: Expose Standardized Operator Metrics

2021-07-19 Thread Steven Wu
> are you okay with the setLastFetchTimeGauge explanation or do you have > alternative ideas? > > Best, > > Arvid > > On Fri, Jul 16, 2021 at 8:13 PM Steven Wu wrote: > > > To avoid confusion, can we either rename "SourceMetricGroup" to " >

Re: [DISCUSS] FLIP-179: Expose Standardized Operator Metrics

2021-07-16 Thread Steven Wu
1] > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-33%3A+Standardize+Connector+Metrics > > On Wed, Jul 14, 2021 at 9:00 AM Steven Wu wrote: > > > I am trying to understand what those two metrics really capture > > > > > G setPendingBytesGauge

Re: [DISCUSS] FLIP-183: Dynamic buffer size adjustment

2021-07-15 Thread Steven Wu
> > > > > So apart from adding buffer size information to the `AddCredit` > message, > > we > > > will need to support a case where upstream subpartition has already > > > produced a buffer with older size (for example 32KB), while the next > > credit > &

Re: [NOTICE] flink-runtime now scala-free

2021-07-15 Thread Steven Wu
This is awesome. Thank you, Chesney! On Wed, Jul 14, 2021 at 1:50 AM Yun Tang wrote: > Great news, thanks for Chesnay's work! > > Best > Yun Tang > > From: Martijn Visser > Sent: Wednesday, July 14, 2021 16:05 > To: dev@flink.apache.org > Subject: Re: [NOTICE]

Re: [DISCUSS] FLIP-179: Expose Standardized Operator Metrics

2021-07-14 Thread Steven Wu
I am trying to understand what those two metrics really capture > G setPendingBytesGauge(G pendingBytesGauge); - use file source as an example, it captures the remaining bytes for the current file split that the reader is processing? How would users interpret or use this metric?

Re: [DISCUSS] FLIP-183: Dynamic buffer size adjustment

2021-07-14 Thread Steven Wu
- The subtask observes the changes in the throughput and changes the buffer size during the whole life period of the task. - The subtask sends buffer size and number of available buffers to the upstream to the corresponding subpartition. - Upstream changes the buffer size

Re: [ANNOUNCE] New PMC member: Guowei Ma

2021-07-08 Thread Steven Wu
Awesome! Congratulations, Guowei! On Wed, Jul 7, 2021 at 4:25 AM Jingsong Li wrote: > Congratulations, Guowei! > > Best, > Jingsong > > On Wed, Jul 7, 2021 at 6:36 PM Arvid Heise wrote: > > > Congratulations! > > > > On Wed, Jul 7, 2021 at 11:30 AM Till Rohrmann > > wrote: > > > > >

Re: [VOTE] FLIP-150: Introduce Hybrid Source

2021-07-01 Thread Steven Wu
+1 (non-binding) On Thu, Jul 1, 2021 at 4:59 AM Thomas Weise wrote: > +1 (binding) > > > On Thu, Jul 1, 2021 at 8:13 AM Arvid Heise wrote: > > > +1 (binding) > > > > Thank you and Thomas for driving this > > > > On Thu, Jul 1, 2021 at 7:50 AM 蒋晓峰 wrote: > > > > > Hi everyone, > > > > > > > >

Re: Add control mode for flink

2021-06-08 Thread Steven Wu
>>>> * Iteration: When a certain condition is met, we might want to >>>>>> signal downstream operators with an event >>>>>> * Mini-batch assembling: Flink currently uses special watermarks >>>>>> for indicating the end of each mini-bat

Re: Re: Add control mode for flink

2021-06-08 Thread Steven Wu
;are checkpointed. > > > Steven Wu [via Apache Flink User Mailing List archive.] < > ml+s2336050n44278...@n4.nabble.com> 于2021年6月8日周二 下午2:15写道: > > > > > I can see the benefits of control flow. E.g., it might help the old (and > > inactive) FLIP-17 side

Re: [DISCUSS]FLIP-150: Introduce Hybrid Source

2021-06-08 Thread Steven Wu
to think if this is the most intuitive name for > new users? I'm especially hoping that natives might give some ideas (or > declare that Hybrid is perfect). > > [1] https://github.com/apache/flink/pull/15924#pullrequestreview-677376664 > > On Sun, Jun 6, 2021 at 7:47 PM Steven Wu

Re: Re: Add control mode for flink

2021-06-08 Thread Steven Wu
mainstream, it would be helpful to have an event >>>>> signaling the finishing of the bootstrap. >>>>> >>>>> ## Dynamic REST controlling >>>>> Back to the specific feature that Jiangang proposed, I personally >>>>> think it's

Re: [DISCUSS]FLIP-150: Introduce Hybrid Source

2021-06-06 Thread Steven Wu
gt; Thanks, > Thomas > > [1] > https://cwiki.apache.org/confluence/display/FLINK/FLIP-150%3A+Introduce+Hybrid+Source#FLIP150:IntroduceHybridSource-Prototypeimplementation > [2] > https://github.com/apache/flink/pull/15924/files#diff-e07478b3cad9810925ec784b61ec0026396839cc5b27bd6d337a

Re: Add control mode for flink

2021-06-04 Thread Steven Wu
I am not sure if we should solve this problem in Flink. This is more like a dynamic config problem that probably should be solved by some configuration framework. Here is one post from google search:

Re: [DISCUSS]FLIP-150: Introduce Hybrid Source

2021-06-01 Thread Steven Wu
discussed the PR with Thosmas offline. Thomas, please correct me if I missed anything. Right now, the PR differs from the FLIP-150 doc regarding the converter. * Current PR uses the enumerator checkpoint state type as the input for the converter * FLIP-150 defines a new EndStateT interface. It

Re: [DISCUSS] FLIP-160: Declarative scheduler

2021-01-22 Thread Steven Wu
Till, thanks a lot for the proposal. Even if the initial phase is only to support scale-up, maybe the "ScaleUpController" interface should be called "RescaleController" so that in the future scale-down can be added. On Fri, Jan 22, 2021 at 7:03 AM Till Rohrmann wrote: > Hi everyone, > > I

Re: [DISCUSS] FLIP-159: Reactive Mode

2021-01-22 Thread Steven Wu
Thanks a lot for the proposal, Robert and Till. > No fixed parallelism for any of the operators Regarding this limitation, can the scheduler only adjust the default parallelism? if some operators set parallelism explicitly (like always 1), just leave them unchanged. On Fri, Jan 22, 2021 at

Re: Re: Re: [ANNOUNCE] Welcome Guowei Ma as a new Apache Flink Committer

2021-01-20 Thread Steven Wu
Congrats, Guowei! On Wed, Jan 20, 2021 at 10:32 AM Seth Wiesman wrote: > Congratulations! > > On Wed, Jan 20, 2021 at 3:41 AM hailongwang <18868816...@163.com> wrote: > > > Congratulations, Guowei! > > > > Best, > > Hailong > > > > 在 2021-01-20 15:55:24,"Till Rohrmann" 写道: > > >Congrats,

Re: [DISCUSS] Flink configuration from environment variables

2021-01-19 Thread Steven Wu
nse. > > > > > > > > > > 1. Not distinguishing JM/TM is reasonable, but what about the > client > > > > side. > > > > > For Yarn/K8s deployment, > > > > > the local flink-conf.yaml will be shipped to JM/TM. So I am just > > confuse

Re: [DISCUSS] Flink configuration from environment variables

2021-01-18 Thread Steven Wu
Variable substitution (proposed here) is definitely useful. For us, hierarchical override is more useful. E.g., we may have the default value of "state.checkpoints.dir=path1" defined in flink-conf.yaml. But maybe we want to override it to "state.checkpoints.dir=path2" via environment variable in

Re: [DISCUSS] Releasing Apache Flink 1.11.3

2020-11-03 Thread Steven Wu
source and later replace it by a dependency on the Flink file > source? > > On Mon, Nov 2, 2020 at 8:33 PM Steven Wu wrote: > > > Stephan, thanks a lot for explaining the file connector. that makes > sense. > > > > I was asking because we were trying to reuse some o

Re: [DISCUSS] Releasing Apache Flink 1.11.3

2020-11-02 Thread Steven Wu
ver, with the base connector changes backported, you should be able to > run the file connector code from master against 1.11.3. > > The collect() utils can be picked back, I see no issue with that (it is > isolated utilities). > > Best, > Stephan > > > On Mon, Nov 2, 2

Re: [DISCUSS] Releasing Apache Flink 1.11.3

2020-11-01 Thread Steven Wu
Basically, it would be great to get the latest code in the flink-connector-files (FLIP-27). On Sat, Oct 31, 2020 at 9:57 AM Steven Wu wrote: > Stephan, it will be great if we can also backport the DataStreamUtils > related commits that help with collecting output from unbounded streams.

Re: [DISCUSS] Releasing Apache Flink 1.11.3

2020-10-31 Thread Steven Wu
gt; >> > > > >> > > commit 4ea95782b4c6a2538153d4d16ad3f4839c7de0fb > >> > > [FLINK-19223][connectors] Simplify Availability Future Model in Base > >> > > Connector > >> > > > >> > > commit 511857049ba30c8f

Re: [DISCUSS] Releasing Apache Flink 1.11.3

2020-10-28 Thread Steven Wu
I would love to see this FLIP-27 source interface improvement [1] made to 1.11.3. [1] https://issues.apache.org/jira/browse/FLINK-19698 On Wed, Oct 28, 2020 at 12:32 AM Tzu-Li (Gordon) Tai wrote: > Thanks for the replies so far! > > Just to provide a brief update on the status of blockers for

Re: [VOTE] FLIP-135: Approximate Task-Local Recovery

2020-10-20 Thread Steven Wu
+1 (non-binding). Some of our users have asked for this tradeoff of consistency over availability for some cases. On Mon, Oct 19, 2020 at 8:02 PM Zhijiang wrote: > Thanks for driving this effort, Yuan. > > +1 (binding) on my side. > > Best, > Zhijiang > > >

Re: [VOTE] FLIP-143: Unified Sink API

2020-09-27 Thread Steven Wu
+1 (non-binding) Although I would love to continue the discussion for tweaking the CommitResult/GlobaCommitter interface maybe during the implementation phase. On Fri, Sep 25, 2020 at 5:35 AM Aljoscha Krettek wrote: > +1 (binding) > > Aljoscha > > On 25.09.20 14:26, Guowei Ma wrote: > > From

Re: [DISCUSS] FLIP-143: Unified Sink API

2020-09-27 Thread Steven Wu
t the > beginning in 1.12 > > What do you think? > > Best, > Guowei > > > On Fri, Sep 25, 2020 at 9:30 PM Steven Wu wrote: > > > I should clarify my last email a little more. > > > > For the example of commits for checkpoints 1-100 failed, the j

Re: [DISCUSS] FLIP-143: Unified Sink API

2020-09-25 Thread Steven Wu
sink implementations. Thanks, Steven On Fri, Sep 25, 2020 at 5:56 AM Steven Wu wrote: > > 1. The frame can not know which `GlobalCommT` to retry if we use the > > List as parameter when the `commit` returns `RETRY`. > > 2. Of course we can let the `commit` return more detailed i

Re: [DISCUSS] FLIP-143: Unified Sink API

2020-09-25 Thread Steven Wu
; >> 1. The frame can not know which `GlobalCommT` to retry if we use the > >> List as parameter when the `commit` returns `RETRY`. > >> 2. Of course we can let the `commit` return more detailed info but it > >> might be too complicated. > >> 3. On the other hand, I thin

Re: [DISCUSS] FLIP-143: Unified Sink API

2020-09-24 Thread Steven Wu
Guowei, Thanks a lot for updating the wiki page. It looks great. I noticed one inconsistency in the wiki with your last summary email for GlobalCommitter interface. I think the version in the summary email is the intended one, because rollover from previous failed commits can accumulate a list.

Re: [DISCUSS] FLIP-143: Unified Sink API

2020-09-22 Thread Steven Wu
recoveredCommittables(List) ; } The most important need from the framework is to run GlobalCommitter in the jobmanager. It involves the topology creation, checkpoint handling, serializing the executions of commit() calls etc. Thanks, Steven On Tue, Sep 22, 2020 at 6:39 AM Steven Wu wrote: > It is f

Re: [DISCUSS] FLIP-143: Unified Sink API

2020-09-22 Thread Steven Wu
It is fine to leave the CommitResult/RETRY outside the scope of framework. Then the framework might need to provide some hooks in the checkpoint/restore logic. because the commit happened in the post checkpoint completion step, sink needs to update the internal state when the commit is successful

Re: [DISCUSS] FLIP-143: Unified Sink API

2020-09-21 Thread Steven Wu
tead of the simple merge() function. > > What do you think? > > Best, > Aljoscha > > On 21.09.20 10:06, Piotr Nowojski wrote: > > Hi Guowei, > > > >> I believe that we could support such an async sink writer > >> very easily in the fut

Re: [DISCUSS] FLIP-143: Unified Sink API

2020-09-20 Thread Steven Wu
anks :) >> >> >> >> Is this called when restored from checkpoint/savepoint? >> >> Yes. >> >> >> >>Iceberg sink needs to do a dup check here on which GlobalCommT were >> committed and which weren't. Should it return the filtered/de-du

Re: [DISCUSS] FLIP-143: Unified Sink API

2020-09-19 Thread Steven Wu
t; A possible alternative option would be let the user build the topology > himself. But considering we have two execution modes we could only use > `Writer` and `Committer` to build the sink topology. > > ### Build Topology Option > > Sink { > Sink addWriter(Writer

Re: [DISCUSS] FLIP-143: Unified Sink API

2020-09-18 Thread Steven Wu
Aljoscha, > Instead the sink would have to check for each set of committables seperately if they had already been committed. Do you think this is feasible? Yes, that is how it works in our internal implementation [1]. We don't use checkpointId. We generate a manifest file (GlobalCommT) to bundle

Re: [DISCUSS] FLIP-143: Unified Sink API

2020-09-17 Thread Steven Wu
Guowei Just to add to what Aljoscha said regarding the unique id. Iceberg sink checkpoints the unique id into state during snapshot. It also inserts the unique id into the Iceberg snapshot metadata during commit. When a job restores the state after failure, it needs to know if the restored

Re: [DISCUSS] FLIP-143: Unified Sink API

2020-09-16 Thread Steven Wu
> I think this problem(How to dedupe the combined committed data) also >> > depends on where to place the agg/combine logic . >> > >> > 1. If the agg/combine takes place in the “commit” maybe we need to >> figure >> > out how to give the aggregated committab

Re: [DISCUSS] FLIP-143: Unified Sink API

2020-09-15 Thread Steven Wu
ointId in API, as long as the internal bookkeeping groups data files by checkpoints for streaming mode. On Tue, Sep 15, 2020 at 6:58 AM Steven Wu wrote: > > images don't make it through to the mailing lists. You would need to > host the file somewhere and send a link. > > Sorry ab

Re: [DISCUSS] FLIP-143: Unified Sink API

2020-09-15 Thread Steven Wu
; > extended commit outages without losing written/uploaded data files, as > > operator state size is as small as one manifest file per checkpoint cycle > > [2]. > > -- > > StateT snapshotState(SnapshotContext context) throws Exception; > > > > That mean

Re: [DISCUSS] FLIP-143: Unified Sink API

2020-09-14 Thread Steven Wu
t(...) and > > the bookkeeping of the committables could be handled by the framework. I > > think something like an optional combiner in the GlobalCommitter would > > be enough. What do you think? > > > > GlobalCommitter { > > > > void commit(GlobalCommT global

Re: [DISCUSS] Deprecate and remove UnionList OperatorState

2020-09-13 Thread Steven Wu
Right now, we use UnionState to store the `nextCheckpointId` in the Iceberg sink use case, because we can't retrieve the checkpointId from the FunctionInitializationContext during the restore case. But we can move away from it if the restore context provides the checkpointId. On Sat, Sep 12, 2020

Re: [DISCUSS] FLIP-143: Unified Sink API

2020-09-13 Thread Steven Wu
all outstanding transactions > > for the IDs when we restore from a savepoint. In order for this to work > > we need to recycle the IDs, so there needs to be a back-channel from the > > Committer to the Writter, or they need to share internal state. > > > > I don

Re: [DISCUSS] FLIP-143: Unified Sink API

2020-09-10 Thread Steven Wu
Guowei, Thanks a lot for the proposal and starting the discussion thread. Very excited. For the big question of "Is the sink an operator or a topology?", I have a few related sub questions. * Where should we run the committers? * Is the committer parallel or single parallelism? * Can a single

Re: [DISCUSS] Planning Flink 1.12

2020-08-14 Thread Steven Wu
What about the work of migrating some Flink sources to the new FLIP-27 source interface? They are not listed in the 1.12 release wiki page. On Thu, Aug 13, 2020 at 6:51 PM Dian Fu wrote: > Hi Rodrigo, > > Both FLIP-130 and FLIP-133 will be in the list of 1.12. Besides, there are > also some

Re: [VOTE] Release 1.11.0, release candidate #4

2020-07-04 Thread Steven Wu
+1 (non-binding) - rolled out to thousands of router jobs in our test env - tested with a large-state job. Did simple resilience and checkpoint/savepoint tests. General performance metrics look on par. - tested with a high-parallelism stateless transformation job. General performance metrics look

Re: [DISCUSS] (Document) Backwards Compatibility of Savepoints

2020-06-06 Thread Steven Wu
estore > from the savepoint taken the new Flink version instead of the previous > savepoint, is that we want to minimize the source rewind? > > Best, > Congxian > > > Steven Wu 于2020年6月3日周三 上午9:08写道: > > > Current Flink documentation is actually pretty clear about n

Re: [DISCUSS] (Document) Backwards Compatibility of Savepoints

2020-06-02 Thread Steven Wu
le pointing out > what the community commits to maintain going forward (e.g. "happens to > work" vs. "guaranteed to work") > > In general, the table is quite large. Would it make sense to order the > releases in reverse order (assuming that the table is more releva

Re: [DISCUSS] (Document) Backwards Compatibility of Savepoints

2020-05-26 Thread Steven Wu
> A use case for this might be when you want to rollback a framework upgrade (after some time) due to e.g. a performance or stability issue. Downgrade (that Konstantin called out) is an important and realistic scenario. It will be great to support backward compatibility for savepoint or at least

Re: [DISCUSS] FLIP-118: Improve Flink’s ID system

2020-03-30 Thread Steven Wu
+1 on allowing user defined resourceId for taskmanager On Sun, Mar 29, 2020 at 7:24 PM Yang Wang wrote: > Hi Konstantin, > > I think it is a good idea. Currently, our users also report a similar issue > with > resourceId of standalone cluster. When we start a standalone cluster now, > the

Re: [VOTE] Release 1.10.0, release candidate #1

2020-02-02 Thread Steven Wu
I filed a small issue regarding readability for memory configurations. It is not a blocking issue. I already attached a PR. https://issues.apache.org/jira/browse/FLINK-15846 On Fri, Jan 31, 2020 at 9:20 PM Thomas Weise wrote: > As part of testing the RC, I run into the following issue with a

Re: [DISCUSS] FLIP-27: Refactor Source Interface

2020-01-15 Thread Steven Wu
t; it > >> comes. > >> * > >> * The source may run forever (until the program is terminated) > or > >> might actually end at some point, > >> * based on some source-specific conditions. Because that is not > >> transparent to the r

Re: [DISCUSS] FLIP-27: Refactor Source Interface

2019-12-19 Thread Steven Wu
Becket, Regarding "UNBOUNDED source that stops at some point", I found it difficult to grasp what UNBOUNDED really mean. If we want to use Kafka source with an end/stop time, I guess you call it UNBOUNDED kafka source that stops (aka BOUNDED-streaming). The terminology is a little confusing to

Re: [ANNOUNCE] Progress of Apache Flink 1.10 #2

2019-11-01 Thread Steven Wu
Gary, FLIP-27 seems to get omitted in the 2nd update. below is the info from update #1. - FLIP-27: Refactor Source Interface [20] - FLIP accepted. Implementation is in progress. On Fri, Nov 1, 2019 at 7:01 AM Gary Yao wrote: > Hi community, > > Because we have approximately one month

Re: [ANNOUNCE] Becket Qin joins the Flink PMC

2019-10-31 Thread Steven Wu
Congratulations, Becket! On Wed, Oct 30, 2019 at 9:51 PM Shaoxuan Wang wrote: > Congratulations, Becket! > > On Mon, Oct 28, 2019 at 6:08 PM Fabian Hueske wrote: > > > Hi everyone, > > > > I'm happy to announce that Becket Qin has joined the Flink PMC. > > Let's congratulate and welcome Becket

Re: [SURVEY] How many people are using customized RestartStrategy(s)

2019-09-25 Thread Steven Wu
ra/browse/FLINK-14164 > > Thanks, > Zhu Zhu > > Steven Wu 于2019年9月25日周三 上午2:30写道: > >> Zhu Zhu, >> >> Sorry, I was using different terminology. yes, Flink meter is what I was >> talking about regarding "fullRestarts" for threshold based aler

Re: [SURVEY] How many people are using customized RestartStrategy(s)

2019-09-24 Thread Steven Wu
pects > fine grained recovery. > > [1] https://issues.apache.org/jira/browse/FLINK-14164 > > Thanks, > Zhu Zhu > > Steven Wu 于2019年9月24日周二 上午6:41写道: > >> >> When we setup alert like "fullRestarts > 1" for some rolling window, we >> want to use

  1   2   >