[jira] [Created] (FLINK-35780) Support state migration from disabling to enabling ttl in RocksDBState

2024-07-07 Thread xiangyu feng (Jira)
xiangyu feng created FLINK-35780:


 Summary: Support state migration from disabling to enabling ttl in 
RocksDBState
 Key: FLINK-35780
 URL: https://issues.apache.org/jira/browse/FLINK-35780
 Project: Flink
  Issue Type: Sub-task
  Components: Runtime / State Backends
Reporter: xiangyu feng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [ANNOUNCE] New Apache Flink PMC Member - Fan Rui

2024-06-05 Thread xiangyu feng
Congratulations, Rui!

Regards,
Xiangyu Feng

Feng Jin  于2024年6月5日周三 20:42写道:

> Congratulations, Rui!
>
>
> Best,
> Feng Jin
>
> On Wed, Jun 5, 2024 at 8:23 PM Yanfei Lei  wrote:
>
> > Congratulations, Rui!
> >
> > Best,
> > Yanfei
> >
> > Luke Chen  于2024年6月5日周三 20:08写道:
> > >
> > > Congrats, Rui!
> > >
> > > Luke
> > >
> > > On Wed, Jun 5, 2024 at 8:02 PM Jiabao Sun 
> wrote:
> > >
> > > > Congrats, Rui. Well-deserved!
> > > >
> > > > Best,
> > > > Jiabao
> > > >
> > > > Zhanghao Chen  于2024年6月5日周三 19:29写道:
> > > >
> > > > > Congrats, Rui!
> > > > >
> > > > > Best,
> > > > > Zhanghao Chen
> > > > > 
> > > > > From: Piotr Nowojski 
> > > > > Sent: Wednesday, June 5, 2024 18:01
> > > > > To: dev ; rui fan <1996fan...@gmail.com>
> > > > > Subject: [ANNOUNCE] New Apache Flink PMC Member - Fan Rui
> > > > >
> > > > > Hi everyone,
> > > > >
> > > > > On behalf of the PMC, I'm very happy to announce another new Apache
> > Flink
> > > > > PMC Member - Fan Rui.
> > > > >
> > > > > Rui has been active in the community since August 2019. During this
> > time
> > > > he
> > > > > has contributed a lot of new features. Among others:
> > > > >   - Decoupling Autoscaler from Kubernetes Operator, and supporting
> > > > > Standalone Autoscaler
> > > > >   - Improvements to checkpointing, flamegraphs, restart strategies,
> > > > > watermark alignment, network shuffles
> > > > >   - Optimizing the memory and CPU usage of large operators, greatly
> > > > > reducing the risk and probability of TaskManager OOM
> > > > >
> > > > > He reviewed a significant amount of PRs and has been active both on
> > the
> > > > > mailing lists and in Jira helping to both maintain and grow Apache
> > > > Flink's
> > > > > community. He is also our current Flink 1.20 release manager.
> > > > >
> > > > > In the last 12 months, Rui has been the most active contributor in
> > the
> > > > > Flink Kubernetes Operator project, while being the 2nd most active
> > Flink
> > > > > contributor at the same time.
> > > > >
> > > > > Please join me in welcoming and congratulating Fan Rui!
> > > > >
> > > > > Best,
> > > > > Piotrek (on behalf of the Flink PMC)
> > > > >
> > > >
> >
>


Re: [ANNOUNCE] New Apache Flink Committer - Zakelly Lan

2024-04-18 Thread xiangyu feng
Congratulations, Zakelly!


Regards,
Xiangyu Feng

yh z  于2024年4月18日周四 14:27写道:

> Congratulations Zakelly!
>
> Best regards,
> Yunhong (swuferhong)
>
> gongzhongqiang  于2024年4月17日周三 21:26写道:
>
> > Congratulations, Zakelly!
> >
> >
> > Best,
> > Zhongqiang Gong
> >
> > Yuan Mei  于2024年4月15日周一 10:51写道:
> >
> > > Hi everyone,
> > >
> > > On behalf of the PMC, I'm happy to let you know that Zakelly Lan has
> > become
> > > a new Flink Committer!
> > >
> > > Zakelly has been continuously contributing to the Flink project since
> > 2020,
> > > with a focus area on Checkpointing, State as well as frocksdb (the
> > default
> > > on-disk state db).
> > >
> > > He leads several FLIPs to improve checkpoints and state APIs, including
> > > File Merging for Checkpoints and configuration/API reorganizations. He
> is
> > > also one of the main contributors to the recent efforts of
> "disaggregated
> > > state management for Flink 2.0" and drives the entire discussion in the
> > > mailing thread, demonstrating outstanding technical depth and breadth
> of
> > > knowledge.
> > >
> > > Beyond his technical contributions, Zakelly is passionate about helping
> > the
> > > community in numerous ways. He spent quite some time setting up the
> Flink
> > > Speed Center and rebuilding the benchmark pipeline after the original
> one
> > > was out of lease. He helps build frocksdb and tests for the upcoming
> > > frocksdb release (bump rocksdb from 6.20.3->8.10).
> > >
> > > Please join me in congratulating Zakelly for becoming an Apache Flink
> > > committer!
> > >
> > > Best,
> > > Yuan (on behalf of the Flink PMC)
> > >
> >
>


Re: [ANNOUNCE] New Apache Flink PMC Member - Lincoln Lee

2024-04-12 Thread xiangyu feng
Congratulations, Lincoln!

Best,
Xiangyu Feng

Feifan Wang  于2024年4月12日周五 17:19写道:

> Congratulations, Lincoln!
>
>
> ——
>
> Best regards,
>
> Feifan Wang
>
>
>
>
> At 2024-04-12 15:59:00, "Jark Wu"  wrote:
> >Hi everyone,
> >
> >On behalf of the PMC, I'm very happy to announce that Lincoln Lee has
> >joined the Flink PMC!
> >
> >Lincoln has been an active member of the Apache Flink community for
> >many years. He mainly works on Flink SQL component and has driven
> >/pushed many FLIPs around SQL, including FLIP-282/373/415/435 in
> >the recent versions. He has a great technical vision of Flink SQL and
> >participated in plenty of discussions in the dev mailing list. Besides
> >that,
> >he is community-minded, such as being the release manager of 1.19,
> >verifying releases, managing release syncs, writing the release
> >announcement etc.
> >
> >Congratulations and welcome Lincoln!
> >
> >Best,
> >Jark (on behalf of the Flink PMC)
>


Re: [DISCUSS] FLIP-414: Support Retry Mechanism in RocksDBStateDataTransfer

2024-02-21 Thread xiangyu feng
Hi Mayue, Hangxiangyu and Piotr,

Sorry for the late reply and thanks a lot for your feedback. After careful
thoughts, I agree with your opinions in following ways:

1, Instead of introducing a specific retry mechanism for RocksDB data
transfer, we should introduce a more fine-grained retry mechanism for
checkpointing when interacting with external storage to improve the overall
success rate;

2, This fine-grained retry mechanism should work with all kinds of state
implementations (OperatorStateBackend、KeyedStateBackend);

3, Currently, there are several Retry Implementations in Flink Code base:

   - `RetryStrategy` and `FutureUtils` under `flink-core` module for common
   usage
   - `AsyncRetryStrategy` and `RetryableResultHandlerDelegator` under
   datastream-api and used for async operators
   - `RetryPolicy` and `RetryingExecutor` under `flink-dstl` module and
   used for ChangelogStateBackend

IMHO, these implementations are almost interchangeable with few
variations. Unifying all the Retry implementations might be another big
topic to discuss, but I do agree that the retry tools used by all
StateBackend should be consistent.

4,The fine-grained retry mechanism should also consider different external
storage exceptions, the retry action should not be performed for a
permanent failure. AFAIK, this can be achieved by a customized retry
predicate.

According to the suggestions mentioned above, I will rework FLIP-414[1] and
continue this discussion after that.

Thx again for your valuable options~

[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-414%3A+Support+Retry+Mechanism+in+RocksDBStateDataTransfer

Best Regards,
Xiangyu Feng


Piotr Nowojski  于2024年1月12日周五 01:39写道:

> Hi,
>
> Thanks for the proposal. I second the Hangxiang's suggestions.
>
> I think this might be valuable. Instead of retrying the whole checkpoint,
> it will be more resource efficient
> to retry upload of a single file.
>
> Regarding re-using configuration options, a while back we introduced
> `taskmanager.network.retries`
> config option. It was hoped to eventually encompass things like this.
>
> My own concern is if we should retry regardless of the exception type, or
> should we focus on things like
> connection loss/host unreachable? All in all, it would be better to not
> retry upload if the failure was:
> - `FileSystem` for given schema not found
> - authorisation failed
> - lack of write rights
> - ...
>
> Best,
> Piotrek
>
>
>
>
> czw., 11 sty 2024 o 10:35 Hangxiang Yu  napisał(a):
>
> > Thanks for driving this.
> > Retry mechanism is common when we want to get or put data by network.
> > So I think it will help when checkpoint failure due to temporary network
> > problems, of course it may increase a bit overhead for some other
> reasons.
> >
> > Some comments and suggestions:
> > 1. Since Flink has a checkpoint mechanism to retry failed checkpoint
> > coarsely, I think it looks good to me if this fine-grained retry could be
> > configurable and don't change the current default mechanism.
> > 2. This should work with the checkpoint procedure of all state backends,
> > Could we make this config unrelated to a specific state backend (maybe
> > execution.checkpointing.xxx)?  Then it could be supported by below state
> > backends.
> > 3. We may not need to re-implement it. There are some tools supporting
> the
> > Retry mechanism (see RetryingExecutor and RetryPolicy in changelog dstl
> > module), it's better to make them become more common tools and reuse
> them.
> >
> > On Thu, Jan 11, 2024 at 3:09 PM yue ma  wrote:
> >
> > > Thanks for driving this effort, xiangyu!
> > > The proposal overall LGTM.
> > > I just have a small question. There are other places in Flink that
> > interact
> > > with external storage. Should we consider adding a general retry
> > mechanism
> > > to them?
> > >
> > > xiangyu feng  于2024年1月8日周一 11:31写道:
> > >
> > > > Hi devs,
> > > >
> > > > I'm opening this thread to discuss FLIP-414: Support Retry Mechanism
> in
> > > > RocksDBStateDataTransfer[1].
> > > >
> > > > Currently, there is no retry mechanism for downloading and uploading
> > > > RocksDB state files. Any jittering of remote filesystem might lead
> to a
> > > > checkpoint failure. By supporting retry mechanism in
> > > > `RocksDBStateDataTransfer`, we can significantly reduce the failure
> > rate
> > > of
> > > > checkpoint during asynchronous phrase.
> > > >
> > > > To make this retry mechanism configurable, we have introduced two
> > options
> > >

[jira] [Created] (FLINK-34452) Release Testing: Verify FLINK-15959 Add min number of slots configuration to limit total number of slots

2024-02-16 Thread xiangyu feng (Jira)
xiangyu feng created FLINK-34452:


 Summary: Release Testing: Verify FLINK-15959 Add min number of 
slots configuration to limit total number of slots
 Key: FLINK-34452
 URL: https://issues.apache.org/jira/browse/FLINK-34452
 Project: Flink
  Issue Type: Sub-task
  Components: Runtime / Coordination
Affects Versions: 1.19.0
Reporter: xiangyu feng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-34226) Add the backoff-multiplier configuration in ExponentialBackoffRetryStrategy

2024-01-24 Thread xiangyu feng (Jira)
xiangyu feng created FLINK-34226:


 Summary: Add the backoff-multiplier configuration in 
ExponentialBackoffRetryStrategy
 Key: FLINK-34226
 URL: https://issues.apache.org/jira/browse/FLINK-34226
 Project: Flink
  Issue Type: Sub-task
Reporter: xiangyu feng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[RESULT][VOTE] FLIP-407: Improve Flink Client performance in interactive scenarios

2024-01-15 Thread xiangyu feng
Hi all,

Thanks for your review and the votes!

I am happy to announce that FLIP-407: Improve Flink Client performance in
interactive scenarios[1] has been accepted.

There are 3 binding votes and 1 non-binding votes [2]:

- Rui Fan (binding)
- Yangze Guo (binding)
- Weihua Hu (binding)
- Chen Zhanghao (non-binding)

There is no disapproving vote.

Regards,
Xiangyu Feng


[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-407%3A+Improve+Flink+Client+performance+in+interactive+scenarios
[2] https://lists.apache.org/thread/b1hqnlxkyqnqt1sn1dww7jw3bdrs030o


Re: [VOTE] FLIP-407: Improve Flink Client performance in interactive scenarios

2024-01-15 Thread xiangyu feng
Thank you all for the votes! I will close the voting thread and summarize
the result in a separate email.

Regards,
Xiangyu Feng

Zhanghao Chen  于2024年1月16日周二 10:18写道:

> +1 (non-binding)
>
> Best,
> Zhanghao Chen
> ____
> From: xiangyu feng 
> Sent: Thursday, January 11, 2024 19:44
> To: dev@flink.apache.org 
> Subject: [VOTE] FLIP-407: Improve Flink Client performance in interactive
> scenarios
>
> Hi all,
>
> I would like to start the vote for FLIP-407: Improve Flink Client
> performance in interactive scenarios[1].
> This FLIP was discussed in this thread [2].
>
> The vote will be open for at least 72 hours unless there is an objection or
> insufficient votes.
>
> Regards,
> Xiangyu
>
> [1]
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-407%3A+Improve+Flink+Client+performance+in+interactive+scenarios
> [2] https://lists.apache.org/thread/ccsv66ygffgqbv956bnknbpllj4t24kj
>


Re: [DISCUSS] FLIP-416: Deprecate and remove the RestoreMode#LEGACY

2024-01-15 Thread xiangyu feng
Hi Zakelly,

Thanks for driving this. +1 to remove the LEGACY mode. This will make the
semantic of snapshot ownership more clear.

Best regards,
Xiangyu Feng

Yuan Mei  于2024年1月15日周一 23:16写道:

> +1
>
> Thanks for driving this Zakelly!
>
> Best
> Yuan
>
> On Mon, Jan 15, 2024 at 10:47 PM Piotr Nowojski 
> wrote:
>
> > +1 good idea!
> >
> > pon., 15 sty 2024 o 05:11 Jinzhong Li 
> > napisał(a):
> >
> > > Hi Zakelly,
> > >
> > > Thanks for driving the discussion. It makes sense to remove  LEGACY
> mode
> > in
> > > Flink 2.0.
> > >
> > > Best,
> > > Jinzhong Li
> > >
> > > On Mon, Jan 15, 2024 at 10:34 AM Xuannan Su 
> > wrote:
> > >
> > > > Hi Zakelly,
> > > >
> > > > Thanks for driving this. +1 to removing the LEGACY mode.
> > > >
> > > > Best regards,
> > > > Xuannan
> > > >
> > > > On Mon, Jan 15, 2024 at 3:22 AM Danny Cranmer <
> dannycran...@apache.org
> > >
> > > > wrote:
> > > > >
> > > > > +1 to removing LEGACY mode in Flink 2.0. Thanks for driving.
> > > > >
> > > > > Danny,
> > > > >
> > > > > On Sat, 13 Jan 2024, 08:20 Yanfei Lei, 
> wrote:
> > > > >
> > > > > > Thanks Zakelly for starting this discussion.
> > > > > >
> > > > > > Regardless of whether it is for users or developers, deprecating
> > > > > > RestoreMode#LEGACY makes the semantics clearer and lower
> > maintenance
> > > > > > costs, and Flink 2.0 is a good time point to do this.
> > > > > > So +1 for the overall idea.
> > > > > >
> > > > > > Best,
> > > > > > Yanfei
> > > > > >
> > > > > > Zakelly Lan  于2024年1月11日周四 14:57写道:
> > > > > >
> > > > > > >
> > > > > > > Hi devs,
> > > > > > >
> > > > > > > I'd like to start a discussion on FLIP-416: Deprecate and
> remove
> > > the
> > > > > > > RestoreMode#LEGACY[1].
> > > > > > >
> > > > > > > The FLIP-193[2] introduced two modes of state file ownership
> > during
> > > > > > > checkpoint restoration: RestoreMode#CLAIM and
> > RestoreMode#NO_CLAIM.
> > > > The
> > > > > > > LEGACY mode, which was how Flink worked until 1.15, has been
> > > > superseded
> > > > > > by
> > > > > > > NO_CLAIM as the default mode. The main drawback of LEGACY mode
> is
> > > > that
> > > > > > the
> > > > > > > new job relies on artifacts from the old job without cleaning
> > them
> > > > up,
> > > > > > > leaving users uncertain about when it is safe to delete the old
> > > > > > checkpoint
> > > > > > > directories. This leads to the accumulation of unnecessary
> > > checkpoint
> > > > > > files
> > > > > > > that are never cleaned up. Considering cluster availability and
> > job
> > > > > > > maintenance, it is not recommended to use LEGACY mode. Users
> > could
> > > > choose
> > > > > > > the other two modes to get a clear semantic for the state file
> > > > ownership.
> > > > > > >
> > > > > > > This FLIP proposes to deprecate the LEGACY mode and remove it
> > > > completely
> > > > > > in
> > > > > > > the upcoming Flink 2.0. This will make the semantic clear as
> well
> > > as
> > > > > > > eliminate many bugs caused by mode transitions involving LEGACY
> > > mode
> > > > > > (e.g.
> > > > > > > FLINK-27114 [3]) and enhance code maintainability.
> > > > > > >
> > > > > > > Looking forward to hearing from you!
> > > > > > >
> > > > > > > [1] https://cwiki.apache.org/confluence/x/ookkEQ
> > > > > > > [2] https://cwiki.apache.org/confluence/x/bIyqCw
> > > > > > > [3] https://issues.apache.org/jira/browse/FLINK-27114
> > > > > > >
> > > > > > > Best,
> > > > > > > Zakelly
> > > > > >
> > > >
> > >
> >
>


[VOTE] FLIP-407: Improve Flink Client performance in interactive scenarios

2024-01-11 Thread xiangyu feng
Hi all,

I would like to start the vote for FLIP-407: Improve Flink Client
performance in interactive scenarios[1].
This FLIP was discussed in this thread [2].

The vote will be open for at least 72 hours unless there is an objection or
insufficient votes.

Regards,
Xiangyu

[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-407%3A+Improve+Flink+Client+performance+in+interactive+scenarios
[2] https://lists.apache.org/thread/ccsv66ygffgqbv956bnknbpllj4t24kj


Re: [Discuss] FLIP-407: Improve Flink Client performance in interactive scenarios

2024-01-11 Thread xiangyu feng
Hi devs,

Thanks for all the feedback. If there are no more comments, I would like to
start a vote for this FLIP, thanks again!

Best,
Xiangyu Feng

Weihua Hu  于2024年1月9日周二 14:45写道:

> Thanks for proposing this FLIP.
>
> Experiments have shown that it significantly enhances the real-time query
> experience.
> +1 for this.
>
> Best,
> Weihua
>
>
> On Mon, Jan 8, 2024 at 5:19 PM Rui Fan <1996fan...@gmail.com> wrote:
>
>> Thanks Xiangyu for the quick update!
>>
>> LGTM
>>
>> Best,
>> Rui
>>
>> On Mon, Jan 8, 2024 at 4:27 PM xiangyu feng  wrote:
>>
>> > Hi Rui and Yong,
>> >
>> > Thx for ur reply.
>> >
>> > My initial attention here is that for short-lived jobs under high QPS: a
>> > fixed delay retry strategy will cause extra resource waste and not
>> flexible
>> > enough, an exponential-backoff strategy might significantly increase the
>> > query latency since the interval time grows too fast. An
>> incremental-delay
>> > strategy could be balanced between resource consumption and short-query
>> > latency.
>> >
>> > With a second thought,  an exponential-delay retry strategy with a
>> > configurable multiplier option can also achieve this goal. By setting
>> the
>> > default value of multiplier to 1, we can be consistent with the original
>> > behavior and reduce the configuration items at the same time.
>> >
>> > I've updated this FLIP accordingly, look forward to your feedback.
>> >
>> > Regards,
>> > Xiangyu Feng
>> >
>> >
>> > Rui Fan <1996fan...@gmail.com> 于2024年1月8日周一 15:29写道:
>> >
>> >> Only one strategy is fine to me.
>> >>
>> >> When the multiplier is set to 1, the exponential-delay will become
>> >> fixed-delay.
>> >> So fixed-delay may not be needed.
>> >>
>> >> Best,
>> >> Rui
>> >>
>> >> On Mon, Jan 8, 2024 at 2:17 PM Yong Fang  wrote:
>> >>
>> >> > I agree with @Rui that the current configuration for Flink Client is
>> a
>> >> > little complex. Can we just provide one strategy with less
>> configuration
>> >> > items for all scenarios?
>> >> >
>> >> > Best,
>> >> > Fang Yong
>> >> >
>> >> > On Mon, Jan 8, 2024 at 11:19 AM Rui Fan <1996fan...@gmail.com>
>> wrote:
>> >> >
>> >> > > Thanks xiangyu for driving this proposal! And sorry for the
>> >> > > late reply.
>> >> > >
>> >> > > Overall looks good to me, I only have some minor questions:
>> >> > >
>> >> > > 1. Do we need to introduce 3 collect strategies in the first
>> version?
>> >> > >
>> >> > > Large and comprehensive configuration items will bring
>> >> > > additional learning costs and usage costs to users. I tend to
>> >> > > provide users with out-of-the-box parameters and 2 collect
>> >> > > strategies may be enough for users.
>> >> > >
>> >> > > IIUC, there is no big difference between exponential-delay and
>> >> > > incremental-delay, especially the default parameters provided.
>> >> > > I wonder could we provide a multiplier for exponential-delay
>> strategy
>> >> > > and removing the incremental-delay strategy?
>> >> > >
>> >> > > Of course, if you think multiplier option is not needed based on
>> >> > > your production experience, it's totally fine for me. Simple is
>> >> better.
>> >> > >
>> >> > > 2. Which strategy do you think is best in mass production?
>> >> > >
>> >> > > I'm working on FLIP-364[1], it's related to Flink failover restart
>> >> > > strategy. IIUC, when one cluster only has a few flink jobs,
>> >> > > fixed-delay is fine. It guarantees minimal latency without too
>> >> > > much stress. But if one cluster has too many jobs, fixed-delay
>> >> > > may not be stable.
>> >> > >
>> >> > > Do you think exponential-delay is better than fixed delay in this
>> >> > > scenario? And which strategy is used in your production for now?
>> >> > > Would you mind sharing it?
>> >> > >
>> >> > > Looking

Re: [VOTE] Accept Flink CDC into Apache Flink

2024-01-09 Thread xiangyu feng
+1 (non-binding)

Regards,
Xiangyu Feng

Danny Cranmer  于2024年1月9日周二 17:50写道:

> +1 (binding)
>
> Thanks,
> Danny
>
> On Tue, Jan 9, 2024 at 9:31 AM Feng Jin  wrote:
>
> > +1 (non-binding)
> >
> > Best,
> > Feng Jin
> >
> > On Tue, Jan 9, 2024 at 5:29 PM Yuxin Tan  wrote:
> >
> > > +1 (non-binding)
> > >
> > > Best,
> > > Yuxin
> > >
> > >
> > > Márton Balassi  于2024年1月9日周二 17:25写道:
> > >
> > > > +1 (binding)
> > > >
> > > > On Tue, Jan 9, 2024 at 10:15 AM Leonard Xu 
> wrote:
> > > >
> > > > > +1(binding)
> > > > >
> > > > > Best,
> > > > > Leonard
> > > > >
> > > > > > 2024年1月9日 下午5:08,Yangze Guo  写道:
> > > > > >
> > > > > > +1 (non-binding)
> > > > > >
> > > > > > Best,
> > > > > > Yangze Guo
> > > > > >
> > > > > > On Tue, Jan 9, 2024 at 5:06 PM Robert Metzger <
> rmetz...@apache.org
> > >
> > > > > wrote:
> > > > > >>
> > > > > >> +1 (binding)
> > > > > >>
> > > > > >>
> > > > > >> On Tue, Jan 9, 2024 at 9:54 AM Guowei Ma 
> > > > wrote:
> > > > > >>
> > > > > >>> +1 (binding)
> > > > > >>> Best,
> > > > > >>> Guowei
> > > > > >>>
> > > > > >>>
> > > > > >>> On Tue, Jan 9, 2024 at 4:49 PM Rui Fan <1996fan...@gmail.com>
> > > wrote:
> > > > > >>>
> > > > > >>>> +1 (non-binding)
> > > > > >>>>
> > > > > >>>> Best,
> > > > > >>>> Rui
> > > > > >>>>
> > > > > >>>> On Tue, Jan 9, 2024 at 4:41 PM Hang Ruan <
> > ruanhang1...@gmail.com>
> > > > > wrote:
> > > > > >>>>
> > > > > >>>>> +1 (non-binding)
> > > > > >>>>>
> > > > > >>>>> Best,
> > > > > >>>>> Hang
> > > > > >>>>>
> > > > > >>>>> gongzhongqiang  于2024年1月9日周二
> > 16:25写道:
> > > > > >>>>>
> > > > > >>>>>> +1 non-binding
> > > > > >>>>>>
> > > > > >>>>>> Best,
> > > > > >>>>>> Zhongqiang
> > > > > >>>>>>
> > > > > >>>>>> Leonard Xu  于2024年1月9日周二 15:05写道:
> > > > > >>>>>>
> > > > > >>>>>>> Hello all,
> > > > > >>>>>>>
> > > > > >>>>>>> This is the official vote whether to accept the Flink CDC
> > code
> > > > > >>>>>> contribution
> > > > > >>>>>>> to Apache Flink.
> > > > > >>>>>>>
> > > > > >>>>>>> The current Flink CDC code, documentation, and website can
> be
> > > > > >>>>>>> found here:
> > > > > >>>>>>> code: https://github.com/ververica/flink-cdc-connectors <
> > > > > >>>>>>> https://github.com/ververica/flink-cdc-connectors>
> > > > > >>>>>>> docs: https://ververica.github.io/flink-cdc-connectors/ <
> > > > > >>>>>>> https://ververica.github.io/flink-cdc-connectors/>
> > > > > >>>>>>>
> > > > > >>>>>>> This vote should capture whether the Apache Flink community
> > is
> > > > > >>>>> interested
> > > > > >>>>>>> in accepting, maintaining, and evolving Flink CDC.
> > > > > >>>>>>>
> > > > > >>>>>>> Regarding my original proposal[1] in the dev mailing list,
> I
> > > > firmly
> > > > > >>>>>> believe
> > > > > >>>>>>> that this initiative aligns perfectly with Flink. For the
> > Flink
> > > > > &

Re: [Discuss] FLIP-407: Improve Flink Client performance in interactive scenarios

2024-01-08 Thread xiangyu feng
Hi Rui and Yong,

Thx for ur reply.

My initial attention here is that for short-lived jobs under high QPS: a
fixed delay retry strategy will cause extra resource waste and not flexible
enough, an exponential-backoff strategy might significantly increase the
query latency since the interval time grows too fast. An incremental-delay
strategy could be balanced between resource consumption and short-query
latency.

With a second thought,  an exponential-delay retry strategy with a
configurable multiplier option can also achieve this goal. By setting the
default value of multiplier to 1, we can be consistent with the original
behavior and reduce the configuration items at the same time.

I've updated this FLIP accordingly, look forward to your feedback.

Regards,
Xiangyu Feng


Rui Fan <1996fan...@gmail.com> 于2024年1月8日周一 15:29写道:

> Only one strategy is fine to me.
>
> When the multiplier is set to 1, the exponential-delay will become
> fixed-delay.
> So fixed-delay may not be needed.
>
> Best,
> Rui
>
> On Mon, Jan 8, 2024 at 2:17 PM Yong Fang  wrote:
>
> > I agree with @Rui that the current configuration for Flink Client is a
> > little complex. Can we just provide one strategy with less configuration
> > items for all scenarios?
> >
> > Best,
> > Fang Yong
> >
> > On Mon, Jan 8, 2024 at 11:19 AM Rui Fan <1996fan...@gmail.com> wrote:
> >
> > > Thanks xiangyu for driving this proposal! And sorry for the
> > > late reply.
> > >
> > > Overall looks good to me, I only have some minor questions:
> > >
> > > 1. Do we need to introduce 3 collect strategies in the first version?
> > >
> > > Large and comprehensive configuration items will bring
> > > additional learning costs and usage costs to users. I tend to
> > > provide users with out-of-the-box parameters and 2 collect
> > > strategies may be enough for users.
> > >
> > > IIUC, there is no big difference between exponential-delay and
> > > incremental-delay, especially the default parameters provided.
> > > I wonder could we provide a multiplier for exponential-delay strategy
> > > and removing the incremental-delay strategy?
> > >
> > > Of course, if you think multiplier option is not needed based on
> > > your production experience, it's totally fine for me. Simple is better.
> > >
> > > 2. Which strategy do you think is best in mass production?
> > >
> > > I'm working on FLIP-364[1], it's related to Flink failover restart
> > > strategy. IIUC, when one cluster only has a few flink jobs,
> > > fixed-delay is fine. It guarantees minimal latency without too
> > > much stress. But if one cluster has too many jobs, fixed-delay
> > > may not be stable.
> > >
> > > Do you think exponential-delay is better than fixed delay in this
> > > scenario? And which strategy is used in your production for now?
> > > Would you mind sharing it?
> > >
> > > Looking forwarding to your opinion~
> > >
> > > Best,
> > > Rui
> > >
> > > On Sat, Jan 6, 2024 at 5:54 PM xiangyu feng 
> > wrote:
> > >
> > > > Hi all,
> > > >
> > > > Thanks for the comments.
> > > >
> > > > If there is no further comment, we will open the voting thread next
> > week.
> > > >
> > > > Regards,
> > > > Xiangyu
> > > >
> > > > Zhanghao Chen  于2024年1月3日周三 16:46写道:
> > > >
> > > > > Thanks for driving this effort on improving the interactive use
> > > > experience
> > > > > of Flink. The proposal overall looks good to me.
> > > > >
> > > > > Best,
> > > > > Zhanghao Chen
> > > > > 
> > > > > From: xiangyu feng 
> > > > > Sent: Tuesday, December 26, 2023 16:51
> > > > > To: dev@flink.apache.org 
> > > > > Subject: [Discuss] FLIP-407: Improve Flink Client performance in
> > > > > interactive scenarios
> > > > >
> > > > > Hi devs,
> > > > >
> > > > > I'm opening this thread to discuss FLIP-407: Improve Flink Client
> > > > > performance in interactive scenarios. The POC test results and
> design
> > > doc
> > > > > can be found at: FLIP-407
> > > > > <
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-407%3A+Improve+Flink+Cli

[DISCUSS] FLIP-414: Support Retry Mechanism in RocksDBStateDataTransfer

2024-01-07 Thread xiangyu feng
Hi devs,

I'm opening this thread to discuss FLIP-414: Support Retry Mechanism in
RocksDBStateDataTransfer[1].

Currently, there is no retry mechanism for downloading and uploading
RocksDB state files. Any jittering of remote filesystem might lead to a
checkpoint failure. By supporting retry mechanism in
`RocksDBStateDataTransfer`, we can significantly reduce the failure rate of
checkpoint during asynchronous phrase.

To make this retry mechanism configurable, we have introduced two options
in this FLIP: `state.backend.rocksdb.checkpoint.transfer.retry.times` and `
state.backend.rocksdb.checkpoint.transfer.retry.interval`. The default
behavior remains to be no retry will be performed in order to be consistent
with the original behavior.

Looking forward to your feedback, thanks.

[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-414%3A+Support+Retry+Mechanism+in+RocksDBStateDataTransfer

Best regards,
Xiangyu Feng


Re: [Discuss] FLIP-407: Improve Flink Client performance in interactive scenarios

2024-01-06 Thread xiangyu feng
Hi all,

Thanks for the comments.

If there is no further comment, we will open the voting thread next week.

Regards,
Xiangyu

Zhanghao Chen  于2024年1月3日周三 16:46写道:

> Thanks for driving this effort on improving the interactive use experience
> of Flink. The proposal overall looks good to me.
>
> Best,
> Zhanghao Chen
> ________
> From: xiangyu feng 
> Sent: Tuesday, December 26, 2023 16:51
> To: dev@flink.apache.org 
> Subject: [Discuss] FLIP-407: Improve Flink Client performance in
> interactive scenarios
>
> Hi devs,
>
> I'm opening this thread to discuss FLIP-407: Improve Flink Client
> performance in interactive scenarios. The POC test results and design doc
> can be found at: FLIP-407
> <
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-407%3A+Improve+Flink+Client+performance+when+interacting+with+dedicated+Flink+Session+Clusters
> >
> .
>
> Currently, Flink Client is mainly designed for one time interaction with
> the Flink Cluster. All the resources(http connections, threads, ha
> services) and instances(ClusterDescriptor, ClusterClient, RestClient) are
> created and recycled for each interaction. This works well when users do
> not need to interact frequently with Flink Cluster and also saves resource
> usage since resources are recycled immediately after each usage.
>
> However, in OLAP or StreamingWarehouse scenarios, users might submit
> interactive jobs to a dedicated Flink Session Cluster very often. In this
> case, we find that for short queries that can finish in less than 1s in
> Flink Cluster will still have E2E latency greater than 2s. Hence, we
> propose this FLIP to improve the Flink Client performance in this scenario.
> This could also improve the user experience when using session debug mode.
>
> The major change in this FLIP is that there will be a new introduced option
> *'execution.interactive-client'*. When this option is enabled, Flink
> Client will reuse all the necessary resources to improve interactive
> performance, including: HA Services, HTTP connections, threads and all
> kinds of instances related to a long-running Flink Cluster. The default
> value of this option will be false, then Flink Client will behave as
> before.
>
> Also, this FLIP proposed a configurable RetryStrategy when fetching results
> from client-side to Flink Cluster. In interactive scenarios, this can save
> more than 15% of TM CPU usage without performance degradation.
>
> Looking forward to your feedback, thanks.
>
> Best regards,
> Xiangyu
>


Re: [VOTE] FLIP-397: Add config options for administrator JVM options

2024-01-03 Thread xiangyu feng
+1 (non-binding)

Regards,
Xiangyu Feng

Rui Fan <1996fan...@gmail.com> 于2024年1月4日周四 13:03写道:

> +1 (binding)
>
> Best,
> Rui
>
> On Thu, Jan 4, 2024 at 11:45 AM Benchao Li  wrote:
>
> > +1 (binding)
> >
> > Zhanghao Chen  于2024年1月4日周四 10:30写道:
> > >
> > > Hi everyone,
> > >
> > > Thanks for all the feedbacks on FLIP-397 [1], which proposes to add a
> > set of default JVM options for administrator use that prepends the
> user-set
> > extra JVM options for easier platform-wide JVM pre-tuning. It has been
> > discussed in [2].
> > >
> > > I'd like to start a vote. The vote will be open for at least 72 hours
> > (until January 8th 12:00 GMT) unless there is an objection or
> insufficient
> > votes.
> > >
> > > [1]
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-397%3A+Add+config+options+for+administrator+JVM+options
> > > [2] https://lists.apache.org/thread/cflonyrfd1ftmyrpztzj3ywckbq41jzg
> > >
> > > Best,
> > > Zhanghao Chen
> >
> >
> >
> > --
> >
> > Best,
> > Benchao Li
> >
>


Re: [DISCUSS] FLIP-397: Add config options for administrator JVM options

2024-01-02 Thread xiangyu feng
Hi Zhanghao,

Thx for reply. LGTM now.

Zhanghao Chen  于2024年1月2日周二 10:29写道:

> Hi Xiangyu,
>
> The proposed new options are targeted on experienced Flink platform
> administrators instead of normal end users, and a one-by-one mapping from
> non-default option to the default option variant might be easier for users
> to understand. Also, although JM and TM tend to use the same set of JVM
> args in most times, there're cases where different set of JVM args are
> preferable. So I am leaning towards the current design, WDYT?
>
> Best,
> Zhanghao Chen
> ________
> From: xiangyu feng 
> Sent: Friday, December 29, 2023 20:20
> To: dev@flink.apache.org 
> Subject: Re: [DISCUSS] FLIP-397: Add config options for administrator JVM
> options
>
> Hi Zhanghao,
>
> Thanks for driving this. +1 for the overall idea.
>
> One minor question, do we need separate administrator JVM options for both
> JobManager and TaskManager? Or just one administrator JVM option for all?
>
> I'm afraid of 6 jvm
>
> options(env.java.opts.all\env.java.default-opts.all\env.java.opts.jobmanager\env.java.default-opts.jobmanager\env.java.opts.taskmanager\env.java.default-opts.taskmanager)
> may confuse users.
>
> Regards,
> Xiangyu
>
>
> Yong Fang  于2023年12月27日周三 15:36写道:
>
> > +1 for this, we have met jobs that need to set GC policies different from
> > the default ones to improve performance. Separating the default and
> > user-set ones can help us better manage them.
> >
> > Best,
> > Fang Yong
> >
> > On Fri, Dec 22, 2023 at 9:18 PM Benchao Li  wrote:
> >
> > > +1 from my side,
> > >
> > > I also met some scenarios that I wanted to set some JVM options by
> > > default for all Flink jobs before, such as
> > > '-XX:-DontCompileHugeMethods', without it, some generated big methods
> > > won't be optimized in JVM C2 compiler, leading to poor performance.
> > >
> > > Zhanghao Chen  于2023年11月27日周一 20:04写道:
> > > >
> > > > Hi devs,
> > > >
> > > > I'd like to start a discussion on FLIP-397: Add config options for
> > > administrator JVM options [1].
> > > >
> > > > In production environments, users typically develop and operate their
> > > Flink jobs through a managed platform. Users may need to add JVM
> options
> > to
> > > their Flink applications (e.g. to tune GC options). They typically use
> > the
> > > env.java.opts.x series of options to do so. Platform administrators
> also
> > > have a set of JVM options to apply by default, e.g. to use JVM 17,
> enable
> > > GC logging, or apply pretuned GC options, etc. Both use cases will need
> > to
> > > set the same series of options and will clobber one another. Similar
> > issues
> > > have been described in SPARK-23472 [2].
> > > >
> > > > Therefore, I propose adding a set of default JVM options for
> > > administrator use that prepends the user-set extra JVM options.
> > > >
> > > > Looking forward to hearing from you.
> > > >
> > > > [1]
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-397%3A+Add+config+options+for+administrator+JVM+options
> > > > [2] https://issues.apache.org/jira/browse/SPARK-23472
> > > >
> > > > Best,
> > > > Zhanghao Chen
> > >
> > >
> > >
> > > --
> > >
> > > Best,
> > > Benchao Li
> > >
> >
>


Re: [DISCUSS] FLIP-397: Add config options for administrator JVM options

2023-12-29 Thread xiangyu feng
Hi Zhanghao,

Thanks for driving this. +1 for the overall idea.

One minor question, do we need separate administrator JVM options for both
JobManager and TaskManager? Or just one administrator JVM option for all?

I'm afraid of 6 jvm
options(env.java.opts.all\env.java.default-opts.all\env.java.opts.jobmanager\env.java.default-opts.jobmanager\env.java.opts.taskmanager\env.java.default-opts.taskmanager)
may confuse users.

Regards,
Xiangyu


Yong Fang  于2023年12月27日周三 15:36写道:

> +1 for this, we have met jobs that need to set GC policies different from
> the default ones to improve performance. Separating the default and
> user-set ones can help us better manage them.
>
> Best,
> Fang Yong
>
> On Fri, Dec 22, 2023 at 9:18 PM Benchao Li  wrote:
>
> > +1 from my side,
> >
> > I also met some scenarios that I wanted to set some JVM options by
> > default for all Flink jobs before, such as
> > '-XX:-DontCompileHugeMethods', without it, some generated big methods
> > won't be optimized in JVM C2 compiler, leading to poor performance.
> >
> > Zhanghao Chen  于2023年11月27日周一 20:04写道:
> > >
> > > Hi devs,
> > >
> > > I'd like to start a discussion on FLIP-397: Add config options for
> > administrator JVM options [1].
> > >
> > > In production environments, users typically develop and operate their
> > Flink jobs through a managed platform. Users may need to add JVM options
> to
> > their Flink applications (e.g. to tune GC options). They typically use
> the
> > env.java.opts.x series of options to do so. Platform administrators also
> > have a set of JVM options to apply by default, e.g. to use JVM 17, enable
> > GC logging, or apply pretuned GC options, etc. Both use cases will need
> to
> > set the same series of options and will clobber one another. Similar
> issues
> > have been described in SPARK-23472 [2].
> > >
> > > Therefore, I propose adding a set of default JVM options for
> > administrator use that prepends the user-set extra JVM options.
> > >
> > > Looking forward to hearing from you.
> > >
> > > [1]
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-397%3A+Add+config+options+for+administrator+JVM+options
> > > [2] https://issues.apache.org/jira/browse/SPARK-23472
> > >
> > > Best,
> > > Zhanghao Chen
> >
> >
> >
> > --
> >
> > Best,
> > Benchao Li
> >
>


Re: [DISCUSS] FLIP-403: High Availability Services for OLAP Scenarios

2023-12-29 Thread xiangyu feng
Thanks Yangze for restart this discussion.

+1 for the overall idea. By splitting the HighAvailabilityServices into
LeaderServices and PersistenceServices, we may support configuring
different storage behind them in the future.

We did run into real problems in production where too much job metadata was
being stored on ZK, causing system instability.


Yangze Guo  于2023年12月29日周五 10:21写道:

> Thanks for the response, Zhanghao.
>
> PersistenceServices sounds good to me.
>
> Best,
> Yangze Guo
>
> On Wed, Dec 27, 2023 at 11:30 AM Zhanghao Chen
>  wrote:
> >
> > Thanks for driving this effort, Yangze! The proposal overall LGTM. Other
> from the throughput enhancement in the OLAP scenario, the separation of
> leader election/discovery services and the metadata persistence services
> will also make the HA impl clearer and easier to maintain. Just a minor
> comment on naming: would it better to rename PersistentServices to
> PersistenceServices, as usually we put a noun before Services?
> >
> > Best,
> > Zhanghao Chen
> > 
> > From: Yangze Guo 
> > Sent: Tuesday, December 19, 2023 17:33
> > To: dev 
> > Subject: [DISCUSS] FLIP-403: High Availability Services for OLAP
> Scenarios
> >
> > Hi, there,
> >
> > We would like to start a discussion thread on "FLIP-403: High
> > Availability Services for OLAP Scenarios"[1].
> >
> > Currently, Flink's high availability service consists of two
> > mechanisms: leader election/retrieval services for JobManager and
> > persistent services for job metadata. However, these mechanisms are
> > set up in an "all or nothing" manner. In OLAP scenarios, we typically
> > only require leader election/retrieval services for JobManager
> > components since jobs usually do not have a restart strategy.
> > Additionally, the persistence of job states can negatively impact the
> > cluster's throughput, especially for short query jobs.
> >
> > To address these issues, this FLIP proposes splitting the
> > HighAvailabilityServices into LeaderServices and PersistentServices,
> > and enable users to independently configure the high availability
> > strategies specifically related to jobs.
> >
> > Please find more details in the FLIP wiki document [1]. Looking
> > forward to your feedback.
> >
> > [1]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-403+High+Availability+Services+for+OLAP+Scenarios
> >
> > Best,
> > Yangze Guo
>


[Discuss] FLIP-407: Improve Flink Client performance in interactive scenarios

2023-12-26 Thread xiangyu feng
Hi devs,

I'm opening this thread to discuss FLIP-407: Improve Flink Client
performance in interactive scenarios. The POC test results and design doc
can be found at: FLIP-407

.

Currently, Flink Client is mainly designed for one time interaction with
the Flink Cluster. All the resources(http connections, threads, ha
services) and instances(ClusterDescriptor, ClusterClient, RestClient) are
created and recycled for each interaction. This works well when users do
not need to interact frequently with Flink Cluster and also saves resource
usage since resources are recycled immediately after each usage.

However, in OLAP or StreamingWarehouse scenarios, users might submit
interactive jobs to a dedicated Flink Session Cluster very often. In this
case, we find that for short queries that can finish in less than 1s in
Flink Cluster will still have E2E latency greater than 2s. Hence, we
propose this FLIP to improve the Flink Client performance in this scenario.
This could also improve the user experience when using session debug mode.

The major change in this FLIP is that there will be a new introduced option
*'execution.interactive-client'*. When this option is enabled, Flink
Client will reuse all the necessary resources to improve interactive
performance, including: HA Services, HTTP connections, threads and all
kinds of instances related to a long-running Flink Cluster. The default
value of this option will be false, then Flink Client will behave as before.

Also, this FLIP proposed a configurable RetryStrategy when fetching results
from client-side to Flink Cluster. In interactive scenarios, this can save
more than 15% of TM CPU usage without performance degradation.

Looking forward to your feedback, thanks.

Best regards,
Xiangyu


Re: FLIP-403: High Availability Services for OLAP Scenarios

2023-12-19 Thread xiangyu feng
Hi Yangze,

Thanks for driving this. I like the idea to separate the current
HighAvailabilityServices into LeaderServices and PersistentServices. AFAIK,
there is no need to
bind JobGraphStore/JobResultStore/BlobStore/CheckpointStore modules with
LeaderElection/LeaderRetrieval services. The formers are mainly used to
store the job related metadata while the latter is designed to distinguish
the current working jobmanager and standby jobmanager.

I have one question for this, will this FLIP change the leader path or job
path stored in the HighAvailabilityServices? If user is upgrading a Flink
job from using current HighAvailabilityServices, will there be any
compatibility issues, do we need any data migration?

Thx,
Xiangyu


Re: [ANNOUNCE] Experimental Java 21 support now available on master

2023-12-10 Thread xiangyu feng
Thanks Sergey for the great work!

ZGC in JDK21 is now a multi-generational garbage collector. In jdk17, we
tried using ZGC as the default garbage collector for streaming
computations, but found that while it reduced STW, it did have a negative
impact on job throughput because a single-generation collector was not as
efficient as a generational collector. This issue might be resolved in
JDK21.

Looking forward to this!

Best Regards,
Xiangyu

Sergey Nuyanzin  于2023年12月11日周一 07:06写道:

> thanks for checking and creation the ticket
>
> yes, that probably makes sense
>
> On Thu, Nov 30, 2023 at 1:08 PM Yun Tang  wrote:
>
> > Hi Sergey,
> >
> > I checked the CI [1] which was executed with Java21, and noticed that the
> > StatefulJobSnapshotMigrationITCase-related tests have passed, which
> proves
> > what I guessed before, most checkpoints/savepoints should be restored
> > successfully.
> >
> > I think we shall introduce such snapshot migration tests, which restore
> > snapshots containing scala code. I also create a ticket focused on Java17
> > [2]
> >
> >
> > [1]
> >
> https://dev.azure.com/snuyanzin/flink/_build/results?buildId=2620=logs=0a15d512-44ac-5ba5-97ab-13a5d066c22c=9a028d19-6c4b-5a4e-d378-03fca149d0b1
> > [2] https://issues.apache.org/jira/browse/FLINK-33707
> >
> >
> > Best
> > Yun Tang
> > 
> > From: Sergey Nuyanzin 
> > Sent: Thursday, November 30, 2023 14:41
> > To: dev@flink.apache.org 
> > Subject: Re: [ANNOUNCE] Experimental Java 21 support now available on
> > master
> >
> > Thanks Yun Tang
> >
> > One question to clarify: since the scala version was also bumped for java
> > 17, shouldn't there be a similar task for java 17?
> >
> > On Thu, Nov 30, 2023 at 3:43 AM Yun Tang  wrote:
> >
> > > Hi Sergey,
> > >
> > > You can leverage all tests extending SnapshotMigrationTestBase[1] to
> > > verify the logic. I believe all binary _metadata existing in the
> > resources
> > > folder[2] were built by JDK8.
> > >
> > > I also create a ticket FLINK-33699[3] to track this.
> > >
> > > [1]
> > >
> >
> https://github.com/apache/flink/blob/master/flink-tests/src/test/java/org/apache/flink/test/checkpointing/utils/SnapshotMigrationTestBase.java
> > > [2]
> > >
> >
> https://github.com/apache/flink/tree/master/flink-tests/src/test/resources
> > > [3] https://issues.apache.org/jira/browse/FLINK-33699
> > >
> > > Best
> > > Yun Tang
> > > 
> > > From: Sergey Nuyanzin 
> > > Sent: Wednesday, November 29, 2023 22:56
> > > To: dev@flink.apache.org 
> > > Subject: Re: [ANNOUNCE] Experimental Java 21 support now available on
> > > master
> > >
> > > thanks for the response
> > >
> > >
> > > >I feel doubt about the conclusion that "don't try to load a savepoint
> > from
> > > a Java 8/11/17 build due to bumping to scala-2.12.18", since the
> > > snapshotted state (operator/keyed state-backend),  and most key/value
> > > serializer snapshots are generated by pure-java code.
> > > >The only left part is that the developer uses scala UDF or scala types
> > for
> > > key/value types. However, since all user-facing scala APIs have been
> > > deprecated, I don't think we have so many cases. Maybe we can give
> > > descriptions without such strong suggestions.
> > >
> > > That is the area where I feel I lack the knowledge to answer this
> > > precisely.
> > > My assumption was that statement about Java 21 regarding this should be
> > > similar to Java 17 which is almost same [1]
> > > Sorry for the inaccuracy
> > > Based on your statements I agree that the conclusion could be more
> > relaxed.
> > >
> > > I'm curious whether there are some tests or anything which could
> clarify
> > > this?
> > >
> > > [1] https://lists.apache.org/thread/mz0m6wqjmqy8htob3w4469pjbg9305do
> > >
> > > On Wed, Nov 29, 2023 at 12:25 PM Yun Tang  wrote:
> > >
> > > > Thanks Sergey for the great work.
> > > >
> > > > I feel doubt about the conclusion that "don't try to load a savepoint
> > > from
> > > > a Java 8/11/17 build due to bummping to scala-2.12.18", since the
> > > > snapshotted state (operator/keyed state-backend),  and most key/value
> > > > serializer snapshots are generated by pure-java code. The only left
> > part
> > > is
> > > > that the developer uses scala UDF or scala types for key/value types.
> > > > However, since all user-facing scala APIs have been deprecated [1], I
> > > don't
> > > > think we have so many cases. Maybe we can give descriptions without
> > such
> > > > strong suggestions.
> > > >
> > > > Please correct me if I am wrong.
> > > >
> > > >
> > > > [1] https://issues.apache.org/jira/browse/FLINK-29740
> > > >
> > > > Best
> > > > Yun Tang
> > > >
> > > > 
> > > > From: Rui Fan <1996fan...@gmail.com>
> > > > Sent: Wednesday, November 29, 2023 16:43
> > > > To: dev@flink.apache.org 
> > > > Subject: Re: [ANNOUNCE] Experimental Java 21 support now available on
> > > > master
> > > >
> > > > Thanks Sergey for the 

[jira] [Created] (FLINK-33702) Add IncrementalDelayRetryStrategy in AsyncRetryStrategies

2023-11-30 Thread xiangyu feng (Jira)
xiangyu feng created FLINK-33702:


 Summary: Add IncrementalDelayRetryStrategy in AsyncRetryStrategies 
 Key: FLINK-33702
 URL: https://issues.apache.org/jira/browse/FLINK-33702
 Project: Flink
  Issue Type: Bug
  Components: API / DataStream
Reporter: xiangyu feng


AsyncRetryStrategies now supports NoRetryStrategy, FixedDelayRetryStrategy and 
ExponentialBackoffDelayRetryStrategy.  In certain scenarios, we also need 
IncrementalDelayRetryStrategy to reduce the retry count and perform the action 
more timely. 

 

IncrementalDelayRetryStrategy will increase the retry delay at a fixed rate for 
each attempt.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-33698) Fix the backoff time calculation in ExponentialBackoffDelayRetryStrategy

2023-11-29 Thread xiangyu feng (Jira)
xiangyu feng created FLINK-33698:


 Summary: Fix the backoff time calculation in 
ExponentialBackoffDelayRetryStrategy
 Key: FLINK-33698
 URL: https://issues.apache.org/jira/browse/FLINK-33698
 Project: Flink
  Issue Type: Bug
  Components: API / DataStream
Reporter: xiangyu feng


The backoff time calculation in `ExponentialBackoffDelayRetryStrategy` should 
consider currentAttempts. 

 

Current Version:
{code:java}
@Override
public long getBackoffTimeMillis(int currentAttempts) {
if (currentAttempts <= 1) {
// equivalent to initial delay
return lastRetryDelay;
}
long backoff = Math.min((long) (lastRetryDelay * multiplier), 
maxRetryDelay);
this.lastRetryDelay = backoff;
return backoff;
} {code}
Fixed Version:
{code:java}
@Override
public long getBackoffTimeMillis(int currentAttempts) {
if (currentAttempts <= 1) {
// equivalent to initial delay
return initialDelay;
}
long backoff =
Math.min(
(long) (initialDelay * Math.pow(multiplier, currentAttempts 
- 1)),
maxRetryDelay);
return backoff;
} {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-33688) Reuse Channels in RestClient to save connection establish time

2023-11-28 Thread xiangyu feng (Jira)
xiangyu feng created FLINK-33688:


 Summary: Reuse Channels in RestClient to save connection establish 
time
 Key: FLINK-33688
 URL: https://issues.apache.org/jira/browse/FLINK-33688
 Project: Flink
  Issue Type: Sub-task
  Components: Client / Job Submission
Reporter: xiangyu feng


RestClient can reuse the connections to Dispatcher when submitting http 
requests to a long running Flink cluster.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-33687) Reuse RestClusterClient in StandaloneClusterDescriptor to avoid frequent thread create/destroy

2023-11-28 Thread xiangyu feng (Jira)
xiangyu feng created FLINK-33687:


 Summary: Reuse RestClusterClient in StandaloneClusterDescriptor to 
avoid frequent thread create/destroy
 Key: FLINK-33687
 URL: https://issues.apache.org/jira/browse/FLINK-33687
 Project: Flink
  Issue Type: Sub-task
  Components: Client / Job Submission
Reporter: xiangyu feng


`RestClusterClient` can also be reused when submitting programs to a 
long-running Flink Cluster



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-33686) Reuse StandaloneClusterDescriptor in RemoteExecutor when executing jobs on the same cluster

2023-11-28 Thread xiangyu feng (Jira)
xiangyu feng created FLINK-33686:


 Summary: Reuse StandaloneClusterDescriptor in RemoteExecutor when 
executing jobs on the same cluster
 Key: FLINK-33686
 URL: https://issues.apache.org/jira/browse/FLINK-33686
 Project: Flink
  Issue Type: Sub-task
  Components: Client / Job Submission
Reporter: xiangyu feng


Multiple `RemoteExecutor` instances can reuse the same 
`StandaloneClusterDescriptor` when executing jobs to a same running cluster.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-33685) StandaloneClusterId need to distinguish different remote clusters

2023-11-28 Thread xiangyu feng (Jira)
xiangyu feng created FLINK-33685:


 Summary: StandaloneClusterId need to distinguish different remote 
clusters
 Key: FLINK-33685
 URL: https://issues.apache.org/jira/browse/FLINK-33685
 Project: Flink
  Issue Type: Sub-task
  Components: Client / Job Submission
Reporter: xiangyu feng


`StandaloneClusterId` is a singleton, which means `StandaloneClusterDescriptor` 
cannot distinguish different remote running clusters.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-33684) Improve the retry strategy in CollectResultFetcher

2023-11-28 Thread xiangyu feng (Jira)
xiangyu feng created FLINK-33684:


 Summary: Improve the retry strategy in CollectResultFetcher
 Key: FLINK-33684
 URL: https://issues.apache.org/jira/browse/FLINK-33684
 Project: Flink
  Issue Type: Sub-task
  Components: API / DataStream
Reporter: xiangyu feng


Currently  CollectResultFetcher use a fixed retry interval.
{code:java}
private void sleepBeforeRetry() {
if (retryMillis <= 0) {
return;
}

try {
// TODO a more proper retry strategy?
Thread.sleep(retryMillis);
} catch (InterruptedException e) {
LOG.warn("Interrupted when sleeping before a retry", e);
}
} {code}
This can be improved with a WaitStrategy.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-33683) Improve the performance of submitting jobs and fetching results to a running flink cluster

2023-11-28 Thread xiangyu feng (Jira)
xiangyu feng created FLINK-33683:


 Summary: Improve the performance of submitting jobs and fetching 
results to a running flink cluster
 Key: FLINK-33683
 URL: https://issues.apache.org/jira/browse/FLINK-33683
 Project: Flink
  Issue Type: Improvement
  Components: Client / Job Submission, Table SQL / Client
Reporter: xiangyu feng


There is now a lot of unnecessary overhead involved in submitting jobs and 
fetching results to a long-running flink cluster. This works well for streaming 
and batch job, because in these scenarios users will not frequently submit jobs 
and fetch result to a running cluster.

 

But in OLAP scenario, users will continuously submit lots of short-lived jobs 
to the running cluster. In this situation, these overhead will have a huge 
impact on the E2E performance.  Here are some examples of unnecessary overhead:
 * Each `RemoteExecutor` will create a new `StandaloneClusterDescriptor` when 
executing a job on the same remote cluster
 * `StandaloneClusterDescriptor` will always create a new `RestClusterClient` 
when retrieving an existing Flink Cluster
 * Each `RestClusterClient` will create a new `ClientHighAvailabilityServices` 
which might contains a resource-consuming ha client(ZKClient or KubeClient) and 
a time-consuming leader retrieval operation
 * `RestClient` will create a new connection for every request which costs 
extra connection establishment time

 

Therefore, I suggest creating this ticket and following subtasks to improve 
this performance. This ticket is also relates to  FLINK-25318.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [VOTE] FLIP-393: Make QueryOperations SQL serializable

2023-11-21 Thread xiangyu feng
+1 (non-binding)

Thanks for driving this.

Best,
Xiangyu Feng


Ferenc Csaky  于2023年11月21日周二 20:07写道:

> +1 (non-binding)
>
> Lookgin forward to this!
>
> Best,
> Ferenc
>
>
>
>
> On Tuesday, November 21st, 2023 at 12:21, Martijn Visser <
> martijnvis...@apache.org> wrote:
>
>
> >
> >
> > +1 (binding)
> >
> > Thanks for driving this.
> >
> > Best regards,
> >
> > Martijn
> >
> > On Tue, Nov 21, 2023 at 12:18 PM Benchao Li libenc...@apache.org wrote:
> >
> > > +1 (binding)
> > >
> > > Dawid Wysakowicz wysakowicz.da...@gmail.com 于2023年11月21日周二 18:56写道:
> > >
> > > > Hi everyone,
> > > >
> > > > Thank you to everyone for the feedback on FLIP-393: Make
> QueryOperations
> > > > SQL serializable[1]
> > > > which has been discussed in this thread [2].
> > > >
> > > > I would like to start a vote for it. The vote will be open for at
> least 72
> > > > hours unless there is an objection or not enough votes.
> > > >
> > > > [1] https://cwiki.apache.org/confluence/x/vQ2ZE
> > > > [2] https://lists.apache.org/thread/ztyk68brsbmwwo66o1nvk3f6fqqhdxgk
> > >
> > > --
> > >
> > > Best,
> > > Benchao Li
>


Re: [DISCUSS] FLIP-393: Make QueryOperations SQL serializable

2023-11-17 Thread xiangyu feng
>After this FLIP is done, FLINK-25015() can utilize this ability to set
> job name for queries.

+1 for this. Currently, when users submit sql jobs through table api, we
can't see the complete SQL string on flink ui. It would be easy for us to
finish this feature if we can get serialized sql from QueryOperation
directly.

So +1 for the overall proposal.

Regards,
Xiangyu

Benchao Li  于2023年11月17日周五 19:07写道:

> That sounds good to me, I'm looking forward to it!
>
> After this FLIP is done, FLINK-25015 can utilize this ability to set
> job name for queries.
>
> Dawid Wysakowicz  于2023年11月16日周四 21:16写道:
> >
> > Yes, the idea is to convert the QueryOperation tree into a
> > proper/compilable query. To be honest I didn't think it could be done
> > differently, sorry if I wasn't clear enough. Yes, it is very much like
> > SqlNode#toSqlString you mentioned. I'll add an example of a single
> > QueryOperation tree to the FLIP.
> >
> > I tried to focus only on the public contracts, not on the implementation
> > details. I mentioned Expressions, because this requires changing
> > semi-public interfaces in BuiltinFunctionDefinitions.
> >
> > Hope this makes it clearer.
> >
> > Regards,
> > Dawid
> >
> > On Thu, 16 Nov 2023 at 12:12, Benchao Li  wrote:
> >
> > > Sorry that I wasn't expressing it clearly.
> > >
> > > Since the FLIP talks about two things: ResolvedExpression and
> > > QueryOperation, and you have illustrated how to serialize
> > > ResolvedExpression into SQL string. I'm wondering how you'll gonna to
> > > convert QueryOperation into SQL string.
> > >
> > > I was thinking that you proposed to convert the QueryOperation tree
> > > into a "complete runnable SQL statement", e.g.
> > >
> > >
> ProjectQueryOperation(x,y)->FilterQueryOperation(z>10)->TableSourceQueryOperation(T),
> > > we'll get "SELECT x, y FROM T WHERE z > 10".
> > > Maybe I misread it, maybe you just meant to convert each
> > > QueryOperation into a row-level SQL string instead the whole tree into
> > > a complete SQL statement.
> > >
> > > The idea of translating whole QueryOperation tree into SQL statement
> > > may come from my experience of Apache Calcite, there is a
> > > SqlImplementor[1] which convert a RelNode tree into SqlNode, and
> > > further we can use  SqlNode#toSqlString to unparse it into SQL string.
> > > I would assume that most of our QueryOperations are much like the
> > > abstraction of Calcite's RelNode, with some exceptions such as
> > > PlannerQueryOperation.
> > >
> > > [1]
> > >
> https://github.com/apache/calcite/blob/153796f8994831ad015af4b9036aa01ebf78/core/src/main/java/org/apache/calcite/rel/rel2sql/SqlImplementor.java#L141
> > >
> > > Dawid Wysakowicz  于2023年11月16日周四 16:24写道:
> > > >
> > > > I think the FLIP covers all public contracts that are necessary to be
> > > > discussed at that level.
> > > >
> > > > If you meant you could not find a method that would be called to
> trigger
> > > > the translation then it is already there. It's just not implemented
> yet:
> > > > QueryOperation#asSerializableString[1]. As I mentioned this is
> mostly a
> > > > follow up to previous work.
> > > >
> > > > Regards,
> > > > Dawid
> > > >
> > > > [1]
> > > >
> > >
> https://github.com/apache/flink/blob/d18a4bfe596fc580f8280750fa3bfa22007671d9/flink-table/flink-table-api-java/src/main/java/org/apache/flink/table/operations/QueryOperation.java#L46
> > > >
> > > > On Wed, 15 Nov 2023 at 16:36, Benchao Li 
> wrote:
> > > >
> > > > > +1 for the idea of choosing SQL as the serialization format for
> > > > > QueryOperation, thanks for Dawid for driving this FLIP.
> > > > >
> > > > > Regarding the implementation, I didn't see the proposal for how to
> > > > > translate QueryOperation to SQL yet, am I missing something? Or the
> > > > > FLIP is still in preliminary state, you just want to gather ideas
> > > > > about whether to use SQL or something else as the serialization
> format
> > > > > for QueryOperation?
> > > > >
> > > > > Dawid Wysakowicz  于2023年11月15日周三 19:34写道:
> > > > > >
> > > > > > Hi,
> > > > > > I would like to propose a follow-up improvement to some of the
> work
> > > that
> > > > > > has been done over the years to the Table API. I posted the
> proposed
> > > > > > changes in the FLIP-393. I'd like to get to know what others
> think of
> > > > > > choosing SQL as the serialization format for QueryOperations.
> > > > > > Regards,
> > > > > > Dawid Wysakowicz
> > > > > >
> > > > > > [1] https://cwiki.apache.org/confluence/x/vQ2ZE
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > >
> > > > > Best,
> > > > > Benchao Li
> > > > >
> > >
> > >
> > >
> > > --
> > >
> > > Best,
> > > Benchao Li
> > >
>
>
>
> --
>
> Best,
> Benchao Li
>


Re: Heartbeat of TaskManager with id container

2023-11-03 Thread xiangyu feng
Hi Nagireddy,

Pls try to
use  '-Dlog4j.configuration=file:/opt/test/flink/log4j.properties'.

Regards,
Xiangyu

Y SREEKARA BHARGAVA REDDY  于2023年11月3日周五 19:45写道:

> Yes, I went through that document.
>
> How can i override log4j.properties with my custom log4j.properties(
> */opt/test/flink/log4j.properties*).
>
> is it possible right?. in flink didn't find it.
>
> Any through on the above.
>
> Regards,
> Nagireddy Y.
>
>
>
> On Fri, Nov 3, 2023 at 12:18 PM xiangyu feng  wrote:
>
>> Hi Nagireddy,
>>
>> You can configure logging for your 1.16 job according to this doc[1].
>>
>> [1]
>> https://nightlies.apache.org/flink/flink-docs-release-1.16/docs/deployment/advanced/logging/
>>
>> Regards
>> Xiangyu,
>>
>>
>>
>> Y SREEKARA BHARGAVA REDDY  于2023年11月3日周五 08:57写道:
>>
>>> Hi Xiangyu,
>>>
>>> I have one issue,
>>>
>>> I am using *Flink** 1.16* version, How can i specify log4j.properties
>>> for the flink run command line along with my job.
>>> every job i need to pass different log file.
>>>
>>> looks like below one is not working: -Dlog4j.configurationFile=
>>> Please help me with correct config for the flink run command.
>>>
>>> Regards,
>>> Nagireddy Y.
>>>
>>>
>>>
>>>
>>> On Thu, Aug 3, 2023 at 7:24 AM xiangyu feng 
>>> wrote:
>>>
>>>> Hi ynagireddy4u,
>>>>
>>>> We have met this exception before. Usually it is caused by following
>>>> reasons:
>>>>
>>>> 1), TaskManager is too busy with other works to send the heartbeat to
>>>> JobMaster or TaskManager process might already exited;
>>>> 2), There might be a network issues between this TaskManager and
>>>> JobMaster;
>>>> 3), In certain cases, JobMaster actor might also being too busy to
>>>> process the RPC requests from TaskManager;
>>>>
>>>> Pls check if your problem fits the above situations.
>>>>
>>>> Best,
>>>> Xiangyu
>>>>
>>>>
>>>> Y SREEKARA BHARGAVA REDDY  于2023年7月31日周一
>>>> 20:49写道:
>>>>
>>>>> Hi Team,
>>>>>
>>>>> Did any one face the below exception.
>>>>> If yes, please share the resolution.
>>>>>
>>>>>
>>>>> 2023-07-28 22:04:16
>>>>> j*ava.util.concurrent.TimeoutException: Heartbeat of TaskManager with
>>>>> id
>>>>> container_e19_1690528962823_0382_01_05 timed out.*
>>>>> at org.apache.flink.runtime.jobmaster.
>>>>> JobMaster$TaskManagerHeartbeatListener.notifyHeartbeatTimeout(JobMaster
>>>>> .java:1147)
>>>>> at org.apache.flink.runtime.heartbeat.HeartbeatMonitorImpl.run(
>>>>> HeartbeatMonitorImpl.java:109)
>>>>> at
>>>>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:
>>>>> 511)
>>>>> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>>>>> at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRunAsync(
>>>>> AkkaRpcActor.java:397)
>>>>> at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(
>>>>> AkkaRpcActor.java:190)
>>>>> at org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor
>>>>> .handleRpcMessage(FencedAkkaRpcActor.java:74)
>>>>> at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage(
>>>>> AkkaRpcActor.java:152)
>>>>> at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:26)
>>>>> at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:21)
>>>>> at
>>>>> scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123)
>>>>> at akka.japi.pf
>>>>> .UnitCaseStatement.applyOrElse(CaseStatements.scala:21)
>>>>> at
>>>>> scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:170)
>>>>> at
>>>>> scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171)
>>>>> at
>>>>> scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171)
>>>>> at akka.actor.Actor$class.aroundReceive(Actor.scala:517)
>>>>> at akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:225)
>>>>> at akka.actor.ActorCell.receiveMessage(ActorCell.scala:592)
>>>>> at akka.actor.ActorCell.invoke(ActorCell.scala:561)
>>>>> at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:258)
>>>>> at akka.dispatch.Mailbox.run(Mailbox.scala:225)
>>>>> at akka.dispatch.Mailbox.exec(Mailbox.scala:235)
>>>>> at
>>>>> akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>>>>> at
>>>>> akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool
>>>>> .java:1339)
>>>>> at
>>>>> akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>>>>> at
>>>>> akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread
>>>>> .java:107)
>>>>>
>>>>> Any suggestions, please share with me.
>>>>>
>>>>> Regards,
>>>>> Nagireddy Y
>>>>>
>>>>


Re: Heartbeat of TaskManager with id container

2023-11-03 Thread xiangyu feng
Hi Nagireddy,

You can configure logging for your 1.16 job according to this doc[1].

[1]
https://nightlies.apache.org/flink/flink-docs-release-1.16/docs/deployment/advanced/logging/

Regards
Xiangyu,



Y SREEKARA BHARGAVA REDDY  于2023年11月3日周五 08:57写道:

> Hi Xiangyu,
>
> I have one issue,
>
> I am using *Flink** 1.16* version, How can i specify log4j.properties for
> the flink run command line along with my job.
> every job i need to pass different log file.
>
> looks like below one is not working: -Dlog4j.configurationFile=
> Please help me with correct config for the flink run command.
>
> Regards,
> Nagireddy Y.
>
>
>
>
> On Thu, Aug 3, 2023 at 7:24 AM xiangyu feng  wrote:
>
>> Hi ynagireddy4u,
>>
>> We have met this exception before. Usually it is caused by following
>> reasons:
>>
>> 1), TaskManager is too busy with other works to send the heartbeat to
>> JobMaster or TaskManager process might already exited;
>> 2), There might be a network issues between this TaskManager and
>> JobMaster;
>> 3), In certain cases, JobMaster actor might also being too busy to
>> process the RPC requests from TaskManager;
>>
>> Pls check if your problem fits the above situations.
>>
>> Best,
>> Xiangyu
>>
>>
>> Y SREEKARA BHARGAVA REDDY  于2023年7月31日周一 20:49写道:
>>
>>> Hi Team,
>>>
>>> Did any one face the below exception.
>>> If yes, please share the resolution.
>>>
>>>
>>> 2023-07-28 22:04:16
>>> j*ava.util.concurrent.TimeoutException: Heartbeat of TaskManager with id
>>> container_e19_1690528962823_0382_01_05 timed out.*
>>> at org.apache.flink.runtime.jobmaster.
>>> JobMaster$TaskManagerHeartbeatListener.notifyHeartbeatTimeout(JobMaster
>>> .java:1147)
>>> at org.apache.flink.runtime.heartbeat.HeartbeatMonitorImpl.run(
>>> HeartbeatMonitorImpl.java:109)
>>> at
>>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:
>>> 511)
>>> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>>> at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRunAsync(
>>> AkkaRpcActor.java:397)
>>> at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(
>>> AkkaRpcActor.java:190)
>>> at org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor
>>> .handleRpcMessage(FencedAkkaRpcActor.java:74)
>>> at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage(
>>> AkkaRpcActor.java:152)
>>> at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:26)
>>> at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:21)
>>> at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123)
>>> at akka.japi.pf
>>> .UnitCaseStatement.applyOrElse(CaseStatements.scala:21)
>>> at
>>> scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:170)
>>> at
>>> scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171)
>>> at
>>> scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171)
>>> at akka.actor.Actor$class.aroundReceive(Actor.scala:517)
>>> at akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:225)
>>> at akka.actor.ActorCell.receiveMessage(ActorCell.scala:592)
>>> at akka.actor.ActorCell.invoke(ActorCell.scala:561)
>>> at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:258)
>>> at akka.dispatch.Mailbox.run(Mailbox.scala:225)
>>> at akka.dispatch.Mailbox.exec(Mailbox.scala:235)
>>> at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>>> at akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool
>>> .java:1339)
>>> at
>>> akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>>> at
>>> akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread
>>> .java:107)
>>>
>>> Any suggestions, please share with me.
>>>
>>> Regards,
>>> Nagireddy Y
>>>
>>


Re: [VOTE] FLIP-370: Support Balanced Tasks Scheduling

2023-10-23 Thread xiangyu feng
Thanks for driving that.
+1 (non-binding)

Regards,
Xiangyu

Yu Chen  于2023年10月23日周一 15:19写道:

> +1 (non-binding)
>
> We deeply need this capability to balance Tasks at the Taskmanager level in
> production, which helps to make a more sufficient usage of Taskmanager
> resources. Thanks for driving that.
>
> Best,
> Yu Chen
>
> Yangze Guo  于2023年10月23日周一 15:08写道:
>
> > +1 (binding)
> >
> > Best,
> > Yangze Guo
> >
> > On Mon, Oct 23, 2023 at 12:00 PM Rui Fan <1996fan...@gmail.com> wrote:
> > >
> > > +1(binding)
> > >
> > > Thanks to Yuepeng and to everyone who participated in the discussion!
> > >
> > > Best,
> > > Rui
> > >
> > > On Mon, Oct 23, 2023 at 11:55 AM Roc Marshal  wrote:
> > >>
> > >> Hi all,
> > >>
> > >> Thanks for all the feedback on FLIP-370[1][2].
> > >> I'd like to start a vote for  FLIP-370. The vote will last for at
> least
> > 72 hours (Oct. 26th at 10:00 A.M. GMT) unless there is an objection or
> > insufficient votes.
> > >>
> > >> Thanks,
> > >> Yuepeng Pan
> > >>
> > >> [1]
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-370%3A+Support+Balanced+Tasks+Scheduling
> > >> [2] https://lists.apache.org/thread/mx3ot0fmk6zr02ccdby0s8oqxqm2pn1x
> >
>


Re: [ANNOUNCE] New Apache Flink Committer - Jane Chan

2023-10-15 Thread xiangyu feng
Congratulations Jane!

Best,
Xiangyu

Xuannan Su  于2023年10月16日周一 10:25写道:

> Congratulations Jane!
>
> Best,
> Xuannan
>
> On Mon, Oct 16, 2023 at 10:21 AM Yun Tang  wrote:
> >
> > Congratulations, Jane!
> >
> > Best
> > Yun Tang
> > 
> > From: Rui Fan <1996fan...@gmail.com>
> > Sent: Monday, October 16, 2023 10:16
> > To: dev@flink.apache.org 
> > Cc: qingyue@gmail.com 
> > Subject: Re: [ANNOUNCE] New Apache Flink Committer - Jane Chan
> >
> > Congratulations Jane!
> >
> > Best,
> > Rui
> >
> > On Mon, Oct 16, 2023 at 10:15 AM yu zelin  wrote:
> >
> > > Congratulations!
> > >
> > > Best,
> > > Yu Zelin
> > >
> > > > 2023年10月16日 09:58,Jark Wu  写道:
> > > >
> > > > Hi, everyone
> > > >
> > > > On behalf of the PMC, I'm very happy to announce Jane Chan as a new
> Flink
> > > > Committer.
> > > >
> > > > Jane started code contribution in Jan 2021 and has been active in the
> > > Flink
> > > > community since. She authored more than 60 PRs and reviewed more
> than 40
> > > > PRs. Her contribution mainly revolves around Flink SQL, including
> Plan
> > > > Advice (FLIP-280), operator-level state TTL (FLIP-292), and ALTER
> TABLE
> > > > statements (FLINK-21634). Jane participated deeply in development
> > > > discussions and also helped answer user question emails. Jane was
> also a
> > > > core contributor of Flink Table Store (now Paimon) when the project
> was
> > > in
> > > > the early days.
> > > >
> > > > Please join me in congratulating Jane Chan for becoming a Flink
> > > Committer!
> > > >
> > > > Best,
> > > > Jark Wu (on behalf of the Flink PMC)
> > >
> > >
>


Re: [DISCUSS] FLIP-370 : Support Balanced Tasks Scheduling

2023-10-11 Thread xiangyu feng
Hi Yuepeng,

Thanks for your feedback. I agree with u, both approaches can achieve the
goal.
As long as we can easily extend the balancing strategy to consider more
than one factors without changing the interface, the solution is OK for me.

Regards,
Xiangyu

Yuepeng Pan  于2023年10月11日周三 17:38写道:

> Hi, xiangyu.
> Thanks for your quick reply.
>
> >interface currently only includes a description of the number of tasks.
> So,
> >IIUC, If there is a need to further expand
> >current interface and its implementations, right?
>
> Yes, that's indeed the case.
>
> >I checked the interface design of LoadingWeight and WeightLoadable, AFAIK
> >currently it only supports comparing the load for one factor. If we want
> to
> >add more loading factors, LoadingWeight might need to add a 'LoadType'
> >field for distinction, WeightLoadable might need to return
> >Set.
>
> Thank you for the clarification, I think I roughly understand your
> description:
> In fact, regarding the specific implementation and extension of this
> LoadingWeight, we can extend it based on this interface and its
> implementation as mentioned above.
> If making frequent changes to the interface and its implementation is
> really tiresome, we can also consider introducing a built-in collapsible
> Map or other type of attribute, like the SlotSharingGroup class in the
> org.apache.flink.api.common.operators package, to describe the specific
> collection of load values and types. This way, these loads are collapsed
> within the LoadingWeight's implementation and can be expanded when needed
> for use. Of course, we can also consider an implementation like the one you
> mentioned, introducing a method in WeightLoadable that returns a collection
> as the return type, so the load values are expanded at the calling site and
> then used. As I understand it, both approaches can achieve the goal.
>
> Of course, I also look forward to hearing others' suggestions. If there
> are any mistakes in my statement, please correct me.
> Looking forward to your reply.
>
> Best regards.
> Yuepeng Pan
>
> On 2023/10/11 08:44:51 xiangyu feng wrote:
> > Hi Yuepeng,
> >
> > Thx for ur reply.
> >
> > > Nice feedback. In fact, as mentioned in the Google Doc, the
> LoadingWeight
> > interface currently only includes a description of the number of tasks.
> So,
> > IIUC, If there is a need to further expand
> > > descriptions of other resource loads, we just extend it based on the
> > current interface and its implementations, right?
> >
> > I checked the interface design of LoadingWeight and WeightLoadable, AFAIK
> > currently it only supports comparing the load for one factor. If we want
> to
> > add more loading factors, LoadingWeight might need to add a 'LoadType'
> > field for distinction, WeightLoadable might need to return
> > Set.
> >
> > I'm not sure I understand this correctly, WDYT?
> >
> > Regards,
> > Xiangyu
> >
> > Yuepeng Pan  于2023年10月11日周三 13:53写道:
> >
> > > Hi, xiangyu,
> > > Thanks for your attention as well.
> > >
> > > >1, About the waiting mechanism: Will the waiting mechanism happen
> only in
> > > >the second level 'assigning slots to TM'? IIUC, the first level
> 'assigning
> > > >Tasks to Slots' needs only the asynchronous slot result from slotpool.
> > >
> > > As described in the latest FLIP, the introduction of the waiting
> mechanism
> > > at the second level is to ensure that, in all deployment modes such as
> > > application, session, etc., we do not fall into a local greedy state
> when
> > > selecting the optimal slot position. This requires obtaining a global
> > > resource view to get the best result.
> > > IIUC, The allocation process from Task to Slot is the generation of a
> > > mapping relationship between two abstract descriptions, and at this
> point,
> > > there is no coupling of information between tasks/slots and Task
> Managers
> > > (TMs).
> > >
> > >
> > > >2, About the slot LoadingWeight: it is reasonable to use the number of
> > > >tasks by default in the beginning, but it would be better if this
> could be
> > > >easily extended in future to distinguish between CPU-intensive and
> > > >IO-intensive workloads. In some cases, TMs may have IO bottlenecks but
> > > >others have CPU bottlenecks.
> > >
> > > Nice feedback. In fact, as mentioned in the Google Doc, the
> LoadingWeight
> > > interface currently only includes a description of the number of
> tasks

Re: [DISCUSS] FLIP-370 : Support Balanced Tasks Scheduling

2023-10-11 Thread xiangyu feng
Hi Yuepeng,

Thx for ur reply.

> Nice feedback. In fact, as mentioned in the Google Doc, the LoadingWeight
interface currently only includes a description of the number of tasks. So,
IIUC, If there is a need to further expand
> descriptions of other resource loads, we just extend it based on the
current interface and its implementations, right?

I checked the interface design of LoadingWeight and WeightLoadable, AFAIK
currently it only supports comparing the load for one factor. If we want to
add more loading factors, LoadingWeight might need to add a 'LoadType'
field for distinction, WeightLoadable might need to return
Set.

I'm not sure I understand this correctly, WDYT?

Regards,
Xiangyu

Yuepeng Pan  于2023年10月11日周三 13:53写道:

> Hi, xiangyu,
> Thanks for your attention as well.
>
> >1, About the waiting mechanism: Will the waiting mechanism happen only in
> >the second level 'assigning slots to TM'? IIUC, the first level 'assigning
> >Tasks to Slots' needs only the asynchronous slot result from slotpool.
>
> As described in the latest FLIP, the introduction of the waiting mechanism
> at the second level is to ensure that, in all deployment modes such as
> application, session, etc., we do not fall into a local greedy state when
> selecting the optimal slot position. This requires obtaining a global
> resource view to get the best result.
> IIUC, The allocation process from Task to Slot is the generation of a
> mapping relationship between two abstract descriptions, and at this point,
> there is no coupling of information between tasks/slots and Task Managers
> (TMs).
>
>
> >2, About the slot LoadingWeight: it is reasonable to use the number of
> >tasks by default in the beginning, but it would be better if this could be
> >easily extended in future to distinguish between CPU-intensive and
> >IO-intensive workloads. In some cases, TMs may have IO bottlenecks but
> >others have CPU bottlenecks.
>
> Nice feedback. In fact, as mentioned in the Google Doc, the LoadingWeight
> interface currently only includes a description of the number of tasks. So,
> IIUC, If there is a need to further expand descriptions of other resource
> loads, we just extend it based on the current interface and its
> implementations, right?
> Please correct me if I have misunderstood. Thanks a lot~
>
> Best,
> Yuepeng.
>
> On 2023/10/06 10:19:21 xiangyu feng wrote:
> > Thanks Yuepeng and Rui for driving this Discussion.
> >
> > Internally when we try to use Flink 1.17.1 in production, we are also
> > suffering from the unbalanced task distribution problem for jobs with
> high
> > qps and complex dag. So +1 for the overall proposal.
> >
> > Some questions about the details:
> >
> > 1, About the waiting mechanism: Will the waiting mechanism happen only in
> > the second level 'assigning slots to TM'?  IIUC, the first level
> 'assigning
> > Tasks to Slots' needs only the asynchronous slot result from slotpool.
> >
> > 2, About the slot LoadingWeight: it is reasonable to use the number of
> > tasks by default in the beginning, but it would be better if this could
> be
> > easily extended in future to distinguish between CPU-intensive and
> > IO-intensive workloads. In some cases, TMs may have IO bottlenecks but
> > others have CPU bottlenecks.
> >
> > Regards,
> > Xiangyu
> >
> >
> > Yuepeng Pan  于2023年10月5日周四 18:34写道:
> >
> > > Hi, Zhu Zhu,
> > >
> > > Thanks for your feedback!
> > >
> > > > I think we can introduce a new config option
> > > > `taskmanager.load-balance.mode`,
> > > > which accepts "None"/"Slots"/"Tasks".
> `cluster.evenly-spread-out-slots`
> > > > can be superseded by the "Slots" mode and get deprecated. In the
> future
> > > > it can support more mode, e.g. "CpuCores", to work better for jobs
> with
> > > > fine-grained resources. The proposed config option
> > > > `slot.request.max-interval`
> > > > then can be renamed to
> > > `taskmanager.load-balance.request-stablizing-timeout`
> > > > to show its relation with the feature. The proposed
> > > `slot.sharing-strategy`
> > > > is not needed, because the configured "Tasks" mode will do the work.
> > >
> > > The new proposed configuration option sounds good to me.
> > >
> > > I have a small question, If we set our configuration value to 'Tasks,'
> it
> > > will initiate two processes: balancing the allocation of task
> quantities at
> > > the slot level and balancing the number of tasks across TaskManagers
> (TMs).
> > > Alternatively, if we configure it as 'Slots,' the system will employ
> the
> > > LocalPreferred allocation policy (which is the default) when assigning
> > > tasks to slots, and it will ensure that the number of slots used
> across TMs
> > > is balanced.
> > > Does  this configuration essentially combine a balanced selection
> strategy
> > > across two dimensions into fixed configuration items, right?
> > >
> > > I would appreciate it if you could correct me if I've made any errors.
> > >
> > > Best,
> > > Yuepeng.
> > >
> >
>


[jira] [Created] (FLINK-33235) Quickstart guide for Flink OLAP should support building from master branch

2023-10-10 Thread xiangyu feng (Jira)
xiangyu feng created FLINK-33235:


 Summary: Quickstart guide for Flink OLAP should support building 
from master branch
 Key: FLINK-33235
 URL: https://issues.apache.org/jira/browse/FLINK-33235
 Project: Flink
  Issue Type: Sub-task
  Components: Documentation
Reporter: xiangyu feng


Many features required by OLAP session cluster are still in master branch or 
in-progress and not released yet, for example: FLIP-295, FLIP-362, FLIP-374. We 
need to address this in the document and show users how to quickly build OLAP 
Session Cluster from master branch.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [VOTE] FLIP-374: Adding a separate configuration for specifying Java Options of the SQL Gateway

2023-10-10 Thread xiangyu feng
+1(non-binding)

Regards,
Xiangyu

Shammon FY  于2023年10月11日周三 10:30写道:

> +1(binding), good job!
>
> Best,
> Shammon FY
>
> On Wed, Oct 11, 2023 at 10:18 AM Benchao Li  wrote:
>
> > +1 (binding)
> >
> > Rui Fan <1996fan...@gmail.com> 于2023年10月11日周三 10:17写道:
> > >
> > > +1(binding)
> > >
> > > Best,
> > > Rui
> > >
> > > On Wed, Oct 11, 2023 at 10:07 AM Yangze Guo 
> wrote:
> > >
> > > > Hi everyone,
> > > >
> > > > I'd like to start the vote of FLIP-374 [1]. This FLIP is discussed in
> > > > the thread [2].
> > > >
> > > > The vote will be open for at least 72 hours. Unless there is an
> > > > objection, I'll try to close it by October 16, 2023 if we have
> > > > received sufficient votes.
> > > >
> > > > [1]
> > > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-374%3A+Adding+a+separate+configuration+for+specifying+Java+Options+of+the+SQL+Gateway
> > > > [2] https://lists.apache.org/thread/g4vl8mgnwgl7vjyvjy6zrc8w54b2lthv
> > > >
> > > > Best,
> > > > Yangze Guo
> > > >
> >
> >
> >
> > --
> >
> > Best,
> > Benchao Li
> >
>


[jira] [Created] (FLINK-33228) Fix the total current resource calculation in fulfill requirements

2023-10-10 Thread xiangyu feng (Jira)
xiangyu feng created FLINK-33228:


 Summary: Fix the total current resource calculation in fulfill 
requirements
 Key: FLINK-33228
 URL: https://issues.apache.org/jira/browse/FLINK-33228
 Project: Flink
  Issue Type: Bug
  Components: Runtime / Coordination
Reporter: xiangyu feng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [VOTE] FLIP-367: Support Setting Parallelism for Table/SQL Sources

2023-10-09 Thread xiangyu feng
+1 (non-binding)

Feng Jin  于2023年10月9日周一 16:00写道:

> +1 (non-binding)
>
> Best,
> Feng
>
> On Mon, Oct 9, 2023 at 3:12 PM Yangze Guo  wrote:
>
> > +1 (binding)
> >
> > Best,
> > Yangze Guo
> >
> > On Mon, Oct 9, 2023 at 2:46 PM Yun Tang  wrote:
> > >
> > > +1 (binding)
> > >
> > > Best
> > > Yun Tang
> > > 
> > > From: Weihua Hu 
> > > Sent: Monday, October 9, 2023 12:03
> > > To: dev@flink.apache.org 
> > > Subject: Re: [VOTE] FLIP-367: Support Setting Parallelism for Table/SQL
> > Sources
> > >
> > > +1 (binding)
> > >
> > > Best,
> > > Weihua
> > >
> > >
> > > On Mon, Oct 9, 2023 at 11:47 AM Shammon FY  wrote:
> > >
> > > > +1 (binding)
> > > >
> > > >
> > > > On Mon, Oct 9, 2023 at 11:12 AM Benchao Li 
> > wrote:
> > > >
> > > > > +1 (binding)
> > > > >
> > > > > Zhanghao Chen  于2023年10月9日周一 10:20写道:
> > > > > >
> > > > > > Hi All,
> > > > > >
> > > > > > Thanks for all the feedback on FLIP-367: Support Setting
> > Parallelism
> > > > for
> > > > > Table/SQL Sources [1][2].
> > > > > >
> > > > > > I'd like to start a vote for FLIP-367. The vote will be open
> until
> > Oct
> > > > > 12th 12:00 PM GMT) unless there is an objection or insufficient
> > votes.
> > > > > >
> > > > > > [1]
> > > > >
> > > >
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=263429150
> > > > > > [2]
> > https://lists.apache.org/thread/gtpswl42jzv0c9o3clwqskpllnw8rh87
> > > > > >
> > > > > > Best,
> > > > > > Zhanghao Chen
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > >
> > > > > Best,
> > > > > Benchao Li
> > > > >
> > > >
> >
>


[jira] [Created] (FLINK-33209) Translate Flink OLAP quick start guide to Chinese

2023-10-08 Thread xiangyu feng (Jira)
xiangyu feng created FLINK-33209:


 Summary: Translate Flink OLAP quick start guide to Chinese
 Key: FLINK-33209
 URL: https://issues.apache.org/jira/browse/FLINK-33209
 Project: Flink
  Issue Type: Sub-task
  Components: chinese-translation
Reporter: xiangyu feng


Translate Flink OLAP quick start guide to Chinese



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] FLIP-374: Adding a separate configuration for specifying Java Options of the SQL Gateway

2023-10-07 Thread xiangyu feng
Thanks for initiating this discussion. Within the development towards
Streaming Warehousing, SQL Gateway will become more and more important.
Big +1 to specify Java Options separately for SQL Gateway.

Regards,
Xiangyu

Yangze Guo  于2023年10月7日周六 15:24写道:

> Hi, there,
>
> We'd like to start a discussion thread on "FLIP-374: Adding a separate
> configuration for specifying Java Options of the SQL Gateway"[1],
> where we propose adding a separate configuration option to specify the
> Java options for the SQL Gateway. This would allow users to fine-tune
> the memory settings, garbage collection behavior, and other relevant
> Java parameters specific to the SQL Gateway, ensuring optimal
> performance and stability in production environments.
>
> Looking forward to your feedback.
>
> [1]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-374%3A+Adding+a+separate+configuration+for+specifying+Java+Options+of+the+SQL+Gateway
>
> Best,
> Yangze Guo
>


Re: [DISCUSS] FLIP-370 : Support Balanced Tasks Scheduling

2023-10-06 Thread xiangyu feng
Thanks Yuepeng and Rui for driving this Discussion.

Internally when we try to use Flink 1.17.1 in production, we are also
suffering from the unbalanced task distribution problem for jobs with high
qps and complex dag. So +1 for the overall proposal.

Some questions about the details:

1, About the waiting mechanism: Will the waiting mechanism happen only in
the second level 'assigning slots to TM'?  IIUC, the first level 'assigning
Tasks to Slots' needs only the asynchronous slot result from slotpool.

2, About the slot LoadingWeight: it is reasonable to use the number of
tasks by default in the beginning, but it would be better if this could be
easily extended in future to distinguish between CPU-intensive and
IO-intensive workloads. In some cases, TMs may have IO bottlenecks but
others have CPU bottlenecks.

Regards,
Xiangyu


Yuepeng Pan  于2023年10月5日周四 18:34写道:

> Hi, Zhu Zhu,
>
> Thanks for your feedback!
>
> > I think we can introduce a new config option
> > `taskmanager.load-balance.mode`,
> > which accepts "None"/"Slots"/"Tasks". `cluster.evenly-spread-out-slots`
> > can be superseded by the "Slots" mode and get deprecated. In the future
> > it can support more mode, e.g. "CpuCores", to work better for jobs with
> > fine-grained resources. The proposed config option
> > `slot.request.max-interval`
> > then can be renamed to
> `taskmanager.load-balance.request-stablizing-timeout`
> > to show its relation with the feature. The proposed
> `slot.sharing-strategy`
> > is not needed, because the configured "Tasks" mode will do the work.
>
> The new proposed configuration option sounds good to me.
>
> I have a small question, If we set our configuration value to 'Tasks,' it
> will initiate two processes: balancing the allocation of task quantities at
> the slot level and balancing the number of tasks across TaskManagers (TMs).
> Alternatively, if we configure it as 'Slots,' the system will employ the
> LocalPreferred allocation policy (which is the default) when assigning
> tasks to slots, and it will ensure that the number of slots used across TMs
> is balanced.
> Does  this configuration essentially combine a balanced selection strategy
> across two dimensions into fixed configuration items, right?
>
> I would appreciate it if you could correct me if I've made any errors.
>
> Best,
> Yuepeng.
>


Re: [Discuss] FLIP-362: Support minimum resource limitation

2023-10-04 Thread xiangyu feng
Hi David,

Glad to hear you back!

> Agreed; in my mind, this boils down to the ability to quickly allocate new
slots (TMs). This might differ between environments though.

Yes, for interactive queries cold-start is a very tricky issue to dealing
with,
we should consider not only about allocating new resources ASAP but also
warming up the newly added TaskManagers.
Internally, we have done lots of work to address this problem.
Minimum resource limitation is the first step, we will
keep working on this.

Appreciate your feedback again.

Regards,
Xiangyu


David Morávek  于2023年10月4日周三 22:58写道:

> > If not, what is the difference between the spare resources and redundant
> taskmanagers?
>
> I wasn't aware of this one; good catch! The main difference is that you
> don't express the spare resources in terms of slots but in terms of task
> managers. Also, those options serve slightly different purpose, and users
> configuring slot manager might not look for another option somewhere else.
>
> > Secondly, IMHO the difference between min-reserved resource and spare
> resources is that we could configure a rather large min-reserved resource
>
> Agreed; in my mind, this boils down to the ability to quickly allocate new
> slots (TMs). This might differ between environments though. In most cases,
> there should be some time between interactive queries unless they're
> submitted programmatically. I can see the value of having both (min + slots
> to keep around).
>
> All in all, I don't have a strong opinion here, it's a significant
> improvement either way. This was just the first thing that I thought about
> after reading the flip.
>
> Best,
> D.
>
> On Tue, Oct 3, 2023 at 2:10 PM xiangyu feng  wrote:
>
> > Hi David,
> >
> > Thx for your feedback.
> >
> > First of all, for keeping some spare resources around, do you mean
> > 'Redundant TaskManagers'[1]? If not, what is the difference between the
> > spare resources and redundant taskmanagers?
> >
> > Secondly, IMHO the difference between min-reserved resource and spare
> > resources is that we could configure a rather large min-reserved resource
> > for user cases submitting lots of short-lived jobs concurrently, but we
> > don't want to configure a large spare resource since this might double
> the
> > total resource usage and lead to resource waste.
> >
> > Looking forward to hearing from you.
> >
> > Regards,
> > Xiangyu
> >
> > [1] https://issues.apache.org/jira/browse/FLINK-18625
> >
> > David Morávek  于2023年10月3日周二 05:00写道:
> >
> > > H Xiangyui,
> > >
> > > The sentiment of the FLIP makes sense, but I keep wondering whether
> this
> > > is the best way to think about the problem. I assume that "interactive
> > > session cluster" users always want to keep some spare resources around
> > (up
> > > to a configured threshold) to reduce cold start instead of statically
> > > configuring the minimum.
> > >
> > > It's just a tiny change from the original proposal, but it could make
> all
> > > the difference (eliminate overprovisioning, maintain latencies with a
> > > growing # of jobs, ..)
> > >
> > > WDYT?
> > >
> > > Best,
> > > D.
> > >
> > > On Mon, Sep 25, 2023 at 5:11 PM Jing Ge 
> > > wrote:
> > >
> > >> Hi Yangze,
> > >>
> > >> Thanks for the clarification. The example of two batch jobs team up
> with
> > >> one streaming job is interesting.
> > >>
> > >> Best regards,
> > >> Jing
> > >>
> > >> On Wed, Sep 20, 2023 at 7:19 PM Yangze Guo 
> wrote:
> > >>
> > >> > Thanks for the comments, Jing.
> > >> >
> > >> > > Will the minimum resource configuration also take effect for
> > streaming
> > >> > jobs in application mode?
> > >> > > Since it is not recommended to configure
> > >> slotmanager.number-of-slots.max
> > >> > for streaming jobs, does it make sense to disable it for common
> > >> streaming
> > >> > jobs? At least disable the check for avoiding the oscillation?
> > >> >
> > >> > Yes. The minimum resource configuration will only disabled in
> > >> > standalone cluster atm. I agree it make sense to disable it for a
> pure
> > >> > streaming job, however:
> > >> > - By default, the minimum resource is configured to 0. If users do
> not
> > >> > proactively set it, either the o

Re: [Discuss] FLIP-362: Support minimum resource limitation

2023-10-03 Thread xiangyu feng
Hi David,

Thx for your feedback.

First of all, for keeping some spare resources around, do you mean
'Redundant TaskManagers'[1]? If not, what is the difference between the
spare resources and redundant taskmanagers?

Secondly, IMHO the difference between min-reserved resource and spare
resources is that we could configure a rather large min-reserved resource
for user cases submitting lots of short-lived jobs concurrently, but we
don't want to configure a large spare resource since this might double the
total resource usage and lead to resource waste.

Looking forward to hearing from you.

Regards,
Xiangyu

[1] https://issues.apache.org/jira/browse/FLINK-18625

David Morávek  于2023年10月3日周二 05:00写道:

> H Xiangyui,
>
> The sentiment of the FLIP makes sense, but I keep wondering whether this
> is the best way to think about the problem. I assume that "interactive
> session cluster" users always want to keep some spare resources around (up
> to a configured threshold) to reduce cold start instead of statically
> configuring the minimum.
>
> It's just a tiny change from the original proposal, but it could make all
> the difference (eliminate overprovisioning, maintain latencies with a
> growing # of jobs, ..)
>
> WDYT?
>
> Best,
> D.
>
> On Mon, Sep 25, 2023 at 5:11 PM Jing Ge 
> wrote:
>
>> Hi Yangze,
>>
>> Thanks for the clarification. The example of two batch jobs team up with
>> one streaming job is interesting.
>>
>> Best regards,
>> Jing
>>
>> On Wed, Sep 20, 2023 at 7:19 PM Yangze Guo  wrote:
>>
>> > Thanks for the comments, Jing.
>> >
>> > > Will the minimum resource configuration also take effect for streaming
>> > jobs in application mode?
>> > > Since it is not recommended to configure
>> slotmanager.number-of-slots.max
>> > for streaming jobs, does it make sense to disable it for common
>> streaming
>> > jobs? At least disable the check for avoiding the oscillation?
>> >
>> > Yes. The minimum resource configuration will only disabled in
>> > standalone cluster atm. I agree it make sense to disable it for a pure
>> > streaming job, however:
>> > - By default, the minimum resource is configured to 0. If users do not
>> > proactively set it, either the oscillation check or the minimum
>> > restriction can be considered as disabled.
>> > - The minimum resource is a cluster-level configuration rather than a
>> > job-level configuration. If a user has an application with two batch
>> > jobs preceding the streaming job, they may also require this
>> > configuration to accelerate the execution of batch jobs.
>> >
>> > WDYT?
>> >
>> > Best,
>> > Yangze Guo
>> >
>> > On Thu, Sep 21, 2023 at 4:49 AM Jing Ge 
>> > wrote:
>> > >
>> > > Hi Xiangyu,
>> > >
>> > > Thanks for driving it! There is one thing I am not really sure if I
>> > > understand you correctly.
>> > >
>> > > According to the FLIP: "The minimum resource limitation will be
>> > implemented
>> > > in the DefaultResourceAllocationStrategy of FineGrainedSlotManager.
>> > >
>> > > Each time when SlotManager needs to reconcile the cluster resources or
>> > > fulfill job resource requirements, the
>> DefaultResourceAllocationStrategy
>> > > will check if the minimum resource requirement has been fulfilled. If
>> it
>> > is
>> > > not, DefaultResourceAllocationStrategy will request new
>> > PendingTaskManagers
>> > > and FineGrainedSlotManager will allocate new worker resources
>> > accordingly."
>> > >
>> > > "To avoid this oscillation, we need to check the worker number derived
>> > from
>> > > minimum and maximum resource configuration is consistent before
>> starting
>> > > SlotManager."
>> > >
>> > > Will the minimum resource configuration also take effect for streaming
>> > jobs
>> > > in application mode? Since it is not recommended to
>> > > configure slotmanager.number-of-slots.max for streaming jobs, does it
>> > make
>> > > sense to disable it for common streaming jobs? At least disable the
>> check
>> > > for avoiding the oscillation?
>> > >
>> > > Best regards,
>> > > Jing
>> > >
>> > >
>> > > On Tue, Sep 19, 2023 at 4:58 PM Chen Zhanghao <
>> zhanghao.c...@outlook.com
>> > >

[RESULT][VOTE] FLIP-362: Support minimum resource limitation

2023-09-28 Thread xiangyu feng
Hi all,

Thanks for your review and the votes!

I am happy to announce that FLIP-362: Support minimum resource limitation
[1] has been accepted.

There are 4 binding votes and 2 non-binding votes [2]:

- Yangze Guo (binding)
- Ahmed Hamdy (non-binding)
- Chen Zhanghao (non-binding)
- Jing Ge (binding)
- Shammon FY (binding)
- Weihua Hu (binding)

There is no disapproving vote.

Regards,
Xiangyu


[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-362%3A+Support+minimum+resource+limitation
[2] https://lists.apache.org/thread/wpbms7rywjcodv7qdgx803277th508og


Re: [VOTE] FLIP-362: Support minimum resource limitation

2023-09-28 Thread xiangyu feng
Thank you all for the votes!I will close the voting thread and summarize
the result in a separate email.

Regards,
Xiangyu


Weihua Hu  于2023年9月26日周二 11:44写道:

> +1 (binding)
>
> Best,
> Weihua
>
>
> On Tue, Sep 26, 2023 at 10:00 AM Shammon FY  wrote:
>
> > +1(binding), thanks for the proposal.
> >
> > Best,
> > Shammon FY
> >
> > On Mon, Sep 25, 2023 at 11:20 PM Jing Ge 
> > wrote:
> >
> > > +1(binding) Thanks!
> > >
> > > Best regards,
> > > Jing
> > >
> > > On Mon, Sep 25, 2023 at 3:48 AM Chen Zhanghao <
> zhanghao.c...@outlook.com
> > >
> > > wrote:
> > >
> > > > Thanks for driving this. +1 (non-binding)
> > > >
> > > > Best,
> > > > Zhanghao Chen
> > > > 
> > > > 发件人: xiangyu feng 
> > > > 发送时间: 2023年9月25日 17:38
> > > > 收件人: dev@flink.apache.org 
> > > > 主题: [VOTE] FLIP-362: Support minimum resource limitation
> > > >
> > > > Hi all,
> > > >
> > > > I would like to start the vote for FLIP-362:  Support minimum
> resource
> > > > limitation[1].
> > > > This FLIP was discussed in this thread [2].
> > > >
> > > > The vote will be open for at least 72 hours unless there is an
> > objection
> > > or
> > > > insufficient votes.
> > > >
> > > > Regards,
> > > > Xiangyu
> > > >
> > > > [1]
> > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-362%3A+Support+minimum+resource+limitation
> > > > [2] https://lists.apache.org/thread/m2v9n4yynm97v8swhqj2o5k0sqlb5ym4
> > > >
> > >
> >
>


[VOTE] FLIP-362: Support minimum resource limitation

2023-09-25 Thread xiangyu feng
Hi all,

I would like to start the vote for FLIP-362:  Support minimum resource
limitation[1].
This FLIP was discussed in this thread [2].

The vote will be open for at least 72 hours unless there is an objection or
insufficient votes.

Regards,
Xiangyu

[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-362%3A+Support+minimum+resource+limitation
[2] https://lists.apache.org/thread/m2v9n4yynm97v8swhqj2o5k0sqlb5ym4


Re: [Discuss] FLIP-362: Support minimum resource limitation

2023-09-22 Thread xiangyu feng
Hi all,

Thank you for the comments.

If there is no further comment, we will open the voting thread in 3 days.

Regards,
Xiangyu

Yangze Guo  于2023年9月21日周四 11:19写道:

> Thanks for the reply, Shammon.
>
> As the example described in my last response, an application could
> contain multiple jobs, both batch and streaming. I don't lean to
> disable it in Application mode in case users want to leverage it to
> accelerate the preceding batch jobs in their application.
>
> Best,
> Yangze Guo
>
> On Thu, Sep 21, 2023 at 11:15 AM Shammon FY  wrote:
> >
> > Hi,
> >
> > I agree that `minimum resource limitation` will bring values for flink
> > session clusters, but for `Application Mode`, is it useful for streaming
> > and batch jobs? Is it necessary for us to not support the application
> mode,
> > rather than relying on the default value 0?
> >
> > Best,
> > Shammon FY
> >
> > On Thu, Sep 21, 2023 at 10:18 AM Yangze Guo  wrote:
> >
> > > Thanks for the comments, Jing.
> > >
> > > > Will the minimum resource configuration also take effect for
> streaming
> > > jobs in application mode?
> > > > Since it is not recommended to configure
> slotmanager.number-of-slots.max
> > > for streaming jobs, does it make sense to disable it for common
> streaming
> > > jobs? At least disable the check for avoiding the oscillation?
> > >
> > > Yes. The minimum resource configuration will only disabled in
> > > standalone cluster atm. I agree it make sense to disable it for a pure
> > > streaming job, however:
> > > - By default, the minimum resource is configured to 0. If users do not
> > > proactively set it, either the oscillation check or the minimum
> > > restriction can be considered as disabled.
> > > - The minimum resource is a cluster-level configuration rather than a
> > > job-level configuration. If a user has an application with two batch
> > > jobs preceding the streaming job, they may also require this
> > > configuration to accelerate the execution of batch jobs.
> > >
> > > WDYT?
> > >
> > > Best,
> > > Yangze Guo
> > >
> > > On Thu, Sep 21, 2023 at 4:49 AM Jing Ge 
> > > wrote:
> > > >
> > > > Hi Xiangyu,
> > > >
> > > > Thanks for driving it! There is one thing I am not really sure if I
> > > > understand you correctly.
> > > >
> > > > According to the FLIP: "The minimum resource limitation will be
> > > implemented
> > > > in the DefaultResourceAllocationStrategy of FineGrainedSlotManager.
> > > >
> > > > Each time when SlotManager needs to reconcile the cluster resources
> or
> > > > fulfill job resource requirements, the
> DefaultResourceAllocationStrategy
> > > > will check if the minimum resource requirement has been fulfilled.
> If it
> > > is
> > > > not, DefaultResourceAllocationStrategy will request new
> > > PendingTaskManagers
> > > > and FineGrainedSlotManager will allocate new worker resources
> > > accordingly."
> > > >
> > > > "To avoid this oscillation, we need to check the worker number
> derived
> > > from
> > > > minimum and maximum resource configuration is consistent before
> starting
> > > > SlotManager."
> > > >
> > > > Will the minimum resource configuration also take effect for
> streaming
> > > jobs
> > > > in application mode? Since it is not recommended to
> > > > configure slotmanager.number-of-slots.max for streaming jobs, does it
> > > make
> > > > sense to disable it for common streaming jobs? At least disable the
> check
> > > > for avoiding the oscillation?
> > > >
> > > > Best regards,
> > > > Jing
> > > >
> > > >
> > > > On Tue, Sep 19, 2023 at 4:58 PM Chen Zhanghao <
> zhanghao.c...@outlook.com
> > > >
> > > > wrote:
> > > >
> > > > > Thanks for driving this, Xiangyu. We use Session clusters for
> quick SQL
> > > > > debugging internally, and found cold-start job submission slow due
> to
> > > lack
> > > > > of the exact minimum resource reservation feature proposed here.
> This
> > > > > should improve the experience a lot for running short lived-jobs in
> > > session
> > > > > clusters.
> > > > >
&

Re: [Discuss] FLIP-362: Support minimum resource limitation

2023-09-20 Thread xiangyu feng
Hi Jing,

Thanks for pointing this out. As described by Yangze, the min resource
option will be set to 0 by default and the oscillation check will be
disabled at then.
In most cases, common streaming jobs won't be affected by this new added
option.

I've updated the FLIP to explain this.

Thx,
Xiangyu

Yangze Guo  于2023年9月21日周四 10:18写道:

> Thanks for the comments, Jing.
>
> > Will the minimum resource configuration also take effect for streaming
> jobs in application mode?
> > Since it is not recommended to configure slotmanager.number-of-slots.max
> for streaming jobs, does it make sense to disable it for common streaming
> jobs? At least disable the check for avoiding the oscillation?
>
> Yes. The minimum resource configuration will only disabled in
> standalone cluster atm. I agree it make sense to disable it for a pure
> streaming job, however:
> - By default, the minimum resource is configured to 0. If users do not
> proactively set it, either the oscillation check or the minimum
> restriction can be considered as disabled.
> - The minimum resource is a cluster-level configuration rather than a
> job-level configuration. If a user has an application with two batch
> jobs preceding the streaming job, they may also require this
> configuration to accelerate the execution of batch jobs.
>
> WDYT?
>
> Best,
> Yangze Guo
>
> On Thu, Sep 21, 2023 at 4:49 AM Jing Ge 
> wrote:
> >
> > Hi Xiangyu,
> >
> > Thanks for driving it! There is one thing I am not really sure if I
> > understand you correctly.
> >
> > According to the FLIP: "The minimum resource limitation will be
> implemented
> > in the DefaultResourceAllocationStrategy of FineGrainedSlotManager.
> >
> > Each time when SlotManager needs to reconcile the cluster resources or
> > fulfill job resource requirements, the DefaultResourceAllocationStrategy
> > will check if the minimum resource requirement has been fulfilled. If it
> is
> > not, DefaultResourceAllocationStrategy will request new
> PendingTaskManagers
> > and FineGrainedSlotManager will allocate new worker resources
> accordingly."
> >
> > "To avoid this oscillation, we need to check the worker number derived
> from
> > minimum and maximum resource configuration is consistent before starting
> > SlotManager."
> >
> > Will the minimum resource configuration also take effect for streaming
> jobs
> > in application mode? Since it is not recommended to
> > configure slotmanager.number-of-slots.max for streaming jobs, does it
> make
> > sense to disable it for common streaming jobs? At least disable the check
> > for avoiding the oscillation?
> >
> > Best regards,
> > Jing
> >
> >
> > On Tue, Sep 19, 2023 at 4:58 PM Chen Zhanghao  >
> > wrote:
> >
> > > Thanks for driving this, Xiangyu. We use Session clusters for quick SQL
> > > debugging internally, and found cold-start job submission slow due to
> lack
> > > of the exact minimum resource reservation feature proposed here. This
> > > should improve the experience a lot for running short lived-jobs in
> session
> > > clusters.
> > >
> > > Best,
> > > Zhanghao Chen
> > > 
> > > 发件人: Yangze Guo 
> > > 发送时间: 2023年9月19日 13:10
> > > 收件人: xiangyu feng 
> > > 抄送: dev@flink.apache.org 
> > > 主题: Re: [Discuss] FLIP-362: Support minimum resource limitation
> > >
> > > Thanks for driving this @Xiangyu. This is a feature that many users
> > > have requested for a long time. +1 for the overall proposal.
> > >
> > > Best,
> > > Yangze Guo
> > >
> > > On Tue, Sep 19, 2023 at 11:48 AM xiangyu feng 
> > > wrote:
> > > >
> > > > Hi Devs,
> > > >
> > > > I'm opening this thread to discuss FLIP-362: Support minimum resource
> > > limitation. The design doc can be found at:
> > > > FLIP-362: Support minimum resource limitation
> > > >
> > > > Currently, the Flink cluster only requests Task Managers (TMs) when
> > > there is a resource requirement, and idle TMs are released after a
> certain
> > > period of time. However, in certain scenarios, such as running short
> > > lived-jobs in session cluster and scheduling batch jobs stage by
> stage, we
> > > need to improve the efficiency of job execution by maintaining a
> certain
> > > number of available workers in the cluster all the time.
> > > >
> > > > After discussed with Yangze, we introduced this new feature. The new
> > > added public options and proposed changes are described in this FLIP.
> > > >
> > > > Looking forward to your feedback, thanks.
> > > >
> > > > Best regards,
> > > > Xiangyu
> > > >
> > >
>


[Discuss] FLIP-362: Support minimum resource limitation

2023-09-18 Thread xiangyu feng
Hi Devs,

I'm opening this thread to discuss FLIP-362: Support minimum resource
limitation. The design doc can be found at:
FLIP-362: Support minimum resource limitation


Currently, the Flink cluster only requests Task Managers (TMs) when there
is a resource requirement, and idle TMs are released after a certain period
of time. However, in certain scenarios, such as running short lived-jobs in
session cluster and scheduling batch jobs stage by stage, we need to
improve the efficiency of job execution by maintaining a certain number of
available workers in the cluster all the time.

After discussed with Yangze, we introduced this new feature. The new added
public options and proposed changes are described in this FLIP.

Looking forward to your feedback, thanks.

Best regards,
Xiangyu


[REQUEST] Apply edit permissions for FLIP

2023-09-07 Thread xiangyu feng
Hi devs,

I would like to request the edit permission for FLIP. I'm currently working
on FLINK-15959  which
need change the ResourceManagerOptions. My wiki username is xiangyu0xf.

Thx,
Xiangyu


[jira] [Created] (FLINK-33016) Invalid download url for Flink JDBC Driver page

2023-09-03 Thread xiangyu feng (Jira)
xiangyu feng created FLINK-33016:


 Summary: Invalid download url for Flink JDBC Driver page
 Key: FLINK-33016
 URL: https://issues.apache.org/jira/browse/FLINK-33016
 Project: Flink
  Issue Type: Bug
  Components: Documentation, Table SQL / JDBC
Reporter: xiangyu feng
 Attachments: image-2023-09-04-11-47-26-874.png, 
image-2023-09-04-11-48-40-461.png

The download url in flink sql driver page is invalid.
!image-2023-09-04-11-47-26-874.png|width=447,height=414!

 

!image-2023-09-04-11-48-40-461.png|width=483,height=108!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-32985) Support setting env.java.opts.sql-gateway to specify the jvm opts for sql gateway

2023-08-28 Thread xiangyu feng (Jira)
xiangyu feng created FLINK-32985:


 Summary: Support setting env.java.opts.sql-gateway to specify the 
jvm opts for sql gateway
 Key: FLINK-32985
 URL: https://issues.apache.org/jira/browse/FLINK-32985
 Project: Flink
  Issue Type: Sub-task
  Components: Table SQL / Gateway
Reporter: xiangyu feng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: Heartbeat of TaskManager with id container

2023-08-23 Thread xiangyu feng
Hi Nagireddy,

I'm not sure how you monitoring kafka lag. AFAIK, you can check the
metadata of the topic in your Kafka cluster to see the actual lag by
following command.

./kafka-consumer-groups.sh --bootstrap-server 192.168.0.107:39092
--group  --describe


This tool is provided with Kafka distribution. If there are any gap between
Kafka Connector lag and this tool, u can open a jira to report this
issue[1].

Hope this helps u!

[1]
https://issues.apache.org/jira/secure/RapidBoard.jspa?rapidView=582=FLINK

Regards,
Xiangyu




Y SREEKARA BHARGAVA REDDY  于2023年8月21日周一 18:45写道:

> Thanks Xiangyu,
>
> I have one issue, while running flink with kafka connector. its a working
> fine for couple of days.
>
> But suddenly  kafka lag went to "Negative value"
>
> I am trying to find the root cause for that. Any suggestions?
>
>
> On Sat, Aug 5, 2023 at 5:57 PM xiangyu feng  wrote:
>
>> Hi Nagireddy,
>>
>> I'm not particularly familiar with StreamingFileSink but I checked with
>> the implementation of HadoopFsCommitter. AFAIK, when committing files to
>> HDFS the committer will check if the temp file exist in the first place.
>> [image: image.png]
>>
>> In your case, could u check why the committing temp file not exist on
>> HDFS? Were these files deleted by mistake? I searched some information,
>> this error may be due to the small file merge will merge the file that is
>> being written. You can disable small file merge when writing files.
>>
>> Hope this helps.
>>
>> Regards,
>> Xiangyu
>>
>>
>> Y SREEKARA BHARGAVA REDDY  于2023年8月5日周六 18:22写道:
>>
>>> Hi Xiangyu/Dev,
>>>
>>> Did any one has solution handle below important note in
>>> StreamingFileSink:
>>>
>>> Caused by: java.io.IOException: Cannot clean commit: Staging file does
>>> not exist.
>>> at org.apache.flink.runtime.fs.hdfs.
>>> HadoopRecoverableFsDataOutputStream$HadoopFsCommitter.commit(
>>> HadoopRecoverableFsDataOutputStream.java:250)
>>>
>>> Important Note 3: Flink and the StreamingFileSink never overwrites
>>> committed data. Given this, when trying to restore from an old
>>> checkpoint/savepoint which assumes an in-progress file which was committed
>>> by subsequent successful checkpoints, *Flink will refuse to resume and
>>> it will throw an exception as it cannot locate the in-progress file*.
>>>
>>> Currently i am facing same issue in the PROD code.
>>>
>>>
>>>
>>> Regards,
>>> Nagireddy Y.
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Fri, Aug 4, 2023 at 12:11 PM xiangyu feng 
>>> wrote:
>>>
>>>> Hi ynagireddy4u,
>>>>
>>>> From the exception info, I think your application has met a HDFS file
>>>> issue during the commit phase of checkpoint. Can u check why 'Staging file
>>>> does not exist' in the first place?
>>>>
>>>> Regards,
>>>> Xiangyu
>>>>
>>>> Y SREEKARA BHARGAVA REDDY  于2023年8月4日周五
>>>> 12:21写道:
>>>>
>>>>> Hi Xiangyu/Dev Team,
>>>>>
>>>>> Thanks for reply.
>>>>>
>>>>> In  our flink job, we increase the *checkpoint timeout to 30 min.*
>>>>> And the *checkpoint interval is 10 min.*
>>>>>
>>>>> But while running the job we got below exception.
>>>>>
>>>>> java.lang.RuntimeException: Error while confirming checkpoint
>>>>> at org.apache.flink.streaming.runtime.tasks.StreamTask
>>>>> .notifyCheckpointComplete(StreamTask.java:952)
>>>>> at org.apache.flink.streaming.runtime.tasks.StreamTask
>>>>> .lambda$notifyCheckpointCompleteAsync$7(StreamTask.java:924)
>>>>> at org.apache.flink.util.function.FunctionUtils
>>>>> .lambda$asCallable$5(FunctionUtils.java:125)
>>>>> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>>>>> at org.apache.flink.streaming.runtime.tasks.
>>>>> StreamTaskActionExecutor$SynchronizedStreamTaskActionExecutor.run(
>>>>> StreamTaskActionExecutor.java:87)
>>>>> at org.apache.flink.streaming.runtime.tasks.mailbox.Mail.run(Mail
>>>>> .java:78)
>>>>> at org.apache.flink.streaming.runtime.tasks.mailbox.
>>>>> MailboxProcessor.processMail(MailboxProcessor.java:261)
>>>>> at org.apache.flink.streaming.runtime.tasks.mailbox.
>>>>

[jira] [Created] (FLINK-32898) Improvement of usability for Flink OLAP

2023-08-20 Thread xiangyu feng (Jira)
xiangyu feng created FLINK-32898:


 Summary: Improvement of usability for Flink OLAP
 Key: FLINK-32898
 URL: https://issues.apache.org/jira/browse/FLINK-32898
 Project: Flink
  Issue Type: Improvement
  Components: Client / Job Submission, Runtime / Configuration, Runtime 
/ Coordination, Table SQL / JDBC, Table SQL / Runtime
Reporter: xiangyu feng


By building Flink OLAP services internally using Flink master branch, we 
realized that if we want to involve more community users to use Flink OLAP, 
besides scheduling and execution performance improvement mentioned in 
[FLINK-25318|https://issues.apache.org/jira/browse/FLINK-25318], we also need 
to improve some basic usability issues in OLAP scenarios, including providing 
quick start for users to build Flink OLAP service, allocating minimum resources 
when cluster start, displaying detailed job sql in flink ui, etc.

 

Meanwhile, we believe this can also optimize the user experience of running 
batch jobs in session cluster mode. So after talk with [~zjureel] , we create 
this jira to track the progress of related work.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-32880) Redundant taskManager should be replenished in FineGrainedSlotManager

2023-08-16 Thread xiangyu feng (Jira)
xiangyu feng created FLINK-32880:


 Summary: Redundant taskManager should be replenished in 
FineGrainedSlotManager
 Key: FLINK-32880
 URL: https://issues.apache.org/jira/browse/FLINK-32880
 Project: Flink
  Issue Type: Bug
  Components: Runtime / Coordination
Reporter: xiangyu feng


Currently, if you are using FineGrainedSlotManager, when a redundant 
taskmanager exit abnormally, no extra taskmanager will be replenished.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] How about adding OLAP to Flink Roadmap?

2023-08-09 Thread xiangyu feng
Thank you Shammon for initiating this discussion. As one of the Flink OLAP
developers in ByteDance, I would also like to share a real case of our
users.

About two years ago we found our first OLAP user internally by integrating
Flink OLAP with ByteHTAP. Users are willing to use Flink as an OLAP engine
mainly hoping to use Flink's rich cross-datasource join capability. In the
beginning, we only support simple query patterns with qps less than 2 and
joins less than 5. With the business growing and our system capabilities
evolving, users have moved more scenarios to Flink OLAP, and the query
pattern is getting more and more complicated. Until early this year, the
user's query pattern has changed to peak QPS greater than 20, join table
number greater than 30 and scan data volume exceeding 1 billion rows. Even
with the evolution of our engine over the past two years, computing at this
scale is still very challenging. It is difficult to satisfy the computation
scale, system stability and query latency at the same time.

Through talking to our user, we easily build some intermediate views by
using Flink's streaming and batch engine. In a similar way to materialized
view, we have optimized user's query pattern to single query join less than
10, scan data volume in tens of millions and QPS remains unchanged. In this
way, our OLAP service has not only perfectly met the business requirements
and also we have made this migration process very smooth, thanks to Flink's
powerful streaming and batch computing ecosystem. Finally, we are highly
recognized by our users.

*There are two points I want to make with this case:*

1, Although there are many OLAP engines out there, Flink may not always
provide the best performance. But thanks to Flink's strong ecosystem, we
are confident that we can build an OLAP engine that provides a great user
experience. This is very important for many small and medium sized
companies;

2, From another perspective, I personally believe that building OLAP will
'bring Flink closer to our end-users' and present a wider variety of
computational challenges to Flink. As the case mentioned above, this is a
very common case in data analytics, where the flink is used to precompute
the data and feed it to OLAP services. These precalculations are often
designed to compensate for the capabilities of other OLAP engines, such as
some engines may not have strong join capabilities, some may not have
complete SQL support and some may have weak plan optimization support. In
general, Flink will not face these users directly, and thus cannot build a
comprehensive end-to-end solution to solve this problem once and for all. *By
building Flink OLAP, we can finally fill the last missing block in the
puzzle!*

Of course, as we built Flink OLAP internally, we encountered many
challenging issues, which is why we are putting this discussion out there
and hoping to involve more contributors. Meanwhile, We also hope to
contribute our optimizations back to the community through FLINK-25318.

So for me, it is a big +1 to add OLAP to Flink Roadmap. ^-^

Best,
Xiangyu

Xintong Song  于2023年8月8日周二 18:01写道:

> Thanks for bringing this up, Shammon.
>
> In general, I'd be +1 to improve Flink's ability to serve as an OLAP
> engine.
>
> I see a great value in Flink becoming a unified Large-scale Data Processing
> / Analysis tool. Through my interactions with users (Alibaba internal
> users, external users on Alibaba Cloud, developers from other companies via
> conferences / meetups), it's commonly complained how complicated and costly
> it is to build a data processing platform out of a bunch of different
> tools. That usually means higher learning / developing / operation and
> maintenance cost. In addition, I also see a trend that many projects and
> products are going along the same direction.
>
> I personally would not be concerned about losing focus. Unlike Apache
> Paimon (Flink Table Store) which tries to solve a completely different
> problem other than data processing, OLAP querying is just a special case of
> batch SQL data processing, where typically you have massive concurrent
> short-lived queries. As Shammon mentioned, Flink already has most of the
> essential building blocks: batch SQL processing, session mode, sql-gateway,
> etc. IMHO, the missing piece is mostly about improving the performance in
> the specific OLAP scenarios. That sounds like a reasonable extension to me.
>
> I'd consider improving the OLAP capability as nice-to-have improvements.
> That is to say, it must not come at the price of sacrificing the
> experiences in other streaming / batch scenarios, nor significantly
> complicate the system. I think one of the reasons that FLINK-25318 became
> stale was that some of the proposed solutions are too dedicated for the
> OLAP scenarios and require extra efforts to carefully re-design in order to
> not affect other scenarios. I'd be glad to see such efforts being revived.
>
> Regarding whether to include 

[jira] [Created] (FLINK-32764) SlotManager supports pulling up all TaskManagers at initialization

2023-08-06 Thread xiangyu feng (Jira)
xiangyu feng created FLINK-32764:


 Summary: SlotManager supports pulling up all TaskManagers at 
initialization 
 Key: FLINK-32764
 URL: https://issues.apache.org/jira/browse/FLINK-32764
 Project: Flink
  Issue Type: Sub-task
Reporter: xiangyu feng


For OLAP session clusters, It is better to pull all TM when the cluster starts, 
rather than waiting for the job to come and assign it later.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-32762) Support concurrency control when submitting OLAP jobs to Dispatcher

2023-08-05 Thread xiangyu feng (Jira)
xiangyu feng created FLINK-32762:


 Summary: Support concurrency control when submitting OLAP jobs to 
Dispatcher
 Key: FLINK-32762
 URL: https://issues.apache.org/jira/browse/FLINK-32762
 Project: Flink
  Issue Type: Sub-task
  Components: Client / Job Submission
Reporter: xiangyu feng


In the OLAP scenario, we need to support concurrency control of submitted jobs. 
When QPS surges, this could avoid performance degradation due to excessive load 
.

 

Currently, we support multiple concurrency control options including 
max-running-jobs and max-running-tasks-per-taskmanager.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: Heartbeat of TaskManager with id container

2023-08-05 Thread xiangyu feng
Hi Nagireddy,

I'm not particularly familiar with StreamingFileSink but I checked with the
implementation of HadoopFsCommitter. AFAIK, when committing files to HDFS
the committer will check if the temp file exist in the first place.
[image: image.png]

In your case, could u check why the committing temp file not exist on
HDFS? Were these files deleted by mistake? I searched some information,
this error may be due to the small file merge will merge the file that is
being written. You can disable small file merge when writing files.

Hope this helps.

Regards,
Xiangyu


Y SREEKARA BHARGAVA REDDY  于2023年8月5日周六 18:22写道:

> Hi Xiangyu/Dev,
>
> Did any one has solution handle below important note in StreamingFileSink:
>
> Caused by: java.io.IOException: Cannot clean commit: Staging file does not
> exist.
> at org.apache.flink.runtime.fs.hdfs.
> HadoopRecoverableFsDataOutputStream$HadoopFsCommitter.commit(
> HadoopRecoverableFsDataOutputStream.java:250)
>
> Important Note 3: Flink and the StreamingFileSink never overwrites
> committed data. Given this, when trying to restore from an old
> checkpoint/savepoint which assumes an in-progress file which was committed
> by subsequent successful checkpoints, *Flink will refuse to resume and it
> will throw an exception as it cannot locate the in-progress file*.
>
> Currently i am facing same issue in the PROD code.
>
>
>
> Regards,
> Nagireddy Y.
>
>
>
>
>
>
> On Fri, Aug 4, 2023 at 12:11 PM xiangyu feng  wrote:
>
>> Hi ynagireddy4u,
>>
>> From the exception info, I think your application has met a HDFS file
>> issue during the commit phase of checkpoint. Can u check why 'Staging file
>> does not exist' in the first place?
>>
>> Regards,
>> Xiangyu
>>
>> Y SREEKARA BHARGAVA REDDY  于2023年8月4日周五 12:21写道:
>>
>>> Hi Xiangyu/Dev Team,
>>>
>>> Thanks for reply.
>>>
>>> In  our flink job, we increase the *checkpoint timeout to 30 min.*
>>> And the *checkpoint interval is 10 min.*
>>>
>>> But while running the job we got below exception.
>>>
>>> java.lang.RuntimeException: Error while confirming checkpoint
>>> at org.apache.flink.streaming.runtime.tasks.StreamTask
>>> .notifyCheckpointComplete(StreamTask.java:952)
>>> at org.apache.flink.streaming.runtime.tasks.StreamTask
>>> .lambda$notifyCheckpointCompleteAsync$7(StreamTask.java:924)
>>> at org.apache.flink.util.function.FunctionUtils.lambda$asCallable$5(
>>> FunctionUtils.java:125)
>>> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>>> at org.apache.flink.streaming.runtime.tasks.
>>> StreamTaskActionExecutor$SynchronizedStreamTaskActionExecutor.run(
>>> StreamTaskActionExecutor.java:87)
>>> at org.apache.flink.streaming.runtime.tasks.mailbox.Mail.run(Mail
>>> .java:78)
>>> at org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor
>>> .processMail(MailboxProcessor.java:261)
>>> at org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor
>>> .runMailboxLoop(MailboxProcessor.java:186)
>>> at org.apache.flink.streaming.runtime.tasks.StreamTask
>>> .runMailboxLoop(StreamTask.java:487)
>>> at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(
>>> StreamTask.java:470)
>>> at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:707)
>>> at org.apache.flink.runtime.taskmanager.Task.run(Task.java:532)
>>> at java.lang.Thread.run(Thread.java:748)
>>> Caused by: java.io.IOException: Cannot clean commit: Staging file does
>>> not exist.
>>> at org.apache.flink.runtime.fs.hdfs.
>>> HadoopRecoverableFsDataOutputStream$HadoopFsCommitter.commit(
>>> HadoopRecoverableFsDataOutputStream.java:250)
>>> at org.apache.flink.streaming.api.functions.sink.filesystem.Bucket
>>> .onSuccessfulCompletionOfCheckpoint(Bucket.java:300)
>>> at org.apache.flink.streaming.api.functions.sink.filesystem.Buckets
>>> .commitUpToCheckpoint(Buckets.java:216)
>>> at org.apache.flink.streaming.api.functions.sink.filesystem.
>>> StreamingFileSink.notifyCheckpointComplete(StreamingFileSink.java:415)
>>> at org.apache.flink.streaming.api.operators.
>>> AbstractUdfStreamOperator.notifyCheckpointComplete(
>>> AbstractUdfStreamOperator.java:130)
>>> at org.apache.flink.streaming.runtime.tasks.StreamTask
>>> .lambda$notifyCheckpointComplete$8(StreamTask.java:936)
>>> at org.apache.flink.streaming.runtime.tasks.
>>> StreamTaskActionExecu

[jira] [Created] (FLINK-32756) Reues ZK CuratorFramework when submitting OLAP jobs to Flink session cluster

2023-08-04 Thread xiangyu feng (Jira)
xiangyu feng created FLINK-32756:


 Summary: Reues ZK CuratorFramework when submitting OLAP jobs to 
Flink session cluster
 Key: FLINK-32756
 URL: https://issues.apache.org/jira/browse/FLINK-32756
 Project: Flink
  Issue Type: Sub-task
Reporter: xiangyu feng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-32755) Add quick start guide for Flink OLAP

2023-08-04 Thread xiangyu feng (Jira)
xiangyu feng created FLINK-32755:


 Summary: Add quick start guide for Flink OLAP
 Key: FLINK-32755
 URL: https://issues.apache.org/jira/browse/FLINK-32755
 Project: Flink
  Issue Type: Sub-task
  Components: Documentation
Reporter: xiangyu feng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: Heartbeat of TaskManager with id container

2023-08-04 Thread xiangyu feng
Hi ynagireddy4u,

>From the exception info, I think your application has met a HDFS file issue
during the commit phase of checkpoint. Can u check why 'Staging file does
not exist' in the first place?

Regards,
Xiangyu

Y SREEKARA BHARGAVA REDDY  于2023年8月4日周五 12:21写道:

> Hi Xiangyu/Dev Team,
>
> Thanks for reply.
>
> In  our flink job, we increase the *checkpoint timeout to 30 min.*
> And the *checkpoint interval is 10 min.*
>
> But while running the job we got below exception.
>
> java.lang.RuntimeException: Error while confirming checkpoint
> at org.apache.flink.streaming.runtime.tasks.StreamTask
> .notifyCheckpointComplete(StreamTask.java:952)
> at org.apache.flink.streaming.runtime.tasks.StreamTask
> .lambda$notifyCheckpointCompleteAsync$7(StreamTask.java:924)
> at org.apache.flink.util.function.FunctionUtils.lambda$asCallable$5(
> FunctionUtils.java:125)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at org.apache.flink.streaming.runtime.tasks.
> StreamTaskActionExecutor$SynchronizedStreamTaskActionExecutor.run(
> StreamTaskActionExecutor.java:87)
> at org.apache.flink.streaming.runtime.tasks.mailbox.Mail.run(Mail
> .java:78)
> at org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor
> .processMail(MailboxProcessor.java:261)
> at org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor
> .runMailboxLoop(MailboxProcessor.java:186)
> at org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(
> StreamTask.java:487)
> at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(
> StreamTask.java:470)
> at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:707)
> at org.apache.flink.runtime.taskmanager.Task.run(Task.java:532)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: java.io.IOException: Cannot clean commit: Staging file does not
> exist.
> at org.apache.flink.runtime.fs.hdfs.
> HadoopRecoverableFsDataOutputStream$HadoopFsCommitter.commit(
> HadoopRecoverableFsDataOutputStream.java:250)
> at org.apache.flink.streaming.api.functions.sink.filesystem.Bucket
> .onSuccessfulCompletionOfCheckpoint(Bucket.java:300)
> at org.apache.flink.streaming.api.functions.sink.filesystem.Buckets
> .commitUpToCheckpoint(Buckets.java:216)
> at org.apache.flink.streaming.api.functions.sink.filesystem.
> StreamingFileSink.notifyCheckpointComplete(StreamingFileSink.java:415)
> at org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator
> .notifyCheckpointComplete(AbstractUdfStreamOperator.java:130)
> at org.apache.flink.streaming.runtime.tasks.StreamTask
> .lambda$notifyCheckpointComplete$8(StreamTask.java:936)
> at org.apache.flink.streaming.runtime.tasks.
> StreamTaskActionExecutor$SynchronizedStreamTaskActionExecutor.call(
> StreamTaskActionExecutor.java:101)
> at org.apache.flink.streaming.runtime.tasks.StreamTask
> .notifyCheckpointComplete(StreamTask.java:930)
> ... 12 more
>
> It would be great, if you have any workaround for that.
>
> Regards,
> Nagireddy Y.
>
>
>
>
>
>
> On Thu, Aug 3, 2023 at 7:24 AM xiangyu feng  wrote:
>
>> Hi ynagireddy4u,
>>
>> We have met this exception before. Usually it is caused by following
>> reasons:
>>
>> 1), TaskManager is too busy with other works to send the heartbeat to
>> JobMaster or TaskManager process might already exited;
>> 2), There might be a network issues between this TaskManager and
>> JobMaster;
>> 3), In certain cases, JobMaster actor might also being too busy to
>> process the RPC requests from TaskManager;
>>
>> Pls check if your problem fits the above situations.
>>
>> Best,
>> Xiangyu
>>
>>
>> Y SREEKARA BHARGAVA REDDY  于2023年7月31日周一 20:49写道:
>>
>>> Hi Team,
>>>
>>> Did any one face the below exception.
>>> If yes, please share the resolution.
>>>
>>>
>>> 2023-07-28 22:04:16
>>> j*ava.util.concurrent.TimeoutException: Heartbeat of TaskManager with id
>>> container_e19_1690528962823_0382_01_05 timed out.*
>>> at org.apache.flink.runtime.jobmaster.
>>> JobMaster$TaskManagerHeartbeatListener.notifyHeartbeatTimeout(JobMaster
>>> .java:1147)
>>> at org.apache.flink.runtime.heartbeat.HeartbeatMonitorImpl.run(
>>> HeartbeatMonitorImpl.java:109)
>>> at
>>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:
>>> 511)
>>> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>>> at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRunAsync(
>>> AkkaRpcActor.j

Re: [ANNOUNCE] New Apache Flink Committer - Weihua Hu

2023-08-03 Thread xiangyu feng
Congratulations Weihua!


Best,
Xiangyu

Leonard Xu  于2023年8月4日周五 11:31写道:

> Congratulations Weihua!
>
>
> Best,
> Leonard
>
> > On Aug 4, 2023, at 11:18 AM, Xintong Song  wrote:
> >
> > Hi everyone,
> >
> > On behalf of the PMC, I'm very happy to announce Weihua Hu as a new Flink
> > Committer!
> >
> > Weihua has been consistently contributing to the project since May 2022.
> He
> > mainly works in Flink's distributed coordination areas. He is the main
> > contributor of FLIP-298 and many other improvements in large-scale job
> > scheduling and improvements. He is also quite active in mailing lists,
> > participating discussions and answering user questions.
> >
> > Please join me in congratulating Weihua!
> >
> > Best,
> >
> > Xintong (on behalf of the Apache Flink PMC)
>
>


[jira] [Created] (FLINK-32746) Enable ZGC in JDK17 to solve long time class unloading STW during fullgc

2023-08-03 Thread xiangyu feng (Jira)
xiangyu feng created FLINK-32746:


 Summary: Enable ZGC in JDK17 to solve long time class unloading 
STW during fullgc
 Key: FLINK-32746
 URL: https://issues.apache.org/jira/browse/FLINK-32746
 Project: Flink
  Issue Type: Sub-task
  Components: Table SQL / Runtime
Reporter: xiangyu feng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: Heartbeat of TaskManager with id container

2023-08-02 Thread xiangyu feng
Hi ynagireddy4u,

We have met this exception before. Usually it is caused by following
reasons:

1), TaskManager is too busy with other works to send the heartbeat to
JobMaster or TaskManager process might already exited;
2), There might be a network issues between this TaskManager and JobMaster;
3), In certain cases, JobMaster actor might also being too busy to process
the RPC requests from TaskManager;

Pls check if your problem fits the above situations.

Best,
Xiangyu


Y SREEKARA BHARGAVA REDDY  于2023年7月31日周一 20:49写道:

> Hi Team,
>
> Did any one face the below exception.
> If yes, please share the resolution.
>
>
> 2023-07-28 22:04:16
> j*ava.util.concurrent.TimeoutException: Heartbeat of TaskManager with id
> container_e19_1690528962823_0382_01_05 timed out.*
> at org.apache.flink.runtime.jobmaster.
> JobMaster$TaskManagerHeartbeatListener.notifyHeartbeatTimeout(JobMaster
> .java:1147)
> at org.apache.flink.runtime.heartbeat.HeartbeatMonitorImpl.run(
> HeartbeatMonitorImpl.java:109)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:
> 511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRunAsync(
> AkkaRpcActor.java:397)
> at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(
> AkkaRpcActor.java:190)
> at org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor
> .handleRpcMessage(FencedAkkaRpcActor.java:74)
> at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage(
> AkkaRpcActor.java:152)
> at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:26)
> at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:21)
> at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123)
> at akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:21)
> at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:170)
> at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171)
> at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171)
> at akka.actor.Actor$class.aroundReceive(Actor.scala:517)
> at akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:225)
> at akka.actor.ActorCell.receiveMessage(ActorCell.scala:592)
> at akka.actor.ActorCell.invoke(ActorCell.scala:561)
> at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:258)
> at akka.dispatch.Mailbox.run(Mailbox.scala:225)
> at akka.dispatch.Mailbox.exec(Mailbox.scala:235)
> at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
> at akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool
> .java:1339)
> at
> akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
> at akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread
> .java:107)
>
> Any suggestions, please share with me.
>
> Regards,
> Nagireddy Y
>


Re: [ANNOUNCE] New Apache Flink Committer - Yong Fang

2023-07-23 Thread xiangyu feng
Congratulations, Yong!

Best,
Xiangyu

liu ron  于2023年7月24日周一 11:48写道:

> Congratulations,
>
> Best,
> Ron
>
> Qingsheng Ren  于2023年7月24日周一 11:18写道:
>
> > Congratulations and welcome aboard, Yong!
> >
> > Best,
> > Qingsheng
> >
> > On Mon, Jul 24, 2023 at 11:14 AM Chen Zhanghao <
> zhanghao.c...@outlook.com>
> > wrote:
> >
> > > Congrats, Shammon!
> > >
> > > Best,
> > > Zhanghao Chen
> > > 
> > > 发件人: Weihua Hu 
> > > 发送时间: 2023年7月24日 11:11
> > > 收件人: dev@flink.apache.org 
> > > 抄送: Shammon FY 
> > > 主题: Re: [ANNOUNCE] New Apache Flink Committer - Yong Fang
> > >
> > > Congratulations!
> > >
> > > Best,
> > > Weihua
> > >
> > >
> > > On Mon, Jul 24, 2023 at 11:04 AM Paul Lam 
> wrote:
> > >
> > > > Congrats, Shammon!
> > > >
> > > > Best,
> > > > Paul Lam
> > > >
> > > > > 2023年7月24日 10:56,Jingsong Li  写道:
> > > > >
> > > > > Shammon
> > > >
> > > >
> > >
> >
>