+1(binding)

Thanks for driving this proposal, it will be useful for rescale.

I’m preparing the FLIP-334[1], it will decouple the autoscaler and
kubernetes. In the end, we hope all kind of flink jobs work well with
rescale and autoscaler.

[1]
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=263424711

Best
Rui Fan

On Fri, 28 Jul 2023 at 19:21, Martijn Visser <martijnvis...@apache.org>
wrote:

> +1 (binding)
>
> On Fri, Jul 14, 2023 at 11:59 AM Prabhu Joseph <prabhujose.ga...@gmail.com
> >
> wrote:
>
> > *+1 (non-binding)*
> >
> > Thanks for working on this. We have seen good improvement during the cool
> > down period with this feature.
> > Below are details on the test results from one of our clusters:
> >
> > On a scale-out operation, 8 new nodes were added one by one with a gap of
> > ~30 seconds. There were 8 restarts within 4 minutes with the default
> > behaviour,
> > whereas only one with this feature (cooldown period of 4 minutes).
> >
> > The number of records processed by the job with this feature during the
> > restart window is higher (2909764), whereas it is only 1323960 with the
> > default
> > behaviour due to multiple restarts, where it spends most of the time
> > recovering, and also whatever work progressed by the tasks after the last
> > successful completed checkpoint is lost.
> >
> > Metrics Default Adaptive Scheduler Adaptive Scheduler With Cooldown
> Period
> > Remarks
> > NumRecordsProcessed 1323960 2909764 1. NumRecordsProcessed metric
> indicates
> > the difference the cool down period brings in. When the job is doing
> > multiple restarts, the task spends most of the time recovering, and the
> > progress the task made will be lost during the restart.
> >
> > 2. There is only one restart with Cool Down Period which happened when
> the
> > 8th node got added back.
> >
> > Job Parallelism 13 -> 20 -> 27 -> 34 -> 41 -> 48 -> 55 → 62 → 69 13 → 69
> > NumRestarts 8 1
> >
> >
> >
> >
> >
> >
> >
> >
> > On Wed, Jul 12, 2023 at 8:03 PM Etienne Chauchot <echauc...@apache.org>
> > wrote:
> >
> > > Hi all,
> > >
> > > I'm going on vacation tonight for 3 weeks.
> > >
> > > Even if the vote is not finished, as the implementation is rather quick
> > > and the design discussion had settled, I preferred I implementing
> > > FLIP-322 [1] to allow people to take a look while I'm off.
> > >
> > > [1] https://github.com/apache/flink/pull/22985
> > >
> > > Best
> > >
> > > Etienne
> > >
> > > Le 12/07/2023 à 09:56, Etienne Chauchot a écrit :
> > > >
> > > > Hi all,
> > > >
> > > > Would you mind casting your vote to this second vote thread (opened
> > > > after new discussions) so that the subject can move forward ?
> > > >
> > > > @David, @Chesnay, @Robert you took part to the discussions, can you
> > > > please sent your vote ?
> > > >
> > > > Thank you very much
> > > >
> > > > Best
> > > >
> > > > Etienne
> > > >
> > > > Le 06/07/2023 à 13:02, Etienne Chauchot a écrit :
> > > >>
> > > >> Hi all,
> > > >>
> > > >> Thanks for your feedback about the FLIP-322: Cooldown period for
> > > >> adaptive scheduler [1].
> > > >>
> > > >> This FLIP was discussed in [2].
> > > >>
> > > >> I'd like to start a vote for it. The vote will be open for at least
> 72
> > > >> hours (until July 9th 15:00 GMT) unless there is an objection or
> > > >> insufficient votes.
> > > >>
> > > >> [1]
> > > >>
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-322+Cooldown+period+for+adaptive+scheduler
> > > >> [2]
> https://lists.apache.org/thread/qvgxzhbp9rhlsqrybxdy51h05zwxfns6
> > > >>
> > > >> Best,
> > > >>
> > > >> Etienne
> >
>

Reply via email to