Hi Chesnay,

For your information, one major goal of blocklist mechanism is to
support FLIP-168(speculative execution of batch jobs). When
speculative execution happens, it needs to keep the existing tasks
running and launch speculative tasks on other nodes. We have heard
request of speculative execution from many users, who find the feature
a blocker for them to run their production batch jobs on Flink.
Multi-tenant environment is common for batch jobs and temporary
hotspot becomes a common problem. It cannot be well resolved by fine
grained resources(machine load is not controlled by Flink) nor by
killing all tasks on a temporary hotspot(the job may roll back to
hours ago). Therefore, even just considering this goal, I think it
adds enough value to users.

Regarding wether we should reject a proposal because it adds
complexity to the core components. My point is that it depends on
whether the feature adds enough value to users. And it's also welcome
if someone has another good idea which adds less complexity.

If you are still concerned about the value of this feature, I'm fine a
open a survey in the user mailing lists to see how users think about
it.
What do you think?

Thanks,
Zhu

Chesnay Schepler <ches...@apache.org> 于2022年6月7日周二 15:13写道:
>
> I've had some time to think about it and concluded to stick to my -1.
>
> While BLOCK_WITH_QUARANTINE is easy to implement (un-register TMs and
> ignore all RPCs (the latter mostly happens automatically)) it doesn't
> add a whole lot of value as it's pretty much equivalent with shutting
> down the TM.
>
> Meanwhile, BLOCK needs an entirely different implementation that
> interacts with the slot management on both JM/RM, and tbh I'm not so
> sure about it's purpose. If the node/process is overloaded because of
> the running job, well then resource profiles & fine-grained resource
> management is supposed to address that. If the overloading is externally
> induced then BLOCK only makes sense if the node is overloaded to a
> degree where the existing workload is fine (otherwise
> BLOCK_WITH_QUARANTINE would be a better choice I guess), which seems
> rather unlikely.
>
> I'm against this change because I don't believe it will be useful for
> the general user-base, nor since this can't be implemented without
> pushing some complexity into core components.
>
> On 28/05/2022 06:48, Zhu Zhu wrote:
> > Hi Chesnay,
> > Would you share your thoughts in the discussion thread if there are
> > still concerns?
> >
> > Thanks,
> > Zhu
> >
> > Chesnay Schepler <ches...@apache.org> 于2022年5月27日周五 14:54写道:
> >
> >> -1 to put a lid on things for now, because I'm not quite done yet with
> >> the discussion.
> >>
> >> On 27/05/2022 05:25, Yangze Guo wrote:
> >>> +1 (binding)
> >>>
> >>> Best,
> >>> Yangze Guo
> >>>
> >>> On Thu, May 26, 2022 at 3:54 PM Yun Gao <yungao...@aliyun.com.invalid> 
> >>> wrote:
> >>>> Thanks Lijie and Zhu for driving the FLIP!
> >>>>
> >>>> The blocked list functionality helps reduce the complexity in maintenance
> >>>> and the currently design looks good to me, thus +1 from my side 
> >>>> (binding).
> >>>>
> >>>>
> >>>> Best,
> >>>> Yun
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> ------------------------------------------------------------------
> >>>> From:Xintong Song <tonysong...@gmail.com>
> >>>> Send Time:2022 May 26 (Thu.) 12:51
> >>>> To:dev <dev@flink.apache.org>
> >>>> Subject:Re: [VOTE] FLIP-224: Blocklist Mechanism
> >>>>
> >>>> Thanks for driving this effort, Lijie.
> >>>>
> >>>> I think a nice addition would be to make this feature accessible directly
> >>>> from webui. However, there's no reason to block this FLIP on it.
> >>>>
> >>>> So +1 (binding) from my side.
> >>>>
> >>>> Best,
> >>>>
> >>>> Xintong
> >>>>
> >>>>
> >>>>
> >>>> On Fri, May 20, 2022 at 12:57 PM Lijie Wang <wangdachui9...@gmail.com>
> >>>> wrote:
> >>>>
> >>>>> Hi everyone,
> >>>>>
> >>>>> Thanks for the feedback for FLIP-224: Blocklist Mechanism [1] on the
> >>>>> discussion thread [2]
> >>>>>
> >>>>> I'd like to start a vote for it. The vote will last for at least 72 
> >>>>> hours
> >>>>> unless there is an objection or insufficient votes.
> >>>>>
> >>>>> [1]
> >>>>>
> >>>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-224%3A+Blocklist+Mechanism
> >>>>> [2] https://lists.apache.org/thread/fngkk52kjbc6b6v9nn0lkfq6hhsbgb1h
> >>>>>
> >>>>> Best,
> >>>>> Lijie
> >>>>>
>

Reply via email to