Re: [DISCUSS] FLIP-119: Pipelined Region Scheduling

Xintong Song Fri, 27 Mar 2020 04:44:10 -0700

Gary & Zhu Zhu,

Thanks for preparing this FLIP, and a BIG +1 from my side. The trade-off
between resource utilization and potential deadlock problems has always
been a pain. Despite not solving all the deadlock cases, this FLIP is
definitely a big improvement. IIUC, it has already covered all the existing
single job cases, and all the mentioned non-covered cases are either in
multi-job session clusters or with diverse slot resources in future.


I've read through the FLIP, and it looks really good to me. Good job! All
the concerns and limitations that I can think of have already been clearly
stated, with reasonable potential future solutions. From the perspective of
fine-grained resource management, I do not see any serious/irresolvable
conflict at this time.

nit: The in-page links are not working. I guess those are copied from
google docs directly?


Thank you~

Xintong Song



On Fri, Mar 27, 2020 at 6:26 PM Zhu Zhu <[email protected]> wrote:

> To Yangze,
>
> >> the blocking edge will not be consumable before the upstream is
> finished.
> Yes. This is how we define a BLOCKING result partition, "Blocking
> partitions represent blocking data exchanges, where the data stream is
> first fully produced and then consumed".
>
> >> I'm also wondering could we execute the upstream and downstream regions
> at the same time if we have enough resources
> It may lead to resource waste since the tasks in downstream regions cannot
> read any data before the upstream region finishes. It saves a bit time on
> schedule, but usually it does not make much difference for large jobs,
> since data processing takes much more time. For small jobs, one can make
> all edges PIPELINED so that all the tasks can be scheduled at the same
> time.
>
> >> is it possible to change the data exchange mode of two regions
> dynamically?
> This is not in the scope of the FLIP. But we are moving forward to a more
> extensible scheduler (FLINK-10429) and resource aware scheduling
> (FLINK-10407).
> So I think it's possible we can have a scheduler in the future which
> dynamically changes the shuffle type wisely regarding available resources.
>
> Thanks,
> Zhu Zhu
>
> Yangze Guo <[email protected]> 于2020年3月27日周五 下午4:49写道：
>
> > Thanks for updating!
> >
> > +1 for supporting the pipelined region scheduling. Although we could
> > not prevent resource deadlock in all scenarios, it is really a big
> > step.
> >
> > The design generally LGTM.
> >
> > One minor thing I want to make sure. If I understand correctly, the
> > blocking edge will not be consumable before the upstream is finished.
> > Without it, when the failure occurs in the upstream region, there is
> > still possible to have a resource deadlock. I don't know whether it is
> > an explicit protocol now. But after this FLIP, I think it should not
> > be broken.
> > I'm also wondering could we execute the upstream and downstream
> > regions at the same time if we have enough resources. It can shorten
> > the running time of large job. We should not break the protocol of
> > blocking edge. But if it is possible to change the data exchange mode
> > of two regions dynamically?
> >
> > Best,
> > Yangze Guo
> >
> > On Fri, Mar 27, 2020 at 1:15 PM Zhu Zhu <[email protected]> wrote:
> > >
> > > Thanks for reporting this Yangze.
> > > I have update the permission to those images. Everyone are able to view
> > them now.
> > >
> > > Thanks,
> > > Zhu Zhu
> > >
> > > Yangze Guo <[email protected]> 于2020年3月27日周五 上午11:25写道：
> > >>
> > >> Thanks for driving this discussion, Zhu Zhu & Gary.
> > >>
> > >> I found that the image link in this FLIP is not working well. When I
> > >> open that link, Google doc told me that I have no access privilege.
> > >> Could you take a look at that issue?
> > >>
> > >> Best,
> > >> Yangze Guo
> > >>
> > >> On Fri, Mar 27, 2020 at 1:38 AM Gary Yao <[email protected]> wrote:
> > >> >
> > >> > Hi community,
> > >> >
> > >> > In the past releases, we have been working on refactoring Flink's
> > scheduler
> > >> > with the goal of making the scheduler extensible [1]. We have rolled
> > out
> > >> > most of the intended refactoring in Flink 1.10, and we think it is
> > now time
> > >> > to leverage our newly introduced abstractions to implement a new
> > resource
> > >> > optimized scheduling strategy: Pipelined Region Scheduling.
> > >> >
> > >> > This scheduling strategy aims at:
> > >> >
> > >> >     * avoidance of resource deadlocks when running batch jobs
> > >> >
> > >> >     * tunable with respect to resource consumption and throughput
> > >> >
> > >> > More details can be found in the Wiki [2]. We are looking forward to
> > your
> > >> > feedback.
> > >> >
> > >> > Best,
> > >> >
> > >> > Zhu Zhu & Gary
> > >> >
> > >> > [1] https://issues.apache.org/jira/browse/FLINK-10429
> > >> >
> > >> > [2]
> > >> >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-119+Pipelined+Region+Scheduling
> >
>

Re: [DISCUSS] FLIP-119: Pipelined Region Scheduling

Reply via email to