Re: Please review design doc for task resizing

Niklas Nielsen Wed, 09 Dec 2015 09:56:53 -0800

(Inlined)

On Mon, Dec 7, 2015 at 6:54 AM, Qian Zhang <zhq527...@gmail.com> wrote:


> Thanks Niklas for your comments :-)
>
> For your first comment, so you prefer option 2 in the design doc (i.e., add
> resize as a new offer operation), right? Actually after more thinking, I
> think if we want to support the following 2 use cases (especially the
> second one) and OK to resize a task with an empty offer list (meaning we
> may need to remove the check:
> https://github.com/apache/mesos/blob/0.25.0/src/master/master.cpp#L2816
> for
> reducing resource from a task), then I also agree option 2 is the best.
> 1. Framework adds / reduces multiple resources for its task at the same
> time.
> 2. Framework reserves the resources and resizes a task to use those
> resources at the same time.
>

Great!


>
> For your second comment, can you please clarify more about "should be
> dictated by the resource type itself"? I see your comment in the design doc
> "The resource type should dictate the sign of R_2 -R_1.", but I am not sure
> what you meant about "R_2 -R_1". Actually there are still some discussion
> about whether frameworks need to send desired resource (e.g., resize my
> task to 8GB) or send the resource delta (e.g., add 2GB to my task) to
> master, personally I prefer the later because the former can cause some
> race condition that we can not handle, you can refer my comments in the
> design doc for details.
>

I see. However, that operation is not idempotent. Imagine you issue a
resize request and for some reason, the request takes long to carry out and
you don't have a way to guarantee that the request was received (for
example, during a master failover). In the mean time, you issue another
resize. When both land, it may not be the action you wanted.
containerizer->update() applies the aggregate size anyway, so you need to
keep track of the 'sign' of the resize all the way down to the slave
process.

Maybe I am completely off. Other folks have some input here?


>
> And I have 2 more questions that I want to discuss with you:
> 1. David G raised a user story about framework should be able to resize its
> executor, I think this should be a valid use case, but I would suggest us
> to focus on task resizing in MVP and handle executor resizing in the
> post-MVP, how do you think?
> 2. Do you think we need to involve executor in task resizing? E.g., let
> slave send a message (e.g., RunTaskMessage) to executor so that executor
> can do the actual resizing? The reason I raise this question is that I
> think in some cases, executor needs to be aware of the resized resources,
> e.g., framework adds a new port to a task, I think executor & task should
> know such new port so that the task can start to use it. And in the
> Kubernetes on Mesos case, user may want to resize a pod which is actually
> created an managed by k8sm-executor, so it should be involved to resize the
> resources of the pod.
>

Maybe we can do that down the line; as an MVP, maybe we can skip it but
have a model that supports it?
Using the task info as a 'desired state', changing the executor info
resources could be used to change it's size. However, there are some
details in terms of master failover and slave reregistration where executor
infos are sent from the slaves, where we need to be careful.


>
> Currently I do not have PoC implementation for my proposal yet, do you
> recommend that we should have it now? Or after the design is close to be
> finalized or at least after we make the decision among those 3 options
> about scheduler API changes in the design doc?
>

Doesn't hurt to experiment and see if there are obvious things that we
missed to address.
If you haven't done any work yet, I'd maybe defer until we at least have
the placement of the 'resize operation' nailed down.


>
> I'd like to have an online sync up with you, can you please let me know
> when you will be online in IRC usually? Or you prefer other ways to sync
> up? I will try to catch you :-)
>

Let's do a joint call; how about Friday or Monday?
I am available in business hours PST.


>
>
> Thanks,
> Qian
>
>
> 2015-12-05 7:03 GMT+08:00 Niklas Nielsen <n...@qni.dk>:
>
> > Hi Qian,
> >
> > Thanks for the update and I apologize the response time.
> >
> > Do you have a PoC implementation of your proposal?
> >
> > I have trouble understanding the motivation of _not_ adding resizing as a
> > usual operation. It seems much cleaner in my mind. To David G's and Alex
> > R's comment: if you want to resize without an offer (during task
> > shrinking), you could do it with an empty offer list. Giving up combining
> > task resizing with the other operations (which will most likely scale
> with
> > upcoming features) is a big loss, but maybe I am missing something.
> >
> > Secondly, whether the new desired resource shape requires growing and
> > shrinking, I think should be dictated by the resource type itself rather
> > than explicitly set by the framework writer. You have to do that math
> > anyway to figure out whether the framework's request is valid, no?
> >
> > We can do a online sync soon, if you want to give a pitch on the design.
> >
> > Cheers,
> > Niklas
> >
> >
> > On Thu, Nov 19, 2015 at 6:34 AM, Qian Zhang <zhq527...@gmail.com> wrote:
> >
> > > Hi All,
> > >
> > > I am currently working on task resizing (MESOS-938), and have drafted a
> > > design doc (see the link below).
> > >
> > >
> >
> https://docs.google.com/document/d/15rVmS2AXLzTDSEugAVDxWuHFUentp82KhL2yzxBCsi8/edit?usp=sharing
> > >
> > >
> > > Please feel free to review it, any comments are welcome, thanks!
> > >
> > >
> > > Regards,
> > > Qian
> > >
> >
> >
> >
> > --
> > Niklas
> >
>



-- 
Niklas

Re: Please review design doc for task resizing

Reply via email to