[jira] [Commented] (MESOS-2735) Change the interaction between the slave and the resource estimator from polling to pushing

Jie Yu (JIRA) Tue, 19 May 2015 10:35:43 -0700

    [ 
https://issues.apache.org/jira/browse/MESOS-2735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14550838#comment-14550838
 ]


Jie Yu commented on MESOS-2735:
-------------------------------

{quote} One of the advantages that we had discussed in the past was that the 
pull model enables us to move as fast as we possibly can, rather than just 
getting a bunch of messages queued up in the slave that we have to process. 
{quote}

I don't think there is a difference in terms of queueing messages. The pull 
model also queues messages in the slave (e.g., 
'estimator->oversubscribed().then(defer(...))' also queues messages in slave's 
queue).

{quote} Even if we want to collect more fine-grained resource estimations a 
ResourceEstimator could do this and store this information until future polls. 
{quote}

I think there's no fundamental difference between the pull and the push model. 
The are only two subtle differences between the two: 1) the push model makes 
less assumptions about the slave behavior. 2) the push model is safer in the 
face of bad behaved resource estimator. Let me elaborate both of them below:

Regarding (1), let's use an example. Say we want to write a resource estimator 
which sends constant number of cpus (say 2 cpus) every 10 seconds. If we use a 
push model, we could just follow the 
[NoopResourceEstimatorProcess|https://github.com/apache/mesos/blob/master/src/slave/resource_estimator.cpp#L52]
 implementation in the code. Basically, we fork a libprocess and invoke the 
registered callback every 10 seconds with 2 cpus.

Now, if we use a pull model, we first need to make an assumption that the slave 
pull the resource estimator as fast as it can without any delay. If there's a 
delay say 1 second, the resource estimator needs to adjust its internal delay 
to be 9 seconds so that the total interval between two estimations is 10 
seconds apart. When implementing the `Future<Resources> oversubscribed()` 
interface, the module writer needs to make another assumption about the slave 
that the slave will not invoke the interface again if the previous estimation 
is still pending. This is important because otherwise, the module writer needs 
to maintain a list of Promises (instead of just one). I just feels that 
there're so many implicit assumptions that the module writer needs to make in a 
pull model.

Regarding (2), as I already stated in this ticket, since the slave invoked the 
interface ('oversubscribed()') in its context, the module writer needs to make 
sure the implementation of the interface does not block, otherwise the slave 
will hang. An alternative is to use 'async' while invoking the interface in the 
slave. I just feel this is rather not necessary if we use a push model.

> Change the interaction between the slave and the resource estimator from 
> polling to pushing 
> --------------------------------------------------------------------------------------------
>
>                 Key: MESOS-2735
>                 URL: https://issues.apache.org/jira/browse/MESOS-2735
>             Project: Mesos
>          Issue Type: Bug
>            Reporter: Jie Yu
>            Assignee: Jie Yu
>              Labels: twitter
>
> This will make the semantics more clear. The resource estimator can control 
> the speed of sending resources estimation to the slave.
> To avoid cyclic dependency, slave will register a callback with the resource 
> estimator and the resource estimator will simply invoke that callback when 
> there's a new estimation ready. The callback will be a defer to the slave's 
> main event queue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-2735) Change the interaction between the slave and the resource estimator from polling to pushing

Reply via email to