[ 
https://issues.apache.org/jira/browse/MESOS-7181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15939922#comment-15939922
 ] 

Yan Xu commented on MESOS-7181:
-------------------------------

Yeah so I meant that on the receiving end the process manager doesn't know 
whether an actor is being {{link}} ed or not so it has to send 
{{TargetPIDExited}} in all situations, this is different than the current local 
{{PID}} behavior. Also this message is sent not when the actor dies but when a 
message arrives, so I guess if a frameworks dies when it's suppressed and with 
no pending status updates, the master will not find out about it because it 
doesn't send messages?

Perhaps we can have a {{Link}} message sent to the linkee based on which it can 
send a special {{Exited}} message to the sender when the actor terminates?

> Stale frameworks seen on Mesos, but not known to scheduler
> ----------------------------------------------------------
>
>                 Key: MESOS-7181
>                 URL: https://issues.apache.org/jira/browse/MESOS-7181
>             Project: Mesos
>          Issue Type: Bug
>          Components: general
>            Reporter: Anindya Sinha
>            Assignee: Anindya Sinha
>
> Using a scheduler which launches multiple frameworks using scheduler driver, 
> we observe occasionally that a framework exists on Mesos which is not known 
> to the scheduler. Since there is no entity that acts on the offers, this 
> framework ends up hogging all the offers leading to starvation in the cluster.
> This particular scenario is as follows:
> 1) Scheduler does a driver.start() which results in the 1st SUBSCRIBE sent to 
> master.
> 2) The scheduler driver resends the SUBSCRIBE (since the framework has not 
> yet registered) which is a result of the exponential backoff.
> 3) Framework is registered based on the 1st SUBSCRIBE, but the scheduler 
> issues a driver.stop() immediately which results in a TEARDOWN sent to the 
> master.
> 4) Master processes the TEARDOWN which removes the framework.
> 5) Master now processes the 2nd SUBSCRIBE (after authorization) and tries to 
> add this framework. This succeeds and a new framework id is generated (since 
> the original framework is no longer registered after the TEARDOWN) but the 
> Scheduler driver by now has already terminated once the scheduler issued the 
> driver.stop(). So, master continues to send offers to this 2nd framework and 
> hogs on to offers till offer time out.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to