[ 
https://issues.apache.org/jira/browse/MESOS-1456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14018348#comment-14018348
 ] 

Dominic Hamon commented on MESOS-1456:
--------------------------------------

Moving metric add/remove to initialize/finalize helps the issue but there is 
still a race in the case where a snapshot is enqueued between the finalize and 
the remove. We will still end up with an attempt to access a {{Gauge}} that 
defers to a pid that will be invalid.

When we try to dispatch to an invalid pid, we delete the {{DispatchEvent}}, 
which in the case of a deferral deletes the {{Promise}} wrapper which drops the 
associated {{Future}} without discarding it. See the comment in the {{Promise}} 
destructor for the logic behind this.

One option is to add a 'remove' {{Future}} to the {{MetricsProcess}} for each 
{{Metric}} that is added. Then when creating the value Futures in {{snapshot}}, 
we can add an 'onAny' to discard the {{Future}}. A call to {{remove}} could 
then satisfy the remove {{Future}} and discard the value {{Future}}.

> Metric lifetime should be tied to process runstate, not lifetime.
> -----------------------------------------------------------------
>
>                 Key: MESOS-1456
>                 URL: https://issues.apache.org/jira/browse/MESOS-1456
>             Project: Mesos
>          Issue Type: Bug
>          Components: statistics
>    Affects Versions: 0.19.0
>            Reporter: Dominic Hamon
>            Assignee: Dominic Hamon
>
> The usual pattern for termination of processes is {{terminate(..); wait(..); 
> delete ..;}} but the {{SchedulerProcess}} is terminated and then deleted some 
> time later.
> If the metrics endpoint is accessed within that period, it never returns as 
> it tries to access a {{Gauge}} that has a reference to a valid PID that is 
> not getting any timeslices (the {{SchedulerProcess}}). A one-off fix can be 
> made to the {{SchedulerProcess}} to move the metrics add/remove calls to 
> {{initialize}} and {{finalize}}, but this should be the general pattern for 
> every process with metrics. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to