[
https://issues.apache.org/jira/browse/MESOS-1456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14018348#comment-14018348
]
Dominic Hamon commented on MESOS-1456:
--------------------------------------
Moving metric add/remove to initialize/finalize helps the issue but there is
still a race in the case where a snapshot is enqueued between the finalize and
the remove. We will still end up with an attempt to access a {{Gauge}} that
defers to a pid that will be invalid.
When we try to dispatch to an invalid pid, we delete the {{DispatchEvent}},
which in the case of a deferral deletes the {{Promise}} wrapper which drops the
associated {{Future}} without discarding it. See the comment in the {{Promise}}
destructor for the logic behind this.
One option is to add a 'remove' {{Future}} to the {{MetricsProcess}} for each
{{Metric}} that is added. Then when creating the value Futures in {{snapshot}},
we can add an 'onAny' to discard the {{Future}}. A call to {{remove}} could
then satisfy the remove {{Future}} and discard the value {{Future}}.
> Metric lifetime should be tied to process runstate, not lifetime.
> -----------------------------------------------------------------
>
> Key: MESOS-1456
> URL: https://issues.apache.org/jira/browse/MESOS-1456
> Project: Mesos
> Issue Type: Bug
> Components: statistics
> Affects Versions: 0.19.0
> Reporter: Dominic Hamon
> Assignee: Dominic Hamon
>
> The usual pattern for termination of processes is {{terminate(..); wait(..);
> delete ..;}} but the {{SchedulerProcess}} is terminated and then deleted some
> time later.
> If the metrics endpoint is accessed within that period, it never returns as
> it tries to access a {{Gauge}} that has a reference to a valid PID that is
> not getting any timeslices (the {{SchedulerProcess}}). A one-off fix can be
> made to the {{SchedulerProcess}} to move the metrics add/remove calls to
> {{initialize}} and {{finalize}}, but this should be the general pattern for
> every process with metrics.
--
This message was sent by Atlassian JIRA
(v6.2#6252)