[ https://issues.apache.org/jira/browse/MESOS-5910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15394584#comment-15394584 ]
Steven Schlansker commented on MESOS-5910: ------------------------------------------ It seems that it actually gives you a current snapshot when you initially subscribe, so perhaps this really is only an issue during master failovers. So this is probably of somewhat lower importance than I thought, although correctly handling master failover without losing events is still desirable. > Operator SUBSCRIBE api should provide a method to get all events without > requiring 100% uptime > ---------------------------------------------------------------------------------------------- > > Key: MESOS-5910 > URL: https://issues.apache.org/jira/browse/MESOS-5910 > Project: Mesos > Issue Type: Improvement > Components: HTTP API, json api > Affects Versions: 1.0.0 > Reporter: Steven Schlansker > > The v1.0 Operator API adds a new SUBSCRIBE call, which returns a stream of > events as they occur. This is going to be extremely useful for monitoring > and management jobs, as they can now have timely information about Mesos's > operation without requiring repeated polling or other ugly solutions. > Unfortunately, the SUBSCRIBE call always returns from the time the call is > made. This means that any consumer cannot reliably subscribe to "all > events"; if the application goes offline (network blip, code upgrade, etc) > all events during that downtime are lost. > You could instead have a cluster of applications receiving the events and > coordinating to deduplicate them to increase reliability, but this pushes a > lot of complexity into clients, and I suspect most users would not do this > correctly and would potentially lose events. > It would be extremely useful for a single client to be able to get a reliable > event stream without requiring a single HTTP connection to be 100% available. > One possible solution is to assign every event an ID. Then, extend the API > to take a "start position" in the log. The API immediately streams out all > events from the start event up until the tail of the log, and then continues > emitting new events are they occur. This provides a reliable way for a > consumer to get "at least once" semantics on events. The caveat is that the > consumer may only be down for as long as the master retains event history, > but this is a much easier pill to swallow. This is similar to etcd's "watch" > api, if you are looking for an actual implementation to reference. -- This message was sent by Atlassian JIRA (v6.3.4#6332)