Re: Detecting slave crashes event

2015-09-23 Thread Benjamin Mahler
I believe some of the contributors from Mesosphere have been thinking about
it, but not sure on the plans. I'll let them reply here.

On Wed, Sep 16, 2015 at 11:11 AM, Paul Bell  wrote:

> Thank you, Benjamin.
>
> So, I could periodically request the metrics endpoint, or stream the logs
> (maybe via mesos.cli; or SSH)? What, roughly, does the "agent removed"
> message look like in the logs?
>
> Are there plans to offer a mechanism for event subscription?
>
> Cordially,
>
> Paul
>
>
>
> On Wed, Sep 16, 2015 at 1:30 PM, Benjamin Mahler <
> benjamin.mah...@gmail.com> wrote:
>
>> You can detect when we remove an agent due to health check failures via
>> the metrics endpoint, but these are counters that are better used for
>> alerting / dashboards for visibility. If you need to know which agents, you
>> can also consume the logs as a stop-gap solution, until we offer a
>> mechanism for subscribing to cluster events.
>>
>> On Wed, Sep 16, 2015 at 10:11 AM, Paul Bell  wrote:
>>
>>> Hi All,
>>>
>>> I am led to believe that, unlike Marathon, Mesos doesn't (yet?) offer a
>>> subscribable event bus.
>>>
>>> So I am wondering if there's a best practices way of determining if a
>>> slave node has crashed. By "crashed" I mean something like the power plug
>>> got yanked, or anything that would cause Mesos to stop talking to the slave
>>> node.
>>>
>>> I suppose such information would be recorded in /var/log/mesos.
>>>
>>> Interested to learn how best to detect this.
>>>
>>> Thank you.
>>>
>>> -Paul
>>>
>>
>>
>


Re: Detecting slave crashes event

2015-09-16 Thread Paul Bell
Thank you, Benjamin.

So, I could periodically request the metrics endpoint, or stream the logs
(maybe via mesos.cli; or SSH)? What, roughly, does the "agent removed"
message look like in the logs?

Are there plans to offer a mechanism for event subscription?

Cordially,

Paul



On Wed, Sep 16, 2015 at 1:30 PM, Benjamin Mahler 
wrote:

> You can detect when we remove an agent due to health check failures via
> the metrics endpoint, but these are counters that are better used for
> alerting / dashboards for visibility. If you need to know which agents, you
> can also consume the logs as a stop-gap solution, until we offer a
> mechanism for subscribing to cluster events.
>
> On Wed, Sep 16, 2015 at 10:11 AM, Paul Bell  wrote:
>
>> Hi All,
>>
>> I am led to believe that, unlike Marathon, Mesos doesn't (yet?) offer a
>> subscribable event bus.
>>
>> So I am wondering if there's a best practices way of determining if a
>> slave node has crashed. By "crashed" I mean something like the power plug
>> got yanked, or anything that would cause Mesos to stop talking to the slave
>> node.
>>
>> I suppose such information would be recorded in /var/log/mesos.
>>
>> Interested to learn how best to detect this.
>>
>> Thank you.
>>
>> -Paul
>>
>
>