Re: Marathon constantly unregisters on particular slaves

2016-08-24 Thread Joseph Wu
>
> Scenario is:
> * Marathon registers on slave,
>
Why is Marathon registering on the agent?  This shouldn't even be possible,
as frameworks must talk to the master.

Marathon dies on two of them constantly.

How are you starting Marathon?  Via some init service?  And are you
starting Marathon on every node?


Re: Marathon constantly unregisters on particular slaves

2016-08-24 Thread Mateusz Moneta
This was normal behavior. Nothing wanted to run there so framework was
unregistered. We had some flapping task which caused constant
deregistration.

On 2016-08-24 16:04 ( 0200), Mateusz Moneta  wrote:
> Hello,>
>
> we have Mesos production cluster composed from 14 slaves nodes and 3>
> masters running Mesos 1.0.0 and Marathon 1.1.1. All nodes are managed by>
> Puppet so have identical configuration. OS is Debian Jessie with 4.6.0>
> Kernel.>
>
> We have problem that, after recent restart of `mesos-slaves` processes>
> across our cluster, Marathon dies on two of them constantly.>
>
> Scenario is:>
> * Marathon registers on slave,>
> * Marathon run for a couple of minutes, tasks are launched and
everything>
> seems fine,>
> * Marathon unregisters.>
>
> I've checked Mesos Slave/Mesos Master/Marathon logs and found nothing.>
> Mesos master has only logs about REVIVE framework, slave about
'framework>
> seems to be missing' and Marathon only about rescheduling tasks from
slave.>
>
> I've tried reboots, removing /var/lib/mesos/meta, restarting>
> slaves/masters/marathon with different configurations. Nothing helps.>
>
> Any clues what can be wrong or how to debug this?>
>
> -- >
> BR,>
> Mateusz>
>