[jira] [Created] (MESOS-10197) One of processes gets incorrect status after stopping and starting mesos-master and mesos-agent simultaneously

Sergei Hanus (Jira) Wed, 04 Nov 2020 04:52:02 -0800

Sergei Hanus created MESOS-10197:
------------------------------------

             Summary: One of processes gets incorrect status after stopping and 
starting mesos-master and mesos-agent simultaneously
                 Key: MESOS-10197
                 URL: https://issues.apache.org/jira/browse/MESOS-10197
             Project: Mesos
          Issue Type: Bug
            Reporter: Sergei Hanus



We are using mesos 1.8.0 together with marathon 1.7.50

We run several child services under marathon. When we stop and start all 
services (including mesos-master and mesos-agent) or simply reboot the server, 
usually everything is returning back to functional.

But, sometimes we observe, that one of child services is reported as healthy, 
but in fact there is no such process on the server. When we restart mesos-sgent 
once more, this child service appears as a process and actually starts working.

 

At the same time we observe the following message in agent log:

 
{code:java}
I1103 01:48:08.291822  6542 slave.cpp:5491] Killing un-reregistered executor 
'ia-cloud_nexus.f09fb47b-1d66-11eb-ad1d-12962e9c065b' of framework 
a99f25dd-d176-4ffd-9351-e70a357c1872-0000 at executor(1)@10.100.5.141:36452
I1103 01:48:08.291896  6542 slave.cpp:7848] Finished recovery
{code}
What could be the reason of such behavior and how to avoid it? If this 
services' state is stuck somethere in agents' internal structures (metadata 
file on disk, or something like that) - hwo could we cleanup this state?

 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (MESOS-10197) One of processes gets incorrect status after stopping and starting mesos-master and mesos-agent simultaneously

Reply via email to