[ 
https://issues.apache.org/jira/browse/MESOS-10188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17196148#comment-17196148
 ] 

Andrei Sekretenko commented on MESOS-10188:
-------------------------------------------

[~Jerome Soussens] Are you sure there is no stack trace above that? In my 
experience,  crash stacks usually get intermixed with log lines, which are also 
written into stdout.

At this point, this does not look like something we introduced into 1.10 
(although, in pre-1.10 a crash would have been impossible due to absent check; 
most likely Mesos would have ended up mis-accounting resources somewhere).
My current suspicion is that there is some kind of race between TEARDOWN call 
and terminal status transitions of tasks/executors in the master...



> Master check failure : scalars does not contain agent
> -----------------------------------------------------
>
>                 Key: MESOS-10188
>                 URL: https://issues.apache.org/jira/browse/MESOS-10188
>             Project: Mesos
>          Issue Type: Bug
>    Affects Versions: 1.10.0
>            Reporter: Jerome Soussens
>            Priority: Critical
>         Attachments: image-2020-09-14-10-07-42-622.png, 
> mesos-master.dev-eu-w-01-sgmm-0-0.root.log.ERROR.20200911-064325.46082-20200912.gz,
>  
> mesos-master.dev-eu-w-01-sgmm-0-0.root.log.FATAL.20200911-064325.46082-20200912.gz,
>  
> mesos-master.dev-eu-w-01-sgmm-0-0.root.log.INFO.20200910-200737.46082-20200911.gz,
>  
> mesos-master.dev-eu-w-01-sgmm-0-0.root.log.WARNING.20200830-004426.46082-20200911.gz
>
>
> Mesos master restarted with the error message :
> {code:java}
> F0911 06:43:25.109040 46181 hierarchical.cpp:232] Check failed: scalars does 
> not contain 8f1f65e8-c38d-4563-bfba-eaa079271b2b-S732{code}
> See attached log files.
> FYI, Agent S732 had a network outage between between 06:40 and 06:44  :
> !image-2020-09-14-10-07-42-622.png|width=1545,height=435!
>  
> AAt the end of the outage, Mesos master has the following logs :
> {code:java}
> I0911 06:43:20.392347 46184 master.cpp:6513] Received reregister agent 
> message from agent 8f1f65e8-c38d-4563-bfba-eaa079271b2b-S732 at 
> slave(1)@172.17.50.35:5051 (dev-eu-w-03-sgma-1)
> W0911 06:43:20.421454 46191 master.cpp:10618] Possibly orphaned completed 
> task b92038e7-b42c-4e23-ae55-9be4325a4d32 of framework 
> d65e2494-c7c5-456b-aad6-fc44cadf2f50 that ran on agent 
> 8f1f65e8-c38d-4563-bfba-eaa079271b2b-S732 at slave(1)@172.17.50.35:5051 
> (dev-eu-w-03-sgma-1)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to