[
https://issues.apache.org/jira/browse/MESOS-10188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17195456#comment-17195456
]
Jerome Soussens commented on MESOS-10188:
-----------------------------------------
Hi,
No stack trace present and here is what I see from system logs :
{code:java}
Sep 11 06:43:25 dev-eu-w-01-sgmm-0-0 systemd: mesos-master.service: main
process exited, code=killed, status=6/ABRT
Sep 11 06:43:25 dev-eu-w-01-sgmm-0-0 systemd: Unit mesos-master.service entered
failed state.
Sep 11 06:43:25 dev-eu-w-01-sgmm-0-0 systemd: mesos-master.service failed.
Sep 11 06:43:45 dev-eu-w-01-sgmm-0-0 systemd: mesos-master.service holdoff time
over, scheduling restart.
Sep 11 06:43:45 dev-eu-w-01-sgmm-0-0 systemd: Stopped Mesos Master.
{code}
I'm continuing to investigate.
> Master check failure : scalars does not contain agent
> -----------------------------------------------------
>
> Key: MESOS-10188
> URL: https://issues.apache.org/jira/browse/MESOS-10188
> Project: Mesos
> Issue Type: Bug
> Affects Versions: 1.10.0
> Reporter: Jerome Soussens
> Priority: Critical
> Attachments: image-2020-09-14-10-07-42-622.png,
> mesos-master.dev-eu-w-01-sgmm-0-0.root.log.ERROR.20200911-064325.46082-20200912.gz,
>
> mesos-master.dev-eu-w-01-sgmm-0-0.root.log.FATAL.20200911-064325.46082-20200912.gz,
>
> mesos-master.dev-eu-w-01-sgmm-0-0.root.log.INFO.20200910-200737.46082-20200911.gz,
>
> mesos-master.dev-eu-w-01-sgmm-0-0.root.log.WARNING.20200830-004426.46082-20200911.gz
>
>
> Mesos master restarted with the error message :
> {code:java}
> F0911 06:43:25.109040 46181 hierarchical.cpp:232] Check failed: scalars does
> not contain 8f1f65e8-c38d-4563-bfba-eaa079271b2b-S732{code}
> See attached log files.
> FYI, Agent S732 had a network outage between between 06:40 and 06:44 :
> !image-2020-09-14-10-07-42-622.png|width=1545,height=435!
>
> AAt the end of the outage, Mesos master has the following logs :
> {code:java}
> I0911 06:43:20.392347 46184 master.cpp:6513] Received reregister agent
> message from agent 8f1f65e8-c38d-4563-bfba-eaa079271b2b-S732 at
> slave(1)@172.17.50.35:5051 (dev-eu-w-03-sgma-1)
> W0911 06:43:20.421454 46191 master.cpp:10618] Possibly orphaned completed
> task b92038e7-b42c-4e23-ae55-9be4325a4d32 of framework
> d65e2494-c7c5-456b-aad6-fc44cadf2f50 that ran on agent
> 8f1f65e8-c38d-4563-bfba-eaa079271b2b-S732 at slave(1)@172.17.50.35:5051
> (dev-eu-w-03-sgma-1)
> {code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)