Kiran J Shetty created MESOS-10198:
--------------------------------------
Summary: Mesos-master service is activating state
Key: MESOS-10198
URL: https://issues.apache.org/jira/browse/MESOS-10198
Project: Mesos
Issue Type: Task
Affects Versions: 1.9.0
Reporter: Kiran J Shetty
mesos-master service showing activating state on all 3 master node and which
intern making marathon to restart frequently . in logs I can see below entry.
Nov 12 08:36:29 servername mesos-master[19867]: @ 0x7f1a864206a9
mesos::internal::log::ReplicaProcess::ReplicaProcess()
Nov 12 08:36:29 servername mesos-master[19867]: @ 0x7f1a86420854
mesos::internal::log::Replica::Replica()
Nov 12 08:36:29 servername mesos-master[19867]: @ 0x7f1a863b6a65
mesos::internal::log::LogProcess::LogProcess()
Nov 12 08:36:29 servername mesos-master[19867]: @ 0x7f1a863b6e34
mesos::log::Log::Log()
Nov 12 08:36:29 servername mesos-master[19867]: @ 0x561155a3ec72 main
Nov 12 08:36:29 servername mesos-master[19867]: @ 0x7f1a82075555
__libc_start_main
Nov 12 08:36:29 servername mesos-master[19867]: @ 0x561155a40d0a (unknown)
Nov 12 08:36:29 servername systemd[1]: mesos-master.service: main process
exited, code=killed, status=6/ABRT
Nov 12 08:36:29 servername systemd[1]: Unit mesos-master.service entered failed
state.
Nov 12 08:36:29 servername systemd[1]: mesos-master.service failed.
Nov 12 08:36:49 servername systemd[1]: mesos-master.service holdoff time over,
scheduling restart.
Nov 12 08:36:49 servername systemd[1]: Stopped Mesos Master.
Nov 12 08:36:49 servername systemd[1]: Started Mesos Master.
Nov 12 08:36:49 servername mesos-master[20037]: I1112 08:36:49.633597 20024
logging.cpp:201] INFO level logging started!
Nov 12 08:36:49 servername mesos-master[20037]: I1112 08:36:49.634446 20024
main.cpp:243] Build: 2019-10-21 12:10:14 by centos
Nov 12 08:36:49 servername mesos-master[20037]: I1112 08:36:49.634460 20024
main.cpp:244] Version: 1.9.0
Nov 12 08:36:49 servername mesos-master[20037]: I1112 08:36:49.634466 20024
main.cpp:247] Git tag: 1.9.0
Nov 12 08:36:49 servername mesos-master[20037]: I1112 08:36:49.634470 20024
main.cpp:251] Git SHA: 5e79a584e6ec3e9e2f96e8bf418411df9dafac2e
Nov 12 08:36:49 servername mesos-master[20037]: I1112 08:36:49.636653 20024
main.cpp:345] Using 'hierarchical' allocator
Nov 12 08:36:49 servername mesos-master[20037]: mesos-master:
./db/skiplist.h:344: void leveldb::SkipList<Key, Comparator>::Insert(const
Key&) [with Key = const char*; Comparator = leveldb::MemTable::KeyComparator]:
Assertion `x == __null || !Equal(key, x->key)' failed.
Nov 12 08:36:49 servername mesos-master[20037]: *** Aborted at 1605150409 (unix
time) try "date -d @1605150409" if you are using GNU date ***
Nov 12 08:36:49 servername mesos-master[20037]: PC: @ 0x7fdee16ed387 __GI_raise
Nov 12 08:36:49 servername mesos-master[20037]: *** SIGABRT (@0x4e38) received
by PID 20024 (TID 0x7fdee720ea00) from PID 20024; stack trace: ***
Nov 12 08:36:49 servername mesos-master[20037]: @ 0x7fdee1fb2630 (unknown)
Nov 12 08:36:49 servername mesos-master[20037]: @ 0x7fdee16ed387 __GI_raise
Nov 12 08:36:49 servername mesos-master[20037]: @ 0x7fdee16eea78 __GI_abort
Nov 12 08:36:49 servername mesos-master[20037]: @ 0x7fdee16e61a6
__assert_fail_base
Nov 12 08:36:49 servername mesos-master[20037]: @ 0x7fdee16e6252
__GI___assert_fail
Nov 12 08:36:49 servername mesos-master[20037]: @ 0x7fdee5cf3dc2
leveldb::SkipList<>::Insert()
Nov 12 08:36:49 servername mesos-master[20037]: @ 0x7fdee5cf3735
leveldb::MemTable::Add()
Nov 12 08:36:49 servername mesos-master[20037]: @ 0x7fdee5d00168
leveldb::WriteBatch::Iterate()
Nov 12 08:36:49 servername mesos-master[20037]: @ 0x7fdee5d00424
leveldb::WriteBatchInternal::InsertInto()
Nov 12 08:36:49 servername mesos-master[20037]: @ 0x7fdee5ce8575
leveldb::DBImpl::RecoverLogFile()
Nov 12 08:36:49 servername mesos-master[20037]: @ 0x7fdee5cec0fc
leveldb::DBImpl::Recover()
Nov 12 08:36:49 servername mesos-master[20037]: @ 0x7fdee5cec3fa
leveldb::DB::Open()
Nov 12 08:36:49 servername mesos-master[20037]: @ 0x7fdee5a0f877
mesos::internal::log::LevelDBStorage::restore()
Nov 12 08:36:49 servername mesos-master[20037]: @ 0x7fdee5a817a2
mesos::internal::log::ReplicaProcess::restore()
Nov 12 08:36:49 servername mesos-master[20037]: @ 0x7fdee5a846a9
mesos::internal::log::ReplicaProcess::ReplicaProcess()
Nov 12 08:36:49 servername mesos-master[20037]: @ 0x7fdee5a84854
mesos::internal::log::Replica::Replica()
Nov 12 08:36:49 servername mesos-master[20037]: @ 0x7fdee5a1aa65
mesos::internal::log::LogProcess::LogProcess()
Nov 12 08:36:49 servername mesos-master[20037]: @ 0x7fdee5a1ae34
mesos::log::Log::Log()
Nov 12 08:36:49 servername mesos-master[20037]: @ 0x559ab80e0c72 main
Nov 12 08:36:49 servername mesos-master[20037]: @ 0x7fdee16d9555
__libc_start_main
Nov 12 08:36:49 servername mesos-master[20037]: @ 0x559ab80e2d0a (unknown)
Nov 12 08:36:49 servername systemd[1]: mesos-master.service: main process
exited, code=killed, status=6/ABRT
Nov 12 08:36:49 servername systemd[1]: Unit mesos-master.service entered failed
state.
Nov 12 08:36:49 servername systemd[1]: mesos-master.service failed.
Marathon logs:
-- Logs begin at Thu 2020-11-12 07:38:40 IST. --
Nov 12 08:09:44 servername marathon[25752]: *
Actor[akka://marathon/user/reviveOffers#-1362265983]
(mesosphere.marathon.core.leadership.impl.LeadershipCoordinatorActor:marathon-akka.actor.default-dispatcher-19)
Nov 12 08:09:44 servername marathon[25752]: [2020-11-12 08:09:44,103] ERROR
Lost leadership; crashing
(mesosphere.marathon.core.election.ElectionServiceImpl:marathon-akka.actor.default-dispatcher-25)
Nov 12 08:09:44 servername marathon[25752]: [2020-11-12 08:09:44,104] INFO
ExpungeOverdueLostTasksActor has stopped
(mesosphere.marathon.core.task.jobs.impl.ExpungeOverdueLostTasksActor:marathon-akka.actor.default-dispatcher-15)
Nov 12 08:09:44 servername marathon[25752]: [2020-11-12 08:09:44,112] INFO
shutting down with exit code 103
(mesosphere.marathon.core.base.RichRuntime:scala-execution-context-global-101)
Nov 12 08:09:44 servername marathon[25752]: [2020-11-12 08:09:44,117] INFO
Suspending scheduler actor
(mesosphere.marathon.MarathonSchedulerActor:marathon-akka.actor.default-dispatcher-20)
Nov 12 08:09:44 servername marathon[25752]: [2020-11-12 08:09:44,117] ERROR
Unhandled message in suspend: class
mesosphere.marathon.core.launchqueue.impl.RateLimiterActor$Unsubscribe$
(mesosphere.marathon.core.leadership.impl.WhenLeaderActor:marathon-akka.actor.default-dispatcher-21)
Nov 12 08:09:44 servername marathon[25752]: [2020-11-12 08:09:44,121] INFO Now
standing by. Closing existing handles and rejecting new.
(mesosphere.marathon.core.event.impl.stream.HttpEventStreamActor:marathon-akka.actor.default-dispatcher-6)
Nov 12 08:09:44 servername systemd[1]: marathon.service: main process exited,
code=exited, status=103/n/a
Nov 12 08:09:44 servername systemd[1]: Unit marathon.service entered failed
state.
Nov 12 08:09:44 servername systemd[1]: marathon.service failed.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)