Kiran J Shetty created MESOS-10198:
--------------------------------------

             Summary: Mesos-master service is activating state
                 Key: MESOS-10198
                 URL: https://issues.apache.org/jira/browse/MESOS-10198
             Project: Mesos
          Issue Type: Task
    Affects Versions: 1.9.0
            Reporter: Kiran J Shetty


mesos-master service showing activating state on all 3 master node and which 
intern making marathon to restart frequently . in logs I can see below entry.

 

Nov 12 08:36:29 servername mesos-master[19867]: @ 0x7f1a864206a9 
mesos::internal::log::ReplicaProcess::ReplicaProcess()
Nov 12 08:36:29 servername mesos-master[19867]: @ 0x7f1a86420854 
mesos::internal::log::Replica::Replica()
Nov 12 08:36:29 servername mesos-master[19867]: @ 0x7f1a863b6a65 
mesos::internal::log::LogProcess::LogProcess()
Nov 12 08:36:29 servername mesos-master[19867]: @ 0x7f1a863b6e34 
mesos::log::Log::Log()
Nov 12 08:36:29 servername mesos-master[19867]: @ 0x561155a3ec72 main
Nov 12 08:36:29 servername mesos-master[19867]: @ 0x7f1a82075555 
__libc_start_main
Nov 12 08:36:29 servername mesos-master[19867]: @ 0x561155a40d0a (unknown)
Nov 12 08:36:29 servername systemd[1]: mesos-master.service: main process 
exited, code=killed, status=6/ABRT
Nov 12 08:36:29 servername systemd[1]: Unit mesos-master.service entered failed 
state.
Nov 12 08:36:29 servername systemd[1]: mesos-master.service failed.
Nov 12 08:36:49 servername systemd[1]: mesos-master.service holdoff time over, 
scheduling restart.
Nov 12 08:36:49 servername systemd[1]: Stopped Mesos Master.
Nov 12 08:36:49 servername systemd[1]: Started Mesos Master.
Nov 12 08:36:49 servername mesos-master[20037]: I1112 08:36:49.633597 20024 
logging.cpp:201] INFO level logging started!
Nov 12 08:36:49 servername mesos-master[20037]: I1112 08:36:49.634446 20024 
main.cpp:243] Build: 2019-10-21 12:10:14 by centos
Nov 12 08:36:49 servername mesos-master[20037]: I1112 08:36:49.634460 20024 
main.cpp:244] Version: 1.9.0
Nov 12 08:36:49 servername mesos-master[20037]: I1112 08:36:49.634466 20024 
main.cpp:247] Git tag: 1.9.0
Nov 12 08:36:49 servername mesos-master[20037]: I1112 08:36:49.634470 20024 
main.cpp:251] Git SHA: 5e79a584e6ec3e9e2f96e8bf418411df9dafac2e
Nov 12 08:36:49 servername mesos-master[20037]: I1112 08:36:49.636653 20024 
main.cpp:345] Using 'hierarchical' allocator
Nov 12 08:36:49 servername mesos-master[20037]: mesos-master: 
./db/skiplist.h:344: void leveldb::SkipList<Key, Comparator>::Insert(const 
Key&) [with Key = const char*; Comparator = leveldb::MemTable::KeyComparator]: 
Assertion `x == __null || !Equal(key, x->key)' failed.
Nov 12 08:36:49 servername mesos-master[20037]: *** Aborted at 1605150409 (unix 
time) try "date -d @1605150409" if you are using GNU date ***
Nov 12 08:36:49 servername mesos-master[20037]: PC: @ 0x7fdee16ed387 __GI_raise
Nov 12 08:36:49 servername mesos-master[20037]: *** SIGABRT (@0x4e38) received 
by PID 20024 (TID 0x7fdee720ea00) from PID 20024; stack trace: ***
Nov 12 08:36:49 servername mesos-master[20037]: @ 0x7fdee1fb2630 (unknown)
Nov 12 08:36:49 servername mesos-master[20037]: @ 0x7fdee16ed387 __GI_raise
Nov 12 08:36:49 servername mesos-master[20037]: @ 0x7fdee16eea78 __GI_abort
Nov 12 08:36:49 servername mesos-master[20037]: @ 0x7fdee16e61a6 
__assert_fail_base
Nov 12 08:36:49 servername mesos-master[20037]: @ 0x7fdee16e6252 
__GI___assert_fail
Nov 12 08:36:49 servername mesos-master[20037]: @ 0x7fdee5cf3dc2 
leveldb::SkipList<>::Insert()
Nov 12 08:36:49 servername mesos-master[20037]: @ 0x7fdee5cf3735 
leveldb::MemTable::Add()
Nov 12 08:36:49 servername mesos-master[20037]: @ 0x7fdee5d00168 
leveldb::WriteBatch::Iterate()
Nov 12 08:36:49 servername mesos-master[20037]: @ 0x7fdee5d00424 
leveldb::WriteBatchInternal::InsertInto()
Nov 12 08:36:49 servername mesos-master[20037]: @ 0x7fdee5ce8575 
leveldb::DBImpl::RecoverLogFile()
Nov 12 08:36:49 servername mesos-master[20037]: @ 0x7fdee5cec0fc 
leveldb::DBImpl::Recover()
Nov 12 08:36:49 servername mesos-master[20037]: @ 0x7fdee5cec3fa 
leveldb::DB::Open()
Nov 12 08:36:49 servername mesos-master[20037]: @ 0x7fdee5a0f877 
mesos::internal::log::LevelDBStorage::restore()
Nov 12 08:36:49 servername mesos-master[20037]: @ 0x7fdee5a817a2 
mesos::internal::log::ReplicaProcess::restore()
Nov 12 08:36:49 servername mesos-master[20037]: @ 0x7fdee5a846a9 
mesos::internal::log::ReplicaProcess::ReplicaProcess()
Nov 12 08:36:49 servername mesos-master[20037]: @ 0x7fdee5a84854 
mesos::internal::log::Replica::Replica()
Nov 12 08:36:49 servername mesos-master[20037]: @ 0x7fdee5a1aa65 
mesos::internal::log::LogProcess::LogProcess()
Nov 12 08:36:49 servername mesos-master[20037]: @ 0x7fdee5a1ae34 
mesos::log::Log::Log()
Nov 12 08:36:49 servername mesos-master[20037]: @ 0x559ab80e0c72 main
Nov 12 08:36:49 servername mesos-master[20037]: @ 0x7fdee16d9555 
__libc_start_main
Nov 12 08:36:49 servername mesos-master[20037]: @ 0x559ab80e2d0a (unknown)
Nov 12 08:36:49 servername systemd[1]: mesos-master.service: main process 
exited, code=killed, status=6/ABRT
Nov 12 08:36:49 servername systemd[1]: Unit mesos-master.service entered failed 
state.
Nov 12 08:36:49 servername systemd[1]: mesos-master.service failed.

 

 

Marathon logs:

-- Logs begin at Thu 2020-11-12 07:38:40 IST. --
Nov 12 08:09:44 servername marathon[25752]: * 
Actor[akka://marathon/user/reviveOffers#-1362265983] 
(mesosphere.marathon.core.leadership.impl.LeadershipCoordinatorActor:marathon-akka.actor.default-dispatcher-19)
Nov 12 08:09:44 servername marathon[25752]: [2020-11-12 08:09:44,103] ERROR 
Lost leadership; crashing 
(mesosphere.marathon.core.election.ElectionServiceImpl:marathon-akka.actor.default-dispatcher-25)
Nov 12 08:09:44 servername marathon[25752]: [2020-11-12 08:09:44,104] INFO 
ExpungeOverdueLostTasksActor has stopped 
(mesosphere.marathon.core.task.jobs.impl.ExpungeOverdueLostTasksActor:marathon-akka.actor.default-dispatcher-15)
Nov 12 08:09:44 servername marathon[25752]: [2020-11-12 08:09:44,112] INFO 
shutting down with exit code 103 
(mesosphere.marathon.core.base.RichRuntime:scala-execution-context-global-101)
Nov 12 08:09:44 servername marathon[25752]: [2020-11-12 08:09:44,117] INFO 
Suspending scheduler actor 
(mesosphere.marathon.MarathonSchedulerActor:marathon-akka.actor.default-dispatcher-20)
Nov 12 08:09:44 servername marathon[25752]: [2020-11-12 08:09:44,117] ERROR 
Unhandled message in suspend: class 
mesosphere.marathon.core.launchqueue.impl.RateLimiterActor$Unsubscribe$ 
(mesosphere.marathon.core.leadership.impl.WhenLeaderActor:marathon-akka.actor.default-dispatcher-21)
Nov 12 08:09:44 servername marathon[25752]: [2020-11-12 08:09:44,121] INFO Now 
standing by. Closing existing handles and rejecting new. 
(mesosphere.marathon.core.event.impl.stream.HttpEventStreamActor:marathon-akka.actor.default-dispatcher-6)
Nov 12 08:09:44 servername systemd[1]: marathon.service: main process exited, 
code=exited, status=103/n/a
Nov 12 08:09:44 servername systemd[1]: Unit marathon.service entered failed 
state.
Nov 12 08:09:44 servername systemd[1]: marathon.service failed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to