[
https://issues.apache.org/jira/browse/MESOS-10198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17420368#comment-17420368
]
Charles Natali commented on MESOS-10198:
----------------------------------------
[~kiranjshetty]
I assume you've since moved on, so unless there is an update to this ticket
soon, I will close.
Cheers,
> Mesos-master service is activating state
> ----------------------------------------
>
> Key: MESOS-10198
> URL: https://issues.apache.org/jira/browse/MESOS-10198
> Project: Mesos
> Issue Type: Task
> Affects Versions: 1.9.0
> Reporter: Kiran J Shetty
> Priority: Major
>
> mesos-master service showing activating state on all 3 master node and which
> intern making marathon to restart frequently . in logs I can see below entry.
> Mesos-master logs:
> Nov 12 08:36:29 servername mesos-master[19867]: @ 0x7f1a864206a9
> mesos::internal::log::ReplicaProcess::ReplicaProcess()
> Nov 12 08:36:29 servername mesos-master[19867]: @ 0x7f1a86420854
> mesos::internal::log::Replica::Replica()
> Nov 12 08:36:29 servername mesos-master[19867]: @ 0x7f1a863b6a65
> mesos::internal::log::LogProcess::LogProcess()
> Nov 12 08:36:29 servername mesos-master[19867]: @ 0x7f1a863b6e34
> mesos::log::Log::Log()
> Nov 12 08:36:29 servername mesos-master[19867]: @ 0x561155a3ec72 main
> Nov 12 08:36:29 servername mesos-master[19867]: @ 0x7f1a82075555
> __libc_start_main
> Nov 12 08:36:29 servername mesos-master[19867]: @ 0x561155a40d0a (unknown)
> Nov 12 08:36:29 servername systemd[1]: mesos-master.service: main process
> exited, code=killed, status=6/ABRT
> Nov 12 08:36:29 servername systemd[1]: Unit mesos-master.service entered
> failed state.
> Nov 12 08:36:29 servername systemd[1]: mesos-master.service failed.
> Nov 12 08:36:49 servername systemd[1]: mesos-master.service holdoff time
> over, scheduling restart.
> Nov 12 08:36:49 servername systemd[1]: Stopped Mesos Master.
> Nov 12 08:36:49 servername systemd[1]: Started Mesos Master.
> Nov 12 08:36:49 servername mesos-master[20037]: I1112 08:36:49.633597 20024
> logging.cpp:201] INFO level logging started!
> Nov 12 08:36:49 servername mesos-master[20037]: I1112 08:36:49.634446 20024
> main.cpp:243] Build: 2019-10-21 12:10:14 by centos
> Nov 12 08:36:49 servername mesos-master[20037]: I1112 08:36:49.634460 20024
> main.cpp:244] Version: 1.9.0
> Nov 12 08:36:49 servername mesos-master[20037]: I1112 08:36:49.634466 20024
> main.cpp:247] Git tag: 1.9.0
> Nov 12 08:36:49 servername mesos-master[20037]: I1112 08:36:49.634470 20024
> main.cpp:251] Git SHA: 5e79a584e6ec3e9e2f96e8bf418411df9dafac2e
> Nov 12 08:36:49 servername mesos-master[20037]: I1112 08:36:49.636653 20024
> main.cpp:345] Using 'hierarchical' allocator
> Nov 12 08:36:49 servername mesos-master[20037]: mesos-master:
> ./db/skiplist.h:344: void leveldb::SkipList<Key, Comparator>::Insert(const
> Key&) [with Key = const char*; Comparator =
> leveldb::MemTable::KeyComparator]: Assertion `x == __null || !Equal(key,
> x->key)' failed.
> Nov 12 08:36:49 servername mesos-master[20037]: *** Aborted at 1605150409
> (unix time) try "date -d @1605150409" if you are using GNU date ***
> Nov 12 08:36:49 servername mesos-master[20037]: PC: @ 0x7fdee16ed387
> __GI_raise
> Nov 12 08:36:49 servername mesos-master[20037]: *** SIGABRT (@0x4e38)
> received by PID 20024 (TID 0x7fdee720ea00) from PID 20024; stack trace: ***
> Nov 12 08:36:49 servername mesos-master[20037]: @ 0x7fdee1fb2630 (unknown)
> Nov 12 08:36:49 servername mesos-master[20037]: @ 0x7fdee16ed387 __GI_raise
> Nov 12 08:36:49 servername mesos-master[20037]: @ 0x7fdee16eea78 __GI_abort
> Nov 12 08:36:49 servername mesos-master[20037]: @ 0x7fdee16e61a6
> __assert_fail_base
> Nov 12 08:36:49 servername mesos-master[20037]: @ 0x7fdee16e6252
> __GI___assert_fail
> Nov 12 08:36:49 servername mesos-master[20037]: @ 0x7fdee5cf3dc2
> leveldb::SkipList<>::Insert()
> Nov 12 08:36:49 servername mesos-master[20037]: @ 0x7fdee5cf3735
> leveldb::MemTable::Add()
> Nov 12 08:36:49 servername mesos-master[20037]: @ 0x7fdee5d00168
> leveldb::WriteBatch::Iterate()
> Nov 12 08:36:49 servername mesos-master[20037]: @ 0x7fdee5d00424
> leveldb::WriteBatchInternal::InsertInto()
> Nov 12 08:36:49 servername mesos-master[20037]: @ 0x7fdee5ce8575
> leveldb::DBImpl::RecoverLogFile()
> Nov 12 08:36:49 servername mesos-master[20037]: @ 0x7fdee5cec0fc
> leveldb::DBImpl::Recover()
> Nov 12 08:36:49 servername mesos-master[20037]: @ 0x7fdee5cec3fa
> leveldb::DB::Open()
> Nov 12 08:36:49 servername mesos-master[20037]: @ 0x7fdee5a0f877
> mesos::internal::log::LevelDBStorage::restore()
> Nov 12 08:36:49 servername mesos-master[20037]: @ 0x7fdee5a817a2
> mesos::internal::log::ReplicaProcess::restore()
> Nov 12 08:36:49 servername mesos-master[20037]: @ 0x7fdee5a846a9
> mesos::internal::log::ReplicaProcess::ReplicaProcess()
> Nov 12 08:36:49 servername mesos-master[20037]: @ 0x7fdee5a84854
> mesos::internal::log::Replica::Replica()
> Nov 12 08:36:49 servername mesos-master[20037]: @ 0x7fdee5a1aa65
> mesos::internal::log::LogProcess::LogProcess()
> Nov 12 08:36:49 servername mesos-master[20037]: @ 0x7fdee5a1ae34
> mesos::log::Log::Log()
> Nov 12 08:36:49 servername mesos-master[20037]: @ 0x559ab80e0c72 main
> Nov 12 08:36:49 servername mesos-master[20037]: @ 0x7fdee16d9555
> __libc_start_main
> Nov 12 08:36:49 servername mesos-master[20037]: @ 0x559ab80e2d0a (unknown)
> Nov 12 08:36:49 servername systemd[1]: mesos-master.service: main process
> exited, code=killed, status=6/ABRT
> Nov 12 08:36:49 servername systemd[1]: Unit mesos-master.service entered
> failed state.
> Nov 12 08:36:49 servername systemd[1]: mesos-master.service failed.
>
>
> Marathon logs:
> Nov 12 08:09:44 servername marathon[25752]: *
> Actor[akka://marathon/user/reviveOffers#-1362265983|#-1362265983]
> (mesosphere.marathon.core.leadership.impl.LeadershipCoordinatorActor:marathon-akka.actor.default-dispatcher-19)
> Nov 12 08:09:44 servername marathon[25752]: [2020-11-12 08:09:44,103] ERROR
> Lost leadership; crashing
> (mesosphere.marathon.core.election.ElectionServiceImpl:marathon-akka.actor.default-dispatcher-25)
> Nov 12 08:09:44 servername marathon[25752]: [2020-11-12 08:09:44,104] INFO
> ExpungeOverdueLostTasksActor has stopped
> (mesosphere.marathon.core.task.jobs.impl.ExpungeOverdueLostTasksActor:marathon-akka.actor.default-dispatcher-15)
> Nov 12 08:09:44 servername marathon[25752]: [2020-11-12 08:09:44,112] INFO
> shutting down with exit code 103
> (mesosphere.marathon.core.base.RichRuntime:scala-execution-context-global-101)
> Nov 12 08:09:44 servername marathon[25752]: [2020-11-12 08:09:44,117] INFO
> Suspending scheduler actor
> (mesosphere.marathon.MarathonSchedulerActor:marathon-akka.actor.default-dispatcher-20)
> Nov 12 08:09:44 servername marathon[25752]: [2020-11-12 08:09:44,117] ERROR
> Unhandled message in suspend: class
> mesosphere.marathon.core.launchqueue.impl.RateLimiterActor$Unsubscribe$
> (mesosphere.marathon.core.leadership.impl.WhenLeaderActor:marathon-akka.actor.default-dispatcher-21)
> Nov 12 08:09:44 servername marathon[25752]: [2020-11-12 08:09:44,121] INFO
> Now standing by. Closing existing handles and rejecting new.
> (mesosphere.marathon.core.event.impl.stream.HttpEventStreamActor:marathon-akka.actor.default-dispatcher-6)
> Nov 12 08:09:44 servername systemd[1]: marathon.service: main process
> exited, code=exited, status=103/n/a
> Nov 12 08:09:44 servername systemd[1]: Unit marathon.service entered failed
> state.
> Nov 12 08:09:44 servername systemd[1]: marathon.service failed.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)