[ 
https://issues.apache.org/jira/browse/MESOS-10198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17420368#comment-17420368
 ] 

Charles Natali commented on MESOS-10198:
----------------------------------------

[~kiranjshetty]

I assume you've since moved on, so unless there is an update to this ticket 
soon, I will close.

Cheers,


> Mesos-master service is activating state
> ----------------------------------------
>
>                 Key: MESOS-10198
>                 URL: https://issues.apache.org/jira/browse/MESOS-10198
>             Project: Mesos
>          Issue Type: Task
>    Affects Versions: 1.9.0
>            Reporter: Kiran J Shetty
>            Priority: Major
>
> mesos-master service showing activating state on all 3 master node and which 
> intern making marathon to restart frequently . in logs I can see below entry.
>  Mesos-master logs:
> Nov 12 08:36:29 servername mesos-master[19867]: @ 0x7f1a864206a9 
> mesos::internal::log::ReplicaProcess::ReplicaProcess()
>  Nov 12 08:36:29 servername mesos-master[19867]: @ 0x7f1a86420854 
> mesos::internal::log::Replica::Replica()
>  Nov 12 08:36:29 servername mesos-master[19867]: @ 0x7f1a863b6a65 
> mesos::internal::log::LogProcess::LogProcess()
>  Nov 12 08:36:29 servername mesos-master[19867]: @ 0x7f1a863b6e34 
> mesos::log::Log::Log()
>  Nov 12 08:36:29 servername mesos-master[19867]: @ 0x561155a3ec72 main
>  Nov 12 08:36:29 servername mesos-master[19867]: @ 0x7f1a82075555 
> __libc_start_main
>  Nov 12 08:36:29 servername mesos-master[19867]: @ 0x561155a40d0a (unknown)
>  Nov 12 08:36:29 servername systemd[1]: mesos-master.service: main process 
> exited, code=killed, status=6/ABRT
>  Nov 12 08:36:29 servername systemd[1]: Unit mesos-master.service entered 
> failed state.
>  Nov 12 08:36:29 servername systemd[1]: mesos-master.service failed.
>  Nov 12 08:36:49 servername systemd[1]: mesos-master.service holdoff time 
> over, scheduling restart.
>  Nov 12 08:36:49 servername systemd[1]: Stopped Mesos Master.
>  Nov 12 08:36:49 servername systemd[1]: Started Mesos Master.
>  Nov 12 08:36:49 servername mesos-master[20037]: I1112 08:36:49.633597 20024 
> logging.cpp:201] INFO level logging started!
>  Nov 12 08:36:49 servername mesos-master[20037]: I1112 08:36:49.634446 20024 
> main.cpp:243] Build: 2019-10-21 12:10:14 by centos
>  Nov 12 08:36:49 servername mesos-master[20037]: I1112 08:36:49.634460 20024 
> main.cpp:244] Version: 1.9.0
>  Nov 12 08:36:49 servername mesos-master[20037]: I1112 08:36:49.634466 20024 
> main.cpp:247] Git tag: 1.9.0
>  Nov 12 08:36:49 servername mesos-master[20037]: I1112 08:36:49.634470 20024 
> main.cpp:251] Git SHA: 5e79a584e6ec3e9e2f96e8bf418411df9dafac2e
>  Nov 12 08:36:49 servername mesos-master[20037]: I1112 08:36:49.636653 20024 
> main.cpp:345] Using 'hierarchical' allocator
>  Nov 12 08:36:49 servername mesos-master[20037]: mesos-master: 
> ./db/skiplist.h:344: void leveldb::SkipList<Key, Comparator>::Insert(const 
> Key&) [with Key = const char*; Comparator = 
> leveldb::MemTable::KeyComparator]: Assertion `x == __null || !Equal(key, 
> x->key)' failed.
>  Nov 12 08:36:49 servername mesos-master[20037]: *** Aborted at 1605150409 
> (unix time) try "date -d @1605150409" if you are using GNU date ***
>  Nov 12 08:36:49 servername mesos-master[20037]: PC: @ 0x7fdee16ed387 
> __GI_raise
>  Nov 12 08:36:49 servername mesos-master[20037]: *** SIGABRT (@0x4e38) 
> received by PID 20024 (TID 0x7fdee720ea00) from PID 20024; stack trace: ***
>  Nov 12 08:36:49 servername mesos-master[20037]: @ 0x7fdee1fb2630 (unknown)
>  Nov 12 08:36:49 servername mesos-master[20037]: @ 0x7fdee16ed387 __GI_raise
>  Nov 12 08:36:49 servername mesos-master[20037]: @ 0x7fdee16eea78 __GI_abort
>  Nov 12 08:36:49 servername mesos-master[20037]: @ 0x7fdee16e61a6 
> __assert_fail_base
>  Nov 12 08:36:49 servername mesos-master[20037]: @ 0x7fdee16e6252 
> __GI___assert_fail
>  Nov 12 08:36:49 servername mesos-master[20037]: @ 0x7fdee5cf3dc2 
> leveldb::SkipList<>::Insert()
>  Nov 12 08:36:49 servername mesos-master[20037]: @ 0x7fdee5cf3735 
> leveldb::MemTable::Add()
>  Nov 12 08:36:49 servername mesos-master[20037]: @ 0x7fdee5d00168 
> leveldb::WriteBatch::Iterate()
>  Nov 12 08:36:49 servername mesos-master[20037]: @ 0x7fdee5d00424 
> leveldb::WriteBatchInternal::InsertInto()
>  Nov 12 08:36:49 servername mesos-master[20037]: @ 0x7fdee5ce8575 
> leveldb::DBImpl::RecoverLogFile()
>  Nov 12 08:36:49 servername mesos-master[20037]: @ 0x7fdee5cec0fc 
> leveldb::DBImpl::Recover()
>  Nov 12 08:36:49 servername mesos-master[20037]: @ 0x7fdee5cec3fa 
> leveldb::DB::Open()
>  Nov 12 08:36:49 servername mesos-master[20037]: @ 0x7fdee5a0f877 
> mesos::internal::log::LevelDBStorage::restore()
>  Nov 12 08:36:49 servername mesos-master[20037]: @ 0x7fdee5a817a2 
> mesos::internal::log::ReplicaProcess::restore()
>  Nov 12 08:36:49 servername mesos-master[20037]: @ 0x7fdee5a846a9 
> mesos::internal::log::ReplicaProcess::ReplicaProcess()
>  Nov 12 08:36:49 servername mesos-master[20037]: @ 0x7fdee5a84854 
> mesos::internal::log::Replica::Replica()
>  Nov 12 08:36:49 servername mesos-master[20037]: @ 0x7fdee5a1aa65 
> mesos::internal::log::LogProcess::LogProcess()
>  Nov 12 08:36:49 servername mesos-master[20037]: @ 0x7fdee5a1ae34 
> mesos::log::Log::Log()
>  Nov 12 08:36:49 servername mesos-master[20037]: @ 0x559ab80e0c72 main
>  Nov 12 08:36:49 servername mesos-master[20037]: @ 0x7fdee16d9555 
> __libc_start_main
>  Nov 12 08:36:49 servername mesos-master[20037]: @ 0x559ab80e2d0a (unknown)
>  Nov 12 08:36:49 servername systemd[1]: mesos-master.service: main process 
> exited, code=killed, status=6/ABRT
>  Nov 12 08:36:49 servername systemd[1]: Unit mesos-master.service entered 
> failed state.
>  Nov 12 08:36:49 servername systemd[1]: mesos-master.service failed.
>  
>  
> Marathon logs:
> Nov 12 08:09:44 servername marathon[25752]: * 
> Actor[akka://marathon/user/reviveOffers#-1362265983|#-1362265983] 
> (mesosphere.marathon.core.leadership.impl.LeadershipCoordinatorActor:marathon-akka.actor.default-dispatcher-19)
>  Nov 12 08:09:44 servername marathon[25752]: [2020-11-12 08:09:44,103] ERROR 
> Lost leadership; crashing 
> (mesosphere.marathon.core.election.ElectionServiceImpl:marathon-akka.actor.default-dispatcher-25)
>  Nov 12 08:09:44 servername marathon[25752]: [2020-11-12 08:09:44,104] INFO 
> ExpungeOverdueLostTasksActor has stopped 
> (mesosphere.marathon.core.task.jobs.impl.ExpungeOverdueLostTasksActor:marathon-akka.actor.default-dispatcher-15)
>  Nov 12 08:09:44 servername marathon[25752]: [2020-11-12 08:09:44,112] INFO 
> shutting down with exit code 103 
> (mesosphere.marathon.core.base.RichRuntime:scala-execution-context-global-101)
>  Nov 12 08:09:44 servername marathon[25752]: [2020-11-12 08:09:44,117] INFO 
> Suspending scheduler actor 
> (mesosphere.marathon.MarathonSchedulerActor:marathon-akka.actor.default-dispatcher-20)
>  Nov 12 08:09:44 servername marathon[25752]: [2020-11-12 08:09:44,117] ERROR 
> Unhandled message in suspend: class 
> mesosphere.marathon.core.launchqueue.impl.RateLimiterActor$Unsubscribe$ 
> (mesosphere.marathon.core.leadership.impl.WhenLeaderActor:marathon-akka.actor.default-dispatcher-21)
>  Nov 12 08:09:44 servername marathon[25752]: [2020-11-12 08:09:44,121] INFO 
> Now standing by. Closing existing handles and rejecting new. 
> (mesosphere.marathon.core.event.impl.stream.HttpEventStreamActor:marathon-akka.actor.default-dispatcher-6)
>  Nov 12 08:09:44 servername systemd[1]: marathon.service: main process 
> exited, code=exited, status=103/n/a
>  Nov 12 08:09:44 servername systemd[1]: Unit marathon.service entered failed 
> state.
>  Nov 12 08:09:44 servername systemd[1]: marathon.service failed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to