[ceph-users] Re: "ceph orch restart mgr" command creates mgr restart loop

2021-07-16 Thread Jim Bartlett
Did you ever discover the reason for this restart loop?
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: "ceph orch restart mgr" command creates mgr restart loop

2021-01-27 Thread Jens Hyllegaard (Soft Design A/S)
Hi Chris

Having also recently started exploring Ceph. I too happened upon this problem.
I found that terminating the command witch ctrl-c seemed to stop the looping. 
Which btw. also happens on all other mgr instances in the cluster.

Regards

Jens

-Original Message-
From: Chris Read  
Sent: 11. januar 2021 21:54
To: ceph-users@ceph.io
Subject: [ceph-users] "ceph orch restart mgr" command creates mgr restart loop

Greetings all...

I'm busy testing out Ceph and have hit this troublesome bug while following the 
steps outlined here:

https://docs.ceph.com/en/octopus/cephadm/monitoring/#configuring-ssl-tls-for-grafana

When I issue the "ceph orch restart mgr" command, it appears the command is not 
cleared from a message queue somewhere (I'm still very unclear on many ceph 
specifics), and so each time the mgr process returns from restart it picks up 
the message again and keeps restarting itself forever (so far it's been stuck 
in this state for 45 minutes).

Watching the logs we see this going on:

$ ceph log last cephadm -w

root@ceph-poc-000:~# ceph log last cephadm -w
  cluster:
id: d23bc326-543a-11eb-bfe0-b324db228b6c
health: HEALTH_OK

  services:
mon: 5 daemons, quorum
ceph-poc-000,ceph-poc-003,ceph-poc-004,ceph-poc-002,ceph-poc-001 (age 2h)
mgr: ceph-poc-000.himivo(active, since 4s), standbys:
ceph-poc-001.unjulx
osd: 10 osds: 10 up (since 2h), 10 in (since 2h)

  data:
pools:   1 pools, 1 pgs
objects: 0 objects, 0 B
usage:   10 GiB used, 5.4 TiB / 5.5 TiB avail
pgs: 1 active+clean


2021-01-11T20:46:32.976606+ mon.ceph-poc-000 [INF] Active manager daemon 
ceph-poc-000.himivo restarted
2021-01-11T20:46:32.980749+ mon.ceph-poc-000 [INF] Activating manager 
daemon ceph-poc-000.himivo
2021-01-11T20:46:33.061519+ mon.ceph-poc-000 [INF] Manager daemon 
ceph-poc-000.himivo is now available
2021-01-11T20:46:39.156420+ mon.ceph-poc-000 [INF] Active manager daemon 
ceph-poc-000.himivo restarted
2021-01-11T20:46:39.160618+ mon.ceph-poc-000 [INF] Activating manager 
daemon ceph-poc-000.himivo
2021-01-11T20:46:39.242603+ mon.ceph-poc-000 [INF] Manager daemon 
ceph-poc-000.himivo is now available
2021-01-11T20:46:45.299953+ mon.ceph-poc-000 [INF] Active manager daemon 
ceph-poc-000.himivo restarted
2021-01-11T20:46:45.304006+ mon.ceph-poc-000 [INF] Activating manager 
daemon ceph-poc-000.himivo
2021-01-11T20:46:45.733495+ mon.ceph-poc-000 [INF] Manager daemon 
ceph-poc-000.himivo is now available
2021-01-11T20:46:51.871903+ mon.ceph-poc-000 [INF] Active manager daemon 
ceph-poc-000.himivo restarted
2021-01-11T20:46:51.877107+ mon.ceph-poc-000 [INF] Activating manager 
daemon ceph-poc-000.himivo
2021-01-11T20:46:51.976190+ mon.ceph-poc-000 [INF] Manager daemon 
ceph-poc-000.himivo is now available
2021-01-11T20:46:58.000720+ mon.ceph-poc-000 [INF] Active manager daemon 
ceph-poc-000.himivo restarted
2021-01-11T20:46:58.006843+ mon.ceph-poc-000 [INF] Activating manager 
daemon ceph-poc-000.himivo
2021-01-11T20:46:58.097163+ mon.ceph-poc-000 [INF] Manager daemon 
ceph-poc-000.himivo is now available
2021-01-11T20:47:04.188630+ mon.ceph-poc-000 [INF] Active manager daemon 
ceph-poc-000.himivo restarted
2021-01-11T20:47:04.193501+ mon.ceph-poc-000 [INF] Activating manager 
daemon ceph-poc-000.himivo
2021-01-11T20:47:04.285509+ mon.ceph-poc-000 [INF] Manager daemon 
ceph-poc-000.himivo is now available
2021-01-11T20:47:10.348099+ mon.ceph-poc-000 [INF] Active manager daemon 
ceph-poc-000.himivo restarted
2021-01-11T20:47:10.352340+ mon.ceph-poc-000 [INF] Activating manager 
daemon ceph-poc-000.himivo
2021-01-11T20:47:10.752243+ mon.ceph-poc-000 [INF] Manager daemon 
ceph-poc-000.himivo is now available

And in the logs for the mgr instance itself we see it keep replaying the 
message over and over:

$ docker logs -f
ceph-d23bc326-543a-11eb-bfe0-b324db228b6c-mgr.ceph-poc-000.himivo
debug 2021-01-11T20:47:31.390+ 7f48b0d0d200  0 set uid:gid to 167:167
(ceph:ceph)
debug 2021-01-11T20:47:31.390+ 7f48b0d0d200  0 ceph version 15.2.8
(bdf3eebcd22d7d0b3dd4d5501bee5bac354d5b55) octopus (stable), process ceph-mgr, 
pid 1 debug 2021-01-11T20:47:31.390+ 7f48b0d0d200  0 pidfile_write: ignore 
empty --pid-file debug 2021-01-11T20:47:31.414+ 7f48b0d0d200  1 mgr[py] 
Loading python module 'alerts'
debug 2021-01-11T20:47:31.486+ 7f48b0d0d200  1 mgr[py] Loading python 
module 'balancer'
debug 2021-01-11T20:47:31.542+ 7f48b0d0d200  1 mgr[py] Loading python 
module 'cephadm'
debug 2021-01-11T20:47:31.742+ 7f48b0d0d200  1 mgr[py] Loading python 
module 'crash'
debug 2021-01-11T20:47:31.798+ 7f48b0d0d200  1 mgr[py] Loading python 
module 'dashboard'
debug 2021-01-11T20:47:32.258+ 7f48b0d0d200  1 mgr[py] Loading python 
module 'devicehealth'
debug 2021-01-11T20:47:32.306+ 7f48b0d0d200  1 mgr[py] Loading python 
module 'diskprediction_local'
debug