On Thu, Sep 13, 2018 at 7:55 PM Christian Albrecht <c...@albix.de> wrote:
>
> Hi all,
>
> after upgrading from 12.2.7 to 12.2.8 the standby mgr instances in my cluster 
> stopped sending beacons.
> The service starts and everything seems to work just fine, but after a period 
> of time the mgr disappears.
> All of my three mgr daemons are running.
>
>
> [root@ceph01 ~]# ceph mgr dump
> {
>     "epoch": 69,
>     "active_gid": 1734502,
>     "active_name": "ceph01",
>     "active_addr": "192.0.2.1:6832/35134",
>     "available": true,
>     "standbys": [
>         {
>             "gid": 1734790,
>             "name": "ceph02",
>             "available_modules": [
>                 "balancer",
>                 "dashboard",
>                 "influx",
>                 "localpool",
>                 "prometheus",
>                 "restful",
>                 "selftest",
>                 "status",
>                 "zabbix"
>             ]
>         }
>     ],
> }
>
> Some time later I get this:
>
> [root@ceph01 ~]# ceph mgr dump
> {
>     "epoch": 69,
>     "active_gid": 1734502,
>     "active_name": "ceph01",
>     "active_addr": "192.0.2.1:6832/35134",
>     "available": true,
>     "standbys": [],
>     ...
> }
>
> Here is the log. Before the upgrade the mgr posts a standby beacon every 2 
> seconds. After the upgrade the message is shown only on startup.
>
> 2018-09-13 10:10:19.191188 7f674b736700  1 mgr send_beacon standby
> 2018-09-13 10:10:21.191565 7f674b736700  1 mgr send_beacon standby
> 2018-09-13 10:10:23.191952 7f674b736700  1 mgr send_beacon standby
> 2018-09-13 10:10:25.192320 7f674b736700  1 mgr send_beacon standby
> 2018-09-13 10:10:27.192695 7f674b736700  1 mgr send_beacon standby
> 2018-09-13 10:10:29.193071 7f674b736700  1 mgr send_beacon standby
> 2018-09-13 10:10:29.434679 7f674af35700 -1 received  signal: Terminated from  
> PID: 1 task name: /usr/lib/systemd/systemd --system --deserialize 22  UID: 0
> 2018-09-13 10:10:29.434692 7f674af35700 -1 mgr handle_signal *** Got signal 
> Terminated ***
> 2018-09-13 10:13:12.390340 7f7c56fcb7c0  0 set uid:gid to 167:167 (ceph:ceph)
> 2018-09-13 10:13:12.390377 7f7c56fcb7c0  0 ceph version 12.2.8 
> (ae699615bac534ea496ee965ac6192cb7e0e07c0) luminous (stable), process 
> ceph-mgr, pid 2368
> 2018-09-13 10:13:12.396043 7f7c56fcb7c0  0 pidfile_write: ignore empty 
> --pid-file
> 2018-09-13 10:13:12.448859 7f7c56fcb7c0  1 mgr send_beacon standby
> 2018-09-13 10:13:12.466768 7f7c4e11e700  1 mgr init Loading python module 
> 'balancer'
> 2018-09-13 10:13:12.506115 7f7c4e11e700  1 mgr init Loading python module 
> 'dashboard'
> 2018-09-13 10:13:12.701486 7f7c4e11e700  1 mgr init Loading python module 
> 'prometheus'
> 2018-09-13 10:13:12.772187 7f7c4e11e700  1 mgr init Loading python module 
> 'restful'
> 2018-09-13 10:13:13.282123 7f7c4e11e700  1 mgr init Loading python module 
> 'status'
> 2018-09-13 10:13:13.390284 7f7c4e11e700  1 mgr load Constructed class from 
> module: dashboard
>
>
> Let me know I have to provide more information on this.

There was very little change in ceph-mgr between 12.2.7 and 12.2.8, so
this is strange.

You could try:
 - setting "debug mgr = 20" on a daemon that's exhibiting this behaviour
 - if the ceph-mgr process is still running, use gdb's "attach"
command and then do a "thread apply all bt" to see where all the
threads are, in case we can see where something is stuck (might need
to install the debug package to get symbols for meaningful backtraces)

John

>
> Thank you for your help.
>
> Best regards,
> Christian
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to