Hello, We are running Mimic 13.2.8 with our cluster, and since upgrading to 13.2.8 the Prometheus plugin seems to hang a lot. It used to respond under 10s but now it often hangs. Restarting the mgr processes helps temporarily but within minutes it gets stuck again.
The active mgr doesn't exit when doing `systemctl stop ceph-mgr.target" and needs to be kill -9'ed. Is there anything I can do to address this issue, or at least get better visibility into the issue? We only have a few plugins enabled: $ ceph mgr module ls { "enabled_modules": [ "balancer", "prometheus", "zabbix" ], 3 mgr processes, but it's a pretty large cluster (near 4000 OSDs) and it's a busy one with lots of rebalancing. (I don't know if a busy cluster would seriously affect the mgr's performance, but just throwing it out there) services: mon: 5 daemons, quorum woodenbox0,woodenbox2,woodenbox4,woodenbox3,woodenbox1 mgr: woodenbox2(active), standbys: woodenbox0, woodenbox1 mds: cephfs-1/1/1 up {0=woodenbox6=up:active}, 1 up:standby-replay osd: 3964 osds: 3928 up, 3928 in; 831 remapped pgs rgw: 4 daemons active Thanks in advance for your help, -Paul Choi _______________________________________________ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io