At the moment I've found that the mgr daemon works fine when I move it to an OSD node. All nodes have the same OS version, so I can conclude that the problem is limited to the nodes that normally run mgr. I'm still investigating what's happening, but at least I got the monitoring back.
Regards. On Tue, Jun 4, 2024 at 4:01 PM Dario Graña <dgr...@pic.es> wrote: > Hi all! > > I'm running ceph quincy 17.2.7 in a cluster. On monday I updated the OS to > AlmaLinux 9.3 to 9.4, since then grafana shows "No Data" message in all > ceph related fields but, for example, the nodes information is still fine > (Host Detail Dashboard). > I have redeployed the mgr service with cephadm, disabled and re-enabled > mgr prometheus module , but nothing changed. Digging into the problem, I > accessed the prometheus interface. When I access prometheus, and found this > error[image: Screen Shot 2024-06-04 at 15.22.37.png] > When I access the node shown as down, it reports > 503 Service Unavailable > > No cached data available yet > > Traceback (most recent call last): > File "/lib/python3.6/site-packages/cherrypy/_cprequest.py", line 638, in > respond > self._do_respond(path_info) > File "/lib/python3.6/site-packages/cherrypy/_cprequest.py", line 697, in > _do_respond > response.body = self.handler() > File "/lib/python3.6/site-packages/cherrypy/lib/encoding.py", line 219, in > __call__ > self.body = self.oldhandler(*args, **kwargs) > File "/lib/python3.6/site-packages/cherrypy/_cpdispatch.py", line 54, in > __call__ > return self.callable(*self.args, **self.kwargs) > File "/usr/share/ceph/mgr/prometheus/module.py", line 1751, in metrics > return self._metrics(_global_instance) > File "/usr/share/ceph/mgr/prometheus/module.py", line 1762, in _metrics > raise cherrypy.HTTPError(503, 'No cached data available yet') > cherrypy._cperror.HTTPError: (503, 'No cached data available yet') > > I checked the mgr prometheus address and port > [ceph: root@ceph-admin01 /]# ceph config get mgr > mgr/prometheus/server_addr > :: > [ceph: root@ceph-admin01 /]# ceph config get mgr > mgr/prometheus/server_port > 9283 > > It seems to be ok. > > When I check the master manager node for the port, I found > [root@ceph-hn01 ~]# netstat -natup | grep 9283 > tcp6 0 0 :::9283 :::* LISTEN > 2453/ceph-mgr > tcp6 0 0 192.168.97.51:9283 192.168.97.60:36130 > ESTABLISHED 2453/ceph-mgr > > I don't understand why it is showing as IPv6, the node doesn't have a dual > stack. > > I also tried to use a newer version of the prometheus container image, the > 1.6.0, but it keeps reporting the same, so I rolled it back to the original > one. > > Has anyone experienced an issue like this? > Where can I look for more information about it? > > Thanks in advance. > > Regards. > -- > Dario Graña > PIC (Port d'Informació Científica) > Campus UAB, Edificio D > E-08193 Bellaterra, Barcelona > http://www.pic.es > Avis - Aviso - Legal Notice: http://legal.ifae.es >
_______________________________________________ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io