Hi All,

In regards to the monitoring services on a Ceph Cluster (ie Prometheus, Grafana, Alertmanager, Loki, Node-Exported, Promtail, etc) how many instances should/can we run for fault tolerance purposes? I can't seem to recall that advice being in the doco anywhere (but of course, I probably missed it).

I'm concerned about HA on those services - will they continue to run if the Ceph Node they're on fails?

At the moment we're running only 1 instance of each in the cluster, but several Ceph Nodes are capable of running each - ie/eg 3 nodes configured but only count:1.

This is on the latest version of Reef using cephadmin (if it makes a huge difference :-) ).

So any advice, etc, would be greatly appreciated, including if we should be running any services not mentioned (not Mgr, Mon, OSD, or iSCSI, obviously :-) )

Cheers

Dulux-Oz
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to