--- Begin Message ---
Hi Nada,
El 27/10/21 a las 20:41, nada escribió:
are all your ceph nodes time synced ?
PLS check chrony and drift time
# ceph time-sync-status
Yes seems they are:
# ceph time-sync-status
{
"time_skew_status": {
"amaiur": {
"skew": 0,
"latency": 0,
"health": "HEALTH_OK"
},
"2": {
"skew": -0.0052040435139160159,
"latency": 0.00021316702537780806,
"health": "HEALTH_OK"
},
"3": {
"skew": -0.0077342363594970704,
"latency": 0.00020703031996249557,
"health": "HEALTH_OK"
}
},
"timechecks": {
"epoch": 4648,
"round": 1730,
"round_status": "finished"
}
}
If this was the issue, I'd expect all OSDs in that node to crash (not
just one as happened)?
in case you have new spare disk and free slot, try to add new OSD and
stabilize ceph cluster
in case you do not have free slot (and you are sure that it failed)
you have to replace it
https://docs.ceph.com/en/latest/rados/operations/add-or-rm-osds/#replacing-an-osd
before any changes identify disk and OSD status
list disk bay at server # pvs
# ceph osd tree
# ceph device ls
# ceph-volume lvm list > osd-lvm-list-202110XX
BTW we started with ceph cluster here in June, so sorry i am beginner
with ceph
Thanks for your comments. Ceph (systemd) automatically restarted crashed
OSD, it is working and cluster is healthy so no pressing worries here,
just trying to understand what happened and see whether there's
something we can do so that crash doesn't happen again :)
Cheers
Eneko Lacunza
Zuzendari teknikoa | Director técnico
Binovo IT Human Project
Tel. +34 943 569 206 | https://www.binovo.es
Astigarragako Bidea, 2 - 2º izda. Oficina 10-11, 20180 Oiartzun
https://www.youtube.com/user/CANALBINOVO
https://www.linkedin.com/company/37269706/
--- End Message ---
_______________________________________________
pve-user mailing list
[email protected]
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user