Thank you for your time. Dimitar Boichev <Dimitar.Boichev@...> writes:
> > I am sure that I speak for the majority of people reading this, when I say that I didn't get anything from your emails. > Could you provide more debug information ? > Like (but not limited to): > ceph -s > ceph health details > ceph osd tree I asked infact what I need to provide because honestly I do not know. Here is ceph -s: cluster ac7bc476-3a02-453d-8e5c-606ab6f022ca health HEALTH_WARN 4 pgs incomplete 4 pgs stuck inactive 4 pgs stuck unclean monmap e8: 3 mons at {0=10.1.0.12:6789/0,1=10.1.0.14:6789/0,2=10.1.0.17:6789/0} election epoch 832, quorum 0,1,2 0,1,2 osdmap e2400: 3 osds: 3 up, 3 in pgmap v5883297: 288 pgs, 4 pools, 391 GB data, 100 kobjects 1090 GB used, 4481 GB / 5571 GB avail 284 active+clean 4 incomplete ceph health detail: cluster ac7bc476-3a02-453d-8e5c-606ab6f022ca health HEALTH_WARN 4 pgs incomplete 4 pgs stuck inactive 4 pgs stuck unclean monmap e8: 3 mons at {0=10.1.0.12:6789/0,1=10.1.0.14:6789/0,2=10.1.0.17:6789/0} election epoch 832, quorum 0,1,2 0,1,2 osdmap e2400: 3 osds: 3 up, 3 in pgmap v5883297: 288 pgs, 4 pools, 391 GB data, 100 kobjects 1090 GB used, 4481 GB / 5571 GB avail 284 active+clean 4 incomplete ceph osd tree: ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY -1 5.42999 root default -2 1.81000 host proxmox-quad3 0 1.81000 osd.0 up 1.00000 1.00000 -3 1.81000 host proxmox-zotac 1 1.81000 osd.1 up 1.00000 1.00000 -4 1.81000 host proxmox-hp 3 1.81000 osd.3 up 1.00000 1.00000 > > I am really having a bad time trying to decode the exact problems. > First you had network issues, then osd failed (in the same time or after?), > Then the cluser did not have enough free space to recover I suppose ? > It is a three server/osd test/evaluation system with Ceph and Proxmox PVE. The load is very light and there is a lot of free space. So: - I NEVER had network issues. People TOLD me that I must have network problems. I changed cables and switches just in case but nothing improved. - One disk had bad sectors. So I added another disk/osd and then removed the osd. Following official documentation. After that the cluster runned ok for two months. So there was enough free space and the cluster has recovered. - Then one day I discovered that proxmox backup was hanged and I see that it was because ceph was not responding. > Regarding the slow SSD disks, what disks are you using ? I said SSHD that is a standard hdd with ssd cache. It is 7200rpms but in benchmarks it is better than a 10000rpm disk. Thanks again, Mario _______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com