I am having some issues with my newly setup cluster. I am able to get all of my 32 OSDs to start after setting up udev rules for my journal partitions but they keep going down. It did seem like half of them would stay up at first but after I checked it this morning I found only 1/4 of them were up when I ran "ceph osd tree". The systemd scripts are running so it doesn't seem like that is the issue. I don't see anything glaring in the log files, which may just reflect my experience level with ceph.
I tried to look for errors and knock out any that seemed obvious but I can't seem to get that done either. The cluster was initially set to 64pgs and I tried to update that to 1024 but it hasn't finished creating all of them and it seems stuck with 270 stale+creating pgs. This is preventing me from updating the number of pgps as it says it is busy creating pgs. I am thinking that the downed OSDs are probably my problem as far as the pgs getting created are concerned. I just don't can't seem to find the reason why they are going down. Could someone help shine some light on this for me? [ceph@matm-cm1 ~]$ ceph status cluster 5a463eb9-b918-4d97-b853-7a5ebd3c0ac2 health HEALTH_ERR 1006 pgs are stuck inactive for more than 300 seconds 1 pgs degraded 140 pgs down 736 pgs peering 1024 pgs stale 1006 pgs stuck inactive 18 pgs stuck unclean 1 pgs undersized pool rbd pg_num 1024 > pgp_num 64 monmap e1: 3 mons at {matm-cm1= 192.168.41.153:6789/0,matm-cm2=192.168.41.154:6789/0,matm-cm3=192.168.41.155:6789/0 } election epoch 8, quorum 0,1,2 matm-cm1,matm-cm2,matm-cm3 osdmap e417: 32 osds: 9 up, 9 in; 496 remapped pgs flags sortbitwise pgmap v1129: 1024 pgs, 1 pools, 0 bytes data, 0 objects 413 MB used, 16753 GB / 16754 GB avail 564 stale+remapped+peering 270 stale+creating 125 stale+down+remapped+peering 32 stale+peering 17 stale+active+remapped 15 stale+down+peering 1 stale+active+undersized+degraded+remapped [ceph@matm-cm1 ~]$ ceph osd tree ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY -1 58.17578 root default -2 14.54395 host matm-cs1 0 1.81799 osd.0 down 0 1.00000 1 1.81799 osd.1 down 0 1.00000 2 1.81799 osd.2 down 0 1.00000 3 1.81799 osd.3 down 0 1.00000 4 1.81799 osd.4 down 0 1.00000 5 1.81799 osd.5 down 0 1.00000 6 1.81799 osd.6 down 0 1.00000 7 1.81799 osd.7 down 0 1.00000 -3 14.54395 host matm-cs2 8 1.81799 osd.8 up 1.00000 1.00000 9 1.81799 osd.9 up 1.00000 1.00000 10 1.81799 osd.10 up 1.00000 1.00000 11 1.81799 osd.11 up 1.00000 1.00000 12 1.81799 osd.12 up 1.00000 1.00000 13 1.81799 osd.13 up 1.00000 1.00000 14 1.81799 osd.14 up 1.00000 1.00000 15 1.81799 osd.15 up 1.00000 1.00000 -4 14.54395 host matm-cs3 16 1.81799 osd.16 down 0 1.00000 17 1.81799 osd.17 down 0 1.00000 18 1.81799 osd.18 down 0 1.00000 19 1.81799 osd.19 down 0 1.00000 20 1.81799 osd.20 down 0 1.00000 21 1.81799 osd.21 down 0 1.00000 22 1.81799 osd.22 down 0 1.00000 23 1.81799 osd.23 down 0 1.00000 -5 14.54395 host matm-cs4 24 1.81799 osd.24 down 0 1.00000 31 1.81799 osd.31 down 0 1.00000 25 1.81799 osd.25 down 0 1.00000 27 1.81799 osd.27 down 0 1.00000 29 1.81799 osd.29 down 0 1.00000 28 1.81799 osd.28 down 0 1.00000 30 1.81799 osd.30 up 1.00000 1.00000 26 1.81799 osd.26 down 0 1.00000 *Nate Curry*
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com