Check your firewall rules On Fri, Apr 1, 2016 at 10:28 AM, Nate Curry <cu...@mosaicatm.com> wrote:
> I am having some issues with my newly setup cluster. I am able to get all > of my 32 OSDs to start after setting up udev rules for my journal > partitions but they keep going down. It did seem like half of them would > stay up at first but after I checked it this morning I found only 1/4 of > them were up when I ran "ceph osd tree". The systemd scripts are running > so it doesn't seem like that is the issue. I don't see anything glaring in > the log files, which may just reflect my experience level with ceph. > > I tried to look for errors and knock out any that seemed obvious but I > can't seem to get that done either. The cluster was initially set to 64pgs > and I tried to update that to 1024 but it hasn't finished creating all of > them and it seems stuck with 270 stale+creating pgs. This is preventing me > from updating the number of pgps as it says it is busy creating pgs. > > I am thinking that the downed OSDs are probably my problem as far as the > pgs getting created are concerned. I just don't can't seem to find the > reason why they are going down. Could someone help shine some light on > this for me? > > > [ceph@matm-cm1 ~]$ ceph status > cluster 5a463eb9-b918-4d97-b853-7a5ebd3c0ac2 > health HEALTH_ERR > 1006 pgs are stuck inactive for more than 300 seconds > 1 pgs degraded > 140 pgs down > 736 pgs peering > 1024 pgs stale > 1006 pgs stuck inactive > 18 pgs stuck unclean > 1 pgs undersized > pool rbd pg_num 1024 > pgp_num 64 > monmap e1: 3 mons at {matm-cm1= > 192.168.41.153:6789/0,matm-cm2=192.168.41.154:6789/0,matm-cm3=192.168.41.155:6789/0 > } > election epoch 8, quorum 0,1,2 matm-cm1,matm-cm2,matm-cm3 > osdmap e417: 32 osds: 9 up, 9 in; 496 remapped pgs > flags sortbitwise > pgmap v1129: 1024 pgs, 1 pools, 0 bytes data, 0 objects > 413 MB used, 16753 GB / 16754 GB avail > 564 stale+remapped+peering > 270 stale+creating > 125 stale+down+remapped+peering > 32 stale+peering > 17 stale+active+remapped > 15 stale+down+peering > 1 stale+active+undersized+degraded+remapped > > [ceph@matm-cm1 ~]$ ceph osd tree > ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY > -1 58.17578 root default > -2 14.54395 host matm-cs1 > 0 1.81799 osd.0 down 0 1.00000 > 1 1.81799 osd.1 down 0 1.00000 > 2 1.81799 osd.2 down 0 1.00000 > 3 1.81799 osd.3 down 0 1.00000 > 4 1.81799 osd.4 down 0 1.00000 > 5 1.81799 osd.5 down 0 1.00000 > 6 1.81799 osd.6 down 0 1.00000 > 7 1.81799 osd.7 down 0 1.00000 > -3 14.54395 host matm-cs2 > 8 1.81799 osd.8 up 1.00000 1.00000 > 9 1.81799 osd.9 up 1.00000 1.00000 > 10 1.81799 osd.10 up 1.00000 1.00000 > 11 1.81799 osd.11 up 1.00000 1.00000 > 12 1.81799 osd.12 up 1.00000 1.00000 > 13 1.81799 osd.13 up 1.00000 1.00000 > 14 1.81799 osd.14 up 1.00000 1.00000 > 15 1.81799 osd.15 up 1.00000 1.00000 > -4 14.54395 host matm-cs3 > 16 1.81799 osd.16 down 0 1.00000 > 17 1.81799 osd.17 down 0 1.00000 > 18 1.81799 osd.18 down 0 1.00000 > 19 1.81799 osd.19 down 0 1.00000 > 20 1.81799 osd.20 down 0 1.00000 > 21 1.81799 osd.21 down 0 1.00000 > 22 1.81799 osd.22 down 0 1.00000 > 23 1.81799 osd.23 down 0 1.00000 > -5 14.54395 host matm-cs4 > 24 1.81799 osd.24 down 0 1.00000 > 31 1.81799 osd.31 down 0 1.00000 > 25 1.81799 osd.25 down 0 1.00000 > 27 1.81799 osd.27 down 0 1.00000 > 29 1.81799 osd.29 down 0 1.00000 > 28 1.81799 osd.28 down 0 1.00000 > 30 1.81799 osd.30 up 1.00000 1.00000 > 26 1.81799 osd.26 down 0 1.00000 > > > > > *Nate Curry* > > > _______________________________________________ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > >
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com