Check your firewall rules
On Fri, Apr 1, 2016 at 10:28 AM, Nate Curry <[email protected]> wrote:
> I am having some issues with my newly setup cluster. I am able to get all
> of my 32 OSDs to start after setting up udev rules for my journal
> partitions but they keep going down. It did seem like half of them would
> stay up at first but after I checked it this morning I found only 1/4 of
> them were up when I ran "ceph osd tree". The systemd scripts are running
> so it doesn't seem like that is the issue. I don't see anything glaring in
> the log files, which may just reflect my experience level with ceph.
>
> I tried to look for errors and knock out any that seemed obvious but I
> can't seem to get that done either. The cluster was initially set to 64pgs
> and I tried to update that to 1024 but it hasn't finished creating all of
> them and it seems stuck with 270 stale+creating pgs. This is preventing me
> from updating the number of pgps as it says it is busy creating pgs.
>
> I am thinking that the downed OSDs are probably my problem as far as the
> pgs getting created are concerned. I just don't can't seem to find the
> reason why they are going down. Could someone help shine some light on
> this for me?
>
>
> [ceph@matm-cm1 ~]$ ceph status
> cluster 5a463eb9-b918-4d97-b853-7a5ebd3c0ac2
> health HEALTH_ERR
> 1006 pgs are stuck inactive for more than 300 seconds
> 1 pgs degraded
> 140 pgs down
> 736 pgs peering
> 1024 pgs stale
> 1006 pgs stuck inactive
> 18 pgs stuck unclean
> 1 pgs undersized
> pool rbd pg_num 1024 > pgp_num 64
> monmap e1: 3 mons at {matm-cm1=
> 192.168.41.153:6789/0,matm-cm2=192.168.41.154:6789/0,matm-cm3=192.168.41.155:6789/0
> }
> election epoch 8, quorum 0,1,2 matm-cm1,matm-cm2,matm-cm3
> osdmap e417: 32 osds: 9 up, 9 in; 496 remapped pgs
> flags sortbitwise
> pgmap v1129: 1024 pgs, 1 pools, 0 bytes data, 0 objects
> 413 MB used, 16753 GB / 16754 GB avail
> 564 stale+remapped+peering
> 270 stale+creating
> 125 stale+down+remapped+peering
> 32 stale+peering
> 17 stale+active+remapped
> 15 stale+down+peering
> 1 stale+active+undersized+degraded+remapped
>
> [ceph@matm-cm1 ~]$ ceph osd tree
> ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY
> -1 58.17578 root default
> -2 14.54395 host matm-cs1
> 0 1.81799 osd.0 down 0 1.00000
> 1 1.81799 osd.1 down 0 1.00000
> 2 1.81799 osd.2 down 0 1.00000
> 3 1.81799 osd.3 down 0 1.00000
> 4 1.81799 osd.4 down 0 1.00000
> 5 1.81799 osd.5 down 0 1.00000
> 6 1.81799 osd.6 down 0 1.00000
> 7 1.81799 osd.7 down 0 1.00000
> -3 14.54395 host matm-cs2
> 8 1.81799 osd.8 up 1.00000 1.00000
> 9 1.81799 osd.9 up 1.00000 1.00000
> 10 1.81799 osd.10 up 1.00000 1.00000
> 11 1.81799 osd.11 up 1.00000 1.00000
> 12 1.81799 osd.12 up 1.00000 1.00000
> 13 1.81799 osd.13 up 1.00000 1.00000
> 14 1.81799 osd.14 up 1.00000 1.00000
> 15 1.81799 osd.15 up 1.00000 1.00000
> -4 14.54395 host matm-cs3
> 16 1.81799 osd.16 down 0 1.00000
> 17 1.81799 osd.17 down 0 1.00000
> 18 1.81799 osd.18 down 0 1.00000
> 19 1.81799 osd.19 down 0 1.00000
> 20 1.81799 osd.20 down 0 1.00000
> 21 1.81799 osd.21 down 0 1.00000
> 22 1.81799 osd.22 down 0 1.00000
> 23 1.81799 osd.23 down 0 1.00000
> -5 14.54395 host matm-cs4
> 24 1.81799 osd.24 down 0 1.00000
> 31 1.81799 osd.31 down 0 1.00000
> 25 1.81799 osd.25 down 0 1.00000
> 27 1.81799 osd.27 down 0 1.00000
> 29 1.81799 osd.29 down 0 1.00000
> 28 1.81799 osd.28 down 0 1.00000
> 30 1.81799 osd.30 up 1.00000 1.00000
> 26 1.81799 osd.26 down 0 1.00000
>
>
>
>
> *Nate Curry*
>
>
> _______________________________________________
> ceph-users mailing list
> [email protected]
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com