Re: [ceph-users] OSDs keep going down

Bob R Fri, 01 Apr 2016 16:53:51 -0700

Check your firewall rules

On Fri, Apr 1, 2016 at 10:28 AM, Nate Curry <cu...@mosaicatm.com> wrote:


> I am having some issues with my newly setup cluster.  I am able to get all
> of my 32 OSDs to start after setting up udev rules for my journal
> partitions but they keep going down.  It did seem like half of them would
> stay up at first but after I checked it this morning I found only 1/4 of
> them were up when I ran "ceph osd tree".  The systemd scripts are running
> so it doesn't seem like that is the issue.  I don't see anything glaring in
> the log files, which may just reflect my experience level with ceph.
>
> I tried to look for errors and knock out any that seemed obvious but I
> can't seem to get that done either.  The cluster was initially set to 64pgs
> and I tried to update that to 1024 but it hasn't finished creating all of
> them and it seems stuck with 270 stale+creating pgs.  This is preventing me
> from updating the number of pgps as it says it is busy creating pgs.
>
> I am thinking that the downed OSDs are probably my problem as far as the
> pgs getting created are concerned.  I just don't can't seem to find the
> reason why they are going down.  Could someone help shine some light on
> this for me?
>
>
> [ceph@matm-cm1 ~]$ ceph status
>     cluster 5a463eb9-b918-4d97-b853-7a5ebd3c0ac2
>      health HEALTH_ERR
>             1006 pgs are stuck inactive for more than 300 seconds
>             1 pgs degraded
>             140 pgs down
>             736 pgs peering
>             1024 pgs stale
>             1006 pgs stuck inactive
>             18 pgs stuck unclean
>             1 pgs undersized
>             pool rbd pg_num 1024 > pgp_num 64
>      monmap e1: 3 mons at {matm-cm1=
> 192.168.41.153:6789/0,matm-cm2=192.168.41.154:6789/0,matm-cm3=192.168.41.155:6789/0
> }
>             election epoch 8, quorum 0,1,2 matm-cm1,matm-cm2,matm-cm3
>      osdmap e417: 32 osds: 9 up, 9 in; 496 remapped pgs
>             flags sortbitwise
>       pgmap v1129: 1024 pgs, 1 pools, 0 bytes data, 0 objects
>             413 MB used, 16753 GB / 16754 GB avail
>                  564 stale+remapped+peering
>                  270 stale+creating
>                  125 stale+down+remapped+peering
>                   32 stale+peering
>                   17 stale+active+remapped
>                   15 stale+down+peering
>                    1 stale+active+undersized+degraded+remapped
>
> [ceph@matm-cm1 ~]$ ceph osd tree
> ID WEIGHT   TYPE NAME         UP/DOWN REWEIGHT PRIMARY-AFFINITY
> -1 58.17578 root default
> -2 14.54395     host matm-cs1
>  0  1.81799         osd.0        down        0          1.00000
>  1  1.81799         osd.1        down        0          1.00000
>  2  1.81799         osd.2        down        0          1.00000
>  3  1.81799         osd.3        down        0          1.00000
>  4  1.81799         osd.4        down        0          1.00000
>  5  1.81799         osd.5        down        0          1.00000
>  6  1.81799         osd.6        down        0          1.00000
>  7  1.81799         osd.7        down        0          1.00000
> -3 14.54395     host matm-cs2
>  8  1.81799         osd.8          up  1.00000          1.00000
>  9  1.81799         osd.9          up  1.00000          1.00000
> 10  1.81799         osd.10         up  1.00000          1.00000
> 11  1.81799         osd.11         up  1.00000          1.00000
> 12  1.81799         osd.12         up  1.00000          1.00000
> 13  1.81799         osd.13         up  1.00000          1.00000
> 14  1.81799         osd.14         up  1.00000          1.00000
> 15  1.81799         osd.15         up  1.00000          1.00000
> -4 14.54395     host matm-cs3
> 16  1.81799         osd.16       down        0          1.00000
> 17  1.81799         osd.17       down        0          1.00000
> 18  1.81799         osd.18       down        0          1.00000
> 19  1.81799         osd.19       down        0          1.00000
> 20  1.81799         osd.20       down        0          1.00000
> 21  1.81799         osd.21       down        0          1.00000
> 22  1.81799         osd.22       down        0          1.00000
> 23  1.81799         osd.23       down        0          1.00000
> -5 14.54395     host matm-cs4
> 24  1.81799         osd.24       down        0          1.00000
> 31  1.81799         osd.31       down        0          1.00000
> 25  1.81799         osd.25       down        0          1.00000
> 27  1.81799         osd.27       down        0          1.00000
> 29  1.81799         osd.29       down        0          1.00000
> 28  1.81799         osd.28       down        0          1.00000
> 30  1.81799         osd.30         up  1.00000          1.00000
> 26  1.81799         osd.26       down        0          1.00000
>
>
>
>
> *Nate Curry*
>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] OSDs keep going down

Reply via email to