[ceph-users] OSDs keep going down

Nate Curry Fri, 01 Apr 2016 10:30:20 -0700

I am having some issues with my newly setup cluster.  I am able to get all
of my 32 OSDs to start after setting up udev rules for my journal
partitions but they keep going down.  It did seem like half of them would
stay up at first but after I checked it this morning I found only 1/4 of
them were up when I ran "ceph osd tree".  The systemd scripts are running
so it doesn't seem like that is the issue.  I don't see anything glaring in
the log files, which may just reflect my experience level with ceph.


I tried to look for errors and knock out any that seemed obvious but I
can't seem to get that done either.  The cluster was initially set to 64pgs
and I tried to update that to 1024 but it hasn't finished creating all of
them and it seems stuck with 270 stale+creating pgs.  This is preventing me
from updating the number of pgps as it says it is busy creating pgs.

I am thinking that the downed OSDs are probably my problem as far as the
pgs getting created are concerned.  I just don't can't seem to find the
reason why they are going down.  Could someone help shine some light on
this for me?


[ceph@matm-cm1 ~]$ ceph status
    cluster 5a463eb9-b918-4d97-b853-7a5ebd3c0ac2
     health HEALTH_ERR
            1006 pgs are stuck inactive for more than 300 seconds
            1 pgs degraded
            140 pgs down
            736 pgs peering
            1024 pgs stale
            1006 pgs stuck inactive
            18 pgs stuck unclean
            1 pgs undersized
            pool rbd pg_num 1024 > pgp_num 64
     monmap e1: 3 mons at {matm-cm1=
192.168.41.153:6789/0,matm-cm2=192.168.41.154:6789/0,matm-cm3=192.168.41.155:6789/0
}
            election epoch 8, quorum 0,1,2 matm-cm1,matm-cm2,matm-cm3
     osdmap e417: 32 osds: 9 up, 9 in; 496 remapped pgs
            flags sortbitwise
      pgmap v1129: 1024 pgs, 1 pools, 0 bytes data, 0 objects
            413 MB used, 16753 GB / 16754 GB avail
                 564 stale+remapped+peering
                 270 stale+creating
                 125 stale+down+remapped+peering
                  32 stale+peering
                  17 stale+active+remapped
                  15 stale+down+peering
                   1 stale+active+undersized+degraded+remapped

[ceph@matm-cm1 ~]$ ceph osd tree
ID WEIGHT   TYPE NAME         UP/DOWN REWEIGHT PRIMARY-AFFINITY
-1 58.17578 root default
-2 14.54395     host matm-cs1
 0  1.81799         osd.0        down        0          1.00000
 1  1.81799         osd.1        down        0          1.00000
 2  1.81799         osd.2        down        0          1.00000
 3  1.81799         osd.3        down        0          1.00000
 4  1.81799         osd.4        down        0          1.00000
 5  1.81799         osd.5        down        0          1.00000
 6  1.81799         osd.6        down        0          1.00000
 7  1.81799         osd.7        down        0          1.00000
-3 14.54395     host matm-cs2
 8  1.81799         osd.8          up  1.00000          1.00000
 9  1.81799         osd.9          up  1.00000          1.00000
10  1.81799         osd.10         up  1.00000          1.00000
11  1.81799         osd.11         up  1.00000          1.00000
12  1.81799         osd.12         up  1.00000          1.00000
13  1.81799         osd.13         up  1.00000          1.00000
14  1.81799         osd.14         up  1.00000          1.00000
15  1.81799         osd.15         up  1.00000          1.00000
-4 14.54395     host matm-cs3
16  1.81799         osd.16       down        0          1.00000
17  1.81799         osd.17       down        0          1.00000
18  1.81799         osd.18       down        0          1.00000
19  1.81799         osd.19       down        0          1.00000
20  1.81799         osd.20       down        0          1.00000
21  1.81799         osd.21       down        0          1.00000
22  1.81799         osd.22       down        0          1.00000
23  1.81799         osd.23       down        0          1.00000
-5 14.54395     host matm-cs4
24  1.81799         osd.24       down        0          1.00000
31  1.81799         osd.31       down        0          1.00000
25  1.81799         osd.25       down        0          1.00000
27  1.81799         osd.27       down        0          1.00000
29  1.81799         osd.29       down        0          1.00000
28  1.81799         osd.28       down        0          1.00000
30  1.81799         osd.30         up  1.00000          1.00000
26  1.81799         osd.26       down        0          1.00000




*Nate Curry*

_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] OSDs keep going down

Reply via email to