Hi,

Could you catch any segmentation faults in /var/log/ceph/ceph-osd.11.log ?

Regards,
Josef

> On 15 Nov 2015, at 23:06, Claes Sahlström <cl...@verymetal.com> wrote:
> 
> Sorry to almost double post, I noticed that it seems like one mon is down, 
> but they do actually seem to be ok, the 11 that are in falls out and I am 
> back at 7 healthy OSD:s again:
>  
> root@black:/var/lib/ceph/mon# ceph -s
>     cluster ee8eae7a-5994-48bc-bd43-aa07639a543b
>      health HEALTH_WARN
>             108 pgs backfill
>             37 pgs backfilling
>             2339 pgs degraded
>             105 pgs down
>             237 pgs peering
>             138 pgs stale
>             765 pgs stuck degraded
>             173 pgs stuck inactive
>             138 pgs stuck stale
>             3327 pgs stuck unclean
>             765 pgs stuck undersized
>             2339 pgs undersized
>             recovery 1612956/6242357 objects degraded (25.839%)
>             recovery 772311/6242357 objects misplaced (12.372%)
>             too many PGs per OSD (561 > max 350)
>             4/11 in osds are down
>      monmap e3: 3 mons at 
> {black=172.16.0.201:6789/0,orange=172.16.0.203:6789/0,purple=172.16.0.202:6789/0}
>             election epoch 456, quorum 0,1,2 black,purple,orange
>      mdsmap e5: 0/0/1 up
>      osdmap e35627: 12 osds: 7 up, 11 in; 1201 remapped pgs
>       pgmap v8215121: 4608 pgs, 3 pools, 11897 GB data, 2996 kobjects
>             17203 GB used, 8865 GB / 26069 GB avail
>             1612956/6242357 objects degraded (25.839%)
>             772311/6242357 objects misplaced (12.372%)
>                 2137 active+undersized+degraded
>                 1052 active+clean
>                  783 active+remapped
>                  137 stale+active+undersized+degraded
>                  104 down+peering
>                  102 active+remapped+wait_backfill
>                   66 remapped+peering
>                   65 peering
>                   33 active+remapped+backfilling
>                   27 activating+undersized+degraded
>                   26 active+undersized+degraded+remapped
>                   25 activating
>                   16 remapped
>                   14 inactive
>                    7 activating+remapped
>                    6 active+undersized+degraded+remapped+wait_backfill
>                    4 active+undersized+degraded+remapped+backfilling
>                    2 activating+undersized+degraded+remapped
>                    1 down+remapped+peering
>                    1 stale+remapped+peering
> recovery io 22108 MB/s, 5581 objects/s
>   client io 1065 MB/s rd, 2317 MB/s wr, 11435 op/s
>  
> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of 
> Claes Sahlström
> Sent: den 15 november 2015 21:56
> To: ceph-users@lists.ceph.com
> Subject: [ceph-users] OSD:s failing out after upgrade to 9.2.0 on Ubuntu 14.04
>  
> Hi,
>  
> I have a problem I hope is possible to solve…
>  
> I upgraded to 9.2.0 a couple of days back and I missed this part:
> “If your systems already have a ceph user, upgrading the package will cause 
> problems. We suggest you first remove or rename the existing ‘ceph’ user and 
> ‘ceph’ group before upgrading.”
>  
> I guess that might be the reason why my OSD:s has started to die on me.
>  
> I can get the osd-services when having the file permissions as root:root  and 
> using:
> setuser match path = /var/lib/ceph/$type/$cluster-$i
>  
> I am really not sure where to look to find out what is wrong.
>  
> First when I had upgraded and the OSD:s were restarted then I got a 
> permission denied on the ods-directories and that was solve then adding the 
> “setuser match” in ceph.conf.
>  
> With 5 of 12 OSD:s down I am starting to worry and since I only have one 
> replica I might lose som data. As I mentioned the OSD-services start and 
> “ceph osd in” does not give me any error but the OSD never comes up.
>  
> Any suggestions or helpful tips are most welcome,
>  
> /Claes
>  
>  
>  
>  
>  
>  
> ID WEIGHT   TYPE NAME       UP/DOWN REWEIGHT PRIMARY-AFFINITY
> -1 24.00000 root default
> -2  8.00000     host black
> 3  2.00000         osd.3        up  1.00000          1.00000
> 2  2.00000         osd.2        up  1.00000          1.00000
> 0  2.00000         osd.0        up  1.00000          1.00000
> 1  2.00000         osd.1        up  1.00000          1.00000
> -3  8.00000     host purple
> 7  2.00000         osd.7      down        0          1.00000
> 6  2.00000         osd.6        up  1.00000          1.00000
> 4  2.00000         osd.4        up  1.00000          1.00000
> 5  2.00000         osd.5        up  1.00000          1.00000
> -4  8.00000     host orange
> 11  2.00000         osd.11     down        0          1.00000
> 10  2.00000         osd.10     down        0          1.00000
> 8  2.00000         osd.8      down        0          1.00000
> 9  2.00000         osd.9      down        0          1.00000
>  
>  
>  
>  
>  
>  
> root@black:/var/log/ceph# ceph -s
> 2015-11-15 21:55:27.919339 7ffb38446700  0 -- :/1336310814 >> 
> 172.16.0.203:6789/0 pipe(0x7ffb34064550 sd=3 :0 s=1 pgs=0 cs=0 l=1 
> c=0x7ffb3405e000).fault
>     cluster ee8eae7a-5994-48bc-bd43-aa07639a543b
>      health HEALTH_WARN
>             1591 pgs backfill
>             38 pgs backfilling
>             2439 pgs degraded
>             105 pgs down
>             106 pgs peering
>             138 pgs stale
>             2439 pgs stuck degraded
>             106 pgs stuck inactive
>             138 pgs stuck stale
>             2873 pgs stuck unclean
>             2439 pgs stuck undersized
>             2439 pgs undersized
>             recovery 1694156/6668499 objects degraded (25.405%)
>             recovery 2315800/6668499 objects misplaced (34.727%)
>             too many PGs per OSD (1197 > max 350)
>             1 mons down, quorum 0,1 black,purple
>      monmap e3: 3 mons at 
> {black=172.16.0.201:6789/0,orange=172.16.0.203:6789/0,purple=172.16.0.202:6789/0}
>             election epoch 448, quorum 0,1 black,purple
>      mdsmap e5: 0/0/1 up
>      osdmap e34098: 12 osds: 7 up, 7 in; 2024 remapped pgs
>       pgmap v8211622: 4608 pgs, 3 pools, 12027 GB data, 3029 kobjects
>             17141 GB used, 8927 GB / 26069 GB avail
>             1694156/6668499 objects degraded (25.405%)
>             2315800/6668499 objects misplaced (34.727%)
>                 1735 active+clean
>                 1590 active+undersized+degraded+remapped+wait_backfill
>                  637 active+undersized+degraded
>                  326 active+remapped
>                  137 stale+active+undersized+degraded
>                  101 down+peering
>                   38 active+undersized+degraded+remapped+backfilling
>                   37 active+undersized+degraded+remapped
>                    4 down+remapped+peering
>                    1 stale+remapped+peering
>                    1 active
>                    1 active+remapped+wait_backfill
> recovery io 66787 kB/s, 16 objects/s
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to