Hi, Could you catch any segmentation faults in /var/log/ceph/ceph-osd.11.log ?
Regards, Josef > On 15 Nov 2015, at 23:06, Claes Sahlström <cl...@verymetal.com> wrote: > > Sorry to almost double post, I noticed that it seems like one mon is down, > but they do actually seem to be ok, the 11 that are in falls out and I am > back at 7 healthy OSD:s again: > > root@black:/var/lib/ceph/mon# ceph -s > cluster ee8eae7a-5994-48bc-bd43-aa07639a543b > health HEALTH_WARN > 108 pgs backfill > 37 pgs backfilling > 2339 pgs degraded > 105 pgs down > 237 pgs peering > 138 pgs stale > 765 pgs stuck degraded > 173 pgs stuck inactive > 138 pgs stuck stale > 3327 pgs stuck unclean > 765 pgs stuck undersized > 2339 pgs undersized > recovery 1612956/6242357 objects degraded (25.839%) > recovery 772311/6242357 objects misplaced (12.372%) > too many PGs per OSD (561 > max 350) > 4/11 in osds are down > monmap e3: 3 mons at > {black=172.16.0.201:6789/0,orange=172.16.0.203:6789/0,purple=172.16.0.202:6789/0} > election epoch 456, quorum 0,1,2 black,purple,orange > mdsmap e5: 0/0/1 up > osdmap e35627: 12 osds: 7 up, 11 in; 1201 remapped pgs > pgmap v8215121: 4608 pgs, 3 pools, 11897 GB data, 2996 kobjects > 17203 GB used, 8865 GB / 26069 GB avail > 1612956/6242357 objects degraded (25.839%) > 772311/6242357 objects misplaced (12.372%) > 2137 active+undersized+degraded > 1052 active+clean > 783 active+remapped > 137 stale+active+undersized+degraded > 104 down+peering > 102 active+remapped+wait_backfill > 66 remapped+peering > 65 peering > 33 active+remapped+backfilling > 27 activating+undersized+degraded > 26 active+undersized+degraded+remapped > 25 activating > 16 remapped > 14 inactive > 7 activating+remapped > 6 active+undersized+degraded+remapped+wait_backfill > 4 active+undersized+degraded+remapped+backfilling > 2 activating+undersized+degraded+remapped > 1 down+remapped+peering > 1 stale+remapped+peering > recovery io 22108 MB/s, 5581 objects/s > client io 1065 MB/s rd, 2317 MB/s wr, 11435 op/s > > From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of > Claes Sahlström > Sent: den 15 november 2015 21:56 > To: ceph-users@lists.ceph.com > Subject: [ceph-users] OSD:s failing out after upgrade to 9.2.0 on Ubuntu 14.04 > > Hi, > > I have a problem I hope is possible to solve… > > I upgraded to 9.2.0 a couple of days back and I missed this part: > “If your systems already have a ceph user, upgrading the package will cause > problems. We suggest you first remove or rename the existing ‘ceph’ user and > ‘ceph’ group before upgrading.” > > I guess that might be the reason why my OSD:s has started to die on me. > > I can get the osd-services when having the file permissions as root:root and > using: > setuser match path = /var/lib/ceph/$type/$cluster-$i > > I am really not sure where to look to find out what is wrong. > > First when I had upgraded and the OSD:s were restarted then I got a > permission denied on the ods-directories and that was solve then adding the > “setuser match” in ceph.conf. > > With 5 of 12 OSD:s down I am starting to worry and since I only have one > replica I might lose som data. As I mentioned the OSD-services start and > “ceph osd in” does not give me any error but the OSD never comes up. > > Any suggestions or helpful tips are most welcome, > > /Claes > > > > > > > ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY > -1 24.00000 root default > -2 8.00000 host black > 3 2.00000 osd.3 up 1.00000 1.00000 > 2 2.00000 osd.2 up 1.00000 1.00000 > 0 2.00000 osd.0 up 1.00000 1.00000 > 1 2.00000 osd.1 up 1.00000 1.00000 > -3 8.00000 host purple > 7 2.00000 osd.7 down 0 1.00000 > 6 2.00000 osd.6 up 1.00000 1.00000 > 4 2.00000 osd.4 up 1.00000 1.00000 > 5 2.00000 osd.5 up 1.00000 1.00000 > -4 8.00000 host orange > 11 2.00000 osd.11 down 0 1.00000 > 10 2.00000 osd.10 down 0 1.00000 > 8 2.00000 osd.8 down 0 1.00000 > 9 2.00000 osd.9 down 0 1.00000 > > > > > > > root@black:/var/log/ceph# ceph -s > 2015-11-15 21:55:27.919339 7ffb38446700 0 -- :/1336310814 >> > 172.16.0.203:6789/0 pipe(0x7ffb34064550 sd=3 :0 s=1 pgs=0 cs=0 l=1 > c=0x7ffb3405e000).fault > cluster ee8eae7a-5994-48bc-bd43-aa07639a543b > health HEALTH_WARN > 1591 pgs backfill > 38 pgs backfilling > 2439 pgs degraded > 105 pgs down > 106 pgs peering > 138 pgs stale > 2439 pgs stuck degraded > 106 pgs stuck inactive > 138 pgs stuck stale > 2873 pgs stuck unclean > 2439 pgs stuck undersized > 2439 pgs undersized > recovery 1694156/6668499 objects degraded (25.405%) > recovery 2315800/6668499 objects misplaced (34.727%) > too many PGs per OSD (1197 > max 350) > 1 mons down, quorum 0,1 black,purple > monmap e3: 3 mons at > {black=172.16.0.201:6789/0,orange=172.16.0.203:6789/0,purple=172.16.0.202:6789/0} > election epoch 448, quorum 0,1 black,purple > mdsmap e5: 0/0/1 up > osdmap e34098: 12 osds: 7 up, 7 in; 2024 remapped pgs > pgmap v8211622: 4608 pgs, 3 pools, 12027 GB data, 3029 kobjects > 17141 GB used, 8927 GB / 26069 GB avail > 1694156/6668499 objects degraded (25.405%) > 2315800/6668499 objects misplaced (34.727%) > 1735 active+clean > 1590 active+undersized+degraded+remapped+wait_backfill > 637 active+undersized+degraded > 326 active+remapped > 137 stale+active+undersized+degraded > 101 down+peering > 38 active+undersized+degraded+remapped+backfilling > 37 active+undersized+degraded+remapped > 4 down+remapped+peering > 1 stale+remapped+peering > 1 active > 1 active+remapped+wait_backfill > recovery io 66787 kB/s, 16 objects/s > _______________________________________________ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com