cc the list as well > On 15 Nov 2015, at 23:41, Josef Johansson <jose...@gmail.com> wrote: > > Hi, > > So it’s just frozen at that point? > > You should definatly increase the logging and restart the osd. I believe it’s > debug osd 20 and debug mon 20. > > A quick google brings up a case where UUID was crashing. > http://serverfault.com/questions/671372/ceph-osd-always-down-in-ubuntu-14-04-1 > > <http://serverfault.com/questions/671372/ceph-osd-always-down-in-ubuntu-14-04-1> > > /Josef >> On 15 Nov 2015, at 23:29, Claes Sahlström <cl...@verymetal.com >> <mailto:cl...@verymetal.com>> wrote: >> >> Hi and thanks for helping. >> >> None that I can when scanning the logfile, it actually looks to me like it >> starts up just fine when I start the OSD. This is the last time I restarted >> it: >> >> 2015-11-15 22:58:13.445684 7f6f8f9be940 0 set uid:gid to 0:0 >> 2015-11-15 22:58:13.445854 7f6f8f9be940 0 ceph version 9.2.0 >> (bb2ecea240f3a1d525bcb35670cb07bd1f0ca299), process ceph-osd, pid 5463 >> 2015-11-15 22:58:13.510385 7f6f8f9be940 0 filestore(/ceph/osd.11) backend >> xfs (magic 0x58465342) >> 2015-11-15 22:58:13.511120 7f6f8f9be940 0 >> genericfilestorebackend(/ceph/osd.11) detect_features: FIEMAP ioctl is >> disabled via 'filestore fiemap' config option >> 2015-11-15 22:58:13.511129 7f6f8f9be940 0 >> genericfilestorebackend(/ceph/osd.11) detect_features: SEEK_DATA/SEEK_HOLE >> is disabled via 'filestore seek data hole' config option >> 2015-11-15 22:58:13.511158 7f6f8f9be940 0 >> genericfilestorebackend(/ceph/osd.11) detect_features: splice is supported >> 2015-11-15 22:58:13.515688 7f6f8f9be940 0 >> genericfilestorebackend(/ceph/osd.11) detect_features: syncfs(2) syscall >> fully supported (by glibc and kernel) >> 2015-11-15 22:58:13.515934 7f6f8f9be940 0 xfsfilestorebackend(/ceph/osd.11) >> detect_features: extsize is supported and your kernel >= 3.5 >> 2015-11-15 22:58:13.600801 7f6f8f9be940 0 filestore(/ceph/osd.11) mount: >> enabling WRITEAHEAD journal mode: checkpoint is not enabled >> 2015-11-15 22:58:39.150619 7f6f8f9be940 1 journal _open >> /dev/orange/journal-osd.11 fd 19: 23622320128 bytes, block size 4096 bytes, >> directio = 1, aio = 1 >> 2015-11-15 22:58:39.160621 7f6f8f9be940 1 journal _open >> /dev/orange/journal-osd.11 fd 19: 23622320128 bytes, block size 4096 bytes, >> directio = 1, aio = 1 >> 2015-11-15 22:58:39.192660 7f6f8f9be940 1 filestore(/ceph/osd.11) upgrade >> 2015-11-15 22:58:39.200192 7f6f8f9be940 0 <cls> >> cls/cephfs/cls_cephfs.cc:136: loading cephfs_size_scan >> 2015-11-15 22:58:39.200457 7f6f8f9be940 0 <cls> cls/hello/cls_hello.cc:305: >> loading cls_hello >> 2015-11-15 22:58:39.206906 7f6f8f9be940 0 osd.11 35462 crush map has >> features 1107558400, adjusting msgr requires for clients >> 2015-11-15 22:58:39.206983 7f6f8f9be940 0 osd.11 35462 crush map has >> features 1107558400 was 8705, adjusting msgr requires for mons >> 2015-11-15 22:58:39.207030 7f6f8f9be940 0 osd.11 35462 crush map has >> features 1107558400, adjusting msgr requires for osds >> 2015-11-15 22:58:40.712757 7f6f8f9be940 0 osd.11 35462 load_pgs >> 2015-11-15 22:59:09.980042 7f6f8f9be940 0 osd.11 35462 load_pgs opened 874 >> pgs >> 2015-11-15 22:59:09.981963 7f6f8f9be940 -1 osd.11 35462 log_to_monitors >> {default=true} >> 2015-11-15 22:59:09.990204 7f6f71312700 0 osd.11 35462 ignoring osdmap >> until we have initialized >> 2015-11-15 22:59:11.194276 7f6f8f9be940 0 osd.11 35462 done with init, >> starting boot process >> >> From: Josef Johansson [mailto:jose...@gmail.com <mailto:jose...@gmail.com>] >> Sent: den 15 november 2015 23:10 >> To: Claes Sahlström <cl...@verymetal.com <mailto:cl...@verymetal.com>> >> Cc: ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com> >> Subject: Re: [ceph-users] OSD:s failing out after upgrade to 9.2.0 on Ubuntu >> 14.04 >> >> Hi, >> >> Could you catch any segmentation faults in /var/log/ceph/ceph-osd.11.log ? >> >> Regards, >> Josef >> >> On 15 Nov 2015, at 23:06, Claes Sahlström <cl...@verymetal.com >> <mailto:cl...@verymetal.com>> wrote: >> >> Sorry to almost double post, I noticed that it seems like one mon is down, >> but they do actually seem to be ok, the 11 that are in falls out and I am >> back at 7 healthy OSD:s again: >> >> root@black:/var/lib/ceph/mon# ceph -s >> cluster ee8eae7a-5994-48bc-bd43-aa07639a543b >> health HEALTH_WARN >> 108 pgs backfill >> 37 pgs backfilling >> 2339 pgs degraded >> 105 pgs down >> 237 pgs peering >> 138 pgs stale >> 765 pgs stuck degraded >> 173 pgs stuck inactive >> 138 pgs stuck stale >> 3327 pgs stuck unclean >> 765 pgs stuck undersized >> 2339 pgs undersized >> recovery 1612956/6242357 objects degraded (25.839%) >> recovery 772311/6242357 objects misplaced (12.372%) >> too many PGs per OSD (561 > max 350) >> 4/11 in osds are down >> monmap e3: 3 mons at >> {black=172.16.0.201:6789/0,orange=172.16.0.203:6789/0,purple=172.16.0.202:6789/0} >> election epoch 456, quorum 0,1,2 black,purple,orange >> mdsmap e5: 0/0/1 up >> osdmap e35627: 12 osds: 7 up, 11 in; 1201 remapped pgs >> pgmap v8215121: 4608 pgs, 3 pools, 11897 GB data, 2996 kobjects >> 17203 GB used, 8865 GB / 26069 GB avail >> 1612956/6242357 objects degraded (25.839%) >> 772311/6242357 objects misplaced (12.372%) >> 2137 active+undersized+degraded >> 1052 active+clean >> 783 active+remapped >> 137 stale+active+undersized+degraded >> 104 down+peering >> 102 active+remapped+wait_backfill >> 66 remapped+peering >> 65 peering >> 33 active+remapped+backfilling >> 27 activating+undersized+degraded >> 26 active+undersized+degraded+remapped >> 25 activating >> 16 remapped >> 14 inactive >> 7 activating+remapped >> 6 active+undersized+degraded+remapped+wait_backfill >> 4 active+undersized+degraded+remapped+backfilling >> 2 activating+undersized+degraded+remapped >> 1 down+remapped+peering >> 1 stale+remapped+peering >> recovery io 22108 MB/s, 5581 objects/s >> client io 1065 MB/s rd, 2317 MB/s wr, 11435 op/s >> >> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com >> <mailto:ceph-users-boun...@lists.ceph.com>] On Behalf Of Claes Sahlström >> Sent: den 15 november 2015 21:56 >> To: ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com> >> Subject: [ceph-users] OSD:s failing out after upgrade to 9.2.0 on Ubuntu >> 14.04 >> >> Hi, >> >> I have a problem I hope is possible to solve… >> >> I upgraded to 9.2.0 a couple of days back and I missed this part: >> “If your systems already have a ceph user, upgrading the package will cause >> problems. We suggest you first remove or rename the existing ‘ceph’ user and >> ‘ceph’ group before upgrading.” >> >> I guess that might be the reason why my OSD:s has started to die on me. >> >> I can get the osd-services when having the file permissions as root:root >> and using: >> setuser match path = /var/lib/ceph/$type/$cluster-$i >> >> I am really not sure where to look to find out what is wrong. >> >> First when I had upgraded and the OSD:s were restarted then I got a >> permission denied on the ods-directories and that was solve then adding the >> “setuser match” in ceph.conf. >> >> With 5 of 12 OSD:s down I am starting to worry and since I only have one >> replica I might lose som data. As I mentioned the OSD-services start and >> “ceph osd in” does not give me any error but the OSD never comes up. >> >> Any suggestions or helpful tips are most welcome, >> >> /Claes >> >> >> >> >> >> >> ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY >> -1 24.00000 root default >> -2 8.00000 host black >> 3 2.00000 osd.3 up 1.00000 1.00000 >> 2 2.00000 osd.2 up 1.00000 1.00000 >> 0 2.00000 osd.0 up 1.00000 1.00000 >> 1 2.00000 osd.1 up 1.00000 1.00000 >> -3 8.00000 host purple >> 7 2.00000 osd.7 down 0 1.00000 >> 6 2.00000 osd.6 up 1.00000 1.00000 >> 4 2.00000 osd.4 up 1.00000 1.00000 >> 5 2.00000 osd.5 up 1.00000 1.00000 >> -4 8.00000 host orange >> 11 2.00000 osd.11 down 0 1.00000 >> 10 2.00000 osd.10 down 0 1.00000 >> 8 2.00000 osd.8 down 0 1.00000 >> 9 2.00000 osd.9 down 0 1.00000 >> >> >> >> >> >> >> root@black:/var/log/ceph# ceph -s >> 2015-11-15 21:55:27.919339 7ffb38446700 0 -- :/1336310814 >> >> 172.16.0.203:6789/0 pipe(0x7ffb34064550 sd=3 :0 s=1 pgs=0 cs=0 l=1 >> c=0x7ffb3405e000).fault >> cluster ee8eae7a-5994-48bc-bd43-aa07639a543b >> health HEALTH_WARN >> 1591 pgs backfill >> 38 pgs backfilling >> 2439 pgs degraded >> 105 pgs down >> 106 pgs peering >> 138 pgs stale >> 2439 pgs stuck degraded >> 106 pgs stuck inactive >> 138 pgs stuck stale >> 2873 pgs stuck unclean >> 2439 pgs stuck undersized >> 2439 pgs undersized >> recovery 1694156/6668499 objects degraded (25.405%) >> recovery 2315800/6668499 objects misplaced (34.727%) >> too many PGs per OSD (1197 > max 350) >> 1 mons down, quorum 0,1 black,purple >> monmap e3: 3 mons at >> {black=172.16.0.201:6789/0,orange=172.16.0.203:6789/0,purple=172.16.0.202:6789/0} >> election epoch 448, quorum 0,1 black,purple >> mdsmap e5: 0/0/1 up >> osdmap e34098: 12 osds: 7 up, 7 in; 2024 remapped pgs >> pgmap v8211622: 4608 pgs, 3 pools, 12027 GB data, 3029 kobjects >> 17141 GB used, 8927 GB / 26069 GB avail >> 1694156/6668499 objects degraded (25.405%) >> 2315800/6668499 objects misplaced (34.727%) >> 1735 active+clean >> 1590 active+undersized+degraded+remapped+wait_backfill >> 637 active+undersized+degraded >> 326 active+remapped >> 137 stale+active+undersized+degraded >> 101 down+peering >> 38 active+undersized+degraded+remapped+backfilling >> 37 active+undersized+degraded+remapped >> 4 down+remapped+peering >> 1 stale+remapped+peering >> 1 active >> 1 active+remapped+wait_backfill >> recovery io 66787 kB/s, 16 objects/s >> _______________________________________________ >> ceph-users mailing list >> ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> <http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com>
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com