Re: [ceph-users] OSD:s failing out after upgrade to 9.2.0 on Ubuntu 14.04

Josef Johansson Sun, 15 Nov 2015 14:42:46 -0800

cc the list as well
> On 15 Nov 2015, at 23:41, Josef Johansson <jose...@gmail.com> wrote:
> 
> Hi,
> 
> So it’s just frozen at that point?
> 
> You should definatly increase the logging and restart the osd. I believe it’s 
> debug osd 20 and debug mon 20. 
> 
> A quick google brings up a case where UUID was crashing. 
> http://serverfault.com/questions/671372/ceph-osd-always-down-in-ubuntu-14-04-1
>  
> <http://serverfault.com/questions/671372/ceph-osd-always-down-in-ubuntu-14-04-1>
> 
> /Josef
>> On 15 Nov 2015, at 23:29, Claes Sahlström <cl...@verymetal.com 
>> <mailto:cl...@verymetal.com>> wrote:
>> 
>> Hi and thanks for helping.
>>  
>> None that I can when scanning the logfile, it actually looks to me like it 
>> starts up just fine when I start the OSD. This is the last time I restarted 
>> it:
>>  
>> 2015-11-15 22:58:13.445684 7f6f8f9be940  0 set uid:gid to 0:0
>> 2015-11-15 22:58:13.445854 7f6f8f9be940  0 ceph version 9.2.0 
>> (bb2ecea240f3a1d525bcb35670cb07bd1f0ca299), process ceph-osd, pid 5463
>> 2015-11-15 22:58:13.510385 7f6f8f9be940  0 filestore(/ceph/osd.11) backend 
>> xfs (magic 0x58465342)
>> 2015-11-15 22:58:13.511120 7f6f8f9be940  0 
>> genericfilestorebackend(/ceph/osd.11) detect_features: FIEMAP ioctl is 
>> disabled via 'filestore fiemap' config option
>> 2015-11-15 22:58:13.511129 7f6f8f9be940  0 
>> genericfilestorebackend(/ceph/osd.11) detect_features: SEEK_DATA/SEEK_HOLE 
>> is disabled via 'filestore seek data hole' config option
>> 2015-11-15 22:58:13.511158 7f6f8f9be940  0 
>> genericfilestorebackend(/ceph/osd.11) detect_features: splice is supported
>> 2015-11-15 22:58:13.515688 7f6f8f9be940  0 
>> genericfilestorebackend(/ceph/osd.11) detect_features: syncfs(2) syscall 
>> fully supported (by glibc and kernel)
>> 2015-11-15 22:58:13.515934 7f6f8f9be940  0 xfsfilestorebackend(/ceph/osd.11) 
>> detect_features: extsize is supported and your kernel >= 3.5
>> 2015-11-15 22:58:13.600801 7f6f8f9be940  0 filestore(/ceph/osd.11) mount: 
>> enabling WRITEAHEAD journal mode: checkpoint is not enabled
>> 2015-11-15 22:58:39.150619 7f6f8f9be940  1 journal _open 
>> /dev/orange/journal-osd.11 fd 19: 23622320128 bytes, block size 4096 bytes, 
>> directio = 1, aio = 1
>> 2015-11-15 22:58:39.160621 7f6f8f9be940  1 journal _open 
>> /dev/orange/journal-osd.11 fd 19: 23622320128 bytes, block size 4096 bytes, 
>> directio = 1, aio = 1
>> 2015-11-15 22:58:39.192660 7f6f8f9be940  1 filestore(/ceph/osd.11) upgrade
>> 2015-11-15 22:58:39.200192 7f6f8f9be940  0 <cls> 
>> cls/cephfs/cls_cephfs.cc:136: loading cephfs_size_scan
>> 2015-11-15 22:58:39.200457 7f6f8f9be940  0 <cls> cls/hello/cls_hello.cc:305: 
>> loading cls_hello
>> 2015-11-15 22:58:39.206906 7f6f8f9be940  0 osd.11 35462 crush map has 
>> features 1107558400, adjusting msgr requires for clients
>> 2015-11-15 22:58:39.206983 7f6f8f9be940  0 osd.11 35462 crush map has 
>> features 1107558400 was 8705, adjusting msgr requires for mons
>> 2015-11-15 22:58:39.207030 7f6f8f9be940  0 osd.11 35462 crush map has 
>> features 1107558400, adjusting msgr requires for osds
>> 2015-11-15 22:58:40.712757 7f6f8f9be940  0 osd.11 35462 load_pgs
>> 2015-11-15 22:59:09.980042 7f6f8f9be940  0 osd.11 35462 load_pgs opened 874 
>> pgs
>> 2015-11-15 22:59:09.981963 7f6f8f9be940 -1 osd.11 35462 log_to_monitors 
>> {default=true}
>> 2015-11-15 22:59:09.990204 7f6f71312700  0 osd.11 35462 ignoring osdmap 
>> until we have initialized
>> 2015-11-15 22:59:11.194276 7f6f8f9be940  0 osd.11 35462 done with init, 
>> starting boot process
>>  
>> From: Josef Johansson [mailto:jose...@gmail.com <mailto:jose...@gmail.com>] 
>> Sent: den 15 november 2015 23:10
>> To: Claes Sahlström <cl...@verymetal.com <mailto:cl...@verymetal.com>>
>> Cc: ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
>> Subject: Re: [ceph-users] OSD:s failing out after upgrade to 9.2.0 on Ubuntu 
>> 14.04
>>  
>> Hi,
>>  
>> Could you catch any segmentation faults in /var/log/ceph/ceph-osd.11.log ?
>>  
>> Regards,
>> Josef
>>  
>> On 15 Nov 2015, at 23:06, Claes Sahlström <cl...@verymetal.com 
>> <mailto:cl...@verymetal.com>> wrote:
>>  
>> Sorry to almost double post, I noticed that it seems like one mon is down, 
>> but they do actually seem to be ok, the 11 that are in falls out and I am 
>> back at 7 healthy OSD:s again:
>>  
>> root@black:/var/lib/ceph/mon# ceph -s
>>     cluster ee8eae7a-5994-48bc-bd43-aa07639a543b
>>      health HEALTH_WARN
>>             108 pgs backfill
>>             37 pgs backfilling
>>             2339 pgs degraded
>>             105 pgs down
>>             237 pgs peering
>>             138 pgs stale
>>             765 pgs stuck degraded
>>             173 pgs stuck inactive
>>             138 pgs stuck stale
>>             3327 pgs stuck unclean
>>             765 pgs stuck undersized
>>             2339 pgs undersized
>>             recovery 1612956/6242357 objects degraded (25.839%)
>>             recovery 772311/6242357 objects misplaced (12.372%)
>>             too many PGs per OSD (561 > max 350)
>>             4/11 in osds are down
>>      monmap e3: 3 mons at 
>> {black=172.16.0.201:6789/0,orange=172.16.0.203:6789/0,purple=172.16.0.202:6789/0}
>>             election epoch 456, quorum 0,1,2 black,purple,orange
>>      mdsmap e5: 0/0/1 up
>>      osdmap e35627: 12 osds: 7 up, 11 in; 1201 remapped pgs
>>       pgmap v8215121: 4608 pgs, 3 pools, 11897 GB data, 2996 kobjects
>>             17203 GB used, 8865 GB / 26069 GB avail
>>             1612956/6242357 objects degraded (25.839%)
>>             772311/6242357 objects misplaced (12.372%)
>>                 2137 active+undersized+degraded
>>                 1052 active+clean
>>                  783 active+remapped
>>                  137 stale+active+undersized+degraded
>>                  104 down+peering
>>                  102 active+remapped+wait_backfill
>>                   66 remapped+peering
>>                   65 peering
>>                   33 active+remapped+backfilling
>>                   27 activating+undersized+degraded
>>                   26 active+undersized+degraded+remapped
>>                   25 activating
>>                   16 remapped
>>                   14 inactive
>>                    7 activating+remapped
>>                    6 active+undersized+degraded+remapped+wait_backfill
>>                    4 active+undersized+degraded+remapped+backfilling
>>                    2 activating+undersized+degraded+remapped
>>                    1 down+remapped+peering
>>                    1 stale+remapped+peering
>> recovery io 22108 MB/s, 5581 objects/s
>>   client io 1065 MB/s rd, 2317 MB/s wr, 11435 op/s
>>  
>> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com 
>> <mailto:ceph-users-boun...@lists.ceph.com>] On Behalf Of Claes Sahlström
>> Sent: den 15 november 2015 21:56
>> To: ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
>> Subject: [ceph-users] OSD:s failing out after upgrade to 9.2.0 on Ubuntu 
>> 14.04
>>  
>> Hi,
>>  
>> I have a problem I hope is possible to solve…
>>  
>> I upgraded to 9.2.0 a couple of days back and I missed this part:
>> “If your systems already have a ceph user, upgrading the package will cause 
>> problems. We suggest you first remove or rename the existing ‘ceph’ user and 
>> ‘ceph’ group before upgrading.”
>>  
>> I guess that might be the reason why my OSD:s has started to die on me.
>>  
>> I can get the osd-services when having the file permissions as root:root  
>> and using:
>> setuser match path = /var/lib/ceph/$type/$cluster-$i
>>  
>> I am really not sure where to look to find out what is wrong.
>>  
>> First when I had upgraded and the OSD:s were restarted then I got a 
>> permission denied on the ods-directories and that was solve then adding the 
>> “setuser match” in ceph.conf.
>>  
>> With 5 of 12 OSD:s down I am starting to worry and since I only have one 
>> replica I might lose som data. As I mentioned the OSD-services start and 
>> “ceph osd in” does not give me any error but the OSD never comes up.
>>  
>> Any suggestions or helpful tips are most welcome,
>>  
>> /Claes
>>  
>>  
>>  
>>  
>>  
>>  
>> ID WEIGHT   TYPE NAME       UP/DOWN REWEIGHT PRIMARY-AFFINITY
>> -1 24.00000 root default
>> -2  8.00000     host black
>> 3  2.00000         osd.3        up  1.00000          1.00000
>> 2  2.00000         osd.2        up  1.00000          1.00000
>> 0  2.00000         osd.0        up  1.00000          1.00000
>> 1  2.00000         osd.1        up  1.00000          1.00000
>> -3  8.00000     host purple
>> 7  2.00000         osd.7      down        0          1.00000
>> 6  2.00000         osd.6        up  1.00000          1.00000
>> 4  2.00000         osd.4        up  1.00000          1.00000
>> 5  2.00000         osd.5        up  1.00000          1.00000
>> -4  8.00000     host orange
>> 11  2.00000         osd.11     down        0          1.00000
>> 10  2.00000         osd.10     down        0          1.00000
>> 8  2.00000         osd.8      down        0          1.00000
>> 9  2.00000         osd.9      down        0          1.00000
>>  
>>  
>>  
>>  
>>  
>>  
>> root@black:/var/log/ceph# ceph -s
>> 2015-11-15 21:55:27.919339 7ffb38446700  0 -- :/1336310814 >> 
>> 172.16.0.203:6789/0 pipe(0x7ffb34064550 sd=3 :0 s=1 pgs=0 cs=0 l=1 
>> c=0x7ffb3405e000).fault
>>     cluster ee8eae7a-5994-48bc-bd43-aa07639a543b
>>      health HEALTH_WARN
>>             1591 pgs backfill
>>             38 pgs backfilling
>>             2439 pgs degraded
>>             105 pgs down
>>             106 pgs peering
>>             138 pgs stale
>>             2439 pgs stuck degraded
>>             106 pgs stuck inactive
>>             138 pgs stuck stale
>>             2873 pgs stuck unclean
>>             2439 pgs stuck undersized
>>             2439 pgs undersized
>>             recovery 1694156/6668499 objects degraded (25.405%)
>>             recovery 2315800/6668499 objects misplaced (34.727%)
>>             too many PGs per OSD (1197 > max 350)
>>             1 mons down, quorum 0,1 black,purple
>>      monmap e3: 3 mons at 
>> {black=172.16.0.201:6789/0,orange=172.16.0.203:6789/0,purple=172.16.0.202:6789/0}
>>             election epoch 448, quorum 0,1 black,purple
>>      mdsmap e5: 0/0/1 up
>>      osdmap e34098: 12 osds: 7 up, 7 in; 2024 remapped pgs
>>       pgmap v8211622: 4608 pgs, 3 pools, 12027 GB data, 3029 kobjects
>>             17141 GB used, 8927 GB / 26069 GB avail
>>             1694156/6668499 objects degraded (25.405%)
>>             2315800/6668499 objects misplaced (34.727%)
>>                 1735 active+clean
>>                 1590 active+undersized+degraded+remapped+wait_backfill
>>                  637 active+undersized+degraded
>>                  326 active+remapped
>>                  137 stale+active+undersized+degraded
>>                  101 down+peering
>>                   38 active+undersized+degraded+remapped+backfilling
>>                   37 active+undersized+degraded+remapped
>>                    4 down+remapped+peering
>>                    1 stale+remapped+peering
>>                    1 active
>>                    1 active+remapped+wait_backfill
>> recovery io 66787 kB/s, 16 objects/s
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
>> <http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com>

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] OSD:s failing out after upgrade to 9.2.0 on Ubuntu 14.04

Reply via email to