Hi, the problem still exists for me, this happens to SSD OSDs only - I recreated all of them running 12.2.8
this is what i got even on newly created OSDs after some time and crashes ceph-bluestore-tool fsck -l /root/fsck-osd.0.log --log-level=20 --path /var/lib/ceph/osd/ceph-0 --deep on 2018-09-05 10:15:42.784873 7f609a311ec0 -1 bluestore(/var/lib/ceph/osd/ceph-137) fsck error: found stray shared blob data for sbid 0x34dbe4 2018-09-05 10:15:42.818239 7f609a311ec0 -1 bluestore(/var/lib/ceph/osd/ceph-137) fsck error: found stray shared blob data for sbid 0x376ccf 2018-09-05 10:15:42.863419 7f609a311ec0 -1 bluestore(/var/lib/ceph/osd/ceph-137) fsck error: found stray shared blob data for sbid 0x3a4e58 2018-09-05 10:15:42.887404 7f609a311ec0 -1 bluestore(/var/lib/ceph/osd/ceph-137) fsck error: found stray shared blob data for sbid 0x3b7f29 2018-09-05 10:15:42.958417 7f609a311ec0 -1 bluestore(/var/lib/ceph/osd/ceph-137) fsck error: found stray shared blob data for sbid 0x3df760 2018-09-05 10:15:42.961275 7f609a311ec0 -1 bluestore(/var/lib/ceph/osd/ceph-137) fsck error: found stray shared blob data for sbid 0x3e076f 2018-09-05 10:15:43.038658 7f609a311ec0 -1 bluestore(/var/lib/ceph/osd/ceph-137) fsck error: found stray shared blob data for sbid 0x3ff156 I don't know if these errors are the reason for the OSD crashes or the result of it currently I'm trying to catch some verbose logs see also Radoslaws reply below >This looks quite similar to #25001 [1]. The corruption *might* be caused by >the racy SharedBlob::put() [2] that was fixed in 12.2.6. However, more logs >(debug_bluestore=20, debug_bdev=20) would be useful. Also you might >want to carefully use fsck -- please take a look on the Igor's (CCed) post >and Troy's response. > >Best regards, >Radoslaw Zarzynski > >[1] http://tracker.ceph.com/issues/25001 >[2] http://tracker.ceph.com/issues/24211 >[3] http://tracker.ceph.com/issues/25001#note-6 I'll keep you updated br wolfgang On 2018-09-06 09:27, Caspar Smit wrote: > Hi, > > These reports are kind of worrying since we have a 12.2.5 cluster too > waiting to upgrade. Did you have a luck with upgrading to 12.2.8 or > still the same behavior? > Is there a bugtracker for this issue? > > Kind regards, > Caspar > > Op di 4 sep. 2018 om 09:59 schreef Wolfgang Lendl > <wolfgang.le...@meduniwien.ac.at > <mailto:wolfgang.le...@meduniwien.ac.at>>: > > is downgrading from 12.2.7 to 12.2.5 an option? - I'm still suffering > from high frequent osd crashes. > my hopes are with 12.2.9 - but hope wasn't always my best strategy > > br > wolfgang > > On 2018-08-30 19:18, Alfredo Deza wrote: > > On Thu, Aug 30, 2018 at 5:24 AM, Wolfgang Lendl > > <wolfgang.le...@meduniwien.ac.at > <mailto:wolfgang.le...@meduniwien.ac.at>> wrote: > >> Hi Alfredo, > >> > >> > >> caught some logs: > >> https://pastebin.com/b3URiA7p > > That looks like there is an issue with bluestore. Maybe Radoslaw or > > Adam might know a bit more. > > > > > >> br > >> wolfgang > >> > >> On 2018-08-29 15:51, Alfredo Deza wrote: > >>> On Wed, Aug 29, 2018 at 2:06 AM, Wolfgang Lendl > >>> <wolfgang.le...@meduniwien.ac.at > <mailto:wolfgang.le...@meduniwien.ac.at>> wrote: > >>>> Hi, > >>>> > >>>> after upgrading my ceph clusters from 12.2.5 to 12.2.7 I'm > experiencing random crashes from SSD OSDs (bluestore) - it seems > that HDD OSDs are not affected. > >>>> I destroyed and recreated some of the SSD OSDs which seemed > to help. > >>>> > >>>> this happens on centos 7.5 (different kernels tested) > >>>> > >>>> /var/log/messages: > >>>> Aug 29 10:24:08 ceph-osd: *** Caught signal (Segmentation > fault) ** > >>>> Aug 29 10:24:08 ceph-osd: in thread 7f8a8e69e700 > thread_name:bstore_kv_final > >>>> Aug 29 10:24:08 kernel: traps: bstore_kv_final[187470] > general protection ip:7f8a997cf42b sp:7f8a8e69abc0 error:0 in > libtcmalloc.so.4.4.5[7f8a997a8000+46000] > >>>> Aug 29 10:24:08 systemd: ceph-osd@2.service: main process > exited, code=killed, status=11/SEGV > >>>> Aug 29 10:24:08 systemd: Unit ceph-osd@2.service entered > failed state. > >>>> Aug 29 10:24:08 systemd: ceph-osd@2.service failed. > >>>> Aug 29 10:24:28 systemd: ceph-osd@2.service holdoff time > over, scheduling restart. > >>>> Aug 29 10:24:28 systemd: Starting Ceph object storage daemon > osd.2... > >>>> Aug 29 10:24:28 systemd: Started Ceph object storage daemon > osd.2. > >>>> Aug 29 10:24:28 ceph-osd: starting osd.2 at - osd_data > /var/lib/ceph/osd/ceph-2 /var/lib/ceph/osd/ceph-2/journal > >>>> Aug 29 10:24:35 ceph-osd: *** Caught signal (Segmentation > fault) ** > >>>> Aug 29 10:24:35 ceph-osd: in thread 7f5f1e790700 > thread_name:tp_osd_tp > >>>> Aug 29 10:24:35 kernel: traps: tp_osd_tp[186933] general > protection ip:7f5f43103e63 sp:7f5f1e78a1c8 error:0 in > libtcmalloc.so.4.4.5[7f5f430cd000+46000] > >>>> Aug 29 10:24:35 systemd: ceph-osd@0.service: main process > exited, code=killed, status=11/SEGV > >>>> Aug 29 10:24:35 systemd: Unit ceph-osd@0.service entered > failed state. > >>>> Aug 29 10:24:35 systemd: ceph-osd@0.service failed > >>> These systemd messages aren't usually helpful, try poking around > >>> /var/log/ceph/ for the output on that one OSD. > >>> > >>> If those logs aren't useful either, try bumping up the > verbosity (see > >>> > > http://docs.ceph.com/docs/master/rados/troubleshooting/log-and-debug/#boot-time > >>> ) > >>>> did I hit a known issue? > >>>> any suggestions are highly appreciated > >>>> > >>>> > >>>> br > >>>> wolfgang > >>>> > >>>> > >>>> > >>>> _______________________________________________ > >>>> ceph-users mailing list > >>>> ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com> > >>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > >>>> > >> -- > >> Wolfgang Lendl > >> IT Systems & Communications > >> Medizinische Universität Wien > >> Spitalgasse 23 / BT 88 /Ebene 00 > >> A-1090 Wien > >> Tel: +43 1 40160-21231 > >> Fax: +43 1 40160-921200 > >> > >> > > -- > Wolfgang Lendl > IT Systems & Communications > Medizinische Universität Wien > Spitalgasse 23 / BT 88 /Ebene 00 > A-1090 Wien > Tel: +43 1 40160-21231 > Fax: +43 1 40160-921200 > > > _______________________________________________ > ceph-users mailing list > ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > > _______________________________________________ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Wolfgang Lendl IT Systems & Communications Medizinische Universität Wien Spitalgasse 23 / BT 88 /Ebene 00 A-1090 Wien Tel: +43 1 40160-21231 Fax: +43 1 40160-921200
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com