[ceph-users] Re: upgraded to cluster to 16.2.6 PACIFIC
> IIRC you get a HEALTH_WARN message that there are OSDs with old metadata > format. You can suppress that warning, but I guess operators feel like > they want to deal with the situation and get it fixed rather than ignore it. Yes, and if suppress the waning gets forgotten you run into other issues down the road ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: upgraded to cluster to 16.2.6 PACIFIC
Am Di., 9. Nov. 2021 um 11:08 Uhr schrieb Dan van der Ster : > > Hi Ansgar, > > To clarify the messaging or docs, could you say where you learned that > you should enable the bluestore_fsck_quick_fix_on_mount setting? Is > that documented somewhere, or did you have it enabled from previously? > The default is false so the corruption only occurs when users actively > choose to fsck. I have upgraded another cluster in the past with no issues as of today, so I just followed my own instructions for this cluster > As to recovery, Igor wrote the low level details here: > https://www.spinics.net/lists/ceph-users/msg69338.html > How did you resolve the omap issues in your rgw.index pool? What type > of issues remain in meta and log? for the index pool we run this script https://paste.openstack.org/show/810861/ it adds a omap-key and triggers a repair but is dose not work for the meta pool my next best option is to stop the radosgw and create a new pool with the same data! like: pool=default.rgw.meta ceph osd pool create $pool.new 64 64 ceph osd pool application enable $pool.new rgw # copy data rados -p $pool export /tmp/$pool.img rados -p $pool.new import /tmp/$pool.img #swap pools ceph osd pool rename $pool $pool.old ceph osd pool rename $pool.new $pool rm -f /tmp/$pool.img ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: upgraded to cluster to 16.2.6 PACIFIC
On Tue, Nov 9, 2021 at 11:29 AM Stefan Kooman wrote: > > On 11/9/21 11:07, Dan van der Ster wrote: > > Hi Ansgar, > > > > To clarify the messaging or docs, could you say where you learned that > > you should enable the bluestore_fsck_quick_fix_on_mount setting? Is > > that documented somewhere, or did you have it enabled from previously? > > The default is false so the corruption only occurs when users actively > > choose to fsck. > > IIRC you get a HEALTH_WARN message that there are OSDs with old metadata > format. You can suppress that warning, but I guess operators feel like > they want to deal with the situation and get it fixed rather than ignore it. That's exactly the point I wanted to clarify. I know that this HEALTH_WARN was presented for N->O, but am unsure for Pacific. (It clearly wouldn't be consistent for ceph to disable quick_fsck by default, but then issue a HEALTH_WARN just after the upgrade). Cheers, Dan > > Gr. Stefan ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: upgraded to cluster to 16.2.6 PACIFIC
Hi Ansgar, I've submitted the following PR to recover broken OMAPs: https://github.com/ceph/ceph/pull/43820 One needs a custom build for now to use it though. And this might be a bit risky to apply since this hasn't passed all the QA procedures... For now I'm aware about one success and one failure stories of its application. Perhaps it would be a good idea to make a DB volume backup (via either raw disk copy or bluefs export) prior to trying it. Thanks, Igor On 11/9/2021 9:35 AM, Ansgar Jazdzewski wrote: Hi fellow ceph users, I did an upgrade from 14.2.23 to 16.2.6 not knowing that the current minor version had this nasty bug! [1] [2] we were able to resolve some of the omap issues in the rgw.index pool but still have 17pg's to fix in the rgw.meta and rgw.log pool! I have a couple of questions: - did someone have done a script to fix that pg's we were only able to fix the index with our approach [3] - why is the 16.2.6 version still in the public mirror (should it not be moved) - do you have any other workarounds to resolve this? thanks for your help! Ansgar 1) https://docs.ceph.com/en/latest/releases/pacific/ 2) https://tracker.ceph.com/issues/53062 3) https://paste.openstack.org/show/810861 ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io -- Igor Fedotov Ceph Lead Developer Looking for help with your Ceph cluster? Contact us at https://croit.io croit GmbH, Freseniusstr. 31h, 81247 Munich CEO: Martin Verges - VAT-ID: DE310638492 Com. register: Amtsgericht Munich HRB 231263 Web: https://croit.io | YouTube: https://goo.gl/PGE1Bx ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: upgraded to cluster to 16.2.6 PACIFIC
Hi Ansgar, To clarify the messaging or docs, could you say where you learned that you should enable the bluestore_fsck_quick_fix_on_mount setting? Is that documented somewhere, or did you have it enabled from previously? The default is false so the corruption only occurs when users actively choose to fsck. As to recovery, Igor wrote the low level details here: https://www.spinics.net/lists/ceph-users/msg69338.html How did you resolve the omap issues in your rgw.index pool? What type of issues remain in meta and log? Cheers, Dan On Tue, Nov 9, 2021 at 7:36 AM Ansgar Jazdzewski wrote: > > Hi fellow ceph users, > > I did an upgrade from 14.2.23 to 16.2.6 not knowing that the current > minor version had this nasty bug! [1] [2] > > we were able to resolve some of the omap issues in the rgw.index pool > but still have 17pg's to fix in the rgw.meta and rgw.log pool! > > I have a couple of questions: > - did someone have done a script to fix that pg's we were only able to > fix the index with our approach [3] > - why is the 16.2.6 version still in the public mirror (should it not be > moved) > - do you have any other workarounds to resolve this? > > thanks for your help! > Ansgar > > 1) https://docs.ceph.com/en/latest/releases/pacific/ > 2) https://tracker.ceph.com/issues/53062 > 3) https://paste.openstack.org/show/810861 > ___ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: upgraded to cluster to 16.2.6 PACIFIC
> > I did an upgrade from 14.2.23 to 16.2.6 not knowing that the current > > minor version had this nasty bug! [1] [2] > > I'm sorry you hit this bug. We tried to warn users through > documentation. Apparently this is not enough and other ways of informing > operators about such (rare) incidents might be worthwile. I have put > this topic on the agenda for the upcoming user + dev monthly meeting > [1]. This is to be expected not? If you aim to get a broader audience to use ceph and start making tools like ceph-adm for just clicking next-next-next-done. You cannot expect then also that this audience is reading anything. I think this is mutually exclusive behaviour. So you are left only with not allowing the update to be done via such tools. ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io