[ceph-users] Re: upgraded to cluster to 16.2.6 PACIFIC

2021-11-09 Thread Ansgar Jazdzewski
> IIRC you get a HEALTH_WARN message that there are OSDs with old metadata
> format. You can suppress that warning, but I guess operators feel like
> they want to deal with the situation and get it fixed rather than ignore it.

Yes, and if suppress the waning gets forgotten you run into other
issues down the road
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: upgraded to cluster to 16.2.6 PACIFIC

2021-11-09 Thread Ansgar Jazdzewski
Am Di., 9. Nov. 2021 um 11:08 Uhr schrieb Dan van der Ster 
:
>
> Hi Ansgar,
>
> To clarify the messaging or docs, could you say where you learned that
> you should enable the bluestore_fsck_quick_fix_on_mount setting? Is
> that documented somewhere, or did you have it enabled from previously?
> The default is false so the corruption only occurs when users actively
> choose to fsck.

I have upgraded another cluster in the past with no issues as of
today, so I just followed my own instructions for this cluster

> As to recovery, Igor wrote the low level details here:
> https://www.spinics.net/lists/ceph-users/msg69338.html
> How did you resolve the omap issues in your rgw.index pool? What type
> of issues remain in meta and log?

for the index pool we run this script
https://paste.openstack.org/show/810861/ it adds a omap-key and
triggers a repair but is dose not work for the meta pool
my next best option  is to stop the radosgw and create a new pool with
the same data! like:

pool=default.rgw.meta
ceph osd pool create $pool.new 64 64
ceph osd pool application enable $pool.new rgw

# copy data
rados -p $pool export /tmp/$pool.img
rados -p $pool.new import /tmp/$pool.img

#swap pools
ceph osd pool rename $pool $pool.old
ceph osd pool rename $pool.new $pool

rm -f /tmp/$pool.img
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: upgraded to cluster to 16.2.6 PACIFIC

2021-11-09 Thread Dan van der Ster
On Tue, Nov 9, 2021 at 11:29 AM Stefan Kooman  wrote:
>
> On 11/9/21 11:07, Dan van der Ster wrote:
> > Hi Ansgar,
> >
> > To clarify the messaging or docs, could you say where you learned that
> > you should enable the bluestore_fsck_quick_fix_on_mount setting? Is
> > that documented somewhere, or did you have it enabled from previously?
> > The default is false so the corruption only occurs when users actively
> > choose to fsck.
>
> IIRC you get a HEALTH_WARN message that there are OSDs with old metadata
> format. You can suppress that warning, but I guess operators feel like
> they want to deal with the situation and get it fixed rather than ignore it.

That's exactly the point I wanted to clarify.
I know that this HEALTH_WARN was presented for N->O, but am unsure for Pacific.

(It clearly wouldn't be consistent for ceph to disable quick_fsck by
default, but then issue a HEALTH_WARN just after the upgrade).

Cheers, Dan

>
> Gr. Stefan
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: upgraded to cluster to 16.2.6 PACIFIC

2021-11-09 Thread Igor Fedotov

Hi Ansgar,

I've submitted the following PR to recover broken OMAPs: 
https://github.com/ceph/ceph/pull/43820


One needs a custom build for now to use it though. And this might be a 
bit risky to apply since this hasn't passed all the QA procedures...


For now I'm aware about one success and one failure stories of its 
application. Perhaps it would be a good idea to make a  DB volume backup 
(via either raw disk copy or bluefs export) prior to trying it.


Thanks,

Igor

On 11/9/2021 9:35 AM, Ansgar Jazdzewski wrote:

Hi fellow ceph users,

I did an upgrade from 14.2.23 to 16.2.6 not knowing that the current
minor version had this nasty bug! [1] [2]

we were able to resolve some of the omap issues in the rgw.index pool
but still have 17pg's to fix in the rgw.meta and rgw.log pool!

I have a couple of questions:
- did someone have done a script to fix that pg's we were only able to
fix the index with our approach [3]
- why is the 16.2.6 version still in the public mirror (should it not be moved)
- do you have any other workarounds to resolve this?

thanks for your help!
Ansgar

1) https://docs.ceph.com/en/latest/releases/pacific/
2) https://tracker.ceph.com/issues/53062
3) https://paste.openstack.org/show/810861
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


--
Igor Fedotov
Ceph Lead Developer

Looking for help with your Ceph cluster? Contact us at https://croit.io

croit GmbH, Freseniusstr. 31h, 81247 Munich
CEO: Martin Verges - VAT-ID: DE310638492
Com. register: Amtsgericht Munich HRB 231263
Web: https://croit.io | YouTube: https://goo.gl/PGE1Bx

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: upgraded to cluster to 16.2.6 PACIFIC

2021-11-09 Thread Dan van der Ster
Hi Ansgar,

To clarify the messaging or docs, could you say where you learned that
you should enable the bluestore_fsck_quick_fix_on_mount setting? Is
that documented somewhere, or did you have it enabled from previously?
The default is false so the corruption only occurs when users actively
choose to fsck.

As to recovery, Igor wrote the low level details here:
https://www.spinics.net/lists/ceph-users/msg69338.html
How did you resolve the omap issues in your rgw.index pool? What type
of issues remain in meta and log?

Cheers, Dan


On Tue, Nov 9, 2021 at 7:36 AM Ansgar Jazdzewski
 wrote:
>
> Hi fellow ceph users,
>
> I did an upgrade from 14.2.23 to 16.2.6 not knowing that the current
> minor version had this nasty bug! [1] [2]
>
> we were able to resolve some of the omap issues in the rgw.index pool
> but still have 17pg's to fix in the rgw.meta and rgw.log pool!
>
> I have a couple of questions:
> - did someone have done a script to fix that pg's we were only able to
> fix the index with our approach [3]
> - why is the 16.2.6 version still in the public mirror (should it not be 
> moved)
> - do you have any other workarounds to resolve this?
>
> thanks for your help!
> Ansgar
>
> 1) https://docs.ceph.com/en/latest/releases/pacific/
> 2) https://tracker.ceph.com/issues/53062
> 3) https://paste.openstack.org/show/810861
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: upgraded to cluster to 16.2.6 PACIFIC

2021-11-09 Thread Marc
> > I did an upgrade from 14.2.23 to 16.2.6 not knowing that the current
> > minor version had this nasty bug! [1] [2]
> 
> I'm sorry you hit this bug. We tried to warn users through
> documentation. Apparently this is not enough and other ways of informing
> operators about such (rare) incidents might be worthwile. I have put
> this topic on the agenda for the upcoming user + dev monthly meeting
> [1].

This is to be expected not? If you aim to get a broader audience to use ceph 
and start making tools like ceph-adm for just clicking next-next-next-done. You 
cannot expect then also that this audience is reading anything. I think this is 
mutually exclusive behaviour. So you are left only with not allowing the update 
to be done via such tools.



 
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io