[ceph-users] Re: reef 18.2.3 QE validation status

2024-04-12 Thread Casey Bodley
On Fri, Apr 12, 2024 at 2:38 PM Yuri Weinstein wrote: > > Details of this release are summarized here: > > https://tracker.ceph.com/issues/65393#note-1 > Release Notes - TBD > LRC upgrade - TBD > > Seeking approvals/reviews for: > > smoke - infra issues, still trying, Laura PTL > > rados - Radek,

[ceph-users] Re: reef 18.2.3 QE validation status

2024-04-12 Thread Ilya Dryomov
On Fri, Apr 12, 2024 at 8:38 PM Yuri Weinstein wrote: > > Details of this release are summarized here: > > https://tracker.ceph.com/issues/65393#note-1 > Release Notes - TBD > LRC upgrade - TBD > > Seeking approvals/reviews for: > > smoke - infra issues, still trying, Laura PTL > > rados - Radek,

[ceph-users] Re: Setting S3 bucket policies with multi-tenants

2024-04-12 Thread Wesley Dillingham
Did you actually get this working? I am trying to replicate your steps but am not being successful doing this with multi-tenant. Respectfully, *Wes Dillingham* LinkedIn w...@wesdillingham.com On Wed, Nov 1, 2023 at 12:52 PM Thomas Bennett wrote:

[ceph-users] reef 18.2.3 QE validation status

2024-04-12 Thread Yuri Weinstein
Details of this release are summarized here: https://tracker.ceph.com/issues/65393#note-1 Release Notes - TBD LRC upgrade - TBD Seeking approvals/reviews for: smoke - infra issues, still trying, Laura PTL rados - Radek, Laura approved? Travis? Nizamudeen? rgw - Casey approved? fs - Venky

[ceph-users] Re: PG inconsistent

2024-04-12 Thread Anthony D'Atri
If you're using an Icinga active check that just looks for SMART overall-health self-assessment test result: PASSED then it's not doing much for you. That bivalue status can be shown for a drive that is decidedly an ex-parrot. Gotta look at specific attributes, which is thorny since they

[ceph-users] Re: PG inconsistent

2024-04-12 Thread Frédéric Nass
- Le 12 Avr 24, à 15:17, Albert Shih albert.s...@obspm.fr a écrit : > Le 12/04/2024 à 12:56:12+0200, Frédéric Nass a écrit >> > Hi, > >> >> Have you check the hardware status of the involved drives other than with >> smartctl? Like with the manufacturer's tools / WebUI (iDrac / perccli

[ceph-users] Re: Impact of large PG splits

2024-04-12 Thread Anthony D'Atri
One can up the ratios temporarily but it's all too easy to forget to reduce them later, or think that it's okay to run all the time with reduced headroom. Until a host blows up and you don't have enough space to recover into. > On Apr 12, 2024, at 05:01, Frédéric Nass > wrote: > > > Oh, and

[ceph-users] Re: PG inconsistent

2024-04-12 Thread Albert Shih
Le 12/04/2024 à 12:56:12+0200, Frédéric Nass a écrit > Hi, > > Have you check the hardware status of the involved drives other than with > smartctl? Like with the manufacturer's tools / WebUI (iDrac / perccli for > DELL hardware for example). Yes, all my disk are «under» periodic check with

[ceph-users] Re: Impact of large PG splits

2024-04-12 Thread Eugen Block
Thanks for chiming in. They are on version 16.2.13 (I was already made aware of the bug you mentioned, thanks!) with wpq. Until now I haven't got an emergency call so I assume everything is calm (I hope). New hardware has been ordered but it will take a couple of weeks until it's delivered,

[ceph-users] Re: PG inconsistent

2024-04-12 Thread Wesley Dillingham
check your ceph.log on the mons for "stat mismatch" and grep for the PG in question for potentially more information. Additionally "rados list-inconsistent-obj {pgid}" will often show which OSD and objects are implicated for the inconsistency. If the acting set has changed since the scrub (for

[ceph-users] Re: PG inconsistent

2024-04-12 Thread Frédéric Nass
Hello Albert, Have you check the hardware status of the involved drives other than with smartctl? Like with the manufacturer's tools / WebUI (iDrac / perccli for DELL hardware for example). If these tools don't report any media error (that is bad blocs on disks) then you might just be facing

[ceph-users] PG inconsistent

2024-04-12 Thread Albert Shih
Hi everyone. I got a warning with root@cthulhu1:/etc/ceph# ceph -s cluster: id: 9c5bb196-c212-11ee-84f3-c3f2beae892d health: HEALTH_ERR 1 scrub errors Possible data damage: 1 pg inconsistent So I find the pg with the issue, and launch a pg repair (still

[ceph-users] Re: Impact of large PG splits

2024-04-12 Thread Frédéric Nass
Oh, and yeah, considering "The fullest OSD is already at 85% usage" best move for now would be to add new hardware/OSDs (to avoid reaching the backfill too full limit), prior to start the splitting PGs before or after enabling upmap balancer depending on how the PGs got rebalanced (well enough

[ceph-users] Re: Impact of large PG splits

2024-04-12 Thread Frédéric Nass
Hello Eugen, Is this cluster using WPQ or mClock scheduler? (cephadm shell ceph daemon osd.0 config show | grep osd_op_queue) If WPQ, you might want to tune osd_recovery_sleep* values as they do have a real impact on the recovery/backfilling speed. Just lower osd_max_backfills to 1 before