[ceph-users] Re: squid 19.1.0 RC QE validation status

2024-07-09 Thread Neha Ojha
On Tue, Jul 9, 2024 at 11:17 AM Yuri Weinstein  wrote:

> Neha, and Josh pls do a final review and approval.
>
> Pls confirm that the Gibba/LRC upgrade is out of scope for this
>

Gibba was upgraded successfully and LRC upgrade is out of space for this rc.


>
> We will still need to promote files to the release location and build
> production containers.
>

Let's finish the final step!

Thanks,
Neha


>
> On Wed, Jul 3, 2024 at 12:53 PM Yuri Weinstein 
> wrote:
> >
> > Dev leads concluded that we have reached the approval level needed for
> > the RC0 an will start release process
> >
> > On Wed, Jul 3, 2024 at 10:11 AM Guillaume ABRIOUX 
> wrote:
> > >
> > > Hi Yuri,
> > >
> > > ceph-volume approved -
> https://pulpito.ceph.com/gabrioux-2024-07-03_14:50:01-orch:cephadm-squid-release-distro-default-smithi/
> > > (the few failures are known issues)
> > >
> > > Thanks!
> > >
> > > --
> > > Guillaume Abrioux
> > > Software Engineer
> > > 
> > > From: Yuri Weinstein 
> > > Sent: 01 July 2024 16:22
> > > To: dev ; ceph-users 
> > > Subject: [EXTERNAL] [ceph-users] squid 19.1.0 RC QE validation status
> > >
> > > Details of this release are summarized here:
> > >
> > > https://tracker.ceph.com/issues/66756#note-1
> > >
> > > Release Notes - TBD
> > > LRC upgrade - TBD
> > >
> > > (Reruns were not done yet.)
> > >
> > > Seeking approvals/reviews for:
> > >
> > > smoke
> > > rados - Radek, Laura
> > > rgw- Casey
> > > fs - Venky
> > > orch - Adam King
> > > rbd, krbd - Ilya
> > > quincy-x, reef-x - Laura, Neha
> > > powercycle - Brad
> > > perf-basic - Yaarit, Laura
> > > crimson-rados - Samuel
> > > ceph-volume - Guillaume
> > >
> > > Pls let me know if any tests were missed from this list.
> > > ___
> > > ceph-users mailing list -- ceph-users@ceph.io
> > > To unsubscribe send an email to ceph-users-le...@ceph.io
> > > Unless otherwise stated above:
> > >
> > > Compagnie IBM France
> > > Siège Social : 17, avenue de l'Europe, 92275 Bois-Colombes Cedex
> > > RCS Nanterre 552 118 465
> > > Forme Sociale : S.A.S.
> > > Capital Social : 664 069 390,60 €
> > > SIRET : 552 118 465 03644 - Code NAF 6203Z
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: squid 19.1.0 RC QE validation status

2024-07-09 Thread Yuri Weinstein
Neha, and Josh pls do a final review and approval.

Pls confirm that the Gibba/LRC upgrade is out of scope for this

We will still need to promote files to the release location and build
production containers.

On Wed, Jul 3, 2024 at 12:53 PM Yuri Weinstein  wrote:
>
> Dev leads concluded that we have reached the approval level needed for
> the RC0 an will start release process
>
> On Wed, Jul 3, 2024 at 10:11 AM Guillaume ABRIOUX  wrote:
> >
> > Hi Yuri,
> >
> > ceph-volume approved - 
> > https://pulpito.ceph.com/gabrioux-2024-07-03_14:50:01-orch:cephadm-squid-release-distro-default-smithi/
> > (the few failures are known issues)
> >
> > Thanks!
> >
> > --
> > Guillaume Abrioux
> > Software Engineer
> > 
> > From: Yuri Weinstein 
> > Sent: 01 July 2024 16:22
> > To: dev ; ceph-users 
> > Subject: [EXTERNAL] [ceph-users] squid 19.1.0 RC QE validation status
> >
> > Details of this release are summarized here:
> >
> > https://tracker.ceph.com/issues/66756#note-1
> >
> > Release Notes - TBD
> > LRC upgrade - TBD
> >
> > (Reruns were not done yet.)
> >
> > Seeking approvals/reviews for:
> >
> > smoke
> > rados - Radek, Laura
> > rgw- Casey
> > fs - Venky
> > orch - Adam King
> > rbd, krbd - Ilya
> > quincy-x, reef-x - Laura, Neha
> > powercycle - Brad
> > perf-basic - Yaarit, Laura
> > crimson-rados - Samuel
> > ceph-volume - Guillaume
> >
> > Pls let me know if any tests were missed from this list.
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
> > Unless otherwise stated above:
> >
> > Compagnie IBM France
> > Siège Social : 17, avenue de l'Europe, 92275 Bois-Colombes Cedex
> > RCS Nanterre 552 118 465
> > Forme Sociale : S.A.S.
> > Capital Social : 664 069 390,60 €
> > SIRET : 552 118 465 03644 - Code NAF 6203Z
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Large omap in index pool even if properly sharded and not "OVER"

2024-07-09 Thread Szabo, Istvan (Agoda)
Hi Casey,

1.
Regarding versioning, the user doesn't use verisoning it if I'm not mistaken:
https://gist.githubusercontent.com/Badb0yBadb0y/d80c1bdb8609088970413969826d2b7d/raw/baee46865178fff454c224040525b55b54e27218/gistfile1.txt

2.
Regarding multiparts, if it would have multipart thrash, it would be listed 
here:
https://gist.githubusercontent.com/Badb0yBadb0y/d80c1bdb8609088970413969826d2b7d/raw/baee46865178fff454c224040525b55b54e27218/gistfile1.txt
as a rgw.multimeta under the usage, right?

3.
Regarding the multisite idea, this bucket has been a multisite bucket last year 
but we had to reshard (accepting to loose the replica on the 2nd site and just 
keep it in the master site) and that time as expected it has disappeared 
completely from the 2nd site (I guess the 40TB thrash still there but can't 
really find it how to clean  ). Now it is a single site bucket.
Also it is the index pool, multisite logs should go to the rgw.log pool 
shouldn't it?


Thank you


From: Casey Bodley 
Sent: Tuesday, July 9, 2024 10:39 PM
To: Szabo, Istvan (Agoda) 
Cc: Eugen Block ; ceph-users@ceph.io 
Subject: Re: [ceph-users] Re: Large omap in index pool even if properly sharded 
and not "OVER"

Email received from the internet. If in doubt, don't click any link nor open 
any attachment !


in general, these omap entries should be evenly spread over the
bucket's index shard objects. but there are two features that may
cause entries to clump on a single shard:

1. for versioned buckets, multiple versions of the same object name
map to the same index shard. this can become an issue if an
application is repeatedly overwriting an object without cleaning up
old versions. lifecycle rules can help to manage these noncurrent
versions

2. during a multipart upload, all of the parts are tracked on the same
index shard as the final object name. if applications are leaving a
lot of incomplete multipart uploads behind (especially if they target
the same object name) this can lead to similar clumping. the S3 api
has operations to list and abort incomplete multipart uploads, along
with lifecycle rules to automate their cleanup

separately, multisite clusters use these same index shards to store
replication logs. if sync gets far enough behind, these log entries
can also lead to large omap warnings

On Tue, Jul 9, 2024 at 10:25 AM Szabo, Istvan (Agoda)
 wrote:
>
> It's the same bucket:
> https://gist.github.com/Badb0yBadb0y/d80c1bdb8609088970413969826d2b7d
>
>
> 
> From: Eugen Block 
> Sent: Tuesday, July 9, 2024 8:03 PM
> To: Szabo, Istvan (Agoda) 
> Cc: ceph-users@ceph.io 
> Subject: Re: [ceph-users] Re: Large omap in index pool even if properly 
> sharded and not "OVER"
>
> Email received from the internet. If in doubt, don't click any link nor open 
> any attachment !
> 
>
> Are those three different buckets? Could you share the stats for each of them?
>
> radosgw-admin bucket stats --bucket=
>
> Zitat von "Szabo, Istvan (Agoda)" :
>
> > Hello,
> >
> > Yeah, still:
> >
> > the .dir.9213182a-14ba-48ad-bde9-289a1c0c0de8.2479481907.1.151 | wc -l
> > 290005
> >
> > and the
> > .dir.9213182a-14ba-48ad-bde9-289a1c0c0de8.2479481907.1.726 | wc -l
> > 289378
> >
> > And just make me happy more I have one more
> > .dir.9213182a-14ba-48ad-bde9-289a1c0c0de8.2479481907.1.6 | wc -l
> > 181588
> >
> > This is my crush tree (I'm using host based crush rule)
> > https://gist.githubusercontent.com/Badb0yBadb0y/9bea911701184a51575619bc99cca94d/raw/e5e4a918d327769bb874aaed279a8428fd7150d5/gistfile1.txt
> >
> > I'm thinking could that be the issue that host 2s13-15 has less nvme
> > osd (however size wise same as in the other 12 host where have 8x
> > nvme osd) than the others?
> > But the pgs are located like this:
> >
> > pg26.427
> > osd.261 host8
> > osd.488 host13
> > osd.276 host4
> >
> > pg26.606
> > osd.443 host12
> > osd.197 host8
> > osd.524 host14
> >
> > pg26.78c
> > osd.89 host7
> > osd.406 host11
> > osd.254 host6
> >
> > If pg26.78c wouldn't be here I'd say 100% the nvme osd distribution
> > based on host is the issue, however this pg is not located on any of
> > the 4x nvme osd nodes 
> >
> > Ty
> >
> > 
> > From: Eugen Block 
> > Sent: Tuesday, July 9, 2024 6:02 PM
> > To: ceph-users@ceph.io 
> > Subject: [ceph-users] Re: Large omap in index pool even if properly
> > sharded and not "OVER"
> >
> > Email received from the internet. If in doubt, don't click any link
> > nor open any attachment !
> > 
> >
> > Hi,
> >
> > the number of shards looks fine, maybe this was just a temporary
> > burst? Did you check if the rados objects in the index pool still have
> > more than 200k omap objects? I would try someting like
> >
> > rados -p  listomapkeys
> > .dir.9213182a-14ba-48ad-bde9-289a1c0c0de8.2479481907.1.151 | wc -l
> >
> >
> > Zitat von "Szabo, 

[ceph-users] Re: Large omap in index pool even if properly sharded and not "OVER"

2024-07-09 Thread Casey Bodley
in general, these omap entries should be evenly spread over the
bucket's index shard objects. but there are two features that may
cause entries to clump on a single shard:

1. for versioned buckets, multiple versions of the same object name
map to the same index shard. this can become an issue if an
application is repeatedly overwriting an object without cleaning up
old versions. lifecycle rules can help to manage these noncurrent
versions

2. during a multipart upload, all of the parts are tracked on the same
index shard as the final object name. if applications are leaving a
lot of incomplete multipart uploads behind (especially if they target
the same object name) this can lead to similar clumping. the S3 api
has operations to list and abort incomplete multipart uploads, along
with lifecycle rules to automate their cleanup

separately, multisite clusters use these same index shards to store
replication logs. if sync gets far enough behind, these log entries
can also lead to large omap warnings

On Tue, Jul 9, 2024 at 10:25 AM Szabo, Istvan (Agoda)
 wrote:
>
> It's the same bucket:
> https://gist.github.com/Badb0yBadb0y/d80c1bdb8609088970413969826d2b7d
>
>
> 
> From: Eugen Block 
> Sent: Tuesday, July 9, 2024 8:03 PM
> To: Szabo, Istvan (Agoda) 
> Cc: ceph-users@ceph.io 
> Subject: Re: [ceph-users] Re: Large omap in index pool even if properly 
> sharded and not "OVER"
>
> Email received from the internet. If in doubt, don't click any link nor open 
> any attachment !
> 
>
> Are those three different buckets? Could you share the stats for each of them?
>
> radosgw-admin bucket stats --bucket=
>
> Zitat von "Szabo, Istvan (Agoda)" :
>
> > Hello,
> >
> > Yeah, still:
> >
> > the .dir.9213182a-14ba-48ad-bde9-289a1c0c0de8.2479481907.1.151 | wc -l
> > 290005
> >
> > and the
> > .dir.9213182a-14ba-48ad-bde9-289a1c0c0de8.2479481907.1.726 | wc -l
> > 289378
> >
> > And just make me happy more I have one more
> > .dir.9213182a-14ba-48ad-bde9-289a1c0c0de8.2479481907.1.6 | wc -l
> > 181588
> >
> > This is my crush tree (I'm using host based crush rule)
> > https://gist.githubusercontent.com/Badb0yBadb0y/9bea911701184a51575619bc99cca94d/raw/e5e4a918d327769bb874aaed279a8428fd7150d5/gistfile1.txt
> >
> > I'm thinking could that be the issue that host 2s13-15 has less nvme
> > osd (however size wise same as in the other 12 host where have 8x
> > nvme osd) than the others?
> > But the pgs are located like this:
> >
> > pg26.427
> > osd.261 host8
> > osd.488 host13
> > osd.276 host4
> >
> > pg26.606
> > osd.443 host12
> > osd.197 host8
> > osd.524 host14
> >
> > pg26.78c
> > osd.89 host7
> > osd.406 host11
> > osd.254 host6
> >
> > If pg26.78c wouldn't be here I'd say 100% the nvme osd distribution
> > based on host is the issue, however this pg is not located on any of
> > the 4x nvme osd nodes 
> >
> > Ty
> >
> > 
> > From: Eugen Block 
> > Sent: Tuesday, July 9, 2024 6:02 PM
> > To: ceph-users@ceph.io 
> > Subject: [ceph-users] Re: Large omap in index pool even if properly
> > sharded and not "OVER"
> >
> > Email received from the internet. If in doubt, don't click any link
> > nor open any attachment !
> > 
> >
> > Hi,
> >
> > the number of shards looks fine, maybe this was just a temporary
> > burst? Did you check if the rados objects in the index pool still have
> > more than 200k omap objects? I would try someting like
> >
> > rados -p  listomapkeys
> > .dir.9213182a-14ba-48ad-bde9-289a1c0c0de8.2479481907.1.151 | wc -l
> >
> >
> > Zitat von "Szabo, Istvan (Agoda)" :
> >
> >> Hi,
> >>
> >> I have a pretty big bucket which sharded with 1999 shard so in
> >> theory can hold close to 200m objects (199.900.000).
> >> Currently it has 54m objects.
> >>
> >> Bucket limit check looks also good:
> >>  "bucket": ""xyz,
> >>  "tenant": "",
> >>  "num_objects": 53619489,
> >>  "num_shards": 1999,
> >>  "objects_per_shard": 26823,
> >>  "fill_status": "OK"
> >>
> >> This is the bucket id:
> >> "id": "9213182a-14ba-48ad-bde9-289a1c0c0de8.2479481907.1"
> >>
> >> This is the log lines:
> >> 2024-06-27T10:41:05.679870+0700 osd.261 (osd.261) 9643 : cluster
> >> [WRN] Large omap object found. Object:
> >> 26:e433e65c:::.dir.9213182a-14ba-48ad-bde9-289a1c0c0de8.2479481907.1.151:head
> >>  PG: 26.3a67cc27 (26.427) Key count: 236919 Size
> >> (bytes):
> >> 89969920
> >>
> >> 2024-06-27T10:43:35.557835+0700 osd.89 (osd.89) 9000 : cluster [WRN]
> >> Large omap object found. Object:
> >> 26:31ff4df1:::.dir.9213182a-14ba-48ad-bde9-289a1c0c0de8.2479481907.1.726:head
> >>  PG: 26.8fb2ff8c (26.78c) Key count: 236495 Size
> >> (bytes):
> >> 95560458
> >>
> >> Tried to deep scrub the affected pgs, tried to deep-scrub the
> >> mentioned osds in the log but didn't help.
> >> Why? What I'm missing?
> >>
> >> Thank you in advance for your help.
> >>
> >> 
> >> This message is confidential 

[ceph-users] Re: CephFS MDS crashing during replay with standby MDSes crashing afterwards

2024-07-09 Thread Tim Holloway
Ivan,

This may be a little off-topic, but if you're still running AlmaLinux
8,9, it's worth noting that CentOS 8 actually end-of-lifed about 2
years ago, thanks to CentOS Stream.

Up until this last week, however, I had several AlmaLinux 8 machines
running myself, but apparently somewhere around May IBM Red Hat pulled
all of its CentOS8 enterprise sites offline, including Storage and
Ceph, which broke my yum updates.

While as far as I'm aware, once you've installed cephadm (whether via
yum/dnf or otherwise), there's no further need for the RPM repos,
losing yum support is not helping at the very least.

On the upside, it's possible to upgrade-in-place from AlmaLinux 8.9 to
AlmaLinux 9, although it may require temporarily disabling certain OS
services to appease the upgrade process.

Probably won't solve your problem, but at least you'll be able to move
fairly painlessly to a better-supported platform.

  Best Regards,
 Tim

On Tue, 2024-07-09 at 11:14 +0100, Ivan Clayson wrote:
> Hi Dhairya,
> 
> I would be more than happy to try and give as many details as
> possible 
> but the slack channel is private and requires my email to have an 
> account/ access to it.
> 
> Wouldn't taking the discussion about this error to a private channel 
> also stop other users who experience this error from learning about
> how 
> and why this happened as  well as possibly not be able to view the 
> solution? Would it not be possible to discuss this more publicly for
> the 
> benefit of the other users on the mailing list?
> 
> Kindest regards,
> 
> Ivan
> 
> On 09/07/2024 10:44, Dhairya Parmar wrote:
> > CAUTION: This email originated from outside of the LMB:
> > *.-dpar...@redhat.com-.*
> > Do not click links or open attachments unless you recognize the
> > sender 
> > and know the content is safe.
> > If you think this is a phishing email, please forward it to 
> > phish...@mrc-lmb.cam.ac.uk
> > 
> > 
> > --
> > 
> > Hey Ivan,
> > 
> > This is a relatively new MDS crash, so this would require some 
> > investigation but I was instructed to recommend disaster-recovery 
> > steps [0] (except session reset) to you to get the FS up again.
> > 
> > This crash is being discussed on upstream CephFS slack channel [1] 
> > with @Venky Shankar  and other CephFS 
> > devs. I'd encourage you to join the conversation, we can discuss
> > this 
> > in detail and maybe go through the incident step by step which
> > should 
> > help analyse the crash better.
> > 
> > [0] 
> > https://docs.ceph.com/en/latest/cephfs/disaster-recovery-experts/#disaster-recovery-experts
> > [1]
> > https://ceph-storage.slack.com/archives/C04LVQMHM9B/p1720443057919519
> > 
> > On Mon, Jul 8, 2024 at 7:37 PM Ivan Clayson
> >  
> > wrote:
> > 
> >     Hi Dhairya,
> > 
> >     Thank you ever so much for having another look at this so
> > quickly.
> >     I don't think I have any logs similar to the ones you
> > referenced
> >     this time as my MDSs don't seem to enter the replay stage when
> >     they crash (or at least don't now after I've thrown the logs
> > away)
> >     but those errors do crop up in the prior logs I shared when the
> >     system first crashed.
> > 
> >     Kindest regards,
> > 
> >     Ivan
> > 
> >     On 08/07/2024 14:08, Dhairya Parmar wrote:
> > >     CAUTION: This email originated from outside of the LMB:
> > >     *.-dpar...@redhat.com-.*
> > >     Do not click links or open attachments unless you recognize
> > > the
> > >     sender and know the content is safe.
> > >     If you think this is a phishing email, please forward it to
> > >     phish...@mrc-lmb.cam.ac.uk
> > > 
> > > 
> > >     --
> > > 
> > >     Ugh, something went horribly wrong. I've downloaded the MDS
> > > logs
> > >     that contain assertion failure and it looks relevant to this
> > > [0].
> > >     Do you have client logs for this?
> > > 
> > >     The other log that you shared is being downloaded right now,
> > > once
> > >     that's done and I'm done going through it, I'll update you.
> > > 
> > >     [0] https://tracker.ceph.com/issues/54546
> > > 
> > >     On Mon, Jul 8, 2024 at 4:49 PM Ivan Clayson
> > >      wrote:
> > > 
> > >     Hi Dhairya,
> > > 
> > >     Sorry to resurrect this thread again, but we still
> > >     unfortunately have an issue with our filesystem after we
> > >     attempted to write new backups to it.
> > > 
> > >     We finished the scrub of the filesystem on Friday and ran
> > > a
> > >     repair scrub on the 1 directory which had metadata
> > > damage.
> > >     After doing so and rebooting, the cluster reported no
> > > issues
> > >     and data was accessible again.
> > > 
> > >     We re-started the backups to run over the weekend and
> > >     unfortunately the filesystem crashed again where the log
> > > of
> > >     the failure is here:
> > >    
> > > 

[ceph-users] Re: Large omap in index pool even if properly sharded and not "OVER"

2024-07-09 Thread Szabo, Istvan (Agoda)
It's the same bucket:
https://gist.github.com/Badb0yBadb0y/d80c1bdb8609088970413969826d2b7d



From: Eugen Block 
Sent: Tuesday, July 9, 2024 8:03 PM
To: Szabo, Istvan (Agoda) 
Cc: ceph-users@ceph.io 
Subject: Re: [ceph-users] Re: Large omap in index pool even if properly sharded 
and not "OVER"

Email received from the internet. If in doubt, don't click any link nor open 
any attachment !


Are those three different buckets? Could you share the stats for each of them?

radosgw-admin bucket stats --bucket=

Zitat von "Szabo, Istvan (Agoda)" :

> Hello,
>
> Yeah, still:
>
> the .dir.9213182a-14ba-48ad-bde9-289a1c0c0de8.2479481907.1.151 | wc -l
> 290005
>
> and the
> .dir.9213182a-14ba-48ad-bde9-289a1c0c0de8.2479481907.1.726 | wc -l
> 289378
>
> And just make me happy more I have one more
> .dir.9213182a-14ba-48ad-bde9-289a1c0c0de8.2479481907.1.6 | wc -l
> 181588
>
> This is my crush tree (I'm using host based crush rule)
> https://gist.githubusercontent.com/Badb0yBadb0y/9bea911701184a51575619bc99cca94d/raw/e5e4a918d327769bb874aaed279a8428fd7150d5/gistfile1.txt
>
> I'm thinking could that be the issue that host 2s13-15 has less nvme
> osd (however size wise same as in the other 12 host where have 8x
> nvme osd) than the others?
> But the pgs are located like this:
>
> pg26.427
> osd.261 host8
> osd.488 host13
> osd.276 host4
>
> pg26.606
> osd.443 host12
> osd.197 host8
> osd.524 host14
>
> pg26.78c
> osd.89 host7
> osd.406 host11
> osd.254 host6
>
> If pg26.78c wouldn't be here I'd say 100% the nvme osd distribution
> based on host is the issue, however this pg is not located on any of
> the 4x nvme osd nodes 
>
> Ty
>
> 
> From: Eugen Block 
> Sent: Tuesday, July 9, 2024 6:02 PM
> To: ceph-users@ceph.io 
> Subject: [ceph-users] Re: Large omap in index pool even if properly
> sharded and not "OVER"
>
> Email received from the internet. If in doubt, don't click any link
> nor open any attachment !
> 
>
> Hi,
>
> the number of shards looks fine, maybe this was just a temporary
> burst? Did you check if the rados objects in the index pool still have
> more than 200k omap objects? I would try someting like
>
> rados -p  listomapkeys
> .dir.9213182a-14ba-48ad-bde9-289a1c0c0de8.2479481907.1.151 | wc -l
>
>
> Zitat von "Szabo, Istvan (Agoda)" :
>
>> Hi,
>>
>> I have a pretty big bucket which sharded with 1999 shard so in
>> theory can hold close to 200m objects (199.900.000).
>> Currently it has 54m objects.
>>
>> Bucket limit check looks also good:
>>  "bucket": ""xyz,
>>  "tenant": "",
>>  "num_objects": 53619489,
>>  "num_shards": 1999,
>>  "objects_per_shard": 26823,
>>  "fill_status": "OK"
>>
>> This is the bucket id:
>> "id": "9213182a-14ba-48ad-bde9-289a1c0c0de8.2479481907.1"
>>
>> This is the log lines:
>> 2024-06-27T10:41:05.679870+0700 osd.261 (osd.261) 9643 : cluster
>> [WRN] Large omap object found. Object:
>> 26:e433e65c:::.dir.9213182a-14ba-48ad-bde9-289a1c0c0de8.2479481907.1.151:head
>>  PG: 26.3a67cc27 (26.427) Key count: 236919 Size
>> (bytes):
>> 89969920
>>
>> 2024-06-27T10:43:35.557835+0700 osd.89 (osd.89) 9000 : cluster [WRN]
>> Large omap object found. Object:
>> 26:31ff4df1:::.dir.9213182a-14ba-48ad-bde9-289a1c0c0de8.2479481907.1.726:head
>>  PG: 26.8fb2ff8c (26.78c) Key count: 236495 Size
>> (bytes):
>> 95560458
>>
>> Tried to deep scrub the affected pgs, tried to deep-scrub the
>> mentioned osds in the log but didn't help.
>> Why? What I'm missing?
>>
>> Thank you in advance for your help.
>>
>> 
>> This message is confidential and is for the sole use of the intended
>> recipient(s). It may also be privileged or otherwise protected by
>> copyright or other legal rules. If you have received it by mistake
>> please let us know by reply email and delete it from your system. It
>> is prohibited to copy this message or disclose its content to
>> anyone. Any confidentiality or privilege is not waived or lost by
>> any mistaken delivery or unauthorized disclosure of the message. All
>> messages sent to and from Agoda may be monitored to ensure
>> compliance with company policies, to protect the company's interests
>> and to remove potential malware. Electronic messages may be
>> intercepted, amended, lost or deleted, or contain viruses.
>> ___
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
>
>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
> 
> This message is confidential and is for the sole use of the intended
> recipient(s). It may also be privileged or otherwise protected by
> copyright or other legal rules. If you have received it by mistake
> please let us know by reply email and delete 

[ceph-users] Re: reef 18.2.3 QE validation status

2024-07-09 Thread Casey Bodley
this was discussed in the ceph leadership team meeting yesterday, and
we've agreed to re-number this release to 18.2.4

On Wed, Jul 3, 2024 at 1:08 PM  wrote:
>
>
> On Jul 3, 2024 5:59 PM, Kaleb Keithley  wrote:
> >
> >
> >
>
> > Replacing the tar file is problematic too, if only because it's a potential 
> > source of confusion for people who aren't paying attention.
>
> It'd be really the worst thing to do.
>
> > I'm not sure I believe that making this next release 18.2.4 really solves 
> > anything
>
> It solves *my* problem that the old version of the file is already in the 
> Debian archive and cannot be replaced there. By all means, please find a 
> better solution for long term. In the mean time, do *not* re-release an 
> already released tarball.
>
> Cheers,
>
> Thomas Goirand (zigo)
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Large omap in index pool even if properly sharded and not "OVER"

2024-07-09 Thread Eugen Block

Are those three different buckets? Could you share the stats for each of them?

radosgw-admin bucket stats --bucket=

Zitat von "Szabo, Istvan (Agoda)" :


Hello,

Yeah, still:

the .dir.9213182a-14ba-48ad-bde9-289a1c0c0de8.2479481907.1.151 | wc -l
290005

and the
.dir.9213182a-14ba-48ad-bde9-289a1c0c0de8.2479481907.1.726 | wc -l
289378

And just make me happy more I have one more
.dir.9213182a-14ba-48ad-bde9-289a1c0c0de8.2479481907.1.6 | wc -l
181588

This is my crush tree (I'm using host based crush rule)
https://gist.githubusercontent.com/Badb0yBadb0y/9bea911701184a51575619bc99cca94d/raw/e5e4a918d327769bb874aaed279a8428fd7150d5/gistfile1.txt

I'm thinking could that be the issue that host 2s13-15 has less nvme  
osd (however size wise same as in the other 12 host where have 8x  
nvme osd) than the others?

But the pgs are located like this:

pg26.427
osd.261 host8
osd.488 host13
osd.276 host4

pg26.606
osd.443 host12
osd.197 host8
osd.524 host14

pg26.78c
osd.89 host7
osd.406 host11
osd.254 host6

If pg26.78c wouldn't be here I'd say 100% the nvme osd distribution  
based on host is the issue, however this pg is not located on any of  
the 4x nvme osd nodes 


Ty


From: Eugen Block 
Sent: Tuesday, July 9, 2024 6:02 PM
To: ceph-users@ceph.io 
Subject: [ceph-users] Re: Large omap in index pool even if properly  
sharded and not "OVER"


Email received from the internet. If in doubt, don't click any link  
nor open any attachment !



Hi,

the number of shards looks fine, maybe this was just a temporary
burst? Did you check if the rados objects in the index pool still have
more than 200k omap objects? I would try someting like

rados -p  listomapkeys
.dir.9213182a-14ba-48ad-bde9-289a1c0c0de8.2479481907.1.151 | wc -l


Zitat von "Szabo, Istvan (Agoda)" :


Hi,

I have a pretty big bucket which sharded with 1999 shard so in
theory can hold close to 200m objects (199.900.000).
Currently it has 54m objects.

Bucket limit check looks also good:
 "bucket": ""xyz,
 "tenant": "",
 "num_objects": 53619489,
 "num_shards": 1999,
 "objects_per_shard": 26823,
 "fill_status": "OK"

This is the bucket id:
"id": "9213182a-14ba-48ad-bde9-289a1c0c0de8.2479481907.1"

This is the log lines:
2024-06-27T10:41:05.679870+0700 osd.261 (osd.261) 9643 : cluster
[WRN] Large omap object found. Object:
26:e433e65c:::.dir.9213182a-14ba-48ad-bde9-289a1c0c0de8.2479481907.1.151:head PG: 26.3a67cc27 (26.427) Key count: 236919 Size  
(bytes):

89969920

2024-06-27T10:43:35.557835+0700 osd.89 (osd.89) 9000 : cluster [WRN]
Large omap object found. Object:
26:31ff4df1:::.dir.9213182a-14ba-48ad-bde9-289a1c0c0de8.2479481907.1.726:head PG: 26.8fb2ff8c (26.78c) Key count: 236495 Size  
(bytes):

95560458

Tried to deep scrub the affected pgs, tried to deep-scrub the
mentioned osds in the log but didn't help.
Why? What I'm missing?

Thank you in advance for your help.


This message is confidential and is for the sole use of the intended
recipient(s). It may also be privileged or otherwise protected by
copyright or other legal rules. If you have received it by mistake
please let us know by reply email and delete it from your system. It
is prohibited to copy this message or disclose its content to
anyone. Any confidentiality or privilege is not waived or lost by
any mistaken delivery or unauthorized disclosure of the message. All
messages sent to and from Agoda may be monitored to ensure
compliance with company policies, to protect the company's interests
and to remove potential malware. Electronic messages may be
intercepted, amended, lost or deleted, or contain viruses.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


This message is confidential and is for the sole use of the intended  
recipient(s). It may also be privileged or otherwise protected by  
copyright or other legal rules. If you have received it by mistake  
please let us know by reply email and delete it from your system. It  
is prohibited to copy this message or disclose its content to  
anyone. Any confidentiality or privilege is not waived or lost by  
any mistaken delivery or unauthorized disclosure of the message. All  
messages sent to and from Agoda may be monitored to ensure  
compliance with company policies, to protect the company's interests  
and to remove potential malware. Electronic messages may be  
intercepted, amended, lost or deleted, or contain viruses.



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Phantom hosts

2024-07-09 Thread Tim Holloway
Hi Eugen,

It's gone now, although similar artefacts seems to linger.

The reason it's gone is that I've been upgrading all my machines to
AlamLinux 8 from CentOS 7 and AlmaLinux 7, as one is already EOL and
the other is within days of it. Rather than upgrade-in-place, I chose
to nuke/replace the entire system disks and provision from scratch. It
helped me clean up my network and get rid of years of cruft.

Ceph helped a lot there, since I'd do one machine at a time, and since
the provisioning data is on Ceph, it was always available even as
individual machines went up and down.

I lost the phantom host, although for a while one of the newer OSDs
gave me issues. The container would die while starting claiming that
the OSD block (badly-quoted) was "already in user". I believe this
happened right after I moved the _admin node to that machine.

I finally got the failed machine back online by manually stopping the
systemd service, waiting a while, then starting (not restarting) it.
But some other nodes may have been rebooted in the interim, so it's
hard to be certain what actually made it happy. Annoyingly, the
dashboard and OSD tree listed the failed node as "up" and "in" even
thoiug "ceph orch ps" showed it as "error". I couldn't persuade it to
go down and out, or I would have destroyed and re-created it.

I did clear up a major mess, though. My original install/admin machine
was littered with dead and mangled objects. Two long-deleted OSDs left
traces, and there was a mix of pre-cephadm components (including) OSDs
and newer stuff.

I did discover a node still running Octopus which I plan to upgrade
today, but overall things look pretty clean, excepting the ever-
frustrating "too many PGs per OSD". If autotuning was supposed to auto-
fix this, it's not doing so, even though autotuning is switched on.
Manual changes don't seem to take either.

Going back to the phantom host situation, one thing I have seen is that
on the dashboard, the hosts display lists OSDs that have been deleted
as belonging to that machine. "ceph osd tree" and the OSD view disagree
and show neither the phantom host nor the deleted OSDs.

Just to recap, the original phantom host was a non-ceph node that
accidentally got sucked in when I did a host add with the wrong IP
address. It then claimed to own another host's OSD.

  Thanks,
Tim

On Tue, 2024-07-09 at 06:08 +, Eugen Block wrote:
> Hi Tim,
> 
> is this still an issue? If it is, I recommend to add some more
> details  
> so it's easier to follow your train of thought.
> 
> ceph osd tree
> ceph -s
> ceph health detail
> ceph orch host ls
> 
> And then please point out which host you're trying to get rid of. I  
> would deal with the rgw thing later. Is it possible, that the
> phantom  
> host actually had OSDs on it? Maybe that needs to be cleaned up
> first.  
> I had something similar on a customer cluster recently where we
> hunted  
> failing OSDs but it turned out they were removed quite a while ago,  
> just not properly cleaned up yet on the filesystem.
> 
> Thanks,
> Eugen
> 
> Zitat von Tim Holloway :
> 
> > It's getting worse.
> > 
> > As many may be aware, the venerable CentOS 7 OS is hitting end-of-
> > life in a
> > matter of days.
> > 
> > The easiest way to upgrade my serves has been to simply create an
> > alternate
> > disk with the new OS, turn my provisioning system loose on it, yank
> > the old
> > OS system disk and jack in the new one.
> > 
> > 
> > However, Ceph is another matter. For that part, the simplest thing
> > to do is
> > to destroy the Ceph node(s) on the affected box, do the OS upgrade,
> > then
> > re-create the nodes.
> > 
> > But now I have even MORE strays. The OSD on my box lives on in Ceph
> > in the
> > dashboard host view even though the documented removal procedures
> > were
> > followed and the VM itself was destroyed.
> > 
> > Further, this last node is an RGW node and I cannot remove it from
> > the RGW
> > configuration. It not only shows on the dashboard, it also lists as
> > still
> > active on the command line and as entries in the config database no
> > matter
> > what I do.
> > 
> > 
> > I really need some solution to this, as it's a major chokepoint in
> > the
> > upgrade process
> > 
> > 
> >    Tim
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
> 
> 
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Fixing BlueFS spillover (pacific 16.2.14)

2024-07-09 Thread Frédéric Nass

One more manual compaction updated bluefs stats figures accordingly.

So at the end, it is:

1/ ceph orch daemon stop osd.${osd}
2/ cephadm shell --fsid $(ceph fsid) --name osd.${osd} -- ceph-bluestore-tool 
bluefs-bdev-migrate --path /var/lib/ceph/osd/ceph-${osd} --devs-source 
/var/lib/ceph/osd/ceph-${osd}/block --dev-target 
/var/lib/ceph/osd/ceph-${osd}/block.db
3/ ceph orch daemon start osd.${osd}
4/ ceph tell osd.${osd} compact

Regards,
Frédéric.

- Le 8 Juil 24, à 17:39, Frédéric Nass frederic.n...@univ-lorraine.fr a 
écrit :

> Hello,
> 
> I just wanted to share that the following command also helped us move slow 
> used
> bytes back to the fast device (without using bluefs-bdev-expand), when several
> compactions couldn't:
> 
> $ cephadm shell --fsid $cid --name osd.${osd} -- ceph-bluestore-tool
> bluefs-bdev-migrate --path /var/lib/ceph/osd/ceph-${osd} --devs-source
> /var/lib/ceph/osd/ceph-${osd}/block --dev-target
> /var/lib/ceph/osd/ceph-${osd}/block.db
> 
> slow_used_bytes is now back to 0 on perf dump and BLUEFS_SPILLOVER alert got
> cleared but 'bluefs stats' is not on par:
> 
> $ ceph tell osd.451 bluefs stats
> 1 : device size 0x1effbfe000 : using 0x30960(12 GiB)
> 2 : device size 0x746dfc0 : using 0x3abd77d2000(3.7 TiB)
> RocksDBBlueFSVolumeSelector Usage Matrix:
> DEV/LEV WAL DB  SLOW*   *   REAL
> FILES
> LOG 0 B 22 MiB  0 B 0 B 0 B 3.9 
> MiB
> 1
> WAL 0 B 33 MiB  0 B 0 B 0 B 32 MiB
> 2
> DB  0 B 12 GiB  0 B 0 B 0 B 12 GiB
> 196
> SLOW0 B 4 MiB   0 B 0 B 0 B 3.8 
> MiB
> 1
> TOTAL   0 B 12 GiB  0 B 0 B 0 B 0 B
> 200
> MAXIMUMS:
> LOG 0 B 22 MiB  0 B 0 B 0 B 17 MiB
> WAL 0 B 33 MiB  0 B 0 B 0 B 32 MiB
> DB  0 B 24 GiB  0 B 0 B 0 B 24 GiB
> SLOW0 B 4 MiB   0 B 0 B 0 B 3.8 
> MiB
> TOTAL   0 B 24 GiB  0 B 0 B 0 B 0 B
>>> SIZE <<  0 B 118 GiB 6.9 TiB
> 
> Any idea? Is this something to worry about?
> 
> Regards,
> Frédéric.
> 
> - Le 16 Oct 23, à 14:46, Igor Fedotov igor.fedo...@croit.io a écrit :
> 
>> Hi Chris,
>> 
>> for the first question (osd.76) you might want to try ceph-volume's "lvm
>> migrate --from data --target " command. Looks like some
>> persistent DB remnants are still kept at main device causing the alert.
>> 
>> W.r.t osd.86's question - the line "SLOW    0 B 3.0 GiB
>> 59 GiB" means that RocksDB higher levels  data (usually L3+) are spread
>> over DB and main (aka slow) devices as 3 GB and 59 GB respectively.
>> 
>> In other words SLOW row refers to DB data which is originally supposed
>> to be at SLOW device (due to RocksDB data mapping mechanics). But
>> improved bluefs logic (introduced by
>> https://github.com/ceph/ceph/pull/29687) permitted extra DB disk usage
>> for a part of this data.
>> 
>> Resizing DB volume and following DB compaction should do the trick and
>> move all the data to DB device. Alternatively ceph-volume's lvm migrate
>> command should do the same but the result will be rather temporary
>> without DB volume resizing.
>> 
>> Hope this helps.
>> 
>> 
>> Thanks,
>> 
>> Igor
>> 
>> On 06/10/2023 06:55, Chris Dunlop wrote:
>>> Hi,
>>>
>>> tl;dr why are my osds still spilling?
>>>
>>> I've recently upgraded to 16.2.14 from 16.2.9 and started receiving
>>> bluefs spillover warnings (due to the "fix spillover alert" per the
>>> 16.2.14 release notes). E.g. from 'ceph health detail', the warning on
>>> one of these (there are a few):
>>>
>>> osd.76 spilled over 128 KiB metadata from 'db' device (56 GiB used of
>>> 60 GiB) to slow device
>>>
>>> This is a 15T HDD with only a 60G SSD for the db so it's not
>>> surprising it spilled as it's way below the recommendation for rbd
>>> usage at db size 1-2% of the storage size.
>>>
>>> There was some spare space on the db ssd so I increased the size of
>>> the db LV up over 400G and did an bluefs-bdev-expand.
>>>
>>> However, days later, I'm still getting the spillover warning for that
>>> osd, including after running a manual compact:
>>>
>>> # ceph tell osd.76 compact
>>>
>>> See attached perf-dump-76 for the perf dump output:
>>>
>>> # cephadm enter --name 'osd.76' ceph daemon 'osd.76' perf dump" | jq
>>> -r '.bluefs'
>>>
>>> In particular, if my understanding is correct, that's telling me the
>>> db available size is 487G (i.e. the LV expand worked), of which it's
>>> using 59G, and there's 128K spilled to the slow device:
>>>
>>> "db_total_bytes": 512309059584,  # 487G
>>> "db_used_bytes": 63470305280,    # 59G
>>> "slow_used_bytes": 131072,   # 128K
>>>
>>> A "bluefs stats" also says the db is 

[ceph-users] Re: Large omap in index pool even if properly sharded and not "OVER"

2024-07-09 Thread Szabo, Istvan (Agoda)
Hello,

Yeah, still:

the .dir.9213182a-14ba-48ad-bde9-289a1c0c0de8.2479481907.1.151 | wc -l
290005

and the
.dir.9213182a-14ba-48ad-bde9-289a1c0c0de8.2479481907.1.726 | wc -l
289378

And just make me happy more I have one more
.dir.9213182a-14ba-48ad-bde9-289a1c0c0de8.2479481907.1.6 | wc -l
181588

This is my crush tree (I'm using host based crush rule)
https://gist.githubusercontent.com/Badb0yBadb0y/9bea911701184a51575619bc99cca94d/raw/e5e4a918d327769bb874aaed279a8428fd7150d5/gistfile1.txt

I'm thinking could that be the issue that host 2s13-15 has less nvme osd 
(however size wise same as in the other 12 host where have 8x nvme osd) than 
the others?
But the pgs are located like this:

pg26.427
osd.261 host8
osd.488 host13
osd.276 host4

pg26.606
osd.443 host12
osd.197 host8
osd.524 host14

pg26.78c
osd.89 host7
osd.406 host11
osd.254 host6

If pg26.78c wouldn't be here I'd say 100% the nvme osd distribution based on 
host is the issue, however this pg is not located on any of the 4x nvme osd 
nodes 

Ty


From: Eugen Block 
Sent: Tuesday, July 9, 2024 6:02 PM
To: ceph-users@ceph.io 
Subject: [ceph-users] Re: Large omap in index pool even if properly sharded and 
not "OVER"

Email received from the internet. If in doubt, don't click any link nor open 
any attachment !


Hi,

the number of shards looks fine, maybe this was just a temporary
burst? Did you check if the rados objects in the index pool still have
more than 200k omap objects? I would try someting like

rados -p  listomapkeys
.dir.9213182a-14ba-48ad-bde9-289a1c0c0de8.2479481907.1.151 | wc -l


Zitat von "Szabo, Istvan (Agoda)" :

> Hi,
>
> I have a pretty big bucket which sharded with 1999 shard so in
> theory can hold close to 200m objects (199.900.000).
> Currently it has 54m objects.
>
> Bucket limit check looks also good:
>  "bucket": ""xyz,
>  "tenant": "",
>  "num_objects": 53619489,
>  "num_shards": 1999,
>  "objects_per_shard": 26823,
>  "fill_status": "OK"
>
> This is the bucket id:
> "id": "9213182a-14ba-48ad-bde9-289a1c0c0de8.2479481907.1"
>
> This is the log lines:
> 2024-06-27T10:41:05.679870+0700 osd.261 (osd.261) 9643 : cluster
> [WRN] Large omap object found. Object:
> 26:e433e65c:::.dir.9213182a-14ba-48ad-bde9-289a1c0c0de8.2479481907.1.151:head 
> PG: 26.3a67cc27 (26.427) Key count: 236919 Size (bytes):
> 89969920
>
> 2024-06-27T10:43:35.557835+0700 osd.89 (osd.89) 9000 : cluster [WRN]
> Large omap object found. Object:
> 26:31ff4df1:::.dir.9213182a-14ba-48ad-bde9-289a1c0c0de8.2479481907.1.726:head 
> PG: 26.8fb2ff8c (26.78c) Key count: 236495 Size (bytes):
> 95560458
>
> Tried to deep scrub the affected pgs, tried to deep-scrub the
> mentioned osds in the log but didn't help.
> Why? What I'm missing?
>
> Thank you in advance for your help.
>
> 
> This message is confidential and is for the sole use of the intended
> recipient(s). It may also be privileged or otherwise protected by
> copyright or other legal rules. If you have received it by mistake
> please let us know by reply email and delete it from your system. It
> is prohibited to copy this message or disclose its content to
> anyone. Any confidentiality or privilege is not waived or lost by
> any mistaken delivery or unauthorized disclosure of the message. All
> messages sent to and from Agoda may be monitored to ensure
> compliance with company policies, to protect the company's interests
> and to remove potential malware. Electronic messages may be
> intercepted, amended, lost or deleted, or contain viruses.
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


This message is confidential and is for the sole use of the intended 
recipient(s). It may also be privileged or otherwise protected by copyright or 
other legal rules. If you have received it by mistake please let us know by 
reply email and delete it from your system. It is prohibited to copy this 
message or disclose its content to anyone. Any confidentiality or privilege is 
not waived or lost by any mistaken delivery or unauthorized disclosure of the 
message. All messages sent to and from Agoda may be monitored to ensure 
compliance with company policies, to protect the company's interests and to 
remove potential malware. Electronic messages may be intercepted, amended, lost 
or deleted, or contain viruses.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] ceph_fill_inode BAD symlink

2024-07-09 Thread Dietmar Rieder

Hi,

we noticed the following ceph errors in the kernel messages (dmesg -T):

[Tue Jul  9 11:59:24 2024] ceph: ceph_fill_inode 
10003683698.fffe BAD symlink size 0


Is this something that we should be worried about?

We are currently trying to identify the inode/file using:

# find /data -inum 10003683698 -printf "%i %p\n"

Could deleting the inode fix the issue? If not what can we do?

Thanks
  Dietmar


OpenPGP_signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Large omap in index pool even if properly sharded and not "OVER"

2024-07-09 Thread Eugen Block

Hi,

the number of shards looks fine, maybe this was just a temporary  
burst? Did you check if the rados objects in the index pool still have  
more than 200k omap objects? I would try someting like


rados -p  listomapkeys  
.dir.9213182a-14ba-48ad-bde9-289a1c0c0de8.2479481907.1.151 | wc -l



Zitat von "Szabo, Istvan (Agoda)" :


Hi,

I have a pretty big bucket which sharded with 1999 shard so in  
theory can hold close to 200m objects (199.900.000).

Currently it has 54m objects.

Bucket limit check looks also good:
 "bucket": ""xyz,
 "tenant": "",
 "num_objects": 53619489,
 "num_shards": 1999,
 "objects_per_shard": 26823,
 "fill_status": "OK"

This is the bucket id:
"id": "9213182a-14ba-48ad-bde9-289a1c0c0de8.2479481907.1"

This is the log lines:
2024-06-27T10:41:05.679870+0700 osd.261 (osd.261) 9643 : cluster  
[WRN] Large omap object found. Object:  
26:e433e65c:::.dir.9213182a-14ba-48ad-bde9-289a1c0c0de8.2479481907.1.151:head PG: 26.3a67cc27 (26.427) Key count: 236919 Size (bytes):  
89969920


2024-06-27T10:43:35.557835+0700 osd.89 (osd.89) 9000 : cluster [WRN]  
Large omap object found. Object:  
26:31ff4df1:::.dir.9213182a-14ba-48ad-bde9-289a1c0c0de8.2479481907.1.726:head PG: 26.8fb2ff8c (26.78c) Key count: 236495 Size (bytes):  
95560458


Tried to deep scrub the affected pgs, tried to deep-scrub the  
mentioned osds in the log but didn't help.

Why? What I'm missing?

Thank you in advance for your help.


This message is confidential and is for the sole use of the intended  
recipient(s). It may also be privileged or otherwise protected by  
copyright or other legal rules. If you have received it by mistake  
please let us know by reply email and delete it from your system. It  
is prohibited to copy this message or disclose its content to  
anyone. Any confidentiality or privilege is not waived or lost by  
any mistaken delivery or unauthorized disclosure of the message. All  
messages sent to and from Agoda may be monitored to ensure  
compliance with company policies, to protect the company's interests  
and to remove potential malware. Electronic messages may be  
intercepted, amended, lost or deleted, or contain viruses.

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: CephFS MDS crashing during replay with standby MDSes crashing afterwards

2024-07-09 Thread Dhairya Parmar
On Tue, Jul 9, 2024 at 3:46 PM Ivan Clayson  wrote:

> Hi Dhairya,
>
> I would be more than happy to try and give as many details as possible but
> the slack channel is private and requires my email to have an account/
> access to it.
>
You're right in the context that you're required to have an account on
slack; it isn't private at all. The slack channel is open for all, (it's
upstream slack channel :D) it's just that you need to access it with an
email but again it's all your choice, not mandatory. I'd ask @Venky Shankar
 @Patrick Donnelly  to add their
input since they've been working on similar issues and can provide better
insights.

> Wouldn't taking the discussion about this error to a private channel also
> stop other users who experience this error from learning about how and why
> this happened as  well as possibly not be able to view the solution? Would
> it not be possible to discuss this more publicly for the benefit of the
> other users on the mailing list?
>
Kindest regards,
>
> Ivan
> On 09/07/2024 10:44, Dhairya Parmar wrote:
>
> CAUTION: This email originated from outside of the LMB:
> *.-dpar...@redhat.com-.*
> Do not click links or open attachments unless you recognize the sender and
> know the content is safe.
> If you think this is a phishing email, please forward it to
> phish...@mrc-lmb.cam.ac.uk
>
>
> --
> Hey Ivan,
>
> This is a relatively new MDS crash, so this would require some
> investigation but I was instructed to recommend disaster-recovery steps [0]
> (except session reset) to you to get the FS up again.
>
> This crash is being discussed on upstream CephFS slack channel [1] with @Venky
> Shankar  and other CephFS devs. I'd encourage you to
> join the conversation, we can discuss this in detail and maybe go through
> the incident step by step which should help analyse the crash better.
>
> [0]
> https://docs.ceph.com/en/latest/cephfs/disaster-recovery-experts/#disaster-recovery-experts
> [1] https://ceph-storage.slack.com/archives/C04LVQMHM9B/p1720443057919519
>
> On Mon, Jul 8, 2024 at 7:37 PM Ivan Clayson 
> wrote:
>
>> Hi Dhairya,
>>
>> Thank you ever so much for having another look at this so quickly. I
>> don't think I have any logs similar to the ones you referenced this time as
>> my MDSs don't seem to enter the replay stage when they crash (or at least
>> don't now after I've thrown the logs away) but those errors do crop up in
>> the prior logs I shared when the system first crashed.
>>
>> Kindest regards,
>>
>> Ivan
>> On 08/07/2024 14:08, Dhairya Parmar wrote:
>>
>> CAUTION: This email originated from outside of the LMB:
>> *.-dpar...@redhat.com-.*
>> Do not click links or open attachments unless you recognize the sender
>> and know the content is safe.
>> If you think this is a phishing email, please forward it to
>> phish...@mrc-lmb.cam.ac.uk
>>
>>
>> --
>> Ugh, something went horribly wrong. I've downloaded the MDS logs that
>> contain assertion failure and it looks relevant to this [0]. Do you have
>> client logs for this?
>>
>> The other log that you shared is being downloaded right now, once that's
>> done and I'm done going through it, I'll update you.
>>
>> [0] https://tracker.ceph.com/issues/54546
>>
>> On Mon, Jul 8, 2024 at 4:49 PM Ivan Clayson 
>> wrote:
>>
>>> Hi Dhairya,
>>>
>>> Sorry to resurrect this thread again, but we still unfortunately have an
>>> issue with our filesystem after we attempted to write new backups to it.
>>>
>>> We finished the scrub of the filesystem on Friday and ran a repair scrub
>>> on the 1 directory which had metadata damage. After doing so and rebooting,
>>> the cluster reported no issues and data was accessible again.
>>>
>>> We re-started the backups to run over the weekend and unfortunately the
>>> filesystem crashed again where the log of the failure is here:
>>> https://www.mrc-lmb.cam.ac.uk/scicomp/data/uploads/ceph/ceph-mds.pebbles-s2.log-20240708.gz.
>>> We ran the backups on kernel mounts of the filesystem without the nowsync
>>> option this time to avoid the out-of-sync write problems..
>>>
>>> I've tried resetting the journal again after recovering the dentries but
>>> unfortunately the filesystem is still in a failed state despite setting
>>> joinable to true. The log of this crash is here:
>>> https://www.mrc-lmb.cam.ac.uk/scicomp/data/uploads/ceph/ceph-mds.pebbles-s4.log-20240708
>>> .
>>>
>>> I'm not sure how to proceed as I can't seem to get any MDS to take over
>>> the first rank. I would like to do a scrub of the filesystem and preferably
>>> overwrite the troublesome files with the originals on the live filesystem.
>>> Do you have any advice on how to make the filesystem leave its failed
>>> state? I have a backup of the journal before I reset it so I can roll back
>>> if necessary.
>>>
>>> Here are some details about the filesystem at present:
>>>
>>> root@pebbles-s2 11:49 [~]: ceph -s; ceph fs status
>>>   cluster:
>>> id: e3f7535e-d35f-4a5d-88f0-a1e97abcd631
>>> health: HEALTH_ERR
>>>  

[ceph-users] Re: CephFS MDS crashing during replay with standby MDSes crashing afterwards

2024-07-09 Thread Ivan Clayson

Hi Dhairya,

I would be more than happy to try and give as many details as possible 
but the slack channel is private and requires my email to have an 
account/ access to it.


Wouldn't taking the discussion about this error to a private channel 
also stop other users who experience this error from learning about how 
and why this happened as  well as possibly not be able to view the 
solution? Would it not be possible to discuss this more publicly for the 
benefit of the other users on the mailing list?


Kindest regards,

Ivan

On 09/07/2024 10:44, Dhairya Parmar wrote:

CAUTION: This email originated from outside of the LMB:
*.-dpar...@redhat.com-.*
Do not click links or open attachments unless you recognize the sender 
and know the content is safe.
If you think this is a phishing email, please forward it to 
phish...@mrc-lmb.cam.ac.uk



--

Hey Ivan,

This is a relatively new MDS crash, so this would require some 
investigation but I was instructed to recommend disaster-recovery 
steps [0] (except session reset) to you to get the FS up again.


This crash is being discussed on upstream CephFS slack channel [1] 
with @Venky Shankar  and other CephFS 
devs. I'd encourage you to join the conversation, we can discuss this 
in detail and maybe go through the incident step by step which should 
help analyse the crash better.


[0] 
https://docs.ceph.com/en/latest/cephfs/disaster-recovery-experts/#disaster-recovery-experts

[1] https://ceph-storage.slack.com/archives/C04LVQMHM9B/p1720443057919519

On Mon, Jul 8, 2024 at 7:37 PM Ivan Clayson  
wrote:


Hi Dhairya,

Thank you ever so much for having another look at this so quickly.
I don't think I have any logs similar to the ones you referenced
this time as my MDSs don't seem to enter the replay stage when
they crash (or at least don't now after I've thrown the logs away)
but those errors do crop up in the prior logs I shared when the
system first crashed.

Kindest regards,

Ivan

On 08/07/2024 14:08, Dhairya Parmar wrote:

CAUTION: This email originated from outside of the LMB:
*.-dpar...@redhat.com-.*
Do not click links or open attachments unless you recognize the
sender and know the content is safe.
If you think this is a phishing email, please forward it to
phish...@mrc-lmb.cam.ac.uk


--

Ugh, something went horribly wrong. I've downloaded the MDS logs
that contain assertion failure and it looks relevant to this [0].
Do you have client logs for this?

The other log that you shared is being downloaded right now, once
that's done and I'm done going through it, I'll update you.

[0] https://tracker.ceph.com/issues/54546

On Mon, Jul 8, 2024 at 4:49 PM Ivan Clayson
 wrote:

Hi Dhairya,

Sorry to resurrect this thread again, but we still
unfortunately have an issue with our filesystem after we
attempted to write new backups to it.

We finished the scrub of the filesystem on Friday and ran a
repair scrub on the 1 directory which had metadata damage.
After doing so and rebooting, the cluster reported no issues
and data was accessible again.

We re-started the backups to run over the weekend and
unfortunately the filesystem crashed again where the log of
the failure is here:

https://www.mrc-lmb.cam.ac.uk/scicomp/data/uploads/ceph/ceph-mds.pebbles-s2.log-20240708.gz.
We ran the backups on kernel mounts of the filesystem without
the nowsync option this time to avoid the out-of-sync write
problems..

I've tried resetting the journal again after recovering the
dentries but unfortunately the filesystem is still in a
failed state despite setting joinable to true. The log of
this crash is here:

https://www.mrc-lmb.cam.ac.uk/scicomp/data/uploads/ceph/ceph-mds.pebbles-s4.log-20240708.

I'm not sure how to proceed as I can't seem to get any MDS to
take over the first rank. I would like to do a scrub of the
filesystem and preferably overwrite the troublesome files
with the originals on the live filesystem. Do you have any
advice on how to make the filesystem leave its failed state?
I have a backup of the journal before I reset it so I can
roll back if necessary.

Here are some details about the filesystem at present:

root@pebbles-s2 11:49 [~]: ceph -s; ceph fs status
  cluster:
    id: e3f7535e-d35f-4a5d-88f0-a1e97abcd631
    health: HEALTH_ERR
    1 filesystem is degraded
    1 large omap objects
    1 filesystem is offline
    1 mds daemon damaged
nobackfill,norebalance,norecover,noscrub,nodeep-scrub,nosnaptrim
flag(s) set
    1750 pgs not 

[ceph-users] Re: CephFS MDS crashing during replay with standby MDSes crashing afterwards

2024-07-09 Thread Dhairya Parmar
Hey Ivan,

This is a relatively new MDS crash, so this would require some
investigation but I was instructed to recommend disaster-recovery steps [0]
(except session reset) to you to get the FS up again.

This crash is being discussed on upstream CephFS slack channel [1] with @Venky
Shankar  and other CephFS devs. I'd encourage you to
join the conversation, we can discuss this in detail and maybe go through
the incident step by step which should help analyse the crash better.

[0]
https://docs.ceph.com/en/latest/cephfs/disaster-recovery-experts/#disaster-recovery-experts
[1] https://ceph-storage.slack.com/archives/C04LVQMHM9B/p1720443057919519

On Mon, Jul 8, 2024 at 7:37 PM Ivan Clayson  wrote:

> Hi Dhairya,
>
> Thank you ever so much for having another look at this so quickly. I don't
> think I have any logs similar to the ones you referenced this time as my
> MDSs don't seem to enter the replay stage when they crash (or at least
> don't now after I've thrown the logs away) but those errors do crop up in
> the prior logs I shared when the system first crashed.
>
> Kindest regards,
>
> Ivan
> On 08/07/2024 14:08, Dhairya Parmar wrote:
>
> CAUTION: This email originated from outside of the LMB:
> *.-dpar...@redhat.com-.*
> Do not click links or open attachments unless you recognize the sender and
> know the content is safe.
> If you think this is a phishing email, please forward it to
> phish...@mrc-lmb.cam.ac.uk
>
>
> --
> Ugh, something went horribly wrong. I've downloaded the MDS logs that
> contain assertion failure and it looks relevant to this [0]. Do you have
> client logs for this?
>
> The other log that you shared is being downloaded right now, once that's
> done and I'm done going through it, I'll update you.
>
> [0] https://tracker.ceph.com/issues/54546
>
> On Mon, Jul 8, 2024 at 4:49 PM Ivan Clayson 
> wrote:
>
>> Hi Dhairya,
>>
>> Sorry to resurrect this thread again, but we still unfortunately have an
>> issue with our filesystem after we attempted to write new backups to it.
>>
>> We finished the scrub of the filesystem on Friday and ran a repair scrub
>> on the 1 directory which had metadata damage. After doing so and rebooting,
>> the cluster reported no issues and data was accessible again.
>>
>> We re-started the backups to run over the weekend and unfortunately the
>> filesystem crashed again where the log of the failure is here:
>> https://www.mrc-lmb.cam.ac.uk/scicomp/data/uploads/ceph/ceph-mds.pebbles-s2.log-20240708.gz.
>> We ran the backups on kernel mounts of the filesystem without the nowsync
>> option this time to avoid the out-of-sync write problems..
>>
>> I've tried resetting the journal again after recovering the dentries but
>> unfortunately the filesystem is still in a failed state despite setting
>> joinable to true. The log of this crash is here:
>> https://www.mrc-lmb.cam.ac.uk/scicomp/data/uploads/ceph/ceph-mds.pebbles-s4.log-20240708
>> .
>>
>> I'm not sure how to proceed as I can't seem to get any MDS to take over
>> the first rank. I would like to do a scrub of the filesystem and preferably
>> overwrite the troublesome files with the originals on the live filesystem.
>> Do you have any advice on how to make the filesystem leave its failed
>> state? I have a backup of the journal before I reset it so I can roll back
>> if necessary.
>>
>> Here are some details about the filesystem at present:
>>
>> root@pebbles-s2 11:49 [~]: ceph -s; ceph fs status
>>   cluster:
>> id: e3f7535e-d35f-4a5d-88f0-a1e97abcd631
>> health: HEALTH_ERR
>> 1 filesystem is degraded
>> 1 large omap objects
>> 1 filesystem is offline
>> 1 mds daemon damaged
>>
>> nobackfill,norebalance,norecover,noscrub,nodeep-scrub,nosnaptrim flag(s) set
>> 1750 pgs not deep-scrubbed in time
>> 1612 pgs not scrubbed in time
>>
>>   services:
>> mon: 4 daemons, quorum pebbles-s1,pebbles-s2,pebbles-s3,pebbles-s4
>> (age 50m)
>> mgr: pebbles-s2(active, since 77m), standbys: pebbles-s1, pebbles-s3,
>> pebbles-s4
>> mds: 1/2 daemons up, 3 standby
>> osd: 1380 osds: 1380 up (since 76m), 1379 in (since 10d); 10 remapped
>> pgs
>>  flags
>> nobackfill,norebalance,norecover,noscrub,nodeep-scrub,nosnaptrim
>>
>>   data:
>> volumes: 1/2 healthy, 1 recovering; 1 damaged
>> pools:   7 pools, 2177 pgs
>> objects: 3.24G objects, 6.7 PiB
>> usage:   8.6 PiB used, 14 PiB / 23 PiB avail
>> pgs: 11785954/27384310061 objects misplaced (0.043%)
>>  2167 active+clean
>>  6active+remapped+backfilling
>>  4active+remapped+backfill_wait
>>
>> ceph_backup - 0 clients
>> ===
>> RANK  STATE   MDS  ACTIVITY  DNS  INOS  DIRS  CAPS
>>  0failed
>> POOLTYPE USED  AVAIL
>>mds_backup_fs  metadata  1174G  3071G
>> ec82_primary_fs_datadata   0   3071G
>>   ec82pool  data8085T  4738T
>> ceph_archive - 2 

[ceph-users] Re: Phantom hosts

2024-07-09 Thread Eugen Block

Hi Tim,

is this still an issue? If it is, I recommend to add some more details  
so it's easier to follow your train of thought.


ceph osd tree
ceph -s
ceph health detail
ceph orch host ls

And then please point out which host you're trying to get rid of. I  
would deal with the rgw thing later. Is it possible, that the phantom  
host actually had OSDs on it? Maybe that needs to be cleaned up first.  
I had something similar on a customer cluster recently where we hunted  
failing OSDs but it turned out they were removed quite a while ago,  
just not properly cleaned up yet on the filesystem.


Thanks,
Eugen

Zitat von Tim Holloway :


It's getting worse.

As many may be aware, the venerable CentOS 7 OS is hitting end-of-life in a
matter of days.

The easiest way to upgrade my serves has been to simply create an alternate
disk with the new OS, turn my provisioning system loose on it, yank the old
OS system disk and jack in the new one.


However, Ceph is another matter. For that part, the simplest thing to do is
to destroy the Ceph node(s) on the affected box, do the OS upgrade, then
re-create the nodes.

But now I have even MORE strays. The OSD on my box lives on in Ceph in the
dashboard host view even though the documented removal procedures were
followed and the VM itself was destroyed.

Further, this last node is an RGW node and I cannot remove it from the RGW
configuration. It not only shows on the dashboard, it also lists as still
active on the command line and as entries in the config database no matter
what I do.


I really need some solution to this, as it's a major chokepoint in the
upgrade process


   Tim
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io