Casey,
Thanks.
I picked a few buckets in question and they have not been resharded (num_shards
not changed since creation). However, radosgw-admin lc reshard fix --bucket
BUCKET did restore the lifecycle and radosgw-admin lc process --bucket
BUCKETNAME did start deleting tihngs as expected on the slave side,
I will check one I did did not process by hand to see if it has run tomorrow. I
think may still have to change
rgw_lc_max_workerrgw_lc_max_wp_workerrgw_lifecycle_work_time
but we will see.
-Chris
On Monday, December 9, 2024 at 08:17:24 AM MST, Casey Bodley
<[email protected]> wrote:
hi Chris,
https://docs.ceph.com/en/latest/radosgw/dynamicresharding/#lifecycle-fixes
may be relevant here. there's a `radosgw-admin lc reshard fix` command
that you can run on the secondary site to add buckets back to the lc
list. if you omit the --bucket argument, it should scan all buckets
and re-link everything with a lifecycle policy
On Fri, Dec 6, 2024 at 5:04 PM Christopher Durham <[email protected]> wrote:
>
>
> I have 18.2.4 on Rocky 9 Linux.This system has been updated from octopus ->
> pacific -> quincy (18.2.2) -> (el8->el9 reinstall of each server, but ceph
> osd and mon survival) -> reef (18.2.4) over several years.
>
> It appears that I have two probably related problems with lifecycle
> expiration in a multsite configuration.
> I have two zones, one on each side of a multisite. I recently discovered
> (about a month after the el9 and reef 18.2.4 updates) that lifecycle
> expiration was (mostly) not working on the secondary zone side.I had thought
> initially that there may be replication issues, but while there are
> replication issues on individual buckets that required me to full sync
> individual buckets, the majority of the issues are becauselifecycle
> expiration is not working on the secondary side.
> The observation that caused me to think lifecycle is the issue is that based
> on a lifecycle policy for a given bucket, all objects in that bucket should
> be already deleted.What we are seeing is that all objects have been deleted
> from the bucket on the master zone, but NONE of them have been deleted on the
> slave side.This may vary based on the date the objects were created across
> multiple lifecycle runs on the master side, but objects never get
> deleted/expired on the slave side.
> I tracked this down to one of two causes, let's say for a given bucket bucket1
>
> 1. radosgw-admin lc list on the master shows that the bucket completes its
> lifecycle processing periodically. But on the slave side, it shows:
> "started": "Thu, 01 Jan 1970 ...""status": "UNINITIAL"
> If I run:
> radosgw-admin lc process --bucket bucket1
> that particular bucket flushes all of its expired objects (takes awhile). But
> as far as I can tell at this point, it never runs lifecycle again on the
> slave side
>
> Now, let's say I have bucket2.
> 2. radosgw-admin lc list on the slave side does NOT show the bucket in the
> json output, yet the same command on the master side shows it!
>
> Given this, if I run
> radosgw-admin lc process --bucket2
> causes C++ exceptions and the command crashes on the slave side (makes sense,
> actually)
>
> Yet in this case if I do:
> aws --profile bucket2_owner s3api get-bucket-lifecycle-configuration --bucket
> bucket2
> it shows the lifecycle configuration for the bucket, regardless whether I
> point the awscli to the master or slave zone.
> In this case, if I redeploy the lifecycle with
> put-bucket-lifecycle-configuration to the master side, then thelifecycle
> status shows up in
> radosgw-admin lc list
> on the slave side (as well as on the master) as UNINITIAL, and this issue
> devolves to #1 above,
> Note that lifecycle expiration on the slave side does work for some number of
> buckets, but most remain in the UNINITIAL state, and others not there at all
> until Iredeploy the lifecycle. The slave side is a lot more active in reading
> and writing.
>
> So, why would the bucket not show up in lc list on the slave side, where it
> had before (I can't say how long ago 'before' was)?How can I get it to
> automatically perform lifecycle on the slave side? Would this perhaps be
> related to
>
> rgw_lc_max_workerrgw_lc_max_wp_workerrgw_lifecycle_work_time
> It appears that lifecycle processing is independent on each side, meaning
> that a lifecycle processing of bucket A on one side runs separately from
> lifecycle processing of bucket A on the other side, and as such an object may
> exist on one side for a time when it has been already deleted on the other
> side.
>
> How does rgw_lifecycle_work_time work? Does it mean that outside of the
> work_time window no new lifecyle processing starts, or that those in process
> abort/stop?
> Either way this may explain my observations as to too many buckets staying in
> UNINITIAL when those that are processing have a lot of data to delete.
> And why is this last one rgw_lifecycle_work_time and not rgw_lc_work_time?
> Anyway, any help on theses issues would be appreciated. Thanks
> \-Chris
> _______________________________________________
> ceph-users mailing list -- [email protected]
> To unsubscribe send an email to [email protected]
_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]
_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]