On Sat, Feb 10, 2024 at 10:05:02AM -0500, Vladimir Sigunov wrote:
> Hello Community!
> I would appreciate any help/suggestions with the massive RGWs outage we are
> facing.
> The cluster's overall status is acceptable (HEALTH_WARN because of some pgs
> not scrubbed in time), and the cluster is operational.
> However, all RGWs fail to start with a core dump.
> The only issue I see at the moment is the RGW GC queue (radosgs-admin gc
> list) that contains 600K records.
> I believe this could be the root cause of the issue. When I pause OSD iops
> (ceph osd pause), all RGWs starting with no issues.
> There are no large OMAPs or any other warnings in ceph -s output.

To get you going for the moment, how about disabling the GC threads in
the RGW daemon, and then processing GC async.

Add "rgw_enable_gc_threads=0" to ceph.conf.

After that, testing to see why you get the dump; start up a seperate RGW
instance with debug logging enabled.

-- 
Robin Hugh Johnson
Gentoo Linux: Dev, Infra Lead, Foundation President & Treasurer
E-Mail   : robb...@gentoo.org
GnuPG FP : 11ACBA4F 4778E3F6 E4EDF38E B27B944E 34884E85
GnuPG FP : 7D0B3CEB E9B85B1F 825BCECF EE05E6F6 A48F6136
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to