[ceph-users] Re: Lousy recovery for mclock and reef

2024-05-24 Thread Joshua Baergen
't think the change took effect even with > updating ceph.conf, restart and a direct asok config set. target memory > value is confirmed to be set via asok config get > > Nothing has helped. I still cannot break the 21 MiB/s barrier. > > Does anyone have any more idea

[ceph-users] Re: Lousy recovery for mclock and reef

2024-05-24 Thread Joshua Baergen
It requires an OSD restart, unfortunately. Josh On Fri, May 24, 2024 at 11:03 AM Mazzystr wrote: > > Is that a setting that can be applied runtime or does it req osd restart? > > On Fri, May 24, 2024 at 9:59 AM Joshua Baergen > wrote: > > > Hey Chris, > > &

[ceph-users] Re: Lousy recovery for mclock and reef

2024-05-24 Thread Joshua Baergen
Hey Chris, A number of users have been reporting issues with recovery on Reef with mClock. Most folks have had success reverting to osd_op_queue=wpq. AIUI 18.2.3 should have some mClock improvements but I haven't looked at the list myself yet. Josh On Fri, May 24, 2024 at 10:55 AM Mazzystr

[ceph-users] Re: Slow ops during recovery for RGW index pool only when degraded OSD is primary

2024-04-03 Thread Joshua Baergen
type and/or media? > > > > > On Apr 3, 2024, at 13:38, Joshua Baergen wrote: > > > > We've had success using osd_async_recovery_min_cost=0 to drastically > > reduce slow ops during index recovery. > > > > Josh > > > > On Wed, Apr 3, 2024 a

[ceph-users] Re: Slow ops during recovery for RGW index pool only when degraded OSD is primary

2024-04-03 Thread Joshua Baergen
We've had success using osd_async_recovery_min_cost=0 to drastically reduce slow ops during index recovery. Josh On Wed, Apr 3, 2024 at 11:29 AM Wesley Dillingham wrote: > > I am fighting an issue on an 18.2.0 cluster where a restart of an OSD which > supports the RGW index pool causes

[ceph-users] Re: S3 Partial Reads from Erasure Pool

2024-04-01 Thread Joshua Baergen
I think it depends what you mean by rados objects and s3 objects here. If you're talking about an object that was uploaded via MPU, and thus may comprise many rados objects, I don't think there's a difference in read behaviors based on pool type. If you're talking about reading a subset byte range

[ceph-users] Re: log_latency slow operation observed for submit_transact, latency = 22.644258499s

2024-03-22 Thread Joshua Baergen
Personally, I don't think the compaction is actually required. Reef has compact-on-iteration enabled, which should take care of this automatically. We see this sort of delay pretty often during PG cleaning, at the end of a PG being cleaned, when the PG has a high count of objects, whether or not

[ceph-users] Re: Why a lot of pgs are degraded after host(+osd) restarted?

2024-03-20 Thread Joshua Baergen
Hi Jaemin, It is normal for PGs to become degraded during a host reboot, since a copy of the data was taken offline and needs to be resynchronized after the host comes back. Normally this is quick, as the recovery mechanism only needs to modify those objects that have changed while the host is

[ceph-users] Re: OSDs not balanced

2024-03-04 Thread Joshua Baergen
The balancer will operate on all pools unless otherwise specified. Josh On Mon, Mar 4, 2024 at 1:12 PM Cedric wrote: > > Did the balancer has enabled pools ? "ceph balancer pool ls" > > Actually I am wondering if the balancer do something when no pools are > added. > > > > On Mon, Mar 4, 2024,

[ceph-users] Re: has anyone enabled bdev_enable_discard?

2024-03-02 Thread Joshua Baergen
Periodic discard was actually attempted in the past: https://github.com/ceph/ceph/pull/20723 A proper implementation would probably need appropriate scheduling/throttling that can be tuned so as to balance against client I/O impact. Josh On Sat, Mar 2, 2024 at 6:20 AM David C. wrote: > > Could