subject:"\[ceph\-users\] Re\: Very slow snaptrim operations blocking client I\/O"

[ceph-users] Re: Very slow snaptrim operations blocking client I/O

2023-02-20 Thread Boris Behrens

Hi, we've encountered the same issue after upgrading to octopus on on of our rbd cluster, and now it reappears after the autoscaler lowered the PGs form 8k to 2k for the RBD pool. What we've done in the past: - recreate all OSD after our 2nd incident with slow OPS in a single week after the ceph

[ceph-users] Re: Very slow snaptrim operations blocking client I/O

2023-02-15 Thread Victor Rodriguez

An update on this for the record: To fully solve this I've had to destroy each OSD and create then again, one by one. I could have done it one host at a time but I've preferred to be on the safest side just in case something else went wrong. The values for num_pgmeta_omap (which I don't know

[ceph-users] Re: Very slow snaptrim operations blocking client I/O

2023-01-30 Thread Victor Rodriguez

On 1/30/23 15:15, Ana Aviles wrote: Hi, Josh already suggested, but I will one more time. We had similar behaviour upgrading from Nautilus to Pacific. In our case compacting the OSDs did the trick. Thanks for chimming in! Unfortunately, in my case neither an online compaction (ceph tell

[ceph-users] Re: Very slow snaptrim operations blocking client I/O

2023-01-30 Thread Ana Aviles

ctor Rodriguez Sent: 29 January 2023 22:40:46 To: ceph-users@ceph.io Subject: [ceph-users] Re: Very slow snaptrim operations blocking client I/O Looks like this is going to take a few days. I hope to manage the available performance for VMs with osd_snap_trim_sleep_ssd. I'm wondering if afte

[ceph-users] Re: Very slow snaptrim operations blocking client I/O

2023-01-30 Thread Frank Schilder

relevant, I could copy pieces I have into here. > > Best regards, > = > Frank Schilder > AIT Risø Campus > Bygning 109, rum S14 > > > From: Victor Rodriguez > Sent: 29 January 2023 22:40:46 > To: ceph-users@ceph.io > Subj

[ceph-users] Re: Very slow snaptrim operations blocking client I/O

2023-01-30 Thread Victor Rodriguez

lder AIT Risø Campus Bygning 109, rum S14 From: Victor Rodriguez Sent: 29 January 2023 22:40:46 To: ceph-users@ceph.io Subject: [ceph-users] Re: Very slow snaptrim operations blocking client I/O Looks like this is going to take a few days. I hope to

[ceph-users] Re: Very slow snaptrim operations blocking client I/O

2023-01-30 Thread Frank Schilder

ceph-users] Re: Very slow snaptrim operations blocking client I/O Looks like this is going to take a few days. I hope to manage the available performance for VMs with osd_snap_trim_sleep_ssd. I'm wondering if after that long snaptrim process you went through, was your cluster was stable again and

[ceph-users] Re: Very slow snaptrim operations blocking client I/O

2023-01-29 Thread Victor Rodriguez

Looks like this is going to take a few days. I hope to manage the available performance for VMs with osd_snap_trim_sleep_ssd. I'm wondering if after that long snaptrim process you went through, was your cluster was stable again and snapshots/snaptrims did work properly? On 1/29/23 16:01,

[ceph-users] Re: Very slow snaptrim operations blocking client I/O

2023-01-29 Thread Matt Vandermeulen

I should have explicitly stated that during the recovery, it was still quite bumpy for customers. Some snaptrims were very quick, some took what felt like a really long time. This was however a cluster with a very large number of volumes and a long, long history of snapshots. I'm not sure

[ceph-users] Re: Very slow snaptrim operations blocking client I/O

2023-01-28 Thread Victor Rodriguez

On 1/29/23 00:50, Matt Vandermeulen wrote: I've observed a similar horror when upgrading a cluster from Luminous to Nautilus, which had the same effect of an overwhelming amount of snaptrim making the cluster unusable. In our case, we held its hand by setting all OSDs to have zero max

[ceph-users] Re: Very slow snaptrim operations blocking client I/O

2023-01-28 Thread Matt Vandermeulen

I've observed a similar horror when upgrading a cluster from Luminous to Nautilus, which had the same effect of an overwhelming amount of snaptrim making the cluster unusable. In our case, we held its hand by setting all OSDs to have zero max trimming PGs, unsetting nosnaptrim, and then

[ceph-users] Re: Very slow snaptrim operations blocking client I/O

2023-01-28 Thread Victor Rodriguez

After some investigation this is what I'm seeing: - OSD processes get stuck at least at 100% CPU if I ceph osd unset nosnaptrim. They keep at 100% CPU even if I ceph osd set nosnaptrim. They stayed like that for at least 26 hours. Some quick benchmarks don't show a reduction of the

[ceph-users] Re: Very slow snaptrim operations blocking client I/O

2023-01-27 Thread Victor Rodriguez

On 1/27/23 17:44, Josh Baergen wrote: This might be due to tombstone accumulation in rocksdb. You can try to issue a compact to all of your OSDs and see if that helps (ceph tell osd.XXX compact). I usually prefer to do this one host at a time just in case it causes issues, though on a

[ceph-users] Re: Very slow snaptrim operations blocking client I/O

2023-01-27 Thread Victor Rodriguez

FWIW, the snapshot was in pool cephVMs01_comp, which does use compresion. How is your pg distribution on your osd devices? Looks like the PG's are not perfectly balanced, but doesn't seem to be too bad: ceph osd df tree ID CLASS WEIGHT REWEIGHT SIZE RAW USE DATA OMAP META

[ceph-users] Re: Very slow snaptrim operations blocking client I/O

2023-01-27 Thread Josh Baergen

This might be due to tombstone accumulation in rocksdb. You can try to issue a compact to all of your OSDs and see if that helps (ceph tell osd.XXX compact). I usually prefer to do this one host at a time just in case it causes issues, though on a reasonably fast RBD cluster you can often get away

[ceph-users] Re: Very slow snaptrim operations blocking client I/O

2023-01-27 Thread Szabo, Istvan (Agoda)

How is your pg distribution on your osd devices? Do you have enough assigned pgs? Istvan Szabo Staff Infrastructure Engineer --- Agoda Services Co., Ltd. e: istvan.sz...@agoda.com

[ceph-users] Re: Very slow snaptrim operations blocking client I/O

2023-01-27 Thread Victor Rodriguez

Ah yes, checked that too. Monitors and OSD's report with ceph config show-with-defaults that bluefs_buffered_io is set to true as default setting (it isn't overriden somewere). On 1/27/23 17:15, Wesley Dillingham wrote: I hit this issue once on a nautilus cluster and changed the OSD

[ceph-users] Re: Very slow snaptrim operations blocking client I/O

2023-01-27 Thread Wesley Dillingham

I hit this issue once on a nautilus cluster and changed the OSD parameter bluefs_buffered_io = true (was set at false). I believe the default of this parameter was switched from false to true in release 14.2.20, however, perhaps you could still check what your osds are configured with in regard to

[ceph-users] Re: Very slow snaptrim operations blocking client I/O

[ceph-users] Re: Very slow snaptrim operations blocking client I/O

[ceph-users] Re: Very slow snaptrim operations blocking client I/O

[ceph-users] Re: Very slow snaptrim operations blocking client I/O

[ceph-users] Re: Very slow snaptrim operations blocking client I/O

[ceph-users] Re: Very slow snaptrim operations blocking client I/O

[ceph-users] Re: Very slow snaptrim operations blocking client I/O

[ceph-users] Re: Very slow snaptrim operations blocking client I/O

[ceph-users] Re: Very slow snaptrim operations blocking client I/O

[ceph-users] Re: Very slow snaptrim operations blocking client I/O

[ceph-users] Re: Very slow snaptrim operations blocking client I/O

[ceph-users] Re: Very slow snaptrim operations blocking client I/O

[ceph-users] Re: Very slow snaptrim operations blocking client I/O

[ceph-users] Re: Very slow snaptrim operations blocking client I/O

[ceph-users] Re: Very slow snaptrim operations blocking client I/O

[ceph-users] Re: Very slow snaptrim operations blocking client I/O

[ceph-users] Re: Very slow snaptrim operations blocking client I/O

[ceph-users] Re: Very slow snaptrim operations blocking client I/O

18 matches

Site Navigation

Mail list logo

Footer information