[ceph-users] Re: snaptrim blocks io on ceph pacific even on fast NVMEs

Arthur Outhenin-Chalandre Wed, 10 Nov 2021 08:39:23 -0800

Hi,

On 11/10/21 16:14, Christoph Adomeit wrote:

But the cluster seemed to slowly "eat" storage space. So yesterday I decided to 
add 3 more NVMEs, 1 for each node. In the second i added the first nvme as ceph osd the 
cluster was crashing. I had high loads on all osds and all the osds where dying again and 
again until i set nodown,noout,noscrub,nodeep-scrub and rtemoved the new osd. The the 
cluster recovered but had slow io and lots of snaptriHm and snaptrim wait processes.

You may have hit this issue https://tracker.ceph.com/issues/52026. AFAIUthere could be some untrimmed snapshots (visible in snaptrimq_len with`ceph pg dump pgs`) which are only trimmed once the pg is repeered. Weexperience that during testing, but the root cause is not fullyunderstood (at least to me).

Maybe once you added your new OSDs made the snaptrim state appeared onvarious PGs which affected your cluster apparently.

I made this smoother by setting --osd_snap_trim_sleep=3.0

Over night the snaptrim_wait pgs became 0 and i had 15% mor free space in the 
ceph cluster. But during the day the snaptrim_waits increased and increased.

I then set osd_snap_trim_sleep to 0.0 again and most vms had extremely high 
iowaits ore crashed.

Now I did a ceph osd set nosnaptrim and the cluster is flying again. Iowait 0 
on all vms but count
of snaptrim wait is slowly increasing.

How can I get the snaptrims running fast and not affect ceph io performance ?
My theory is until yesterday for some reasons the snaptrims were not running for some 
reason and therefore the cluster was "eating" storage space. After crash 
yesterday and restarting the snaptrims the started.

On our test cluster we actually decreased `osd_snap_trim_sleep` to 0.1sinstead of the default 2s for hybrid OSD because the snaptrim we hadwould have lasted a few weeks without it IIRC. We didn't notice anyslowdowns, HDD crashing or anything like that (but this cluster doesn'thave any real production workloads, so we may have overlooked some aspects).

In your case the default value should be set to`osd_snap_trim_sleep_ssd` which is 0, so maybe with some SSD/NVME OSDthe snaptrim do affect performance (with the default settings atleast)... Therefore, you may want to set `osd_snap_trim_sleep` tosomething different than 0. The 0.1s sleep worked smoothly in our tests,but this was needed because I was stress testing snapshots and there wasmany many objects that needed this snaptrim process. You could probablyincrease this value for safety reasons, any value between 0.1s and 3s(that you already tested!) is probably fine!


Cheers,

--
Arthur Outhenin-Chalandre
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: snaptrim blocks io on ceph pacific even on fast NVMEs

Reply via email to