[ceph-users] Re: something wrong with my monitor database ?
Le 13/06/2022 à 18:37, Stefan Kooman a écrit : CAUTION: This email originated from outside the organization. Do not click links or open attachments unless you recognize the sender and know the content is safe. On 6/13/22 18:21, Eric Le Lay wrote: Those objects are deleted but have snapshots, even if the pool itself doesn't have snapshots. What could cause that? root@hpc1a:~# rados -p storage stat rbd_data.5b423b48a4643f.0006a4e5 error stat-ing storage/rbd_data.5b423b48a4643f.0006a4e5: (2) No such file or directory root@hpc1a:~# rados -p storage lssnap 0 snaps root@hpc1a:~# rados -p storage listsnaps rbd_data.5b423b48a4643f.0006a4e5 rbd_data.5b423b48a4643f.0006a4e5: cloneid snaps size overlap 1160 1160 4194304 [1048576~32768,1097728~16384,1228800~16384,1409024~16384,1441792~16384,1572864~16384,1720320~16384,1900544~16384,2310144~16384] 1364 1364 4194304 [] Do the OSDs still need to trim the snapshots? Does data usage decline over time? Gr. Stefan thanks Stefan for your time! Snaptrims were re-enabled a week ago but the OSDs only snaptrim newly deleted snapshots. restarting or outing an OSD doesn't trigger them either. Crush-reweighting to 0 an OSD indeeds results in more storage being used! I'll drop the cluster and start again from scratch. Best, Eric ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: something wrong with my monitor database ?
Le 13/06/2022 à 17:54, Eric Le Lay a écrit : Le 10/06/2022 à 11:58, Stefan Kooman a écrit : CAUTION: This email originated from outside the organization. Do not click links or open attachments unless you recognize the sender and know the content is safe. On 6/10/22 11:41, Eric Le Lay wrote: Hello list, my ceph cluster was upgraded from nautilus to octopus last October, causing snaptrims to overload OSDs so I had to disable them (bluefs_buffered_io=false|true didn't help). Now I've copied data elsewhere and removed all clients and try to fix the cluster. Scraping it and starting over is possible, but it would be wonderful if we could figure out what's wrong with it... FYI: osd snap trim sleep <- adding some sleep might help alleviate the impact on the cluster. If HEALTH is OK I would not expect anything wrong with your cluster. Does " ceph osd dump |grep require_osd_release" give you require_osd_release octopus? Gr. Stefan Hi Stefan, thank you for your answer. Even osd_snap_trim_sleep=10 was not sustainable with normal cluster load.| Following your email I've tested bluefs_buffered_io=true again and indeed it dramatically reduces disk load, but not cpu nor slow ceph io. Yes, require_osd_release=octopus. What worries me is the pool is now void of rbd images, but still has 14TiB of object data. Here is my pool contents. rbd_directory, rbd_trash are empty. rados -p storage ls | sed 's/\(.*\..*\)\..*/\1/'|sort|uniq -c 1 rbd_children 6 rbd_data.13fc0d1d63c52b 2634 rbd_data.15ab844f62d5 258 rbd_data.15f1f2e2398dc7 133 rbd_data.17d93e1c5a4855 258 rbd_data.1af03e352ec460 2987 rbd_data.236cfc2474b020 206872 rbd_data.31c55ee49f0abb 604593 rbd_data.5b423b48a4643f 90 rbd_data.7b06b7abcc9441 81576 rbd_data.913b398f28d1 18 rbd_data.9662ade11235a 16051 rbd_data.e01609a7a07e20 278 rbd_data.e6b6f855b5172c 90 rbd_data.e85da37e044922 1 rbd_directory 1 rbd_info 1 rbd_trash Eric Those objects are deleted but have snapshots, even if the pool itself doesn't have snapshots. What could cause that? root@hpc1a:~# rados -p storage stat rbd_data.5b423b48a4643f.0006a4e5 error stat-ing storage/rbd_data.5b423b48a4643f.0006a4e5: (2) No such file or directory root@hpc1a:~# rados -p storage lssnap 0 snaps root@hpc1a:~# rados -p storage listsnaps rbd_data.5b423b48a4643f.0006a4e5 rbd_data.5b423b48a4643f.0006a4e5: cloneid snaps size overlap 1160 1160 4194304 [1048576~32768,1097728~16384,1228800~16384,1409024~16384,1441792~16384,1572864~16384,1720320~16384,1900544~16384,2310144~16384] 1364 1364 4194304 [] ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: something wrong with my monitor database ?
Le 10/06/2022 à 11:58, Stefan Kooman a écrit : CAUTION: This email originated from outside the organization. Do not click links or open attachments unless you recognize the sender and know the content is safe. On 6/10/22 11:41, Eric Le Lay wrote: Hello list, my ceph cluster was upgraded from nautilus to octopus last October, causing snaptrims to overload OSDs so I had to disable them (bluefs_buffered_io=false|true didn't help). Now I've copied data elsewhere and removed all clients and try to fix the cluster. Scraping it and starting over is possible, but it would be wonderful if we could figure out what's wrong with it... FYI: osd snap trim sleep <- adding some sleep might help alleviate the impact on the cluster. If HEALTH is OK I would not expect anything wrong with your cluster. Does " ceph osd dump |grep require_osd_release" give you require_osd_release octopus? Gr. Stefan |Hi Stefan, thank you for your answer. | |Even osd_snap_trim_sleep=10 was not sustainable with normal cluster load.| | | || |Following your email I've tested bluefs_buffered_io=true again and indeed it dramatically reduces disk load, but not cpu nor slow ceph io. Yes, require_osd_release=octopus. What worries me is the pool is now void of rbd images, but still has 14TiB of object data. Here is my pool contents. rbd_directory, rbd_trash are empty. rados -p storage ls | sed 's/\(.*\..*\)\..*/\1/'|sort|uniq -c 1 rbd_children 6 rbd_data.13fc0d1d63c52b 2634 rbd_data.15ab844f62d5 258 rbd_data.15f1f2e2398dc7 133 rbd_data.17d93e1c5a4855 258 rbd_data.1af03e352ec460 2987 rbd_data.236cfc2474b020 206872 rbd_data.31c55ee49f0abb 604593 rbd_data.5b423b48a4643f 90 rbd_data.7b06b7abcc9441 81576 rbd_data.913b398f28d1 18 rbd_data.9662ade11235a 16051 rbd_data.e01609a7a07e20 278 rbd_data.e6b6f855b5172c 90 rbd_data.e85da37e044922 1 rbd_directory 1 rbd_info 1 rbd_trash Eric | ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io