[ceph-users] Re: OSDs do not respect my memory tune limit
for example on of my latest osd crashes looks like this in dmesg: [Dec 2 08:26] bstore_mempool invoked oom-killer: gfp_mask=0x24200ca(GFP_HIGHUSER_MOVABLE), nodemask=0, order=0, oom_score_adj=0 [ +0.06] bstore_mempool cpuset=ed46e6fa52c1e40f13389b349c54e62dcc8c65d76c4c7860e2ff7c39444d14cc mems_allowed=0 [ +0.10] CPU: 3 PID: 3061712 Comm: bstore_mempool Tainted: GW 4.9.312-7 #1 [ +0.02] Hardware name: Hardkernel ODROID-HC4 (DT) [ +0.01] Call trace: [ +0.11] [] dump_backtrace+0x0/0x230 [ +0.04] [] show_stack+0x28/0x34 [ +0.05] [] dump_stack+0xb0/0xe8 [ +0.06] [] dump_header+0x70/0x1d8 [ +0.05] [] oom_kill_process+0xec/0x490 [ +0.04] [] out_of_memory+0x124/0x2e0 [ +0.03] [] mem_cgroup_out_of_memory+0x58/0x80 [ +0.03] [] mem_cgroup_oom_synchronize+0x35c/0x3d4 [ +0.03] [] pagefault_out_of_memory+0x1c/0x80 [ +0.04] [] do_page_fault+0x38c/0x3b0 [ +0.03] [] do_translation_fault+0xd0/0xf0 [ +0.03] [] do_mem_abort+0x58/0xb0 [ +0.03] Exception stack(0xffc026637df0 to 0xffc026637f20) [ +0.02] 7de0: 007f86d50c78 8207 [ +0.04] 7e00: ffc026637ec0 007f86d50c78 ffc08ae3b900 ffc08ae3b900 [ +0.02] 7e20: ffc026637ec0 0055851577c8 6000 000409ff [ +0.03] 7e40: ff80090837c0 ffc026637e90 ff800908c51c [ +0.03] 7e60: ffc026637e90 ff800908145c 007f86d50c78 8207 [ +0.03] 7e80: 0008 ffc08ae3b900 ff800908340c [ +0.02] 7ea0: 0040c4e3f000 0040c4e3f000 [ +0.03] 7ec0: 007f870d8b88 16e4c66f 108890ff61ee90a0 [ +0.03] 7ee0: 0055c5db2828 017f 007f870d6000 219d0e6c [ +0.03] 7f00: 003b9aca 6389b6e9 16e4c66f [ +0.03] [] do_el0_ia_bp_hardening+0x90/0xa0 [ +0.01] Exception stack(0xffc026637ea0 to 0xffc026637fd0) [ +0.03] 7ea0: 0040c4e3f000 0040c4e3f000 [ +0.03] 7ec0: 007f870d8b88 16e4c66f 108890ff61ee90a0 [ +0.03] 7ee0: 0055c5db2828 017f 007f870d6000 219d0e6c [ +0.03] 7f00: 003b9aca 6389b6e9 16e4c66f [ +0.02] 7f20: 0018 6389b6e9 0016a9ab002471a6 3b1b6f26b535 [ +0.03] 7f40: 005585ece630 007f86d50c78 005605fb2130 [ +0.03] 7f60: 007f768f9ce8 1003e8f3 [ +0.03] 7f80: 1003df58 1626e380 3b9aca00 112e0be826d694b3 [ +0.02] 7fa0: 0004d7bb2ef84792 007f768f9b40 0055859871ac 007f768f9b40 [ +0.02] 7fc0: 007f86d50c78 [ +0.03] [] el0_ia+0x18/0x1c [ +0.02] Task in /docker/ed46e6fa52c1e40f13389b349c54e62dcc8c65d76c4c7860e2ff7c39444d14cc killed as a result of limit of /docker/ed46e6fa52c1e40f13389b349c54e62dcc8c65d76c4c7860e2ff7c39444d14cc [ +0.11] memory: usage 3072000kB, limit 3072000kB, failcnt 4030563 [ +0.02] memory+swap: usage 4059888kB, limit 6144000kB, failcnt 0 [ +0.02] kmem: usage 4596kB, limit 9007199254740988kB, failcnt 0 [ +0.01] Memory cgroup stats for /docker/ed46e6fa52c1e40f13389b349c54e62dcc8c65d76c4c7860e2ff7c39444d14cc: cache:1308KB rss:3066096KB rss_huge:4096KB mapped_file:308KB dirty:0KB writeback:0KB swap:987888KB inactive_anon:613232KB active_anon:2452864KB inactive_file:864KB active_file:348KB unevictable:0KB [ +0.21] [ pid ] uid tgid total_vm rss nr_ptes nr_pmds swapents oom_score_adj name [ +0.000180] [3061162] 0 3061162 2140 4 3 8 0 docker-init [ +0.04] [3061175] 167 3061175 1294319 7645842298 9 248700 0 ceph-osd [ +0.15] Memory cgroup out of memory: Kill process 3061175 (ceph-osd) score 985 or sacrifice child [ +0.004798] Killed process 3061175 (ceph-osd) total-vm:5177276kB, anon-rss:3058332kB, file-rss:0kB, shmem-rss:0kB [ +1.042284] oom_reaper: reaped process 3061175 (ceph-osd), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB Am Fr., 2. Dez. 2022 um 09:47 Uhr schrieb Daniel Brunner : > Hi, > > my OSDs are running odroid-hc4's and they only have about 4GB of memory, > and every 10 minutes a random OSD crashes due to out of memory. Sadly the > whole machine gets unresponsive when the memory gets completely full, so no > ssh access or prometheus output in the meantime. > > After the osd successfully crashed and restarts, and the memory is free > again, i can look into the machine again. > > I've set the memory limit very low on all OSDs: > > for i in {0..17} ; do sudo ceph config set osd.$i osd_memory_target > 939524096 ; done > > which is the absolute minimum, about 0.9GB. > > Why are
[ceph-users] Re: OSDs do not respect my memory tune limit
> my OSDs are running odroid-hc4's and they only have about 4GB of memory, > and every 10 minutes a random OSD crashes due to out of memory. Sadly the > whole machine gets unresponsive when the memory gets completely full, so no > ssh access or prometheus output in the meantime. > I've set the memory limit very low on all OSDs: > > for i in {0..17} ; do sudo ceph config set osd.$i osd_memory_target > 939524096 ; done which is the absolute minimum, about 0.9GB. > > Why are the OSDs not respecting this limit? The memory limit you set with osd_memory_target is about the parts that _can_ scale up and down memory usage, like read caches and so forth, but it is not all of the needed RAM to run an OSD with many PGs/Objects. If the box is too small, it is too small. -- May the most significant bit of your life be positive. ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: OSDs do not respect my memory tune limit
Can I get rid of PGs after trying to decrease the number on the pool again? Doing a backup and nuking the cluster seems a little too much work for me :) $ sudo ceph osd pool get cephfs_data pg_num pg_num: 128 $ sudo ceph osd pool set cephfs_data pg_num 16 $ sudo ceph osd pool get cephfs_data pg_num pg_num: 128 Am Fr., 2. Dez. 2022 um 10:22 Uhr schrieb Janne Johansson < icepic...@gmail.com>: > > my OSDs are running odroid-hc4's and they only have about 4GB of memory, > > and every 10 minutes a random OSD crashes due to out of memory. Sadly the > > whole machine gets unresponsive when the memory gets completely full, so > no > > ssh access or prometheus output in the meantime. > > > I've set the memory limit very low on all OSDs: > > > > for i in {0..17} ; do sudo ceph config set osd.$i osd_memory_target > > 939524096 ; done which is the absolute minimum, about 0.9GB. > > > > Why are the OSDs not respecting this limit? > > The memory limit you set with osd_memory_target is about the parts > that _can_ scale up and down memory usage, like read caches and so > forth, but it is not all of the needed RAM to run an OSD with many > PGs/Objects. If the box is too small, it is too small. > > -- > May the most significant bit of your life be positive. > ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: OSDs do not respect my memory tune limit
Thanks for the hint, I tried turning that off: $ sudo ceph osd pool get cephfs_data pg_autoscale_mode pg_autoscale_mode: on $ sudo ceph osd pool set cephfs_data pg_autoscale_mode off set pool 9 pg_autoscale_mode to off $ sudo ceph osd pool get cephfs_data pg_autoscale_mode pg_autoscale_mode: off $ sudo ceph osd pool set cephfs_data pg_num 16 $ sudo ceph osd pool get cephfs_data pg_num pg_num: 128 Am Fr., 2. Dez. 2022 um 14:30 Uhr schrieb Anthony D'Atri < anthony.da...@gmail.com>: > Could be that you’re fighting with the autoscaler? > > > On Dec 2, 2022, at 4:58 AM, Daniel Brunner wrote: > > > > Can I get rid of PGs after trying to decrease the number on the pool > again? > > > > Doing a backup and nuking the cluster seems a little too much work for > me :) > > > > > > $ sudo ceph osd pool get cephfs_data pg_num > > pg_num: 128 > > $ sudo ceph osd pool set cephfs_data pg_num 16 > > $ sudo ceph osd pool get cephfs_data pg_num > > pg_num: 128 > > > > Am Fr., 2. Dez. 2022 um 10:22 Uhr schrieb Janne Johansson < > > icepic...@gmail.com>: > > > >>> my OSDs are running odroid-hc4's and they only have about 4GB of > memory, > >>> and every 10 minutes a random OSD crashes due to out of memory. Sadly > the > >>> whole machine gets unresponsive when the memory gets completely full, > so > >> no > >>> ssh access or prometheus output in the meantime. > >> > >>> I've set the memory limit very low on all OSDs: > >>> > >>> for i in {0..17} ; do sudo ceph config set osd.$i osd_memory_target > >>> 939524096 ; done which is the absolute minimum, about 0.9GB. > >>> > >>> Why are the OSDs not respecting this limit? > >> > >> The memory limit you set with osd_memory_target is about the parts > >> that _can_ scale up and down memory usage, like read caches and so > >> forth, but it is not all of the needed RAM to run an OSD with many > >> PGs/Objects. If the box is too small, it is too small. > >> > >> -- > >> May the most significant bit of your life be positive. > >> > > ___ > > ceph-users mailing list -- ceph-users@ceph.io > > To unsubscribe send an email to ceph-users-le...@ceph.io > > ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io