[ceph-users] Re: OSDs do not respect my memory tune limit

2022-12-02 Thread Daniel Brunner
for example on of my latest osd crashes looks like this in dmesg:

[Dec 2 08:26] bstore_mempool invoked oom-killer:
gfp_mask=0x24200ca(GFP_HIGHUSER_MOVABLE), nodemask=0, order=0,
oom_score_adj=0
[  +0.06] bstore_mempool
cpuset=ed46e6fa52c1e40f13389b349c54e62dcc8c65d76c4c7860e2ff7c39444d14cc
mems_allowed=0
[  +0.10] CPU: 3 PID: 3061712 Comm: bstore_mempool Tainted: GW
  4.9.312-7 #1
[  +0.02] Hardware name: Hardkernel ODROID-HC4 (DT)
[  +0.01] Call trace:
[  +0.11] [] dump_backtrace+0x0/0x230
[  +0.04] [] show_stack+0x28/0x34
[  +0.05] [] dump_stack+0xb0/0xe8
[  +0.06] [] dump_header+0x70/0x1d8
[  +0.05] [] oom_kill_process+0xec/0x490
[  +0.04] [] out_of_memory+0x124/0x2e0
[  +0.03] [] mem_cgroup_out_of_memory+0x58/0x80
[  +0.03] [] mem_cgroup_oom_synchronize+0x35c/0x3d4
[  +0.03] [] pagefault_out_of_memory+0x1c/0x80
[  +0.04] [] do_page_fault+0x38c/0x3b0
[  +0.03] [] do_translation_fault+0xd0/0xf0
[  +0.03] [] do_mem_abort+0x58/0xb0
[  +0.03] Exception stack(0xffc026637df0 to 0xffc026637f20)
[  +0.02] 7de0:   007f86d50c78
8207
[  +0.04] 7e00: ffc026637ec0 007f86d50c78 ffc08ae3b900
ffc08ae3b900
[  +0.02] 7e20: ffc026637ec0 0055851577c8 6000
000409ff
[  +0.03] 7e40:  ff80090837c0 ffc026637e90
ff800908c51c
[  +0.03] 7e60: ffc026637e90 ff800908145c 007f86d50c78
8207
[  +0.03] 7e80: 0008 ffc08ae3b900 
ff800908340c
[  +0.02] 7ea0:  0040c4e3f000 
0040c4e3f000
[  +0.03] 7ec0:  007f870d8b88 16e4c66f
108890ff61ee90a0
[  +0.03] 7ee0: 0055c5db2828 017f 007f870d6000
219d0e6c
[  +0.03] 7f00:  003b9aca 6389b6e9
16e4c66f
[  +0.03] [] do_el0_ia_bp_hardening+0x90/0xa0
[  +0.01] Exception stack(0xffc026637ea0 to 0xffc026637fd0)
[  +0.03] 7ea0:  0040c4e3f000 
0040c4e3f000
[  +0.03] 7ec0:  007f870d8b88 16e4c66f
108890ff61ee90a0
[  +0.03] 7ee0: 0055c5db2828 017f 007f870d6000
219d0e6c
[  +0.03] 7f00:  003b9aca 6389b6e9
16e4c66f
[  +0.02] 7f20: 0018 6389b6e9 0016a9ab002471a6
3b1b6f26b535
[  +0.03] 7f40: 005585ece630 007f86d50c78 
005605fb2130
[  +0.03] 7f60: 007f768f9ce8  
1003e8f3
[  +0.03] 7f80: 1003df58 1626e380 3b9aca00
112e0be826d694b3
[  +0.02] 7fa0: 0004d7bb2ef84792 007f768f9b40 0055859871ac
007f768f9b40
[  +0.02] 7fc0: 007f86d50c78 
[  +0.03] [] el0_ia+0x18/0x1c
[  +0.02] Task in
/docker/ed46e6fa52c1e40f13389b349c54e62dcc8c65d76c4c7860e2ff7c39444d14cc
killed as a result of limit of
/docker/ed46e6fa52c1e40f13389b349c54e62dcc8c65d76c4c7860e2ff7c39444d14cc
[  +0.11] memory: usage 3072000kB, limit 3072000kB, failcnt 4030563
[  +0.02] memory+swap: usage 4059888kB, limit 6144000kB, failcnt 0
[  +0.02] kmem: usage 4596kB, limit 9007199254740988kB, failcnt 0
[  +0.01] Memory cgroup stats for
/docker/ed46e6fa52c1e40f13389b349c54e62dcc8c65d76c4c7860e2ff7c39444d14cc:
cache:1308KB rss:3066096KB rss_huge:4096KB mapped_file:308KB dirty:0KB
writeback:0KB swap:987888KB inactive_anon:613232KB active_anon:2452864KB
inactive_file:864KB active_file:348KB unevictable:0KB
[  +0.21] [ pid ]   uid  tgid total_vm  rss nr_ptes nr_pmds
swapents oom_score_adj name
[  +0.000180] [3061162] 0 3061162  2140   4   3
   8 0 docker-init
[  +0.04] [3061175]   167 3061175  1294319   7645842298   9
248700 0 ceph-osd
[  +0.15] Memory cgroup out of memory: Kill process 3061175 (ceph-osd)
score 985 or sacrifice child
[  +0.004798] Killed process 3061175 (ceph-osd) total-vm:5177276kB,
anon-rss:3058332kB, file-rss:0kB, shmem-rss:0kB
[  +1.042284] oom_reaper: reaped process 3061175 (ceph-osd), now
anon-rss:0kB, file-rss:0kB, shmem-rss:0kB

Am Fr., 2. Dez. 2022 um 09:47 Uhr schrieb Daniel Brunner
:

> Hi,
>
> my OSDs are running odroid-hc4's and they only have about 4GB of memory,
> and every 10 minutes a random OSD crashes due to out of memory. Sadly the
> whole machine gets unresponsive when the memory gets completely full, so no
> ssh access or prometheus output in the meantime.
>
> After the osd successfully crashed and restarts, and the memory is free
> again, i can look into the machine again.
>
> I've set the memory limit very low on all OSDs:
>
> for i in {0..17} ; do sudo ceph config set osd.$i osd_memory_target
> 939524096 ; done
>
> which is the absolute minimum, about 0.9GB.
>
> Why are 

[ceph-users] Re: OSDs do not respect my memory tune limit

2022-12-02 Thread Janne Johansson
> my OSDs are running odroid-hc4's and they only have about 4GB of memory,
> and every 10 minutes a random OSD crashes due to out of memory. Sadly the
> whole machine gets unresponsive when the memory gets completely full, so no
> ssh access or prometheus output in the meantime.

> I've set the memory limit very low on all OSDs:
>
> for i in {0..17} ; do sudo ceph config set osd.$i osd_memory_target
> 939524096 ; done which is the absolute minimum, about 0.9GB.
>
> Why are the OSDs not respecting this limit?

The memory limit you set with osd_memory_target is about the parts
that _can_ scale up and down memory usage, like read caches and so
forth, but it is not all of the needed RAM to run an OSD with many
PGs/Objects.  If the box is too small, it is too small.

-- 
May the most significant bit of your life be positive.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: OSDs do not respect my memory tune limit

2022-12-02 Thread Daniel Brunner
Can I get rid of PGs after trying to decrease the number on the pool again?

Doing a backup and nuking the cluster seems a little too much work for me :)


$ sudo ceph osd pool get cephfs_data pg_num
pg_num: 128
$ sudo ceph osd pool set cephfs_data pg_num 16
$ sudo ceph osd pool get cephfs_data pg_num
pg_num: 128

Am Fr., 2. Dez. 2022 um 10:22 Uhr schrieb Janne Johansson <
icepic...@gmail.com>:

> > my OSDs are running odroid-hc4's and they only have about 4GB of memory,
> > and every 10 minutes a random OSD crashes due to out of memory. Sadly the
> > whole machine gets unresponsive when the memory gets completely full, so
> no
> > ssh access or prometheus output in the meantime.
>
> > I've set the memory limit very low on all OSDs:
> >
> > for i in {0..17} ; do sudo ceph config set osd.$i osd_memory_target
> > 939524096 ; done which is the absolute minimum, about 0.9GB.
> >
> > Why are the OSDs not respecting this limit?
>
> The memory limit you set with osd_memory_target is about the parts
> that _can_ scale up and down memory usage, like read caches and so
> forth, but it is not all of the needed RAM to run an OSD with many
> PGs/Objects.  If the box is too small, it is too small.
>
> --
> May the most significant bit of your life be positive.
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: OSDs do not respect my memory tune limit

2022-12-02 Thread Daniel Brunner
Thanks for the hint, I tried turning that off:

$ sudo ceph osd pool get cephfs_data pg_autoscale_mode
pg_autoscale_mode: on
$ sudo ceph osd pool set cephfs_data pg_autoscale_mode  off
set pool 9 pg_autoscale_mode to off
$ sudo ceph osd pool get cephfs_data pg_autoscale_mode
pg_autoscale_mode: off
$ sudo ceph osd pool set cephfs_data pg_num 16
$ sudo ceph osd pool get cephfs_data pg_num
pg_num: 128

Am Fr., 2. Dez. 2022 um 14:30 Uhr schrieb Anthony D'Atri <
anthony.da...@gmail.com>:

> Could be that you’re fighting with the autoscaler?
>
> > On Dec 2, 2022, at 4:58 AM, Daniel Brunner  wrote:
> >
> > Can I get rid of PGs after trying to decrease the number on the pool
> again?
> >
> > Doing a backup and nuking the cluster seems a little too much work for
> me :)
> >
> >
> > $ sudo ceph osd pool get cephfs_data pg_num
> > pg_num: 128
> > $ sudo ceph osd pool set cephfs_data pg_num 16
> > $ sudo ceph osd pool get cephfs_data pg_num
> > pg_num: 128
> >
> > Am Fr., 2. Dez. 2022 um 10:22 Uhr schrieb Janne Johansson <
> > icepic...@gmail.com>:
> >
> >>> my OSDs are running odroid-hc4's and they only have about 4GB of
> memory,
> >>> and every 10 minutes a random OSD crashes due to out of memory. Sadly
> the
> >>> whole machine gets unresponsive when the memory gets completely full,
> so
> >> no
> >>> ssh access or prometheus output in the meantime.
> >>
> >>> I've set the memory limit very low on all OSDs:
> >>>
> >>> for i in {0..17} ; do sudo ceph config set osd.$i osd_memory_target
> >>> 939524096 ; done which is the absolute minimum, about 0.9GB.
> >>>
> >>> Why are the OSDs not respecting this limit?
> >>
> >> The memory limit you set with osd_memory_target is about the parts
> >> that _can_ scale up and down memory usage, like read caches and so
> >> forth, but it is not all of the needed RAM to run an OSD with many
> >> PGs/Objects.  If the box is too small, it is too small.
> >>
> >> --
> >> May the most significant bit of your life be positive.
> >>
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io