Quoting Michael Ellerman (2018-12-07 06:31:13) > Michael Roth <mdr...@linux.vnet.ibm.com> writes: > > > Commit ede95a63b5 introduced a bpf_jit_limit tuneable to limit BPF > > JIT allocations. At compile time it defaults to PAGE_SIZE * 40000, > > and is adjusted again at init time if MODULES_VADDR is defined. > > > > For ppc64 kernels, MODULES_VADDR isn't defined, so we're stuck with > > But maybe it should be, I don't know why we don't define it. > > > the compile-time default at boot-time, which is 0x9c400000 when > > using 64K page size. This overflows the signed 32-bit bpf_jit_limit > > value: > > > > root@ubuntu:/tmp# cat /proc/sys/net/core/bpf_jit_limit > > -1673527296 > > > > and can cause various unexpected failures throughout the network > > stack. In one case `strace dhclient eth0` reported: > > > > setsockopt(5, SOL_SOCKET, SO_ATTACH_FILTER, {len=11, filter=0x105dd27f8}, > > 16) = -1 ENOTSUPP (Unknown error 524) > > > > and similar failures can be seen with tools like tcpdump. This doesn't > > always reproduce however, and I'm not sure why. The more consistent > > failure I've seen is an Ubuntu 18.04 KVM guest booted on a POWER9 host > > would time out on systemd/netplan configuring a virtio-net NIC with no > > noticeable errors in the logs. > > > > Fix this by limiting the compile-time default for bpf_jit_limit to > > INT_MAX. > > INT_MAX is a lot more than (4k * 40000), so I guess I'm not clear on > whether we should be using PAGE_SIZE here at all. I guess each BPF > program uses at least one page is the thinking?
That seems to be the case, at least, the max number of minimum-sized allocations would be less on ppc64 since the allocations are always at least PAGE_SIZE in size. The init-time default also limits to INT_MAX, so it seemed consistent to do that here too. > > Thanks for tracking this down. For some reason none of my ~10 test boxes > have hit this, perhaps I don't have new enough userspace? I'm not too sure, I would've thought things like the dhclient case in the commit log would fail every time, but sometimes I need to reboot the guest before I start seeing the behavior. Maybe there's something special about when JIT allocations are actually done that can affect reproducibility? In my case at least the virtio-net networking timeout was consistent enough for a bisect, but maybe it depends on the specific network configuration (single NIC, basic DHCP through netplan/systemd in my case). > > You don't mention why you needed to add BPF_MIN(), I assume because the > kernel version of min() has gotten too complicated to work here? I wasn't sure if it was safe here or not, so I tried looking at other users and came across: mm/vmalloc.c:777:#define VMAP_MIN(x, y) ((x) < (y) ? (x) : (y)) /* can't use min() */ I'm not sure what the reasoning was (or whether it still applies), but I figured it was safer to do the same here. Maybe Nick still recalls? > > Daniel I assume you'll merge this via your tree? > > cheers > > > Fixes: ede95a63b5e8 ("bpf: add bpf_jit_limit knob to restrict unpriv > > allocations") > > Cc: linuxppc-...@ozlabs.org > > Cc: Daniel Borkmann <dan...@iogearbox.net> > > Cc: Sandipan Das <sandi...@linux.ibm.com> > > Cc: Alexei Starovoitov <a...@kernel.org> > > Signed-off-by: Michael Roth <mdr...@linux.vnet.ibm.com> > > --- > > kernel/bpf/core.c | 3 ++- > > 1 file changed, 2 insertions(+), 1 deletion(-) > > > > diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c > > index b1a3545d0ec8..55de4746cdfd 100644 > > --- a/kernel/bpf/core.c > > +++ b/kernel/bpf/core.c > > @@ -365,7 +365,8 @@ void bpf_prog_kallsyms_del_all(struct bpf_prog *fp) > > } > > > > #ifdef CONFIG_BPF_JIT > > -# define BPF_JIT_LIMIT_DEFAULT (PAGE_SIZE * 40000) > > +# define BPF_MIN(x, y) ((x) < (y) ? (x) : (y)) > > +# define BPF_JIT_LIMIT_DEFAULT BPF_MIN((PAGE_SIZE * 40000), INT_MAX) > > > > /* All BPF JIT sysctl knobs here. */ > > int bpf_jit_enable __read_mostly = IS_BUILTIN(CONFIG_BPF_JIT_ALWAYS_ON); > > -- > > 2.17.1 >