Re: [PATCH] bpf: fix overflow of bpf_jit_limit when PAGE_SIZE >= 64K
Daniel Borkmann writes: < snip > > > I would actually just like to get rid of the BPF_JIT_LIMIT_DEFAULT > define also given for 4.21 arm64 will have its own dedicated area for > JIT allocations where neither the above limit nor the MODULES_END/ > MODULES_VADDR one would fit and I don't want to make this even more > ugly with adding further cases into the core. Would the below variant > work for you? I haven't actually hit the bug so I won't ack/tested-by this, but it looks fine to me. cheers > From da9daf462d41ce5506c6b6318a9fa3d6d8a64f6c Mon Sep 17 00:00:00 2001 > From: Daniel Borkmann > Date: Mon, 10 Dec 2018 14:30:27 +0100 > Subject: [PATCH bpf] bpf: fix bpf_jit_limit knob for PAGE_SIZE >= 64K > > Michael and Sandipan report: > > Commit ede95a63b5 introduced a bpf_jit_limit tuneable to limit BPF > JIT allocations. At compile time it defaults to PAGE_SIZE * 4, > and is adjusted again at init time if MODULES_VADDR is defined. > > For ppc64 kernels, MODULES_VADDR isn't defined, so we're stuck with > the compile-time default at boot-time, which is 0x9c40 when > using 64K page size. This overflows the signed 32-bit bpf_jit_limit > value: > > root@ubuntu:/tmp# cat /proc/sys/net/core/bpf_jit_limit > -1673527296 > > and can cause various unexpected failures throughout the network > stack. In one case `strace dhclient eth0` reported: > > setsockopt(5, SOL_SOCKET, SO_ATTACH_FILTER, {len=11, filter=0x105dd27f8}, > 16) = -1 ENOTSUPP (Unknown error 524) > > and similar failures can be seen with tools like tcpdump. This doesn't > always reproduce however, and I'm not sure why. The more consistent > failure I've seen is an Ubuntu 18.04 KVM guest booted on a POWER9 > host would time out on systemd/netplan configuring a virtio-net NIC > with no noticeable errors in the logs. > > Given this and also given that in near future some architectures like > arm64 will have a custom area for BPF JIT image allocations we should > get rid of the BPF_JIT_LIMIT_DEFAULT fallback / default entirely. For > 4.21, we have an overridable bpf_jit_alloc_exec(), bpf_jit_free_exec() > so therefore add another overridable bpf_jit_alloc_exec_limit() helper > function which returns the possible size of the memory area for deriving > the default heuristic in bpf_jit_charge_init(). > > Like bpf_jit_alloc_exec() and bpf_jit_free_exec(), the new > bpf_jit_alloc_exec_limit() assumes that module_alloc() is the default > JIT memory provider, and therefore in case archs implement their custom > module_alloc() we use MODULES_{END,_VADDR} for limits and otherwise for > vmalloc_exec() cases like on ppc64 we use VMALLOC_{END,_START}. > > Fixes: ede95a63b5e8 ("bpf: add bpf_jit_limit knob to restrict unpriv > allocations") > Reported-by: Sandipan Das > Reported-by: Michael Roth > Signed-off-by: Daniel Borkmann > --- > kernel/bpf/core.c | 19 ++- > 1 file changed, 14 insertions(+), 5 deletions(-) > > diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c > index b1a3545..6c2332e 100644 > --- a/kernel/bpf/core.c > +++ b/kernel/bpf/core.c > @@ -365,13 +365,11 @@ void bpf_prog_kallsyms_del_all(struct bpf_prog *fp) > } > > #ifdef CONFIG_BPF_JIT > -# define BPF_JIT_LIMIT_DEFAULT (PAGE_SIZE * 4) > - > /* All BPF JIT sysctl knobs here. */ > int bpf_jit_enable __read_mostly = IS_BUILTIN(CONFIG_BPF_JIT_ALWAYS_ON); > int bpf_jit_harden __read_mostly; > int bpf_jit_kallsyms __read_mostly; > -int bpf_jit_limit__read_mostly = BPF_JIT_LIMIT_DEFAULT; > +int bpf_jit_limit__read_mostly; > > static __always_inline void > bpf_get_prog_addr_region(const struct bpf_prog *prog, > @@ -580,16 +578,27 @@ int bpf_get_kallsym(unsigned int symnum, unsigned long > *value, char *type, > > static atomic_long_t bpf_jit_current; > > +/* Can be overridden by an arch's JIT compiler if it has a custom, > + * dedicated BPF backend memory area, or if neither of the two > + * below apply. > + */ > +u64 __weak bpf_jit_alloc_exec_limit(void) > +{ > #if defined(MODULES_VADDR) > + return MODULES_END - MODULES_VADDR; > +#else > + return VMALLOC_END - VMALLOC_START; > +#endif > +} > + > static int __init bpf_jit_charge_init(void) > { > /* Only used as heuristic here to derive limit. */ > - bpf_jit_limit = min_t(u64, round_up((MODULES_END - MODULES_VADDR) >> 2, > + bpf_jit_limit = min_t(u64, round_up(bpf_jit_alloc_exec_limit() >> 2, > PAGE_SIZE), INT_MAX); > return 0; > } > pure_initcall(bpf_jit_charge_init); > -#endif > > static int bpf_jit_charge_modmem(u32 pages) > { > -- > 2.9.5
Re: [PATCH] bpf: fix overflow of bpf_jit_limit when PAGE_SIZE >= 64K
On 12/10/2018 06:27 PM, Michael Roth wrote: > Quoting Daniel Borkmann (2018-12-10 08:26:31) >> On 12/07/2018 04:36 PM, Michael Roth wrote: >>> Quoting Michael Ellerman (2018-12-07 06:31:13) Michael Roth writes: > Commit ede95a63b5 introduced a bpf_jit_limit tuneable to limit BPF > JIT allocations. At compile time it defaults to PAGE_SIZE * 4, > and is adjusted again at init time if MODULES_VADDR is defined. > > For ppc64 kernels, MODULES_VADDR isn't defined, so we're stuck with But maybe it should be, I don't know why we don't define it. > the compile-time default at boot-time, which is 0x9c40 when > using 64K page size. This overflows the signed 32-bit bpf_jit_limit > value: > > root@ubuntu:/tmp# cat /proc/sys/net/core/bpf_jit_limit > -1673527296 > > and can cause various unexpected failures throughout the network > stack. In one case `strace dhclient eth0` reported: > > setsockopt(5, SOL_SOCKET, SO_ATTACH_FILTER, {len=11, > filter=0x105dd27f8}, 16) = -1 ENOTSUPP (Unknown error 524) > > and similar failures can be seen with tools like tcpdump. This doesn't > always reproduce however, and I'm not sure why. The more consistent > failure I've seen is an Ubuntu 18.04 KVM guest booted on a POWER9 host > would time out on systemd/netplan configuring a virtio-net NIC with no > noticeable errors in the logs. > > Fix this by limiting the compile-time default for bpf_jit_limit to > INT_MAX. INT_MAX is a lot more than (4k * 4), so I guess I'm not clear on whether we should be using PAGE_SIZE here at all. I guess each BPF program uses at least one page is the thinking? >>> >>> That seems to be the case, at least, the max number of minimum-sized >>> allocations would be less on ppc64 since the allocations are always at >>> least PAGE_SIZE in size. The init-time default also limits to INT_MAX, >>> so it seemed consistent to do that here too. >>> Thanks for tracking this down. For some reason none of my ~10 test boxes have hit this, perhaps I don't have new enough userspace? >>> >>> I'm not too sure, I would've thought things like the dhclient case in >>> the commit log would fail every time, but sometimes I need to reboot the >>> guest before I start seeing the behavior. Maybe there's something special >>> about when JIT allocations are actually done that can affect >>> reproducibility? >>> >>> In my case at least the virtio-net networking timeout was consistent >>> enough for a bisect, but maybe it depends on the specific network >>> configuration (single NIC, basic DHCP through netplan/systemd in my case). >>> You don't mention why you needed to add BPF_MIN(), I assume because the kernel version of min() has gotten too complicated to work here? >>> >>> I wasn't sure if it was safe here or not, so I tried looking at other >>> users and came across: >>> >>> mm/vmalloc.c:777:#define VMAP_MIN(x, y) ((x) < (y) ? (x) : >>> (y)) /* can't use min() */ >>> >>> I'm not sure what the reasoning was (or whether it still applies), but I >>> figured it was safer to do the same here. Maybe Nick still recalls? >>> Daniel I assume you'll merge this via your tree? cheers > Fixes: ede95a63b5e8 ("bpf: add bpf_jit_limit knob to restrict unpriv > allocations") > Cc: linuxppc-...@ozlabs.org > Cc: Daniel Borkmann > Cc: Sandipan Das > Cc: Alexei Starovoitov > Signed-off-by: Michael Roth >> >> Thanks for the reports / fixes and sorry for my late reply (bit too >> swamped last week), some more thoughts below. >> > kernel/bpf/core.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c > index b1a3545d0ec8..55de4746cdfd 100644 > --- a/kernel/bpf/core.c > +++ b/kernel/bpf/core.c > @@ -365,7 +365,8 @@ void bpf_prog_kallsyms_del_all(struct bpf_prog *fp) > } > > #ifdef CONFIG_BPF_JIT > -# define BPF_JIT_LIMIT_DEFAULT (PAGE_SIZE * 4) > +# define BPF_MIN(x, y) ((x) < (y) ? (x) : (y)) > +# define BPF_JIT_LIMIT_DEFAULT BPF_MIN((PAGE_SIZE * 4), > INT_MAX) > > /* All BPF JIT sysctl knobs here. */ > int bpf_jit_enable __read_mostly = > IS_BUILTIN(CONFIG_BPF_JIT_ALWAYS_ON); >> >> I would actually just like to get rid of the BPF_JIT_LIMIT_DEFAULT >> define also given for 4.21 arm64 will have its own dedicated area for >> JIT allocations where neither the above limit nor the MODULES_END/ >> MODULES_VADDR one would fit and I don't want to make this even more >> ugly with adding further cases into the core. Would the below variant >> work for you? > > Looks good to me. My one concern (which is probably a separate > issue) is that the INT_MAX limit is a bit more punishing for larger > page sizes since the minimum allocations
Re: [PATCH] bpf: fix overflow of bpf_jit_limit when PAGE_SIZE >= 64K
Quoting Daniel Borkmann (2018-12-10 08:26:31) > On 12/07/2018 04:36 PM, Michael Roth wrote: > > Quoting Michael Ellerman (2018-12-07 06:31:13) > >> Michael Roth writes: > >> > >>> Commit ede95a63b5 introduced a bpf_jit_limit tuneable to limit BPF > >>> JIT allocations. At compile time it defaults to PAGE_SIZE * 4, > >>> and is adjusted again at init time if MODULES_VADDR is defined. > >>> > >>> For ppc64 kernels, MODULES_VADDR isn't defined, so we're stuck with > >> > >> But maybe it should be, I don't know why we don't define it. > >> > >>> the compile-time default at boot-time, which is 0x9c40 when > >>> using 64K page size. This overflows the signed 32-bit bpf_jit_limit > >>> value: > >>> > >>> root@ubuntu:/tmp# cat /proc/sys/net/core/bpf_jit_limit > >>> -1673527296 > >>> > >>> and can cause various unexpected failures throughout the network > >>> stack. In one case `strace dhclient eth0` reported: > >>> > >>> setsockopt(5, SOL_SOCKET, SO_ATTACH_FILTER, {len=11, > >>> filter=0x105dd27f8}, 16) = -1 ENOTSUPP (Unknown error 524) > >>> > >>> and similar failures can be seen with tools like tcpdump. This doesn't > >>> always reproduce however, and I'm not sure why. The more consistent > >>> failure I've seen is an Ubuntu 18.04 KVM guest booted on a POWER9 host > >>> would time out on systemd/netplan configuring a virtio-net NIC with no > >>> noticeable errors in the logs. > >>> > >>> Fix this by limiting the compile-time default for bpf_jit_limit to > >>> INT_MAX. > >> > >> INT_MAX is a lot more than (4k * 4), so I guess I'm not clear on > >> whether we should be using PAGE_SIZE here at all. I guess each BPF > >> program uses at least one page is the thinking? > > > > That seems to be the case, at least, the max number of minimum-sized > > allocations would be less on ppc64 since the allocations are always at > > least PAGE_SIZE in size. The init-time default also limits to INT_MAX, > > so it seemed consistent to do that here too. > > > >> > >> Thanks for tracking this down. For some reason none of my ~10 test boxes > >> have hit this, perhaps I don't have new enough userspace? > > > > I'm not too sure, I would've thought things like the dhclient case in > > the commit log would fail every time, but sometimes I need to reboot the > > guest before I start seeing the behavior. Maybe there's something special > > about when JIT allocations are actually done that can affect > > reproducibility? > > > > In my case at least the virtio-net networking timeout was consistent > > enough for a bisect, but maybe it depends on the specific network > > configuration (single NIC, basic DHCP through netplan/systemd in my case). > > > >> > >> You don't mention why you needed to add BPF_MIN(), I assume because the > >> kernel version of min() has gotten too complicated to work here? > > > > I wasn't sure if it was safe here or not, so I tried looking at other > > users and came across: > > > > mm/vmalloc.c:777:#define VMAP_MIN(x, y) ((x) < (y) ? (x) : > > (y)) /* can't use min() */ > > > > I'm not sure what the reasoning was (or whether it still applies), but I > > figured it was safer to do the same here. Maybe Nick still recalls? > > > >> > >> Daniel I assume you'll merge this via your tree? > >> > >> cheers > >> > >>> Fixes: ede95a63b5e8 ("bpf: add bpf_jit_limit knob to restrict unpriv > >>> allocations") > >>> Cc: linuxppc-...@ozlabs.org > >>> Cc: Daniel Borkmann > >>> Cc: Sandipan Das > >>> Cc: Alexei Starovoitov > >>> Signed-off-by: Michael Roth > > Thanks for the reports / fixes and sorry for my late reply (bit too > swamped last week), some more thoughts below. > > >>> kernel/bpf/core.c | 3 ++- > >>> 1 file changed, 2 insertions(+), 1 deletion(-) > >>> > >>> diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c > >>> index b1a3545d0ec8..55de4746cdfd 100644 > >>> --- a/kernel/bpf/core.c > >>> +++ b/kernel/bpf/core.c > >>> @@ -365,7 +365,8 @@ void bpf_prog_kallsyms_del_all(struct bpf_prog *fp) > >>> } > >>> > >>> #ifdef CONFIG_BPF_JIT > >>> -# define BPF_JIT_LIMIT_DEFAULT (PAGE_SIZE * 4) > >>> +# define BPF_MIN(x, y) ((x) < (y) ? (x) : (y)) > >>> +# define BPF_JIT_LIMIT_DEFAULT BPF_MIN((PAGE_SIZE * 4), > >>> INT_MAX) > >>> > >>> /* All BPF JIT sysctl knobs here. */ > >>> int bpf_jit_enable __read_mostly = > >>> IS_BUILTIN(CONFIG_BPF_JIT_ALWAYS_ON); > > I would actually just like to get rid of the BPF_JIT_LIMIT_DEFAULT > define also given for 4.21 arm64 will have its own dedicated area for > JIT allocations where neither the above limit nor the MODULES_END/ > MODULES_VADDR one would fit and I don't want to make this even more > ugly with adding further cases into the core. Would the below variant > work for you? Looks good to me. My one concern (which is probably a separate issue) is that the INT_MAX limit is a bit more punishing for larger page sizes since the minimum allocations seem to be 1 page. Are there reasonable workloads that
Re: [PATCH] bpf: fix overflow of bpf_jit_limit when PAGE_SIZE >= 64K
On 12/07/2018 04:36 PM, Michael Roth wrote: > Quoting Michael Ellerman (2018-12-07 06:31:13) >> Michael Roth writes: >> >>> Commit ede95a63b5 introduced a bpf_jit_limit tuneable to limit BPF >>> JIT allocations. At compile time it defaults to PAGE_SIZE * 4, >>> and is adjusted again at init time if MODULES_VADDR is defined. >>> >>> For ppc64 kernels, MODULES_VADDR isn't defined, so we're stuck with >> >> But maybe it should be, I don't know why we don't define it. >> >>> the compile-time default at boot-time, which is 0x9c40 when >>> using 64K page size. This overflows the signed 32-bit bpf_jit_limit >>> value: >>> >>> root@ubuntu:/tmp# cat /proc/sys/net/core/bpf_jit_limit >>> -1673527296 >>> >>> and can cause various unexpected failures throughout the network >>> stack. In one case `strace dhclient eth0` reported: >>> >>> setsockopt(5, SOL_SOCKET, SO_ATTACH_FILTER, {len=11, filter=0x105dd27f8}, >>> 16) = -1 ENOTSUPP (Unknown error 524) >>> >>> and similar failures can be seen with tools like tcpdump. This doesn't >>> always reproduce however, and I'm not sure why. The more consistent >>> failure I've seen is an Ubuntu 18.04 KVM guest booted on a POWER9 host >>> would time out on systemd/netplan configuring a virtio-net NIC with no >>> noticeable errors in the logs. >>> >>> Fix this by limiting the compile-time default for bpf_jit_limit to >>> INT_MAX. >> >> INT_MAX is a lot more than (4k * 4), so I guess I'm not clear on >> whether we should be using PAGE_SIZE here at all. I guess each BPF >> program uses at least one page is the thinking? > > That seems to be the case, at least, the max number of minimum-sized > allocations would be less on ppc64 since the allocations are always at > least PAGE_SIZE in size. The init-time default also limits to INT_MAX, > so it seemed consistent to do that here too. > >> >> Thanks for tracking this down. For some reason none of my ~10 test boxes >> have hit this, perhaps I don't have new enough userspace? > > I'm not too sure, I would've thought things like the dhclient case in > the commit log would fail every time, but sometimes I need to reboot the > guest before I start seeing the behavior. Maybe there's something special > about when JIT allocations are actually done that can affect > reproducibility? > > In my case at least the virtio-net networking timeout was consistent > enough for a bisect, but maybe it depends on the specific network > configuration (single NIC, basic DHCP through netplan/systemd in my case). > >> >> You don't mention why you needed to add BPF_MIN(), I assume because the >> kernel version of min() has gotten too complicated to work here? > > I wasn't sure if it was safe here or not, so I tried looking at other > users and came across: > > mm/vmalloc.c:777:#define VMAP_MIN(x, y) ((x) < (y) ? (x) : (y)) > /* can't use min() */ > > I'm not sure what the reasoning was (or whether it still applies), but I > figured it was safer to do the same here. Maybe Nick still recalls? > >> >> Daniel I assume you'll merge this via your tree? >> >> cheers >> >>> Fixes: ede95a63b5e8 ("bpf: add bpf_jit_limit knob to restrict unpriv >>> allocations") >>> Cc: linuxppc-...@ozlabs.org >>> Cc: Daniel Borkmann >>> Cc: Sandipan Das >>> Cc: Alexei Starovoitov >>> Signed-off-by: Michael Roth Thanks for the reports / fixes and sorry for my late reply (bit too swamped last week), some more thoughts below. >>> kernel/bpf/core.c | 3 ++- >>> 1 file changed, 2 insertions(+), 1 deletion(-) >>> >>> diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c >>> index b1a3545d0ec8..55de4746cdfd 100644 >>> --- a/kernel/bpf/core.c >>> +++ b/kernel/bpf/core.c >>> @@ -365,7 +365,8 @@ void bpf_prog_kallsyms_del_all(struct bpf_prog *fp) >>> } >>> >>> #ifdef CONFIG_BPF_JIT >>> -# define BPF_JIT_LIMIT_DEFAULT (PAGE_SIZE * 4) >>> +# define BPF_MIN(x, y) ((x) < (y) ? (x) : (y)) >>> +# define BPF_JIT_LIMIT_DEFAULT BPF_MIN((PAGE_SIZE * 4), INT_MAX) >>> >>> /* All BPF JIT sysctl knobs here. */ >>> int bpf_jit_enable __read_mostly = IS_BUILTIN(CONFIG_BPF_JIT_ALWAYS_ON); I would actually just like to get rid of the BPF_JIT_LIMIT_DEFAULT define also given for 4.21 arm64 will have its own dedicated area for JIT allocations where neither the above limit nor the MODULES_END/ MODULES_VADDR one would fit and I don't want to make this even more ugly with adding further cases into the core. Would the below variant work for you? Thanks, Daniel >From da9daf462d41ce5506c6b6318a9fa3d6d8a64f6c Mon Sep 17 00:00:00 2001 From: Daniel Borkmann Date: Mon, 10 Dec 2018 14:30:27 +0100 Subject: [PATCH bpf] bpf: fix bpf_jit_limit knob for PAGE_SIZE >= 64K Michael and Sandipan report: Commit ede95a63b5 introduced a bpf_jit_limit tuneable to limit BPF JIT allocations. At compile time it defaults to PAGE_SIZE * 4, and is adjusted again at init time if MODULES_VADDR is defined. For ppc64 kernels, MODULES_VADDR isn't defined, so
Re: [PATCH] bpf: fix overflow of bpf_jit_limit when PAGE_SIZE >= 64K
Quoting Michael Ellerman (2018-12-07 06:31:13) > Michael Roth writes: > > > Commit ede95a63b5 introduced a bpf_jit_limit tuneable to limit BPF > > JIT allocations. At compile time it defaults to PAGE_SIZE * 4, > > and is adjusted again at init time if MODULES_VADDR is defined. > > > > For ppc64 kernels, MODULES_VADDR isn't defined, so we're stuck with > > But maybe it should be, I don't know why we don't define it. > > > the compile-time default at boot-time, which is 0x9c40 when > > using 64K page size. This overflows the signed 32-bit bpf_jit_limit > > value: > > > > root@ubuntu:/tmp# cat /proc/sys/net/core/bpf_jit_limit > > -1673527296 > > > > and can cause various unexpected failures throughout the network > > stack. In one case `strace dhclient eth0` reported: > > > > setsockopt(5, SOL_SOCKET, SO_ATTACH_FILTER, {len=11, filter=0x105dd27f8}, > > 16) = -1 ENOTSUPP (Unknown error 524) > > > > and similar failures can be seen with tools like tcpdump. This doesn't > > always reproduce however, and I'm not sure why. The more consistent > > failure I've seen is an Ubuntu 18.04 KVM guest booted on a POWER9 host > > would time out on systemd/netplan configuring a virtio-net NIC with no > > noticeable errors in the logs. > > > > Fix this by limiting the compile-time default for bpf_jit_limit to > > INT_MAX. > > INT_MAX is a lot more than (4k * 4), so I guess I'm not clear on > whether we should be using PAGE_SIZE here at all. I guess each BPF > program uses at least one page is the thinking? That seems to be the case, at least, the max number of minimum-sized allocations would be less on ppc64 since the allocations are always at least PAGE_SIZE in size. The init-time default also limits to INT_MAX, so it seemed consistent to do that here too. > > Thanks for tracking this down. For some reason none of my ~10 test boxes > have hit this, perhaps I don't have new enough userspace? I'm not too sure, I would've thought things like the dhclient case in the commit log would fail every time, but sometimes I need to reboot the guest before I start seeing the behavior. Maybe there's something special about when JIT allocations are actually done that can affect reproducibility? In my case at least the virtio-net networking timeout was consistent enough for a bisect, but maybe it depends on the specific network configuration (single NIC, basic DHCP through netplan/systemd in my case). > > You don't mention why you needed to add BPF_MIN(), I assume because the > kernel version of min() has gotten too complicated to work here? I wasn't sure if it was safe here or not, so I tried looking at other users and came across: mm/vmalloc.c:777:#define VMAP_MIN(x, y) ((x) < (y) ? (x) : (y)) /* can't use min() */ I'm not sure what the reasoning was (or whether it still applies), but I figured it was safer to do the same here. Maybe Nick still recalls? > > Daniel I assume you'll merge this via your tree? > > cheers > > > Fixes: ede95a63b5e8 ("bpf: add bpf_jit_limit knob to restrict unpriv > > allocations") > > Cc: linuxppc-...@ozlabs.org > > Cc: Daniel Borkmann > > Cc: Sandipan Das > > Cc: Alexei Starovoitov > > Signed-off-by: Michael Roth > > --- > > kernel/bpf/core.c | 3 ++- > > 1 file changed, 2 insertions(+), 1 deletion(-) > > > > diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c > > index b1a3545d0ec8..55de4746cdfd 100644 > > --- a/kernel/bpf/core.c > > +++ b/kernel/bpf/core.c > > @@ -365,7 +365,8 @@ void bpf_prog_kallsyms_del_all(struct bpf_prog *fp) > > } > > > > #ifdef CONFIG_BPF_JIT > > -# define BPF_JIT_LIMIT_DEFAULT (PAGE_SIZE * 4) > > +# define BPF_MIN(x, y) ((x) < (y) ? (x) : (y)) > > +# define BPF_JIT_LIMIT_DEFAULT BPF_MIN((PAGE_SIZE * 4), INT_MAX) > > > > /* All BPF JIT sysctl knobs here. */ > > int bpf_jit_enable __read_mostly = IS_BUILTIN(CONFIG_BPF_JIT_ALWAYS_ON); > > -- > > 2.17.1 >
Re: [PATCH] bpf: fix overflow of bpf_jit_limit when PAGE_SIZE >= 64K
Michael Roth writes: > Commit ede95a63b5 introduced a bpf_jit_limit tuneable to limit BPF > JIT allocations. At compile time it defaults to PAGE_SIZE * 4, > and is adjusted again at init time if MODULES_VADDR is defined. > > For ppc64 kernels, MODULES_VADDR isn't defined, so we're stuck with But maybe it should be, I don't know why we don't define it. > the compile-time default at boot-time, which is 0x9c40 when > using 64K page size. This overflows the signed 32-bit bpf_jit_limit > value: > > root@ubuntu:/tmp# cat /proc/sys/net/core/bpf_jit_limit > -1673527296 > > and can cause various unexpected failures throughout the network > stack. In one case `strace dhclient eth0` reported: > > setsockopt(5, SOL_SOCKET, SO_ATTACH_FILTER, {len=11, filter=0x105dd27f8}, > 16) = -1 ENOTSUPP (Unknown error 524) > > and similar failures can be seen with tools like tcpdump. This doesn't > always reproduce however, and I'm not sure why. The more consistent > failure I've seen is an Ubuntu 18.04 KVM guest booted on a POWER9 host > would time out on systemd/netplan configuring a virtio-net NIC with no > noticeable errors in the logs. > > Fix this by limiting the compile-time default for bpf_jit_limit to > INT_MAX. INT_MAX is a lot more than (4k * 4), so I guess I'm not clear on whether we should be using PAGE_SIZE here at all. I guess each BPF program uses at least one page is the thinking? Thanks for tracking this down. For some reason none of my ~10 test boxes have hit this, perhaps I don't have new enough userspace? You don't mention why you needed to add BPF_MIN(), I assume because the kernel version of min() has gotten too complicated to work here? Daniel I assume you'll merge this via your tree? cheers > Fixes: ede95a63b5e8 ("bpf: add bpf_jit_limit knob to restrict unpriv > allocations") > Cc: linuxppc-...@ozlabs.org > Cc: Daniel Borkmann > Cc: Sandipan Das > Cc: Alexei Starovoitov > Signed-off-by: Michael Roth > --- > kernel/bpf/core.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c > index b1a3545d0ec8..55de4746cdfd 100644 > --- a/kernel/bpf/core.c > +++ b/kernel/bpf/core.c > @@ -365,7 +365,8 @@ void bpf_prog_kallsyms_del_all(struct bpf_prog *fp) > } > > #ifdef CONFIG_BPF_JIT > -# define BPF_JIT_LIMIT_DEFAULT (PAGE_SIZE * 4) > +# define BPF_MIN(x, y) ((x) < (y) ? (x) : (y)) > +# define BPF_JIT_LIMIT_DEFAULT BPF_MIN((PAGE_SIZE * 4), INT_MAX) > > /* All BPF JIT sysctl knobs here. */ > int bpf_jit_enable __read_mostly = IS_BUILTIN(CONFIG_BPF_JIT_ALWAYS_ON); > -- > 2.17.1