Re: [PATCH] bpf: fix overflow of bpf_jit_limit when PAGE_SIZE >= 64K

2018-12-11 Thread Michael Ellerman
Daniel Borkmann  writes:
< snip >
>
> I would actually just like to get rid of the BPF_JIT_LIMIT_DEFAULT
> define also given for 4.21 arm64 will have its own dedicated area for
> JIT allocations where neither the above limit nor the MODULES_END/
> MODULES_VADDR one would fit and I don't want to make this even more
> ugly with adding further cases into the core. Would the below variant
> work for you?

I haven't actually hit the bug so I won't ack/tested-by this, but it
looks fine to me.

cheers

> From da9daf462d41ce5506c6b6318a9fa3d6d8a64f6c Mon Sep 17 00:00:00 2001
> From: Daniel Borkmann 
> Date: Mon, 10 Dec 2018 14:30:27 +0100
> Subject: [PATCH bpf] bpf: fix bpf_jit_limit knob for PAGE_SIZE >= 64K
>
> Michael and Sandipan report:
>
>   Commit ede95a63b5 introduced a bpf_jit_limit tuneable to limit BPF
>   JIT allocations. At compile time it defaults to PAGE_SIZE * 4,
>   and is adjusted again at init time if MODULES_VADDR is defined.
>
>   For ppc64 kernels, MODULES_VADDR isn't defined, so we're stuck with
>   the compile-time default at boot-time, which is 0x9c40 when
>   using 64K page size. This overflows the signed 32-bit bpf_jit_limit
>   value:
>
>   root@ubuntu:/tmp# cat /proc/sys/net/core/bpf_jit_limit
>   -1673527296
>
>   and can cause various unexpected failures throughout the network
>   stack. In one case `strace dhclient eth0` reported:
>
>   setsockopt(5, SOL_SOCKET, SO_ATTACH_FILTER, {len=11, filter=0x105dd27f8},
>  16) = -1 ENOTSUPP (Unknown error 524)
>
>   and similar failures can be seen with tools like tcpdump. This doesn't
>   always reproduce however, and I'm not sure why. The more consistent
>   failure I've seen is an Ubuntu 18.04 KVM guest booted on a POWER9
>   host would time out on systemd/netplan configuring a virtio-net NIC
>   with no noticeable errors in the logs.
>
> Given this and also given that in near future some architectures like
> arm64 will have a custom area for BPF JIT image allocations we should
> get rid of the BPF_JIT_LIMIT_DEFAULT fallback / default entirely. For
> 4.21, we have an overridable bpf_jit_alloc_exec(), bpf_jit_free_exec()
> so therefore add another overridable bpf_jit_alloc_exec_limit() helper
> function which returns the possible size of the memory area for deriving
> the default heuristic in bpf_jit_charge_init().
>
> Like bpf_jit_alloc_exec() and bpf_jit_free_exec(), the new
> bpf_jit_alloc_exec_limit() assumes that module_alloc() is the default
> JIT memory provider, and therefore in case archs implement their custom
> module_alloc() we use MODULES_{END,_VADDR} for limits and otherwise for
> vmalloc_exec() cases like on ppc64 we use VMALLOC_{END,_START}.
>
> Fixes: ede95a63b5e8 ("bpf: add bpf_jit_limit knob to restrict unpriv 
> allocations")
> Reported-by: Sandipan Das 
> Reported-by: Michael Roth 
> Signed-off-by: Daniel Borkmann 
> ---
>  kernel/bpf/core.c | 19 ++-
>  1 file changed, 14 insertions(+), 5 deletions(-)
>
> diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
> index b1a3545..6c2332e 100644
> --- a/kernel/bpf/core.c
> +++ b/kernel/bpf/core.c
> @@ -365,13 +365,11 @@ void bpf_prog_kallsyms_del_all(struct bpf_prog *fp)
>  }
>
>  #ifdef CONFIG_BPF_JIT
> -# define BPF_JIT_LIMIT_DEFAULT   (PAGE_SIZE * 4)
> -
>  /* All BPF JIT sysctl knobs here. */
>  int bpf_jit_enable   __read_mostly = IS_BUILTIN(CONFIG_BPF_JIT_ALWAYS_ON);
>  int bpf_jit_harden   __read_mostly;
>  int bpf_jit_kallsyms __read_mostly;
> -int bpf_jit_limit__read_mostly = BPF_JIT_LIMIT_DEFAULT;
> +int bpf_jit_limit__read_mostly;
>
>  static __always_inline void
>  bpf_get_prog_addr_region(const struct bpf_prog *prog,
> @@ -580,16 +578,27 @@ int bpf_get_kallsym(unsigned int symnum, unsigned long 
> *value, char *type,
>
>  static atomic_long_t bpf_jit_current;
>
> +/* Can be overridden by an arch's JIT compiler if it has a custom,
> + * dedicated BPF backend memory area, or if neither of the two
> + * below apply.
> + */
> +u64 __weak bpf_jit_alloc_exec_limit(void)
> +{
>  #if defined(MODULES_VADDR)
> + return MODULES_END - MODULES_VADDR;
> +#else
> + return VMALLOC_END - VMALLOC_START;
> +#endif
> +}
> +
>  static int __init bpf_jit_charge_init(void)
>  {
>   /* Only used as heuristic here to derive limit. */
> - bpf_jit_limit = min_t(u64, round_up((MODULES_END - MODULES_VADDR) >> 2,
> + bpf_jit_limit = min_t(u64, round_up(bpf_jit_alloc_exec_limit() >> 2,
>   PAGE_SIZE), INT_MAX);
>   return 0;
>  }
>  pure_initcall(bpf_jit_charge_init);
> -#endif
>
>  static int bpf_jit_charge_modmem(u32 pages)
>  {
> -- 
> 2.9.5


Re: [PATCH] bpf: fix overflow of bpf_jit_limit when PAGE_SIZE >= 64K

2018-12-10 Thread Daniel Borkmann
On 12/10/2018 06:27 PM, Michael Roth wrote:
> Quoting Daniel Borkmann (2018-12-10 08:26:31)
>> On 12/07/2018 04:36 PM, Michael Roth wrote:
>>> Quoting Michael Ellerman (2018-12-07 06:31:13)
 Michael Roth  writes:

> Commit ede95a63b5 introduced a bpf_jit_limit tuneable to limit BPF
> JIT allocations. At compile time it defaults to PAGE_SIZE * 4,
> and is adjusted again at init time if MODULES_VADDR is defined.
>
> For ppc64 kernels, MODULES_VADDR isn't defined, so we're stuck with

 But maybe it should be, I don't know why we don't define it.

> the compile-time default at boot-time, which is 0x9c40 when
> using 64K page size. This overflows the signed 32-bit bpf_jit_limit
> value:
>
>   root@ubuntu:/tmp# cat /proc/sys/net/core/bpf_jit_limit
>   -1673527296
>
> and can cause various unexpected failures throughout the network
> stack. In one case `strace dhclient eth0` reported:
>
>   setsockopt(5, SOL_SOCKET, SO_ATTACH_FILTER, {len=11, 
> filter=0x105dd27f8}, 16) = -1 ENOTSUPP (Unknown error 524)
>
> and similar failures can be seen with tools like tcpdump. This doesn't
> always reproduce however, and I'm not sure why. The more consistent
> failure I've seen is an Ubuntu 18.04 KVM guest booted on a POWER9 host
> would time out on systemd/netplan configuring a virtio-net NIC with no
> noticeable errors in the logs.
>
> Fix this by limiting the compile-time default for bpf_jit_limit to
> INT_MAX.

 INT_MAX is a lot more than (4k * 4), so I guess I'm not clear on
 whether we should be using PAGE_SIZE here at all. I guess each BPF
 program uses at least one page is the thinking?
>>>
>>> That seems to be the case, at least, the max number of minimum-sized
>>> allocations would be less on ppc64 since the allocations are always at
>>> least PAGE_SIZE in size. The init-time default also limits to INT_MAX,
>>> so it seemed consistent to do that here too.
>>>

 Thanks for tracking this down. For some reason none of my ~10 test boxes
 have hit this, perhaps I don't have new enough userspace?
>>>
>>> I'm not too sure, I would've thought things like the dhclient case in
>>> the commit log would fail every time, but sometimes I need to reboot the
>>> guest before I start seeing the behavior. Maybe there's something special
>>> about when JIT allocations are actually done that can affect
>>> reproducibility?
>>>
>>> In my case at least the virtio-net networking timeout was consistent
>>> enough for a bisect, but maybe it depends on the specific network
>>> configuration (single NIC, basic DHCP through netplan/systemd in my case).
>>>

 You don't mention why you needed to add BPF_MIN(), I assume because the
 kernel version of min() has gotten too complicated to work here?
>>>
>>> I wasn't sure if it was safe here or not, so I tried looking at other
>>> users and came across:
>>>
>>> mm/vmalloc.c:777:#define VMAP_MIN(x, y)   ((x) < (y) ? (x) : 
>>> (y)) /* can't use min() */
>>>
>>> I'm not sure what the reasoning was (or whether it still applies), but I
>>> figured it was safer to do the same here. Maybe Nick still recalls?
>>>

 Daniel I assume you'll merge this via your tree?

 cheers

> Fixes: ede95a63b5e8 ("bpf: add bpf_jit_limit knob to restrict unpriv 
> allocations")
> Cc: linuxppc-...@ozlabs.org
> Cc: Daniel Borkmann 
> Cc: Sandipan Das 
> Cc: Alexei Starovoitov 
> Signed-off-by: Michael Roth 
>>
>> Thanks for the reports / fixes and sorry for my late reply (bit too
>> swamped last week), some more thoughts below.
>>
>  kernel/bpf/core.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
> index b1a3545d0ec8..55de4746cdfd 100644
> --- a/kernel/bpf/core.c
> +++ b/kernel/bpf/core.c
> @@ -365,7 +365,8 @@ void bpf_prog_kallsyms_del_all(struct bpf_prog *fp)
>  }
>  
>  #ifdef CONFIG_BPF_JIT
> -# define BPF_JIT_LIMIT_DEFAULT   (PAGE_SIZE * 4)
> +# define BPF_MIN(x, y) ((x) < (y) ? (x) : (y))
> +# define BPF_JIT_LIMIT_DEFAULT   BPF_MIN((PAGE_SIZE * 4), 
> INT_MAX)
>  
>  /* All BPF JIT sysctl knobs here. */
>  int bpf_jit_enable   __read_mostly = 
> IS_BUILTIN(CONFIG_BPF_JIT_ALWAYS_ON);
>>
>> I would actually just like to get rid of the BPF_JIT_LIMIT_DEFAULT
>> define also given for 4.21 arm64 will have its own dedicated area for
>> JIT allocations where neither the above limit nor the MODULES_END/
>> MODULES_VADDR one would fit and I don't want to make this even more
>> ugly with adding further cases into the core. Would the below variant
>> work for you?
> 
> Looks good to me. My one concern (which is probably a separate
> issue) is that the INT_MAX limit is a bit more punishing for larger
> page sizes since the minimum allocations 

Re: [PATCH] bpf: fix overflow of bpf_jit_limit when PAGE_SIZE >= 64K

2018-12-10 Thread Michael Roth
Quoting Daniel Borkmann (2018-12-10 08:26:31)
> On 12/07/2018 04:36 PM, Michael Roth wrote:
> > Quoting Michael Ellerman (2018-12-07 06:31:13)
> >> Michael Roth  writes:
> >>
> >>> Commit ede95a63b5 introduced a bpf_jit_limit tuneable to limit BPF
> >>> JIT allocations. At compile time it defaults to PAGE_SIZE * 4,
> >>> and is adjusted again at init time if MODULES_VADDR is defined.
> >>>
> >>> For ppc64 kernels, MODULES_VADDR isn't defined, so we're stuck with
> >>
> >> But maybe it should be, I don't know why we don't define it.
> >>
> >>> the compile-time default at boot-time, which is 0x9c40 when
> >>> using 64K page size. This overflows the signed 32-bit bpf_jit_limit
> >>> value:
> >>>
> >>>   root@ubuntu:/tmp# cat /proc/sys/net/core/bpf_jit_limit
> >>>   -1673527296
> >>>
> >>> and can cause various unexpected failures throughout the network
> >>> stack. In one case `strace dhclient eth0` reported:
> >>>
> >>>   setsockopt(5, SOL_SOCKET, SO_ATTACH_FILTER, {len=11, 
> >>> filter=0x105dd27f8}, 16) = -1 ENOTSUPP (Unknown error 524)
> >>>
> >>> and similar failures can be seen with tools like tcpdump. This doesn't
> >>> always reproduce however, and I'm not sure why. The more consistent
> >>> failure I've seen is an Ubuntu 18.04 KVM guest booted on a POWER9 host
> >>> would time out on systemd/netplan configuring a virtio-net NIC with no
> >>> noticeable errors in the logs.
> >>>
> >>> Fix this by limiting the compile-time default for bpf_jit_limit to
> >>> INT_MAX.
> >>
> >> INT_MAX is a lot more than (4k * 4), so I guess I'm not clear on
> >> whether we should be using PAGE_SIZE here at all. I guess each BPF
> >> program uses at least one page is the thinking?
> > 
> > That seems to be the case, at least, the max number of minimum-sized
> > allocations would be less on ppc64 since the allocations are always at
> > least PAGE_SIZE in size. The init-time default also limits to INT_MAX,
> > so it seemed consistent to do that here too.
> > 
> >>
> >> Thanks for tracking this down. For some reason none of my ~10 test boxes
> >> have hit this, perhaps I don't have new enough userspace?
> > 
> > I'm not too sure, I would've thought things like the dhclient case in
> > the commit log would fail every time, but sometimes I need to reboot the
> > guest before I start seeing the behavior. Maybe there's something special
> > about when JIT allocations are actually done that can affect
> > reproducibility?
> > 
> > In my case at least the virtio-net networking timeout was consistent
> > enough for a bisect, but maybe it depends on the specific network
> > configuration (single NIC, basic DHCP through netplan/systemd in my case).
> > 
> >>
> >> You don't mention why you needed to add BPF_MIN(), I assume because the
> >> kernel version of min() has gotten too complicated to work here?
> > 
> > I wasn't sure if it was safe here or not, so I tried looking at other
> > users and came across:
> > 
> > mm/vmalloc.c:777:#define VMAP_MIN(x, y)   ((x) < (y) ? (x) : 
> > (y)) /* can't use min() */
> > 
> > I'm not sure what the reasoning was (or whether it still applies), but I
> > figured it was safer to do the same here. Maybe Nick still recalls?
> > 
> >>
> >> Daniel I assume you'll merge this via your tree?
> >>
> >> cheers
> >>
> >>> Fixes: ede95a63b5e8 ("bpf: add bpf_jit_limit knob to restrict unpriv 
> >>> allocations")
> >>> Cc: linuxppc-...@ozlabs.org
> >>> Cc: Daniel Borkmann 
> >>> Cc: Sandipan Das 
> >>> Cc: Alexei Starovoitov 
> >>> Signed-off-by: Michael Roth 
> 
> Thanks for the reports / fixes and sorry for my late reply (bit too
> swamped last week), some more thoughts below.
> 
> >>>  kernel/bpf/core.c | 3 ++-
> >>>  1 file changed, 2 insertions(+), 1 deletion(-)
> >>>
> >>> diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
> >>> index b1a3545d0ec8..55de4746cdfd 100644
> >>> --- a/kernel/bpf/core.c
> >>> +++ b/kernel/bpf/core.c
> >>> @@ -365,7 +365,8 @@ void bpf_prog_kallsyms_del_all(struct bpf_prog *fp)
> >>>  }
> >>>  
> >>>  #ifdef CONFIG_BPF_JIT
> >>> -# define BPF_JIT_LIMIT_DEFAULT   (PAGE_SIZE * 4)
> >>> +# define BPF_MIN(x, y) ((x) < (y) ? (x) : (y))
> >>> +# define BPF_JIT_LIMIT_DEFAULT   BPF_MIN((PAGE_SIZE * 4), 
> >>> INT_MAX)
> >>>  
> >>>  /* All BPF JIT sysctl knobs here. */
> >>>  int bpf_jit_enable   __read_mostly = 
> >>> IS_BUILTIN(CONFIG_BPF_JIT_ALWAYS_ON);
> 
> I would actually just like to get rid of the BPF_JIT_LIMIT_DEFAULT
> define also given for 4.21 arm64 will have its own dedicated area for
> JIT allocations where neither the above limit nor the MODULES_END/
> MODULES_VADDR one would fit and I don't want to make this even more
> ugly with adding further cases into the core. Would the below variant
> work for you?

Looks good to me. My one concern (which is probably a separate
issue) is that the INT_MAX limit is a bit more punishing for larger
page sizes since the minimum allocations seem to be 1 page. Are there
reasonable workloads that 

Re: [PATCH] bpf: fix overflow of bpf_jit_limit when PAGE_SIZE >= 64K

2018-12-10 Thread Daniel Borkmann
On 12/07/2018 04:36 PM, Michael Roth wrote:
> Quoting Michael Ellerman (2018-12-07 06:31:13)
>> Michael Roth  writes:
>>
>>> Commit ede95a63b5 introduced a bpf_jit_limit tuneable to limit BPF
>>> JIT allocations. At compile time it defaults to PAGE_SIZE * 4,
>>> and is adjusted again at init time if MODULES_VADDR is defined.
>>>
>>> For ppc64 kernels, MODULES_VADDR isn't defined, so we're stuck with
>>
>> But maybe it should be, I don't know why we don't define it.
>>
>>> the compile-time default at boot-time, which is 0x9c40 when
>>> using 64K page size. This overflows the signed 32-bit bpf_jit_limit
>>> value:
>>>
>>>   root@ubuntu:/tmp# cat /proc/sys/net/core/bpf_jit_limit
>>>   -1673527296
>>>
>>> and can cause various unexpected failures throughout the network
>>> stack. In one case `strace dhclient eth0` reported:
>>>
>>>   setsockopt(5, SOL_SOCKET, SO_ATTACH_FILTER, {len=11, filter=0x105dd27f8}, 
>>> 16) = -1 ENOTSUPP (Unknown error 524)
>>>
>>> and similar failures can be seen with tools like tcpdump. This doesn't
>>> always reproduce however, and I'm not sure why. The more consistent
>>> failure I've seen is an Ubuntu 18.04 KVM guest booted on a POWER9 host
>>> would time out on systemd/netplan configuring a virtio-net NIC with no
>>> noticeable errors in the logs.
>>>
>>> Fix this by limiting the compile-time default for bpf_jit_limit to
>>> INT_MAX.
>>
>> INT_MAX is a lot more than (4k * 4), so I guess I'm not clear on
>> whether we should be using PAGE_SIZE here at all. I guess each BPF
>> program uses at least one page is the thinking?
> 
> That seems to be the case, at least, the max number of minimum-sized
> allocations would be less on ppc64 since the allocations are always at
> least PAGE_SIZE in size. The init-time default also limits to INT_MAX,
> so it seemed consistent to do that here too.
> 
>>
>> Thanks for tracking this down. For some reason none of my ~10 test boxes
>> have hit this, perhaps I don't have new enough userspace?
> 
> I'm not too sure, I would've thought things like the dhclient case in
> the commit log would fail every time, but sometimes I need to reboot the
> guest before I start seeing the behavior. Maybe there's something special
> about when JIT allocations are actually done that can affect
> reproducibility?
> 
> In my case at least the virtio-net networking timeout was consistent
> enough for a bisect, but maybe it depends on the specific network
> configuration (single NIC, basic DHCP through netplan/systemd in my case).
> 
>>
>> You don't mention why you needed to add BPF_MIN(), I assume because the
>> kernel version of min() has gotten too complicated to work here?
> 
> I wasn't sure if it was safe here or not, so I tried looking at other
> users and came across:
> 
> mm/vmalloc.c:777:#define VMAP_MIN(x, y)   ((x) < (y) ? (x) : (y)) 
> /* can't use min() */
> 
> I'm not sure what the reasoning was (or whether it still applies), but I
> figured it was safer to do the same here. Maybe Nick still recalls?
> 
>>
>> Daniel I assume you'll merge this via your tree?
>>
>> cheers
>>
>>> Fixes: ede95a63b5e8 ("bpf: add bpf_jit_limit knob to restrict unpriv 
>>> allocations")
>>> Cc: linuxppc-...@ozlabs.org
>>> Cc: Daniel Borkmann 
>>> Cc: Sandipan Das 
>>> Cc: Alexei Starovoitov 
>>> Signed-off-by: Michael Roth 

Thanks for the reports / fixes and sorry for my late reply (bit too
swamped last week), some more thoughts below.

>>>  kernel/bpf/core.c | 3 ++-
>>>  1 file changed, 2 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
>>> index b1a3545d0ec8..55de4746cdfd 100644
>>> --- a/kernel/bpf/core.c
>>> +++ b/kernel/bpf/core.c
>>> @@ -365,7 +365,8 @@ void bpf_prog_kallsyms_del_all(struct bpf_prog *fp)
>>>  }
>>>  
>>>  #ifdef CONFIG_BPF_JIT
>>> -# define BPF_JIT_LIMIT_DEFAULT   (PAGE_SIZE * 4)
>>> +# define BPF_MIN(x, y) ((x) < (y) ? (x) : (y))
>>> +# define BPF_JIT_LIMIT_DEFAULT   BPF_MIN((PAGE_SIZE * 4), INT_MAX)
>>>  
>>>  /* All BPF JIT sysctl knobs here. */
>>>  int bpf_jit_enable   __read_mostly = IS_BUILTIN(CONFIG_BPF_JIT_ALWAYS_ON);

I would actually just like to get rid of the BPF_JIT_LIMIT_DEFAULT
define also given for 4.21 arm64 will have its own dedicated area for
JIT allocations where neither the above limit nor the MODULES_END/
MODULES_VADDR one would fit and I don't want to make this even more
ugly with adding further cases into the core. Would the below variant
work for you?

Thanks,
Daniel

>From da9daf462d41ce5506c6b6318a9fa3d6d8a64f6c Mon Sep 17 00:00:00 2001
From: Daniel Borkmann 
Date: Mon, 10 Dec 2018 14:30:27 +0100
Subject: [PATCH bpf] bpf: fix bpf_jit_limit knob for PAGE_SIZE >= 64K

Michael and Sandipan report:

  Commit ede95a63b5 introduced a bpf_jit_limit tuneable to limit BPF
  JIT allocations. At compile time it defaults to PAGE_SIZE * 4,
  and is adjusted again at init time if MODULES_VADDR is defined.

  For ppc64 kernels, MODULES_VADDR isn't defined, so 

Re: [PATCH] bpf: fix overflow of bpf_jit_limit when PAGE_SIZE >= 64K

2018-12-07 Thread Michael Roth
Quoting Michael Ellerman (2018-12-07 06:31:13)
> Michael Roth  writes:
> 
> > Commit ede95a63b5 introduced a bpf_jit_limit tuneable to limit BPF
> > JIT allocations. At compile time it defaults to PAGE_SIZE * 4,
> > and is adjusted again at init time if MODULES_VADDR is defined.
> >
> > For ppc64 kernels, MODULES_VADDR isn't defined, so we're stuck with
> 
> But maybe it should be, I don't know why we don't define it.
> 
> > the compile-time default at boot-time, which is 0x9c40 when
> > using 64K page size. This overflows the signed 32-bit bpf_jit_limit
> > value:
> >
> >   root@ubuntu:/tmp# cat /proc/sys/net/core/bpf_jit_limit
> >   -1673527296
> >
> > and can cause various unexpected failures throughout the network
> > stack. In one case `strace dhclient eth0` reported:
> >
> >   setsockopt(5, SOL_SOCKET, SO_ATTACH_FILTER, {len=11, filter=0x105dd27f8}, 
> > 16) = -1 ENOTSUPP (Unknown error 524)
> >
> > and similar failures can be seen with tools like tcpdump. This doesn't
> > always reproduce however, and I'm not sure why. The more consistent
> > failure I've seen is an Ubuntu 18.04 KVM guest booted on a POWER9 host
> > would time out on systemd/netplan configuring a virtio-net NIC with no
> > noticeable errors in the logs.
> >
> > Fix this by limiting the compile-time default for bpf_jit_limit to
> > INT_MAX.
> 
> INT_MAX is a lot more than (4k * 4), so I guess I'm not clear on
> whether we should be using PAGE_SIZE here at all. I guess each BPF
> program uses at least one page is the thinking?

That seems to be the case, at least, the max number of minimum-sized
allocations would be less on ppc64 since the allocations are always at
least PAGE_SIZE in size. The init-time default also limits to INT_MAX,
so it seemed consistent to do that here too.

> 
> Thanks for tracking this down. For some reason none of my ~10 test boxes
> have hit this, perhaps I don't have new enough userspace?

I'm not too sure, I would've thought things like the dhclient case in
the commit log would fail every time, but sometimes I need to reboot the
guest before I start seeing the behavior. Maybe there's something special
about when JIT allocations are actually done that can affect
reproducibility?

In my case at least the virtio-net networking timeout was consistent
enough for a bisect, but maybe it depends on the specific network
configuration (single NIC, basic DHCP through netplan/systemd in my case).

> 
> You don't mention why you needed to add BPF_MIN(), I assume because the
> kernel version of min() has gotten too complicated to work here?

I wasn't sure if it was safe here or not, so I tried looking at other
users and came across:

mm/vmalloc.c:777:#define VMAP_MIN(x, y) ((x) < (y) ? (x) : (y)) /* 
can't use min() */

I'm not sure what the reasoning was (or whether it still applies), but I
figured it was safer to do the same here. Maybe Nick still recalls?

> 
> Daniel I assume you'll merge this via your tree?
> 
> cheers
> 
> > Fixes: ede95a63b5e8 ("bpf: add bpf_jit_limit knob to restrict unpriv 
> > allocations")
> > Cc: linuxppc-...@ozlabs.org
> > Cc: Daniel Borkmann 
> > Cc: Sandipan Das 
> > Cc: Alexei Starovoitov 
> > Signed-off-by: Michael Roth 
> > ---
> >  kernel/bpf/core.c | 3 ++-
> >  1 file changed, 2 insertions(+), 1 deletion(-)
> >
> > diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
> > index b1a3545d0ec8..55de4746cdfd 100644
> > --- a/kernel/bpf/core.c
> > +++ b/kernel/bpf/core.c
> > @@ -365,7 +365,8 @@ void bpf_prog_kallsyms_del_all(struct bpf_prog *fp)
> >  }
> >  
> >  #ifdef CONFIG_BPF_JIT
> > -# define BPF_JIT_LIMIT_DEFAULT   (PAGE_SIZE * 4)
> > +# define BPF_MIN(x, y) ((x) < (y) ? (x) : (y))
> > +# define BPF_JIT_LIMIT_DEFAULT   BPF_MIN((PAGE_SIZE * 4), INT_MAX)
> >  
> >  /* All BPF JIT sysctl knobs here. */
> >  int bpf_jit_enable   __read_mostly = IS_BUILTIN(CONFIG_BPF_JIT_ALWAYS_ON);
> > -- 
> > 2.17.1
> 


Re: [PATCH] bpf: fix overflow of bpf_jit_limit when PAGE_SIZE >= 64K

2018-12-07 Thread Michael Ellerman
Michael Roth  writes:

> Commit ede95a63b5 introduced a bpf_jit_limit tuneable to limit BPF
> JIT allocations. At compile time it defaults to PAGE_SIZE * 4,
> and is adjusted again at init time if MODULES_VADDR is defined.
>
> For ppc64 kernels, MODULES_VADDR isn't defined, so we're stuck with

But maybe it should be, I don't know why we don't define it.

> the compile-time default at boot-time, which is 0x9c40 when
> using 64K page size. This overflows the signed 32-bit bpf_jit_limit
> value:
>
>   root@ubuntu:/tmp# cat /proc/sys/net/core/bpf_jit_limit
>   -1673527296
>
> and can cause various unexpected failures throughout the network
> stack. In one case `strace dhclient eth0` reported:
>
>   setsockopt(5, SOL_SOCKET, SO_ATTACH_FILTER, {len=11, filter=0x105dd27f8}, 
> 16) = -1 ENOTSUPP (Unknown error 524)
>
> and similar failures can be seen with tools like tcpdump. This doesn't
> always reproduce however, and I'm not sure why. The more consistent
> failure I've seen is an Ubuntu 18.04 KVM guest booted on a POWER9 host
> would time out on systemd/netplan configuring a virtio-net NIC with no
> noticeable errors in the logs.
>
> Fix this by limiting the compile-time default for bpf_jit_limit to
> INT_MAX.

INT_MAX is a lot more than (4k * 4), so I guess I'm not clear on
whether we should be using PAGE_SIZE here at all. I guess each BPF
program uses at least one page is the thinking?

Thanks for tracking this down. For some reason none of my ~10 test boxes
have hit this, perhaps I don't have new enough userspace?

You don't mention why you needed to add BPF_MIN(), I assume because the
kernel version of min() has gotten too complicated to work here?

Daniel I assume you'll merge this via your tree?

cheers

> Fixes: ede95a63b5e8 ("bpf: add bpf_jit_limit knob to restrict unpriv 
> allocations")
> Cc: linuxppc-...@ozlabs.org
> Cc: Daniel Borkmann 
> Cc: Sandipan Das 
> Cc: Alexei Starovoitov 
> Signed-off-by: Michael Roth 
> ---
>  kernel/bpf/core.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
> index b1a3545d0ec8..55de4746cdfd 100644
> --- a/kernel/bpf/core.c
> +++ b/kernel/bpf/core.c
> @@ -365,7 +365,8 @@ void bpf_prog_kallsyms_del_all(struct bpf_prog *fp)
>  }
>  
>  #ifdef CONFIG_BPF_JIT
> -# define BPF_JIT_LIMIT_DEFAULT   (PAGE_SIZE * 4)
> +# define BPF_MIN(x, y) ((x) < (y) ? (x) : (y))
> +# define BPF_JIT_LIMIT_DEFAULT   BPF_MIN((PAGE_SIZE * 4), INT_MAX)
>  
>  /* All BPF JIT sysctl knobs here. */
>  int bpf_jit_enable   __read_mostly = IS_BUILTIN(CONFIG_BPF_JIT_ALWAYS_ON);
> -- 
> 2.17.1