Re: WARNING: CPU: 1 PID: 52 at mm/page_alloc.c:4826 __alloc_pages_nodemask (Re: [PATCH 5/5] sysctl: pass kernel pointers to ->proc_handler)

2020-06-08 Thread sdf

On 06/08, Alexei Starovoitov wrote:

On Mon, Jun 8, 2020 at 6:05 AM Christoph Hellwig  wrote:
>
> On Mon, Jun 08, 2020 at 09:45:49AM +0200, Vegard Nossum wrote:
> > Just a test case.
> >
> > Allowing the kernel to allocate an unbounded amount of memory on  
behalf

> > of userspace is an easy DOS.
> >
> > All the length checks were already in there, e.g.
> >
> >  static int cmm_timeout_handler(struct ctl_table *ctl, int write,
> >   void __user *buffer, size_t *lenp,  
loff_t

> > *ppos)
> >  {
> > char buf[64], *p;
> > [...]
> > len = min(*lenp, sizeof(buf));
> > if (copy_from_user(buf, buffer, len))
> > return -EFAULT;
>
> Doesn't help if we don't know the exact limit yet.  But we can put
> some arbitrary but reasonable limit like KMALLOC_MAX_SIZE on the
> sysctls and see if this sticks.



adding Stanislav. I think he's looking into this already.

Yeah, I'm looking at it from the get/setsockopt point of view.
I'm currently trying to bypass allocating a buffer if it's greater
than PAGE_SIZE.
I suppose for sysctls we should try to do something similar?


Re: WARNING: CPU: 1 PID: 52 at mm/page_alloc.c:4826 __alloc_pages_nodemask (Re: [PATCH 5/5] sysctl: pass kernel pointers to ->proc_handler)

2020-06-08 Thread Alexei Starovoitov
On Mon, Jun 8, 2020 at 6:05 AM Christoph Hellwig  wrote:
>
> On Mon, Jun 08, 2020 at 09:45:49AM +0200, Vegard Nossum wrote:
> > Just a test case.
> >
> > Allowing the kernel to allocate an unbounded amount of memory on behalf
> > of userspace is an easy DOS.
> >
> > All the length checks were already in there, e.g.
> >
> >  static int cmm_timeout_handler(struct ctl_table *ctl, int write,
> >   void __user *buffer, size_t *lenp, loff_t
> > *ppos)
> >  {
> > char buf[64], *p;
> > [...]
> > len = min(*lenp, sizeof(buf));
> > if (copy_from_user(buf, buffer, len))
> > return -EFAULT;
>
> Doesn't help if we don't know the exact limit yet.  But we can put
> some arbitrary but reasonable limit like KMALLOC_MAX_SIZE on the
> sysctls and see if this sticks.

adding Stanislav. I think he's looking into this already.


Re: WARNING: CPU: 1 PID: 52 at mm/page_alloc.c:4826 __alloc_pages_nodemask (Re: [PATCH 5/5] sysctl: pass kernel pointers to ->proc_handler)

2020-06-08 Thread Christoph Hellwig
On Mon, Jun 08, 2020 at 09:45:49AM +0200, Vegard Nossum wrote:
> Just a test case.
>
> Allowing the kernel to allocate an unbounded amount of memory on behalf
> of userspace is an easy DOS.
>
> All the length checks were already in there, e.g.
>
>  static int cmm_timeout_handler(struct ctl_table *ctl, int write,
>   void __user *buffer, size_t *lenp, loff_t 
> *ppos)
>  {
> char buf[64], *p;
> [...]
> len = min(*lenp, sizeof(buf));
> if (copy_from_user(buf, buffer, len))
> return -EFAULT;

Doesn't help if we don't know the exact limit yet.  But we can put
some arbitrary but reasonable limit like KMALLOC_MAX_SIZE on the
sysctls and see if this sticks.


Re: WARNING: CPU: 1 PID: 52 at mm/page_alloc.c:4826 __alloc_pages_nodemask (Re: [PATCH 5/5] sysctl: pass kernel pointers to ->proc_handler)

2020-06-08 Thread Vegard Nossum



On 2020-06-08 08:51, Christoph Hellwig wrote:

On Thu, Jun 04, 2020 at 10:22:21PM +0200, Vegard Nossum wrote:

It's easy to reproduce by just doing

 read(open("/proc/sys/vm/swappiness", O_RDONLY), 0, 512UL * 1024 * 1024
* 1024);

or so. Reverting the commit fixes the issue for me.


Yes, doing giant allocations will fail and trace.  We have to options
here that both seems sensible:

  - trunate sysctrl calls to some sensible length
  - (optionally) use vmalloc

Is this a real application or just a test case trying to do the
stupidmost possible thing?



Just a test case.

Allowing the kernel to allocate an unbounded amount of memory on behalf
of userspace is an easy DOS.

All the length checks were already in there, e.g.

 static int cmm_timeout_handler(struct ctl_table *ctl, int write,
  void __user *buffer, size_t *lenp, loff_t 
*ppos)

 {
char buf[64], *p;
[...]
len = min(*lenp, sizeof(buf));
if (copy_from_user(buf, buffer, len))
return -EFAULT;


Vegard


Re: WARNING: CPU: 1 PID: 52 at mm/page_alloc.c:4826 __alloc_pages_nodemask (Re: [PATCH 5/5] sysctl: pass kernel pointers to ->proc_handler)

2020-06-08 Thread Christoph Hellwig
On Thu, Jun 04, 2020 at 10:22:21PM +0200, Vegard Nossum wrote:
> It's easy to reproduce by just doing
>
> read(open("/proc/sys/vm/swappiness", O_RDONLY), 0, 512UL * 1024 * 1024 
> * 1024);
>
> or so. Reverting the commit fixes the issue for me.

Yes, doing giant allocations will fail and trace.  We have to options
here that both seems sensible:

 - trunate sysctrl calls to some sensible length
 - (optionally) use vmalloc

Is this a real application or just a test case trying to do the
stupidmost possible thing?


WARNING: CPU: 1 PID: 52 at mm/page_alloc.c:4826 __alloc_pages_nodemask (Re: [PATCH 5/5] sysctl: pass kernel pointers to ->proc_handler)

2020-06-04 Thread Vegard Nossum



(Trimmed original Ccs due to outgoing email policy.)

Hi,

On 2020-04-24 08:43, Christoph Hellwig wrote:

Instead of having all the sysctl handlers deal with user pointers, which
is rather hairy in terms of the BPF interaction, copy the input to and
from  userspace in common code.  This also means that the strings are
always NUL-terminated by the common code, making the API a little bit
safer.

As most handler just pass through the data to one of the common handlers
a lot of the changes are mechnical.

Signed-off-by: Christoph Hellwig 
Acked-by: Andrey Ignatov 


[snip]

diff --git a/fs/proc/proc_sysctl.c b/fs/proc/proc_sysctl.c
index b6f5d459b087d..df2143e05c571 100644
--- a/fs/proc/proc_sysctl.c
+++ b/fs/proc/proc_sysctl.c
@@ -539,13 +539,13 @@ static struct dentry *proc_sys_lookup(struct inode *dir, 
struct dentry *dentry,
return err;
  }
  
-static ssize_t proc_sys_call_handler(struct file *filp, void __user *buf,

+static ssize_t proc_sys_call_handler(struct file *filp, void __user *ubuf,
size_t count, loff_t *ppos, int write)
  {
struct inode *inode = file_inode(filp);
struct ctl_table_header *head = grab_header(inode);
struct ctl_table *table = PROC_I(inode)->sysctl_entry;
-   void *new_buf = NULL;
+   void *kbuf;
ssize_t error;
  
  	if (IS_ERR(head))

@@ -564,27 +564,38 @@ static ssize_t proc_sys_call_handler(struct file *filp, 
void __user *buf,
if (!table->proc_handler)
goto out;
  
-	error = BPF_CGROUP_RUN_PROG_SYSCTL(head, table, write, buf, ,

-  ppos, _buf);
+   if (write) {
+   kbuf = memdup_user_nul(ubuf, count);
+   if (IS_ERR(kbuf)) {
+   error = PTR_ERR(kbuf);
+   goto out;
+   }
+   } else {
+   error = -ENOMEM;
+   kbuf = kzalloc(count, GFP_KERNEL);
+   if (!kbuf)
+   goto out;
+   }
+
+   error = BPF_CGROUP_RUN_PROG_SYSCTL(head, table, write, , ,
+  ppos);
if (error)
-   goto out;
+   goto out_free_buf;
  
  	/* careful: calling conventions are nasty here */

-   if (new_buf) {
-   mm_segment_t old_fs;
-
-   old_fs = get_fs();
-   set_fs(KERNEL_DS);
-   error = table->proc_handler(table, write, (void __user 
*)new_buf,
-   , ppos);
-   set_fs(old_fs);
-   kfree(new_buf);
-   } else {
-   error = table->proc_handler(table, write, buf, , ppos);
+   error = table->proc_handler(table, write, kbuf, , ppos);
+   if (error)
+   goto out_free_buf;
+
+   if (!write) {
+   error = -EFAULT;
+   if (copy_to_user(ubuf, kbuf, count))
+   goto out_free_buf;
}
  
-	if (!error)

-   error = count;
+   error = count;
+out_free_buf:
+   kfree(kbuf);
  out:
sysctl_head_finish(head);
  


This commit in recent linus/master
(32927393dc1ccd60fb2bdc05b9e8e88753761469) causes a regression for me:

[ cut here ]
WARNING: CPU: 1 PID: 52 at mm/page_alloc.c:4826 
__alloc_pages_nodemask+0x1cd/0x2a0

CPU: 1 PID: 52 Comm: init Not tainted 5.7.0+ #218
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
Ubuntu-1.8.2-1ubuntu1 04/01/2014

RIP: 0010:__alloc_pages_nodemask+0x1cd/0x2a0
Code: 0f 85 26 ff ff ff 65 48 8b 04 25 00 7d 01 00 48 05 88 07 00 00 41 
bd 01 00 00 00 48 89 44 24 08 e9 07 ff ff ff 80 e7 20 75 02 <0f> 0b 45 
31 ed eb 98 44 8b 64 24 18 65 8b 05 d0 25 e9 7e 89 c0 48

RSP: 0018:c90e7de0 EFLAGS: 00010246
RAX:  RBX: 000400c0 RCX: 
RDX:  RSI: 0013 RDI: 00040dc0
RBP: 7000 R08: 820276c0 R09: 
R10:  R11:  R12: c90e7f08
R13: 0013 R14: 0013 R15: 81c34ce0
FS:  006cf880() GS:88803ed0() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 004a1dab CR3: 3e012002 CR4: 003606e0
DR0:  DR1:  DR2: 
DR3:  DR6: fffe0ff0 DR7: 0400
Call Trace:
 kmalloc_order+0x16/0x70
 kmalloc_order_trace+0x18/0xa0
 proc_sys_call_handler+0xf7/0x170
 vfs_read+0x98/0x120
 ksys_read+0x5a/0xd0
 do_syscall_64+0x43/0x140
 entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x43f910
Code: 01 f0 ff ff 0f 83 e0 57 00 00 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 
1f 44 00 00 83 3d 19 f2 28 00 00 75 14 b8 00 00 00 00 0f 05 <48> 3d 01 
f0 ff ff 0f 83 b4 57 00 00 c3 48 83 ec 08 e8 4a 39 00 00

RSP: 002b:7fffeaa8 EFLAGS: 0246 ORIG_RAX: 
RAX: ffda RBX: 004002c8 RCX: 

Re: [PATCH 5/5] sysctl: pass kernel pointers to ->proc_handler

2020-05-04 Thread Christoph Hellwig
On Mon, May 04, 2020 at 12:01:11PM -0700, Kees Cook wrote:
> > if (error)
> > -   goto out;
> > +   goto out_free_buf;
> >  
> > /* careful: calling conventions are nasty here */
> 
> Is this comment still valid after doing these cleanups?

The comment is pretty old so I decided to keep it.  That being said
I'm not sure it really is very helpful.


Re: [PATCH 5/5] sysctl: pass kernel pointers to ->proc_handler

2020-05-04 Thread Kees Cook
On Fri, Apr 24, 2020 at 08:43:38AM +0200, Christoph Hellwig wrote:
> Instead of having all the sysctl handlers deal with user pointers, which
> is rather hairy in terms of the BPF interaction, copy the input to and
> from  userspace in common code.  This also means that the strings are
> always NUL-terminated by the common code, making the API a little bit
> safer.
> 
> As most handler just pass through the data to one of the common handlers
> a lot of the changes are mechnical.

This is a lovely cleanup; thank you!

Tiny notes below...

> diff --git a/fs/proc/proc_sysctl.c b/fs/proc/proc_sysctl.c
> index b6f5d459b087d..df2143e05c571 100644
> --- a/fs/proc/proc_sysctl.c
> +++ b/fs/proc/proc_sysctl.c
> @@ -539,13 +539,13 @@ static struct dentry *proc_sys_lookup(struct inode 
> *dir, struct dentry *dentry,
>   return err;
>  }
>  
> -static ssize_t proc_sys_call_handler(struct file *filp, void __user *buf,
> +static ssize_t proc_sys_call_handler(struct file *filp, void __user *ubuf,
>   size_t count, loff_t *ppos, int write)
>  {
>   struct inode *inode = file_inode(filp);
>   struct ctl_table_header *head = grab_header(inode);
>   struct ctl_table *table = PROC_I(inode)->sysctl_entry;
> - void *new_buf = NULL;
> + void *kbuf;
>   ssize_t error;
>  
>   if (IS_ERR(head))
> @@ -564,27 +564,38 @@ static ssize_t proc_sys_call_handler(struct file *filp, 
> void __user *buf,
>   if (!table->proc_handler)
>   goto out;
>  
> - error = BPF_CGROUP_RUN_PROG_SYSCTL(head, table, write, buf, ,
> -ppos, _buf);
> + if (write) {
> + kbuf = memdup_user_nul(ubuf, count);
> + if (IS_ERR(kbuf)) {
> + error = PTR_ERR(kbuf);
> + goto out;
> + }
> + } else {
> + error = -ENOMEM;
> + kbuf = kzalloc(count, GFP_KERNEL);
> + if (!kbuf)
> + goto out;
> + }
> +
> + error = BPF_CGROUP_RUN_PROG_SYSCTL(head, table, write, , ,
> +ppos);
>   if (error)
> - goto out;
> + goto out_free_buf;
>  
>   /* careful: calling conventions are nasty here */

Is this comment still valid after doing these cleanups?

> - if (new_buf) {
> - mm_segment_t old_fs;
> -
> - old_fs = get_fs();
> - set_fs(KERNEL_DS);
> - error = table->proc_handler(table, write, (void __user 
> *)new_buf,
> - , ppos);
> - set_fs(old_fs);
> - kfree(new_buf);
> - } else {
> - error = table->proc_handler(table, write, buf, , ppos);
> + error = table->proc_handler(table, write, kbuf, , ppos);
> + if (error)
> + goto out_free_buf;
> +
> + if (!write) {
> + error = -EFAULT;
> + if (copy_to_user(ubuf, kbuf, count))
> + goto out_free_buf;
>   }

Something I noticed here that existed in the original code, but might be
nice to improve while we're here is to make sure that the "count"
returned from proc_handler() cannot grow _larger_, since then we might
expose heap memory beyond the end of the allocation.

I'll send a patch for this...

>  
> - if (!error)
> - error = count;
> + error = count;
> +out_free_buf:
> + kfree(kbuf);
>  out:
>   sysctl_head_finish(head);
>  
> [...]
> diff --git a/kernel/sysctl.c b/kernel/sysctl.c
> index 511543d238794..e26fe7e8e19d7 100644
> --- a/kernel/sysctl.c
> +++ b/kernel/sysctl.c
> [...]
> @@ -682,7 +661,6 @@ static int do_proc_douintvec_w(unsigned int *tbl_data,
>   left -= proc_skip_spaces();
>  
>  out_free:
> - kfree(kbuf);
>   if (err)
>   return -EINVAL;

This label name isn't accurate any more... *shrug*

-- 
Kees Cook