Re: kernel BUG at ./include/linux/mm.h:LINE! (3)

2018-01-08 Thread Pete Zaitcev
On Wed, 3 Jan 2018 12:26:04 +0300
"Kirill A. Shutemov"  wrote:

> > > - unsigned long offset, chunk_idx;
> > > + unsigned long offset, chunk_idx, flags;
> > >   struct page *pageptr;
> > >  
> > > + mutex_lock(>fetch_lock);
> > > + spin_lock_irqsave(>b_lock, flags);
> > >   offset = vmf->pgoff << PAGE_SHIFT;

> > I think that grabbing the spinlock is not really necessary in
> > this case. The ->b_lock is designed for things that are accessed
> > from interrupts that Host Controller Driver serves -- mostly
> > various pointers. By defintion it's not covering things that
> > are related to re-allocation. Now, the re-allocation itself
> > grabs it, because it resets indexes into the new buffer, but
> > does not appear to apply here, does it now?  
> 
> Please, double check everything. I remember that the mutex wasn't enough
> to stop bug from triggering. But I didn't spend much time understanding
> the code.

Attached is the test that I used to reproduce the problem, and my
patch with just the ->fetch_lock fixes it. That said, your test
suite may be more comprehensive, or you may have a device that
generates enough USB traffic with associated monitoring callbacks.
But I don't see it.

At this point, I'm going to post the patch with a Signed-Off-By.

-- Pete

(this reproduces very quickly)

/*
 * usbmontest: The crash test for usbmon, crashes kernel 4.15
 *
 * Copyright (c) 2007 Red Hat, Inc.
 * Copyright (c) 2016 Mike Frysinger 
 * Copyright (c) 2018 Pete Zaitcev 
 *
 * This program is free software; you can redistribute it and/or modify
 * it under the terms of the GNU General Public License version 2 as
 * published by the Free Software Foundation.
 *
 * This program is distributed in the hope that it will be useful,
 * but WITHOUT ANY WARRANTY; without even the implied warranty of
 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 * GNU General Public License for more details.
 *
 * You should have received a copy of the GNU General Public License along
 * with this program; if not, write to the Free Software Foundation, Inc.,
 * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
 */
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 

#define TAG "usbmontest"

#ifdef __GNUC__
#define __unused __attribute__((unused))
#else
#define __unused /**/
#endif

#define MON_IOC_MAGIC 0x92

#define MON_IOCG_STATS _IOR(MON_IOC_MAGIC, 3, struct usbmon_stats)

#define MON_IOCT_RING_SIZE _IO(MON_IOC_MAGIC, 4)

#define MON_IOCQ_RING_SIZE _IO(MON_IOC_MAGIC, 5)

struct usbmon_mfetch_arg {
unsigned int *offvec;   /* Vector of events fetched */
unsigned int nfetch;/* Num. of events to fetch / fetched */
unsigned int nflush;/* Number of events to flush */
};

#define MON_IOCX_MFETCH _IOWR(MON_IOC_MAGIC, 7, struct usbmon_mfetch_arg)

/*
 */
struct params {
int ifnum;  /* USB bus number */
char *devname;  /* /dev/usbmonN */

unsigned int map_size;
};
const unsigned int page_size = 8192;/* 2x the size on x86, cross-platform */
const unsigned int num_pages = 3;

void Usage(void);

void parse_params(struct params *p, char **argv);
void make_device(const struct params *p);
int find_major(void);

struct params par;

int main(int argc __unused, char **argv)
{
int fd;
unsigned char *data_buff;
unsigned int n;
int wstat;
int rc;

parse_params(, argv+1);

/*
 * Two reasons to do this:
 * 1. Reduce weird error messages.
 * 2. If we create device nodes, we want them owned by root.
 */
if (geteuid() != 0) {
fprintf(stderr, TAG ": Must run as root\n");
exit(1);
}

if ((fd = open(par.devname, O_RDWR)) == -1) {
if (errno == ENOENT) {
make_device();
fd = open(par.devname, O_RDWR);
}
if (fd == -1) {
if (errno == ENODEV && par.ifnum == 0) {
fprintf(stderr, TAG
": Can't open pseudo-bus zero at %s"
" (probably not supported by kernel)\n",
par.devname);
} else {
fprintf(stderr, TAG ": Can't open %s: %s\n",
par.devname, strerror(errno));
}
exit(1);
}
}

rc = ioctl(fd, MON_IOCT_RING_SIZE, par.map_size);
if (rc == -1) {
fprintf(stderr, TAG ": Cannot set ring size (%d): %s\n",
par.map_size, strerror(errno));
exit(1);
}

rc = fork();
 

Re: kernel BUG at ./include/linux/mm.h:LINE! (3)

2018-01-07 Thread Pete Zaitcev
On Fri, 29 Dec 2017 16:24:20 +0300
"Kirill A. Shutemov"  wrote:

> Looks like MON_IOCT_RING_SIZE reallocates ring buffer without any
> serialization wrt mon_bin_vma_fault(). By the time of get_page() the page
> may be freed.

As an update: I tried to make a smaller test for this, but was unsuccessful
so far. I'll poke a bit at it later, but it may take me some time.

-- Pete
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kernel BUG at ./include/linux/mm.h:LINE! (3)

2018-01-03 Thread Pete Zaitcev
On Wed, 3 Jan 2018 13:08:12 -0800
Matthew Wilcox  wrote:

> > +   mutex_lock(>fetch_lock);
> > offset = vmf->pgoff << PAGE_SHIFT;
> > if (offset >= rp->b_size)
> > +   mutex_unlock(>fetch_lock);
> > return VM_FAULT_SIGBUS;
> > chunk_idx = offset / CHUNK_SIZE;  
> 
> missing braces ... maybe you'd rather a 'goto sigbus' approach?

Thanks. What a way to return to kernel programming for me. I'm going
to test it anyway, already started to unpacking a PC, but yeah...

-- Pete
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kernel BUG at ./include/linux/mm.h:LINE! (3)

2018-01-03 Thread Matthew Wilcox
On Wed, Jan 03, 2018 at 03:04:19PM -0600, Pete Zaitcev wrote:
> @@ -1231,12 +1233,15 @@ static int mon_bin_vma_fault(struct vm_fault *vmf)
>   unsigned long offset, chunk_idx;
>   struct page *pageptr;
>  
> + mutex_lock(>fetch_lock);
>   offset = vmf->pgoff << PAGE_SHIFT;
>   if (offset >= rp->b_size)
> + mutex_unlock(>fetch_lock);
>   return VM_FAULT_SIGBUS;
>   chunk_idx = offset / CHUNK_SIZE;

missing braces ... maybe you'd rather a 'goto sigbus' approach?

>   pageptr = rp->b_vec[chunk_idx].pg;
>   get_page(pageptr);
> + mutex_unlock(>fetch_lock);
>   vmf->page = pageptr;
>   return 0;
>  }
> 
> -- Pete
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majord...@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: mailto:"d...@kvack.org;> em...@kvack.org 
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kernel BUG at ./include/linux/mm.h:LINE! (3)

2018-01-03 Thread Pete Zaitcev
On Wed, 3 Jan 2018 12:26:04 +0300
"Kirill A. Shutemov"  wrote:

> > > +++ b/drivers/usb/mon/mon_bin.c
> > > @@ -1228,15 +1228,24 @@ static void mon_bin_vma_close(struct 
> > > vm_area_struct *vma)
> > >  static int mon_bin_vma_fault(struct vm_fault *vmf)
> > >  {
> > >   struct mon_reader_bin *rp = vmf->vma->vm_private_data;
> > > - unsigned long offset, chunk_idx;
> > > + unsigned long offset, chunk_idx, flags;
> > >   struct page *pageptr;
> > >  
> > > + mutex_lock(>fetch_lock);
> > > + spin_lock_irqsave(>b_lock, flags);
> > >   offset = vmf->pgoff << PAGE_SHIFT;
> > > - if (offset >= rp->b_size)
> > > + if (offset >= rp->b_size) {
> > > + spin_unlock_irqrestore(>b_lock, flags);
> > > + mutex_unlock(>fetch_lock);
> > >   return VM_FAULT_SIGBUS;
> > > + }
> > >   chunk_idx = offset / CHUNK_SIZE;
> > > +
> > >   pageptr = rp->b_vec[chunk_idx].pg;
> > >   get_page(pageptr);
> > > + spin_unlock_irqrestore(>b_lock, flags);
> > > + mutex_unlock(>fetch_lock);
> > > +
> > >   vmf->page = pageptr;
> > >   return 0;
> > >  }  
> > 
> > I think that grabbing the spinlock is not really necessary in
> > this case. [...]
> 
> Please, double check everything. I remember that the mutex wasn't enough
> to stop bug from triggering. But I didn't spend much time understanding
> the code.

I just don't understand why. The only two fields that are used
in the fault routine are rp->b_vec and rp->b_size. They are
protected by the mutex rp->fetch_lock. I don't see anything else
can spill into these fields by dirtying adjacent words in memory,
either except this:

case MON_IOCQ_RING_SIZE:
ret = rp->b_size;
break;

In the old days, this was safe, but who knows what CPUs do today.
It needs the same mutex taken around the read-only reference too.
How about this:

diff --git a/drivers/usb/mon/mon_bin.c b/drivers/usb/mon/mon_bin.c
index f6ae753ab99b..cb3612f28804 100644
--- a/drivers/usb/mon/mon_bin.c
+++ b/drivers/usb/mon/mon_bin.c
@@ -1004,7 +1004,9 @@ static long mon_bin_ioctl(struct file *file, unsigned int 
cmd, unsigned long arg
break;
 
case MON_IOCQ_RING_SIZE:
+   mutex_lock(>fetch_lock);
ret = rp->b_size;
+   mutex_unlock(>fetch_lock);
break;
 
case MON_IOCT_RING_SIZE:
@@ -1231,12 +1233,15 @@ static int mon_bin_vma_fault(struct vm_fault *vmf)
unsigned long offset, chunk_idx;
struct page *pageptr;
 
+   mutex_lock(>fetch_lock);
offset = vmf->pgoff << PAGE_SHIFT;
if (offset >= rp->b_size)
+   mutex_unlock(>fetch_lock);
return VM_FAULT_SIGBUS;
chunk_idx = offset / CHUNK_SIZE;
pageptr = rp->b_vec[chunk_idx].pg;
get_page(pageptr);
+   mutex_unlock(>fetch_lock);
vmf->page = pageptr;
return 0;
 }

-- Pete
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kernel BUG at ./include/linux/mm.h:LINE! (3)

2018-01-03 Thread Kirill A. Shutemov
On Wed, Jan 03, 2018 at 01:02:38AM -0600, Pete Zaitcev wrote:
> On Fri, 29 Dec 2017 16:24:20 +0300
> "Kirill A. Shutemov"  wrote:
> 
> > Looks like MON_IOCT_RING_SIZE reallocates ring buffer without any
> > serialization wrt mon_bin_vma_fault(). By the time of get_page() the page
> > may be freed.
> 
> Okay. Who knew that you could fork while holding an open descriptor. :-)

It's threads. But yeah.

> > The patch below seems help the crash to go away, but I *think* more work
> > is required. For instance, after ring buffer reallocation the old pages
> > will stay mapped. Nothing pulls them.
> 
> You know, this bothered me all these years too, but I was assured
> back in the day (as much as I can remember), that doing get_page()
> in the .fault() is just the right thing. In my defense, you can
> see other drivers doing it, such as:
> 
> ./drivers/char/agp/alpha-agp.c
> ./drivers/hsi/clients/cmt_speech.c
> 
> I'd appreciate insight from someone who knows how VM subsystem works.

get_page() is not a problem. It's right thing to do in ->fault.

After more thought, I think it's not a problem at all. As long as
userspace is aware that old mapping is no good after changing size of the
buffer everything would work fine. Even if userspace would use old mapping
nothing bad would happen from kernel POV.  Just userspace may see
old/inconsistent data. But there's no crashes or such.

> Now, about the code:
> 
> > diff --git a/drivers/usb/mon/mon_bin.c b/drivers/usb/mon/mon_bin.c
> > index f6ae753ab99b..ac168fecf04f 100644
> > --- a/drivers/usb/mon/mon_bin.c
> > +++ b/drivers/usb/mon/mon_bin.c
> > @@ -1228,15 +1228,24 @@ static void mon_bin_vma_close(struct vm_area_struct 
> > *vma)
> >  static int mon_bin_vma_fault(struct vm_fault *vmf)
> >  {
> > struct mon_reader_bin *rp = vmf->vma->vm_private_data;
> > -   unsigned long offset, chunk_idx;
> > +   unsigned long offset, chunk_idx, flags;
> > struct page *pageptr;
> >  
> > +   mutex_lock(>fetch_lock);
> > +   spin_lock_irqsave(>b_lock, flags);
> > offset = vmf->pgoff << PAGE_SHIFT;
> > -   if (offset >= rp->b_size)
> > +   if (offset >= rp->b_size) {
> > +   spin_unlock_irqrestore(>b_lock, flags);
> > +   mutex_unlock(>fetch_lock);
> > return VM_FAULT_SIGBUS;
> > +   }
> > chunk_idx = offset / CHUNK_SIZE;
> > +
> > pageptr = rp->b_vec[chunk_idx].pg;
> > get_page(pageptr);
> > +   spin_unlock_irqrestore(>b_lock, flags);
> > +   mutex_unlock(>fetch_lock);
> > +
> > vmf->page = pageptr;
> > return 0;
> >  }
> 
> I think that grabbing the spinlock is not really necessary in
> this case. The ->b_lock is designed for things that are accessed
> from interrupts that Host Controller Driver serves -- mostly
> various pointers. By defintion it's not covering things that
> are related to re-allocation. Now, the re-allocation itself
> grabs it, because it resets indexes into the new buffer, but
> does not appear to apply here, does it now?

Please, double check everything. I remember that the mutex wasn't enough
to stop bug from triggering. But I didn't spend much time understanding
the code.

-- 
 Kirill A. Shutemov
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kernel BUG at ./include/linux/mm.h:LINE! (3)

2018-01-02 Thread Pete Zaitcev
On Fri, 29 Dec 2017 16:24:20 +0300
"Kirill A. Shutemov"  wrote:

> Looks like MON_IOCT_RING_SIZE reallocates ring buffer without any
> serialization wrt mon_bin_vma_fault(). By the time of get_page() the page
> may be freed.

Okay. Who knew that you could fork while holding an open descriptor. :-)

> The patch below seems help the crash to go away, but I *think* more work
> is required. For instance, after ring buffer reallocation the old pages
> will stay mapped. Nothing pulls them.

You know, this bothered me all these years too, but I was assured
back in the day (as much as I can remember), that doing get_page()
in the .fault() is just the right thing. In my defense, you can
see other drivers doing it, such as:

./drivers/char/agp/alpha-agp.c
./drivers/hsi/clients/cmt_speech.c

I'd appreciate insight from someone who knows how VM subsystem works.

Now, about the code:

> diff --git a/drivers/usb/mon/mon_bin.c b/drivers/usb/mon/mon_bin.c
> index f6ae753ab99b..ac168fecf04f 100644
> --- a/drivers/usb/mon/mon_bin.c
> +++ b/drivers/usb/mon/mon_bin.c
> @@ -1228,15 +1228,24 @@ static void mon_bin_vma_close(struct vm_area_struct 
> *vma)
>  static int mon_bin_vma_fault(struct vm_fault *vmf)
>  {
>   struct mon_reader_bin *rp = vmf->vma->vm_private_data;
> - unsigned long offset, chunk_idx;
> + unsigned long offset, chunk_idx, flags;
>   struct page *pageptr;
>  
> + mutex_lock(>fetch_lock);
> + spin_lock_irqsave(>b_lock, flags);
>   offset = vmf->pgoff << PAGE_SHIFT;
> - if (offset >= rp->b_size)
> + if (offset >= rp->b_size) {
> + spin_unlock_irqrestore(>b_lock, flags);
> + mutex_unlock(>fetch_lock);
>   return VM_FAULT_SIGBUS;
> + }
>   chunk_idx = offset / CHUNK_SIZE;
> +
>   pageptr = rp->b_vec[chunk_idx].pg;
>   get_page(pageptr);
> + spin_unlock_irqrestore(>b_lock, flags);
> + mutex_unlock(>fetch_lock);
> +
>   vmf->page = pageptr;
>   return 0;
>  }

I think that grabbing the spinlock is not really necessary in
this case. The ->b_lock is designed for things that are accessed
from interrupts that Host Controller Driver serves -- mostly
various pointers. By defintion it's not covering things that
are related to re-allocation. Now, the re-allocation itself
grabs it, because it resets indexes into the new buffer, but
does not appear to apply here, does it now?

-- Pete
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Fw: kernel BUG at ./include/linux/mm.h:LINE! (3)

2017-12-29 Thread Kirill A. Shutemov
On Thu, Dec 28, 2017 at 04:03:46PM -0800, Andrew Morton wrote:
> 
> Is anyone able to reproduce this?  The VM_BUG_ON_PAGE() in get_page()
> went bang.

Yep, it triggers for me.

Looks like MON_IOCT_RING_SIZE reallocates ring buffer without any
serialization wrt mon_bin_vma_fault(). By the time of get_page() the page
may be freed.

The patch below seems help the crash to go away, but I *think* more work
is required. For instance, after ring buffer reallocation the old pages
will stay mapped. Nothing pulls them.

8<--
diff --git a/drivers/usb/mon/mon_bin.c b/drivers/usb/mon/mon_bin.c
index f6ae753ab99b..ac168fecf04f 100644
--- a/drivers/usb/mon/mon_bin.c
+++ b/drivers/usb/mon/mon_bin.c
@@ -1228,15 +1228,24 @@ static void mon_bin_vma_close(struct vm_area_struct 
*vma)
 static int mon_bin_vma_fault(struct vm_fault *vmf)
 {
struct mon_reader_bin *rp = vmf->vma->vm_private_data;
-   unsigned long offset, chunk_idx;
+   unsigned long offset, chunk_idx, flags;
struct page *pageptr;
 
+   mutex_lock(>fetch_lock);
+   spin_lock_irqsave(>b_lock, flags);
offset = vmf->pgoff << PAGE_SHIFT;
-   if (offset >= rp->b_size)
+   if (offset >= rp->b_size) {
+   spin_unlock_irqrestore(>b_lock, flags);
+   mutex_unlock(>fetch_lock);
return VM_FAULT_SIGBUS;
+   }
chunk_idx = offset / CHUNK_SIZE;
+
pageptr = rp->b_vec[chunk_idx].pg;
get_page(pageptr);
+   spin_unlock_irqrestore(>b_lock, flags);
+   mutex_unlock(>fetch_lock);
+
vmf->page = pageptr;
return 0;
 }
8<--

> 
> 
> Begin forwarded message:
> 
> Date: Wed, 27 Dec 2017 14:22:01 -0800
> From: syzbot <syzbot+f9831b881b3e84982...@syzkaller.appspotmail.com>
> To: a...@linux-foundation.org, alexander.deuc...@amd.com, 
> ch...@chris-wilson.co.uk, dave.ji...@intel.com, da...@davemloft.net, 
> gre...@linuxfoundation.org, linux-ker...@vger.kernel.org, 
> linux-usb@vger.kernel.org, mche...@kernel.org, mi...@kernel.org, 
> syzkaller-b...@googlegroups.com
> Subject: kernel BUG at ./include/linux/mm.h:LINE! (3)
> 
> 
> Hello,
> 
> syzkaller hit the following crash on  
> 37759fa6d0fa9e4d6036d19ac12f555bfc0aeafd
> git://git.cmpxchg.org/linux-mmots.git/master
> compiler: gcc (GCC) 7.1.1 20170620
> .config is attached
> Raw console output is attached.
> C reproducer is attached
> syzkaller reproducer is attached. See https://goo.gl/kgGztJ
> for information about syzkaller reproducers
> 
> 
> IMPORTANT: if you fix the bug, please add the following tag to the commit:
> Reported-by: <syzbot+f9831b881b3e84982...@syzkaller.appspotmail.com>
> It will help syzbot understand when the bug is fixed. See footer for  
> details.
> If you forward the report, please keep this part and the footer.
> 
> [ cut here ]
> kernel BUG at ./include/linux/mm.h:844!
> invalid opcode:  [#1] SMP KASAN
> Dumping ftrace buffer:
> (ftrace buffer empty)
> Modules linked in:
> CPU: 1 PID: 3152 Comm: syzkaller757651 Not tainted 4.15.0-rc4-mm1+ #49
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS  
> Google 01/01/2011
> RIP: 0010:get_page include/linux/mm.h:844 [inline]
> RIP: 0010:mon_bin_vma_fault+0x2f4/0x400 drivers/usb/mon/mon_bin.c:1239
> RSP: 0018:8801cb0af510 EFLAGS: 00010203
> RAX:  RBX: 110039615ea4 RCX: 
> RDX:  RSI: 110039615dee RDI: ed0039615e90
> RBP: 8801cb0af5e8 R08: 110039615db0 R09: 
> R10: dc00 R11:  R12: 110039615ea8
> R13: dc00 R14: 8801cb0af840 R15: ea000723fc00
> FS:  7f49b1386700() GS:8801db30() knlGS:
> CS:  0010 DS:  ES:  CR0: 80050033
> CR2: 55cfb39ce0b8 CR3: 0001ccb19003 CR4: 001606e0
> DR0:  DR1:  DR2: 
> DR3:  DR6: fffe0ff0 DR7: 0400
> Call Trace:
>   __do_fault+0xeb/0x30f mm/memory.c:3206
>   do_read_fault mm/memory.c:3616 [inline]
>   do_fault mm/memory.c:3716 [inline]
>   handle_pte_fault mm/memory.c:3947 [inline]
>   __handle_mm_fault+0x1d8f/0x3ce0 mm/memory.c:4071
>   handle_mm_fault+0x38f/0x930 mm/memory.c:4108
>   faultin_page mm/gup.c:502 [inline]
>   __get_user_pages+0x50c/0x15f0 mm/gup.c:699
>   populate_vma_page_range+0x20e/0x2f0 mm/gup.c:1200
>   __mm_populate+0x23a/0x450 mm/gup.c:1250
>   mm_populate include/linux/mm.h:2233 [inline]
>   vm_mmap_pgoff+0x241/0x280 mm/util.c:338
>   SYSC_mmap_pgoff mm/mmap.c:1