Re: crash on urtwn removal

2015-04-28 Thread Martin Pieuchot
On 28/04/15(Tue) 13:15, Remi Locherer wrote:
 On Tue, Apr 28, 2015 at 10:17:16AM +0200, Martin Pieuchot wrote:
  On 27/04/15(Mon) 22:45, Remi Locherer wrote:
   On Mon, Apr 27, 2015 at 03:13:06PM +0200, Martin Pieuchot wrote:
This trace tells use that the pipe is no longer valid, which means that
the device has been removed but a xfer is still referenced by ehci.

The output of ps could help understand what's going wrong in such
case.  If you can, please get it next time :)

If you think you can reproduce this bug too, here's a diff that would
get us a useful trace:
   
   It seams to be easier to trigger this bug than the first one.
  
  What did you do to trigger it?
 
 While lynx was loading a website (via ipv6) I unplugged the urtwn device.
 The panic doesn't happen every time I try this. Maybe every 10th time.

What I don't understand is that your trace shows that the process
triggering the panic is doing an ioctl(2).  In your ps output you
have ifconfig running, but since you did ps on CPU3 I can only
guess that that's the program who triggered the panic.

So did you unplugged the device while doing anything with ifconfig?
Where you able to use your device before the kernel crashes?  Did you
see anything in the dmesg before?  

I understand that you unplugged the device, but the important parts
are what did you do *before* or *during* 8)



Re: crash on urtwn removal

2015-04-28 Thread Remi Locherer
On Tue, Apr 28, 2015 at 01:41:29PM +0200, Martin Pieuchot wrote:
 On 28/04/15(Tue) 13:15, Remi Locherer wrote:
  On Tue, Apr 28, 2015 at 10:17:16AM +0200, Martin Pieuchot wrote:
   On 27/04/15(Mon) 22:45, Remi Locherer wrote:
On Mon, Apr 27, 2015 at 03:13:06PM +0200, Martin Pieuchot wrote:
 This trace tells use that the pipe is no longer valid, which means 
 that
 the device has been removed but a xfer is still referenced by ehci.
 
 The output of ps could help understand what's going wrong in such
 case.  If you can, please get it next time :)
 
 If you think you can reproduce this bug too, here's a diff that would
 get us a useful trace:

It seams to be easier to trigger this bug than the first one.
   
   What did you do to trigger it?
  
  While lynx was loading a website (via ipv6) I unplugged the urtwn device.
  The panic doesn't happen every time I try this. Maybe every 10th time.
 
 What I don't understand is that your trace shows that the process
 triggering the panic is doing an ioctl(2).  In your ps output you
 have ifconfig running, but since you did ps on CPU3 I can only
 guess that that's the program who triggered the panic.

Should I run ps on all CPUs next time? Or better on the cpu that is 
active at the beginning of the ddb session?
 
 So did you unplugged the device while doing anything with ifconfig?

I'm using Bob's wifinwid script which does this:

while :; do
(ifconfig $1 | grep 'status: active'  /dev/null)
if [ $? -ne 0 ]; then
[...]
sleep 2
fi
done

Full script: http://foad2.obtuse.com/beck/wifinwid

 Where you able to use your device before the kernel crashes?  Did you
 see anything in the dmesg before?  

Before I unplugged the urtwn I could use it normaly. I didn't noticed
any messages on the console but didn't check dmesg befor I unplugged urtwn.

 I understand that you unplugged the device, but the important parts
 are what did you do *before* or *during* 8)
 
Besides lynx in the foreground on the console the wifinwid script and
offlineimap (python) were active in the background.



Re: crash on urtwn removal

2015-04-28 Thread Remi Locherer
On Tue, Apr 28, 2015 at 10:17:16AM +0200, Martin Pieuchot wrote:
 On 27/04/15(Mon) 22:45, Remi Locherer wrote:
  On Mon, Apr 27, 2015 at 03:13:06PM +0200, Martin Pieuchot wrote:
   This trace tells use that the pipe is no longer valid, which means that
   the device has been removed but a xfer is still referenced by ehci.
   
   The output of ps could help understand what's going wrong in such
   case.  If you can, please get it next time :)
   
   If you think you can reproduce this bug too, here's a diff that would
   get us a useful trace:
  
  It seams to be easier to trigger this bug than the first one.
 
 What did you do to trigger it?

While lynx was loading a website (via ipv6) I unplugged the urtwn device.
The panic doesn't happen every time I try this. Maybe every 10th time.

  ddb trace and ps output:
  
  https://relo.ch/urtwncrash_trace_part1.jpg
  https://relo.ch/urtwncrash_trace_part2.jpg
  https://relo.ch/urtwncrash_ps_part1.jpg
  https://relo.ch/urtwncrash_ps_part2.jpg
  
  Unfortunately boot reboot in ddb did not work so I had to upload 
  the photos. But at least one line number appeared in the output so now I
  know how to build a kernel with debug symbols ;)
  
  
   
   Index: usbdi.c
   ===
   RCS file: /cvs/src/sys/dev/usb/usbdi.c,v
   retrieving revision 1.81
   diff -u -p -r1.81 usbdi.c
   --- usbdi.c   14 Mar 2015 03:38:50 -  1.81
   +++ usbdi.c   27 Apr 2015 13:08:33 -
   @@ -824,10 +824,8 @@ usb_insert_transfer(struct usbd_xfer *xf
 DPRINTFN(5,(usb_insert_transfer: pipe=%p running=%d timeout=%d\n,
 pipe, pipe-running, xfer-timeout));
#ifdef DIAGNOSTIC
   - if (xfer-busy_free != XFER_FREE) {
   - printf(%s: xfer=%p not free\n, __func__, xfer);
   - return (USBD_INVAL);
   - }
   + if (xfer-busy_free != XFER_FREE)
   + panic(%s: xfer=%p not free\n, __func__, xfer);
 xfer-busy_free = XFER_ONQU;
#endif
 s = splusb();
   
  
 



Re: crash on urtwn removal

2015-04-28 Thread Martin Pieuchot
On 27/04/15(Mon) 22:45, Remi Locherer wrote:
 On Mon, Apr 27, 2015 at 03:13:06PM +0200, Martin Pieuchot wrote:
  This trace tells use that the pipe is no longer valid, which means that
  the device has been removed but a xfer is still referenced by ehci.
  
  The output of ps could help understand what's going wrong in such
  case.  If you can, please get it next time :)
  
  If you think you can reproduce this bug too, here's a diff that would
  get us a useful trace:
 
 It seams to be easier to trigger this bug than the first one.

What did you do to trigger it?

 ddb trace and ps output:
 
 https://relo.ch/urtwncrash_trace_part1.jpg
 https://relo.ch/urtwncrash_trace_part2.jpg
 https://relo.ch/urtwncrash_ps_part1.jpg
 https://relo.ch/urtwncrash_ps_part2.jpg
 
 Unfortunately boot reboot in ddb did not work so I had to upload 
 the photos. But at least one line number appeared in the output so now I
 know how to build a kernel with debug symbols ;)
 
 
  
  Index: usbdi.c
  ===
  RCS file: /cvs/src/sys/dev/usb/usbdi.c,v
  retrieving revision 1.81
  diff -u -p -r1.81 usbdi.c
  --- usbdi.c 14 Mar 2015 03:38:50 -  1.81
  +++ usbdi.c 27 Apr 2015 13:08:33 -
  @@ -824,10 +824,8 @@ usb_insert_transfer(struct usbd_xfer *xf
  DPRINTFN(5,(usb_insert_transfer: pipe=%p running=%d timeout=%d\n,
  pipe, pipe-running, xfer-timeout));
   #ifdef DIAGNOSTIC
  -   if (xfer-busy_free != XFER_FREE) {
  -   printf(%s: xfer=%p not free\n, __func__, xfer);
  -   return (USBD_INVAL);
  -   }
  +   if (xfer-busy_free != XFER_FREE)
  +   panic(%s: xfer=%p not free\n, __func__, xfer);
  xfer-busy_free = XFER_ONQU;
   #endif
  s = splusb();