Re: crash on urtwn removal
On 28/04/15(Tue) 13:15, Remi Locherer wrote: On Tue, Apr 28, 2015 at 10:17:16AM +0200, Martin Pieuchot wrote: On 27/04/15(Mon) 22:45, Remi Locherer wrote: On Mon, Apr 27, 2015 at 03:13:06PM +0200, Martin Pieuchot wrote: This trace tells use that the pipe is no longer valid, which means that the device has been removed but a xfer is still referenced by ehci. The output of ps could help understand what's going wrong in such case. If you can, please get it next time :) If you think you can reproduce this bug too, here's a diff that would get us a useful trace: It seams to be easier to trigger this bug than the first one. What did you do to trigger it? While lynx was loading a website (via ipv6) I unplugged the urtwn device. The panic doesn't happen every time I try this. Maybe every 10th time. What I don't understand is that your trace shows that the process triggering the panic is doing an ioctl(2). In your ps output you have ifconfig running, but since you did ps on CPU3 I can only guess that that's the program who triggered the panic. So did you unplugged the device while doing anything with ifconfig? Where you able to use your device before the kernel crashes? Did you see anything in the dmesg before? I understand that you unplugged the device, but the important parts are what did you do *before* or *during* 8)
Re: crash on urtwn removal
On Tue, Apr 28, 2015 at 01:41:29PM +0200, Martin Pieuchot wrote: On 28/04/15(Tue) 13:15, Remi Locherer wrote: On Tue, Apr 28, 2015 at 10:17:16AM +0200, Martin Pieuchot wrote: On 27/04/15(Mon) 22:45, Remi Locherer wrote: On Mon, Apr 27, 2015 at 03:13:06PM +0200, Martin Pieuchot wrote: This trace tells use that the pipe is no longer valid, which means that the device has been removed but a xfer is still referenced by ehci. The output of ps could help understand what's going wrong in such case. If you can, please get it next time :) If you think you can reproduce this bug too, here's a diff that would get us a useful trace: It seams to be easier to trigger this bug than the first one. What did you do to trigger it? While lynx was loading a website (via ipv6) I unplugged the urtwn device. The panic doesn't happen every time I try this. Maybe every 10th time. What I don't understand is that your trace shows that the process triggering the panic is doing an ioctl(2). In your ps output you have ifconfig running, but since you did ps on CPU3 I can only guess that that's the program who triggered the panic. Should I run ps on all CPUs next time? Or better on the cpu that is active at the beginning of the ddb session? So did you unplugged the device while doing anything with ifconfig? I'm using Bob's wifinwid script which does this: while :; do (ifconfig $1 | grep 'status: active' /dev/null) if [ $? -ne 0 ]; then [...] sleep 2 fi done Full script: http://foad2.obtuse.com/beck/wifinwid Where you able to use your device before the kernel crashes? Did you see anything in the dmesg before? Before I unplugged the urtwn I could use it normaly. I didn't noticed any messages on the console but didn't check dmesg befor I unplugged urtwn. I understand that you unplugged the device, but the important parts are what did you do *before* or *during* 8) Besides lynx in the foreground on the console the wifinwid script and offlineimap (python) were active in the background.
Re: crash on urtwn removal
On Tue, Apr 28, 2015 at 10:17:16AM +0200, Martin Pieuchot wrote: On 27/04/15(Mon) 22:45, Remi Locherer wrote: On Mon, Apr 27, 2015 at 03:13:06PM +0200, Martin Pieuchot wrote: This trace tells use that the pipe is no longer valid, which means that the device has been removed but a xfer is still referenced by ehci. The output of ps could help understand what's going wrong in such case. If you can, please get it next time :) If you think you can reproduce this bug too, here's a diff that would get us a useful trace: It seams to be easier to trigger this bug than the first one. What did you do to trigger it? While lynx was loading a website (via ipv6) I unplugged the urtwn device. The panic doesn't happen every time I try this. Maybe every 10th time. ddb trace and ps output: https://relo.ch/urtwncrash_trace_part1.jpg https://relo.ch/urtwncrash_trace_part2.jpg https://relo.ch/urtwncrash_ps_part1.jpg https://relo.ch/urtwncrash_ps_part2.jpg Unfortunately boot reboot in ddb did not work so I had to upload the photos. But at least one line number appeared in the output so now I know how to build a kernel with debug symbols ;) Index: usbdi.c === RCS file: /cvs/src/sys/dev/usb/usbdi.c,v retrieving revision 1.81 diff -u -p -r1.81 usbdi.c --- usbdi.c 14 Mar 2015 03:38:50 - 1.81 +++ usbdi.c 27 Apr 2015 13:08:33 - @@ -824,10 +824,8 @@ usb_insert_transfer(struct usbd_xfer *xf DPRINTFN(5,(usb_insert_transfer: pipe=%p running=%d timeout=%d\n, pipe, pipe-running, xfer-timeout)); #ifdef DIAGNOSTIC - if (xfer-busy_free != XFER_FREE) { - printf(%s: xfer=%p not free\n, __func__, xfer); - return (USBD_INVAL); - } + if (xfer-busy_free != XFER_FREE) + panic(%s: xfer=%p not free\n, __func__, xfer); xfer-busy_free = XFER_ONQU; #endif s = splusb();
Re: crash on urtwn removal
On 27/04/15(Mon) 22:45, Remi Locherer wrote: On Mon, Apr 27, 2015 at 03:13:06PM +0200, Martin Pieuchot wrote: This trace tells use that the pipe is no longer valid, which means that the device has been removed but a xfer is still referenced by ehci. The output of ps could help understand what's going wrong in such case. If you can, please get it next time :) If you think you can reproduce this bug too, here's a diff that would get us a useful trace: It seams to be easier to trigger this bug than the first one. What did you do to trigger it? ddb trace and ps output: https://relo.ch/urtwncrash_trace_part1.jpg https://relo.ch/urtwncrash_trace_part2.jpg https://relo.ch/urtwncrash_ps_part1.jpg https://relo.ch/urtwncrash_ps_part2.jpg Unfortunately boot reboot in ddb did not work so I had to upload the photos. But at least one line number appeared in the output so now I know how to build a kernel with debug symbols ;) Index: usbdi.c === RCS file: /cvs/src/sys/dev/usb/usbdi.c,v retrieving revision 1.81 diff -u -p -r1.81 usbdi.c --- usbdi.c 14 Mar 2015 03:38:50 - 1.81 +++ usbdi.c 27 Apr 2015 13:08:33 - @@ -824,10 +824,8 @@ usb_insert_transfer(struct usbd_xfer *xf DPRINTFN(5,(usb_insert_transfer: pipe=%p running=%d timeout=%d\n, pipe, pipe-running, xfer-timeout)); #ifdef DIAGNOSTIC - if (xfer-busy_free != XFER_FREE) { - printf(%s: xfer=%p not free\n, __func__, xfer); - return (USBD_INVAL); - } + if (xfer-busy_free != XFER_FREE) + panic(%s: xfer=%p not free\n, __func__, xfer); xfer-busy_free = XFER_ONQU; #endif s = splusb();