Re: Isochronous transfer error on USB3
Em Wed, 08 Jan 2014 15:05:12 -0200 Mauro Carvalho Chehab escreveu: > Em Wed, 08 Jan 2014 14:31:28 -0200 > Mauro Carvalho Chehab escreveu: > > > Em Thu, 02 Jan 2014 14:07:22 -0800 > > Sarah Sharp escreveu: > > > > > On Sun, Dec 29, 2013 at 02:54:40AM -0200, Mauro Carvalho Chehab wrote: > > > > It seems that usb_unlink_urb() is causing troubles with xHCI: the > > > > endpoint stops streaming, but, after that, it doesn't start again, > > > > and lots of debug messages are produced. I emailed you the full log > > > > after start streaming in priv (too big for vger), but basically, > > > > it produces: > > > > > > > > [ 1635.754546] xhci_hcd :00:14.0: Endpoint 0x81 not halted, > > > > refusing to reset. > > > > [ 1635.754562] xhci_hcd :00:14.0: Endpoint 0x82 not halted, > > > > refusing to reset. > > > > [ 1635.754577] xhci_hcd :00:14.0: Endpoint 0x83 not halted, > > > > refusing to reset. > > > > [ 1635.754594] xhci_hcd :00:14.0: Endpoint 0x84 not halted, > > > > refusing to reset. > > > > > > I think that's due to the driver (or userspace) attempting to reset the > > > endpoint when it didn't actually receive a stall (-EPIPE) status from an > > > URB. When that happens, the xHCI host controller endpoint "toggle" bits > > > get out of sync with the device toggle bits, and the result is that all > > > transfers will fail to the endpoint from then on until you switch > > > alternate interface settings or unplug/replug the device. > > > > > > Try this patch: > > > > > > http://marc.info/?l=linux-usb&m=138116117104619&w=2 > > > > > > It's still under RFC, and I know it has race conditions, but it will let > > > you quickly test whether this fixes your issue. > > > > Didn't work fine, or at least it didn't solve all the problems. Also, it > > started to cause OOPSes due to the race conditions. > > > > > > > > This has been a long-standing xHCI driver bug. I asked my OPW intern to > > > work on the patch to fix it, but she may be a bit busy with her new job > > > to finish up the RFC. I'll probably have to take over finishing the > > > patch, if this turns out to be your issue. > > > > > > > (Not sure why it is trying to stop all endpoints - as just one endpoint > > > > was > > > > requested to restart). > > > > > > Something is calling into usb_clear_halt() with all the endpoints. > > > Userspace, perhaps? > > > > No, userspace is not doing it. The userspace doesn't even know that this > > device is USB (and were written at the time that all media drivers were > > PCI only - so it doesn't have any USB specific call on it). > > > > > You could add WARN() calls to usb_clear_halt() to > > > see what code is resetting the endpoints. In any case, it's not part of > > > the USB core code to change configuration or alt settings, since I don't > > > see any xHCI driver output from the endpoint bandwidth code in this > > > chunk of the dmesg you sent: > > > > The em28xx-audio.c driver may need to call usb_set_interface() while > > the video is still streaming, in order to unmute the audio. That happens > > when the audio device is opened. > > > > With EHCI, this works properly. > > > > > [ 1649.640783] xhci_hcd :00:14.0: Removing canceled TD starting at > > > 0xb41e8580 (dma). > > > [ 1649.640784] xhci_hcd :00:14.0: TRB to noop at offset 0xb41e8580 > > > [ 1649.643159] xhci_hcd :00:14.0: Endpoint 0x81 not halted, refusing > > > to reset. > > > [ 1649.643188] xhci_hcd :00:14.0: Endpoint 0x82 not halted, refusing > > > to reset. > > > [ 1649.643215] xhci_hcd :00:14.0: Endpoint 0x83 not halted, refusing > > > to reset. > > > [ 1649.643239] xhci_hcd :00:14.0: Endpoint 0x84 not halted, refusing > > > to reset. > > > [ 1649.735539] xhci_hcd :00:14.0: ERROR no room on ep ring, try ring > > > expansion > > > > > > Sarah Sharp > > > > Btw, sometimes, I get such logs: > > > > [ 646.192273] xhci_hcd :00:14.0: Miss service interval error, set skip > > flag > > [ 646.192292] xhci_hcd :00:14.0: Miss service interval error, set skip > > flag > > [ 646.192311] xhci_hcd :00:14.0: Miss service interval error, set skip > > flag > > [ 646.192329] xhci_hcd :00:14.0: Miss service interval error, set skip > > flag > > [ 646.192351] xhci_hcd :00:14.0: Miss service interval error, set skip > > flag > > [ 646.192376] xhci_hcd :00:14.0: Miss service interval error, set skip > > flag > > > > After adding some debug at em28xx-audio, triggering alsa trigger start > > events, I'm getting those: > > > > [ 3078.971224] snd_em28xx_capture_trigger: start capture > > [ 3078.971284] xhci_hcd :00:14.0: ERROR no room on ep ring, try ring > > expansion > > [ 3078.971311] xhci_hcd :00:14.0: ring expansion succeed, now has 4 > > segments > > [ 3078.971350] xhci_hcd :00:14.0: ERROR no room on ep ring, try ring > > expansion > > [ 3078.971387] xhci_hcd :00:14.0: ring expansion succeed, now has 8 > > segments > > [ 3079.034626]
Re: Isochronous transfer error on USB3
Em Wed, 08 Jan 2014 14:31:28 -0200 Mauro Carvalho Chehab escreveu: > Em Thu, 02 Jan 2014 14:07:22 -0800 > Sarah Sharp escreveu: > > > On Sun, Dec 29, 2013 at 02:54:40AM -0200, Mauro Carvalho Chehab wrote: > > > It seems that usb_unlink_urb() is causing troubles with xHCI: the > > > endpoint stops streaming, but, after that, it doesn't start again, > > > and lots of debug messages are produced. I emailed you the full log > > > after start streaming in priv (too big for vger), but basically, > > > it produces: > > > > > > [ 1635.754546] xhci_hcd :00:14.0: Endpoint 0x81 not halted, refusing > > > to reset. > > > [ 1635.754562] xhci_hcd :00:14.0: Endpoint 0x82 not halted, refusing > > > to reset. > > > [ 1635.754577] xhci_hcd :00:14.0: Endpoint 0x83 not halted, refusing > > > to reset. > > > [ 1635.754594] xhci_hcd :00:14.0: Endpoint 0x84 not halted, refusing > > > to reset. > > > > I think that's due to the driver (or userspace) attempting to reset the > > endpoint when it didn't actually receive a stall (-EPIPE) status from an > > URB. When that happens, the xHCI host controller endpoint "toggle" bits > > get out of sync with the device toggle bits, and the result is that all > > transfers will fail to the endpoint from then on until you switch > > alternate interface settings or unplug/replug the device. > > > > Try this patch: > > > > http://marc.info/?l=linux-usb&m=138116117104619&w=2 > > > > It's still under RFC, and I know it has race conditions, but it will let > > you quickly test whether this fixes your issue. > > Didn't work fine, or at least it didn't solve all the problems. Also, it > started to cause OOPSes due to the race conditions. > > > > > This has been a long-standing xHCI driver bug. I asked my OPW intern to > > work on the patch to fix it, but she may be a bit busy with her new job > > to finish up the RFC. I'll probably have to take over finishing the > > patch, if this turns out to be your issue. > > > > > (Not sure why it is trying to stop all endpoints - as just one endpoint > > > was > > > requested to restart). > > > > Something is calling into usb_clear_halt() with all the endpoints. > > Userspace, perhaps? > > No, userspace is not doing it. The userspace doesn't even know that this > device is USB (and were written at the time that all media drivers were > PCI only - so it doesn't have any USB specific call on it). > > > You could add WARN() calls to usb_clear_halt() to > > see what code is resetting the endpoints. In any case, it's not part of > > the USB core code to change configuration or alt settings, since I don't > > see any xHCI driver output from the endpoint bandwidth code in this > > chunk of the dmesg you sent: > > The em28xx-audio.c driver may need to call usb_set_interface() while > the video is still streaming, in order to unmute the audio. That happens > when the audio device is opened. > > With EHCI, this works properly. > > > [ 1649.640783] xhci_hcd :00:14.0: Removing canceled TD starting at > > 0xb41e8580 (dma). > > [ 1649.640784] xhci_hcd :00:14.0: TRB to noop at offset 0xb41e8580 > > [ 1649.643159] xhci_hcd :00:14.0: Endpoint 0x81 not halted, refusing to > > reset. > > [ 1649.643188] xhci_hcd :00:14.0: Endpoint 0x82 not halted, refusing to > > reset. > > [ 1649.643215] xhci_hcd :00:14.0: Endpoint 0x83 not halted, refusing to > > reset. > > [ 1649.643239] xhci_hcd :00:14.0: Endpoint 0x84 not halted, refusing to > > reset. > > [ 1649.735539] xhci_hcd :00:14.0: ERROR no room on ep ring, try ring > > expansion > > > > Sarah Sharp > > Btw, sometimes, I get such logs: > > [ 646.192273] xhci_hcd :00:14.0: Miss service interval error, set skip > flag > [ 646.192292] xhci_hcd :00:14.0: Miss service interval error, set skip > flag > [ 646.192311] xhci_hcd :00:14.0: Miss service interval error, set skip > flag > [ 646.192329] xhci_hcd :00:14.0: Miss service interval error, set skip > flag > [ 646.192351] xhci_hcd :00:14.0: Miss service interval error, set skip > flag > [ 646.192376] xhci_hcd :00:14.0: Miss service interval error, set skip > flag > > After adding some debug at em28xx-audio, triggering alsa trigger start > events, I'm getting those: > > [ 3078.971224] snd_em28xx_capture_trigger: start capture > [ 3078.971284] xhci_hcd :00:14.0: ERROR no room on ep ring, try ring > expansion > [ 3078.971311] xhci_hcd :00:14.0: ring expansion succeed, now has 4 > segments > [ 3078.971350] xhci_hcd :00:14.0: ERROR no room on ep ring, try ring > expansion > [ 3078.971387] xhci_hcd :00:14.0: ring expansion succeed, now has 8 > segments > [ 3079.034626] em28xx_audio_isocirq, 64 packets (first one with size 12) > > Here, some audio data arrives. > > [ 3079.034665] snd_em28xx_capture_trigger: stop capture > > It seems, however, that this didn't arrive in time, and causes an alsa > buffer underrun. So, it cancels the existing URBs. > > PS.
Re: Isochronous transfer error on USB3
Em Thu, 02 Jan 2014 14:07:22 -0800 Sarah Sharp escreveu: > On Sun, Dec 29, 2013 at 02:54:40AM -0200, Mauro Carvalho Chehab wrote: > > It seems that usb_unlink_urb() is causing troubles with xHCI: the > > endpoint stops streaming, but, after that, it doesn't start again, > > and lots of debug messages are produced. I emailed you the full log > > after start streaming in priv (too big for vger), but basically, > > it produces: > > > > [ 1635.754546] xhci_hcd :00:14.0: Endpoint 0x81 not halted, refusing to > > reset. > > [ 1635.754562] xhci_hcd :00:14.0: Endpoint 0x82 not halted, refusing to > > reset. > > [ 1635.754577] xhci_hcd :00:14.0: Endpoint 0x83 not halted, refusing to > > reset. > > [ 1635.754594] xhci_hcd :00:14.0: Endpoint 0x84 not halted, refusing to > > reset. > > I think that's due to the driver (or userspace) attempting to reset the > endpoint when it didn't actually receive a stall (-EPIPE) status from an > URB. When that happens, the xHCI host controller endpoint "toggle" bits > get out of sync with the device toggle bits, and the result is that all > transfers will fail to the endpoint from then on until you switch > alternate interface settings or unplug/replug the device. > > Try this patch: > > http://marc.info/?l=linux-usb&m=138116117104619&w=2 > > It's still under RFC, and I know it has race conditions, but it will let > you quickly test whether this fixes your issue. Didn't work fine, or at least it didn't solve all the problems. Also, it started to cause OOPSes due to the race conditions. > > This has been a long-standing xHCI driver bug. I asked my OPW intern to > work on the patch to fix it, but she may be a bit busy with her new job > to finish up the RFC. I'll probably have to take over finishing the > patch, if this turns out to be your issue. > > > (Not sure why it is trying to stop all endpoints - as just one endpoint was > > requested to restart). > > Something is calling into usb_clear_halt() with all the endpoints. > Userspace, perhaps? No, userspace is not doing it. The userspace doesn't even know that this device is USB (and were written at the time that all media drivers were PCI only - so it doesn't have any USB specific call on it). > You could add WARN() calls to usb_clear_halt() to > see what code is resetting the endpoints. In any case, it's not part of > the USB core code to change configuration or alt settings, since I don't > see any xHCI driver output from the endpoint bandwidth code in this > chunk of the dmesg you sent: The em28xx-audio.c driver may need to call usb_set_interface() while the video is still streaming, in order to unmute the audio. That happens when the audio device is opened. With EHCI, this works properly. > [ 1649.640783] xhci_hcd :00:14.0: Removing canceled TD starting at > 0xb41e8580 (dma). > [ 1649.640784] xhci_hcd :00:14.0: TRB to noop at offset 0xb41e8580 > [ 1649.643159] xhci_hcd :00:14.0: Endpoint 0x81 not halted, refusing to > reset. > [ 1649.643188] xhci_hcd :00:14.0: Endpoint 0x82 not halted, refusing to > reset. > [ 1649.643215] xhci_hcd :00:14.0: Endpoint 0x83 not halted, refusing to > reset. > [ 1649.643239] xhci_hcd :00:14.0: Endpoint 0x84 not halted, refusing to > reset. > [ 1649.735539] xhci_hcd :00:14.0: ERROR no room on ep ring, try ring > expansion > > Sarah Sharp Btw, sometimes, I get such logs: [ 646.192273] xhci_hcd :00:14.0: Miss service interval error, set skip flag [ 646.192292] xhci_hcd :00:14.0: Miss service interval error, set skip flag [ 646.192311] xhci_hcd :00:14.0: Miss service interval error, set skip flag [ 646.192329] xhci_hcd :00:14.0: Miss service interval error, set skip flag [ 646.192351] xhci_hcd :00:14.0: Miss service interval error, set skip flag [ 646.192376] xhci_hcd :00:14.0: Miss service interval error, set skip flag After adding some debug at em28xx-audio, triggering alsa trigger start events, I'm getting those: [ 3078.971224] snd_em28xx_capture_trigger: start capture [ 3078.971284] xhci_hcd :00:14.0: ERROR no room on ep ring, try ring expansion [ 3078.971311] xhci_hcd :00:14.0: ring expansion succeed, now has 4 segments [ 3078.971350] xhci_hcd :00:14.0: ERROR no room on ep ring, try ring expansion [ 3078.971387] xhci_hcd :00:14.0: ring expansion succeed, now has 8 segments [ 3079.034626] em28xx_audio_isocirq, 64 packets (first one with size 12) Here, some audio data arrives. [ 3079.034665] snd_em28xx_capture_trigger: stop capture It seems, however, that this didn't arrive in time, and causes an alsa buffer underrun. So, it cancels the existing URBs. PS.: Even with EHCI, it causes a few ALSA underruns before it gets steady. I suspect that this is due to em28xx time to synchronize audio and video streams. [ 3079.034736] xhci_hcd :00:14.0: Cancel URB 88020790, dev 4, ep 0x83, starting at offset 0x1ffb13850 [ 3079.034755] xhci_hcd :00:14.0: // Ding do
Re: Isochronous transfer error on USB3
On Sun, Dec 29, 2013 at 02:54:40AM -0200, Mauro Carvalho Chehab wrote: > It seems that usb_unlink_urb() is causing troubles with xHCI: the > endpoint stops streaming, but, after that, it doesn't start again, > and lots of debug messages are produced. I emailed you the full log > after start streaming in priv (too big for vger), but basically, > it produces: > > [ 1635.754546] xhci_hcd :00:14.0: Endpoint 0x81 not halted, refusing to > reset. > [ 1635.754562] xhci_hcd :00:14.0: Endpoint 0x82 not halted, refusing to > reset. > [ 1635.754577] xhci_hcd :00:14.0: Endpoint 0x83 not halted, refusing to > reset. > [ 1635.754594] xhci_hcd :00:14.0: Endpoint 0x84 not halted, refusing to > reset. I think that's due to the driver (or userspace) attempting to reset the endpoint when it didn't actually receive a stall (-EPIPE) status from an URB. When that happens, the xHCI host controller endpoint "toggle" bits get out of sync with the device toggle bits, and the result is that all transfers will fail to the endpoint from then on until you switch alternate interface settings or unplug/replug the device. Try this patch: http://marc.info/?l=linux-usb&m=138116117104619&w=2 It's still under RFC, and I know it has race conditions, but it will let you quickly test whether this fixes your issue. This has been a long-standing xHCI driver bug. I asked my OPW intern to work on the patch to fix it, but she may be a bit busy with her new job to finish up the RFC. I'll probably have to take over finishing the patch, if this turns out to be your issue. > (Not sure why it is trying to stop all endpoints - as just one endpoint was > requested to restart). Something is calling into usb_clear_halt() with all the endpoints. Userspace, perhaps? You could add WARN() calls to usb_clear_halt() to see what code is resetting the endpoints. In any case, it's not part of the USB core code to change configuration or alt settings, since I don't see any xHCI driver output from the endpoint bandwidth code in this chunk of the dmesg you sent: [ 1649.640783] xhci_hcd :00:14.0: Removing canceled TD starting at 0xb41e8580 (dma). [ 1649.640784] xhci_hcd :00:14.0: TRB to noop at offset 0xb41e8580 [ 1649.643159] xhci_hcd :00:14.0: Endpoint 0x81 not halted, refusing to reset. [ 1649.643188] xhci_hcd :00:14.0: Endpoint 0x82 not halted, refusing to reset. [ 1649.643215] xhci_hcd :00:14.0: Endpoint 0x83 not halted, refusing to reset. [ 1649.643239] xhci_hcd :00:14.0: Endpoint 0x84 not halted, refusing to reset. [ 1649.735539] xhci_hcd :00:14.0: ERROR no room on ep ring, try ring expansion Sarah Sharp -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Isochronous transfer error on USB3
Em Mon, 18 Mar 2013 09:51:24 -0700 Sarah Sharp escreveu: > On Tue, Mar 12, 2013 at 10:57:22AM +0100, jean-philippe francois wrote: > > 2013/3/12 jean-philippe francois : > > > 2013/3/12 Sarah Sharp : > > >> On Mon, Mar 11, 2013 at 01:21:48PM +0100, jean-philippe francois wrote: > > >>> Hi, > > >>> > > >>> The company I work for is doing USB cameras, for which I wrote the > > >>> drivers (out of tree). > > >> > > >> Kernel driver or userspace driver? > > >> > > > Kernel driver. > > Ok, I don't know what license your driver is under, so I can't legally > look at the C snippet you sent me. Sarah, I'm experiencing a similar error with the in-kernel em28xx driver, plus some fixes that I'm writing for Kernel 3.14. The tree with the driver is at: http://git.linuxtv.org/mchehab/experimental.git/shortlog/refs/heads/em28xx-v4l2 The code with troubles is at: http://git.linuxtv.org/mchehab/experimental.git/blob/refs/heads/em28xx-v4l2:/drivers/media/usb/em28xx/em28xx-audio.c Basically, the device works fine on an USB 2.0 port, but if I connect it on any of the 3 existing USB 3.0 ports on my notebook, the audio endpoint doesn't work. On this device, there are two isoc transfers: one for video (on frame on every 1/60 seconds) and another one for audio. Each on a different endpoint. The video transfers happen fine. The audio transfer doesn't. Due to the way the userspace works, every time the channel changes, the audio stream is stopped and restarted. The code that starts streaming is at line 164: 164 static int em28xx_init_audio_isoc(struct em28xx *dev) 165 { 166 int i, errCode; 167 168 dprintk("Starting isoc transfers\n"); 169 170 /* Start streaming */ 171 for (i = 0; i < EM28XX_AUDIO_BUFS; i++) { 172 memset(dev->adev.transfer_buffer[i], 0x80, 173dev->adev.urb[i]->transfer_buffer_length); 174 175 errCode = usb_submit_urb(dev->adev.urb[i], GFP_ATOMIC); 176 if (errCode) { 177 em28xx_errdev("submit of audio urb failed\n"); 178 em28xx_deinit_isoc_audio(dev); 179 atomic_set(&dev->stream_started, 0); 180 return errCode; 181 } 182 183 } 184 185 return 0; 186 } And the code that stops it is: 61 static int em28xx_deinit_isoc_audio(struct em28xx *dev) 62 { 63 int i; 64 65 dprintk("Stopping isoc\n"); 66 for (i = 0; i < EM28XX_AUDIO_BUFS; i++) { 67 struct urb *urb = dev->adev.urb[i]; 68 69 if (!irqs_disabled()) 70 usb_kill_urb(urb); 71 else 72 usb_unlink_urb(urb); 73 } 74 75 return 0; 76 } It seems that usb_unlink_urb() is causing troubles with xHCI: the endpoint stops streaming, but, after that, it doesn't start again, and lots of debug messages are produced. I emailed you the full log after start streaming in priv (too big for vger), but basically, it produces: [ 1635.754546] xhci_hcd :00:14.0: Endpoint 0x81 not halted, refusing to reset. [ 1635.754562] xhci_hcd :00:14.0: Endpoint 0x82 not halted, refusing to reset. [ 1635.754577] xhci_hcd :00:14.0: Endpoint 0x83 not halted, refusing to reset. [ 1635.754594] xhci_hcd :00:14.0: Endpoint 0x84 not halted, refusing to reset. (Not sure why it is trying to stop all endpoints - as just one endpoint was requested to restart). FYI, my tests are with xHCI 1.0: [3.640583] xhci_hcd :00:14.0: // @c900040c = 0x180 (CAPLENGTH AND HCIVERSION) Any ideas? -- Cheers, Mauro -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Isochronous transfer error on USB3
On Wed, Mar 13, 2013 at 03:34:15PM +0100, jean-philippe francois wrote: > 2013/3/12 jean-philippe francois : > > 2013/3/12 jean-philippe francois : > >> 2013/3/12 Sarah Sharp : > >>> On Mon, Mar 11, 2013 at 01:21:48PM +0100, jean-philippe francois wrote: > USB3 host is an asmedia ASM1042. > > Update : > I have been able to test this on an intel XHCI, and it works fine, > except for some > difference in the treatment of short iso packet in the EHCI and XHCI path. > > On the ASM1042, it also work when the endpoint descriptor is set to > 3x1000, bInterval = 2. > > But it does not with a iso endpoint descriptor of 3x1000, bInterval = 1. That's odd. Can you recompile the xHCI driver with CONFIG_USB_XHCI_HCD_DEBUGGING and CONFIG_USB_DEBUG turned on, and send me dmesg, starting from just before you plug in your device and start a video stream. Please capture trying to use both the non-working endpoint setting, and the working setting. Sarah Sharp -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Isochronous transfer error on USB3
On Tue, Mar 12, 2013 at 10:57:22AM +0100, jean-philippe francois wrote: > 2013/3/12 jean-philippe francois : > > 2013/3/12 Sarah Sharp : > >> On Mon, Mar 11, 2013 at 01:21:48PM +0100, jean-philippe francois wrote: > >>> Hi, > >>> > >>> The company I work for is doing USB cameras, for which I wrote the > >>> drivers (out of tree). > >> > >> Kernel driver or userspace driver? > >> > > Kernel driver. Ok, I don't know what license your driver is under, so I can't legally look at the C snippet you sent me. > >>> Raw camera data are transferred using isochronous transfer, which work > >>> fine when used on USB2 EHCI. > >>> > >>> However when plugging the camera on an USB3 port, xhci spits the > >>> following messages on URB submission : > >>> > >>> [ 1704.989785] xhci_hcd :01:00.0: ERROR Transfer event TRB DMA ptr > >>> not part of current TD > >> > >> Does the device work properly, despite the error messages? I see that > >> there are transfer errors in the log file, along with those messages. > >> Are those expected? > >> > > > > The completion handler for the submitteds urb is never called, so I would > > say > > no transfer is hapenning. After these error messages, nothing happens. > > The urb are not completed, so they are not restarted. > > When stopping the transfer, the HC dies. > > > > Sometimes the completion handler is called, but transfer are never > restarted anyway. What do you mean by the "transfers are never restarted anyway"? > Attached is the interrupt routine code, and the relevant part of the log. > > Individual iso frame transfer are completed, actual_length field is > Ok, but status is -EPROTO. So that's a transfer error. You can't trust the actual_length field if you get a transfer error. > >>> USB3 host is an asmedia ASM1042. > >> > >> Do you know if this is a 1.0 xHCI host? E.g. when you enable > >> CONFIG_USB_DEBUG and CONFIG_USB_XHCI_HCD_DEBUGGING, do you see lines > >> like this during driver initialization: > >> > >> CAPLENGTH AND HCIVERSION 0x180: > >> > >> If you see that line start with 0x100, it's a 1.0 xHCI host. If starts > >> with 0x096, it's a 0.96 host; 0x095 would be a 0.95 host. > >> > > > > It is a 0.96 host. : > > 0x960020 (CAPLENGTH AND HCIVERSION) All right, then my hypothesis wasn't correct. > > I don't think it is the same issue here, because no transfer happens or is > > considered complete, and we are at the beginning of the frame, so there > > should > > be no short transfer. That's not a short transfer you're seeing. You're seeing a protocol error, which usually means either: a bad cable, a bad connection between the device and host, or something is misconfigured in the xHCI driver. > > What is the meaning of "Endpoint 0x82 not halted, refusing to reset" > > The failed isochronous urb should happen on this endpoint. If your driver is calling into usb_reset_endpoint when the host endpoint hasn't stalled, you will get a mismatch between the toggle bit state in the xHCI host driver and the device. This is because currently, the xHCI driver refuses to send a Reset Endpoint command if the endpoint isn't actually halted. This is because the hardware will reject the command. The driver should issue a Configure Endpoint command instead. Are you calling into usb_reset_endpoint when the endpoint isn't halted? Sarah Sharp -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Isochronous transfer error on USB3
2013/3/12 jean-philippe francois : > 2013/3/12 jean-philippe francois : >> 2013/3/12 Sarah Sharp : >>> On Mon, Mar 11, 2013 at 01:21:48PM +0100, jean-philippe francois wrote: Hi, The company I work for is doing USB cameras, for which I wrote the drivers (out of tree). >>> >>> Kernel driver or userspace driver? >>> >> Kernel driver. >> Raw camera data are transferred using isochronous transfer, which work fine when used on USB2 EHCI. However when plugging the camera on an USB3 port, xhci spits the following messages on URB submission : [ 1704.989785] xhci_hcd :01:00.0: ERROR Transfer event TRB DMA ptr not part of current TD >>> >>> Does the device work properly, despite the error messages? I see that >>> there are transfer errors in the log file, along with those messages. >>> Are those expected? >>> >> >> The completion handler for the submitteds urb is never called, so I would say >> no transfer is hapenning. After these error messages, nothing happens. >> The urb are not completed, so they are not restarted. >> When stopping the transfer, the HC dies. >> > > Sometimes the completion handler is called, but transfer are never > restarted anyway. > Attached is the interrupt routine code, and the relevant part of the log. > > Individual iso frame transfer are completed, actual_length field is > Ok, but status is -EPROTO. > > USB3 host is an asmedia ASM1042. Update : I have been able to test this on an intel XHCI, and it works fine, except for some difference in the treatment of short iso packet in the EHCI and XHCI path. On the ASM1042, it also work when the endpoint descriptor is set to 3x1000, bInterval = 2. But it does not with a iso endpoint descriptor of 3x1000, bInterval = 1. >>> >>> Do you know if this is a 1.0 xHCI host? E.g. when you enable >>> CONFIG_USB_DEBUG and CONFIG_USB_XHCI_HCD_DEBUGGING, do you see lines >>> like this during driver initialization: >>> >>> CAPLENGTH AND HCIVERSION 0x180: >>> >>> If you see that line start with 0x100, it's a 1.0 xHCI host. If starts >>> with 0x096, it's a 0.96 host; 0x095 would be a 0.95 host. >>> >> >> It is a 0.96 host. : >> 0x960020 (CAPLENGTH AND HCIVERSION) >> >>> I ask because I have a todo list entry to fix short packet handling on >>> xHCI 1.0 hosts: >>> >>> "On Intel Lynx Point xHCI, when you plug in a HS USB webcam, the log >>> will fill with the error message: >>> >>> ERROR Transfer event TRB DMA ptr not part of current TD >>> >>> However, the video looks fine, and there is no impact on the driver >>> behavior. >>> >>> This is caused by a change between the xHCI 0.96 and the xHCI 1.0 spec. >>> The change describes in section 4.10.1.1 how the xHCI host controller >>> should handle short transfers that happen on a TD comprised of more than >>> one TRB chained together. The 0.96 spec says that the host will only >>> send one event with a short completion code for the TD. The 1.0 spec >>> says if the short packet happened on a TRB that wasn't the last TRB in >>> the TD, then we will get two events with short completions (one for the >>> short TRB and one for the last TRB with the IOC flag set). The 1.0 spec >>> says that the xHCI driver shall not consider the TD to be done until the >>> second event is received. >>> >> >> I don't think it is the same issue here, because no transfer happens or is >> considered complete, and we are at the beginning of the frame, so there >> should >> be no short transfer. >> >> What is the meaning of "Endpoint 0x82 not halted, refusing to reset" >> The failed isochronous urb should happen on this endpoint. >> >> Thank you for your interest. >> >> Jean-Philippe François >>> The current xHCI driver behavior finishes the TD when it receives the >>> first short event. Then when it receives the second event, it prints the >>> warning message. >>> >>> This also appeared under the xHCI 1.0 host in Intel Panther Point xHCI, >>> but it was mistaken for a known hardware bug, the spurious successful >>> event bug. The work-around that was put into the driver masks this xHCI >>> driver bug. >>> >>> Impact on the driver is minimal for this bug. >>> >>> The easy fix is just to change the XHCI_SPURIOUS_SUCCESS quirk to be >>> applied for all 1.0 xHCI hosts. However, that doesn't solve the race >>> condition that exists when the endpoint is stopped before the second >>> event TRB is received. >>> >>> The harder fix is to either add an event data TRB after the chained TD >>> (and stop using ISP and IOC flags), or make the xHCI ring handling code >>> wait for the second event completion for xHCI 1.0 hosts." >>> >>> Sarah Sharp -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Isochronous transfer error on USB3
2013/3/12 jean-philippe francois : > 2013/3/12 Sarah Sharp : >> On Mon, Mar 11, 2013 at 01:21:48PM +0100, jean-philippe francois wrote: >>> Hi, >>> >>> The company I work for is doing USB cameras, for which I wrote the >>> drivers (out of tree). >> >> Kernel driver or userspace driver? >> > Kernel driver. > >>> Raw camera data are transferred using isochronous transfer, which work >>> fine when used on USB2 EHCI. >>> >>> However when plugging the camera on an USB3 port, xhci spits the >>> following messages on URB submission : >>> >>> [ 1704.989785] xhci_hcd :01:00.0: ERROR Transfer event TRB DMA ptr >>> not part of current TD >> >> Does the device work properly, despite the error messages? I see that >> there are transfer errors in the log file, along with those messages. >> Are those expected? >> > > The completion handler for the submitteds urb is never called, so I would say > no transfer is hapenning. After these error messages, nothing happens. > The urb are not completed, so they are not restarted. > When stopping the transfer, the HC dies. > Sometimes the completion handler is called, but transfer are never restarted anyway. Attached is the interrupt routine code, and the relevant part of the log. Individual iso frame transfer are completed, actual_length field is Ok, but status is -EPROTO. >>> USB3 host is an asmedia ASM1042. >> >> Do you know if this is a 1.0 xHCI host? E.g. when you enable >> CONFIG_USB_DEBUG and CONFIG_USB_XHCI_HCD_DEBUGGING, do you see lines >> like this during driver initialization: >> >> CAPLENGTH AND HCIVERSION 0x180: >> >> If you see that line start with 0x100, it's a 1.0 xHCI host. If starts >> with 0x096, it's a 0.96 host; 0x095 would be a 0.95 host. >> > > It is a 0.96 host. : > 0x960020 (CAPLENGTH AND HCIVERSION) > >> I ask because I have a todo list entry to fix short packet handling on >> xHCI 1.0 hosts: >> >> "On Intel Lynx Point xHCI, when you plug in a HS USB webcam, the log >> will fill with the error message: >> >> ERROR Transfer event TRB DMA ptr not part of current TD >> >> However, the video looks fine, and there is no impact on the driver >> behavior. >> >> This is caused by a change between the xHCI 0.96 and the xHCI 1.0 spec. >> The change describes in section 4.10.1.1 how the xHCI host controller >> should handle short transfers that happen on a TD comprised of more than >> one TRB chained together. The 0.96 spec says that the host will only >> send one event with a short completion code for the TD. The 1.0 spec >> says if the short packet happened on a TRB that wasn't the last TRB in >> the TD, then we will get two events with short completions (one for the >> short TRB and one for the last TRB with the IOC flag set). The 1.0 spec >> says that the xHCI driver shall not consider the TD to be done until the >> second event is received. >> > > I don't think it is the same issue here, because no transfer happens or is > considered complete, and we are at the beginning of the frame, so there should > be no short transfer. > > What is the meaning of "Endpoint 0x82 not halted, refusing to reset" > The failed isochronous urb should happen on this endpoint. > > Thank you for your interest. > > Jean-Philippe François >> The current xHCI driver behavior finishes the TD when it receives the >> first short event. Then when it receives the second event, it prints the >> warning message. >> >> This also appeared under the xHCI 1.0 host in Intel Panther Point xHCI, >> but it was mistaken for a known hardware bug, the spurious successful >> event bug. The work-around that was put into the driver masks this xHCI >> driver bug. >> >> Impact on the driver is minimal for this bug. >> >> The easy fix is just to change the XHCI_SPURIOUS_SUCCESS quirk to be >> applied for all 1.0 xHCI hosts. However, that doesn't solve the race >> condition that exists when the endpoint is stopped before the second >> event TRB is received. >> >> The harder fix is to either add an event data TRB after the chained TD >> (and stop using ISP and IOC flags), or make the xHCI ring handling code >> wait for the second event completion for xHCI 1.0 hosts." >> >> Sarah Sharp /* * isochronous IRQ */ static void ovfx2_isoc_irq(struct urb *urb) { int i, global_len; struct usb_ovfx2 *ov; struct ovfx2_sbuf *sbuf; struct list_head * list_in; unsigned long flags = 0; if (!urb->context) { printk(KERN_ERR "%s :no context\n", __func__); return; } sbuf = urb->context; ov = sbuf->ov; if(!ov) { printk(KERN_ERR "%s :ov NULL\n", __func__); } if (!ov->dev || !ov->user) { PDEBUG(4, ov,"no device, or not open"); return; } if (!ov->streaming) { PDEBUG(4, ov,"hmmm... not streaming, but got interrupt"); return; } if (urb->status == -ENOENT || urb->status == -ECONNRESET) { PDEBUG(4, ov, "URB unlinked"); return; } if (urb->status != -EINPROGRESS && urb->status != 0) { dev_err(DEV(ov), "ERROR: urb->status=%d: %s", urb->status, symbolic(urb_errlist, urb->sta
Re: Isochronous transfer error on USB3
2013/3/12 Sarah Sharp : > On Mon, Mar 11, 2013 at 01:21:48PM +0100, jean-philippe francois wrote: >> Hi, >> >> The company I work for is doing USB cameras, for which I wrote the >> drivers (out of tree). > > Kernel driver or userspace driver? > Kernel driver. >> Raw camera data are transferred using isochronous transfer, which work >> fine when used on USB2 EHCI. >> >> However when plugging the camera on an USB3 port, xhci spits the >> following messages on URB submission : >> >> [ 1704.989785] xhci_hcd :01:00.0: ERROR Transfer event TRB DMA ptr >> not part of current TD > > Does the device work properly, despite the error messages? I see that > there are transfer errors in the log file, along with those messages. > Are those expected? > The completion handler for the submitteds urb is never called, so I would say no transfer is hapenning. After these error messages, nothing happens. The urb are not completed, so they are not restarted. When stopping the transfer, the HC dies. >> USB3 host is an asmedia ASM1042. > > Do you know if this is a 1.0 xHCI host? E.g. when you enable > CONFIG_USB_DEBUG and CONFIG_USB_XHCI_HCD_DEBUGGING, do you see lines > like this during driver initialization: > > CAPLENGTH AND HCIVERSION 0x180: > > If you see that line start with 0x100, it's a 1.0 xHCI host. If starts > with 0x096, it's a 0.96 host; 0x095 would be a 0.95 host. > It is a 0.96 host. : 0x960020 (CAPLENGTH AND HCIVERSION) > I ask because I have a todo list entry to fix short packet handling on > xHCI 1.0 hosts: > > "On Intel Lynx Point xHCI, when you plug in a HS USB webcam, the log > will fill with the error message: > > ERROR Transfer event TRB DMA ptr not part of current TD > > However, the video looks fine, and there is no impact on the driver > behavior. > > This is caused by a change between the xHCI 0.96 and the xHCI 1.0 spec. > The change describes in section 4.10.1.1 how the xHCI host controller > should handle short transfers that happen on a TD comprised of more than > one TRB chained together. The 0.96 spec says that the host will only > send one event with a short completion code for the TD. The 1.0 spec > says if the short packet happened on a TRB that wasn't the last TRB in > the TD, then we will get two events with short completions (one for the > short TRB and one for the last TRB with the IOC flag set). The 1.0 spec > says that the xHCI driver shall not consider the TD to be done until the > second event is received. > I don't think it is the same issue here, because no transfer happens or is considered complete, and we are at the beginning of the frame, so there should be no short transfer. What is the meaning of "Endpoint 0x82 not halted, refusing to reset" The failed isochronous urb should happen on this endpoint. Thank you for your interest. Jean-Philippe François > The current xHCI driver behavior finishes the TD when it receives the > first short event. Then when it receives the second event, it prints the > warning message. > > This also appeared under the xHCI 1.0 host in Intel Panther Point xHCI, > but it was mistaken for a known hardware bug, the spurious successful > event bug. The work-around that was put into the driver masks this xHCI > driver bug. > > Impact on the driver is minimal for this bug. > > The easy fix is just to change the XHCI_SPURIOUS_SUCCESS quirk to be > applied for all 1.0 xHCI hosts. However, that doesn't solve the race > condition that exists when the endpoint is stopped before the second > event TRB is received. > > The harder fix is to either add an event data TRB after the chained TD > (and stop using ISP and IOC flags), or make the xHCI ring handling code > wait for the second event completion for xHCI 1.0 hosts." > > Sarah Sharp -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Isochronous transfer error on USB3
On Mon, Mar 11, 2013 at 01:21:48PM +0100, jean-philippe francois wrote: > Hi, > > The company I work for is doing USB cameras, for which I wrote the > drivers (out of tree). Kernel driver or userspace driver? > Raw camera data are transferred using isochronous transfer, which work > fine when used on USB2 EHCI. > > However when plugging the camera on an USB3 port, xhci spits the > following messages on URB submission : > > [ 1704.989785] xhci_hcd :01:00.0: ERROR Transfer event TRB DMA ptr > not part of current TD Does the device work properly, despite the error messages? I see that there are transfer errors in the log file, along with those messages. Are those expected? > USB3 host is an asmedia ASM1042. Do you know if this is a 1.0 xHCI host? E.g. when you enable CONFIG_USB_DEBUG and CONFIG_USB_XHCI_HCD_DEBUGGING, do you see lines like this during driver initialization: CAPLENGTH AND HCIVERSION 0x180: If you see that line start with 0x100, it's a 1.0 xHCI host. If starts with 0x096, it's a 0.96 host; 0x095 would be a 0.95 host. I ask because I have a todo list entry to fix short packet handling on xHCI 1.0 hosts: "On Intel Lynx Point xHCI, when you plug in a HS USB webcam, the log will fill with the error message: ERROR Transfer event TRB DMA ptr not part of current TD However, the video looks fine, and there is no impact on the driver behavior. This is caused by a change between the xHCI 0.96 and the xHCI 1.0 spec. The change describes in section 4.10.1.1 how the xHCI host controller should handle short transfers that happen on a TD comprised of more than one TRB chained together. The 0.96 spec says that the host will only send one event with a short completion code for the TD. The 1.0 spec says if the short packet happened on a TRB that wasn't the last TRB in the TD, then we will get two events with short completions (one for the short TRB and one for the last TRB with the IOC flag set). The 1.0 spec says that the xHCI driver shall not consider the TD to be done until the second event is received. The current xHCI driver behavior finishes the TD when it receives the first short event. Then when it receives the second event, it prints the warning message. This also appeared under the xHCI 1.0 host in Intel Panther Point xHCI, but it was mistaken for a known hardware bug, the spurious successful event bug. The work-around that was put into the driver masks this xHCI driver bug. Impact on the driver is minimal for this bug. The easy fix is just to change the XHCI_SPURIOUS_SUCCESS quirk to be applied for all 1.0 xHCI hosts. However, that doesn't solve the race condition that exists when the endpoint is stopped before the second event TRB is received. The harder fix is to either add an event data TRB after the chained TD (and stop using ISP and IOC flags), or make the xHCI ring handling code wait for the second event completion for xHCI 1.0 hosts." Sarah Sharp -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html