Re: Isochronous transfer error on USB3

2014-01-08 Thread Mauro Carvalho Chehab
Em Wed, 08 Jan 2014 15:05:12 -0200
Mauro Carvalho Chehab  escreveu:

> Em Wed, 08 Jan 2014 14:31:28 -0200
> Mauro Carvalho Chehab  escreveu:
> 
> > Em Thu, 02 Jan 2014 14:07:22 -0800
> > Sarah Sharp  escreveu:
> > 
> > > On Sun, Dec 29, 2013 at 02:54:40AM -0200, Mauro Carvalho Chehab wrote:
> > > > It seems that usb_unlink_urb() is causing troubles with xHCI: the
> > > > endpoint stops streaming, but, after that, it doesn't start again,
> > > > and lots of debug messages are produced. I emailed you the full log
> > > > after start streaming in priv (too big for vger), but basically, 
> > > > it produces:
> > > > 
> > > > [ 1635.754546] xhci_hcd :00:14.0: Endpoint 0x81 not halted, 
> > > > refusing to reset.
> > > > [ 1635.754562] xhci_hcd :00:14.0: Endpoint 0x82 not halted, 
> > > > refusing to reset.
> > > > [ 1635.754577] xhci_hcd :00:14.0: Endpoint 0x83 not halted, 
> > > > refusing to reset.
> > > > [ 1635.754594] xhci_hcd :00:14.0: Endpoint 0x84 not halted, 
> > > > refusing to reset.
> > > 
> > > I think that's due to the driver (or userspace) attempting to reset the
> > > endpoint when it didn't actually receive a stall (-EPIPE) status from an
> > > URB.  When that happens, the xHCI host controller endpoint "toggle" bits
> > > get out of sync with the device toggle bits, and the result is that all
> > > transfers will fail to the endpoint from then on until you switch
> > > alternate interface settings or unplug/replug the device.
> > > 
> > > Try this patch:
> > > 
> > > http://marc.info/?l=linux-usb&m=138116117104619&w=2
> > > 
> > > It's still under RFC, and I know it has race conditions, but it will let
> > > you quickly test whether this fixes your issue.
> > 
> > Didn't work fine, or at least it didn't solve all the problems. Also, it
> > started to cause OOPSes due to the race conditions.
> > 
> > > 
> > > This has been a long-standing xHCI driver bug.  I asked my OPW intern to
> > > work on the patch to fix it, but she may be a bit busy with her new job
> > > to finish up the RFC.  I'll probably have to take over finishing the
> > > patch, if this turns out to be your issue.
> > > 
> > > > (Not sure why it is trying to stop all endpoints - as just one endpoint 
> > > > was
> > > > requested to restart).
> > > 
> > > Something is calling into usb_clear_halt() with all the endpoints.
> > > Userspace, perhaps? 
> > 
> > No, userspace is not doing it. The userspace doesn't even know that this
> > device is USB (and were written at the time that all media drivers were
> > PCI only - so it doesn't have any USB specific call on it).
> > 
> > > You could add WARN() calls to usb_clear_halt() to
> > > see what code is resetting the endpoints.  In any case, it's not part of
> > > the USB core code to change configuration or alt settings, since I don't
> > > see any xHCI driver output from the endpoint bandwidth code in this
> > > chunk of the dmesg you sent:
> > 
> > The em28xx-audio.c driver may need to call usb_set_interface() while
> > the video is still streaming, in order to unmute the audio. That happens
> > when the audio device is opened.
> > 
> > With EHCI, this works properly.
> > 
> > > [ 1649.640783] xhci_hcd :00:14.0: Removing canceled TD starting at 
> > > 0xb41e8580 (dma).
> > > [ 1649.640784] xhci_hcd :00:14.0: TRB to noop at offset 0xb41e8580
> > > [ 1649.643159] xhci_hcd :00:14.0: Endpoint 0x81 not halted, refusing 
> > > to reset.
> > > [ 1649.643188] xhci_hcd :00:14.0: Endpoint 0x82 not halted, refusing 
> > > to reset.
> > > [ 1649.643215] xhci_hcd :00:14.0: Endpoint 0x83 not halted, refusing 
> > > to reset.
> > > [ 1649.643239] xhci_hcd :00:14.0: Endpoint 0x84 not halted, refusing 
> > > to reset.
> > > [ 1649.735539] xhci_hcd :00:14.0: ERROR no room on ep ring, try ring 
> > > expansion
> > > 
> > > Sarah Sharp
> > 
> > Btw, sometimes, I get such logs:
> > 
> > [  646.192273] xhci_hcd :00:14.0: Miss service interval error, set skip 
> > flag
> > [  646.192292] xhci_hcd :00:14.0: Miss service interval error, set skip 
> > flag
> > [  646.192311] xhci_hcd :00:14.0: Miss service interval error, set skip 
> > flag
> > [  646.192329] xhci_hcd :00:14.0: Miss service interval error, set skip 
> > flag
> > [  646.192351] xhci_hcd :00:14.0: Miss service interval error, set skip 
> > flag
> > [  646.192376] xhci_hcd :00:14.0: Miss service interval error, set skip 
> > flag
> > 
> > After adding some debug at em28xx-audio, triggering alsa trigger start
> > events, I'm getting those:
> > 
> > [ 3078.971224] snd_em28xx_capture_trigger: start capture
> > [ 3078.971284] xhci_hcd :00:14.0: ERROR no room on ep ring, try ring 
> > expansion
> > [ 3078.971311] xhci_hcd :00:14.0: ring expansion succeed, now has 4 
> > segments
> > [ 3078.971350] xhci_hcd :00:14.0: ERROR no room on ep ring, try ring 
> > expansion
> > [ 3078.971387] xhci_hcd :00:14.0: ring expansion succeed, now has 8 
> > segments
> > [ 3079.034626]

Re: Isochronous transfer error on USB3

2014-01-08 Thread Mauro Carvalho Chehab
Em Wed, 08 Jan 2014 14:31:28 -0200
Mauro Carvalho Chehab  escreveu:

> Em Thu, 02 Jan 2014 14:07:22 -0800
> Sarah Sharp  escreveu:
> 
> > On Sun, Dec 29, 2013 at 02:54:40AM -0200, Mauro Carvalho Chehab wrote:
> > > It seems that usb_unlink_urb() is causing troubles with xHCI: the
> > > endpoint stops streaming, but, after that, it doesn't start again,
> > > and lots of debug messages are produced. I emailed you the full log
> > > after start streaming in priv (too big for vger), but basically, 
> > > it produces:
> > > 
> > > [ 1635.754546] xhci_hcd :00:14.0: Endpoint 0x81 not halted, refusing 
> > > to reset.
> > > [ 1635.754562] xhci_hcd :00:14.0: Endpoint 0x82 not halted, refusing 
> > > to reset.
> > > [ 1635.754577] xhci_hcd :00:14.0: Endpoint 0x83 not halted, refusing 
> > > to reset.
> > > [ 1635.754594] xhci_hcd :00:14.0: Endpoint 0x84 not halted, refusing 
> > > to reset.
> > 
> > I think that's due to the driver (or userspace) attempting to reset the
> > endpoint when it didn't actually receive a stall (-EPIPE) status from an
> > URB.  When that happens, the xHCI host controller endpoint "toggle" bits
> > get out of sync with the device toggle bits, and the result is that all
> > transfers will fail to the endpoint from then on until you switch
> > alternate interface settings or unplug/replug the device.
> > 
> > Try this patch:
> > 
> > http://marc.info/?l=linux-usb&m=138116117104619&w=2
> > 
> > It's still under RFC, and I know it has race conditions, but it will let
> > you quickly test whether this fixes your issue.
> 
> Didn't work fine, or at least it didn't solve all the problems. Also, it
> started to cause OOPSes due to the race conditions.
> 
> > 
> > This has been a long-standing xHCI driver bug.  I asked my OPW intern to
> > work on the patch to fix it, but she may be a bit busy with her new job
> > to finish up the RFC.  I'll probably have to take over finishing the
> > patch, if this turns out to be your issue.
> > 
> > > (Not sure why it is trying to stop all endpoints - as just one endpoint 
> > > was
> > > requested to restart).
> > 
> > Something is calling into usb_clear_halt() with all the endpoints.
> > Userspace, perhaps? 
> 
> No, userspace is not doing it. The userspace doesn't even know that this
> device is USB (and were written at the time that all media drivers were
> PCI only - so it doesn't have any USB specific call on it).
> 
> > You could add WARN() calls to usb_clear_halt() to
> > see what code is resetting the endpoints.  In any case, it's not part of
> > the USB core code to change configuration or alt settings, since I don't
> > see any xHCI driver output from the endpoint bandwidth code in this
> > chunk of the dmesg you sent:
> 
> The em28xx-audio.c driver may need to call usb_set_interface() while
> the video is still streaming, in order to unmute the audio. That happens
> when the audio device is opened.
> 
> With EHCI, this works properly.
> 
> > [ 1649.640783] xhci_hcd :00:14.0: Removing canceled TD starting at 
> > 0xb41e8580 (dma).
> > [ 1649.640784] xhci_hcd :00:14.0: TRB to noop at offset 0xb41e8580
> > [ 1649.643159] xhci_hcd :00:14.0: Endpoint 0x81 not halted, refusing to 
> > reset.
> > [ 1649.643188] xhci_hcd :00:14.0: Endpoint 0x82 not halted, refusing to 
> > reset.
> > [ 1649.643215] xhci_hcd :00:14.0: Endpoint 0x83 not halted, refusing to 
> > reset.
> > [ 1649.643239] xhci_hcd :00:14.0: Endpoint 0x84 not halted, refusing to 
> > reset.
> > [ 1649.735539] xhci_hcd :00:14.0: ERROR no room on ep ring, try ring 
> > expansion
> > 
> > Sarah Sharp
> 
> Btw, sometimes, I get such logs:
> 
> [  646.192273] xhci_hcd :00:14.0: Miss service interval error, set skip 
> flag
> [  646.192292] xhci_hcd :00:14.0: Miss service interval error, set skip 
> flag
> [  646.192311] xhci_hcd :00:14.0: Miss service interval error, set skip 
> flag
> [  646.192329] xhci_hcd :00:14.0: Miss service interval error, set skip 
> flag
> [  646.192351] xhci_hcd :00:14.0: Miss service interval error, set skip 
> flag
> [  646.192376] xhci_hcd :00:14.0: Miss service interval error, set skip 
> flag
> 
> After adding some debug at em28xx-audio, triggering alsa trigger start
> events, I'm getting those:
> 
> [ 3078.971224] snd_em28xx_capture_trigger: start capture
> [ 3078.971284] xhci_hcd :00:14.0: ERROR no room on ep ring, try ring 
> expansion
> [ 3078.971311] xhci_hcd :00:14.0: ring expansion succeed, now has 4 
> segments
> [ 3078.971350] xhci_hcd :00:14.0: ERROR no room on ep ring, try ring 
> expansion
> [ 3078.971387] xhci_hcd :00:14.0: ring expansion succeed, now has 8 
> segments
> [ 3079.034626] em28xx_audio_isocirq, 64 packets (first one with size 12)
> 
> Here, some audio data arrives.
> 
> [ 3079.034665] snd_em28xx_capture_trigger: stop capture
> 
> It seems, however, that this didn't arrive in time, and causes an alsa
> buffer underrun. So, it cancels the existing URBs.
> 
> PS.

Re: Isochronous transfer error on USB3

2014-01-08 Thread Mauro Carvalho Chehab
Em Thu, 02 Jan 2014 14:07:22 -0800
Sarah Sharp  escreveu:

> On Sun, Dec 29, 2013 at 02:54:40AM -0200, Mauro Carvalho Chehab wrote:
> > It seems that usb_unlink_urb() is causing troubles with xHCI: the
> > endpoint stops streaming, but, after that, it doesn't start again,
> > and lots of debug messages are produced. I emailed you the full log
> > after start streaming in priv (too big for vger), but basically, 
> > it produces:
> > 
> > [ 1635.754546] xhci_hcd :00:14.0: Endpoint 0x81 not halted, refusing to 
> > reset.
> > [ 1635.754562] xhci_hcd :00:14.0: Endpoint 0x82 not halted, refusing to 
> > reset.
> > [ 1635.754577] xhci_hcd :00:14.0: Endpoint 0x83 not halted, refusing to 
> > reset.
> > [ 1635.754594] xhci_hcd :00:14.0: Endpoint 0x84 not halted, refusing to 
> > reset.
> 
> I think that's due to the driver (or userspace) attempting to reset the
> endpoint when it didn't actually receive a stall (-EPIPE) status from an
> URB.  When that happens, the xHCI host controller endpoint "toggle" bits
> get out of sync with the device toggle bits, and the result is that all
> transfers will fail to the endpoint from then on until you switch
> alternate interface settings or unplug/replug the device.
> 
> Try this patch:
> 
> http://marc.info/?l=linux-usb&m=138116117104619&w=2
> 
> It's still under RFC, and I know it has race conditions, but it will let
> you quickly test whether this fixes your issue.

Didn't work fine, or at least it didn't solve all the problems. Also, it
started to cause OOPSes due to the race conditions.

> 
> This has been a long-standing xHCI driver bug.  I asked my OPW intern to
> work on the patch to fix it, but she may be a bit busy with her new job
> to finish up the RFC.  I'll probably have to take over finishing the
> patch, if this turns out to be your issue.
> 
> > (Not sure why it is trying to stop all endpoints - as just one endpoint was
> > requested to restart).
> 
> Something is calling into usb_clear_halt() with all the endpoints.
> Userspace, perhaps? 

No, userspace is not doing it. The userspace doesn't even know that this
device is USB (and were written at the time that all media drivers were
PCI only - so it doesn't have any USB specific call on it).

> You could add WARN() calls to usb_clear_halt() to
> see what code is resetting the endpoints.  In any case, it's not part of
> the USB core code to change configuration or alt settings, since I don't
> see any xHCI driver output from the endpoint bandwidth code in this
> chunk of the dmesg you sent:

The em28xx-audio.c driver may need to call usb_set_interface() while
the video is still streaming, in order to unmute the audio. That happens
when the audio device is opened.

With EHCI, this works properly.

> [ 1649.640783] xhci_hcd :00:14.0: Removing canceled TD starting at 
> 0xb41e8580 (dma).
> [ 1649.640784] xhci_hcd :00:14.0: TRB to noop at offset 0xb41e8580
> [ 1649.643159] xhci_hcd :00:14.0: Endpoint 0x81 not halted, refusing to 
> reset.
> [ 1649.643188] xhci_hcd :00:14.0: Endpoint 0x82 not halted, refusing to 
> reset.
> [ 1649.643215] xhci_hcd :00:14.0: Endpoint 0x83 not halted, refusing to 
> reset.
> [ 1649.643239] xhci_hcd :00:14.0: Endpoint 0x84 not halted, refusing to 
> reset.
> [ 1649.735539] xhci_hcd :00:14.0: ERROR no room on ep ring, try ring 
> expansion
> 
> Sarah Sharp

Btw, sometimes, I get such logs:

[  646.192273] xhci_hcd :00:14.0: Miss service interval error, set skip flag
[  646.192292] xhci_hcd :00:14.0: Miss service interval error, set skip flag
[  646.192311] xhci_hcd :00:14.0: Miss service interval error, set skip flag
[  646.192329] xhci_hcd :00:14.0: Miss service interval error, set skip flag
[  646.192351] xhci_hcd :00:14.0: Miss service interval error, set skip flag
[  646.192376] xhci_hcd :00:14.0: Miss service interval error, set skip flag

After adding some debug at em28xx-audio, triggering alsa trigger start
events, I'm getting those:

[ 3078.971224] snd_em28xx_capture_trigger: start capture
[ 3078.971284] xhci_hcd :00:14.0: ERROR no room on ep ring, try ring 
expansion
[ 3078.971311] xhci_hcd :00:14.0: ring expansion succeed, now has 4 segments
[ 3078.971350] xhci_hcd :00:14.0: ERROR no room on ep ring, try ring 
expansion
[ 3078.971387] xhci_hcd :00:14.0: ring expansion succeed, now has 8 segments
[ 3079.034626] em28xx_audio_isocirq, 64 packets (first one with size 12)

Here, some audio data arrives.

[ 3079.034665] snd_em28xx_capture_trigger: stop capture

It seems, however, that this didn't arrive in time, and causes an alsa
buffer underrun. So, it cancels the existing URBs.

PS.: Even with EHCI, it causes a few ALSA underruns before it gets steady.
I suspect that this is due to em28xx time to synchronize audio and video
streams.

[ 3079.034736] xhci_hcd :00:14.0: Cancel URB 88020790, dev 4, ep 
0x83, starting at offset 0x1ffb13850
[ 3079.034755] xhci_hcd :00:14.0: // Ding do

Re: Isochronous transfer error on USB3

2014-01-02 Thread Sarah Sharp
On Sun, Dec 29, 2013 at 02:54:40AM -0200, Mauro Carvalho Chehab wrote:
> It seems that usb_unlink_urb() is causing troubles with xHCI: the
> endpoint stops streaming, but, after that, it doesn't start again,
> and lots of debug messages are produced. I emailed you the full log
> after start streaming in priv (too big for vger), but basically, 
> it produces:
> 
> [ 1635.754546] xhci_hcd :00:14.0: Endpoint 0x81 not halted, refusing to 
> reset.
> [ 1635.754562] xhci_hcd :00:14.0: Endpoint 0x82 not halted, refusing to 
> reset.
> [ 1635.754577] xhci_hcd :00:14.0: Endpoint 0x83 not halted, refusing to 
> reset.
> [ 1635.754594] xhci_hcd :00:14.0: Endpoint 0x84 not halted, refusing to 
> reset.

I think that's due to the driver (or userspace) attempting to reset the
endpoint when it didn't actually receive a stall (-EPIPE) status from an
URB.  When that happens, the xHCI host controller endpoint "toggle" bits
get out of sync with the device toggle bits, and the result is that all
transfers will fail to the endpoint from then on until you switch
alternate interface settings or unplug/replug the device.

Try this patch:

http://marc.info/?l=linux-usb&m=138116117104619&w=2

It's still under RFC, and I know it has race conditions, but it will let
you quickly test whether this fixes your issue.

This has been a long-standing xHCI driver bug.  I asked my OPW intern to
work on the patch to fix it, but she may be a bit busy with her new job
to finish up the RFC.  I'll probably have to take over finishing the
patch, if this turns out to be your issue.

> (Not sure why it is trying to stop all endpoints - as just one endpoint was
> requested to restart).

Something is calling into usb_clear_halt() with all the endpoints.
Userspace, perhaps?  You could add WARN() calls to usb_clear_halt() to
see what code is resetting the endpoints.  In any case, it's not part of
the USB core code to change configuration or alt settings, since I don't
see any xHCI driver output from the endpoint bandwidth code in this
chunk of the dmesg you sent:

[ 1649.640783] xhci_hcd :00:14.0: Removing canceled TD starting at 
0xb41e8580 (dma).
[ 1649.640784] xhci_hcd :00:14.0: TRB to noop at offset 0xb41e8580
[ 1649.643159] xhci_hcd :00:14.0: Endpoint 0x81 not halted, refusing to 
reset.
[ 1649.643188] xhci_hcd :00:14.0: Endpoint 0x82 not halted, refusing to 
reset.
[ 1649.643215] xhci_hcd :00:14.0: Endpoint 0x83 not halted, refusing to 
reset.
[ 1649.643239] xhci_hcd :00:14.0: Endpoint 0x84 not halted, refusing to 
reset.
[ 1649.735539] xhci_hcd :00:14.0: ERROR no room on ep ring, try ring 
expansion

Sarah Sharp
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Isochronous transfer error on USB3

2013-12-28 Thread Mauro Carvalho Chehab
Em Mon, 18 Mar 2013 09:51:24 -0700
Sarah Sharp  escreveu:

> On Tue, Mar 12, 2013 at 10:57:22AM +0100, jean-philippe francois wrote:
> > 2013/3/12 jean-philippe francois :
> > > 2013/3/12 Sarah Sharp :
> > >> On Mon, Mar 11, 2013 at 01:21:48PM +0100, jean-philippe francois wrote:
> > >>> Hi,
> > >>>
> > >>> The company I work for is doing USB cameras, for which I wrote the
> > >>> drivers (out of tree).
> > >>
> > >> Kernel driver or userspace driver?
> > >>
> > > Kernel driver.
> 
> Ok, I don't know what license your driver is under, so I can't legally
> look at the C snippet you sent me.

Sarah,

I'm experiencing a similar error with the in-kernel em28xx driver, plus
some fixes that I'm writing for Kernel 3.14. The tree with the driver is
at:

http://git.linuxtv.org/mchehab/experimental.git/shortlog/refs/heads/em28xx-v4l2

The code with troubles is at:


http://git.linuxtv.org/mchehab/experimental.git/blob/refs/heads/em28xx-v4l2:/drivers/media/usb/em28xx/em28xx-audio.c

Basically, the device works fine on an USB 2.0 port, but if I connect
it on any of the 3 existing USB 3.0 ports on my notebook, the audio
endpoint doesn't work.

On this device, there are two isoc transfers: one for video (on frame
on every 1/60 seconds) and another one for audio. Each on a different
endpoint.

The video transfers happen fine. The audio transfer doesn't.

Due to the way the userspace works, every time the channel changes,
the audio stream is stopped and restarted. 

The code that starts streaming is at line 164:

164 static int em28xx_init_audio_isoc(struct em28xx *dev)
165 {
166 int   i, errCode;
167
168 dprintk("Starting isoc transfers\n");
169
170 /* Start streaming */
171 for (i = 0; i < EM28XX_AUDIO_BUFS; i++) {
172 memset(dev->adev.transfer_buffer[i], 0x80,
173dev->adev.urb[i]->transfer_buffer_length);
174
175 errCode = usb_submit_urb(dev->adev.urb[i], GFP_ATOMIC);
176 if (errCode) {
177 em28xx_errdev("submit of audio urb failed\n");
178 em28xx_deinit_isoc_audio(dev);
179 atomic_set(&dev->stream_started, 0);
180 return errCode;
181 }
182
183 }
184
185 return 0;
186 }

And the code that stops it is:

61 static int em28xx_deinit_isoc_audio(struct em28xx *dev)
62 {
63 int i;
64
65 dprintk("Stopping isoc\n");
66 for (i = 0; i < EM28XX_AUDIO_BUFS; i++) {
67 struct urb *urb = dev->adev.urb[i];
68
69 if (!irqs_disabled())
70 usb_kill_urb(urb);
71 else
72 usb_unlink_urb(urb);
73 }
74
75 return 0;
76 }

It seems that usb_unlink_urb() is causing troubles with xHCI: the
endpoint stops streaming, but, after that, it doesn't start again,
and lots of debug messages are produced. I emailed you the full log
after start streaming in priv (too big for vger), but basically, 
it produces:

[ 1635.754546] xhci_hcd :00:14.0: Endpoint 0x81 not halted, refusing to 
reset.
[ 1635.754562] xhci_hcd :00:14.0: Endpoint 0x82 not halted, refusing to 
reset.
[ 1635.754577] xhci_hcd :00:14.0: Endpoint 0x83 not halted, refusing to 
reset.
[ 1635.754594] xhci_hcd :00:14.0: Endpoint 0x84 not halted, refusing to 
reset.

(Not sure why it is trying to stop all endpoints - as just one endpoint was
requested to restart).

FYI, my tests are with xHCI 1.0:

[3.640583] xhci_hcd :00:14.0: // @c900040c = 0x180 
(CAPLENGTH AND HCIVERSION)

Any ideas?
-- 

Cheers,
Mauro
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Isochronous transfer error on USB3

2013-03-18 Thread Sarah Sharp
On Wed, Mar 13, 2013 at 03:34:15PM +0100, jean-philippe francois wrote:
> 2013/3/12 jean-philippe francois :
> > 2013/3/12 jean-philippe francois :
> >> 2013/3/12 Sarah Sharp :
> >>> On Mon, Mar 11, 2013 at 01:21:48PM +0100, jean-philippe francois wrote:
>  USB3 host  is an asmedia ASM1042.
> 
> Update :
> I have been able to test this on an intel XHCI, and it works fine,
> except for some
> difference in the treatment of short iso packet in the EHCI and XHCI path.
> 
> On the ASM1042, it also work when the endpoint descriptor is set to
> 3x1000, bInterval = 2.
> 
> But it does not with a iso endpoint descriptor of 3x1000, bInterval = 1.

That's odd.  Can you recompile the xHCI driver with
CONFIG_USB_XHCI_HCD_DEBUGGING and CONFIG_USB_DEBUG turned on, and send
me dmesg, starting from just before you plug in your device and start a
video stream.  Please capture trying to use both the non-working
endpoint setting, and the working setting.

Sarah Sharp
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Isochronous transfer error on USB3

2013-03-18 Thread Sarah Sharp
On Tue, Mar 12, 2013 at 10:57:22AM +0100, jean-philippe francois wrote:
> 2013/3/12 jean-philippe francois :
> > 2013/3/12 Sarah Sharp :
> >> On Mon, Mar 11, 2013 at 01:21:48PM +0100, jean-philippe francois wrote:
> >>> Hi,
> >>>
> >>> The company I work for is doing USB cameras, for which I wrote the
> >>> drivers (out of tree).
> >>
> >> Kernel driver or userspace driver?
> >>
> > Kernel driver.

Ok, I don't know what license your driver is under, so I can't legally
look at the C snippet you sent me.

> >>> Raw camera data are transferred using isochronous transfer, which work
> >>> fine when used on USB2 EHCI.
> >>>
> >>> However when plugging the camera on an USB3 port, xhci spits the
> >>> following messages on URB submission :
> >>>
> >>> [ 1704.989785] xhci_hcd :01:00.0: ERROR Transfer event TRB DMA ptr
> >>> not part of current TD
> >>
> >> Does the device work properly, despite the error messages?  I see that
> >> there are transfer errors in the log file, along with those messages.
> >> Are those expected?
> >>
> >
> > The completion handler for the submitteds urb is never called, so I would 
> > say
> > no transfer is hapenning. After these error messages, nothing happens.
> > The urb are not completed, so they are not restarted.
> > When stopping the transfer, the HC dies.
> >
> 
> Sometimes the completion handler is called, but transfer are never
> restarted anyway.

What do you mean by the "transfers are never restarted anyway"?

> Attached is the interrupt routine code, and the relevant part of the log.
> 
> Individual iso frame transfer are completed, actual_length field is
> Ok, but status is -EPROTO.

So that's a transfer error.  You can't trust the actual_length field if
you get a transfer error.

> >>> USB3 host  is an asmedia ASM1042.
> >>
> >> Do you know if this is a 1.0 xHCI host?  E.g. when you enable
> >> CONFIG_USB_DEBUG and CONFIG_USB_XHCI_HCD_DEBUGGING, do you see lines
> >> like this during driver initialization:
> >>
> >> CAPLENGTH AND HCIVERSION 0x180:
> >>
> >> If you see that line start with 0x100, it's a 1.0 xHCI host.  If starts
> >> with 0x096, it's a 0.96 host; 0x095 would be a 0.95 host.
> >>
> >
> > It is a 0.96 host. :
> > 0x960020 (CAPLENGTH AND HCIVERSION)

All right, then my hypothesis wasn't correct.

> > I don't think it is the same issue here, because no transfer happens or is
> > considered complete, and we are at the beginning of the frame, so there 
> > should
> > be no short transfer.

That's not a short transfer you're seeing.  You're seeing a protocol
error, which usually means either: a bad cable, a bad connection between
the device and host, or something is misconfigured in the xHCI driver.

> > What is the meaning of "Endpoint 0x82  not halted, refusing to reset"
> > The failed isochronous urb should happen on this endpoint.

If your driver is calling into usb_reset_endpoint when the host endpoint
hasn't stalled, you will get a mismatch between the toggle bit state in
the xHCI host driver and the device.  This is because currently, the
xHCI driver refuses to send a Reset Endpoint command if the endpoint
isn't actually halted.  This is because the hardware will reject the
command.  The driver should issue a Configure Endpoint command instead.

Are you calling into usb_reset_endpoint when the endpoint isn't halted?

Sarah Sharp
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Isochronous transfer error on USB3

2013-03-13 Thread jean-philippe francois
2013/3/12 jean-philippe francois :
> 2013/3/12 jean-philippe francois :
>> 2013/3/12 Sarah Sharp :
>>> On Mon, Mar 11, 2013 at 01:21:48PM +0100, jean-philippe francois wrote:
 Hi,

 The company I work for is doing USB cameras, for which I wrote the
 drivers (out of tree).
>>>
>>> Kernel driver or userspace driver?
>>>
>> Kernel driver.
>>
 Raw camera data are transferred using isochronous transfer, which work
 fine when used on USB2 EHCI.

 However when plugging the camera on an USB3 port, xhci spits the
 following messages on URB submission :

 [ 1704.989785] xhci_hcd :01:00.0: ERROR Transfer event TRB DMA ptr
 not part of current TD
>>>
>>> Does the device work properly, despite the error messages?  I see that
>>> there are transfer errors in the log file, along with those messages.
>>> Are those expected?
>>>
>>
>> The completion handler for the submitteds urb is never called, so I would say
>> no transfer is hapenning. After these error messages, nothing happens.
>> The urb are not completed, so they are not restarted.
>> When stopping the transfer, the HC dies.
>>
>
> Sometimes the completion handler is called, but transfer are never
> restarted anyway.
> Attached is the interrupt routine code, and the relevant part of the log.
>
> Individual iso frame transfer are completed, actual_length field is
> Ok, but status is -EPROTO.
>
>
 USB3 host  is an asmedia ASM1042.

Update :
I have been able to test this on an intel XHCI, and it works fine,
except for some
difference in the treatment of short iso packet in the EHCI and XHCI path.

On the ASM1042, it also work when the endpoint descriptor is set to
3x1000, bInterval = 2.

But it does not with a iso endpoint descriptor of 3x1000, bInterval = 1.



>>>
>>> Do you know if this is a 1.0 xHCI host?  E.g. when you enable
>>> CONFIG_USB_DEBUG and CONFIG_USB_XHCI_HCD_DEBUGGING, do you see lines
>>> like this during driver initialization:
>>>
>>> CAPLENGTH AND HCIVERSION 0x180:
>>>
>>> If you see that line start with 0x100, it's a 1.0 xHCI host.  If starts
>>> with 0x096, it's a 0.96 host; 0x095 would be a 0.95 host.
>>>
>>
>> It is a 0.96 host. :
>> 0x960020 (CAPLENGTH AND HCIVERSION)
>>
>>> I ask because I have a todo list entry to fix short packet handling on
>>> xHCI 1.0 hosts:
>>>
>>> "On Intel Lynx Point xHCI, when you plug in a HS USB webcam, the log
>>> will fill with the error message:
>>>
>>> ERROR Transfer event TRB DMA ptr not part of current TD
>>>
>>> However, the video looks fine, and there is no impact on the driver
>>> behavior.
>>>
>>> This is caused by a change between the xHCI 0.96 and the xHCI 1.0 spec.
>>> The change describes in section 4.10.1.1 how the xHCI host controller
>>> should handle short transfers that happen on a TD comprised of more than
>>> one TRB chained together. The 0.96 spec says that the host will only
>>> send one event with a short completion code for the TD. The 1.0 spec
>>> says if the short packet happened on a TRB that wasn't the last TRB in
>>> the TD, then we will get two events with short completions (one for the
>>> short TRB and one for the last TRB with the IOC flag set). The 1.0 spec
>>> says that the xHCI driver shall not consider the TD to be done until the
>>> second event is received.
>>>
>>
>> I don't think it is the same issue here, because no transfer happens or is
>> considered complete, and we are at the beginning of the frame, so there 
>> should
>> be no short transfer.
>>
>> What is the meaning of "Endpoint 0x82  not halted, refusing to reset"
>> The failed isochronous urb should happen on this endpoint.
>>
>> Thank you for your interest.
>>
>> Jean-Philippe François
>>> The current xHCI driver behavior finishes the TD when it receives the
>>> first short event. Then when it receives the second event, it prints the
>>> warning message.
>>>
>>> This also appeared under the xHCI 1.0 host in Intel Panther Point xHCI,
>>> but it was mistaken for a known hardware bug, the spurious successful
>>> event bug. The work-around that was put into the driver masks this xHCI
>>> driver bug.
>>>
>>> Impact on the driver is minimal for this bug.
>>>
>>> The easy fix is just to change the XHCI_SPURIOUS_SUCCESS quirk to be
>>> applied for all 1.0 xHCI hosts. However, that doesn't solve the race
>>> condition that exists when the endpoint is stopped before the second
>>> event TRB is received.
>>>
>>> The harder fix is to either add an event data TRB after the chained TD
>>> (and stop using ISP and IOC flags), or make the xHCI ring handling code
>>> wait for the second event completion for xHCI 1.0 hosts."
>>>
>>> Sarah Sharp
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Isochronous transfer error on USB3

2013-03-12 Thread jean-philippe francois
2013/3/12 jean-philippe francois :
> 2013/3/12 Sarah Sharp :
>> On Mon, Mar 11, 2013 at 01:21:48PM +0100, jean-philippe francois wrote:
>>> Hi,
>>>
>>> The company I work for is doing USB cameras, for which I wrote the
>>> drivers (out of tree).
>>
>> Kernel driver or userspace driver?
>>
> Kernel driver.
>
>>> Raw camera data are transferred using isochronous transfer, which work
>>> fine when used on USB2 EHCI.
>>>
>>> However when plugging the camera on an USB3 port, xhci spits the
>>> following messages on URB submission :
>>>
>>> [ 1704.989785] xhci_hcd :01:00.0: ERROR Transfer event TRB DMA ptr
>>> not part of current TD
>>
>> Does the device work properly, despite the error messages?  I see that
>> there are transfer errors in the log file, along with those messages.
>> Are those expected?
>>
>
> The completion handler for the submitteds urb is never called, so I would say
> no transfer is hapenning. After these error messages, nothing happens.
> The urb are not completed, so they are not restarted.
> When stopping the transfer, the HC dies.
>

Sometimes the completion handler is called, but transfer are never
restarted anyway.
Attached is the interrupt routine code, and the relevant part of the log.

Individual iso frame transfer are completed, actual_length field is
Ok, but status is -EPROTO.


>>> USB3 host  is an asmedia ASM1042.
>>
>> Do you know if this is a 1.0 xHCI host?  E.g. when you enable
>> CONFIG_USB_DEBUG and CONFIG_USB_XHCI_HCD_DEBUGGING, do you see lines
>> like this during driver initialization:
>>
>> CAPLENGTH AND HCIVERSION 0x180:
>>
>> If you see that line start with 0x100, it's a 1.0 xHCI host.  If starts
>> with 0x096, it's a 0.96 host; 0x095 would be a 0.95 host.
>>
>
> It is a 0.96 host. :
> 0x960020 (CAPLENGTH AND HCIVERSION)
>
>> I ask because I have a todo list entry to fix short packet handling on
>> xHCI 1.0 hosts:
>>
>> "On Intel Lynx Point xHCI, when you plug in a HS USB webcam, the log
>> will fill with the error message:
>>
>> ERROR Transfer event TRB DMA ptr not part of current TD
>>
>> However, the video looks fine, and there is no impact on the driver
>> behavior.
>>
>> This is caused by a change between the xHCI 0.96 and the xHCI 1.0 spec.
>> The change describes in section 4.10.1.1 how the xHCI host controller
>> should handle short transfers that happen on a TD comprised of more than
>> one TRB chained together. The 0.96 spec says that the host will only
>> send one event with a short completion code for the TD. The 1.0 spec
>> says if the short packet happened on a TRB that wasn't the last TRB in
>> the TD, then we will get two events with short completions (one for the
>> short TRB and one for the last TRB with the IOC flag set). The 1.0 spec
>> says that the xHCI driver shall not consider the TD to be done until the
>> second event is received.
>>
>
> I don't think it is the same issue here, because no transfer happens or is
> considered complete, and we are at the beginning of the frame, so there should
> be no short transfer.
>
> What is the meaning of "Endpoint 0x82  not halted, refusing to reset"
> The failed isochronous urb should happen on this endpoint.
>
> Thank you for your interest.
>
> Jean-Philippe François
>> The current xHCI driver behavior finishes the TD when it receives the
>> first short event. Then when it receives the second event, it prints the
>> warning message.
>>
>> This also appeared under the xHCI 1.0 host in Intel Panther Point xHCI,
>> but it was mistaken for a known hardware bug, the spurious successful
>> event bug. The work-around that was put into the driver masks this xHCI
>> driver bug.
>>
>> Impact on the driver is minimal for this bug.
>>
>> The easy fix is just to change the XHCI_SPURIOUS_SUCCESS quirk to be
>> applied for all 1.0 xHCI hosts. However, that doesn't solve the race
>> condition that exists when the endpoint is stopped before the second
>> event TRB is received.
>>
>> The harder fix is to either add an event data TRB after the chained TD
>> (and stop using ISP and IOC flags), or make the xHCI ring handling code
>> wait for the second event completion for xHCI 1.0 hosts."
>>
>> Sarah Sharp
/*
 * isochronous IRQ
 */
static void
ovfx2_isoc_irq(struct urb *urb)
{
	int i, global_len;
	struct usb_ovfx2 *ov;
	struct ovfx2_sbuf *sbuf;
	struct list_head * list_in;
	unsigned long flags = 0;

	if (!urb->context) {
		printk(KERN_ERR "%s :no context\n", __func__);
		return;
	}

	sbuf = urb->context;
	ov = sbuf->ov;

	if(!ov) {
		printk(KERN_ERR "%s :ov NULL\n", __func__);
	}

	if (!ov->dev || !ov->user) {
		PDEBUG(4, ov,"no device, or not open");
		return;
	}

	if (!ov->streaming) {
		PDEBUG(4, ov,"hmmm... not streaming, but got interrupt");
		return;
	}

	if (urb->status == -ENOENT || urb->status == -ECONNRESET) {
		PDEBUG(4, ov, "URB unlinked");
		return;
	}

	if (urb->status != -EINPROGRESS && urb->status != 0) {
		dev_err(DEV(ov), "ERROR: urb->status=%d: %s", urb->status,
		symbolic(urb_errlist, urb->sta

Re: Isochronous transfer error on USB3

2013-03-12 Thread jean-philippe francois
2013/3/12 Sarah Sharp :
> On Mon, Mar 11, 2013 at 01:21:48PM +0100, jean-philippe francois wrote:
>> Hi,
>>
>> The company I work for is doing USB cameras, for which I wrote the
>> drivers (out of tree).
>
> Kernel driver or userspace driver?
>
Kernel driver.

>> Raw camera data are transferred using isochronous transfer, which work
>> fine when used on USB2 EHCI.
>>
>> However when plugging the camera on an USB3 port, xhci spits the
>> following messages on URB submission :
>>
>> [ 1704.989785] xhci_hcd :01:00.0: ERROR Transfer event TRB DMA ptr
>> not part of current TD
>
> Does the device work properly, despite the error messages?  I see that
> there are transfer errors in the log file, along with those messages.
> Are those expected?
>

The completion handler for the submitteds urb is never called, so I would say
no transfer is hapenning. After these error messages, nothing happens.
The urb are not completed, so they are not restarted.
When stopping the transfer, the HC dies.

>> USB3 host  is an asmedia ASM1042.
>
> Do you know if this is a 1.0 xHCI host?  E.g. when you enable
> CONFIG_USB_DEBUG and CONFIG_USB_XHCI_HCD_DEBUGGING, do you see lines
> like this during driver initialization:
>
> CAPLENGTH AND HCIVERSION 0x180:
>
> If you see that line start with 0x100, it's a 1.0 xHCI host.  If starts
> with 0x096, it's a 0.96 host; 0x095 would be a 0.95 host.
>

It is a 0.96 host. :
0x960020 (CAPLENGTH AND HCIVERSION)

> I ask because I have a todo list entry to fix short packet handling on
> xHCI 1.0 hosts:
>
> "On Intel Lynx Point xHCI, when you plug in a HS USB webcam, the log
> will fill with the error message:
>
> ERROR Transfer event TRB DMA ptr not part of current TD
>
> However, the video looks fine, and there is no impact on the driver
> behavior.
>
> This is caused by a change between the xHCI 0.96 and the xHCI 1.0 spec.
> The change describes in section 4.10.1.1 how the xHCI host controller
> should handle short transfers that happen on a TD comprised of more than
> one TRB chained together. The 0.96 spec says that the host will only
> send one event with a short completion code for the TD. The 1.0 spec
> says if the short packet happened on a TRB that wasn't the last TRB in
> the TD, then we will get two events with short completions (one for the
> short TRB and one for the last TRB with the IOC flag set). The 1.0 spec
> says that the xHCI driver shall not consider the TD to be done until the
> second event is received.
>

I don't think it is the same issue here, because no transfer happens or is
considered complete, and we are at the beginning of the frame, so there should
be no short transfer.

What is the meaning of "Endpoint 0x82  not halted, refusing to reset"
The failed isochronous urb should happen on this endpoint.

Thank you for your interest.

Jean-Philippe François
> The current xHCI driver behavior finishes the TD when it receives the
> first short event. Then when it receives the second event, it prints the
> warning message.
>
> This also appeared under the xHCI 1.0 host in Intel Panther Point xHCI,
> but it was mistaken for a known hardware bug, the spurious successful
> event bug. The work-around that was put into the driver masks this xHCI
> driver bug.
>
> Impact on the driver is minimal for this bug.
>
> The easy fix is just to change the XHCI_SPURIOUS_SUCCESS quirk to be
> applied for all 1.0 xHCI hosts. However, that doesn't solve the race
> condition that exists when the endpoint is stopped before the second
> event TRB is received.
>
> The harder fix is to either add an event data TRB after the chained TD
> (and stop using ISP and IOC flags), or make the xHCI ring handling code
> wait for the second event completion for xHCI 1.0 hosts."
>
> Sarah Sharp
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Isochronous transfer error on USB3

2013-03-11 Thread Sarah Sharp
On Mon, Mar 11, 2013 at 01:21:48PM +0100, jean-philippe francois wrote:
> Hi,
> 
> The company I work for is doing USB cameras, for which I wrote the
> drivers (out of tree).

Kernel driver or userspace driver?

> Raw camera data are transferred using isochronous transfer, which work
> fine when used on USB2 EHCI.
> 
> However when plugging the camera on an USB3 port, xhci spits the
> following messages on URB submission :
> 
> [ 1704.989785] xhci_hcd :01:00.0: ERROR Transfer event TRB DMA ptr
> not part of current TD

Does the device work properly, despite the error messages?  I see that
there are transfer errors in the log file, along with those messages.
Are those expected?

> USB3 host  is an asmedia ASM1042.

Do you know if this is a 1.0 xHCI host?  E.g. when you enable
CONFIG_USB_DEBUG and CONFIG_USB_XHCI_HCD_DEBUGGING, do you see lines
like this during driver initialization:

CAPLENGTH AND HCIVERSION 0x180:

If you see that line start with 0x100, it's a 1.0 xHCI host.  If starts
with 0x096, it's a 0.96 host; 0x095 would be a 0.95 host.

I ask because I have a todo list entry to fix short packet handling on
xHCI 1.0 hosts:

"On Intel Lynx Point xHCI, when you plug in a HS USB webcam, the log
will fill with the error message:

ERROR Transfer event TRB DMA ptr not part of current TD

However, the video looks fine, and there is no impact on the driver
behavior.

This is caused by a change between the xHCI 0.96 and the xHCI 1.0 spec.
The change describes in section 4.10.1.1 how the xHCI host controller
should handle short transfers that happen on a TD comprised of more than
one TRB chained together. The 0.96 spec says that the host will only
send one event with a short completion code for the TD. The 1.0 spec
says if the short packet happened on a TRB that wasn't the last TRB in
the TD, then we will get two events with short completions (one for the
short TRB and one for the last TRB with the IOC flag set). The 1.0 spec
says that the xHCI driver shall not consider the TD to be done until the
second event is received.

The current xHCI driver behavior finishes the TD when it receives the
first short event. Then when it receives the second event, it prints the
warning message.

This also appeared under the xHCI 1.0 host in Intel Panther Point xHCI,
but it was mistaken for a known hardware bug, the spurious successful
event bug. The work-around that was put into the driver masks this xHCI
driver bug.

Impact on the driver is minimal for this bug.

The easy fix is just to change the XHCI_SPURIOUS_SUCCESS quirk to be
applied for all 1.0 xHCI hosts. However, that doesn't solve the race
condition that exists when the endpoint is stopped before the second
event TRB is received.

The harder fix is to either add an event data TRB after the chained TD
(and stop using ISP and IOC flags), or make the xHCI ring handling code
wait for the second event completion for xHCI 1.0 hosts."

Sarah Sharp
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html