On Sat, 7 Oct 2006, Christopher "Monty" Montgomery wrote:

> Let's establish how to report missed slots, I'll update the patches,
> and then try to figure out what's going on with usbaudio.

Okay, let's move on to discuss this.  Part of the problem with our earlier
discussion has been that there are actually 3 queues, any of which can dry
up:

        Queue A is the flow of data from the application to snd-usb-audio
        or some other high-level driver.  If this queue drains it is an
        xrun and quite probably loss-of-sync.  The driver is free to
        report the error to the application any way it wants to.  This is 
        where application latency matters.

        Queue B is the flow of URBs from the driver to ehci-hcd.  If this
        queue drains then the bandwidth is deallocated, something you
        desperately want to avoid.  It's up to the higher-level driver to
        keep the queue non-empty, even if that means submitting URBs with
        dummy data.  Latency has no effect here.

        Queue C is the flow of packets from the host controller to the
        device.  If this queue drains it is a loss-of-sync.  The
        higher-level driver can find out about these occurrences only
        by checking various return status codes from ehci-hcd, and then
        it has to decide how to relay the information to the application.
        This is where kernel and IRQ latency matter.

First a word of warning.  ISO transfers are UNRELIABLE!  A data-out packet
sent by the host might not be received by the device, and the host would
have no way to know.  Obviously such errors can't be reported since
ehci-hcd never realizes they occur.  Of course, this doesn't reduce our
obligation to report accurately the errors we _do_ know about.


Errors caused by Queue A draining presumably are already handled in a 
satisfactory manner.  If they aren't, it's a matter for the higher-level 
driver; ehci-hcd has nothing to do with it.

Errors caused by Queue B draining have a fixed meaning according to the
API, and they are catastrophic.  Fortunately they are easily avoided if
the higher-level driver is properly configured.

So we only need to consider errors caused by Queue C draining.  Currently 
there is no standardized way to report these errors back to the 
higher-level driver.  Looking through documentation/usb/error-codes.txt, 
the closest thing we see are these wonderful entries:

-EXDEV                  ISO transfer only partially completed
                        look at individual frame status for details

-EINVAL                 ISO madness, if this happens: Log off and go home

That's why I chose EXDEV in uhci-hcd; it seemed to be the closest match.  
This might be a good time to settle the matter once and for all.

(Incidentally, the descriptions for EAGAIN and EFBIG are confusing and 
possibly overlapping.  We should straighten them out as well.)


To be definite, let's suppose a periodic stream has been allocated a
specific series of slots, and up until slot N-1 everything has been okay.  
Now URB U is submitted, supposedly starting in slot N.  But something has
gone wrong, and ehci-hcd isn't able to add U to the hardware schedule in
time for slot N to be filled.  What should happen?

Sometimes it will be apparent at submission time that U is already too 
late.  For instance, slot N's microframe might already be over.  In such 
cases it is possible to return a submission error.  Let's call this option 
#1.

Sometimes it won't be apparent until later that the slot was missed.  
When this happens, ehci-hcd is unable to report a submission error U.  
Another possible approach is to report a submission error for the
following URB, U+1.  Let's call this option #2.

The only other reasonable option, #3, is to report an error upon the 
completion of U.  These two events (submission and completion) are the 
only chances ehci-hcd has to communicate with a higher-level driver.

So which option should we use?


As Dave has already mentioned, options #1 and #2 share certain practical
problems.  Returning an error for a submission when in fact the submission
was accepted is _not_ a good policy.  The return code could be a positive
value, not a negative -Exxx code.  Then it would be necessary to audit all
the USB drivers that use ISO endpoints, because there could easily be
cases where the driver checks only for 0 or nonzero.

We could in fact return an error code and _not_ accept the submission,
counting on the driver to realize what went wrong and make a new
submission.  I think this is overly complicated.  And it provides no hint
as to exactly how many slots were missed.  Finally, it increases the
amount of work performed by both the higher-level driver and ehci-hcd --
exactly the sort of thing you _don't_ want to do when a stream is lagging
behind.

Lastly, there is the objection that sometimes option #1 can't be used 
because ehci-hcd doesn't know about the error until later.  The same 
objection applies to option #2: If the driver submits several URBs in 
quick succession, ehci-hcd might not realize until after all of them are 
submitted that the first one was submitted too late.

That's why I chose to use option #3 in uhci-hcd.  It has the advantage of
being reliable and not messing up any existing code by changing an
accepted API.  It also indicates exactly which slots were missed, by the
error codes in urb->iso_frame_desc[n].status.  It has the disadvantage of
delaying the notification for some number of milliseconds after sync was
lost.

I don't know just how bad that disadvantage is.  No doubt it depends on
the nature of the application.  However the fact remains, once sync has
been lost it's already too late to recover fully.  All you can do is
recover as much as possible, as quickly as possible.  Delaying the error
notification until U completes won't slow down recovery significantly.


There's one final matter I want to bring up, having to do with 
urb->start_frame.  The API doesn't say much about how start_frame should 
be interpreted once a stream is already established.  The assumption is 
that drivers won't use it, specifying URB_ISO_ASAP instead to cause each 
URB to fill the slot immediately following the end of the previous URB.

I don't know how ohci-hcd and ehci-hcd treat start_frame in an established 
stream.  Here's what uhci-hcd does: It compares start_frame with the 
actual frame number of the next slot (the one immediately following the 
end of the previous URB).  If the values agree, well and good.  If they 
disagree, the submission is rejected with -EINVAL.

A different and completely reasonable policy would be to check that
start_frame does indeed match one of the allocated future slots and to
start the URB in that slot, leaving the intervening slots empty.  This 
would make it a little easier for higher-level drivers to retain their 
bandwidth reservation when Queue A runs dry.  However that's not how 
uhci-hcd currently works.

Thoughts?

Alan Stern


-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys -- and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
linux-usb-devel@lists.sourceforge.net
To unsubscribe, use the last form field at:
https://lists.sourceforge.net/lists/listinfo/linux-usb-devel

Reply via email to