On Mon, 1 Jul 2013, Laurent Pinchart wrote:

> > What about error recovery for insane systems?  If we do get a 50-ms gap in
> > the data stream, what's the best way for the UVC driver to learn this has
> > happens and attempt to carry on?
> 
> When isochronous packets are lost video frames get corrupted. UVC doesn't 
> provide sequence numbers for packets, so there's no way to know exactly how 
> many packets have been lost.
> 
> However, several synchronization methods are available for the driver to 
> detect frame boundaries (namely an End-Of-Frame bit and a Frame ID bit that 
> toggles for every frame in isochronous packet headers). The driver already 
> handles those bits and marks video frames as complete when the EOF bit or an 
> FID transition is detected. For uncompressed formats the driver then checks 
> whether the received data size matches the frame size, and marks the frame as 
> bad if it doesn't. That check can't be performed for compressed formats as 
> the 
> frame size is then variable.
> 
> It would thus be helpful to receive a notification when a gap in the data 
> stream is detected, through the URB status field for instance. The driver 
> could then mark the frame being received as faulty and reset its state 
> machine 
> to resynchronize to the next frame boundary.

One possibility is to set urb->status to -EXDEV if an underrun has
caused the entire URB to be too late.  Currently, under those
conditions the HCD rejects the URB submission with a -EXDEV error code,
which doesn't seem to be the right approach.  Drivers don't expect
isochronous _submissions_ to fail, although they are prepared to see
failure statuses for isochronous _completions_.

Of course, it has been true all along that the status fields in the
URB's individual usb_iso_packet_descriptor structures indicate any
errors.  But HCDs tend to set urb->status to 0 always, for isochronous
URBs.  Is there any advantage to setting urb->status to -EXDEV, given
that we already set urb->iso_frame_desc[i].status to -EXDEV for each i?

(There's also an urb->error_count field, which reports how many of the
isochronous packets got an error.  It is hardly used; I think we could
remove it.)

This boils down to three possibilities for how to handle URBs that were
submitted too late -- that is, so late that all the time slots for all
the URB's packets are known to have expired already:

    (1) Reject the submission with -EXDEV -- this is what we do now.

    (2) Accept the submission, but have the URB complete immediately
        with urb->status and all the packet statuses set to -EXDEV.

    (3) Accept the submission, but have the URB complete immediately
        with urb->status set to 0 and all the packet statuses set to 
        -EXDEV.

For (1), the only way to recover is to submit an URB with URB_ISO_ASAP
set.  This is essentially the same as shutting down the stream and 
restarting it.

For (2) and (3), the stream's "next" time slot value would be updated
in the usual way.  For example, if 10 slots had expired and the driver
was submitting URBs containing 4 packets each, then the first and
second URBs would complete right away with errors, and the first two
packets of the third URB would get errors, but the last two packets of
the third URB would be assigned to the two upcoming time slots and
would be treated normally.  Thus recovery would be automatic, at the
cost of "wasting" two URBs.  Since we expect underruns to be rare, 
maybe this is acceptable.

With (2) or (3), the driver could also recover right away by setting
the URB_ISO_ASAP flag on its next URB, but then synchronization would
be lost.  You wouldn't want to do this if synchronization matters.  
But if it doesn't, the driver could simply set URB_ISO_ASAP on every
URB to avoid worrying about the problem -- the flag would have no
effect in the absence of underruns.

(The main reason the current code uses (1) is because (2) and (3)  
require completion to occur _during_ submission.  Resubmissions would
thus be nested, consuming excessive stack space, and there would be a
deadlock if the completion handler tried to acquire a lock that was
held during submission.  When a tasklet is used for completions, none
of these problems arise.)

Any preference?  Clemens, what do you think?

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to