Re: [linux-usb-devel] [PATCH] EHCI interrupt transactions failing during CPU frequency changes

Stuart_Hayes Mon, 26 Mar 2007 13:27:00 -0800

Sorry for the delayed response on this... I've been working on it and
other stuff at the same time.  I found a couple bugs with my last patch,
and have tested quite a bit on both Intel & ServerWorks systems.  Here's
the new patch (very similar to the previous one I sent), and responses
to your questions below.


David Brownell wrote:
> On Wednesday 07 March 2007 1:10 pm, [EMAIL PROTECTED] wrote:
>> 
>> OK, I've made an attempt to solve this issue without using such a
>> large hammer.  Instead of shutting off the entire periodic schedule
>> during cpufreq changes, this patch will just inactivate (using the
>> "I" 
>> bit in
>> qh->hw_info1, currently unused in the EHCI driver) any queue heads
>> qh->that
>> are to full/low speed interrupt endpoints, during the cpufreq
>> changes. 
> 
> Makes sense ... although I don't trust your safe_to_inactivate()
> logic, that's inherently racey.  You could check to see if the
> transaction has started, and decide it didn't (and thus set the I
> bit) just when the controller starts that transaction.  The race
> seems unavoidable    
> 

safe_to_inactivate() shouldn't be racy, because I wait one full
microframe after the transaction should have been started before
checking QTD_STS_STS... the controller should write this bit during the
microframe that the transaction was started.  I did actually see
problems (rarely) when I checked QTD_STS_STS in the microframe
immediately after the transaction should have started, since I guess
QTD_STS_STS sometimes gets updated right at the end of the microframe.

> I had way too much ... "fun" isn't quite the word ... chasing issues
> like that with fault handling in the async ring.  Such issues might
> be less common with the periodic schedule; maybe.  When the driver
> loses that race, this comes up as an MMF error right?   
>
 
Yes, that's what happened when I was checking QTD_STS_STS the uframe
immedately after the transaction should have started.

> 
>> I have found that the ICH7 works without this patch, because it
>> caches 7 uframes of the periodic schedule, so it never has to read
>> main 
>> memory in order to complete a split transaction in time.  The
>> Broadcom/Serverworks HT1000 only caches 1 uframe, so it fails
>> without this patch. 
> 
> I've always wondered how that caching is supposed to work.  I suspect
> that you could make the ICH7 fail too, if the periodic schedule were
> busy enough ... it's not like they'll allocate enough silicon to
> cache every possible QH in SRAM inside that controller!  That could
> mean a LOT of memory for caching.  Right?    
> 
> Not to poke too much at all your good work here, but how would ICH7
> fail in that case, given the logic in this patch? 
> 

Good question... maybe the caching indicates how many microframes in
advance the controller COULD cache the transactions--but it won't
necessarily do so if there are too many of them.

So the thing to do is to assume that the controller might have the QH
cached like it claims (i.e., don't mess with the QH if we're within
ehci->i_thresh of starting the transaction), but don't assume it will
necessarily cache everything (i.e., register the cpufreq notifier even
if i_thresh appears to be big enough that you wouldn't need it), just in
case.

> 
>> I have tested this code on both the HT1000 and ICH7 (I tested the
>> patch on the ICH7 with the cpufreq notifier registered, too, even
>> though the patch won't even register it, just to make sure it would
>> work correctly).
> 
> How heavily were your periodic schedules loaded?  The
> /sys/class/usb_host debug file labeled "periodic" should provide a
> textual display.  It would be good to test both hardware configs with
> a "lot" of USB mice hooked up.   
> 
> Not that the current EHCI periodic scheduler handles very many, but
> ISTR that three full/low speed devices per TT was realistic.  And
> high speed hubs with multiple TTs are easy enough to find, given that
> Cypress chip.  At Dell, I expect you have one or two extra USB mice
> and keyboards lying around that could be used to try abusing your
> patch.  :)     
> 

I guess this isn't all that relevant if I enable the notifier even for
the controllers that say that they'll cache 7 uframes, but I ran for >4
days with this load with no problems (I added line feeds, and deleted
repeated QHs & high-speed QHs, in the text below).  (I ran this on both
an AMD/ServerWorks HT1000 system, and an Intel ICH7 system.)

   7:  qh8-0e01/ffff810001665480 (l5 ep1in [1/2] q1 p8)
        qh8-1c02/ffff810001665780 (l7 ep1in [1/2] q1 p8)
        qh8-0e01/ffff810001665d80 (l12 ep1in [1/2] q1 p8)
        qh8-0e01/ffff810001665e40 (l12 ep2in [1/2] q1 p8)
        qh8-7008/ffff810037fee6c0 (l5 ep2in [1/2] q1 p8)
        qh8-0e01/ffff810037fee780 (l13 ep1in [1/2] q1 p4)
  15:  qh16-3804/ffff810001665900 (f8 ep1in [1/2] q1 p8)
        qh16-3804/ffff810001665b40 (l10 ep1in [1/2] q1 p8)
        qh16-7008/ffff810037fee0c0 (f14 ep1in [1/2] q1 p8)
  31:  qh32-7008/ffff810037fee180 (f14 ep2in [1/2] q1 p4)
 127:  qh128-0e01/ffff810001665600 (f6 ep1in [1/2] q1 p1)

>> This patch adds more code, but it avoids a couple of the issues that
>> were brought up with the previous patch:
>> 
>> 1--This patch doesn't mess with isochronous transfers (or high speed
>> interrupt transfers) at all.
> 
> That seems to be a good resolution in both cases.  ISO transfers are
> "best effort" in any case, and you hadn't identified issues except
> with full/low speed interrupt transfers.  
> 
> 
>> 2--The notifier will take very little time unless the cpufreq change
>> happens while a split transaction is in progress (or is already
>> cached 
>> in the controller).
> 
> Nice!
> 
> 
>> I can repost it with a more concise summarization of the problem/fix
>> and a signed-off-by if it looks ok.
> 
> It basically looks OK.  Please start adopting the "func(parms)" style
> in that code, vs "func (parms)" ... IMO it's less readable, but it is
> what Documentation/CodingStyle says we now prefer.  Also, don't
> forget to have a brief comment in the source code saying what the
> cpufreq notifier is trying to avoid.  (ISTR:  MMF failures caused by
> DMA not being able to access memory while it's being reclocked.)     
> 

I have made these changes.

Thanks!
Stuart

take9_i_bit_b.patch
Description: take9_i_bit_b.patch

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV

_______________________________________________
[email protected]
To unsubscribe, use the last form field at:
https://lists.sourceforge.net/lists/listinfo/linux-usb-devel

Re: [linux-usb-devel] [PATCH] EHCI interrupt transactions failing during CPU frequency changes

Reply via email to