Re: [RFC PATCH 0/4] USB: HCD/EHCI: giveback of URB in tasklet context

2013-06-14 Thread Greg Kroah-Hartman
On Fri, Jun 14, 2013 at 09:53:52AM +0800, Ming Lei wrote:
 On Fri, Jun 14, 2013 at 8:35 AM, Greg Kroah-Hartman
 gre...@linuxfoundation.org wrote:
  On Thu, Jun 13, 2013 at 03:41:17PM -0400, Alan Stern wrote:
  On Thu, 13 Jun 2013, Greg Kroah-Hartman wrote:
 
   On Thu, Jun 13, 2013 at 10:54:13AM -0400, Alan Stern wrote:
On Thu, 13 Jun 2013, Ming Lei wrote:
   
 - using interrupt threaded handler(default)
 33.440 MB/sec

 - using tasklet(#undef USB_HCD_THREADED_IRQ)
 34.29 MB/sec

 - using hard interrupt handler(by removing HCD_BH in ehci-hcd.c )
 34.260 MB/s


 So looks usb mass storage performance loss can be observed with
 interrupt threaded handler because one mass storage read/write 
 sectors
 requires at least 3 interrupts which wake up usb-storage thread 3 
 times
 (each interrupt wakeup the usb-storage each time), introducing irq 
 threaded
 handler will make 2 threads to be waken up about 6 times for one 
 read/write.

 I think usb mass storage transfer handler need to be rewritten, 
 otherwise
 it may become worsen after using irq threaded handler in USB 3.0.(the
 above device can reach 120MB/sec with hardware handler or tasklet 
 handler,
 which means about ~3K interrupts/sec, so ~6K contexts switch in case 
 of
 using irq threaded handler)

 So how about supporting tasklet first, then convert to interrupt
 threaded handler
 after usb mass storage transfer is rewritten without performance 
 loss?
 (rewriting
 usb mass storage transfer handler may need some time and work since 
 storage
 stability/correctness is extremely important, :-)
   
Maybe we should simply copy what the networking people do.  They are
very concerned about performance and latency; whatever technique they
use should be good for USB too.
  
   Yes, but for old-style usb-storage, is this really a big deal?  We are
   still easily hitting the line-speed of USB for usb-storage with simple
   machines, the bottlenecks that I'm seeing are in the devices themselves,
   and then in the USB wire speed.
 
  What about with USB-3 storage devices?  Many of them still use the
  bulk-only transport instead of UAS.  They may push the limits up.
 
  Are they really?  Have we seen that happen yet?  With the number's I've
 
 Yes, the device I am testing is bulk-only, no uas support , and it is very
 popular in market.
 
  seen published, we are easily serving up enough data to keep the pipe
  full, but that all depends on your CPU / host controller.
 
   Once hardware comes out that uses USB streams, and we get device support
   for the UAS protocol, then we might have a need to change things, but at
   this point in time, for the old driver, I think we are fine.
  
   Unless someone has a workload / benchmark that shows otherwise?
 
  The test results above show a 2.4% degradation for threaded interrupts
  as compared to tasklets.  That's in addition to the bottlenecks caused
  by the device; no doubt it would be worse for a faster device.  This
  result calls into question the benefits of threaded interrupts.
 
  The main reason for moving away from the current scheme is to reduce
  latency for other interrupt handlers.  Ming gave two examples of slow
  USB code that runs in hardirq context now; with his change they would
  run in softirq context and therefore wouldn't delay other interrupts so
  much.  (Interrupt latency is hard to measure, however.)
 
  Yes, I know that people keep wanting to worry about latency issues, and
  the best answer for them has always been, don't use USB. :)
 
 I think we can do it better, why don't do it? :-)

Because of other issues, that have been brought up here already.

But if you can do it without affecting others, that's fine.

  You suffer throughput issues with predicitable latency dependancies, so
 
 This patchset don't cause throughout degradation but decrease latency much,
 also has other advantages.

Like what?

  we need to be careful we don't slow down the 99% of the systems out
  there that do not care about this at all.
 
 Considered great amount of ARM devices in market, I think we need to
 consider the problem on these devices, :-)

Is it a problem on those devices?  I think they have host controller
issues that are way bigger problems than this device driver, right?

greg k-h
--
To unsubscribe from this list: send the line unsubscribe linux-usb in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 0/4] USB: HCD/EHCI: giveback of URB in tasklet context

2013-06-14 Thread Ming Lei
On Fri, Jun 14, 2013 at 2:05 PM, Greg Kroah-Hartman
gre...@linuxfoundation.org wrote:
 On Fri, Jun 14, 2013 at 09:53:52AM +0800, Ming Lei wrote:
 On Fri, Jun 14, 2013 at 8:35 AM, Greg Kroah-Hartman
 gre...@linuxfoundation.org wrote:
  On Thu, Jun 13, 2013 at 03:41:17PM -0400, Alan Stern wrote:
  On Thu, 13 Jun 2013, Greg Kroah-Hartman wrote:
 
   On Thu, Jun 13, 2013 at 10:54:13AM -0400, Alan Stern wrote:
On Thu, 13 Jun 2013, Ming Lei wrote:
   
 - using interrupt threaded handler(default)
 33.440 MB/sec

 - using tasklet(#undef USB_HCD_THREADED_IRQ)
 34.29 MB/sec

 - using hard interrupt handler(by removing HCD_BH in ehci-hcd.c )
 34.260 MB/s


 So looks usb mass storage performance loss can be observed with
 interrupt threaded handler because one mass storage read/write 
 sectors
 requires at least 3 interrupts which wake up usb-storage thread 3 
 times
 (each interrupt wakeup the usb-storage each time), introducing irq 
 threaded
 handler will make 2 threads to be waken up about 6 times for one 
 read/write.

 I think usb mass storage transfer handler need to be rewritten, 
 otherwise
 it may become worsen after using irq threaded handler in USB 
 3.0.(the
 above device can reach 120MB/sec with hardware handler or tasklet 
 handler,
 which means about ~3K interrupts/sec, so ~6K contexts switch in 
 case of
 using irq threaded handler)

 So how about supporting tasklet first, then convert to interrupt
 threaded handler
 after usb mass storage transfer is rewritten without performance 
 loss?
 (rewriting
 usb mass storage transfer handler may need some time and work since 
 storage
 stability/correctness is extremely important, :-)
   
Maybe we should simply copy what the networking people do.  They are
very concerned about performance and latency; whatever technique they
use should be good for USB too.
  
   Yes, but for old-style usb-storage, is this really a big deal?  We are
   still easily hitting the line-speed of USB for usb-storage with simple
   machines, the bottlenecks that I'm seeing are in the devices themselves,
   and then in the USB wire speed.
 
  What about with USB-3 storage devices?  Many of them still use the
  bulk-only transport instead of UAS.  They may push the limits up.
 
  Are they really?  Have we seen that happen yet?  With the number's I've

 Yes, the device I am testing is bulk-only, no uas support , and it is very
 popular in market.

  seen published, we are easily serving up enough data to keep the pipe
  full, but that all depends on your CPU / host controller.
 
   Once hardware comes out that uses USB streams, and we get device support
   for the UAS protocol, then we might have a need to change things, but at
   this point in time, for the old driver, I think we are fine.
  
   Unless someone has a workload / benchmark that shows otherwise?
 
  The test results above show a 2.4% degradation for threaded interrupts
  as compared to tasklets.  That's in addition to the bottlenecks caused
  by the device; no doubt it would be worse for a faster device.  This
  result calls into question the benefits of threaded interrupts.
 
  The main reason for moving away from the current scheme is to reduce
  latency for other interrupt handlers.  Ming gave two examples of slow
  USB code that runs in hardirq context now; with his change they would
  run in softirq context and therefore wouldn't delay other interrupts so
  much.  (Interrupt latency is hard to measure, however.)
 
  Yes, I know that people keep wanting to worry about latency issues, and
  the best answer for them has always been, don't use USB. :)

 I think we can do it better, why don't do it? :-)

 Because of other issues, that have been brought up here already.

 But if you can do it without affecting others, that's fine.

It won't affect others. As discussed in the thread, we can do the change
with following steps:

1), move URB giveback from hard irq handler to tasklet, but call complete()
with local IRQs disabled;

2), cleanup current drivers which call 'spin_lock' in compele() by
replacing spin_lock() with spin_lock_irqrestore(), and change the API of
complete() callback by claiming it called with local IRQs enabled, but bh is
disabled.

3), enable local IRQs before calling complete().

There is no good reason for complete() to run with interrupt disabled, as
discussed, so the API change and driver cleanup still makes sense.

In fact, only the below two cases requires the 2nd step's change:

- drivers-complete() acquire a subsystem wide lock which may be called
in another hard irq context, and the subsystem wide lock is acquired by
spin_lock() in complete()

- drivers-complete() hold a private lock with spin_lock() but may export
APIs to let other drivers acquire the private lock in its interrupt handler.

Looks both two 

Re: [RFC PATCH 0/4] USB: HCD/EHCI: giveback of URB in tasklet context

2013-06-14 Thread Ming Lei
On Fri, Jun 14, 2013 at 10:56 PM, Alan Stern st...@rowland.harvard.edu wrote:
 On Fri, 14 Jun 2013, Ming Lei wrote:

 On Fri, Jun 14, 2013 at 3:41 AM, Alan Stern st...@rowland.harvard.edu 
 wrote:
 
  The main reason for moving away from the current scheme is to reduce
  latency for other interrupt handlers.  Ming gave two examples of slow
  USB code that runs in hardirq context now; with his change they would
  run in softirq context and therefore wouldn't delay other interrupts so
  much.  (Interrupt latency is hard to measure, however.)

 With the two trace points of irq_handler_entry and irq_handler_exit, the
 interrupt latency(or the time taken by hard irq handler) isn't hard to 
 measure.
 One simple script can figure out the average/maximum latency for one irq
 handler, like I did in 4/4.

 But that doesn't measure the time between when the IRQ request is
 issued and when irq_handler_entry runs.

That might be hard to measure, also I am wondering if the time can be
measured only by software, but these patches only focus on the time
between HCD irq entry and irq exit.


Thanks,
--
Ming Lei
--
To unsubscribe from this list: send the line unsubscribe linux-usb in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 0/4] USB: HCD/EHCI: giveback of URB in tasklet context

2013-06-14 Thread Alan Stern
On Fri, 14 Jun 2013, Ming Lei wrote:

  With the two trace points of irq_handler_entry and irq_handler_exit, the
  interrupt latency(or the time taken by hard irq handler) isn't hard to 
  measure.
  One simple script can figure out the average/maximum latency for one irq
  handler, like I did in 4/4.
 
  But that doesn't measure the time between when the IRQ request is
  issued and when irq_handler_entry runs.
 
 That might be hard to measure, also I am wondering if the time can be
 measured only by software, but these patches only focus on the time
 between HCD irq entry and irq exit.

Not entirely.  On a UP system, leaving interrupts disabled for a long
time (which is what we do now) increases the delay between when the IRQ
is raised and when it is serviced.  On an SMP system, a long-running 
interrupt handler will delay servicing a different device that shares 
the same IRQ line.

Alan Stern

--
To unsubscribe from this list: send the line unsubscribe linux-usb in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 0/4] USB: HCD/EHCI: giveback of URB in tasklet context

2013-06-14 Thread Thomas Gleixner
On Fri, 14 Jun 2013, Alan Stern wrote:

 On Fri, 14 Jun 2013, Ming Lei wrote:
 
   With the two trace points of irq_handler_entry and irq_handler_exit, the
   interrupt latency(or the time taken by hard irq handler) isn't hard to 
   measure.
   One simple script can figure out the average/maximum latency for one irq
   handler, like I did in 4/4.
  
   But that doesn't measure the time between when the IRQ request is
   issued and when irq_handler_entry runs.
  
  That might be hard to measure, also I am wondering if the time can be
  measured only by software, but these patches only focus on the time
  between HCD irq entry and irq exit.
 
 Not entirely.  On a UP system, leaving interrupts disabled for a long
 time (which is what we do now) increases the delay between when the IRQ
 is raised and when it is serviced.  On an SMP system, a long-running 
 interrupt handler will delay servicing a different device that shares 
 the same IRQ line.
 
And on UP it delays ALL other interrupts. I've seen 500us+ caused by
the USB interrupt handlers...

Thanks,

tglx
--
To unsubscribe from this list: send the line unsubscribe linux-usb in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 0/4] USB: HCD/EHCI: giveback of URB in tasklet context

2013-06-14 Thread Ming Lei
On Sat, Jun 15, 2013 at 4:23 AM, Alan Stern st...@rowland.harvard.edu wrote:

 Not entirely.  On a UP system, leaving interrupts disabled for a long
 time (which is what we do now) increases the delay between when the IRQ
 is raised and when it is serviced.  On an SMP system, a long-running

Yes, I mean the HCD IRQ handling time is already too long, and it
isn't affected by the time between raising and servicing the IRQ.

 interrupt handler will delay servicing a different device that shares
 the same IRQ line.

Not always so.

Currently, ARM can only set one irq line to be served by one of CPU
at the same time for power saving, which still results in above situation.
In fact, without irq-balance, on ARM, all IRQs are handled by CPU0 only.

On Sat, Jun 15, 2013 at 5:09 AM, Thomas Gleixner t...@linutronix.de wrote:
 And on UP it delays ALL other interrupts. I've seen 500us+ caused by
 the USB interrupt handlers...

On SMP the above case may be worse than UP, when the same completion
things(from hw view) happen on one of CPU, the releasing  reacquiring HCD
private lock in interrupt handler may cause longer time.


Thanks,
--
Ming Lei
--
To unsubscribe from this list: send the line unsubscribe linux-usb in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 0/4] USB: HCD/EHCI: giveback of URB in tasklet context

2013-06-13 Thread Alan Stern
On Thu, 13 Jun 2013, Ming Lei wrote:

 - using interrupt threaded handler(default)
 33.440 MB/sec
 
 - using tasklet(#undef USB_HCD_THREADED_IRQ)
 34.29 MB/sec
 
 - using hard interrupt handler(by removing HCD_BH in ehci-hcd.c )
 34.260 MB/s
 
 
 So looks usb mass storage performance loss can be observed with
 interrupt threaded handler because one mass storage read/write sectors
 requires at least 3 interrupts which wake up usb-storage thread 3 times
 (each interrupt wakeup the usb-storage each time), introducing irq threaded
 handler will make 2 threads to be waken up about 6 times for one read/write.
 
 I think usb mass storage transfer handler need to be rewritten, otherwise
 it may become worsen after using irq threaded handler in USB 3.0.(the
 above device can reach 120MB/sec with hardware handler or tasklet handler,
 which means about ~3K interrupts/sec, so ~6K contexts switch in case of
 using irq threaded handler)
 
 So how about supporting tasklet first, then convert to interrupt
 threaded handler
 after usb mass storage transfer is rewritten without performance loss?
 (rewriting
 usb mass storage transfer handler may need some time and work since storage
 stability/correctness is extremely important, :-)

Maybe we should simply copy what the networking people do.  They are 
very concerned about performance and latency; whatever technique they 
use should be good for USB too.

 Also another problem with irq threaded handler is that there is no sort of
 tasklet_schedule() interface to wakeup the thread handler manually, so
 I have to use work to schedule some URB giveback from drivers(root hub
 transfer, unlink),  even though that isn't a big deal but will cause code a 
 bit
 much/complicated, :-)

Yes, I was going to bring that up.

Thomas, sometimes we need the IRQ handler thread to do some work even
though an interrupt hasn't occurred.  Is there an API for this, or can 
one be added?

Alan Stern

--
To unsubscribe from this list: send the line unsubscribe linux-usb in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 0/4] USB: HCD/EHCI: giveback of URB in tasklet context

2013-06-13 Thread Alan Stern
On Thu, 13 Jun 2013, Ming Lei wrote:

  For reasonably sophisticated host controllers like EHCI, the delay in
  responding to an interrupt doesn't matter much.  But for simpler host
 
 EHCI might still benefit from the change: suppose uvc video is playing on
 one EHCI bus, and mass storage's performance over another EHCI may
 decrease because of the IRQ latency introduced.

If multiple CPUs are involved, then we could see an improvement in the 
case where the two EHCI controllers share the same IRQ line.  With 
different IRQs, though, everything would run in parallel on different 
CPUs, so there is no advantage.

  At this point we are talking about multiple changes:
 
  Moving the givebacks to a tasklet or threaded handler.
 
  Changing the USB completion handlers so that they can be called
  with interrupts enabled.
 
  Allowing the tasklet/thread to run with interrupts enabled.
 
  Moving more of the HCD processing into the tasklet/thread.
 
  Maybe other things too.  In principle, the first and second items can
  be done simultaneously.
 
 Very good summery, we can push the 1st change with disabling local IRQ
 when calling complete(), so that at least DMA unmapping can be done
 with IRQ enabled, and at the same time do the API change, which
 should be a bit slow but the mechanical way proposed by Oliver may
 be OK.
 
 The 3rd item is easy once the 1st two items are completed.
 
 For the 4th one, it might be a long term thing, since refactoring
 HCD interrupt handler is a bit complicated. Also, when the 1st
 three items are completed, hard interrupt handler takes less time,
 often less than 20us at average about EHCI, so I am wondering
 if the 4th is worthy.

Yes, I don't know about that.  If the time spent in the hardirq
processing is reasonably short then moving it to the threaded handler
won't help very much.

Alan Stern

--
To unsubscribe from this list: send the line unsubscribe linux-usb in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 0/4] USB: HCD/EHCI: giveback of URB in tasklet context

2013-06-13 Thread Greg Kroah-Hartman
On Thu, Jun 13, 2013 at 10:54:13AM -0400, Alan Stern wrote:
 On Thu, 13 Jun 2013, Ming Lei wrote:
 
  - using interrupt threaded handler(default)
  33.440 MB/sec
  
  - using tasklet(#undef USB_HCD_THREADED_IRQ)
  34.29 MB/sec
  
  - using hard interrupt handler(by removing HCD_BH in ehci-hcd.c )
  34.260 MB/s
  
  
  So looks usb mass storage performance loss can be observed with
  interrupt threaded handler because one mass storage read/write sectors
  requires at least 3 interrupts which wake up usb-storage thread 3 times
  (each interrupt wakeup the usb-storage each time), introducing irq threaded
  handler will make 2 threads to be waken up about 6 times for one read/write.
  
  I think usb mass storage transfer handler need to be rewritten, otherwise
  it may become worsen after using irq threaded handler in USB 3.0.(the
  above device can reach 120MB/sec with hardware handler or tasklet handler,
  which means about ~3K interrupts/sec, so ~6K contexts switch in case of
  using irq threaded handler)
  
  So how about supporting tasklet first, then convert to interrupt
  threaded handler
  after usb mass storage transfer is rewritten without performance loss?
  (rewriting
  usb mass storage transfer handler may need some time and work since storage
  stability/correctness is extremely important, :-)
 
 Maybe we should simply copy what the networking people do.  They are 
 very concerned about performance and latency; whatever technique they 
 use should be good for USB too.

Yes, but for old-style usb-storage, is this really a big deal?  We are
still easily hitting the line-speed of USB for usb-storage with simple
machines, the bottlenecks that I'm seeing are in the devices themselves,
and then in the USB wire speed.

Once hardware comes out that uses USB streams, and we get device support
for the UAS protocol, then we might have a need to change things, but at
this point in time, for the old driver, I think we are fine.

Unless someone has a workload / benchmark that shows otherwise?

thanks,

greg k-h
--
To unsubscribe from this list: send the line unsubscribe linux-usb in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 0/4] USB: HCD/EHCI: giveback of URB in tasklet context

2013-06-13 Thread Alan Stern
On Thu, 13 Jun 2013, Greg Kroah-Hartman wrote:

 On Thu, Jun 13, 2013 at 10:54:13AM -0400, Alan Stern wrote:
  On Thu, 13 Jun 2013, Ming Lei wrote:
  
   - using interrupt threaded handler(default)
   33.440 MB/sec
   
   - using tasklet(#undef USB_HCD_THREADED_IRQ)
   34.29 MB/sec
   
   - using hard interrupt handler(by removing HCD_BH in ehci-hcd.c )
   34.260 MB/s
   
   
   So looks usb mass storage performance loss can be observed with
   interrupt threaded handler because one mass storage read/write sectors
   requires at least 3 interrupts which wake up usb-storage thread 3 times
   (each interrupt wakeup the usb-storage each time), introducing irq 
   threaded
   handler will make 2 threads to be waken up about 6 times for one 
   read/write.
   
   I think usb mass storage transfer handler need to be rewritten, otherwise
   it may become worsen after using irq threaded handler in USB 3.0.(the
   above device can reach 120MB/sec with hardware handler or tasklet 
   handler,
   which means about ~3K interrupts/sec, so ~6K contexts switch in case of
   using irq threaded handler)
   
   So how about supporting tasklet first, then convert to interrupt
   threaded handler
   after usb mass storage transfer is rewritten without performance loss?
   (rewriting
   usb mass storage transfer handler may need some time and work since 
   storage
   stability/correctness is extremely important, :-)
  
  Maybe we should simply copy what the networking people do.  They are 
  very concerned about performance and latency; whatever technique they 
  use should be good for USB too.
 
 Yes, but for old-style usb-storage, is this really a big deal?  We are
 still easily hitting the line-speed of USB for usb-storage with simple
 machines, the bottlenecks that I'm seeing are in the devices themselves,
 and then in the USB wire speed.

What about with USB-3 storage devices?  Many of them still use the
bulk-only transport instead of UAS.  They may push the limits up.

 Once hardware comes out that uses USB streams, and we get device support
 for the UAS protocol, then we might have a need to change things, but at
 this point in time, for the old driver, I think we are fine.
 
 Unless someone has a workload / benchmark that shows otherwise?

The test results above show a 2.4% degradation for threaded interrupts 
as compared to tasklets.  That's in addition to the bottlenecks caused 
by the device; no doubt it would be worse for a faster device.  This 
result calls into question the benefits of threaded interrupts.

The main reason for moving away from the current scheme is to reduce
latency for other interrupt handlers.  Ming gave two examples of slow
USB code that runs in hardirq context now; with his change they would
run in softirq context and therefore wouldn't delay other interrupts so
much.  (Interrupt latency is hard to measure, however.)

Alan Stern

--
To unsubscribe from this list: send the line unsubscribe linux-usb in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 0/4] USB: HCD/EHCI: giveback of URB in tasklet context

2013-06-13 Thread Steven Rostedt
On Thu, 2013-06-13 at 15:41 -0400, Alan Stern wrote:

 The test results above show a 2.4% degradation for threaded interrupts 
 as compared to tasklets.  That's in addition to the bottlenecks caused 
 by the device; no doubt it would be worse for a faster device.  This 
 result calls into question the benefits of threaded interrupts.

That's because it was written like a top half and not a full blown
interrupt. I just looked at the patch, and saw this:

+#ifndef USB_HCD_THREADED_IRQ
if (sched) {
if (async)
tasklet_schedule(bh-bh);
else
tasklet_hi_schedule(bh-bh);
}
+#else
+   if (sched)
+   schedule_work(hcd-periodic_bh-work);
+#endif

What is this? The work isn't done by an interrupt thread, but by work queues!

The point of the interrupt thread is that you do *all* the work that
needs to be done when an interrupt comes in. You don't need to delay the
work.

If you just treat a threaded interrupt like a real interrupt and push
off work to something else, then yes, it will degrade performance.

If you go the threaded interrupt route, you need to rethink the
paradigm. There's no reason that the interrupt handler needs to be fast
like it needs to be in true interrupt context. The handler can now use
mutexes, and other full features that currently only threads benefit
from. It should improve locking issues, and can serialize things if
needed.

All this patch did was to switch the main irq to a thread and make a
bottom half into a work queue.

Why couldn't you just do:

if (sched)
usb_giveback_urb_bh(bh);

?

-- Steve


--
To unsubscribe from this list: send the line unsubscribe linux-usb in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 0/4] USB: HCD/EHCI: giveback of URB in tasklet context

2013-06-13 Thread Alan Stern
On Thu, 13 Jun 2013, Steven Rostedt wrote:

 On Thu, 2013-06-13 at 15:41 -0400, Alan Stern wrote:
 
  The test results above show a 2.4% degradation for threaded interrupts 
  as compared to tasklets.  That's in addition to the bottlenecks caused 
  by the device; no doubt it would be worse for a faster device.  This 
  result calls into question the benefits of threaded interrupts.
 
 That's because it was written like a top half and not a full blown
 interrupt. I just looked at the patch, and saw this:
 
 +#ifndef USB_HCD_THREADED_IRQ
 if (sched) {
 if (async)
 tasklet_schedule(bh-bh);
 else
 tasklet_hi_schedule(bh-bh);
 }
 +#else
 +   if (sched)
 +   schedule_work(hcd-periodic_bh-work);
 +#endif
 
 What is this? The work isn't done by an interrupt thread, but by work queues!

You don't understand the patch.  Most of the time, sched will be 0 
here and hence the work queue won't be involved.

Yes, part of the work is done by a work queue rather than the interrupt 
thread.  But it is an unimportant part, the part that involves 
transfers to root hubs or transfers that were cancelled.  These things 
can complete without any interrupt occurring, so they can't be handled 
by the interrupt thread.  However, they are the abnormal case; the 
transfers we care about are not to root hubs and they do complete 
normally.

 The point of the interrupt thread is that you do *all* the work that
 needs to be done when an interrupt comes in. You don't need to delay the
 work.

You've got it backward.  The patch doesn't leave part of the work 
undone when an interrupt occurs.  Rather it's the other way around -- 
sometimes work needs to be done when there isn't any interrupt.  This 
could happen in a timer callback, or it could happen as a direct result 
of a function call.

Since there doesn't seem to be any way to invoke the interrupt thread 
in the absence of an interrupt, Ming pushed the job off to a work 
queue.

 If you just treat a threaded interrupt like a real interrupt and push
 off work to something else, then yes, it will degrade performance.
 
 If you go the threaded interrupt route, you need to rethink the
 paradigm. There's no reason that the interrupt handler needs to be fast
 like it needs to be in true interrupt context. The handler can now use
 mutexes, and other full features that currently only threads benefit
 from. It should improve locking issues, and can serialize things if
 needed.
 
 All this patch did was to switch the main irq to a thread and make a
 bottom half into a work queue.

In case it's not clear, the code you quoted above is part of the 
interrupt handler, not part of the thread.

 Why couldn't you just do:
 
   if (sched)
   usb_giveback_urb_bh(bh);
 
 ?

Because usb_giveback_urb_bh() is supposed to run in the context of the
tasklet or interrupt thread or work queue, not in the context of the
interrupt handler.

Alan Stern

--
To unsubscribe from this list: send the line unsubscribe linux-usb in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 0/4] USB: HCD/EHCI: giveback of URB in tasklet context

2013-06-13 Thread Steven Rostedt
On Thu, 2013-06-13 at 17:09 -0400, Alan Stern wrote:
 On Thu, 13 Jun 2013, Steven Rostedt wrote:
 
  On Thu, 2013-06-13 at 15:41 -0400, Alan Stern wrote:
  
   The test results above show a 2.4% degradation for threaded interrupts 
   as compared to tasklets.  That's in addition to the bottlenecks caused 
   by the device; no doubt it would be worse for a faster device.  This 
   result calls into question the benefits of threaded interrupts.
  
  That's because it was written like a top half and not a full blown
  interrupt. I just looked at the patch, and saw this:
  
  +#ifndef USB_HCD_THREADED_IRQ
  if (sched) {
  if (async)
  tasklet_schedule(bh-bh);
  else
  tasklet_hi_schedule(bh-bh);
  }
  +#else
  +   if (sched)
  +   schedule_work(hcd-periodic_bh-work);
  +#endif
  
  What is this? The work isn't done by an interrupt thread, but by work 
  queues!
 
 You don't understand the patch.

Hey, I'll admit that I don't understand how USB works ;-)

I also only looked at the second patch without applying it. Thus, a lot
of the changes were out of context for me.


   Most of the time, sched will be 0 
 here and hence the work queue won't be involved.

OK, I only compared that current tasklets were being commented out, and
that's pretty much all I had to go on.

 
 Yes, part of the work is done by a work queue rather than the interrupt 
 thread.  But it is an unimportant part, the part that involves 
 transfers to root hubs or transfers that were cancelled.  These things 
 can complete without any interrupt occurring, so they can't be handled 
 by the interrupt thread.  However, they are the abnormal case; the 
 transfers we care about are not to root hubs and they do complete 
 normally.
 
  The point of the interrupt thread is that you do *all* the work that
  needs to be done when an interrupt comes in. You don't need to delay the
  work.
 
 You've got it backward.  The patch doesn't leave part of the work 
 undone when an interrupt occurs.  Rather it's the other way around -- 
 sometimes work needs to be done when there isn't any interrupt.  This 
 could happen in a timer callback, or it could happen as a direct result 
 of a function call.
 
 Since there doesn't seem to be any way to invoke the interrupt thread 
 in the absence of an interrupt, Ming pushed the job off to a work 
 queue.
 
  If you just treat a threaded interrupt like a real interrupt and push
  off work to something else, then yes, it will degrade performance.
  
  If you go the threaded interrupt route, you need to rethink the
  paradigm. There's no reason that the interrupt handler needs to be fast
  like it needs to be in true interrupt context. The handler can now use
  mutexes, and other full features that currently only threads benefit
  from. It should improve locking issues, and can serialize things if
  needed.
  
  All this patch did was to switch the main irq to a thread and make a
  bottom half into a work queue.
 
 In case it's not clear, the code you quoted above is part of the 
 interrupt handler, not part of the thread.

Got it.

 
  Why couldn't you just do:
  
  if (sched)
  usb_giveback_urb_bh(bh);
  
  ?
 
 Because usb_giveback_urb_bh() is supposed to run in the context of the
 tasklet or interrupt thread or work queue, not in the context of the
 interrupt handler.

I only took a quick look at the second patch. I'm now looking at both
patches applied to the code. I didn't realize this was called from the
top half.

Usually the top half for threaded interrupts is used just to quite the
interrupt line. Either by acknowledging the interrupt or by disabling
the device from sending more interrupts till the bottom half (thread)
can run. This looks to be doing a bit more than that.

I'll look a bit deeper at the patch, but this still doesn't look like a
typical threaded interrupt usage.

-- Steve


--
To unsubscribe from this list: send the line unsubscribe linux-usb in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 0/4] USB: HCD/EHCI: giveback of URB in tasklet context

2013-06-13 Thread Alan Stern
On Thu, 13 Jun 2013, Steven Rostedt wrote:

 I only took a quick look at the second patch. I'm now looking at both
 patches applied to the code. I didn't realize this was called from the
 top half.
 
 Usually the top half for threaded interrupts is used just to quite the
 interrupt line. Either by acknowledging the interrupt or by disabling
 the device from sending more interrupts till the bottom half (thread)
 can run. This looks to be doing a bit more than that.

Yes, it does.  Ming left all the host controller processing in the top 
half.  The bottom half merely handles the completion callbacks.

One of the things we discussed during this email thread was the
possibility of moving _all_ the work into the threaded handler.  
That's not as easy to do, though; it requires significant modification
of the host controller driver.  And each controller driver would need
its own modifications, whereas with Ming's approach only the core needs
to be changed.

 I'll look a bit deeper at the patch, but this still doesn't look like a
 typical threaded interrupt usage.

I'll agree with that; it isn't typical.  Ming claims that the work
remaining in the top half often takes less than 20 us on average in his
tests.  Depending on the particular device, the work in the bottom half
can be very quick (little more than waking up a thread in the case of
usb-storage) or quite slow (perhaps on the order of a ms or more for
the UVC video driver on some ARM platforms).

Alan Stern

--
To unsubscribe from this list: send the line unsubscribe linux-usb in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 0/4] USB: HCD/EHCI: giveback of URB in tasklet context

2013-06-13 Thread Greg Kroah-Hartman
On Thu, Jun 13, 2013 at 03:41:17PM -0400, Alan Stern wrote:
 On Thu, 13 Jun 2013, Greg Kroah-Hartman wrote:
 
  On Thu, Jun 13, 2013 at 10:54:13AM -0400, Alan Stern wrote:
   On Thu, 13 Jun 2013, Ming Lei wrote:
   
- using interrupt threaded handler(default)
33.440 MB/sec

- using tasklet(#undef USB_HCD_THREADED_IRQ)
34.29 MB/sec

- using hard interrupt handler(by removing HCD_BH in ehci-hcd.c )
34.260 MB/s


So looks usb mass storage performance loss can be observed with
interrupt threaded handler because one mass storage read/write sectors
requires at least 3 interrupts which wake up usb-storage thread 3 times
(each interrupt wakeup the usb-storage each time), introducing irq 
threaded
handler will make 2 threads to be waken up about 6 times for one 
read/write.

I think usb mass storage transfer handler need to be rewritten, 
otherwise
it may become worsen after using irq threaded handler in USB 3.0.(the
above device can reach 120MB/sec with hardware handler or tasklet 
handler,
which means about ~3K interrupts/sec, so ~6K contexts switch in case of
using irq threaded handler)

So how about supporting tasklet first, then convert to interrupt
threaded handler
after usb mass storage transfer is rewritten without performance loss?
(rewriting
usb mass storage transfer handler may need some time and work since 
storage
stability/correctness is extremely important, :-)
   
   Maybe we should simply copy what the networking people do.  They are 
   very concerned about performance and latency; whatever technique they 
   use should be good for USB too.
  
  Yes, but for old-style usb-storage, is this really a big deal?  We are
  still easily hitting the line-speed of USB for usb-storage with simple
  machines, the bottlenecks that I'm seeing are in the devices themselves,
  and then in the USB wire speed.
 
 What about with USB-3 storage devices?  Many of them still use the
 bulk-only transport instead of UAS.  They may push the limits up.

Are they really?  Have we seen that happen yet?  With the number's I've
seen published, we are easily serving up enough data to keep the pipe
full, but that all depends on your CPU / host controller.

  Once hardware comes out that uses USB streams, and we get device support
  for the UAS protocol, then we might have a need to change things, but at
  this point in time, for the old driver, I think we are fine.
  
  Unless someone has a workload / benchmark that shows otherwise?
 
 The test results above show a 2.4% degradation for threaded interrupts 
 as compared to tasklets.  That's in addition to the bottlenecks caused 
 by the device; no doubt it would be worse for a faster device.  This 
 result calls into question the benefits of threaded interrupts.
 
 The main reason for moving away from the current scheme is to reduce
 latency for other interrupt handlers.  Ming gave two examples of slow
 USB code that runs in hardirq context now; with his change they would
 run in softirq context and therefore wouldn't delay other interrupts so
 much.  (Interrupt latency is hard to measure, however.)

Yes, I know that people keep wanting to worry about latency issues, and
the best answer for them has always been, don't use USB. :)

You suffer throughput issues with predicitable latency dependancies, so
we need to be careful we don't slow down the 99% of the systems out
there that do not care about this at all.

greg k-h
--
To unsubscribe from this list: send the line unsubscribe linux-usb in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 0/4] USB: HCD/EHCI: giveback of URB in tasklet context

2013-06-13 Thread Ming Lei
On Fri, Jun 14, 2013 at 5:09 AM, Alan Stern st...@rowland.harvard.edu wrote:
 On Thu, 13 Jun 2013, Steven Rostedt wrote:

 On Thu, 2013-06-13 at 15:41 -0400, Alan Stern wrote:

  The test results above show a 2.4% degradation for threaded interrupts
  as compared to tasklets.  That's in addition to the bottlenecks caused
  by the device; no doubt it would be worse for a faster device.  This
  result calls into question the benefits of threaded interrupts.

 That's because it was written like a top half and not a full blown
 interrupt. I just looked at the patch, and saw this:

 +#ifndef USB_HCD_THREADED_IRQ
 if (sched) {
 if (async)
 tasklet_schedule(bh-bh);
 else
 tasklet_hi_schedule(bh-bh);
 }
 +#else
 +   if (sched)
 +   schedule_work(hcd-periodic_bh-work);
 +#endif

 What is this? The work isn't done by an interrupt thread, but by work queues!

 You don't understand the patch.  Most of the time, sched will be 0
 here and hence the work queue won't be involved.

 Yes, part of the work is done by a work queue rather than the interrupt
 thread.  But it is an unimportant part, the part that involves
 transfers to root hubs or transfers that were cancelled.  These things
 can complete without any interrupt occurring, so they can't be handled
 by the interrupt thread.  However, they are the abnormal case; the
 transfers we care about are not to root hubs and they do complete
 normally.

 The point of the interrupt thread is that you do *all* the work that
 needs to be done when an interrupt comes in. You don't need to delay the
 work.

 You've got it backward.  The patch doesn't leave part of the work
 undone when an interrupt occurs.  Rather it's the other way around --
 sometimes work needs to be done when there isn't any interrupt.  This
 could happen in a timer callback, or it could happen as a direct result
 of a function call.

 Since there doesn't seem to be any way to invoke the interrupt thread
 in the absence of an interrupt, Ming pushed the job off to a work
 queue.

 If you just treat a threaded interrupt like a real interrupt and push
 off work to something else, then yes, it will degrade performance.

 If you go the threaded interrupt route, you need to rethink the
 paradigm. There's no reason that the interrupt handler needs to be fast
 like it needs to be in true interrupt context. The handler can now use
 mutexes, and other full features that currently only threads benefit
 from. It should improve locking issues, and can serialize things if
 needed.

 All this patch did was to switch the main irq to a thread and make a
 bottom half into a work queue.

 In case it's not clear, the code you quoted above is part of the
 interrupt handler, not part of the thread.

 Why couldn't you just do:

   if (sched)
   usb_giveback_urb_bh(bh);

 ?

 Because usb_giveback_urb_bh() is supposed to run in the context of the
 tasklet or interrupt thread or work queue, not in the context of the
 interrupt handler.

Exactly, the code should have been the below shape:

#ifndef USB_HCD_THREADED_IRQ
  ..
#else
   if (!in_irq()) {
  bh = hcd-periodic_bh;
  sched = 1;
  }
#endif

Then it will be quite clear.

Thanks,
--
Ming Lei
--
To unsubscribe from this list: send the line unsubscribe linux-usb in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 0/4] USB: HCD/EHCI: giveback of URB in tasklet context

2013-06-13 Thread Ming Lei
On Fri, Jun 14, 2013 at 3:41 AM, Alan Stern st...@rowland.harvard.edu wrote:
 On Thu, 13 Jun 2013, Greg Kroah-Hartman wrote:

 On Thu, Jun 13, 2013 at 10:54:13AM -0400, Alan Stern wrote:
  On Thu, 13 Jun 2013, Ming Lei wrote:
 
   - using interrupt threaded handler(default)
   33.440 MB/sec
  
   - using tasklet(#undef USB_HCD_THREADED_IRQ)
   34.29 MB/sec
  
   - using hard interrupt handler(by removing HCD_BH in ehci-hcd.c )
   34.260 MB/s
  
  
   So looks usb mass storage performance loss can be observed with
   interrupt threaded handler because one mass storage read/write sectors
   requires at least 3 interrupts which wake up usb-storage thread 3 times
   (each interrupt wakeup the usb-storage each time), introducing irq 
   threaded
   handler will make 2 threads to be waken up about 6 times for one 
   read/write.
  
   I think usb mass storage transfer handler need to be rewritten, otherwise
   it may become worsen after using irq threaded handler in USB 3.0.(the
   above device can reach 120MB/sec with hardware handler or tasklet 
   handler,
   which means about ~3K interrupts/sec, so ~6K contexts switch in case of
   using irq threaded handler)
  
   So how about supporting tasklet first, then convert to interrupt
   threaded handler
   after usb mass storage transfer is rewritten without performance loss?
   (rewriting
   usb mass storage transfer handler may need some time and work since 
   storage
   stability/correctness is extremely important, :-)
 
  Maybe we should simply copy what the networking people do.  They are
  very concerned about performance and latency; whatever technique they
  use should be good for USB too.

 Yes, but for old-style usb-storage, is this really a big deal?  We are
 still easily hitting the line-speed of USB for usb-storage with simple
 machines, the bottlenecks that I'm seeing are in the devices themselves,
 and then in the USB wire speed.

 What about with USB-3 storage devices?  Many of them still use the
 bulk-only transport instead of UAS.  They may push the limits up.

Exactly, my test device(sandisk extreme USB 3.0 16G, 0781:5580) is very
popular, which is faster than most USB 3.0 pendrive in market, but the device
is bulk-only, and no UAS support, so I guess most of the USB 3.0 pendrive in
market still may not support UAS.


 Once hardware comes out that uses USB streams, and we get device support
 for the UAS protocol, then we might have a need to change things, but at
 this point in time, for the old driver, I think we are fine.

 Unless someone has a workload / benchmark that shows otherwise?

 The test results above show a 2.4% degradation for threaded interrupts
 as compared to tasklets.  That's in addition to the bottlenecks caused
 by the device; no doubt it would be worse for a faster device.  This
 result calls into question the benefits of threaded interrupts.

If I enable HCD_BH in xhci driver and enable requst_threaded_irq in xhci driver,
the degradation becomes 10%, see below test on the same device connected to
xhci-hcd:

[tom@board]$ps -ax | grep xhci
Warning: bad ps syntax, perhaps a bogus '-'? See http://procps.sf.net/faq.html
 4896 pts/1S+ 0:00 grep --color=auto xhci
[tom@board]$sudo ./ds-msg /dev/sdb 400M 1 4
No. 0, time 121 MB
No. 1, time 122 MB
No. 2, time 124 MB
No. 3, time 122 MB
count=4, total=489 ms, average=122.250 MB
[tom@board]$
[tom@board]$ps -ax | grep xhci
Warning: bad ps syntax, perhaps a bogus '-'? See http://procps.sf.net/faq.html
 6037 ?S  0:00 [irq/42-xhci_hcd]
 6038 ?S  0:00 [irq/43-xhci_hcd]
 6039 ?S  0:00 [irq/44-xhci_hcd]
 6040 ?S  0:00 [irq/45-xhci_hcd]
 6041 ?S  0:00 [irq/46-xhci_hcd]
 6304 pts/1S+ 0:00 grep --color=auto xhci
[tom@board]$
[tom@board]$
[tom@board]$sudo ./ds-msg /dev/sdb 400M 1 4
No. 0, time 107 MB
No. 1, time 108 MB
No. 2, time 108 MB
No. 3, time 109 MB
count=4, total=432 ms, average=108.000 MB

Thanks,
--
Ming Lei
--
To unsubscribe from this list: send the line unsubscribe linux-usb in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 0/4] USB: HCD/EHCI: giveback of URB in tasklet context

2013-06-13 Thread Ming Lei
On Fri, Jun 14, 2013 at 8:35 AM, Greg Kroah-Hartman
gre...@linuxfoundation.org wrote:
 On Thu, Jun 13, 2013 at 03:41:17PM -0400, Alan Stern wrote:
 On Thu, 13 Jun 2013, Greg Kroah-Hartman wrote:

  On Thu, Jun 13, 2013 at 10:54:13AM -0400, Alan Stern wrote:
   On Thu, 13 Jun 2013, Ming Lei wrote:
  
- using interrupt threaded handler(default)
33.440 MB/sec
   
- using tasklet(#undef USB_HCD_THREADED_IRQ)
34.29 MB/sec
   
- using hard interrupt handler(by removing HCD_BH in ehci-hcd.c )
34.260 MB/s
   
   
So looks usb mass storage performance loss can be observed with
interrupt threaded handler because one mass storage read/write sectors
requires at least 3 interrupts which wake up usb-storage thread 3 times
(each interrupt wakeup the usb-storage each time), introducing irq 
threaded
handler will make 2 threads to be waken up about 6 times for one 
read/write.
   
I think usb mass storage transfer handler need to be rewritten, 
otherwise
it may become worsen after using irq threaded handler in USB 3.0.(the
above device can reach 120MB/sec with hardware handler or tasklet 
handler,
which means about ~3K interrupts/sec, so ~6K contexts switch in case of
using irq threaded handler)
   
So how about supporting tasklet first, then convert to interrupt
threaded handler
after usb mass storage transfer is rewritten without performance loss?
(rewriting
usb mass storage transfer handler may need some time and work since 
storage
stability/correctness is extremely important, :-)
  
   Maybe we should simply copy what the networking people do.  They are
   very concerned about performance and latency; whatever technique they
   use should be good for USB too.
 
  Yes, but for old-style usb-storage, is this really a big deal?  We are
  still easily hitting the line-speed of USB for usb-storage with simple
  machines, the bottlenecks that I'm seeing are in the devices themselves,
  and then in the USB wire speed.

 What about with USB-3 storage devices?  Many of them still use the
 bulk-only transport instead of UAS.  They may push the limits up.

 Are they really?  Have we seen that happen yet?  With the number's I've

Yes, the device I am testing is bulk-only, no uas support , and it is very
popular in market.

 seen published, we are easily serving up enough data to keep the pipe
 full, but that all depends on your CPU / host controller.

  Once hardware comes out that uses USB streams, and we get device support
  for the UAS protocol, then we might have a need to change things, but at
  this point in time, for the old driver, I think we are fine.
 
  Unless someone has a workload / benchmark that shows otherwise?

 The test results above show a 2.4% degradation for threaded interrupts
 as compared to tasklets.  That's in addition to the bottlenecks caused
 by the device; no doubt it would be worse for a faster device.  This
 result calls into question the benefits of threaded interrupts.

 The main reason for moving away from the current scheme is to reduce
 latency for other interrupt handlers.  Ming gave two examples of slow
 USB code that runs in hardirq context now; with his change they would
 run in softirq context and therefore wouldn't delay other interrupts so
 much.  (Interrupt latency is hard to measure, however.)

 Yes, I know that people keep wanting to worry about latency issues, and
 the best answer for them has always been, don't use USB. :)

I think we can do it better, why don't do it? :-)

 You suffer throughput issues with predicitable latency dependancies, so

This patchset don't cause throughout degradation but decrease latency much,
also has other advantages.

 we need to be careful we don't slow down the 99% of the systems out
 there that do not care about this at all.

Considered great amount of ARM devices in market, I think we need to
consider the problem on these devices, :-)

Thanks,
--
Ming Lei
--
To unsubscribe from this list: send the line unsubscribe linux-usb in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 0/4] USB: HCD/EHCI: giveback of URB in tasklet context

2013-06-13 Thread Ming Lei
On Thu, Jun 13, 2013 at 10:54 PM, Alan Stern st...@rowland.harvard.edu wrote:

 Maybe we should simply copy what the networking people do.  They are
 very concerned about performance and latency; whatever technique they
 use should be good for USB too.

Most of net devices don't use interrupt threaded handler, looks we need to
copy that first, :-)

$git grep -n request_threaded_irq drivers/net/ | wc -l
9

$git grep -n request_threaded_irq drivers/net/wireless | wc -l
5

Also 5 of 9 using interrupt threaded handler are wireless network devices, which
are a bit slow, and only one ethernet driver uses it.

So I plan to use tasklet first, then we can switch to interrupt threaded handler
in the future if mass storage devices are OK with it.

If no one objects to use tasklet, I will post out the v1 patch for review later.

Thanks,
--
Ming Lei
--
To unsubscribe from this list: send the line unsubscribe linux-usb in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 0/4] USB: HCD/EHCI: giveback of URB in tasklet context

2013-06-13 Thread Ming Lei
On Fri, Jun 14, 2013 at 3:41 AM, Alan Stern st...@rowland.harvard.edu wrote:

 The main reason for moving away from the current scheme is to reduce
 latency for other interrupt handlers.  Ming gave two examples of slow
 USB code that runs in hardirq context now; with his change they would
 run in softirq context and therefore wouldn't delay other interrupts so
 much.  (Interrupt latency is hard to measure, however.)

With the two trace points of irq_handler_entry and irq_handler_exit, the
interrupt latency(or the time taken by hard irq handler) isn't hard to measure.
One simple script can figure out the average/maximum latency for one irq
handler, like I did in 4/4.


Thanks,
--
Ming Lei
--
To unsubscribe from this list: send the line unsubscribe linux-usb in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 0/4] USB: HCD/EHCI: giveback of URB in tasklet context

2013-06-12 Thread Ming Lei
On Wed, Jun 12, 2013 at 10:43 AM, Ming Lei ming@canonical.com wrote:
 On Wed, Jun 12, 2013 at 3:10 AM, Alan Stern st...@rowland.harvard.edu wrote:

 I don't follow your reasoning.  Consider the following situation, where
 the same spinlock is used in both a URB completion handler and an
 interrupt handler:

 URB completes
 A tasklet calls the completion handler, with interrupts enabled
 The completion handler does
 spin_lock(lock);
 An interrupt occurs
 The interrupt handler does
 spin_lock(lock);   // Deadlock!

 If you mean the interrupt handler is HCD irq handler of the device, no
 such problem since all complete() will run in tasklet context.

 If not, I am wondering why one USB driver need register another hard
 interrupt handler? Could you share such examples? I understand the
 case only exists with timer handler as discussed with Oliver, don't I?

In fact, timer funtion is still run in softirq context, so the deadlock won't
be caused with acquiring same lock(spin_lock) in both timer function
and complete() from tasklet.


Thanks,
--
Ming Lei
--
To unsubscribe from this list: send the line unsubscribe linux-usb in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 0/4] USB: HCD/EHCI: giveback of URB in tasklet context

2013-06-12 Thread Thomas Gleixner
On Wed, 12 Jun 2013, Ming Lei wrote:
 On Wed, Jun 12, 2013 at 3:10 AM, Alan Stern st...@rowland.harvard.edu wrote:
  Also the tasklet running in CPU0 can handle the work which should
  have been done by the same tasket scheduled in CPU1,  so we can
  avoid busy-waitting in CPU1 which may be introduced by tasklet_schedule()
  in CPU1.
 
  Or you could have a separate tasklet for each host controller.
 
 Yes, but I will compare tasklet with interrupt threaded handler first.

Yes, please. I read through the thread and I really recommend that you
get rid of the tasklet. tasklets are a complete disaster by design and
we really should make efforts to get rid of the last users instead of
trying to work around their semantical shortcomings.

Thanks,

tglx
--
To unsubscribe from this list: send the line unsubscribe linux-usb in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 0/4] USB: HCD/EHCI: giveback of URB in tasklet context

2013-06-12 Thread Oliver Neukum
On Tuesday 11 June 2013 15:10:03 Alan Stern wrote:
 In order to prevent this from happening, you would have to change the
 spin_lock() call in the completion handler to spin_lock_irqsave().  
 Furthermore, you will have to audit every USB driver to make sure that
 all the completion handlers get fixed.

Yes. However, it can be done mechanically. And we know only
the handlers for complete need to be fixed.

Regards
Oliver

--
To unsubscribe from this list: send the line unsubscribe linux-usb in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 0/4] USB: HCD/EHCI: giveback of URB in tasklet context

2013-06-12 Thread Ming Lei
On Wed, Jun 12, 2013 at 5:11 PM, Oliver Neukum oli...@neukum.org wrote:
 On Tuesday 11 June 2013 15:10:03 Alan Stern wrote:
 In order to prevent this from happening, you would have to change the
 spin_lock() call in the completion handler to spin_lock_irqsave().
 Furthermore, you will have to audit every USB driver to make sure that
 all the completion handlers get fixed.

 Yes. However, it can be done mechanically. And we know only
 the handlers for complete need to be fixed.

I am wondering if the change is needed since timer function is still
run in softirq context instead of hard irq.

Looks Alan concerned that one USB interface driver may have another
hard interrupt handler involved. Is there such kind of USB driver/device
in tree?

Thanks,
--
Ming
--
To unsubscribe from this list: send the line unsubscribe linux-usb in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 0/4] USB: HCD/EHCI: giveback of URB in tasklet context

2013-06-12 Thread Oliver Neukum
On Wednesday 12 June 2013 18:11:34 Ming Lei wrote:
 On Wed, Jun 12, 2013 at 5:11 PM, Oliver Neukum oli...@neukum.org wrote:
  On Tuesday 11 June 2013 15:10:03 Alan Stern wrote:
  In order to prevent this from happening, you would have to change the
  spin_lock() call in the completion handler to spin_lock_irqsave().
  Furthermore, you will have to audit every USB driver to make sure that
  all the completion handlers get fixed.
 
  Yes. However, it can be done mechanically. And we know only
  the handlers for complete need to be fixed.
 
 I am wondering if the change is needed since timer function is still
 run in softirq context instead of hard irq.
 
 Looks Alan concerned that one USB interface driver may have another
 hard interrupt handler involved. Is there such kind of USB driver/device
 in tree?

No. Suppose a USB network driver.
The complete() handler is written on the assumption that interrupts are off.
So it takes a spinlock from the network subsystem. It does so with spin_lock()

Other network drivers also take the lock. And they may take it from an IRQ 
handler.
If such an IRQ interrupts the tasklet complete() is running in, the CPU will 
deadlock.

The danger is not interrupt handlers in the same driver but IRQ handlers of 
_other_
drivers (PCI, ...) a lock is shared with.

You need to go through all USB drivers and change every spin_lock() that goes
for a lock that is exported to a spin_lock_irqsave()

Regards
Oliver

--
To unsubscribe from this list: send the line unsubscribe linux-usb in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 0/4] USB: HCD/EHCI: giveback of URB in tasklet context

2013-06-12 Thread Ming Lei
On Wed, Jun 12, 2013 at 6:19 PM, Oliver Neukum oli...@neukum.org wrote:
 On Wednesday 12 June 2013 18:11:34 Ming Lei wrote:
 On Wed, Jun 12, 2013 at 5:11 PM, Oliver Neukum oli...@neukum.org wrote:
  On Tuesday 11 June 2013 15:10:03 Alan Stern wrote:
  In order to prevent this from happening, you would have to change the
  spin_lock() call in the completion handler to spin_lock_irqsave().
  Furthermore, you will have to audit every USB driver to make sure that
  all the completion handlers get fixed.
 
  Yes. However, it can be done mechanically. And we know only
  the handlers for complete need to be fixed.

 I am wondering if the change is needed since timer function is still
 run in softirq context instead of hard irq.

 Looks Alan concerned that one USB interface driver may have another
 hard interrupt handler involved. Is there such kind of USB driver/device
 in tree?

 No. Suppose a USB network driver.
 The complete() handler is written on the assumption that interrupts are off.
 So it takes a spinlock from the network subsystem. It does so with spin_lock()

Looks I misses the case, but IMO, it might not be very common, generally
subsystem provides API for drivers to do something(handle rx/report tx) inside
complete(), and seldom exports subsystem-wide lock directly to drivers.
API has no context info, and should have called spin_lock_irqrestore().


 Other network drivers also take the lock. And they may take it from an IRQ 
 handler.
 If such an IRQ interrupts the tasklet complete() is running in, the CPU will 
 deadlock.

Looks no usbnet drivers hold subsystem(network) locks in its complete().
Both the locks held are per device/per skb list. In usbnet_bh(), there
is one, eg.
netif_rx(), which is a API and disables local IRQ.


 The danger is not interrupt handlers in the same driver but IRQ handlers of 
 _other_
 drivers (PCI, ...) a lock is shared with.

Right, so generally drivers which spin_lock(subsystem lock) in complete()
might be affected.


 You need to go through all USB drivers and change every spin_lock() that goes
 for a lock that is exported to a spin_lock_irqsave()

Yes, that is the safest way.

Thanks,
--
Ming Lei
--
To unsubscribe from this list: send the line unsubscribe linux-usb in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 0/4] USB: HCD/EHCI: giveback of URB in tasklet context

2013-06-12 Thread Alan Stern
On Wed, 12 Jun 2013, Ming Lei wrote:

 If not, I am wondering why one USB driver need register another hard
 interrupt handler? Could you share such examples? I understand the
 case only exists with timer handler as discussed with Oliver, don't I?

Here's another possibility.  A USB driver has a private lock, and it
acquires this lock in its completion handler.  But it also acquires
this lock in other routines, and those other routines may be called by
other drivers or subsystems in interrupt context.

  The drivers _do_ need to be changed, as explained above.  And that
  explanation was only one failure scenario.  There may be others.
 
 OK, I'd like to know what the others are? :-)

Me too.  :-)

  And finally, USB HCDs themselves will benefit from the change, won't they?
 
  How will they benefit?  Merely from not releasing their private locks?
 
 The interrupt of one HCD will be responded lately since another HCD's
 interrupt handler takes very long time. With the change, the problem can
 be avoided.

For reasonably sophisticated host controllers like EHCI, the delay in
responding to an interrupt doesn't matter much.  But for simpler host
controllers (for example, those using PIO instead of DMA) it matters
more, so those other HCDs will benefit more from the conversion.

  Compared with usb-storage task's schedule latency, the tasklet
  schedule delay should be lower at most of situations since the tasklet
  is scheduled inside irq handler.
 
  True.  But the two latencies add.
 
 I think it should be one tasklet latency added, but what is the other one?

I meant that the latency in tasklet scheduling has to be added to the 
latency in scheduling the usb-storage thread.

  Or you could have a separate tasklet for each host controller.
 
 Yes, but I will compare tasklet with interrupt threaded handler first.

Switching to a threaded interrupt handler offers more possibilities.  
Instead of moving just the givebacks to the thread, we could put almost 
all the processing there.  (At least, we can for ehci-hcd.  Other HCDs 
may not be able to do this.)

At this point we are talking about multiple changes:

Moving the givebacks to a tasklet or threaded handler.

Changing the USB completion handlers so that they can be called 
with interrupts enabled.

Allowing the tasklet/thread to run with interrupts enabled.

Moving more of the HCD processing into the tasklet/thread.

Maybe other things too.  In principle, the first and second items can 
be done simultaneously.

Alan Stern

--
To unsubscribe from this list: send the line unsubscribe linux-usb in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 0/4] USB: HCD/EHCI: giveback of URB in tasklet context

2013-06-12 Thread Oliver Neukum
On Wednesday 12 June 2013 10:35:44 Alan Stern wrote:
 On Wed, 12 Jun 2013, Ming Lei wrote:
 
  If not, I am wondering why one USB driver need register another hard
  interrupt handler? Could you share such examples? I understand the
  case only exists with timer handler as discussed with Oliver, don't I?
 
 Here's another possibility.  A USB driver has a private lock, and it
 acquires this lock in its completion handler.  But it also acquires
 this lock in other routines, and those other routines may be called by
 other drivers or subsystems in interrupt context.

That is fatal. But again mechanically using _irqsave in complete()
does the job. No other place can be affected.

Regards
Oliver

--
To unsubscribe from this list: send the line unsubscribe linux-usb in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 0/4] USB: HCD/EHCI: giveback of URB in tasklet context

2013-06-12 Thread Ming Lei
On Wed, Jun 12, 2013 at 3:45 PM, Thomas Gleixner t...@linutronix.de wrote:
 On Wed, 12 Jun 2013, Ming Lei wrote:

 Yes, please. I read through the thread and I really recommend that you
 get rid of the tasklet. tasklets are a complete disaster by design and
 we really should make efforts to get rid of the last users instead of
 trying to work around their semantical shortcomings.

The attached patch001 supports tasklet in HCD and patch002 implements
interrupt threaded handler, then if USB_HCD_THREADED_IRQ
is set, interrupt threaded handler is used to do URB giveback, otherwise
tasklet is used to do that.

The other 3 patches(2/4, 3/4, 4/4) aren't changed now, if anyone want to try it.
(the patch 2 is only for comparing performance difference, and welcome to
review/comment it so that we can get accurate test result to make decision)

Follows the mass storage test result(average over 10 times test) with
tasklet /interrupt threaded handler/hard interrupt handler under  same
environment of lenovo T410(x86), which means the test is switched by
reinserting module of usbcore or ehci-hcd without changing other things
in the machine.

- do below tests 10 times and figure out the average speed

  dd if=/dev/sdN of=/dev/null iflag=direct bs=200M 1

  device: sandisk extreme USB 3.0 16G, host: Lenovo T410 EHCI

- using interrupt threaded handler(default)
33.440 MB/sec

- using tasklet(#undef USB_HCD_THREADED_IRQ)
34.29 MB/sec

- using hard interrupt handler(by removing HCD_BH in ehci-hcd.c )
34.260 MB/s


So looks usb mass storage performance loss can be observed with
interrupt threaded handler because one mass storage read/write sectors
requires at least 3 interrupts which wake up usb-storage thread 3 times
(each interrupt wakeup the usb-storage each time), introducing irq threaded
handler will make 2 threads to be waken up about 6 times for one read/write.

I think usb mass storage transfer handler need to be rewritten, otherwise
it may become worsen after using irq threaded handler in USB 3.0.(the
above device can reach 120MB/sec with hardware handler or tasklet handler,
which means about ~3K interrupts/sec, so ~6K contexts switch in case of
using irq threaded handler)

So how about supporting tasklet first, then convert to interrupt
threaded handler
after usb mass storage transfer is rewritten without performance loss?
(rewriting
usb mass storage transfer handler may need some time and work since storage
stability/correctness is extremely important, :-)

Also another problem with irq threaded handler is that there is no sort of
tasklet_schedule() interface to wakeup the thread handler manually, so
I have to use work to schedule some URB giveback from drivers(root hub
transfer, unlink),  even though that isn't a big deal but will cause code a bit
much/complicated, :-)

Thanks,
--
Ming Lei


0002-USB-support-interrupt-threaded-handler.patch
Description: Binary data


0001-USB-HCD-support-giveback-of-URB-in-tasklet-context.patch
Description: Binary data


Re: [RFC PATCH 0/4] USB: HCD/EHCI: giveback of URB in tasklet context

2013-06-12 Thread Ming Lei
On Wed, Jun 12, 2013 at 10:35 PM, Alan Stern st...@rowland.harvard.edu wrote:
 On Wed, 12 Jun 2013, Ming Lei wrote:

 If not, I am wondering why one USB driver need register another hard
 interrupt handler? Could you share such examples? I understand the
 case only exists with timer handler as discussed with Oliver, don't I?

 Here's another possibility.  A USB driver has a private lock, and it
 acquires this lock in its completion handler.  But it also acquires
 this lock in other routines, and those other routines may be called by
 other drivers or subsystems in interrupt context.

OK, so the problem is more complicated than I imaged at start, :-(


  The drivers _do_ need to be changed, as explained above.  And that
  explanation was only one failure scenario.  There may be others.

 OK, I'd like to know what the others are? :-)

 Me too.  :-)

  And finally, USB HCDs themselves will benefit from the change, won't they?
 
  How will they benefit?  Merely from not releasing their private locks?

 The interrupt of one HCD will be responded lately since another HCD's
 interrupt handler takes very long time. With the change, the problem can
 be avoided.

 For reasonably sophisticated host controllers like EHCI, the delay in
 responding to an interrupt doesn't matter much.  But for simpler host

EHCI might still benefit from the change: suppose uvc video is playing on
one EHCI bus, and mass storage's performance over another EHCI may
decrease because of the IRQ latency introduced.

 controllers (for example, those using PIO instead of DMA) it matters
 more, so those other HCDs will benefit more from the conversion.

Indeed, musb and ehci/ohci are coexisted on OMAP3/4/5.

  Compared with usb-storage task's schedule latency, the tasklet
  schedule delay should be lower at most of situations since the tasklet
  is scheduled inside irq handler.
 
  True.  But the two latencies add.

 I think it should be one tasklet latency added, but what is the other one?

 I meant that the latency in tasklet scheduling has to be added to the
 latency in scheduling the usb-storage thread.

  Or you could have a separate tasklet for each host controller.

 Yes, but I will compare tasklet with interrupt threaded handler first.

 Switching to a threaded interrupt handler offers more possibilities.
 Instead of moving just the givebacks to the thread, we could put almost
 all the processing there.  (At least, we can for ehci-hcd.  Other HCDs
 may not be able to do this.)

 At this point we are talking about multiple changes:

 Moving the givebacks to a tasklet or threaded handler.

 Changing the USB completion handlers so that they can be called
 with interrupts enabled.

 Allowing the tasklet/thread to run with interrupts enabled.

 Moving more of the HCD processing into the tasklet/thread.

 Maybe other things too.  In principle, the first and second items can
 be done simultaneously.

Very good summery, we can push the 1st change with disabling local IRQ
when calling complete(), so that at least DMA unmapping can be done
with IRQ enabled, and at the same time do the API change, which
should be a bit slow but the mechanical way proposed by Oliver may
be OK.

The 3rd item is easy once the 1st two items are completed.

For the 4th one, it might be a long term thing, since refactoring
HCD interrupt handler is a bit complicated. Also, when the 1st
three items are completed, hard interrupt handler takes less time,
often less than 20us at average about EHCI, so I am wondering
if the 4th is worthy.


Thanks,
--
Ming Lei
--
To unsubscribe from this list: send the line unsubscribe linux-usb in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 0/4] USB: HCD/EHCI: giveback of URB in tasklet context

2013-06-11 Thread Ming Lei
On Tue, Jun 11, 2013 at 4:51 AM, Alan Stern st...@rowland.harvard.edu wrote:
 On Mon, 10 Jun 2013, Alan Stern wrote:

  Tasklet doesn't disable local interrupts.

 It is documented that interrupts will be disabled while the completion
 handler runs.  Therefore the tasklet _must_ disable local interrupts.

 You know, it may be that you can get most of the advantages you want by
 enabling local interrupts around the call to unmap_urb_for_dma() in
 usb_hcd_giveback_urb().

No, please don't enable IRQs inside interrupt handler, and you will
get warning dump.

Except for unmap_urb_for_dma() in usb_hcd_giveback_urb(),  I hope
map_urb_for_dma() inside complete() can benefit from the change too,
since map_urb_for_dma() is same time-consuming with unmap_urb_for_dma().

Also I hope complete() can be run with interrupt enabled since
driver's complete() may take long time on some ARCH, as I mentioned
in my last mail. And it isn't good to disable interrupt only for one interface
driver, looks I still don't get real good explanation about disabling
IRQs here, :-)


 This may be a little dangerous, though, because it is possible for an
 URB to be given back at the time it is submitted.  Drivers may not
 expect interrupts to get enabled temporarily when they call
 usb_submit_urb().

Thanks,
--
Ming Lei
--
To unsubscribe from this list: send the line unsubscribe linux-usb in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 0/4] USB: HCD/EHCI: giveback of URB in tasklet context

2013-06-11 Thread Oliver Neukum
On Tuesday 11 June 2013 13:40:20 Ming Lei wrote:
 On Tue, Jun 11, 2013 at 1:36 AM, Alan Stern st...@rowland.harvard.edu wrote:
  On Mon, 10 Jun 2013, Ming Lei wrote:

 If complete() callback runs in one tasklet context, spin_lock() inside
 complete() is enough, just like hard irq, the tasklet itself is disabled
 during complete(), if the percpu tasklet is converted to single tasklet.
 So no problem when the lock is to protect shared resources between
 complete().

We also have exclusion between complete() and other contexts, i.e. timers.

 When the lock is to protect shared resources between complete() and
 non-IRQ context, currently spin_lock_irqsave() is used in non-IRQ
 context, which is enough to prevent tasklet from being run on the CPU,
 so no problem for this situation too.
 
 When all HCDs support to run URB giveback in tasklet context, we can
 change all spin_lock_irq*() to spin_lock() in drivers URB-complete(), and
 in other places, the spin_lock_irq*() can be changed to spin_lock_bh().

Even now we cannot guarantee that all calls to complete() are in irq.
There is the case of HCD hotunplug and other cases, like timeouts.
They will have to be verified.

Regards
Oliver

--
To unsubscribe from this list: send the line unsubscribe linux-usb in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 0/4] USB: HCD/EHCI: giveback of URB in tasklet context

2013-06-11 Thread Ming Lei
On Tue, Jun 11, 2013 at 3:18 PM, Oliver Neukum oli...@neukum.org wrote:
 On Tuesday 11 June 2013 13:40:20 Ming Lei wrote:
 On Tue, Jun 11, 2013 at 1:36 AM, Alan Stern st...@rowland.harvard.edu 
 wrote:
  On Mon, 10 Jun 2013, Ming Lei wrote:

 If complete() callback runs in one tasklet context, spin_lock() inside
 complete() is enough, just like hard irq, the tasklet itself is disabled
 during complete(), if the percpu tasklet is converted to single tasklet.
 So no problem when the lock is to protect shared resources between
 complete().

 We also have exclusion between complete() and other contexts, i.e. timers.

The patch also converts the complete() called in timers to tasklet context too.

So could you let me know if that eases your concern? If not, could you explain
your concern about other contexts in a bit detail?


 When the lock is to protect shared resources between complete() and
 non-IRQ context, currently spin_lock_irqsave() is used in non-IRQ
 context, which is enough to prevent tasklet from being run on the CPU,
 so no problem for this situation too.

 When all HCDs support to run URB giveback in tasklet context, we can
 change all spin_lock_irq*() to spin_lock() in drivers URB-complete(), and
 in other places, the spin_lock_irq*() can be changed to spin_lock_bh().

 Even now we cannot guarantee that all calls to complete() are in irq.
 There is the case of HCD hotunplug and other cases, like timeouts.
 They will have to be verified.

All URBs are completed via usb_hcd_giveback_urb(), so there should
be no differences between these cases and the common one about
introducing the patchset.

Thanks,
--
Ming Lei
--
To unsubscribe from this list: send the line unsubscribe linux-usb in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 0/4] USB: HCD/EHCI: giveback of URB in tasklet context

2013-06-11 Thread Ming Lei
On Tue, Jun 11, 2013 at 4:54 AM, Steven Rostedt rost...@goodmis.org wrote:
 On Mon, 2013-06-10 at 16:47 -0400, Alan Stern wrote:
 On Mon, 10 Jun 2013, Steven Rostedt wrote:

and why so many drivers
are using tasklet/softirq?
 
  Because it's easy to set up and device driver authors don't know any
  better ;-). Note, a lot of drivers are now using work queues today,
  which run as a task.
 
  Yes, there's a little more overhead with a task than for a softirq, but
  the problem with softirqs and tasklets is that they can't be preempted,
  and they are more important than all tasks on the system. If you have a
  task that is critical, it must yield for your softirq. Almost!
 
  That is, even if you have a softirq, it may not run in irq context at
  all. If you get too many softirqs at a time (one comes while another one
  is running), it will defer the processing to the ksoftirq thread. This
  not only runs as a task, but also runs as SCHED_OTHER, and must yield to
  other tasks like everyone else too.
 
  Thus, adding it as a softirq does not guarantee that it will be
  processed quickly. It just means that most of the time it will prevent
  anything else from happening while your most important handler in the
  world is running.

 From this, it sounds like you generally advise using threaded interrupt
 handlers rather than tasklets/softirqs.

 Yes, there's plenty of benefits for them, and I highly doubt that any
 USB device would even notice the difference between a softirq and a
 thread for response time latencies.

Steven, thanks for your input.

IMO, about the problem, the most important point between threaded interrupt
handler and tasklet is the latency.

For USB video/audio application, the data need to be handled very quickly.
Also new transfer need to be scheduled without much latency. I am wondering
if interrupt thread handler can respond quickly enough if there is high load on
CPUs.

For USB mass storage driver, it still need to be handled quickly.

If tasklet is taken, tasklet_schedule() is always called from hard interrupt
context, so most of time the latency should be better than interrupt thread
handler latency since tasklet handler is called just after irq handler exits in
the situation.  Also we can avoid to do tasklet_schedule when the tasklet
is being handled.

I will compare, collect and post out some latency data later for the sake of
choosing which one to handle URB giveback.

Thanks,
--
Ming Lei
--
To unsubscribe from this list: send the line unsubscribe linux-usb in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 0/4] USB: HCD/EHCI: giveback of URB in tasklet context

2013-06-11 Thread Oliver Neukum
On Tuesday 11 June 2013 16:14:25 Ming Lei wrote:
 On Tue, Jun 11, 2013 at 3:18 PM, Oliver Neukum oli...@neukum.org wrote:
  On Tuesday 11 June 2013 13:40:20 Ming Lei wrote:
  On Tue, Jun 11, 2013 at 1:36 AM, Alan Stern st...@rowland.harvard.edu 
  wrote:
   On Mon, 10 Jun 2013, Ming Lei wrote:
 
  If complete() callback runs in one tasklet context, spin_lock() inside
  complete() is enough, just like hard irq, the tasklet itself is disabled
  during complete(), if the percpu tasklet is converted to single tasklet.
  So no problem when the lock is to protect shared resources between
  complete().
 
  We also have exclusion between complete() and other contexts, i.e. timers.
 
 The patch also converts the complete() called in timers to tasklet context 
 too.
 
 So could you let me know if that eases your concern? If not, could you explain
 your concern about other contexts in a bit detail?

The driver itself may have submitted a timer and race against it.
What locking do you need in complete() and a timer to lock against
each other?

  Even now we cannot guarantee that all calls to complete() are in irq.
  There is the case of HCD hotunplug and other cases, like timeouts.
  They will have to be verified.
 
 All URBs are completed via usb_hcd_giveback_urb(), so there should
 be no differences between these cases and the common one about
 introducing the patchset.

But it makes no sense to go to a tasklet when you are already in task context.
In those cases you need to do something, essentially blocking the tasklet.

Regards
Oliver


--
To unsubscribe from this list: send the line unsubscribe linux-usb in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 0/4] USB: HCD/EHCI: giveback of URB in tasklet context

2013-06-11 Thread Ming Lei
On Tue, Jun 11, 2013 at 4:49 PM, Oliver Neukum oli...@neukum.org wrote:
 On Tuesday 11 June 2013 16:14:25 Ming Lei wrote:
 On Tue, Jun 11, 2013 at 3:18 PM, Oliver Neukum oli...@neukum.org wrote:
  On Tuesday 11 June 2013 13:40:20 Ming Lei wrote:
  On Tue, Jun 11, 2013 at 1:36 AM, Alan Stern st...@rowland.harvard.edu 
  wrote:
   On Mon, 10 Jun 2013, Ming Lei wrote:
 
  If complete() callback runs in one tasklet context, spin_lock() inside
  complete() is enough, just like hard irq, the tasklet itself is disabled
  during complete(), if the percpu tasklet is converted to single tasklet.
  So no problem when the lock is to protect shared resources between
  complete().
 
  We also have exclusion between complete() and other contexts, i.e. timers.

 The patch also converts the complete() called in timers to tasklet context 
 too.

 So could you let me know if that eases your concern? If not, could you 
 explain
 your concern about other contexts in a bit detail?

 The driver itself may have submitted a timer and race against it.
 What locking do you need in complete() and a timer to lock against
 each other?

Good catch.

The problem will come if only spin_lock() is called inside complete(),
I will check main USB drivers in tree to see if there is such use case.


  Even now we cannot guarantee that all calls to complete() are in irq.
  There is the case of HCD hotunplug and other cases, like timeouts.
  They will have to be verified.

 All URBs are completed via usb_hcd_giveback_urb(), so there should
 be no differences between these cases and the common one about
 introducing the patchset.

 But it makes no sense to go to a tasklet when you are already in task context.
 In those cases you need to do something, essentially blocking the tasklet.

At least now, always doing complete() in tasklet handler can simplify
implementation since these cases aren't in hot path.

Thanks,
--
Ming Lei
--
To unsubscribe from this list: send the line unsubscribe linux-usb in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 0/4] USB: HCD/EHCI: giveback of URB in tasklet context

2013-06-11 Thread Oliver Neukum
On Tuesday 11 June 2013 17:27:28 Ming Lei wrote:
 On Tue, Jun 11, 2013 at 4:49 PM, Oliver Neukum oli...@neukum.org wrote:
  On Tuesday 11 June 2013 16:14:25 Ming Lei wrote:

  The driver itself may have submitted a timer and race against it.
  What locking do you need in complete() and a timer to lock against
  each other?
 
 Good catch.
 
 The problem will come if only spin_lock() is called inside complete(),
 I will check main USB drivers in tree to see if there is such use case.

All network drivers race against timeout. I think they just unlink the URB,
but there's a lot of them.

  But it makes no sense to go to a tasklet when you are already in task 
  context.
  In those cases you need to do something, essentially blocking the tasklet.
 
 At least now, always doing complete() in tasklet handler can simplify
 implementation since these cases aren't in hot path.

Well, I am afraid this is not simply the case. These cases are partially
synchronous. For example you need to make sure all calls to complete()
are finished before you disconnect a HCD itself. The same applies to a device
being disconnected.

It the same area, what happens if an URB is unlinked between the irq handler
and the tasklet?

Regards
Oliver

--
To unsubscribe from this list: send the line unsubscribe linux-usb in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 0/4] USB: HCD/EHCI: giveback of URB in tasklet context

2013-06-11 Thread Ming Lei
On Tue, Jun 11, 2013 at 6:51 PM, Oliver Neukum oli...@neukum.org wrote:
 On Tuesday 11 June 2013 17:27:28 Ming Lei wrote:
 On Tue, Jun 11, 2013 at 4:49 PM, Oliver Neukum oli...@neukum.org wrote:
  On Tuesday 11 June 2013 16:14:25 Ming Lei wrote:

  The driver itself may have submitted a timer and race against it.
  What locking do you need in complete() and a timer to lock against
  each other?

 Good catch.

 The problem will come if only spin_lock() is called inside complete(),
 I will check main USB drivers in tree to see if there is such use case.

 All network drivers race against timeout. I think they just unlink the URB,
 but there's a lot of them.

usbnet is fine since no spin_lock() is used in complete() and the same lock
is acquired in timer handler.

As far as I think of, the problem only exists in the below situation:

complete():
 spin_lock(A)
 ...
 spin_unlock(A)

timer handler():
spin_lock(A)
 
spin_unlock(A)

And we need to replace spin_lock() in complete() with spin_lock_irqsave()
if the change is going to be introduced.


  But it makes no sense to go to a tasklet when you are already in task 
  context.
  In those cases you need to do something, essentially blocking the tasklet.

 At least now, always doing complete() in tasklet handler can simplify
 implementation since these cases aren't in hot path.

 Well, I am afraid this is not simply the case. These cases are partially
 synchronous. For example you need to make sure all calls to complete()
 are finished before you disconnect a HCD itself. The same applies to a device
 being disconnected.

That is fine since the coming giveback() in tasklet context will drop the URB's
reference count(urb-use_count) finally if the scheduled tasklet can't
be dropped.
(looks tasklet schedule can make sure that)


 It the same area, what happens if an URB is unlinked between the irq handler
 and the tasklet?

The unlink will return failure since the URB isn't in queue of ep-urb_list.

Thanks,
--
Ming Lei
--
To unsubscribe from this list: send the line unsubscribe linux-usb in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 0/4] USB: HCD/EHCI: giveback of URB in tasklet context

2013-06-11 Thread Alan Stern
On Tue, 11 Jun 2013, Ming Lei wrote:

  Is there any good reason to do URB giveback with interrupt disabled?
 
  That's a good question.  Offhand I can't think of any drivers that rely
  on this -- although there certainly are places where callback routines
  use spin_lock() rather than spin_lock_irq() or spin_lock_irqsave(),
  because they know that interrupts are already disabled.
 
 Complete() may take long time, for example in UVC's driver, URB's
 complete() may take more than 3 ms, as reported in below link:
 
   http://marc.info/?t=136438111600010r=1w=2
 
 Also complete() is provided by interface driver, and it does many driver
 specific works, so it isn't good to disable interrupt only for one driver.

You probably mean it isn't good to disable interrupts for the sake of
only one driver.  As things stand now, we disable interrupts for all 
callbacks in all drivers.

 If complete() callback runs in one tasklet context, spin_lock() inside
 complete() is enough, just like hard irq, the tasklet itself is disabled
 during complete(), if the percpu tasklet is converted to single tasklet.
 So no problem when the lock is to protect shared resources between
 complete().
 
 When the lock is to protect shared resources between complete() and
 non-IRQ context, currently spin_lock_irqsave() is used in non-IRQ
 context, which is enough to prevent tasklet from being run on the CPU,
 so no problem for this situation too.
 
 When all HCDs support to run URB giveback in tasklet context, we can
 change all spin_lock_irq*() to spin_lock() in drivers URB-complete(), and
 in other places, the spin_lock_irq*() can be changed to spin_lock_bh().

I don't follow your reasoning.  Consider the following situation, where
the same spinlock is used in both a URB completion handler and an
interrupt handler:

URB completes
A tasklet calls the completion handler, with interrupts enabled
The completion handler does
spin_lock(lock);
An interrupt occurs
The interrupt handler does
spin_lock(lock);   // Deadlock!

In order to prevent this from happening, you would have to change the
spin_lock() call in the completion handler to spin_lock_irqsave().  
Furthermore, you will have to audit every USB driver to make sure that
all the completion handlers get fixed.

  However, changing a documented API is not something to be done lightly.
  You would have to audit _every_ USB driver to make sure no trouble
  would arise.
 
 OK, I will write patch to request for changing the API if the improvement
 on the situation isn't objected.

I don't mind.  But don't be surprised if you end up breaking some
drivers.

 In fact, at least now, the change on API is only about document change, and
 no drivers' change is required, isn't it?  We can describe that 
 URB-complete()
 might run in hard irq or softirq context, depends on HCD_BH flag of
 hcd-driver-flags.

The drivers _do_ need to be changed, as explained above.  And that
explanation was only one failure scenario.  There may be others.

 Even with current code, one HCD's interrupt handling still may delay
 the handling for another HCD interrupt handling for quite long, that is
 the motivation to decrease the interrupt handling time of USB HCD, ;-)
 And finally, USB HCDs themselves will benefit from the change, won't they?

How will they benefit?  Merely from not releasing their private locks?  
That involves a fairly small amount of code.

  For async transfer, generally, it should have no effect since there should
  have URBs queued on the qh queue before submitting URB.
 
  That's not how the Bulk-Only mass-storage protocol works.  There are
  times when the protocol _requires_ bulk packets not to be submitted
  until a particular URB has completed.
 
 Yes, I agree. But mass-storage driver itself is very fragile, it will perform
 badly if CPU has a higher load.

Why does that make it fragile?  _Every_ driver and program will behave
worse when the system is under a heavy load.

  And it is better to submit DATA/CSW
 transfer in previous transfer's complete(), IMO.

Actually, it would be even better to submit DATA _before_ the CBW
completes, and to submit the CSW before submitting DATA-OUT.  
Unfortunately, the design of the SG library makes it impossible to
submit the CSW during DATA-IN completion.

 Compared with usb-storage task's schedule latency, the tasklet
 schedule delay should be lower at most of situations since the tasklet
 is scheduled inside irq handler.

True.  But the two latencies add.

 So I decided to replace the percpu tasklet with one single tasklet,
 which can avoid the problem.

Hmmm.  What happens when there is more than one host controller?  The 
tasklet will give back only one URB at a time, and it won't run on more 
than one CPU at a time.  The existing drivers allow URBs to be given 
back on multiple CPUs simultaneously.

 Also the tasklet running in CPU0 can handle the work which should
 have 

Re: [RFC PATCH 0/4] USB: HCD/EHCI: giveback of URB in tasklet context

2013-06-11 Thread Ming Lei
On Wed, Jun 12, 2013 at 3:10 AM, Alan Stern st...@rowland.harvard.edu wrote:
 On Tue, 11 Jun 2013, Ming Lei wrote:

  Is there any good reason to do URB giveback with interrupt disabled?
 
  That's a good question.  Offhand I can't think of any drivers that rely
  on this -- although there certainly are places where callback routines
  use spin_lock() rather than spin_lock_irq() or spin_lock_irqsave(),
  because they know that interrupts are already disabled.

 Complete() may take long time, for example in UVC's driver, URB's
 complete() may take more than 3 ms, as reported in below link:

   http://marc.info/?t=136438111600010r=1w=2

 Also complete() is provided by interface driver, and it does many driver
 specific works, so it isn't good to disable interrupt only for one driver.

 You probably mean it isn't good to disable interrupts for the sake of
 only one driver.  As things stand now, we disable interrupts for all
 callbacks in all drivers.

 If complete() callback runs in one tasklet context, spin_lock() inside
 complete() is enough, just like hard irq, the tasklet itself is disabled
 during complete(), if the percpu tasklet is converted to single tasklet.
 So no problem when the lock is to protect shared resources between
 complete().

 When the lock is to protect shared resources between complete() and
 non-IRQ context, currently spin_lock_irqsave() is used in non-IRQ
 context, which is enough to prevent tasklet from being run on the CPU,
 so no problem for this situation too.

 When all HCDs support to run URB giveback in tasklet context, we can
 change all spin_lock_irq*() to spin_lock() in drivers URB-complete(), and
 in other places, the spin_lock_irq*() can be changed to spin_lock_bh().

 I don't follow your reasoning.  Consider the following situation, where
 the same spinlock is used in both a URB completion handler and an
 interrupt handler:

 URB completes
 A tasklet calls the completion handler, with interrupts enabled
 The completion handler does
 spin_lock(lock);
 An interrupt occurs
 The interrupt handler does
 spin_lock(lock);   // Deadlock!

If you mean the interrupt handler is HCD irq handler of the device, no
such problem since all complete() will run in tasklet context.

If not, I am wondering why one USB driver need register another hard
interrupt handler? Could you share such examples? I understand the
case only exists with timer handler as discussed with Oliver, don't I?


 In order to prevent this from happening, you would have to change the
 spin_lock() call in the completion handler to spin_lock_irqsave().
 Furthermore, you will have to audit every USB driver to make sure that
 all the completion handlers get fixed.

I have written one script to search usb drivers which may use timers and
spin_lock() at the same time, about 30 drivers are found, and looks we
can fix them.


  However, changing a documented API is not something to be done lightly.
  You would have to audit _every_ USB driver to make sure no trouble
  would arise.

 OK, I will write patch to request for changing the API if the improvement
 on the situation isn't objected.

 I don't mind.  But don't be surprised if you end up breaking some
 drivers.

Sure, we should change these drivers before changing the API.


 In fact, at least now, the change on API is only about document change, and
 no drivers' change is required, isn't it?  We can describe that 
 URB-complete()
 might run in hard irq or softirq context, depends on HCD_BH flag of
 hcd-driver-flags.

 The drivers _do_ need to be changed, as explained above.  And that
 explanation was only one failure scenario.  There may be others.

OK, I'd like to know what the others are? :-)


 Even with current code, one HCD's interrupt handling still may delay
 the handling for another HCD interrupt handling for quite long, that is
 the motivation to decrease the interrupt handling time of USB HCD, ;-)
 And finally, USB HCDs themselves will benefit from the change, won't they?

 How will they benefit?  Merely from not releasing their private locks?

The interrupt of one HCD will be responded lately since another HCD's
interrupt handler takes very long time. With the change, the problem can
be avoided.

 That involves a fairly small amount of code.

Maybe no much code, but :

1) Inside interrupt handler no submitting and unlinking requests from
drivers can happen any more, so interrupt handler should have became
simple.

2), Also the race between irq handler and related timer handler needn't
to be considered because the private lock can't be released in irq handler.


  For async transfer, generally, it should have no effect since there should
  have URBs queued on the qh queue before submitting URB.
 
  That's not how the Bulk-Only mass-storage protocol works.  There are
  times when the protocol _requires_ bulk packets not to be submitted
  until a particular URB has completed.

 Yes, 

Re: [RFC PATCH 0/4] USB: HCD/EHCI: giveback of URB in tasklet context

2013-06-10 Thread Ming Lei
On Sun, Jun 9, 2013 at 11:48 PM, Alan Stern st...@rowland.harvard.edu wrote:
 On Sun, 9 Jun 2013, Ming Lei wrote:

 Hi,

 The patchset supports to run giveback of URB in tasklet context, so that
 DMA unmapping/mapping on transfer buffer and compelte() callback can be
 run with interrupt enabled, then time of HCD interrupt handler can be
 saved much. Also this approach may simplify HCD since HCD lock needn't
 be released any more before calling usb_hcd_giveback_urb().

 That's a lot of material.  However, you haven't specifically said
 _what_ you want to accomplish.  It sounds like you're trying to

All IRQs are disabled(locally) in hard interrupt context, but they are
enabled in softirq context, this patchset can decrease the time a lot.

 minimize the amount of time spent in hardirq context, but I can't be
 sure that is really what you want.

 If it is, this is not a good way to do it.  A better way to minimize
 the time spent in hardirq context is to use a threaded interrupt
 handler.  But why do you want to minimize this time in the first place?

Process context switch may have more cost than switching to softirq,
then USB performance may be effected, that is why I choose to use
tasklet.  Also now there is no any sleep in URB giveback, so it isn't
necessary to introduce process to handle the giveback or completion.

 The CPU will be still have to use at least as much time as before; the
 only difference is that it will be in softirq or in a high-priority
 thread instead of in hardirq.

Yes, as I said above, but we can allow other IRQs to be served in
handling URB giveback in softirq context, that is the value of softirq/tasklet.
Of course, system may have more important things to do during the time.
Generally speaking, for one driver, hard interrupt handler should always
consume less time as possible.


 Also, I don't see how you can avoid releasing the HCD's private lock
 before calling usb_hcd_giveback_urb().  When the completion handler

Please see patch 1/4, :-)

 tries to resubmit the URB, the HCD's enqueue routine will try to
 reacquire the private lock.  If it didn't release the lock earlier, it

Yes, but with the patchset, resubmitting is called in taskelet context,
instead of hardirq context, so there isn't any recursion any more.


Thanks,
--
Ming Lei
--
To unsubscribe from this list: send the line unsubscribe linux-usb in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 0/4] USB: HCD/EHCI: giveback of URB in tasklet context

2013-06-10 Thread Alan Stern
[Thomas and Steve, please see the question below.]

On Mon, 10 Jun 2013, Ming Lei wrote:

 On Sun, Jun 9, 2013 at 11:48 PM, Alan Stern st...@rowland.harvard.edu wrote:
  On Sun, 9 Jun 2013, Ming Lei wrote:
 
  Hi,
 
  The patchset supports to run giveback of URB in tasklet context, so that
  DMA unmapping/mapping on transfer buffer and compelte() callback can be
  run with interrupt enabled, then time of HCD interrupt handler can be
  saved much. Also this approach may simplify HCD since HCD lock needn't
  be released any more before calling usb_hcd_giveback_urb().
 
  That's a lot of material.  However, you haven't specifically said
  _what_ you want to accomplish.  It sounds like you're trying to
 
 All IRQs are disabled(locally) in hard interrupt context, but they are
 enabled in softirq context, this patchset can decrease the time a lot.

That isn't clear.  It is documented that USB completion handlers are 
called with local interrupts disabled.  An IRQ arriving before the 
tasklet starts up might therefore be serviced during the short interval 
before the tasklet disables local interrupts, but if that happens it 
would mean that the completion handler would be delayed.

In effect, you are prioritizing other IRQs higher than USB completion
handlers.  Nevertheless, the total time spent with interrupts disabled
will remain the same.

(There's one other side effect that perhaps you haven't considered.  
When multiple URBs are addressed to the same endpoint, their completion 
handlers are called in order of URB completion, which is the same as 
the order of URB submission unless an URB is cancelled.  By delegating 
the completion handlers to tasklets, and particularly by using per-CPU 
tasklets, you may violate this behavior.)

  minimize the amount of time spent in hardirq context, but I can't be
  sure that is really what you want.
 
  If it is, this is not a good way to do it.  A better way to minimize
  the time spent in hardirq context is to use a threaded interrupt
  handler.  But why do you want to minimize this time in the first place?
 
 Process context switch may have more cost than switching to softirq,
 then USB performance may be effected, that is why I choose to use
 tasklet.  Also now there is no any sleep in URB giveback, so it isn't
 necessary to introduce process to handle the giveback or completion.

Thomas and Steve, can you comment on this?  I do not have a good 
understanding of the relative benefits/drawbacks of tasklets vs. 
threaded interrupt handlers.

  The CPU will be still have to use at least as much time as before; the
  only difference is that it will be in softirq or in a high-priority
  thread instead of in hardirq.
 
 Yes, as I said above, but we can allow other IRQs to be served in
 handling URB giveback in softirq context, that is the value of 
 softirq/tasklet.
 Of course, system may have more important things to do during the time.
 Generally speaking, for one driver, hard interrupt handler should always
 consume less time as possible.

As I understand it, the tradeoff between hardirq and softirq is
generally unimportant.  It does matter a lot in realtime settings --
but the RT kernel effectively makes _every_ interrupt handler threaded,
so it won't benefit from this change.

Alan Stern

--
To unsubscribe from this list: send the line unsubscribe linux-usb in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 0/4] USB: HCD/EHCI: giveback of URB in tasklet context

2013-06-10 Thread Oliver Neukum
On Monday 10 June 2013 10:03:00 Alan Stern wrote:
 [Thomas and Steve, please see the question below.]
 
 On Mon, 10 Jun 2013, Ming Lei wrote:
 

 That isn't clear.  It is documented that USB completion handlers are 
 called with local interrupts disabled.  An IRQ arriving before the 
 tasklet starts up might therefore be serviced during the short interval 
 before the tasklet disables local interrupts, but if that happens it 
 would mean that the completion handler would be delayed.

That is what tasklets do by definition, isn't it?
 
 In effect, you are prioritizing other IRQs higher than USB completion
 handlers.  Nevertheless, the total time spent with interrupts disabled
 will remain the same.

It pobably slightly increases. You have colder caches twice.
And potentially you swich CPUs.
 
 (There's one other side effect that perhaps you haven't considered.  
 When multiple URBs are addressed to the same endpoint, their completion 
 handlers are called in order of URB completion, which is the same as 
 the order of URB submission unless an URB is cancelled.  By delegating 
 the completion handlers to tasklets, and particularly by using per-CPU 
 tasklets, you may violate this behavior.)

This is quite serious. It mustn't happen.

Regards
Oliver

--
To unsubscribe from this list: send the line unsubscribe linux-usb in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 0/4] USB: HCD/EHCI: giveback of URB in tasklet context

2013-06-10 Thread Alan Stern
On Mon, 10 Jun 2013, Oliver Neukum wrote:

 On Monday 10 June 2013 10:03:00 Alan Stern wrote:
  [Thomas and Steve, please see the question below.]
  
  On Mon, 10 Jun 2013, Ming Lei wrote:
  
 
  That isn't clear.  It is documented that USB completion handlers are 
  called with local interrupts disabled.  An IRQ arriving before the 
  tasklet starts up might therefore be serviced during the short interval 
  before the tasklet disables local interrupts, but if that happens it 
  would mean that the completion handler would be delayed.
 
 That is what tasklets do by definition, isn't it?

Yes.  Although tasklets in general have no reason to leave interrupts 
disabled.

  In effect, you are prioritizing other IRQs higher than USB completion
  handlers.  Nevertheless, the total time spent with interrupts disabled
  will remain the same.
 
 It pobably slightly increases. You have colder caches twice.
 And potentially you swich CPUs.

Actually, the situation is a little better in one respect, which was
pointed out earlier but not emphasized.  The DMA unmapping can be
deferred to the tasklet, and it can run with interrupts enabled before
the completion hanlder is called.  Since the unmapping sometimes takes
a while to run, this is an advantage.

Alan Stern

--
To unsubscribe from this list: send the line unsubscribe linux-usb in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 0/4] USB: HCD/EHCI: giveback of URB in tasklet context

2013-06-10 Thread Ming Lei
On Mon, Jun 10, 2013 at 10:03 PM, Alan Stern st...@rowland.harvard.edu wrote:
 [Thomas and Steve, please see the question below.]

 On Mon, 10 Jun 2013, Ming Lei wrote:

 On Sun, Jun 9, 2013 at 11:48 PM, Alan Stern st...@rowland.harvard.edu 
 wrote:
  On Sun, 9 Jun 2013, Ming Lei wrote:
 
  Hi,
 
  The patchset supports to run giveback of URB in tasklet context, so that
  DMA unmapping/mapping on transfer buffer and compelte() callback can be
  run with interrupt enabled, then time of HCD interrupt handler can be
  saved much. Also this approach may simplify HCD since HCD lock needn't
  be released any more before calling usb_hcd_giveback_urb().
 
  That's a lot of material.  However, you haven't specifically said
  _what_ you want to accomplish.  It sounds like you're trying to

 All IRQs are disabled(locally) in hard interrupt context, but they are
 enabled in softirq context, this patchset can decrease the time a lot.

 That isn't clear.  It is documented that USB completion handlers are
 called with local interrupts disabled.  An IRQ arriving before the

Is there any good reason to do URB giveback with interrupt disabled?

IMO, disabling IRQs isn't necessary for completing URB since the
complete() of URBs for same endpoint is reentrant.

 tasklet starts up might therefore be serviced during the short interval
 before the tasklet disables local interrupts, but if that happens it

Tasklet doesn't disable local interrupts.

 would mean that the completion handler would be delayed.

Yes, the tasklet schedule delay is introduced to URB completion, but
the delay doesn't have much side effect about completing URB.

For async transfer, generally, it should have no effect since there should
have URBs queued on the qh queue before submitting URB.

For periodic transfer, the patch uses high priority tasklet to schedule it.


 In effect, you are prioritizing other IRQs higher than USB completion
 handlers.

Both other IRQs and HCD's IRQ are prioritizing the URB completion.

 Nevertheless, the total time spent with interrupts disabled
 will remain the same.

No, the total time spent with interrupts disabled does not remain same,
since no local IRQs are disabled during URB giveback anymore.


 (There's one other side effect that perhaps you haven't considered.
 When multiple URBs are addressed to the same endpoint, their completion
 handlers are called in order of URB completion, which is the same as
 the order of URB submission unless an URB is cancelled.  By delegating
 the completion handlers to tasklets, and particularly by using per-CPU
 tasklets, you may violate this behavior.)

Even though URB giveback is handled by hard interrupt handler, it still can't
guarantee that 100%,  please see below:

- interrupt for URB A comes in CPU 0, and handled, then HCD private
lock is released, and usb_hcd_giveback_urb() is called to complete URB A

- interrupt for URB B comes in CPU 1 just before CPU0 releases HCD private
lock, so CPU1 holds the private lock when CPU 0 releases the lock and
handles the HCD interrupt.

So now just like two CPUs are racing, if CPU0 wins, URB A is completed
first, otherwise URB B is completed first.

With introducing completing URB in percpu tasklet, the situation is
just very similar.

On ehci hcd, it can make sure that by always letting CPU0 complete all URBs
(use ehci-scanning/ehci-need_rescan flag), but the patch can change percpu
tasklet to tasklet to meet the demand.


  minimize the amount of time spent in hardirq context, but I can't be
  sure that is really what you want.
 
  If it is, this is not a good way to do it.  A better way to minimize
  the time spent in hardirq context is to use a threaded interrupt
  handler.  But why do you want to minimize this time in the first place?

 Process context switch may have more cost than switching to softirq,
 then USB performance may be effected, that is why I choose to use
 tasklet.  Also now there is no any sleep in URB giveback, so it isn't
 necessary to introduce process to handle the giveback or completion.

 Thomas and Steve, can you comment on this?  I do not have a good
 understanding of the relative benefits/drawbacks of tasklets vs.
 threaded interrupt handlers.

  The CPU will be still have to use at least as much time as before; the
  only difference is that it will be in softirq or in a high-priority
  thread instead of in hardirq.

 Yes, as I said above, but we can allow other IRQs to be served in
 handling URB giveback in softirq context, that is the value of 
 softirq/tasklet.
 Of course, system may have more important things to do during the time.
 Generally speaking, for one driver, hard interrupt handler should always
 consume less time as possible.

 As I understand it, the tradeoff between hardirq and softirq is
 generally unimportant.  It does matter a lot in realtime settings --

I don't think so, and obviously softirq allows IRQs to be served quickly,
otherwise why is softirq and tasklet introduced? and why so many 

Re: [RFC PATCH 0/4] USB: HCD/EHCI: giveback of URB in tasklet context

2013-06-10 Thread Alan Stern
On Mon, 10 Jun 2013, Ming Lei wrote:

  That isn't clear.  It is documented that USB completion handlers are
  called with local interrupts disabled.  An IRQ arriving before the
 
 Is there any good reason to do URB giveback with interrupt disabled?

That's a good question.  Offhand I can't think of any drivers that rely
on this -- although there certainly are places where callback routines
use spin_lock() rather than spin_lock_irq() or spin_lock_irqsave(),
because they know that interrupts are already disabled.

However, changing a documented API is not something to be done lightly.  
You would have to audit _every_ USB driver to make sure no trouble
would arise.

 IMO, disabling IRQs isn't necessary for completing URB since the
 complete() of URBs for same endpoint is reentrant.

Whether it is _necessary_ isn't the point.  It is _documented_, and 
therefore it cannot be changed easily.

  tasklet starts up might therefore be serviced during the short interval
  before the tasklet disables local interrupts, but if that happens it
 
 Tasklet doesn't disable local interrupts.

It is documented that interrupts will be disabled while the completion 
handler runs.  Therefore the tasklet _must_ disable local interrupts.

  would mean that the completion handler would be delayed.
 
 Yes, the tasklet schedule delay is introduced to URB completion, but
 the delay doesn't have much side effect about completing URB.

What about delays that occur because a separate interrupt is serviced
before the URB's completion handler is called?  With the current code
that can't happen, but with your scheme it can.

 For async transfer, generally, it should have no effect since there should
 have URBs queued on the qh queue before submitting URB.

That's not how the Bulk-Only mass-storage protocol works.  There are
times when the protocol _requires_ bulk packets not to be submitted
until a particular URB has completed.

Since mass-storage is an important use case for USB, its requirements 
should not be ignored.

  In effect, you are prioritizing other IRQs higher than USB completion
  handlers.
 
 Both other IRQs and HCD's IRQ are prioritizing the URB completion.

I don't understand that sentence.  Prioritizing means assigning a 
priority to.  IRQs and HCDs don't assign priorities to anything; 
priorities are assigned by human programmers.

  Nevertheless, the total time spent with interrupts disabled
  will remain the same.
 
 No, the total time spent with interrupts disabled does not remain same,
 since no local IRQs are disabled during URB giveback anymore.

That is a bug.  Local IRQs must be disabled during URB giveback.

  (There's one other side effect that perhaps you haven't considered.
  When multiple URBs are addressed to the same endpoint, their completion
  handlers are called in order of URB completion, which is the same as
  the order of URB submission unless an URB is cancelled.  By delegating
  the completion handlers to tasklets, and particularly by using per-CPU
  tasklets, you may violate this behavior.)
 
 Even though URB giveback is handled by hard interrupt handler, it still can't
 guarantee that 100%,  please see below:
 
 - interrupt for URB A comes in CPU 0, and handled, then HCD private
 lock is released, and usb_hcd_giveback_urb() is called to complete URB A
 
 - interrupt for URB B comes in CPU 1 just before CPU0 releases HCD private
 lock, so CPU1 holds the private lock when CPU 0 releases the lock and
 handles the HCD interrupt.
 
 So now just like two CPUs are racing, if CPU0 wins, URB A is completed
 first, otherwise URB B is completed first.

Although you didn't say so, I assume you mean that A and B are URBs for 
the same endpoint.  This means they use the same host controller and 
hence they use the same IRQ line.

When an interrupt is handled, it is not possible for a second interrupt
to arrive on the same IRQ line before the handler for the first 
interrupt returns.  The kernel disables the IRQ line when the first 
interrupt occurs, and keeps it disabled until all the handlers are 
finished.  Therefore the scenario you described cannot occur.

 With introducing completing URB in percpu tasklet, the situation is
 just very similar.

No, it isn't, for the reason given above.

 On ehci hcd, it can make sure that by always letting CPU0 complete all URBs
 (use ehci-scanning/ehci-need_rescan flag), but the patch can change percpu
 tasklet to tasklet to meet the demand.

Or perhaps you could assign each endpoint to a particular tasklet.

  As I understand it, the tradeoff between hardirq and softirq is
  generally unimportant.  It does matter a lot in realtime settings --
 
 I don't think so, and obviously softirq allows IRQs to be served quickly,
 otherwise why is softirq and tasklet introduced? and why so many drivers
 are using tasklet/softirq?

This is part of what I would like to hear Thomas and Steve comment on.

Alan Stern

--
To unsubscribe from this list: send the line unsubscribe 

Re: [RFC PATCH 0/4] USB: HCD/EHCI: giveback of URB in tasklet context

2013-06-10 Thread Steven Rostedt
On Mon, 2013-06-10 at 13:36 -0400, Alan Stern wrote:

  I don't think so, and obviously softirq allows IRQs to be served quickly,
  otherwise why is softirq and tasklet introduced?

Because of history. In the old days there were top halves (hard irq) and
bottom halves (like a softirq). The thing about a bottom half was, no
two could run at the same time. That is, if one CPU was running a bottom
half, another CPU would have to wait for that one to complete before it
could run any bottom half. The killer here is that it didn't matter if
it was the same bottom half or not! Microsoft/Mindcraft was kind enough
to point out this inefficiency with some benchmarks showing how horrible
Linux ran on a 4 way box. Thanks to them, softirqs and tasklets were
born :-)

A softirq does the rest of the interrupt work with interrupts enabled.

A tasklet is run from softirq context, but the difference between a
softirq and tasklet is that the same tasklet can not run on two
different CPUs at the same time. It acts similar to a task, as a task
can not run on two different CPUs at the same time either, hence the
name tasklet. But tasklets are better than bottom halves because it
allows for different tasklets to run on different CPUs.



  and why so many drivers
  are using tasklet/softirq?

Because it's easy to set up and device driver authors don't know any
better ;-). Note, a lot of drivers are now using work queues today,
which run as a task.

Yes, there's a little more overhead with a task than for a softirq, but
the problem with softirqs and tasklets is that they can't be preempted,
and they are more important than all tasks on the system. If you have a
task that is critical, it must yield for your softirq. Almost!

That is, even if you have a softirq, it may not run in irq context at
all. If you get too many softirqs at a time (one comes while another one
is running), it will defer the processing to the ksoftirq thread. This
not only runs as a task, but also runs as SCHED_OTHER, and must yield to
other tasks like everyone else too.

Thus, adding it as a softirq does not guarantee that it will be
processed quickly. It just means that most of the time it will prevent
anything else from happening while your most important handler in the
world is running.

-- Steve


--
To unsubscribe from this list: send the line unsubscribe linux-usb in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 0/4] USB: HCD/EHCI: giveback of URB in tasklet context

2013-06-10 Thread Alan Stern
On Mon, 10 Jun 2013, Alan Stern wrote:

  Tasklet doesn't disable local interrupts.
 
 It is documented that interrupts will be disabled while the completion 
 handler runs.  Therefore the tasklet _must_ disable local interrupts.

You know, it may be that you can get most of the advantages you want by 
enabling local interrupts around the call to unmap_urb_for_dma() in 
usb_hcd_giveback_urb().

This may be a little dangerous, though, because it is possible for an
URB to be given back at the time it is submitted.  Drivers may not 
expect interrupts to get enabled temporarily when they call 
usb_submit_urb().

Alan Stern

--
To unsubscribe from this list: send the line unsubscribe linux-usb in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 0/4] USB: HCD/EHCI: giveback of URB in tasklet context

2013-06-10 Thread Steven Rostedt
On Mon, 2013-06-10 at 16:47 -0400, Alan Stern wrote:
 On Mon, 10 Jun 2013, Steven Rostedt wrote:
 
and why so many drivers
are using tasklet/softirq?
  
  Because it's easy to set up and device driver authors don't know any
  better ;-). Note, a lot of drivers are now using work queues today,
  which run as a task.
  
  Yes, there's a little more overhead with a task than for a softirq, but
  the problem with softirqs and tasklets is that they can't be preempted,
  and they are more important than all tasks on the system. If you have a
  task that is critical, it must yield for your softirq. Almost!
  
  That is, even if you have a softirq, it may not run in irq context at
  all. If you get too many softirqs at a time (one comes while another one
  is running), it will defer the processing to the ksoftirq thread. This
  not only runs as a task, but also runs as SCHED_OTHER, and must yield to
  other tasks like everyone else too.
  
  Thus, adding it as a softirq does not guarantee that it will be
  processed quickly. It just means that most of the time it will prevent
  anything else from happening while your most important handler in the
  world is running.
 
 From this, it sounds like you generally advise using threaded interrupt 
 handlers rather than tasklets/softirqs.

Yes, there's plenty of benefits for them, and I highly doubt that any
USB device would even notice the difference between a softirq and a
thread for response time latencies.

-- Steve



--
To unsubscribe from this list: send the line unsubscribe linux-usb in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 0/4] USB: HCD/EHCI: giveback of URB in tasklet context

2013-06-10 Thread Ming Lei
On Tue, Jun 11, 2013 at 1:36 AM, Alan Stern st...@rowland.harvard.edu wrote:
 On Mon, 10 Jun 2013, Ming Lei wrote:

  That isn't clear.  It is documented that USB completion handlers are
  called with local interrupts disabled.  An IRQ arriving before the

 Is there any good reason to do URB giveback with interrupt disabled?

 That's a good question.  Offhand I can't think of any drivers that rely
 on this -- although there certainly are places where callback routines
 use spin_lock() rather than spin_lock_irq() or spin_lock_irqsave(),
 because they know that interrupts are already disabled.

Complete() may take long time, for example in UVC's driver, URB's
complete() may take more than 3 ms, as reported in below link:

  http://marc.info/?t=136438111600010r=1w=2

Also complete() is provided by interface driver, and it does many driver
specific works, so it isn't good to disable interrupt only for one driver.

If complete() callback runs in one tasklet context, spin_lock() inside
complete() is enough, just like hard irq, the tasklet itself is disabled
during complete(), if the percpu tasklet is converted to single tasklet.
So no problem when the lock is to protect shared resources between
complete().

When the lock is to protect shared resources between complete() and
non-IRQ context, currently spin_lock_irqsave() is used in non-IRQ
context, which is enough to prevent tasklet from being run on the CPU,
so no problem for this situation too.

When all HCDs support to run URB giveback in tasklet context, we can
change all spin_lock_irq*() to spin_lock() in drivers URB-complete(), and
in other places, the spin_lock_irq*() can be changed to spin_lock_bh().


 However, changing a documented API is not something to be done lightly.
 You would have to audit _every_ USB driver to make sure no trouble
 would arise.

OK, I will write patch to request for changing the API if the improvement
on the situation isn't objected.

In fact, at least now, the change on API is only about document change, and
no drivers' change is required, isn't it?  We can describe that URB-complete()
might run in hard irq or softirq context, depends on HCD_BH flag of
hcd-driver-flags.


 IMO, disabling IRQs isn't necessary for completing URB since the
 complete() of URBs for same endpoint is reentrant.

 Whether it is _necessary_ isn't the point.  It is _documented_, and
 therefore it cannot be changed easily.

Same with above.


  tasklet starts up might therefore be serviced during the short interval
  before the tasklet disables local interrupts, but if that happens it

 Tasklet doesn't disable local interrupts.

 It is documented that interrupts will be disabled while the completion
 handler runs.  Therefore the tasklet _must_ disable local interrupts.

Same with above, we can change the document which itself shouldn't
have been the only reason for blocking the change, :-)

  would mean that the completion handler would be delayed.

 Yes, the tasklet schedule delay is introduced to URB completion, but
 the delay doesn't have much side effect about completing URB.

 What about delays that occur because a separate interrupt is serviced
 before the URB's completion handler is called?  With the current code
 that can't happen, but with your scheme it can.

Even with current code, one HCD's interrupt handling still may delay
the handling for another HCD interrupt handling for quite long, that is
the motivation to decrease the interrupt handling time of USB HCD, ;-)
And finally, USB HCDs themselves will benefit from the change, won't they?

Also it shouldn't be a problem since most of interrupt handlers
takes very less time.

 For async transfer, generally, it should have no effect since there should
 have URBs queued on the qh queue before submitting URB.

 That's not how the Bulk-Only mass-storage protocol works.  There are
 times when the protocol _requires_ bulk packets not to be submitted
 until a particular URB has completed.

Yes, I agree. But mass-storage driver itself is very fragile, it will perform
badly if CPU has a higher load.  And it is better to submit DATA/CSW
transfer in previous transfer's complete(), IMO.

Compared with usb-storage task's schedule latency, the tasklet
schedule delay should be lower at most of situations since the tasklet
is scheduled inside irq handler.


 Since mass-storage is an important use case for USB, its requirements
 should not be ignored.

That is why I did many tests on mass storage case, and from the results
in commit log of patch 4/4, looks no performance loss is involved with
the change on usb storage.


  In effect, you are prioritizing other IRQs higher than USB completion
  handlers.

 Both other IRQs and HCD's IRQ are prioritizing the URB completion.

 I don't understand that sentence.  Prioritizing means assigning a
 priority to.  IRQs and HCDs don't assign priorities to anything;
 priorities are assigned by human programmers.

Sorry, prioritizing should have been prioritized.

  

[RFC PATCH 0/4] USB: HCD/EHCI: giveback of URB in tasklet context

2013-06-09 Thread Ming Lei
Hi,

The patchset supports to run giveback of URB in tasklet context, so that
DMA unmapping/mapping on transfer buffer and compelte() callback can be
run with interrupt enabled, then time of HCD interrupt handler can be
saved much. Also this approach may simplify HCD since HCD lock needn't
be released any more before calling usb_hcd_giveback_urb().

The patchset enables the mechanism on EHCI HCD.

Patch 3/4 improves interrupt qh unlink on EHCI HCD when the mechanism
is introduced.

In the commit log of patch 4/4, detailed test result on three machines
(ARM A9/A15 dual core, X86) are provided about transfer performance and
ehci irq handling time. From the result, no transfer performance loss
is found and ehci irq handling time can drop much with the patchset.

 drivers/usb/core/hcd.c|  170 +++--
 drivers/usb/host/ehci-fsl.c   |2 +-
 drivers/usb/host/ehci-grlib.c |2 +-
 drivers/usb/host/ehci-hcd.c   |   18 ++--
 drivers/usb/host/ehci-hub.c   |1 +
 drivers/usb/host/ehci-mv.c|2 +-
 drivers/usb/host/ehci-octeon.c|2 +-
 drivers/usb/host/ehci-pmcmsp.c|2 +-
 drivers/usb/host/ehci-ppc-of.c|2 +-
 drivers/usb/host/ehci-ps3.c   |2 +-
 drivers/usb/host/ehci-q.c |6 +-
 drivers/usb/host/ehci-sched.c |   60 -
 drivers/usb/host/ehci-sead3.c |2 +-
 drivers/usb/host/ehci-sh.c|2 +-
 drivers/usb/host/ehci-tegra.c |2 +-
 drivers/usb/host/ehci-tilegx.c|2 +-
 drivers/usb/host/ehci-timer.c |   43 ++
 drivers/usb/host/ehci-w90x900.c   |2 +-
 drivers/usb/host/ehci-xilinx-of.c |2 +-
 drivers/usb/host/ehci.h   |3 +
 include/linux/usb/hcd.h   |   16 
 21 files changed, 295 insertions(+), 48 deletions(-)


Thanks,
--
Ming Lei

--
To unsubscribe from this list: send the line unsubscribe linux-usb in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html