Re: [RFC PATCH 0/4] USB: HCD/EHCI: giveback of URB in tasklet context
On Fri, Jun 14, 2013 at 09:53:52AM +0800, Ming Lei wrote: On Fri, Jun 14, 2013 at 8:35 AM, Greg Kroah-Hartman gre...@linuxfoundation.org wrote: On Thu, Jun 13, 2013 at 03:41:17PM -0400, Alan Stern wrote: On Thu, 13 Jun 2013, Greg Kroah-Hartman wrote: On Thu, Jun 13, 2013 at 10:54:13AM -0400, Alan Stern wrote: On Thu, 13 Jun 2013, Ming Lei wrote: - using interrupt threaded handler(default) 33.440 MB/sec - using tasklet(#undef USB_HCD_THREADED_IRQ) 34.29 MB/sec - using hard interrupt handler(by removing HCD_BH in ehci-hcd.c ) 34.260 MB/s So looks usb mass storage performance loss can be observed with interrupt threaded handler because one mass storage read/write sectors requires at least 3 interrupts which wake up usb-storage thread 3 times (each interrupt wakeup the usb-storage each time), introducing irq threaded handler will make 2 threads to be waken up about 6 times for one read/write. I think usb mass storage transfer handler need to be rewritten, otherwise it may become worsen after using irq threaded handler in USB 3.0.(the above device can reach 120MB/sec with hardware handler or tasklet handler, which means about ~3K interrupts/sec, so ~6K contexts switch in case of using irq threaded handler) So how about supporting tasklet first, then convert to interrupt threaded handler after usb mass storage transfer is rewritten without performance loss? (rewriting usb mass storage transfer handler may need some time and work since storage stability/correctness is extremely important, :-) Maybe we should simply copy what the networking people do. They are very concerned about performance and latency; whatever technique they use should be good for USB too. Yes, but for old-style usb-storage, is this really a big deal? We are still easily hitting the line-speed of USB for usb-storage with simple machines, the bottlenecks that I'm seeing are in the devices themselves, and then in the USB wire speed. What about with USB-3 storage devices? Many of them still use the bulk-only transport instead of UAS. They may push the limits up. Are they really? Have we seen that happen yet? With the number's I've Yes, the device I am testing is bulk-only, no uas support , and it is very popular in market. seen published, we are easily serving up enough data to keep the pipe full, but that all depends on your CPU / host controller. Once hardware comes out that uses USB streams, and we get device support for the UAS protocol, then we might have a need to change things, but at this point in time, for the old driver, I think we are fine. Unless someone has a workload / benchmark that shows otherwise? The test results above show a 2.4% degradation for threaded interrupts as compared to tasklets. That's in addition to the bottlenecks caused by the device; no doubt it would be worse for a faster device. This result calls into question the benefits of threaded interrupts. The main reason for moving away from the current scheme is to reduce latency for other interrupt handlers. Ming gave two examples of slow USB code that runs in hardirq context now; with his change they would run in softirq context and therefore wouldn't delay other interrupts so much. (Interrupt latency is hard to measure, however.) Yes, I know that people keep wanting to worry about latency issues, and the best answer for them has always been, don't use USB. :) I think we can do it better, why don't do it? :-) Because of other issues, that have been brought up here already. But if you can do it without affecting others, that's fine. You suffer throughput issues with predicitable latency dependancies, so This patchset don't cause throughout degradation but decrease latency much, also has other advantages. Like what? we need to be careful we don't slow down the 99% of the systems out there that do not care about this at all. Considered great amount of ARM devices in market, I think we need to consider the problem on these devices, :-) Is it a problem on those devices? I think they have host controller issues that are way bigger problems than this device driver, right? greg k-h -- To unsubscribe from this list: send the line unsubscribe linux-usb in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 0/4] USB: HCD/EHCI: giveback of URB in tasklet context
On Fri, Jun 14, 2013 at 2:05 PM, Greg Kroah-Hartman gre...@linuxfoundation.org wrote: On Fri, Jun 14, 2013 at 09:53:52AM +0800, Ming Lei wrote: On Fri, Jun 14, 2013 at 8:35 AM, Greg Kroah-Hartman gre...@linuxfoundation.org wrote: On Thu, Jun 13, 2013 at 03:41:17PM -0400, Alan Stern wrote: On Thu, 13 Jun 2013, Greg Kroah-Hartman wrote: On Thu, Jun 13, 2013 at 10:54:13AM -0400, Alan Stern wrote: On Thu, 13 Jun 2013, Ming Lei wrote: - using interrupt threaded handler(default) 33.440 MB/sec - using tasklet(#undef USB_HCD_THREADED_IRQ) 34.29 MB/sec - using hard interrupt handler(by removing HCD_BH in ehci-hcd.c ) 34.260 MB/s So looks usb mass storage performance loss can be observed with interrupt threaded handler because one mass storage read/write sectors requires at least 3 interrupts which wake up usb-storage thread 3 times (each interrupt wakeup the usb-storage each time), introducing irq threaded handler will make 2 threads to be waken up about 6 times for one read/write. I think usb mass storage transfer handler need to be rewritten, otherwise it may become worsen after using irq threaded handler in USB 3.0.(the above device can reach 120MB/sec with hardware handler or tasklet handler, which means about ~3K interrupts/sec, so ~6K contexts switch in case of using irq threaded handler) So how about supporting tasklet first, then convert to interrupt threaded handler after usb mass storage transfer is rewritten without performance loss? (rewriting usb mass storage transfer handler may need some time and work since storage stability/correctness is extremely important, :-) Maybe we should simply copy what the networking people do. They are very concerned about performance and latency; whatever technique they use should be good for USB too. Yes, but for old-style usb-storage, is this really a big deal? We are still easily hitting the line-speed of USB for usb-storage with simple machines, the bottlenecks that I'm seeing are in the devices themselves, and then in the USB wire speed. What about with USB-3 storage devices? Many of them still use the bulk-only transport instead of UAS. They may push the limits up. Are they really? Have we seen that happen yet? With the number's I've Yes, the device I am testing is bulk-only, no uas support , and it is very popular in market. seen published, we are easily serving up enough data to keep the pipe full, but that all depends on your CPU / host controller. Once hardware comes out that uses USB streams, and we get device support for the UAS protocol, then we might have a need to change things, but at this point in time, for the old driver, I think we are fine. Unless someone has a workload / benchmark that shows otherwise? The test results above show a 2.4% degradation for threaded interrupts as compared to tasklets. That's in addition to the bottlenecks caused by the device; no doubt it would be worse for a faster device. This result calls into question the benefits of threaded interrupts. The main reason for moving away from the current scheme is to reduce latency for other interrupt handlers. Ming gave two examples of slow USB code that runs in hardirq context now; with his change they would run in softirq context and therefore wouldn't delay other interrupts so much. (Interrupt latency is hard to measure, however.) Yes, I know that people keep wanting to worry about latency issues, and the best answer for them has always been, don't use USB. :) I think we can do it better, why don't do it? :-) Because of other issues, that have been brought up here already. But if you can do it without affecting others, that's fine. It won't affect others. As discussed in the thread, we can do the change with following steps: 1), move URB giveback from hard irq handler to tasklet, but call complete() with local IRQs disabled; 2), cleanup current drivers which call 'spin_lock' in compele() by replacing spin_lock() with spin_lock_irqrestore(), and change the API of complete() callback by claiming it called with local IRQs enabled, but bh is disabled. 3), enable local IRQs before calling complete(). There is no good reason for complete() to run with interrupt disabled, as discussed, so the API change and driver cleanup still makes sense. In fact, only the below two cases requires the 2nd step's change: - drivers-complete() acquire a subsystem wide lock which may be called in another hard irq context, and the subsystem wide lock is acquired by spin_lock() in complete() - drivers-complete() hold a private lock with spin_lock() but may export APIs to let other drivers acquire the private lock in its interrupt handler. Looks both two
Re: [RFC PATCH 0/4] USB: HCD/EHCI: giveback of URB in tasklet context
On Fri, Jun 14, 2013 at 10:56 PM, Alan Stern st...@rowland.harvard.edu wrote: On Fri, 14 Jun 2013, Ming Lei wrote: On Fri, Jun 14, 2013 at 3:41 AM, Alan Stern st...@rowland.harvard.edu wrote: The main reason for moving away from the current scheme is to reduce latency for other interrupt handlers. Ming gave two examples of slow USB code that runs in hardirq context now; with his change they would run in softirq context and therefore wouldn't delay other interrupts so much. (Interrupt latency is hard to measure, however.) With the two trace points of irq_handler_entry and irq_handler_exit, the interrupt latency(or the time taken by hard irq handler) isn't hard to measure. One simple script can figure out the average/maximum latency for one irq handler, like I did in 4/4. But that doesn't measure the time between when the IRQ request is issued and when irq_handler_entry runs. That might be hard to measure, also I am wondering if the time can be measured only by software, but these patches only focus on the time between HCD irq entry and irq exit. Thanks, -- Ming Lei -- To unsubscribe from this list: send the line unsubscribe linux-usb in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 0/4] USB: HCD/EHCI: giveback of URB in tasklet context
On Fri, 14 Jun 2013, Ming Lei wrote: With the two trace points of irq_handler_entry and irq_handler_exit, the interrupt latency(or the time taken by hard irq handler) isn't hard to measure. One simple script can figure out the average/maximum latency for one irq handler, like I did in 4/4. But that doesn't measure the time between when the IRQ request is issued and when irq_handler_entry runs. That might be hard to measure, also I am wondering if the time can be measured only by software, but these patches only focus on the time between HCD irq entry and irq exit. Not entirely. On a UP system, leaving interrupts disabled for a long time (which is what we do now) increases the delay between when the IRQ is raised and when it is serviced. On an SMP system, a long-running interrupt handler will delay servicing a different device that shares the same IRQ line. Alan Stern -- To unsubscribe from this list: send the line unsubscribe linux-usb in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 0/4] USB: HCD/EHCI: giveback of URB in tasklet context
On Fri, 14 Jun 2013, Alan Stern wrote: On Fri, 14 Jun 2013, Ming Lei wrote: With the two trace points of irq_handler_entry and irq_handler_exit, the interrupt latency(or the time taken by hard irq handler) isn't hard to measure. One simple script can figure out the average/maximum latency for one irq handler, like I did in 4/4. But that doesn't measure the time between when the IRQ request is issued and when irq_handler_entry runs. That might be hard to measure, also I am wondering if the time can be measured only by software, but these patches only focus on the time between HCD irq entry and irq exit. Not entirely. On a UP system, leaving interrupts disabled for a long time (which is what we do now) increases the delay between when the IRQ is raised and when it is serviced. On an SMP system, a long-running interrupt handler will delay servicing a different device that shares the same IRQ line. And on UP it delays ALL other interrupts. I've seen 500us+ caused by the USB interrupt handlers... Thanks, tglx -- To unsubscribe from this list: send the line unsubscribe linux-usb in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 0/4] USB: HCD/EHCI: giveback of URB in tasklet context
On Sat, Jun 15, 2013 at 4:23 AM, Alan Stern st...@rowland.harvard.edu wrote: Not entirely. On a UP system, leaving interrupts disabled for a long time (which is what we do now) increases the delay between when the IRQ is raised and when it is serviced. On an SMP system, a long-running Yes, I mean the HCD IRQ handling time is already too long, and it isn't affected by the time between raising and servicing the IRQ. interrupt handler will delay servicing a different device that shares the same IRQ line. Not always so. Currently, ARM can only set one irq line to be served by one of CPU at the same time for power saving, which still results in above situation. In fact, without irq-balance, on ARM, all IRQs are handled by CPU0 only. On Sat, Jun 15, 2013 at 5:09 AM, Thomas Gleixner t...@linutronix.de wrote: And on UP it delays ALL other interrupts. I've seen 500us+ caused by the USB interrupt handlers... On SMP the above case may be worse than UP, when the same completion things(from hw view) happen on one of CPU, the releasing reacquiring HCD private lock in interrupt handler may cause longer time. Thanks, -- Ming Lei -- To unsubscribe from this list: send the line unsubscribe linux-usb in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 0/4] USB: HCD/EHCI: giveback of URB in tasklet context
On Thu, 13 Jun 2013, Ming Lei wrote: - using interrupt threaded handler(default) 33.440 MB/sec - using tasklet(#undef USB_HCD_THREADED_IRQ) 34.29 MB/sec - using hard interrupt handler(by removing HCD_BH in ehci-hcd.c ) 34.260 MB/s So looks usb mass storage performance loss can be observed with interrupt threaded handler because one mass storage read/write sectors requires at least 3 interrupts which wake up usb-storage thread 3 times (each interrupt wakeup the usb-storage each time), introducing irq threaded handler will make 2 threads to be waken up about 6 times for one read/write. I think usb mass storage transfer handler need to be rewritten, otherwise it may become worsen after using irq threaded handler in USB 3.0.(the above device can reach 120MB/sec with hardware handler or tasklet handler, which means about ~3K interrupts/sec, so ~6K contexts switch in case of using irq threaded handler) So how about supporting tasklet first, then convert to interrupt threaded handler after usb mass storage transfer is rewritten without performance loss? (rewriting usb mass storage transfer handler may need some time and work since storage stability/correctness is extremely important, :-) Maybe we should simply copy what the networking people do. They are very concerned about performance and latency; whatever technique they use should be good for USB too. Also another problem with irq threaded handler is that there is no sort of tasklet_schedule() interface to wakeup the thread handler manually, so I have to use work to schedule some URB giveback from drivers(root hub transfer, unlink), even though that isn't a big deal but will cause code a bit much/complicated, :-) Yes, I was going to bring that up. Thomas, sometimes we need the IRQ handler thread to do some work even though an interrupt hasn't occurred. Is there an API for this, or can one be added? Alan Stern -- To unsubscribe from this list: send the line unsubscribe linux-usb in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 0/4] USB: HCD/EHCI: giveback of URB in tasklet context
On Thu, 13 Jun 2013, Ming Lei wrote: For reasonably sophisticated host controllers like EHCI, the delay in responding to an interrupt doesn't matter much. But for simpler host EHCI might still benefit from the change: suppose uvc video is playing on one EHCI bus, and mass storage's performance over another EHCI may decrease because of the IRQ latency introduced. If multiple CPUs are involved, then we could see an improvement in the case where the two EHCI controllers share the same IRQ line. With different IRQs, though, everything would run in parallel on different CPUs, so there is no advantage. At this point we are talking about multiple changes: Moving the givebacks to a tasklet or threaded handler. Changing the USB completion handlers so that they can be called with interrupts enabled. Allowing the tasklet/thread to run with interrupts enabled. Moving more of the HCD processing into the tasklet/thread. Maybe other things too. In principle, the first and second items can be done simultaneously. Very good summery, we can push the 1st change with disabling local IRQ when calling complete(), so that at least DMA unmapping can be done with IRQ enabled, and at the same time do the API change, which should be a bit slow but the mechanical way proposed by Oliver may be OK. The 3rd item is easy once the 1st two items are completed. For the 4th one, it might be a long term thing, since refactoring HCD interrupt handler is a bit complicated. Also, when the 1st three items are completed, hard interrupt handler takes less time, often less than 20us at average about EHCI, so I am wondering if the 4th is worthy. Yes, I don't know about that. If the time spent in the hardirq processing is reasonably short then moving it to the threaded handler won't help very much. Alan Stern -- To unsubscribe from this list: send the line unsubscribe linux-usb in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 0/4] USB: HCD/EHCI: giveback of URB in tasklet context
On Thu, Jun 13, 2013 at 10:54:13AM -0400, Alan Stern wrote: On Thu, 13 Jun 2013, Ming Lei wrote: - using interrupt threaded handler(default) 33.440 MB/sec - using tasklet(#undef USB_HCD_THREADED_IRQ) 34.29 MB/sec - using hard interrupt handler(by removing HCD_BH in ehci-hcd.c ) 34.260 MB/s So looks usb mass storage performance loss can be observed with interrupt threaded handler because one mass storage read/write sectors requires at least 3 interrupts which wake up usb-storage thread 3 times (each interrupt wakeup the usb-storage each time), introducing irq threaded handler will make 2 threads to be waken up about 6 times for one read/write. I think usb mass storage transfer handler need to be rewritten, otherwise it may become worsen after using irq threaded handler in USB 3.0.(the above device can reach 120MB/sec with hardware handler or tasklet handler, which means about ~3K interrupts/sec, so ~6K contexts switch in case of using irq threaded handler) So how about supporting tasklet first, then convert to interrupt threaded handler after usb mass storage transfer is rewritten without performance loss? (rewriting usb mass storage transfer handler may need some time and work since storage stability/correctness is extremely important, :-) Maybe we should simply copy what the networking people do. They are very concerned about performance and latency; whatever technique they use should be good for USB too. Yes, but for old-style usb-storage, is this really a big deal? We are still easily hitting the line-speed of USB for usb-storage with simple machines, the bottlenecks that I'm seeing are in the devices themselves, and then in the USB wire speed. Once hardware comes out that uses USB streams, and we get device support for the UAS protocol, then we might have a need to change things, but at this point in time, for the old driver, I think we are fine. Unless someone has a workload / benchmark that shows otherwise? thanks, greg k-h -- To unsubscribe from this list: send the line unsubscribe linux-usb in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 0/4] USB: HCD/EHCI: giveback of URB in tasklet context
On Thu, 13 Jun 2013, Greg Kroah-Hartman wrote: On Thu, Jun 13, 2013 at 10:54:13AM -0400, Alan Stern wrote: On Thu, 13 Jun 2013, Ming Lei wrote: - using interrupt threaded handler(default) 33.440 MB/sec - using tasklet(#undef USB_HCD_THREADED_IRQ) 34.29 MB/sec - using hard interrupt handler(by removing HCD_BH in ehci-hcd.c ) 34.260 MB/s So looks usb mass storage performance loss can be observed with interrupt threaded handler because one mass storage read/write sectors requires at least 3 interrupts which wake up usb-storage thread 3 times (each interrupt wakeup the usb-storage each time), introducing irq threaded handler will make 2 threads to be waken up about 6 times for one read/write. I think usb mass storage transfer handler need to be rewritten, otherwise it may become worsen after using irq threaded handler in USB 3.0.(the above device can reach 120MB/sec with hardware handler or tasklet handler, which means about ~3K interrupts/sec, so ~6K contexts switch in case of using irq threaded handler) So how about supporting tasklet first, then convert to interrupt threaded handler after usb mass storage transfer is rewritten without performance loss? (rewriting usb mass storage transfer handler may need some time and work since storage stability/correctness is extremely important, :-) Maybe we should simply copy what the networking people do. They are very concerned about performance and latency; whatever technique they use should be good for USB too. Yes, but for old-style usb-storage, is this really a big deal? We are still easily hitting the line-speed of USB for usb-storage with simple machines, the bottlenecks that I'm seeing are in the devices themselves, and then in the USB wire speed. What about with USB-3 storage devices? Many of them still use the bulk-only transport instead of UAS. They may push the limits up. Once hardware comes out that uses USB streams, and we get device support for the UAS protocol, then we might have a need to change things, but at this point in time, for the old driver, I think we are fine. Unless someone has a workload / benchmark that shows otherwise? The test results above show a 2.4% degradation for threaded interrupts as compared to tasklets. That's in addition to the bottlenecks caused by the device; no doubt it would be worse for a faster device. This result calls into question the benefits of threaded interrupts. The main reason for moving away from the current scheme is to reduce latency for other interrupt handlers. Ming gave two examples of slow USB code that runs in hardirq context now; with his change they would run in softirq context and therefore wouldn't delay other interrupts so much. (Interrupt latency is hard to measure, however.) Alan Stern -- To unsubscribe from this list: send the line unsubscribe linux-usb in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 0/4] USB: HCD/EHCI: giveback of URB in tasklet context
On Thu, 2013-06-13 at 15:41 -0400, Alan Stern wrote: The test results above show a 2.4% degradation for threaded interrupts as compared to tasklets. That's in addition to the bottlenecks caused by the device; no doubt it would be worse for a faster device. This result calls into question the benefits of threaded interrupts. That's because it was written like a top half and not a full blown interrupt. I just looked at the patch, and saw this: +#ifndef USB_HCD_THREADED_IRQ if (sched) { if (async) tasklet_schedule(bh-bh); else tasklet_hi_schedule(bh-bh); } +#else + if (sched) + schedule_work(hcd-periodic_bh-work); +#endif What is this? The work isn't done by an interrupt thread, but by work queues! The point of the interrupt thread is that you do *all* the work that needs to be done when an interrupt comes in. You don't need to delay the work. If you just treat a threaded interrupt like a real interrupt and push off work to something else, then yes, it will degrade performance. If you go the threaded interrupt route, you need to rethink the paradigm. There's no reason that the interrupt handler needs to be fast like it needs to be in true interrupt context. The handler can now use mutexes, and other full features that currently only threads benefit from. It should improve locking issues, and can serialize things if needed. All this patch did was to switch the main irq to a thread and make a bottom half into a work queue. Why couldn't you just do: if (sched) usb_giveback_urb_bh(bh); ? -- Steve -- To unsubscribe from this list: send the line unsubscribe linux-usb in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 0/4] USB: HCD/EHCI: giveback of URB in tasklet context
On Thu, 13 Jun 2013, Steven Rostedt wrote: On Thu, 2013-06-13 at 15:41 -0400, Alan Stern wrote: The test results above show a 2.4% degradation for threaded interrupts as compared to tasklets. That's in addition to the bottlenecks caused by the device; no doubt it would be worse for a faster device. This result calls into question the benefits of threaded interrupts. That's because it was written like a top half and not a full blown interrupt. I just looked at the patch, and saw this: +#ifndef USB_HCD_THREADED_IRQ if (sched) { if (async) tasklet_schedule(bh-bh); else tasklet_hi_schedule(bh-bh); } +#else + if (sched) + schedule_work(hcd-periodic_bh-work); +#endif What is this? The work isn't done by an interrupt thread, but by work queues! You don't understand the patch. Most of the time, sched will be 0 here and hence the work queue won't be involved. Yes, part of the work is done by a work queue rather than the interrupt thread. But it is an unimportant part, the part that involves transfers to root hubs or transfers that were cancelled. These things can complete without any interrupt occurring, so they can't be handled by the interrupt thread. However, they are the abnormal case; the transfers we care about are not to root hubs and they do complete normally. The point of the interrupt thread is that you do *all* the work that needs to be done when an interrupt comes in. You don't need to delay the work. You've got it backward. The patch doesn't leave part of the work undone when an interrupt occurs. Rather it's the other way around -- sometimes work needs to be done when there isn't any interrupt. This could happen in a timer callback, or it could happen as a direct result of a function call. Since there doesn't seem to be any way to invoke the interrupt thread in the absence of an interrupt, Ming pushed the job off to a work queue. If you just treat a threaded interrupt like a real interrupt and push off work to something else, then yes, it will degrade performance. If you go the threaded interrupt route, you need to rethink the paradigm. There's no reason that the interrupt handler needs to be fast like it needs to be in true interrupt context. The handler can now use mutexes, and other full features that currently only threads benefit from. It should improve locking issues, and can serialize things if needed. All this patch did was to switch the main irq to a thread and make a bottom half into a work queue. In case it's not clear, the code you quoted above is part of the interrupt handler, not part of the thread. Why couldn't you just do: if (sched) usb_giveback_urb_bh(bh); ? Because usb_giveback_urb_bh() is supposed to run in the context of the tasklet or interrupt thread or work queue, not in the context of the interrupt handler. Alan Stern -- To unsubscribe from this list: send the line unsubscribe linux-usb in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 0/4] USB: HCD/EHCI: giveback of URB in tasklet context
On Thu, 2013-06-13 at 17:09 -0400, Alan Stern wrote: On Thu, 13 Jun 2013, Steven Rostedt wrote: On Thu, 2013-06-13 at 15:41 -0400, Alan Stern wrote: The test results above show a 2.4% degradation for threaded interrupts as compared to tasklets. That's in addition to the bottlenecks caused by the device; no doubt it would be worse for a faster device. This result calls into question the benefits of threaded interrupts. That's because it was written like a top half and not a full blown interrupt. I just looked at the patch, and saw this: +#ifndef USB_HCD_THREADED_IRQ if (sched) { if (async) tasklet_schedule(bh-bh); else tasklet_hi_schedule(bh-bh); } +#else + if (sched) + schedule_work(hcd-periodic_bh-work); +#endif What is this? The work isn't done by an interrupt thread, but by work queues! You don't understand the patch. Hey, I'll admit that I don't understand how USB works ;-) I also only looked at the second patch without applying it. Thus, a lot of the changes were out of context for me. Most of the time, sched will be 0 here and hence the work queue won't be involved. OK, I only compared that current tasklets were being commented out, and that's pretty much all I had to go on. Yes, part of the work is done by a work queue rather than the interrupt thread. But it is an unimportant part, the part that involves transfers to root hubs or transfers that were cancelled. These things can complete without any interrupt occurring, so they can't be handled by the interrupt thread. However, they are the abnormal case; the transfers we care about are not to root hubs and they do complete normally. The point of the interrupt thread is that you do *all* the work that needs to be done when an interrupt comes in. You don't need to delay the work. You've got it backward. The patch doesn't leave part of the work undone when an interrupt occurs. Rather it's the other way around -- sometimes work needs to be done when there isn't any interrupt. This could happen in a timer callback, or it could happen as a direct result of a function call. Since there doesn't seem to be any way to invoke the interrupt thread in the absence of an interrupt, Ming pushed the job off to a work queue. If you just treat a threaded interrupt like a real interrupt and push off work to something else, then yes, it will degrade performance. If you go the threaded interrupt route, you need to rethink the paradigm. There's no reason that the interrupt handler needs to be fast like it needs to be in true interrupt context. The handler can now use mutexes, and other full features that currently only threads benefit from. It should improve locking issues, and can serialize things if needed. All this patch did was to switch the main irq to a thread and make a bottom half into a work queue. In case it's not clear, the code you quoted above is part of the interrupt handler, not part of the thread. Got it. Why couldn't you just do: if (sched) usb_giveback_urb_bh(bh); ? Because usb_giveback_urb_bh() is supposed to run in the context of the tasklet or interrupt thread or work queue, not in the context of the interrupt handler. I only took a quick look at the second patch. I'm now looking at both patches applied to the code. I didn't realize this was called from the top half. Usually the top half for threaded interrupts is used just to quite the interrupt line. Either by acknowledging the interrupt or by disabling the device from sending more interrupts till the bottom half (thread) can run. This looks to be doing a bit more than that. I'll look a bit deeper at the patch, but this still doesn't look like a typical threaded interrupt usage. -- Steve -- To unsubscribe from this list: send the line unsubscribe linux-usb in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 0/4] USB: HCD/EHCI: giveback of URB in tasklet context
On Thu, 13 Jun 2013, Steven Rostedt wrote: I only took a quick look at the second patch. I'm now looking at both patches applied to the code. I didn't realize this was called from the top half. Usually the top half for threaded interrupts is used just to quite the interrupt line. Either by acknowledging the interrupt or by disabling the device from sending more interrupts till the bottom half (thread) can run. This looks to be doing a bit more than that. Yes, it does. Ming left all the host controller processing in the top half. The bottom half merely handles the completion callbacks. One of the things we discussed during this email thread was the possibility of moving _all_ the work into the threaded handler. That's not as easy to do, though; it requires significant modification of the host controller driver. And each controller driver would need its own modifications, whereas with Ming's approach only the core needs to be changed. I'll look a bit deeper at the patch, but this still doesn't look like a typical threaded interrupt usage. I'll agree with that; it isn't typical. Ming claims that the work remaining in the top half often takes less than 20 us on average in his tests. Depending on the particular device, the work in the bottom half can be very quick (little more than waking up a thread in the case of usb-storage) or quite slow (perhaps on the order of a ms or more for the UVC video driver on some ARM platforms). Alan Stern -- To unsubscribe from this list: send the line unsubscribe linux-usb in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 0/4] USB: HCD/EHCI: giveback of URB in tasklet context
On Thu, Jun 13, 2013 at 03:41:17PM -0400, Alan Stern wrote: On Thu, 13 Jun 2013, Greg Kroah-Hartman wrote: On Thu, Jun 13, 2013 at 10:54:13AM -0400, Alan Stern wrote: On Thu, 13 Jun 2013, Ming Lei wrote: - using interrupt threaded handler(default) 33.440 MB/sec - using tasklet(#undef USB_HCD_THREADED_IRQ) 34.29 MB/sec - using hard interrupt handler(by removing HCD_BH in ehci-hcd.c ) 34.260 MB/s So looks usb mass storage performance loss can be observed with interrupt threaded handler because one mass storage read/write sectors requires at least 3 interrupts which wake up usb-storage thread 3 times (each interrupt wakeup the usb-storage each time), introducing irq threaded handler will make 2 threads to be waken up about 6 times for one read/write. I think usb mass storage transfer handler need to be rewritten, otherwise it may become worsen after using irq threaded handler in USB 3.0.(the above device can reach 120MB/sec with hardware handler or tasklet handler, which means about ~3K interrupts/sec, so ~6K contexts switch in case of using irq threaded handler) So how about supporting tasklet first, then convert to interrupt threaded handler after usb mass storage transfer is rewritten without performance loss? (rewriting usb mass storage transfer handler may need some time and work since storage stability/correctness is extremely important, :-) Maybe we should simply copy what the networking people do. They are very concerned about performance and latency; whatever technique they use should be good for USB too. Yes, but for old-style usb-storage, is this really a big deal? We are still easily hitting the line-speed of USB for usb-storage with simple machines, the bottlenecks that I'm seeing are in the devices themselves, and then in the USB wire speed. What about with USB-3 storage devices? Many of them still use the bulk-only transport instead of UAS. They may push the limits up. Are they really? Have we seen that happen yet? With the number's I've seen published, we are easily serving up enough data to keep the pipe full, but that all depends on your CPU / host controller. Once hardware comes out that uses USB streams, and we get device support for the UAS protocol, then we might have a need to change things, but at this point in time, for the old driver, I think we are fine. Unless someone has a workload / benchmark that shows otherwise? The test results above show a 2.4% degradation for threaded interrupts as compared to tasklets. That's in addition to the bottlenecks caused by the device; no doubt it would be worse for a faster device. This result calls into question the benefits of threaded interrupts. The main reason for moving away from the current scheme is to reduce latency for other interrupt handlers. Ming gave two examples of slow USB code that runs in hardirq context now; with his change they would run in softirq context and therefore wouldn't delay other interrupts so much. (Interrupt latency is hard to measure, however.) Yes, I know that people keep wanting to worry about latency issues, and the best answer for them has always been, don't use USB. :) You suffer throughput issues with predicitable latency dependancies, so we need to be careful we don't slow down the 99% of the systems out there that do not care about this at all. greg k-h -- To unsubscribe from this list: send the line unsubscribe linux-usb in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 0/4] USB: HCD/EHCI: giveback of URB in tasklet context
On Fri, Jun 14, 2013 at 5:09 AM, Alan Stern st...@rowland.harvard.edu wrote: On Thu, 13 Jun 2013, Steven Rostedt wrote: On Thu, 2013-06-13 at 15:41 -0400, Alan Stern wrote: The test results above show a 2.4% degradation for threaded interrupts as compared to tasklets. That's in addition to the bottlenecks caused by the device; no doubt it would be worse for a faster device. This result calls into question the benefits of threaded interrupts. That's because it was written like a top half and not a full blown interrupt. I just looked at the patch, and saw this: +#ifndef USB_HCD_THREADED_IRQ if (sched) { if (async) tasklet_schedule(bh-bh); else tasklet_hi_schedule(bh-bh); } +#else + if (sched) + schedule_work(hcd-periodic_bh-work); +#endif What is this? The work isn't done by an interrupt thread, but by work queues! You don't understand the patch. Most of the time, sched will be 0 here and hence the work queue won't be involved. Yes, part of the work is done by a work queue rather than the interrupt thread. But it is an unimportant part, the part that involves transfers to root hubs or transfers that were cancelled. These things can complete without any interrupt occurring, so they can't be handled by the interrupt thread. However, they are the abnormal case; the transfers we care about are not to root hubs and they do complete normally. The point of the interrupt thread is that you do *all* the work that needs to be done when an interrupt comes in. You don't need to delay the work. You've got it backward. The patch doesn't leave part of the work undone when an interrupt occurs. Rather it's the other way around -- sometimes work needs to be done when there isn't any interrupt. This could happen in a timer callback, or it could happen as a direct result of a function call. Since there doesn't seem to be any way to invoke the interrupt thread in the absence of an interrupt, Ming pushed the job off to a work queue. If you just treat a threaded interrupt like a real interrupt and push off work to something else, then yes, it will degrade performance. If you go the threaded interrupt route, you need to rethink the paradigm. There's no reason that the interrupt handler needs to be fast like it needs to be in true interrupt context. The handler can now use mutexes, and other full features that currently only threads benefit from. It should improve locking issues, and can serialize things if needed. All this patch did was to switch the main irq to a thread and make a bottom half into a work queue. In case it's not clear, the code you quoted above is part of the interrupt handler, not part of the thread. Why couldn't you just do: if (sched) usb_giveback_urb_bh(bh); ? Because usb_giveback_urb_bh() is supposed to run in the context of the tasklet or interrupt thread or work queue, not in the context of the interrupt handler. Exactly, the code should have been the below shape: #ifndef USB_HCD_THREADED_IRQ .. #else if (!in_irq()) { bh = hcd-periodic_bh; sched = 1; } #endif Then it will be quite clear. Thanks, -- Ming Lei -- To unsubscribe from this list: send the line unsubscribe linux-usb in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 0/4] USB: HCD/EHCI: giveback of URB in tasklet context
On Fri, Jun 14, 2013 at 3:41 AM, Alan Stern st...@rowland.harvard.edu wrote: On Thu, 13 Jun 2013, Greg Kroah-Hartman wrote: On Thu, Jun 13, 2013 at 10:54:13AM -0400, Alan Stern wrote: On Thu, 13 Jun 2013, Ming Lei wrote: - using interrupt threaded handler(default) 33.440 MB/sec - using tasklet(#undef USB_HCD_THREADED_IRQ) 34.29 MB/sec - using hard interrupt handler(by removing HCD_BH in ehci-hcd.c ) 34.260 MB/s So looks usb mass storage performance loss can be observed with interrupt threaded handler because one mass storage read/write sectors requires at least 3 interrupts which wake up usb-storage thread 3 times (each interrupt wakeup the usb-storage each time), introducing irq threaded handler will make 2 threads to be waken up about 6 times for one read/write. I think usb mass storage transfer handler need to be rewritten, otherwise it may become worsen after using irq threaded handler in USB 3.0.(the above device can reach 120MB/sec with hardware handler or tasklet handler, which means about ~3K interrupts/sec, so ~6K contexts switch in case of using irq threaded handler) So how about supporting tasklet first, then convert to interrupt threaded handler after usb mass storage transfer is rewritten without performance loss? (rewriting usb mass storage transfer handler may need some time and work since storage stability/correctness is extremely important, :-) Maybe we should simply copy what the networking people do. They are very concerned about performance and latency; whatever technique they use should be good for USB too. Yes, but for old-style usb-storage, is this really a big deal? We are still easily hitting the line-speed of USB for usb-storage with simple machines, the bottlenecks that I'm seeing are in the devices themselves, and then in the USB wire speed. What about with USB-3 storage devices? Many of them still use the bulk-only transport instead of UAS. They may push the limits up. Exactly, my test device(sandisk extreme USB 3.0 16G, 0781:5580) is very popular, which is faster than most USB 3.0 pendrive in market, but the device is bulk-only, and no UAS support, so I guess most of the USB 3.0 pendrive in market still may not support UAS. Once hardware comes out that uses USB streams, and we get device support for the UAS protocol, then we might have a need to change things, but at this point in time, for the old driver, I think we are fine. Unless someone has a workload / benchmark that shows otherwise? The test results above show a 2.4% degradation for threaded interrupts as compared to tasklets. That's in addition to the bottlenecks caused by the device; no doubt it would be worse for a faster device. This result calls into question the benefits of threaded interrupts. If I enable HCD_BH in xhci driver and enable requst_threaded_irq in xhci driver, the degradation becomes 10%, see below test on the same device connected to xhci-hcd: [tom@board]$ps -ax | grep xhci Warning: bad ps syntax, perhaps a bogus '-'? See http://procps.sf.net/faq.html 4896 pts/1S+ 0:00 grep --color=auto xhci [tom@board]$sudo ./ds-msg /dev/sdb 400M 1 4 No. 0, time 121 MB No. 1, time 122 MB No. 2, time 124 MB No. 3, time 122 MB count=4, total=489 ms, average=122.250 MB [tom@board]$ [tom@board]$ps -ax | grep xhci Warning: bad ps syntax, perhaps a bogus '-'? See http://procps.sf.net/faq.html 6037 ?S 0:00 [irq/42-xhci_hcd] 6038 ?S 0:00 [irq/43-xhci_hcd] 6039 ?S 0:00 [irq/44-xhci_hcd] 6040 ?S 0:00 [irq/45-xhci_hcd] 6041 ?S 0:00 [irq/46-xhci_hcd] 6304 pts/1S+ 0:00 grep --color=auto xhci [tom@board]$ [tom@board]$ [tom@board]$sudo ./ds-msg /dev/sdb 400M 1 4 No. 0, time 107 MB No. 1, time 108 MB No. 2, time 108 MB No. 3, time 109 MB count=4, total=432 ms, average=108.000 MB Thanks, -- Ming Lei -- To unsubscribe from this list: send the line unsubscribe linux-usb in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 0/4] USB: HCD/EHCI: giveback of URB in tasklet context
On Fri, Jun 14, 2013 at 8:35 AM, Greg Kroah-Hartman gre...@linuxfoundation.org wrote: On Thu, Jun 13, 2013 at 03:41:17PM -0400, Alan Stern wrote: On Thu, 13 Jun 2013, Greg Kroah-Hartman wrote: On Thu, Jun 13, 2013 at 10:54:13AM -0400, Alan Stern wrote: On Thu, 13 Jun 2013, Ming Lei wrote: - using interrupt threaded handler(default) 33.440 MB/sec - using tasklet(#undef USB_HCD_THREADED_IRQ) 34.29 MB/sec - using hard interrupt handler(by removing HCD_BH in ehci-hcd.c ) 34.260 MB/s So looks usb mass storage performance loss can be observed with interrupt threaded handler because one mass storage read/write sectors requires at least 3 interrupts which wake up usb-storage thread 3 times (each interrupt wakeup the usb-storage each time), introducing irq threaded handler will make 2 threads to be waken up about 6 times for one read/write. I think usb mass storage transfer handler need to be rewritten, otherwise it may become worsen after using irq threaded handler in USB 3.0.(the above device can reach 120MB/sec with hardware handler or tasklet handler, which means about ~3K interrupts/sec, so ~6K contexts switch in case of using irq threaded handler) So how about supporting tasklet first, then convert to interrupt threaded handler after usb mass storage transfer is rewritten without performance loss? (rewriting usb mass storage transfer handler may need some time and work since storage stability/correctness is extremely important, :-) Maybe we should simply copy what the networking people do. They are very concerned about performance and latency; whatever technique they use should be good for USB too. Yes, but for old-style usb-storage, is this really a big deal? We are still easily hitting the line-speed of USB for usb-storage with simple machines, the bottlenecks that I'm seeing are in the devices themselves, and then in the USB wire speed. What about with USB-3 storage devices? Many of them still use the bulk-only transport instead of UAS. They may push the limits up. Are they really? Have we seen that happen yet? With the number's I've Yes, the device I am testing is bulk-only, no uas support , and it is very popular in market. seen published, we are easily serving up enough data to keep the pipe full, but that all depends on your CPU / host controller. Once hardware comes out that uses USB streams, and we get device support for the UAS protocol, then we might have a need to change things, but at this point in time, for the old driver, I think we are fine. Unless someone has a workload / benchmark that shows otherwise? The test results above show a 2.4% degradation for threaded interrupts as compared to tasklets. That's in addition to the bottlenecks caused by the device; no doubt it would be worse for a faster device. This result calls into question the benefits of threaded interrupts. The main reason for moving away from the current scheme is to reduce latency for other interrupt handlers. Ming gave two examples of slow USB code that runs in hardirq context now; with his change they would run in softirq context and therefore wouldn't delay other interrupts so much. (Interrupt latency is hard to measure, however.) Yes, I know that people keep wanting to worry about latency issues, and the best answer for them has always been, don't use USB. :) I think we can do it better, why don't do it? :-) You suffer throughput issues with predicitable latency dependancies, so This patchset don't cause throughout degradation but decrease latency much, also has other advantages. we need to be careful we don't slow down the 99% of the systems out there that do not care about this at all. Considered great amount of ARM devices in market, I think we need to consider the problem on these devices, :-) Thanks, -- Ming Lei -- To unsubscribe from this list: send the line unsubscribe linux-usb in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 0/4] USB: HCD/EHCI: giveback of URB in tasklet context
On Thu, Jun 13, 2013 at 10:54 PM, Alan Stern st...@rowland.harvard.edu wrote: Maybe we should simply copy what the networking people do. They are very concerned about performance and latency; whatever technique they use should be good for USB too. Most of net devices don't use interrupt threaded handler, looks we need to copy that first, :-) $git grep -n request_threaded_irq drivers/net/ | wc -l 9 $git grep -n request_threaded_irq drivers/net/wireless | wc -l 5 Also 5 of 9 using interrupt threaded handler are wireless network devices, which are a bit slow, and only one ethernet driver uses it. So I plan to use tasklet first, then we can switch to interrupt threaded handler in the future if mass storage devices are OK with it. If no one objects to use tasklet, I will post out the v1 patch for review later. Thanks, -- Ming Lei -- To unsubscribe from this list: send the line unsubscribe linux-usb in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 0/4] USB: HCD/EHCI: giveback of URB in tasklet context
On Fri, Jun 14, 2013 at 3:41 AM, Alan Stern st...@rowland.harvard.edu wrote: The main reason for moving away from the current scheme is to reduce latency for other interrupt handlers. Ming gave two examples of slow USB code that runs in hardirq context now; with his change they would run in softirq context and therefore wouldn't delay other interrupts so much. (Interrupt latency is hard to measure, however.) With the two trace points of irq_handler_entry and irq_handler_exit, the interrupt latency(or the time taken by hard irq handler) isn't hard to measure. One simple script can figure out the average/maximum latency for one irq handler, like I did in 4/4. Thanks, -- Ming Lei -- To unsubscribe from this list: send the line unsubscribe linux-usb in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 0/4] USB: HCD/EHCI: giveback of URB in tasklet context
On Wed, Jun 12, 2013 at 10:43 AM, Ming Lei ming@canonical.com wrote: On Wed, Jun 12, 2013 at 3:10 AM, Alan Stern st...@rowland.harvard.edu wrote: I don't follow your reasoning. Consider the following situation, where the same spinlock is used in both a URB completion handler and an interrupt handler: URB completes A tasklet calls the completion handler, with interrupts enabled The completion handler does spin_lock(lock); An interrupt occurs The interrupt handler does spin_lock(lock); // Deadlock! If you mean the interrupt handler is HCD irq handler of the device, no such problem since all complete() will run in tasklet context. If not, I am wondering why one USB driver need register another hard interrupt handler? Could you share such examples? I understand the case only exists with timer handler as discussed with Oliver, don't I? In fact, timer funtion is still run in softirq context, so the deadlock won't be caused with acquiring same lock(spin_lock) in both timer function and complete() from tasklet. Thanks, -- Ming Lei -- To unsubscribe from this list: send the line unsubscribe linux-usb in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 0/4] USB: HCD/EHCI: giveback of URB in tasklet context
On Wed, 12 Jun 2013, Ming Lei wrote: On Wed, Jun 12, 2013 at 3:10 AM, Alan Stern st...@rowland.harvard.edu wrote: Also the tasklet running in CPU0 can handle the work which should have been done by the same tasket scheduled in CPU1, so we can avoid busy-waitting in CPU1 which may be introduced by tasklet_schedule() in CPU1. Or you could have a separate tasklet for each host controller. Yes, but I will compare tasklet with interrupt threaded handler first. Yes, please. I read through the thread and I really recommend that you get rid of the tasklet. tasklets are a complete disaster by design and we really should make efforts to get rid of the last users instead of trying to work around their semantical shortcomings. Thanks, tglx -- To unsubscribe from this list: send the line unsubscribe linux-usb in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 0/4] USB: HCD/EHCI: giveback of URB in tasklet context
On Tuesday 11 June 2013 15:10:03 Alan Stern wrote: In order to prevent this from happening, you would have to change the spin_lock() call in the completion handler to spin_lock_irqsave(). Furthermore, you will have to audit every USB driver to make sure that all the completion handlers get fixed. Yes. However, it can be done mechanically. And we know only the handlers for complete need to be fixed. Regards Oliver -- To unsubscribe from this list: send the line unsubscribe linux-usb in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 0/4] USB: HCD/EHCI: giveback of URB in tasklet context
On Wed, Jun 12, 2013 at 5:11 PM, Oliver Neukum oli...@neukum.org wrote: On Tuesday 11 June 2013 15:10:03 Alan Stern wrote: In order to prevent this from happening, you would have to change the spin_lock() call in the completion handler to spin_lock_irqsave(). Furthermore, you will have to audit every USB driver to make sure that all the completion handlers get fixed. Yes. However, it can be done mechanically. And we know only the handlers for complete need to be fixed. I am wondering if the change is needed since timer function is still run in softirq context instead of hard irq. Looks Alan concerned that one USB interface driver may have another hard interrupt handler involved. Is there such kind of USB driver/device in tree? Thanks, -- Ming -- To unsubscribe from this list: send the line unsubscribe linux-usb in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 0/4] USB: HCD/EHCI: giveback of URB in tasklet context
On Wednesday 12 June 2013 18:11:34 Ming Lei wrote: On Wed, Jun 12, 2013 at 5:11 PM, Oliver Neukum oli...@neukum.org wrote: On Tuesday 11 June 2013 15:10:03 Alan Stern wrote: In order to prevent this from happening, you would have to change the spin_lock() call in the completion handler to spin_lock_irqsave(). Furthermore, you will have to audit every USB driver to make sure that all the completion handlers get fixed. Yes. However, it can be done mechanically. And we know only the handlers for complete need to be fixed. I am wondering if the change is needed since timer function is still run in softirq context instead of hard irq. Looks Alan concerned that one USB interface driver may have another hard interrupt handler involved. Is there such kind of USB driver/device in tree? No. Suppose a USB network driver. The complete() handler is written on the assumption that interrupts are off. So it takes a spinlock from the network subsystem. It does so with spin_lock() Other network drivers also take the lock. And they may take it from an IRQ handler. If such an IRQ interrupts the tasklet complete() is running in, the CPU will deadlock. The danger is not interrupt handlers in the same driver but IRQ handlers of _other_ drivers (PCI, ...) a lock is shared with. You need to go through all USB drivers and change every spin_lock() that goes for a lock that is exported to a spin_lock_irqsave() Regards Oliver -- To unsubscribe from this list: send the line unsubscribe linux-usb in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 0/4] USB: HCD/EHCI: giveback of URB in tasklet context
On Wed, Jun 12, 2013 at 6:19 PM, Oliver Neukum oli...@neukum.org wrote: On Wednesday 12 June 2013 18:11:34 Ming Lei wrote: On Wed, Jun 12, 2013 at 5:11 PM, Oliver Neukum oli...@neukum.org wrote: On Tuesday 11 June 2013 15:10:03 Alan Stern wrote: In order to prevent this from happening, you would have to change the spin_lock() call in the completion handler to spin_lock_irqsave(). Furthermore, you will have to audit every USB driver to make sure that all the completion handlers get fixed. Yes. However, it can be done mechanically. And we know only the handlers for complete need to be fixed. I am wondering if the change is needed since timer function is still run in softirq context instead of hard irq. Looks Alan concerned that one USB interface driver may have another hard interrupt handler involved. Is there such kind of USB driver/device in tree? No. Suppose a USB network driver. The complete() handler is written on the assumption that interrupts are off. So it takes a spinlock from the network subsystem. It does so with spin_lock() Looks I misses the case, but IMO, it might not be very common, generally subsystem provides API for drivers to do something(handle rx/report tx) inside complete(), and seldom exports subsystem-wide lock directly to drivers. API has no context info, and should have called spin_lock_irqrestore(). Other network drivers also take the lock. And they may take it from an IRQ handler. If such an IRQ interrupts the tasklet complete() is running in, the CPU will deadlock. Looks no usbnet drivers hold subsystem(network) locks in its complete(). Both the locks held are per device/per skb list. In usbnet_bh(), there is one, eg. netif_rx(), which is a API and disables local IRQ. The danger is not interrupt handlers in the same driver but IRQ handlers of _other_ drivers (PCI, ...) a lock is shared with. Right, so generally drivers which spin_lock(subsystem lock) in complete() might be affected. You need to go through all USB drivers and change every spin_lock() that goes for a lock that is exported to a spin_lock_irqsave() Yes, that is the safest way. Thanks, -- Ming Lei -- To unsubscribe from this list: send the line unsubscribe linux-usb in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 0/4] USB: HCD/EHCI: giveback of URB in tasklet context
On Wed, 12 Jun 2013, Ming Lei wrote: If not, I am wondering why one USB driver need register another hard interrupt handler? Could you share such examples? I understand the case only exists with timer handler as discussed with Oliver, don't I? Here's another possibility. A USB driver has a private lock, and it acquires this lock in its completion handler. But it also acquires this lock in other routines, and those other routines may be called by other drivers or subsystems in interrupt context. The drivers _do_ need to be changed, as explained above. And that explanation was only one failure scenario. There may be others. OK, I'd like to know what the others are? :-) Me too. :-) And finally, USB HCDs themselves will benefit from the change, won't they? How will they benefit? Merely from not releasing their private locks? The interrupt of one HCD will be responded lately since another HCD's interrupt handler takes very long time. With the change, the problem can be avoided. For reasonably sophisticated host controllers like EHCI, the delay in responding to an interrupt doesn't matter much. But for simpler host controllers (for example, those using PIO instead of DMA) it matters more, so those other HCDs will benefit more from the conversion. Compared with usb-storage task's schedule latency, the tasklet schedule delay should be lower at most of situations since the tasklet is scheduled inside irq handler. True. But the two latencies add. I think it should be one tasklet latency added, but what is the other one? I meant that the latency in tasklet scheduling has to be added to the latency in scheduling the usb-storage thread. Or you could have a separate tasklet for each host controller. Yes, but I will compare tasklet with interrupt threaded handler first. Switching to a threaded interrupt handler offers more possibilities. Instead of moving just the givebacks to the thread, we could put almost all the processing there. (At least, we can for ehci-hcd. Other HCDs may not be able to do this.) At this point we are talking about multiple changes: Moving the givebacks to a tasklet or threaded handler. Changing the USB completion handlers so that they can be called with interrupts enabled. Allowing the tasklet/thread to run with interrupts enabled. Moving more of the HCD processing into the tasklet/thread. Maybe other things too. In principle, the first and second items can be done simultaneously. Alan Stern -- To unsubscribe from this list: send the line unsubscribe linux-usb in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 0/4] USB: HCD/EHCI: giveback of URB in tasklet context
On Wednesday 12 June 2013 10:35:44 Alan Stern wrote: On Wed, 12 Jun 2013, Ming Lei wrote: If not, I am wondering why one USB driver need register another hard interrupt handler? Could you share such examples? I understand the case only exists with timer handler as discussed with Oliver, don't I? Here's another possibility. A USB driver has a private lock, and it acquires this lock in its completion handler. But it also acquires this lock in other routines, and those other routines may be called by other drivers or subsystems in interrupt context. That is fatal. But again mechanically using _irqsave in complete() does the job. No other place can be affected. Regards Oliver -- To unsubscribe from this list: send the line unsubscribe linux-usb in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 0/4] USB: HCD/EHCI: giveback of URB in tasklet context
On Wed, Jun 12, 2013 at 3:45 PM, Thomas Gleixner t...@linutronix.de wrote: On Wed, 12 Jun 2013, Ming Lei wrote: Yes, please. I read through the thread and I really recommend that you get rid of the tasklet. tasklets are a complete disaster by design and we really should make efforts to get rid of the last users instead of trying to work around their semantical shortcomings. The attached patch001 supports tasklet in HCD and patch002 implements interrupt threaded handler, then if USB_HCD_THREADED_IRQ is set, interrupt threaded handler is used to do URB giveback, otherwise tasklet is used to do that. The other 3 patches(2/4, 3/4, 4/4) aren't changed now, if anyone want to try it. (the patch 2 is only for comparing performance difference, and welcome to review/comment it so that we can get accurate test result to make decision) Follows the mass storage test result(average over 10 times test) with tasklet /interrupt threaded handler/hard interrupt handler under same environment of lenovo T410(x86), which means the test is switched by reinserting module of usbcore or ehci-hcd without changing other things in the machine. - do below tests 10 times and figure out the average speed dd if=/dev/sdN of=/dev/null iflag=direct bs=200M 1 device: sandisk extreme USB 3.0 16G, host: Lenovo T410 EHCI - using interrupt threaded handler(default) 33.440 MB/sec - using tasklet(#undef USB_HCD_THREADED_IRQ) 34.29 MB/sec - using hard interrupt handler(by removing HCD_BH in ehci-hcd.c ) 34.260 MB/s So looks usb mass storage performance loss can be observed with interrupt threaded handler because one mass storage read/write sectors requires at least 3 interrupts which wake up usb-storage thread 3 times (each interrupt wakeup the usb-storage each time), introducing irq threaded handler will make 2 threads to be waken up about 6 times for one read/write. I think usb mass storage transfer handler need to be rewritten, otherwise it may become worsen after using irq threaded handler in USB 3.0.(the above device can reach 120MB/sec with hardware handler or tasklet handler, which means about ~3K interrupts/sec, so ~6K contexts switch in case of using irq threaded handler) So how about supporting tasklet first, then convert to interrupt threaded handler after usb mass storage transfer is rewritten without performance loss? (rewriting usb mass storage transfer handler may need some time and work since storage stability/correctness is extremely important, :-) Also another problem with irq threaded handler is that there is no sort of tasklet_schedule() interface to wakeup the thread handler manually, so I have to use work to schedule some URB giveback from drivers(root hub transfer, unlink), even though that isn't a big deal but will cause code a bit much/complicated, :-) Thanks, -- Ming Lei 0002-USB-support-interrupt-threaded-handler.patch Description: Binary data 0001-USB-HCD-support-giveback-of-URB-in-tasklet-context.patch Description: Binary data
Re: [RFC PATCH 0/4] USB: HCD/EHCI: giveback of URB in tasklet context
On Wed, Jun 12, 2013 at 10:35 PM, Alan Stern st...@rowland.harvard.edu wrote: On Wed, 12 Jun 2013, Ming Lei wrote: If not, I am wondering why one USB driver need register another hard interrupt handler? Could you share such examples? I understand the case only exists with timer handler as discussed with Oliver, don't I? Here's another possibility. A USB driver has a private lock, and it acquires this lock in its completion handler. But it also acquires this lock in other routines, and those other routines may be called by other drivers or subsystems in interrupt context. OK, so the problem is more complicated than I imaged at start, :-( The drivers _do_ need to be changed, as explained above. And that explanation was only one failure scenario. There may be others. OK, I'd like to know what the others are? :-) Me too. :-) And finally, USB HCDs themselves will benefit from the change, won't they? How will they benefit? Merely from not releasing their private locks? The interrupt of one HCD will be responded lately since another HCD's interrupt handler takes very long time. With the change, the problem can be avoided. For reasonably sophisticated host controllers like EHCI, the delay in responding to an interrupt doesn't matter much. But for simpler host EHCI might still benefit from the change: suppose uvc video is playing on one EHCI bus, and mass storage's performance over another EHCI may decrease because of the IRQ latency introduced. controllers (for example, those using PIO instead of DMA) it matters more, so those other HCDs will benefit more from the conversion. Indeed, musb and ehci/ohci are coexisted on OMAP3/4/5. Compared with usb-storage task's schedule latency, the tasklet schedule delay should be lower at most of situations since the tasklet is scheduled inside irq handler. True. But the two latencies add. I think it should be one tasklet latency added, but what is the other one? I meant that the latency in tasklet scheduling has to be added to the latency in scheduling the usb-storage thread. Or you could have a separate tasklet for each host controller. Yes, but I will compare tasklet with interrupt threaded handler first. Switching to a threaded interrupt handler offers more possibilities. Instead of moving just the givebacks to the thread, we could put almost all the processing there. (At least, we can for ehci-hcd. Other HCDs may not be able to do this.) At this point we are talking about multiple changes: Moving the givebacks to a tasklet or threaded handler. Changing the USB completion handlers so that they can be called with interrupts enabled. Allowing the tasklet/thread to run with interrupts enabled. Moving more of the HCD processing into the tasklet/thread. Maybe other things too. In principle, the first and second items can be done simultaneously. Very good summery, we can push the 1st change with disabling local IRQ when calling complete(), so that at least DMA unmapping can be done with IRQ enabled, and at the same time do the API change, which should be a bit slow but the mechanical way proposed by Oliver may be OK. The 3rd item is easy once the 1st two items are completed. For the 4th one, it might be a long term thing, since refactoring HCD interrupt handler is a bit complicated. Also, when the 1st three items are completed, hard interrupt handler takes less time, often less than 20us at average about EHCI, so I am wondering if the 4th is worthy. Thanks, -- Ming Lei -- To unsubscribe from this list: send the line unsubscribe linux-usb in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 0/4] USB: HCD/EHCI: giveback of URB in tasklet context
On Tue, Jun 11, 2013 at 4:51 AM, Alan Stern st...@rowland.harvard.edu wrote: On Mon, 10 Jun 2013, Alan Stern wrote: Tasklet doesn't disable local interrupts. It is documented that interrupts will be disabled while the completion handler runs. Therefore the tasklet _must_ disable local interrupts. You know, it may be that you can get most of the advantages you want by enabling local interrupts around the call to unmap_urb_for_dma() in usb_hcd_giveback_urb(). No, please don't enable IRQs inside interrupt handler, and you will get warning dump. Except for unmap_urb_for_dma() in usb_hcd_giveback_urb(), I hope map_urb_for_dma() inside complete() can benefit from the change too, since map_urb_for_dma() is same time-consuming with unmap_urb_for_dma(). Also I hope complete() can be run with interrupt enabled since driver's complete() may take long time on some ARCH, as I mentioned in my last mail. And it isn't good to disable interrupt only for one interface driver, looks I still don't get real good explanation about disabling IRQs here, :-) This may be a little dangerous, though, because it is possible for an URB to be given back at the time it is submitted. Drivers may not expect interrupts to get enabled temporarily when they call usb_submit_urb(). Thanks, -- Ming Lei -- To unsubscribe from this list: send the line unsubscribe linux-usb in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 0/4] USB: HCD/EHCI: giveback of URB in tasklet context
On Tuesday 11 June 2013 13:40:20 Ming Lei wrote: On Tue, Jun 11, 2013 at 1:36 AM, Alan Stern st...@rowland.harvard.edu wrote: On Mon, 10 Jun 2013, Ming Lei wrote: If complete() callback runs in one tasklet context, spin_lock() inside complete() is enough, just like hard irq, the tasklet itself is disabled during complete(), if the percpu tasklet is converted to single tasklet. So no problem when the lock is to protect shared resources between complete(). We also have exclusion between complete() and other contexts, i.e. timers. When the lock is to protect shared resources between complete() and non-IRQ context, currently spin_lock_irqsave() is used in non-IRQ context, which is enough to prevent tasklet from being run on the CPU, so no problem for this situation too. When all HCDs support to run URB giveback in tasklet context, we can change all spin_lock_irq*() to spin_lock() in drivers URB-complete(), and in other places, the spin_lock_irq*() can be changed to spin_lock_bh(). Even now we cannot guarantee that all calls to complete() are in irq. There is the case of HCD hotunplug and other cases, like timeouts. They will have to be verified. Regards Oliver -- To unsubscribe from this list: send the line unsubscribe linux-usb in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 0/4] USB: HCD/EHCI: giveback of URB in tasklet context
On Tue, Jun 11, 2013 at 3:18 PM, Oliver Neukum oli...@neukum.org wrote: On Tuesday 11 June 2013 13:40:20 Ming Lei wrote: On Tue, Jun 11, 2013 at 1:36 AM, Alan Stern st...@rowland.harvard.edu wrote: On Mon, 10 Jun 2013, Ming Lei wrote: If complete() callback runs in one tasklet context, spin_lock() inside complete() is enough, just like hard irq, the tasklet itself is disabled during complete(), if the percpu tasklet is converted to single tasklet. So no problem when the lock is to protect shared resources between complete(). We also have exclusion between complete() and other contexts, i.e. timers. The patch also converts the complete() called in timers to tasklet context too. So could you let me know if that eases your concern? If not, could you explain your concern about other contexts in a bit detail? When the lock is to protect shared resources between complete() and non-IRQ context, currently spin_lock_irqsave() is used in non-IRQ context, which is enough to prevent tasklet from being run on the CPU, so no problem for this situation too. When all HCDs support to run URB giveback in tasklet context, we can change all spin_lock_irq*() to spin_lock() in drivers URB-complete(), and in other places, the spin_lock_irq*() can be changed to spin_lock_bh(). Even now we cannot guarantee that all calls to complete() are in irq. There is the case of HCD hotunplug and other cases, like timeouts. They will have to be verified. All URBs are completed via usb_hcd_giveback_urb(), so there should be no differences between these cases and the common one about introducing the patchset. Thanks, -- Ming Lei -- To unsubscribe from this list: send the line unsubscribe linux-usb in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 0/4] USB: HCD/EHCI: giveback of URB in tasklet context
On Tue, Jun 11, 2013 at 4:54 AM, Steven Rostedt rost...@goodmis.org wrote: On Mon, 2013-06-10 at 16:47 -0400, Alan Stern wrote: On Mon, 10 Jun 2013, Steven Rostedt wrote: and why so many drivers are using tasklet/softirq? Because it's easy to set up and device driver authors don't know any better ;-). Note, a lot of drivers are now using work queues today, which run as a task. Yes, there's a little more overhead with a task than for a softirq, but the problem with softirqs and tasklets is that they can't be preempted, and they are more important than all tasks on the system. If you have a task that is critical, it must yield for your softirq. Almost! That is, even if you have a softirq, it may not run in irq context at all. If you get too many softirqs at a time (one comes while another one is running), it will defer the processing to the ksoftirq thread. This not only runs as a task, but also runs as SCHED_OTHER, and must yield to other tasks like everyone else too. Thus, adding it as a softirq does not guarantee that it will be processed quickly. It just means that most of the time it will prevent anything else from happening while your most important handler in the world is running. From this, it sounds like you generally advise using threaded interrupt handlers rather than tasklets/softirqs. Yes, there's plenty of benefits for them, and I highly doubt that any USB device would even notice the difference between a softirq and a thread for response time latencies. Steven, thanks for your input. IMO, about the problem, the most important point between threaded interrupt handler and tasklet is the latency. For USB video/audio application, the data need to be handled very quickly. Also new transfer need to be scheduled without much latency. I am wondering if interrupt thread handler can respond quickly enough if there is high load on CPUs. For USB mass storage driver, it still need to be handled quickly. If tasklet is taken, tasklet_schedule() is always called from hard interrupt context, so most of time the latency should be better than interrupt thread handler latency since tasklet handler is called just after irq handler exits in the situation. Also we can avoid to do tasklet_schedule when the tasklet is being handled. I will compare, collect and post out some latency data later for the sake of choosing which one to handle URB giveback. Thanks, -- Ming Lei -- To unsubscribe from this list: send the line unsubscribe linux-usb in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 0/4] USB: HCD/EHCI: giveback of URB in tasklet context
On Tuesday 11 June 2013 16:14:25 Ming Lei wrote: On Tue, Jun 11, 2013 at 3:18 PM, Oliver Neukum oli...@neukum.org wrote: On Tuesday 11 June 2013 13:40:20 Ming Lei wrote: On Tue, Jun 11, 2013 at 1:36 AM, Alan Stern st...@rowland.harvard.edu wrote: On Mon, 10 Jun 2013, Ming Lei wrote: If complete() callback runs in one tasklet context, spin_lock() inside complete() is enough, just like hard irq, the tasklet itself is disabled during complete(), if the percpu tasklet is converted to single tasklet. So no problem when the lock is to protect shared resources between complete(). We also have exclusion between complete() and other contexts, i.e. timers. The patch also converts the complete() called in timers to tasklet context too. So could you let me know if that eases your concern? If not, could you explain your concern about other contexts in a bit detail? The driver itself may have submitted a timer and race against it. What locking do you need in complete() and a timer to lock against each other? Even now we cannot guarantee that all calls to complete() are in irq. There is the case of HCD hotunplug and other cases, like timeouts. They will have to be verified. All URBs are completed via usb_hcd_giveback_urb(), so there should be no differences between these cases and the common one about introducing the patchset. But it makes no sense to go to a tasklet when you are already in task context. In those cases you need to do something, essentially blocking the tasklet. Regards Oliver -- To unsubscribe from this list: send the line unsubscribe linux-usb in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 0/4] USB: HCD/EHCI: giveback of URB in tasklet context
On Tue, Jun 11, 2013 at 4:49 PM, Oliver Neukum oli...@neukum.org wrote: On Tuesday 11 June 2013 16:14:25 Ming Lei wrote: On Tue, Jun 11, 2013 at 3:18 PM, Oliver Neukum oli...@neukum.org wrote: On Tuesday 11 June 2013 13:40:20 Ming Lei wrote: On Tue, Jun 11, 2013 at 1:36 AM, Alan Stern st...@rowland.harvard.edu wrote: On Mon, 10 Jun 2013, Ming Lei wrote: If complete() callback runs in one tasklet context, spin_lock() inside complete() is enough, just like hard irq, the tasklet itself is disabled during complete(), if the percpu tasklet is converted to single tasklet. So no problem when the lock is to protect shared resources between complete(). We also have exclusion between complete() and other contexts, i.e. timers. The patch also converts the complete() called in timers to tasklet context too. So could you let me know if that eases your concern? If not, could you explain your concern about other contexts in a bit detail? The driver itself may have submitted a timer and race against it. What locking do you need in complete() and a timer to lock against each other? Good catch. The problem will come if only spin_lock() is called inside complete(), I will check main USB drivers in tree to see if there is such use case. Even now we cannot guarantee that all calls to complete() are in irq. There is the case of HCD hotunplug and other cases, like timeouts. They will have to be verified. All URBs are completed via usb_hcd_giveback_urb(), so there should be no differences between these cases and the common one about introducing the patchset. But it makes no sense to go to a tasklet when you are already in task context. In those cases you need to do something, essentially blocking the tasklet. At least now, always doing complete() in tasklet handler can simplify implementation since these cases aren't in hot path. Thanks, -- Ming Lei -- To unsubscribe from this list: send the line unsubscribe linux-usb in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 0/4] USB: HCD/EHCI: giveback of URB in tasklet context
On Tuesday 11 June 2013 17:27:28 Ming Lei wrote: On Tue, Jun 11, 2013 at 4:49 PM, Oliver Neukum oli...@neukum.org wrote: On Tuesday 11 June 2013 16:14:25 Ming Lei wrote: The driver itself may have submitted a timer and race against it. What locking do you need in complete() and a timer to lock against each other? Good catch. The problem will come if only spin_lock() is called inside complete(), I will check main USB drivers in tree to see if there is such use case. All network drivers race against timeout. I think they just unlink the URB, but there's a lot of them. But it makes no sense to go to a tasklet when you are already in task context. In those cases you need to do something, essentially blocking the tasklet. At least now, always doing complete() in tasklet handler can simplify implementation since these cases aren't in hot path. Well, I am afraid this is not simply the case. These cases are partially synchronous. For example you need to make sure all calls to complete() are finished before you disconnect a HCD itself. The same applies to a device being disconnected. It the same area, what happens if an URB is unlinked between the irq handler and the tasklet? Regards Oliver -- To unsubscribe from this list: send the line unsubscribe linux-usb in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 0/4] USB: HCD/EHCI: giveback of URB in tasklet context
On Tue, Jun 11, 2013 at 6:51 PM, Oliver Neukum oli...@neukum.org wrote: On Tuesday 11 June 2013 17:27:28 Ming Lei wrote: On Tue, Jun 11, 2013 at 4:49 PM, Oliver Neukum oli...@neukum.org wrote: On Tuesday 11 June 2013 16:14:25 Ming Lei wrote: The driver itself may have submitted a timer and race against it. What locking do you need in complete() and a timer to lock against each other? Good catch. The problem will come if only spin_lock() is called inside complete(), I will check main USB drivers in tree to see if there is such use case. All network drivers race against timeout. I think they just unlink the URB, but there's a lot of them. usbnet is fine since no spin_lock() is used in complete() and the same lock is acquired in timer handler. As far as I think of, the problem only exists in the below situation: complete(): spin_lock(A) ... spin_unlock(A) timer handler(): spin_lock(A) spin_unlock(A) And we need to replace spin_lock() in complete() with spin_lock_irqsave() if the change is going to be introduced. But it makes no sense to go to a tasklet when you are already in task context. In those cases you need to do something, essentially blocking the tasklet. At least now, always doing complete() in tasklet handler can simplify implementation since these cases aren't in hot path. Well, I am afraid this is not simply the case. These cases are partially synchronous. For example you need to make sure all calls to complete() are finished before you disconnect a HCD itself. The same applies to a device being disconnected. That is fine since the coming giveback() in tasklet context will drop the URB's reference count(urb-use_count) finally if the scheduled tasklet can't be dropped. (looks tasklet schedule can make sure that) It the same area, what happens if an URB is unlinked between the irq handler and the tasklet? The unlink will return failure since the URB isn't in queue of ep-urb_list. Thanks, -- Ming Lei -- To unsubscribe from this list: send the line unsubscribe linux-usb in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 0/4] USB: HCD/EHCI: giveback of URB in tasklet context
On Tue, 11 Jun 2013, Ming Lei wrote: Is there any good reason to do URB giveback with interrupt disabled? That's a good question. Offhand I can't think of any drivers that rely on this -- although there certainly are places where callback routines use spin_lock() rather than spin_lock_irq() or spin_lock_irqsave(), because they know that interrupts are already disabled. Complete() may take long time, for example in UVC's driver, URB's complete() may take more than 3 ms, as reported in below link: http://marc.info/?t=136438111600010r=1w=2 Also complete() is provided by interface driver, and it does many driver specific works, so it isn't good to disable interrupt only for one driver. You probably mean it isn't good to disable interrupts for the sake of only one driver. As things stand now, we disable interrupts for all callbacks in all drivers. If complete() callback runs in one tasklet context, spin_lock() inside complete() is enough, just like hard irq, the tasklet itself is disabled during complete(), if the percpu tasklet is converted to single tasklet. So no problem when the lock is to protect shared resources between complete(). When the lock is to protect shared resources between complete() and non-IRQ context, currently spin_lock_irqsave() is used in non-IRQ context, which is enough to prevent tasklet from being run on the CPU, so no problem for this situation too. When all HCDs support to run URB giveback in tasklet context, we can change all spin_lock_irq*() to spin_lock() in drivers URB-complete(), and in other places, the spin_lock_irq*() can be changed to spin_lock_bh(). I don't follow your reasoning. Consider the following situation, where the same spinlock is used in both a URB completion handler and an interrupt handler: URB completes A tasklet calls the completion handler, with interrupts enabled The completion handler does spin_lock(lock); An interrupt occurs The interrupt handler does spin_lock(lock); // Deadlock! In order to prevent this from happening, you would have to change the spin_lock() call in the completion handler to spin_lock_irqsave(). Furthermore, you will have to audit every USB driver to make sure that all the completion handlers get fixed. However, changing a documented API is not something to be done lightly. You would have to audit _every_ USB driver to make sure no trouble would arise. OK, I will write patch to request for changing the API if the improvement on the situation isn't objected. I don't mind. But don't be surprised if you end up breaking some drivers. In fact, at least now, the change on API is only about document change, and no drivers' change is required, isn't it? We can describe that URB-complete() might run in hard irq or softirq context, depends on HCD_BH flag of hcd-driver-flags. The drivers _do_ need to be changed, as explained above. And that explanation was only one failure scenario. There may be others. Even with current code, one HCD's interrupt handling still may delay the handling for another HCD interrupt handling for quite long, that is the motivation to decrease the interrupt handling time of USB HCD, ;-) And finally, USB HCDs themselves will benefit from the change, won't they? How will they benefit? Merely from not releasing their private locks? That involves a fairly small amount of code. For async transfer, generally, it should have no effect since there should have URBs queued on the qh queue before submitting URB. That's not how the Bulk-Only mass-storage protocol works. There are times when the protocol _requires_ bulk packets not to be submitted until a particular URB has completed. Yes, I agree. But mass-storage driver itself is very fragile, it will perform badly if CPU has a higher load. Why does that make it fragile? _Every_ driver and program will behave worse when the system is under a heavy load. And it is better to submit DATA/CSW transfer in previous transfer's complete(), IMO. Actually, it would be even better to submit DATA _before_ the CBW completes, and to submit the CSW before submitting DATA-OUT. Unfortunately, the design of the SG library makes it impossible to submit the CSW during DATA-IN completion. Compared with usb-storage task's schedule latency, the tasklet schedule delay should be lower at most of situations since the tasklet is scheduled inside irq handler. True. But the two latencies add. So I decided to replace the percpu tasklet with one single tasklet, which can avoid the problem. Hmmm. What happens when there is more than one host controller? The tasklet will give back only one URB at a time, and it won't run on more than one CPU at a time. The existing drivers allow URBs to be given back on multiple CPUs simultaneously. Also the tasklet running in CPU0 can handle the work which should have
Re: [RFC PATCH 0/4] USB: HCD/EHCI: giveback of URB in tasklet context
On Wed, Jun 12, 2013 at 3:10 AM, Alan Stern st...@rowland.harvard.edu wrote: On Tue, 11 Jun 2013, Ming Lei wrote: Is there any good reason to do URB giveback with interrupt disabled? That's a good question. Offhand I can't think of any drivers that rely on this -- although there certainly are places where callback routines use spin_lock() rather than spin_lock_irq() or spin_lock_irqsave(), because they know that interrupts are already disabled. Complete() may take long time, for example in UVC's driver, URB's complete() may take more than 3 ms, as reported in below link: http://marc.info/?t=136438111600010r=1w=2 Also complete() is provided by interface driver, and it does many driver specific works, so it isn't good to disable interrupt only for one driver. You probably mean it isn't good to disable interrupts for the sake of only one driver. As things stand now, we disable interrupts for all callbacks in all drivers. If complete() callback runs in one tasklet context, spin_lock() inside complete() is enough, just like hard irq, the tasklet itself is disabled during complete(), if the percpu tasklet is converted to single tasklet. So no problem when the lock is to protect shared resources between complete(). When the lock is to protect shared resources between complete() and non-IRQ context, currently spin_lock_irqsave() is used in non-IRQ context, which is enough to prevent tasklet from being run on the CPU, so no problem for this situation too. When all HCDs support to run URB giveback in tasklet context, we can change all spin_lock_irq*() to spin_lock() in drivers URB-complete(), and in other places, the spin_lock_irq*() can be changed to spin_lock_bh(). I don't follow your reasoning. Consider the following situation, where the same spinlock is used in both a URB completion handler and an interrupt handler: URB completes A tasklet calls the completion handler, with interrupts enabled The completion handler does spin_lock(lock); An interrupt occurs The interrupt handler does spin_lock(lock); // Deadlock! If you mean the interrupt handler is HCD irq handler of the device, no such problem since all complete() will run in tasklet context. If not, I am wondering why one USB driver need register another hard interrupt handler? Could you share such examples? I understand the case only exists with timer handler as discussed with Oliver, don't I? In order to prevent this from happening, you would have to change the spin_lock() call in the completion handler to spin_lock_irqsave(). Furthermore, you will have to audit every USB driver to make sure that all the completion handlers get fixed. I have written one script to search usb drivers which may use timers and spin_lock() at the same time, about 30 drivers are found, and looks we can fix them. However, changing a documented API is not something to be done lightly. You would have to audit _every_ USB driver to make sure no trouble would arise. OK, I will write patch to request for changing the API if the improvement on the situation isn't objected. I don't mind. But don't be surprised if you end up breaking some drivers. Sure, we should change these drivers before changing the API. In fact, at least now, the change on API is only about document change, and no drivers' change is required, isn't it? We can describe that URB-complete() might run in hard irq or softirq context, depends on HCD_BH flag of hcd-driver-flags. The drivers _do_ need to be changed, as explained above. And that explanation was only one failure scenario. There may be others. OK, I'd like to know what the others are? :-) Even with current code, one HCD's interrupt handling still may delay the handling for another HCD interrupt handling for quite long, that is the motivation to decrease the interrupt handling time of USB HCD, ;-) And finally, USB HCDs themselves will benefit from the change, won't they? How will they benefit? Merely from not releasing their private locks? The interrupt of one HCD will be responded lately since another HCD's interrupt handler takes very long time. With the change, the problem can be avoided. That involves a fairly small amount of code. Maybe no much code, but : 1) Inside interrupt handler no submitting and unlinking requests from drivers can happen any more, so interrupt handler should have became simple. 2), Also the race between irq handler and related timer handler needn't to be considered because the private lock can't be released in irq handler. For async transfer, generally, it should have no effect since there should have URBs queued on the qh queue before submitting URB. That's not how the Bulk-Only mass-storage protocol works. There are times when the protocol _requires_ bulk packets not to be submitted until a particular URB has completed. Yes,
Re: [RFC PATCH 0/4] USB: HCD/EHCI: giveback of URB in tasklet context
On Sun, Jun 9, 2013 at 11:48 PM, Alan Stern st...@rowland.harvard.edu wrote: On Sun, 9 Jun 2013, Ming Lei wrote: Hi, The patchset supports to run giveback of URB in tasklet context, so that DMA unmapping/mapping on transfer buffer and compelte() callback can be run with interrupt enabled, then time of HCD interrupt handler can be saved much. Also this approach may simplify HCD since HCD lock needn't be released any more before calling usb_hcd_giveback_urb(). That's a lot of material. However, you haven't specifically said _what_ you want to accomplish. It sounds like you're trying to All IRQs are disabled(locally) in hard interrupt context, but they are enabled in softirq context, this patchset can decrease the time a lot. minimize the amount of time spent in hardirq context, but I can't be sure that is really what you want. If it is, this is not a good way to do it. A better way to minimize the time spent in hardirq context is to use a threaded interrupt handler. But why do you want to minimize this time in the first place? Process context switch may have more cost than switching to softirq, then USB performance may be effected, that is why I choose to use tasklet. Also now there is no any sleep in URB giveback, so it isn't necessary to introduce process to handle the giveback or completion. The CPU will be still have to use at least as much time as before; the only difference is that it will be in softirq or in a high-priority thread instead of in hardirq. Yes, as I said above, but we can allow other IRQs to be served in handling URB giveback in softirq context, that is the value of softirq/tasklet. Of course, system may have more important things to do during the time. Generally speaking, for one driver, hard interrupt handler should always consume less time as possible. Also, I don't see how you can avoid releasing the HCD's private lock before calling usb_hcd_giveback_urb(). When the completion handler Please see patch 1/4, :-) tries to resubmit the URB, the HCD's enqueue routine will try to reacquire the private lock. If it didn't release the lock earlier, it Yes, but with the patchset, resubmitting is called in taskelet context, instead of hardirq context, so there isn't any recursion any more. Thanks, -- Ming Lei -- To unsubscribe from this list: send the line unsubscribe linux-usb in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 0/4] USB: HCD/EHCI: giveback of URB in tasklet context
[Thomas and Steve, please see the question below.] On Mon, 10 Jun 2013, Ming Lei wrote: On Sun, Jun 9, 2013 at 11:48 PM, Alan Stern st...@rowland.harvard.edu wrote: On Sun, 9 Jun 2013, Ming Lei wrote: Hi, The patchset supports to run giveback of URB in tasklet context, so that DMA unmapping/mapping on transfer buffer and compelte() callback can be run with interrupt enabled, then time of HCD interrupt handler can be saved much. Also this approach may simplify HCD since HCD lock needn't be released any more before calling usb_hcd_giveback_urb(). That's a lot of material. However, you haven't specifically said _what_ you want to accomplish. It sounds like you're trying to All IRQs are disabled(locally) in hard interrupt context, but they are enabled in softirq context, this patchset can decrease the time a lot. That isn't clear. It is documented that USB completion handlers are called with local interrupts disabled. An IRQ arriving before the tasklet starts up might therefore be serviced during the short interval before the tasklet disables local interrupts, but if that happens it would mean that the completion handler would be delayed. In effect, you are prioritizing other IRQs higher than USB completion handlers. Nevertheless, the total time spent with interrupts disabled will remain the same. (There's one other side effect that perhaps you haven't considered. When multiple URBs are addressed to the same endpoint, their completion handlers are called in order of URB completion, which is the same as the order of URB submission unless an URB is cancelled. By delegating the completion handlers to tasklets, and particularly by using per-CPU tasklets, you may violate this behavior.) minimize the amount of time spent in hardirq context, but I can't be sure that is really what you want. If it is, this is not a good way to do it. A better way to minimize the time spent in hardirq context is to use a threaded interrupt handler. But why do you want to minimize this time in the first place? Process context switch may have more cost than switching to softirq, then USB performance may be effected, that is why I choose to use tasklet. Also now there is no any sleep in URB giveback, so it isn't necessary to introduce process to handle the giveback or completion. Thomas and Steve, can you comment on this? I do not have a good understanding of the relative benefits/drawbacks of tasklets vs. threaded interrupt handlers. The CPU will be still have to use at least as much time as before; the only difference is that it will be in softirq or in a high-priority thread instead of in hardirq. Yes, as I said above, but we can allow other IRQs to be served in handling URB giveback in softirq context, that is the value of softirq/tasklet. Of course, system may have more important things to do during the time. Generally speaking, for one driver, hard interrupt handler should always consume less time as possible. As I understand it, the tradeoff between hardirq and softirq is generally unimportant. It does matter a lot in realtime settings -- but the RT kernel effectively makes _every_ interrupt handler threaded, so it won't benefit from this change. Alan Stern -- To unsubscribe from this list: send the line unsubscribe linux-usb in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 0/4] USB: HCD/EHCI: giveback of URB in tasklet context
On Monday 10 June 2013 10:03:00 Alan Stern wrote: [Thomas and Steve, please see the question below.] On Mon, 10 Jun 2013, Ming Lei wrote: That isn't clear. It is documented that USB completion handlers are called with local interrupts disabled. An IRQ arriving before the tasklet starts up might therefore be serviced during the short interval before the tasklet disables local interrupts, but if that happens it would mean that the completion handler would be delayed. That is what tasklets do by definition, isn't it? In effect, you are prioritizing other IRQs higher than USB completion handlers. Nevertheless, the total time spent with interrupts disabled will remain the same. It pobably slightly increases. You have colder caches twice. And potentially you swich CPUs. (There's one other side effect that perhaps you haven't considered. When multiple URBs are addressed to the same endpoint, their completion handlers are called in order of URB completion, which is the same as the order of URB submission unless an URB is cancelled. By delegating the completion handlers to tasklets, and particularly by using per-CPU tasklets, you may violate this behavior.) This is quite serious. It mustn't happen. Regards Oliver -- To unsubscribe from this list: send the line unsubscribe linux-usb in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 0/4] USB: HCD/EHCI: giveback of URB in tasklet context
On Mon, 10 Jun 2013, Oliver Neukum wrote: On Monday 10 June 2013 10:03:00 Alan Stern wrote: [Thomas and Steve, please see the question below.] On Mon, 10 Jun 2013, Ming Lei wrote: That isn't clear. It is documented that USB completion handlers are called with local interrupts disabled. An IRQ arriving before the tasklet starts up might therefore be serviced during the short interval before the tasklet disables local interrupts, but if that happens it would mean that the completion handler would be delayed. That is what tasklets do by definition, isn't it? Yes. Although tasklets in general have no reason to leave interrupts disabled. In effect, you are prioritizing other IRQs higher than USB completion handlers. Nevertheless, the total time spent with interrupts disabled will remain the same. It pobably slightly increases. You have colder caches twice. And potentially you swich CPUs. Actually, the situation is a little better in one respect, which was pointed out earlier but not emphasized. The DMA unmapping can be deferred to the tasklet, and it can run with interrupts enabled before the completion hanlder is called. Since the unmapping sometimes takes a while to run, this is an advantage. Alan Stern -- To unsubscribe from this list: send the line unsubscribe linux-usb in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 0/4] USB: HCD/EHCI: giveback of URB in tasklet context
On Mon, Jun 10, 2013 at 10:03 PM, Alan Stern st...@rowland.harvard.edu wrote: [Thomas and Steve, please see the question below.] On Mon, 10 Jun 2013, Ming Lei wrote: On Sun, Jun 9, 2013 at 11:48 PM, Alan Stern st...@rowland.harvard.edu wrote: On Sun, 9 Jun 2013, Ming Lei wrote: Hi, The patchset supports to run giveback of URB in tasklet context, so that DMA unmapping/mapping on transfer buffer and compelte() callback can be run with interrupt enabled, then time of HCD interrupt handler can be saved much. Also this approach may simplify HCD since HCD lock needn't be released any more before calling usb_hcd_giveback_urb(). That's a lot of material. However, you haven't specifically said _what_ you want to accomplish. It sounds like you're trying to All IRQs are disabled(locally) in hard interrupt context, but they are enabled in softirq context, this patchset can decrease the time a lot. That isn't clear. It is documented that USB completion handlers are called with local interrupts disabled. An IRQ arriving before the Is there any good reason to do URB giveback with interrupt disabled? IMO, disabling IRQs isn't necessary for completing URB since the complete() of URBs for same endpoint is reentrant. tasklet starts up might therefore be serviced during the short interval before the tasklet disables local interrupts, but if that happens it Tasklet doesn't disable local interrupts. would mean that the completion handler would be delayed. Yes, the tasklet schedule delay is introduced to URB completion, but the delay doesn't have much side effect about completing URB. For async transfer, generally, it should have no effect since there should have URBs queued on the qh queue before submitting URB. For periodic transfer, the patch uses high priority tasklet to schedule it. In effect, you are prioritizing other IRQs higher than USB completion handlers. Both other IRQs and HCD's IRQ are prioritizing the URB completion. Nevertheless, the total time spent with interrupts disabled will remain the same. No, the total time spent with interrupts disabled does not remain same, since no local IRQs are disabled during URB giveback anymore. (There's one other side effect that perhaps you haven't considered. When multiple URBs are addressed to the same endpoint, their completion handlers are called in order of URB completion, which is the same as the order of URB submission unless an URB is cancelled. By delegating the completion handlers to tasklets, and particularly by using per-CPU tasklets, you may violate this behavior.) Even though URB giveback is handled by hard interrupt handler, it still can't guarantee that 100%, please see below: - interrupt for URB A comes in CPU 0, and handled, then HCD private lock is released, and usb_hcd_giveback_urb() is called to complete URB A - interrupt for URB B comes in CPU 1 just before CPU0 releases HCD private lock, so CPU1 holds the private lock when CPU 0 releases the lock and handles the HCD interrupt. So now just like two CPUs are racing, if CPU0 wins, URB A is completed first, otherwise URB B is completed first. With introducing completing URB in percpu tasklet, the situation is just very similar. On ehci hcd, it can make sure that by always letting CPU0 complete all URBs (use ehci-scanning/ehci-need_rescan flag), but the patch can change percpu tasklet to tasklet to meet the demand. minimize the amount of time spent in hardirq context, but I can't be sure that is really what you want. If it is, this is not a good way to do it. A better way to minimize the time spent in hardirq context is to use a threaded interrupt handler. But why do you want to minimize this time in the first place? Process context switch may have more cost than switching to softirq, then USB performance may be effected, that is why I choose to use tasklet. Also now there is no any sleep in URB giveback, so it isn't necessary to introduce process to handle the giveback or completion. Thomas and Steve, can you comment on this? I do not have a good understanding of the relative benefits/drawbacks of tasklets vs. threaded interrupt handlers. The CPU will be still have to use at least as much time as before; the only difference is that it will be in softirq or in a high-priority thread instead of in hardirq. Yes, as I said above, but we can allow other IRQs to be served in handling URB giveback in softirq context, that is the value of softirq/tasklet. Of course, system may have more important things to do during the time. Generally speaking, for one driver, hard interrupt handler should always consume less time as possible. As I understand it, the tradeoff between hardirq and softirq is generally unimportant. It does matter a lot in realtime settings -- I don't think so, and obviously softirq allows IRQs to be served quickly, otherwise why is softirq and tasklet introduced? and why so many
Re: [RFC PATCH 0/4] USB: HCD/EHCI: giveback of URB in tasklet context
On Mon, 10 Jun 2013, Ming Lei wrote: That isn't clear. It is documented that USB completion handlers are called with local interrupts disabled. An IRQ arriving before the Is there any good reason to do URB giveback with interrupt disabled? That's a good question. Offhand I can't think of any drivers that rely on this -- although there certainly are places where callback routines use spin_lock() rather than spin_lock_irq() or spin_lock_irqsave(), because they know that interrupts are already disabled. However, changing a documented API is not something to be done lightly. You would have to audit _every_ USB driver to make sure no trouble would arise. IMO, disabling IRQs isn't necessary for completing URB since the complete() of URBs for same endpoint is reentrant. Whether it is _necessary_ isn't the point. It is _documented_, and therefore it cannot be changed easily. tasklet starts up might therefore be serviced during the short interval before the tasklet disables local interrupts, but if that happens it Tasklet doesn't disable local interrupts. It is documented that interrupts will be disabled while the completion handler runs. Therefore the tasklet _must_ disable local interrupts. would mean that the completion handler would be delayed. Yes, the tasklet schedule delay is introduced to URB completion, but the delay doesn't have much side effect about completing URB. What about delays that occur because a separate interrupt is serviced before the URB's completion handler is called? With the current code that can't happen, but with your scheme it can. For async transfer, generally, it should have no effect since there should have URBs queued on the qh queue before submitting URB. That's not how the Bulk-Only mass-storage protocol works. There are times when the protocol _requires_ bulk packets not to be submitted until a particular URB has completed. Since mass-storage is an important use case for USB, its requirements should not be ignored. In effect, you are prioritizing other IRQs higher than USB completion handlers. Both other IRQs and HCD's IRQ are prioritizing the URB completion. I don't understand that sentence. Prioritizing means assigning a priority to. IRQs and HCDs don't assign priorities to anything; priorities are assigned by human programmers. Nevertheless, the total time spent with interrupts disabled will remain the same. No, the total time spent with interrupts disabled does not remain same, since no local IRQs are disabled during URB giveback anymore. That is a bug. Local IRQs must be disabled during URB giveback. (There's one other side effect that perhaps you haven't considered. When multiple URBs are addressed to the same endpoint, their completion handlers are called in order of URB completion, which is the same as the order of URB submission unless an URB is cancelled. By delegating the completion handlers to tasklets, and particularly by using per-CPU tasklets, you may violate this behavior.) Even though URB giveback is handled by hard interrupt handler, it still can't guarantee that 100%, please see below: - interrupt for URB A comes in CPU 0, and handled, then HCD private lock is released, and usb_hcd_giveback_urb() is called to complete URB A - interrupt for URB B comes in CPU 1 just before CPU0 releases HCD private lock, so CPU1 holds the private lock when CPU 0 releases the lock and handles the HCD interrupt. So now just like two CPUs are racing, if CPU0 wins, URB A is completed first, otherwise URB B is completed first. Although you didn't say so, I assume you mean that A and B are URBs for the same endpoint. This means they use the same host controller and hence they use the same IRQ line. When an interrupt is handled, it is not possible for a second interrupt to arrive on the same IRQ line before the handler for the first interrupt returns. The kernel disables the IRQ line when the first interrupt occurs, and keeps it disabled until all the handlers are finished. Therefore the scenario you described cannot occur. With introducing completing URB in percpu tasklet, the situation is just very similar. No, it isn't, for the reason given above. On ehci hcd, it can make sure that by always letting CPU0 complete all URBs (use ehci-scanning/ehci-need_rescan flag), but the patch can change percpu tasklet to tasklet to meet the demand. Or perhaps you could assign each endpoint to a particular tasklet. As I understand it, the tradeoff between hardirq and softirq is generally unimportant. It does matter a lot in realtime settings -- I don't think so, and obviously softirq allows IRQs to be served quickly, otherwise why is softirq and tasklet introduced? and why so many drivers are using tasklet/softirq? This is part of what I would like to hear Thomas and Steve comment on. Alan Stern -- To unsubscribe from this list: send the line unsubscribe
Re: [RFC PATCH 0/4] USB: HCD/EHCI: giveback of URB in tasklet context
On Mon, 2013-06-10 at 13:36 -0400, Alan Stern wrote: I don't think so, and obviously softirq allows IRQs to be served quickly, otherwise why is softirq and tasklet introduced? Because of history. In the old days there were top halves (hard irq) and bottom halves (like a softirq). The thing about a bottom half was, no two could run at the same time. That is, if one CPU was running a bottom half, another CPU would have to wait for that one to complete before it could run any bottom half. The killer here is that it didn't matter if it was the same bottom half or not! Microsoft/Mindcraft was kind enough to point out this inefficiency with some benchmarks showing how horrible Linux ran on a 4 way box. Thanks to them, softirqs and tasklets were born :-) A softirq does the rest of the interrupt work with interrupts enabled. A tasklet is run from softirq context, but the difference between a softirq and tasklet is that the same tasklet can not run on two different CPUs at the same time. It acts similar to a task, as a task can not run on two different CPUs at the same time either, hence the name tasklet. But tasklets are better than bottom halves because it allows for different tasklets to run on different CPUs. and why so many drivers are using tasklet/softirq? Because it's easy to set up and device driver authors don't know any better ;-). Note, a lot of drivers are now using work queues today, which run as a task. Yes, there's a little more overhead with a task than for a softirq, but the problem with softirqs and tasklets is that they can't be preempted, and they are more important than all tasks on the system. If you have a task that is critical, it must yield for your softirq. Almost! That is, even if you have a softirq, it may not run in irq context at all. If you get too many softirqs at a time (one comes while another one is running), it will defer the processing to the ksoftirq thread. This not only runs as a task, but also runs as SCHED_OTHER, and must yield to other tasks like everyone else too. Thus, adding it as a softirq does not guarantee that it will be processed quickly. It just means that most of the time it will prevent anything else from happening while your most important handler in the world is running. -- Steve -- To unsubscribe from this list: send the line unsubscribe linux-usb in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 0/4] USB: HCD/EHCI: giveback of URB in tasklet context
On Mon, 10 Jun 2013, Alan Stern wrote: Tasklet doesn't disable local interrupts. It is documented that interrupts will be disabled while the completion handler runs. Therefore the tasklet _must_ disable local interrupts. You know, it may be that you can get most of the advantages you want by enabling local interrupts around the call to unmap_urb_for_dma() in usb_hcd_giveback_urb(). This may be a little dangerous, though, because it is possible for an URB to be given back at the time it is submitted. Drivers may not expect interrupts to get enabled temporarily when they call usb_submit_urb(). Alan Stern -- To unsubscribe from this list: send the line unsubscribe linux-usb in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 0/4] USB: HCD/EHCI: giveback of URB in tasklet context
On Mon, 2013-06-10 at 16:47 -0400, Alan Stern wrote: On Mon, 10 Jun 2013, Steven Rostedt wrote: and why so many drivers are using tasklet/softirq? Because it's easy to set up and device driver authors don't know any better ;-). Note, a lot of drivers are now using work queues today, which run as a task. Yes, there's a little more overhead with a task than for a softirq, but the problem with softirqs and tasklets is that they can't be preempted, and they are more important than all tasks on the system. If you have a task that is critical, it must yield for your softirq. Almost! That is, even if you have a softirq, it may not run in irq context at all. If you get too many softirqs at a time (one comes while another one is running), it will defer the processing to the ksoftirq thread. This not only runs as a task, but also runs as SCHED_OTHER, and must yield to other tasks like everyone else too. Thus, adding it as a softirq does not guarantee that it will be processed quickly. It just means that most of the time it will prevent anything else from happening while your most important handler in the world is running. From this, it sounds like you generally advise using threaded interrupt handlers rather than tasklets/softirqs. Yes, there's plenty of benefits for them, and I highly doubt that any USB device would even notice the difference between a softirq and a thread for response time latencies. -- Steve -- To unsubscribe from this list: send the line unsubscribe linux-usb in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 0/4] USB: HCD/EHCI: giveback of URB in tasklet context
On Tue, Jun 11, 2013 at 1:36 AM, Alan Stern st...@rowland.harvard.edu wrote: On Mon, 10 Jun 2013, Ming Lei wrote: That isn't clear. It is documented that USB completion handlers are called with local interrupts disabled. An IRQ arriving before the Is there any good reason to do URB giveback with interrupt disabled? That's a good question. Offhand I can't think of any drivers that rely on this -- although there certainly are places where callback routines use spin_lock() rather than spin_lock_irq() or spin_lock_irqsave(), because they know that interrupts are already disabled. Complete() may take long time, for example in UVC's driver, URB's complete() may take more than 3 ms, as reported in below link: http://marc.info/?t=136438111600010r=1w=2 Also complete() is provided by interface driver, and it does many driver specific works, so it isn't good to disable interrupt only for one driver. If complete() callback runs in one tasklet context, spin_lock() inside complete() is enough, just like hard irq, the tasklet itself is disabled during complete(), if the percpu tasklet is converted to single tasklet. So no problem when the lock is to protect shared resources between complete(). When the lock is to protect shared resources between complete() and non-IRQ context, currently spin_lock_irqsave() is used in non-IRQ context, which is enough to prevent tasklet from being run on the CPU, so no problem for this situation too. When all HCDs support to run URB giveback in tasklet context, we can change all spin_lock_irq*() to spin_lock() in drivers URB-complete(), and in other places, the spin_lock_irq*() can be changed to spin_lock_bh(). However, changing a documented API is not something to be done lightly. You would have to audit _every_ USB driver to make sure no trouble would arise. OK, I will write patch to request for changing the API if the improvement on the situation isn't objected. In fact, at least now, the change on API is only about document change, and no drivers' change is required, isn't it? We can describe that URB-complete() might run in hard irq or softirq context, depends on HCD_BH flag of hcd-driver-flags. IMO, disabling IRQs isn't necessary for completing URB since the complete() of URBs for same endpoint is reentrant. Whether it is _necessary_ isn't the point. It is _documented_, and therefore it cannot be changed easily. Same with above. tasklet starts up might therefore be serviced during the short interval before the tasklet disables local interrupts, but if that happens it Tasklet doesn't disable local interrupts. It is documented that interrupts will be disabled while the completion handler runs. Therefore the tasklet _must_ disable local interrupts. Same with above, we can change the document which itself shouldn't have been the only reason for blocking the change, :-) would mean that the completion handler would be delayed. Yes, the tasklet schedule delay is introduced to URB completion, but the delay doesn't have much side effect about completing URB. What about delays that occur because a separate interrupt is serviced before the URB's completion handler is called? With the current code that can't happen, but with your scheme it can. Even with current code, one HCD's interrupt handling still may delay the handling for another HCD interrupt handling for quite long, that is the motivation to decrease the interrupt handling time of USB HCD, ;-) And finally, USB HCDs themselves will benefit from the change, won't they? Also it shouldn't be a problem since most of interrupt handlers takes very less time. For async transfer, generally, it should have no effect since there should have URBs queued on the qh queue before submitting URB. That's not how the Bulk-Only mass-storage protocol works. There are times when the protocol _requires_ bulk packets not to be submitted until a particular URB has completed. Yes, I agree. But mass-storage driver itself is very fragile, it will perform badly if CPU has a higher load. And it is better to submit DATA/CSW transfer in previous transfer's complete(), IMO. Compared with usb-storage task's schedule latency, the tasklet schedule delay should be lower at most of situations since the tasklet is scheduled inside irq handler. Since mass-storage is an important use case for USB, its requirements should not be ignored. That is why I did many tests on mass storage case, and from the results in commit log of patch 4/4, looks no performance loss is involved with the change on usb storage. In effect, you are prioritizing other IRQs higher than USB completion handlers. Both other IRQs and HCD's IRQ are prioritizing the URB completion. I don't understand that sentence. Prioritizing means assigning a priority to. IRQs and HCDs don't assign priorities to anything; priorities are assigned by human programmers. Sorry, prioritizing should have been prioritized.
[RFC PATCH 0/4] USB: HCD/EHCI: giveback of URB in tasklet context
Hi, The patchset supports to run giveback of URB in tasklet context, so that DMA unmapping/mapping on transfer buffer and compelte() callback can be run with interrupt enabled, then time of HCD interrupt handler can be saved much. Also this approach may simplify HCD since HCD lock needn't be released any more before calling usb_hcd_giveback_urb(). The patchset enables the mechanism on EHCI HCD. Patch 3/4 improves interrupt qh unlink on EHCI HCD when the mechanism is introduced. In the commit log of patch 4/4, detailed test result on three machines (ARM A9/A15 dual core, X86) are provided about transfer performance and ehci irq handling time. From the result, no transfer performance loss is found and ehci irq handling time can drop much with the patchset. drivers/usb/core/hcd.c| 170 +++-- drivers/usb/host/ehci-fsl.c |2 +- drivers/usb/host/ehci-grlib.c |2 +- drivers/usb/host/ehci-hcd.c | 18 ++-- drivers/usb/host/ehci-hub.c |1 + drivers/usb/host/ehci-mv.c|2 +- drivers/usb/host/ehci-octeon.c|2 +- drivers/usb/host/ehci-pmcmsp.c|2 +- drivers/usb/host/ehci-ppc-of.c|2 +- drivers/usb/host/ehci-ps3.c |2 +- drivers/usb/host/ehci-q.c |6 +- drivers/usb/host/ehci-sched.c | 60 - drivers/usb/host/ehci-sead3.c |2 +- drivers/usb/host/ehci-sh.c|2 +- drivers/usb/host/ehci-tegra.c |2 +- drivers/usb/host/ehci-tilegx.c|2 +- drivers/usb/host/ehci-timer.c | 43 ++ drivers/usb/host/ehci-w90x900.c |2 +- drivers/usb/host/ehci-xilinx-of.c |2 +- drivers/usb/host/ehci.h |3 + include/linux/usb/hcd.h | 16 21 files changed, 295 insertions(+), 48 deletions(-) Thanks, -- Ming Lei -- To unsubscribe from this list: send the line unsubscribe linux-usb in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html