Re: [PATCH] USB: EHCI: fix for leaking isochronous data
On Thu, 21 Mar 2013, Soeren Moch wrote: > > Hi Alan, Soeren > > > > Could you word the description a bit better. If Alan did not get it > > without a bit of thought, few others are going to understand it > > without a better explanation. > > > > Thanks > > Andrew > > > > Alan, > > can you come up with a better explanation, please? I think your > description how it is supposed to work from here > http://marc.info/?l=linux-usb&m=136345559432055&w=2 > is required to understand the problem and the fix. Okay, I will rewrite your patch description. Alan Stern -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] USB: EHCI: fix for leaking isochronous data
On 21.03.2013 22:20, Andrew Lunn wrote: On Thu, Mar 21, 2013 at 05:12:01PM -0400, Alan Stern wrote: On Thu, 21 Mar 2013, Alan Stern wrote: On Thu, 21 Mar 2013, Soeren Moch wrote: Now I found out what is going on here: In itd_urb_transaction() we allocate 9 iTDs for each URB with number_of_packets == 64 in my case. The iTDs are added to sched->td_list. For a frame-aligned scheduling we need 8 iTDs, the 9th one is released back to the front of the streams free_list in iso_sched_free(). This iTD was cleared after allocation and has a frame number of 0 now. So for each allocation when now_frame == 0 we allocate from the dma_pool, not from the free_list. Okay, that is a problem. But it shouldn't be such a big problem, because now_frame should not be equal to 0 very often. Oh, wait, now I get it. We never reach a steady state, because the free list never shrinks, but occasionally it does increase when now_frame is equal to 0. Even though that doesn't happen very often, the effects add up. Very good; tomorrow I will send your patch in. Hi Alan, Soeren Could you word the description a bit better. If Alan did not get it without a bit of thought, few others are going to understand it without a better explanation. Thanks Andrew Alan, can you come up with a better explanation, please? I think your description how it is supposed to work from here http://marc.info/?l=linux-usb&m=136345559432055&w=2 is required to understand the problem and the fix. Thanks, Soeren -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] USB: EHCI: fix for leaking isochronous data
On 21.03.2013 20:10, Arnd Bergmann wrote: On Thursday 21 March 2013, Jason Cooper wrote: On Thu, Mar 21, 2013 at 06:04:59PM +0100, Soeren Moch wrote: Now I found out what is going on here: In itd_urb_transaction() we allocate 9 iTDs for each URB with number_of_packets == 64 in my case. The iTDs are added to sched->td_list. For a frame-aligned scheduling we need 8 iTDs, the 9th one is released back to the front of the streams free_list in iso_sched_free(). This iTD was cleared after allocation and has a frame number of 0 now. So for each allocation when now_frame == 0 we allocate from the dma_pool, not from the free_list. The attached patch invalidates the frame number in each iTD before it is sent to the scheduler. This fixes the problem without the need to iterate over a iTD list. Signed-off-by: Soeren Moch Wow! Great work Soeren! Talk about a long road to a small fix. Thanks for keeping after it. +1 I hardly understand half of the description above, but that much sounds plausible. Is this a bug fix that should get backported to stable kernels? Arnd I think this patch should go to stable kernels, too. But I'm not an expert here... Soeren -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] USB: EHCI: fix for leaking isochronous data
On 03/21/13 22:12, Alan Stern wrote: On Thu, 21 Mar 2013, Alan Stern wrote: On Thu, 21 Mar 2013, Soeren Moch wrote: Now I found out what is going on here: In itd_urb_transaction() we allocate 9 iTDs for each URB with number_of_packets == 64 in my case. The iTDs are added to sched->td_list. For a frame-aligned scheduling we need 8 iTDs, the 9th one is released back to the front of the streams free_list in iso_sched_free(). This iTD was cleared after allocation and has a frame number of 0 now. So for each allocation when now_frame == 0 we allocate from the dma_pool, not from the free_list. Okay, that is a problem. But it shouldn't be such a big problem, because now_frame should not be equal to 0 very often. Oh, wait, now I get it. We never reach a steady state, because the free list never shrinks, but occasionally it does increase when now_frame is equal to 0. Even though that doesn't happen very often, the effects add up. Correct. Approximately twice a second we allocate 9 iTDs from the dma_pool in my case (the frame numbers run from 0 to 511 - not up to 2047, which I thought they should). But since we allocate iTDs every 8 frames in my case (for each active stream), the pool allocations occur only when we hit the allocation sequence frame 0,8,16,... (not one of the other 7 opportunities frame 1,9,17,... - frame 7,15,23,...) Very good; tomorrow I will send your patch in. Alan Stern Thank you. And also thanks to all who helped to track down this issue. Soeren -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] USB: EHCI: fix for leaking isochronous data
On Thu, Mar 21, 2013 at 05:12:01PM -0400, Alan Stern wrote: > On Thu, 21 Mar 2013, Alan Stern wrote: > > > On Thu, 21 Mar 2013, Soeren Moch wrote: > > > > > Now I found out what is going on here: > > > > > > In itd_urb_transaction() we allocate 9 iTDs for each URB with > > > number_of_packets == 64 in my case. The iTDs are added to > > > sched->td_list. For a frame-aligned scheduling we need 8 iTDs, the 9th > > > one is released back to the front of the streams free_list in > > > iso_sched_free(). This iTD was cleared after allocation and has a frame > > > number of 0 now. So for each allocation when now_frame == 0 we allocate > > > from the dma_pool, not from the free_list. > > > > Okay, that is a problem. But it shouldn't be such a big problem, > > because now_frame should not be equal to 0 very often. > > Oh, wait, now I get it. We never reach a steady state, because the > free list never shrinks, but occasionally it does increase when > now_frame is equal to 0. Even though that doesn't happen very often, > the effects add up. > > Very good; tomorrow I will send your patch in. Hi Alan, Soeren Could you word the description a bit better. If Alan did not get it without a bit of thought, few others are going to understand it without a better explanation. Thanks Andrew -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] USB: EHCI: fix for leaking isochronous data
On Thu, 21 Mar 2013, Alan Stern wrote: > On Thu, 21 Mar 2013, Soeren Moch wrote: > > > Now I found out what is going on here: > > > > In itd_urb_transaction() we allocate 9 iTDs for each URB with > > number_of_packets == 64 in my case. The iTDs are added to > > sched->td_list. For a frame-aligned scheduling we need 8 iTDs, the 9th > > one is released back to the front of the streams free_list in > > iso_sched_free(). This iTD was cleared after allocation and has a frame > > number of 0 now. So for each allocation when now_frame == 0 we allocate > > from the dma_pool, not from the free_list. > > Okay, that is a problem. But it shouldn't be such a big problem, > because now_frame should not be equal to 0 very often. Oh, wait, now I get it. We never reach a steady state, because the free list never shrinks, but occasionally it does increase when now_frame is equal to 0. Even though that doesn't happen very often, the effects add up. Very good; tomorrow I will send your patch in. Alan Stern -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] USB: EHCI: fix for leaking isochronous data
On Thu, 21 Mar 2013, Soeren Moch wrote: > Now I found out what is going on here: > > In itd_urb_transaction() we allocate 9 iTDs for each URB with > number_of_packets == 64 in my case. The iTDs are added to > sched->td_list. For a frame-aligned scheduling we need 8 iTDs, the 9th > one is released back to the front of the streams free_list in > iso_sched_free(). This iTD was cleared after allocation and has a frame > number of 0 now. So for each allocation when now_frame == 0 we allocate > from the dma_pool, not from the free_list. Okay, that is a problem. But it shouldn't be such a big problem, because now_frame should not be equal to 0 very often. > The attached patch > invalidates the frame number in each iTD before it is sent to the > scheduler. This fixes the problem without the need to iterate over a iTD > list. The patch looks okay. However I would like to understand why the 0 frame value messes things up so much. Alan Stern -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] USB: EHCI: fix for leaking isochronous data
Hi On 21/03/13 20:10, Arnd Bergmann wrote: > On Thursday 21 March 2013, Jason Cooper wrote: >> On Thu, Mar 21, 2013 at 06:04:59PM +0100, Soeren Moch wrote: > >>> >>> Now I found out what is going on here: >>> >>> In itd_urb_transaction() we allocate 9 iTDs for each URB with >>> number_of_packets == 64 in my case. The iTDs are added to >>> sched->td_list. For a frame-aligned scheduling we need 8 iTDs, the >>> 9th one is released back to the front of the streams free_list in >>> iso_sched_free(). This iTD was cleared after allocation and has a >>> frame number of 0 now. So for each allocation when now_frame == 0 we >>> allocate from the dma_pool, not from the free_list. The attached >>> patch invalidates the frame number in each iTD before it is sent to >>> the scheduler. This fixes the problem without the need to iterate >>> over a iTD list. >>> >>> Signed-off-by: Soeren Moch >> >> Wow! Great work Soeren! Talk about a long road to a small fix. Thanks >> for keeping after it. > > +1 > > I hardly understand half of the description above, but that much sounds > plausible. Is this a bug fix that should get backported to stable kernels? > + 1 and I can test on my device. Just one comment: I don't know if -1 is the correct way to init it. Michael > Arnd > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] USB: EHCI: fix for leaking isochronous data
On Thursday 21 March 2013, Jason Cooper wrote: > On Thu, Mar 21, 2013 at 06:04:59PM +0100, Soeren Moch wrote: > > > > Now I found out what is going on here: > > > > In itd_urb_transaction() we allocate 9 iTDs for each URB with > > number_of_packets == 64 in my case. The iTDs are added to > > sched->td_list. For a frame-aligned scheduling we need 8 iTDs, the > > 9th one is released back to the front of the streams free_list in > > iso_sched_free(). This iTD was cleared after allocation and has a > > frame number of 0 now. So for each allocation when now_frame == 0 we > > allocate from the dma_pool, not from the free_list. The attached > > patch invalidates the frame number in each iTD before it is sent to > > the scheduler. This fixes the problem without the need to iterate > > over a iTD list. > > > > Signed-off-by: Soeren Moch > > Wow! Great work Soeren! Talk about a long road to a small fix. Thanks > for keeping after it. +1 I hardly understand half of the description above, but that much sounds plausible. Is this a bug fix that should get backported to stable kernels? Arnd -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] USB: EHCI: fix for leaking isochronous data
On Thu, Mar 21, 2013 at 06:04:59PM +0100, Soeren Moch wrote: > On 03/17/13 18:36, Alan Stern wrote: > >On Sun, 17 Mar 2013, Soeren Moch wrote: > > > >>For each device only one isochronous endpoint is used (EP IN4, 1x 940 > >>Bytes, Interval 1). > >>When the ENOMEM error occurs, a huge number of iTDs is in the free_list > >>of one stream. This number is much higher than the 2*M entries, which > >>should be there according to your description. > > > >Okay, but how did they get there? With each URB requiring 9 iTDs, and > >about 5 URBs active at any time, there should be about 5*9 = 45 iTDs in > >use and 2*9 = 18 iTDs on the free list. By the time each URB > >completes, it should have released all 9 iTDs back to the free list, > >and each time an URB is submitted, it should be able to acquire all 9 > >of the iTDs that it needs from the free list -- it shouldn't have to > >allocate any from the DMA pool. > > > >Looks like you'll have to investigate what's going on inside > >itd_urb_transaction(). Print out some useful information whenever the > >size of stream->free_list is above 50, such as the value of num_itds, > >how many of the loop iterations could get an iTD from the free list, > >and the value of itd->frame in the case where the "goto alloc_itd" > >statement is followed. > > > >It might be a good idea also to print out the size of the free list in > >itd_complete(), where it calls ehci_urb_done(), and include the value > >of ehci->now_frame. > > > > Now I found out what is going on here: > > In itd_urb_transaction() we allocate 9 iTDs for each URB with > number_of_packets == 64 in my case. The iTDs are added to > sched->td_list. For a frame-aligned scheduling we need 8 iTDs, the > 9th one is released back to the front of the streams free_list in > iso_sched_free(). This iTD was cleared after allocation and has a > frame number of 0 now. So for each allocation when now_frame == 0 we > allocate from the dma_pool, not from the free_list. The attached > patch invalidates the frame number in each iTD before it is sent to > the scheduler. This fixes the problem without the need to iterate > over a iTD list. > > Signed-off-by: Soeren Moch Wow! Great work Soeren! Talk about a long road to a small fix. Thanks for keeping after it. thx, Jason. > > > > --- linux-3.9.0-rc3-guru/drivers/usb/host/ehci-sched.c.orig 2013-03-21 > 17:36:21.0 +0100 > +++ linux-3.9.0-rc3-guru/drivers/usb/host/ehci-sched.c2013-03-21 > 17:38:56.0 +0100 > @@ -1214,6 +1214,7 @@ itd_urb_transaction ( > > memset (itd, 0, sizeof *itd); > itd->itd_dma = itd_dma; > + itd->frame = -1; > list_add (&itd->itd_list, &sched->td_list); > } > spin_unlock_irqrestore (&ehci->lock, flags); > @@ -1915,6 +1916,7 @@ sitd_urb_transaction ( > > memset (sitd, 0, sizeof *sitd); > sitd->sitd_dma = sitd_dma; > + sitd->frame = -1; > list_add (&sitd->sitd_list, &iso_sched->td_list); > } > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] USB: EHCI: fix for leaking isochronous data
On 03/17/13 18:36, Alan Stern wrote: On Sun, 17 Mar 2013, Soeren Moch wrote: For each device only one isochronous endpoint is used (EP IN4, 1x 940 Bytes, Interval 1). When the ENOMEM error occurs, a huge number of iTDs is in the free_list of one stream. This number is much higher than the 2*M entries, which should be there according to your description. Okay, but how did they get there? With each URB requiring 9 iTDs, and about 5 URBs active at any time, there should be about 5*9 = 45 iTDs in use and 2*9 = 18 iTDs on the free list. By the time each URB completes, it should have released all 9 iTDs back to the free list, and each time an URB is submitted, it should be able to acquire all 9 of the iTDs that it needs from the free list -- it shouldn't have to allocate any from the DMA pool. Looks like you'll have to investigate what's going on inside itd_urb_transaction(). Print out some useful information whenever the size of stream->free_list is above 50, such as the value of num_itds, how many of the loop iterations could get an iTD from the free list, and the value of itd->frame in the case where the "goto alloc_itd" statement is followed. It might be a good idea also to print out the size of the free list in itd_complete(), where it calls ehci_urb_done(), and include the value of ehci->now_frame. Now I found out what is going on here: In itd_urb_transaction() we allocate 9 iTDs for each URB with number_of_packets == 64 in my case. The iTDs are added to sched->td_list. For a frame-aligned scheduling we need 8 iTDs, the 9th one is released back to the front of the streams free_list in iso_sched_free(). This iTD was cleared after allocation and has a frame number of 0 now. So for each allocation when now_frame == 0 we allocate from the dma_pool, not from the free_list. The attached patch invalidates the frame number in each iTD before it is sent to the scheduler. This fixes the problem without the need to iterate over a iTD list. Signed-off-by: Soeren Moch --- linux-3.9.0-rc3-guru/drivers/usb/host/ehci-sched.c.orig 2013-03-21 17:36:21.0 +0100 +++ linux-3.9.0-rc3-guru/drivers/usb/host/ehci-sched.c 2013-03-21 17:38:56.0 +0100 @@ -1214,6 +1214,7 @@ itd_urb_transaction ( memset (itd, 0, sizeof *itd); itd->itd_dma = itd_dma; + itd->frame = -1; list_add (&itd->itd_list, &sched->td_list); } spin_unlock_irqrestore (&ehci->lock, flags); @@ -1915,6 +1916,7 @@ sitd_urb_transaction ( memset (sitd, 0, sizeof *sitd); sitd->sitd_dma = sitd_dma; + sitd->frame = -1; list_add (&sitd->sitd_list, &iso_sched->td_list); }
Re: [PATCH] USB: EHCI: fix for leaking isochronous data
On Sun, 17 Mar 2013, Alan Stern wrote: > On Sun, 17 Mar 2013, Soeren Moch wrote: > > > For each device only one isochronous endpoint is used (EP IN4, 1x 940 > > Bytes, Interval 1). > > When the ENOMEM error occurs, a huge number of iTDs is in the free_list > > of one stream. This number is much higher than the 2*M entries, which > > should be there according to your description. > > Okay, but how did they get there? With each URB requiring 9 iTDs, and > about 5 URBs active at any time, there should be about 5*9 = 45 iTDs in > use and 2*9 = 18 iTDs on the free list. By the time each URB > completes, it should have released all 9 iTDs back to the free list, > and each time an URB is submitted, it should be able to acquire all 9 > of the iTDs that it needs from the free list -- it shouldn't have to > allocate any from the DMA pool. > > Looks like you'll have to investigate what's going on inside > itd_urb_transaction(). Print out some useful information whenever the > size of stream->free_list is above 50, such as the value of num_itds, > how many of the loop iterations could get an iTD from the free list, > and the value of itd->frame in the case where the "goto alloc_itd" > statement is followed. > > It might be a good idea also to print out the size of the free list in > itd_complete(), where it calls ehci_urb_done(), and include the value > of ehci->now_frame. One thing I forgot to mention: It would help to have millisecond precision for the timestamps in the system log, for comparison of frame number values. Please enable CONFIG_PRINTK_TIME for the next test. Alan Stern -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] USB: EHCI: fix for leaking isochronous data
On Sun, 17 Mar 2013, Soeren Moch wrote: > For each device only one isochronous endpoint is used (EP IN4, 1x 940 > Bytes, Interval 1). > When the ENOMEM error occurs, a huge number of iTDs is in the free_list > of one stream. This number is much higher than the 2*M entries, which > should be there according to your description. Okay, but how did they get there? With each URB requiring 9 iTDs, and about 5 URBs active at any time, there should be about 5*9 = 45 iTDs in use and 2*9 = 18 iTDs on the free list. By the time each URB completes, it should have released all 9 iTDs back to the free list, and each time an URB is submitted, it should be able to acquire all 9 of the iTDs that it needs from the free list -- it shouldn't have to allocate any from the DMA pool. Looks like you'll have to investigate what's going on inside itd_urb_transaction(). Print out some useful information whenever the size of stream->free_list is above 50, such as the value of num_itds, how many of the loop iterations could get an iTD from the free list, and the value of itd->frame in the case where the "goto alloc_itd" statement is followed. It might be a good idea also to print out the size of the free list in itd_complete(), where it calls ehci_urb_done(), and include the value of ehci->now_frame. Alan Stern -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] USB: EHCI: fix for leaking isochronous data
On 16.03.2013 18:39, Alan Stern wrote: On Sat, 16 Mar 2013, Soeren Moch wrote: I implemented the counter. The max value is sampled at the beginning of end_free_itds(), the current counter value is sampled at the end of this function. Counter values w/o a max number are from the error path in itd_urb_transaction(). The number of allocated iTDs can grow to higher values (why?), but normally the iTDs are freed during normal operation. Due to some reason the number of iTDs suddenly increases until coherent pool exhaustion. There is no permanent memory leak, all iTDs are released when the user application ends. But imho several thousands of iTDs cannot be the intended behavior... No, it's not. Here's how it's supposed to work: Each ehci_iso_stream structure corresponds to a single isochronous endpoint. The structure has a free list of iTDs that aren't currently in use; when an URB is submitted, its iTDs are taken from the start of this free list if possible. Otherwise new iTDs are allocated from the DMA pool. iTDs on the free list aren't always available. This is because the hardware can continue to access an iTD for up to 1 ms after the iTD has completed. itd->frame stores the frame number (one frame per ms) for when the iTD completes, and ehci->now_frame contains a best estimate of the current frame number. This explains the logic in itd_urb_transaction(). Near the end of itd_complete() you can see where a completed iTD gets added back to the end of the free list. At the very end of itd_complete() is a section of code that takes the entries on the iso_stream's free list and moves them to a global free list (ehci->cached_itd_list). This happens only when the endpoint is no longer in use, i.e., no iTDs are queued for it. The end_free_itds() routine in ehci_timer.c takes iTDs from this global list and releases them back to the DMA pool. The routine doesn't run until at least 1 ms after start_free_itds() is called, to wait for the hardware to stop accessing the iTDs on the list. The idea is that during normal use we will quickly reach a steady state, where an endpoint always has about N URBs queued for it, each URB uses about M iTDs, and there are one or two URB's worth of unused iTDs. Thus there will be N*M iTDs in use plus maybe another 2*M iTDs on the iso_stream's free list. Once we reach this point, every new URB should be able to get the iTDs it needs from the free list (assuming a new URB is submitted every time an URB completes). When the URBs stop being submitted, the pipeline will empty out and after a couple more milliseconds, all (N+2)*M iTDs should be released back to the pool. In your case the situation is complicated by the fact that you're using two devices, each of which has up to four isochronous endpoints. This makes it harder to see what's going on. Probably not all of the endpoints were being used for data transfers. But even if they were, there should not have been more then 800 iTDs allocated at any time (figure that N is around 5 and M is 9). You could simplify the testing by using only one device -- it might not exhaust the pool but your iTD counter would still indicate if something wasn't right. I'm not sure how to figure out what's causing the problem. Maybe you can think of a good way to see where the actual operation differs from the description above. Perhaps start by keeping track of the number of iTDs on each iso_stream's free list and the number in use by each iso_stream. For each device only one isochronous endpoint is used (EP IN4, 1x 940 Bytes, Interval 1). When the ENOMEM error occurs, a huge number of iTDs is in the free_list of one stream. This number is much higher than the 2*M entries, which should be there according to your description. Soeren Moch Mar 17 17:06:49 guruvdr kernel: free iso_stream:0xd802d840 Mar 17 17:06:49 guruvdr kernel: iso_stream_init: stream:0xd802d7e0 dev:5 ep:132 int:1 maxp:940 bw:152 Mar 17 17:06:50 guruvdr kernel: free iso_stream:0xd802d9c0 Mar 17 17:06:50 guruvdr kernel: iso_stream_init: stream:0xd802d360 dev:6 ep:132 int:1 maxp:940 bw:152 Mar 17 17:07:09 guruvdr kernel: itd_counter:42 (max:4549) Mar 17 17:07:09 guruvdr kernel: itd_counter:0 (max:4549) Mar 17 17:07:10 guruvdr kernel: free iso_stream:0xd802d7e0 Mar 17 17:07:11 guruvdr kernel: iso_stream_init: stream:0xd802d7e0 dev:5 ep:132 int:1 maxp:940 bw:152 Mar 17 17:07:11 guruvdr kernel: free iso_stream:0xd802d360 Mar 17 17:07:11 guruvdr kernel: iso_stream_init: stream:0xd802d360 dev:6 ep:132 int:1 maxp:940 bw:152 Mar 17 17:15:13 guruvdr kernel: ERROR: 1024 KiB atomic DMA coherent pool is too small! Mar 17 17:15:13 guruvdr kernel: Please increase it with coherent_pool= kernel parameter! Mar 17 17:15:13 guruvdr kernel: itd_counter:6275 Mar 17 17:15:13 guruvdr kernel: stream0 num_td_list_entries:33 num_free_list_entries:6200 Mar 17 17:15:13 guruvdr kernel: stream1 num_td_list_entries:36 num_free_list_entries:6 Mar 17 17:15:13 guruvdr kernel:
Re: [PATCH] USB: EHCI: fix for leaking isochronous data
On Sat, 16 Mar 2013, Soeren Moch wrote: > I implemented the counter. The max value is sampled at the beginning of > end_free_itds(), the current counter value is sampled at the end of this > function. Counter values w/o a max number are from the error path in > itd_urb_transaction(). > The number of allocated iTDs can grow to higher values (why?), but > normally the iTDs are freed during normal operation. Due to some reason > the number of iTDs suddenly increases until coherent pool exhaustion. > There is no permanent memory leak, all iTDs are released when the user > application ends. But imho several thousands of iTDs cannot be the > intended behavior... No, it's not. Here's how it's supposed to work: Each ehci_iso_stream structure corresponds to a single isochronous endpoint. The structure has a free list of iTDs that aren't currently in use; when an URB is submitted, its iTDs are taken from the start of this free list if possible. Otherwise new iTDs are allocated from the DMA pool. iTDs on the free list aren't always available. This is because the hardware can continue to access an iTD for up to 1 ms after the iTD has completed. itd->frame stores the frame number (one frame per ms) for when the iTD completes, and ehci->now_frame contains a best estimate of the current frame number. This explains the logic in itd_urb_transaction(). Near the end of itd_complete() you can see where a completed iTD gets added back to the end of the free list. At the very end of itd_complete() is a section of code that takes the entries on the iso_stream's free list and moves them to a global free list (ehci->cached_itd_list). This happens only when the endpoint is no longer in use, i.e., no iTDs are queued for it. The end_free_itds() routine in ehci_timer.c takes iTDs from this global list and releases them back to the DMA pool. The routine doesn't run until at least 1 ms after start_free_itds() is called, to wait for the hardware to stop accessing the iTDs on the list. The idea is that during normal use we will quickly reach a steady state, where an endpoint always has about N URBs queued for it, each URB uses about M iTDs, and there are one or two URB's worth of unused iTDs. Thus there will be N*M iTDs in use plus maybe another 2*M iTDs on the iso_stream's free list. Once we reach this point, every new URB should be able to get the iTDs it needs from the free list (assuming a new URB is submitted every time an URB completes). When the URBs stop being submitted, the pipeline will empty out and after a couple more milliseconds, all (N+2)*M iTDs should be released back to the pool. In your case the situation is complicated by the fact that you're using two devices, each of which has up to four isochronous endpoints. This makes it harder to see what's going on. Probably not all of the endpoints were being used for data transfers. But even if they were, there should not have been more then 800 iTDs allocated at any time (figure that N is around 5 and M is 9). You could simplify the testing by using only one device -- it might not exhaust the pool but your iTD counter would still indicate if something wasn't right. I'm not sure how to figure out what's causing the problem. Maybe you can think of a good way to see where the actual operation differs from the description above. Perhaps start by keeping track of the number of iTDs on each iso_stream's free list and the number in use by each iso_stream. Alan Stern -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] USB: EHCI: fix for leaking isochronous data
On 14.03.2013 22:33, Alan Stern wrote: Maybe a better way to go about this is, instead of printing out every allocation and deallocation, to keep a running counter. You could have the driver print out the value of this counter every minute or so. Any time the device isn't in use, the counter should be 0. I implemented the counter. The max value is sampled at the beginning of end_free_itds(), the current counter value is sampled at the end of this function. Counter values w/o a max number are from the error path in itd_urb_transaction(). The number of allocated iTDs can grow to higher values (why?), but normally the iTDs are freed during normal operation. Due to some reason the number of iTDs suddenly increases until coherent pool exhaustion. There is no permanent memory leak, all iTDs are released when the user application ends. But imho several thousands of iTDs cannot be the intended behavior... Soeren Mar 16 01:12:52 guruvdr kernel: itd_counter:42 (max:174) Mar 16 01:12:52 guruvdr kernel: itd_counter:0 (max:174) Mar 16 01:13:13 guruvdr kernel: itd_counter:42 (max:174) Mar 16 01:13:13 guruvdr kernel: itd_counter:0 (max:174) Mar 16 01:13:34 guruvdr kernel: itd_counter:87 (max:174) Mar 16 01:13:34 guruvdr kernel: itd_counter:0 (max:174) Mar 16 01:13:55 guruvdr kernel: itd_counter:42 (max:426) Mar 16 01:13:55 guruvdr kernel: itd_counter:0 (max:426) Mar 16 01:15:14 guruvdr kernel: itd_counter:249 (max:1441) Mar 16 01:15:14 guruvdr kernel: itd_counter:0 (max:1441) [...] Mar 16 02:03:33 guruvdr kernel: itd_counter:0 (max:1441) Mar 16 02:04:15 guruvdr kernel: itd_counter:150 (max:1441) Mar 16 02:04:15 guruvdr kernel: itd_counter:0 (max:1441) Mar 16 02:04:36 guruvdr kernel: itd_counter:42 (max:1441) Mar 16 02:04:36 guruvdr kernel: itd_counter:0 (max:1441) Mar 16 02:04:57 guruvdr kernel: itd_counter:42 (max:1441) Mar 16 02:04:57 guruvdr kernel: itd_counter:0 (max:1441) Mar 16 02:05:18 guruvdr kernel: itd_counter:42 (max:1441) Mar 16 02:05:18 guruvdr kernel: itd_counter:0 (max:1441) Mar 16 02:05:39 guruvdr kernel: itd_counter:42 (max:1441) Mar 16 02:05:39 guruvdr kernel: itd_counter:0 (max:1441) Mar 16 02:06:43 guruvdr kernel: itd_counter:254 (max:1441) Mar 16 02:07:44 guruvdr kernel: itd_counter:968 (max:1469) Mar 16 02:08:04 guruvdr kernel: itd_counter:1192 (max:1469) Mar 16 02:08:46 guruvdr kernel: itd_counter:1812 (max:1854) Mar 16 02:09:07 guruvdr kernel: itd_counter:2117 (max:2159) Mar 16 02:09:28 guruvdr kernel: itd_counter:2482 (max:2524) Mar 16 02:09:49 guruvdr kernel: itd_counter:2791 (max:2833) Mar 16 02:10:10 guruvdr kernel: itd_counter:3088 (max:3409) Mar 16 02:10:31 guruvdr kernel: itd_counter:3402 (max:3444) Mar 16 02:10:52 guruvdr kernel: itd_counter:3689 (max:3731) Mar 16 02:11:13 guruvdr kernel: itd_counter:3950 (max:4073) Mar 16 02:11:34 guruvdr kernel: itd_counter:4264 (max:4306) Mar 16 02:11:55 guruvdr kernel: itd_counter:4525 (max:4567) Mar 16 02:12:16 guruvdr kernel: itd_counter:4803 (max:4845) Mar 16 02:12:37 guruvdr kernel: itd_counter:5028 (max:5070) Mar 16 02:13:19 guruvdr kernel: itd_counter:5549 (max:5843) Mar 16 02:13:40 guruvdr kernel: itd_counter:5836 (max:5878) Mar 16 02:14:01 guruvdr kernel: itd_counter:6133 (max:6238) Mar 16 02:14:06 guruvdr kernel: ERROR: 1024 KiB atomic DMA coherent pool is too small! Mar 16 02:14:06 guruvdr kernel: Please increase it with coherent_pool= kernel parameter! Mar 16 02:14:06 guruvdr kernel: itd_counter:6275 Mar 16 02:14:06 guruvdr kernel: orion-ehci orion-ehci.0: can't init itds Mar 16 02:14:06 guruvdr kernel: itd_counter:6275 Mar 16 02:14:06 guruvdr kernel: orion-ehci orion-ehci.0: can't init itds Mar 16 02:14:06 guruvdr kernel: itd_counter:6275 Mar 16 02:14:06 guruvdr kernel: orion-ehci orion-ehci.0: can't init itds Mar 16 02:14:06 guruvdr kernel: itd_counter:6275 Mar 16 02:14:06 guruvdr kernel: orion-ehci orion-ehci.0: can't init itds Mar 16 02:14:07 guruvdr kernel: itd_counter:6275 Mar 16 02:14:07 guruvdr kernel: orion-ehci orion-ehci.0: can't init itds Mar 16 02:14:07 guruvdr kernel: itd_counter:6275 Mar 16 02:14:07 guruvdr kernel: orion-ehci orion-ehci.0: can't init itds Mar 16 02:14:07 guruvdr kernel: itd_counter:6275 Mar 16 02:14:07 guruvdr kernel: orion-ehci orion-ehci.0: can't init itds Mar 16 02:14:07 guruvdr kernel: itd_counter:6275 Mar 16 02:14:07 guruvdr kernel: orion-ehci orion-ehci.0: can't init itds Mar 16 02:14:08 guruvdr kernel: itd_counter:6275 Mar 16 02:14:08 guruvdr kernel: orion-ehci orion-ehci.0: can't init itds Mar 16 02:14:08 guruvdr kernel: itd_counter:78 (max:6275) Mar 16 02:14:22 guruvdr kernel: itd_counter:0 (max:6275) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] USB: EHCI: fix for leaking isochronous data
On Fri, 15 Mar 2013, Soeren Moch wrote: > > The log shows a 1-1 match between allocations and deallocations, except > > for three excess allocations about 45 lines before the end. I have no > > idea what's up with those. They may be an artifact arising from where > > you stopped copying the log data. > > > > There are as many as 400 iTDs being allocated before any are freed. > > That seems like a lot. Are they all for the same isochronous endpoint? > > What's the endpoint's period? How often are URBs submitted? > > I use 2 dvb sticks, capturing digital TV. For each stick 5 URBs on a > single endpoint are used, I think. I'm not sure, which endpoint in which > alternateSetting is active. I attached the output of 'lsusb -v' for the > sticks. > How can I track down the other information you need? Use usbmon (see Documentation/usb/usbmon.txt). Alan Stern -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] USB: EHCI: fix for leaking isochronous data
On 14.03.2013 22:33, Alan Stern wrote: On Thu, 14 Mar 2013, Soeren Moch wrote: If the memory really is being leaked here in some sort of systematic way, we may be able to see it in your debugging output after a few seconds. OK, here are the first seconds of the log. But the buffer exhaustion usually occurs after several hours of runtime... The log shows a 1-1 match between allocations and deallocations, except for three excess allocations about 45 lines before the end. I have no idea what's up with those. They may be an artifact arising from where you stopped copying the log data. There are as many as 400 iTDs being allocated before any are freed. That seems like a lot. Are they all for the same isochronous endpoint? What's the endpoint's period? How often are URBs submitted? I use 2 dvb sticks, capturing digital TV. For each stick 5 URBs on a single endpoint are used, I think. I'm not sure, which endpoint in which alternateSetting is active. I attached the output of 'lsusb -v' for the sticks. How can I track down the other information you need? In general, there shouldn't be more than a couple of millisecond's worth of iTDs allocated for any endpoint, depending on how many URBs are in the pipeline at any time. Maybe the em28xx driver is not doing a good job here!? Soeren Moch Maybe a better way to go about this is, instead of printing out every allocation and deallocation, to keep a running counter. You could have the driver print out the value of this counter every minute or so. Any time the device isn't in use, the counter should be 0. Alan Stern Bus 001 Device 005: ID 0ccd:00b2 TerraTec Electronic GmbH Device Descriptor: bLength18 bDescriptorType 1 bcdUSB 2.00 bDeviceClass0 (Defined at Interface level) bDeviceSubClass 0 bDeviceProtocol 0 bMaxPacketSize064 idVendor 0x0ccd TerraTec Electronic GmbH idProduct 0x00b2 bcdDevice1.00 iManufacturer 3 TERRATEC iProduct1 Cinergy HTC Stick iSerial 2 123456789ABCD bNumConfigurations 1 Configuration Descriptor: bLength 9 bDescriptorType 2 wTotalLength 305 bNumInterfaces 1 bConfigurationValue 1 iConfiguration 0 bmAttributes 0x80 (Bus Powered) MaxPower 500mA Interface Descriptor: bLength 9 bDescriptorType 4 bInterfaceNumber0 bAlternateSetting 0 bNumEndpoints 4 bInterfaceClass 255 Vendor Specific Class bInterfaceSubClass 0 bInterfaceProtocol255 iInterface 0 Endpoint Descriptor: bLength 7 bDescriptorType 5 bEndpointAddress 0x81 EP 1 IN bmAttributes3 Transfer TypeInterrupt Synch Type None Usage Type Data wMaxPacketSize 0x0001 1x 1 bytes bInterval 11 Endpoint Descriptor: bLength 7 bDescriptorType 5 bEndpointAddress 0x82 EP 2 IN bmAttributes1 Transfer TypeIsochronous Synch Type None Usage Type Data wMaxPacketSize 0x 1x 0 bytes bInterval 1 Endpoint Descriptor: bLength 7 bDescriptorType 5 bEndpointAddress 0x83 EP 3 IN bmAttributes1 Transfer TypeIsochronous Synch Type None Usage Type Data wMaxPacketSize 0x 1x 0 bytes bInterval 4 Endpoint Descriptor: bLength 7 bDescriptorType 5 bEndpointAddress 0x84 EP 4 IN bmAttributes1 Transfer TypeIsochronous Synch Type None Usage Type Data wMaxPacketSize 0x 1x 0 bytes bInterval 1 Interface Descriptor: bLength 9 bDescriptorType 4 bInterfaceNumber0 bAlternateSetting 1 bNumEndpoints 4 bInterfaceClass 255 Vendor Specific Class bInterfaceSubClass 0 bInterfaceProtocol255 iInterface 0 Endpoint Descriptor: bLength 7 bDescriptorType 5 bEndpointAddress 0x81 EP 1 IN bmAttributes3 Transfer TypeInterrupt Synch Type None Usage Type Data wMaxPacketSize 0x0001 1x 1 bytes
Re: [PATCH] USB: EHCI: fix for leaking isochronous data
On Thu, 14 Mar 2013, Soeren Moch wrote: > > If the memory really is being leaked here in some sort of systematic > > way, we may be able to see it in your debugging output after a few > > seconds. > > > > OK, here are the first seconds of the log. But the buffer exhaustion > usually occurs after several hours of runtime... The log shows a 1-1 match between allocations and deallocations, except for three excess allocations about 45 lines before the end. I have no idea what's up with those. They may be an artifact arising from where you stopped copying the log data. There are as many as 400 iTDs being allocated before any are freed. That seems like a lot. Are they all for the same isochronous endpoint? What's the endpoint's period? How often are URBs submitted? In general, there shouldn't be more than a couple of millisecond's worth of iTDs allocated for any endpoint, depending on how many URBs are in the pipeline at any time. Maybe a better way to go about this is, instead of printing out every allocation and deallocation, to keep a running counter. You could have the driver print out the value of this counter every minute or so. Any time the device isn't in use, the counter should be 0. Alan Stern -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] USB: EHCI: fix for leaking isochronous data
On 14.03.2013 21:32, Alan Stern wrote: On Thu, 14 Mar 2013, Soeren Moch wrote: I added a debug message to drivers/usb/host/ehci-sched.c:itd_urb_transaction() to log the allocation flags, see log below. But it looks like you didn't add a message to end_free_itds(), so we don't know when the memory gets deallocated. And you didn't print out the values of urb, num_itds, and i, or the value of itd (so we can match up allocations against deallocations). OK, I will implement this more detailed logging. But with several allocations per second and runtime of several hours this will result in a very long logfile. If the memory really is being leaked here in some sort of systematic way, we may be able to see it in your debugging output after a few seconds. OK, here are the first seconds of the log. But the buffer exhaustion usually occurs after several hours of runtime... Soeren Moch syslog.gz Description: GNU Zip compressed data
Re: [PATCH] USB: EHCI: fix for leaking isochronous data
On Thu, 14 Mar 2013, Soeren Moch wrote: > >> I added a debug message to > >> drivers/usb/host/ehci-sched.c:itd_urb_transaction() to log the > >> allocation flags, see log below. > > > > But it looks like you didn't add a message to end_free_itds(), so we > > don't know when the memory gets deallocated. And you didn't print out > > the values of urb, num_itds, and i, or the value of itd (so we can > > match up allocations against deallocations). > > OK, I will implement this more detailed logging. But with several > allocations per second and runtime of several hours this will result in > a very long logfile. If the memory really is being leaked here in some sort of systematic way, we may be able to see it in your debugging output after a few seconds. Alan Stern -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] USB: EHCI: fix for leaking isochronous data
On 14.03.2013 19:48, Soeren Moch wrote: On 10.03.2013 21:59, Alan Stern wrote: On Sun, 10 Mar 2013, Soeren Moch wrote: On Wed, 20 Feb 2013, Soeren Moch wrote: Ok. I use 2 em2840-based usb sticks (em28xx driver) attached to a Marvell Kirkwood-SoC with a orion-ehci usb controller. These usb sticks stream dvb data (digital TV) employing isochronous usb transfers (user application is vdr). Starting from linux-3.6 I see ERROR: 1024 KiB atomic DMA coherent pool is too small! in the syslog after several 10 minutes (sometimes hours) of streaming and then streaming stops. In linux-3.6 the memory management for the arm architecture was changed, so that atomic coherent dma allocations are served from a special pool. This pool gets exhausted. The only user of this pool (in my test) is orion-ehci. Although I have only 10 URBs in flight (5 for each stick, resubmitted in the completion handler), I have 256 atomic coherent allocations (memory from the pool is allocated in pages) from orion-ehci when I see this error. So I think there must be a memory leak (memory allocated atomic somewhere below the usb_submit_urb call in em28xx-core.c). With other dvb sticks using usb bulk transfers I never see this error. Since you already found a memory leak in the ehci driver for isoc transfers, I hoped you can help to solve this problem. If there are additional questions, please ask. If there is something I can test, I would be glad to do so. I guess the first thing is to get a dmesg log showing the problem. You should build a kernel with CONFIG_USB_DEBUG enabled and post the part of the dmesg output starting from when you plug in the troublesome DVB stick. Sorry for my late response. Now I built a kernel 3.8.0 with usb_debug enabled. See below for the syslog of device plug-in. It also might help to have a record of all the isochronous-related coherent allocations and deallocations done by the ehci-hcd driver. Are you comfortable making your own debugging changes? The allocations are done by a call to dma_pool_alloc() in drivers/usb/host/ehci-sched.c:itd_urb_transaction() if the device runs at high speed and sitd_urb_transaction() if the device runs at full speed. The deallocations are done by calls to dma_pool_free() in ehci-timer.c:end_free_itds(). I added a debug message to drivers/usb/host/ehci-sched.c:itd_urb_transaction() to log the allocation flags, see log below. But it looks like you didn't add a message to end_free_itds(), so we don't know when the memory gets deallocated. And you didn't print out the values of urb, num_itds, and i, or the value of itd (so we can match up allocations against deallocations). OK, I will implement this more detailed logging. But with several allocations per second and runtime of several hours this will result in a very long logfile. For me this looks like nothing is allocated atomic here, so this function should not be the root cause of the dma coherent pool exhaustion. I don't understand. If non-atomic allocations can't exhaust the pool, why do we see these allocations fail? Good point. Unfortunately I'm not familiar with the memory management details. Arnd, can memory allocated with dma_pool_alloc() and gfp_flags 0x2093 or 0x8093 come from the atomic dma coherent pool? Sorry, I logged the wrong flags. All allocations are GFP_ATOMIC (0x20) and therefore coming from the pool. Soeren Are there other allocation functions which I could track? Yes, but they wouldn't be used for isochronous transfers. See ehci_qtd_alloc(), ehci_qtd_free(), ehci_qh_alloc(), and qh_destroy() in ehci-mem.c, as well as some other one-time-only coherent allocations in that file. Alan Stern Soeren Moch -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] USB: EHCI: fix for leaking isochronous data
On 10.03.2013 21:59, Alan Stern wrote: On Sun, 10 Mar 2013, Soeren Moch wrote: On Wed, 20 Feb 2013, Soeren Moch wrote: Ok. I use 2 em2840-based usb sticks (em28xx driver) attached to a Marvell Kirkwood-SoC with a orion-ehci usb controller. These usb sticks stream dvb data (digital TV) employing isochronous usb transfers (user application is vdr). Starting from linux-3.6 I see ERROR: 1024 KiB atomic DMA coherent pool is too small! in the syslog after several 10 minutes (sometimes hours) of streaming and then streaming stops. In linux-3.6 the memory management for the arm architecture was changed, so that atomic coherent dma allocations are served from a special pool. This pool gets exhausted. The only user of this pool (in my test) is orion-ehci. Although I have only 10 URBs in flight (5 for each stick, resubmitted in the completion handler), I have 256 atomic coherent allocations (memory from the pool is allocated in pages) from orion-ehci when I see this error. So I think there must be a memory leak (memory allocated atomic somewhere below the usb_submit_urb call in em28xx-core.c). With other dvb sticks using usb bulk transfers I never see this error. Since you already found a memory leak in the ehci driver for isoc transfers, I hoped you can help to solve this problem. If there are additional questions, please ask. If there is something I can test, I would be glad to do so. I guess the first thing is to get a dmesg log showing the problem. You should build a kernel with CONFIG_USB_DEBUG enabled and post the part of the dmesg output starting from when you plug in the troublesome DVB stick. Sorry for my late response. Now I built a kernel 3.8.0 with usb_debug enabled. See below for the syslog of device plug-in. It also might help to have a record of all the isochronous-related coherent allocations and deallocations done by the ehci-hcd driver. Are you comfortable making your own debugging changes? The allocations are done by a call to dma_pool_alloc() in drivers/usb/host/ehci-sched.c:itd_urb_transaction() if the device runs at high speed and sitd_urb_transaction() if the device runs at full speed. The deallocations are done by calls to dma_pool_free() in ehci-timer.c:end_free_itds(). I added a debug message to drivers/usb/host/ehci-sched.c:itd_urb_transaction() to log the allocation flags, see log below. But it looks like you didn't add a message to end_free_itds(), so we don't know when the memory gets deallocated. And you didn't print out the values of urb, num_itds, and i, or the value of itd (so we can match up allocations against deallocations). OK, I will implement this more detailed logging. But with several allocations per second and runtime of several hours this will result in a very long logfile. For me this looks like nothing is allocated atomic here, so this function should not be the root cause of the dma coherent pool exhaustion. I don't understand. If non-atomic allocations can't exhaust the pool, why do we see these allocations fail? Good point. Unfortunately I'm not familiar with the memory management details. Arnd, can memory allocated with dma_pool_alloc() and gfp_flags 0x2093 or 0x8093 come from the atomic dma coherent pool? Are there other allocation functions which I could track? Yes, but they wouldn't be used for isochronous transfers. See ehci_qtd_alloc(), ehci_qtd_free(), ehci_qh_alloc(), and qh_destroy() in ehci-mem.c, as well as some other one-time-only coherent allocations in that file. Alan Stern Soeren Moch -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] USB: EHCI: fix for leaking isochronous data
On Sun, 10 Mar 2013, Soeren Moch wrote: > > On Wed, 20 Feb 2013, Soeren Moch wrote: > > > >> Ok. I use 2 em2840-based usb sticks (em28xx driver) attached to a > >> Marvell Kirkwood-SoC with a orion-ehci usb controller. These usb sticks > >> stream dvb data (digital TV) employing isochronous usb transfers (user > >> application is vdr). > >> > >> Starting from linux-3.6 I see > >> ERROR: 1024 KiB atomic DMA coherent pool is too small! > >> in the syslog after several 10 minutes (sometimes hours) of streaming > >> and then streaming stops. > >> > >> In linux-3.6 the memory management for the arm architecture was changed, > >> so that atomic coherent dma allocations are served from a special pool. > >> This pool gets exhausted. The only user of this pool (in my test) is > >> orion-ehci. Although I have only 10 URBs in flight (5 for each stick, > >> resubmitted in the completion handler), I have 256 atomic coherent > >> allocations (memory from the pool is allocated in pages) from orion-ehci > >> when I see this error. So I think there must be a memory leak (memory > >> allocated atomic somewhere below the usb_submit_urb call in em28xx-core.c). > >> > >> With other dvb sticks using usb bulk transfers I never see this error. > >> > >> Since you already found a memory leak in the ehci driver for isoc > >> transfers, I hoped you can help to solve this problem. If there are > >> additional questions, please ask. If there is something I can test, I > >> would be glad to do so. > > > > I guess the first thing is to get a dmesg log showing the problem. You > > should build a kernel with CONFIG_USB_DEBUG enabled and post the part > > of the dmesg output starting from when you plug in the troublesome DVB > > stick. > > Sorry for my late response. Now I built a kernel 3.8.0 with usb_debug > enabled. See below for the syslog of device plug-in. > > > It also might help to have a record of all the isochronous-related > > coherent allocations and deallocations done by the ehci-hcd driver. > > Are you comfortable making your own debugging changes? The allocations > > are done by a call to dma_pool_alloc() in > > drivers/usb/host/ehci-sched.c:itd_urb_transaction() if the device runs > > at high speed and sitd_urb_transaction() if the device runs at full > > speed. The deallocations are done by calls to dma_pool_free() in > > ehci-timer.c:end_free_itds(). > > > > I added a debug message to > drivers/usb/host/ehci-sched.c:itd_urb_transaction() to log the > allocation flags, see log below. But it looks like you didn't add a message to end_free_itds(), so we don't know when the memory gets deallocated. And you didn't print out the values of urb, num_itds, and i, or the value of itd (so we can match up allocations against deallocations). > For me this looks like nothing is > allocated atomic here, so this function should not be the root cause of > the dma coherent pool exhaustion. I don't understand. If non-atomic allocations can't exhaust the pool, why do we see these allocations fail? > Are there other allocation functions > which I could track? Yes, but they wouldn't be used for isochronous transfers. See ehci_qtd_alloc(), ehci_qtd_free(), ehci_qh_alloc(), and qh_destroy() in ehci-mem.c, as well as some other one-time-only coherent allocations in that file. Alan Stern -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] USB: EHCI: fix for leaking isochronous data
On 21.02.2013 19:54, Alan Stern wrote: I trust you won't mind if I put this on the public mailing list. In general, problems of this sort should be discussed in public. In addition to creating a permanent record in the various mailing list archives, it also gives other people a chance to learn about these problems and to chime in if they think of something I have overlooked. This problem was already discussed on the kernel and mm mailing list. (see thread starting here: https://lkml.org/lkml/2012/11/6/408 ) Since this seems to be USB related now, of course it makes sense to discuss this problem on the usb mailing list. On Wed, 20 Feb 2013, Soeren Moch wrote: Ok. I use 2 em2840-based usb sticks (em28xx driver) attached to a Marvell Kirkwood-SoC with a orion-ehci usb controller. These usb sticks stream dvb data (digital TV) employing isochronous usb transfers (user application is vdr). Starting from linux-3.6 I see ERROR: 1024 KiB atomic DMA coherent pool is too small! in the syslog after several 10 minutes (sometimes hours) of streaming and then streaming stops. In linux-3.6 the memory management for the arm architecture was changed, so that atomic coherent dma allocations are served from a special pool. This pool gets exhausted. The only user of this pool (in my test) is orion-ehci. Although I have only 10 URBs in flight (5 for each stick, resubmitted in the completion handler), I have 256 atomic coherent allocations (memory from the pool is allocated in pages) from orion-ehci when I see this error. So I think there must be a memory leak (memory allocated atomic somewhere below the usb_submit_urb call in em28xx-core.c). With other dvb sticks using usb bulk transfers I never see this error. Since you already found a memory leak in the ehci driver for isoc transfers, I hoped you can help to solve this problem. If there are additional questions, please ask. If there is something I can test, I would be glad to do so. I guess the first thing is to get a dmesg log showing the problem. You should build a kernel with CONFIG_USB_DEBUG enabled and post the part of the dmesg output starting from when you plug in the troublesome DVB stick. Sorry for my late response. Now I built a kernel 3.8.0 with usb_debug enabled. See below for the syslog of device plug-in. It also might help to have a record of all the isochronous-related coherent allocations and deallocations done by the ehci-hcd driver. Are you comfortable making your own debugging changes? The allocations are done by a call to dma_pool_alloc() in drivers/usb/host/ehci-sched.c:itd_urb_transaction() if the device runs at high speed and sitd_urb_transaction() if the device runs at full speed. The deallocations are done by calls to dma_pool_free() in ehci-timer.c:end_free_itds(). I added a debug message to drivers/usb/host/ehci-sched.c:itd_urb_transaction() to log the allocation flags, see log below. For me this looks like nothing is allocated atomic here, so this function should not be the root cause of the dma coherent pool exhaustion. Are there other allocation functions which I could track? Regards, Soeren Moch Mar 10 15:21:05 guruvdr kernel: hub 1-1:1.0: state 7 ports 4 chg evt 0008 Mar 10 15:21:05 guruvdr kernel: hub 1-1:1.0: port 3, status 0101, change 0001, 12 Mb/s Mar 10 15:21:05 guruvdr kernel: hub 1-1:1.0: debounce: port 3: total 100ms stable 100ms status 0x101 Mar 10 15:21:05 guruvdr kernel: usb 1-1.3: new high-speed USB device number 6 using orion-ehci Mar 10 15:21:06 guruvdr kernel: usb 1-1.3: default language 0x0409 Mar 10 15:21:06 guruvdr kernel: usb 1-1.3: udev 6, busnum 1, minor = 5 Mar 10 15:21:06 guruvdr kernel: usb 1-1.3: New USB device found, idVendor=0ccd, idProduct=00b2 Mar 10 15:21:06 guruvdr kernel: usb 1-1.3: New USB device strings: Mfr=3, Product=1, SerialNumber=2 Mar 10 15:21:06 guruvdr kernel: usb 1-1.3: Product: Cinergy HTC Stick Mar 10 15:21:06 guruvdr kernel: usb 1-1.3: Manufacturer: TERRATEC Mar 10 15:21:06 guruvdr kernel: usb 1-1.3: SerialNumber: 123456789ABCD Mar 10 15:21:06 guruvdr kernel: usb 1-1.3: usb_probe_device Mar 10 15:21:06 guruvdr kernel: usb 1-1.3: configuration #1 chosen from 1 choice Mar 10 15:21:06 guruvdr kernel: usb 1-1.3: adding 1-1.3:1.0 (config #1, interface 0) Mar 10 15:21:06 guruvdr kernel: em28xx 1-1.3:1.0: usb_probe_interface Mar 10 15:21:06 guruvdr kernel: em28xx 1-1.3:1.0: usb_probe_interface - got id Mar 10 15:21:06 guruvdr kernel: em28xx: New device TERRATEC Cinergy HTC Stick @ 480 Mbps (0ccd:00b2, interface 0, class 0) Mar 10 15:21:06 guruvdr kernel: em28xx: Audio Vendor Class interface 0 found Mar 10 15:21:06 guruvdr kernel: em28xx: Video interface 0 found Mar 10 15:21:06 guruvdr kernel: em28xx: DVB interface 0 found Mar 10 15:21:06 guruvdr kernel: em28xx #0: chip ID is em2884 Mar 10 15:21:06 guruvdr kernel: em28xx #0: Identified as Terratec Cinergy HTC Stick (card=82) Mar 10 15:21:06 guruvdr kernel: em28xx #0: Config register raw