Re: [PATCH] dmaengine: pl330: Fix race in pl330_get_desc()
On 27 April 2016 at 19:17, Robin Murphywrote: >> Instead of churning the code, I would suggest either check in a loop >> that we have a desc OR allocate 2 or NR_DEFAULT_DESC descriptors >> there. Probably we get more descriptors at the same cost of memory. > > > Having had a quick look into how beneficial that might be, I discover that > what's provoking the add_desc() race is something else causing > desc_pool->head to get wedged pointing at itself, so list_empty() forever > returns true and we kick off a storm of allocations while desc_pool->tail > fills up with unused descriptors. Looks like I'm going to have to set aside > some time to properly debug and fix this driver if I want to use it for > stress-testing IOMMU code... > OK, thanks. And yes, we still need to fix that potential race in pl330_get_desc, probably by retrying in a loop. >>> >>> I'm also seeing what looks like another occasional race under the same >>> conditions where pl330_tx_submit() blows up from dma_cookie_assign() >>> dereferencing a bogus tx->chan, but I think that's beyond my ability to >>> figure out right now. Similarly the storm of WARNs from >>> pl330_issue_pending() >>> when using a large number of small buffers and dmatest.noverify=1. This >>> one was some obvious low-hanging fruit. >>> >> Sorry, that part of code has changed a lot since I wrote the driver, >> so more details will help me. > > > Here's the actual splat: > > [ 220.649267] Unable to handle kernel paging request at virtual address > 100341338 > [ 220.666576] pgd = ff8008dcb000 > [ 220.679805] [100341338] *pgd=, *pud= > [ 220.696116] Internal error: Oops: 9605 [#1] PREEMPT SMP > [ 220.711579] Modules linked in: > [ 220.724459] CPU: 3 PID: 1249 Comm: dma0chan5-copy0 Not tainted 4.6.0-rc4+ > #527 > [ 220.741652] Hardware name: ARM Juno development board (r1) (DT) > [ 220.757512] task: ffc976255400 ti: ffc975aa4000 task.ti: > ffc975aa4000 > [ 220.774979] PC is at pl330_tx_submit+0x68/0x108 > [ 220.789412] LR is at pl330_tx_submit+0x2c/0x108 > [ 220.803704] pc : [] lr : [] pstate: > 81c5 > [ 220.821078] sp : ffc975aa7c90 > [ 220.834236] x29: ffc975aa7c90 x28: ffc975e81800 > [ 220.849303] x27: ffc97626c018 x26: ff8008cd9000 > [ 220.864314] x25: x24: > [ 220.879300] x23: ff8008d8d410 x22: ff8008ce52f0 > [ 220.894266] x21: ffc976220d18 x20: ffc079138610 > [ 220.909139] x19: ffc976220c60 x18: 0010 > [ 220.923857] x17: 0007 x16: 0001 > [ 220.938418] x15: 0001 x14: 0001020304050607 > [ 220.952886] x13: 08090a0b0c0d0e0f x12: 1011121314151617 > [ 220.967305] x11: 18191a1b1c1d1e1f x10: 0001020304050607 > [ 220.981603] x9 : x8 : 0010 > [ 220.995783] x7 : 0001 x6 : ffc976220ce0 > [ 221.009899] x5 : 000100341330 x4 : ffc079138600 > [ 221.023775] x3 : x2 : ffc97626c158 > [ 221.037417] x1 : ffc97636a790 x0 : 0140 > [ 221.050845] > [ 221.060126] Process dma0chan5-copy0 (pid: 1249, stack limit = > 0xffc975aa4020) > [ 221.075704] Stack: (0xffc975aa7c90 to 0xffc975aa8000) > [ 221.089574] 7c80: ffc975aa7cd0 > ff80083d9e60 > [ 221.105653] 7ca0: ffc9758b88c0 2b12 20b8 > ff8008ce52f0 > [ 221.121765] 7cc0: f9f82b12 ffc0790e5b00 ffc975aa7e30 > ff80080d7138 > [ 221.137833] 7ce0: ffc975e81780 ff8008d90b00 ff8008af0d40 > ffc975e81800 > [ 221.153811] 7d00: ff80083d91c0 > > [ 221.169777] 7d20: ff8008ca7000 > 00323ef9fd9c > [ 221.185714] 7d40: 0008 ffc975aa7e20 ff8008b265d8 > > [ 221.201753] 7d60: 0010 000175aa7dc0 > > [ 221.217898] 7d80: 0259 0001 ff8008b265f0 > ffc9758b88e8 > [ 221.234155] 7da0: ff802b12 ffc975aa7cd0 ffc976220c88 > 005d99c3 > [ 221.250577] 7dc0: 0001 02f40259 ff8008d12be0 > ffc975aa7cc0 > [ 221.267166] 7de0: ffc975aa7df0 ff800bcc0bcc > ffc975aa7df8 > [ 221.283840] 7e00: ffc975aa7df8 c5b6eabfb5a2c4b8 ffc9 > ff80080f2048 > [ 221.300672] 7e20: ffc975aa7e20 ffc975aa7e20 > ff8008085e10 > [ 221.317405] 7e40: ff80080d7068 ffc975e81780 > > [ 221.334211] 7e60: ff80080df900 ff80080d7068 > > [ 221.351119] 7e80: ffc975e81800 > > [ 221.368086] 7ea0: ffc975aa7ea0 ffc975aa7ea0 > ff80 > [ 221.385128] 7ec0:
Re: [PATCH] dmaengine: pl330: Fix race in pl330_get_desc()
On 27 April 2016 at 19:17, Robin Murphy wrote: >> Instead of churning the code, I would suggest either check in a loop >> that we have a desc OR allocate 2 or NR_DEFAULT_DESC descriptors >> there. Probably we get more descriptors at the same cost of memory. > > > Having had a quick look into how beneficial that might be, I discover that > what's provoking the add_desc() race is something else causing > desc_pool->head to get wedged pointing at itself, so list_empty() forever > returns true and we kick off a storm of allocations while desc_pool->tail > fills up with unused descriptors. Looks like I'm going to have to set aside > some time to properly debug and fix this driver if I want to use it for > stress-testing IOMMU code... > OK, thanks. And yes, we still need to fix that potential race in pl330_get_desc, probably by retrying in a loop. >>> >>> I'm also seeing what looks like another occasional race under the same >>> conditions where pl330_tx_submit() blows up from dma_cookie_assign() >>> dereferencing a bogus tx->chan, but I think that's beyond my ability to >>> figure out right now. Similarly the storm of WARNs from >>> pl330_issue_pending() >>> when using a large number of small buffers and dmatest.noverify=1. This >>> one was some obvious low-hanging fruit. >>> >> Sorry, that part of code has changed a lot since I wrote the driver, >> so more details will help me. > > > Here's the actual splat: > > [ 220.649267] Unable to handle kernel paging request at virtual address > 100341338 > [ 220.666576] pgd = ff8008dcb000 > [ 220.679805] [100341338] *pgd=, *pud= > [ 220.696116] Internal error: Oops: 9605 [#1] PREEMPT SMP > [ 220.711579] Modules linked in: > [ 220.724459] CPU: 3 PID: 1249 Comm: dma0chan5-copy0 Not tainted 4.6.0-rc4+ > #527 > [ 220.741652] Hardware name: ARM Juno development board (r1) (DT) > [ 220.757512] task: ffc976255400 ti: ffc975aa4000 task.ti: > ffc975aa4000 > [ 220.774979] PC is at pl330_tx_submit+0x68/0x108 > [ 220.789412] LR is at pl330_tx_submit+0x2c/0x108 > [ 220.803704] pc : [] lr : [] pstate: > 81c5 > [ 220.821078] sp : ffc975aa7c90 > [ 220.834236] x29: ffc975aa7c90 x28: ffc975e81800 > [ 220.849303] x27: ffc97626c018 x26: ff8008cd9000 > [ 220.864314] x25: x24: > [ 220.879300] x23: ff8008d8d410 x22: ff8008ce52f0 > [ 220.894266] x21: ffc976220d18 x20: ffc079138610 > [ 220.909139] x19: ffc976220c60 x18: 0010 > [ 220.923857] x17: 0007 x16: 0001 > [ 220.938418] x15: 0001 x14: 0001020304050607 > [ 220.952886] x13: 08090a0b0c0d0e0f x12: 1011121314151617 > [ 220.967305] x11: 18191a1b1c1d1e1f x10: 0001020304050607 > [ 220.981603] x9 : x8 : 0010 > [ 220.995783] x7 : 0001 x6 : ffc976220ce0 > [ 221.009899] x5 : 000100341330 x4 : ffc079138600 > [ 221.023775] x3 : x2 : ffc97626c158 > [ 221.037417] x1 : ffc97636a790 x0 : 0140 > [ 221.050845] > [ 221.060126] Process dma0chan5-copy0 (pid: 1249, stack limit = > 0xffc975aa4020) > [ 221.075704] Stack: (0xffc975aa7c90 to 0xffc975aa8000) > [ 221.089574] 7c80: ffc975aa7cd0 > ff80083d9e60 > [ 221.105653] 7ca0: ffc9758b88c0 2b12 20b8 > ff8008ce52f0 > [ 221.121765] 7cc0: f9f82b12 ffc0790e5b00 ffc975aa7e30 > ff80080d7138 > [ 221.137833] 7ce0: ffc975e81780 ff8008d90b00 ff8008af0d40 > ffc975e81800 > [ 221.153811] 7d00: ff80083d91c0 > > [ 221.169777] 7d20: ff8008ca7000 > 00323ef9fd9c > [ 221.185714] 7d40: 0008 ffc975aa7e20 ff8008b265d8 > > [ 221.201753] 7d60: 0010 000175aa7dc0 > > [ 221.217898] 7d80: 0259 0001 ff8008b265f0 > ffc9758b88e8 > [ 221.234155] 7da0: ff802b12 ffc975aa7cd0 ffc976220c88 > 005d99c3 > [ 221.250577] 7dc0: 0001 02f40259 ff8008d12be0 > ffc975aa7cc0 > [ 221.267166] 7de0: ffc975aa7df0 ff800bcc0bcc > ffc975aa7df8 > [ 221.283840] 7e00: ffc975aa7df8 c5b6eabfb5a2c4b8 ffc9 > ff80080f2048 > [ 221.300672] 7e20: ffc975aa7e20 ffc975aa7e20 > ff8008085e10 > [ 221.317405] 7e40: ff80080d7068 ffc975e81780 > > [ 221.334211] 7e60: ff80080df900 ff80080d7068 > > [ 221.351119] 7e80: ffc975e81800 > > [ 221.368086] 7ea0: ffc975aa7ea0 ffc975aa7ea0 > ff80 > [ 221.385128] 7ec0: ffc975aa7ec0 ffc975aa7ec0
Re: [PATCH] dmaengine: pl330: Fix race in pl330_get_desc()
Hi Jassi, On 27/04/16 04:26, Jassi Brar wrote: On Tue, Apr 26, 2016 at 10:00 PM, Robin Murphywrote: The current logic in pl330_get_desc() contains a clear race condition, whereby if the descriptor pool is empty, we will create a new descriptor, add it to the pool with the lock held, *release the lock*, then try to remove it from the pool again. Furthermore, if that second attempt then finds the pool empty because someone else got there first, we conclude that something must have gone terribly wrong and it's the absolute end of the world. We may retry or simply increase the number of descriptors allocated in a batch from 1 to, say, NR_DEFAULT_DESC. diff --git a/drivers/dma/pl330.c b/drivers/dma/pl330.c index 372b435..7179a4d 100644 --- a/drivers/dma/pl330.c +++ b/drivers/dma/pl330.c @@ -2449,7 +2449,7 @@ static struct dma_pl330_desc *pl330_get_desc(struct dma_pl330_chan *pch) /* If the DMAC pool is empty, alloc new */ if (!desc) { - if (!add_desc(pl330, GFP_ATOMIC, 1)) + if (!add_desc(pl330, GFP_ATOMIC, NR_DEFAULT_DESC)) return NULL; /* Try again */ I'm not sure I see how leaving the obvious race in place and just making it n times harder to hit is in any way not a terrible idea compared to actually fixing it. ... [ 709.328891] dma-pl330 7ff0.dma: pl330_get_desc:2459 ALERT! ** 10 printk messages dropped ** [ 709.352280] dma-pl330 7ff0.dma: __pl330_prep_dma_memcpy:2493 Unable to fetch desc ** 11 printk messages dropped ** [ 709.372930] dma-pl330 7ff0.dma: pl330_get_desc:2459 ALERT! ** 2 printk messages dropped ** [ 709.372953] dma-pl330 7ff0.dma: pl330_get_desc:2459 ALERT! ** 8 printk messages dropped ** [ 709.374157] dma-pl330 7ff0.dma: __pl330_prep_dma_memcpy:2493 Unable to fetch desc ** 41 printk messages dropped ** [ 709.393095] dmatest: dma0chan4-copy1: result #4545: 'prep error' with src_off=0x3a32 dst_off=0x46dd len=0xc5c3 (0) ... Down with this sort of thing! The most sensible thing to do is avoid the allocate-add-remove dance entirely and simply use the freshly-allocated descriptor straight away. ... but you still try to first pluck from the list. Well, that's the point of having a pool, right? Try to reuse a free descriptor if there is one, otherwise expand the pool. It's just that in the latter case, race or not, it's absolutely pointless to add the new descriptor to the pool only to immediately remove it again, when you can simply start using it instead - either way there's still an extra descriptor released back to the pool once we're done with it. Instead of churning the code, I would suggest either check in a loop that we have a desc OR allocate 2 or NR_DEFAULT_DESC descriptors there. Probably we get more descriptors at the same cost of memory. Having had a quick look into how beneficial that might be, I discover that what's provoking the add_desc() race is something else causing desc_pool->head to get wedged pointing at itself, so list_empty() forever returns true and we kick off a storm of allocations while desc_pool->tail fills up with unused descriptors. Looks like I'm going to have to set aside some time to properly debug and fix this driver if I want to use it for stress-testing IOMMU code... I'm also seeing what looks like another occasional race under the same conditions where pl330_tx_submit() blows up from dma_cookie_assign() dereferencing a bogus tx->chan, but I think that's beyond my ability to figure out right now. Similarly the storm of WARNs from pl330_issue_pending() when using a large number of small buffers and dmatest.noverify=1. This one was some obvious low-hanging fruit. Sorry, that part of code has changed a lot since I wrote the driver, so more details will help me. Here's the actual splat: [ 220.649267] Unable to handle kernel paging request at virtual address 100341338 [ 220.666576] pgd = ff8008dcb000 [ 220.679805] [100341338] *pgd=, *pud= [ 220.696116] Internal error: Oops: 9605 [#1] PREEMPT SMP [ 220.711579] Modules linked in: [ 220.724459] CPU: 3 PID: 1249 Comm: dma0chan5-copy0 Not tainted 4.6.0-rc4+ #527 [ 220.741652] Hardware name: ARM Juno development board (r1) (DT) [ 220.757512] task: ffc976255400 ti: ffc975aa4000 task.ti: ffc975aa4000 [ 220.774979] PC is at pl330_tx_submit+0x68/0x108 [ 220.789412] LR is at pl330_tx_submit+0x2c/0x108 [ 220.803704] pc : [] lr : [] pstate: 81c5 [ 220.821078] sp : ffc975aa7c90 [ 220.834236] x29: ffc975aa7c90 x28: ffc975e81800 [ 220.849303] x27: ffc97626c018 x26: ff8008cd9000 [ 220.864314] x25: x24: [ 220.879300] x23: ff8008d8d410 x22: ff8008ce52f0 [ 220.894266] x21: ffc976220d18 x20: ffc079138610 [ 220.909139] x19: ffc976220c60 x18: 0010 [ 220.923857] x17: 0007
Re: [PATCH] dmaengine: pl330: Fix race in pl330_get_desc()
Hi Jassi, On 27/04/16 04:26, Jassi Brar wrote: On Tue, Apr 26, 2016 at 10:00 PM, Robin Murphy wrote: The current logic in pl330_get_desc() contains a clear race condition, whereby if the descriptor pool is empty, we will create a new descriptor, add it to the pool with the lock held, *release the lock*, then try to remove it from the pool again. Furthermore, if that second attempt then finds the pool empty because someone else got there first, we conclude that something must have gone terribly wrong and it's the absolute end of the world. We may retry or simply increase the number of descriptors allocated in a batch from 1 to, say, NR_DEFAULT_DESC. diff --git a/drivers/dma/pl330.c b/drivers/dma/pl330.c index 372b435..7179a4d 100644 --- a/drivers/dma/pl330.c +++ b/drivers/dma/pl330.c @@ -2449,7 +2449,7 @@ static struct dma_pl330_desc *pl330_get_desc(struct dma_pl330_chan *pch) /* If the DMAC pool is empty, alloc new */ if (!desc) { - if (!add_desc(pl330, GFP_ATOMIC, 1)) + if (!add_desc(pl330, GFP_ATOMIC, NR_DEFAULT_DESC)) return NULL; /* Try again */ I'm not sure I see how leaving the obvious race in place and just making it n times harder to hit is in any way not a terrible idea compared to actually fixing it. ... [ 709.328891] dma-pl330 7ff0.dma: pl330_get_desc:2459 ALERT! ** 10 printk messages dropped ** [ 709.352280] dma-pl330 7ff0.dma: __pl330_prep_dma_memcpy:2493 Unable to fetch desc ** 11 printk messages dropped ** [ 709.372930] dma-pl330 7ff0.dma: pl330_get_desc:2459 ALERT! ** 2 printk messages dropped ** [ 709.372953] dma-pl330 7ff0.dma: pl330_get_desc:2459 ALERT! ** 8 printk messages dropped ** [ 709.374157] dma-pl330 7ff0.dma: __pl330_prep_dma_memcpy:2493 Unable to fetch desc ** 41 printk messages dropped ** [ 709.393095] dmatest: dma0chan4-copy1: result #4545: 'prep error' with src_off=0x3a32 dst_off=0x46dd len=0xc5c3 (0) ... Down with this sort of thing! The most sensible thing to do is avoid the allocate-add-remove dance entirely and simply use the freshly-allocated descriptor straight away. ... but you still try to first pluck from the list. Well, that's the point of having a pool, right? Try to reuse a free descriptor if there is one, otherwise expand the pool. It's just that in the latter case, race or not, it's absolutely pointless to add the new descriptor to the pool only to immediately remove it again, when you can simply start using it instead - either way there's still an extra descriptor released back to the pool once we're done with it. Instead of churning the code, I would suggest either check in a loop that we have a desc OR allocate 2 or NR_DEFAULT_DESC descriptors there. Probably we get more descriptors at the same cost of memory. Having had a quick look into how beneficial that might be, I discover that what's provoking the add_desc() race is something else causing desc_pool->head to get wedged pointing at itself, so list_empty() forever returns true and we kick off a storm of allocations while desc_pool->tail fills up with unused descriptors. Looks like I'm going to have to set aside some time to properly debug and fix this driver if I want to use it for stress-testing IOMMU code... I'm also seeing what looks like another occasional race under the same conditions where pl330_tx_submit() blows up from dma_cookie_assign() dereferencing a bogus tx->chan, but I think that's beyond my ability to figure out right now. Similarly the storm of WARNs from pl330_issue_pending() when using a large number of small buffers and dmatest.noverify=1. This one was some obvious low-hanging fruit. Sorry, that part of code has changed a lot since I wrote the driver, so more details will help me. Here's the actual splat: [ 220.649267] Unable to handle kernel paging request at virtual address 100341338 [ 220.666576] pgd = ff8008dcb000 [ 220.679805] [100341338] *pgd=, *pud= [ 220.696116] Internal error: Oops: 9605 [#1] PREEMPT SMP [ 220.711579] Modules linked in: [ 220.724459] CPU: 3 PID: 1249 Comm: dma0chan5-copy0 Not tainted 4.6.0-rc4+ #527 [ 220.741652] Hardware name: ARM Juno development board (r1) (DT) [ 220.757512] task: ffc976255400 ti: ffc975aa4000 task.ti: ffc975aa4000 [ 220.774979] PC is at pl330_tx_submit+0x68/0x108 [ 220.789412] LR is at pl330_tx_submit+0x2c/0x108 [ 220.803704] pc : [] lr : [] pstate: 81c5 [ 220.821078] sp : ffc975aa7c90 [ 220.834236] x29: ffc975aa7c90 x28: ffc975e81800 [ 220.849303] x27: ffc97626c018 x26: ff8008cd9000 [ 220.864314] x25: x24: [ 220.879300] x23: ff8008d8d410 x22: ff8008ce52f0 [ 220.894266] x21: ffc976220d18 x20: ffc079138610 [ 220.909139] x19: ffc976220c60 x18: 0010 [ 220.923857] x17: 0007 x16: 0001 [
Re: [PATCH] dmaengine: pl330: Fix race in pl330_get_desc()
On Tue, Apr 26, 2016 at 10:00 PM, Robin Murphywrote: > The current logic in pl330_get_desc() contains a clear race condition, > whereby if the descriptor pool is empty, we will create a new > descriptor, add it to the pool with the lock held, *release the lock*, > then try to remove it from the pool again. > > Furthermore, if that second attempt then finds the pool empty because > someone else got there first, we conclude that something must have gone > terribly wrong and it's the absolute end of the world. > We may retry or simply increase the number of descriptors allocated in a batch from 1 to, say, NR_DEFAULT_DESC. diff --git a/drivers/dma/pl330.c b/drivers/dma/pl330.c index 372b435..7179a4d 100644 --- a/drivers/dma/pl330.c +++ b/drivers/dma/pl330.c @@ -2449,7 +2449,7 @@ static struct dma_pl330_desc *pl330_get_desc(struct dma_pl330_chan *pch) /* If the DMAC pool is empty, alloc new */ if (!desc) { - if (!add_desc(pl330, GFP_ATOMIC, 1)) + if (!add_desc(pl330, GFP_ATOMIC, NR_DEFAULT_DESC)) return NULL; /* Try again */ > ... > [ 709.328891] dma-pl330 7ff0.dma: pl330_get_desc:2459 ALERT! > ** 10 printk messages dropped ** [ 709.352280] dma-pl330 7ff0.dma: > __pl330_prep_dma_memcpy:2493 Unable to fetch desc > ** 11 printk messages dropped ** [ 709.372930] dma-pl330 7ff0.dma: > pl330_get_desc:2459 ALERT! > ** 2 printk messages dropped ** [ 709.372953] dma-pl330 7ff0.dma: > pl330_get_desc:2459 ALERT! > ** 8 printk messages dropped ** [ 709.374157] dma-pl330 7ff0.dma: > __pl330_prep_dma_memcpy:2493 Unable to fetch desc > ** 41 printk messages dropped ** [ 709.393095] dmatest: dma0chan4-copy1: > result #4545: 'prep error' with src_off=0x3a32 dst_off=0x46dd len=0xc5c3 (0) > ... > > Down with this sort of thing! The most sensible thing to do is avoid the > allocate-add-remove dance entirely and simply use the freshly-allocated > descriptor straight away. > ... but you still try to first pluck from the list. Instead of churning the code, I would suggest either check in a loop that we have a desc OR allocate 2 or NR_DEFAULT_DESC descriptors there. Probably we get more descriptors at the same cost of memory. > > I'm also seeing what looks like another occasional race under the same > conditions where pl330_tx_submit() blows up from dma_cookie_assign() > dereferencing a bogus tx->chan, but I think that's beyond my ability to > figure out right now. Similarly the storm of WARNs from pl330_issue_pending() > when using a large number of small buffers and dmatest.noverify=1. This > one was some obvious low-hanging fruit. > Sorry, that part of code has changed a lot since I wrote the driver, so more details will help me. Thanks.
Re: [PATCH] dmaengine: pl330: Fix race in pl330_get_desc()
On Tue, Apr 26, 2016 at 10:00 PM, Robin Murphy wrote: > The current logic in pl330_get_desc() contains a clear race condition, > whereby if the descriptor pool is empty, we will create a new > descriptor, add it to the pool with the lock held, *release the lock*, > then try to remove it from the pool again. > > Furthermore, if that second attempt then finds the pool empty because > someone else got there first, we conclude that something must have gone > terribly wrong and it's the absolute end of the world. > We may retry or simply increase the number of descriptors allocated in a batch from 1 to, say, NR_DEFAULT_DESC. diff --git a/drivers/dma/pl330.c b/drivers/dma/pl330.c index 372b435..7179a4d 100644 --- a/drivers/dma/pl330.c +++ b/drivers/dma/pl330.c @@ -2449,7 +2449,7 @@ static struct dma_pl330_desc *pl330_get_desc(struct dma_pl330_chan *pch) /* If the DMAC pool is empty, alloc new */ if (!desc) { - if (!add_desc(pl330, GFP_ATOMIC, 1)) + if (!add_desc(pl330, GFP_ATOMIC, NR_DEFAULT_DESC)) return NULL; /* Try again */ > ... > [ 709.328891] dma-pl330 7ff0.dma: pl330_get_desc:2459 ALERT! > ** 10 printk messages dropped ** [ 709.352280] dma-pl330 7ff0.dma: > __pl330_prep_dma_memcpy:2493 Unable to fetch desc > ** 11 printk messages dropped ** [ 709.372930] dma-pl330 7ff0.dma: > pl330_get_desc:2459 ALERT! > ** 2 printk messages dropped ** [ 709.372953] dma-pl330 7ff0.dma: > pl330_get_desc:2459 ALERT! > ** 8 printk messages dropped ** [ 709.374157] dma-pl330 7ff0.dma: > __pl330_prep_dma_memcpy:2493 Unable to fetch desc > ** 41 printk messages dropped ** [ 709.393095] dmatest: dma0chan4-copy1: > result #4545: 'prep error' with src_off=0x3a32 dst_off=0x46dd len=0xc5c3 (0) > ... > > Down with this sort of thing! The most sensible thing to do is avoid the > allocate-add-remove dance entirely and simply use the freshly-allocated > descriptor straight away. > ... but you still try to first pluck from the list. Instead of churning the code, I would suggest either check in a loop that we have a desc OR allocate 2 or NR_DEFAULT_DESC descriptors there. Probably we get more descriptors at the same cost of memory. > > I'm also seeing what looks like another occasional race under the same > conditions where pl330_tx_submit() blows up from dma_cookie_assign() > dereferencing a bogus tx->chan, but I think that's beyond my ability to > figure out right now. Similarly the storm of WARNs from pl330_issue_pending() > when using a large number of small buffers and dmatest.noverify=1. This > one was some obvious low-hanging fruit. > Sorry, that part of code has changed a lot since I wrote the driver, so more details will help me. Thanks.