RE: [PATCH v4 2/3] dmaengine: imx-sdma: fix dma freezes
On 2019-9-19 22:30 Philipp Puschmann wrote > For some years and since many kernel versions there are reports that the RX > UART SDMA channel stops working at some point. The workaround was to > disable DMA for RX. This commit tries to fix the problem itself. > > Due to its license i wasn't able to debug the sdma script itself but it > somehow > leads to blocking the scheduling of the channel script when a running sdma > script does not find any free descriptor in the ring to put its data into. > > If we detect such a potential case we manually restart the channel. > > As sdmac->desc is constant we can move desc out of the loop. > > Fixes: 1ec1e82f2510 ("dmaengine: Add Freescale i.MX SDMA support") In fact, it's a refine patch rather than bug fix, just restore cyclic transfer back in the corner case. There are two causes for such 'corner case': 1. improper number of BD or length of BD setting for cyclic, so that BD could be consumed very quickly, worst case is uart Aging timer which one byte may consume one BD. So for such case, enlarge more BDs is the right way as your UART patch. 2. High cpu loading so that SDMA interrupt handler can't run in time to set BD_DONE flag back again, at last all BDs consumed. In such case, this patch may blind other coding issues such as long time window of disable irq(spin_lock_irq) . So I think this patch is much like a refine/restore patch, and it's better to add a clear print information to hint user channel is restoring and unexpected high cpu loading is coming... > Signed-off-by: Philipp Puschmann > Reviewed-by: Lucas Stach > --- > > Changelog v4: > - fixed the fixes tag > > Changelog v3: > - use correct dma_wmb() instead of dma_wb() > - add fixes tag > > Changelog v2: > - clarify comment and commit description > > drivers/dma/imx-sdma.c | 21 + > 1 file changed, 17 insertions(+), 4 deletions(-) > > diff --git a/drivers/dma/imx-sdma.c b/drivers/dma/imx-sdma.c index > e029a2443cfc..a32b5962630e 100644 > --- a/drivers/dma/imx-sdma.c > +++ b/drivers/dma/imx-sdma.c > @@ -775,21 +775,23 @@ static void sdma_start_desc(struct sdma_channel > *sdmac) static void sdma_update_channel_loop(struct sdma_channel > *sdmac) { > struct sdma_buffer_descriptor *bd; > - int error = 0; > - enum dma_status old_status = sdmac->status; > + struct sdma_desc *desc = sdmac->desc; > + int error = 0, cnt = 0; > + enum dma_status old_status = sdmac->status; > > /* >* loop mode. Iterate over descriptors, re-setup them and >* call callback function. >*/ > - while (sdmac->desc) { > - struct sdma_desc *desc = sdmac->desc; > + while (desc) { > > bd = >bd[desc->buf_tail]; > > if (bd->mode.status & BD_DONE) > break; > > + cnt++; > + > if (bd->mode.status & BD_RROR) { > bd->mode.status &= ~BD_RROR; > sdmac->status = DMA_ERROR; > @@ -822,6 +824,17 @@ static void sdma_update_channel_loop(struct > sdma_channel *sdmac) > if (error) > sdmac->status = old_status; > } > + > + /* In some situations it may happen that the sdma does not found any > + * usable descriptor in the ring to put data into. The channel is > + * stopped then. While there is no specific error condition we can > + * check for, a necessary condition is that all available buffers for > + * the current channel have been written to by the sdma script. In > + * this case and after we have made the buffers available again, > + * we restart the channel. > + */ > + if (cnt >= desc->num_bd) > + sdma_enable_channel(sdmac->sdma, sdmac->channel); > } > > static void mxc_sdma_handle_channel_normal(struct sdma_channel *data) > -- > 2.23.0
Re: [PATCH v4 2/3] dmaengine: imx-sdma: fix dma freezes
On Fr, 2019-09-20 at 10:53 +0200, Philipp Puschmann wrote: > Hi Jan, > > Am 19.09.19 um 17:19 schrieb Jan Lübbe: > > Hi Philipp, > > > > see below... > > > > On Thu, 2019-09-19 at 16:29 +0200, Philipp Puschmann wrote: > > > For some years and since many kernel versions there are reports that the > > > RX UART SDMA channel stops working at some point. The workaround was to > > > disable DMA for RX. This commit tries to fix the problem itself. > > > > > > Due to its license i wasn't able to debug the sdma script itself but it > > > somehow leads to blocking the scheduling of the channel script when a > > > running sdma script does not find any free descriptor in the ring to put > > > its data into. > > > > > > If we detect such a potential case we manually restart the channel. > > > > > > As sdmac->desc is constant we can move desc out of the loop. > > > > > > Fixes: 1ec1e82f2510 ("dmaengine: Add Freescale i.MX SDMA support") > > > Signed-off-by: Philipp Puschmann > > > Reviewed-by: Lucas Stach > > > --- > > > > > > Changelog v4: > > > - fixed the fixes tag > > > > > > Changelog v3: > > > - use correct dma_wmb() instead of dma_wb() > > > - add fixes tag > > > > > > Changelog v2: > > > - clarify comment and commit description > > > > > > drivers/dma/imx-sdma.c | 21 + > > > 1 file changed, 17 insertions(+), 4 deletions(-) > > > > > > diff --git a/drivers/dma/imx-sdma.c b/drivers/dma/imx-sdma.c > > > index e029a2443cfc..a32b5962630e 100644 > > > --- a/drivers/dma/imx-sdma.c > > > +++ b/drivers/dma/imx-sdma.c > > > @@ -775,21 +775,23 @@ static void sdma_start_desc(struct sdma_channel > > > *sdmac) > > > static void sdma_update_channel_loop(struct sdma_channel *sdmac) > > > { > > > struct sdma_buffer_descriptor *bd; > > > - int error = 0; > > > - enum dma_status old_status = sdmac->status; > > > + struct sdma_desc *desc = sdmac->desc; > > > + int error = 0, cnt = 0; > > > + enum dma_status old_status = sdmac->status; > > > > > > /* > > >* loop mode. Iterate over descriptors, re-setup them and > > >* call callback function. > > >*/ > > > - while (sdmac->desc) { > > > - struct sdma_desc *desc = sdmac->desc; > > > + while (desc) { > > > > > > bd = >bd[desc->buf_tail]; > > > > > > if (bd->mode.status & BD_DONE) > > > break; > > > > > > + cnt++; > > > + > > > if (bd->mode.status & BD_RROR) { > > > bd->mode.status &= ~BD_RROR; > > > sdmac->status = DMA_ERROR; > > > @@ -822,6 +824,17 @@ static void sdma_update_channel_loop(struct > > > sdma_channel *sdmac) > > > if (error) > > > sdmac->status = old_status; > > > } > > > + > > > + /* In some situations it may happen that the sdma does not found any > > ^ hasn't > > > + * usable descriptor in the ring to put data into. The channel is > > > + * stopped then. While there is no specific error condition we can > > > + * check for, a necessary condition is that all available buffers for > > > + * the current channel have been written to by the sdma script. In > > > + * this case and after we have made the buffers available again, > > > + * we restart the channel. > > > + */ > > > > Are you sure we can't miss cases where we only had to make some buffers > > available again, but the SDMA already ran out of buffers before? > Think so, yes. > > A while ago, I was debugging a similar issue triggered by receiving > > data with a wrong baud rate, which leads to all descriptors being > > marked with the error flag very quickly (and the SDMA stalling). > > I noticed that you can check if the channel is still running by > > checking the SDMA_H_STATSTOP register & BIT(sdmac->channel). > > I think checking for this register is the better approach. Then i could drop > the > cnt variable. And by droppting cnt i would propose to move the check and > reenabling > to the end of the while loop to reenable the channel after freeing first > buffer. You certainly don't want to have a MMIO read at each iteration of the loop, as that would be quite a bit of overhead. I'm not sure it's worth it to try to minimize the channel re-enable latency. You are only getting into this situation because of bad system latencies before this part of the code run, so the little bit of latency added by cleaning the descriptors before trying to re-enable the channel will probably not add much further harm and you don't risk running in the out-of- descriptors error immediately again. Remember, in a preemptible kernel the task cleaning the descriptors could be put to sleep immediately after you you cleaned a single descriptor and kicked the channel back to life. > > I also added a flag for the sdmac->flags field to allow stopping the > > channel from the callback (otherwise it would enable the channel > > again). > > Could memory and compiler ordering
Re: [PATCH v4 2/3] dmaengine: imx-sdma: fix dma freezes
Hi Jan, Am 19.09.19 um 17:19 schrieb Jan Lübbe: > Hi Philipp, > > see below... > > On Thu, 2019-09-19 at 16:29 +0200, Philipp Puschmann wrote: >> For some years and since many kernel versions there are reports that the >> RX UART SDMA channel stops working at some point. The workaround was to >> disable DMA for RX. This commit tries to fix the problem itself. >> >> Due to its license i wasn't able to debug the sdma script itself but it >> somehow leads to blocking the scheduling of the channel script when a >> running sdma script does not find any free descriptor in the ring to put >> its data into. >> >> If we detect such a potential case we manually restart the channel. >> >> As sdmac->desc is constant we can move desc out of the loop. >> >> Fixes: 1ec1e82f2510 ("dmaengine: Add Freescale i.MX SDMA support") >> Signed-off-by: Philipp Puschmann >> Reviewed-by: Lucas Stach >> --- >> >> Changelog v4: >> - fixed the fixes tag >> >> Changelog v3: >> - use correct dma_wmb() instead of dma_wb() >> - add fixes tag >> >> Changelog v2: >> - clarify comment and commit description >> >> drivers/dma/imx-sdma.c | 21 + >> 1 file changed, 17 insertions(+), 4 deletions(-) >> >> diff --git a/drivers/dma/imx-sdma.c b/drivers/dma/imx-sdma.c >> index e029a2443cfc..a32b5962630e 100644 >> --- a/drivers/dma/imx-sdma.c >> +++ b/drivers/dma/imx-sdma.c >> @@ -775,21 +775,23 @@ static void sdma_start_desc(struct sdma_channel *sdmac) >> static void sdma_update_channel_loop(struct sdma_channel *sdmac) >> { >> struct sdma_buffer_descriptor *bd; >> -int error = 0; >> -enum dma_status old_status = sdmac->status; >> +struct sdma_desc *desc = sdmac->desc; >> +int error = 0, cnt = 0; >> +enum dma_status old_status = sdmac->status; >> >> /* >> * loop mode. Iterate over descriptors, re-setup them and >> * call callback function. >> */ >> -while (sdmac->desc) { >> -struct sdma_desc *desc = sdmac->desc; >> +while (desc) { >> >> bd = >bd[desc->buf_tail]; >> >> if (bd->mode.status & BD_DONE) >> break; >> >> +cnt++; >> + >> if (bd->mode.status & BD_RROR) { >> bd->mode.status &= ~BD_RROR; >> sdmac->status = DMA_ERROR; >> @@ -822,6 +824,17 @@ static void sdma_update_channel_loop(struct >> sdma_channel *sdmac) >> if (error) >> sdmac->status = old_status; >> } >> + >> +/* In some situations it may happen that the sdma does not found any > ^ hasn't >> + * usable descriptor in the ring to put data into. The channel is >> + * stopped then. While there is no specific error condition we can >> + * check for, a necessary condition is that all available buffers for >> + * the current channel have been written to by the sdma script. In >> + * this case and after we have made the buffers available again, >> + * we restart the channel. >> + */ > > Are you sure we can't miss cases where we only had to make some buffers > available again, but the SDMA already ran out of buffers before? Think so, yes. > > A while ago, I was debugging a similar issue triggered by receiving > data with a wrong baud rate, which leads to all descriptors being > marked with the error flag very quickly (and the SDMA stalling). > I noticed that you can check if the channel is still running by > checking the SDMA_H_STATSTOP register & BIT(sdmac->channel). I think checking for this register is the better approach. Then i could drop the cnt variable. And by droppting cnt i would propose to move the check and reenabling to the end of the while loop to reenable the channel after freeing first buffer. > > I also added a flag for the sdmac->flags field to allow stopping the > channel from the callback (otherwise it would enable the channel > again). Could memory and compiler ordering a problem here? I'm not that into these kind of problems, but is this sdmac->flags &= ~IMX_DMA_ACTIVE; writel_relaxed(BIT(channel), sdma->regs + SDMA_H_STATSTOP); guaranteed to be free of race conditions? Regards, Philipp > > Attached is my current version of that patch for reference. > >> +if (cnt >= desc->num_bd) >> +sdma_enable_channel(sdmac->sdma, sdmac->channel); >> } >> >> static void mxc_sdma_handle_channel_normal(struct sdma_channel *data)
Re: [PATCH v4 2/3] dmaengine: imx-sdma: fix dma freezes
Hi Philipp, see below... On Thu, 2019-09-19 at 16:29 +0200, Philipp Puschmann wrote: > For some years and since many kernel versions there are reports that the > RX UART SDMA channel stops working at some point. The workaround was to > disable DMA for RX. This commit tries to fix the problem itself. > > Due to its license i wasn't able to debug the sdma script itself but it > somehow leads to blocking the scheduling of the channel script when a > running sdma script does not find any free descriptor in the ring to put > its data into. > > If we detect such a potential case we manually restart the channel. > > As sdmac->desc is constant we can move desc out of the loop. > > Fixes: 1ec1e82f2510 ("dmaengine: Add Freescale i.MX SDMA support") > Signed-off-by: Philipp Puschmann > Reviewed-by: Lucas Stach > --- > > Changelog v4: > - fixed the fixes tag > > Changelog v3: > - use correct dma_wmb() instead of dma_wb() > - add fixes tag > > Changelog v2: > - clarify comment and commit description > > drivers/dma/imx-sdma.c | 21 + > 1 file changed, 17 insertions(+), 4 deletions(-) > > diff --git a/drivers/dma/imx-sdma.c b/drivers/dma/imx-sdma.c > index e029a2443cfc..a32b5962630e 100644 > --- a/drivers/dma/imx-sdma.c > +++ b/drivers/dma/imx-sdma.c > @@ -775,21 +775,23 @@ static void sdma_start_desc(struct sdma_channel *sdmac) > static void sdma_update_channel_loop(struct sdma_channel *sdmac) > { > struct sdma_buffer_descriptor *bd; > - int error = 0; > - enum dma_status old_status = sdmac->status; > + struct sdma_desc *desc = sdmac->desc; > + int error = 0, cnt = 0; > + enum dma_status old_status = sdmac->status; > > /* >* loop mode. Iterate over descriptors, re-setup them and >* call callback function. >*/ > - while (sdmac->desc) { > - struct sdma_desc *desc = sdmac->desc; > + while (desc) { > > bd = >bd[desc->buf_tail]; > > if (bd->mode.status & BD_DONE) > break; > > + cnt++; > + > if (bd->mode.status & BD_RROR) { > bd->mode.status &= ~BD_RROR; > sdmac->status = DMA_ERROR; > @@ -822,6 +824,17 @@ static void sdma_update_channel_loop(struct sdma_channel > *sdmac) > if (error) > sdmac->status = old_status; > } > + > + /* In some situations it may happen that the sdma does not found any ^ hasn't > + * usable descriptor in the ring to put data into. The channel is > + * stopped then. While there is no specific error condition we can > + * check for, a necessary condition is that all available buffers for > + * the current channel have been written to by the sdma script. In > + * this case and after we have made the buffers available again, > + * we restart the channel. > + */ Are you sure we can't miss cases where we only had to make some buffers available again, but the SDMA already ran out of buffers before? A while ago, I was debugging a similar issue triggered by receiving data with a wrong baud rate, which leads to all descriptors being marked with the error flag very quickly (and the SDMA stalling). I noticed that you can check if the channel is still running by checking the SDMA_H_STATSTOP register & BIT(sdmac->channel). I also added a flag for the sdmac->flags field to allow stopping the channel from the callback (otherwise it would enable the channel again). Attached is my current version of that patch for reference. > + if (cnt >= desc->num_bd) > + sdma_enable_channel(sdmac->sdma, sdmac->channel); > } > > static void mxc_sdma_handle_channel_normal(struct sdma_channel *data) From 73d7dcf84dac5512c50448ff6adf084f1a9bd6f9 Mon Sep 17 00:00:00 2001 From: Jan Luebbe Date: Tue, 16 Apr 2019 18:35:04 +0200 Subject: [PATCH] dmaengine: imx-sdma: restart stopped cyclic transfers For cyclic DMA transfers, we have at least two cases where we can run out descriptors available to the engine: - Interrups are disabled for too long and all buffers a filled with data. - DMA errors (such as generated by baud rate mismatch with imx-uart) use up all descriptors before we can react. In this case, SDMA stops the channel and no further transfers are done until the respective channel is disabled and re-enabled. The best we can do in this case is to check if the transfer should still be enabled (it could have been disabled during sdma_update_channel_loop), but the SDMA channel is stopped. In this case, we re-start the channel. To avoid racing with changes to the sdmac->status field (which is written and restored in sdma_update_channel_loop), we add a new flag (IMX_DMA_ACTIVE) to indicate that the channel is currently active. Signed-off-by: Jan Luebbe --- drivers/dma/imx-sdma.c | 13 + 1 file changed, 13