RE: [PATCH v7 1/8] Talitos: Support for async_tx XOR offload

2012-09-12 Thread Liu Qiang-B32616
> >> Will this engine be coordinating with another to handle memory copies?
> >>  The dma mapping code for async_tx/raid is broken when dma mapping
> >> requests overlap or cross dma device boundaries [1].
> >>
> >> [1]: http://marc.info/?l=linux-arm-kernel&m=129407269402930&w=2
> > Yes, it needs fsl-dma to handle memcpy copies.
> > I read your link, the unmap address is stored in talitos hwdesc, the
> address will be unmapped when async_tx ack this descriptor, I know fsl-
> dma won't wait this ack flag in current kernel, so I fix it in fsl-dma
> patch 5/8. Do you mean that?
> 
> Unfortunately no.  I'm open to other suggestions. but as far as I can
> see it requires deeper changes to rip out the dma mapping that happens
> in async_tx and the automatic unmapping done by drivers.  It should
> all be pushed to the client (md).
> 
> Currently async_tx hides hardware details from md such that it doesn't
> even care if the operation is offloaded to hardware at all, but that
> takes things too far.  In the worst case an copy->xor chain handled by
> multiple channels results in :
> 
> 1/ dma_map(copy_chan...)
> 2/ dma_map(xor_chan...)
> 3/ 
> 4/ dma_unmap(copy_chan...)
> 5/  <---initiated by the copy_chan
> 6/ dma_unmap(xor_chan...)
> 
> Step 2 violates the dma api since the buffers belong to the xor_chan
> until unmap.  Step 5 also causes the random completion context of the
> copy channel to bleed into submission context of the xor channel which
> is problematic.  So the order needs to be:
> 
> 1/ dma_map(copy_chan...)
> 2/ 
> 3/ dma_unmap(copy_chan...)
> 4/ dma_map(xor_chan...)
> 5/  <--initiated by md in a static context
> 6/ dma_unmap(xor_chan...)
> 
> Also, if xor_chan and copy_chan lie with the same dma mapping domain
> (iommu or parent device) then we can map the stripe once and skip the
> extra maintenance for the duration of the chain of operations.  This
> dumps a lot of hardware details on md, but I think it is the only way
> to get consistent semantics when arbitrary offload devices are
> involved.
Thanks for your answer and links, I did some investigate these days,

first, powerpc processor should be hardware assured cache coherency, it should
be ok for hardware when in step 5 (but I will avoid map same address on 
different
device).

second, I have a workaround to make dma_map/unmap by order when using 2 
different
device to offload, I will submit next descriptor until current descriptor 
complete,
if (submit->flags & ASYNC_TX_ACK)
async_tx_ack(tx);
if (depend_tx)
async_tx_ack(depend_tx);

+   /* do more check to support 2 devices offload? */
+   if (dma_wait_for_async_tx(tx) == DMA_ERROR)
+   panic("%s: DMA_ERROR waiting for tx\n", __func__);
}
EXPORT_SYMBOL_GPL(async_tx_submit);

Also use your example, 
1/ dma_map(copy_chan...)
2/ tx->submit(tx); async_tx_ack(tx);
3/ dma_unmap(copy_chan...)
4/ dma_map(xor_chan...)
5/  <-- initialized by tx->submit(tx);
6/ dma_unmap(xor_chan...)

Under this way, actually dma_run_dependency() is useless, so this can make sure 
copy
and xor with same page processed by order, and only one descriptor per channel 
is
served. dma_unmap in driver is controlled by client (tx->flags)

How's you thinking or any suggestions? I test it on our powerpc, I don't know 
whether
it does work on other architecture.

Thanks.

> 
> --
> Dan


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


RE: [PATCH v7 6/8] fsl-dma: use spin_lock_bh to instead of spin_lock_irqsave

2012-09-04 Thread Liu Qiang-B32616
> -Original Message-
> From: dan.j.willi...@gmail.com [mailto:dan.j.willi...@gmail.com] On
> Behalf Of Dan Williams
> Sent: Sunday, September 02, 2012 4:41 PM
> To: Liu Qiang-B32616
> Cc: linux-cry...@vger.kernel.org; linuxppc-dev@lists.ozlabs.org; linux-
> ker...@vger.kernel.org; vinod.k...@intel.com; Phillips Kim-R1AAHA;
> herb...@gondor.hengli.com.au; da...@davemloft.net; a...@arndb.de;
> gre...@linuxfoundation.org; Li Yang-R58472; Tabi Timur-B04825
> Subject: Re: [PATCH v7 6/8] fsl-dma: use spin_lock_bh to instead of
> spin_lock_irqsave
> 
> On Thu, Aug 9, 2012 at 1:23 AM,   wrote:
> > From: Qiang Liu 
> >
> > The use of spin_lock_irqsave() is a stronger locking mechanism than is
> > required throughout the driver. The minimum locking required should be
> > used instead. Interrupts will be turned off and context will be saved,
> > there is needless to use irqsave.
> >
> > Change all instances of spin_lock_irqsave() to spin_lock_bh().
> > All manipulation of protected fields is done using tasklet context or
> > weaker, which makes spin_lock_bh() the correct choice.
> 
> It seems you are coordinating fsl-dma copy and talitos xor operations.
>  It looks like fsl-dma will be called through
> talitos_process_pending()->dma_run_dependencies(), which is
> potentially called in hard irq context.
> 
> This all comes back to the need to fix raid offload to manage the
> channels explicitly rather than the current dependency chains.
So you mean I must implement talitos_run_dependencies() and 
fsldma_run_dependencies()? Invoke async_tx->callback() respectively.
How about avoiding irq context in talitos?

> 
> --
> Dan


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


RE: [PATCH v7 1/8] Talitos: Support for async_tx XOR offload

2012-09-04 Thread Liu Qiang-B32616
> -Original Message-
> From: dan.j.willi...@gmail.com [mailto:dan.j.willi...@gmail.com] On
> Behalf Of Dan Williams
> Sent: Sunday, September 02, 2012 4:12 PM
> To: Liu Qiang-B32616
> Cc: linux-cry...@vger.kernel.org; herb...@gondor.hengli.com.au;
> da...@davemloft.net; linux-ker...@vger.kernel.org; linuxppc-
> d...@lists.ozlabs.org; Li Yang-R58472; Phillips Kim-R1AAHA;
> vinod.k...@intel.com; dan.j.willi...@intel.com; a...@arndb.de;
> gre...@linuxfoundation.org
> Subject: Re: [PATCH v7 1/8] Talitos: Support for async_tx XOR offload
> 
> On Thu, Aug 9, 2012 at 1:20 AM,   wrote:
> > From: Qiang Liu 
> >
> > Expose Talitos's XOR functionality to be used for RAID parity
> > calculation via the Async_tx layer.
> >
> > Cc: Herbert Xu 
> > Cc: David S. Miller 
> > Signed-off-by: Dipen Dudhat 
> > Signed-off-by: Maneesh Gupta 
> > Signed-off-by: Kim Phillips 
> > Signed-off-by: Vishnu Suresh 
> > Signed-off-by: Qiang Liu 
> > ---
> >  drivers/crypto/Kconfig   |9 +
> >  drivers/crypto/talitos.c |  413
> ++
> >  drivers/crypto/talitos.h |   53 ++
> >  3 files changed, 475 insertions(+), 0 deletions(-)
> >
> > diff --git a/drivers/crypto/Kconfig b/drivers/crypto/Kconfig
> > index be6b2ba..f0a7c29 100644
> > --- a/drivers/crypto/Kconfig
> > +++ b/drivers/crypto/Kconfig
> > @@ -222,6 +222,15 @@ config CRYPTO_DEV_TALITOS
> >   To compile this driver as a module, choose M here: the module
> >   will be called talitos.
> >
> > +config CRYPTO_DEV_TALITOS_RAIDXOR
> > +   bool "Talitos RAID5 XOR Calculation Offload"
> > +   default y
> 
> No, default y.  The user should explicitly enable this.
Ok, I will change it to N.

> 
> > +   select DMA_ENGINE
> > +   depends on CRYPTO_DEV_TALITOS
> > +   help
> > + Say 'Y' here to use the Freescale Security Engine (SEC) to
> > + offload RAID XOR parity Calculation
> > +
> 
> Is it faster than cpu xor?
Yes, I tested with IOZone, cpu load is also reduced.
I think one possible reason is the processor of p1022ds is not strong as 
others, so the performance is improved obviously.

> 
> Will this engine be coordinating with another to handle memory copies?
>  The dma mapping code for async_tx/raid is broken when dma mapping
> requests overlap or cross dma device boundaries [1].
> 
> [1]: http://marc.info/?l=linux-arm-kernel&m=129407269402930&w=2
Yes, it needs fsl-dma to handle memcpy copies.
I read your link, the unmap address is stored in talitos hwdesc, the address 
will be unmapped when async_tx ack this descriptor, I know fsl-dma won't wait 
this ack flag in current kernel, so I fix it in fsl-dma patch 5/8. Do you mean 
that?

> 
> > +static void talitos_process_pending(struct talitos_xor_chan *xor_chan)
> > +{
> > +   struct talitos_xor_desc *desc, *_desc;
> > +   unsigned long flags;
> > +   int status;
> > +   struct talitos_private *priv;
> > +   int ch;
> > +
> > +   priv = dev_get_drvdata(xor_chan->dev);
> > +   ch = atomic_inc_return(&priv->last_chan) &
> > + (priv->num_channels - 1);
> 
> Maybe a comment about why this this is duplicated from
> talitos_cra_init()?  It sticks out here and I had to go looking to
> find out why this channel number increments on submit.
My fault, I will correct it as below,

atomic_read((&priv->last_chan) & (priv->num_channels - 1);

> 
> 
> > +   spin_lock_irqsave(&xor_chan->desc_lock, flags);
> > +
> > +   list_for_each_entry_safe(desc, _desc, &xor_chan->pending_q,
> node) {
> > +   status = talitos_submit(xor_chan->dev, ch, &desc-
> >hwdesc,
> > +   talitos_release_xor, desc);
> > +   if (status != -EINPROGRESS)
> > +   break;
> > +
> > +   list_del(&desc->node);
> > +   list_add_tail(&desc->node, &xor_chan->in_progress_q);
> > +   }
> > +
> > +   spin_unlock_irqrestore(&xor_chan->desc_lock, flags);
> > +}
> > +
> > +static void talitos_xor_run_tx_complete_actions(struct
> talitos_xor_desc *desc,
> > +   struct talitos_xor_chan *xor_chan)
> > +{
> > +   struct device *dev = xor_chan->dev;
> > +   dma_addr_t dest, addr;
> > +   unsigned int src_cnt = desc->unmap_src_cnt;
> > + 

RE: [PATCH v7 1/8] Talitos: Support for async_tx XOR offload

2012-08-31 Thread Liu Qiang-B32616
> -Original Message-
> From: Geanta Neag Horia Ioan-B05471
> Sent: Friday, August 31, 2012 6:39 PM
> To: Liu Qiang-B32616; linux-cry...@vger.kernel.org;
> dan.j.willi...@gmail.com; herb...@gondor.hengli.com.au;
> da...@davemloft.net; linux-ker...@vger.kernel.org; linuxppc-
> d...@lists.ozlabs.org
> Cc: Li Yang-R58472; Phillips Kim-R1AAHA; vinod.k...@intel.com; Dan
> Williams; a...@arndb.de; gre...@linuxfoundation.org
> Subject: RE: [PATCH v7 1/8] Talitos: Support for async_tx XOR offload
> 
> On Fri, 31 Aug 2012 06:08:05 +0300, Liu Qiang-B32616
> 
> wrote:
> > > -Original Message-
> > > From: Geanta Neag Horia Ioan-B05471
> > > Sent: Thursday, August 30, 2012 10:23 PM
> > > To: Liu Qiang-B32616; linux-cry...@vger.kernel.org;
> > > dan.j.willi...@gmail.com; herb...@gondor.hengli.com.au;
> > > da...@davemloft.net; linux-ker...@vger.kernel.org; linuxppc-
> > > d...@lists.ozlabs.org
> > > Cc: Li Yang-R58472; Phillips Kim-R1AAHA; vinod.k...@intel.com;
> > > dan.j.willi...@intel.com; a...@arndb.de; gre...@linuxfoundation.org;
> Liu
> > > Qiang-B32616
> > > Subject: RE: [PATCH v7 1/8] Talitos: Support for async_tx XOR offload
> > >
> > > On Thu, 9 Aug 2012 11:20:48 +0300, qiang@freescale.com wrote:
> > > > From: Qiang Liu 
> > > >
> > > > Expose Talitos's XOR functionality to be used for RAID parity
> > > > calculation via the Async_tx layer.
> > > >
> > > > Cc: Herbert Xu 
> > > > Cc: David S. Miller 
> > > > Signed-off-by: Dipen Dudhat 
> > > > Signed-off-by: Maneesh Gupta 
> > > > Signed-off-by: Kim Phillips 
> > > > Signed-off-by: Vishnu Suresh 
> > > > Signed-off-by: Qiang Liu 
> > > > ---
> > > >  drivers/crypto/Kconfig   |9 +
> > > >  drivers/crypto/talitos.c |  413
> > > ++
> > > >  drivers/crypto/talitos.h |   53 ++
> > > >  3 files changed, 475 insertions(+), 0 deletions(-)
> > >
> 
> > > > +static int talitos_alloc_chan_resources(struct dma_chan *chan)
> > > > +{
> > > > +   struct talitos_xor_chan *xor_chan;
> > > > +   struct talitos_xor_desc *desc;
> > > > +   LIST_HEAD(tmp_list);
> > > > +   int i;
> > > > +
> > > > +   xor_chan = container_of(chan, struct talitos_xor_chan,
> common);
> > > > +
> > > > +   if (!list_empty(&xor_chan->free_desc))
> > > > +   return xor_chan->total_desc;
> > > > +
> > > > +   for (i = 0; i < TALITOS_MAX_DESCRIPTOR_NR; i++) {
> > > > +   desc = talitos_xor_alloc_descriptor(xor_chan,
> > > > +   GFP_KERNEL | GFP_DMA);
> > >
> > > talitos_xor_alloc_descriptor() is called here without holding
> > > the xor_chan->desc_lock and it increments xor_chan->total_desc.
> > > Isn't this an issue ?
> >
> > No, please refer to the code as below,
> > +   list_add_tail(&desc->node, &tmp_list);
> >
> > The list is temporary list, it will be merged to xor_chan->free_desc in
> next step, here is protected
> > by lock,
> > +   spin_lock_bh(&xor_chan->desc_lock);
> > +   list_splice_init(&tmp_list, &xor_chan->free_desc);
> > +   spin_unlock_bh(&xor_chan->desc_lock);
> 
> I was not referring to the list, but to xor_chan->total_desc variable.
> The following access:
> talitos_alloc_chan_resources()->talitos_xor_alloc_descriptor()-
> >total_desc++
> is not protected by the xor_chan->desc_lock.
Ok, I will correct it in next series. Thanks.


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


RE: [PATCH v7 1/8] Talitos: Support for async_tx XOR offload

2012-08-30 Thread Liu Qiang-B32616
> -Original Message-
> From: Geanta Neag Horia Ioan-B05471
> Sent: Thursday, August 30, 2012 10:23 PM
> To: Liu Qiang-B32616; linux-cry...@vger.kernel.org;
> dan.j.willi...@gmail.com; herb...@gondor.hengli.com.au;
> da...@davemloft.net; linux-ker...@vger.kernel.org; linuxppc-
> d...@lists.ozlabs.org
> Cc: Li Yang-R58472; Phillips Kim-R1AAHA; vinod.k...@intel.com;
> dan.j.willi...@intel.com; a...@arndb.de; gre...@linuxfoundation.org; Liu
> Qiang-B32616
> Subject: RE: [PATCH v7 1/8] Talitos: Support for async_tx XOR offload
> 
> On Thu, 9 Aug 2012 11:20:48 +0300, qiang@freescale.com wrote:
> > From: Qiang Liu 
> >
> > Expose Talitos's XOR functionality to be used for RAID parity
> > calculation via the Async_tx layer.
> >
> > Cc: Herbert Xu 
> > Cc: David S. Miller 
> > Signed-off-by: Dipen Dudhat 
> > Signed-off-by: Maneesh Gupta 
> > Signed-off-by: Kim Phillips 
> > Signed-off-by: Vishnu Suresh 
> > Signed-off-by: Qiang Liu 
> > ---
> >  drivers/crypto/Kconfig   |9 +
> >  drivers/crypto/talitos.c |  413
> ++
> >  drivers/crypto/talitos.h |   53 ++
> >  3 files changed, 475 insertions(+), 0 deletions(-)
> 
> 
> > +static void talitos_xor_run_tx_complete_actions(struct
> talitos_xor_desc *desc,
> > +   struct talitos_xor_chan *xor_chan)
> > +{
> > +   struct device *dev = xor_chan->dev;
> > +   dma_addr_t dest, addr;
> > +   unsigned int src_cnt = desc->unmap_src_cnt;
> > +   unsigned int len = desc->unmap_len;
> > +   enum dma_ctrl_flags flags = desc->async_tx.flags;
> > +   struct dma_async_tx_descriptor *tx = &desc->async_tx;
> > +
> > +   /* unmap dma addresses */
> > +   dest = desc->hwdesc.ptr[6].ptr;
> > +   if (likely(!(flags & DMA_COMPL_SKIP_DEST_UNMAP)))
> > +   dma_unmap_page(dev, dest, len, DMA_BIDIRECTIONAL);
> > +
> > +   desc->idx = 6 - src_cnt;
> > +   if (likely(!(flags & DMA_COMPL_SKIP_SRC_UNMAP))) {
> > +   while(desc->idx < 6) {
> > +   addr = desc->hwdesc.ptr[desc->idx++].ptr;
> > +   if (addr == dest)
> > +   continue;
> > +   dma_unmap_page(dev, addr, len, DMA_TO_DEVICE);
> > +   }
> > +   }
> 
> No need for braces around the while block.
I will remove it.

> 
> > +   /* run dependent operations */
> > +   dma_run_dependencies(tx);
> > +}
> 
> 
> > +static void talitos_release_xor(struct device *dev, struct
> talitos_desc *hwdesc,
> > +   void *context, int error)
> > +{
> > +   struct talitos_xor_desc *desc = context;
> > +   struct talitos_xor_chan *xor_chan;
> > +   dma_async_tx_callback callback;
> > +   void *callback_param;
> > +
> > +   if (unlikely(error))
> > +   dev_err(dev, "xor operation: talitos error %d\n", error);
> > +
> > +   xor_chan = container_of(desc->async_tx.chan, struct
> talitos_xor_chan,
> > +   common);
> > +   spin_lock_bh(&xor_chan->desc_lock);
> > +   if (xor_chan->completed_cookie < desc->async_tx.cookie)
> > +   xor_chan->completed_cookie = desc->async_tx.cookie;
> > +
> > +   callback = desc->async_tx.callback;
> > +   callback_param = desc->async_tx.callback_param;
> > +
> > +   if (callback) {
> > +   spin_unlock_bh(&xor_chan->desc_lock);
> > +   callback(callback_param);
> > +   spin_lock_bh(&xor_chan->desc_lock);
> > +   }
> 
> Since callback_param is used only here, maybe:
> 
> if (callback) {
>   void *callback_param = desc->async_tx.callback_param;
> 
>   spin_unlock_bh(&xor_chan->desc_lock);
>   callback(callback_param);
>   spin_lock_bh(&xor_chan->desc_lock);
> }
Fine. I will modify it in next.

> 
> > +
> > +   talitos_xor_run_tx_complete_actions(desc, xor_chan);
> > +
> > +   list_del(&desc->node);
> > +   list_add_tail(&desc->node, &xor_chan->free_desc);
> > +   spin_unlock_bh(&xor_chan->desc_lock);
> > +   if (!list_empty(&xor_chan->pending_q))
> > +   talitos_process_pending(xor_chan);
> > +}
> 
> 
> > +static int talitos_alloc_chan_resources(struct dma_chan *chan)
> > +{
> > +   struct talitos_xor_chan *xor_chan;
> > +   struct talitos_xor_desc *desc;
> > +   LIST_HEAD(tmp_list);
> > +   int i;
> > 

RE: [PATCH v7 0/8] Raid: enable talitos xor offload for improving performance

2012-08-29 Thread Liu Qiang-B32616
> -Original Message-
> From: Dan Williams [mailto:d...@fb.com]
> Sent: Wednesday, August 29, 2012 10:53 PM
> To: Liu Qiang-B32616
> Cc: vinod.k...@intel.com; a...@arndb.de; herb...@gondor.apana.org.au;
> gre...@linuxfoundation.org; linuxppc-dev@lists.ozlabs.org; linux-
> ker...@vger.kernel.org; linux-cry...@vger.kernel.org; Ira W. Snyder
> Subject: Re: [PATCH v7 0/8] Raid: enable talitos xor offload for
> improving performance
> 
> On Wed, 2012-08-29 at 11:15 +, Liu Qiang-B32616 wrote:
> > Hi Dan,
> >
> > Ping?
> > Can you apply these patches? Thanks.
> >
> 
> I'm working my way through them.
> 
> The first thing I notice is that xor_chan->desc_lock is taken
> inconsistently.  I.e. spin_lock_irqsave() in talitos_process_pending()
> and spin_lock_bh() everywhere else.  Have you run these patches with
> lockdep?
Thanks for your reply.
LOCKDEP is enabled as you suggested, there is not any info about "inconsistent 
lock state" displayed.
I don't know whether it's enough.

I'm confused about the attribute of DMA_INTERRUPT, my understanding is this 
interface is only used to trigger an interrupt (make sure all former operations 
are finished before switching to other channels), but fsl-dma will trigger an 
interrupt by "Programmed Error". I'm wondering whether other hardware are same 
with fsl-dma (the interrupt is a normal interrupt, but not an error) i.e. 
xscale-iop?
If other hardware also trigger an interrupt by an abnormal error, maybe my 
patch 2/8 should be reverted because it violates the rules of this attribute.

BTW, could you please reply in the patch if you have any comments. Thanks.

> 
> --
> Dan
> 
> 

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


RE: [PATCH v7 0/8] Raid: enable talitos xor offload for improving performance

2012-08-29 Thread Liu Qiang-B32616
Hi Dan,

Ping?
Can you apply these patches? Thanks.


- Qiang

> -Original Message-
> From: dan.j.willi...@gmail.com [mailto:dan.j.willi...@gmail.com] On
> Behalf Of Dan Williams
> Sent: Wednesday, August 15, 2012 4:02 AM
> To: Liu Qiang-B32616
> Cc: dan.j.willi...@intel.com; vinod.k...@intel.com; a...@arndb.de;
> herb...@gondor.apana.org.au; gre...@linuxfoundation.org; linuxppc-
> d...@lists.ozlabs.org; linux-ker...@vger.kernel.org; linux-
> cry...@vger.kernel.org; Ira W. Snyder
> Subject: Re: [PATCH v7 0/8] Raid: enable talitos xor offload for
> improving performance
> 
> On Tue, Aug 14, 2012 at 2:04 AM, Liu Qiang-B32616 
> wrote:
> > Hi Vinod,
> >
> > Would you like to apply this series from patch 2/8 to 7/8) in your tree?
> > The link as below,
> > http://patchwork.ozlabs.org/patch/176023/
> > http://patchwork.ozlabs.org/patch/176024/
> > http://patchwork.ozlabs.org/patch/176025/
> > http://patchwork.ozlabs.org/patch/176026/
> > http://patchwork.ozlabs.org/patch/176027/
> > http://patchwork.ozlabs.org/patch/176028/
> >
> 
> Hi, sorry for the recent silence I've been transitioning and am now
> just catching up.  I'll take a look and then it's fine for these to go
> through Vinod's tree.
> 
> --
> Dan


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


RE: [PATCH v7 0/8] Raid: enable talitos xor offload for improving performance

2012-08-15 Thread Liu Qiang-B32616
> -Original Message-
> From: dan.j.willi...@gmail.com [mailto:dan.j.willi...@gmail.com] On
> Behalf Of Dan Williams
> Sent: Wednesday, August 15, 2012 4:02 AM
> To: Liu Qiang-B32616
> Cc: dan.j.willi...@intel.com; vinod.k...@intel.com; a...@arndb.de;
> herb...@gondor.apana.org.au; gre...@linuxfoundation.org; linuxppc-
> d...@lists.ozlabs.org; linux-ker...@vger.kernel.org; linux-
> cry...@vger.kernel.org; Ira W. Snyder
> Subject: Re: [PATCH v7 0/8] Raid: enable talitos xor offload for
> improving performance
> 
> On Tue, Aug 14, 2012 at 2:04 AM, Liu Qiang-B32616 
> wrote:
> > Hi Vinod,
> >
> > Would you like to apply this series from patch 2/8 to 7/8) in your tree?
> > The link as below,
> > http://patchwork.ozlabs.org/patch/176023/
> > http://patchwork.ozlabs.org/patch/176024/
> > http://patchwork.ozlabs.org/patch/176025/
> > http://patchwork.ozlabs.org/patch/176026/
> > http://patchwork.ozlabs.org/patch/176027/
> > http://patchwork.ozlabs.org/patch/176028/
> >
> 
> Hi, sorry for the recent silence I've been transitioning and am now
> just catching up.  I'll take a look and then it's fine for these to go
> through Vinod's tree.
Hello Dan,

Please review, this issue has been continued since many years. I hope we can fix
it this time. Thanks.

> 
> --
> Dan


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


RE: [PATCH v7 0/8] Raid: enable talitos xor offload for improving performance

2012-08-14 Thread Liu Qiang-B32616
Hi Vinod,

Would you like to apply this series from patch 2/8 to 7/8) in your tree?
The link as below,
http://patchwork.ozlabs.org/patch/176023/
http://patchwork.ozlabs.org/patch/176024/
http://patchwork.ozlabs.org/patch/176025/
http://patchwork.ozlabs.org/patch/176026/
http://patchwork.ozlabs.org/patch/176027/
http://patchwork.ozlabs.org/patch/176028/

After that, Herbert will merge patch 1/8, 
http://patchwork.ozlabs.org/patch/176022/
and Greg apply patch 8/8, http://patchwork.ozlabs.org/patch/176029/

Thanks.
 

> -Original Message-
> From: Ira W. Snyder [mailto:i...@ovro.caltech.edu]
> Sent: Friday, August 10, 2012 1:03 AM
> To: Liu Qiang-B32616
> Cc: linux-cry...@vger.kernel.org; vinod.k...@intel.com;
> dan.j.willi...@intel.com; herb...@gondor.hengli.com.au; a...@arndb.de;
> gre...@linuxfoundation.org; linuxppc-dev@lists.ozlabs.org; linux-
> ker...@vger.kernel.org; dan.j.willi...@gmail.com
> Subject: Re: [PATCH v7 0/8] Raid: enable talitos xor offload for
> improving performance
> 
> On Thu, Aug 09, 2012 at 04:19:35PM +0800, qiang@freescale.com wrote:
> > Hi all,
> >
> > The following 8 patches enabling fsl-dma and talitos offload raid
> > operations for improving raid performance and balancing CPU load.
> >
> > These patches include talitos, fsl-dma and carma module (caram uses
> > some features of fsl-dma).
> >
> > Write performance will be improved by 25-30% tested by iozone.
> > Write performance is improved about 2% after using spin_lock_bh
> > replace spin_lock_irqsave.
> > CPU load will be reduced by 8%.
> >
> > "fwiw, I gave v5 a test-drive, setting up a RAID5 array on ramdisks
> > [1], and this patchseries, along with FSL_DMA && NET_DMA set seems to
> > be holding water, so this series gets my:"
> >
> > Tested-by: Kim Phillips 
> >
> 
> The fsldma parts of the series all look great to me.
> 
> Thanks,
> Ira
> 
> > [1] mdadm --create --verbose --force /dev/md0 --level=raid5 --raid-
> devices=4 \
> > /dev/ram[0123]
> >
> > Changes in v7:
> > - add test result which is provided by Kim Phillips;
> > - correct one coding style issue in patch 5/8;
> > - add comments by Arnd Bergmann in patch 6/8;
> >
> > Changes in v6:
> > - swap the order of original patch 3/6 and 4/6;
> > - merge Ira's patch to reduce the size of original patch;
> > - merge Ira's patch of carma in 8/8;
> > - update documents and descriptions according to Ira's advice;
> >
> > Changes in v5:
> > - add detail description in patch 3/6 about the process of
> completed
> > descriptor, the process is in align with fsl-dma Reference Manual,
> > illustrate the potential risk and how to reproduce it;
> > - drop the patch 7/7 in v4 according to Timur's comments;
> >
> > Changes in v4:
> > - fix an error in talitos when dest addr is same with src addr,
> dest
> > should be freed only one time if src is same with dest addr;
> > - correct coding style in fsl-dma according to Ira's comments;
> > - fix a race condition in fsl-dma fsl_tx_status(), remove the
> interface
> > which is used to free descriptors in queue ld_completed, this
> interface
> > has been included in fsldma_cleanup_descriptor(), in v3, there is
> one
> > place missed spin_lock protect;
> > - split the original patch 3/4 up to 2 patches 3/7 and 4/7
> according to
> > Li Yang's comments;
> > - fix a warning of unitialized cookie;
> > - add memory copy self test in fsl-dma;
> > - add more detail description about use spin_lock_bh() to instead
> of
> > spin_lock_irqsave() according to Timur's comments.
> >
> > Changes in v3:
> > - change release process of fsl-dma descriptor for resolve the
> > potential race condition;
> > - add test result when use spin_lock_bh replace spin_lock_irqsave;
> > - modify the benchmark results according to the latest patch.
> >
> > Changes in v2:
> > - rebase onto cryptodev tree;
> > - split the patch 3/4 up to 3 independent patches;
> > - remove the patch 4/4, the fix is not for cryptodev tree;
> >
> > Qiang Liu (8):
> >   Talitos: Support for async_tx XOR offload
> >   fsl-dma: remove attribute DMA_INTERRUPT of dmaengine
> >   fsl-dma: add fsl_dma_free_descriptor() to reduce code duplication
> >   fsl-dma: move functions to avoid forward declarations
> >   fsl-dma: change release process of dma descriptor for supporting
> async_tx
> &

RE: [PATCH v6 5/8] fsl-dma: change release process of dma descriptor for supporting async_tx

2012-08-07 Thread Liu Qiang-B32616
Hi Ira,

I remember you said you always disable CONFIG_NET_DMA to make sure dma engine 
won't be locked, actually if this options may affect some behavior of fsl-dma 
in current kernel, it is possible to issue pending descriptors if there is any 
network actions.

Set CONFIG_NET_DMA=y, dma_issue_pending_all will be invoked in net/core/dev.c 
(e.g. triggered by NAPI), it will activate all dma channels which are 
registered to kernel. I don't know whether it will make some troubles in your 
system.

If any question please let me know. Thanks.


> -Original Message-
> From: Ira W. Snyder [mailto:i...@ovro.caltech.edu]
> Sent: Tuesday, August 07, 2012 1:51 AM
> To: Liu Qiang-B32616
> Cc: linux-cry...@vger.kernel.org; linuxppc-dev@lists.ozlabs.org;
> dan.j.willi...@intel.com; linux-ker...@vger.kernel.org;
> dan.j.willi...@gmail.com; vinod.k...@intel.com; a...@arndb.de;
> gre...@linuxfoundation.org; herb...@gondor.hengli.com.au;
> da...@davemloft.net
> Subject: Re: [PATCH v6 5/8] fsl-dma: change release process of dma
> descriptor for supporting async_tx
> 
> On Mon, Aug 06, 2012 at 06:14:33PM +0800, qiang@freescale.com wrote:
> > From: Qiang Liu 
> >
> > Fix the potential risk when enable config NET_DMA and ASYNC_TX.
> > Async_tx is lack of support in current release process of dma
> descriptor,
> > all descriptors will be released whatever is acked or no-acked by
> async_tx,
> > so there is a potential race condition when dma engine is uesd by
> others
> > clients (e.g. when enable NET_DMA to offload TCP).
> >
> > In our case, a race condition which is raised when use both of talitos
> > and dmaengine to offload xor is because napi scheduler will sync all
> > pending requests in dma channels, it affects the process of raid
> operations
> > due to ack_tx is not checked in fsl dma. The no-acked descriptor is
> freed
> > which is submitted just now, as a dependent tx, this freed descriptor
> trigger
> > BUG_ON(async_tx_test_ack(depend_tx)) in async_tx_submit().
> >
> > TASK = ee1a94a0[1390] 'md0_raid5' THREAD: ecf4 CPU: 0
> > GPR00: 0001 ecf41ca0 ee44/921a94a0 003f 0001 c00593e4
>  0001
> > GPR08:  a7a7a7a7 0001 045/92002 42028042 100a38d4
> ed576d98 
> > GPR16: ed5a11b0  2b162000 0200 046/92000 2d555000
> ed3015e8 c15a7aa0
> > GPR24:  c155fc40  ecb63220 ecf41d28 e47/92f640bb0
> ef640c30 ecf41ca0
> > NIP [c02b048c] async_tx_submit+0x6c/0x2b4
> > LR [c02b068c] async_tx_submit+0x26c/0x2b4
> > Call Trace:
> > [ecf41ca0] [c02b068c] async_tx_submit+0x26c/0x2b448/92 (unreliable)
> > [ecf41cd0] [c02b0a4c] async_memcpy+0x240/0x25c
> > [ecf41d20] [c0421064] async_copy_data+0xa0/0x17c
> > [ecf41d70] [c0421cf4] __raid_run_ops+0x874/0xe10
> > [ecf41df0] [c0426ee4] handle_stripe+0x820/0x25e8
> > [ecf41e90] [c0429080] raid5d+0x3d4/0x5b4
> > [ecf41f40] [c04329b8] md_thread+0x138/0x16c
> > [ecf41f90] [c008277c] kthread+0x8c/0x90
> > [ecf41ff0] [c0011630] kernel_thread+0x4c/0x68
> >
> > Another modification in this patch is the change of completed
> descriptors,
> > there is a potential risk which caused by exception interrupt, all
> descriptors
> > in ld_running list are seemed completed when an interrupt raised, it
> works fine
> > under normal condition, but if there is an exception occured, it cannot
> work
> > as our excepted. Hardware should not be depend on s/w list, the right
> way is
> > to read current descriptor address register to find the last completed
> > descriptor. If an interrupt is raised by an error, all descriptors in
> ld_running
> > should not be seemed finished, or these unfinished descriptors in
> ld_running
> > will be released wrongly.
> >
> > A simple way to reproduce,
> > Enable dmatest first, then insert some bad descriptors which can
> trigger
> > Programming Error interrupts before the good descriptors. Last, the
> good
> > descriptors will be freed before they are processsed because of the
> exception
> > intrerrupt.
> >
> > Note: the bad descriptors are only for simulating an exception
> interrupt.
> > This case can illustrate the potential risk in current fsl-dma very
> well.
> >
> > Cc: Dan Williams 
> > Cc: Dan Williams 
> > Cc: Vinod Koul 
> > Cc: Li Yang 
> > Signed-off-by: Qiang Liu 
> > Signed-off-by: Ira W. Snyder 
> 
> There are two minor nitpicks below. Other than that, the patch looks
> excellent to me.
> 
> Ira
> 
> > ---
> >  drivers/dma/fsldma.c |  232 ++--

RE: [PATCH v6 0/8] Raid: enable talitos xor offload for improving performance

2012-08-06 Thread Liu Qiang-B32616
> -Original Message-
> From: Phillips Kim-R1AAHA
> Sent: Tuesday, August 07, 2012 9:35 AM
> To: Liu Qiang-B32616
> Cc: linux-cry...@vger.kernel.org; vinod.k...@intel.com;
> dan.j.willi...@intel.com; herb...@gondor.hengli.com.au; a...@arndb.de;
> gre...@linuxfoundation.org; linuxppc-dev@lists.ozlabs.org; linux-
> ker...@vger.kernel.org; dan.j.willi...@gmail.com; Li Yang-R58472
> Subject: Re: [PATCH v6 0/8] Raid: enable talitos xor offload for
> improving performance
> 
> On Mon, 6 Aug 2012 18:10:15 +0800
>  wrote:
> 
> > Changes in v6:
> > - swap the order of original patch 3/6 and 4/6;
> > - merge Ira's patch to reduce the size of original patch;
> > - merge Ira's patch of carma in 8/8;
> > - update documents and descriptions according to Ira's advice;
> 
> fwiw, I gave v5 a test-drive, setting up a RAID5 array on ramdisks [1],
> and this patchseries, along with FSL_DMA && NET_DMA set seems to be
> holding water, so this series gets my:
> 
> Tested-by: Kim Phillips 
Thanks, Kim. I will add this line in v7:)

> 
> Thanks,
> 
> Kim
> 
> [1] mdadm --create --verbose --force /dev/md0 --level=raid5 --raid-
> devices=4 /dev/ram[0123]

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


RE: [PATCH v6 5/8] fsl-dma: change release process of dma descriptor for supporting async_tx

2012-08-06 Thread Liu Qiang-B32616
> -Original Message-
> From: Ira W. Snyder [mailto:i...@ovro.caltech.edu]
> Sent: Tuesday, August 07, 2012 1:51 AM
> To: Liu Qiang-B32616
> Cc: linux-cry...@vger.kernel.org; linuxppc-dev@lists.ozlabs.org;
> dan.j.willi...@intel.com; linux-ker...@vger.kernel.org;
> dan.j.willi...@gmail.com; vinod.k...@intel.com; a...@arndb.de;
> gre...@linuxfoundation.org; herb...@gondor.hengli.com.au;
> da...@davemloft.net
> Subject: Re: [PATCH v6 5/8] fsl-dma: change release process of dma
> descriptor for supporting async_tx
> 
> On Mon, Aug 06, 2012 at 06:14:33PM +0800, qiang@freescale.com wrote:
> > From: Qiang Liu 
> >
> > Fix the potential risk when enable config NET_DMA and ASYNC_TX.
> > Async_tx is lack of support in current release process of dma
> descriptor,
> > all descriptors will be released whatever is acked or no-acked by
> async_tx,
> > so there is a potential race condition when dma engine is uesd by
> others
> > clients (e.g. when enable NET_DMA to offload TCP).
> >
> > In our case, a race condition which is raised when use both of talitos
> > and dmaengine to offload xor is because napi scheduler will sync all
> > pending requests in dma channels, it affects the process of raid
> operations
> > due to ack_tx is not checked in fsl dma. The no-acked descriptor is
> freed
> > which is submitted just now, as a dependent tx, this freed descriptor
> trigger
> > BUG_ON(async_tx_test_ack(depend_tx)) in async_tx_submit().
> >
> > TASK = ee1a94a0[1390] 'md0_raid5' THREAD: ecf4 CPU: 0
> > GPR00: 0001 ecf41ca0 ee44/921a94a0 003f 0001 c00593e4
>  0001
> > GPR08:  a7a7a7a7 0001 045/92002 42028042 100a38d4
> ed576d98 
> > GPR16: ed5a11b0  2b162000 0200 046/92000 2d555000
> ed3015e8 c15a7aa0
> > GPR24:  c155fc40  ecb63220 ecf41d28 e47/92f640bb0
> ef640c30 ecf41ca0
> > NIP [c02b048c] async_tx_submit+0x6c/0x2b4
> > LR [c02b068c] async_tx_submit+0x26c/0x2b4
> > Call Trace:
> > [ecf41ca0] [c02b068c] async_tx_submit+0x26c/0x2b448/92 (unreliable)
> > [ecf41cd0] [c02b0a4c] async_memcpy+0x240/0x25c
> > [ecf41d20] [c0421064] async_copy_data+0xa0/0x17c
> > [ecf41d70] [c0421cf4] __raid_run_ops+0x874/0xe10
> > [ecf41df0] [c0426ee4] handle_stripe+0x820/0x25e8
> > [ecf41e90] [c0429080] raid5d+0x3d4/0x5b4
> > [ecf41f40] [c04329b8] md_thread+0x138/0x16c
> > [ecf41f90] [c008277c] kthread+0x8c/0x90
> > [ecf41ff0] [c0011630] kernel_thread+0x4c/0x68
> >
> > Another modification in this patch is the change of completed
> descriptors,
> > there is a potential risk which caused by exception interrupt, all
> descriptors
> > in ld_running list are seemed completed when an interrupt raised, it
> works fine
> > under normal condition, but if there is an exception occured, it cannot
> work
> > as our excepted. Hardware should not be depend on s/w list, the right
> way is
> > to read current descriptor address register to find the last completed
> > descriptor. If an interrupt is raised by an error, all descriptors in
> ld_running
> > should not be seemed finished, or these unfinished descriptors in
> ld_running
> > will be released wrongly.
> >
> > A simple way to reproduce,
> > Enable dmatest first, then insert some bad descriptors which can
> trigger
> > Programming Error interrupts before the good descriptors. Last, the
> good
> > descriptors will be freed before they are processsed because of the
> exception
> > intrerrupt.
> >
> > Note: the bad descriptors are only for simulating an exception
> interrupt.
> > This case can illustrate the potential risk in current fsl-dma very
> well.
> >
> > Cc: Dan Williams 
> > Cc: Dan Williams 
> > Cc: Vinod Koul 
> > Cc: Li Yang 
> > Signed-off-by: Qiang Liu 
> > Signed-off-by: Ira W. Snyder 
> 
> There are two minor nitpicks below. Other than that, the patch looks
> excellent to me.
> 
> Ira
> 
> > ---
> >  drivers/dma/fsldma.c |  232 ++
> 
> >  drivers/dma/fsldma.h |   17 +++-
> >  2 files changed, 174 insertions(+), 75 deletions(-)
> >
> > diff --git a/drivers/dma/fsldma.c b/drivers/dma/fsldma.c
> > index 36490a3..938d8c1 100644
> > --- a/drivers/dma/fsldma.c
> > +++ b/drivers/dma/fsldma.c
> > @@ -472,6 +472,110 @@ static struct fsl_desc_sw
> *fsl_dma_alloc_descriptor(struct fsldma_chan *chan)
> >  }
> >
> >  /**
> > + * fsldma_clean_completed_descriptor - free

RE: [PATCH v6 6/8] fsl-dma: use spin_lock_bh to instead of spin_lock_irqsave

2012-08-06 Thread Liu Qiang-B32616
> -Original Message-
> From: Arnd Bergmann [mailto:a...@arndb.de]
> Sent: Monday, August 06, 2012 7:57 PM
> To: Liu Qiang-B32616
> Cc: linux-cry...@vger.kernel.org; linuxppc-dev@lists.ozlabs.org;
> dan.j.willi...@intel.com; linux-ker...@vger.kernel.org;
> dan.j.willi...@gmail.com; vinod.k...@intel.com; Phillips Kim-R1AAHA;
> herb...@gondor.hengli.com.au; da...@davemloft.net;
> gre...@linuxfoundation.org; Li Yang-R58472; Tabi Timur-B04825
> Subject: Re: [PATCH v6 6/8] fsl-dma: use spin_lock_bh to instead of
> spin_lock_irqsave
> 
> On Monday 06 August 2012, qiang@freescale.com wrote:
> >
> > From: Qiang Liu 
> >
> > The use of spin_lock_irqsave() is a stronger locking mechanism than is
> > required throughout the driver. The minimum locking required should be
> > used instead. Interrupts will be turned off and context will be saved,
> > there is needless to use irqsave.
> >
> > Change all instances of spin_lock_irqsave() to spin_lock_bh().
> > All manipulation of protected fields is done using tasklet context or
> > weaker, which makes spin_lock_bh() the correct choice.
> >
> > Cc: Dan Williams 
> > Cc: Dan Williams 
> > Cc: Vinod Koul 
> > Cc: Li Yang 
> > Cc: Timur Tabi 
> > Signed-off-by: Qiang Liu 
> > Acked-by: Ira W. Snyder 
> 
> Acked-by: Arnd Bergmann 
> 
> You could actually change the use of spin_lock_bh inside of the tasklet
> function (dma_do_tasklet) do just spin_lock(), because softirqs are
> already disabled there, but your version is also ok.
Yes, you are right, it will disable softirq.
Thank you very much.


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


RE: [PATCH 6/8] fsl-dma: use spin_lock_bh to instead of spin_lock_irqsave

2012-08-06 Thread Liu Qiang-B32616
Hi all,

Please ignore this one. Thanks.

> -Original Message-
> From: Liu Qiang-B32616
> Sent: Monday, August 06, 2012 6:15 PM
> To: linux-cry...@vger.kernel.org; linuxppc-dev@lists.ozlabs.org;
> dan.j.willi...@intel.com; linux-ker...@vger.kernel.org;
> dan.j.willi...@gmail.com; vinod.k...@intel.com
> Cc: Phillips Kim-R1AAHA; herb...@gondor.hengli.com.au;
> da...@davemloft.net; a...@arndb.de; gre...@linuxfoundation.org; Liu
> Qiang-B32616; Li Yang-R58472; Tabi Timur-B04825
> Subject: [PATCH 6/8] fsl-dma: use spin_lock_bh to instead of
> spin_lock_irqsave
> 
> From: Qiang Liu 
> 
> The use of spin_lock_irqsave() is a stronger locking mechanism than is
> required throughout the driver. The minimum locking required should be
> used instead. Interrupts will be turned off and context will be saved,
> there is needless to use irqsave.
> 
> Change all instances of spin_lock_irqsave() to spin_lock_bh().
> All manipulation of protected fields is done using tasklet context or
> weaker, which makes spin_lock_bh() the correct choice.
> 
> Cc: Dan Williams 
> Cc: Dan Williams 
> Cc: Vinod Koul 
> Cc: Li Yang 
> Cc: Timur Tabi 
> Signed-off-by: Qiang Liu 
> Acked-by: Ira W. Snyder 
> ---
>  drivers/dma/fsldma.c |   30 --
>  1 files changed, 12 insertions(+), 18 deletions(-)
> 
> diff --git a/drivers/dma/fsldma.c b/drivers/dma/fsldma.c
> index 938d8c1..3f809df 100644
> --- a/drivers/dma/fsldma.c
> +++ b/drivers/dma/fsldma.c
> @@ -405,10 +405,9 @@ static dma_cookie_t fsl_dma_tx_submit(struct
> dma_async_tx_descriptor *tx)
>   struct fsldma_chan *chan = to_fsl_chan(tx->chan);
>   struct fsl_desc_sw *desc = tx_to_fsl_desc(tx);
>   struct fsl_desc_sw *child;
> - unsigned long flags;
>   dma_cookie_t cookie;
> 
> - spin_lock_irqsave(&chan->desc_lock, flags);
> + spin_lock_bh(&chan->desc_lock);
> 
>   /*
>* assign cookies to all of the software descriptors
> @@ -421,7 +420,7 @@ static dma_cookie_t fsl_dma_tx_submit(struct
> dma_async_tx_descriptor *tx)
>   /* put this transaction onto the tail of the pending queue */
>   append_ld_queue(chan, desc);
> 
> - spin_unlock_irqrestore(&chan->desc_lock, flags);
> + spin_unlock_bh(&chan->desc_lock);
> 
>   return cookie;
>  }
> @@ -761,15 +760,14 @@ static void fsldma_free_desc_list_reverse(struct
> fsldma_chan *chan,
>  static void fsl_dma_free_chan_resources(struct dma_chan *dchan)
>  {
>   struct fsldma_chan *chan = to_fsl_chan(dchan);
> - unsigned long flags;
> 
>   chan_dbg(chan, "free all channel resources\n");
> - spin_lock_irqsave(&chan->desc_lock, flags);
> + spin_lock_bh(&chan->desc_lock);
>   fsldma_cleanup_descriptors(chan);
>   fsldma_free_desc_list(chan, &chan->ld_pending);
>   fsldma_free_desc_list(chan, &chan->ld_running);
>   fsldma_free_desc_list(chan, &chan->ld_completed);
> - spin_unlock_irqrestore(&chan->desc_lock, flags);
> + spin_unlock_bh(&chan->desc_lock);
> 
>   dma_pool_destroy(chan->desc_pool);
>   chan->desc_pool = NULL;
> @@ -988,7 +986,6 @@ static int fsl_dma_device_control(struct dma_chan
> *dchan,
>  {
>   struct dma_slave_config *config;
>   struct fsldma_chan *chan;
> - unsigned long flags;
>   int size;
> 
>   if (!dchan)
> @@ -998,7 +995,7 @@ static int fsl_dma_device_control(struct dma_chan
> *dchan,
> 
>   switch (cmd) {
>   case DMA_TERMINATE_ALL:
> - spin_lock_irqsave(&chan->desc_lock, flags);
> + spin_lock_bh(&chan->desc_lock);
> 
>   /* Halt the DMA engine */
>   dma_halt(chan);
> @@ -1009,7 +1006,7 @@ static int fsl_dma_device_control(struct dma_chan
> *dchan,
>   fsldma_free_desc_list(chan, &chan->ld_completed);
>   chan->idle = true;
> 
> - spin_unlock_irqrestore(&chan->desc_lock, flags);
> + spin_unlock_bh(&chan->desc_lock);
>   return 0;
> 
>   case DMA_SLAVE_CONFIG:
> @@ -1051,11 +1048,10 @@ static int fsl_dma_device_control(struct dma_chan
> *dchan,
>  static void fsl_dma_memcpy_issue_pending(struct dma_chan *dchan)
>  {
>   struct fsldma_chan *chan = to_fsl_chan(dchan);
> - unsigned long flags;
> 
> - spin_lock_irqsave(&chan->desc_lock, flags);
> + spin_lock_bh(&chan->desc_lock);
>   fsl_chan_xfer_ld_queue(chan);
> - spin_unlock_irqrestore(&chan->desc_lock, flags);
> + spin_unlock_bh(&chan->desc_lock);
&

RE: [PATCH v5 3/6] fsl-dma: change release process of dma descriptor for supporting async_tx

2012-08-02 Thread Liu Qiang-B32616
Hi Ira,

Here, I want to talk about the issue of dma lock when use dmatest with original 
patch.

I do some tests on p1022ds, 2 cores, 6 dma channels (actually is 4 channels, I 
am investigating why is 6, but it doesn't matter). I would like to share with 
you to find something:)

First, it is easy to reproduce your issue with 6 channels after up to 200,000 
iterations per channel, there is not any response from terminal, then I run top 
to see the 2 cores load, I found cpu load is 100%, terminal won't update any 
info after a long time;
Second, I only enable 4 channels to run dmatest, I found cpu load is 50% per 
channel, terminal also update info after a long time;
Last, I only enable 1 channel to run dmatest, serial interrupt is attached to 
core1, and dma channel interrupt is attached to core0, I found terminal is very 
slow after boot up, but it's only slow whatever how many interrupts raised by 
the only dma channel (over 900,000 iterations).

So I want to know, our test results are same? How do you know dma engine is 
self locked but not for heavy cpu load, there is no chance to schedule?
I also read dmatest.c, there is not any sleep to release cpu (msleep(100) if tx 
is null).

If our test results are same, I think I can explain why your original patch can 
"fix" the issue of dma lock, because your patch avoid mass cpu data operation. 
83xx is old, and P1 series is low end chip of Freescale, this maybe the reason 
why p4080 cannot reproduce this issue.

You can have a test to verify my point if you are available. I will follow this 
issue until it's resolved. I will let you know if any other new clues.

Thanks.

 

> -Original Message-
> From: Ira W. Snyder [mailto:i...@ovro.caltech.edu]
> Sent: Thursday, August 02, 2012 1:25 AM
> To: Liu Qiang-B32616
> Cc: linux-cry...@vger.kernel.org; linuxppc-dev@lists.ozlabs.org; linux-
> ker...@vger.kernel.org; dan.j.willi...@gmail.com; Vinod Koul;
> herb...@gondor.hengli.com.au; Dan Williams; da...@davemloft.net
> Subject: Re: [PATCH v5 3/6] fsl-dma: change release process of dma
> descriptor for supporting async_tx
> 
> On Wed, Aug 01, 2012 at 04:49:17PM +0800, qiang@freescale.com wrote:
> > From: Qiang Liu 
> >
> > Fix the potential risk when enable config NET_DMA and ASYNC_TX.
> > Async_tx is lack of support in current release process of dma
> descriptor,
> > all descriptors will be released whatever is acked or no-acked by
> async_tx,
> > so there is a potential race condition when dma engine is uesd by
> others
> > clients (e.g. when enable NET_DMA to offload TCP).
> >
> > In our case, a race condition which is raised when use both of talitos
> > and dmaengine to offload xor is because napi scheduler will sync all
> > pending requests in dma channels, it affects the process of raid
> operations
> > due to ack_tx is not checked in fsl dma. The no-acked descriptor is
> freed
> > which is submitted just now, as a dependent tx, this freed descriptor
> trigger
> > BUG_ON(async_tx_test_ack(depend_tx)) in async_tx_submit().
> >
> > TASK = ee1a94a0[1390] 'md0_raid5' THREAD: ecf4 CPU: 0
> > GPR00: 0001 ecf41ca0 ee44/921a94a0 003f 0001 c00593e4
>  0001
> > GPR08:  a7a7a7a7 0001 045/92002 42028042 100a38d4
> ed576d98 
> > GPR16: ed5a11b0  2b162000 0200 046/92000 2d555000
> ed3015e8 c15a7aa0
> > GPR24:  c155fc40  ecb63220 ecf41d28 e47/92f640bb0
> ef640c30 ecf41ca0
> > NIP [c02b048c] async_tx_submit+0x6c/0x2b4
> > LR [c02b068c] async_tx_submit+0x26c/0x2b4
> > Call Trace:
> > [ecf41ca0] [c02b068c] async_tx_submit+0x26c/0x2b448/92 (unreliable)
> > [ecf41cd0] [c02b0a4c] async_memcpy+0x240/0x25c
> > [ecf41d20] [c0421064] async_copy_data+0xa0/0x17c
> > [ecf41d70] [c0421cf4] __raid_run_ops+0x874/0xe10
> > [ecf41df0] [c0426ee4] handle_stripe+0x820/0x25e8
> > [ecf41e90] [c0429080] raid5d+0x3d4/0x5b4
> > [ecf41f40] [c04329b8] md_thread+0x138/0x16c
> > [ecf41f90] [c008277c] kthread+0x8c/0x90
> > [ecf41ff0] [c0011630] kernel_thread+0x4c/0x68
> >
> > Another major modification in this patch is the change to completed
> descriptors,
> > there is a potential risk which caused by exception interrupt, all
> descriptors
> > in ld_running list are seemed completed when an interrupt raised, it
> works fine
> > under normal condition, but if there is an exception occured, it cannot
> work
> > as our excepted. Hardware should not depend on s/w list, the right way
> is
> > to read current descriptor address register to find the last completed
> > descriptor. If an interrupt is raised by an error, all descriptors in
> ld_running
> >

RE: [PATCH v5 3/6] fsl-dma: change release process of dma descriptor for supporting async_tx

2012-08-02 Thread Liu Qiang-B32616
> -Original Message-
> From: Ira W. Snyder [mailto:i...@ovro.caltech.edu]
> Sent: Thursday, August 02, 2012 1:25 AM
> To: Liu Qiang-B32616
> Cc: linux-cry...@vger.kernel.org; linuxppc-dev@lists.ozlabs.org; linux-
> ker...@vger.kernel.org; dan.j.willi...@gmail.com; Vinod Koul;
> herb...@gondor.hengli.com.au; Dan Williams; da...@davemloft.net
> Subject: Re: [PATCH v5 3/6] fsl-dma: change release process of dma
> descriptor for supporting async_tx
> 
> On Wed, Aug 01, 2012 at 04:49:17PM +0800, qiang@freescale.com wrote:
> > From: Qiang Liu 
> >
> > Fix the potential risk when enable config NET_DMA and ASYNC_TX.
> > Async_tx is lack of support in current release process of dma
> > descriptor, all descriptors will be released whatever is acked or
> > no-acked by async_tx, so there is a potential race condition when dma
> > engine is uesd by others clients (e.g. when enable NET_DMA to offload
> TCP).
> >
> > In our case, a race condition which is raised when use both of talitos
> > and dmaengine to offload xor is because napi scheduler will sync all
> > pending requests in dma channels, it affects the process of raid
> > operations due to ack_tx is not checked in fsl dma. The no-acked
> > descriptor is freed which is submitted just now, as a dependent tx,
> > this freed descriptor trigger
> > BUG_ON(async_tx_test_ack(depend_tx)) in async_tx_submit().
> >
> > TASK = ee1a94a0[1390] 'md0_raid5' THREAD: ecf4 CPU: 0
> > GPR00: 0001 ecf41ca0 ee44/921a94a0 003f 0001 c00593e4
> >  0001
> > GPR08:  a7a7a7a7 0001 045/92002 42028042 100a38d4
> > ed576d98 
> > GPR16: ed5a11b0  2b162000 0200 046/92000 2d555000
> > ed3015e8 c15a7aa0
> > GPR24:  c155fc40  ecb63220 ecf41d28 e47/92f640bb0
> > ef640c30 ecf41ca0 NIP [c02b048c] async_tx_submit+0x6c/0x2b4 LR
> > [c02b068c] async_tx_submit+0x26c/0x2b4 Call Trace:
> > [ecf41ca0] [c02b068c] async_tx_submit+0x26c/0x2b448/92 (unreliable)
> > [ecf41cd0] [c02b0a4c] async_memcpy+0x240/0x25c [ecf41d20] [c0421064]
> > async_copy_data+0xa0/0x17c [ecf41d70] [c0421cf4]
> > __raid_run_ops+0x874/0xe10 [ecf41df0] [c0426ee4]
> > handle_stripe+0x820/0x25e8 [ecf41e90] [c0429080] raid5d+0x3d4/0x5b4
> > [ecf41f40] [c04329b8] md_thread+0x138/0x16c [ecf41f90] [c008277c]
> > kthread+0x8c/0x90 [ecf41ff0] [c0011630] kernel_thread+0x4c/0x68
> >
> > Another major modification in this patch is the change to completed
> > descriptors, there is a potential risk which caused by exception
> > interrupt, all descriptors in ld_running list are seemed completed
> > when an interrupt raised, it works fine under normal condition, but if
> > there is an exception occured, it cannot work as our excepted.
> > Hardware should not depend on s/w list, the right way is to read
> > current descriptor address register to find the last completed
> > descriptor. If an interrupt is raised by an error, all descriptors in
> > ld_running should not be seemed finished, or these unfinished
> descriptors in ld_running will be released wrongly.
> >
> > A simple way to reproduce,
> > Enable dmatest first, then insert some bad descriptors which can
> > trigger Programming Error interrupts before the good descriptors.
> > Last, the good descriptors will be freed before they are processsed
> > because of the exception intrerrupt.
> >
> > Note: the bad descriptors are only for simulating an exception
> interrupt.
> > This case can illustrate the potential risk in current fsl-dma very
> well.
> >
> 
> I've never managed to trigger a PE (programming error) interrupt on the
> 83xx hardware. Any time I intentionally caused an error, the hardware
> wedged itself. The CB (channel busy) bit is stuck high, and cannot be
> cleared without a hard reset of the board.
Sorry, the exception indeed will be occurred, actually, the capability 
DMA_INTERRUPT
will reproduce the issue under conditions. It will trigger a exception because 
of
bad descriptor (length is zero, src and dst are zero, a->b->c->bada->badb->d, 
we cannot find out which one is really finished in an interrupt tasklet).
So, we'd better consider this case.

BTW, I have already found out your patch which is used to resolve the issue of 
dma lock,
http://lkml.indiana.edu/hypermail/linux/kernel/1103.0/01519.html

> 
> I agree the "snoop on the hardware" technique works. As far as I can tell,
> you have implemented the code correctly.
> 
> The MPC8349EARM.pdf from Freescale indicates that the hardware will halt
> in response to a programming error, and generate a PE i

RE: [PATCH v5 5/6] fsl-dma: use spin_lock_bh to instead of spin_lock_irqsave

2012-08-01 Thread Liu Qiang-B32616
> -Original Message-
> From: Ira W. Snyder [mailto:i...@ovro.caltech.edu]
> Sent: Thursday, August 02, 2012 1:31 AM
> To: Liu Qiang-B32616
> Cc: linux-cry...@vger.kernel.org; linuxppc-dev@lists.ozlabs.org; linux-
> ker...@vger.kernel.org; dan.j.willi...@gmail.com; Vinod Koul; Tabi Timur-
> B04825; herb...@gondor.hengli.com.au; Dan Williams; da...@davemloft.net
> Subject: Re: [PATCH v5 5/6] fsl-dma: use spin_lock_bh to instead of
> spin_lock_irqsave
> 
> On Wed, Aug 01, 2012 at 04:50:09PM +0800, qiang@freescale.com wrote:
> > From: Qiang Liu 
> >
> > - use spin_lock_bh() is the right way to use async_tx api,
> > dma_run_dependencies() should not be protected by spin_lock_irqsave();
> > - use spin_lock_bh to instead of spin_lock_irqsave for improving
> > performance, There is not any place to access descriptor queues in
> > fsl-dma ISR except its tasklet, spin_lock_bh() is more proper here.
> > Interrupts will be turned off and context will be save in irqsave,
> there is needless to use irqsave..
> >
> 
> This description is not very clear English. I understand it is not your
> native language. Let me try to help.
> 
> """
> The use of spin_lock_irqsave() is a stronger locking mechanism than is
> required throughout the driver. The minimum locking required should be
> used instead.
> 
> Change all instances of spin_lock_irqsave() to spin_lock_bh(). All
> manipulation of protected fields is done using tasklet context or weaker,
> which makes spin_lock_bh() the correct choice.
> """
I will modify the description in v6, please check later.

> 
> Other than that,
> Acked-by: Ira W. Snyder 
Thanks.

> 
> > Cc: Dan Williams 
> > Cc: Vinod Koul 
> > Cc: Li Yang 
> > Cc: Timur Tabi 
> > Signed-off-by: Qiang Liu 
> > ---
> >  drivers/dma/fsldma.c |   30 --
> >  1 files changed, 12 insertions(+), 18 deletions(-)
> >
> > diff --git a/drivers/dma/fsldma.c b/drivers/dma/fsldma.c index
> > bb883c0..e3814aa 100644
> > --- a/drivers/dma/fsldma.c
> > +++ b/drivers/dma/fsldma.c
> > @@ -645,10 +645,9 @@ static dma_cookie_t fsl_dma_tx_submit(struct
> dma_async_tx_descriptor *tx)
> > struct fsldma_chan *chan = to_fsl_chan(tx->chan);
> > struct fsl_desc_sw *desc = tx_to_fsl_desc(tx);
> > struct fsl_desc_sw *child;
> > -   unsigned long flags;
> > dma_cookie_t cookie;
> >
> > -   spin_lock_irqsave(&chan->desc_lock, flags);
> > +   spin_lock_bh(&chan->desc_lock);
> >
> > /*
> >  * assign cookies to all of the software descriptors @@ -661,7
> > +660,7 @@ static dma_cookie_t fsl_dma_tx_submit(struct
> dma_async_tx_descriptor *tx)
> > /* put this transaction onto the tail of the pending queue */
> > append_ld_queue(chan, desc);
> >
> > -   spin_unlock_irqrestore(&chan->desc_lock, flags);
> > +   spin_unlock_bh(&chan->desc_lock);
> >
> > return cookie;
> >  }
> > @@ -770,15 +769,14 @@ static void fsldma_free_desc_list_reverse(struct
> > fsldma_chan *chan,  static void fsl_dma_free_chan_resources(struct
> > dma_chan *dchan)  {
> > struct fsldma_chan *chan = to_fsl_chan(dchan);
> > -   unsigned long flags;
> >
> > chan_dbg(chan, "free all channel resources\n");
> > -   spin_lock_irqsave(&chan->desc_lock, flags);
> > +   spin_lock_bh(&chan->desc_lock);
> > fsldma_cleanup_descriptor(chan);
> > fsldma_free_desc_list(chan, &chan->ld_pending);
> > fsldma_free_desc_list(chan, &chan->ld_running);
> > fsldma_free_desc_list(chan, &chan->ld_completed);
> > -   spin_unlock_irqrestore(&chan->desc_lock, flags);
> > +   spin_unlock_bh(&chan->desc_lock);
> >
> > dma_pool_destroy(chan->desc_pool);
> > chan->desc_pool = NULL;
> > @@ -997,7 +995,6 @@ static int fsl_dma_device_control(struct dma_chan
> > *dchan,  {
> > struct dma_slave_config *config;
> > struct fsldma_chan *chan;
> > -   unsigned long flags;
> > int size;
> >
> > if (!dchan)
> > @@ -1007,7 +1004,7 @@ static int fsl_dma_device_control(struct
> > dma_chan *dchan,
> >
> > switch (cmd) {
> > case DMA_TERMINATE_ALL:
> > -   spin_lock_irqsave(&chan->desc_lock, flags);
> > +   spin_lock_bh(&chan->desc_lock);
> >
> > /* Halt the DMA engine */
> > dma_halt(chan);
> > @@ -1017,7 +1014,7 @@ static int fsl_dma_device_control(str

RE: [PATCH v5 4/6] fsl-dma: move the function ahead of its invoke function

2012-08-01 Thread Liu Qiang-B32616
> -Original Message-
> From: Ira W. Snyder [mailto:i...@ovro.caltech.edu]
> Sent: Thursday, August 02, 2012 12:31 AM
> To: Liu Qiang-B32616
> Cc: linux-cry...@vger.kernel.org; linuxppc-dev@lists.ozlabs.org; linux-
> ker...@vger.kernel.org; dan.j.willi...@gmail.com; Vinod Koul;
> herb...@gondor.hengli.com.au; Dan Williams; da...@davemloft.net
> Subject: Re: [PATCH v5 4/6] fsl-dma: move the function ahead of its
> invoke function
> 
> On Wed, Aug 01, 2012 at 04:49:43PM +0800, qiang@freescale.com wrote:
> > From: Qiang Liu 
> >
> > Move the function fsldma_cleanup_descriptor() and
> fsl_chan_xfer_ld_queue()
> > ahead of its invoke function for avoiding redundant definition.
> >
> > Cc: Dan Williams 
> > Cc: Vinod Koul 
> > Cc: Li Yang 
> > Signed-off-by: Qiang Liu 
> > ---
> >  drivers/dma/fsldma.c |  252 +-
> 
> >  1 files changed, 124 insertions(+), 128 deletions(-)
> >
> > diff --git a/drivers/dma/fsldma.c b/drivers/dma/fsldma.c
> > index 87f52c0..bb883c0 100644
> > --- a/drivers/dma/fsldma.c
> > +++ b/drivers/dma/fsldma.c
> > @@ -400,9 +400,6 @@ out_splice:
> > list_splice_tail_init(&desc->tx_list, &chan->ld_pending);
> >  }
> >
> > -static void fsldma_cleanup_descriptor(struct fsldma_chan *chan);
> > -static void fsl_chan_xfer_ld_queue(struct fsldma_chan *chan);
> > -
> 
> Please swap the order of this patch (patch 4/6) and the previous patch
> (patch 3/6).
> 
> You added these lines in the patch 3/6 and deleted them here. If you
> reverse the order of the patches, this doesn't happen.
> 
> Adding lines only to delete them in the next patch should be avoided.
I will swap the order in v6. Thanks.

> 
> >  /**
> >   * fsldma_clean_completed_descriptor - free all descriptors which
> >   * has been completed and acked
> > @@ -519,6 +516,130 @@ fsldma_clean_running_descriptor(struct
> fsldma_chan *chan,
> > return 0;
> >  }
> >
> > +/**
> > + * fsl_chan_xfer_ld_queue - transfer any pending transactions
> > + * @chan : Freescale DMA channel
> > + *
> > + * HARDWARE STATE: idle
> > + * LOCKING: must hold chan->desc_lock
> > + */
> > +static void fsl_chan_xfer_ld_queue(struct fsldma_chan *chan)
> > +{
> > +   struct fsl_desc_sw *desc;
> > +
> > +   /*
> > +* If the list of pending descriptors is empty, then we
> > +* don't need to do any work at all
> > +*/
> > +   if (list_empty(&chan->ld_pending)) {
> > +   chan_dbg(chan, "no pending LDs\n");
> > +   return;
> > +   }
> > +
> > +   /*
> > +* The DMA controller is not idle, which means that the interrupt
> > +* handler will start any queued transactions when it runs after
> > +* this transaction finishes
> > +*/
> > +   if (!chan->idle) {
> > +   chan_dbg(chan, "DMA controller still busy\n");
> > +   return;
> > +   }
> > +
> > +   /*
> > +* If there are some link descriptors which have not been
> > +* transferred, we need to start the controller
> > +*/
> > +
> > +   /*
> > +* Move all elements from the queue of pending transactions
> > +* onto the list of running transactions
> > +*/
> > +   chan_dbg(chan, "idle, starting controller\n");
> > +   desc = list_first_entry(&chan->ld_pending, struct fsl_desc_sw,
> node);
> > +   list_splice_tail_init(&chan->ld_pending, &chan->ld_running);
> > +
> > +   /*
> > +* The 85xx DMA controller doesn't clear the channel start bit
> > +* automatically at the end of a transfer. Therefore we must clear
> > +* it in software before starting the transfer.
> > +*/
> > +   if ((chan->feature & FSL_DMA_IP_MASK) == FSL_DMA_IP_85XX) {
> > +   u32 mode;
> > +
> > +   mode = DMA_IN(chan, &chan->regs->mr, 32);
> > +   mode &= ~FSL_DMA_MR_CS;
> > +   DMA_OUT(chan, &chan->regs->mr, mode, 32);
> > +   }
> > +
> > +   /*
> > +* Program the descriptor's address into the DMA controller,
> > +* then start the DMA transaction
> > +*/
> > +   set_cdar(chan, desc->async_tx.phys);
> > +   get_cdar(chan);
> > +
> > +   dma_start(chan);
> > +   chan->idle = false;
> > +}
> > +
> > +/**
> > + * fsldma_cleanup_de

RE: [PATCH v5 2/6] fsl-dma: remove attribute DMA_INTERRUPT of dmaengine

2012-08-01 Thread Liu Qiang-B32616
> -Original Message-
> From: Ira W. Snyder [mailto:i...@ovro.caltech.edu]
> Sent: Thursday, August 02, 2012 12:36 AM
> To: Liu Qiang-B32616
> Cc: linux-cry...@vger.kernel.org; linuxppc-dev@lists.ozlabs.org; linux-
> ker...@vger.kernel.org; dan.j.willi...@gmail.com; Vinod Koul;
> herb...@gondor.hengli.com.au; Dan Williams; da...@davemloft.net
> Subject: Re: [PATCH v5 2/6] fsl-dma: remove attribute DMA_INTERRUPT of
> dmaengine
> 
> On Wed, Aug 01, 2012 at 04:49:08PM +0800, qiang@freescale.com wrote:
> > From: Qiang Liu 
> >
> > Delete attribute DMA_INTERRUPT because fsl-dma doesn't support this
> > function, exception will be thrown if talitos is used to offload xor at
> the same time.
> >
> 
> I have no problem with this patch.
> 
> However, it ***WILL BREAK*** both drivers in drivers/misc/carma. Please
> add my patch 7/7 titled "[PATCH 7/7] carma: remove unnecessary
> DMA_INTERRUPT capability" to your series. I suggest placing it
> immediately after this patch in your series.
> 
> The carma drivers use the fsldma driver exclusively.
Fine, thanks.

> 
> > Cc: Dan Williams 
> > Cc: Vinod Koul 
> > Cc: Li Yang 
> > Signed-off-by: Qiang Liu 
> > Acked-by: Ira W. Snyder 
> > ---
> >  drivers/dma/fsldma.c |   31 ---
> >  1 files changed, 0 insertions(+), 31 deletions(-)
> >
> > diff --git a/drivers/dma/fsldma.c b/drivers/dma/fsldma.c index
> > 8f84761..4f2f212 100644
> > --- a/drivers/dma/fsldma.c
> > +++ b/drivers/dma/fsldma.c
> > @@ -543,35 +543,6 @@ static void fsl_dma_free_chan_resources(struct
> > dma_chan *dchan)  }
> >
> >  static struct dma_async_tx_descriptor *
> > -fsl_dma_prep_interrupt(struct dma_chan *dchan, unsigned long flags)
> > -{
> > -   struct fsldma_chan *chan;
> > -   struct fsl_desc_sw *new;
> > -
> > -   if (!dchan)
> > -   return NULL;
> > -
> > -   chan = to_fsl_chan(dchan);
> > -
> > -   new = fsl_dma_alloc_descriptor(chan);
> > -   if (!new) {
> > -   chan_err(chan, "%s\n", msg_ld_oom);
> > -   return NULL;
> > -   }
> > -
> > -   new->async_tx.cookie = -EBUSY;
> > -   new->async_tx.flags = flags;
> > -
> > -   /* Insert the link descriptor to the LD ring */
> > -   list_add_tail(&new->node, &new->tx_list);
> > -
> > -   /* Set End-of-link to the last link descriptor of new list */
> > -   set_ld_eol(chan, new);
> > -
> > -   return &new->async_tx;
> > -}
> > -
> > -static struct dma_async_tx_descriptor *  fsl_dma_prep_memcpy(struct
> > dma_chan *dchan,
> > dma_addr_t dma_dst, dma_addr_t dma_src,
> > size_t len, unsigned long flags)
> > @@ -1352,12 +1323,10 @@ static int __devinit fsldma_of_probe(struct
> platform_device *op)
> > fdev->irq = irq_of_parse_and_map(op->dev.of_node, 0);
> >
> > dma_cap_set(DMA_MEMCPY, fdev->common.cap_mask);
> > -   dma_cap_set(DMA_INTERRUPT, fdev->common.cap_mask);
> > dma_cap_set(DMA_SG, fdev->common.cap_mask);
> > dma_cap_set(DMA_SLAVE, fdev->common.cap_mask);
> > fdev->common.device_alloc_chan_resources =
> fsl_dma_alloc_chan_resources;
> > fdev->common.device_free_chan_resources =
> fsl_dma_free_chan_resources;
> > -   fdev->common.device_prep_dma_interrupt = fsl_dma_prep_interrupt;
> > fdev->common.device_prep_dma_memcpy = fsl_dma_prep_memcpy;
> > fdev->common.device_prep_dma_sg = fsl_dma_prep_sg;
> > fdev->common.device_tx_status = fsl_tx_status;
> > --
> > 1.7.5.1
> >
> >
> > ___
> > Linuxppc-dev mailing list
> > Linuxppc-dev@lists.ozlabs.org
> > https://lists.ozlabs.org/listinfo/linuxppc-dev


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


RE: [PATCH v5 3/6] fsl-dma: change release process of dma descriptor for supporting async_tx

2012-08-01 Thread Liu Qiang-B32616
Hi Ira,

I hope we can discuss fsl-dma in this thread. In this patch I give a simple 
case to illustrate why I must correct the release process of finished 
descriptors.
There is potential risk in current fsl-dma, the finished cookie value and 
finished async_tx descriptor should be judged by hardware, but not only depend 
on the s/w queue ld_running. I know h/w is very fast, but the driver should be 
in align with h/w.

Thanks.

> -Original Message-
> From: Liu Qiang-B32616
> Sent: Wednesday, August 01, 2012 4:49 PM
> To: linux-cry...@vger.kernel.org; linuxppc-dev@lists.ozlabs.org; linux-
> ker...@vger.kernel.org; dan.j.willi...@gmail.com
> Cc: Phillips Kim-R1AAHA; herb...@gondor.hengli.com.au;
> da...@davemloft.net; Liu Qiang-B32616; Dan Williams; Vinod Koul; Li Yang-
> R58472; Ira W. Snyder
> Subject: [PATCH v5 3/6] fsl-dma: change release process of dma descriptor
> for supporting async_tx
> 
> From: Qiang Liu 
> 
> Fix the potential risk when enable config NET_DMA and ASYNC_TX.
> Async_tx is lack of support in current release process of dma descriptor,
> all descriptors will be released whatever is acked or no-acked by
> async_tx, so there is a potential race condition when dma engine is uesd
> by others clients (e.g. when enable NET_DMA to offload TCP).
> 
> In our case, a race condition which is raised when use both of talitos
> and dmaengine to offload xor is because napi scheduler will sync all
> pending requests in dma channels, it affects the process of raid
> operations due to ack_tx is not checked in fsl dma. The no-acked
> descriptor is freed which is submitted just now, as a dependent tx, this
> freed descriptor trigger
> BUG_ON(async_tx_test_ack(depend_tx)) in async_tx_submit().
> 
> TASK = ee1a94a0[1390] 'md0_raid5' THREAD: ecf4 CPU: 0
> GPR00: 0001 ecf41ca0 ee44/921a94a0 003f 0001 c00593e4
>  0001
> GPR08:  a7a7a7a7 0001 045/92002 42028042 100a38d4
> ed576d98 
> GPR16: ed5a11b0  2b162000 0200 046/92000 2d555000
> ed3015e8 c15a7aa0
> GPR24:  c155fc40  ecb63220 ecf41d28 e47/92f640bb0
> ef640c30 ecf41ca0 NIP [c02b048c] async_tx_submit+0x6c/0x2b4 LR [c02b068c]
> async_tx_submit+0x26c/0x2b4 Call Trace:
> [ecf41ca0] [c02b068c] async_tx_submit+0x26c/0x2b448/92 (unreliable)
> [ecf41cd0] [c02b0a4c] async_memcpy+0x240/0x25c [ecf41d20] [c0421064]
> async_copy_data+0xa0/0x17c [ecf41d70] [c0421cf4]
> __raid_run_ops+0x874/0xe10 [ecf41df0] [c0426ee4]
> handle_stripe+0x820/0x25e8 [ecf41e90] [c0429080] raid5d+0x3d4/0x5b4
> [ecf41f40] [c04329b8] md_thread+0x138/0x16c [ecf41f90] [c008277c]
> kthread+0x8c/0x90 [ecf41ff0] [c0011630] kernel_thread+0x4c/0x68
> 
> Another major modification in this patch is the change to completed
> descriptors, there is a potential risk which caused by exception
> interrupt, all descriptors in ld_running list are seemed completed when
> an interrupt raised, it works fine under normal condition, but if there
> is an exception occured, it cannot work as our excepted. Hardware should
> not depend on s/w list, the right way is to read current descriptor
> address register to find the last completed descriptor. If an interrupt
> is raised by an error, all descriptors in ld_running should not be seemed
> finished, or these unfinished descriptors in ld_running will be released
> wrongly.
> 
> A simple way to reproduce,
> Enable dmatest first, then insert some bad descriptors which can trigger
> Programming Error interrupts before the good descriptors. Last, the good
> descriptors will be freed before they are processsed because of the
> exception intrerrupt.
> 
> Note: the bad descriptors are only for simulating an exception interrupt.
> This case can illustrate the potential risk in current fsl-dma very well.
> 
> Cc: Dan Williams 
> Cc: Dan Williams 
> Cc: Vinod Koul 
> Cc: Li Yang 
> Cc: Ira W. Snyder 
> Signed-off-by: Qiang Liu 
> ---
>  drivers/dma/fsldma.c |  242 +++-
> --
>  drivers/dma/fsldma.h |1 +
>  2 files changed, 172 insertions(+), 71 deletions(-)
> 
> diff --git a/drivers/dma/fsldma.c b/drivers/dma/fsldma.c index
> 4f2f212..87f52c0 100644
> --- a/drivers/dma/fsldma.c
> +++ b/drivers/dma/fsldma.c
> @@ -400,6 +400,125 @@ out_splice:
>   list_splice_tail_init(&desc->tx_list, &chan->ld_pending);  }
> 
> +static void fsldma_cleanup_descriptor(struct fsldma_chan *chan); static
> +void fsl_chan_xfer_ld_queue(struct fsldma_chan *chan);
> +
> +/**
> + * fsldma_clean_completed_descriptor - free all descriptors which
> + * has been completed and acked
> + * @chan: Freescale DMA channel
> + *
> + * This func

RE: [PATCH 5/7] fsl-dma: fix support for async_tx API

2012-07-31 Thread Liu Qiang-B32616
> -Original Message-
> From: Ira W. Snyder [mailto:i...@ovro.caltech.edu]
> Sent: Wednesday, August 01, 2012 7:46 AM
> To: linux-cry...@vger.kernel.org
> Cc: linuxppc-dev@lists.ozlabs.org; Liu Qiang-B32616; Ira W. Snyder; Dan
> Williams; Vinod Koul; Li Yang-R58472; Liu Qiang-B32616
> Subject: [PATCH 5/7] fsl-dma: fix support for async_tx API
> 
> From: "Ira W. Snyder" 
> 
> The current fsldma driver does not support the async_tx API. This is due
> to a bug: all descriptors are released when the hardware is finished
> with them. The async_tx API requires that the ACK bit is set by software
> before descriptors are freed.
> 
> This bug is easiest to reproduce by enabling both CONFIG_NET_DMA and
> CONFIG_ASYNC_TX, and using the hardware offload support in MD RAID. The
> network stack will force operations on shared DMA channels, and will
> free descriptors which are being used by the MD RAID code.
> 
> The BUG_ON(async_tx_test_ack(depend_tx)) test in async_tx_submit() will
> be hit, and panic the machine.
> 
> TASK = ee1a94a0[1390] 'md0_raid5' THREAD: ecf4 CPU: 0
> GPR00: 0001 ecf41ca0 ee44/921a94a0 003f 0001 c00593e4
>  0001
> GPR08:  a7a7a7a7 0001 045/92002 42028042 100a38d4
> ed576d98 
> GPR16: ed5a11b0  2b162000 0200 046/92000 2d555000
> ed3015e8 c15a7aa0
> GPR24:  c155fc40  ecb63220 ecf41d28 e47/92f640bb0
> ef640c30 ecf41ca0
> NIP [c02b048c] async_tx_submit+0x6c/0x2b4
> LR [c02b068c] async_tx_submit+0x26c/0x2b4
> Call Trace:
> [ecf41ca0] [c02b068c] async_tx_submit+0x26c/0x2b448/92 (unreliable)
> [ecf41cd0] [c02b0a4c] async_memcpy+0x240/0x25c
> [ecf41d20] [c0421064] async_copy_data+0xa0/0x17c
> [ecf41d70] [c0421cf4] __raid_run_ops+0x874/0xe10
> [ecf41df0] [c0426ee4] handle_stripe+0x820/0x25e8
> [ecf41e90] [c0429080] raid5d+0x3d4/0x5b4
> [ecf41f40] [c04329b8] md_thread+0x138/0x16c
> [ecf41f90] [c008277c] kthread+0x8c/0x90
> [ecf41ff0] [c0011630] kernel_thread+0x4c/0x68
> 
> Cc: Dan Williams 
> Cc: Vinod Koul 
> Cc: Li Yang 
> Cc: Qiang Liu 
I have no idea where the log is from, if the log is from my patch, please note 
it.

> Signed-off-by: Ira W. Snyder 
> ---
>  drivers/dma/fsldma.c |  156 +++-
> --
>  drivers/dma/fsldma.h |1 +
>  2 files changed, 97 insertions(+), 60 deletions(-)
> 
> diff --git a/drivers/dma/fsldma.c b/drivers/dma/fsldma.c
> index 80edc63..380c1b7 100644
> --- a/drivers/dma/fsldma.c
> +++ b/drivers/dma/fsldma.c
> @@ -410,16 +410,15 @@ static void fsl_dma_free_descriptor(struct
> fsldma_chan *chan, struct fsl_desc_sw
>  }
> 
>  /**
> - * fsldma_cleanup_descriptor - cleanup and free a single link descriptor
> + * fsldma_run_tx_complete_actions - run callback and unmap descriptor
>   * @chan: Freescale DMA channel
>   * @desc: descriptor to cleanup and free
>   *
>   * This function is used on a descriptor which has been executed by the
> DMA
> - * controller. It will run any callbacks, submit any dependencies, and
> then
> - * free the descriptor.
> + * controller. It will run the callback and unmap the descriptor, if
> requested.
>   */
> -static void fsldma_cleanup_descriptor(struct fsldma_chan *chan,
> -   struct fsl_desc_sw *desc)
> +static void fsldma_run_tx_complete_actions(struct fsldma_chan *chan,
> +struct fsl_desc_sw *desc)
>  {
Ah.. I am the original author of this design. Especially some important 
interfaces,
at least you should add my name or note the idea before you modify it?

>   struct dma_async_tx_descriptor *txd = &desc->async_tx;
>   struct device *dev = chan->common.device->dev;
> @@ -427,6 +426,10 @@ static void fsldma_cleanup_descriptor(struct
> fsldma_chan *chan,
>   dma_addr_t dst = get_desc_dst(chan, desc);
>   u32 len = get_desc_cnt(chan, desc);
> 
> + /* Cookies with value zero are already cleaned up */
> + if (txd->cookie == 0)
> + return;
> +
>   /* Run the link descriptor callback function */
>   if (txd->callback) {
>  #ifdef FSL_DMA_LD_DEBUG
> @@ -435,9 +438,6 @@ static void fsldma_cleanup_descriptor(struct
> fsldma_chan *chan,
>   txd->callback(txd->callback_param);
>   }
> 
> - /* Run any dependencies */
> - dma_run_dependencies(txd);
> -
>   /* Unmap the dst buffer, if requested */
>   if (!(txd->flags & DMA_COMPL_SKIP_DEST_UNMAP)) {
>   if (txd->flags & DMA_COMPL_DEST_UNMAP_SINGLE)
> @@ -454,7 +454,68 @@ static void fsldma_cleanup_descriptor(struct
> fsldma_chan *chan

RE: [PATCH 7/7] carma: remove unnecessary DMA_INTERRUPT capability

2012-07-31 Thread Liu Qiang-B32616
> -Original Message-
> From: Ira W. Snyder [mailto:i...@ovro.caltech.edu]
> Sent: Wednesday, August 01, 2012 7:46 AM
> To: linux-cry...@vger.kernel.org
> Cc: linuxppc-dev@lists.ozlabs.org; Liu Qiang-B32616; Ira W. Snyder
> Subject: [PATCH 7/7] carma: remove unnecessary DMA_INTERRUPT capability
> 
> From: "Ira W. Snyder" 
> 
> These drivers set the DMA_INTERRUPT capability bit when requesting a DMA
> controller channel. This was historical, and is no longer needed.
> 
> Recent changes to the drivers/dma/fsldma.c driver have removed support
> for this flag. This makes the carma drivers unable to find a DMA channel
> with the required capabilities.
> 
> Signed-off-by: Ira W. Snyder 
> ---
>  drivers/misc/carma/carma-fpga-program.c |1 -
>  drivers/misc/carma/carma-fpga.c |3 +--
>  2 files changed, 1 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/misc/carma/carma-fpga-program.c
> b/drivers/misc/carma/carma-fpga-program.c
> index a2d25e4..eaddfe9 100644
> --- a/drivers/misc/carma/carma-fpga-program.c
> +++ b/drivers/misc/carma/carma-fpga-program.c
> @@ -978,7 +978,6 @@ static int fpga_of_probe(struct platform_device *op)
>   dev_set_drvdata(priv->dev, priv);
>   dma_cap_zero(mask);
>   dma_cap_set(DMA_MEMCPY, mask);
> - dma_cap_set(DMA_INTERRUPT, mask);
>   dma_cap_set(DMA_SLAVE, mask);
>   dma_cap_set(DMA_SG, mask);
> 
> diff --git a/drivers/misc/carma/carma-fpga.c b/drivers/misc/carma/carma-
> fpga.c index 8c279da..861b298 100644
> --- a/drivers/misc/carma/carma-fpga.c
> +++ b/drivers/misc/carma/carma-fpga.c
> @@ -666,7 +666,7 @@ static int data_submit_dma(struct fpga_device *priv,
> struct data_buf *buf)
>   src = SYS_FPGA_BLOCK;
>   tx = chan->device->device_prep_dma_memcpy(chan, dst, src,
> REG_BLOCK_SIZE,
> -   DMA_PREP_INTERRUPT);
> +   0);
>   if (!tx) {
>   dev_err(priv->dev, "unable to prep SYS-FPGA DMA\n");
>   return -ENOMEM;
> @@ -1333,7 +1333,6 @@ static int data_of_probe(struct platform_device *op)
> 
>   dma_cap_zero(mask);
>   dma_cap_set(DMA_MEMCPY, mask);
> - dma_cap_set(DMA_INTERRUPT, mask);
>   dma_cap_set(DMA_SLAVE, mask);
>   dma_cap_set(DMA_SG, mask);
Hi Ira, attached link is the comments of Dan Williams to the original patch of 
fsl-dma.
http://lkml.indiana.edu/hypermail/linux/kernel/0910.1/03836.html

> --
> 1.7.8.6
> 


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


RE: [PATCH 0/7] fsl-dma: fixes for Freescale DMA driver

2012-07-31 Thread Liu Qiang-B32616
Hi Ira,

My comments inline.

> -Original Message-
> From: Ira W. Snyder [mailto:i...@ovro.caltech.edu]
> Sent: Wednesday, August 01, 2012 7:46 AM
> To: linux-cry...@vger.kernel.org
> Cc: linuxppc-dev@lists.ozlabs.org; Liu Qiang-B32616; Ira W. Snyder
> Subject: [PATCH 0/7] fsl-dma: fixes for Freescale DMA driver
> 
> From: "Ira W. Snyder" 
> 
> Hello everyone,
> 
> This is my alternative (simpler) attempt at solving the problems reported
> by Qiang Liu with the async_tx API and MD RAID hardware offload support
> when using the Freescale DMA driver.
> 
> The bug is caused by this driver freeing descriptors before they have
> been ACKed by software using the async_tx API.
> 
> I don't like Qiang Liu's code to check where the hardware is in the
> processing of the descriptor chain, and try to free a partial list of
> descriptors. This was a source of bugs in this driver before I fixed them
> several years ago.
It's a bug which you think the whole list is completed when an interrupt is 
raised, there is a potential risk when an interrupt is raised by "Programmed 
Error". The "ld_running" is a s/w concept, we should not depend on it to judge 
the status of descriptors list.

I know you don't like this process, but it's a safe and common process. You can 
refer to other dma drivers, like ioap-adma, mv-xor and ibm-ppc440x-adma.
Said far point, usb also take this method to judge which descriptor is 
completed, I don't know which device can use a s/w list to free all 
descriptors, you can refer to the implement of dl_reverse_done_list().

If you find any problem in my patch, please point out, or you can give a link 
about the bug you mentioned many years ago.

Thanks.


> 
> Instead, the DMA controller raises an interrupt every time it has
> completed a descriptor chain. This means it is ready for new descriptors:
> no need to try and figure out where it is in the middle of a descriptor
> chain.
Attached again,
The interrupt is only report the state of hardware, we cannot assume all 
descriptors are finished when an interrupt is raised.

> 
> Qiang Liu: I do not have a hardware setup capable of using MD RAID.
> Please test these patches to see if they fix the bug you reported. You
> may use these patches as-is, or build upon them.
I hope we can discuss it based on my patch. If you think my patch will involve 
some issue, please point out. I'm willing to fix it.
Thanks.

> 
> I have tested this using the drivers/dma/dmatest.c driver, as well as the
> CARMA drivers. There are no regressions that I can find.
> 
> [  355.069679] dma0chan3-copy0: terminating after 10 tests, 0
> failures (status 0) [  355.192278] dma0chan2-copy0: terminating after
> 10 tests, 0 failures (status 0)
> 
> Ira W. Snyder (5):
>   fsl-dma: minimize locking overhead
>   fsl-dma: add fsl_dma_free_descriptor() to reduce code duplication
>   fsl-dma: move functions to avoid forward declarations
>   fsl-dma: fix support for async_tx API
>   carma: remove unnecessary DMA_INTERRUPT capability
> 
> Qiang Liu (2):
>   fsl-dma: remove attribute DMA_INTERRUPT of dmaengine
>   fsl-dma: fix a warning of unitialized cookie
> 
>  drivers/dma/fsldma.c|  318 +++--
> --
>  drivers/dma/fsldma.h|1 +
>  drivers/misc/carma/carma-fpga-program.c |1 -
>  drivers/misc/carma/carma-fpga.c |3 +-
>  4 files changed, 159 insertions(+), 164 deletions(-)
> 
> --
> 1.7.8.6
> 


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


RE: [PATCH v4 3/7] fsl-dma: change release process of dma descriptor for supporting async_tx

2012-07-31 Thread Liu Qiang-B32616
> -Original Message-
> From: Ira W. Snyder [mailto:i...@ovro.caltech.edu]
> Sent: Wednesday, August 01, 2012 6:14 AM
> To: Liu Qiang-B32616
> Cc: linux-cry...@vger.kernel.org; linuxppc-dev@lists.ozlabs.org; Phillips
> Kim-R1AAHA; da...@davemloft.net; dan.j.willi...@gmail.com; Vinod Koul; Li
> Yang-R58472; herb...@gondor.apana.org.au
> Subject: Re: [PATCH v4 3/7] fsl-dma: change release process of dma
> descriptor for supporting async_tx
> 
> On Tue, Jul 31, 2012 at 04:09:28AM +, Liu Qiang-B32616 wrote:
> > > -Original Message-
> > > From: Ira W. Snyder [mailto:i...@ovro.caltech.edu]
> > > Sent: Tuesday, July 31, 2012 5:10 AM
> > > To: Liu Qiang-B32616
> > > Cc: linux-cry...@vger.kernel.org; linuxppc-dev@lists.ozlabs.org;
> Phillips
> > > Kim-R1AAHA; herb...@gondor.hengli.com.au; da...@davemloft.net; Dan
> > > Williams; Vinod Koul; Li Yang-R58472
> > > Subject: Re: [PATCH v4 3/7] fsl-dma: change release process of dma
> > > descriptor for supporting async_tx
> > >
> > > On Fri, Jul 27, 2012 at 05:16:09PM +0800, qiang@freescale.com
> wrote:
> > > > From: Qiang Liu 
> > > >
> > > > Fix the potential risk when enable config NET_DMA and ASYNC_TX.
> > > > Async_tx is lack of support in current release process of dma
> > > descriptor,
> > > > all descriptors will be released whatever is acked or no-acked by
> > > async_tx,
> > > > so there is a potential race condition when dma engine is uesd by
> > > others
> > > > clients (e.g. when enable NET_DMA to offload TCP).
> > > >
> > > > In our case, a race condition which is raised when use both of
> talitos
> > > > and dmaengine to offload xor is because napi scheduler will sync
> all
> > > > pending requests in dma channels, it affects the process of raid
> > > operations
> > > > due to ack_tx is not checked in fsl dma. The no-acked descriptor is
> > > freed
> > > > which is submitted just now, as a dependent tx, this freed
> descriptor
> > > trigger
> > > > BUG_ON(async_tx_test_ack(depend_tx)) in async_tx_submit().
> > > >
> > >
> > > I'm preparing an alternative version of this patch that I think is
> > > easier to understand (it is much shorter). I'll post it up here as
> soon
> > > as I finish testing.
> > Can you give a simple description/idea about your patch? My patch is
> for fix the
> > problems when I build a raid environment with talitos offload xor.
> > I think the new interface is clear enough and similar with the
> implement of other dma devices.
> >
> > And do you have any comments about this patch?
> >
> 
> My patch will fix the same problem, in a simpler way. It will not
> involve checking if the hardware is finished with a descriptor on
> ld_running.
> 
> > >
> > > It would be nice to know how to easily reproduce this bug, without
> > > needing to set up a RAID system. I don't have access to any such
> > > hardware. A driver similar to drivers/dma/dmatest.c (using the
> async_tx
> > > API instead) would be wonderful.
> > You can refer to raid5.c if you do not want to use hardware. Or you can
> use
> > you ram (or other storage devices) to build a raid env to test.
> > Thanks.
> >
> > >
> > > Thanks,
> > > Ira
> > >
> > > > TASK = ee1a94a0[1390] 'md0_raid5' THREAD: ecf4 CPU: 0
> > > > GPR00: 0001 ecf41ca0 ee44/921a94a0 003f 0001 c00593e4
> > >  0001
> > > > GPR08:  a7a7a7a7 0001 045/92002 42028042 100a38d4
> > > ed576d98 
> > > > GPR16: ed5a11b0  2b162000 0200 046/92000 2d555000
> > > ed3015e8 c15a7aa0
> > > > GPR24:  c155fc40  ecb63220 ecf41d28 e47/92f640bb0
> > > ef640c30 ecf41ca0
> > > > NIP [c02b048c] async_tx_submit+0x6c/0x2b4
> > > > LR [c02b068c] async_tx_submit+0x26c/0x2b4
> > > > Call Trace:
> > > > [ecf41ca0] [c02b068c] async_tx_submit+0x26c/0x2b448/92 (unreliable)
> > > > [ecf41cd0] [c02b0a4c] async_memcpy+0x240/0x25c
> > > > [ecf41d20] [c0421064] async_copy_data+0xa0/0x17c
> > > > [ecf41d70] [c0421cf4] __raid_run_ops+0x874/0xe10
> > > > [ecf41df0] [c0426ee4] handle_stripe+0x820/0x25e8
> > > > [ecf41e90] [c0429080] raid5d+0x3d4/0x5b4
> > > > [ecf41f40] [c04329b8] md_thread+0x138/0x16c
> > > > [ecf41f90] [c

RE: [PATCH] dma/fsldma: fix a compilation warnings

2012-07-31 Thread Liu Qiang-B32616
Hi Kumar,

> From: Linuxppc-dev 
> [linuxppc-dev-bounces+qiang.liu=freescale@lists.ozlabs.org] on behalf of 
> Kumar Gala [ga...@kernel.crashing.org]
> Sent: Tuesday, July 31, 2012 8:57 AM
> To: dan.j.willi...@intel.com
> Cc: vinod.k...@intel.com; linuxppc-...@ozlabs.org
> Subject: [PATCH] dma/fsldma: fix a compilation warnings

> drivers/dma/fsldma.c: In function 'fsl_dma_tx_submit':
> drivers/dma/fsldma.c:409:15: warning: 'cookie' may be used uninitialized in 
> this function
> 
> Signed-off-by: Kumar Gala 
> ---
>  drivers/dma/fsldma.c |2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/dma/fsldma.c b/drivers/dma/fsldma.c
> index 8f84761..6194eb7 100644
> --- a/drivers/dma/fsldma.c
> +++ b/drivers/dma/fsldma.c
> @@ -406,7 +406,7 @@ static dma_cookie_t fsl_dma_tx_submit(struct 
> dma_async_tx_descriptor *tx)
>struct fsl_desc_sw *desc = tx_to_fsl_desc(tx);
>struct fsl_desc_sw *child;
> unsigned long flags;
> -   dma_cookie_t cookie;
> +   dma_cookie_t cookie = 0;
I have already submitted a patch to fix it. Please refer to:
https://lists.ozlabs.org/pipermail/linuxppc-dev/2012-July/099819.html

Thanks.

> spin_lock_irqsave(&chan->desc_lock, flags);
> 
> --
> 1.7.9.7

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


RE: [PATCH v4 3/7] fsl-dma: change release process of dma descriptor for supporting async_tx

2012-07-30 Thread Liu Qiang-B32616
> -Original Message-
> From: Ira W. Snyder [mailto:i...@ovro.caltech.edu]
> Sent: Tuesday, July 31, 2012 5:10 AM
> To: Liu Qiang-B32616
> Cc: linux-cry...@vger.kernel.org; linuxppc-dev@lists.ozlabs.org; Phillips
> Kim-R1AAHA; herb...@gondor.hengli.com.au; da...@davemloft.net; Dan
> Williams; Vinod Koul; Li Yang-R58472
> Subject: Re: [PATCH v4 3/7] fsl-dma: change release process of dma
> descriptor for supporting async_tx
> 
> On Fri, Jul 27, 2012 at 05:16:09PM +0800, qiang@freescale.com wrote:
> > From: Qiang Liu 
> >
> > Fix the potential risk when enable config NET_DMA and ASYNC_TX.
> > Async_tx is lack of support in current release process of dma
> descriptor,
> > all descriptors will be released whatever is acked or no-acked by
> async_tx,
> > so there is a potential race condition when dma engine is uesd by
> others
> > clients (e.g. when enable NET_DMA to offload TCP).
> >
> > In our case, a race condition which is raised when use both of talitos
> > and dmaengine to offload xor is because napi scheduler will sync all
> > pending requests in dma channels, it affects the process of raid
> operations
> > due to ack_tx is not checked in fsl dma. The no-acked descriptor is
> freed
> > which is submitted just now, as a dependent tx, this freed descriptor
> trigger
> > BUG_ON(async_tx_test_ack(depend_tx)) in async_tx_submit().
> >
> 
> I'm preparing an alternative version of this patch that I think is
> easier to understand (it is much shorter). I'll post it up here as soon
> as I finish testing.
Can you give a simple description/idea about your patch? My patch is for fix the
problems when I build a raid environment with talitos offload xor.
I think the new interface is clear enough and similar with the implement of 
other dma devices.

And do you have any comments about this patch?

> 
> It would be nice to know how to easily reproduce this bug, without
> needing to set up a RAID system. I don't have access to any such
> hardware. A driver similar to drivers/dma/dmatest.c (using the async_tx
> API instead) would be wonderful.
You can refer to raid5.c if you do not want to use hardware. Or you can use
you ram (or other storage devices) to build a raid env to test.
Thanks.

> 
> Thanks,
> Ira
> 
> > TASK = ee1a94a0[1390] 'md0_raid5' THREAD: ecf4 CPU: 0
> > GPR00: 0001 ecf41ca0 ee44/921a94a0 003f 0001 c00593e4
>  0001
> > GPR08:  a7a7a7a7 0001 045/92002 42028042 100a38d4
> ed576d98 
> > GPR16: ed5a11b0  2b162000 0200 046/92000 2d555000
> ed3015e8 c15a7aa0
> > GPR24:  c155fc40  ecb63220 ecf41d28 e47/92f640bb0
> ef640c30 ecf41ca0
> > NIP [c02b048c] async_tx_submit+0x6c/0x2b4
> > LR [c02b068c] async_tx_submit+0x26c/0x2b4
> > Call Trace:
> > [ecf41ca0] [c02b068c] async_tx_submit+0x26c/0x2b448/92 (unreliable)
> > [ecf41cd0] [c02b0a4c] async_memcpy+0x240/0x25c
> > [ecf41d20] [c0421064] async_copy_data+0xa0/0x17c
> > [ecf41d70] [c0421cf4] __raid_run_ops+0x874/0xe10
> > [ecf41df0] [c0426ee4] handle_stripe+0x820/0x25e8
> > [ecf41e90] [c0429080] raid5d+0x3d4/0x5b4
> > [ecf41f40] [c04329b8] md_thread+0x138/0x16c
> > [ecf41f90] [c008277c] kthread+0x8c/0x90
> > [ecf41ff0] [c0011630] kernel_thread+0x4c/0x68
> >
> > Cc: Dan Williams 
> > Cc: Vinod Koul 
> > Cc: Li Yang 
> > Cc: Ira W. Snyder 
> > Signed-off-by: Qiang Liu 
> > ---
> >  drivers/dma/fsldma.c |  242 +++---
> 
> >  drivers/dma/fsldma.h |1 +
> >  2 files changed, 172 insertions(+), 71 deletions(-)
> >
> > diff --git a/drivers/dma/fsldma.c b/drivers/dma/fsldma.c
> > index 4f2f212..87f52c0 100644
> > --- a/drivers/dma/fsldma.c
> > +++ b/drivers/dma/fsldma.c
> > @@ -400,6 +400,125 @@ out_splice:
> > list_splice_tail_init(&desc->tx_list, &chan->ld_pending);
> >  }
> >
> > +static void fsldma_cleanup_descriptor(struct fsldma_chan *chan);
> > +static void fsl_chan_xfer_ld_queue(struct fsldma_chan *chan);
> > +
> > +/**
> > + * fsldma_clean_completed_descriptor - free all descriptors which
> > + * has been completed and acked
> > + * @chan: Freescale DMA channel
> > + *
> > + * This function is used on all completed and acked descriptors.
> > + * All descriptors should only be freed in this function.
> > + */
> > +static int
> > +fsldma_clean_completed_descriptor(struct fsldma_chan *chan)
> > +{
> > +   struct fsl_desc_sw *desc, *_desc;
> > +
> > +   /* Run the callback for each d

RE: [linuxppc-release] [PATCH v4 7/7] fsl-dma: add memcpy self test interface

2012-07-30 Thread Liu Qiang-B32616
> -Original Message-
> From: linux-crypto-ow...@vger.kernel.org [mailto:linux-crypto-
> ow...@vger.kernel.org] On Behalf Of Ira W. Snyder
> Sent: Tuesday, July 31, 2012 2:33 AM
> To: Tabi Timur-B04825
> Cc: Liu Qiang-B32616; linux-cry...@vger.kernel.org; linuxppc-
> d...@lists.ozlabs.org; Vinod Koul; herb...@gondor.hengli.com.au; Dan
> Williams; Li Yang-R58472; da...@davemloft.net
> Subject: Re: [linuxppc-release] [PATCH v4 7/7] fsl-dma: add memcpy self
> test interface
> 
> On Mon, Jul 30, 2012 at 12:48:41PM -0500, Timur Tabi wrote:
> > qiang@freescale.com wrote:
> > >
> > > Add memory copy self test when probe device, fsl-dma will be disabled
> > > if self test failed.
> >
> > Is this a real problem that can occur?  The DMA driver used to have a
> > self-test, but I removed it a long time ago because it was pointless.
> I
> > don't see why we need to add another one back in.
> >
> > --
> > Timur Tabi
> > Linux kernel developer at Freescale
> >
> 
> I made a comment that a test suite for the async_tx API would be very
> helpful in diagnosing similar problems in this and other DMA drivers.
> Something standalone, similar to the drivers/dma/dmatest.c driver, using
> the async_tx API.
> 
> I think this was misinterpreted into me asking that the driver have a
> built-in self test.
I will drop it in next version. Thanks.

> 
> Ira
> --
> To unsubscribe from this list: send the line "unsubscribe linux-crypto"
> in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


RE: [PATCH v4 0/7] Raid: enable talitos xor offload for improving performance

2012-07-29 Thread Liu Qiang-B32616
Hi,

Vinod, Dan, ping?


> -Original Message-
> From: linux-crypto-ow...@vger.kernel.org [mailto:linux-crypto-
> ow...@vger.kernel.org] On Behalf Of qiang@freescale.com
> Sent: Friday, July 27, 2012 5:16 PM
> To: linux-cry...@vger.kernel.org; vinod.k...@intel.com;
> dan.j.willi...@intel.com; herb...@gondor.hengli.com.au; linuxppc-
> d...@lists.ozlabs.org
> Cc: Li Yang-R58472; Phillips Kim-R1AAHA
> Subject: [PATCH v4 0/7] Raid: enable talitos xor offload for improving
> performance
> 
> Hi,
> 
> The following 7 patches enabling fsl-dma and talitos offload raid
> operations for improving raid performance and balancing CPU load.
> 
> Write performance will be improved by 25-30% tested by iozone.
> Write performance is improved about 2% after using spin_lock_bh replace
> spin_lock_irqsave.
> CPU load will be reduced by 8%.
> 
> Changes in V4:
>   - fix an error in talitos when dest addr is same with src addr,
> dest
>   should be freed only one time if src is same with dest addr;
>   - correct coding style in fsl-dma according to Ira's comments;
>   - fix a race condition in fsl-dma fsl_tx_status(), remove the
> interface
>   which is used to free descriptors in queue ld_completed, this
> interface
>   has been included in fsldma_cleanup_descriptor(), in v3, there is
> one
>   place missed spin_lock protect;
>   - split the original patch 3/4 up to 2 patches 3/7 and 4/7
> according to
>   Li Yang's comments.
>   - fix a warning of unitialized cookie;
>   - add memory copy self test in fsl-dma;
>   - add more detail description about use spin_lock_bh() to instead
> of
>   spin_lock_irqsave() according to Timur's comments;
> 
> Changes in v3:
>   - change release process of fsl-dma descriptor for resolve the
>   potential race condition
>   - add test result when use spin_lock_bh replace spin_lock_irqsave
>   - modify the benchmark results according to the latest patch
> 
> Changes in v2:
>   - rebase onto cryptodev tree
>   - split the patch 3/4 up to 3 independent patches
>   - remove the patch 4/4, the fix is not for cryptodev tree
> 
> 
> Qiang Liu (4):
>   Talitos: Support for async_tx XOR offload
>   fsl-dma: remove attribute DMA_INTERRUPT of dmaengine
>   fsl-dma: change release process of dma descriptor for supporting
> async_tx
>   fsl-dma: use spin_lock_bh to instead of spin_lock_irqsave
> 
> Qiang Liu (7):
>   Talitos: Support for async_tx XOR offload
>   fsl-dma: remove attribute DMA_INTERRUPT of dmaengine
>   fsl-dma: change release process of dma descriptor for supporting
> async_tx
>   fsl-dma: move the function ahead of its invoke function
>   fsl-dma: use spin_lock_bh to instead of spin_lock_irqsave
>   fsl-dma: fix a warning of unitialized cookie
>   fsl-dma: add memcpy self test interface
> 
>  drivers/crypto/Kconfig   |9 +
>  drivers/crypto/talitos.c |  413 ++
>  drivers/crypto/talitos.h |   53 +
>  drivers/dma/fsldma.c |  550 +---
> --
>  drivers/dma/fsldma.h |1 +
>  5 files changed, 822 insertions(+), 204 deletions(-)
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-crypto"
> in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


RE: [PATCH v4 3/7] fsl-dma: change release process of dma descriptor for supporting async_tx

2012-07-29 Thread Liu Qiang-B32616
Hi Dan and Vinod,

Can you apply these patches of fsl-dma to -next if there is not any comments?

Thanks.

> -Original Message-
> From: Liu Qiang-B32616
> Sent: Friday, July 27, 2012 5:16 PM
> To: linux-cry...@vger.kernel.org; linuxppc-dev@lists.ozlabs.org
> Cc: Phillips Kim-R1AAHA; herb...@gondor.hengli.com.au;
> da...@davemloft.net; Liu Qiang-B32616; Dan Williams; Vinod Koul; Li Yang-
> R58472; Ira W. Snyder
> Subject: [PATCH v4 3/7] fsl-dma: change release process of dma descriptor
> for supporting async_tx
> 
> From: Qiang Liu 
> 
> Fix the potential risk when enable config NET_DMA and ASYNC_TX.
> Async_tx is lack of support in current release process of dma descriptor,
> all descriptors will be released whatever is acked or no-acked by
> async_tx,
> so there is a potential race condition when dma engine is uesd by others
> clients (e.g. when enable NET_DMA to offload TCP).
> 
> In our case, a race condition which is raised when use both of talitos
> and dmaengine to offload xor is because napi scheduler will sync all
> pending requests in dma channels, it affects the process of raid
> operations
> due to ack_tx is not checked in fsl dma. The no-acked descriptor is freed
> which is submitted just now, as a dependent tx, this freed descriptor
> trigger
> BUG_ON(async_tx_test_ack(depend_tx)) in async_tx_submit().
> 
> TASK = ee1a94a0[1390] 'md0_raid5' THREAD: ecf4 CPU: 0
> GPR00: 0001 ecf41ca0 ee44/921a94a0 003f 0001 c00593e4
>  0001
> GPR08:  a7a7a7a7 0001 045/92002 42028042 100a38d4
> ed576d98 
> GPR16: ed5a11b0  2b162000 0200 046/92000 2d555000
> ed3015e8 c15a7aa0
> GPR24:  c155fc40  ecb63220 ecf41d28 e47/92f640bb0
> ef640c30 ecf41ca0
> NIP [c02b048c] async_tx_submit+0x6c/0x2b4
> LR [c02b068c] async_tx_submit+0x26c/0x2b4
> Call Trace:
> [ecf41ca0] [c02b068c] async_tx_submit+0x26c/0x2b448/92 (unreliable)
> [ecf41cd0] [c02b0a4c] async_memcpy+0x240/0x25c
> [ecf41d20] [c0421064] async_copy_data+0xa0/0x17c
> [ecf41d70] [c0421cf4] __raid_run_ops+0x874/0xe10
> [ecf41df0] [c0426ee4] handle_stripe+0x820/0x25e8
> [ecf41e90] [c0429080] raid5d+0x3d4/0x5b4
> [ecf41f40] [c04329b8] md_thread+0x138/0x16c
> [ecf41f90] [c008277c] kthread+0x8c/0x90
> [ecf41ff0] [c0011630] kernel_thread+0x4c/0x68
> 
> Cc: Dan Williams 
> Cc: Vinod Koul 
> Cc: Li Yang 
> Cc: Ira W. Snyder 
> Signed-off-by: Qiang Liu 
> ---
>  drivers/dma/fsldma.c |  242 +++-
> --
>  drivers/dma/fsldma.h |1 +
>  2 files changed, 172 insertions(+), 71 deletions(-)
> 
> diff --git a/drivers/dma/fsldma.c b/drivers/dma/fsldma.c
> index 4f2f212..87f52c0 100644
> --- a/drivers/dma/fsldma.c
> +++ b/drivers/dma/fsldma.c
> @@ -400,6 +400,125 @@ out_splice:
>   list_splice_tail_init(&desc->tx_list, &chan->ld_pending);
>  }
> 
> +static void fsldma_cleanup_descriptor(struct fsldma_chan *chan);
> +static void fsl_chan_xfer_ld_queue(struct fsldma_chan *chan);
> +
> +/**
> + * fsldma_clean_completed_descriptor - free all descriptors which
> + * has been completed and acked
> + * @chan: Freescale DMA channel
> + *
> + * This function is used on all completed and acked descriptors.
> + * All descriptors should only be freed in this function.
> + */
> +static int
> +fsldma_clean_completed_descriptor(struct fsldma_chan *chan)
> +{
> + struct fsl_desc_sw *desc, *_desc;
> +
> + /* Run the callback for each descriptor, in order */
> + list_for_each_entry_safe(desc, _desc, &chan->ld_completed, node) {
> +
> + if (async_tx_test_ack(&desc->async_tx)) {
> + /* Remove from the list of transactions */
> + list_del(&desc->node);
> +#ifdef FSL_DMA_LD_DEBUG
> + chan_dbg(chan, "LD %p free\n", desc);
> +#endif
> + dma_pool_free(chan->desc_pool, desc,
> + desc->async_tx.phys);
> + }
> + }
> +
> + return 0;
> +}
> +
> +/**
> + * fsldma_run_tx_complete_actions - cleanup and free a single link
> descriptor
> + * @chan: Freescale DMA channel
> + * @desc: descriptor to cleanup and free
> + * @cookie: Freescale DMA transaction identifier
> + *
> + * This function is used on a descriptor which has been executed by the
> DMA
> + * controller. It will run any callbacks, submit any dependencies.
> + */
> +static dma_cookie_t fsldma_run_tx_complete_actions(struct fsl_desc_sw
> *desc,
> + struct fsldma_chan *chan, dma_cookie_t cookie)
> +{
> + struct dma_asyn

RE: [PATCH v3 3/4] fsl-dma: change release process of dma descriptor for supporting async_tx

2012-07-17 Thread Liu Qiang-B32616
Hi Ira,

My comments inline.

Hi Ira and Dan,

Here is a introduction about my scenario of this case.

My case is use async_tx API to offload raid5,
Use fsl-talitos to compute parity calculation, fsl-dma to process memcpy.

What bug occurred to me?
Exception is thrown by BUG_ON(async_tx_ack_test(depend_tx)) in 
async_tx_submit().

TASK = ee1a94a0[1390] 'md0_raid5' THREAD: ecf4 CPU: 0
GPR00: 0001 ecf41ca0 ee44/921a94a0 003f 0001 c00593e4  
0001 
GPR08:  a7a7a7a7 0001 045/92002 42028042 100a38d4 ed576d98 
 
GPR16: ed5a11b0  2b162000 0200 046/92000 2d555000 ed3015e8 
c15a7aa0 
GPR24:  c155fc40  ecb63220 ecf41d28 e47/92f640bb0 ef640c30 
ecf41ca0 
NIP [c02b048c] async_tx_submit+0x6c/0x2b4
LR [c02b068c] async_tx_submit+0x26c/0x2b4
Call Trace:
[ecf41ca0] [c02b068c] async_tx_submit+0x26c/0x2b448/92 (unreliable)
[ecf41cd0] [c02b0a4c] async_memcpy+0x240/0x25c
[ecf41d20] [c0421064] async_copy_data+0xa0/0x17c
[ecf41d70] [c0421cf4] __raid_run_ops+0x874/0xe10
[ecf41df0] [c0426ee4] handle_stripe+0x820/0x25e8
[ecf41e90] [c0429080] raid5d+0x3d4/0x5b4
[ecf41f40] [c04329b8] md_thread+0x138/0x16c
[ecf41f90] [c008277c] kthread+0x8c/0x90
[ecf41ff0] [c0011630] kernel_thread+0x4c/0x68

What is the root cause of this bug?
The async_tx descriptor(depend_tx) is free'ed before async_tx_submit ack it.
That means async_tx descriptor should be acked before it's free'ed. In other
word, only ack'ed async_tx descriptor should be free'ed by dma engine.

How to fix it?
Add a queue ld_completed, move all completed descriptors in this queue, free
these descriptors at a proper time (these descriptors also should be acked).

Dan, do you think my understand about "ack" is right? Thanks. 

- Qiang

> -Original Message-
> From: Ira W. Snyder [mailto:i...@ovro.caltech.edu]
> Sent: Wednesday, July 18, 2012 12:17 AM
> To: Liu Qiang-B32616
> Cc: linux-cry...@vger.kernel.org; linuxppc-dev@lists.ozlabs.org; Phillips
> Kim-R1AAHA; herb...@gondor.hengli.com.au; da...@davemloft.net; Dan
> Williams; Vinod Koul; Li Yang-R58472
> Subject: Re: [PATCH v3 3/4] fsl-dma: change release process of dma
> descriptor for supporting async_tx
> 
> On Tue, Jul 17, 2012 at 07:06:33AM +, Liu Qiang-B32616 wrote:
> > > -Original Message-
> > > From: Ira W. Snyder [mailto:i...@ovro.caltech.edu]
> > > Sent: Tuesday, July 17, 2012 4:01 AM
> > > To: Liu Qiang-B32616
> > > Cc: linux-cry...@vger.kernel.org; linuxppc-dev@lists.ozlabs.org;
> Phillips
> > > Kim-R1AAHA; herb...@gondor.hengli.com.au; da...@davemloft.net; Dan
> > > Williams; Vinod Koul; Li Yang-R58472
> > > Subject: Re: [PATCH v3 3/4] fsl-dma: change release process of dma
> > > descriptor for supporting async_tx
> > >
> > > On Mon, Jul 16, 2012 at 12:08:29PM +0800, Qiang Liu wrote:
> > > > Fix the potential risk when enable config NET_DMA and ASYNC_TX.
> > > > Async_tx is lack of support in current release process of dma
> > > > descriptor, all descriptors will be released whatever is acked or
> > > > no-acked by async_tx, so there is a potential race condition when
> dma
> > > > engine is uesd by others clients (e.g. when enable NET_DMA to
> offload
> > > TCP).
> > > >
> > > > In our case, a race condition which is raised when use both of
> talitos
> > > > and dmaengine to offload xor is because napi scheduler will sync
> all
> > > > pending requests in dma channels, it affects the process of raid
> > > > operations due to ack_tx is not checked in fsl dma. The no-acked
> > > > descriptor is freed which is submitted just now, as a dependent tx,
> > > > this freed descriptor trigger
> > > > BUG_ON(async_tx_test_ack(depend_tx)) in async_tx_submit().
> > > >
> > > > Cc: Dan Williams 
> > > > Cc: Vinod Koul 
> > > > Cc: Li Yang 
> > > > Cc: Ira W. Snyder 
> > > > Signed-off-by: Qiang Liu 
> > > > ---
> > > >  drivers/dma/fsldma.c |  378 +-
> 
> > > ---
> > > >  drivers/dma/fsldma.h |1 +
> > > >  2 files changed, 225 insertions(+), 154 deletions(-)
> > > >
> > > > diff --git a/drivers/dma/fsldma.c b/drivers/dma/fsldma.c index
> > > > 4f2f212..4ee1b8f 100644
> > > > --- a/drivers/dma/fsldma.c
> > > > +++ b/drivers/dma/fsldma.c
> > > > @@ -400,6 +400,217 @@ out_splice:
> > > > list_splice_tail_init(&desc->tx_list, &chan->ld_pending);  }
> > > >
> > > >

RE: [PATCH v3 3/4] fsl-dma: change release process of dma descriptor for supporting async_tx

2012-07-17 Thread Liu Qiang-B32616
> -Original Message-
> From: Ira W. Snyder [mailto:i...@ovro.caltech.edu]
> Sent: Tuesday, July 17, 2012 4:01 AM
> To: Liu Qiang-B32616
> Cc: linux-cry...@vger.kernel.org; linuxppc-dev@lists.ozlabs.org; Phillips
> Kim-R1AAHA; herb...@gondor.hengli.com.au; da...@davemloft.net; Dan
> Williams; Vinod Koul; Li Yang-R58472
> Subject: Re: [PATCH v3 3/4] fsl-dma: change release process of dma
> descriptor for supporting async_tx
> 
> On Mon, Jul 16, 2012 at 12:08:29PM +0800, Qiang Liu wrote:
> > Fix the potential risk when enable config NET_DMA and ASYNC_TX.
> > Async_tx is lack of support in current release process of dma
> > descriptor, all descriptors will be released whatever is acked or
> > no-acked by async_tx, so there is a potential race condition when dma
> > engine is uesd by others clients (e.g. when enable NET_DMA to offload
> TCP).
> >
> > In our case, a race condition which is raised when use both of talitos
> > and dmaengine to offload xor is because napi scheduler will sync all
> > pending requests in dma channels, it affects the process of raid
> > operations due to ack_tx is not checked in fsl dma. The no-acked
> > descriptor is freed which is submitted just now, as a dependent tx,
> > this freed descriptor trigger
> > BUG_ON(async_tx_test_ack(depend_tx)) in async_tx_submit().
> >
> > Cc: Dan Williams 
> > Cc: Vinod Koul 
> > Cc: Li Yang 
> > Cc: Ira W. Snyder 
> > Signed-off-by: Qiang Liu 
> > ---
> >  drivers/dma/fsldma.c |  378 +-
> ---
> >  drivers/dma/fsldma.h |1 +
> >  2 files changed, 225 insertions(+), 154 deletions(-)
> >
> > diff --git a/drivers/dma/fsldma.c b/drivers/dma/fsldma.c index
> > 4f2f212..4ee1b8f 100644
> > --- a/drivers/dma/fsldma.c
> > +++ b/drivers/dma/fsldma.c
> > @@ -400,6 +400,217 @@ out_splice:
> > list_splice_tail_init(&desc->tx_list, &chan->ld_pending);  }
> >
> > +/**
> > + * fsl_chan_xfer_ld_queue - transfer any pending transactions
> > + * @chan : Freescale DMA channel
> > + *
> > + * HARDWARE STATE: idle
> > + * LOCKING: must hold chan->desc_lock  */ static void
> > +fsl_chan_xfer_ld_queue(struct fsldma_chan *chan) {
> > +   struct fsl_desc_sw *desc;
> > +
> > +   /*
> > +* If the list of pending descriptors is empty, then we
> > +* don't need to do any work at all
> > +*/
> > +   if (list_empty(&chan->ld_pending)) {
> > +   chan_dbg(chan, "no pending LDs\n");
> > +   return;
> > +   }
> > +
> > +   /*
> > +* The DMA controller is not idle, which means that the interrupt
> > +* handler will start any queued transactions when it runs after
> > +* this transaction finishes
> > +*/
> > +   if (!chan->idle) {
> > +   chan_dbg(chan, "DMA controller still busy\n");
> > +   return;
> > +   }
> > +
> > +   /*
> > +* If there are some link descriptors which have not been
> > +* transferred, we need to start the controller
> > +*/
> > +
> > +   /*
> > +* Move all elements from the queue of pending transactions
> > +* onto the list of running transactions
> > +*/
> > +   chan_dbg(chan, "idle, starting controller\n");
> > +   desc = list_first_entry(&chan->ld_pending, struct fsl_desc_sw,
> node);
> > +   list_splice_tail_init(&chan->ld_pending, &chan->ld_running);
> > +
> > +   /*
> > +* The 85xx DMA controller doesn't clear the channel start bit
> > +* automatically at the end of a transfer. Therefore we must clear
> > +* it in software before starting the transfer.
> > +*/
> > +   if ((chan->feature & FSL_DMA_IP_MASK) == FSL_DMA_IP_85XX) {
> > +   u32 mode;
> > +
> > +   mode = DMA_IN(chan, &chan->regs->mr, 32);
> > +   mode &= ~FSL_DMA_MR_CS;
> > +   DMA_OUT(chan, &chan->regs->mr, mode, 32);
> > +   }
> > +
> > +   /*
> > +* Program the descriptor's address into the DMA controller,
> > +* then start the DMA transaction
> > +*/
> > +   set_cdar(chan, desc->async_tx.phys);
> > +   get_cdar(chan);
> > +
> > +   dma_start(chan);
> > +   chan->idle = false;
> > +}
> > +
> 
> Please add a note about the locking requirements here, and for the other
> new functions you've added.
> 
> You call this f

RE: [linuxppc-release] [PATCH v3 4/4] fsl-dma: use spin_lock_bh to instead of spin_lock_irqsave

2012-07-16 Thread Liu Qiang-B32616
> -Original Message-
> From: Tabi Timur-B04825
> Sent: Monday, July 16, 2012 10:25 PM
> To: Liu Qiang-B32616
> Cc: linux-cry...@vger.kernel.org; linuxppc-dev@lists.ozlabs.org; Vinod
> Koul; herb...@gondor.hengli.com.au; Dan Williams; Li Yang-R58472;
> da...@davemloft.net
> Subject: Re: [linuxppc-release] [PATCH v3 4/4] fsl-dma: use spin_lock_bh
> to instead of spin_lock_irqsave
> 
> Qiang Liu wrote:
> > Use spin_lock_bh to instead of spin_lock_irqsave for improving
> performance.
> 
> You forgot to include the evidence that performance has improved, as well
> as an explanation why it's okay to use spin_lock_bh, and why it's faster.
>   I told you to respin the patch with that information in the patch
> description.
I attached the test result in v3 0/4, performance is improved by 2%.
For my understanding, there is not any place to access descriptor lists
in fsl-dma interrupt service handler except its tasklet, spin_lock_bh()
is born for this. Interrupts will be turned off and context will be save in
irqsave, there is needless to use irqsave in our case. You can refer to the
implement of mv_xor.c or ioap-adma.c.
If you think my explanation is ok, I can add it in the patch.
Thanks.

> 
> --
> Timur Tabi
> Linux kernel developer at Freescale


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


RE: [PATCH v3 3/4] fsl-dma: change release process of dma descriptor for supporting async_tx

2012-07-16 Thread Liu Qiang-B32616
> -Original Message-
> From: linux-crypto-ow...@vger.kernel.org [mailto:linux-crypto-
> ow...@vger.kernel.org] On Behalf Of Li Yang
> Sent: Tuesday, July 17, 2012 11:37 AM
> To: Liu Qiang-B32616
> Cc: linux-cry...@vger.kernel.org; linuxppc-dev@lists.ozlabs.org; Ira W.
> Snyder; Vinod Koul; herb...@gondor.hengli.com.au; Dan Williams;
> da...@davemloft.net
> Subject: Re: [PATCH v3 3/4] fsl-dma: change release process of dma
> descriptor for supporting async_tx
> 
> On Mon, Jul 16, 2012 at 12:08 PM, Qiang Liu 
> wrote:
> > Fix the potential risk when enable config NET_DMA and ASYNC_TX.
> > Async_tx is lack of support in current release process of dma
> > descriptor, all descriptors will be released whatever is acked or
> > no-acked by async_tx, so there is a potential race condition when dma
> > engine is uesd by others clients (e.g. when enable NET_DMA to offload
> TCP).
> >
> > In our case, a race condition which is raised when use both of talitos
> > and dmaengine to offload xor is because napi scheduler will sync all
> > pending requests in dma channels, it affects the process of raid
> > operations due to ack_tx is not checked in fsl dma. The no-acked
> > descriptor is freed which is submitted just now, as a dependent tx,
> > this freed descriptor trigger
> > BUG_ON(async_tx_test_ack(depend_tx)) in async_tx_submit().
> >
> > Cc: Dan Williams 
> > Cc: Vinod Koul 
> > Cc: Li Yang 
> > Cc: Ira W. Snyder 
> > Signed-off-by: Qiang Liu 
> 
> Also separate the function ordering change and real code change into
> different patches when you work on the next patch set.
Accept, I will spilt it up in v4. Thanks.

> 
> - Leo
> --
> To unsubscribe from this list: send the line "unsubscribe linux-crypto"
> in the body of a message to majord...@vger.kernel.org More majordomo info
> at  http://vger.kernel.org/majordomo-info.html

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


RE: [PATCH v3 0/4] Raid: enable talitos xor offload for improving performance

2012-07-16 Thread Liu Qiang-B32616
> -Original Message-
> From: Phillips Kim-R1AAHA
> Sent: Tuesday, July 17, 2012 9:04 AM
> To: Liu Qiang-B32616
> Cc: linux-cry...@vger.kernel.org; linuxppc-dev@lists.ozlabs.org; Li Yang-
> R58472; dan.j.willi...@intel.com; herb...@gondor.hengli.com.au
> Subject: Re: [PATCH v3 0/4] Raid: enable talitos xor offload for
> improving performance
> 
> On Mon, 16 Jul 2012 12:07:16 +0800
> Qiang Liu  wrote:
> 
> >  drivers/crypto/Kconfig   |9 +
> >  drivers/crypto/talitos.c |  410
> +++
> >  drivers/crypto/talitos.h |   53 ++
> >  drivers/dma/fsldma.c |  436 +-
> 
> >  drivers/dma/fsldma.h |1 +
> >  5 files changed, 708 insertions(+), 201 deletions(-)
> 
> Given the pending talitos patches, this patchseries doesn't apply
> cleanly:  can you rebase onto [1], which is based on Herbert's
> cryptodev tree and contain's Horia's four patches?  They didn't get
> any negative comments, so I assume eventually they will be applied,
> and doing so will make Herbert's life easier.
> 
> I applied the series on Herbert's cryptodev, and while fsldma
> already had this build warning:
> 
> drivers/dma/fsldma.c: In function 'fsl_dma_tx_submit':
> drivers/dma/fsldma.c:636:2: warning: 'cookie' may be used uninitialized
> in this function [-Wuninitialized]
Kim, I will fix it in an another separate patch.

> 
> this patchseries introduces a new one:
> 
> drivers/dma/fsldma.c: In function 'dma_do_tasklet':
> drivers/dma/fsldma.c:1134:16: warning: unused variable 'flags' [-Wunused-
> variable]
Sorry, my bad. I will correct it in v4 5/6 (according to Li Yang's comments,
v3 3/4 should be split up). Thanks.

> 
> I'll wait for a re-post, after these and Ira's comments are
> addressed, before trying to test again.
I agree, I think his comments is very important.

> 
> Thanks,
> 
> Kim
> 
> [1] git://git.freescale.com/crypto/cryptodev.git

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


RE: [linuxppc-release] [PATCH v2 4/4] fsl-dma: use spin_lock_bh to instead of spin_lock_irqsave

2012-07-12 Thread Liu Qiang-B32616
> -Original Message-
> From: Tabi Timur-B04825
> Sent: Wednesday, July 11, 2012 11:01 PM
> To: Liu Qiang-B32616
> Cc: linux-cry...@vger.kernel.org; linuxppc-dev@lists.ozlabs.org; Vinod
> Koul; herb...@gondor.hengli.com.au; Dan Williams; Li Yang-R58472;
> da...@davemloft.net
> Subject: Re: [linuxppc-release] [PATCH v2 4/4] fsl-dma: use spin_lock_bh
> to instead of spin_lock_irqsave
> 
> Qiang Liu wrote:
> > Use spin_lock_bh to instead of spin_lock_irqsave for improving
> performance.
> 
> Please provide some evidence that performance has improved, as well as an
> explanation why it's okay to use spin_lock_bh, and why it's faster.
I compared my test result before and after this patch, write performance can
improved by 15%. I will send the latest patches sooner because of Ira's concern.
I will give a complete description about the improvement of spin_lock_bh().

About your question, spin_lock_bh is used in the case of bottom/half as its
name, there is no need to protect a running/pending list with spin_lock_irqsave.

Thanks.

> --
> Timur Tabi
> Linux kernel developer at Freescale

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


RE: [PATCH v2 3/4] fsl-dma: change the release process of dma descriptor

2012-07-12 Thread Liu Qiang-B32616
> -Original Message-
> From: Linuxppc-dev [mailto:linuxppc-dev-
> bounces+qiang.liu=freescale@lists.ozlabs.org] On Behalf Of Liu Qiang-
> B32616
> Sent: Thursday, July 12, 2012 3:12 PM
> To: Ira W. Snyder
> Cc: herb...@gondor.apana.org.au; Vinod Koul; linux-cry...@vger.kernel.org;
> Dan Williams; linuxppc-dev@lists.ozlabs.org; da...@davemloft.net
> Subject: RE: [PATCH v2 3/4] fsl-dma: change the release process of dma
> descriptor
> 
> > -Original Message-
> > From: Ira W. Snyder [mailto:i...@ovro.caltech.edu]
> > Sent: Thursday, July 12, 2012 12:31 AM
> > To: Liu Qiang-B32616
> > Cc: linux-cry...@vger.kernel.org; linuxppc-dev@lists.ozlabs.org; Vinod
> > Koul; herb...@gondor.hengli.com.au; Dan Williams; da...@davemloft.net
> > Subject: Re: [PATCH v2 3/4] fsl-dma: change the release process of dma
> > descriptor
> >
> > On Wed, Jul 11, 2012 at 05:01:25PM +0800, Qiang Liu wrote:
> > > Modify the release process of dma descriptor for avoiding exception
> > when
> > > enable config NET_DMA, release dma descriptor from 1st to last
> > > second,
> > the
> > > last descriptor which is reserved in current descriptor register may
> > not be
> > > completed, race condition will be raised if free current descriptor.
> > >
> > > A race condition which is raised when use both of talitos and
> > > dmaengine
> > to
> > > offload xor is because napi scheduler (NET_DMA is enabled) will sync
> > all
> > > pending requests in dma channels, it affects the process of raid
> > operations.
> > > The descriptor is freed which is submitted just now, but async_tx
> > > must
> > check
> > > whether this depend tx descriptor is acked, there are poison
> > > contents
> > in the
> > > invalid address, then BUG_ON() is thrown, so this descriptor will be
> > freed
> > > in the next time.
> > >
> >
> > This patch seems to be covering up a bug in the driver, rather than
> > actually fixing it.
> Yes, it's fine for fsl-dma itself, but it cannot work under complex
> condition.
> For example, we enable NET_DMA and SEC xor offload, if NAPI task is waken
> up to synchronize pending requests when raid5 dma copy was submitted, the
> process order of raid5 tx descriptor is not as our expected.
> Unfortunately, sometime we have to check this dependent tx descriptor
> which has was already released.
> 
> >
> > When it was written, it was expected that dma_do_tasklet() would run
> > only when the controller was idle.
> >
> > > Cc: Dan Williams 
> > > Cc: Vinod Koul 
> > > Cc: Li Yang 
> > > Signed-off-by: Qiang Liu 
> > > ---
> > >  drivers/dma/fsldma.c |   15 ---
> > >  1 files changed, 12 insertions(+), 3 deletions(-)
> > >
> > > diff --git a/drivers/dma/fsldma.c b/drivers/dma/fsldma.c index
> > > 4f2f212..0ba3e40 100644
> > > --- a/drivers/dma/fsldma.c
> > > +++ b/drivers/dma/fsldma.c
> > > @@ -1035,14 +1035,22 @@ static irqreturn_t fsldma_chan_irq(int irq,
> > void *data)
> > >  static void dma_do_tasklet(unsigned long data)  {
> > >   struct fsldma_chan *chan = (struct fsldma_chan *)data;
> > > - struct fsl_desc_sw *desc, *_desc;
> > > + struct fsl_desc_sw *desc, *_desc, *prev = NULL;
> > >   LIST_HEAD(ld_cleanup);
> > >   unsigned long flags;
> > > + dma_addr_t curr_phys = get_cdar(chan);
> > >
> > >   chan_dbg(chan, "tasklet entry\n");
> > >
> > >   spin_lock_irqsave(&chan->desc_lock, flags);
> > >
> > > + /* find the descriptor which is already completed */
> > > + list_for_each_entry_safe(desc, _desc, &chan->ld_running, node) {
> > > + if (prev && desc->async_tx.phys == curr_phys)
> > > + break;
> > > + prev = desc;
> > > + }
> > > +
> >
> > If the DMA controller was still busy processing transactions, you
> > should have gotten the printout "irq: controller not idle!" from
> > fsldma_chan_irq() just before it scheduled the dma_do_tasklet() to run.
> > If you did not get this printout, how was dma_do_tasklet() entered
> > with the controller still busy? I don't understand how it can happen.
> Hi ira, this issue should be not related to dma status. The last
> descriptor is left as usb null descriptor, actually, this descriptor is
> used as usb null descriptor, at any case, I believe it has been already
> completed, but I will rele

RE: [PATCH v2 3/4] fsl-dma: change the release process of dma descriptor

2012-07-12 Thread Liu Qiang-B32616
> -Original Message-
> From: Ira W. Snyder [mailto:i...@ovro.caltech.edu]
> Sent: Thursday, July 12, 2012 12:31 AM
> To: Liu Qiang-B32616
> Cc: linux-cry...@vger.kernel.org; linuxppc-dev@lists.ozlabs.org; Vinod
> Koul; herb...@gondor.hengli.com.au; Dan Williams; da...@davemloft.net
> Subject: Re: [PATCH v2 3/4] fsl-dma: change the release process of dma
> descriptor
> 
> On Wed, Jul 11, 2012 at 05:01:25PM +0800, Qiang Liu wrote:
> > Modify the release process of dma descriptor for avoiding exception
> when
> > enable config NET_DMA, release dma descriptor from 1st to last second,
> the
> > last descriptor which is reserved in current descriptor register may
> not be
> > completed, race condition will be raised if free current descriptor.
> >
> > A race condition which is raised when use both of talitos and dmaengine
> to
> > offload xor is because napi scheduler (NET_DMA is enabled) will sync
> all
> > pending requests in dma channels, it affects the process of raid
> operations.
> > The descriptor is freed which is submitted just now, but async_tx must
> check
> > whether this depend tx descriptor is acked, there are poison contents
> in the
> > invalid address, then BUG_ON() is thrown, so this descriptor will be
> freed
> > in the next time.
> >
> 
> This patch seems to be covering up a bug in the driver, rather than
> actually fixing it.
Yes, it's fine for fsl-dma itself, but it cannot work under complex condition.
For example, we enable NET_DMA and SEC xor offload, if NAPI task is waken up to
synchronize pending requests when raid5 dma copy was submitted, the process 
order
of raid5 tx descriptor is not as our expected. Unfortunately, sometime we have
to check this dependent tx descriptor which has was already released.

> 
> When it was written, it was expected that dma_do_tasklet() would run
> only when the controller was idle.
> 
> > Cc: Dan Williams 
> > Cc: Vinod Koul 
> > Cc: Li Yang 
> > Signed-off-by: Qiang Liu 
> > ---
> >  drivers/dma/fsldma.c |   15 ---
> >  1 files changed, 12 insertions(+), 3 deletions(-)
> >
> > diff --git a/drivers/dma/fsldma.c b/drivers/dma/fsldma.c
> > index 4f2f212..0ba3e40 100644
> > --- a/drivers/dma/fsldma.c
> > +++ b/drivers/dma/fsldma.c
> > @@ -1035,14 +1035,22 @@ static irqreturn_t fsldma_chan_irq(int irq,
> void *data)
> >  static void dma_do_tasklet(unsigned long data)
> >  {
> > struct fsldma_chan *chan = (struct fsldma_chan *)data;
> > -   struct fsl_desc_sw *desc, *_desc;
> > +   struct fsl_desc_sw *desc, *_desc, *prev = NULL;
> > LIST_HEAD(ld_cleanup);
> > unsigned long flags;
> > +   dma_addr_t curr_phys = get_cdar(chan);
> >
> > chan_dbg(chan, "tasklet entry\n");
> >
> > spin_lock_irqsave(&chan->desc_lock, flags);
> >
> > +   /* find the descriptor which is already completed */
> > +   list_for_each_entry_safe(desc, _desc, &chan->ld_running, node) {
> > +   if (prev && desc->async_tx.phys == curr_phys)
> > +   break;
> > +   prev = desc;
> > +   }
> > +
> 
> If the DMA controller was still busy processing transactions, you should
> have gotten the printout "irq: controller not idle!" from
> fsldma_chan_irq() just before it scheduled the dma_do_tasklet() to run.
> If you did not get this printout, how was dma_do_tasklet() entered with
> the controller still busy? I don't understand how it can happen.
Hi ira, this issue should be not related to dma status. The last descriptor
is left as usb null descriptor, actually, this descriptor is used as usb null
descriptor, at any case, I believe it has been already completed, but I
will released it in next chain, it doesn't affect the upper api to use the
data, and make sure async_tx api won't raise an exception 
(BUG_ON(async_tx_test_ack(depend_tx)), this depend_tx is the desc->async_tx).

> 
> If you test without your spin_lock_bh() and spin_unlock_bh() conversion
> patch, do you still hit the error?
The error still happened. spin_lock_bh() and spin_unlock_bh() are modified
after this patch.

> 
> What happens if a user submits exactly one DMA transaction, and then
> leaves the system idle? The callback for the last descriptor in the
> chain will never get run, right? That's a bug.
It won't be happened if use fsl-dma, because the right order is 
xor-copy-xor->callback, The callback which you concerned is implemented 
in talitos driver, callback should be null in fsl-dma.

> 
> > /* update the cookie if we have some descriptors to cleanup */
> > if

RE: [PATCH v2 2/4] fsl-dma: remove attribute DMA_INTERRUPT of dmaengine

2012-07-11 Thread Liu Qiang-B32616
> -Original Message-
> From: Ira W. Snyder [mailto:i...@ovro.caltech.edu]
> Sent: Thursday, July 12, 2012 12:17 AM
> To: Liu Qiang-B32616
> Cc: linux-cry...@vger.kernel.org; linuxppc-dev@lists.ozlabs.org; Vinod
> Koul; herb...@gondor.hengli.com.au; Dan Williams; da...@davemloft.net
> Subject: Re: [PATCH v2 2/4] fsl-dma: remove attribute DMA_INTERRUPT of
> dmaengine
> 
> On Wed, Jul 11, 2012 at 05:00:53PM +0800, Qiang Liu wrote:
> > Delete attribute DMA_INTERRUPT because fsl-dma doesn't support this
> function,
> > exception will be thrown if talitos is used to offload xor at the same
> time.
> >
> 
> Both drivers/misc/carma/carma-fpga.c and
> drivers/misc/carma/carma-fpga-program.c expect the DMA_INTERRUPT
> property, though they do not use it. The mask is set for historical
> reasons. It is safe to delete the line "dma_cap_set(DMA_INTERRUPT,
> mask);"
> from both drivers.
> 
> I don't know which other drivers may expect this feature to be present.
> These are only the ones which I maintain.
The only place is in async_tx api which the feature is used. But it does not
work fine as expected of DMA_INTERRUPT. In fsl-dma, it will raise a program
error if source or destination address is zero.

> 
> Other than that, you can add my:
> Acked-by: Ira W. Snyder 
Thanks.

> 
> > Cc: Dan Williams 
> > Cc: Vinod Koul 
> > Cc: Li Yang 
> > Signed-off-by: Qiang Liu 
> > ---
> >  drivers/dma/fsldma.c |   31 ---
> >  1 files changed, 0 insertions(+), 31 deletions(-)
> >
> > diff --git a/drivers/dma/fsldma.c b/drivers/dma/fsldma.c
> > index 8f84761..4f2f212 100644
> > --- a/drivers/dma/fsldma.c
> > +++ b/drivers/dma/fsldma.c
> > @@ -543,35 +543,6 @@ static void fsl_dma_free_chan_resources(struct
> dma_chan *dchan)
> >  }
> >
> >  static struct dma_async_tx_descriptor *
> > -fsl_dma_prep_interrupt(struct dma_chan *dchan, unsigned long flags)
> > -{
> > -   struct fsldma_chan *chan;
> > -   struct fsl_desc_sw *new;
> > -
> > -   if (!dchan)
> > -   return NULL;
> > -
> > -   chan = to_fsl_chan(dchan);
> > -
> > -   new = fsl_dma_alloc_descriptor(chan);
> > -   if (!new) {
> > -   chan_err(chan, "%s\n", msg_ld_oom);
> > -   return NULL;
> > -   }
> > -
> > -   new->async_tx.cookie = -EBUSY;
> > -   new->async_tx.flags = flags;
> > -
> > -   /* Insert the link descriptor to the LD ring */
> > -   list_add_tail(&new->node, &new->tx_list);
> > -
> > -   /* Set End-of-link to the last link descriptor of new list */
> > -   set_ld_eol(chan, new);
> > -
> > -   return &new->async_tx;
> > -}
> > -
> > -static struct dma_async_tx_descriptor *
> >  fsl_dma_prep_memcpy(struct dma_chan *dchan,
> > dma_addr_t dma_dst, dma_addr_t dma_src,
> > size_t len, unsigned long flags)
> > @@ -1352,12 +1323,10 @@ static int __devinit fsldma_of_probe(struct
> platform_device *op)
> > fdev->irq = irq_of_parse_and_map(op->dev.of_node, 0);
> >
> > dma_cap_set(DMA_MEMCPY, fdev->common.cap_mask);
> > -   dma_cap_set(DMA_INTERRUPT, fdev->common.cap_mask);
> > dma_cap_set(DMA_SG, fdev->common.cap_mask);
> > dma_cap_set(DMA_SLAVE, fdev->common.cap_mask);
> > fdev->common.device_alloc_chan_resources =
> fsl_dma_alloc_chan_resources;
> > fdev->common.device_free_chan_resources =
> fsl_dma_free_chan_resources;
> > -   fdev->common.device_prep_dma_interrupt = fsl_dma_prep_interrupt;
> > fdev->common.device_prep_dma_memcpy = fsl_dma_prep_memcpy;
> > fdev->common.device_prep_dma_sg = fsl_dma_prep_sg;
> > fdev->common.device_tx_status = fsl_tx_status;
> > --
> > 1.7.5.1
> >
> >
> > ___
> > Linuxppc-dev mailing list
> > Linuxppc-dev@lists.ozlabs.org
> > https://lists.ozlabs.org/listinfo/linuxppc-dev


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


RE: [PATCH 4/4] Talitos: fix the issue of dma memory leak

2012-07-11 Thread Liu Qiang-B32616
> -Original Message-
> From: Geanta Neag Horia Ioan-B05471
> Sent: Wednesday, July 11, 2012 3:09 PM
> To: Liu Qiang-B32616
> Cc: linux-cry...@vger.kernel.org; linuxppc-dev@lists.ozlabs.org; Li Yang-
> R58472; Phillips Kim-R1AAHA; Herbert Xu; David S. Miller
> Subject: RE: [PATCH 4/4] Talitos: fix the issue of dma memory leak
> 
> On Tue, 10 Jul 2012 09:00:14 +0300, Qiang Liu 
> wrote:
> > An error will be happened when test with mass data:
> > "DMA-API: device driver tries to sync DMA memory it has not
> > allocated";
> > "DMA-API: debugging out of memory - disabling"
> > dma mapping memory of request->desc is not released by right device,
> > it should be private->dev but not dev;
> >
> > Cc: Herbert Xu 
> > Cc: David S. Miller 
> > Signed-off-by: Qiang Liu 
> > ---
> >  drivers/crypto/talitos.c |2 +-
> >  1 files changed, 1 insertions(+), 1 deletions(-) diff --git
> > a/drivers/crypto/talitos.c b/drivers/crypto/talitos.c index
> > 81f8497..a7da48c 100644
> > --- a/drivers/crypto/talitos.c
> > +++ b/drivers/crypto/talitos.c
> > @@ -264,7 +264,7 @@ static void flush_channel(struct device *dev, int
> > ch, int error, int reset_ch)
> > else
> > status = error;
> > -   dma_unmap_single(dev, request->dma_desc,
> > +dma_unmap_single(priv->dev, request->dma_desc,
> >  sizeof(struct talitos_desc),
> >  DMA_BIDIRECTIONAL);
> 
> Are you sure this fix applies to the upstream version of talitos?
> (i.e. have you encountered the error while running on cryptodev.git ?)
> 
I found it on our own git, I'm going to test it. I will send the result later.
Thanks.

> Looks to me this is a fix for the not-upstreamed-yet NAPI patch (which
> needs to be reworked according to Dave's feedback).
> 
> When you respin the patch series, consider removing this one.
Thanks for your mention, I will remove it if I cannot reproduce it.

> Cheers,
> Horia
> 


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


RE: [PATCH 1/4] Talitos: move the data structure into header file

2012-07-10 Thread Liu Qiang-B32616
> -Original Message-
> From: Phillips Kim-R1AAHA
> Sent: Wednesday, July 11, 2012 8:11 AM
> To: Liu Qiang-B32616
> Cc: linux-cry...@vger.kernel.org; linuxppc-dev@lists.ozlabs.org; Li Yang-
> R58472; Herbert Xu; David S. Miller; Geanta Neag Horia Ioan-B05471
> Subject: Re: [PATCH 1/4] Talitos: move the data structure into header
> file
> 
> On Tue, 10 Jul 2012 13:56:46 +0800
> Qiang Liu  wrote:
> 
> > Move the declaration of talitos data structure into talitos.h.
> >
> > Cc: Herbert Xu 
> > Cc: David S. Miller 
> > Signed-off-by: Qiang Liu 
> > ---
> 
> this patch has already been submitted [1].
> 
> Subsequent patches in this series also don't apply cleanly:  can
> you rebase onto [2], which is based on Herbert's cryptodev tree and
> contain's Horia's four patches, and re-send?
Kim, Thanks for your note, I will rebase the cryptodev tree and resend
the patch again.

> 
> Also note that upstream talitos does not yet contain NAPI support
> [3].
Thanks.
> 
> Thanks,
> 
> Kim
> 
> [1] http://www.mail-archive.com/linux-
> cry...@vger.kernel.org/msg07299.html
> [2] git://git.freescale.com/crypto/cryptodev.git
> [3] http://www.mail-archive.com/linux-
> cry...@vger.kernel.org/msg07289.html

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


RE: [linuxppc-release] [PATCH 4/4] Talitos: fix the issue of dma memory leak

2012-07-10 Thread Liu Qiang-B32616
> -Original Message-
> From: Tabi Timur-B04825
> Sent: Wednesday, July 11, 2012 5:26 AM
> To: Liu Qiang-B32616
> Cc: linux-cry...@vger.kernel.org; linuxppc-dev@lists.ozlabs.org; Herbert
> Xu; Li Yang-R58472; David S. Miller
> Subject: Re: [linuxppc-release] [PATCH 4/4] Talitos: fix the issue of dma
> memory leak
> 
> Qiang Liu wrote:
> > An error will be happened when test with mass data:
> 
> Please don't use the phrase "fix the issue" in patch summaries.  It's
> redundant.
> 
> This patch should be titled,
> 
> "drivers/crypto: fix memory leak in Talitos driver"
> 
> > diff --git a/drivers/crypto/talitos.c b/drivers/crypto/talitos.c index
> > 81f8497..a7da48c 100644
> > --- a/drivers/crypto/talitos.c
> > +++ b/drivers/crypto/talitos.c
> > @@ -264,7 +264,7 @@ static void flush_channel(struct device *dev, int
> ch, int error, int reset_ch)
> > else
> > status = error;
> >
> > -   dma_unmap_single(dev, request->dma_desc,
> > +dma_unmap_single(priv->dev, request->dma_desc,
> 
> You have an indentation problem here.
My fault, I will correct it and resend again. Thanks.

> 
> --
> Timur Tabi
> Linux kernel developer at Freescale

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


RE: [PATCH 3/4] fsl-dma: support attribute of DMA_MEMORY when async_tx enabled

2012-07-10 Thread Liu Qiang-B32616
> -Original Message-
> From: Dan Williams [mailto:dan.j.willi...@intel.com]
> Sent: Wednesday, July 11, 2012 3:39 AM
> To: Liu Qiang-B32616
> Cc: linux-cry...@vger.kernel.org; linuxppc-dev@lists.ozlabs.org; Li Yang-
> R58472; Phillips Kim-R1AAHA; Vinod Koul
> Subject: Re: [PATCH 3/4] fsl-dma: support attribute of DMA_MEMORY when
> async_tx enabled
> 
> On Mon, Jul 9, 2012 at 10:59 PM, Qiang Liu 
> wrote:
> > - delete attribute of DMA_INTERRUPT because fsl-dma doesn't support
> > this function, exception will be thrown if talitos is used to compute
> xor
> > at the same time;
> > - change the release process of dma descriptor for avoiding exception
> when
> > enable config NET_DMA, release dma descriptor from 1st to last second,
> the
> > last descriptor which is reserved in current descriptor register may
> not be
> > completed, race condition will be raised if free current descriptor;
> > - use spin_lock_bh to instead of spin_lock_irqsave for improving
> performance;
> >
> > A race condition which is raised when use both of talitos and dmaengine
> to
> > offload xor is because napi scheduler will sync all pending requests in
> dma
> > channels, it will affect the process of raid operations. The descriptor
> is
> > freed which is submitted just now, but async_tx must check whether this
> depend
> > tx descriptor is acked, there are poison contents in the invalid
> address,
> > then BUG_ON() is thrown, so this descriptor will be freed in the next
> time.
> >
> > Cc: Dan Williams 
> > Cc: Vinod Koul 
> > Cc: Li Yang 
> > Signed-off-by: Qiang Liu 
> > ---
> 
> From the description this sounds like 3 or 4 patches.  Can you split it
> up?
I will split this patch according to my description and resend again. Thanks. 


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


RE: [PATCH V3] fsl-sata: add support for interrupt coalsecing feature

2012-02-28 Thread Liu Qiang-B32616
Hi Jeff,

Do you plan to apply it to upstream, or any suggestions? Thanks.

> -Original Message-
> From: linux-ide-ow...@vger.kernel.org [mailto:linux-ide-
> ow...@vger.kernel.org] On Behalf Of Li Yang
> Sent: Wednesday, February 15, 2012 3:51 PM
> To: Liu Qiang-B32616
> Cc: jgar...@pobox.com; linux-...@vger.kernel.org; linux-
> ker...@vger.kernel.org; linuxppc-dev@lists.ozlabs.org
> Subject: Re: [PATCH V3] fsl-sata: add support for interrupt coalsecing
> feature
> 
> On Wed, Feb 15, 2012 at 3:40 PM, Qiang Liu 
> wrote:
> > Adds support for interrupt coalescing feature to reduce interrupt
> events.
> > Provides a mechanism of adjusting coalescing count and timeout tick by
> > sysfs at runtime, so that tradeoff of latency and CPU load can be made
> > depending on different applications.
> >
> > Signed-off-by: Qiang Liu 
> 
> Acked-by: Li Yang 
> 
> - Leo
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ide" in
> the body of a message to majord...@vger.kernel.org More majordomo info at
> http://vger.kernel.org/majordomo-info.html

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


RE: [PATCH V2 RESEND] fsl-sata: I/O load balancing

2012-02-16 Thread Liu Qiang-B32616
> -Original Message-
> From: Benjamin Herrenschmidt [mailto:b...@kernel.crashing.org]
> Sent: Friday, February 17, 2012 11:18 AM
> To: Liu Qiang-B32616
> Cc: jgar...@pobox.com; linux-...@vger.kernel.org; linuxppc-
> d...@lists.ozlabs.org; linux-ker...@vger.kernel.org
> Subject: RE: [PATCH V2 RESEND] fsl-sata: I/O load balancing
> 
> On Fri, 2012-02-17 at 01:54 +, Liu Qiang-B32616 wrote:
> > The default will be set in a common interface
> > fsl_sata_set_irq_coalescing when initialize the controller. This
> > interface will check the range of intr count and ticks and make sure
> the values are reasonably.
> 
> Allright, but the current defaults are basically no coalescing right ?
> 
> > It's hard to find a aggressive value to adapt all scenarios, so I use
> > echo to adjust the value. I remember P5020 have some performance issue,
> I will check it.
> > BTW, which filesystem do you use? Ext2 is lower than ext4 because
> > metadata is continuously wrote to disk. You can try ext4 or xfs.
> 
> ext3 at the moment, I plan to switch to ext4 when I finish that fsck pass
> which is taking hours... I am not aware of the 5020 performance issues,
> is this something documented and/or fixable ?
> 

My patch may not meet your requirement:(
For your performance requirement, I suggest use ext4 filesystem and SSD.

Thanks.

> Cheers,
> Ben.
> 
> 

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


RE: [PATCH V2 RESEND] fsl-sata: I/O load balancing

2012-02-16 Thread Liu Qiang-B32616
> -Original Message-
> From: Benjamin Herrenschmidt [mailto:b...@kernel.crashing.org]
> Sent: Friday, February 17, 2012 8:40 AM
> To: Liu Qiang-B32616
> Cc: jgar...@pobox.com; linux-...@vger.kernel.org; linuxppc-
> d...@lists.ozlabs.org; linux-ker...@vger.kernel.org
> Subject: Re: [PATCH V2 RESEND] fsl-sata: I/O load balancing
> 
> On Wed, 2012-02-15 at 13:49 +0800, Qiang Liu wrote:
> ]
> > diff --git a/drivers/ata/sata_fsl.c b/drivers/ata/sata_fsl.c index
> > 0120b0d..d6577b9 100644
> > --- a/drivers/ata/sata_fsl.c
> > +++ b/drivers/ata/sata_fsl.c
> > @@ -6,7 +6,7 @@
> >   * Author: Ashish Kalra 
> >   * Li Yang 
> >   *
> > - * Copyright (c) 2006-2007, 2011 Freescale Semiconductor, Inc.
> > + * Copyright (c) 2006-2007, 2011-2012 Freescale Semiconductor, Inc.
> >   *
> >   * This program is free software; you can redistribute  it and/or
> modify it
> >   * under  the terms of  the GNU General  Public License as published
> > by the @@ -26,6 +26,15 @@  #include   #include
> > 
> >
> > +static unsigned int intr_coalescing_count;
> > +module_param(intr_coalescing_count, int, S_IRUGO);
> > +MODULE_PARM_DESC(intr_coalescing_count,
> > +"INT coalescing count threshold (1..31)");
> > +
> > +static unsigned int intr_coalescing_ticks;
> > +module_param(intr_coalescing_ticks, int, S_IRUGO);
> > +MODULE_PARM_DESC(intr_coalescing_ticks,
> > +"INT coalescing timer threshold in AHB ticks");
> 
> You don't seem to provide very useful defaults... for example,
> intr_coalescing_count starts at 0 which isn't even in range (the code
> fixes that up but still...).
> 
> I tried a debian install on the p5020ds here and I find SATA to be
> extremely slow, generating millions of interrupts. I think the defaults
> should be a lot more aggressive at coalescing.

Hello Ben,

The default will be set in a common interface fsl_sata_set_irq_coalescing when
initialize the controller. This interface will check the range of intr count
and ticks and make sure the values are reasonably.

It's hard to find a aggressive value to adapt all scenarios, so I use echo to 
adjust
the value. I remember P5020 have some performance issue, I will check it.
BTW, which filesystem do you use? Ext2 is lower than ext4 because metadata is 
continuously wrote to disk. You can try ext4 or xfs.

Thanks.

> 
> Cheers,
> Ben.
> 
> >  /* Controller information */
> >  enum {
> > SATA_FSL_QUEUE_DEPTH= 16,
> > @@ -83,6 +92,16 @@ enum {
> >  };
> >
> >  /*
> > + * Interrupt Coalescing Control Register bitdefs  */ enum {
> > +   ICC_MIN_INT_COUNT_THRESHOLD = 1,
> > +   ICC_MAX_INT_COUNT_THRESHOLD = ((1 << 5) - 1),
> > +   ICC_MIN_INT_TICKS_THRESHOLD = 0,
> > +   ICC_MAX_INT_TICKS_THRESHOLD = ((1 << 19) - 1),
> > +   ICC_SAFE_INT_TICKS  = 1,
> > +};
> > +
> > +/*
> >  * Host Controller command register set - per port  */  enum { @@
> > -263,8 +282,65 @@ struct sata_fsl_host_priv {
> > void __iomem *csr_base;
> > int irq;
> > int data_snoop;
> > +   struct device_attribute intr_coalescing;
> >  };
> >
> > +static void fsl_sata_set_irq_coalescing(struct ata_host *host,
> > +   unsigned int count, unsigned int ticks) {
> > +   struct sata_fsl_host_priv *host_priv = host->private_data;
> > +   void __iomem *hcr_base = host_priv->hcr_base;
> > +
> > +   if (count > ICC_MAX_INT_COUNT_THRESHOLD)
> > +   count = ICC_MAX_INT_COUNT_THRESHOLD;
> > +   else if (count < ICC_MIN_INT_COUNT_THRESHOLD)
> > +   count = ICC_MIN_INT_COUNT_THRESHOLD;
> > +
> > +   if (ticks > ICC_MAX_INT_TICKS_THRESHOLD)
> > +   ticks = ICC_MAX_INT_TICKS_THRESHOLD;
> > +   else if ((ICC_MIN_INT_TICKS_THRESHOLD == ticks) &&
> > +   (count > ICC_MIN_INT_COUNT_THRESHOLD))
> > +   ticks = ICC_SAFE_INT_TICKS;
> > +
> > +   spin_lock(&host->lock);
> > +   iowrite32((count << 24 | ticks), hcr_base + ICC);
> > +
> > +   intr_coalescing_count = count;
> > +   intr_coalescing_ticks = ticks;
> > +   spin_unlock(&host->lock);
> > +
> > +   DPRINTK("intrrupt coalescing, count = 0x%x, ticks = %x\n",
> > +   intr_coalescing_count, intr_coalescing_ticks);
> > +   DPRINTK("ICC register status: (hcr base: 0x%x) = 0x%x\n",
> > +   hcr_base, ioread32(hcr_base + ICC)); }
> > +
> > +static 

RE: [PATCH V2 RESEND] fsl-sata: I/O load balancing

2012-02-14 Thread Liu Qiang-B32616
> -Original Message-
> From: linux-ide-ow...@vger.kernel.org [mailto:linux-ide-
> ow...@vger.kernel.org] On Behalf Of Li Yang
> Sent: Wednesday, February 15, 2012 3:22 PM
> To: Liu Qiang-B32616
> Cc: jgar...@pobox.com; linux-...@vger.kernel.org; linuxppc-
> d...@lists.ozlabs.org; linux-ker...@vger.kernel.org
> Subject: Re: [PATCH V2 RESEND] fsl-sata: I/O load balancing
> 
> On Wed, Feb 15, 2012 at 1:49 PM, Qiang Liu 
> wrote:
> 
> Hi Liu Qiang,
> 
> The patch is fine except for the title and comment.  It's too vague to
> say I/O load balancing.  You need explicit description like "add
> interrupt coalescing support" as title.
> 
> > Reduce interrupt signals through reset Interrupt Coalescing Control Reg.
> > Provide dynamic method to adjust interrupt signals and timer ticks by
> sysfs.
> > It is a tradeoff for different applications.
> 
> How about:
> 
> Adds support for interrupt coalescing feature to reduce interrupt events.
> Provides mechanism of adjusting coalescing count and timeout tick by
> sysfs on runtime, so that tradeoff of latency and CPU load can be made
> depending on applications.
> 
Ok, I will change the title and comments more specifically.

> - Leo
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ide" in
> the body of a message to majord...@vger.kernel.org More majordomo info at
> http://vger.kernel.org/majordomo-info.html

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


RE: [PATCH V2] fsl-sata: I/O load balancing

2012-02-13 Thread Liu Qiang-B32616
> -Original Message-
> From: linux-ide-ow...@vger.kernel.org [mailto:linux-ide-
> ow...@vger.kernel.org] On Behalf Of Jeff Garzik
> Sent: Saturday, February 11, 2012 2:26 AM
> To: Liu Qiang-B32616
> Cc: linux-...@vger.kernel.org; linux-ker...@vger.kernel.org; linuxppc-
> d...@lists.ozlabs.org
> Subject: Re: [PATCH V2] fsl-sata: I/O load balancing
> 
> On 01/19/2012 09:19 PM, qiang@freescale.com wrote:
> > From: Qiang Liu
> >
> > Reduce interrupt signals through reset Interrupt Coalescing Control Reg.
> > Provide dynamic method to adjust interrupt signals and timer ticks by
> sysfs.
> > It is a tradeoff for different applications.
> >
> > Signed-off-by: Qiang Liu
> > ---
> >
> > change for V2
> > support dynamic config interrupt coalescing register by /sysfs
> > test random small file with iometer
> > Description:
> >1. fsl-sata interrupt will be raised 130 thousand times when write
> 8G file
> >  (dd if=/dev/zero of=/dev/sda2 bs=128K count=65536);
> >2. most of interrupts raised because of only 1-4 commands completed;
> >3. only 30 thousand times will be raised after set max interrupt
> threshold,
> >  more interrupts are coalesced as the description of ICC;
> >
> > Test methods and results:
> >1. test sequential large file performance,
> >[root@p2020ds root]# echo 31 524287>  \
> > /sys/devices/soc.0/ffe18000.sata/intr_coalescing
> >[root@p2020ds root]# dd if=/dev/zero of=/dev/sda2 bs=128K
> count=65536&
> >[root@p2020ds root]# top
> >
> >CPU %  |  dd   |  flush-8:0 | softirq
> >---
> >before | 20-22 |17-19   |7
> >---
> >after  | 18-21 |15-16   |5
> >---
> >2. test random small file with iometer,
> >   iometer paramters:
> > 4 I/Os burst length, 1MB transfer request size, 100% write, 2MB
> file size
> > as default configuration of interrupt coalescing register, 1
> interrupts and
> >   no timeout config, total write performance is 119MB per second,
> > after config with the maximum value, write performance is 110MB per
> second.
> >
> >After compare the test results, a configuable interrupt coalescing
> should be
> >better when cope with flexible context.
> >
> >   drivers/ata/sata_fsl.c |  111
> ++--
> >   1 files changed, 107 insertions(+), 4 deletions(-)
> 
> Doesn't seem to apply to upstream, or another less recent -rc...
Thanks, I will resend it latterly.
> 
>   Jeff
> 
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ide" in
> the body of a message to majord...@vger.kernel.org More majordomo info at
> http://vger.kernel.org/majordomo-info.html


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


RE: [PATCH V2] fsl-sata: I/O load balancing

2012-02-10 Thread Liu Qiang-B32616
Hi Jeff,

Do you have any suggestion about this patch :)


> -Original Message-
> From: Liu Qiang-B32616
> Sent: Friday, January 20, 2012 10:19 AM
> To: jgar...@pobox.com; linux-...@vger.kernel.org
> Cc: linux-ker...@vger.kernel.org; linuxppc-dev@lists.ozlabs.org; Liu
> Qiang-B32616
> Subject: [PATCH V2] fsl-sata: I/O load balancing
> 
> From: Qiang Liu 
> 
> Reduce interrupt signals through reset Interrupt Coalescing Control Reg.
> Provide dynamic method to adjust interrupt signals and timer ticks by
> sysfs.
> It is a tradeoff for different applications.
> 
> Signed-off-by: Qiang Liu 
> ---
> 
> change for V2
>   support dynamic config interrupt coalescing register by /sysfs
>   test random small file with iometer
> Description:
>   1. fsl-sata interrupt will be raised 130 thousand times when write 8G
> file
> (dd if=/dev/zero of=/dev/sda2 bs=128K count=65536);
>   2. most of interrupts raised because of only 1-4 commands completed;
>   3. only 30 thousand times will be raised after set max interrupt
> threshold,
> more interrupts are coalesced as the description of ICC;
> 
> Test methods and results:
>   1. test sequential large file performance,
>   [root@p2020ds root]# echo 31 524287 > \
>   /sys/devices/soc.0/ffe18000.sata/intr_coalescing
>   [root@p2020ds root]# dd if=/dev/zero of=/dev/sda2 bs=128K count=65536 &
>   [root@p2020ds root]# top
> 
>   CPU %  |  dd   |  flush-8:0 | softirq
>   ---
>   before | 20-22 |17-19   |7
>   ---
>   after  | 18-21 |15-16   |5
>   ---
>   2. test random small file with iometer,  iometer paramters:
>4 I/Os burst length, 1MB transfer request size, 100% write, 2MB file
> size
>as default configuration of interrupt coalescing register, 1
> interrupts and  no timeout config, total write performance is 119MB per
> second,
>after config with the maximum value, write performance is 110MB per
> second.
> 
>   After compare the test results, a configuable interrupt coalescing
> should be
>   better when cope with flexible context.
> 
>  drivers/ata/sata_fsl.c |  111
> ++--
>  1 files changed, 107 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/ata/sata_fsl.c b/drivers/ata/sata_fsl.c index
> 3547000..41ca495 100644
> --- a/drivers/ata/sata_fsl.c
> +++ b/drivers/ata/sata_fsl.c
> @@ -6,7 +6,7 @@
>   * Author: Ashish Kalra 
>   * Li Yang 
>   *
> - * Copyright (c) 2006-2007, 2011 Freescale Semiconductor, Inc.
> + * Copyright (c) 2006-2007, 2011-2012 Freescale Semiconductor, Inc.
>   *
>   * This program is free software; you can redistribute  it and/or modify
> it
>   * under  the terms of  the GNU General  Public License as published by
> the @@ -26,6 +26,15 @@  #include   #include
> 
> 
> +static unsigned int intr_coalescing_count;
> +module_param(intr_coalescing_count, int, S_IRUGO);
> +MODULE_PARM_DESC(intr_coalescing_count,
> +  "INT coalescing count threshold (1..31)");
> +
> +static unsigned int intr_coalescing_ticks;
> +module_param(intr_coalescing_ticks, int, S_IRUGO);
> +MODULE_PARM_DESC(intr_coalescing_ticks,
> +  "INT coalescing timer threshold in AHB ticks");
>  /* Controller information */
>  enum {
>   SATA_FSL_QUEUE_DEPTH= 16,
> @@ -83,6 +92,16 @@ enum {
>  };
> 
>  /*
> + * Interrupt Coalescing Control Register bitdefs  */ enum {
> + ICC_MIN_INT_COUNT_THRESHOLD = 1,
> + ICC_MAX_INT_COUNT_THRESHOLD = ((1 << 5) - 1),
> + ICC_MIN_INT_TICKS_THRESHOLD = 0,
> + ICC_MAX_INT_TICKS_THRESHOLD = ((1 << 19) - 1),
> + ICC_SAFE_INT_TICKS  = 1,
> +};
> +
> +/*
>  * Host Controller command register set - per port  */  enum { @@ -263,9
> +282,66 @@ struct sata_fsl_host_priv {
>   int irq;
>   int data_snoop;
>   u32 quirks;
> + struct device_attribute intr_coalescing;
>  #define SATA_FSL_QUIRK_P3P5_ERRATA   (1 << 0)
>  };
> 
> +static void fsl_sata_set_irq_coalescing(struct ata_host *host,
> + unsigned int count, unsigned int ticks) {
> + struct sata_fsl_host_priv *host_priv = host->private_data;
> + void __iomem *hcr_base = host_priv->hcr_base;
> +
> + if (count > ICC_MAX_INT_COUNT_THRESHOLD)
> + count = ICC_MAX_INT_COUNT_THRESHOLD;
> + else if (count < ICC_MIN_INT_COUNT_THRESHOLD)
> + count = ICC_MIN_INT_COUNT_THRESHOLD;
> +
> + if (ticks > ICC_MAX_INT_TICKS_THRES

RE: [PATCH][SDK v1.2] sata: I/O load balancing

2012-01-13 Thread Liu Qiang-B32616


> From: linux-ide-ow...@vger.kernel.org [linux-ide-ow...@vger.kernel.org] on 
> behalf of Li Yang [le...@freescale.com]
> Sent: Friday, January 13, 2012 3:36 AM
> To: Liu Qiang-B32616
> Cc: jgar...@pobox.com; linux-...@vger.kernel.org; 
> linux-ker...@vger.kernel.org; linuxppc-dev@lists.ozlabs.org; Liu Qiang-B32616
> Subject: Re: [PATCH][SDK v1.2] sata: I/O load balancing

> On Fri, Jan 13, 2012 at 4:25 PM, Qiang Liu  wrote:
> > From: Qiang Liu 
> >
> > Reduce interrupt singnals through reset Interrupt Coalescing Control Reg.
> > Increase the threshold value of interrupt and timer will reduce the number
> > of complete interrupt sharply. Improve the system performance effectively.

> There is always a trade off of throughput and latency by using
> interrupt coalescing.  It's not reasonable to assume that the
> throughput is the only factor to be considered and set the coalescing
> threshold and timeout to the max value by default.  Have you carried
> out other benchmark like copying many small files?
No, I didn't test small file. I think this won't affect system load. I can have 
a test
and describe the result in next patch.

> It will be safer to make the coalescing runtime configurable like the
> sata_mv driver, IMO.
Ok, I will do it like this in next patch.


- Leo
--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev