[bug report] crypto: brcm - Add Broadcom SPU driver

2017-02-13 Thread Dan Carpenter
Hello Rob Rice,

The patch 9d12ba86f818: "crypto: brcm - Add Broadcom SPU driver" from
Feb 3, 2017, leads to the following static checker warning:

drivers/crypto/bcm/cipher.c:2340 ahash_finup()
warn: 'tmpbuf' was already freed.

drivers/crypto/bcm/cipher.c
  2316  /* Copy data from req scatterlist to tmp buffer */
  2317  gfp = (req->base.flags & (CRYPTO_TFM_REQ_MAY_BACKLOG |
  2318 CRYPTO_TFM_REQ_MAY_SLEEP)) ? GFP_KERNEL : 
GFP_ATOMIC;
  2319  tmpbuf = kmalloc(req->nbytes, gfp);
  2320  if (!tmpbuf) {
  2321  ret = -ENOMEM;
  2322  goto ahash_finup_exit;
  2323  }
  2324  
  2325  if (sg_copy_to_buffer(req->src, nents, tmpbuf, 
req->nbytes) !=
  2326  req->nbytes) {
  2327  ret = -EINVAL;
  2328  goto ahash_finup_free;
  2329  }
  2330  
  2331  /* Call synchronous update */
  2332  ret = crypto_shash_finup(ctx->shash, tmpbuf, 
req->nbytes,
  2333   req->result);
  2334  kfree(tmpbuf);
^
  2335  } else {
  2336  /* Otherwise call the internal function which uses SPU 
hw */
  2337  return __ahash_finup(req);
  2338  }
  2339  ahash_finup_free:
  2340  kfree(tmpbuf);
^
I'm only working a 30 minutes per day to keep a hand in.  I'm not
sending patches this month.

  2341  
  2342  ahash_finup_exit:
  2343  /* Done with hash, can deallocate it now */
  2344  crypto_free_shash(ctx->shash->tfm);
  2345  kfree(ctx->shash);
  2346  return ret;
  2347  }

regards,
dan carpenter


[PATCH v4 4/4] dt-bindings: Add DT bindings document for Broadcom SBA RAID driver

2017-02-13 Thread Anup Patel
This patch adds the DT bindings document for newly added Broadcom
SBA RAID driver.

Signed-off-by: Anup Patel 
Reviewed-by: Ray Jui 
Reviewed-by: Scott Branden 
---
 .../devicetree/bindings/dma/brcm,iproc-sba.txt | 29 ++
 1 file changed, 29 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/dma/brcm,iproc-sba.txt

diff --git a/Documentation/devicetree/bindings/dma/brcm,iproc-sba.txt 
b/Documentation/devicetree/bindings/dma/brcm,iproc-sba.txt
new file mode 100644
index 000..092913a
--- /dev/null
+++ b/Documentation/devicetree/bindings/dma/brcm,iproc-sba.txt
@@ -0,0 +1,29 @@
+* Broadcom SBA RAID engine
+
+Required properties:
+- compatible: Should be one of the following
+ "brcm,iproc-sba"
+ "brcm,iproc-sba-v2"
+  The "brcm,iproc-sba" has support for only 6 PQ coefficients
+  The "brcm,iproc-sba-v2" has support for only 30 PQ coefficients
+- mboxes: List of phandle and mailbox channel specifiers
+
+Example:
+
+raid_mbox: mbox@6740 {
+   ...
+   #mbox-cells = <3>;
+   ...
+};
+
+raid0 {
+   compatible = "brcm,iproc-sba-v2";
+   mboxes = <_mbox 0 0x1 0x>,
+<_mbox 1 0x1 0x>,
+<_mbox 2 0x1 0x>,
+<_mbox 3 0x1 0x>,
+<_mbox 4 0x1 0x>,
+<_mbox 5 0x1 0x>,
+<_mbox 6 0x1 0x>,
+<_mbox 7 0x1 0x>;
+};
-- 
2.7.4



[PATCH v4 0/4] Broadcom SBA RAID support

2017-02-13 Thread Anup Patel
The Broadcom SBA RAID is a stream-based device which provides
RAID5/6 offload.

It requires a SoC specific ring manager (such as Broadcom FlexRM
ring manager) to provide ring-based programming interface. Due to
this, the Broadcom SBA RAID driver (mailbox client) implements
DMA device having one DMA channel using a set of mailbox channels
provided by Broadcom SoC specific ring manager driver (mailbox
controller).

The Broadcom SBA RAID hardware requires PQ disk position instead
of PQ disk coefficient. To address this, we have added raid_gflog
table which will help driver to convert PQ disk coefficient to PQ
disk position.

This patchset is based on Linux-4.10-rc2 and depends on patchset
"[PATCH v4 0/2] Broadcom FlexRM ring manager support"

It is also available at sba-raid-v4 branch of
https://github.com/Broadcom/arm64-linux.git

Changes since v3:
 - Replaced SBA_ENC() with sba_cmd_enc() inline function
 - Use list_first_entry_or_null() wherever possible
 - Remove unwanted brances around loops wherever possible
 - Use lockdep_assert_held() where required

Changes since v2:
 - Droped patch to handle DMA devices having support for fewer
   PQ coefficients in Linux Async Tx
 - Added work-around in bcm-sba-raid driver to handle unsupported
   PQ coefficients using multiple SBA requests

Changes since v1:
 - Droped patch to add mbox_channel_device() API
 - Used GENMASK and BIT macros wherever possible in bcm-sba-raid driver
 - Replaced C_MDATA macros with static inline functions in
   bcm-sba-raid driver
 - Removed sba_alloc_chan_resources() callback in bcm-sba-raid driver
 - Used dev_err() instead of dev_info() wherever applicable
 - Removed call to sba_issue_pending() from sba_tx_submit() in
   bcm-sba-raid driver
 - Implemented SBA request chaning for handling (len > sba->req_size)
   in bcm-sba-raid driver
 - Implemented device_terminate_all() callback in bcm-sba-raid driver

Anup Patel (4):
  lib/raid6: Add log-of-2 table for RAID6 HW requiring disk position
  async_tx: Fix DMA_PREP_FENCE usage in do_async_gen_syndrome()
  dmaengine: Add Broadcom SBA RAID driver
  dt-bindings: Add DT bindings document for Broadcom SBA RAID driver

 .../devicetree/bindings/dma/brcm,iproc-sba.txt |   29 +
 crypto/async_tx/async_pq.c |5 +-
 drivers/dma/Kconfig|   13 +
 drivers/dma/Makefile   |1 +
 drivers/dma/bcm-sba-raid.c | 1694 
 include/linux/raid/pq.h|1 +
 lib/raid6/mktables.c   |   20 +
 7 files changed, 1760 insertions(+), 3 deletions(-)
 create mode 100644 Documentation/devicetree/bindings/dma/brcm,iproc-sba.txt
 create mode 100644 drivers/dma/bcm-sba-raid.c

-- 
2.7.4



[PATCH v4 3/4] dmaengine: Add Broadcom SBA RAID driver

2017-02-13 Thread Anup Patel
The Broadcom stream buffer accelerator (SBA) provides offloading
capabilities for RAID operations. This SBA offload engine is
accessible via Broadcom SoC specific ring manager.

This patch adds Broadcom SBA RAID driver which provides one
DMA device with RAID capabilities using one or more Broadcom
SoC specific ring manager channels. The SBA RAID driver in its
current shape implements memcpy, xor, and pq operations.

Signed-off-by: Anup Patel 
Reviewed-by: Ray Jui 
---
 drivers/dma/Kconfig|   13 +
 drivers/dma/Makefile   |1 +
 drivers/dma/bcm-sba-raid.c | 1694 
 3 files changed, 1708 insertions(+)
 create mode 100644 drivers/dma/bcm-sba-raid.c

diff --git a/drivers/dma/Kconfig b/drivers/dma/Kconfig
index 263495d..bf8fb84 100644
--- a/drivers/dma/Kconfig
+++ b/drivers/dma/Kconfig
@@ -99,6 +99,19 @@ config AXI_DMAC
  controller is often used in Analog Device's reference designs for FPGA
  platforms.
 
+config BCM_SBA_RAID
+   tristate "Broadcom SBA RAID engine support"
+   depends on (ARM64 && MAILBOX && RAID6_PQ) || COMPILE_TEST
+   select DMA_ENGINE
+   select DMA_ENGINE_RAID
+   select ASYNC_TX_ENABLE_CHANNEL_SWITCH
+   default ARCH_BCM_IPROC
+   help
+ Enable support for Broadcom SBA RAID Engine. The SBA RAID
+ engine is available on most of the Broadcom iProc SoCs. It
+ has the capability to offload memcpy, xor and pq computation
+ for raid5/6.
+
 config COH901318
bool "ST-Ericsson COH901318 DMA support"
select DMA_ENGINE
diff --git a/drivers/dma/Makefile b/drivers/dma/Makefile
index a4fa336..ba96bdd 100644
--- a/drivers/dma/Makefile
+++ b/drivers/dma/Makefile
@@ -17,6 +17,7 @@ obj-$(CONFIG_AMCC_PPC440SPE_ADMA) += ppc4xx/
 obj-$(CONFIG_AT_HDMAC) += at_hdmac.o
 obj-$(CONFIG_AT_XDMAC) += at_xdmac.o
 obj-$(CONFIG_AXI_DMAC) += dma-axi-dmac.o
+obj-$(CONFIG_BCM_SBA_RAID) += bcm-sba-raid.o
 obj-$(CONFIG_COH901318) += coh901318.o coh901318_lli.o
 obj-$(CONFIG_DMA_BCM2835) += bcm2835-dma.o
 obj-$(CONFIG_DMA_JZ4740) += dma-jz4740.o
diff --git a/drivers/dma/bcm-sba-raid.c b/drivers/dma/bcm-sba-raid.c
new file mode 100644
index 000..279e5e2
--- /dev/null
+++ b/drivers/dma/bcm-sba-raid.c
@@ -0,0 +1,1694 @@
+/*
+ * Copyright (C) 2017 Broadcom
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+/*
+ * Broadcom SBA RAID Driver
+ *
+ * The Broadcom stream buffer accelerator (SBA) provides offloading
+ * capabilities for RAID operations. The SBA offload engine is accessible
+ * via Broadcom SoC specific ring manager. Two or more offload engines
+ * can share same Broadcom SoC specific ring manager due to this Broadcom
+ * SoC specific ring manager driver is implemented as a mailbox controller
+ * driver and offload engine drivers are implemented as mallbox clients.
+ *
+ * Typically, Broadcom SoC specific ring manager will implement larger
+ * number of hardware rings over one or more SBA hardware devices. By
+ * design, the internal buffer size of SBA hardware device is limited
+ * but all offload operations supported by SBA can be broken down into
+ * multiple small size requests and executed parallely on multiple SBA
+ * hardware devices for achieving high through-put.
+ *
+ * The Broadcom SBA RAID driver does not require any register programming
+ * except submitting request to SBA hardware device via mailbox channels.
+ * This driver implements a DMA device with one DMA channel using a set
+ * of mailbox channels provided by Broadcom SoC specific ring manager
+ * driver. To exploit parallelism (as described above), all DMA request
+ * coming to SBA RAID DMA channel are broken down to smaller requests
+ * and submitted to multiple mailbox channels in round-robin fashion.
+ * For having more SBA DMA channels, we can create more SBA device nodes
+ * in Broadcom SoC specific DTS based on number of hardware rings supported
+ * by Broadcom SoC ring manager.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "dmaengine.h"
+
+/* SBA command related defines */
+#define SBA_TYPE_SHIFT 48
+#define SBA_TYPE_MASK  GENMASK(1, 0)
+#define SBA_TYPE_A 0x0
+#define SBA_TYPE_B 0x2
+#define SBA_TYPE_C 0x3
+#define SBA_USER_DEF_SHIFT 32
+#define SBA_USER_DEF_MASK  GENMASK(15, 0)
+#define SBA_R_MDATA_SHIFT  24
+#define SBA_R_MDATA_MASK   GENMASK(7, 0)
+#define SBA_C_MDATA_MS_SHIFT   18
+#define 

[PATCH v4 1/4] lib/raid6: Add log-of-2 table for RAID6 HW requiring disk position

2017-02-13 Thread Anup Patel
The raid6_gfexp table represents {2}^n values for 0 <= n < 256. The
Linux async_tx framework pass values from raid6_gfexp as coefficients
for each source to prep_dma_pq() callback of DMA channel with PQ
capability. This creates problem for RAID6 offload engines (such as
Broadcom SBA) which take disk position (i.e. log of {2}) instead of
multiplicative cofficients from raid6_gfexp table.

This patch adds raid6_gflog table having log-of-2 value for any given
x such that 0 <= x < 256. For any given disk coefficient x, the
corresponding disk position is given by raid6_gflog[x]. The RAID6
offload engine driver can use this newly added raid6_gflog table to
get disk position from multiplicative coefficient.

Signed-off-by: Anup Patel 
Reviewed-by: Scott Branden 
Reviewed-by: Ray Jui 
---
 include/linux/raid/pq.h |  1 +
 lib/raid6/mktables.c| 20 
 2 files changed, 21 insertions(+)

diff --git a/include/linux/raid/pq.h b/include/linux/raid/pq.h
index 4d57bba..30f9453 100644
--- a/include/linux/raid/pq.h
+++ b/include/linux/raid/pq.h
@@ -142,6 +142,7 @@ int raid6_select_algo(void);
 extern const u8 raid6_gfmul[256][256] __attribute__((aligned(256)));
 extern const u8 raid6_vgfmul[256][32] __attribute__((aligned(256)));
 extern const u8 raid6_gfexp[256]  __attribute__((aligned(256)));
+extern const u8 raid6_gflog[256]  __attribute__((aligned(256)));
 extern const u8 raid6_gfinv[256]  __attribute__((aligned(256)));
 extern const u8 raid6_gfexi[256]  __attribute__((aligned(256)));
 
diff --git a/lib/raid6/mktables.c b/lib/raid6/mktables.c
index 39787db..e824d08 100644
--- a/lib/raid6/mktables.c
+++ b/lib/raid6/mktables.c
@@ -125,6 +125,26 @@ int main(int argc, char *argv[])
printf("EXPORT_SYMBOL(raid6_gfexp);\n");
printf("#endif\n");
 
+   /* Compute log-of-2 table */
+   printf("\nconst u8 __attribute__((aligned(256)))\n"
+  "raid6_gflog[256] =\n" "{\n");
+   for (i = 0; i < 256; i += 8) {
+   printf("\t");
+   for (j = 0; j < 8; j++) {
+   v = 255;
+   for (k = 0; k < 256; k++)
+   if (exptbl[k] == (i + j)) {
+   v = k;
+   break;
+   }
+   printf("0x%02x,%c", v, (j == 7) ? '\n' : ' ');
+   }
+   }
+   printf("};\n");
+   printf("#ifdef __KERNEL__\n");
+   printf("EXPORT_SYMBOL(raid6_gflog);\n");
+   printf("#endif\n");
+
/* Compute inverse table x^-1 == x^254 */
printf("\nconst u8 __attribute__((aligned(256)))\n"
   "raid6_gfinv[256] =\n" "{\n");
-- 
2.7.4



[PATCH v4 2/4] async_tx: Fix DMA_PREP_FENCE usage in do_async_gen_syndrome()

2017-02-13 Thread Anup Patel
The DMA_PREP_FENCE is to be used when preparing Tx descriptor if output
of Tx descriptor is to be used by next/dependent Tx descriptor.

The DMA_PREP_FENSE will not be set correctly in do_async_gen_syndrome()
when calling dma->device_prep_dma_pq() under following conditions:
1. ASYNC_TX_FENCE not set in submit->flags
2. DMA_PREP_FENCE not set in dma_flags
3. src_cnt (= (disks - 2)) is greater than dma_maxpq(dma, dma_flags)

This patch fixes DMA_PREP_FENCE usage in do_async_gen_syndrome() taking
inspiration from do_async_xor() implementation.

Signed-off-by: Anup Patel 
Reviewed-by: Ray Jui 
Reviewed-by: Scott Branden 
---
 crypto/async_tx/async_pq.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/crypto/async_tx/async_pq.c b/crypto/async_tx/async_pq.c
index f83de99..56bd612 100644
--- a/crypto/async_tx/async_pq.c
+++ b/crypto/async_tx/async_pq.c
@@ -62,9 +62,6 @@ do_async_gen_syndrome(struct dma_chan *chan,
dma_addr_t dma_dest[2];
int src_off = 0;
 
-   if (submit->flags & ASYNC_TX_FENCE)
-   dma_flags |= DMA_PREP_FENCE;
-
while (src_cnt > 0) {
submit->flags = flags_orig;
pq_src_cnt = min(src_cnt, dma_maxpq(dma, dma_flags));
@@ -83,6 +80,8 @@ do_async_gen_syndrome(struct dma_chan *chan,
if (cb_fn_orig)
dma_flags |= DMA_PREP_INTERRUPT;
}
+   if (submit->flags & ASYNC_TX_FENCE)
+   dma_flags |= DMA_PREP_FENCE;
 
/* Drivers force forward progress in case they can not provide
 * a descriptor
-- 
2.7.4



[PATCH] crypto: fix typo in doc

2017-02-13 Thread Gilad Ben-Yossef
Fix a single letter typo in api-skcipher.rst.

Signed-off-by: Gilad Ben-Yossef 
---
 Documentation/crypto/api-skcipher.rst | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Documentation/crypto/api-skcipher.rst 
b/Documentation/crypto/api-skcipher.rst
index b20028a..4eec4a9 100644
--- a/Documentation/crypto/api-skcipher.rst
+++ b/Documentation/crypto/api-skcipher.rst
@@ -59,4 +59,4 @@ Synchronous Block Cipher API - Deprecated
:doc: Synchronous Block Cipher API
 
 .. kernel-doc:: include/linux/crypto.h
-   :functions: crypto_alloc_blkcipher rypto_free_blkcipher 
crypto_has_blkcipher crypto_blkcipher_name crypto_blkcipher_ivsize 
crypto_blkcipher_blocksize crypto_blkcipher_setkey crypto_blkcipher_encrypt 
crypto_blkcipher_encrypt_iv crypto_blkcipher_decrypt 
crypto_blkcipher_decrypt_iv crypto_blkcipher_set_iv crypto_blkcipher_get_iv
+   :functions: crypto_alloc_blkcipher crypto_free_blkcipher 
crypto_has_blkcipher crypto_blkcipher_name crypto_blkcipher_ivsize 
crypto_blkcipher_blocksize crypto_blkcipher_setkey crypto_blkcipher_encrypt 
crypto_blkcipher_encrypt_iv crypto_blkcipher_decrypt 
crypto_blkcipher_decrypt_iv crypto_blkcipher_set_iv crypto_blkcipher_get_iv
-- 
2.1.4



Re: [PATCH v3 3/4] dmaengine: Add Broadcom SBA RAID driver

2017-02-13 Thread Anup Patel
On Fri, Feb 10, 2017 at 11:20 PM, Dan Williams  wrote:
> On Fri, Feb 10, 2017 at 1:07 AM, Anup Patel  wrote:
>> The Broadcom stream buffer accelerator (SBA) provides offloading
>> capabilities for RAID operations. This SBA offload engine is
>> accessible via Broadcom SoC specific ring manager.
>>
>> This patch adds Broadcom SBA RAID driver which provides one
>> DMA device with RAID capabilities using one or more Broadcom
>> SoC specific ring manager channels. The SBA RAID driver in its
>> current shape implements memcpy, xor, and pq operations.
>>
>> Signed-off-by: Anup Patel 
>> Reviewed-by: Ray Jui 
>> ---
>>  drivers/dma/Kconfig|   13 +
>>  drivers/dma/Makefile   |1 +
>>  drivers/dma/bcm-sba-raid.c | 1711 
>> 
>>  3 files changed, 1725 insertions(+)
>>  create mode 100644 drivers/dma/bcm-sba-raid.c
>>
>> diff --git a/drivers/dma/Kconfig b/drivers/dma/Kconfig
>> index 263495d..bf8fb84 100644
>> --- a/drivers/dma/Kconfig
>> +++ b/drivers/dma/Kconfig
>> @@ -99,6 +99,19 @@ config AXI_DMAC
>>   controller is often used in Analog Device's reference designs for 
>> FPGA
>>   platforms.
>>
>> +config BCM_SBA_RAID
>> +   tristate "Broadcom SBA RAID engine support"
>> +   depends on (ARM64 && MAILBOX && RAID6_PQ) || COMPILE_TEST
>> +   select DMA_ENGINE
>> +   select DMA_ENGINE_RAID
>> +   select ASYNC_TX_ENABLE_CHANNEL_SWITCH
>
> ASYNC_TX_ENABLE_CHANNEL_SWITCH violates the DMA mapping API and
> Russell has warned it's especially problematic on ARM [1].  If you
> need channel switching for this offload engine to be useful then you
> need to move DMA mapping and channel switching responsibilities to MD
> itself.
>
> [1]: 
> http://lists.infradead.org/pipermail/linux-arm-kernel/2011-January/036753.html

In case of BCM-SBA-RAID, the underlying "struct device" for each
DMA channel is the mailbox controller "struct device" (i.e. Ring
Manager device). This is because Ring Manager HW is the front
facing device which we program to submit work to BCM-SBA-RAID
HW.

This means DMA channels provided by BCM-SBA-RAID driver
will use same "struct device" for DMA mappings hence channel
switching between BCM-SBA-RAID DMA channels is safe.

Due to above, we can safely enable
ASYNC_TX_ENABLE_CHANNEL_SWITCH option for
BCM-SBA-RAID driver.

Regards,
Anup


Re: [PATCH v3 3/4] dmaengine: Add Broadcom SBA RAID driver

2017-02-13 Thread Anup Patel
On Mon, Feb 13, 2017 at 2:43 PM, Anup Patel  wrote:
> On Fri, Feb 10, 2017 at 11:20 PM, Dan Williams  
> wrote:
>> On Fri, Feb 10, 2017 at 1:07 AM, Anup Patel  wrote:
>>> The Broadcom stream buffer accelerator (SBA) provides offloading
>>> capabilities for RAID operations. This SBA offload engine is
>>> accessible via Broadcom SoC specific ring manager.
>>>
>>> This patch adds Broadcom SBA RAID driver which provides one
>>> DMA device with RAID capabilities using one or more Broadcom
>>> SoC specific ring manager channels. The SBA RAID driver in its
>>> current shape implements memcpy, xor, and pq operations.
>>>
>>> Signed-off-by: Anup Patel 
>>> Reviewed-by: Ray Jui 
>>> ---
>>>  drivers/dma/Kconfig|   13 +
>>>  drivers/dma/Makefile   |1 +
>>>  drivers/dma/bcm-sba-raid.c | 1711 
>>> 
>>>  3 files changed, 1725 insertions(+)
>>>  create mode 100644 drivers/dma/bcm-sba-raid.c
>>>
>>> diff --git a/drivers/dma/Kconfig b/drivers/dma/Kconfig
>>> index 263495d..bf8fb84 100644
>>> --- a/drivers/dma/Kconfig
>>> +++ b/drivers/dma/Kconfig
>>> @@ -99,6 +99,19 @@ config AXI_DMAC
>>>   controller is often used in Analog Device's reference designs for 
>>> FPGA
>>>   platforms.
>>>
>>> +config BCM_SBA_RAID
>>> +   tristate "Broadcom SBA RAID engine support"
>>> +   depends on (ARM64 && MAILBOX && RAID6_PQ) || COMPILE_TEST
>>> +   select DMA_ENGINE
>>> +   select DMA_ENGINE_RAID
>>> +   select ASYNC_TX_ENABLE_CHANNEL_SWITCH
>>
>> ASYNC_TX_ENABLE_CHANNEL_SWITCH violates the DMA mapping API and
>> Russell has warned it's especially problematic on ARM [1].  If you
>> need channel switching for this offload engine to be useful then you
>> need to move DMA mapping and channel switching responsibilities to MD
>> itself.
>>
>> [1]: 
>> http://lists.infradead.org/pipermail/linux-arm-kernel/2011-January/036753.html
>
> Actually driver works fine with/without
> ASYNC_TX_ENABLE_CHANNEL_SWITCH enabled
> so I am fine with removing dependency on this config option.

I stand corrected.

Previously, when I had tried removing
ASYNC_TX_ENABLE_CHANNEL_SWITCH
from BCM_SBA_RAID config option it worked because other
drivers such xgene-dma and mv_xor_v2 are selecting this option.

The BCM-SBA-RAID driver requires
ASYNC_TX_ENABLE_CHANNEL_SWITCH option

There is no issue reported for
ASYNC_TX_ENABLE_CHANNEL_SWITCH
with ARM64 kernel.

The issue you pointed out was with ARM kernel.

We will have to select
ASYNC_TX_ENABLE_CHANNEL_SWITCH
for BCM-SBA-RAID driver
just like other ARM64 RAID drivers such
as xgene-dma and mv_xor_v2.
(Refer, XGENE_DMA and MV_XOR_V2 options)

Regards,
Anup


Re: [PATCH v3] crypto: algapi - make crypto_xor() and crypto_inc() alignment agnostic

2017-02-13 Thread Jason A. Donenfeld
On Sun, Feb 5, 2017 at 11:06 AM, Ard Biesheuvel
 wrote:
> +   if (IS_ENABLED(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS) ||
> +   !((unsigned long)b & (__alignof__(*b) - 1)))

Why not simply use the IS_ALIGNED macro?

Also, are you might consider checking to see if this is a constant, so
that you can avoid an unnecessary branch. Alternatively, if you want
to use the branch, I'd be interested in you writing back saying, "I
tested both cases, and branching is faster than always using the slow
unaligned path."
> +   while (((unsigned long)dst & (relalign - 1)) && len > 0) {

IS_ALIGNED

> +static inline void crypto_xor(u8 *dst, const u8 *src, unsigned int size)
> +{
> +   if (IS_ENABLED(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS) &&
> +   __builtin_constant_p(size) &&
> +   (size % sizeof(unsigned long)) == 0) {

You can expand this condition to be:

if ( (is_constant(size) && size%sizeof(ulong)==0) &&
(efficient_unaligned || (is_constant(dst) && is_constant(src) &&
is_aligned(dst) && is_aligned(src))) )

It might seem complex, but it all gets compiled out.


Re: crypto: NULL deref in sha512_mb_mgr_get_comp_job_avx2

2017-02-13 Thread Tim Chen
On Sat, 2017-02-11 at 18:50 +0800, Herbert Xu wrote:
> On Wed, Feb 01, 2017 at 10:45:02AM -0800, Tim Chen wrote:
> > 
> > 
> > One theory that Mehga and I have is that perhaps the flusher
> > and regular computaion updates are stepping on each other. 
> > Can you try this patch and see if it helps?
> Patch applied.  Thanks.

Herbert,

Megha is now able to create a test set up that produce
similar problem reported by Dmitry.  This patch did not
completely fix it.  So maybe you can hold off on merging
this patch to the mainline till we can develop a more
complete fix.

Thanks.

Tim


Qualcomm QCE driver: XTS setkey only allows 128 bit AES

2017-02-13 Thread Stephan Müller
Hi,

The Qualcomm QCE driver implementation defines:

.flags  = QCE_ALG_AES | QCE_MODE_XTS,
.name   = "xts(aes)",
.drv_name   = "xts-aes-qce",
.blocksize  = AES_BLOCK_SIZE,
.ivsize = AES_BLOCK_SIZE,
.min_keysize= AES_MIN_KEY_SIZE,
.max_keysize= AES_MAX_KEY_SIZE,

and

alg->cra_ablkcipher.min_keysize = def->min_keysize;
alg->cra_ablkcipher.max_keysize = def->max_keysize;
alg->cra_ablkcipher.setkey = qce_ablkcipher_setkey;

Thus, this driver has the limits of 128 to 256 bits for the key. Furthermore, 
the common setkey function is used.

May I ask how the key for AES XTS is supposed to be handled here considering 
that the kernel crypto API expects that the AES key and the tweak key is set 
via one setkey call. I.e. the setkey should expect 256 through 512 bits.

Thanks.

Ciao
Stephan


Re: [PATCH v4 0/3] Add Broadcom SPU Crypto Driver

2017-02-13 Thread Jon Mason
On Sat, Feb 11, 2017 at 5:54 AM, Herbert Xu  wrote:
> On Fri, Feb 03, 2017 at 12:55:31PM -0500, Rob Rice wrote:
>> Changes in v4:
>> - Added Rob Herring's Acked-by to patch 1/3 for bindings doc
>> - In response to Herbert's comment, in ahash_export() and
>>   ahash_import(), only copy the hash state, not state params
>>   related to cipher or aead algos.
>> - Noticed that hmac_offset in iproc_reqctx_s and spu_hash_params
>>   wasn't really used. So removed.
>
> Patches 1-2 applied.  Thanks.

Thanks Herbert!

Florian, could you please include patch #3 in your DT branch?

Thanks,
Jon

> --
> Email: Herbert Xu 
> Home Page: http://gondor.apana.org.au/~herbert/
> PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt


Re: [PATCH] lz4: fix performance regressions

2017-02-13 Thread Willy Tarreau
On Mon, Feb 13, 2017 at 12:53:49PM +0100, Sven Schmidt wrote:
> On Sun, Feb 12, 2017 at 10:41:17PM +0100, Willy Tarreau wrote:
> > On Sun, Feb 12, 2017 at 04:20:00PM +0100, Sven Schmidt wrote:
> > > On Sun, Feb 12, 2017 at 02:05:08PM +0100, Willy Tarreau wrote:
> > > > Hi Sven,
> > > > 
> > > > On Sun, Feb 12, 2017 at 12:16:18PM +0100, Sven Schmidt wrote:
> > > > > Fix performance regressions compared to current kernel LZ4
> > > > 
> > > > Your patch contains mostly style cleanups which certainly are welcome
> > > > but make the whole patch hard to review. These cleanups would have been
> > > > better into a separate, preliminary patch IMHO.
> > > > 
> > > > Regards,
> > > > Willy
> > > 
> > > Hi Willy,
> > > 
> > > the problem was, I wanted to compare my version to the upstream LZ4 to 
> > > find bugs (as with my last patch version: wrong indentation in LZ4HC 
> > > in two for loops). But since the LZ4 code is a pain to read, I made 
> > > additional style cleanups "on the way".
> > 
> > Oh I can easily understand!
> > 
> > > Hope you can manage to review the patch though, because it is difficult 
> > > to separate the cleanups now.
> > 
> > When I need to split a patch into pieces, usually what I do is that I
> > revert it, re-apply it without committing, then "git add -p", validate
> > all the hunks to be taken as the first patch (ie here the cleanups),
> > commit, then commit the rest as a separate one. It seems to me that the
> > fix is in the last few hunks though I'm not sure yet.
> > 
> > Thanks,
> > Willy
> 
> Hi Willy,
> 
> I didn't know about this 'trick' until now. Thanks for sharing it! I gave it 
> a short try recently, that's really cool!
> 
> Since the problem discussed in this branch of this thread seems to be solved 
> (see Minchans E-Mail), I won't split the patches, though.
> Or is there an actual need for doing so? I will send an updated patchset 
> (containing these patches + the other ones suggested by Eric) later.

It's probably too late for this time, but keep it in mind for next time :-)

willy


Re: [PATCH v7 0/5] Update LZ4 compressor module

2017-02-13 Thread Sven Schmidt
On Mon, Feb 13, 2017 at 09:03:24AM +0900, Minchan Kim wrote:
> Hi Sven,
> 
> On Sun, Feb 12, 2017 at 12:16:17PM +0100, Sven Schmidt wrote:
> > 
> > 
> > 
> > On 02/10/2017 01:13 AM, Minchan Kim wrote:
> > > Hello Sven,
> > >
> > > On Thu, Feb 09, 2017 at 11:56:17AM +0100, Sven Schmidt wrote:
> > >> Hey Minchan,
> > >>
> > >> On Thu, Feb 09, 2017 at 08:31:21AM +0900, Minchan Kim wrote:
> > >>> Hello Sven,
> > >>>
> > >>> On Sun, Feb 05, 2017 at 08:09:03PM +0100, Sven Schmidt wrote:
> > 
> >  This patchset is for updating the LZ4 compression module to a version 
> >  based
> >  on LZ4 v1.7.3 allowing to use the fast compression algorithm aka LZ4 
> >  fast
> >  which provides an "acceleration" parameter as a tradeoff between
> >  high compression ratio and high compression speed.
> > 
> >  We want to use LZ4 fast in order to support compression in lustre
> >  and (mostly, based on that) investigate data reduction techniques in 
> >  behalf of
> >  storage systems.
> > 
> >  Also, it will be useful for other users of LZ4 compression, as with 
> >  LZ4 fast
> >  it is possible to enable applications to use fast and/or high 
> >  compression
> >  depending on the usecase.
> >  For instance, ZRAM is offering a LZ4 backend and could benefit from an 
> >  updated
> >  LZ4 in the kernel.
> > 
> >  LZ4 homepage: http://www.lz4.org/
> >  LZ4 source repository: https://github.com/lz4/lz4
> >  Source version: 1.7.3
> > 
> >  Benchmark (taken from [1], Core i5-4300U @1.9GHz):
> >  |--||--
> >  Compressor  | Compression  | Decompression  | Ratio
> >  |--||--
> >  memcpy  |  4200 MB/s   |  4200 MB/s | 1.000
> >  LZ4 fast 50 |  1080 MB/s   |  2650 MB/s | 1.375
> >  LZ4 fast 17 |   680 MB/s   |  2220 MB/s | 1.607
> >  LZ4 fast 5  |   475 MB/s   |  1920 MB/s | 1.886
> >  LZ4 default |   385 MB/s   |  1850 MB/s | 2.101
> > 
> >  [1] 
> >  http://fastcompression.blogspot.de/2015/04/sampling-or-faster-lz4.html
> > 
> >  [PATCH 1/5] lib: Update LZ4 compressor module
> >  [PATCH 2/5] lib/decompress_unlz4: Change module to work with new LZ4 
> >  module version
> >  [PATCH 3/5] crypto: Change LZ4 modules to work with new LZ4 module 
> >  version
> >  [PATCH 4/5] fs/pstore: fs/squashfs: Change usage of LZ4 to work with 
> >  new LZ4 version
> >  [PATCH 5/5] lib/lz4: Remove back-compat wrappers
> > >>>
> > >>> Today, I did zram-lz4 performance test with fio in current mmotm and
> > >>> found it makes regression about 20%.
> > >>>
> > >>> "lz4-update" means current mmots(git://git.cmpxchg.org/linux-mmots.git) 
> > >>> so
> > >>> applied your 5 patches. (But now sure current mmots has recent uptodate
> > >>> patches)
> > >>> "revert" means I reverted your 5 patches in current mmots.
> > >>>
> > >>>  revertlz4-update
> > >>>
> > >>>   seq-write   1547   1339  86.55%
> > >>>  rand-write  22775  19381  85.10%
> > >>>seq-read   7035   5589  79.45%
> > >>>   rand-read  78556  68479  87.17%
> > >>>mixed-seq(R)   1305   1066  81.69%
> > >>>mixed-seq(W)   1205984  81.66%
> > >>>   mixed-rand(R)  17421  14993  86.06%
> > >>>   mixed-rand(W)  17391  14968  86.07%
> > >>
> > >> which parts of the output (as well as units) are these values exactly?
> > >> I did not work with fio until now, so I think I might ask before 
> > >> misinterpreting my results.
> > >
> > > It is IOPS.
> > >
> > >>  
> > >>> My fio description file
> > >>>
> > >>> [global]
> > >>> bs=4k
> > >>> ioengine=sync
> > >>> size=100m
> > >>> numjobs=1
> > >>> group_reporting
> > >>> buffer_compress_percentage=30
> > >>> scramble_buffers=0
> > >>> filename=/dev/zram0
> > >>> loops=10
> > >>> fsync_on_close=1
> > >>>
> > >>> [seq-write]
> > >>> bs=64k
> > >>> rw=write
> > >>> stonewall
> > >>>
> > >>> [rand-write]
> > >>> rw=randwrite
> > >>> stonewall
> > >>>
> > >>> [seq-read]
> > >>> bs=64k
> > >>> rw=read
> > >>> stonewall
> > >>>
> > >>> [rand-read]
> > >>> rw=randread
> > >>> stonewall
> > >>>
> > >>> [mixed-seq]
> > >>> bs=64k
> > >>> rw=rw
> > >>> stonewall
> > >>>
> > >>> [mixed-rand]
> > >>> rw=randrw
> > >>> stonewall
> > >>>
> > >>
> > >> Great, this makes it easy for me to reproduce your test.
> > >
> > > If you have trouble to reproduce, feel free to ask me. I'm happy to test 
> > > it. :)
> > >
> > > Thanks!
> > >
> > 
> > Hi Minchan,
> > 
> > I will send an updated patch as a reply to this E-Mail. Would be really 
> > grateful If you'd test it and provide feedback!
> > The patch should be applied to the current mmots tree.
> > 
> > In fact, the updated LZ4 _is_ slower than the 

[PATCH] hw_random: update help description for omap-rng

2017-02-13 Thread Russell King
omap-rng also supports Marvell Armada 7k/8k SoCs, but no mention of this
is made in the help text, despite the dependency being added. Explicitly
mention these SoCs in the help description so people know that it covers
more than just TI SoCs.

Fixes: 383212425c92 ("hwrng: omap - Add device variant for SafeXcel IP-76 found 
in Armada 8K")
Signed-off-by: Russell King 
---
 drivers/char/hw_random/Kconfig | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/char/hw_random/Kconfig b/drivers/char/hw_random/Kconfig
index ceff2fc524b1..0cafe08919c9 100644
--- a/drivers/char/hw_random/Kconfig
+++ b/drivers/char/hw_random/Kconfig
@@ -172,8 +172,8 @@ config HW_RANDOM_OMAP
default HW_RANDOM
---help---
  This driver provides kernel-side support for the Random Number
- Generator hardware found on OMAP16xx, OMAP2/3/4/5 and AM33xx/AM43xx
- multimedia processors.
+ Generator hardware found on OMAP16xx, OMAP2/3/4/5, AM33xx/AM43xx
+ multimedia processors, and Marvell Armada 7k/8k SoCs.
 
  To compile this driver as a module, choose M here: the
  module will be called omap-rng.
-- 
2.7.4



Re: [PATCH] lz4: fix performance regressions

2017-02-13 Thread Sven Schmidt
On Sun, Feb 12, 2017 at 10:41:17PM +0100, Willy Tarreau wrote:
> On Sun, Feb 12, 2017 at 04:20:00PM +0100, Sven Schmidt wrote:
> > On Sun, Feb 12, 2017 at 02:05:08PM +0100, Willy Tarreau wrote:
> > > Hi Sven,
> > > 
> > > On Sun, Feb 12, 2017 at 12:16:18PM +0100, Sven Schmidt wrote:
> > > > Fix performance regressions compared to current kernel LZ4
> > > 
> > > Your patch contains mostly style cleanups which certainly are welcome
> > > but make the whole patch hard to review. These cleanups would have been
> > > better into a separate, preliminary patch IMHO.
> > > 
> > > Regards,
> > > Willy
> > 
> > Hi Willy,
> > 
> > the problem was, I wanted to compare my version to the upstream LZ4 to find 
> > bugs (as with my last patch version: wrong indentation in LZ4HC 
> > in two for loops). But since the LZ4 code is a pain to read, I made 
> > additional style cleanups "on the way".
> 
> Oh I can easily understand!
> 
> > Hope you can manage to review the patch though, because it is difficult to 
> > separate the cleanups now.
> 
> When I need to split a patch into pieces, usually what I do is that I
> revert it, re-apply it without committing, then "git add -p", validate
> all the hunks to be taken as the first patch (ie here the cleanups),
> commit, then commit the rest as a separate one. It seems to me that the
> fix is in the last few hunks though I'm not sure yet.
> 
> Thanks,
> Willy

Hi Willy,

I didn't know about this 'trick' until now. Thanks for sharing it! I gave it a 
short try recently, that's really cool!

Since the problem discussed in this branch of this thread seems to be solved 
(see Minchans E-Mail), I won't split the patches, though.
Or is there an actual need for doing so? I will send an updated patchset 
(containing these patches + the other ones suggested by Eric) later.

Regards,

Sven


[PATCH v3 1/2] crypto: skcipher AF_ALG - overhaul memory management

2017-02-13 Thread Stephan Müller
The updated memory management is described in the top part of the code.
As one benefit of the changed memory management, the AIO and synchronous
operation is now implemented in one common function. The AF_ALG
operation uses the async kernel crypto API interface for each cipher
operation. Thus, the only difference between the AIO and sync operation
types visible from user space is:

1. the callback function to be invoked when the asynchronous operation
   is completed

2. whether to wait for the completion of the kernel crypto API operation
   or not

In addition, the code structure is adjusted to match the structure of
algif_aead for easier code assessment.

The user space interface changed slightly as follows: the old AIO
operation returned zero upon success and < 0 in case of an error to user
space. As all other AF_ALG interfaces (including the sync skcipher
interface) returned the number of processed bytes upon success and < 0
in case of an error, the new skcipher interface (regardless of AIO or
sync) returns the number of processed bytes in case of success.

Signed-off-by: Stephan Mueller 
---
 crypto/algif_skcipher.c | 477 
 1 file changed, 198 insertions(+), 279 deletions(-)

diff --git a/crypto/algif_skcipher.c b/crypto/algif_skcipher.c
index a9e79d8..d873de2 100644
--- a/crypto/algif_skcipher.c
+++ b/crypto/algif_skcipher.c
@@ -10,6 +10,25 @@
  * Software Foundation; either version 2 of the License, or (at your option)
  * any later version.
  *
+ * The following concept of the memory management is used:
+ *
+ * The kernel maintains two SGLs, the TX SGL and the RX SGL. The TX SGL is
+ * filled by user space with the data submitted via sendpage/sendmsg. Filling
+ * up the TX SGL does not cause a crypto operation -- the data will only be
+ * tracked by the kernel. Upon receipt of one recvmsg call, the caller must
+ * provide a buffer which is tracked with the RX SGL.
+ *
+ * During the processing of the recvmsg operation, the cipher request is
+ * allocated and prepared. To support multiple recvmsg operations operating
+ * on one TX SGL, an offset pointer into the TX SGL is maintained. The TX SGL
+ * that is used for the crypto request is scatterwalk_ffwd by the offset
+ * pointer to obtain the start address the crypto operation shall use for
+ * the input data.
+ *
+ * After the completion of the crypto operation, the RX SGL and the cipher
+ * request is released. The processed TX SGL parts are released together with
+ * the RX SGL release and the offset pointer is reduced by the released
+ * data.
  */
 
 #include 
@@ -31,78 +50,50 @@ struct skcipher_sg_list {
struct scatterlist sg[0];
 };
 
+struct skcipher_rsgl {
+   struct af_alg_sgl sgl;
+   struct list_head list;
+};
+
+struct skcipher_async_req {
+   struct kiocb *iocb;
+   struct sock *sk;
+
+   struct skcipher_rsgl first_sgl;
+   struct list_head rsgl_list;
+
+   unsigned int areqlen;
+   struct skcipher_request req;
+};
+
 struct skcipher_tfm {
struct crypto_skcipher *skcipher;
bool has_key;
 };
 
 struct skcipher_ctx {
-   struct list_head tsgl;
-   struct af_alg_sgl rsgl;
+   struct list_head tsgl_list;
 
void *iv;
 
struct af_alg_completion completion;
 
-   atomic_t inflight;
+   unsigned int inflight;
size_t used;
+   size_t processed;
 
-   unsigned int len;
bool more;
bool merge;
bool enc;
 
-   struct skcipher_request req;
-};
-
-struct skcipher_async_rsgl {
-   struct af_alg_sgl sgl;
-   struct list_head list;
+   unsigned int len;
 };
 
-struct skcipher_async_req {
-   struct kiocb *iocb;
-   struct skcipher_async_rsgl first_sgl;
-   struct list_head list;
-   struct scatterlist *tsg;
-   atomic_t *inflight;
-   struct skcipher_request req;
-};
+static DECLARE_WAIT_QUEUE_HEAD(skcipher_aio_finish_wait);
 
 #define MAX_SGL_ENTS ((4096 - sizeof(struct skcipher_sg_list)) / \
  sizeof(struct scatterlist) - 1)
 
-static void skcipher_free_async_sgls(struct skcipher_async_req *sreq)
-{
-   struct skcipher_async_rsgl *rsgl, *tmp;
-   struct scatterlist *sgl;
-   struct scatterlist *sg;
-   int i, n;
-
-   list_for_each_entry_safe(rsgl, tmp, >list, list) {
-   af_alg_free_sg(>sgl);
-   if (rsgl != >first_sgl)
-   kfree(rsgl);
-   }
-   sgl = sreq->tsg;
-   n = sg_nents(sgl);
-   for_each_sg(sgl, sg, n, i)
-   put_page(sg_page(sg));
-
-   kfree(sreq->tsg);
-}
-
-static void skcipher_async_cb(struct crypto_async_request *req, int err)
-{
-   struct skcipher_async_req *sreq = req->data;
-   struct kiocb *iocb = sreq->iocb;
-
-   atomic_dec(sreq->inflight);
-   skcipher_free_async_sgls(sreq);
-   kzfree(sreq);
-   iocb->ki_complete(iocb, err, err);
-}
-
 static inline 

[PATCH v3 0/2] crypto: AF_ALG memory management fix

2017-02-13 Thread Stephan Müller
Hi Herbert,

Changes v3:
* in *_pull_tsgl: make sure ctx->processed cannot be less than zero
* perform fuzzing of all input parameters with bogus values

Changes v2:
* import fix from Harsh Jain  to remove SG
  from list before freeing
* fix return code used for ki_complete to match AIO behavior
  with sync behavior
* rename variable list -> tsgl_list
* update the algif_aead patch to include a dynamic TX SGL
  allocation similar to what algif_skcipher does. This allows
  concurrent continuous read/write operations to the extent
  you requested. Although I have not implemented "pairs of
  TX/RX SGLs" as I think that is even more overhead, the
  implementation conceptually defines such pairs. The recvmsg
  call defines how much from the input data is processed.
  The caller can have arbitrary number of sendmsg calls
  where the data is added to the TX SGL before an recvmsg
  asks the kernel to process a given amount (or all) of the
  TX SGL.

With the changes, you will see a lot of code duplication now
as I deliberately tried to use the same struct and variable names,
the same function names and even the same oder of functions.
If you agree to this patch, I volunteer to provide a followup
patch that will extract the code duplication into common
functions.

Please find attached memory management updates to

- simplify the code: the old AIO memory management is very
  complex and seemingly very fragile -- the update now
  eliminates all reported bugs in the skcipher and AEAD
  interfaces which allowed the kernel to be crashed by
  an unprivileged user

- streamline the code: there is one code path for AIO and sync
  operation; the code between algif_skcipher and algif_aead
  is very similar (if that patch set is accepted, I volunteer
  to reduce code duplication by moving service operations
  into af_alg.c and to further unify the TX SGL handling)

- unify the AIO and sync operation which only differ in the
  kernel crypto API callback and whether to wait for the
  crypto operation or not

- fix all reported bugs regarding the handling of multiple
  IOCBs.

The following testing was performed:

- stress testing to verify that no memleaks exist

- testing using Tadeusz Struck AIO test tool (see
  https://github.com/tstruk/afalg_async_test) -- the AEAD test
  is not applicable any more due to the changed user space
  interface; the skcipher test works once the user space
  interface change is honored in the test code

- using the libkcapi test suite, all tests including the
  originally failing ones (AIO with multiple IOCBs) work now --
  the current libkcapi code artificially limits the AEAD
  operation to one IOCB. After altering the libkcapi code
  to allow multiple IOCBs, the testing works flawless.

Stephan Mueller (2):
  crypto: skcipher AF_ALG - overhaul memory management
  crypto: aead AF_ALG - overhaul memory management

 crypto/algif_aead.c | 673 +---
 crypto/algif_skcipher.c | 477 ++
 2 files changed, 554 insertions(+), 596 deletions(-)

-- 
2.9.3




[PATCH v3 2/2] crypto: aead AF_ALG - overhaul memory management

2017-02-13 Thread Stephan Müller
The updated memory management is described in the top part of the code.
As one benefit of the changed memory management, the AIO and synchronous
operation is now implemented in one common function. The AF_ALG
operation uses the async kernel crypto API interface for each cipher
operation. Thus, the only difference between the AIO and sync operation
types visible from user space is:

1. the callback function to be invoked when the asynchronous operation
   is completed

2. whether to wait for the completion of the kernel crypto API operation
   or not

The change includes the overhaul of the TX and RX SGL handling. The TX
SGL holding the data sent from user space to the kernel is now dynamic
similar to algif_skcipher. This dynamic nature allows a continuous
operation of a thread sending data and a second thread receiving the
data. These threads do not need to synchronize as the kernel processes
as much data from the TX SGL to fill the RX SGL.

The caller reading the data from the kernel defines the amount of data
to be processed. Considering that the interface covers AEAD
authenticating ciphers, the reader must provide the buffer in the
correct size. Thus the reader defines the encryption size.

Signed-off-by: Stephan Mueller 
---
 crypto/algif_aead.c | 673 +++-
 1 file changed, 356 insertions(+), 317 deletions(-)

diff --git a/crypto/algif_aead.c b/crypto/algif_aead.c
index 533265f..ed49fce 100644
--- a/crypto/algif_aead.c
+++ b/crypto/algif_aead.c
@@ -11,6 +11,26 @@
  * under the terms of the GNU General Public License as published by the Free
  * Software Foundation; either version 2 of the License, or (at your option)
  * any later version.
+ *
+ * The following concept of the memory management is used:
+ *
+ * The kernel maintains two SGLs, the TX SGL and the RX SGL. The TX SGL is
+ * filled by user space with the data submitted via sendpage/sendmsg. Filling
+ * up the TX SGL does not cause a crypto operation -- the data will only be
+ * tracked by the kernel. Upon receipt of one recvmsg call, the caller must
+ * provide a buffer which is tracked with the RX SGL.
+ *
+ * During the processing of the recvmsg operation, the cipher request is
+ * allocated and prepared. To support multiple recvmsg operations operating
+ * on one TX SGL, an offset pointer into the TX SGL is maintained. The TX SGL
+ * that is used for the crypto request is scatterwalk_ffwd by the offset
+ * pointer to obtain the start address the crypto operation shall use for
+ * the input data.
+ *
+ * After the completion of the crypto operation, the RX SGL and the cipher
+ * request is released. The processed TX SGL parts are released together with
+ * the RX SGL release and the offset pointer is reduced by the released
+ * data.
  */
 
 #include 
@@ -24,45 +44,55 @@
 #include 
 #include 
 
-struct aead_sg_list {
-   unsigned int cur;
-   struct scatterlist sg[ALG_MAX_PAGES];
+struct aead_tsgl {
+   struct list_head list;
+   unsigned int cur;   /* Last processed SG entry */
+   struct scatterlist sg[0];   /* Array of SGs forming the SGL */
 };
 
-struct aead_async_rsgl {
+struct aead_rsgl {
struct af_alg_sgl sgl;
struct list_head list;
 };
 
 struct aead_async_req {
-   struct scatterlist *tsgl;
-   struct aead_async_rsgl first_rsgl;
-   struct list_head list;
struct kiocb *iocb;
-   unsigned int tsgls;
-   char iv[];
+   struct sock *sk;
+
+   struct aead_rsgl first_rsgl;/* First RX SG */
+   struct list_head rsgl_list; /* Track RX SGs */
+
+   unsigned int outlen;/* Filled output buf length */
+
+   unsigned int areqlen;   /* Length of this data struct */
+   struct aead_request aead_req;   /* req ctx trails this struct */
 };
 
 struct aead_ctx {
-   struct aead_sg_list tsgl;
-   struct aead_async_rsgl first_rsgl;
-   struct list_head list;
+   struct list_head tsgl_list; /* Link to TX SGL */
 
void *iv;
+   size_t aead_assoclen;
 
-   struct af_alg_completion completion;
+   struct af_alg_completion completion;/* sync work queue */
 
-   unsigned long used;
+   unsigned int inflight;  /* Outstanding AIO ops */
+   size_t used;/* TX bytes sent to kernel */
+   size_t processed;   /* Processed TX bytes */
 
-   unsigned int len;
-   bool more;
-   bool merge;
-   bool enc;
+   bool more;  /* More data to be expected? */
+   bool merge; /* Merge new data into existing SG */
+   bool enc;   /* Crypto operation: enc, dec */
 
-   size_t aead_assoclen;
-   struct aead_request aead_req;
+   unsigned int len;   /* Length of allocated memory for this struct */
+   struct crypto_aead *aead_tfm;
 };
 
+static DECLARE_WAIT_QUEUE_HEAD(aead_aio_finish_wait);
+
+#define MAX_SGL_ENTS ((4096 - 

Re: [PATCH v3 3/4] dmaengine: Add Broadcom SBA RAID driver

2017-02-13 Thread Anup Patel
On Fri, Feb 10, 2017 at 11:20 PM, Dan Williams  wrote:
> On Fri, Feb 10, 2017 at 1:07 AM, Anup Patel  wrote:
>> The Broadcom stream buffer accelerator (SBA) provides offloading
>> capabilities for RAID operations. This SBA offload engine is
>> accessible via Broadcom SoC specific ring manager.
>>
>> This patch adds Broadcom SBA RAID driver which provides one
>> DMA device with RAID capabilities using one or more Broadcom
>> SoC specific ring manager channels. The SBA RAID driver in its
>> current shape implements memcpy, xor, and pq operations.
>>
>> Signed-off-by: Anup Patel 
>> Reviewed-by: Ray Jui 
>> ---
>>  drivers/dma/Kconfig|   13 +
>>  drivers/dma/Makefile   |1 +
>>  drivers/dma/bcm-sba-raid.c | 1711 
>> 
>>  3 files changed, 1725 insertions(+)
>>  create mode 100644 drivers/dma/bcm-sba-raid.c
>>
>> diff --git a/drivers/dma/Kconfig b/drivers/dma/Kconfig
>> index 263495d..bf8fb84 100644
>> --- a/drivers/dma/Kconfig
>> +++ b/drivers/dma/Kconfig
>> @@ -99,6 +99,19 @@ config AXI_DMAC
>>   controller is often used in Analog Device's reference designs for 
>> FPGA
>>   platforms.
>>
>> +config BCM_SBA_RAID
>> +   tristate "Broadcom SBA RAID engine support"
>> +   depends on (ARM64 && MAILBOX && RAID6_PQ) || COMPILE_TEST
>> +   select DMA_ENGINE
>> +   select DMA_ENGINE_RAID
>> +   select ASYNC_TX_ENABLE_CHANNEL_SWITCH
>
> ASYNC_TX_ENABLE_CHANNEL_SWITCH violates the DMA mapping API and
> Russell has warned it's especially problematic on ARM [1].  If you
> need channel switching for this offload engine to be useful then you
> need to move DMA mapping and channel switching responsibilities to MD
> itself.
>
> [1]: 
> http://lists.infradead.org/pipermail/linux-arm-kernel/2011-January/036753.html

Actually driver works fine with/without
ASYNC_TX_ENABLE_CHANNEL_SWITCH enabled
so I am fine with removing dependency on this config option.

>
>
> [..]
>> diff --git a/drivers/dma/bcm-sba-raid.c b/drivers/dma/bcm-sba-raid.c
>> new file mode 100644
>> index 000..bab9918
>> --- /dev/null
>> +++ b/drivers/dma/bcm-sba-raid.c
>> @@ -0,0 +1,1711 @@
>> +/*
>> + * Copyright (C) 2017 Broadcom
>> + *
>> + * This program is free software; you can redistribute it and/or modify
>> + * it under the terms of the GNU General Public License version 2 as
>> + * published by the Free Software Foundation.
>> + */
>> +
>> +/*
>> + * Broadcom SBA RAID Driver
>> + *
>> + * The Broadcom stream buffer accelerator (SBA) provides offloading
>> + * capabilities for RAID operations. The SBA offload engine is accessible
>> + * via Broadcom SoC specific ring manager. Two or more offload engines
>> + * can share same Broadcom SoC specific ring manager due to this Broadcom
>> + * SoC specific ring manager driver is implemented as a mailbox controller
>> + * driver and offload engine drivers are implemented as mallbox clients.
>> + *
>> + * Typically, Broadcom SoC specific ring manager will implement larger
>> + * number of hardware rings over one or more SBA hardware devices. By
>> + * design, the internal buffer size of SBA hardware device is limited
>> + * but all offload operations supported by SBA can be broken down into
>> + * multiple small size requests and executed parallely on multiple SBA
>> + * hardware devices for achieving high through-put.
>> + *
>> + * The Broadcom SBA RAID driver does not require any register programming
>> + * except submitting request to SBA hardware device via mailbox channels.
>> + * This driver implements a DMA device with one DMA channel using a set
>> + * of mailbox channels provided by Broadcom SoC specific ring manager
>> + * driver. To exploit parallelism (as described above), all DMA request
>> + * coming to SBA RAID DMA channel are broken down to smaller requests
>> + * and submitted to multiple mailbox channels in round-robin fashion.
>> + * For having more SBA DMA channels, we can create more SBA device nodes
>> + * in Broadcom SoC specific DTS based on number of hardware rings supported
>> + * by Broadcom SoC ring manager.
>> + */
>> +
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +
>> +#include "dmaengine.h"
>> +
>> +/* SBA command helper macros */
>> +#define SBA_DEC(_d, _s, _m)(((_d) >> (_s)) & (_m))
>> +#define SBA_ENC(_d, _v, _s, _m)\
>> +   do {\
>> +   (_d) &= ~((u64)(_m) << (_s));   \
>> +   (_d) |= (((u64)(_v) & (_m)) << (_s));   \
>> +   } while (0)
>
> Reusing a macro argument multiple times is problematic, consider
> SBA_ENC(..., arg++, ...), and hiding assignments in a macro make this
> hard to read. The compiler should inline it