[bug report] crypto: brcm - Add Broadcom SPU driver
Hello Rob Rice, The patch 9d12ba86f818: "crypto: brcm - Add Broadcom SPU driver" from Feb 3, 2017, leads to the following static checker warning: drivers/crypto/bcm/cipher.c:2340 ahash_finup() warn: 'tmpbuf' was already freed. drivers/crypto/bcm/cipher.c 2316 /* Copy data from req scatterlist to tmp buffer */ 2317 gfp = (req->base.flags & (CRYPTO_TFM_REQ_MAY_BACKLOG | 2318 CRYPTO_TFM_REQ_MAY_SLEEP)) ? GFP_KERNEL : GFP_ATOMIC; 2319 tmpbuf = kmalloc(req->nbytes, gfp); 2320 if (!tmpbuf) { 2321 ret = -ENOMEM; 2322 goto ahash_finup_exit; 2323 } 2324 2325 if (sg_copy_to_buffer(req->src, nents, tmpbuf, req->nbytes) != 2326 req->nbytes) { 2327 ret = -EINVAL; 2328 goto ahash_finup_free; 2329 } 2330 2331 /* Call synchronous update */ 2332 ret = crypto_shash_finup(ctx->shash, tmpbuf, req->nbytes, 2333 req->result); 2334 kfree(tmpbuf); ^ 2335 } else { 2336 /* Otherwise call the internal function which uses SPU hw */ 2337 return __ahash_finup(req); 2338 } 2339 ahash_finup_free: 2340 kfree(tmpbuf); ^ I'm only working a 30 minutes per day to keep a hand in. I'm not sending patches this month. 2341 2342 ahash_finup_exit: 2343 /* Done with hash, can deallocate it now */ 2344 crypto_free_shash(ctx->shash->tfm); 2345 kfree(ctx->shash); 2346 return ret; 2347 } regards, dan carpenter
[PATCH v4 4/4] dt-bindings: Add DT bindings document for Broadcom SBA RAID driver
This patch adds the DT bindings document for newly added Broadcom SBA RAID driver. Signed-off-by: Anup PatelReviewed-by: Ray Jui Reviewed-by: Scott Branden --- .../devicetree/bindings/dma/brcm,iproc-sba.txt | 29 ++ 1 file changed, 29 insertions(+) create mode 100644 Documentation/devicetree/bindings/dma/brcm,iproc-sba.txt diff --git a/Documentation/devicetree/bindings/dma/brcm,iproc-sba.txt b/Documentation/devicetree/bindings/dma/brcm,iproc-sba.txt new file mode 100644 index 000..092913a --- /dev/null +++ b/Documentation/devicetree/bindings/dma/brcm,iproc-sba.txt @@ -0,0 +1,29 @@ +* Broadcom SBA RAID engine + +Required properties: +- compatible: Should be one of the following + "brcm,iproc-sba" + "brcm,iproc-sba-v2" + The "brcm,iproc-sba" has support for only 6 PQ coefficients + The "brcm,iproc-sba-v2" has support for only 30 PQ coefficients +- mboxes: List of phandle and mailbox channel specifiers + +Example: + +raid_mbox: mbox@6740 { + ... + #mbox-cells = <3>; + ... +}; + +raid0 { + compatible = "brcm,iproc-sba-v2"; + mboxes = <_mbox 0 0x1 0x>, +<_mbox 1 0x1 0x>, +<_mbox 2 0x1 0x>, +<_mbox 3 0x1 0x>, +<_mbox 4 0x1 0x>, +<_mbox 5 0x1 0x>, +<_mbox 6 0x1 0x>, +<_mbox 7 0x1 0x>; +}; -- 2.7.4
[PATCH v4 0/4] Broadcom SBA RAID support
The Broadcom SBA RAID is a stream-based device which provides RAID5/6 offload. It requires a SoC specific ring manager (such as Broadcom FlexRM ring manager) to provide ring-based programming interface. Due to this, the Broadcom SBA RAID driver (mailbox client) implements DMA device having one DMA channel using a set of mailbox channels provided by Broadcom SoC specific ring manager driver (mailbox controller). The Broadcom SBA RAID hardware requires PQ disk position instead of PQ disk coefficient. To address this, we have added raid_gflog table which will help driver to convert PQ disk coefficient to PQ disk position. This patchset is based on Linux-4.10-rc2 and depends on patchset "[PATCH v4 0/2] Broadcom FlexRM ring manager support" It is also available at sba-raid-v4 branch of https://github.com/Broadcom/arm64-linux.git Changes since v3: - Replaced SBA_ENC() with sba_cmd_enc() inline function - Use list_first_entry_or_null() wherever possible - Remove unwanted brances around loops wherever possible - Use lockdep_assert_held() where required Changes since v2: - Droped patch to handle DMA devices having support for fewer PQ coefficients in Linux Async Tx - Added work-around in bcm-sba-raid driver to handle unsupported PQ coefficients using multiple SBA requests Changes since v1: - Droped patch to add mbox_channel_device() API - Used GENMASK and BIT macros wherever possible in bcm-sba-raid driver - Replaced C_MDATA macros with static inline functions in bcm-sba-raid driver - Removed sba_alloc_chan_resources() callback in bcm-sba-raid driver - Used dev_err() instead of dev_info() wherever applicable - Removed call to sba_issue_pending() from sba_tx_submit() in bcm-sba-raid driver - Implemented SBA request chaning for handling (len > sba->req_size) in bcm-sba-raid driver - Implemented device_terminate_all() callback in bcm-sba-raid driver Anup Patel (4): lib/raid6: Add log-of-2 table for RAID6 HW requiring disk position async_tx: Fix DMA_PREP_FENCE usage in do_async_gen_syndrome() dmaengine: Add Broadcom SBA RAID driver dt-bindings: Add DT bindings document for Broadcom SBA RAID driver .../devicetree/bindings/dma/brcm,iproc-sba.txt | 29 + crypto/async_tx/async_pq.c |5 +- drivers/dma/Kconfig| 13 + drivers/dma/Makefile |1 + drivers/dma/bcm-sba-raid.c | 1694 include/linux/raid/pq.h|1 + lib/raid6/mktables.c | 20 + 7 files changed, 1760 insertions(+), 3 deletions(-) create mode 100644 Documentation/devicetree/bindings/dma/brcm,iproc-sba.txt create mode 100644 drivers/dma/bcm-sba-raid.c -- 2.7.4
[PATCH v4 3/4] dmaengine: Add Broadcom SBA RAID driver
The Broadcom stream buffer accelerator (SBA) provides offloading capabilities for RAID operations. This SBA offload engine is accessible via Broadcom SoC specific ring manager. This patch adds Broadcom SBA RAID driver which provides one DMA device with RAID capabilities using one or more Broadcom SoC specific ring manager channels. The SBA RAID driver in its current shape implements memcpy, xor, and pq operations. Signed-off-by: Anup PatelReviewed-by: Ray Jui --- drivers/dma/Kconfig| 13 + drivers/dma/Makefile |1 + drivers/dma/bcm-sba-raid.c | 1694 3 files changed, 1708 insertions(+) create mode 100644 drivers/dma/bcm-sba-raid.c diff --git a/drivers/dma/Kconfig b/drivers/dma/Kconfig index 263495d..bf8fb84 100644 --- a/drivers/dma/Kconfig +++ b/drivers/dma/Kconfig @@ -99,6 +99,19 @@ config AXI_DMAC controller is often used in Analog Device's reference designs for FPGA platforms. +config BCM_SBA_RAID + tristate "Broadcom SBA RAID engine support" + depends on (ARM64 && MAILBOX && RAID6_PQ) || COMPILE_TEST + select DMA_ENGINE + select DMA_ENGINE_RAID + select ASYNC_TX_ENABLE_CHANNEL_SWITCH + default ARCH_BCM_IPROC + help + Enable support for Broadcom SBA RAID Engine. The SBA RAID + engine is available on most of the Broadcom iProc SoCs. It + has the capability to offload memcpy, xor and pq computation + for raid5/6. + config COH901318 bool "ST-Ericsson COH901318 DMA support" select DMA_ENGINE diff --git a/drivers/dma/Makefile b/drivers/dma/Makefile index a4fa336..ba96bdd 100644 --- a/drivers/dma/Makefile +++ b/drivers/dma/Makefile @@ -17,6 +17,7 @@ obj-$(CONFIG_AMCC_PPC440SPE_ADMA) += ppc4xx/ obj-$(CONFIG_AT_HDMAC) += at_hdmac.o obj-$(CONFIG_AT_XDMAC) += at_xdmac.o obj-$(CONFIG_AXI_DMAC) += dma-axi-dmac.o +obj-$(CONFIG_BCM_SBA_RAID) += bcm-sba-raid.o obj-$(CONFIG_COH901318) += coh901318.o coh901318_lli.o obj-$(CONFIG_DMA_BCM2835) += bcm2835-dma.o obj-$(CONFIG_DMA_JZ4740) += dma-jz4740.o diff --git a/drivers/dma/bcm-sba-raid.c b/drivers/dma/bcm-sba-raid.c new file mode 100644 index 000..279e5e2 --- /dev/null +++ b/drivers/dma/bcm-sba-raid.c @@ -0,0 +1,1694 @@ +/* + * Copyright (C) 2017 Broadcom + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + */ + +/* + * Broadcom SBA RAID Driver + * + * The Broadcom stream buffer accelerator (SBA) provides offloading + * capabilities for RAID operations. The SBA offload engine is accessible + * via Broadcom SoC specific ring manager. Two or more offload engines + * can share same Broadcom SoC specific ring manager due to this Broadcom + * SoC specific ring manager driver is implemented as a mailbox controller + * driver and offload engine drivers are implemented as mallbox clients. + * + * Typically, Broadcom SoC specific ring manager will implement larger + * number of hardware rings over one or more SBA hardware devices. By + * design, the internal buffer size of SBA hardware device is limited + * but all offload operations supported by SBA can be broken down into + * multiple small size requests and executed parallely on multiple SBA + * hardware devices for achieving high through-put. + * + * The Broadcom SBA RAID driver does not require any register programming + * except submitting request to SBA hardware device via mailbox channels. + * This driver implements a DMA device with one DMA channel using a set + * of mailbox channels provided by Broadcom SoC specific ring manager + * driver. To exploit parallelism (as described above), all DMA request + * coming to SBA RAID DMA channel are broken down to smaller requests + * and submitted to multiple mailbox channels in round-robin fashion. + * For having more SBA DMA channels, we can create more SBA device nodes + * in Broadcom SoC specific DTS based on number of hardware rings supported + * by Broadcom SoC ring manager. + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include "dmaengine.h" + +/* SBA command related defines */ +#define SBA_TYPE_SHIFT 48 +#define SBA_TYPE_MASK GENMASK(1, 0) +#define SBA_TYPE_A 0x0 +#define SBA_TYPE_B 0x2 +#define SBA_TYPE_C 0x3 +#define SBA_USER_DEF_SHIFT 32 +#define SBA_USER_DEF_MASK GENMASK(15, 0) +#define SBA_R_MDATA_SHIFT 24 +#define SBA_R_MDATA_MASK GENMASK(7, 0) +#define SBA_C_MDATA_MS_SHIFT 18 +#define
[PATCH v4 1/4] lib/raid6: Add log-of-2 table for RAID6 HW requiring disk position
The raid6_gfexp table represents {2}^n values for 0 <= n < 256. The Linux async_tx framework pass values from raid6_gfexp as coefficients for each source to prep_dma_pq() callback of DMA channel with PQ capability. This creates problem for RAID6 offload engines (such as Broadcom SBA) which take disk position (i.e. log of {2}) instead of multiplicative cofficients from raid6_gfexp table. This patch adds raid6_gflog table having log-of-2 value for any given x such that 0 <= x < 256. For any given disk coefficient x, the corresponding disk position is given by raid6_gflog[x]. The RAID6 offload engine driver can use this newly added raid6_gflog table to get disk position from multiplicative coefficient. Signed-off-by: Anup PatelReviewed-by: Scott Branden Reviewed-by: Ray Jui --- include/linux/raid/pq.h | 1 + lib/raid6/mktables.c| 20 2 files changed, 21 insertions(+) diff --git a/include/linux/raid/pq.h b/include/linux/raid/pq.h index 4d57bba..30f9453 100644 --- a/include/linux/raid/pq.h +++ b/include/linux/raid/pq.h @@ -142,6 +142,7 @@ int raid6_select_algo(void); extern const u8 raid6_gfmul[256][256] __attribute__((aligned(256))); extern const u8 raid6_vgfmul[256][32] __attribute__((aligned(256))); extern const u8 raid6_gfexp[256] __attribute__((aligned(256))); +extern const u8 raid6_gflog[256] __attribute__((aligned(256))); extern const u8 raid6_gfinv[256] __attribute__((aligned(256))); extern const u8 raid6_gfexi[256] __attribute__((aligned(256))); diff --git a/lib/raid6/mktables.c b/lib/raid6/mktables.c index 39787db..e824d08 100644 --- a/lib/raid6/mktables.c +++ b/lib/raid6/mktables.c @@ -125,6 +125,26 @@ int main(int argc, char *argv[]) printf("EXPORT_SYMBOL(raid6_gfexp);\n"); printf("#endif\n"); + /* Compute log-of-2 table */ + printf("\nconst u8 __attribute__((aligned(256)))\n" + "raid6_gflog[256] =\n" "{\n"); + for (i = 0; i < 256; i += 8) { + printf("\t"); + for (j = 0; j < 8; j++) { + v = 255; + for (k = 0; k < 256; k++) + if (exptbl[k] == (i + j)) { + v = k; + break; + } + printf("0x%02x,%c", v, (j == 7) ? '\n' : ' '); + } + } + printf("};\n"); + printf("#ifdef __KERNEL__\n"); + printf("EXPORT_SYMBOL(raid6_gflog);\n"); + printf("#endif\n"); + /* Compute inverse table x^-1 == x^254 */ printf("\nconst u8 __attribute__((aligned(256)))\n" "raid6_gfinv[256] =\n" "{\n"); -- 2.7.4
[PATCH v4 2/4] async_tx: Fix DMA_PREP_FENCE usage in do_async_gen_syndrome()
The DMA_PREP_FENCE is to be used when preparing Tx descriptor if output of Tx descriptor is to be used by next/dependent Tx descriptor. The DMA_PREP_FENSE will not be set correctly in do_async_gen_syndrome() when calling dma->device_prep_dma_pq() under following conditions: 1. ASYNC_TX_FENCE not set in submit->flags 2. DMA_PREP_FENCE not set in dma_flags 3. src_cnt (= (disks - 2)) is greater than dma_maxpq(dma, dma_flags) This patch fixes DMA_PREP_FENCE usage in do_async_gen_syndrome() taking inspiration from do_async_xor() implementation. Signed-off-by: Anup PatelReviewed-by: Ray Jui Reviewed-by: Scott Branden --- crypto/async_tx/async_pq.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/crypto/async_tx/async_pq.c b/crypto/async_tx/async_pq.c index f83de99..56bd612 100644 --- a/crypto/async_tx/async_pq.c +++ b/crypto/async_tx/async_pq.c @@ -62,9 +62,6 @@ do_async_gen_syndrome(struct dma_chan *chan, dma_addr_t dma_dest[2]; int src_off = 0; - if (submit->flags & ASYNC_TX_FENCE) - dma_flags |= DMA_PREP_FENCE; - while (src_cnt > 0) { submit->flags = flags_orig; pq_src_cnt = min(src_cnt, dma_maxpq(dma, dma_flags)); @@ -83,6 +80,8 @@ do_async_gen_syndrome(struct dma_chan *chan, if (cb_fn_orig) dma_flags |= DMA_PREP_INTERRUPT; } + if (submit->flags & ASYNC_TX_FENCE) + dma_flags |= DMA_PREP_FENCE; /* Drivers force forward progress in case they can not provide * a descriptor -- 2.7.4
[PATCH] crypto: fix typo in doc
Fix a single letter typo in api-skcipher.rst. Signed-off-by: Gilad Ben-Yossef--- Documentation/crypto/api-skcipher.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Documentation/crypto/api-skcipher.rst b/Documentation/crypto/api-skcipher.rst index b20028a..4eec4a9 100644 --- a/Documentation/crypto/api-skcipher.rst +++ b/Documentation/crypto/api-skcipher.rst @@ -59,4 +59,4 @@ Synchronous Block Cipher API - Deprecated :doc: Synchronous Block Cipher API .. kernel-doc:: include/linux/crypto.h - :functions: crypto_alloc_blkcipher rypto_free_blkcipher crypto_has_blkcipher crypto_blkcipher_name crypto_blkcipher_ivsize crypto_blkcipher_blocksize crypto_blkcipher_setkey crypto_blkcipher_encrypt crypto_blkcipher_encrypt_iv crypto_blkcipher_decrypt crypto_blkcipher_decrypt_iv crypto_blkcipher_set_iv crypto_blkcipher_get_iv + :functions: crypto_alloc_blkcipher crypto_free_blkcipher crypto_has_blkcipher crypto_blkcipher_name crypto_blkcipher_ivsize crypto_blkcipher_blocksize crypto_blkcipher_setkey crypto_blkcipher_encrypt crypto_blkcipher_encrypt_iv crypto_blkcipher_decrypt crypto_blkcipher_decrypt_iv crypto_blkcipher_set_iv crypto_blkcipher_get_iv -- 2.1.4
Re: [PATCH v3 3/4] dmaengine: Add Broadcom SBA RAID driver
On Fri, Feb 10, 2017 at 11:20 PM, Dan Williamswrote: > On Fri, Feb 10, 2017 at 1:07 AM, Anup Patel wrote: >> The Broadcom stream buffer accelerator (SBA) provides offloading >> capabilities for RAID operations. This SBA offload engine is >> accessible via Broadcom SoC specific ring manager. >> >> This patch adds Broadcom SBA RAID driver which provides one >> DMA device with RAID capabilities using one or more Broadcom >> SoC specific ring manager channels. The SBA RAID driver in its >> current shape implements memcpy, xor, and pq operations. >> >> Signed-off-by: Anup Patel >> Reviewed-by: Ray Jui >> --- >> drivers/dma/Kconfig| 13 + >> drivers/dma/Makefile |1 + >> drivers/dma/bcm-sba-raid.c | 1711 >> >> 3 files changed, 1725 insertions(+) >> create mode 100644 drivers/dma/bcm-sba-raid.c >> >> diff --git a/drivers/dma/Kconfig b/drivers/dma/Kconfig >> index 263495d..bf8fb84 100644 >> --- a/drivers/dma/Kconfig >> +++ b/drivers/dma/Kconfig >> @@ -99,6 +99,19 @@ config AXI_DMAC >> controller is often used in Analog Device's reference designs for >> FPGA >> platforms. >> >> +config BCM_SBA_RAID >> + tristate "Broadcom SBA RAID engine support" >> + depends on (ARM64 && MAILBOX && RAID6_PQ) || COMPILE_TEST >> + select DMA_ENGINE >> + select DMA_ENGINE_RAID >> + select ASYNC_TX_ENABLE_CHANNEL_SWITCH > > ASYNC_TX_ENABLE_CHANNEL_SWITCH violates the DMA mapping API and > Russell has warned it's especially problematic on ARM [1]. If you > need channel switching for this offload engine to be useful then you > need to move DMA mapping and channel switching responsibilities to MD > itself. > > [1]: > http://lists.infradead.org/pipermail/linux-arm-kernel/2011-January/036753.html In case of BCM-SBA-RAID, the underlying "struct device" for each DMA channel is the mailbox controller "struct device" (i.e. Ring Manager device). This is because Ring Manager HW is the front facing device which we program to submit work to BCM-SBA-RAID HW. This means DMA channels provided by BCM-SBA-RAID driver will use same "struct device" for DMA mappings hence channel switching between BCM-SBA-RAID DMA channels is safe. Due to above, we can safely enable ASYNC_TX_ENABLE_CHANNEL_SWITCH option for BCM-SBA-RAID driver. Regards, Anup
Re: [PATCH v3 3/4] dmaengine: Add Broadcom SBA RAID driver
On Mon, Feb 13, 2017 at 2:43 PM, Anup Patelwrote: > On Fri, Feb 10, 2017 at 11:20 PM, Dan Williams > wrote: >> On Fri, Feb 10, 2017 at 1:07 AM, Anup Patel wrote: >>> The Broadcom stream buffer accelerator (SBA) provides offloading >>> capabilities for RAID operations. This SBA offload engine is >>> accessible via Broadcom SoC specific ring manager. >>> >>> This patch adds Broadcom SBA RAID driver which provides one >>> DMA device with RAID capabilities using one or more Broadcom >>> SoC specific ring manager channels. The SBA RAID driver in its >>> current shape implements memcpy, xor, and pq operations. >>> >>> Signed-off-by: Anup Patel >>> Reviewed-by: Ray Jui >>> --- >>> drivers/dma/Kconfig| 13 + >>> drivers/dma/Makefile |1 + >>> drivers/dma/bcm-sba-raid.c | 1711 >>> >>> 3 files changed, 1725 insertions(+) >>> create mode 100644 drivers/dma/bcm-sba-raid.c >>> >>> diff --git a/drivers/dma/Kconfig b/drivers/dma/Kconfig >>> index 263495d..bf8fb84 100644 >>> --- a/drivers/dma/Kconfig >>> +++ b/drivers/dma/Kconfig >>> @@ -99,6 +99,19 @@ config AXI_DMAC >>> controller is often used in Analog Device's reference designs for >>> FPGA >>> platforms. >>> >>> +config BCM_SBA_RAID >>> + tristate "Broadcom SBA RAID engine support" >>> + depends on (ARM64 && MAILBOX && RAID6_PQ) || COMPILE_TEST >>> + select DMA_ENGINE >>> + select DMA_ENGINE_RAID >>> + select ASYNC_TX_ENABLE_CHANNEL_SWITCH >> >> ASYNC_TX_ENABLE_CHANNEL_SWITCH violates the DMA mapping API and >> Russell has warned it's especially problematic on ARM [1]. If you >> need channel switching for this offload engine to be useful then you >> need to move DMA mapping and channel switching responsibilities to MD >> itself. >> >> [1]: >> http://lists.infradead.org/pipermail/linux-arm-kernel/2011-January/036753.html > > Actually driver works fine with/without > ASYNC_TX_ENABLE_CHANNEL_SWITCH enabled > so I am fine with removing dependency on this config option. I stand corrected. Previously, when I had tried removing ASYNC_TX_ENABLE_CHANNEL_SWITCH from BCM_SBA_RAID config option it worked because other drivers such xgene-dma and mv_xor_v2 are selecting this option. The BCM-SBA-RAID driver requires ASYNC_TX_ENABLE_CHANNEL_SWITCH option There is no issue reported for ASYNC_TX_ENABLE_CHANNEL_SWITCH with ARM64 kernel. The issue you pointed out was with ARM kernel. We will have to select ASYNC_TX_ENABLE_CHANNEL_SWITCH for BCM-SBA-RAID driver just like other ARM64 RAID drivers such as xgene-dma and mv_xor_v2. (Refer, XGENE_DMA and MV_XOR_V2 options) Regards, Anup
Re: [PATCH v3] crypto: algapi - make crypto_xor() and crypto_inc() alignment agnostic
On Sun, Feb 5, 2017 at 11:06 AM, Ard Biesheuvelwrote: > + if (IS_ENABLED(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS) || > + !((unsigned long)b & (__alignof__(*b) - 1))) Why not simply use the IS_ALIGNED macro? Also, are you might consider checking to see if this is a constant, so that you can avoid an unnecessary branch. Alternatively, if you want to use the branch, I'd be interested in you writing back saying, "I tested both cases, and branching is faster than always using the slow unaligned path." > + while (((unsigned long)dst & (relalign - 1)) && len > 0) { IS_ALIGNED > +static inline void crypto_xor(u8 *dst, const u8 *src, unsigned int size) > +{ > + if (IS_ENABLED(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS) && > + __builtin_constant_p(size) && > + (size % sizeof(unsigned long)) == 0) { You can expand this condition to be: if ( (is_constant(size) && size%sizeof(ulong)==0) && (efficient_unaligned || (is_constant(dst) && is_constant(src) && is_aligned(dst) && is_aligned(src))) ) It might seem complex, but it all gets compiled out.
Re: crypto: NULL deref in sha512_mb_mgr_get_comp_job_avx2
On Sat, 2017-02-11 at 18:50 +0800, Herbert Xu wrote: > On Wed, Feb 01, 2017 at 10:45:02AM -0800, Tim Chen wrote: > > > > > > One theory that Mehga and I have is that perhaps the flusher > > and regular computaion updates are stepping on each other. > > Can you try this patch and see if it helps? > Patch applied. Thanks. Herbert, Megha is now able to create a test set up that produce similar problem reported by Dmitry. This patch did not completely fix it. So maybe you can hold off on merging this patch to the mainline till we can develop a more complete fix. Thanks. Tim
Qualcomm QCE driver: XTS setkey only allows 128 bit AES
Hi, The Qualcomm QCE driver implementation defines: .flags = QCE_ALG_AES | QCE_MODE_XTS, .name = "xts(aes)", .drv_name = "xts-aes-qce", .blocksize = AES_BLOCK_SIZE, .ivsize = AES_BLOCK_SIZE, .min_keysize= AES_MIN_KEY_SIZE, .max_keysize= AES_MAX_KEY_SIZE, and alg->cra_ablkcipher.min_keysize = def->min_keysize; alg->cra_ablkcipher.max_keysize = def->max_keysize; alg->cra_ablkcipher.setkey = qce_ablkcipher_setkey; Thus, this driver has the limits of 128 to 256 bits for the key. Furthermore, the common setkey function is used. May I ask how the key for AES XTS is supposed to be handled here considering that the kernel crypto API expects that the AES key and the tweak key is set via one setkey call. I.e. the setkey should expect 256 through 512 bits. Thanks. Ciao Stephan
Re: [PATCH v4 0/3] Add Broadcom SPU Crypto Driver
On Sat, Feb 11, 2017 at 5:54 AM, Herbert Xuwrote: > On Fri, Feb 03, 2017 at 12:55:31PM -0500, Rob Rice wrote: >> Changes in v4: >> - Added Rob Herring's Acked-by to patch 1/3 for bindings doc >> - In response to Herbert's comment, in ahash_export() and >> ahash_import(), only copy the hash state, not state params >> related to cipher or aead algos. >> - Noticed that hmac_offset in iproc_reqctx_s and spu_hash_params >> wasn't really used. So removed. > > Patches 1-2 applied. Thanks. Thanks Herbert! Florian, could you please include patch #3 in your DT branch? Thanks, Jon > -- > Email: Herbert Xu > Home Page: http://gondor.apana.org.au/~herbert/ > PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
Re: [PATCH] lz4: fix performance regressions
On Mon, Feb 13, 2017 at 12:53:49PM +0100, Sven Schmidt wrote: > On Sun, Feb 12, 2017 at 10:41:17PM +0100, Willy Tarreau wrote: > > On Sun, Feb 12, 2017 at 04:20:00PM +0100, Sven Schmidt wrote: > > > On Sun, Feb 12, 2017 at 02:05:08PM +0100, Willy Tarreau wrote: > > > > Hi Sven, > > > > > > > > On Sun, Feb 12, 2017 at 12:16:18PM +0100, Sven Schmidt wrote: > > > > > Fix performance regressions compared to current kernel LZ4 > > > > > > > > Your patch contains mostly style cleanups which certainly are welcome > > > > but make the whole patch hard to review. These cleanups would have been > > > > better into a separate, preliminary patch IMHO. > > > > > > > > Regards, > > > > Willy > > > > > > Hi Willy, > > > > > > the problem was, I wanted to compare my version to the upstream LZ4 to > > > find bugs (as with my last patch version: wrong indentation in LZ4HC > > > in two for loops). But since the LZ4 code is a pain to read, I made > > > additional style cleanups "on the way". > > > > Oh I can easily understand! > > > > > Hope you can manage to review the patch though, because it is difficult > > > to separate the cleanups now. > > > > When I need to split a patch into pieces, usually what I do is that I > > revert it, re-apply it without committing, then "git add -p", validate > > all the hunks to be taken as the first patch (ie here the cleanups), > > commit, then commit the rest as a separate one. It seems to me that the > > fix is in the last few hunks though I'm not sure yet. > > > > Thanks, > > Willy > > Hi Willy, > > I didn't know about this 'trick' until now. Thanks for sharing it! I gave it > a short try recently, that's really cool! > > Since the problem discussed in this branch of this thread seems to be solved > (see Minchans E-Mail), I won't split the patches, though. > Or is there an actual need for doing so? I will send an updated patchset > (containing these patches + the other ones suggested by Eric) later. It's probably too late for this time, but keep it in mind for next time :-) willy
Re: [PATCH v7 0/5] Update LZ4 compressor module
On Mon, Feb 13, 2017 at 09:03:24AM +0900, Minchan Kim wrote: > Hi Sven, > > On Sun, Feb 12, 2017 at 12:16:17PM +0100, Sven Schmidt wrote: > > > > > > > > On 02/10/2017 01:13 AM, Minchan Kim wrote: > > > Hello Sven, > > > > > > On Thu, Feb 09, 2017 at 11:56:17AM +0100, Sven Schmidt wrote: > > >> Hey Minchan, > > >> > > >> On Thu, Feb 09, 2017 at 08:31:21AM +0900, Minchan Kim wrote: > > >>> Hello Sven, > > >>> > > >>> On Sun, Feb 05, 2017 at 08:09:03PM +0100, Sven Schmidt wrote: > > > > This patchset is for updating the LZ4 compression module to a version > > based > > on LZ4 v1.7.3 allowing to use the fast compression algorithm aka LZ4 > > fast > > which provides an "acceleration" parameter as a tradeoff between > > high compression ratio and high compression speed. > > > > We want to use LZ4 fast in order to support compression in lustre > > and (mostly, based on that) investigate data reduction techniques in > > behalf of > > storage systems. > > > > Also, it will be useful for other users of LZ4 compression, as with > > LZ4 fast > > it is possible to enable applications to use fast and/or high > > compression > > depending on the usecase. > > For instance, ZRAM is offering a LZ4 backend and could benefit from an > > updated > > LZ4 in the kernel. > > > > LZ4 homepage: http://www.lz4.org/ > > LZ4 source repository: https://github.com/lz4/lz4 > > Source version: 1.7.3 > > > > Benchmark (taken from [1], Core i5-4300U @1.9GHz): > > |--||-- > > Compressor | Compression | Decompression | Ratio > > |--||-- > > memcpy | 4200 MB/s | 4200 MB/s | 1.000 > > LZ4 fast 50 | 1080 MB/s | 2650 MB/s | 1.375 > > LZ4 fast 17 | 680 MB/s | 2220 MB/s | 1.607 > > LZ4 fast 5 | 475 MB/s | 1920 MB/s | 1.886 > > LZ4 default | 385 MB/s | 1850 MB/s | 2.101 > > > > [1] > > http://fastcompression.blogspot.de/2015/04/sampling-or-faster-lz4.html > > > > [PATCH 1/5] lib: Update LZ4 compressor module > > [PATCH 2/5] lib/decompress_unlz4: Change module to work with new LZ4 > > module version > > [PATCH 3/5] crypto: Change LZ4 modules to work with new LZ4 module > > version > > [PATCH 4/5] fs/pstore: fs/squashfs: Change usage of LZ4 to work with > > new LZ4 version > > [PATCH 5/5] lib/lz4: Remove back-compat wrappers > > >>> > > >>> Today, I did zram-lz4 performance test with fio in current mmotm and > > >>> found it makes regression about 20%. > > >>> > > >>> "lz4-update" means current mmots(git://git.cmpxchg.org/linux-mmots.git) > > >>> so > > >>> applied your 5 patches. (But now sure current mmots has recent uptodate > > >>> patches) > > >>> "revert" means I reverted your 5 patches in current mmots. > > >>> > > >>> revertlz4-update > > >>> > > >>> seq-write 1547 1339 86.55% > > >>> rand-write 22775 19381 85.10% > > >>>seq-read 7035 5589 79.45% > > >>> rand-read 78556 68479 87.17% > > >>>mixed-seq(R) 1305 1066 81.69% > > >>>mixed-seq(W) 1205984 81.66% > > >>> mixed-rand(R) 17421 14993 86.06% > > >>> mixed-rand(W) 17391 14968 86.07% > > >> > > >> which parts of the output (as well as units) are these values exactly? > > >> I did not work with fio until now, so I think I might ask before > > >> misinterpreting my results. > > > > > > It is IOPS. > > > > > >> > > >>> My fio description file > > >>> > > >>> [global] > > >>> bs=4k > > >>> ioengine=sync > > >>> size=100m > > >>> numjobs=1 > > >>> group_reporting > > >>> buffer_compress_percentage=30 > > >>> scramble_buffers=0 > > >>> filename=/dev/zram0 > > >>> loops=10 > > >>> fsync_on_close=1 > > >>> > > >>> [seq-write] > > >>> bs=64k > > >>> rw=write > > >>> stonewall > > >>> > > >>> [rand-write] > > >>> rw=randwrite > > >>> stonewall > > >>> > > >>> [seq-read] > > >>> bs=64k > > >>> rw=read > > >>> stonewall > > >>> > > >>> [rand-read] > > >>> rw=randread > > >>> stonewall > > >>> > > >>> [mixed-seq] > > >>> bs=64k > > >>> rw=rw > > >>> stonewall > > >>> > > >>> [mixed-rand] > > >>> rw=randrw > > >>> stonewall > > >>> > > >> > > >> Great, this makes it easy for me to reproduce your test. > > > > > > If you have trouble to reproduce, feel free to ask me. I'm happy to test > > > it. :) > > > > > > Thanks! > > > > > > > Hi Minchan, > > > > I will send an updated patch as a reply to this E-Mail. Would be really > > grateful If you'd test it and provide feedback! > > The patch should be applied to the current mmots tree. > > > > In fact, the updated LZ4 _is_ slower than the
[PATCH] hw_random: update help description for omap-rng
omap-rng also supports Marvell Armada 7k/8k SoCs, but no mention of this is made in the help text, despite the dependency being added. Explicitly mention these SoCs in the help description so people know that it covers more than just TI SoCs. Fixes: 383212425c92 ("hwrng: omap - Add device variant for SafeXcel IP-76 found in Armada 8K") Signed-off-by: Russell King--- drivers/char/hw_random/Kconfig | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/char/hw_random/Kconfig b/drivers/char/hw_random/Kconfig index ceff2fc524b1..0cafe08919c9 100644 --- a/drivers/char/hw_random/Kconfig +++ b/drivers/char/hw_random/Kconfig @@ -172,8 +172,8 @@ config HW_RANDOM_OMAP default HW_RANDOM ---help--- This driver provides kernel-side support for the Random Number - Generator hardware found on OMAP16xx, OMAP2/3/4/5 and AM33xx/AM43xx - multimedia processors. + Generator hardware found on OMAP16xx, OMAP2/3/4/5, AM33xx/AM43xx + multimedia processors, and Marvell Armada 7k/8k SoCs. To compile this driver as a module, choose M here: the module will be called omap-rng. -- 2.7.4
Re: [PATCH] lz4: fix performance regressions
On Sun, Feb 12, 2017 at 10:41:17PM +0100, Willy Tarreau wrote: > On Sun, Feb 12, 2017 at 04:20:00PM +0100, Sven Schmidt wrote: > > On Sun, Feb 12, 2017 at 02:05:08PM +0100, Willy Tarreau wrote: > > > Hi Sven, > > > > > > On Sun, Feb 12, 2017 at 12:16:18PM +0100, Sven Schmidt wrote: > > > > Fix performance regressions compared to current kernel LZ4 > > > > > > Your patch contains mostly style cleanups which certainly are welcome > > > but make the whole patch hard to review. These cleanups would have been > > > better into a separate, preliminary patch IMHO. > > > > > > Regards, > > > Willy > > > > Hi Willy, > > > > the problem was, I wanted to compare my version to the upstream LZ4 to find > > bugs (as with my last patch version: wrong indentation in LZ4HC > > in two for loops). But since the LZ4 code is a pain to read, I made > > additional style cleanups "on the way". > > Oh I can easily understand! > > > Hope you can manage to review the patch though, because it is difficult to > > separate the cleanups now. > > When I need to split a patch into pieces, usually what I do is that I > revert it, re-apply it without committing, then "git add -p", validate > all the hunks to be taken as the first patch (ie here the cleanups), > commit, then commit the rest as a separate one. It seems to me that the > fix is in the last few hunks though I'm not sure yet. > > Thanks, > Willy Hi Willy, I didn't know about this 'trick' until now. Thanks for sharing it! I gave it a short try recently, that's really cool! Since the problem discussed in this branch of this thread seems to be solved (see Minchans E-Mail), I won't split the patches, though. Or is there an actual need for doing so? I will send an updated patchset (containing these patches + the other ones suggested by Eric) later. Regards, Sven
[PATCH v3 1/2] crypto: skcipher AF_ALG - overhaul memory management
The updated memory management is described in the top part of the code. As one benefit of the changed memory management, the AIO and synchronous operation is now implemented in one common function. The AF_ALG operation uses the async kernel crypto API interface for each cipher operation. Thus, the only difference between the AIO and sync operation types visible from user space is: 1. the callback function to be invoked when the asynchronous operation is completed 2. whether to wait for the completion of the kernel crypto API operation or not In addition, the code structure is adjusted to match the structure of algif_aead for easier code assessment. The user space interface changed slightly as follows: the old AIO operation returned zero upon success and < 0 in case of an error to user space. As all other AF_ALG interfaces (including the sync skcipher interface) returned the number of processed bytes upon success and < 0 in case of an error, the new skcipher interface (regardless of AIO or sync) returns the number of processed bytes in case of success. Signed-off-by: Stephan Mueller--- crypto/algif_skcipher.c | 477 1 file changed, 198 insertions(+), 279 deletions(-) diff --git a/crypto/algif_skcipher.c b/crypto/algif_skcipher.c index a9e79d8..d873de2 100644 --- a/crypto/algif_skcipher.c +++ b/crypto/algif_skcipher.c @@ -10,6 +10,25 @@ * Software Foundation; either version 2 of the License, or (at your option) * any later version. * + * The following concept of the memory management is used: + * + * The kernel maintains two SGLs, the TX SGL and the RX SGL. The TX SGL is + * filled by user space with the data submitted via sendpage/sendmsg. Filling + * up the TX SGL does not cause a crypto operation -- the data will only be + * tracked by the kernel. Upon receipt of one recvmsg call, the caller must + * provide a buffer which is tracked with the RX SGL. + * + * During the processing of the recvmsg operation, the cipher request is + * allocated and prepared. To support multiple recvmsg operations operating + * on one TX SGL, an offset pointer into the TX SGL is maintained. The TX SGL + * that is used for the crypto request is scatterwalk_ffwd by the offset + * pointer to obtain the start address the crypto operation shall use for + * the input data. + * + * After the completion of the crypto operation, the RX SGL and the cipher + * request is released. The processed TX SGL parts are released together with + * the RX SGL release and the offset pointer is reduced by the released + * data. */ #include @@ -31,78 +50,50 @@ struct skcipher_sg_list { struct scatterlist sg[0]; }; +struct skcipher_rsgl { + struct af_alg_sgl sgl; + struct list_head list; +}; + +struct skcipher_async_req { + struct kiocb *iocb; + struct sock *sk; + + struct skcipher_rsgl first_sgl; + struct list_head rsgl_list; + + unsigned int areqlen; + struct skcipher_request req; +}; + struct skcipher_tfm { struct crypto_skcipher *skcipher; bool has_key; }; struct skcipher_ctx { - struct list_head tsgl; - struct af_alg_sgl rsgl; + struct list_head tsgl_list; void *iv; struct af_alg_completion completion; - atomic_t inflight; + unsigned int inflight; size_t used; + size_t processed; - unsigned int len; bool more; bool merge; bool enc; - struct skcipher_request req; -}; - -struct skcipher_async_rsgl { - struct af_alg_sgl sgl; - struct list_head list; + unsigned int len; }; -struct skcipher_async_req { - struct kiocb *iocb; - struct skcipher_async_rsgl first_sgl; - struct list_head list; - struct scatterlist *tsg; - atomic_t *inflight; - struct skcipher_request req; -}; +static DECLARE_WAIT_QUEUE_HEAD(skcipher_aio_finish_wait); #define MAX_SGL_ENTS ((4096 - sizeof(struct skcipher_sg_list)) / \ sizeof(struct scatterlist) - 1) -static void skcipher_free_async_sgls(struct skcipher_async_req *sreq) -{ - struct skcipher_async_rsgl *rsgl, *tmp; - struct scatterlist *sgl; - struct scatterlist *sg; - int i, n; - - list_for_each_entry_safe(rsgl, tmp, >list, list) { - af_alg_free_sg(>sgl); - if (rsgl != >first_sgl) - kfree(rsgl); - } - sgl = sreq->tsg; - n = sg_nents(sgl); - for_each_sg(sgl, sg, n, i) - put_page(sg_page(sg)); - - kfree(sreq->tsg); -} - -static void skcipher_async_cb(struct crypto_async_request *req, int err) -{ - struct skcipher_async_req *sreq = req->data; - struct kiocb *iocb = sreq->iocb; - - atomic_dec(sreq->inflight); - skcipher_free_async_sgls(sreq); - kzfree(sreq); - iocb->ki_complete(iocb, err, err); -} - static inline
[PATCH v3 0/2] crypto: AF_ALG memory management fix
Hi Herbert, Changes v3: * in *_pull_tsgl: make sure ctx->processed cannot be less than zero * perform fuzzing of all input parameters with bogus values Changes v2: * import fix from Harsh Jainto remove SG from list before freeing * fix return code used for ki_complete to match AIO behavior with sync behavior * rename variable list -> tsgl_list * update the algif_aead patch to include a dynamic TX SGL allocation similar to what algif_skcipher does. This allows concurrent continuous read/write operations to the extent you requested. Although I have not implemented "pairs of TX/RX SGLs" as I think that is even more overhead, the implementation conceptually defines such pairs. The recvmsg call defines how much from the input data is processed. The caller can have arbitrary number of sendmsg calls where the data is added to the TX SGL before an recvmsg asks the kernel to process a given amount (or all) of the TX SGL. With the changes, you will see a lot of code duplication now as I deliberately tried to use the same struct and variable names, the same function names and even the same oder of functions. If you agree to this patch, I volunteer to provide a followup patch that will extract the code duplication into common functions. Please find attached memory management updates to - simplify the code: the old AIO memory management is very complex and seemingly very fragile -- the update now eliminates all reported bugs in the skcipher and AEAD interfaces which allowed the kernel to be crashed by an unprivileged user - streamline the code: there is one code path for AIO and sync operation; the code between algif_skcipher and algif_aead is very similar (if that patch set is accepted, I volunteer to reduce code duplication by moving service operations into af_alg.c and to further unify the TX SGL handling) - unify the AIO and sync operation which only differ in the kernel crypto API callback and whether to wait for the crypto operation or not - fix all reported bugs regarding the handling of multiple IOCBs. The following testing was performed: - stress testing to verify that no memleaks exist - testing using Tadeusz Struck AIO test tool (see https://github.com/tstruk/afalg_async_test) -- the AEAD test is not applicable any more due to the changed user space interface; the skcipher test works once the user space interface change is honored in the test code - using the libkcapi test suite, all tests including the originally failing ones (AIO with multiple IOCBs) work now -- the current libkcapi code artificially limits the AEAD operation to one IOCB. After altering the libkcapi code to allow multiple IOCBs, the testing works flawless. Stephan Mueller (2): crypto: skcipher AF_ALG - overhaul memory management crypto: aead AF_ALG - overhaul memory management crypto/algif_aead.c | 673 +--- crypto/algif_skcipher.c | 477 ++ 2 files changed, 554 insertions(+), 596 deletions(-) -- 2.9.3
[PATCH v3 2/2] crypto: aead AF_ALG - overhaul memory management
The updated memory management is described in the top part of the code. As one benefit of the changed memory management, the AIO and synchronous operation is now implemented in one common function. The AF_ALG operation uses the async kernel crypto API interface for each cipher operation. Thus, the only difference between the AIO and sync operation types visible from user space is: 1. the callback function to be invoked when the asynchronous operation is completed 2. whether to wait for the completion of the kernel crypto API operation or not The change includes the overhaul of the TX and RX SGL handling. The TX SGL holding the data sent from user space to the kernel is now dynamic similar to algif_skcipher. This dynamic nature allows a continuous operation of a thread sending data and a second thread receiving the data. These threads do not need to synchronize as the kernel processes as much data from the TX SGL to fill the RX SGL. The caller reading the data from the kernel defines the amount of data to be processed. Considering that the interface covers AEAD authenticating ciphers, the reader must provide the buffer in the correct size. Thus the reader defines the encryption size. Signed-off-by: Stephan Mueller--- crypto/algif_aead.c | 673 +++- 1 file changed, 356 insertions(+), 317 deletions(-) diff --git a/crypto/algif_aead.c b/crypto/algif_aead.c index 533265f..ed49fce 100644 --- a/crypto/algif_aead.c +++ b/crypto/algif_aead.c @@ -11,6 +11,26 @@ * under the terms of the GNU General Public License as published by the Free * Software Foundation; either version 2 of the License, or (at your option) * any later version. + * + * The following concept of the memory management is used: + * + * The kernel maintains two SGLs, the TX SGL and the RX SGL. The TX SGL is + * filled by user space with the data submitted via sendpage/sendmsg. Filling + * up the TX SGL does not cause a crypto operation -- the data will only be + * tracked by the kernel. Upon receipt of one recvmsg call, the caller must + * provide a buffer which is tracked with the RX SGL. + * + * During the processing of the recvmsg operation, the cipher request is + * allocated and prepared. To support multiple recvmsg operations operating + * on one TX SGL, an offset pointer into the TX SGL is maintained. The TX SGL + * that is used for the crypto request is scatterwalk_ffwd by the offset + * pointer to obtain the start address the crypto operation shall use for + * the input data. + * + * After the completion of the crypto operation, the RX SGL and the cipher + * request is released. The processed TX SGL parts are released together with + * the RX SGL release and the offset pointer is reduced by the released + * data. */ #include @@ -24,45 +44,55 @@ #include #include -struct aead_sg_list { - unsigned int cur; - struct scatterlist sg[ALG_MAX_PAGES]; +struct aead_tsgl { + struct list_head list; + unsigned int cur; /* Last processed SG entry */ + struct scatterlist sg[0]; /* Array of SGs forming the SGL */ }; -struct aead_async_rsgl { +struct aead_rsgl { struct af_alg_sgl sgl; struct list_head list; }; struct aead_async_req { - struct scatterlist *tsgl; - struct aead_async_rsgl first_rsgl; - struct list_head list; struct kiocb *iocb; - unsigned int tsgls; - char iv[]; + struct sock *sk; + + struct aead_rsgl first_rsgl;/* First RX SG */ + struct list_head rsgl_list; /* Track RX SGs */ + + unsigned int outlen;/* Filled output buf length */ + + unsigned int areqlen; /* Length of this data struct */ + struct aead_request aead_req; /* req ctx trails this struct */ }; struct aead_ctx { - struct aead_sg_list tsgl; - struct aead_async_rsgl first_rsgl; - struct list_head list; + struct list_head tsgl_list; /* Link to TX SGL */ void *iv; + size_t aead_assoclen; - struct af_alg_completion completion; + struct af_alg_completion completion;/* sync work queue */ - unsigned long used; + unsigned int inflight; /* Outstanding AIO ops */ + size_t used;/* TX bytes sent to kernel */ + size_t processed; /* Processed TX bytes */ - unsigned int len; - bool more; - bool merge; - bool enc; + bool more; /* More data to be expected? */ + bool merge; /* Merge new data into existing SG */ + bool enc; /* Crypto operation: enc, dec */ - size_t aead_assoclen; - struct aead_request aead_req; + unsigned int len; /* Length of allocated memory for this struct */ + struct crypto_aead *aead_tfm; }; +static DECLARE_WAIT_QUEUE_HEAD(aead_aio_finish_wait); + +#define MAX_SGL_ENTS ((4096 -
Re: [PATCH v3 3/4] dmaengine: Add Broadcom SBA RAID driver
On Fri, Feb 10, 2017 at 11:20 PM, Dan Williamswrote: > On Fri, Feb 10, 2017 at 1:07 AM, Anup Patel wrote: >> The Broadcom stream buffer accelerator (SBA) provides offloading >> capabilities for RAID operations. This SBA offload engine is >> accessible via Broadcom SoC specific ring manager. >> >> This patch adds Broadcom SBA RAID driver which provides one >> DMA device with RAID capabilities using one or more Broadcom >> SoC specific ring manager channels. The SBA RAID driver in its >> current shape implements memcpy, xor, and pq operations. >> >> Signed-off-by: Anup Patel >> Reviewed-by: Ray Jui >> --- >> drivers/dma/Kconfig| 13 + >> drivers/dma/Makefile |1 + >> drivers/dma/bcm-sba-raid.c | 1711 >> >> 3 files changed, 1725 insertions(+) >> create mode 100644 drivers/dma/bcm-sba-raid.c >> >> diff --git a/drivers/dma/Kconfig b/drivers/dma/Kconfig >> index 263495d..bf8fb84 100644 >> --- a/drivers/dma/Kconfig >> +++ b/drivers/dma/Kconfig >> @@ -99,6 +99,19 @@ config AXI_DMAC >> controller is often used in Analog Device's reference designs for >> FPGA >> platforms. >> >> +config BCM_SBA_RAID >> + tristate "Broadcom SBA RAID engine support" >> + depends on (ARM64 && MAILBOX && RAID6_PQ) || COMPILE_TEST >> + select DMA_ENGINE >> + select DMA_ENGINE_RAID >> + select ASYNC_TX_ENABLE_CHANNEL_SWITCH > > ASYNC_TX_ENABLE_CHANNEL_SWITCH violates the DMA mapping API and > Russell has warned it's especially problematic on ARM [1]. If you > need channel switching for this offload engine to be useful then you > need to move DMA mapping and channel switching responsibilities to MD > itself. > > [1]: > http://lists.infradead.org/pipermail/linux-arm-kernel/2011-January/036753.html Actually driver works fine with/without ASYNC_TX_ENABLE_CHANNEL_SWITCH enabled so I am fine with removing dependency on this config option. > > > [..] >> diff --git a/drivers/dma/bcm-sba-raid.c b/drivers/dma/bcm-sba-raid.c >> new file mode 100644 >> index 000..bab9918 >> --- /dev/null >> +++ b/drivers/dma/bcm-sba-raid.c >> @@ -0,0 +1,1711 @@ >> +/* >> + * Copyright (C) 2017 Broadcom >> + * >> + * This program is free software; you can redistribute it and/or modify >> + * it under the terms of the GNU General Public License version 2 as >> + * published by the Free Software Foundation. >> + */ >> + >> +/* >> + * Broadcom SBA RAID Driver >> + * >> + * The Broadcom stream buffer accelerator (SBA) provides offloading >> + * capabilities for RAID operations. The SBA offload engine is accessible >> + * via Broadcom SoC specific ring manager. Two or more offload engines >> + * can share same Broadcom SoC specific ring manager due to this Broadcom >> + * SoC specific ring manager driver is implemented as a mailbox controller >> + * driver and offload engine drivers are implemented as mallbox clients. >> + * >> + * Typically, Broadcom SoC specific ring manager will implement larger >> + * number of hardware rings over one or more SBA hardware devices. By >> + * design, the internal buffer size of SBA hardware device is limited >> + * but all offload operations supported by SBA can be broken down into >> + * multiple small size requests and executed parallely on multiple SBA >> + * hardware devices for achieving high through-put. >> + * >> + * The Broadcom SBA RAID driver does not require any register programming >> + * except submitting request to SBA hardware device via mailbox channels. >> + * This driver implements a DMA device with one DMA channel using a set >> + * of mailbox channels provided by Broadcom SoC specific ring manager >> + * driver. To exploit parallelism (as described above), all DMA request >> + * coming to SBA RAID DMA channel are broken down to smaller requests >> + * and submitted to multiple mailbox channels in round-robin fashion. >> + * For having more SBA DMA channels, we can create more SBA device nodes >> + * in Broadcom SoC specific DTS based on number of hardware rings supported >> + * by Broadcom SoC ring manager. >> + */ >> + >> +#include >> +#include >> +#include >> +#include >> +#include >> +#include >> +#include >> +#include >> +#include >> +#include >> + >> +#include "dmaengine.h" >> + >> +/* SBA command helper macros */ >> +#define SBA_DEC(_d, _s, _m)(((_d) >> (_s)) & (_m)) >> +#define SBA_ENC(_d, _v, _s, _m)\ >> + do {\ >> + (_d) &= ~((u64)(_m) << (_s)); \ >> + (_d) |= (((u64)(_v) & (_m)) << (_s)); \ >> + } while (0) > > Reusing a macro argument multiple times is problematic, consider > SBA_ENC(..., arg++, ...), and hiding assignments in a macro make this > hard to read. The compiler should inline it