[PATCH 2/2] arm64: dts: hisilicon hip07: drop commas from @address node names
Reported-by: Rob Herring Signed-off-by: Jonathan Cameron Fixes: e4a1f7858ab8 ("arm64: dts: hisi: add SEC crypto accelerator nodes for hip07 SoC") --- arch/arm64/boot/dts/hisilicon/hip07.dtsi | 28 1 file changed, 14 insertions(+), 14 deletions(-) diff --git a/arch/arm64/boot/dts/hisilicon/hip07.dtsi b/arch/arm64/boot/dts/hisilicon/hip07.dtsi index c33adefc3061..bc0113445655 100644 --- a/arch/arm64/boot/dts/hisilicon/hip07.dtsi +++ b/arch/arm64/boot/dts/hisilicon/hip07.dtsi @@ -949,35 +949,35 @@ reg = <0x0 0xc600 0x0 0x4>; }; - p0_its_dsa_b: interrupt-controller@8,c600 { + p0_its_dsa_b: interrupt-controller@8c600 { compatible = "arm,gic-v3-its"; msi-controller; #msi-cells = <1>; reg = <0x8 0xc600 0x0 0x4>; }; - p1_its_peri_a: interrupt-controller@400,4c00 { + p1_its_peri_a: interrupt-controller@4004c00 { compatible = "arm,gic-v3-its"; msi-controller; #msi-cells = <1>; reg = <0x400 0x4c00 0x0 0x4>; }; - p1_its_peri_b: interrupt-controller@400,6c00 { + p1_its_peri_b: interrupt-controller@4006c00 { compatible = "arm,gic-v3-its"; msi-controller; #msi-cells = <1>; reg = <0x400 0x6c00 0x0 0x4>; }; - p1_its_dsa_a: interrupt-controller@400,c600 { + p1_its_dsa_a: interrupt-controller@400c600 { compatible = "arm,gic-v3-its"; msi-controller; #msi-cells = <1>; reg = <0x400 0xc600 0x0 0x4>; }; - p1_its_dsa_b: interrupt-controller@408,c600 { + p1_its_dsa_b: interrupt-controller@408c600 { compatible = "arm,gic-v3-its"; msi-controller; #msi-cells = <1>; @@ -1066,7 +1066,7 @@ num-pins = <3>; }; }; - p0_mbigen_alg_b:interrupt-controller@8,d008 { + p0_mbigen_alg_b:interrupt-controller@8d008 { compatible = "hisilicon,mbigen-v2"; reg = <0x8 0xd008 0x0 0x1>; @@ -1083,7 +1083,7 @@ num-pins = <3>; }; }; - p1_mbigen_alg_a:interrupt-controller@400,d008 { + p1_mbigen_alg_a:interrupt-controller@400d008 { compatible = "hisilicon,mbigen-v2"; reg = <0x400 0xd008 0x0 0x1>; @@ -1100,7 +1100,7 @@ num-pins = <3>; }; }; - p1_mbigen_alg_b:interrupt-controller@408,d008 { + p1_mbigen_alg_b:interrupt-controller@408d008 { compatible = "hisilicon,mbigen-v2"; reg = <0x408 0xd008 0x0 0x1>; @@ -1187,7 +1187,7 @@ hisilicon,broken-prefetch-cmd; /* smmu-cb-memtype = <0x0 0x1>;*/ }; - p0_smmu_alg_b: smmu_alg@8,d004 { + p0_smmu_alg_b: smmu_alg@8d004 { compatible = "arm,smmu-v3"; reg = <0x8 0xd004 0x0 0x2>; interrupt-parent = <_mbigen_smmu_alg_b>; @@ -1200,7 +1200,7 @@ hisilicon,broken-prefetch-cmd; /* smmu-cb-memtype = <0x0 0x1>;*/ }; - p1_smmu_alg_a: smmu_alg@400,d004 { + p1_smmu_alg_a: smmu_alg@400d004 { compatible = "arm,smmu-v3"; reg = <0x400 0xd004 0x0 0x2>; interrupt-parent = <_mbigen_smmu_alg_a>; @@ -1213,7 +1213,7 @@ hisilicon,broken-prefetch-cmd; /* smmu-cb-memtype = <0x0 0x1>;*/ }; - p1_smmu_alg_b: smmu_alg@408,d004 { + p1_smmu_alg_b: smmu_alg@408d004 { compatible = "arm,smmu-v3"; reg = <0x408 0xd004 0x0 0x2>; interrupt-parent = <_mbigen_smmu_alg_b>; @@ -1763,7 +1763,7 @@ <605 1>, <606 4>, <607 1>, <608 4>; }; - p0_sec_b: crypto@8,d200 { + p0_sec_b: crypto@8d200 { compatible = "hisilicon,hip07-sec";
[PATCH 1/2] dt-bindings: crypto: hip07-sec, drop incorrect commas
There should not be a comma in the address used for the instance so drop them. This is a left over from a review of the final version before Herbert Xu picked the series up. Reported-by: Rob Herring Signed-off-by: Jonathan Cameron Fixes: 8f026dc887fe ("dt-bindings: Add bidnings for Hisilicon SEC crypto accelerators") --- .../devicetree/bindings/crypto/hisilicon,hip07-sec.txt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Documentation/devicetree/bindings/crypto/hisilicon,hip07-sec.txt b/Documentation/devicetree/bindings/crypto/hisilicon,hip07-sec.txt index 78d2db9d4de5..d28fd1af01b4 100644 --- a/Documentation/devicetree/bindings/crypto/hisilicon,hip07-sec.txt +++ b/Documentation/devicetree/bindings/crypto/hisilicon,hip07-sec.txt @@ -24,7 +24,7 @@ Optional properties: Example: -p1_sec_a: crypto@400,d200 { +p1_sec_a: crypto@400d200 { compatible = "hisilicon,hip07-sec"; reg = <0x400 0xd000 0x0 0x1 0x400 0xd200 0x0 0x1 -- 2.18.0
[PATCH 0/2] dt-bindings: Cleanup spurious commas in hisilicon hip07 dt.
Rob Herring picked up on an issue in our sec dt binding doc where we had an unnecessary comma in the addresses provided as part of the node name. The comment wasn't fixed before the series was applied, hence this follow up. This was a cut and paste job from a few of the its entries already present in the hip07 dtsi. This series tidies up this loose end. It has no particular urgency as nothing stops working if you have a comma present and it is unlikely an userspace code would be using these. Jonathan Cameron (2): dt-bindings: crypto: hip07-sec, drop incorrect commas arm64: dts: hisilicon hip07: drop commas from @address node names .../bindings/crypto/hisilicon,hip07-sec.txt | 2 +- arch/arm64/boot/dts/hisilicon/hip07.dtsi | 28 +-- 2 files changed, 15 insertions(+), 15 deletions(-) -- 2.18.0
Re: [bug report] crypto: hisilicon - SEC security accelerator driver
On Mon, 6 Aug 2018 23:12:44 +0300 Dan Carpenter wrote: > Hello Jonathan Cameron, > > The patch 915e4e8413da: "crypto: hisilicon - SEC security accelerator > driver" from Jul 23, 2018, leads to the following static checker > warning: > > drivers/crypto/hisilicon/sec/sec_algs.c:865 sec_alg_skcipher_crypto() > error: double free of 'split_sizes' > > drivers/crypto/hisilicon/sec/sec_algs.c >808 >809 /* Cleanup - all elements in pointer arrays have been coppied > */ >810 kfree(splits_in_nents); >811 kfree(splits_in); >812 kfree(splits_out_nents); >813 kfree(splits_out); >814 kfree(split_sizes); Thanks Dan, I clearly missed up the error paths when adding the backlog support. Sorry for the delay, this came in just as I went on vacation. Should get a patch out later this week. Easiest is probably going to be just doing this kfree block later, after the last error handling path. Jonathan > ^^^ > Free > >815 >816 /* Grab a big lock for a long time to avoid concurrency > issues */ >817 mutex_lock(>queuelock); >818 >819 /* >820 * Can go on to queue if we have space in either: >821 * 1) The hardware queue and no software queue >822 * 2) The software queue >823 * AND there is nothing in the backlog. If there is backlog > we >824 * have to only queue to the backlog queue and return busy. >825 */ >826 if ((!sec_queue_can_enqueue(queue, steps) && >827 (!queue->havesoftqueue || >828kfifo_avail(>softqueue) > steps)) || >829 !list_empty(>backlog)) { >830 if ((skreq->base.flags & CRYPTO_TFM_REQ_MAY_BACKLOG)) > { >831 list_add_tail(_req->backlog_head, > >backlog); >832 mutex_unlock(>queuelock); >833 return -EBUSY; >834 } >835 >836 ret = -EBUSY; >837 mutex_unlock(>queuelock); >838 goto err_free_elements; > ^^ >839 } >840 ret = sec_send_request(sec_req, queue); >841 mutex_unlock(>queuelock); >842 if (ret) >843 goto err_free_elements; > ^^ >844 >845 return -EINPROGRESS; >846 >847 err_free_elements: >848 list_for_each_entry_safe(el, temp, _req->elements, head) { >849 list_del(>head); >850 sec_alg_free_el(el, info); >851 } >852 if (crypto_skcipher_ivsize(atfm)) >853 dma_unmap_single(info->dev, sec_req->dma_iv, >854 crypto_skcipher_ivsize(atfm), >855 DMA_BIDIRECTIONAL); >856 err_unmap_out_sg: >857 if (skreq->src != skreq->dst) >858 sec_unmap_sg_on_err(skreq->dst, steps, splits_out, >859 splits_out_nents, > sec_req->len_out, >860 info->dev); >861 err_unmap_in_sg: >862 sec_unmap_sg_on_err(skreq->src, steps, splits_in, > splits_in_nents, >863 sec_req->len_in, info->dev); >864 err_free_split_sizes: >865 kfree(split_sizes); > ^^^ > Double free. > >866 >867 return ret; >868 } > > regards, > dan carpenter
[PATCH 2/3] crypto: hisilicon SEC security accelerator driver
This accelerator is found inside hisilicon hip06 and hip07 SoCs. Each instance provides a number of queues which feed a different number of backend acceleration units. The queues are operating in an out of order mode in the interests of throughput. The silicon does not do tracking of dependencies between multiple 'messages' or update of the IVs as appropriate for training. Hence where relevant we need to do this in software. Signed-off-by: Jonathan Cameron --- drivers/crypto/Kconfig |2 + drivers/crypto/Makefile |1 + drivers/crypto/hisilicon/Kconfig| 14 + drivers/crypto/hisilicon/Makefile |2 + drivers/crypto/hisilicon/sec/Makefile |3 + drivers/crypto/hisilicon/sec/sec_algs.c | 1122 ++ drivers/crypto/hisilicon/sec/sec_drv.c | 1323 +++ drivers/crypto/hisilicon/sec/sec_drv.h | 428 ++ 8 files changed, 2895 insertions(+) diff --git a/drivers/crypto/Kconfig b/drivers/crypto/Kconfig index d1ea1a07cecb..d0b80d0d1f8b 100644 --- a/drivers/crypto/Kconfig +++ b/drivers/crypto/Kconfig @@ -750,4 +750,6 @@ config CRYPTO_DEV_CCREE cryptographic operations on the system REE. If unsure say Y. +source "drivers/crypto/hisilicon/Kconfig" + endif # CRYPTO_HW diff --git a/drivers/crypto/Makefile b/drivers/crypto/Makefile index 7ae87b4f6c8d..ee43aed8cb69 100644 --- a/drivers/crypto/Makefile +++ b/drivers/crypto/Makefile @@ -45,3 +45,4 @@ obj-$(CONFIG_CRYPTO_DEV_VMX) += vmx/ obj-$(CONFIG_CRYPTO_DEV_BCM_SPU) += bcm/ obj-$(CONFIG_CRYPTO_DEV_SAFEXCEL) += inside-secure/ obj-$(CONFIG_CRYPTO_DEV_ARTPEC6) += axis/ +obj-y += hisilicon/ diff --git a/drivers/crypto/hisilicon/Kconfig b/drivers/crypto/hisilicon/Kconfig new file mode 100644 index ..8ca9c503bcb0 --- /dev/null +++ b/drivers/crypto/hisilicon/Kconfig @@ -0,0 +1,14 @@ +# SPDX-License-Identifier: GPL-2.0 + +config CRYPTO_DEV_HISI_SEC + tristate "Support for Hisilicon SEC crypto block cipher accelerator" + select CRYPTO_BLKCIPHER + select CRYPTO_ALGAPI + select SG_SPLIT + depends on ARM64 || COMPILE_TEST + depends on HAS_IOMEM + help + Support for Hisilicon SEC Engine in Hip06 and Hip07 + + To compile this as a module, choose M here: the module + will be called hisi_sec. diff --git a/drivers/crypto/hisilicon/Makefile b/drivers/crypto/hisilicon/Makefile new file mode 100644 index ..463f46ace182 --- /dev/null +++ b/drivers/crypto/hisilicon/Makefile @@ -0,0 +1,2 @@ +# SPDX-License-Identifier: GPL-2.0 +obj-$(CONFIG_CRYPTO_DEV_HISI_SEC) += sec/ diff --git a/drivers/crypto/hisilicon/sec/Makefile b/drivers/crypto/hisilicon/sec/Makefile new file mode 100644 index ..a55b698e0c27 --- /dev/null +++ b/drivers/crypto/hisilicon/sec/Makefile @@ -0,0 +1,3 @@ +# SPDX-License-Identifier: GPL-2.0 +obj-$(CONFIG_CRYPTO_DEV_HISI_SEC) += hisi_sec.o +hisi_sec-y = sec_algs.o sec_drv.o diff --git a/drivers/crypto/hisilicon/sec/sec_algs.c b/drivers/crypto/hisilicon/sec/sec_algs.c new file mode 100644 index ..d69d3ce358b0 --- /dev/null +++ b/drivers/crypto/hisilicon/sec/sec_algs.c @@ -0,0 +1,1122 @@ +// SPDX-License-Identifier: GPL-2.0 +/* Copyright (c) 2016-2017 Hisilicon Limited. */ +#include +#include +#include +#include +#include +#include + +#include +#include +#include +#include +#include +#include + +#include "sec_drv.h" + +#define SEC_MAX_CIPHER_KEY 64 +#define SEC_REQ_LIMIT SZ_32M + +struct sec_c_alg_cfg { + unsigned c_alg : 3; + unsigned c_mode : 3; + unsigned key_len: 2; + unsigned c_width: 2; +}; + +static const struct sec_c_alg_cfg sec_c_alg_cfgs[] = { + [SEC_C_DES_ECB_64] = { + .c_alg = SEC_C_ALG_DES, + .c_mode = SEC_C_MODE_ECB, + .key_len = SEC_KEY_LEN_DES, + }, + [SEC_C_DES_CBC_64] = { + .c_alg = SEC_C_ALG_DES, + .c_mode = SEC_C_MODE_CBC, + .key_len = SEC_KEY_LEN_DES, + }, + [SEC_C_3DES_ECB_192_3KEY] = { + .c_alg = SEC_C_ALG_3DES, + .c_mode = SEC_C_MODE_ECB, + .key_len = SEC_KEY_LEN_3DES_3_KEY, + }, + [SEC_C_3DES_ECB_192_2KEY] = { + .c_alg = SEC_C_ALG_3DES, + .c_mode = SEC_C_MODE_ECB, + .key_len = SEC_KEY_LEN_3DES_2_KEY, + }, + [SEC_C_3DES_CBC_192_3KEY] = { + .c_alg = SEC_C_ALG_3DES, + .c_mode = SEC_C_MODE_CBC, + .key_len = SEC_KEY_LEN_3DES_3_KEY, + }, + [SEC_C_3DES_CBC_192_2KEY] = { + .c_alg = SEC_C_ALG_3DES, + .c_mode = SEC_C_MODE_CBC, + .key_len = SEC_KEY_LEN_3DES_2_KEY, + }, + [SEC_C_AES_ECB_128] = { + .c_alg = SEC_C_ALG_AES, + .c_mode = SEC_C_MODE
[PATCH V2 0/3] Hisilicon SEC crypto driver (hip06 / hip07)
The driver provides in kernel support for the Hisilicon SEC accelerator found in the hip06 and hip07 SoCs. There are 4 such units on the D05 board for which an appropriate DT binding has been provided. ACPI also works with an appropriate UEFI build. The hardware does not update the IV in chaining or counting modes. This is done in the drive ron completion of the cipher operation. The driver support AES, DES and 3DES block ciphers in a range of modes (others to follow). Hash and AAED support to follow. Sorry for the delay on this one, other priorities and all that... Changes since V1. 1) DT binding fixes suggested by Rob Herring in patches 1 and 3. 2) Added XTS key check as suggested by Stephan Muller. 3) A trivial use after free found during testing of the above. Changes since RFC. 1) Addition of backlog queuing as needed to support dm-crypt usecases. 2) iommu presence tests now done as Robin Murphy suggested. 3) Hardware limiation to 32MB requests worked aroud in driver so it will now support very large requests (512*32MB). Larger request handling than this would require a longer queue with the associate overheads and is considered unlikely to be necessary. 4) The specific handling related to the inline IV patch set from Stephan has been dropped for now. 5) Interrupt handler was previous more complex than necessary so has been reworked. 6) Use of the bounce buffer for small packeets is dropped for now. This is a performance optimization that made the code harder to review and can be reintroduced as necessary at a later date. 7) Restructuring of some code to simplify hash and aaed (hash implemented but not ready fo upstream at this time) 8) Various minor fixes and reworks of the code * several off by one errors in the cleanup paths * single template for enc and dec * drop dec_key as not used (enc_key was used in both cases) * drop dma pool for IVs as it breaks chaining. * lots of spinlocks changed to mutexes as not taken in atomic context. * nasty memory leak cleaned up. Jonathan Cameron (3): dt-bindings: Add bindings for Hisilicon SEC crypto accelerators. crypto: hisilicon SEC security accelerator driver arm64: dts: hisi: add SEC crypto accelerator nodes for hip07 SoC .../bindings/crypto/hisilicon,hip07-sec.txt| 67 + arch/arm64/boot/dts/hisilicon/hip07.dtsi | 284 + drivers/crypto/Kconfig |2 + drivers/crypto/Makefile|1 + drivers/crypto/hisilicon/Kconfig | 14 + drivers/crypto/hisilicon/Makefile |2 + drivers/crypto/hisilicon/sec/Makefile |3 + drivers/crypto/hisilicon/sec/sec_algs.c| 1122 + drivers/crypto/hisilicon/sec/sec_drv.c | 1323 drivers/crypto/hisilicon/sec/sec_drv.h | 428 +++ 10 files changed, 3246 insertions(+) create mode 100644 Documentation/devicetree/bindings/crypto/hisilicon,hip07-sec.txt create mode 100644 drivers/crypto/hisilicon/Kconfig create mode 100644 drivers/crypto/hisilicon/Makefile create mode 100644 drivers/crypto/hisilicon/sec/Makefile create mode 100644 drivers/crypto/hisilicon/sec/sec_algs.c create mode 100644 drivers/crypto/hisilicon/sec/sec_drv.c create mode 100644 drivers/crypto/hisilicon/sec/sec_drv.h -- 2.16.2
[PATCH 3/3] arm64: dts: hisi: add SEC crypto accelerator nodes for hip07 SoC
Enable all 4 SEC units available on d05 boards. Signed-off-by: Jonathan Cameron --- arch/arm64/boot/dts/hisilicon/hip07.dtsi | 284 +++ 1 file changed, 284 insertions(+) diff --git a/arch/arm64/boot/dts/hisilicon/hip07.dtsi b/arch/arm64/boot/dts/hisilicon/hip07.dtsi index 0600a6a84ab7..6d046f4f7019 100644 --- a/arch/arm64/boot/dts/hisilicon/hip07.dtsi +++ b/arch/arm64/boot/dts/hisilicon/hip07.dtsi @@ -1049,7 +1049,74 @@ num-pins = <2>; }; }; + p0_mbigen_alg_a:interrupt-controller@d008 { + compatible = "hisilicon,mbigen-v2"; + reg = <0x0 0xd008 0x0 0x1>; + p0_mbigen_sec_a: intc_sec { + msi-parent = <_its_dsa_a 0x40400>; + interrupt-controller; + #interrupt-cells = <2>; + num-pins = <33>; + }; + p0_mbigen_smmu_alg_a: intc_smmu_alg { + msi-parent = <_its_dsa_a 0x40b1b>; + interrupt-controller; + #interrupt-cells = <2>; + num-pins = <3>; + }; + }; + p0_mbigen_alg_b:interrupt-controller@8,d008 { + compatible = "hisilicon,mbigen-v2"; + reg = <0x8 0xd008 0x0 0x1>; + + p0_mbigen_sec_b: intc_sec { + msi-parent = <_its_dsa_b 0x42400>; + interrupt-controller; + #interrupt-cells = <2>; + num-pins = <33>; + }; + p0_mbigen_smmu_alg_b: intc_smmu_alg { + msi-parent = <_its_dsa_b 0x42b1b>; + interrupt-controller; + #interrupt-cells = <2>; + num-pins = <3>; + }; + }; + p1_mbigen_alg_a:interrupt-controller@400,d008 { + compatible = "hisilicon,mbigen-v2"; + reg = <0x400 0xd008 0x0 0x1>; + + p1_mbigen_sec_a: intc_sec { + msi-parent = <_its_dsa_a 0x44400>; + interrupt-controller; + #interrupt-cells = <2>; + num-pins = <33>; + }; + p1_mbigen_smmu_alg_a: intc_smmu_alg { + msi-parent = <_its_dsa_a 0x44b1b>; + interrupt-controller; + #interrupt-cells = <2>; + num-pins = <3>; + }; + }; + p1_mbigen_alg_b:interrupt-controller@408,d008 { + compatible = "hisilicon,mbigen-v2"; + reg = <0x408 0xd008 0x0 0x1>; + + p1_mbigen_sec_b: intc_sec { + msi-parent = <_its_dsa_b 0x46400>; + interrupt-controller; + #interrupt-cells = <2>; + num-pins = <33>; + }; + p1_mbigen_smmu_alg_b: intc_smmu_alg { + msi-parent = <_its_dsa_b 0x46b1b>; + interrupt-controller; + #interrupt-cells = <2>; + num-pins = <3>; + }; + }; p0_mbigen_dsa_a: interrupt-controller@c008 { compatible = "hisilicon,mbigen-v2"; reg = <0x0 0xc008 0x0 0x1>; @@ -1107,6 +1174,58 @@ hisilicon,broken-prefetch-cmd; status = "disabled"; }; + p0_smmu_alg_a: smmu_alg@d004 { + compatible = "arm,smmu-v3"; + reg = <0x0 0xd004 0x0 0x2>; + interrupt-parent = <_mbigen_smmu_alg_a>; + interrupts = <733 1>, + <734 1>, + <735 1>; + interrupt-names = "eventq", "gerror", "priq"; + #iommu-cells = <1>; + dma-coherent; + hisilicon,broken-prefetch-cmd; + /* smmu-cb-memtype = <0x0 0x1>;*/ + }; + p0_smmu_alg_b: smmu_alg@8,d004 { + compatible = "arm,smmu-v3"; + reg = <0x8 0xd004 0x0 0x2>; + interrupt-parent = <_mbigen_smmu_alg_b>; + interrupts = <733 1>, + <734 1>, + <735 1>; + interrupt-names = "eventq", "gerror", "priq"; + #iommu-cells = <1>; + dma-coherent; + hisilicon,broken-prefetch-cmd; +
[PATCH 1/3] dt-bindings: Add bindings for Hisilicon SEC crypto accelerators.
The hip06 and hip07 SoCs contain a number of these crypto units which accelerate AES and DES operations. Signed-off-by: Jonathan Cameron --- .../bindings/crypto/hisilicon,hip07-sec.txt| 67 ++ 1 file changed, 67 insertions(+) diff --git a/Documentation/devicetree/bindings/crypto/hisilicon,hip07-sec.txt b/Documentation/devicetree/bindings/crypto/hisilicon,hip07-sec.txt new file mode 100644 index ..78d2db9d4de5 --- /dev/null +++ b/Documentation/devicetree/bindings/crypto/hisilicon,hip07-sec.txt @@ -0,0 +1,67 @@ +* Hisilicon hip07 Security Accelerator (SEC) + +Required properties: +- compatible: Must contain one of + - "hisilicon,hip06-sec" + - "hisilicon,hip07-sec" +- reg: Memory addresses and lengths of the memory regions through which + this device is controlled. + Region 0 has registers to control the backend processing engines. + Region 1 has registers for functionality common to all queues. + Regions 2-18 have registers for the 16 individual queues which are isolated + both in hardware and within the driver. +- interrupts: Interrupt specifiers. + Refer to interrupt-controller/interrupts.txt for generic interrupt client node + bindings. + Interrupt 0 is for the SEC unit error queue. + Interrupt 2N + 1 is the completion interrupt for queue N. + Interrupt 2N + 2 is the error interrupt for queue N. +- dma-coherent: The driver assumes coherent dma is possible. + +Optional properties: +- iommus: The SEC units are behind smmu-v3 iommus. + Refer to iommu/arm,smmu-v3.txt for more information. + +Example: + +p1_sec_a: crypto@400,d200 { + compatible = "hisilicon,hip07-sec"; + reg = <0x400 0xd000 0x0 0x1 + 0x400 0xd200 0x0 0x1 + 0x400 0xd201 0x0 0x1 + 0x400 0xd202 0x0 0x1 + 0x400 0xd203 0x0 0x1 + 0x400 0xd204 0x0 0x1 + 0x400 0xd205 0x0 0x1 + 0x400 0xd206 0x0 0x1 + 0x400 0xd207 0x0 0x1 + 0x400 0xd208 0x0 0x1 + 0x400 0xd209 0x0 0x1 + 0x400 0xd20a 0x0 0x1 + 0x400 0xd20b 0x0 0x1 + 0x400 0xd20c 0x0 0x1 + 0x400 0xd20d 0x0 0x1 + 0x400 0xd20e 0x0 0x1 + 0x400 0xd20f 0x0 0x1 + 0x400 0xd210 0x0 0x1>; + interrupt-parent = <_mbigen_sec_a>; + iommus = <_smmu_alg_a 0x600>; + dma-coherent; + interrupts = <576 4>, +<577 1>, <578 4>, +<579 1>, <580 4>, +<581 1>, <582 4>, +<583 1>, <584 4>, +<585 1>, <586 4>, +<587 1>, <588 4>, +<589 1>, <590 4>, +<591 1>, <592 4>, +<593 1>, <594 4>, +<595 1>, <596 4>, +<597 1>, <598 4>, +<599 1>, <600 4>, +<601 1>, <602 4>, +<603 1>, <604 4>, +<605 1>, <606 4>, +<607 1>, <608 4>; +}; -- 2.16.2
Re: [PATCH 2/3] crypto: hisilicon SEC security accelerator driver
On Fri, 20 Jul 2018 20:17:22 +0200 Stephan Müller wrote: > Am Montag, 16. Juli 2018, 12:43:41 CEST schrieb Jonathan Cameron: > > Hi Jonathan, > > > +static int sec_alg_skcipher_setkey_aes_xts(struct crypto_skcipher *tfm, > > + const u8 *key, unsigned int > > keylen) +{ > > + enum sec_cipher_alg alg; > > + > > + switch (keylen) { > > + case AES_KEYSIZE_128 * 2: > > + alg = SEC_C_AES_XTS_128; > > + break; > > + case AES_KEYSIZE_256 * 2: > > + alg = SEC_C_AES_XTS_256; > > + break; > > + default: > > + return -EINVAL; > > + } > > + > > + return sec_alg_skcipher_setkey(tfm, key, keylen, alg); > > +} > > Can you please call the function xts_check_key or xts_verify_key before > setting the key? > Will do. Thanks, Jonathan
Re: [PATCH 1/3] dt-bindings: Add bindings for Hisilicon SEC crypto accelerators.
On Fri, 20 Jul 2018 10:30:10 -0600 Rob Herring wrote: > On Mon, Jul 16, 2018 at 11:43:40AM +0100, Jonathan Cameron wrote: > > The hip06 and hip07 SoCs contain a number of these crypto units which > > accelerate AES and DES operations. > > > > Signed-off-by: Jonathan Cameron > > --- > > .../bindings/crypto/hisilicon,hip07-sec.txt| 69 > > ++ > > 1 file changed, 69 insertions(+) > > > > diff --git > > a/Documentation/devicetree/bindings/crypto/hisilicon,hip07-sec.txt > > b/Documentation/devicetree/bindings/crypto/hisilicon,hip07-sec.txt > > new file mode 100644 > > index ..00b838706c98 > > --- /dev/null > > +++ b/Documentation/devicetree/bindings/crypto/hisilicon,hip07-sec.txt > > @@ -0,0 +1,69 @@ > > +* Hisilicon hip07 Security Accelerator (SEC) > > + > > +Required properties: > > +- compatible: Must contain one of > > + - "hisilicon,hip06-sec" > > + - "hisilicon,hip07-sec" > > +- reg: Memory addresses and lengths of the memory regions used by the > > driver. > > You know my feelings about the word "driver" in bindings. :) Oops. Will fix. Oddly rare I actually write one of these docs so making newbie mistakes :( > > > + Region 0 has registers to control the backend processing engines. > > + Region 1 has registers for functionality common to all queues. > > + Regions 2-18 have registers for the individual queues which are isolated > > + both in hardware and within the driver. > > It's always 16 queues? Might state that somewhere. Technically complex because some of them may be in use by the secure world and hence not available and not seen in the DT. Right now the driver doesn't support that, and it's not happening on any existing platforms but I was trying to avoid stating there were definitely 16 available. I guess if that happens I'll fix the binding when they do it. (moderately unlikely to happen now given age of this platform, but you never know). > > > +- interrupts: Interrupt specifiers. > > + Refer to interrupt-controller/interrupts.txt for generic interrupt > > client node > > + bindings. > > + Interrupt 0 is for the SEC unit error queue. > > + Interrupt 2N + 1 is the completion interrupt for queue N. > > + Interrupt 2N + 2 is the error interrupt for queue N. > > +- dma-coherent: The driver assumes coherent dma is possible. > > + > > +Optional properties: > > +- iommus: The SEC units are behind smmu-v3 iommus. > > + Refer to iommu/arm,smmu-v3.txt for more information. > > + > > +Example: > > +Second socket, first unit chosen to illustrate need for 64 bit addresses. > > I don't follow the 64-bit address comment. Without it the address is in the 32bit range so internal review raised the question of why we needed to provide 64 bit registers as 32 bit ones are more compact. reg = <0xd00 0x1 etc > > > + > > +p1_sec_a: sec@d200 { > > crypto@... > > The unit-address should be from the 1st reg entry and why isn't the > 0x400 part included? Will fix here an obviously in the DT patch itself. > > > + compatible = "hisilicon,hip07-sec"; > > + #address-cells = <2>; > > + #size-cells = <2>; > > These aren't needed here as there are no child nodes. Oops I always forget those. > > > + reg = <0x400 0xd000 0x0 0x1 > > + 0x400 0xd200 0x0 0x1 > > + 0x400 0xd201 0x0 0x1 > > + 0x400 0xd202 0x0 0x1 > > + 0x400 0xd203 0x0 0x1 > > + 0x400 0xd204 0x0 0x1 > > + 0x400 0xd205 0x0 0x1 > > + 0x400 0xd206 0x0 0x1 > > + 0x400 0xd207 0x0 0x1 > > + 0x400 0xd208 0x0 0x1 > > + 0x400 0xd209 0x0 0x1 > > + 0x400 0xd20a 0x0 0x1 > > + 0x400 0xd20b 0x0 0x1 > > + 0x400 0xd20c 0x0 0x1 > > + 0x400 0xd20d 0x0 0x1 > > + 0x400 0xd20e 0x0 0x1 > > + 0x400 0xd20f 0x0 0x1 > > + 0x400 0xd210 0x0 0x1>; > > + interrupt-parent = <_mbigen_sec_a>; > > + iommus = <_smmu_alg_a 0x600>; > > + dma-coherent; > > + interrupts = <576 4>, > > +<577 1>,<578 4>, > > space needed after the comma. > > > +<579 1>,<580 4>, > > +<581 1>,<582 4>, > > +<583 1>,<584 4>, > > +<585 1>,<586 4>, > > +<587 1>,<588 4>, > > +<589 1>,<590 4>, > > +<591 1>,<592 4>, > > +<593 1>,<594 4>, > > +<595 1>,<596 4>, > > +<597 1>,<598 4>, > > +<599 1>,<600 4>, > > +<601 1>,<602 4>, > > +<603 1>,<604 4>, > > +<605 1>,<606 4>, > > +<607 1>,<608 4>; > > +}; > > -- > > 2.16.2 Thanks Rob. Jonathan > > > >
[PATCH 2/3] crypto: hisilicon SEC security accelerator driver
This accelerator is found inside hisilicon hip06 and hip07 SoCs. Each instance provides a number of queues which feed a different number of backend acceleration units. The queues are operating in an out of order mode in the interests of throughput. The silicon does not do tracking of dependencies between multiple 'messages' or update of the IVs as appropriate for training. Hence where relevant we need to do this in software. Signed-off-by: Jonathan Cameron --- drivers/crypto/Kconfig |2 + drivers/crypto/Makefile |1 + drivers/crypto/hisilicon/Kconfig| 14 + drivers/crypto/hisilicon/Makefile |2 + drivers/crypto/hisilicon/sec/Makefile |3 + drivers/crypto/hisilicon/sec/sec_algs.c | 1116 ++ drivers/crypto/hisilicon/sec/sec_drv.c | 1323 +++ drivers/crypto/hisilicon/sec/sec_drv.h | 428 ++ 8 files changed, 2889 insertions(+) diff --git a/drivers/crypto/Kconfig b/drivers/crypto/Kconfig index d1ea1a07cecb..d0b80d0d1f8b 100644 --- a/drivers/crypto/Kconfig +++ b/drivers/crypto/Kconfig @@ -750,4 +750,6 @@ config CRYPTO_DEV_CCREE cryptographic operations on the system REE. If unsure say Y. +source "drivers/crypto/hisilicon/Kconfig" + endif # CRYPTO_HW diff --git a/drivers/crypto/Makefile b/drivers/crypto/Makefile index 7ae87b4f6c8d..ee43aed8cb69 100644 --- a/drivers/crypto/Makefile +++ b/drivers/crypto/Makefile @@ -45,3 +45,4 @@ obj-$(CONFIG_CRYPTO_DEV_VMX) += vmx/ obj-$(CONFIG_CRYPTO_DEV_BCM_SPU) += bcm/ obj-$(CONFIG_CRYPTO_DEV_SAFEXCEL) += inside-secure/ obj-$(CONFIG_CRYPTO_DEV_ARTPEC6) += axis/ +obj-y += hisilicon/ diff --git a/drivers/crypto/hisilicon/Kconfig b/drivers/crypto/hisilicon/Kconfig new file mode 100644 index ..8ca9c503bcb0 --- /dev/null +++ b/drivers/crypto/hisilicon/Kconfig @@ -0,0 +1,14 @@ +# SPDX-License-Identifier: GPL-2.0 + +config CRYPTO_DEV_HISI_SEC + tristate "Support for Hisilicon SEC crypto block cipher accelerator" + select CRYPTO_BLKCIPHER + select CRYPTO_ALGAPI + select SG_SPLIT + depends on ARM64 || COMPILE_TEST + depends on HAS_IOMEM + help + Support for Hisilicon SEC Engine in Hip06 and Hip07 + + To compile this as a module, choose M here: the module + will be called hisi_sec. diff --git a/drivers/crypto/hisilicon/Makefile b/drivers/crypto/hisilicon/Makefile new file mode 100644 index ..463f46ace182 --- /dev/null +++ b/drivers/crypto/hisilicon/Makefile @@ -0,0 +1,2 @@ +# SPDX-License-Identifier: GPL-2.0 +obj-$(CONFIG_CRYPTO_DEV_HISI_SEC) += sec/ diff --git a/drivers/crypto/hisilicon/sec/Makefile b/drivers/crypto/hisilicon/sec/Makefile new file mode 100644 index ..a55b698e0c27 --- /dev/null +++ b/drivers/crypto/hisilicon/sec/Makefile @@ -0,0 +1,3 @@ +# SPDX-License-Identifier: GPL-2.0 +obj-$(CONFIG_CRYPTO_DEV_HISI_SEC) += hisi_sec.o +hisi_sec-y = sec_algs.o sec_drv.o diff --git a/drivers/crypto/hisilicon/sec/sec_algs.c b/drivers/crypto/hisilicon/sec/sec_algs.c new file mode 100644 index ..a9401e98d2e2 --- /dev/null +++ b/drivers/crypto/hisilicon/sec/sec_algs.c @@ -0,0 +1,1116 @@ +// SPDX-License-Identifier: GPL-2.0 +/* Copyright (c) 2016-2017 Hisilicon Limited. */ +#include +#include +#include +#include +#include +#include + +#include +#include +#include +#include +#include + +#include "sec_drv.h" + +#define SEC_MAX_CIPHER_KEY 64 +#define SEC_REQ_LIMIT SZ_32M + +struct sec_c_alg_cfg { + unsigned c_alg : 3; + unsigned c_mode : 3; + unsigned key_len: 2; + unsigned c_width: 2; +}; + +static const struct sec_c_alg_cfg sec_c_alg_cfgs[] = { + [SEC_C_DES_ECB_64] = { + .c_alg = SEC_C_ALG_DES, + .c_mode = SEC_C_MODE_ECB, + .key_len = SEC_KEY_LEN_DES, + }, + [SEC_C_DES_CBC_64] = { + .c_alg = SEC_C_ALG_DES, + .c_mode = SEC_C_MODE_CBC, + .key_len = SEC_KEY_LEN_DES, + }, + [SEC_C_3DES_ECB_192_3KEY] = { + .c_alg = SEC_C_ALG_3DES, + .c_mode = SEC_C_MODE_ECB, + .key_len = SEC_KEY_LEN_3DES_3_KEY, + }, + [SEC_C_3DES_ECB_192_2KEY] = { + .c_alg = SEC_C_ALG_3DES, + .c_mode = SEC_C_MODE_ECB, + .key_len = SEC_KEY_LEN_3DES_2_KEY, + }, + [SEC_C_3DES_CBC_192_3KEY] = { + .c_alg = SEC_C_ALG_3DES, + .c_mode = SEC_C_MODE_CBC, + .key_len = SEC_KEY_LEN_3DES_3_KEY, + }, + [SEC_C_3DES_CBC_192_2KEY] = { + .c_alg = SEC_C_ALG_3DES, + .c_mode = SEC_C_MODE_CBC, + .key_len = SEC_KEY_LEN_3DES_2_KEY, + }, + [SEC_C_AES_ECB_128] = { + .c_alg = SEC_C_ALG_AES, + .c_mode = SEC_C_MODE
[PATCH 0/3] Hisilicon SEC crypto driver (hip06 / hip07)
The driver provides in kernel support for hte Hisilicon SEC accelerator found in the hip06 and hip07 SoCs. There are 4 such units on the D05 board for which an appropriate DT binding has been provided. ACPI also works with an appropriate UEFI build. The hardware does not update the IV in chaining or counting modes. This is done in the drive ron completion of the cipher operation. The driver support AES, DES and 3DES block ciphers in a range of modes (others to follow). Hash and AAED support to follow. Sorry for the delay on this one, other priorities and all that... Changes since RFC. 1) Addition of backlog queuing as needed to support dm-crypt usecases. 2) iommu presence tests now done as Robin Murphy suggested. 3) Hardware limiation to 32MB requests worked aroud in driver so it will now support very large requests (512*32MB). Larger request handling than this would require a longer queue with the associate overheads and is considered unlikely to be necessary. 4) The specific handling related to the inline IV patch set from Stephan has been dropped for now. 5) Interrupt handler was previous more complex than necessary so has been reworked. 6) Use of the bounce buffer for small packeets is dropped for now. This is a performance optimization that made the code harder to review and can be reintroduced as necessary at a later date. 7) Restructuring of some code to simplify hash and aaed (hash implemented but not ready fo upstream at this time) 8) Various minor fixes and reworks of the code * several off by one errors in the cleanup paths * single template for enc and dec * drop dec_key as not used (enc_key was used in both cases) * drop dma pool for IVs as it breaks chaining. * lots of spinlocks changed to mutexes as not taken in atomic context. * nasty memory leak cleaned up. Jonathan Cameron (3): dt-bindings: Add bindings for Hisilicon SEC crypto accelerators. crypto: hisilicon SEC security accelerator driver arm64: dts: hisi: add SEC crypto accelerator nodes for hip07 SoC .../bindings/crypto/hisilicon,hip07-sec.txt| 69 + arch/arm64/boot/dts/hisilicon/hip07.dtsi | 285 + drivers/crypto/Kconfig |2 + drivers/crypto/Makefile|1 + drivers/crypto/hisilicon/Kconfig | 14 + drivers/crypto/hisilicon/Makefile |2 + drivers/crypto/hisilicon/sec/Makefile |3 + drivers/crypto/hisilicon/sec/sec_algs.c| 1116 + drivers/crypto/hisilicon/sec/sec_drv.c | 1323 drivers/crypto/hisilicon/sec/sec_drv.h | 428 +++ 10 files changed, 3243 insertions(+) create mode 100644 Documentation/devicetree/bindings/crypto/hisilicon,hip07-sec.txt create mode 100644 drivers/crypto/hisilicon/Kconfig create mode 100644 drivers/crypto/hisilicon/Makefile create mode 100644 drivers/crypto/hisilicon/sec/Makefile create mode 100644 drivers/crypto/hisilicon/sec/sec_algs.c create mode 100644 drivers/crypto/hisilicon/sec/sec_drv.c create mode 100644 drivers/crypto/hisilicon/sec/sec_drv.h -- 2.16.2
[PATCH 1/3] dt-bindings: Add bindings for Hisilicon SEC crypto accelerators.
The hip06 and hip07 SoCs contain a number of these crypto units which accelerate AES and DES operations. Signed-off-by: Jonathan Cameron --- .../bindings/crypto/hisilicon,hip07-sec.txt| 69 ++ 1 file changed, 69 insertions(+) diff --git a/Documentation/devicetree/bindings/crypto/hisilicon,hip07-sec.txt b/Documentation/devicetree/bindings/crypto/hisilicon,hip07-sec.txt new file mode 100644 index ..00b838706c98 --- /dev/null +++ b/Documentation/devicetree/bindings/crypto/hisilicon,hip07-sec.txt @@ -0,0 +1,69 @@ +* Hisilicon hip07 Security Accelerator (SEC) + +Required properties: +- compatible: Must contain one of + - "hisilicon,hip06-sec" + - "hisilicon,hip07-sec" +- reg: Memory addresses and lengths of the memory regions used by the driver. + Region 0 has registers to control the backend processing engines. + Region 1 has registers for functionality common to all queues. + Regions 2-18 have registers for the individual queues which are isolated + both in hardware and within the driver. +- interrupts: Interrupt specifiers. + Refer to interrupt-controller/interrupts.txt for generic interrupt client node + bindings. + Interrupt 0 is for the SEC unit error queue. + Interrupt 2N + 1 is the completion interrupt for queue N. + Interrupt 2N + 2 is the error interrupt for queue N. +- dma-coherent: The driver assumes coherent dma is possible. + +Optional properties: +- iommus: The SEC units are behind smmu-v3 iommus. + Refer to iommu/arm,smmu-v3.txt for more information. + +Example: +Second socket, first unit chosen to illustrate need for 64 bit addresses. + +p1_sec_a: sec@d200 { + compatible = "hisilicon,hip07-sec"; + #address-cells = <2>; + #size-cells = <2>; + reg = <0x400 0xd000 0x0 0x1 + 0x400 0xd200 0x0 0x1 + 0x400 0xd201 0x0 0x1 + 0x400 0xd202 0x0 0x1 + 0x400 0xd203 0x0 0x1 + 0x400 0xd204 0x0 0x1 + 0x400 0xd205 0x0 0x1 + 0x400 0xd206 0x0 0x1 + 0x400 0xd207 0x0 0x1 + 0x400 0xd208 0x0 0x1 + 0x400 0xd209 0x0 0x1 + 0x400 0xd20a 0x0 0x1 + 0x400 0xd20b 0x0 0x1 + 0x400 0xd20c 0x0 0x1 + 0x400 0xd20d 0x0 0x1 + 0x400 0xd20e 0x0 0x1 + 0x400 0xd20f 0x0 0x1 + 0x400 0xd210 0x0 0x1>; + interrupt-parent = <_mbigen_sec_a>; + iommus = <_smmu_alg_a 0x600>; + dma-coherent; + interrupts = <576 4>, +<577 1>,<578 4>, +<579 1>,<580 4>, +<581 1>,<582 4>, +<583 1>,<584 4>, +<585 1>,<586 4>, +<587 1>,<588 4>, +<589 1>,<590 4>, +<591 1>,<592 4>, +<593 1>,<594 4>, +<595 1>,<596 4>, +<597 1>,<598 4>, +<599 1>,<600 4>, +<601 1>,<602 4>, +<603 1>,<604 4>, +<605 1>,<606 4>, +<607 1>,<608 4>; +}; -- 2.16.2
[PATCH 3/3] arm64: dts: hisi: add SEC crypto accelerator nodes for hip07 SoC
Enable all 4 SEC units available on d05 boards. Signed-off-by: Jonathan Cameron --- arch/arm64/boot/dts/hisilicon/hip07.dtsi | 285 +++ 1 file changed, 285 insertions(+) diff --git a/arch/arm64/boot/dts/hisilicon/hip07.dtsi b/arch/arm64/boot/dts/hisilicon/hip07.dtsi index 0600a6a84ab7..a29c4466af26 100644 --- a/arch/arm64/boot/dts/hisilicon/hip07.dtsi +++ b/arch/arm64/boot/dts/hisilicon/hip07.dtsi @@ -1049,7 +1049,74 @@ num-pins = <2>; }; }; + p0_mbigen_alg_a:interrupt-controller@d008 { + compatible = "hisilicon,mbigen-v2"; + reg = <0x0 0xd008 0x0 0x1>; + p0_mbigen_sec_a: intc_sec { + msi-parent = <_its_dsa_a 0x40400>; + interrupt-controller; + #interrupt-cells = <2>; + num-pins = <33>; + }; + p0_mbigen_smmu_alg_a: intc_smmu_alg { + msi-parent = <_its_dsa_a 0x40b1b>; + interrupt-controller; + #interrupt-cells = <2>; + num-pins = <3>; + }; + }; + p0_mbigen_alg_b:interrupt-controller@8,d008 { + compatible = "hisilicon,mbigen-v2"; + reg = <0x8 0xd008 0x0 0x1>; + + p0_mbigen_sec_b: intc_sec { + msi-parent = <_its_dsa_b 0x42400>; + interrupt-controller; + #interrupt-cells = <2>; + num-pins = <33>; + }; + p0_mbigen_smmu_alg_b: intc_smmu_alg { + msi-parent = <_its_dsa_b 0x42b1b>; + interrupt-controller; + #interrupt-cells = <2>; + num-pins = <3>; + }; + }; + p1_mbigen_alg_a:interrupt-controller@400,d008 { + compatible = "hisilicon,mbigen-v2"; + reg = <0x400 0xd008 0x0 0x1>; + + p1_mbigen_sec_a: intc_sec { + msi-parent = <_its_dsa_a 0x44400>; + interrupt-controller; + #interrupt-cells = <2>; + num-pins = <33>; + }; + p1_mbigen_smmu_alg_a: intc_smmu_alg { + msi-parent = <_its_dsa_a 0x44b1b>; + interrupt-controller; + #interrupt-cells = <2>; + num-pins = <3>; + }; + }; + p1_mbigen_alg_b:interrupt-controller@408,d008 { + compatible = "hisilicon,mbigen-v2"; + reg = <0x408 0xd008 0x0 0x1>; + + p1_mbigen_sec_b: intc_sec { + msi-parent = <_its_dsa_b 0x46400>; + interrupt-controller; + #interrupt-cells = <2>; + num-pins = <33>; + }; + p1_mbigen_smmu_alg_b: intc_smmu_alg { + msi-parent = <_its_dsa_b 0x46b1b>; + interrupt-controller; + #interrupt-cells = <2>; + num-pins = <3>; + }; + }; p0_mbigen_dsa_a: interrupt-controller@c008 { compatible = "hisilicon,mbigen-v2"; reg = <0x0 0xc008 0x0 0x1>; @@ -1107,6 +1174,58 @@ hisilicon,broken-prefetch-cmd; status = "disabled"; }; + p0_smmu_alg_a: smmu_alg@d004 { + compatible = "arm,smmu-v3"; + reg = <0x0 0xd004 0x0 0x2>; + interrupt-parent = <_mbigen_smmu_alg_a>; + interrupts = <733 1>, + <734 1>, + <735 1>; + interrupt-names = "eventq", "gerror", "priq"; + #iommu-cells = <1>; + dma-coherent; + hisilicon,broken-prefetch-cmd; + /* smmu-cb-memtype = <0x0 0x1>;*/ + }; + p0_smmu_alg_b: smmu_alg@8,d004 { + compatible = "arm,smmu-v3"; + reg = <0x8 0xd004 0x0 0x2>; + interrupt-parent = <_mbigen_smmu_alg_b>; + interrupts = <733 1>, + <734 1>, + <735 1>; + interrupt-names = "eventq", "gerror", "priq"; + #iommu-cells = <1>; + dma-coherent; + hisilicon,broken-prefetch-cmd; +
Re: [PATCH v3 0/4] crypto: AF_ALG AIO improvements
On Tue, 27 Feb 2018 15:08:58 +0100 Stephan Müllerwrote: > Am Freitag, 23. Februar 2018, 13:00:26 CET schrieb Herbert Xu: > > Hi Herbert, Hi Stephan / Herbert, > > > On Fri, Feb 23, 2018 at 09:33:33AM +0100, Stephan Müller wrote: > > > A simple copy operation, however, will imply that in one AIO recvmsg > > > request, only *one* IOCB can be set and processed. > > > > Sure, but the recvmsg will return as soon as the crypto API encrypt > > or decrypt function returns. It's still fully async. It's just > > that the setup part needs to be done with sendmsg/recvmsg. > > Wouldn't a copy of the ctx->iv into a per-request buffer change the behavoir > of the AF_ALG interface significantly? > > Today, if multiple IOCBs are submitted, most cipher implementations would > serialize the requests (e.g. all implementations that behave synchronous in > nature like all software implementations). > > Thus, when copying the ctx->iv into a separate per-request buffer, suddenly > all block-chained cipher operations are not block chained any more. Agreed - specific handling would be needed to ensure the IV is written to each copy to maintain the chain. Not nice at all. > > > > Even if we wanted to do what you stated, just inlining the IV isn't > > enough. You'd also need to inline the assoclen, and probably the > > optype in case you want to mix encrypt/decrypt too. > > Maybe that is what we have to do. The one element I could do with more clarity on here is use cases as it feels like the discussion is a little unfocused (helps with performance runs, but is it really useful?) When do we want to have separate IVs per request but a shared key? I think this is relevant for ctr modes in particular where userspace can provide the relevant ctrs but the key is shared. Storage encryption modes such as XTS can also benefit. My own knowledge is too abstract to give good answers to these. > > > > However, I must say that I don't see the point of going all the way > > to support such a bulk submission interface (e.g., lio_listio). > > IMHO, the point is that AF_ALG is the only interface to allow userspace to > utilize hardware crypto implementations. For example, on a small chip with > hardware crypto support, your user space code can offload crypto to that > hardware to free CPU time. > > How else would somebody access its crypto accelerators? This is also useful at the high end where we may well be throwing this bulk submission at a set of crypto units (hidden behind a queue) to parallelize when possible. Just because we have lots of cpu power doesn't mean it makes sense to use it for crypto :) We 'could' just do it all in userspace via vfio, but there are the usual disadvantages in that approach in terms of generality etc. > > > > Remember, the algif interface due to its inherent overhead is meant > > for bulk data. That is, the processing time for each request is > > dominated by the actual processing, not the submission process. > > I see that. And for smaller chips with crypto support, this would be the case > IMHO. Especially if we streamline the AF_ALG overhead such that we reduce the > number of syscalls and user/kernel space roundtrips. For larger devices the ability to run large numbers of requests and 'know' that they don't need to be chained is useful (because they have separate IVs). This allows you to let the hardware handle them in parallel (either because the hardware handles dependency tracking, or because we have done it in the driver.) Applies just as well for large blocks with lower overhead. You could do this by opening lots of separate sockets and simply providing them all with the same key. However, this assumes the hardware / driver can handle very large numbers of contexts (ours can though we only implement a subset of this functionality in the current driver to keep things simple). If we 'fake' such support in the driver then there is inherent nastiness around having to let the hardware queues drain before you can change the IV) and that the overhead of operating such a pool of sockets in your program isn't significant. Managing such a pool of sockets would also be a significant complexity overhead in complexity of the user space code. > > > > If you're instead processing lots of tiny requests, do NOT use > > algif because it's not designed for that. > > The only issue in this case is that it makes the operation slower. > > > > Therefore spending too much time to optimise the submission overhead > > seems pointless to me. > > > > Cheers, > > > Ciao > Stephan > > Thanks, Jonathan
Re: ahash - scheduling whilst atomic.
On Mon, 26 Feb 2018 14:02:33 + Jonathan Cameron <jonathan.came...@huawei.com> wrote: > Hi All, Please ignore - stupid bug in my code. Sorry for the noise. Jonathan > > I seem to get much the same whether using af_alg or tcrypt to poke > the support for ahashes that I just added to our driver. > > Taking the af_alg path as that is the one I've chased further. > We return -EINPROGRESS (along with it seems lots of other ahash > implementations) if we have some data in flight. The hardware is > asynchronous and we don't currently bother to check if it is > already done or not (may do so later as an optimization). > > In hash_sendmsg (algif_hash.c) we have > > err = crypto_wait_req(crypto_ahash_update(>req), >wait); > > Now crypto_wait does a wait_for_completion if it gets -EINPROGRESS > from the crypto_ahash_update call. > > Given hash_sendmsg is called from a socket we can't do a wait_for_completion > here safely. I note for the various wait calls in the equivalent skcipher > code > we drop through with an error if they would otherwise sleep. > > I'm guessing that I'm missing something! As far as I can tell the first > few drivers I looked at caam_hash and crypto4xx do exactly the same thing > I am doing (superficially anyway)... > > Anyone have any thoughts? > > Thanks, > > Jonathan > ___ > Linuxarm mailing list > linux...@huawei.com > http://hulk.huawei.com/mailman/listinfo/linuxarm
ahash - scheduling whilst atomic.
Hi All, I seem to get much the same whether using af_alg or tcrypt to poke the support for ahashes that I just added to our driver. Taking the af_alg path as that is the one I've chased further. We return -EINPROGRESS (along with it seems lots of other ahash implementations) if we have some data in flight. The hardware is asynchronous and we don't currently bother to check if it is already done or not (may do so later as an optimization). In hash_sendmsg (algif_hash.c) we have err = crypto_wait_req(crypto_ahash_update(>req), >wait); Now crypto_wait does a wait_for_completion if it gets -EINPROGRESS from the crypto_ahash_update call. Given hash_sendmsg is called from a socket we can't do a wait_for_completion here safely. I note for the various wait calls in the equivalent skcipher code we drop through with an error if they would otherwise sleep. I'm guessing that I'm missing something! As far as I can tell the first few drivers I looked at caam_hash and crypto4xx do exactly the same thing I am doing (superficially anyway)... Anyone have any thoughts? Thanks, Jonathan
Re: [PATCH v2 4/4] crypto: add CRYPTO_TFM_REQ_PARALLEL flag
On Wed, 7 Feb 2018 16:43:10 +0100 Stephan Mueller <smuel...@chronox.de> wrote: > Am Mittwoch, 7. Februar 2018, 16:39:11 CET schrieb Jonathan Cameron: > > Hi Jonathan, > > > On Wed, 7 Feb 2018 13:48:32 +0100 > > > > Stephan Mueller <smuel...@chronox.de> wrote: > > > Am Mittwoch, 7. Februar 2018, 08:44:04 CET schrieb Stephan Müller: > > > > > > Hi, > > > > > > > diff --git a/crypto/algif_aead.c b/crypto/algif_aead.c > > > > index 3970ad7f6fd0..da010405eea0 100644 > > > > --- a/crypto/algif_aead.c > > > > +++ b/crypto/algif_aead.c > > > > @@ -66,13 +66,22 @@ static int aead_sendmsg(struct socket *sock, struct > > > > msghdr *msg, size_t size) { > > > > > > > > struct sock *sk = sock->sk; > > > > struct alg_sock *ask = alg_sk(sk); > > > > > > > > + struct af_alg_ctx *ctx = ask->private; > > > > > > > > struct sock *psk = ask->parent; > > > > struct alg_sock *pask = alg_sk(psk); > > > > struct aead_tfm *aeadc = pask->private; > > > > > > > > - struct crypto_aead *tfm = aeadc->aead; > > > > - unsigned int ivsize = crypto_aead_ivsize(tfm); > > > > + struct crypto_aead *aead = aeadc->aead; > > > > + struct crypto_tfm *tfm = crypto_aead_tfm(aead); > > > > + unsigned int ivsize = crypto_aead_ivsize(aead); > > > > + int ret = af_alg_sendmsg(sock, msg, size, ivsize); > > > > + > > > > + if (ret < 0) > > > > + return ret; > > > > > > > > - return af_alg_sendmsg(sock, msg, size, ivsize); > > > > + if (ctx->iiv == ALG_IIV_USE) > > > > > > This should be ALG_IIV_DISABLE of course. > > > > You say that, but my initial reading was that the core > > was requesting that the driver do things in parallel > > irrespective of whether the driver thought it was safe. > > So I would think this was correct. > > > > Definitely needs some documentation or a clearer name. > > How about: > > ALG_IV_SERIAL_PROCESSING (was ALG_IIV_DISABLE) > ALG_IV_PARALLEL_PROCESSING (was ALG_IIV_USE) > Actually those were fine on the basis that inline iv is obvious enough, it was CRYPTO_TFM_REQ_PARALLEL that was causing me confusion. Sorry, wasn't terribly clear on that! Jonathan > Ciao > Stephan > >
Re: [PATCH v2 4/4] crypto: add CRYPTO_TFM_REQ_PARALLEL flag
On Wed, 7 Feb 2018 13:48:32 +0100 Stephan Muellerwrote: > Am Mittwoch, 7. Februar 2018, 08:44:04 CET schrieb Stephan Müller: > > Hi, > > > > diff --git a/crypto/algif_aead.c b/crypto/algif_aead.c > > index 3970ad7f6fd0..da010405eea0 100644 > > --- a/crypto/algif_aead.c > > +++ b/crypto/algif_aead.c > > @@ -66,13 +66,22 @@ static int aead_sendmsg(struct socket *sock, struct > > msghdr *msg, size_t size) { > > struct sock *sk = sock->sk; > > struct alg_sock *ask = alg_sk(sk); > > + struct af_alg_ctx *ctx = ask->private; > > struct sock *psk = ask->parent; > > struct alg_sock *pask = alg_sk(psk); > > struct aead_tfm *aeadc = pask->private; > > - struct crypto_aead *tfm = aeadc->aead; > > - unsigned int ivsize = crypto_aead_ivsize(tfm); > > + struct crypto_aead *aead = aeadc->aead; > > + struct crypto_tfm *tfm = crypto_aead_tfm(aead); > > + unsigned int ivsize = crypto_aead_ivsize(aead); > > + int ret = af_alg_sendmsg(sock, msg, size, ivsize); > > + > > + if (ret < 0) > > + return ret; > > > > - return af_alg_sendmsg(sock, msg, size, ivsize); > > + if (ctx->iiv == ALG_IIV_USE) > > This should be ALG_IIV_DISABLE of course. You say that, but my initial reading was that the core was requesting that the driver do things in parallel irrespective of whether the driver thought it was safe. So I would think this was correct. Definitely needs some documentation or a clearer name. > > > + tfm->crt_flags |= CRYPTO_TFM_REQ_PARALLEL; > > + > > + return ret; > > } > > > > static int crypto_aead_copy_sgl(struct crypto_skcipher *null_tfm, > > diff --git a/crypto/algif_skcipher.c b/crypto/algif_skcipher.c > > index aee602a3ec24..22554cf7 100644 > > --- a/crypto/algif_skcipher.c > > +++ b/crypto/algif_skcipher.c > > @@ -43,12 +43,21 @@ static int skcipher_sendmsg(struct socket *sock, struct > > msghdr *msg, { > > struct sock *sk = sock->sk; > > struct alg_sock *ask = alg_sk(sk); > > + struct af_alg_ctx *ctx = ask->private; > > struct sock *psk = ask->parent; > > struct alg_sock *pask = alg_sk(psk); > > - struct crypto_skcipher *tfm = pask->private; > > - unsigned ivsize = crypto_skcipher_ivsize(tfm); > > + struct crypto_skcipher *skc = pask->private; > > + struct crypto_tfm *tfm = crypto_skcipher_tfm(skc); > > + unsigned int ivsize = crypto_skcipher_ivsize(skc); > > + int ret = af_alg_sendmsg(sock, msg, size, ivsize); > > + > > + if (ret < 0) > > + return ret; > > > > - return af_alg_sendmsg(sock, msg, size, ivsize); > > + if (ctx->iiv == ALG_IIV_USE) > > Dto. > > I will wait for some more comments before sending a new patch. > > Ciao > Stephan > >
Re: [PATCH v2 0/4] crypto: AF_ALG AIO improvements
On Wed, 7 Feb 2018 08:42:59 +0100 Stephan Müller <smuel...@chronox.de> wrote: > Hi Herbert, > > Herbert, the patch 1 is meant for stable. However, this patch as is > only applies to the new AF_ALG interface implementation. Though, > the issue goes back to the first implementation of AIO support. > Shall I try prepare a patch for the old AF_ALG implementation > as well? > > Changes from v1: > > * integrate the inline IV and locking patch into one patch set > > * reverse the order of lock context IV patch and inline IV patch -- > the reason is to allow the first patch to be back-ported to stable > > * mark the first patch (locking of the context IV) as applicable to > stable as there is an existing inconsistency which was demonstrated > by Harsh with the Chelsio driver vs the AES-NI driver > > * modify the inline IV patch to have proper unlocking of the mutex > in case of errors > > * prevent locking if no IV is defined by cipher > > * add a patch to allow crypto drivers to report whether they support > serialization -- in this case the locking in AF_ALG shall be > disabled > > * add a patch to inform the crypto drivers that their serialization > support should actually be enabled and used because AF_ALG does not > serialize the interdependent parallel AIO requests > > * streamline the code in patch 1 and 2 slightly > > I would like to ask the folks with real AIO hardware (Harsh, Jonathan) > to test the patches. Especially, is the locking patch should be tested > by Harsh as you have seen the issue with your hardware. > I've updated my driver to take advantage of this and it is working pretty much the same as it was with the dirty hacks I was running before. So Tested-by: Jonathan Cameron <jonathan.came...@huawei.com> with that minor change to sensitivity to having it set to IIV multiple times. Hmm. Naming of the parallel request probably needs some thought though. Will reply to that patch. > Thanks. > > Stephan Mueller (4): > crypto: AF_ALG AIO - lock context IV > crypto: AF_ALG - inline IV support > crypto: AF_ALG - allow driver to serialize IV access > crypto: add CRYPTO_TFM_REQ_PARALLEL flag > > crypto/af_alg.c | 119 > +++- > crypto/algif_aead.c | 86 +--- > crypto/algif_skcipher.c | 38 ++ > include/crypto/if_alg.h | 37 ++ > include/linux/crypto.h | 16 ++ > include/uapi/linux/if_alg.h | 6 ++- > 6 files changed, 249 insertions(+), 53 deletions(-) >
Re: [PATCH v2 2/4] crypto: AF_ALG - inline IV support
On Wed, 7 Feb 2018 08:43:32 +0100 Stephan Müllerwrote: > The kernel crypto API requires the caller to set an IV in the request > data structure. That request data structure shall define one particular > cipher operation. During the cipher operation, the IV is read by the > cipher implementation and eventually the potentially updated IV (e.g. > in case of CBC) is written back to the memory location the request data > structure points to. > > AF_ALG allows setting the IV with a sendmsg request, where the IV is > stored in the AF_ALG context that is unique to one particular AF_ALG > socket. Note the analogy: an AF_ALG socket is like a TFM where one > recvmsg operation uses one request with the TFM from the socket. > > AF_ALG these days supports AIO operations with multiple IOCBs. I.e. > with one recvmsg call, multiple IOVECs can be specified. Each > individual IOCB (derived from one IOVEC) implies that one request data > structure is created with the data to be processed by the cipher > implementation. The IV that was set with the sendmsg call is registered > with the request data structure before the cipher operation. > > As of now, the individual IOCBs are serialized with respect to the IV > handling. This implies that the kernel does not perform a truly parallel > invocation of the cipher implementations. However, if the user wants to > perform cryptographic operations on multiple IOCBs where each IOCB is > truly independent from the other, parallel invocations are possible. > This would require that each IOCB provides its own IV to ensure true > separation of the IOCBs. > > The solution is to allow providing the IV data supplied as part of the > plaintext/ciphertext. To do so, the AF_ALG interface treats the > ALG_SET_OP flag usable with sendmsg as a bit-array allowing to set the > cipher operation together with the flag whether the operation should > enable support for inline IV handling. > > If inline IV handling is enabled, the IV is expected to be the first > part of the input plaintext/ciphertext. This IV is only used for one > cipher operation and will not retained in the kernel for subsequent > cipher operations. > > The inline IV handling support is only allowed to be enabled during > the first sendmsg call for a context. Any subsequent sendmsg calls are > not allowed to change the setting of the inline IV handling (either > enable or disable it) as this would open up a race condition with the > locking and unlocking of the ctx->ivlock mutex. > > The AEAD support required a slight re-arragning of the code, because > obtaining the IV implies that ctx->used is updated. Thus, the ctx->used > access in _aead_recvmsg must be moved below the IV gathering. > > The AEAD code to find the first SG with data in the TX SGL is moved to a > common function as it is required by the IV gathering function as well. > > This patch does not change the existing interface where user space is > allowed to provide an IV via sendmsg. It only extends the interface by > giving the user the choice to provide the IV either via sendmsg (the > current approach) or as part of the data (the additional approach). > > Signed-off-by: Stephan Mueller It's not a 'bug' as such, but this does currently break the crypto-perf test for aio and iiv in the libkcapi branch. We can perhaps make it more forgiving... I suggest we let noop cases where we set IIV on when it is already on to not result in an error but to just be ignored. Jonathan > --- > crypto/af_alg.c | 93 > ++--- > crypto/algif_aead.c | 58 > crypto/algif_skcipher.c | 12 +++--- > include/crypto/if_alg.h | 21 +- > include/uapi/linux/if_alg.h | 6 ++- > 5 files changed, 143 insertions(+), 47 deletions(-) > > diff --git a/crypto/af_alg.c b/crypto/af_alg.c > index e7887621aa44..973233000d18 100644 > --- a/crypto/af_alg.c > +++ b/crypto/af_alg.c > @@ -14,6 +14,7 @@ > > #include > #include > +#include > #include > #include > #include > @@ -834,6 +835,7 @@ int af_alg_sendmsg(struct socket *sock, struct msghdr > *msg, size_t size, > struct af_alg_control con = {}; > long copied = 0; > bool enc = 0; > + int iiv = ALG_IIV_DISABLE; > bool init = 0; > int err = 0; > > @@ -843,7 +845,7 @@ int af_alg_sendmsg(struct socket *sock, struct msghdr > *msg, size_t size, > return err; > > init = 1; > - switch (con.op) { > + switch (con.op & ALG_OP_CIPHER_MASK) { > case ALG_OP_ENCRYPT: > enc = 1; > break; > @@ -854,6 +856,9 @@ int af_alg_sendmsg(struct socket *sock, struct msghdr > *msg, size_t size, > return -EINVAL; > } > > + if (con.op & ALG_OP_INLINE_IV) > + iiv = ALG_IIV_USE; > + >
Re: [RFC PATCH 2/3] crypto: hisilicon hacv1 driver
On Sat, 3 Feb 2018 12:16:18 +0100 Stephan Müller <smuel...@chronox.de> wrote: > Am Dienstag, 30. Januar 2018, 16:29:52 CET schrieb Jonathan Cameron: > > Hi Jonathan, > > > + /* Special path for single element SGLs with small packets. */ > > + if (sg_is_last(sgl) && sgl->length <= SEC_SMALL_PACKET_SIZE) { > > This looks strangely familiar. Is this code affected by a similar issue fixed > in https://patchwork.kernel.org/patch/10173981? Not as far as I know - this section is about optimizing the setup of the IOMMU. It's purely a performance optimization. It is really costly to do the translation setup for lots of small regions. These small regions are often contiguous anyway making the cost even more ridiculous. The use of a dma pool allows us to keep the iommu setup constant(ish). It is cheaper to copy into an element of this, already mapped, pool than it is to set up the iommu mappings for a new region. I could drop this for the initial submission and bring it in as an optimization with supporting numbers as a follow up patch. > > > +static int sec_alg_skcipher_setkey(struct crypto_skcipher *tfm, > > + const u8 *key, unsigned int keylen, > > + enum sec_cipher_alg alg) > > +{ > > + struct sec_alg_skcipher_ctx *ctx = crypto_skcipher_ctx(tfm); > > + struct device *dev; > > + > > + spin_lock(>lock); > > + if (ctx->enc_key) { > > + /* rekeying */ > > + dev = SEC_Q_DEV(ctx->queue); > > + memset(ctx->enc_key, 0, SEC_MAX_CIPHER_KEY); > > + memset(ctx->dec_key, 0, SEC_MAX_CIPHER_KEY); > > + memset(>enc_req, 0, sizeof(ctx->enc_req)); > > + memset(>dec_req, 0, sizeof(ctx->dec_req)); > > + } else { > > + /* new key */ > > + dev = SEC_Q_DEV(ctx->queue); > > + ctx->enc_key = dma_zalloc_coherent(dev, SEC_MAX_CIPHER_KEY, > > + >enc_pkey, > > GFP_ATOMIC); + if (!ctx->enc_key) { > > + spin_unlock(>lock); > > + return -ENOMEM; > > + } > > + ctx->dec_key = dma_zalloc_coherent(dev, SEC_MAX_CIPHER_KEY, > > + >dec_pkey, > > GFP_ATOMIC); + if (!ctx->dec_key) { > > + spin_unlock(>lock); > > + goto out_free_enc; > > + } > > + } > > + spin_unlock(>lock); > > + if (sec_alg_skcipher_init_context(tfm, key, keylen, alg)) > > + goto out_free_all; > > + > > + return 0; > > + > > +out_free_all: > > + memset(ctx->dec_key, 0, SEC_MAX_CIPHER_KEY); > > + dma_free_coherent(dev, SEC_MAX_CIPHER_KEY, > > + ctx->dec_key, ctx->dec_pkey); > > + ctx->dec_key = NULL; > > +out_free_enc: > > + memset(ctx->enc_key, 0, SEC_MAX_CIPHER_KEY); > > + dma_free_coherent(dev, SEC_MAX_CIPHER_KEY, > > + ctx->enc_key, ctx->enc_pkey); > > + ctx->enc_key = NULL; > > Please use memzero_explicit. Will do - thanks! Jonathan > > + > > + return -ENOMEM; > > +
Re: [PATCH] crypto: AF_ALG AIO - lock context IV
On Thu, 1 Feb 2018 12:07:21 +0200 Gilad Ben-Yossefwrote: > On Thu, Feb 1, 2018 at 12:04 PM, Stephan Mueller wrote: > > Am Donnerstag, 1. Februar 2018, 10:35:07 CET schrieb Gilad Ben-Yossef: > > > > Hi Gilad, > > > >> > > >> > Which works well for the sort of optimization I did and for hardware that > >> > can do iv dependency tracking itself. If hardware dependency tracking was > >> > avilable, you would be able to queue up requests with a chained IV > >> > without taking any special precautions at all. The hardware would > >> > handle the IV update dependencies. > >> > > >> > So in conclusion, Stephan's approach should work and give us a nice > >> > small patch suitable for stable inclusion. > >> > > >> > However, if people know that their setup overhead can be put in parallel > >> > with previous requests (even when the IV is not yet updated) then they > >> > will > >> > probably want to do something inside their own driver and set the flag > >> > that Stephan is proposing adding to bypass the mutex approach. > >> > >> The patches from Stephan looks good to me, but I think we can do better > >> for the long term approach you are discussing. > > > > What you made me think of is the following: shouldn't we relay the inline IV > > flag on to the crypto drivers? > > > > The reason is to allow a clean implementation of the enabling or disabling > > of > > the dependency handling in the driver. Jonathan's driver, for example, > > decides > > based on the pointer value of the IV buffer whether it is the same buffer > > and > > thus dependency handling is to be applied. This is fragile. I agree it's inelegant and a flag would be better than pointer tricks (though they are safe currently - we never know what might change in future) It was really a minimal example rather than a suggestion of the ultimate solution ;) I was planning on suggesting a flag myself once the basic discussion concluded the approach was worthwhile. > > > > As AF_ALG knows whether the inline IV with separate IVs per request or the > > serialization with one IV buffer for all requests is requested, it should > > relay this state on to the drivers. Thus, for example, Jonathan's driver can > > be changed to rely on this flag instead on the buffer pointer value to > > decide > > whether to enable its dependency handling. > > Yes, that is exactly what I was trying to point out :-) Agreed - make things explicit rather than basically relying on knowing how the above layers are working. Thanks, Jonathan
Re: [PATCH] crypto: AF_ALG AIO - lock context IV
On Tue, 30 Jan 2018 15:51:40 + Jonathan Cameron <jonathan.came...@huawei.com> wrote: > On Tue, 30 Jan 2018 09:27:04 +0100 > Stephan Müller <smuel...@chronox.de> wrote: A few clarifications from me after discussions with Stephan this morning. > > > Hi Harsh, > > > > may I as you to try the attached patch on your Chelsio driver? Please > > invoke > > the following command from libkcapi: > > > > kcapi -d 2 -x 9 -e -c "cbc(aes)" -k > > 8d7dd9b0170ce0b5f2f8e1aa768e01e91da8bfc67fd486d081b28254c99eb423 -i > > 7fbc02ebf5b93322329df9bfccb635af -p 48981da18e4bb9ef7e2e3162d16b1910 > > > > The result should now be a fully block-chained result. > > > > Note, the patch goes on top of the inline IV patch. > > > > Thanks > > > > ---8<--- > > > > In case of AIO where multiple IOCBs are sent to the kernel and inline IV > > is not used, the ctx->iv is used for each IOCB. The ctx->iv is read for > > the crypto operation and written back after the crypto operation. Thus, > > processing of multiple IOCBs must be serialized. > > > > As the AF_ALG interface is used by user space, a mutex provides the > > serialization which puts the caller to sleep in case a previous IOCB > > processing is not yet finished. > > > > If multiple IOCBs arrive that all are blocked, the mutex' FIFO operation > > of processing arriving requests ensures that the blocked IOCBs are > > unblocked in the right order. > > > > Signed-off-by: Stephan Mueller <smuel...@chronox.de> Firstly, as far as I can tell (not tested it yet) the patch does the job and is about the best we can easily do in the AF_ALG code. I'd suggest that this (or v2 anyway) is stable material as well (which, as Stephan observed, will require reordering of the two patches). > > As a reference point, holding outside the kernel (in at least > the case of our hardware with 8K CBC) drops us down to around 30% > of performance with separate IVs. Putting a queue in kernel > (and hence allowing most setup of descriptors / DMA etc) gets us > up to 50% of raw non chained performance. > > So whilst this will work in principle I suspect anyone who > cares about the cost will be looking at adding their own > software queuing and dependency handling in driver. How > much that matters will depend on just how quick the hardware > is vs overheads. > > Anyhow, just posted the driver as an RFC. There are a few corners > that need resolving and the outcome of this thread effects whether > we need to play games with IVs or not. > > https://marc.info/?l=linux-crypto-vger=151732626428206=2 > grep for softqueue. > > We have a number of processing units that will grab requests > from the HW queues in parallel. So 'lots' of requests > from a given HW queue may be in flight at the same time (the problem > case). > > Logic is fairly simple. > 1. Only create a softqueue if the encryption mode requires chaining. >Currently in this driver that is CBC and CTR modes only. >Otherwise put everything straight in the hardware monitored queues. > 2. If we have an IV embedded within the request it will have a different >address to the one used previously (guaranteed if that one is still >in flight and it doesn't matter if it isn't). If that occurs we >can safely add it to the hardware queue. To avoid DoS issues against >and earlier set of messages using the a chained IV we actually make >sure nothing is in the software queue (not needed for correctness) > 3. If IV matches previous IV and there are existing elements in SW or HW >queues we need to hold it in the SW queue. > 4. On receiving a response from the hardware and if the hardware queue >is empty, we can release an element from the software queue into the >hardware queue. The differences are better shown with pictures... To compare the two approaches. If we break up the data flow the alternatives are 1) Mutex causes queuing in AF ALG code The key thing is the [Build / Map HW Descs] for small packets, up to perhaps 32K, is a significant task we can't avoid. At 8k it looks like it takes perhaps 20-30% of the time (though I only have end to end performance numbers so far) [REQUEST 1 from userspace] | [REQUEST 2 from userspace] [AF_ALG/SOCKET] | | [AF_ALG/SOCKET] NOTHING BLOCKING (lock mut) | |Queued on Mutex [BUILD / MAP HW DESCS] | | | [Place in HW Queue]| |
Re: [PATCH] crypto: AF_ALG AIO - lock context IV
On Tue, 30 Jan 2018 09:27:04 +0100 Stephan Müllerwrote: > Hi Harsh, > > may I as you to try the attached patch on your Chelsio driver? Please invoke > the following command from libkcapi: > > kcapi -d 2 -x 9 -e -c "cbc(aes)" -k > 8d7dd9b0170ce0b5f2f8e1aa768e01e91da8bfc67fd486d081b28254c99eb423 -i > 7fbc02ebf5b93322329df9bfccb635af -p 48981da18e4bb9ef7e2e3162d16b1910 > > The result should now be a fully block-chained result. > > Note, the patch goes on top of the inline IV patch. > > Thanks > > ---8<--- > > In case of AIO where multiple IOCBs are sent to the kernel and inline IV > is not used, the ctx->iv is used for each IOCB. The ctx->iv is read for > the crypto operation and written back after the crypto operation. Thus, > processing of multiple IOCBs must be serialized. > > As the AF_ALG interface is used by user space, a mutex provides the > serialization which puts the caller to sleep in case a previous IOCB > processing is not yet finished. > > If multiple IOCBs arrive that all are blocked, the mutex' FIFO operation > of processing arriving requests ensures that the blocked IOCBs are > unblocked in the right order. > > Signed-off-by: Stephan Mueller As a reference point, holding outside the kernel (in at least the case of our hardware with 8K CBC) drops us down to around 30% of performance with separate IVs. Putting a queue in kernel (and hence allowing most setup of descriptors / DMA etc) gets us up to 50% of raw non chained performance. So whilst this will work in principle I suspect anyone who cares about the cost will be looking at adding their own software queuing and dependency handling in driver. How much that matters will depend on just how quick the hardware is vs overheads. Anyhow, just posted the driver as an RFC. There are a few corners that need resolving and the outcome of this thread effects whether we need to play games with IVs or not. https://marc.info/?l=linux-crypto-vger=151732626428206=2 grep for softqueue. We have a number of processing units that will grab requests from the HW queues in parallel. So 'lots' of requests from a given HW queue may be in flight at the same time (the problem case). Logic is fairly simple. 1. Only create a softqueue if the encryption mode requires chaining. Currently in this driver that is CBC and CTR modes only. Otherwise put everything straight in the hardware monitored queues. 2. If we have an IV embedded within the request it will have a different address to the one used previously (guaranteed if that one is still in flight and it doesn't matter if it isn't). If that occurs we can safely add it to the hardware queue. To avoid DoS issues against and earlier set of messages using the a chained IV we actually make sure nothing is in the software queue (not needed for correctness) 3. If IV matches previous IV and there are existing elements in SW or HW queues we need to hold it in the SW queue. 4. On receiving a response from the hardware and if the hardware queue is empty, we can release an element from the software queue into the hardware queue. This maintains chaining when needed and gets a good chunk of the lost performance back. I believe (though yet to verify) the remaining lost performance is mostly down to the fact we can't coalesce interrupts if chaining. Note the driver is deliberately a touch 'simplistic' so there may be other optimization opportunities to get some of the lost performance back or it may be fundamental due to the fact the raw processing can't be in parallel. > --- > crypto/af_alg.c | 22 ++ > crypto/algif_aead.c | 2 ++ > crypto/algif_skcipher.c | 2 ++ > include/crypto/if_alg.h | 3 +++ > 4 files changed, 29 insertions(+) > > diff --git a/crypto/af_alg.c b/crypto/af_alg.c > index 87394dc90ac0..3007c9d899da 100644 > --- a/crypto/af_alg.c > +++ b/crypto/af_alg.c > @@ -1059,6 +1059,8 @@ void af_alg_async_cb(struct crypto_async_request *_req, > int err) > struct kiocb *iocb = areq->iocb; > unsigned int resultlen; > > + af_alg_put_iv(sk); > + > /* Buffer size written by crypto operation. */ > resultlen = areq->outlen; > > @@ -1227,6 +1229,11 @@ int af_alg_get_iv(struct sock *sk, struct > af_alg_async_req *areq) > > /* No inline IV or cipher has no IV, use ctx IV buffer. */ > if (!ctx->iiv || !ctx->ivlen) { > + int ret = mutex_lock_interruptible(>ivlock); > + > + if (ret) > + return ret; > + > areq->iv = ctx->iv; > areq->ivlen = 0;// areq->iv will not be freed > return 0; > @@ -1252,6 +1259,21 @@ int af_alg_get_iv(struct sock *sk, struct > af_alg_async_req *areq) > } > EXPORT_SYMBOL_GPL(af_alg_get_iv); > > +/** > + * af_alg_put_iv - release lock on IV in case CTX IV is used > + * > + * @sk [in] AF_ALG socket > + */ > +void af_alg_put_iv(struct
[RFC PATCH 2/3] crypto: hisilicon hacv1 driver
From: Jonathan Cameron <jonathan.came...@huawei.com> This accelerator is found inside hisilicon hip06 and hip07 SoCs. Each instance provides a number of queues which feed a different number of backend accleration units. The queues are operating in an out of order mode in the interests of throughput. The silicon does not do tracking of dependencies between multiple 'messages' or update of the IVs as appropriate for training. Hence where relevant we need to do this in software. Signed-off-by: Jonathan Cameron <jonathan.came...@huawei.com> --- drivers/crypto/Kconfig |2 + drivers/crypto/Makefile |1 + drivers/crypto/hisilicon/Kconfig| 16 + drivers/crypto/hisilicon/Makefile |2 + drivers/crypto/hisilicon/sec/Makefile |3 + drivers/crypto/hisilicon/sec/sec_algs.c | 1082 +++ drivers/crypto/hisilicon/sec/sec_drv.c | 1418 +++ drivers/crypto/hisilicon/sec/sec_drv.h | 282 ++ 8 files changed, 2806 insertions(+) diff --git a/drivers/crypto/Kconfig b/drivers/crypto/Kconfig index 47ec920d5b71..ea85924427e9 100644 --- a/drivers/crypto/Kconfig +++ b/drivers/crypto/Kconfig @@ -731,4 +731,6 @@ config CRYPTO_DEV_ARTPEC6 To compile this driver as a module, choose M here. +source "drivers/crypto/hisilicon/Kconfig" + endif # CRYPTO_HW diff --git a/drivers/crypto/Makefile b/drivers/crypto/Makefile index 2513d13ea2c4..ed237414ca1b 100644 --- a/drivers/crypto/Makefile +++ b/drivers/crypto/Makefile @@ -45,3 +45,4 @@ obj-$(CONFIG_CRYPTO_DEV_VMX) += vmx/ obj-$(CONFIG_CRYPTO_DEV_BCM_SPU) += bcm/ obj-$(CONFIG_CRYPTO_DEV_SAFEXCEL) += inside-secure/ obj-$(CONFIG_CRYPTO_DEV_ARTPEC6) += axis/ +obj-$(CONFIG_CRYPTO_DEV_HISILICON) += hisilicon/ diff --git a/drivers/crypto/hisilicon/Kconfig b/drivers/crypto/hisilicon/Kconfig new file mode 100644 index ..cd9296687235 --- /dev/null +++ b/drivers/crypto/hisilicon/Kconfig @@ -0,0 +1,16 @@ +# SPDX-License-Identifier: GPL-2.0 + +config CRYPTO_DEV_HISILICON + bool + +config CRYPTO_DEV_HISI_SEC + tristate "Support for Hisilicon SEC crypto block cipher accelerator" + select CRYPTO_DEV_HISILICON + select CRYPTO_BLKCIPHER + select CRYPTO_ALGAPI + depends on ARM64 + help + Support for Hisilicon SEC Engine in Hip06 and Hip07 + + To compile this as a module, choose M here: the module + will be called hisi_sec. diff --git a/drivers/crypto/hisilicon/Makefile b/drivers/crypto/hisilicon/Makefile new file mode 100644 index ..463f46ace182 --- /dev/null +++ b/drivers/crypto/hisilicon/Makefile @@ -0,0 +1,2 @@ +# SPDX-License-Identifier: GPL-2.0 +obj-$(CONFIG_CRYPTO_DEV_HISI_SEC) += sec/ diff --git a/drivers/crypto/hisilicon/sec/Makefile b/drivers/crypto/hisilicon/sec/Makefile new file mode 100644 index ..a55b698e0c27 --- /dev/null +++ b/drivers/crypto/hisilicon/sec/Makefile @@ -0,0 +1,3 @@ +# SPDX-License-Identifier: GPL-2.0 +obj-$(CONFIG_CRYPTO_DEV_HISI_SEC) += hisi_sec.o +hisi_sec-y = sec_algs.o sec_drv.o diff --git a/drivers/crypto/hisilicon/sec/sec_algs.c b/drivers/crypto/hisilicon/sec/sec_algs.c new file mode 100644 index ..6b82636f7979 --- /dev/null +++ b/drivers/crypto/hisilicon/sec/sec_algs.c @@ -0,0 +1,1082 @@ +// SPDX-License-Identifier: GPL-2.0 +/* Copyright (c) 2016-2017 Hisilicon Limited. */ +#include +#include +#include +#include +#include +#include + +#include +#include +#include +#include +#include + +#include "sec_drv.h" + +enum sec_cipher_type { + SEC_CIPHER_NULL, + SEC_CIPHER_ENCRYPT, + SEC_CIPHER_DECRYPT, + SEC_CIPHER_PASS, + SEC_CIPHER_INVALID, +}; + +#define SEC_C_ALG_DES 0 +#define SEC_C_ALG_3DES 1 +#define SEC_C_ALG_AES 2 + +#define SEC_C_MODE_ECB 0 +#define SEC_C_MODE_CBC 1 +#define SEC_C_MODE_CTR 4 +#define SEC_C_MODE_CCM 5 +#define SEC_C_MODE_GCM 6 +#define SEC_C_MODE_XTS 7 + +#define SEC_KEY_LEN_AES_1280 +#define SEC_KEY_LEN_AES_1921 +#define SEC_KEY_LEN_AES_2562 +#define SEC_KEY_LEN_DES1 +#define SEC_KEY_LEN_3DES_3_KEY 1 +#define SEC_KEY_LEN_3DES_2_KEY 3 + +#define SEC_C_WIDTH_AES_128BIT 0 +#define SEC_C_WIDTH_AES_8BIT 1 +#define SEC_C_WIDTH_AES_1BIT 2 +#define SEC_C_WIDTH_DES_64BIT 0 +#define SEC_C_WIDTH_DES_8BIT 1 +#define SEC_C_WIDTH_DES_1BIT 2 + +enum sec_cipher_alg { + SEC_DES_ECB_64, + SEC_DES_CBC_64, + + SEC_3DES_ECB_192_3KEY, + SEC_3DES_ECB_192_2KEY, + + SEC_3DES_CBC_192_3KEY, + SEC_3DES_CBC_192_2KEY, + + SEC_AES_ECB_128, + SEC_AES_ECB_192, + SEC_AES_ECB_256, + + SEC_AES_CBC_128, +
[RFC PATCH 3/3] arm64: dts: hisi: add SEC crypto accelerator nodes for hip07 SoC
From: Jonathan Cameron <jonathan.came...@huawei.com> Enable all 4 SEC units available on d05 boards. Signed-off-by: Jonathan Cameron <jonathan.came...@huawei.com> --- arch/arm64/boot/dts/hisilicon/hip07.dtsi | 293 +++ 1 file changed, 293 insertions(+) diff --git a/arch/arm64/boot/dts/hisilicon/hip07.dtsi b/arch/arm64/boot/dts/hisilicon/hip07.dtsi index 2c01a21c3665..81b57c8ce720 100644 --- a/arch/arm64/boot/dts/hisilicon/hip07.dtsi +++ b/arch/arm64/boot/dts/hisilicon/hip07.dtsi @@ -1049,7 +1049,74 @@ num-pins = <2>; }; }; + p0_mbigen_alg_a:interrupt-controller@d008 { + compatible = "hisilicon,mbigen-v2"; + reg = <0x0 0xd008 0x0 0x1>; + p0_mbigen_sec_a: intc_sec { + msi-parent = <_its_dsa_a 0x40400>; + interrupt-controller; + #interrupt-cells = <2>; + num-pins = <33>; + }; + p0_mbigen_smmu_alg_a: intc_smmu_alg { + msi-parent = <_its_dsa_a 0x40b1b>; + interrupt-controller; + #interrupt-cells = <2>; + num-pins = <3>; + }; + }; + p0_mbigen_alg_b:interrupt-controller@8,d008 { + compatible = "hisilicon,mbigen-v2"; + reg = <0x8 0xd008 0x0 0x1>; + + p0_mbigen_sec_b: intc_sec { + msi-parent = <_its_dsa_b 0x42400>; + interrupt-controller; + #interrupt-cells = <2>; + num-pins = <33>; + }; + p0_mbigen_smmu_alg_b: intc_smmu_alg { + msi-parent = <_its_dsa_b 0x42b1b>; + interrupt-controller; + #interrupt-cells = <2>; + num-pins = <3>; + }; + }; + p1_mbigen_alg_a:interrupt-controller@400,d008 { + compatible = "hisilicon,mbigen-v2"; + reg = <0x400 0xd008 0x0 0x1>; + + p1_mbigen_sec_a: intc_sec { + msi-parent = <_its_dsa_a 0x44400>; + interrupt-controller; + #interrupt-cells = <2>; + num-pins = <33>; + }; + p1_mbigen_smmu_alg_a: intc_smmu_alg { + msi-parent = <_its_dsa_a 0x44b1b>; + interrupt-controller; + #interrupt-cells = <2>; + num-pins = <3>; + }; + }; + p1_mbigen_alg_b:interrupt-controller@408,d008 { + compatible = "hisilicon,mbigen-v2"; + reg = <0x408 0xd008 0x0 0x1>; + + p1_mbigen_sec_b: intc_sec { + msi-parent = <_its_dsa_b 0x46400>; + interrupt-controller; + #interrupt-cells = <2>; + num-pins = <33>; + }; + p1_mbigen_smmu_alg_b: intc_smmu_alg { + msi-parent = <_its_dsa_b 0x46b1b>; + interrupt-controller; + #interrupt-cells = <2>; + num-pins = <3>; + }; + }; p0_mbigen_dsa_a: interrupt-controller@c008 { compatible = "hisilicon,mbigen-v2"; reg = <0x0 0xc008 0x0 0x1>; @@ -1083,6 +1150,58 @@ }; }; + p0_smmu_alg_a: smmu_alg@d004 { + compatible = "arm,smmu-v3"; + reg = <0x0 0xd004 0x0 0x2>; + interrupt-parent = <_mbigen_smmu_alg_a>; + interrupts = <733 1>, + <734 1>, + <735 1>; + interrupt-names = "eventq", "gerror", "priq"; + #iommu-cells = <1>; + dma-coherent; + hisilicon,broken-prefetch-cmd; + /* smmu-cb-memtype = <0x0 0x1>;*/ + }; + p0_smmu_alg_b: smmu_alg@8,d004 { + compatible = "arm,smmu-v3"; + reg = <0x8 0xd004 0x0 0x2>; + interrupt-parent = <_mbigen_smmu_alg_b>; + interrupts = <733 1>, + <734 1>, + <735 1>; + interrupt-names = "eventq", "gerror", "priq"; + #iommu-cells = <1>; + dma-coherent; + hisilicon,broken-prefet
[RFC PATCH 0/3] Hisilicon SEC accelerator driver
From: Jonathan Cameron <jonathan.came...@huawei.com> This an RFC for a couple of reasons: 1) There is some ongoing discussion on how to handle IVs in the AF_ALG interface when doing async IO. There are a few corners of handling in here that are performance enhancements based on Stephan's recent set. 2) The handling of iommu presence is ugly. I need to discuss this with iommu maintainer and interested parties. 3) The hardware is limited to 32MB per queued job. We need to handle splitting this up into multiple jobs and related IV chaining. This driver provides in kernel support for the Hisilicon SEC accelerators found on the hip06 and hip07 SoCs. There are 4 such units on the D05 board for which DT bindings are provided. ACPI configuration also works with an appropriate UEFI build. The hardware does not update the IV in chaining or counting modes. This is done in the driver on completion of the cipher operation. If we are relying on IV chaining with current AF_ALG it is necessary to ensure we do not stream the elements directly into the hardware queues. For modes where this necessary we use a front end software queue. The driver currently supports AES, DES and 3DES block ciphers. Hash support and AAED support to follow. Jonathan Cameron (3): dt-bindings: Add bindings for Hisilicon SEC crypto accelerators. crypto: hisilicon hacv1 driver arm64: dts: hisi: add SEC crypto accelerator nodes for hip07 SoC .../bindings/crypto/hisilicon,hip07-sec.txt| 71 + arch/arm64/boot/dts/hisilicon/hip07.dtsi | 293 drivers/crypto/Kconfig |2 + drivers/crypto/Makefile|1 + drivers/crypto/hisilicon/Kconfig | 16 + drivers/crypto/hisilicon/Makefile |2 + drivers/crypto/hisilicon/sec/Makefile |3 + drivers/crypto/hisilicon/sec/sec_algs.c| 1082 +++ drivers/crypto/hisilicon/sec/sec_drv.c | 1418 drivers/crypto/hisilicon/sec/sec_drv.h | 282 10 files changed, 3170 insertions(+) create mode 100644 Documentation/devicetree/bindings/crypto/hisilicon,hip07-sec.txt create mode 100644 drivers/crypto/hisilicon/Kconfig create mode 100644 drivers/crypto/hisilicon/Makefile create mode 100644 drivers/crypto/hisilicon/sec/Makefile create mode 100644 drivers/crypto/hisilicon/sec/sec_algs.c create mode 100644 drivers/crypto/hisilicon/sec/sec_drv.c create mode 100644 drivers/crypto/hisilicon/sec/sec_drv.h -- 2.11.0
[RFC PATCH 1/3] dt-bindings: Add bindings for Hisilicon SEC crypto accelerators.
From: Jonathan Cameron <jonathan.came...@huawei.com> The hip06 and hip07 SoCs contain a number of these crypto units which accelerate AES and DES operations. Signed-off-by: Jonathan Cameron <jonathan.came...@huawei.com> --- .../bindings/crypto/hisilicon,hip07-sec.txt| 71 ++ 1 file changed, 71 insertions(+) diff --git a/Documentation/devicetree/bindings/crypto/hisilicon,hip07-sec.txt b/Documentation/devicetree/bindings/crypto/hisilicon,hip07-sec.txt new file mode 100644 index ..bf81118e560c --- /dev/null +++ b/Documentation/devicetree/bindings/crypto/hisilicon,hip07-sec.txt @@ -0,0 +1,71 @@ +* Hisilicon hip07 Security Accelerator (SEC) + +Required properties: +- compatible: Must contain one of + - "hisilicon,hip06-sec" + - "hisilicon,hip07-sec" +- #address-cells: Must be <2> as 64 bit addresses in reg. +- #size-cells: Must be <2> as 64 bit lengths in reg. +- reg: Memory addresses and lengths of the memory regions used by the driver. + Region 0 has registers to control the backend processing engines. + Region 1 has registers for functionality common to all queues. + Regions 2-18 have registers for the individual queues which are isolated + both in hardware and within the driver. +- interrupts: Interrupt specifiers. + Refer to interrupt-controller/interrupts.txt for generic interrupt client node + bindings. + Interrupt 0 is for the SEC unit error queue. + Interrupt 2N + 1 is the completion interrupt for queue N. + Interrupt 2N + 2 is the error interrupt for queue N. +- dma-coherent: The driver assumes coherent dma is possible. + +Optional properties: +- iommus: The SEC units are behind smmu-v3 iommus. + Refer to iommu/arm,smmu-v3.txt for more information. + +Example: +Second socket, first unit chosen to illustrate need for 64 bit addresses. + +p1_sec_a: sec@d200 { + compatible = "hisilicon,hip07-sec"; + #address-cells = <2>; + #size-cells = <2>; + reg = <0x400 0xd000 0x0 0x1 + 0x400 0xd200 0x0 0x1 + 0x400 0xd201 0x0 0x1 + 0x400 0xd202 0x0 0x1 + 0x400 0xd203 0x0 0x1 + 0x400 0xd204 0x0 0x1 + 0x400 0xd205 0x0 0x1 + 0x400 0xd206 0x0 0x1 + 0x400 0xd207 0x0 0x1 + 0x400 0xd208 0x0 0x1 + 0x400 0xd209 0x0 0x1 + 0x400 0xd20a 0x0 0x1 + 0x400 0xd20b 0x0 0x1 + 0x400 0xd20c 0x0 0x1 + 0x400 0xd20d 0x0 0x1 + 0x400 0xd20e 0x0 0x1 + 0x400 0xd20f 0x0 0x1 + 0x400 0xd210 0x0 0x1>; + interrupt-parent = <_mbigen_sec_a>; + iommus = <_smmu_alg_a 0x600>; + dma-coherent; + interrupts = <576 4>, +<577 1>,<578 4>, +<579 1>,<580 4>, +<581 1>,<582 4>, +<583 1>,<584 4>, +<585 1>,<586 4>, +<587 1>,<588 4>, +<589 1>,<590 4>, +<591 1>,<592 4>, +<593 1>,<594 4>, +<595 1>,<596 4>, +<597 1>,<598 4>, +<599 1>,<600 4>, +<601 1>,<602 4>, +<603 1>,<604 4>, +<605 1>,<606 4>, +<607 1>,<608 4>; +}; -- 2.11.0
Re: [PATCH] crypto: AF_ALG - inline IV support
On Mon, 22 Jan 2018 15:30:39 +0100 Stephan Mueller <smuel...@chronox.de> wrote: > Am Montag, 22. Januar 2018, 15:11:53 CET schrieb Jonathan Cameron: > > Hi Jonathan, Hi Stephan, > > > On Mon, 15 Jan 2018 10:35:34 +0100 > > > > Stephan Mueller <smuel...@chronox.de> wrote: > > > The kernel crypto API requires the caller to set an IV in the request > > > data structure. That request data structure shall define one particular > > > cipher operation. During the cipher operation, the IV is read by the > > > cipher implementation and eventually the potentially updated IV (e.g. > > > in case of CBC) is written back to the memory location the request data > > > structure points to. > > > > > > AF_ALG allows setting the IV with a sendmsg request, where the IV is > > > stored in the AF_ALG context that is unique to one particular AF_ALG > > > socket. Note the analogy: an AF_ALG socket is like a TFM where one > > > recvmsg operation uses one request with the TFM from the socket. > > > > > > AF_ALG these days supports AIO operations with multiple IOCBs. I.e. > > > with one recvmsg call, multiple IOVECs can be specified. Each > > > individual IOCB (derived from one IOVEC) implies that one request data > > > structure is created with the data to be processed by the cipher > > > implementation. The IV that was set with the sendmsg call is registered > > > with the request data structure before the cipher operation. > > > > > > In case of an AIO operation, the cipher operation invocation returns > > > immediately, queuing the request to the hardware. While the AIO request > > > is processed by the hardware, recvmsg processes the next IOVEC for > > > which another request is created. Again, the IV buffer from the AF_ALG > > > socket context is registered with the new request and the cipher > > > operation is invoked. > > > > > > You may now see that there is a potential race condition regarding the > > > IV handling, because there is *no* separate IV buffer for the different > > > requests. This is nicely demonstrated with libkcapi using the following > > > command which creates an AIO request with two IOCBs each encrypting one > > > AES block in CBC mode: > > > > > > kcapi -d 2 -x 9 -e -c "cbc(aes)" -k > > > 8d7dd9b0170ce0b5f2f8e1aa768e01e91da8bfc67fd486d081b28254c99eb423 -i > > > 7fbc02ebf5b93322329df9bfccb635af -p 48981da18e4bb9ef7e2e3162d16b1910 > > > > > > When the first AIO request finishes before the 2nd AIO request is > > > processed, the returned value is: > > > > > > 8b19050f66582cb7f7e4b6c873819b7108afa0eaa7de29bac7d903576b674c32 > > > > > > I.e. two blocks where the IV output from the first request is the IV input > > > to the 2nd block. > > > > > > In case the first AIO request is not completed before the 2nd request > > > commences, the result is two identical AES blocks (i.e. both use the > > > same IV): > > > > > > 8b19050f66582cb7f7e4b6c873819b718b19050f66582cb7f7e4b6c873819b71 > > > > > > This inconsistent result may even lead to the conclusion that there can > > > be a memory corruption in the IV buffer if both AIO requests write to > > > the IV buffer at the same time. > > > > > > The solution is to allow providing the IV data supplied as part of the > > > plaintext/ciphertext. To do so, the AF_ALG interface treats the > > > ALG_SET_OP flag usable with sendmsg as a bit-array allowing to set the > > > cipher operation together with the flag whether the operation should > > > enable support for inline IV handling. > > > > > > If inline IV handling is enabled, the IV is expected to be the first > > > part of the input plaintext/ciphertext. This IV is only used for one > > > cipher operation and will not retained in the kernel for subsequent > > > cipher operations. > > > > > > The AEAD support required a slight re-arragning of the code, because > > > obtaining the IV implies that ctx->used is updated. Thus, the ctx->used > > > access in _aead_recvmsg must be moved below the IV gathering. > > > > > > The AEAD code to find the first SG with data in the TX SGL is moved to a > > > common function as it is required by the IV gathering function as well. > > > > > > This patch does not change the existing interface where user space is > > > allowed to provide an IV via sendm
Re: [PATCH] crypto: AF_ALG - inline IV support
On Mon, 15 Jan 2018 10:35:34 +0100 Stephan Mueller <smuel...@chronox.de> wrote: > The kernel crypto API requires the caller to set an IV in the request > data structure. That request data structure shall define one particular > cipher operation. During the cipher operation, the IV is read by the > cipher implementation and eventually the potentially updated IV (e.g. > in case of CBC) is written back to the memory location the request data > structure points to. > > AF_ALG allows setting the IV with a sendmsg request, where the IV is > stored in the AF_ALG context that is unique to one particular AF_ALG > socket. Note the analogy: an AF_ALG socket is like a TFM where one > recvmsg operation uses one request with the TFM from the socket. > > AF_ALG these days supports AIO operations with multiple IOCBs. I.e. > with one recvmsg call, multiple IOVECs can be specified. Each > individual IOCB (derived from one IOVEC) implies that one request data > structure is created with the data to be processed by the cipher > implementation. The IV that was set with the sendmsg call is registered > with the request data structure before the cipher operation. > > In case of an AIO operation, the cipher operation invocation returns > immediately, queuing the request to the hardware. While the AIO request > is processed by the hardware, recvmsg processes the next IOVEC for > which another request is created. Again, the IV buffer from the AF_ALG > socket context is registered with the new request and the cipher > operation is invoked. > > You may now see that there is a potential race condition regarding the > IV handling, because there is *no* separate IV buffer for the different > requests. This is nicely demonstrated with libkcapi using the following > command which creates an AIO request with two IOCBs each encrypting one > AES block in CBC mode: > > kcapi -d 2 -x 9 -e -c "cbc(aes)" -k > 8d7dd9b0170ce0b5f2f8e1aa768e01e91da8bfc67fd486d081b28254c99eb423 -i > 7fbc02ebf5b93322329df9bfccb635af -p 48981da18e4bb9ef7e2e3162d16b1910 > > When the first AIO request finishes before the 2nd AIO request is > processed, the returned value is: > > 8b19050f66582cb7f7e4b6c873819b7108afa0eaa7de29bac7d903576b674c32 > > I.e. two blocks where the IV output from the first request is the IV input > to the 2nd block. > > In case the first AIO request is not completed before the 2nd request > commences, the result is two identical AES blocks (i.e. both use the > same IV): > > 8b19050f66582cb7f7e4b6c873819b718b19050f66582cb7f7e4b6c873819b71 > > This inconsistent result may even lead to the conclusion that there can > be a memory corruption in the IV buffer if both AIO requests write to > the IV buffer at the same time. > > The solution is to allow providing the IV data supplied as part of the > plaintext/ciphertext. To do so, the AF_ALG interface treats the > ALG_SET_OP flag usable with sendmsg as a bit-array allowing to set the > cipher operation together with the flag whether the operation should > enable support for inline IV handling. > > If inline IV handling is enabled, the IV is expected to be the first > part of the input plaintext/ciphertext. This IV is only used for one > cipher operation and will not retained in the kernel for subsequent > cipher operations. > > The AEAD support required a slight re-arragning of the code, because > obtaining the IV implies that ctx->used is updated. Thus, the ctx->used > access in _aead_recvmsg must be moved below the IV gathering. > > The AEAD code to find the first SG with data in the TX SGL is moved to a > common function as it is required by the IV gathering function as well. > > This patch does not change the existing interface where user space is > allowed to provide an IV via sendmsg. It only extends the interface by > giving the user the choice to provide the IV either via sendmsg (the > current approach) or as part of the data (the additional approach). > > Signed-off-by: Stephan Mueller <smuel...@chronox.de> Firstly it works Tested-by: Jonathan Cameron <jonathan.came...@huawei.com> Now to be really useful for my particular hardware (which doesn't do dependency tracking in its out of order queues) what I need to know is whether I need to let previous queued up entries finish before I can run this one or not. Now there is an obvious 'hack' I can use, which is that the above only matters is if the IV is still 'in flight'. Given it is still inflight the memory is in use. Thus until it is finished it's address is clearly only being used by 'this IV'. Hence I can just compare against the address of the IV for the previous packet. If it changed we know there is no need to wait for previous packets to finish. I'll s
Re: [RFC] AF_ALG AIO and IV
On Tue, 16 Jan 2018 07:28:06 +0100 Stephan Mueller <smuel...@chronox.de> wrote: > Am Montag, 15. Januar 2018, 15:42:58 CET schrieb Jonathan Cameron: > > Hi Jonathan, > > > > What about: > > > > > > sendmsg(IV, data) > > > sendmsg(data) > > > .. > > > AIO recvmsg with multiple IOCBs > > > AIO recvmsg with multiple IOCBs > > > .. > > > sendmsg(IV, data) > > > .. > > > > > > This implies, however, that before the sendmsg with the second IV is sent, > > > all AIO operations from the first invocation would need to be finished. > > Yes that works fine, but rather restricts the flow - you would end up > > waiting until you could concatenate a bunch of data in userspace so as to > > trade off against the slow down whenever you need to synchronize back up to > > userspace. > > I think the solution is already present and even libkcapi's architecture is > set up to handle this scenario: > > We have 2 types of FDs: one obtained from socket() and one from accept(). The > socket-FD is akin to the TFM. The accept FD is exactly what you want: > > tfmfd = socket() > setkey(tfmfd) > opfd = accept() > opfd2 = accept() > sendmsg(opfd, IV, data) > recvmsg(opfd, data) > > sendmsg(opfd2, IV, data) > recvmsg(opfd2, data) > > sendmsg(opfd, data) > .. > > There can be multipe FDs from accept and these are the "identifiers" for your > cipher operation stream that belongs together. > > libkcapi has already the architecture for this type of work, but it is not > exposed to the API yet. The internal calls for sendmsg/recvmsg all take an > (op)FD parameter. E.g. _kcapi_common_send_meta_fd has the fdptr variable. > internal.h currently wraps this call into _kcapi_common_send_meta where the > handle->opfd variable is used. > > The idea why I implemented that is because the caller could maintain an array > of opfds. If we would expose these internal calls with the FD argument, you > can maintain multiple opfds implementing your use case. > > The only change would be to expose the internal libkcapi calls. > > Ciao > Stephan Thanks, I'll take a look at this soonish. Having a busy week. Jonathan > >
Re: [RFC] AF_ALG AIO and IV
On Mon, 15 Jan 2018 15:31:42 +0100 Stephan Mueller <smuel...@chronox.de> wrote: > Am Montag, 15. Januar 2018, 15:25:38 CET schrieb Jonathan Cameron: > > Hi Jonathan, > > > On Mon, 15 Jan 2018 14:15:42 +0100 > > > > Stephan Mueller <smuel...@chronox.de> wrote: > > > Am Montag, 15. Januar 2018, 13:59:27 CET schrieb Jonathan Cameron: > > > > > > Hi Jonathan, > > > > > > > > But there may be hardware that cannot/will not track such > > > > > dependencies. > > > > > Yet, it has multiple hardware queues. Such hardware can still handle > > > > > parallel requests when they are totally independent from each other. > > > > > For > > > > > such a case, AF_ALG currently has no support, because it lacks the > > > > > support for setting multiple IVs for the multiple concurrent calls. > > > > > > > > Agreed, something like your new support is needed - I just suspect we > > > > need > > > > a level between one socket one iv chain and every IOCB with own IV and > > > > right now the only way to hit that balance is to have a separate socket > > > > for each IV chain. Not exactly efficient use of resources though it > > > > will > > > > work. > > > > > > How would you propose such support via AF_ALG? > > > Wouldn't it be possible to > > > arrange the IOVEC array in user space appropriately before calling the > > > AF_ALG interface? In this case, I would still see that the current AF_ALG > > > (plus the patch) would support all use cases I am aware of. > > > > I'm not sure how that would work, but maybe I'm missing something - are you > > suggesting we could contrive the situation where the kernel side can tell > > it is getting the same IV multiple times and hence know that it should chain > > it? We are talking streaming here - we don't have the data for the later > > elements when the first ones are queued up. > > > > One approach to handling token based IV - where the token refers to an IV > > without being it's value would be to add another flag similar to the one > > you used for inline IV. > > What about: > > sendmsg(IV, data) > sendmsg(data) > .. > AIO recvmsg with multiple IOCBs > AIO recvmsg with multiple IOCBs > .. > sendmsg(IV, data) > .. > > This implies, however, that before the sendmsg with the second IV is sent, > all > AIO operations from the first invocation would need to be finished. Yes that works fine, but rather restricts the flow - you would end up waiting until you could concatenate a bunch of data in userspace so as to trade off against the slow down whenever you need to synchronize back up to userspace. > > > > You would then set the IV as you have done, but also provide a magic value > > by which to track the chain. Later IOCBs using the same IV chain would > > just provide the magic token. > > > > You'd also need some way of retiring the IV eventually once you were done > > with it or ultimately you would run out of resources. > > Let me think about that approach a bit. > > > > > > > What AF_ALG should do is to enable different vendors like yourself to use > > > the most appropriate solution. AF_ALG shall not limit users in any way. > > Agreed, but we also need to have some consistency for userspace to have some > > awareness of what it should be using. Last thing we want is lots of higher > > level software having to have knowledge of the encryption hardware > > underneath. Hence I think we should keep the options to the minimum > > possible or put the burden on drivers that must play well with all options > > (be it not as efficiently for the ones that work badly for them). > > > > > Thus, AF_ALG allows multiple sockets, if desired. It allows a stream usage > > > with one setiv call applicable to multiple cipher operations. And with the > > > offered patch it would allow multiple concurrent and yet independent > > > cipher > > > operations. Whichever use case is right for you, AF_ALG should not block > > > you from applying it. Yet, what is good for you may not be good for > > > others. Thus, these others may implement a different usage strategy for > > > AF_ALG. The good thing is that this strategy is defined by user space. > > > > > > In case you see a use case that is prevented by AF_ALG, it would be great > > > to hear about it to see whether we can support it. > > > > The usecas
Re: [RFC] AF_ALG AIO and IV
On Mon, 15 Jan 2018 14:25:38 + Jonathan Cameron <jonathan.came...@huawei.com> wrote: > On Mon, 15 Jan 2018 14:15:42 +0100 > Stephan Mueller <smuel...@chronox.de> wrote: > > > Am Montag, 15. Januar 2018, 13:59:27 CET schrieb Jonathan Cameron: > > > > Hi Jonathan, > > > > > > > > But there may be hardware that cannot/will not track such dependencies. > > > > Yet, it has multiple hardware queues. Such hardware can still handle > > > > parallel requests when they are totally independent from each other. For > > > > such a case, AF_ALG currently has no support, because it lacks the > > > > support for setting multiple IVs for the multiple concurrent calls. > > > > > > Agreed, something like your new support is needed - I just suspect we need > > > a level between one socket one iv chain and every IOCB with own IV and > > > right now the only way to hit that balance is to have a separate socket > > > for each IV chain. Not exactly efficient use of resources though it will > > > work. > > > > How would you propose such support via AF_ALG? > > Wouldn't it be possible to > > arrange the IOVEC array in user space appropriately before calling the > > AF_ALG > > interface? In this case, I would still see that the current AF_ALG (plus > > the > > patch) would support all use cases I am aware of. > > I'm not sure how that would work, but maybe I'm missing something - are you > suggesting we could contrive the situation where the kernel side can tell > it is getting the same IV multiple times and hence know that it should chain > it? We are talking streaming here - we don't have the data for the > later elements when the first ones are queued up. > > One approach to handling token based IV - where the token refers to an IV > without > being it's value would be to add another flag similar to the one you used for > inline IV. > > You would then set the IV as you have done, but also provide a magic value by > which to track the chain. Later IOCBs using the same IV chain would just > provide the magic token. > > You'd also need some way of retiring the IV eventually once you were done > with it or ultimately you would run out of resources. > > > > > > > > So the only one left is the case 3 above where the hardware is capable > > > > > of doing the dependency tracking. > > > > > > > > > > We can support that in two ways but one is rather heavyweight in terms > > > > > of > > > > > resources. > > > > > > > > > > 1) Whenever we want to allocate a new context we spin up a new socket > > > > > and > > > > > effectively associate a single IV with that (and it's chained updates) > > > > > much > > > > > like we do in the existing interface. > > > > > > > > I would not like that because it is too heavyweight. Moreover, > > > > considering > > > > the kernel crypto API logic, a socket is the user space equivalent of a > > > > TFM. I.e. for setting an IV, you do not need to re-instantiate a TFM. > > > > > > > > > > Agreed, though as I mention above if you have multiple processes you > > > probably want to give them their own resources anyway (own socket and > > > probably hardware queue if you can spare one) so as to avoid denial of > > > service from one to another. > > > > That is for sure, different processes shall never share a socket as > > otherwise > > one can obtain data belonging to the other which would violate process > > isolation. > > > > > > > 2) We allow a token based tracking of IVs. So userspace code > > > > > maintains > > > > > a counter and tags ever message and the initial IV setup with that > > > > > counter. > > > > > > > > I think the option I offer with the patch, we have an even more > > > > lightweight > > > > approach. > > > > > > Except that I think you have to go all the way back to userspace - unless > > > I > > > am missing the point - you can't have multiple elements of a stream queued > > > up. Performance will stink if you have a small number of contexts and > > > can't > > > keep the processing engines busy. At the moment option 1 here is the only > > > way to implement this. > > > > What AF_ALG should do is to enable different
Re: [RFC] AF_ALG AIO and IV
On Mon, 15 Jan 2018 14:15:42 +0100 Stephan Mueller <smuel...@chronox.de> wrote: > Am Montag, 15. Januar 2018, 13:59:27 CET schrieb Jonathan Cameron: > > Hi Jonathan, > > > > > > But there may be hardware that cannot/will not track such dependencies. > > > Yet, it has multiple hardware queues. Such hardware can still handle > > > parallel requests when they are totally independent from each other. For > > > such a case, AF_ALG currently has no support, because it lacks the > > > support for setting multiple IVs for the multiple concurrent calls. > > > > Agreed, something like your new support is needed - I just suspect we need > > a level between one socket one iv chain and every IOCB with own IV and > > right now the only way to hit that balance is to have a separate socket > > for each IV chain. Not exactly efficient use of resources though it will > > work. > > How would you propose such support via AF_ALG? > Wouldn't it be possible to > arrange the IOVEC array in user space appropriately before calling the AF_ALG > interface? In this case, I would still see that the current AF_ALG (plus the > patch) would support all use cases I am aware of. I'm not sure how that would work, but maybe I'm missing something - are you suggesting we could contrive the situation where the kernel side can tell it is getting the same IV multiple times and hence know that it should chain it? We are talking streaming here - we don't have the data for the later elements when the first ones are queued up. One approach to handling token based IV - where the token refers to an IV without being it's value would be to add another flag similar to the one you used for inline IV. You would then set the IV as you have done, but also provide a magic value by which to track the chain. Later IOCBs using the same IV chain would just provide the magic token. You'd also need some way of retiring the IV eventually once you were done with it or ultimately you would run out of resources. > > > > > So the only one left is the case 3 above where the hardware is capable > > > > of doing the dependency tracking. > > > > > > > > We can support that in two ways but one is rather heavyweight in terms > > > > of > > > > resources. > > > > > > > > 1) Whenever we want to allocate a new context we spin up a new socket > > > > and > > > > effectively associate a single IV with that (and it's chained updates) > > > > much > > > > like we do in the existing interface. > > > > > > I would not like that because it is too heavyweight. Moreover, considering > > > the kernel crypto API logic, a socket is the user space equivalent of a > > > TFM. I.e. for setting an IV, you do not need to re-instantiate a TFM. > > > > Agreed, though as I mention above if you have multiple processes you > > probably want to give them their own resources anyway (own socket and > > probably hardware queue if you can spare one) so as to avoid denial of > > service from one to another. > > That is for sure, different processes shall never share a socket as otherwise > one can obtain data belonging to the other which would violate process > isolation. > > > > > 2) We allow a token based tracking of IVs. So userspace code maintains > > > > a counter and tags ever message and the initial IV setup with that > > > > counter. > > > > > > I think the option I offer with the patch, we have an even more > > > lightweight > > > approach. > > > > Except that I think you have to go all the way back to userspace - unless I > > am missing the point - you can't have multiple elements of a stream queued > > up. Performance will stink if you have a small number of contexts and can't > > keep the processing engines busy. At the moment option 1 here is the only > > way to implement this. > > What AF_ALG should do is to enable different vendors like yourself to use the > most appropriate solution. AF_ALG shall not limit users in any way. Agreed, but we also need to have some consistency for userspace to have some awareness of what it should be using. Last thing we want is lots of higher level software having to have knowledge of the encryption hardware underneath. Hence I think we should keep the options to the minimum possible or put the burden on drivers that must play well with all options (be it not as efficiently for the ones that work badly for them). > > Thus, AF_ALG allows multiple sockets, if desired. It allows a stream usage > with one setiv call applicable to multiple
Re: [RFC] AF_ALG AIO and IV
On Mon, 15 Jan 2018 13:07:16 +0100 Stephan Mueller <smuel...@chronox.de> wrote: > Am Montag, 15. Januar 2018, 12:05:03 CET schrieb Jonathan Cameron: > > Hi Jonathan, > > > On Fri, 12 Jan 2018 14:21:15 +0100 > > > > Stephan Mueller <smuel...@chronox.de> wrote: > > > Hi, > > > > > > The kernel crypto API requires the caller to set an IV in the request data > > > structure. That request data structure shall define one particular cipher > > > operation. During the cipher operation, the IV is read by the cipher > > > implementation and eventually the potentially updated IV (e.g. in case of > > > CBC) is written back to the memory location the request data structure > > > points to. > > Silly question, are we obliged to always write it back? > > Well, in general, yes. The AF_ALG interface should allow a "stream" mode of > operation: > > socket > accept > setsockopt(setkey) > sendmsg(IV, data) > recvmsg(data) > sendmsg(data) > recvmsg(data) > .. > > For such synchronous operation, I guess it is clear that the IV needs to be > written back. > > If you want to play with it, use the "stream" API of libkcapi and the > associated test cases. Thanks for the pointer - will do. > > > In CBC it is > > obviously the same as the last n bytes of the encrypted message. I guess > > for ease of handling it makes sense to do so though. > > > > > AF_ALG allows setting the IV with a sendmsg request, where the IV is > > > stored in the AF_ALG context that is unique to one particular AF_ALG > > > socket. Note the analogy: an AF_ALG socket is like a TFM where one > > > recvmsg operation uses one request with the TFM from the socket. > > > > > > AF_ALG these days supports AIO operations with multiple IOCBs. I.e. with > > > one recvmsg call, multiple IOVECs can be specified. Each individual IOCB > > > (derived from one IOVEC) implies that one request data structure is > > > created with the data to be processed by the cipher implementation. The > > > IV that was set with the sendmsg call is registered with the request data > > > structure before the cipher operation. > > > > > > In case of an AIO operation, the cipher operation invocation returns > > > immediately, queuing the request to the hardware. While the AIO request is > > > processed by the hardware, recvmsg processes the next IOVEC for which > > > another request is created. Again, the IV buffer from the AF_ALG socket > > > context is registered with the new request and the cipher operation is > > > invoked. > > > > > > You may now see that there is a potential race condition regarding the IV > > > handling, because there is *no* separate IV buffer for the different > > > requests. This is nicely demonstrated with libkcapi using the following > > > command which creates an AIO request with two IOCBs each encrypting one > > > AES block in CBC mode: > > > > > > kcapi -d 2 -x 9 -e -c "cbc(aes)" -k > > > 8d7dd9b0170ce0b5f2f8e1aa768e01e91da8bfc67fd486d081b28254c99eb423 -i > > > 7fbc02ebf5b93322329df9bfccb635af -p 48981da18e4bb9ef7e2e3162d16b1910 > > > > > > When the first AIO request finishes before the 2nd AIO request is > > > processed, the returned value is: > > > > > > 8b19050f66582cb7f7e4b6c873819b7108afa0eaa7de29bac7d903576b674c32 > > > > > > I.e. two blocks where the IV output from the first request is the IV input > > > to the 2nd block. > > > > > > In case the first AIO request is not completed before the 2nd request > > > commences, the result is two identical AES blocks (i.e. both use the same > > > IV): > > > > > > 8b19050f66582cb7f7e4b6c873819b718b19050f66582cb7f7e4b6c873819b71 > > > > > > This inconsistent result may even lead to the conclusion that there can be > > > a memory corruption in the IV buffer if both AIO requests write to the IV > > > buffer at the same time. > > > > > > This needs to be solved somehow. I see the following options which I would > > > like to have vetted by the community. > > > > Taking some 'entirely hypothetical' hardware with the following structure > > for all my responses - it's about as flexible as I think we'll see in the > > near future - though I'm sure someone has something more complex out there > > :) > > > > N hardware queues feeding M processing engines in a scheduler driven > &g
Re: [RFC] AF_ALG AIO and IV
On Fri, 12 Jan 2018 14:21:15 +0100 Stephan Muellerwrote: > Hi, > > The kernel crypto API requires the caller to set an IV in the request data > structure. That request data structure shall define one particular cipher > operation. During the cipher operation, the IV is read by the cipher > implementation and eventually the potentially updated IV (e.g. in case of > CBC) > is written back to the memory location the request data structure points to. Silly question, are we obliged to always write it back? In CBC it is obviously the same as the last n bytes of the encrypted message. I guess for ease of handling it makes sense to do so though. > > AF_ALG allows setting the IV with a sendmsg request, where the IV is stored > in > the AF_ALG context that is unique to one particular AF_ALG socket. Note the > analogy: an AF_ALG socket is like a TFM where one recvmsg operation uses one > request with the TFM from the socket. > > AF_ALG these days supports AIO operations with multiple IOCBs. I.e. with one > recvmsg call, multiple IOVECs can be specified. Each individual IOCB (derived > from one IOVEC) implies that one request data structure is created with the > data to be processed by the cipher implementation. The IV that was set with > the sendmsg call is registered with the request data structure before the > cipher operation. > > In case of an AIO operation, the cipher operation invocation returns > immediately, queuing the request to the hardware. While the AIO request is > processed by the hardware, recvmsg processes the next IOVEC for which another > request is created. Again, the IV buffer from the AF_ALG socket context is > registered with the new request and the cipher operation is invoked. > > You may now see that there is a potential race condition regarding the IV > handling, because there is *no* separate IV buffer for the different > requests. > This is nicely demonstrated with libkcapi using the following command which > creates an AIO request with two IOCBs each encrypting one AES block in CBC > mode: > > kcapi -d 2 -x 9 -e -c "cbc(aes)" -k > 8d7dd9b0170ce0b5f2f8e1aa768e01e91da8bfc67fd486d081b28254c99eb423 -i > 7fbc02ebf5b93322329df9bfccb635af -p 48981da18e4bb9ef7e2e3162d16b1910 > > When the first AIO request finishes before the 2nd AIO request is processed, > the returned value is: > > 8b19050f66582cb7f7e4b6c873819b7108afa0eaa7de29bac7d903576b674c32 > > I.e. two blocks where the IV output from the first request is the IV input to > the 2nd block. > > In case the first AIO request is not completed before the 2nd request > commences, the result is two identical AES blocks (i.e. both use the same IV): > > 8b19050f66582cb7f7e4b6c873819b718b19050f66582cb7f7e4b6c873819b71 > > This inconsistent result may even lead to the conclusion that there can be a > memory corruption in the IV buffer if both AIO requests write to the IV > buffer > at the same time. > > This needs to be solved somehow. I see the following options which I would > like to have vetted by the community. > Taking some 'entirely hypothetical' hardware with the following structure for all my responses - it's about as flexible as I think we'll see in the near future - though I'm sure someone has something more complex out there :) N hardware queues feeding M processing engines in a scheduler driven fashion. Actually we might have P sets of these, but load balancing and tracking and transferring contexts between these is a complexity I think we can ignore. If you want to use more than one of these P you'll just have to handle it yourself in userspace. Note messages may be shorter than IOCBs which raises another question I've been meaning to ask. Are all crypto algorithms obliged to run unlimited length IOCBs? If there are M messages in a particular queue and none elsewhere it is capable of processing them all at once (and perhaps returning out of order but we can fudge them back in order in the driver to avoid that additional complexity from an interface point of view). So I'm going to look at this from the hardware point of view - you have well addressed software management above. Three ways context management can be handled (in CBC this is basically just the IV). 1. Each 'work item' queued on a hardware queue has it's IV embedded with the data. This requires external synchronization if we are chaining across multiple 'work items' - note the hardware may have restrictions that mean it has to split large pieces of data up to encrypt them. Not all hardware may support per 'work item' IVs (I haven't done a survey to find out if everyone does...) 2. Each queue has a context assigned. We get a new queue whenever we want to have a different context. Runs out eventually but our hypothetical hardware may support a lot of queues. Note this version could be 'faked' by putting a cryptoengine queue on the front of the hardware queues. 3. The hardware supports IV
Re: AF_ALG: skb limits
On Sat, 13 Jan 2018 15:04:20 +0100 Stephan Müller <smuel...@chronox.de> wrote: > Am Dienstag, 12. Dezember 2017, 14:59:21 CET schrieb Jonathan Cameron: > Hi Stephan, > Hi Jonathan, > > > On Fri, 8 Dec 2017 13:43:20 +0100 > > > > Stephan Mueller <smuel...@chronox.de> wrote: > > > Am Freitag, 8. Dezember 2017, 12:39:06 CET schrieb Jonathan Cameron: > > > > > > Hi Jonathan, > > > > > > > As a heads up, the other nasties we've found so far are around hitting > > > > limits on the various socket buffers. When you run into those you can > > > > end > > > > up with parts of the data to be encrypted going through without it being > > > > obvious. > > > > > > The entire code uses sock_alloc to prevent user space to chew up kernel > > > memory. I am aware that if you have a too low skb buffer limit, parts of > > > the cipher operation will not succeed as a malloc will fail. > > > > > > But that is returned with an error to user space. > > > > I 'think' there are some cases where you don't get an error to userspace - > > or at least not directly. > > > > In af_alg_get_rsgl, we have an af_alg_readable(sk) call which simply breaks > > out of the loop - leaving us with len set to potentially part of the > > required length - without setting a appropriate return value. > > > > I can't immediately see how this one is reported to userspace. > > af_alg_get_rsgl is used to calculate and obtain the length of the buffers > receiving the result (what is called RX buffers). That length is used for the > cipher operation and finally reported to user space. > > For example (TX is the buffer given to the kernel during sendmsg, RX during > recvmsg) where the example disregards the alignment to block size: > > sendmsg TX: 1000 bytes > recvmsg RX: 100 bytes given by user space, 100 bytes obtained by kernel > => recvmsg returns 100 > recvmsg RX: 1000 bytes given by user space, 900 bytes obtained by kernel > => recvmsg returns 900 > > Note, the TX and RX buffer sizes may *not* be identical or even related. User > space may send one large input block as TX, but "process" it with numerous > small recvmsg calls. Each recvmsg call defines the cipher operation size. > This > is of particular importance for an AEAD cipher. Sure. > > > > At least, with my driver, the first we see is an error returned by the > > processing engine. That ultimately gets reported to userspace the aio > > path. > > So, you say that the error is returned. Isn't that what you want or expect? Given the issue is detected at sending to the device I would expect it to be reported then not when we get corrupt data back from the device. Note this isn't the crypto core detecting it but the hardware picking up on a malformed configuration. There could easily be other hardware out there that would not detect this at all and would simply, transparently, return the wrong value. Now it's possible it would also be detected as a wrong length at the receive end, but I'm not getting that far. I would have expected the system to have not queued it up for processing in the first place if not all the data is there (because we have no space for the result). Jonathan > > > > I'm chasing a particularly evil bug in my driver at the moment that is > > locking up the CPU and IOMMUs - if I ever figure that one out I'll run some > > fuzzing around these limits and see if we can find other cases. > > Ciao > Stephan > >
Re: [PATCH] crypto: Fix race around ctx->rcvused by making it atomic_t
On Tue, 19 Dec 2017 12:11:19 +0100 Stephan Mueller <smuel...@chronox.de> wrote: > Am Dienstag, 19. Dezember 2017, 11:31:22 CET schrieb Jonathan Cameron: > > Hi Jonathan, > > > This variable was increased and decreased without any protection. > > Result was an occasional misscount and negative wrap around resulting > > in false resource allocation failures. > > > > Signed-off-by: Jonathan Cameron <jonathan.came...@huawei.com> > > Fixes: 2d97591ef43d ("crypto: af_alg - consolidation of duplicate code") > > --- > > Resending due to incompetence on my part - sorry all! > > > > The fixes tag is probably not ideal as I'm not 100% sure when this actually > > became a bug. The code in question was introduced in > > > > Fixes: e870456d8e7c ("crypto: algif_skcipher - overhaul memory management") > > and related. > > rcvused moved in > > Fixes: 2d97591ef43d ("crypto: af_alg - consolidation of duplicate code") > > So I have gone with the latter option as that is where it will cleanly > > apply. However, it probably doesn't matter as both are present only in the > > 4.14 final kernel. > > > > crypto/af_alg.c | 4 ++-- > > crypto/algif_aead.c | 2 +- > > crypto/algif_skcipher.c | 2 +- > > include/crypto/if_alg.h | 5 +++-- > > 4 files changed, 7 insertions(+), 6 deletions(-) > > > > diff --git a/crypto/af_alg.c b/crypto/af_alg.c > > index 8612aa36a3ef..759cfa678c04 100644 > > --- a/crypto/af_alg.c > > +++ b/crypto/af_alg.c > > @@ -664,7 +664,7 @@ void af_alg_free_areq_sgls(struct af_alg_async_req > > *areq) unsigned int i; > > > > list_for_each_entry_safe(rsgl, tmp, >rsgl_list, list) { > > - ctx->rcvused -= rsgl->sg_num_bytes; > > + atomic_sub(rsgl->sg_num_bytes, >rcvused); > > af_alg_free_sg(>sgl); > > list_del(>list); > > if (rsgl != >first_rsgl) > > @@ -1173,7 +1173,7 @@ int af_alg_get_rsgl(struct sock *sk, struct msghdr > > *msg, int flags, > > > > areq->last_rsgl = rsgl; > > len += err; > > - ctx->rcvused += err; > > + atomic_add(err, >rcvused); > > rsgl->sg_num_bytes = err; > > iov_iter_advance(>msg_iter, err); > > } > > diff --git a/crypto/algif_aead.c b/crypto/algif_aead.c > > index db9378a45296..d3da3b0869f5 100644 > > --- a/crypto/algif_aead.c > > +++ b/crypto/algif_aead.c > > @@ -550,7 +550,7 @@ static int aead_accept_parent_nokey(void *private, > > struct sock *sk) INIT_LIST_HEAD(>tsgl_list); > > ctx->len = len; > > ctx->used = 0; > > - ctx->rcvused = 0; > > + atomic_set(>rcvused, 0); > > I think ATOMIC_INIT(0) is more suitable here. It's ugly to use it to assign a dynamic element like this. ctx->rcvused = (atomic_t)ATOMIC_INIT(0); Need the cast to avoid getting error: expected expression before '{' token #define ATOMIC_INIT(i) { (i) } There are only two drivers in the kernel doing this vs a lot doing initialization using the atomic_set option. I'm happy to change it though if you would prefer. > > > ctx->more = 0; > > ctx->merge = 0; > > ctx->enc = 0; > > diff --git a/crypto/algif_skcipher.c b/crypto/algif_skcipher.c > > index c7c75ef41952..a5b4898f625a 100644 > > --- a/crypto/algif_skcipher.c > > +++ b/crypto/algif_skcipher.c > > @@ -388,7 +388,7 @@ static int skcipher_accept_parent_nokey(void *private, > > struct sock *sk) INIT_LIST_HEAD(>tsgl_list); > > ctx->len = len; > > ctx->used = 0; > > - ctx->rcvused = 0; > > + atomic_set(>rcvused, 0); > > dto. > > > ctx->more = 0; > > ctx->merge = 0; > > ctx->enc = 0; > > diff --git a/include/crypto/if_alg.h b/include/crypto/if_alg.h > > index 38d9c5861ed8..f38227a78eae 100644 > > --- a/include/crypto/if_alg.h > > +++ b/include/crypto/if_alg.h > > @@ -18,6 +18,7 @@ > > #include > > #include > > #include > > +#include > > #include > > > > #include > > @@ -150,7 +151,7 @@ struct af_alg_ctx { > > struct crypto_wait wait; > > > > size_t used; > > - size_t rcvused; > > + atomic_t rcvused; > > > > bool more; > > bool merge; > > @@ -215,7 +216,7 @@ static inline int af_alg_rcvbuf(struct sock *sk) > > struct af_alg_ctx *ctx = ask->private; > > > > return max_t(int, max_t(int, sk->sk_rcvbuf & PAGE_MASK, PAGE_SIZE) - > > - ctx->rcvused, 0); > > +atomic_read(>rcvused), 0); > > } > > > > /** > > Other than the comments above: > > Reviewed-by: Stephan Mueller <smuel...@chronox.de> > > > Ciao > Stephan
Re: [Plinth PATCH]{topost} crypto: Fix race around ctx->rcvused by making it atomic_t
Sorry idiot moment. Sent that out with our internal markings. Resending shortly. On Tue, 19 Dec 2017 10:27:24 + Jonathan Cameron <jonathan.came...@huawei.com> wrote: > This variable was increased and decreased without any protection. > Result was an occasional misscount and negative wrap around resulting > in false resource allocation failures. > > Signed-off-by: Jonathan Cameron <jonathan.came...@huawei.com> > Fixes: 2d97591ef43d ("crypto: af_alg - consolidation of duplicate code") > --- > The fixes tag is probably not ideal as I'm not 100% sure when this actually > became a bug. The code in question was introduced in > > Fixes: e870456d8e7c ("crypto: algif_skcipher - overhaul memory management") > and related. > rcvused moved in > Fixes: 2d97591ef43d ("crypto: af_alg - consolidation of duplicate code") > So I have gone with the latter option as that is where it will cleanly apply. > However, it probably doesn't matter as both are present only in the 4.14 > final kernel. > > crypto/af_alg.c | 4 ++-- > crypto/algif_aead.c | 2 +- > crypto/algif_skcipher.c | 2 +- > include/crypto/if_alg.h | 5 +++-- > 4 files changed, 7 insertions(+), 6 deletions(-) > > diff --git a/crypto/af_alg.c b/crypto/af_alg.c > index 8612aa36a3ef..759cfa678c04 100644 > --- a/crypto/af_alg.c > +++ b/crypto/af_alg.c > @@ -664,7 +664,7 @@ void af_alg_free_areq_sgls(struct af_alg_async_req *areq) > unsigned int i; > > list_for_each_entry_safe(rsgl, tmp, >rsgl_list, list) { > - ctx->rcvused -= rsgl->sg_num_bytes; > + atomic_sub(rsgl->sg_num_bytes, >rcvused); > af_alg_free_sg(>sgl); > list_del(>list); > if (rsgl != >first_rsgl) > @@ -1173,7 +1173,7 @@ int af_alg_get_rsgl(struct sock *sk, struct msghdr > *msg, int flags, > > areq->last_rsgl = rsgl; > len += err; > - ctx->rcvused += err; > + atomic_add(err, >rcvused); > rsgl->sg_num_bytes = err; > iov_iter_advance(>msg_iter, err); > } > diff --git a/crypto/algif_aead.c b/crypto/algif_aead.c > index db9378a45296..d3da3b0869f5 100644 > --- a/crypto/algif_aead.c > +++ b/crypto/algif_aead.c > @@ -550,7 +550,7 @@ static int aead_accept_parent_nokey(void *private, struct > sock *sk) > INIT_LIST_HEAD(>tsgl_list); > ctx->len = len; > ctx->used = 0; > - ctx->rcvused = 0; > + atomic_set(>rcvused, 0); > ctx->more = 0; > ctx->merge = 0; > ctx->enc = 0; > diff --git a/crypto/algif_skcipher.c b/crypto/algif_skcipher.c > index c7c75ef41952..a5b4898f625a 100644 > --- a/crypto/algif_skcipher.c > +++ b/crypto/algif_skcipher.c > @@ -388,7 +388,7 @@ static int skcipher_accept_parent_nokey(void *private, > struct sock *sk) > INIT_LIST_HEAD(>tsgl_list); > ctx->len = len; > ctx->used = 0; > - ctx->rcvused = 0; > + atomic_set(>rcvused, 0); > ctx->more = 0; > ctx->merge = 0; > ctx->enc = 0; > diff --git a/include/crypto/if_alg.h b/include/crypto/if_alg.h > index 38d9c5861ed8..f38227a78eae 100644 > --- a/include/crypto/if_alg.h > +++ b/include/crypto/if_alg.h > @@ -18,6 +18,7 @@ > #include > #include > #include > +#include > #include > > #include > @@ -150,7 +151,7 @@ struct af_alg_ctx { > struct crypto_wait wait; > > size_t used; > - size_t rcvused; > + atomic_t rcvused; > > bool more; > bool merge; > @@ -215,7 +216,7 @@ static inline int af_alg_rcvbuf(struct sock *sk) > struct af_alg_ctx *ctx = ask->private; > > return max_t(int, max_t(int, sk->sk_rcvbuf & PAGE_MASK, PAGE_SIZE) - > - ctx->rcvused, 0); > + atomic_read(>rcvused), 0); > } > > /**
[Plinth PATCH]{topost} crypto: Fix race around ctx->rcvused by making it atomic_t
This variable was increased and decreased without any protection. Result was an occasional misscount and negative wrap around resulting in false resource allocation failures. Signed-off-by: Jonathan Cameron <jonathan.came...@huawei.com> Fixes: 2d97591ef43d ("crypto: af_alg - consolidation of duplicate code") --- The fixes tag is probably not ideal as I'm not 100% sure when this actually became a bug. The code in question was introduced in Fixes: e870456d8e7c ("crypto: algif_skcipher - overhaul memory management") and related. rcvused moved in Fixes: 2d97591ef43d ("crypto: af_alg - consolidation of duplicate code") So I have gone with the latter option as that is where it will cleanly apply. However, it probably doesn't matter as both are present only in the 4.14 final kernel. crypto/af_alg.c | 4 ++-- crypto/algif_aead.c | 2 +- crypto/algif_skcipher.c | 2 +- include/crypto/if_alg.h | 5 +++-- 4 files changed, 7 insertions(+), 6 deletions(-) diff --git a/crypto/af_alg.c b/crypto/af_alg.c index 8612aa36a3ef..759cfa678c04 100644 --- a/crypto/af_alg.c +++ b/crypto/af_alg.c @@ -664,7 +664,7 @@ void af_alg_free_areq_sgls(struct af_alg_async_req *areq) unsigned int i; list_for_each_entry_safe(rsgl, tmp, >rsgl_list, list) { - ctx->rcvused -= rsgl->sg_num_bytes; + atomic_sub(rsgl->sg_num_bytes, >rcvused); af_alg_free_sg(>sgl); list_del(>list); if (rsgl != >first_rsgl) @@ -1173,7 +1173,7 @@ int af_alg_get_rsgl(struct sock *sk, struct msghdr *msg, int flags, areq->last_rsgl = rsgl; len += err; - ctx->rcvused += err; + atomic_add(err, >rcvused); rsgl->sg_num_bytes = err; iov_iter_advance(>msg_iter, err); } diff --git a/crypto/algif_aead.c b/crypto/algif_aead.c index db9378a45296..d3da3b0869f5 100644 --- a/crypto/algif_aead.c +++ b/crypto/algif_aead.c @@ -550,7 +550,7 @@ static int aead_accept_parent_nokey(void *private, struct sock *sk) INIT_LIST_HEAD(>tsgl_list); ctx->len = len; ctx->used = 0; - ctx->rcvused = 0; + atomic_set(>rcvused, 0); ctx->more = 0; ctx->merge = 0; ctx->enc = 0; diff --git a/crypto/algif_skcipher.c b/crypto/algif_skcipher.c index c7c75ef41952..a5b4898f625a 100644 --- a/crypto/algif_skcipher.c +++ b/crypto/algif_skcipher.c @@ -388,7 +388,7 @@ static int skcipher_accept_parent_nokey(void *private, struct sock *sk) INIT_LIST_HEAD(>tsgl_list); ctx->len = len; ctx->used = 0; - ctx->rcvused = 0; + atomic_set(>rcvused, 0); ctx->more = 0; ctx->merge = 0; ctx->enc = 0; diff --git a/include/crypto/if_alg.h b/include/crypto/if_alg.h index 38d9c5861ed8..f38227a78eae 100644 --- a/include/crypto/if_alg.h +++ b/include/crypto/if_alg.h @@ -18,6 +18,7 @@ #include #include #include +#include #include #include @@ -150,7 +151,7 @@ struct af_alg_ctx { struct crypto_wait wait; size_t used; - size_t rcvused; + atomic_t rcvused; bool more; bool merge; @@ -215,7 +216,7 @@ static inline int af_alg_rcvbuf(struct sock *sk) struct af_alg_ctx *ctx = ask->private; return max_t(int, max_t(int, sk->sk_rcvbuf & PAGE_MASK, PAGE_SIZE) - - ctx->rcvused, 0); +atomic_read(>rcvused), 0); } /** -- 2.11.0
RFC: Crypto: Race in updating available writable socket memory.
Hi All, I came across this one but am not sure how heavy weight a fix we want to apply. I was getting failures on af_alg_readable in af_alg_get_rsgl which made little sense as I had wmem set to a very large value. This was caused by a race between ctx->rcvused += err in af_alg_get_rsgl and ctx->rcvused -= rsgl->sg_num_bytes in af_alg_free_areq_sgls which was being called from an interrupt thread. It was wrapping in a downwards direction once in a large number of cycles. Testing was using the AIO interface and libkcapi on the userspace side and a (hopefully) soon to be posted new Hisilicon 161x crypto accelerator driver. So it seems that this field needs some protection. 1. Could make it an atomic_t (works fine, but feels rather inelegant!) 2. Leverage the complexity seen in sock.c if it is needed... So the question is whether making it an atomic_t is the right way to go. Thanks, Jonathan
Re: AF_ALG: skb limits
On Fri, 8 Dec 2017 13:43:20 +0100 Stephan Mueller <smuel...@chronox.de> wrote: > Am Freitag, 8. Dezember 2017, 12:39:06 CET schrieb Jonathan Cameron: > > Hi Jonathan, > > > > > As a heads up, the other nasties we've found so far are around hitting > > limits on the various socket buffers. When you run into those you can end > > up with parts of the data to be encrypted going through without it being > > obvious. > > > > The entire code uses sock_alloc to prevent user space to chew up kernel > memory. I am aware that if you have a too low skb buffer limit, parts of the > cipher operation will not succeed as a malloc will fail. > > But that is returned with an error to user space. I 'think' there are some cases where you don't get an error to userspace - or at least not directly. In af_alg_get_rsgl, we have an af_alg_readable(sk) call which simply breaks out of the loop - leaving us with len set to potentially part of the required length - without setting a appropriate return value. I can't immediately see how this one is reported to userspace. At least, with my driver, the first we see is an error returned by the processing engine. That ultimately gets reported to userspace the aio path. I'm chasing a particularly evil bug in my driver at the moment that is locking up the CPU and IOMMUs - if I ever figure that one out I'll run some fuzzing around these limits and see if we can find other cases. Jonathan > If you observe such an > error, the entire message you wanted to read with recvmsg must be considered > tainted (i.e. you do not know which part of the message has been encrypted or > not). Thus, the recvmsg must be invoked again on the same buffer sent to the > kernel if you want to ensure you encrypted the data. > > Could you please help me understand where that functionality fails? > > PS: If you want to give more memory to your sockets, you have to tweak /proc/ > sys/net/core/optmem_max. > > Ciao > Stephan
Re: [PATCH] crypto: AF_ALG - fix race accessing cipher request
On Fri, 8 Dec 2017 11:50:37 +0100 Stephan Müller <smuel...@chronox.de> wrote: > Hi Herbert, > > This patch would go on top of 7d2c3f54e6f646887d019faa45f35d6fe9fe82ce > "crypto: af_alg - remove locking in async callback" found in Linus' tree > which is not yet in the cryptodev-2.6 tree. > > In addition, this patch is already on top of the other patches discussed > on this list fixing similar issues. I.e. depending in which order you apply > the patches, there may be a hunk. In case you want me to rebase the patch, > please let me know. > > ---8<--- > When invoking an asynchronous cipher operation, the invocation of the > callback may be performed before the subsequent operations in the > initial code path are invoked. The callback deletes the cipher request > data structure which implies that after the invocation of the > asynchronous cipher operation, this data structure must not be accessed > any more. > > The setting of the return code size with the request data structure must > therefore be moved before the invocation of the asynchronous cipher > operation. > > Fixes: e870456d8e7c ("crypto: algif_skcipher - overhaul memory management") > Fixes: d887c52d6ae4 ("crypto: algif_aead - overhaul memory management") > Reported-by: syzbot <syzkal...@googlegroups.com> > Cc: <sta...@vger.kernel.org> # v4.14+ > Signed-off-by: Stephan Mueller <smuel...@chronox.de> Acked-by: Jonathan Cameron <jonathan.came...@huawei.com> Have an identical patch in my queue having observed this happening on our hardware - though only had the skcipher case. As a heads up, the other nasties we've found so far are around hitting limits on the various socket buffers. When you run into those you can end up with parts of the data to be encrypted going through without it being obvious. Will update on that properly once I've chased down a local problem that is limiting the length of my test runs to a few million messages. Jonathan > --- > crypto/algif_aead.c | 10 +- > crypto/algif_skcipher.c | 10 +- > 2 files changed, 10 insertions(+), 10 deletions(-) > > diff --git a/crypto/algif_aead.c b/crypto/algif_aead.c > index 15e561dc47b5..d7ec2f4ebaf9 100644 > --- a/crypto/algif_aead.c > +++ b/crypto/algif_aead.c > @@ -291,6 +291,10 @@ static int _aead_recvmsg(struct socket *sock, struct > msghdr *msg, > /* AIO operation */ > sock_hold(sk); > areq->iocb = msg->msg_iocb; > + > + /* Remember output size that will be generated. */ > + areq->outlen = outlen; > + > aead_request_set_callback(>cra_u.aead_req, > CRYPTO_TFM_REQ_MAY_BACKLOG, > af_alg_async_cb, areq); > @@ -298,12 +302,8 @@ static int _aead_recvmsg(struct socket *sock, struct > msghdr *msg, >crypto_aead_decrypt(>cra_u.aead_req); > > /* AIO operation in progress */ > - if (err == -EINPROGRESS || err == -EBUSY) { > - /* Remember output size that will be generated. */ > - areq->outlen = outlen; > - > + if (err == -EINPROGRESS || err == -EBUSY) > return -EIOCBQUEUED; > - } > > sock_put(sk); > } else { > diff --git a/crypto/algif_skcipher.c b/crypto/algif_skcipher.c > index 6fb595cd63ac..baef9bfccdda 100644 > --- a/crypto/algif_skcipher.c > +++ b/crypto/algif_skcipher.c > @@ -125,6 +125,10 @@ static int _skcipher_recvmsg(struct socket *sock, struct > msghdr *msg, > /* AIO operation */ > sock_hold(sk); > areq->iocb = msg->msg_iocb; > + > + /* Remember output size that will be generated. */ > + areq->outlen = len; > + > skcipher_request_set_callback(>cra_u.skcipher_req, > CRYPTO_TFM_REQ_MAY_SLEEP, > af_alg_async_cb, areq); > @@ -133,12 +137,8 @@ static int _skcipher_recvmsg(struct socket *sock, struct > msghdr *msg, > crypto_skcipher_decrypt(>cra_u.skcipher_req); > > /* AIO operation in progress */ > - if (err == -EINPROGRESS || err == -EBUSY) { > - /* Remember output size that will be generated. */ > - areq->outlen = len; > - > + if (err == -EINPROGRESS || err == -EBUSY) > return -EIOCBQUEUED; > - } > > sock_put(sk); > } else {
Re: [PATCH v3] crypto: AF_ALG - remove locking in async callback
On Fri, 24 Nov 2017 17:04:19 +0100 Stephan Muellerwrote: > Am Freitag, 24. November 2017, 08:37:39 CET schrieb Herbert Xu: > > Hi Herbert, > > > On Fri, Nov 10, 2017 at 01:20:55PM +0100, Stephan Müller wrote: > > > The code paths protected by the socket-lock do not use or modify the > > > socket in a non-atomic fashion. The actions pertaining the socket do not > > > even need to be handled as an atomic operation. Thus, the socket-lock > > > can be safely ignored. > > > > > > This fixes a bug regarding scheduling in atomic as the callback function > > > may be invoked in interrupt context. > > > > > > In addition, the sock_hold is moved before the AIO encrypt/decrypt > > > operation to ensure that the socket is always present. This avoids a > > > tiny race window where the socket is unprotected and yet used by the AIO > > > operation. > > > > > > Finally, the release of resources for a crypto operation is moved into a > > > common function of af_alg_free_resources. > > > > > > Fixes: e870456d8e7c8 ("crypto: algif_skcipher - overhaul memory > > > management") Fixes: d887c52d6ae43 ("crypto: algif_aead - overhaul memory > > > management") Reported-by: Romain Izard > > > Signed-off-by: Stephan Mueller > > > > Patch applied. Thanks. > > Thanks a lot. > > Would it make sense to feed it to stable? > > Ciao > Stephan My view would be definitely. Ran into this precise issue whilst testing a new driver 4.14 today... Jonathan
Re: New Linux accelerators discussion list [was: Re: Fostering linux community collaboration on hardware accelerators]
+ linux-accelerat...@lists.ozlabs.org Seems sensible to have this email actually go to the new list so at least it appears in the archive. Sorry all, I should have thought of this before pressing send, Jonathan On Tue, 17 Oct 2017 13:48:10 +0100 Jonathan Cameron <jonathan.came...@huawei.com> wrote: > On Tue, 17 Oct 2017 11:00:40 +1100 > Andrew Donnellan <andrew.donnel...@au1.ibm.com> wrote: > > > On 17/10/17 01:07, Jonathan Cameron wrote: > > > > > > > > >>> So as ever with a linux community focusing on a particular topic, the > > >>> obvious solution is a mailing list. There are a number of options on how > > >>> do this. > > >>> > > >>> 1) Ask one of the industry bodies to host? Who? > > >>> > > >>> 2) Put together a compelling argument for > > >>> linux-accelerat...@vger.kernel.org > > >>> as probably the most generic location for such a list. > > >> > > >> Happy to offer linux-accelerat...@lists.ozlabs.org, which I can get set > > >> up immediately (and if we want patchwork, patchwork.ozlabs.org is > > >> available as always, no matter where the list is hosted). > > >> > > > > > > That would be great! Thanks for doing this. Much easier to find out what > > > such a list is useful for by the practical option of having a list and > > > see what people do with it. > > > > [+ LKML] > > > > We now have a mailing list: > > > >List: linux-accelerat...@lists.ozlabs.org > >Info: https://lists.ozlabs.org/listinfo/linux-accelerators > >Archives: https://lists.ozlabs.org/pipermail/linux-accelerators > > > > I haven't set up Patchwork as yet, but if people think that's a good > > idea I can get that done too. > > > > > > Andrew > > > > Thanks Andrew. > > A quick summary of initial thoughts on scope of this list for anyone > entering the discussion at this point. > Note it will hopefully evolve in whatever direction people find helpful. > This contains some suggestions not a barrier to wider scope! > > There are a number of projects / applications involving what are > termed hardware accelerators. These include: > * Traditional offload engines > - Crypto, compression, media transcoding and similar accelerators > - usecases including kTLS, Storage Encryption etc. > * Dynamic FPGA type accelerators > * ODP, DPDK and similar networking data plane - particularly dual stack > solutions where the kernel 'plays nicely' with userspace drivers. > * AI Accelerators > * Shared Virtual Memory (SVM) bus systems including Open-CAPI, CCIX etc > * Fast data flow to/from userspace applications. > > A number of existing project focus on these: > * Mainline kernel drivers > - Numerous crypto drivers etc > - Open-CAPI > * Various networking data plane activities > * VFIO based and similar userspace drivers > Hopefully this list can provide a broader forum where more general > questions are being considered. > > The discussion that lead to this list was that a number of people would > like a general open forum on which to discuss ideas with scope beyond > simply one kernel subsystem or one particular userspace framework. > > Topics might include > * RFCs and early reviews of new approaches. > * Similar hardware - who is trying to solve the same problems? > * What would we ideally want from new hardware iterations? > * Hardware description - the question of how to chose a particular > crypto engine is very dependent on the particular data flows. > Sometimes hardware accelerators don't actually help due to overheads. > Understanding those barriers would be very useful. > * Upstream paths - blockers and how to work with the communities to > overcome them. > * Fostering stronger userspace communities to allow these accelerators to > be easily harnessed. > - A number of projects have been highlighted in this thread > OpenStack (cyborg project), openMP accelerator support > * Robustness / security of userspace frameworks. > * Virtualisation of accelerators > > Anyhow, this email was just meant to draw together some thoughts. > It will be interesting to see what the list actually gets used for :) > > Jonathan
Re: New Linux accelerators discussion list [was: Re: Fostering linux community collaboration on hardware accelerators]
On Tue, 17 Oct 2017 11:00:40 +1100 Andrew Donnellan <andrew.donnel...@au1.ibm.com> wrote: > On 17/10/17 01:07, Jonathan Cameron wrote: > > > > > >>> So as ever with a linux community focusing on a particular topic, the > >>> obvious solution is a mailing list. There are a number of options on how > >>> do this. > >>> > >>> 1) Ask one of the industry bodies to host? Who? > >>> > >>> 2) Put together a compelling argument for > >>> linux-accelerat...@vger.kernel.org > >>> as probably the most generic location for such a list. > >> > >> Happy to offer linux-accelerat...@lists.ozlabs.org, which I can get set > >> up immediately (and if we want patchwork, patchwork.ozlabs.org is > >> available as always, no matter where the list is hosted). > >> > > > > That would be great! Thanks for doing this. Much easier to find out what > > such a list is useful for by the practical option of having a list and > > see what people do with it. > > [+ LKML] > > We now have a mailing list: > >List: linux-accelerat...@lists.ozlabs.org >Info: https://lists.ozlabs.org/listinfo/linux-accelerators >Archives: https://lists.ozlabs.org/pipermail/linux-accelerators > > I haven't set up Patchwork as yet, but if people think that's a good > idea I can get that done too. > > > Andrew > Thanks Andrew. A quick summary of initial thoughts on scope of this list for anyone entering the discussion at this point. Note it will hopefully evolve in whatever direction people find helpful. This contains some suggestions not a barrier to wider scope! There are a number of projects / applications involving what are termed hardware accelerators. These include: * Traditional offload engines - Crypto, compression, media transcoding and similar accelerators - usecases including kTLS, Storage Encryption etc. * Dynamic FPGA type accelerators * ODP, DPDK and similar networking data plane - particularly dual stack solutions where the kernel 'plays nicely' with userspace drivers. * AI Accelerators * Shared Virtual Memory (SVM) bus systems including Open-CAPI, CCIX etc * Fast data flow to/from userspace applications. A number of existing project focus on these: * Mainline kernel drivers - Numerous crypto drivers etc - Open-CAPI * Various networking data plane activities * VFIO based and similar userspace drivers Hopefully this list can provide a broader forum where more general questions are being considered. The discussion that lead to this list was that a number of people would like a general open forum on which to discuss ideas with scope beyond simply one kernel subsystem or one particular userspace framework. Topics might include * RFCs and early reviews of new approaches. * Similar hardware - who is trying to solve the same problems? * What would we ideally want from new hardware iterations? * Hardware description - the question of how to chose a particular crypto engine is very dependent on the particular data flows. Sometimes hardware accelerators don't actually help due to overheads. Understanding those barriers would be very useful. * Upstream paths - blockers and how to work with the communities to overcome them. * Fostering stronger userspace communities to allow these accelerators to be easily harnessed. - A number of projects have been highlighted in this thread OpenStack (cyborg project), openMP accelerator support * Robustness / security of userspace frameworks. * Virtualisation of accelerators Anyhow, this email was just meant to draw together some thoughts. It will be interesting to see what the list actually gets used for :) Jonathan
Re: Fostering linux community collaboration on hardware accelerators
> > So as ever with a linux community focusing on a particular topic, the > > obvious solution is a mailing list. There are a number of options on how > > do this. > > > > 1) Ask one of the industry bodies to host? Who? > > > > 2) Put together a compelling argument for linux-accelerat...@vger.kernel.org > > as probably the most generic location for such a list. > > Happy to offer linux-accelerat...@lists.ozlabs.org, which I can get set > up immediately (and if we want patchwork, patchwork.ozlabs.org is > available as always, no matter where the list is hosted). > That would be great! Thanks for doing this. Much easier to find out what such a list is useful for by the practical option of having a list and see what people do with it. Thanks, Jonathan > > More open questions are > > 1) Scope? > > * Would anyone ever use such an overarching list? > > * Are we better off with the usual adhoc list of 'interested parties' + > > lkml? > > * Do we actually need to define the full scope - are we better with a > > vague > > definition? > > I think a list with a broad and vaguely defined scope is a good idea - > it would certainly be helpful to us to be able to follow what other > contributors are doing that could be relevant to our CAPI and OpenCAPI work. > > > > > 2) Is there an existing community we can use to discuss these issues? > > (beyond the obvious firehose of LKML). > > > > 3) Who else to approach for input on these general questions? > > > > In parallel to this there are elements such as git / patchwork etc but > > they can all be done as they are needed. > > > > Thanks > > > > -- > > Jonathan Cameron > > Huawei > > >
Re: Fostering linux community collaboration on hardware accelerators
On Wed, 11 Oct 2017 11:51:49 -0400 Sandy Harris <sandyinch...@gmail.com> wrote: > I shortened the cc list radically. If the discussion continues, it may > be a good idea to add people back. I added John Gilmore since I cite > one of his posts below. Fair enough - I have cc'd back in our internal list for now. I cynically take the view that people in this community are very good at ignoring emails they aren't interested in :) > > Jonathan Cameron <jonathan.came...@huawei.com> wrote: > > > On behalf of Huawei, I am looking into options to foster a wider community > > around the various ongoing projects related to Accelerator support within > > Linux. The particular area of interest to Huawei is that of harnessing > > accelerators from userspace, but in a collaborative way with the kernel > > still able to make efficient use of them, where appropriate. > > > > We are keen to foster a wider community than one just focused on > > our own current technology. > > Good stuff, but there are problems. e.g. see the thread starting > with my message here: > https://www.mail-archive.com/linux-crypto@vger.kernel.org/msg27274.html An interesting email - we have heard (and observed) the same working with the optimized ARM algorithm implementations. In particular the point about 'when a given implementation is quickest' is important. This isn't even obvious for the various ARM options available in mainline. There is almost always more overhead (for now ;) in using a crypto accelerators than doing it on the CPU which puts an obvious issue in place for small data packets. It's not easy - however not all crypto needs to come out of the kernel again and with care you can avoid duplicating the overheads. e.g. * Kernel TLS moves the crypto down into the kernel so data goes in once (which has to happen anyway to get it to the network card). * Storage encryption - again data is going across the kernel boundary anyway. > > My perspective is that of a crypto guy working on general-purpose > CPUs, anything from 32-bit ARM up. There are certainly problems for > devices with massive loads like a high-end router or with more limited > CPUs that I will not even pretend to address. Agreed - the application area is certainly not general. This stuff consumes a lot of silicon - you really have to need it to make it worth the expense. > For me, far & away the biggest issue is having a good source of random > numbers; more-or-less all crypto depends on that. The Linux random(4) > RNG gets close, but there are cases where it may not be well > initialized soon enough on some systems. If a system provides a > hardware RNG, I will certainly use it to feed random(4). I do not care > nearly as much about anything else that might be in a hardware crypto > accelerator. Absolutely - that needs to be part of a complete solution. > > Separate accelerator devices require management, separating accesses > by different kernel threads or by user processes if they are allowed > to play, keeping them from seeing each other's keys, perhaps saving & > restoring state sometimes. This can be assisted a lot by hardware context management (this look similar to what you see on high end network cards with things that look like queue steering etc). Note, what we are trying to avoid is everything going full userspace driver stack as has happened for some of those sorts of systems (e.g. Solarflare's offering) This hardware is appearing in crypto accelerators - until it is there I agree this is a really nasty problem - be it perhaps not an unsolvable one. There are intermediate levels where the hardware can handle a small number of contexts but to be truely useful you need to be able to use lots and have the throughput to support them all with reasonable latency. > Things that can be built into the CPU -- > RNG instruction or register, AES instructions, Intel's instruction for > 128-bit finite field multiplication which accelerates AES-GCM, > probably some I have not heard of -- require less management and are > usable by any process, assuming either compiler support or some > assembler code. As a software guy, I'd far rather the hardware > designers gave me those than anything that needs a driver. I absolutely agree. If you can get away with doing crypto on a CPU (under constraints of power usage and other needs for the cpu) then do it that way. The targets for this stuff are where crypto has become a bottleneck not cases where a good software implementation is quick enough. Our initial focus in my team at Huawei, has been crypto partly because we have a reasonably capable engine to play with - it is just one of many options moving forward. Jonathan
Re: Fostering linux community collaboration on hardware accelerators
On Tue, 10 Oct 2017 16:55:06 +0200 Francois Ozog <francois.o...@linaro.org> wrote: > Hi, Hi Francois, Firstly, to anyone seeing this on the mailing lists, there was an attachment so it presumably bounced. > > I am really happy you started this thread. I think we need to have Grant > Likely from Arm involved in the topic. I only have his HPE mail and not his > new Arm. I also added Andrea Gallo who has visibility of all segment groups > in Linaro. Good idea. Given the simple nature of your average ARM email address I've one with guessing Grant's *crosses fingers*. > > > As you say, there are more questions than answers at this stage and I'd > like to share thoughts and related activities I already conducted so that > you can take those into account (if they are relevant) to build your action > plan. > > > On the scope of accelerators I include media related (transcoding...) and > FPGAs. Two more for the list - I knew I've forget some big groups ;) > I kicked off discussions with various parties on how to build an > FPGA focused community (see attached document on FPGAs). The FPGAs will > need proper "factory" of userland drivers and I expect WD to be > instrumental in this. > > On the "ODP type" requirements: I think we have to include both ODP and > DPDK. > > Robustness/security of the userland framework: if anything happens to > userland, the kernel has to recover. This issue is a key one I'd missed when drawing up my list. Glad you raised it. > Exposing pieces of PCI config space or > portions of MMIOs may be a new area of intellectual property for Arm. > > I was leading acceleration activities in ETSI NFV (IFA stream 2 group that > was created by Yun-Chao Hu@ Huawei, Bruno Chatras@Orange and myself) and > talked to Orange, BT, Telecom Italia and AT yesterday. The guys I know > are keen to participate to an advisory board related to acceleration in > Opensource Communities. They see the value of continuing the work done in > SDOs (ETSI, IETF, IEEE...) in OCs (Linaro, Linux Foundation...). Interesting - of course such a structure might be useful. Of course their input along with that of others would be of great assistance. However, any formal arrangement or consortium does bring costs that may put some individuals or companies off involvement. To my mind, any Linux focused effort needs to incorporate those interests, but take in wider views. That's not to say that any Linux community focused effort would not be a good place for ideas coming out of industry consortiums / advisory groups to be presented and evaluated as an integral part of the community. >I think > their participation to the acceleration efforts is key to ensure we solve > the right problem the right way. I am trying to organize an accelerator > workshop "accelerators: from SDO to Opensource) at the ETSI NFV F2F meeting > (NFV#20 December 4-8th in Sophia Antipolis, France). An agenda item is also > how to deal with Intel EPA that introduce bias in platform and accelerator > "scheduling" in Openstack terminology. > > As of particular interest is accelerator management in the context of Cloud > and NFV. Huawei is leading efforts in Cyborg project (OpenStack) and I > think, while it should be kept as a separate effort, we should be aligning > our efforts in a way or the other. Management, in a more general sense isn't something I had really considered. Efforts such as Cyborg seem to sit at a higher level in the stack but clearly you are correct in suggesting that aligning efforts makes sense. > > CCIX will have a massive impact on networking. With DMA it is all about > packet MOVEMENT. With CCIX this is about cacheline games. Dealing with > headroom and tail room of packets will no longer be an issue. Achieving > 50Gbps and above line rate bumped into DMA transactions per second problem. > Wit CCIX this is no longer an issue (at least on paper). This is true of both CCIX and some competing / alternative technologies. I am very keen to embrace that competition as well so that any results of collaboration help everyone! Not to mention there is plenty of existing hardware with adhoc solutions in place already. The one big element I clearly forgot to mention in the below scope is virtualization (oops). Any solutions clearly need to take that into account as well. Jonathan > > > > Cordially, > > François-Frédéric > > On 10 October 2017 at 13:28, Jonathan Cameron <jonathan.came...@huawei.com> > wrote: > > > Hi All, > > > > Please forward this email to anyone you think may be interested. > > > > On behalf of Huawei, I am looking into options to foster a wider community > > around the various ongoing projects related to Accelerator support
Fostering linux community collaboration on hardware accelerators
Hi All, Please forward this email to anyone you think may be interested. On behalf of Huawei, I am looking into options to foster a wider community around the various ongoing projects related to Accelerator support within Linux. The particular area of interest to Huawei is that of harnessing accelerators from userspace, but in a collaborative way with the kernel still able to make efficient use of them, where appropriate. We are keen to foster a wider community than one just focused on our own current technology. This is a field with no clear answers, so the widest possible range of input is needed! The address list of this email is drawn from people we have had discussions with or who have been suggested in response to Kenneth Lee's wrapdrive presentation at Linaro Connect and earlier presentations on the more general issue. A few relevant lists added to hopefully catch anyone we missed. My apologies to anyone who got swept up in this and isn't interested! Here we are defining accelerators fairly broadly - suggestions for a better term are also welcome. The infrastructure may be appropriate for: * Traditional offload engines - cryptography, compression and similar * Upcoming AI accelerators * ODP type requirements for access to elements of networking * Systems utilizing SVM including CCIX and other cache coherent buses * Many things we haven't thought of yet... As I see it, there are several aspects to this: 1) Kernel drivers for accelerators themselves. * Traditional drivers such as crypto etc - These already have their own communities. The main focus of such work will always be through them. - What a more general community could add here would be an overview of the shared infrastructure of such devices. This is particularly true around VFIO based (or similar) userspace interfaces with a non trivial userspace component. * How to support new types of accelerator? 2) The need for lightweight access paths from userspace that 'play well' and share resources etc with standard in-kernel drivers. This is the area that Kenneth Lee and Huawei have been focusing on with their wrapdrive effort. We know there are other similar efforts going on in other companies. * This may involve interacting with existing kernel communities such as those around VFIO and mdev. * Resource management when we may have many consumers - not all hardware has appropriate features to deal with this. 3) Usecases for accelerators. e.g. * kTLS * Storage encryption * ODP - networking dataplane * AI toolkits Discussions we want to get started include: * A wider range of hardware than we are currently considering. What makes sense to target / what hardware do people have they would like to support? * Upstream paths - potential blockers and how to overcome them. The standard kernel drivers should be fairly straightforward, but once we start looking at systems with a heavier userspace component, things will get more controversial! * Fostering stronger userspace communities to allow these these accelerators to be easily harnessed. So as ever with a linux community focusing on a particular topic, the obvious solution is a mailing list. There are a number of options on how do this. 1) Ask one of the industry bodies to host? Who? 2) Put together a compelling argument for linux-accelerat...@vger.kernel.org as probably the most generic location for such a list. More open questions are 1) Scope? * Would anyone ever use such an overarching list? * Are we better off with the usual adhoc list of 'interested parties' + lkml? * Do we actually need to define the full scope - are we better with a vague definition? 2) Is there an existing community we can use to discuss these issues? (beyond the obvious firehose of LKML). 3) Who else to approach for input on these general questions? In parallel to this there are elements such as git / patchwork etc but they can all be done as they are needed. Thanks -- Jonathan Cameron Huawei
Re: [PATCH v5 05/19] crypto: introduce crypto wait for async op
On Mon, 14 Aug 2017 18:21:15 +0300 Gilad Ben-Yossefwrote: > Invoking a possibly async. crypto op and waiting for completion > while correctly handling backlog processing is a common task > in the crypto API implementation and outside users of it. > > This patch adds a generic implementation for doing so in > preparation for using it across the board instead of hand > rolled versions. > > Signed-off-by: Gilad Ben-Yossef > CC: Eric Biggers One really trivial point inline I happened to notice. > --- > crypto/api.c | 13 + > include/linux/crypto.h | 41 + > 2 files changed, 54 insertions(+) > > diff --git a/crypto/api.c b/crypto/api.c > index 941cd4c..2a2479d 100644 > --- a/crypto/api.c > +++ b/crypto/api.c > @@ -24,6 +24,7 @@ > #include > #include > #include > +#include > #include "internal.h" > > LIST_HEAD(crypto_alg_list); > @@ -595,5 +596,17 @@ int crypto_has_alg(const char *name, u32 type, u32 mask) > } > EXPORT_SYMBOL_GPL(crypto_has_alg); > > +void crypto_req_done(struct crypto_async_request *req, int err) > +{ > + struct crypto_wait *wait = req->data; > + > + if (err == -EINPROGRESS) > + return; > + > + wait->err = err; > + complete(>completion); > +} > +EXPORT_SYMBOL_GPL(crypto_req_done); > + > MODULE_DESCRIPTION("Cryptographic core API"); > MODULE_LICENSE("GPL"); > diff --git a/include/linux/crypto.h b/include/linux/crypto.h > index 84da997..bb00186 100644 > --- a/include/linux/crypto.h > +++ b/include/linux/crypto.h > @@ -24,6 +24,7 @@ > #include > #include > #include > +#include > > /* > * Autoloaded crypto modules should only use a prefixed name to avoid > allowing > @@ -468,6 +469,45 @@ struct crypto_alg { > } CRYPTO_MINALIGN_ATTR; > > /* > + * A helper struct for waiting for completion of async crypto ops > + */ > +struct crypto_wait { > + struct completion completion; > + int err; > +}; > + > +/* > + * Macro for declaring a crypto op async wait object on stack > + */ > +#define DECLARE_CRYPTO_WAIT(_wait) \ > + struct crypto_wait _wait = { \ > + COMPLETION_INITIALIZER_ONSTACK((_wait).completion), 0 } > + > +/* > + * Async ops completion helper functioons > + */ > +void crypto_req_done(struct crypto_async_request *req, int err); > + > +static inline int crypto_wait_req(int err, struct crypto_wait *wait) > +{ > + switch (err) { > + case -EINPROGRESS: > + case -EBUSY: > + wait_for_completion(>completion); > + reinit_completion(>completion); > + err = wait->err; > + break; > + }; > + > + return err; > +} > + > +static inline void crypto_init_wait(struct crypto_wait *wait) > +{ > + init_completion(>completion); > +} > + > +/* > * Algorithm registration interface. > */ > int crypto_register_alg(struct crypto_alg *alg); > @@ -1604,5 +1644,6 @@ static inline int crypto_comp_decompress(struct > crypto_comp *tfm, > src, slen, dst, dlen); > } > > + Trivial observation. Shouldn't have this bonus blank line inserted here. > #endif /* _LINUX_CRYPTO_H */ >
Re: [PATCH v2 00/11] getting back -Wmaybe-uninitialized
On 11/11/16 19:49, Arnd Bergmann wrote: > On Friday, November 11, 2016 9:13:00 AM CET Linus Torvalds wrote: >> On Thu, Nov 10, 2016 at 8:44 AM, Arnd Bergmannwrote: >>> >>> Please merge these directly if you are happy with the result. >> >> I will take this. > > Thanks a lot! > >> I do see two warnings, but they both seem to be valid and recent, >> though, so I have no issues with the spurious cases. > > Ok, both of them should have my fixes coming your way already. > >> Warning #1: >> >> sound/soc/qcom/lpass-platform.c: In function ‘lpass_platform_pcmops_open’: >> sound/soc/qcom/lpass-platform.c:83:29: warning: ‘dma_ch’ may be used >> uninitialized in this function [-Wmaybe-uninitialized] >> drvdata->substream[dma_ch] = substream; >> ~~~^~~ >> >> and 'dma_ch' usage there really is crazy and wrong. Broken by >> 022d00ee0b55 ("ASoC: lpass-platform: Fix broken pcm data usage") > > Right, the patches crossed here, the bugfix patch that introduced > this came into linux-next over the kernel summit, and the fix I > sent on Tuesday made it into Mark Brown's tree on Wednesday but not > before you pulled alsa tree. It should be fixed the next time you > pull from the alsa tree, the commit is > > 3b89e4b77ef9 ("ASoC: lpass-platform: initialize dma channel number") > >> Warning #2 is not a real bug, but it's reasonable that gcc doesn't >> know that storage_bytes (chip->read_size) has to be 2/4. Again, >> introduced recently by commit 231147ee77f3 ("iio: maxim_thermocouple: >> Align 16 bit big endian value of raw reads"), so you didn't see it. > > This is the one I mentioned in the commit message as one that > is fixed in linux-next and that should make it in soon. > >> drivers/iio/temperature/maxim_thermocouple.c: In function >> ‘maxim_thermocouple_read_raw’: >> drivers/iio/temperature/maxim_thermocouple.c:141:5: warning: ‘ret’ >> may be used uninitialized in this function [-Wmaybe-uninitialized] >> if (ret) >>^ >> drivers/iio/temperature/maxim_thermocouple.c:128:6: note: ‘ret’ was >> declared here >> int ret; >> ^~~ >> >> and I guess that code can just initialize 'ret' to '-EINVAL' or >> something to just make the theoretical "somehow we had a wrong >> chip->read_size" case error out cleanly. > > Right, that was my conclusion too. I sent the bugfix on Oct 25 > for linux-next but it didn't make it in until this Monday, after > you pulled the patch that introduced it on Oct 29. > > The commit in staging-testing is > 32cb7d27e65d ("iio: maxim_thermocouple: detect invalid storage size in > read()") > > Greg and Jonathan, I see now that this is part of the 'iio-for-4.10b' > branch, so I suspect you were not planning to send this before the > merge window. Could you make sure this ends up in v4.9 so we get > a clean build when -Wmaybe-uninitialized gets enabled again? I'll queue this up and send a pull to Greg tomorrow. Was highly doubtful that a false warning suppression (be it an understandable one) was worth sending mid cycle, hence it was taking the slow route. Jonathan > > Arnd > -- To unsubscribe from this list: send the line "unsubscribe linux-crypto" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: linux-next: Tree for Jan 23 (mtd/ubi and iio and crypto/crc32[c])
On 01/23/2013 11:41 PM, Randy Dunlap wrote: On 01/23/13 15:23, Herbert Xu wrote: On Wed, Jan 23, 2013 at 03:10:01PM -0800, Randy Dunlap wrote: On 01/22/13 22:43, Stephen Rothwell wrote: Hi all, Changes since 20130122: on i386: ERROR: crc32_le [drivers/mtd/ubi/ubi.ko] undefined! ERROR: iio_triggered_buffer_setup [drivers/iio/adc/max1363.ko] undefined! ERROR: iio_triggered_buffer_cleanup [drivers/iio/adc/max1363.ko] undefined! ERROR: __crc32c_le [crypto/crc32c.ko] undefined! ERROR: crc32_le [crypto/crc32.ko] undefined! Weird. You do have CONFIG_CRC32 selected, so did it compile a lib/crc32.ko or not? Hmph, this seems to be some kind of rebuild problem. It's not reproducible, so the mtd/ubi and crypto reports above are incorrect -- please ignore them. Only the iio one is reproducible. Sorry about that. For iio one had a patch for a while, just not had a chance to send a pull request to Greg until today. Sorry for the slow response. -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html