from:"Jonathan Cameron"

[PATCH 2/2] arm64: dts: hisilicon hip07: drop commas from @address node names

2018-09-12 Thread Jonathan Cameron

Reported-by: Rob Herring 
Signed-off-by: Jonathan Cameron 
Fixes: e4a1f7858ab8 ("arm64: dts: hisi: add SEC crypto accelerator nodes for 
hip07 SoC")
---
 arch/arm64/boot/dts/hisilicon/hip07.dtsi | 28 
 1 file changed, 14 insertions(+), 14 deletions(-)

diff --git a/arch/arm64/boot/dts/hisilicon/hip07.dtsi 
b/arch/arm64/boot/dts/hisilicon/hip07.dtsi
index c33adefc3061..bc0113445655 100644
--- a/arch/arm64/boot/dts/hisilicon/hip07.dtsi
+++ b/arch/arm64/boot/dts/hisilicon/hip07.dtsi
@@ -949,35 +949,35 @@
reg = <0x0 0xc600 0x0 0x4>;
};
 
-   p0_its_dsa_b: interrupt-controller@8,c600 {
+   p0_its_dsa_b: interrupt-controller@8c600 {
compatible = "arm,gic-v3-its";
msi-controller;
#msi-cells = <1>;
reg = <0x8 0xc600 0x0 0x4>;
};
 
-   p1_its_peri_a: interrupt-controller@400,4c00 {
+   p1_its_peri_a: interrupt-controller@4004c00 {
compatible = "arm,gic-v3-its";
msi-controller;
#msi-cells = <1>;
reg = <0x400 0x4c00 0x0 0x4>;
};
 
-   p1_its_peri_b: interrupt-controller@400,6c00 {
+   p1_its_peri_b: interrupt-controller@4006c00 {
compatible = "arm,gic-v3-its";
msi-controller;
#msi-cells = <1>;
reg = <0x400 0x6c00 0x0 0x4>;
};
 
-   p1_its_dsa_a: interrupt-controller@400,c600 {
+   p1_its_dsa_a: interrupt-controller@400c600 {
compatible = "arm,gic-v3-its";
msi-controller;
#msi-cells = <1>;
reg = <0x400 0xc600 0x0 0x4>;
};
 
-   p1_its_dsa_b: interrupt-controller@408,c600 {
+   p1_its_dsa_b: interrupt-controller@408c600 {
compatible = "arm,gic-v3-its";
msi-controller;
#msi-cells = <1>;
@@ -1066,7 +1066,7 @@
num-pins = <3>;
};
};
-   p0_mbigen_alg_b:interrupt-controller@8,d008 {
+   p0_mbigen_alg_b:interrupt-controller@8d008 {
compatible = "hisilicon,mbigen-v2";
reg = <0x8 0xd008 0x0 0x1>;
 
@@ -1083,7 +1083,7 @@
num-pins = <3>;
};
};
-   p1_mbigen_alg_a:interrupt-controller@400,d008 {
+   p1_mbigen_alg_a:interrupt-controller@400d008 {
compatible = "hisilicon,mbigen-v2";
reg = <0x400 0xd008 0x0 0x1>;
 
@@ -1100,7 +1100,7 @@
num-pins = <3>;
};
};
-   p1_mbigen_alg_b:interrupt-controller@408,d008 {
+   p1_mbigen_alg_b:interrupt-controller@408d008 {
compatible = "hisilicon,mbigen-v2";
reg = <0x408 0xd008 0x0 0x1>;
 
@@ -1187,7 +1187,7 @@
hisilicon,broken-prefetch-cmd;
/* smmu-cb-memtype = <0x0 0x1>;*/
};
-   p0_smmu_alg_b: smmu_alg@8,d004 {
+   p0_smmu_alg_b: smmu_alg@8d004 {
compatible = "arm,smmu-v3";
reg = <0x8 0xd004 0x0 0x2>;
interrupt-parent = <_mbigen_smmu_alg_b>;
@@ -1200,7 +1200,7 @@
hisilicon,broken-prefetch-cmd;
/* smmu-cb-memtype = <0x0 0x1>;*/
};
-   p1_smmu_alg_a: smmu_alg@400,d004 {
+   p1_smmu_alg_a: smmu_alg@400d004 {
compatible = "arm,smmu-v3";
reg = <0x400 0xd004 0x0 0x2>;
interrupt-parent = <_mbigen_smmu_alg_a>;
@@ -1213,7 +1213,7 @@
hisilicon,broken-prefetch-cmd;
/* smmu-cb-memtype = <0x0 0x1>;*/
};
-   p1_smmu_alg_b: smmu_alg@408,d004 {
+   p1_smmu_alg_b: smmu_alg@408d004 {
compatible = "arm,smmu-v3";
reg = <0x408 0xd004 0x0 0x2>;
interrupt-parent = <_mbigen_smmu_alg_b>;
@@ -1763,7 +1763,7 @@
 <605 1>, <606 4>,
 <607 1>, <608 4>;
};
-   p0_sec_b: crypto@8,d200 {
+   p0_sec_b: crypto@8d200 {
compatible = "hisilicon,hip07-sec";

[PATCH 1/2] dt-bindings: crypto: hip07-sec, drop incorrect commas

2018-09-12 Thread Jonathan Cameron

There should not be a comma in the address used for the instance
so drop them.

This is a left over from a review of the final version before
Herbert Xu picked the series up.

Reported-by: Rob Herring 
Signed-off-by: Jonathan Cameron 
Fixes: 8f026dc887fe ("dt-bindings: Add bidnings for Hisilicon SEC crypto 
accelerators")
---
 .../devicetree/bindings/crypto/hisilicon,hip07-sec.txt  | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Documentation/devicetree/bindings/crypto/hisilicon,hip07-sec.txt 
b/Documentation/devicetree/bindings/crypto/hisilicon,hip07-sec.txt
index 78d2db9d4de5..d28fd1af01b4 100644
--- a/Documentation/devicetree/bindings/crypto/hisilicon,hip07-sec.txt
+++ b/Documentation/devicetree/bindings/crypto/hisilicon,hip07-sec.txt
@@ -24,7 +24,7 @@ Optional properties:
 
 Example:
 
-p1_sec_a: crypto@400,d200 {
+p1_sec_a: crypto@400d200 {
compatible = "hisilicon,hip07-sec";
reg = <0x400 0xd000 0x0 0x1
   0x400 0xd200 0x0 0x1
-- 
2.18.0

[PATCH 0/2] dt-bindings: Cleanup spurious commas in hisilicon hip07 dt.

2018-09-12 Thread Jonathan Cameron

Rob Herring picked up on an issue in our sec dt binding doc where we had
an unnecessary comma in the addresses provided as part of the node name.
The comment wasn't fixed before the series was applied, hence this follow up.

This was a cut and paste job from a few of the its entries already present
in the hip07 dtsi.

This series tidies up this loose end.  It has no particular urgency
as nothing stops working if you have a comma present and it is unlikely
an userspace code would be using these.

Jonathan Cameron (2):
  dt-bindings: crypto: hip07-sec, drop incorrect commas
  arm64: dts: hisilicon hip07: drop commas from @address node names

 .../bindings/crypto/hisilicon,hip07-sec.txt   |  2 +-
 arch/arm64/boot/dts/hisilicon/hip07.dtsi  | 28 +--
 2 files changed, 15 insertions(+), 15 deletions(-)

-- 
2.18.0

Re: [bug report] crypto: hisilicon - SEC security accelerator driver

2018-08-15 Thread Jonathan Cameron

On Mon, 6 Aug 2018 23:12:44 +0300
Dan Carpenter  wrote:

> Hello Jonathan Cameron,
> 
> The patch 915e4e8413da: "crypto: hisilicon - SEC security accelerator
> driver" from Jul 23, 2018, leads to the following static checker
> warning:
> 
>   drivers/crypto/hisilicon/sec/sec_algs.c:865 sec_alg_skcipher_crypto()
>   error: double free of 'split_sizes'
> 
> drivers/crypto/hisilicon/sec/sec_algs.c
>808  
>809  /* Cleanup - all elements in pointer arrays have been coppied 
> */
>810  kfree(splits_in_nents);
>811  kfree(splits_in);
>812  kfree(splits_out_nents);
>813  kfree(splits_out);
>814  kfree(split_sizes);

Thanks Dan,

I clearly missed up the error paths when adding the backlog support.
Sorry for the delay, this came in just as I went on vacation.  Should get
a patch out later this week.

Easiest is probably going to be just doing this kfree block later, after the 
last error
handling path.

Jonathan


>   ^^^
> Free
> 
>815  
>816  /* Grab a big lock for a long time to avoid concurrency 
> issues */
>817  mutex_lock(>queuelock);
>818  
>819  /*
>820   * Can go on to queue if we have space in either:
>821   * 1) The hardware queue and no software queue
>822   * 2) The software queue
>823   * AND there is nothing in the backlog.  If there is backlog 
> we
>824   * have to only queue to the backlog queue and return busy.
>825   */
>826  if ((!sec_queue_can_enqueue(queue, steps) &&
>827   (!queue->havesoftqueue ||
>828kfifo_avail(>softqueue) > steps)) ||
>829  !list_empty(>backlog)) {
>830  if ((skreq->base.flags & CRYPTO_TFM_REQ_MAY_BACKLOG)) 
> {
>831  list_add_tail(_req->backlog_head, 
> >backlog);
>832  mutex_unlock(>queuelock);
>833  return -EBUSY;
>834  }
>835  
>836  ret = -EBUSY;
>837  mutex_unlock(>queuelock);
>838  goto err_free_elements;
> ^^
>839  }
>840  ret = sec_send_request(sec_req, queue);
>841  mutex_unlock(>queuelock);
>842  if (ret)
>843  goto err_free_elements;
> ^^
>844  
>845  return -EINPROGRESS;
>846  
>847  err_free_elements:
>848  list_for_each_entry_safe(el, temp, _req->elements, head) {
>849  list_del(>head);
>850  sec_alg_free_el(el, info);
>851  }
>852  if (crypto_skcipher_ivsize(atfm))
>853  dma_unmap_single(info->dev, sec_req->dma_iv,
>854   crypto_skcipher_ivsize(atfm),
>855   DMA_BIDIRECTIONAL);
>856  err_unmap_out_sg:
>857  if (skreq->src != skreq->dst)
>858  sec_unmap_sg_on_err(skreq->dst, steps, splits_out,
>859  splits_out_nents, 
> sec_req->len_out,
>860  info->dev);
>861  err_unmap_in_sg:
>862  sec_unmap_sg_on_err(skreq->src, steps, splits_in, 
> splits_in_nents,
>863  sec_req->len_in, info->dev);
>864  err_free_split_sizes:
>865  kfree(split_sizes);
> ^^^
> Double free.
> 
>866  
>867  return ret;
>868  }
> 
> regards,
> dan carpenter

[PATCH 2/3] crypto: hisilicon SEC security accelerator driver

2018-07-23 Thread Jonathan Cameron

This accelerator is found inside hisilicon hip06 and hip07 SoCs.
Each instance provides a number of queues which feed a different number of
backend acceleration units.

The queues are operating in an out of order mode in the interests of
throughput. The silicon does not do tracking of dependencies between
multiple 'messages' or update of the IVs as appropriate for training.
Hence where relevant we need to do this in software.

Signed-off-by: Jonathan Cameron 
---
 drivers/crypto/Kconfig  |2 +
 drivers/crypto/Makefile |1 +
 drivers/crypto/hisilicon/Kconfig|   14 +
 drivers/crypto/hisilicon/Makefile   |2 +
 drivers/crypto/hisilicon/sec/Makefile   |3 +
 drivers/crypto/hisilicon/sec/sec_algs.c | 1122 ++
 drivers/crypto/hisilicon/sec/sec_drv.c  | 1323 +++
 drivers/crypto/hisilicon/sec/sec_drv.h  |  428 ++
 8 files changed, 2895 insertions(+)

diff --git a/drivers/crypto/Kconfig b/drivers/crypto/Kconfig
index d1ea1a07cecb..d0b80d0d1f8b 100644
--- a/drivers/crypto/Kconfig
+++ b/drivers/crypto/Kconfig
@@ -750,4 +750,6 @@ config CRYPTO_DEV_CCREE
  cryptographic operations on the system REE.
  If unsure say Y.
 
+source "drivers/crypto/hisilicon/Kconfig"
+
 endif # CRYPTO_HW
diff --git a/drivers/crypto/Makefile b/drivers/crypto/Makefile
index 7ae87b4f6c8d..ee43aed8cb69 100644
--- a/drivers/crypto/Makefile
+++ b/drivers/crypto/Makefile
@@ -45,3 +45,4 @@ obj-$(CONFIG_CRYPTO_DEV_VMX) += vmx/
 obj-$(CONFIG_CRYPTO_DEV_BCM_SPU) += bcm/
 obj-$(CONFIG_CRYPTO_DEV_SAFEXCEL) += inside-secure/
 obj-$(CONFIG_CRYPTO_DEV_ARTPEC6) += axis/
+obj-y += hisilicon/
diff --git a/drivers/crypto/hisilicon/Kconfig b/drivers/crypto/hisilicon/Kconfig
new file mode 100644
index ..8ca9c503bcb0
--- /dev/null
+++ b/drivers/crypto/hisilicon/Kconfig
@@ -0,0 +1,14 @@
+# SPDX-License-Identifier: GPL-2.0
+
+config CRYPTO_DEV_HISI_SEC
+   tristate "Support for Hisilicon SEC crypto block cipher accelerator"
+   select CRYPTO_BLKCIPHER
+   select CRYPTO_ALGAPI
+   select SG_SPLIT
+   depends on ARM64 || COMPILE_TEST
+   depends on HAS_IOMEM
+   help
+ Support for Hisilicon SEC Engine in Hip06 and Hip07
+
+ To compile this as a module, choose M here: the module
+ will be called hisi_sec.
diff --git a/drivers/crypto/hisilicon/Makefile 
b/drivers/crypto/hisilicon/Makefile
new file mode 100644
index ..463f46ace182
--- /dev/null
+++ b/drivers/crypto/hisilicon/Makefile
@@ -0,0 +1,2 @@
+# SPDX-License-Identifier: GPL-2.0
+obj-$(CONFIG_CRYPTO_DEV_HISI_SEC) += sec/
diff --git a/drivers/crypto/hisilicon/sec/Makefile 
b/drivers/crypto/hisilicon/sec/Makefile
new file mode 100644
index ..a55b698e0c27
--- /dev/null
+++ b/drivers/crypto/hisilicon/sec/Makefile
@@ -0,0 +1,3 @@
+# SPDX-License-Identifier: GPL-2.0
+obj-$(CONFIG_CRYPTO_DEV_HISI_SEC) += hisi_sec.o
+hisi_sec-y = sec_algs.o sec_drv.o
diff --git a/drivers/crypto/hisilicon/sec/sec_algs.c 
b/drivers/crypto/hisilicon/sec/sec_algs.c
new file mode 100644
index ..d69d3ce358b0
--- /dev/null
+++ b/drivers/crypto/hisilicon/sec/sec_algs.c
@@ -0,0 +1,1122 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2016-2017 Hisilicon Limited. */
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "sec_drv.h"
+
+#define SEC_MAX_CIPHER_KEY 64
+#define SEC_REQ_LIMIT SZ_32M
+
+struct sec_c_alg_cfg {
+   unsigned c_alg  : 3;
+   unsigned c_mode : 3;
+   unsigned key_len: 2;
+   unsigned c_width: 2;
+};
+
+static const struct sec_c_alg_cfg sec_c_alg_cfgs[] =  {
+   [SEC_C_DES_ECB_64] = {
+   .c_alg = SEC_C_ALG_DES,
+   .c_mode = SEC_C_MODE_ECB,
+   .key_len = SEC_KEY_LEN_DES,
+   },
+   [SEC_C_DES_CBC_64] = {
+   .c_alg = SEC_C_ALG_DES,
+   .c_mode = SEC_C_MODE_CBC,
+   .key_len = SEC_KEY_LEN_DES,
+   },
+   [SEC_C_3DES_ECB_192_3KEY] = {
+   .c_alg = SEC_C_ALG_3DES,
+   .c_mode = SEC_C_MODE_ECB,
+   .key_len = SEC_KEY_LEN_3DES_3_KEY,
+   },
+   [SEC_C_3DES_ECB_192_2KEY] = {
+   .c_alg = SEC_C_ALG_3DES,
+   .c_mode = SEC_C_MODE_ECB,
+   .key_len = SEC_KEY_LEN_3DES_2_KEY,
+   },
+   [SEC_C_3DES_CBC_192_3KEY] = {
+   .c_alg = SEC_C_ALG_3DES,
+   .c_mode = SEC_C_MODE_CBC,
+   .key_len = SEC_KEY_LEN_3DES_3_KEY,
+   },
+   [SEC_C_3DES_CBC_192_2KEY] = {
+   .c_alg = SEC_C_ALG_3DES,
+   .c_mode = SEC_C_MODE_CBC,
+   .key_len = SEC_KEY_LEN_3DES_2_KEY,
+   },
+   [SEC_C_AES_ECB_128] = {
+   .c_alg = SEC_C_ALG_AES,
+   .c_mode = SEC_C_MODE

[PATCH V2 0/3] Hisilicon SEC crypto driver (hip06 / hip07)

2018-07-23 Thread Jonathan Cameron

The driver provides in kernel support for the Hisilicon SEC accelerator
found in the hip06 and hip07 SoCs.  There are 4 such units on the D05
board for which an appropriate DT binding has been provided.  ACPI also
works with an appropriate UEFI build.

The hardware does not update the IV in chaining or counting modes.
This is done in the drive ron completion of the cipher operation.

The driver support AES, DES and 3DES block ciphers in a range of
modes (others to follow). Hash and AAED support to follow.

Sorry for the delay on this one, other priorities and all that...

Changes since V1.
1) DT binding fixes suggested by Rob Herring in patches 1 and 3.
2) Added XTS key check as suggested by Stephan Muller.
3) A trivial use after free found during testing of the above.

Changes since RFC.
1) Addition of backlog queuing as needed to support dm-crypt usecases.
2) iommu presence tests now done as Robin Murphy suggested.
3) Hardware limiation to 32MB requests worked aroud in driver so it will
   now support very large requests (512*32MB).  Larger request handling
   than this would require a longer queue with the associate overheads and
   is considered unlikely to be necessary.
4) The specific handling related to the inline IV patch set from Stephan
   has been dropped for now.
5) Interrupt handler was previous more complex than necessary so has been
   reworked.
6) Use of the bounce buffer for small packeets is dropped for now.  This is a
   performance optimization that made the code harder to review and can be
   reintroduced as necessary at a later date.
7) Restructuring of some code to simplify hash and aaed (hash implemented
   but not ready fo upstream at this time)
8) Various minor fixes and reworks of the code
   * several off by one errors in the cleanup paths
   * single template for enc and dec
   * drop dec_key as not used (enc_key was used in both cases)
   * drop dma pool for IVs as it breaks chaining.
   * lots of spinlocks changed to mutexes as not taken in atomic context.
   * nasty memory leak cleaned up.

Jonathan Cameron (3):
  dt-bindings: Add bindings for Hisilicon SEC crypto accelerators.
  crypto: hisilicon SEC security accelerator driver
  arm64: dts: hisi: add SEC crypto accelerator nodes for hip07 SoC

 .../bindings/crypto/hisilicon,hip07-sec.txt|   67 +
 arch/arm64/boot/dts/hisilicon/hip07.dtsi   |  284 +
 drivers/crypto/Kconfig |2 +
 drivers/crypto/Makefile|1 +
 drivers/crypto/hisilicon/Kconfig   |   14 +
 drivers/crypto/hisilicon/Makefile  |2 +
 drivers/crypto/hisilicon/sec/Makefile  |3 +
 drivers/crypto/hisilicon/sec/sec_algs.c| 1122 +
 drivers/crypto/hisilicon/sec/sec_drv.c | 1323 
 drivers/crypto/hisilicon/sec/sec_drv.h |  428 +++
 10 files changed, 3246 insertions(+)
 create mode 100644 
Documentation/devicetree/bindings/crypto/hisilicon,hip07-sec.txt
 create mode 100644 drivers/crypto/hisilicon/Kconfig
 create mode 100644 drivers/crypto/hisilicon/Makefile
 create mode 100644 drivers/crypto/hisilicon/sec/Makefile
 create mode 100644 drivers/crypto/hisilicon/sec/sec_algs.c
 create mode 100644 drivers/crypto/hisilicon/sec/sec_drv.c
 create mode 100644 drivers/crypto/hisilicon/sec/sec_drv.h

-- 
2.16.2

[PATCH 3/3] arm64: dts: hisi: add SEC crypto accelerator nodes for hip07 SoC

2018-07-23 Thread Jonathan Cameron

Enable all 4 SEC units available on d05 boards.

Signed-off-by: Jonathan Cameron 
---
 arch/arm64/boot/dts/hisilicon/hip07.dtsi | 284 +++
 1 file changed, 284 insertions(+)

diff --git a/arch/arm64/boot/dts/hisilicon/hip07.dtsi 
b/arch/arm64/boot/dts/hisilicon/hip07.dtsi
index 0600a6a84ab7..6d046f4f7019 100644
--- a/arch/arm64/boot/dts/hisilicon/hip07.dtsi
+++ b/arch/arm64/boot/dts/hisilicon/hip07.dtsi
@@ -1049,7 +1049,74 @@
num-pins = <2>;
};
};
+   p0_mbigen_alg_a:interrupt-controller@d008 {
+   compatible = "hisilicon,mbigen-v2";
+   reg = <0x0 0xd008 0x0 0x1>;
 
+   p0_mbigen_sec_a: intc_sec {
+   msi-parent = <_its_dsa_a 0x40400>;
+   interrupt-controller;
+   #interrupt-cells = <2>;
+   num-pins = <33>;
+   };
+   p0_mbigen_smmu_alg_a: intc_smmu_alg {
+   msi-parent = <_its_dsa_a 0x40b1b>;
+   interrupt-controller;
+   #interrupt-cells = <2>;
+   num-pins = <3>;
+   };
+   };
+   p0_mbigen_alg_b:interrupt-controller@8,d008 {
+   compatible = "hisilicon,mbigen-v2";
+   reg = <0x8 0xd008 0x0 0x1>;
+
+   p0_mbigen_sec_b: intc_sec {
+   msi-parent = <_its_dsa_b 0x42400>;
+   interrupt-controller;
+   #interrupt-cells = <2>;
+   num-pins = <33>;
+   };
+   p0_mbigen_smmu_alg_b: intc_smmu_alg {
+   msi-parent = <_its_dsa_b 0x42b1b>;
+   interrupt-controller;
+   #interrupt-cells = <2>;
+   num-pins = <3>;
+   };
+   };
+   p1_mbigen_alg_a:interrupt-controller@400,d008 {
+   compatible = "hisilicon,mbigen-v2";
+   reg = <0x400 0xd008 0x0 0x1>;
+
+   p1_mbigen_sec_a: intc_sec {
+   msi-parent = <_its_dsa_a 0x44400>;
+   interrupt-controller;
+   #interrupt-cells = <2>;
+   num-pins = <33>;
+   };
+   p1_mbigen_smmu_alg_a: intc_smmu_alg {
+   msi-parent = <_its_dsa_a 0x44b1b>;
+   interrupt-controller;
+   #interrupt-cells = <2>;
+   num-pins = <3>;
+   };
+   };
+   p1_mbigen_alg_b:interrupt-controller@408,d008 {
+   compatible = "hisilicon,mbigen-v2";
+   reg = <0x408 0xd008 0x0 0x1>;
+
+   p1_mbigen_sec_b: intc_sec {
+   msi-parent = <_its_dsa_b 0x46400>;
+   interrupt-controller;
+   #interrupt-cells = <2>;
+   num-pins = <33>;
+   };
+   p1_mbigen_smmu_alg_b: intc_smmu_alg {
+   msi-parent = <_its_dsa_b 0x46b1b>;
+   interrupt-controller;
+   #interrupt-cells = <2>;
+   num-pins = <3>;
+   };
+   };
p0_mbigen_dsa_a: interrupt-controller@c008 {
compatible = "hisilicon,mbigen-v2";
reg = <0x0 0xc008 0x0 0x1>;
@@ -1107,6 +1174,58 @@
hisilicon,broken-prefetch-cmd;
status = "disabled";
};
+   p0_smmu_alg_a: smmu_alg@d004 {
+   compatible = "arm,smmu-v3";
+   reg = <0x0 0xd004 0x0 0x2>;
+   interrupt-parent = <_mbigen_smmu_alg_a>;
+   interrupts = <733 1>,
+   <734 1>,
+   <735 1>;
+   interrupt-names = "eventq", "gerror", "priq";
+   #iommu-cells = <1>;
+   dma-coherent;
+   hisilicon,broken-prefetch-cmd;
+   /* smmu-cb-memtype = <0x0 0x1>;*/
+   };
+   p0_smmu_alg_b: smmu_alg@8,d004 {
+   compatible = "arm,smmu-v3";
+   reg = <0x8 0xd004 0x0 0x2>;
+   interrupt-parent = <_mbigen_smmu_alg_b>;
+   interrupts = <733 1>,
+   <734 1>,
+   <735 1>;
+   interrupt-names = "eventq", "gerror", "priq";
+   #iommu-cells = <1>;
+   dma-coherent;
+   hisilicon,broken-prefetch-cmd;
+

[PATCH 1/3] dt-bindings: Add bindings for Hisilicon SEC crypto accelerators.

2018-07-23 Thread Jonathan Cameron

The hip06 and hip07 SoCs contain a number of these crypto units which
accelerate AES and DES operations.

Signed-off-by: Jonathan Cameron 
---
 .../bindings/crypto/hisilicon,hip07-sec.txt| 67 ++
 1 file changed, 67 insertions(+)

diff --git a/Documentation/devicetree/bindings/crypto/hisilicon,hip07-sec.txt 
b/Documentation/devicetree/bindings/crypto/hisilicon,hip07-sec.txt
new file mode 100644
index ..78d2db9d4de5
--- /dev/null
+++ b/Documentation/devicetree/bindings/crypto/hisilicon,hip07-sec.txt
@@ -0,0 +1,67 @@
+* Hisilicon hip07 Security Accelerator (SEC)
+
+Required properties:
+- compatible: Must contain one of
+  - "hisilicon,hip06-sec"
+  - "hisilicon,hip07-sec"
+- reg: Memory addresses and lengths of the memory regions through which
+  this device is controlled.
+  Region 0 has registers to control the backend processing engines.
+  Region 1 has registers for functionality common to all queues.
+  Regions 2-18 have registers for the 16 individual queues which are isolated
+  both in hardware and within the driver.
+- interrupts: Interrupt specifiers.
+  Refer to interrupt-controller/interrupts.txt for generic interrupt client 
node
+  bindings.
+  Interrupt 0 is for the SEC unit error queue.
+  Interrupt 2N + 1 is the completion interrupt for queue N.
+  Interrupt 2N + 2 is the error interrupt for queue N.
+- dma-coherent:  The driver assumes coherent dma is possible.
+
+Optional properties:
+- iommus: The SEC units are behind smmu-v3 iommus.
+  Refer to iommu/arm,smmu-v3.txt for more information.
+
+Example:
+
+p1_sec_a: crypto@400,d200 {
+   compatible = "hisilicon,hip07-sec";
+   reg = <0x400 0xd000 0x0 0x1
+  0x400 0xd200 0x0 0x1
+  0x400 0xd201 0x0 0x1
+  0x400 0xd202 0x0 0x1
+  0x400 0xd203 0x0 0x1
+  0x400 0xd204 0x0 0x1
+  0x400 0xd205 0x0 0x1
+  0x400 0xd206 0x0 0x1
+  0x400 0xd207 0x0 0x1
+  0x400 0xd208 0x0 0x1
+  0x400 0xd209 0x0 0x1
+  0x400 0xd20a 0x0 0x1
+  0x400 0xd20b 0x0 0x1
+  0x400 0xd20c 0x0 0x1
+  0x400 0xd20d 0x0 0x1
+  0x400 0xd20e 0x0 0x1
+  0x400 0xd20f 0x0 0x1
+  0x400 0xd210 0x0 0x1>;
+   interrupt-parent = <_mbigen_sec_a>;
+   iommus = <_smmu_alg_a 0x600>;
+   dma-coherent;
+   interrupts = <576 4>,
+<577 1>, <578 4>,
+<579 1>, <580 4>,
+<581 1>, <582 4>,
+<583 1>, <584 4>,
+<585 1>, <586 4>,
+<587 1>, <588 4>,
+<589 1>, <590 4>,
+<591 1>, <592 4>,
+<593 1>, <594 4>,
+<595 1>, <596 4>,
+<597 1>, <598 4>,
+<599 1>, <600 4>,
+<601 1>, <602 4>,
+<603 1>, <604 4>,
+<605 1>, <606 4>,
+<607 1>, <608 4>;
+};
-- 
2.16.2

Re: [PATCH 2/3] crypto: hisilicon SEC security accelerator driver

2018-07-23 Thread Jonathan Cameron

On Fri, 20 Jul 2018 20:17:22 +0200
Stephan Müller  wrote:

> Am Montag, 16. Juli 2018, 12:43:41 CEST schrieb Jonathan Cameron:
> 
> Hi Jonathan,
> 
> > +static int sec_alg_skcipher_setkey_aes_xts(struct crypto_skcipher *tfm,
> > +  const u8 *key, unsigned int
> > keylen) +{
> > +   enum sec_cipher_alg alg;
> > +
> > +   switch (keylen) {
> > +   case AES_KEYSIZE_128 * 2:
> > +   alg = SEC_C_AES_XTS_128;
> > +   break;
> > +   case AES_KEYSIZE_256 * 2:
> > +   alg = SEC_C_AES_XTS_256;
> > +   break;
> > +   default:
> > +   return -EINVAL;
> > +   }
> > +
> > +   return sec_alg_skcipher_setkey(tfm, key, keylen, alg);
> > +}  
> 
> Can you please call the function xts_check_key or xts_verify_key before 
> setting the key?
> 
Will do.

Thanks,

Jonathan

Re: [PATCH 1/3] dt-bindings: Add bindings for Hisilicon SEC crypto accelerators.

2018-07-20 Thread Jonathan Cameron

On Fri, 20 Jul 2018 10:30:10 -0600
Rob Herring  wrote:

> On Mon, Jul 16, 2018 at 11:43:40AM +0100, Jonathan Cameron wrote:
> > The hip06 and hip07 SoCs contain a number of these crypto units which
> > accelerate AES and DES operations.
> > 
> > Signed-off-by: Jonathan Cameron 
> > ---
> >  .../bindings/crypto/hisilicon,hip07-sec.txt| 69 
> > ++
> >  1 file changed, 69 insertions(+)
> > 
> > diff --git 
> > a/Documentation/devicetree/bindings/crypto/hisilicon,hip07-sec.txt 
> > b/Documentation/devicetree/bindings/crypto/hisilicon,hip07-sec.txt
> > new file mode 100644
> > index ..00b838706c98
> > --- /dev/null
> > +++ b/Documentation/devicetree/bindings/crypto/hisilicon,hip07-sec.txt
> > @@ -0,0 +1,69 @@
> > +* Hisilicon hip07 Security Accelerator (SEC)
> > +
> > +Required properties:
> > +- compatible: Must contain one of
> > +  - "hisilicon,hip06-sec"
> > +  - "hisilicon,hip07-sec"
> > +- reg: Memory addresses and lengths of the memory regions used by the 
> > driver.  
> 
> You know my feelings about the word "driver" in bindings. :)

Oops. Will fix.  Oddly rare I actually write one of these docs so making newbie
mistakes :(

> 
> > +  Region 0 has registers to control the backend processing engines.
> > +  Region 1 has registers for functionality common to all queues.
> > +  Regions 2-18 have registers for the individual queues which are isolated
> > +  both in hardware and within the driver.  
> 
> It's always 16 queues? Might state that somewhere.

Technically complex because some of them may be in use by the secure world
and hence not available and not seen in the DT.
Right now the driver doesn't support that, and it's not happening on any
existing platforms but I was trying to avoid stating there were definitely
16 available. I guess if that happens I'll fix the binding when they do it.
(moderately unlikely to happen now given age of this platform, but you never
know).

> 
> > +- interrupts: Interrupt specifiers.
> > +  Refer to interrupt-controller/interrupts.txt for generic interrupt 
> > client node
> > +  bindings.
> > +  Interrupt 0 is for the SEC unit error queue.
> > +  Interrupt 2N + 1 is the completion interrupt for queue N.
> > +  Interrupt 2N + 2 is the error interrupt for queue N.
> > +- dma-coherent:  The driver assumes coherent dma is possible.
> > +
> > +Optional properties:
> > +- iommus: The SEC units are behind smmu-v3 iommus.
> > +  Refer to iommu/arm,smmu-v3.txt for more information.
> > +
> > +Example:
> > +Second socket, first unit chosen to illustrate need for 64 bit addresses.  
> 
> I don't follow the 64-bit address comment.

Without it the address is in the 32bit range so internal review raised
the question of why we needed to provide 64 bit registers as 32 bit ones
are more compact.

reg = <0xd00 0x1

etc

> 
> > +
> > +p1_sec_a: sec@d200 {  
> 
> crypto@...
> 
> The unit-address should be from the 1st reg entry and why isn't the 
> 0x400 part included?

Will fix here an obviously in the DT patch itself.

> 
> > +   compatible = "hisilicon,hip07-sec";
> > +   #address-cells = <2>;
> > +   #size-cells = <2>;  
> 
> These aren't needed here as there are no child nodes.
Oops I always forget those.
> 
> > +   reg = <0x400 0xd000 0x0 0x1
> > +  0x400 0xd200 0x0 0x1
> > +  0x400 0xd201 0x0 0x1
> > +  0x400 0xd202 0x0 0x1
> > +  0x400 0xd203 0x0 0x1
> > +  0x400 0xd204 0x0 0x1
> > +  0x400 0xd205 0x0 0x1
> > +  0x400 0xd206 0x0 0x1
> > +  0x400 0xd207 0x0 0x1
> > +  0x400 0xd208 0x0 0x1
> > +  0x400 0xd209 0x0 0x1
> > +  0x400 0xd20a 0x0 0x1
> > +  0x400 0xd20b 0x0 0x1
> > +  0x400 0xd20c 0x0 0x1
> > +  0x400 0xd20d 0x0 0x1
> > +  0x400 0xd20e 0x0 0x1
> > +  0x400 0xd20f 0x0 0x1
> > +  0x400 0xd210 0x0 0x1>;
> > +   interrupt-parent = <_mbigen_sec_a>;
> > +   iommus = <_smmu_alg_a 0x600>;
> > +   dma-coherent;
> > +   interrupts = <576 4>,
> > +<577 1>,<578 4>,  
> 
> space needed after the comma.
> 
> > +<579 1>,<580 4>,
> > +<581 1>,<582 4>,
> > +<583 1>,<584 4>,
> > +<585 1>,<586 4>,
> > +<587 1>,<588 4>,
> > +<589 1>,<590 4>,
> > +<591 1>,<592 4>,
> > +<593 1>,<594 4>,
> > +<595 1>,<596 4>,
> > +<597 1>,<598 4>,
> > +<599 1>,<600 4>,
> > +<601 1>,<602 4>,
> > +<603 1>,<604 4>,
> > +<605 1>,<606 4>,
> > +<607 1>,<608 4>;
> > +};
> > -- 
> > 2.16.2

Thanks Rob.

Jonathan
> > 
> >

[PATCH 2/3] crypto: hisilicon SEC security accelerator driver

2018-07-16 Thread Jonathan Cameron

This accelerator is found inside hisilicon hip06 and hip07 SoCs.
Each instance provides a number of queues which feed a different number of
backend acceleration units.

The queues are operating in an out of order mode in the interests of
throughput. The silicon does not do tracking of dependencies between
multiple 'messages' or update of the IVs as appropriate for training.
Hence where relevant we need to do this in software.

Signed-off-by: Jonathan Cameron 
---
 drivers/crypto/Kconfig  |2 +
 drivers/crypto/Makefile |1 +
 drivers/crypto/hisilicon/Kconfig|   14 +
 drivers/crypto/hisilicon/Makefile   |2 +
 drivers/crypto/hisilicon/sec/Makefile   |3 +
 drivers/crypto/hisilicon/sec/sec_algs.c | 1116 ++
 drivers/crypto/hisilicon/sec/sec_drv.c  | 1323 +++
 drivers/crypto/hisilicon/sec/sec_drv.h  |  428 ++
 8 files changed, 2889 insertions(+)

diff --git a/drivers/crypto/Kconfig b/drivers/crypto/Kconfig
index d1ea1a07cecb..d0b80d0d1f8b 100644
--- a/drivers/crypto/Kconfig
+++ b/drivers/crypto/Kconfig
@@ -750,4 +750,6 @@ config CRYPTO_DEV_CCREE
  cryptographic operations on the system REE.
  If unsure say Y.
 
+source "drivers/crypto/hisilicon/Kconfig"
+
 endif # CRYPTO_HW
diff --git a/drivers/crypto/Makefile b/drivers/crypto/Makefile
index 7ae87b4f6c8d..ee43aed8cb69 100644
--- a/drivers/crypto/Makefile
+++ b/drivers/crypto/Makefile
@@ -45,3 +45,4 @@ obj-$(CONFIG_CRYPTO_DEV_VMX) += vmx/
 obj-$(CONFIG_CRYPTO_DEV_BCM_SPU) += bcm/
 obj-$(CONFIG_CRYPTO_DEV_SAFEXCEL) += inside-secure/
 obj-$(CONFIG_CRYPTO_DEV_ARTPEC6) += axis/
+obj-y += hisilicon/
diff --git a/drivers/crypto/hisilicon/Kconfig b/drivers/crypto/hisilicon/Kconfig
new file mode 100644
index ..8ca9c503bcb0
--- /dev/null
+++ b/drivers/crypto/hisilicon/Kconfig
@@ -0,0 +1,14 @@
+# SPDX-License-Identifier: GPL-2.0
+
+config CRYPTO_DEV_HISI_SEC
+   tristate "Support for Hisilicon SEC crypto block cipher accelerator"
+   select CRYPTO_BLKCIPHER
+   select CRYPTO_ALGAPI
+   select SG_SPLIT
+   depends on ARM64 || COMPILE_TEST
+   depends on HAS_IOMEM
+   help
+ Support for Hisilicon SEC Engine in Hip06 and Hip07
+
+ To compile this as a module, choose M here: the module
+ will be called hisi_sec.
diff --git a/drivers/crypto/hisilicon/Makefile 
b/drivers/crypto/hisilicon/Makefile
new file mode 100644
index ..463f46ace182
--- /dev/null
+++ b/drivers/crypto/hisilicon/Makefile
@@ -0,0 +1,2 @@
+# SPDX-License-Identifier: GPL-2.0
+obj-$(CONFIG_CRYPTO_DEV_HISI_SEC) += sec/
diff --git a/drivers/crypto/hisilicon/sec/Makefile 
b/drivers/crypto/hisilicon/sec/Makefile
new file mode 100644
index ..a55b698e0c27
--- /dev/null
+++ b/drivers/crypto/hisilicon/sec/Makefile
@@ -0,0 +1,3 @@
+# SPDX-License-Identifier: GPL-2.0
+obj-$(CONFIG_CRYPTO_DEV_HISI_SEC) += hisi_sec.o
+hisi_sec-y = sec_algs.o sec_drv.o
diff --git a/drivers/crypto/hisilicon/sec/sec_algs.c 
b/drivers/crypto/hisilicon/sec/sec_algs.c
new file mode 100644
index ..a9401e98d2e2
--- /dev/null
+++ b/drivers/crypto/hisilicon/sec/sec_algs.c
@@ -0,0 +1,1116 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2016-2017 Hisilicon Limited. */
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "sec_drv.h"
+
+#define SEC_MAX_CIPHER_KEY 64
+#define SEC_REQ_LIMIT SZ_32M
+
+struct sec_c_alg_cfg {
+   unsigned c_alg  : 3;
+   unsigned c_mode : 3;
+   unsigned key_len: 2;
+   unsigned c_width: 2;
+};
+
+static const struct sec_c_alg_cfg sec_c_alg_cfgs[] =  {
+   [SEC_C_DES_ECB_64] = {
+   .c_alg = SEC_C_ALG_DES,
+   .c_mode = SEC_C_MODE_ECB,
+   .key_len = SEC_KEY_LEN_DES,
+   },
+   [SEC_C_DES_CBC_64] = {
+   .c_alg = SEC_C_ALG_DES,
+   .c_mode = SEC_C_MODE_CBC,
+   .key_len = SEC_KEY_LEN_DES,
+   },
+   [SEC_C_3DES_ECB_192_3KEY] = {
+   .c_alg = SEC_C_ALG_3DES,
+   .c_mode = SEC_C_MODE_ECB,
+   .key_len = SEC_KEY_LEN_3DES_3_KEY,
+   },
+   [SEC_C_3DES_ECB_192_2KEY] = {
+   .c_alg = SEC_C_ALG_3DES,
+   .c_mode = SEC_C_MODE_ECB,
+   .key_len = SEC_KEY_LEN_3DES_2_KEY,
+   },
+   [SEC_C_3DES_CBC_192_3KEY] = {
+   .c_alg = SEC_C_ALG_3DES,
+   .c_mode = SEC_C_MODE_CBC,
+   .key_len = SEC_KEY_LEN_3DES_3_KEY,
+   },
+   [SEC_C_3DES_CBC_192_2KEY] = {
+   .c_alg = SEC_C_ALG_3DES,
+   .c_mode = SEC_C_MODE_CBC,
+   .key_len = SEC_KEY_LEN_3DES_2_KEY,
+   },
+   [SEC_C_AES_ECB_128] = {
+   .c_alg = SEC_C_ALG_AES,
+   .c_mode = SEC_C_MODE

[PATCH 0/3] Hisilicon SEC crypto driver (hip06 / hip07)

2018-07-16 Thread Jonathan Cameron

The driver provides in kernel support for hte Hisilicon SEC accelerator
found in the hip06 and hip07 SoCs.  There are 4 such units on the D05
board for which an appropriate DT binding has been provided.  ACPI also
works with an appropriate UEFI build.

The hardware does not update the IV in chaining or counting modes.
This is done in the drive ron completion of the cipher operation.

The driver support AES, DES and 3DES block ciphers in a range of
modes (others to follow). Hash and AAED support to follow.

Sorry for the delay on this one, other priorities and all that...

Changes since RFC.
1) Addition of backlog queuing as needed to support dm-crypt usecases.
2) iommu presence tests now done as Robin Murphy suggested.
3) Hardware limiation to 32MB requests worked aroud in driver so it will
   now support very large requests (512*32MB).  Larger request handling
   than this would require a longer queue with the associate overheads and
   is considered unlikely to be necessary.
4) The specific handling related to the inline IV patch set from Stephan
   has been dropped for now.
5) Interrupt handler was previous more complex than necessary so has been
   reworked.
6) Use of the bounce buffer for small packeets is dropped for now.  This is a
   performance optimization that made the code harder to review and can be
   reintroduced as necessary at a later date.
7) Restructuring of some code to simplify hash and aaed (hash implemented
   but not ready fo upstream at this time)
8) Various minor fixes and reworks of the code
   * several off by one errors in the cleanup paths
   * single template for enc and dec
   * drop dec_key as not used (enc_key was used in both cases)
   * drop dma pool for IVs as it breaks chaining.
   * lots of spinlocks changed to mutexes as not taken in atomic context.
   * nasty memory leak cleaned up.

Jonathan Cameron (3):
  dt-bindings: Add bindings for Hisilicon SEC crypto accelerators.
  crypto: hisilicon SEC security accelerator driver
  arm64: dts: hisi: add SEC crypto accelerator nodes for hip07 SoC

 .../bindings/crypto/hisilicon,hip07-sec.txt|   69 +
 arch/arm64/boot/dts/hisilicon/hip07.dtsi   |  285 +
 drivers/crypto/Kconfig |2 +
 drivers/crypto/Makefile|1 +
 drivers/crypto/hisilicon/Kconfig   |   14 +
 drivers/crypto/hisilicon/Makefile  |2 +
 drivers/crypto/hisilicon/sec/Makefile  |3 +
 drivers/crypto/hisilicon/sec/sec_algs.c| 1116 +
 drivers/crypto/hisilicon/sec/sec_drv.c | 1323 
 drivers/crypto/hisilicon/sec/sec_drv.h |  428 +++
 10 files changed, 3243 insertions(+)
 create mode 100644 
Documentation/devicetree/bindings/crypto/hisilicon,hip07-sec.txt
 create mode 100644 drivers/crypto/hisilicon/Kconfig
 create mode 100644 drivers/crypto/hisilicon/Makefile
 create mode 100644 drivers/crypto/hisilicon/sec/Makefile
 create mode 100644 drivers/crypto/hisilicon/sec/sec_algs.c
 create mode 100644 drivers/crypto/hisilicon/sec/sec_drv.c
 create mode 100644 drivers/crypto/hisilicon/sec/sec_drv.h

-- 
2.16.2

[PATCH 1/3] dt-bindings: Add bindings for Hisilicon SEC crypto accelerators.

2018-07-16 Thread Jonathan Cameron

The hip06 and hip07 SoCs contain a number of these crypto units which
accelerate AES and DES operations.

Signed-off-by: Jonathan Cameron 
---
 .../bindings/crypto/hisilicon,hip07-sec.txt| 69 ++
 1 file changed, 69 insertions(+)

diff --git a/Documentation/devicetree/bindings/crypto/hisilicon,hip07-sec.txt 
b/Documentation/devicetree/bindings/crypto/hisilicon,hip07-sec.txt
new file mode 100644
index ..00b838706c98
--- /dev/null
+++ b/Documentation/devicetree/bindings/crypto/hisilicon,hip07-sec.txt
@@ -0,0 +1,69 @@
+* Hisilicon hip07 Security Accelerator (SEC)
+
+Required properties:
+- compatible: Must contain one of
+  - "hisilicon,hip06-sec"
+  - "hisilicon,hip07-sec"
+- reg: Memory addresses and lengths of the memory regions used by the driver.
+  Region 0 has registers to control the backend processing engines.
+  Region 1 has registers for functionality common to all queues.
+  Regions 2-18 have registers for the individual queues which are isolated
+  both in hardware and within the driver.
+- interrupts: Interrupt specifiers.
+  Refer to interrupt-controller/interrupts.txt for generic interrupt client 
node
+  bindings.
+  Interrupt 0 is for the SEC unit error queue.
+  Interrupt 2N + 1 is the completion interrupt for queue N.
+  Interrupt 2N + 2 is the error interrupt for queue N.
+- dma-coherent:  The driver assumes coherent dma is possible.
+
+Optional properties:
+- iommus: The SEC units are behind smmu-v3 iommus.
+  Refer to iommu/arm,smmu-v3.txt for more information.
+
+Example:
+Second socket, first unit chosen to illustrate need for 64 bit addresses.
+
+p1_sec_a: sec@d200 {
+   compatible = "hisilicon,hip07-sec";
+   #address-cells = <2>;
+   #size-cells = <2>;
+   reg = <0x400 0xd000 0x0 0x1
+  0x400 0xd200 0x0 0x1
+  0x400 0xd201 0x0 0x1
+  0x400 0xd202 0x0 0x1
+  0x400 0xd203 0x0 0x1
+  0x400 0xd204 0x0 0x1
+  0x400 0xd205 0x0 0x1
+  0x400 0xd206 0x0 0x1
+  0x400 0xd207 0x0 0x1
+  0x400 0xd208 0x0 0x1
+  0x400 0xd209 0x0 0x1
+  0x400 0xd20a 0x0 0x1
+  0x400 0xd20b 0x0 0x1
+  0x400 0xd20c 0x0 0x1
+  0x400 0xd20d 0x0 0x1
+  0x400 0xd20e 0x0 0x1
+  0x400 0xd20f 0x0 0x1
+  0x400 0xd210 0x0 0x1>;
+   interrupt-parent = <_mbigen_sec_a>;
+   iommus = <_smmu_alg_a 0x600>;
+   dma-coherent;
+   interrupts = <576 4>,
+<577 1>,<578 4>,
+<579 1>,<580 4>,
+<581 1>,<582 4>,
+<583 1>,<584 4>,
+<585 1>,<586 4>,
+<587 1>,<588 4>,
+<589 1>,<590 4>,
+<591 1>,<592 4>,
+<593 1>,<594 4>,
+<595 1>,<596 4>,
+<597 1>,<598 4>,
+<599 1>,<600 4>,
+<601 1>,<602 4>,
+<603 1>,<604 4>,
+<605 1>,<606 4>,
+<607 1>,<608 4>;
+};
-- 
2.16.2

[PATCH 3/3] arm64: dts: hisi: add SEC crypto accelerator nodes for hip07 SoC

2018-07-16 Thread Jonathan Cameron

Enable all 4 SEC units available on d05 boards.

Signed-off-by: Jonathan Cameron 
---
 arch/arm64/boot/dts/hisilicon/hip07.dtsi | 285 +++
 1 file changed, 285 insertions(+)

diff --git a/arch/arm64/boot/dts/hisilicon/hip07.dtsi 
b/arch/arm64/boot/dts/hisilicon/hip07.dtsi
index 0600a6a84ab7..a29c4466af26 100644
--- a/arch/arm64/boot/dts/hisilicon/hip07.dtsi
+++ b/arch/arm64/boot/dts/hisilicon/hip07.dtsi
@@ -1049,7 +1049,74 @@
num-pins = <2>;
};
};
+   p0_mbigen_alg_a:interrupt-controller@d008 {
+   compatible = "hisilicon,mbigen-v2";
+   reg = <0x0 0xd008 0x0 0x1>;
 
+   p0_mbigen_sec_a: intc_sec {
+   msi-parent = <_its_dsa_a 0x40400>;
+   interrupt-controller;
+   #interrupt-cells = <2>;
+   num-pins = <33>;
+   };
+   p0_mbigen_smmu_alg_a: intc_smmu_alg {
+   msi-parent = <_its_dsa_a 0x40b1b>;
+   interrupt-controller;
+   #interrupt-cells = <2>;
+   num-pins = <3>;
+   };
+   };
+   p0_mbigen_alg_b:interrupt-controller@8,d008 {
+   compatible = "hisilicon,mbigen-v2";
+   reg = <0x8 0xd008 0x0 0x1>;
+
+   p0_mbigen_sec_b: intc_sec {
+   msi-parent = <_its_dsa_b 0x42400>;
+   interrupt-controller;
+   #interrupt-cells = <2>;
+   num-pins = <33>;
+   };
+   p0_mbigen_smmu_alg_b: intc_smmu_alg {
+   msi-parent = <_its_dsa_b 0x42b1b>;
+   interrupt-controller;
+   #interrupt-cells = <2>;
+   num-pins = <3>;
+   };
+   };
+   p1_mbigen_alg_a:interrupt-controller@400,d008 {
+   compatible = "hisilicon,mbigen-v2";
+   reg = <0x400 0xd008 0x0 0x1>;
+
+   p1_mbigen_sec_a: intc_sec {
+   msi-parent = <_its_dsa_a 0x44400>;
+   interrupt-controller;
+   #interrupt-cells = <2>;
+   num-pins = <33>;
+   };
+   p1_mbigen_smmu_alg_a: intc_smmu_alg {
+   msi-parent = <_its_dsa_a 0x44b1b>;
+   interrupt-controller;
+   #interrupt-cells = <2>;
+   num-pins = <3>;
+   };
+   };
+   p1_mbigen_alg_b:interrupt-controller@408,d008 {
+   compatible = "hisilicon,mbigen-v2";
+   reg = <0x408 0xd008 0x0 0x1>;
+
+   p1_mbigen_sec_b: intc_sec {
+   msi-parent = <_its_dsa_b 0x46400>;
+   interrupt-controller;
+   #interrupt-cells = <2>;
+   num-pins = <33>;
+   };
+   p1_mbigen_smmu_alg_b: intc_smmu_alg {
+   msi-parent = <_its_dsa_b 0x46b1b>;
+   interrupt-controller;
+   #interrupt-cells = <2>;
+   num-pins = <3>;
+   };
+   };
p0_mbigen_dsa_a: interrupt-controller@c008 {
compatible = "hisilicon,mbigen-v2";
reg = <0x0 0xc008 0x0 0x1>;
@@ -1107,6 +1174,58 @@
hisilicon,broken-prefetch-cmd;
status = "disabled";
};
+   p0_smmu_alg_a: smmu_alg@d004 {
+   compatible = "arm,smmu-v3";
+   reg = <0x0 0xd004 0x0 0x2>;
+   interrupt-parent = <_mbigen_smmu_alg_a>;
+   interrupts = <733 1>,
+   <734 1>,
+   <735 1>;
+   interrupt-names = "eventq", "gerror", "priq";
+   #iommu-cells = <1>;
+   dma-coherent;
+   hisilicon,broken-prefetch-cmd;
+   /* smmu-cb-memtype = <0x0 0x1>;*/
+   };
+   p0_smmu_alg_b: smmu_alg@8,d004 {
+   compatible = "arm,smmu-v3";
+   reg = <0x8 0xd004 0x0 0x2>;
+   interrupt-parent = <_mbigen_smmu_alg_b>;
+   interrupts = <733 1>,
+   <734 1>,
+   <735 1>;
+   interrupt-names = "eventq", "gerror", "priq";
+   #iommu-cells = <1>;
+   dma-coherent;
+   hisilicon,broken-prefetch-cmd;
+

Re: [PATCH v3 0/4] crypto: AF_ALG AIO improvements

2018-03-02 Thread Jonathan Cameron

On Tue, 27 Feb 2018 15:08:58 +0100
Stephan Müller  wrote:

> Am Freitag, 23. Februar 2018, 13:00:26 CET schrieb Herbert Xu:
> 
> Hi Herbert,

Hi Stephan / Herbert,

> 
> > On Fri, Feb 23, 2018 at 09:33:33AM +0100, Stephan Müller wrote:  
> > > A simple copy operation, however, will imply that in one AIO recvmsg
> > > request, only *one* IOCB can be set and processed.  
> > 
> > Sure, but the recvmsg will return as soon as the crypto API encrypt
> > or decrypt function returns.  It's still fully async.  It's just
> > that the setup part needs to be done with sendmsg/recvmsg.  
> 
> Wouldn't a copy of the ctx->iv into a per-request buffer change the behavoir 
> of the AF_ALG interface significantly?
> 
> Today, if multiple IOCBs are submitted, most cipher implementations would 
> serialize the requests (e.g. all implementations that behave synchronous in 
> nature like all software implementations).
> 
> Thus, when copying the ctx->iv into a separate per-request buffer, suddenly 
> all block-chained cipher operations are not block chained any more.

Agreed - specific handling would be needed to ensure the IV is written to
each copy to maintain the chain.  Not nice at all.

> > 
> > Even if we wanted to do what you stated, just inlining the IV isn't
> > enough.  You'd also need to inline the assoclen, and probably the
> > optype in case you want to mix encrypt/decrypt too.  
> 
> Maybe that is what we have to do.

The one element I could do with more clarity on here is use cases
as it feels like the discussion is a little unfocused (helps with
performance runs, but is it really useful?)

When do we want to have separate IVs per request but a shared key?

I think this is relevant for ctr modes in particular where userspace
can provide the relevant ctrs but the key is shared.
Storage encryption modes such as XTS can also benefit.

My own knowledge is too abstract to give good answers to these.

> > 
> > However, I must say that I don't see the point of going all the way
> > to support such a bulk submission interface (e.g., lio_listio).  
> 
> IMHO, the point is that AF_ALG is the only interface to allow userspace to 
> utilize hardware crypto implementations. For example, on a small chip with 
> hardware crypto support, your user space code can offload crypto to that 
> hardware to free CPU time.
> 
> How else would somebody access its crypto accelerators?

This is also useful at the high end where we may well be throwing this
bulk submission at a set of crypto units (hidden behind a queue) to
parallelize when possible.

Just because we have lots of cpu power doesn't mean it makes sense
to use it for crypto :)

We 'could' just do it all in userspace via vfio, but there are the usual
disadvantages in that approach in terms of generality etc.

> > 
> > Remember, the algif interface due to its inherent overhead is meant
> > for bulk data.  That is, the processing time for each request is
> > dominated by the actual processing, not the submission process.  
> 
> I see that. And for smaller chips with crypto support, this would be the case 
> IMHO. Especially if we streamline the AF_ALG overhead such that we reduce the 
> number of syscalls and user/kernel space roundtrips.

For larger devices the ability to run large numbers of requests and 'know'
that they don't need to be chained is useful (because they have separate IVs).
This allows you to let the hardware handle them in parallel (either because
the hardware handles dependency tracking, or because we have done it in the
driver.)  Applies just as well for large blocks with lower overhead.

You could do this by opening lots of separate sockets and simply providing
them all with the same key.  However, this assumes the hardware / driver can
handle very large numbers of contexts (ours can though we only implement a
subset of this functionality in the current driver to keep things simple).

If we 'fake' such support in the driver then there is inherent nastiness
around having to let the hardware queues drain before you can change the IV)
and that the overhead of operating such a pool of sockets in your program
isn't significant.  

Managing such a pool of sockets would also be a significant complexity
overhead in complexity of the user space code.

> > 
> > If you're instead processing lots of tiny requests, do NOT use
> > algif because it's not designed for that.  
> 
> The only issue in this case is that it makes the operation slower. 
> > 
> > Therefore spending too much time to optimise the submission overhead
> > seems pointless to me.
> > 
> > Cheers,  
> 
> 
> Ciao
> Stephan
> 
> 
Thanks,

Jonathan

Re: ahash - scheduling whilst atomic.

2018-02-26 Thread Jonathan Cameron

On Mon, 26 Feb 2018 14:02:33 +
Jonathan Cameron <jonathan.came...@huawei.com> wrote:

> Hi All,

Please ignore - stupid bug in my code.

Sorry for the noise.

Jonathan
> 
> I seem to get much the same whether using af_alg or tcrypt to poke
> the support for ahashes that I just added to our driver.
> 
> Taking the af_alg path as that is the one I've chased further.
> We return -EINPROGRESS (along with it seems lots of other ahash
> implementations) if we have some data in flight. The hardware is
> asynchronous and we don't currently bother to check if it is
> already done or not (may do so later as an optimization).
> 
> In hash_sendmsg (algif_hash.c) we have
> 
> err = crypto_wait_req(crypto_ahash_update(>req), >wait);
> 
> Now crypto_wait does a wait_for_completion if it gets -EINPROGRESS
> from the crypto_ahash_update call.
> 
> Given hash_sendmsg is called from a socket we can't do a wait_for_completion
> here safely.  I note for the various wait calls in the equivalent skcipher 
> code
> we drop through with an error if they would otherwise sleep.
> 
> I'm guessing that I'm missing something!  As far as I can tell the first
> few drivers I looked at caam_hash and crypto4xx do exactly the same thing
> I am doing (superficially anyway)...
> 
> Anyone have any thoughts?
> 
> Thanks,
> 
> Jonathan
> ___
> Linuxarm mailing list
> linux...@huawei.com
> http://hulk.huawei.com/mailman/listinfo/linuxarm

ahash - scheduling whilst atomic.

2018-02-26 Thread Jonathan Cameron

Hi All,

I seem to get much the same whether using af_alg or tcrypt to poke
the support for ahashes that I just added to our driver.

Taking the af_alg path as that is the one I've chased further.
We return -EINPROGRESS (along with it seems lots of other ahash
implementations) if we have some data in flight. The hardware is
asynchronous and we don't currently bother to check if it is
already done or not (may do so later as an optimization).

In hash_sendmsg (algif_hash.c) we have

err = crypto_wait_req(crypto_ahash_update(>req), >wait);

Now crypto_wait does a wait_for_completion if it gets -EINPROGRESS
from the crypto_ahash_update call.

Given hash_sendmsg is called from a socket we can't do a wait_for_completion
here safely.  I note for the various wait calls in the equivalent skcipher code
we drop through with an error if they would otherwise sleep.

I'm guessing that I'm missing something!  As far as I can tell the first
few drivers I looked at caam_hash and crypto4xx do exactly the same thing
I am doing (superficially anyway)...

Anyone have any thoughts?

Thanks,

Jonathan

Re: [PATCH v2 4/4] crypto: add CRYPTO_TFM_REQ_PARALLEL flag

2018-02-07 Thread Jonathan Cameron

On Wed, 7 Feb 2018 16:43:10 +0100
Stephan Mueller <smuel...@chronox.de> wrote:

> Am Mittwoch, 7. Februar 2018, 16:39:11 CET schrieb Jonathan Cameron:
> 
> Hi Jonathan,
> 
> > On Wed, 7 Feb 2018 13:48:32 +0100
> > 
> > Stephan Mueller <smuel...@chronox.de> wrote:  
> > > Am Mittwoch, 7. Februar 2018, 08:44:04 CET schrieb Stephan Müller:
> > > 
> > > Hi,
> > >   
> > > > diff --git a/crypto/algif_aead.c b/crypto/algif_aead.c
> > > > index 3970ad7f6fd0..da010405eea0 100644
> > > > --- a/crypto/algif_aead.c
> > > > +++ b/crypto/algif_aead.c
> > > > @@ -66,13 +66,22 @@ static int aead_sendmsg(struct socket *sock, struct
> > > > msghdr *msg, size_t size) {
> > > > 
> > > > struct sock *sk = sock->sk;
> > > > struct alg_sock *ask = alg_sk(sk);
> > > > 
> > > > +   struct af_alg_ctx *ctx = ask->private;
> > > > 
> > > > struct sock *psk = ask->parent;
> > > > struct alg_sock *pask = alg_sk(psk);
> > > > struct aead_tfm *aeadc = pask->private;
> > > > 
> > > > -   struct crypto_aead *tfm = aeadc->aead;
> > > > -   unsigned int ivsize = crypto_aead_ivsize(tfm);
> > > > +   struct crypto_aead *aead = aeadc->aead;
> > > > +   struct crypto_tfm *tfm = crypto_aead_tfm(aead);
> > > > +   unsigned int ivsize = crypto_aead_ivsize(aead);
> > > > +   int ret = af_alg_sendmsg(sock, msg, size, ivsize);
> > > > +
> > > > +   if (ret < 0)
> > > > +   return ret;
> > > > 
> > > > -   return af_alg_sendmsg(sock, msg, size, ivsize);
> > > > +   if (ctx->iiv == ALG_IIV_USE)  
> > > 
> > > This should be ALG_IIV_DISABLE of course.  
> > 
> > You say that, but my initial reading was that the core
> > was requesting that the driver do things in parallel
> > irrespective of whether the driver thought it was safe.
> > So I would think this was correct.
> > 
> > Definitely needs some documentation or a clearer name.  
> 
> How about:
> 
> ALG_IV_SERIAL_PROCESSING (was ALG_IIV_DISABLE)
> ALG_IV_PARALLEL_PROCESSING (was ALG_IIV_USE)
> 
Actually those were fine on the basis that inline iv is
obvious enough, it was CRYPTO_TFM_REQ_PARALLEL that
was causing me confusion.

Sorry, wasn't terribly clear on that!

Jonathan

> Ciao
> Stephan
> 
>

Re: [PATCH v2 4/4] crypto: add CRYPTO_TFM_REQ_PARALLEL flag

2018-02-07 Thread Jonathan Cameron

On Wed, 7 Feb 2018 13:48:32 +0100
Stephan Mueller  wrote:

> Am Mittwoch, 7. Februar 2018, 08:44:04 CET schrieb Stephan Müller:
> 
> Hi,
> 
> 
> > diff --git a/crypto/algif_aead.c b/crypto/algif_aead.c
> > index 3970ad7f6fd0..da010405eea0 100644
> > --- a/crypto/algif_aead.c
> > +++ b/crypto/algif_aead.c
> > @@ -66,13 +66,22 @@ static int aead_sendmsg(struct socket *sock, struct
> > msghdr *msg, size_t size) {
> > struct sock *sk = sock->sk;
> > struct alg_sock *ask = alg_sk(sk);
> > +   struct af_alg_ctx *ctx = ask->private;
> > struct sock *psk = ask->parent;
> > struct alg_sock *pask = alg_sk(psk);
> > struct aead_tfm *aeadc = pask->private;
> > -   struct crypto_aead *tfm = aeadc->aead;
> > -   unsigned int ivsize = crypto_aead_ivsize(tfm);
> > +   struct crypto_aead *aead = aeadc->aead;
> > +   struct crypto_tfm *tfm = crypto_aead_tfm(aead);
> > +   unsigned int ivsize = crypto_aead_ivsize(aead);
> > +   int ret = af_alg_sendmsg(sock, msg, size, ivsize);
> > +
> > +   if (ret < 0)
> > +   return ret;
> > 
> > -   return af_alg_sendmsg(sock, msg, size, ivsize);
> > +   if (ctx->iiv == ALG_IIV_USE)  
> 
> This should be ALG_IIV_DISABLE of course.

You say that, but my initial reading was that the core
was requesting that the driver do things in parallel
irrespective of whether the driver thought it was safe.
So I would think this was correct.

Definitely needs some documentation or a clearer name.

> 
> > +   tfm->crt_flags |= CRYPTO_TFM_REQ_PARALLEL;
> > +
> > +   return ret;
> >  }
> > 
> >  static int crypto_aead_copy_sgl(struct crypto_skcipher *null_tfm,
> > diff --git a/crypto/algif_skcipher.c b/crypto/algif_skcipher.c
> > index aee602a3ec24..22554cf7 100644
> > --- a/crypto/algif_skcipher.c
> > +++ b/crypto/algif_skcipher.c
> > @@ -43,12 +43,21 @@ static int skcipher_sendmsg(struct socket *sock, struct
> > msghdr *msg, {
> > struct sock *sk = sock->sk;
> > struct alg_sock *ask = alg_sk(sk);
> > +   struct af_alg_ctx *ctx = ask->private;
> > struct sock *psk = ask->parent;
> > struct alg_sock *pask = alg_sk(psk);
> > -   struct crypto_skcipher *tfm = pask->private;
> > -   unsigned ivsize = crypto_skcipher_ivsize(tfm);
> > +   struct crypto_skcipher *skc = pask->private;
> > +   struct crypto_tfm *tfm = crypto_skcipher_tfm(skc);
> > +   unsigned int ivsize = crypto_skcipher_ivsize(skc);
> > +   int ret = af_alg_sendmsg(sock, msg, size, ivsize);
> > +
> > +   if (ret < 0)
> > +   return ret;
> > 
> > -   return af_alg_sendmsg(sock, msg, size, ivsize);
> > +   if (ctx->iiv == ALG_IIV_USE)  
> 
> Dto.
> 
> I will wait for some more comments before sending a new patch.
> 
> Ciao
> Stephan
> 
>

Re: [PATCH v2 0/4] crypto: AF_ALG AIO improvements

2018-02-07 Thread Jonathan Cameron

On Wed, 7 Feb 2018 08:42:59 +0100
Stephan Müller <smuel...@chronox.de> wrote:

> Hi Herbert,
> 
> Herbert, the patch 1 is meant for stable. However, this patch as is
> only applies to the new AF_ALG interface implementation. Though,
> the issue goes back to the first implementation of AIO support.
> Shall I try prepare a patch for the old AF_ALG implementation
> as well?
> 
> Changes from v1:
> 
> * integrate the inline IV and locking patch into one patch set
> 
> * reverse the order of lock context IV patch and inline IV patch --
>   the reason is to allow the first patch to be back-ported to stable
> 
> * mark the first patch (locking of the context IV) as applicable to
>   stable as there is an existing inconsistency which was demonstrated
>   by Harsh with the Chelsio driver vs the AES-NI driver
> 
> * modify the inline IV patch to have proper unlocking of the mutex
>   in case of errors
> 
> * prevent locking if no IV is defined by cipher
> 
> * add a patch to allow crypto drivers to report whether they support
>   serialization -- in this case the locking in AF_ALG shall be
>   disabled
> 
> * add a patch to inform the crypto drivers that their serialization
>   support should actually be enabled and used because AF_ALG does not
>   serialize the interdependent parallel AIO requests
> 
> * streamline the code in patch 1 and 2 slightly
> 
> I would like to ask the folks with real AIO hardware (Harsh, Jonathan)
> to test the patches. Especially, is the locking patch should be tested
> by Harsh as you have seen the issue with your hardware.
> 

I've updated my driver to take advantage of this and it is working
pretty much the same as it was with the dirty hacks I was running before.

So Tested-by: Jonathan Cameron <jonathan.came...@huawei.com> with
that minor change to sensitivity to having it set to IIV multiple times.

Hmm. Naming of the parallel request probably needs some thought though.
Will reply to that patch.


> Thanks.
> 
> Stephan Mueller (4):
>   crypto: AF_ALG AIO - lock context IV
>   crypto: AF_ALG - inline IV support
>   crypto: AF_ALG - allow driver to serialize IV access
>   crypto: add CRYPTO_TFM_REQ_PARALLEL flag
> 
>  crypto/af_alg.c | 119 
> +++-
>  crypto/algif_aead.c |  86 +---
>  crypto/algif_skcipher.c |  38 ++
>  include/crypto/if_alg.h |  37 ++
>  include/linux/crypto.h  |  16 ++
>  include/uapi/linux/if_alg.h |   6 ++-
>  6 files changed, 249 insertions(+), 53 deletions(-)
>

Re: [PATCH v2 2/4] crypto: AF_ALG - inline IV support

2018-02-07 Thread Jonathan Cameron

On Wed, 7 Feb 2018 08:43:32 +0100
Stephan Müller  wrote:

> The kernel crypto API requires the caller to set an IV in the request
> data structure. That request data structure shall define one particular
> cipher operation. During the cipher operation, the IV is read by the
> cipher implementation and eventually the potentially updated IV (e.g.
> in case of CBC) is written back to the memory location the request data
> structure points to.
> 
> AF_ALG allows setting the IV with a sendmsg request, where the IV is
> stored in the AF_ALG context that is unique to one particular AF_ALG
> socket. Note the analogy: an AF_ALG socket is like a TFM where one
> recvmsg operation uses one request with the TFM from the socket.
> 
> AF_ALG these days supports AIO operations with multiple IOCBs. I.e.
> with one recvmsg call, multiple IOVECs can be specified. Each
> individual IOCB (derived from one IOVEC) implies that one request data
> structure is created with the data to be processed by the cipher
> implementation. The IV that was set with the sendmsg call is registered
> with the request data structure before the cipher operation.
> 
> As of now, the individual IOCBs are serialized with respect to the IV
> handling. This implies that the kernel does not perform a truly parallel
> invocation of the cipher implementations. However, if the user wants to
> perform cryptographic operations on multiple IOCBs where each IOCB is
> truly independent from the other, parallel invocations are possible.
> This would require that each IOCB provides its own IV to ensure true
> separation of the IOCBs.
> 
> The solution is to allow providing the IV data supplied as part of the
> plaintext/ciphertext. To do so, the AF_ALG interface treats the
> ALG_SET_OP flag usable with sendmsg as a bit-array allowing to set the
> cipher operation together with the flag whether the operation should
> enable support for inline IV handling.
> 
> If inline IV handling is enabled, the IV is expected to be the first
> part of the input plaintext/ciphertext. This IV is only used for one
> cipher operation and will not retained in the kernel for subsequent
> cipher operations.
> 
> The inline IV handling support is only allowed to be enabled during
> the first sendmsg call for a context. Any subsequent sendmsg calls are
> not allowed to change the setting of the inline IV handling (either
> enable or disable it) as this would open up a race condition with the
> locking and unlocking of the ctx->ivlock mutex.
> 
> The AEAD support required a slight re-arragning of the code, because
> obtaining the IV implies that ctx->used is updated. Thus, the ctx->used
> access in _aead_recvmsg must be moved below the IV gathering.
> 
> The AEAD code to find the first SG with data in the TX SGL is moved to a
> common function as it is required by the IV gathering function as well.
> 
> This patch does not change the existing interface where user space is
> allowed to provide an IV via sendmsg. It only extends the interface by
> giving the user the choice to provide the IV either via sendmsg (the
> current approach) or as part of the data (the additional approach).
> 
> Signed-off-by: Stephan Mueller 

It's not a 'bug' as such, but this does currently break the crypto-perf
test for aio and iiv in the libkcapi branch.  We can perhaps make it
more forgiving...

I suggest we let noop cases where we set IIV on when it is already on
to not result in an error but to just be ignored.

Jonathan
> ---
>  crypto/af_alg.c | 93 
> ++---
>  crypto/algif_aead.c | 58 
>  crypto/algif_skcipher.c | 12 +++---
>  include/crypto/if_alg.h | 21 +-
>  include/uapi/linux/if_alg.h |  6 ++-
>  5 files changed, 143 insertions(+), 47 deletions(-)
> 
> diff --git a/crypto/af_alg.c b/crypto/af_alg.c
> index e7887621aa44..973233000d18 100644
> --- a/crypto/af_alg.c
> +++ b/crypto/af_alg.c
> @@ -14,6 +14,7 @@
>  
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -834,6 +835,7 @@ int af_alg_sendmsg(struct socket *sock, struct msghdr 
> *msg, size_t size,
>   struct af_alg_control con = {};
>   long copied = 0;
>   bool enc = 0;
> + int iiv = ALG_IIV_DISABLE;
>   bool init = 0;
>   int err = 0;
>  
> @@ -843,7 +845,7 @@ int af_alg_sendmsg(struct socket *sock, struct msghdr 
> *msg, size_t size,
>   return err;
>  
>   init = 1;
> - switch (con.op) {
> + switch (con.op & ALG_OP_CIPHER_MASK) {
>   case ALG_OP_ENCRYPT:
>   enc = 1;
>   break;
> @@ -854,6 +856,9 @@ int af_alg_sendmsg(struct socket *sock, struct msghdr 
> *msg, size_t size,
>   return -EINVAL;
>   }
>  
> + if (con.op & ALG_OP_INLINE_IV)
> + iiv = ALG_IIV_USE;
> +
>

Re: [RFC PATCH 2/3] crypto: hisilicon hacv1 driver

2018-02-05 Thread Jonathan Cameron

On Sat, 3 Feb 2018 12:16:18 +0100
Stephan Müller <smuel...@chronox.de> wrote:

> Am Dienstag, 30. Januar 2018, 16:29:52 CET schrieb Jonathan Cameron:
> 
> Hi Jonathan,
> 
> > +   /* Special path for single element SGLs with small packets. */
> > +   if (sg_is_last(sgl) && sgl->length <= SEC_SMALL_PACKET_SIZE) {  
> 
> This looks strangely familiar. Is this code affected by a similar issue fixed 
> in https://patchwork.kernel.org/patch/10173981?

Not as far as I know - this section is about optimizing the setup of
the IOMMU. It's purely a performance optimization.

It is really costly to do the translation setup for lots of small regions.
These small regions are often contiguous anyway making the cost even more
ridiculous.  The use of a dma pool allows us to keep the iommu setup
constant(ish). It is cheaper to copy into an element of this, already
mapped, pool than it is to set up the iommu mappings for a new region.

I could drop this for the initial submission and bring it in as an
optimization with supporting numbers as a follow up patch.

> 
> > +static int sec_alg_skcipher_setkey(struct crypto_skcipher *tfm,
> > +  const u8 *key, unsigned int keylen,
> > +  enum sec_cipher_alg alg)
> > +{
> > +   struct sec_alg_skcipher_ctx *ctx = crypto_skcipher_ctx(tfm);
> > +   struct device *dev;
> > +
> > +   spin_lock(>lock);
> > +   if (ctx->enc_key) {
> > +   /* rekeying */
> > +   dev = SEC_Q_DEV(ctx->queue);
> > +   memset(ctx->enc_key, 0, SEC_MAX_CIPHER_KEY);
> > +   memset(ctx->dec_key, 0, SEC_MAX_CIPHER_KEY);
> > +   memset(>enc_req, 0, sizeof(ctx->enc_req));
> > +   memset(>dec_req, 0, sizeof(ctx->dec_req));
> > +   } else {
> > +   /* new key */
> > +   dev = SEC_Q_DEV(ctx->queue);
> > +   ctx->enc_key = dma_zalloc_coherent(dev, SEC_MAX_CIPHER_KEY,
> > +  >enc_pkey,
> > GFP_ATOMIC); +   if (!ctx->enc_key) {
> > +   spin_unlock(>lock);
> > +   return -ENOMEM;
> > +   }
> > +   ctx->dec_key = dma_zalloc_coherent(dev, SEC_MAX_CIPHER_KEY,
> > +  >dec_pkey,
> > GFP_ATOMIC); +   if (!ctx->dec_key) {
> > +   spin_unlock(>lock);
> > +   goto out_free_enc;
> > +   }
> > +   }
> > +   spin_unlock(>lock);
> > +   if (sec_alg_skcipher_init_context(tfm, key, keylen, alg))
> > +   goto out_free_all;
> > +
> > +   return 0;
> > +
> > +out_free_all:
> > +   memset(ctx->dec_key, 0, SEC_MAX_CIPHER_KEY);
> > +   dma_free_coherent(dev, SEC_MAX_CIPHER_KEY,
> > + ctx->dec_key, ctx->dec_pkey);
> > +   ctx->dec_key = NULL;
> > +out_free_enc:
> > +   memset(ctx->enc_key, 0, SEC_MAX_CIPHER_KEY);
> > +   dma_free_coherent(dev, SEC_MAX_CIPHER_KEY,
> > + ctx->enc_key, ctx->enc_pkey);
> > +   ctx->enc_key = NULL;  
> 
> Please use memzero_explicit.

Will do - thanks!

Jonathan
> > +
> > +   return -ENOMEM;
> > +

Re: [PATCH] crypto: AF_ALG AIO - lock context IV

2018-02-01 Thread Jonathan Cameron

On Thu, 1 Feb 2018 12:07:21 +0200
Gilad Ben-Yossef  wrote:

> On Thu, Feb 1, 2018 at 12:04 PM, Stephan Mueller  wrote:
> > Am Donnerstag, 1. Februar 2018, 10:35:07 CET schrieb Gilad Ben-Yossef:
> >
> > Hi Gilad,
> >  
> >> >
> >> > Which works well for the sort of optimization I did and for hardware that
> >> > can do iv dependency tracking itself. If hardware dependency tracking was
> >> > avilable, you would be able to queue up requests with a chained IV
> >> > without taking any special precautions at all. The hardware would
> >> > handle the IV update dependencies.
> >> >
> >> > So in conclusion, Stephan's approach should work and give us a nice
> >> > small patch suitable for stable inclusion.
> >> >
> >> > However, if people know that their setup overhead can be put in parallel
> >> > with previous requests (even when the IV is not yet updated) then they
> >> > will
> >> > probably want to do something inside their own driver and set the flag
> >> > that Stephan is proposing adding to bypass the mutex approach.  
> >>
> >> The patches from Stephan looks good to me, but I think we can do better
> >> for the long term approach you are discussing.  
> >
> > What you made me think of is the following: shouldn't we relay the inline IV
> > flag on to the crypto drivers?
> >
> > The reason is to allow a clean implementation of the enabling or disabling 
> > of
> > the dependency handling in the driver. Jonathan's driver, for example, 
> > decides
> > based on the pointer value of the IV buffer whether it is the same buffer 
> > and
> > thus dependency handling is to be applied. This is fragile.

I agree it's inelegant and a flag would be better than pointer tricks (though
they are safe currently - we never know what might change in future)
It was really a minimal example rather than a suggestion of the ultimate
solution ;)  I was planning on suggesting a flag myself once the basic
discussion concluded the approach was worthwhile.

> >
> > As AF_ALG knows whether the inline IV with separate IVs per request or the
> > serialization with one IV buffer for all requests is requested, it should
> > relay this state on to the drivers. Thus, for example, Jonathan's driver can
> > be changed to rely on this flag instead on the buffer pointer value to 
> > decide
> > whether to enable its dependency handling.  
> 
> Yes, that is exactly what I was trying to point out :-)

Agreed - make things explicit rather than basically relying on knowing how
the above layers are working.

Thanks,

Jonathan

Re: [PATCH] crypto: AF_ALG AIO - lock context IV

2018-01-31 Thread Jonathan Cameron

On Tue, 30 Jan 2018 15:51:40 +
Jonathan Cameron <jonathan.came...@huawei.com> wrote:

> On Tue, 30 Jan 2018 09:27:04 +0100
> Stephan Müller <smuel...@chronox.de> wrote:

A few clarifications from me after discussions with Stephan this morning.

> 
> > Hi Harsh,
> > 
> > may I as you to try the attached patch on your Chelsio driver? Please 
> > invoke 
> > the following command from libkcapi:
> > 
> > kcapi  -d 2 -x 9  -e -c "cbc(aes)" -k 
> > 8d7dd9b0170ce0b5f2f8e1aa768e01e91da8bfc67fd486d081b28254c99eb423 -i 
> > 7fbc02ebf5b93322329df9bfccb635af -p 48981da18e4bb9ef7e2e3162d16b1910
> > 
> > The result should now be a fully block-chained result.
> > 
> > Note, the patch goes on top of the inline IV patch.
> > 
> > Thanks
> > 
> > ---8<---
> > 
> > In case of AIO where multiple IOCBs are sent to the kernel and inline IV
> > is not used, the ctx->iv is used for each IOCB. The ctx->iv is read for
> > the crypto operation and written back after the crypto operation. Thus,
> > processing of multiple IOCBs must be serialized.
> > 
> > As the AF_ALG interface is used by user space, a mutex provides the
> > serialization which puts the caller to sleep in case a previous IOCB
> > processing is not yet finished.
> > 
> > If multiple IOCBs arrive that all are blocked, the mutex' FIFO operation
> > of processing arriving requests ensures that the blocked IOCBs are
> > unblocked in the right order.
> > 
> > Signed-off-by: Stephan Mueller <smuel...@chronox.de> 

Firstly, as far as I can tell (not tested it yet) the patch does the
job and is about the best we can easily do in the AF_ALG code.

I'd suggest that this (or v2 anyway) is stable material as well
(which, as Stephan observed, will require reordering of the two patches).

> 
> As a reference point, holding outside the kernel (in at least
> the case of our hardware with 8K CBC) drops us down to around 30%
> of performance with separate IVs.  Putting a queue in kernel
> (and hence allowing most setup of descriptors / DMA etc) gets us
> up to 50% of raw non chained performance.
> 
> So whilst this will work in principle I suspect anyone who
> cares about the cost will be looking at adding their own
> software queuing and dependency handling in driver.  How
> much that matters will depend on just how quick the hardware
> is vs overheads.
> 
> Anyhow, just posted the driver as an RFC. There are a few corners
> that need resolving and the outcome of this thread effects whether
> we need to play games with IVs or not.
> 
> https://marc.info/?l=linux-crypto-vger=151732626428206=2
> grep for softqueue.
> 
> We have a number of processing units that will grab requests
> from the HW queues in parallel.  So 'lots' of requests
> from a given HW queue may be in flight at the same time (the problem
> case).
> 
> Logic is fairly simple.
> 1. Only create a softqueue if the encryption mode requires chaining.
>Currently in this driver that is CBC and CTR modes only.
>Otherwise put everything straight in the hardware monitored queues.
> 2. If we have an IV embedded within the request it will have a different
>address to the one used previously (guaranteed if that one is still
>in flight and it doesn't matter if it isn't).  If that occurs we
>can safely add it to the hardware queue. To avoid DoS issues against
>and earlier set of messages using the a chained IV we actually make
>sure nothing is in the software queue (not needed for correctness)
> 3. If IV matches previous IV and there are existing elements in SW or HW
>queues we need to hold it in the SW queue.
> 4. On receiving a response from the hardware and if the hardware queue
>is empty, we can release an element from the software queue into the
>hardware queue.

The differences are better shown with pictures...
To compare the two approaches. If we break up the data flow the
alternatives are

1) Mutex causes queuing in AF ALG code

The key thing is the [Build / Map HW Descs] for small packets,
up to perhaps 32K, is a significant task we can't avoid.
At 8k it looks like it takes perhaps 20-30% of the time
(though I only have end to end performance numbers so far)


[REQUEST 1 from userspace]   
   |  [REQUEST 2 from userspace] 
[AF_ALG/SOCKET]   |
   |   [AF_ALG/SOCKET]
NOTHING BLOCKING (lock mut)   |
   |Queued on Mutex
 [BUILD / MAP HW DESCS]   | 
   |  |
   [Place in HW Queue]|
   |

Re: [PATCH] crypto: AF_ALG AIO - lock context IV

2018-01-30 Thread Jonathan Cameron

On Tue, 30 Jan 2018 09:27:04 +0100
Stephan Müller  wrote:

> Hi Harsh,
> 
> may I as you to try the attached patch on your Chelsio driver? Please invoke 
> the following command from libkcapi:
> 
> kcapi  -d 2 -x 9  -e -c "cbc(aes)" -k 
> 8d7dd9b0170ce0b5f2f8e1aa768e01e91da8bfc67fd486d081b28254c99eb423 -i 
> 7fbc02ebf5b93322329df9bfccb635af -p 48981da18e4bb9ef7e2e3162d16b1910
> 
> The result should now be a fully block-chained result.
> 
> Note, the patch goes on top of the inline IV patch.
> 
> Thanks
> 
> ---8<---
> 
> In case of AIO where multiple IOCBs are sent to the kernel and inline IV
> is not used, the ctx->iv is used for each IOCB. The ctx->iv is read for
> the crypto operation and written back after the crypto operation. Thus,
> processing of multiple IOCBs must be serialized.
> 
> As the AF_ALG interface is used by user space, a mutex provides the
> serialization which puts the caller to sleep in case a previous IOCB
> processing is not yet finished.
> 
> If multiple IOCBs arrive that all are blocked, the mutex' FIFO operation
> of processing arriving requests ensures that the blocked IOCBs are
> unblocked in the right order.
> 
> Signed-off-by: Stephan Mueller 

As a reference point, holding outside the kernel (in at least
the case of our hardware with 8K CBC) drops us down to around 30%
of performance with separate IVs.  Putting a queue in kernel
(and hence allowing most setup of descriptors / DMA etc) gets us
up to 50% of raw non chained performance.

So whilst this will work in principle I suspect anyone who
cares about the cost will be looking at adding their own
software queuing and dependency handling in driver.  How
much that matters will depend on just how quick the hardware
is vs overheads.

Anyhow, just posted the driver as an RFC. There are a few corners
that need resolving and the outcome of this thread effects whether
we need to play games with IVs or not.

https://marc.info/?l=linux-crypto-vger=151732626428206=2
grep for softqueue.

We have a number of processing units that will grab requests
from the HW queues in parallel.  So 'lots' of requests
from a given HW queue may be in flight at the same time (the problem
case).

Logic is fairly simple.
1. Only create a softqueue if the encryption mode requires chaining.
   Currently in this driver that is CBC and CTR modes only.
   Otherwise put everything straight in the hardware monitored queues.
2. If we have an IV embedded within the request it will have a different
   address to the one used previously (guaranteed if that one is still
   in flight and it doesn't matter if it isn't).  If that occurs we
   can safely add it to the hardware queue. To avoid DoS issues against
   and earlier set of messages using the a chained IV we actually make
   sure nothing is in the software queue (not needed for correctness)
3. If IV matches previous IV and there are existing elements in SW or HW
   queues we need to hold it in the SW queue.
4. On receiving a response from the hardware and if the hardware queue
   is empty, we can release an element from the software queue into the
   hardware queue.

This maintains chaining when needed and gets a good chunk of the lost
performance back.  I believe (though yet to verify) the remaining lost
performance is mostly down to the fact we can't coalesce interrupts if
chaining.

Note the driver is deliberately a touch 'simplistic' so there may be other
optimization opportunities to get some of the lost performance back or
it may be fundamental due to the fact the raw processing can't be in
parallel.

> ---
>  crypto/af_alg.c | 22 ++
>  crypto/algif_aead.c |  2 ++
>  crypto/algif_skcipher.c |  2 ++
>  include/crypto/if_alg.h |  3 +++
>  4 files changed, 29 insertions(+)
> 
> diff --git a/crypto/af_alg.c b/crypto/af_alg.c
> index 87394dc90ac0..3007c9d899da 100644
> --- a/crypto/af_alg.c
> +++ b/crypto/af_alg.c
> @@ -1059,6 +1059,8 @@ void af_alg_async_cb(struct crypto_async_request *_req, 
> int err)
>   struct kiocb *iocb = areq->iocb;
>   unsigned int resultlen;
>  
> + af_alg_put_iv(sk);
> +
>   /* Buffer size written by crypto operation. */
>   resultlen = areq->outlen;
>  
> @@ -1227,6 +1229,11 @@ int af_alg_get_iv(struct sock *sk, struct 
> af_alg_async_req *areq)
>  
>   /* No inline IV or cipher has no IV, use ctx IV buffer. */
>   if (!ctx->iiv || !ctx->ivlen) {
> + int ret = mutex_lock_interruptible(>ivlock);
> +
> + if (ret)
> + return ret;
> +
>   areq->iv = ctx->iv;
>   areq->ivlen = 0;// areq->iv will not be freed
>   return 0;
> @@ -1252,6 +1259,21 @@ int af_alg_get_iv(struct sock *sk, struct 
> af_alg_async_req *areq)
>  }
>  EXPORT_SYMBOL_GPL(af_alg_get_iv);
>  
> +/**
> + * af_alg_put_iv - release lock on IV in case CTX IV is used
> + *
> + * @sk [in] AF_ALG socket
> + */
> +void af_alg_put_iv(struct

[RFC PATCH 2/3] crypto: hisilicon hacv1 driver

2018-01-30 Thread Jonathan Cameron

From: Jonathan Cameron <jonathan.came...@huawei.com>

This accelerator is found inside hisilicon hip06 and hip07 SoCs.
Each instance provides a number of queues which feed a different number of
backend accleration units.

The queues are operating in an out of order mode in the interests of
throughput. The silicon does not do tracking of dependencies between
multiple 'messages' or update of the IVs as appropriate for training.
Hence where relevant we need to do this in software.

Signed-off-by: Jonathan Cameron <jonathan.came...@huawei.com>
---
 drivers/crypto/Kconfig  |2 +
 drivers/crypto/Makefile |1 +
 drivers/crypto/hisilicon/Kconfig|   16 +
 drivers/crypto/hisilicon/Makefile   |2 +
 drivers/crypto/hisilicon/sec/Makefile   |3 +
 drivers/crypto/hisilicon/sec/sec_algs.c | 1082 +++
 drivers/crypto/hisilicon/sec/sec_drv.c  | 1418 +++
 drivers/crypto/hisilicon/sec/sec_drv.h  |  282 ++
 8 files changed, 2806 insertions(+)

diff --git a/drivers/crypto/Kconfig b/drivers/crypto/Kconfig
index 47ec920d5b71..ea85924427e9 100644
--- a/drivers/crypto/Kconfig
+++ b/drivers/crypto/Kconfig
@@ -731,4 +731,6 @@ config CRYPTO_DEV_ARTPEC6
 
  To compile this driver as a module, choose M here.
 
+source "drivers/crypto/hisilicon/Kconfig"
+
 endif # CRYPTO_HW
diff --git a/drivers/crypto/Makefile b/drivers/crypto/Makefile
index 2513d13ea2c4..ed237414ca1b 100644
--- a/drivers/crypto/Makefile
+++ b/drivers/crypto/Makefile
@@ -45,3 +45,4 @@ obj-$(CONFIG_CRYPTO_DEV_VMX) += vmx/
 obj-$(CONFIG_CRYPTO_DEV_BCM_SPU) += bcm/
 obj-$(CONFIG_CRYPTO_DEV_SAFEXCEL) += inside-secure/
 obj-$(CONFIG_CRYPTO_DEV_ARTPEC6) += axis/
+obj-$(CONFIG_CRYPTO_DEV_HISILICON) += hisilicon/
diff --git a/drivers/crypto/hisilicon/Kconfig b/drivers/crypto/hisilicon/Kconfig
new file mode 100644
index ..cd9296687235
--- /dev/null
+++ b/drivers/crypto/hisilicon/Kconfig
@@ -0,0 +1,16 @@
+# SPDX-License-Identifier: GPL-2.0
+
+config CRYPTO_DEV_HISILICON
+   bool
+
+config CRYPTO_DEV_HISI_SEC
+   tristate "Support for Hisilicon SEC crypto block cipher accelerator"
+   select CRYPTO_DEV_HISILICON
+   select CRYPTO_BLKCIPHER
+   select CRYPTO_ALGAPI
+   depends on ARM64
+   help
+ Support for Hisilicon SEC Engine in Hip06 and Hip07
+
+ To compile this as a module, choose M here: the module
+ will be called hisi_sec.
diff --git a/drivers/crypto/hisilicon/Makefile 
b/drivers/crypto/hisilicon/Makefile
new file mode 100644
index ..463f46ace182
--- /dev/null
+++ b/drivers/crypto/hisilicon/Makefile
@@ -0,0 +1,2 @@
+# SPDX-License-Identifier: GPL-2.0
+obj-$(CONFIG_CRYPTO_DEV_HISI_SEC) += sec/
diff --git a/drivers/crypto/hisilicon/sec/Makefile 
b/drivers/crypto/hisilicon/sec/Makefile
new file mode 100644
index ..a55b698e0c27
--- /dev/null
+++ b/drivers/crypto/hisilicon/sec/Makefile
@@ -0,0 +1,3 @@
+# SPDX-License-Identifier: GPL-2.0
+obj-$(CONFIG_CRYPTO_DEV_HISI_SEC) += hisi_sec.o
+hisi_sec-y = sec_algs.o sec_drv.o
diff --git a/drivers/crypto/hisilicon/sec/sec_algs.c 
b/drivers/crypto/hisilicon/sec/sec_algs.c
new file mode 100644
index ..6b82636f7979
--- /dev/null
+++ b/drivers/crypto/hisilicon/sec/sec_algs.c
@@ -0,0 +1,1082 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2016-2017 Hisilicon Limited. */
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "sec_drv.h"
+
+enum sec_cipher_type {
+   SEC_CIPHER_NULL,
+   SEC_CIPHER_ENCRYPT,
+   SEC_CIPHER_DECRYPT,
+   SEC_CIPHER_PASS,
+   SEC_CIPHER_INVALID,
+};
+
+#define SEC_C_ALG_DES  0
+#define SEC_C_ALG_3DES 1
+#define SEC_C_ALG_AES  2
+
+#define SEC_C_MODE_ECB 0
+#define SEC_C_MODE_CBC 1
+#define SEC_C_MODE_CTR 4
+#define SEC_C_MODE_CCM 5
+#define SEC_C_MODE_GCM 6
+#define SEC_C_MODE_XTS 7
+
+#define SEC_KEY_LEN_AES_1280
+#define SEC_KEY_LEN_AES_1921
+#define SEC_KEY_LEN_AES_2562
+#define SEC_KEY_LEN_DES1
+#define SEC_KEY_LEN_3DES_3_KEY 1
+#define SEC_KEY_LEN_3DES_2_KEY 3
+
+#define SEC_C_WIDTH_AES_128BIT 0
+#define SEC_C_WIDTH_AES_8BIT   1
+#define SEC_C_WIDTH_AES_1BIT   2
+#define SEC_C_WIDTH_DES_64BIT  0
+#define SEC_C_WIDTH_DES_8BIT   1
+#define SEC_C_WIDTH_DES_1BIT   2
+
+enum sec_cipher_alg {
+   SEC_DES_ECB_64,
+   SEC_DES_CBC_64,
+
+   SEC_3DES_ECB_192_3KEY,
+   SEC_3DES_ECB_192_2KEY,
+
+   SEC_3DES_CBC_192_3KEY,
+   SEC_3DES_CBC_192_2KEY,
+
+   SEC_AES_ECB_128,
+   SEC_AES_ECB_192,
+   SEC_AES_ECB_256,
+
+   SEC_AES_CBC_128,
+

[RFC PATCH 3/3] arm64: dts: hisi: add SEC crypto accelerator nodes for hip07 SoC

2018-01-30 Thread Jonathan Cameron

From: Jonathan Cameron <jonathan.came...@huawei.com>

Enable all 4 SEC units available on d05 boards.

Signed-off-by: Jonathan Cameron <jonathan.came...@huawei.com>
---
 arch/arm64/boot/dts/hisilicon/hip07.dtsi | 293 +++
 1 file changed, 293 insertions(+)

diff --git a/arch/arm64/boot/dts/hisilicon/hip07.dtsi 
b/arch/arm64/boot/dts/hisilicon/hip07.dtsi
index 2c01a21c3665..81b57c8ce720 100644
--- a/arch/arm64/boot/dts/hisilicon/hip07.dtsi
+++ b/arch/arm64/boot/dts/hisilicon/hip07.dtsi
@@ -1049,7 +1049,74 @@
num-pins = <2>;
};
};
+   p0_mbigen_alg_a:interrupt-controller@d008 {
+   compatible = "hisilicon,mbigen-v2";
+   reg = <0x0 0xd008 0x0 0x1>;
 
+   p0_mbigen_sec_a: intc_sec {
+   msi-parent = <_its_dsa_a 0x40400>;
+   interrupt-controller;
+   #interrupt-cells = <2>;
+   num-pins = <33>;
+   };
+   p0_mbigen_smmu_alg_a: intc_smmu_alg {
+   msi-parent = <_its_dsa_a 0x40b1b>;
+   interrupt-controller;
+   #interrupt-cells = <2>;
+   num-pins = <3>;
+   };
+   };
+   p0_mbigen_alg_b:interrupt-controller@8,d008 {
+   compatible = "hisilicon,mbigen-v2";
+   reg = <0x8 0xd008 0x0 0x1>;
+
+   p0_mbigen_sec_b: intc_sec {
+   msi-parent = <_its_dsa_b 0x42400>;
+   interrupt-controller;
+   #interrupt-cells = <2>;
+   num-pins = <33>;
+   };
+   p0_mbigen_smmu_alg_b: intc_smmu_alg {
+   msi-parent = <_its_dsa_b 0x42b1b>;
+   interrupt-controller;
+   #interrupt-cells = <2>;
+   num-pins = <3>;
+   };
+   };
+   p1_mbigen_alg_a:interrupt-controller@400,d008 {
+   compatible = "hisilicon,mbigen-v2";
+   reg = <0x400 0xd008 0x0 0x1>;
+
+   p1_mbigen_sec_a: intc_sec {
+   msi-parent = <_its_dsa_a 0x44400>;
+   interrupt-controller;
+   #interrupt-cells = <2>;
+   num-pins = <33>;
+   };
+   p1_mbigen_smmu_alg_a: intc_smmu_alg {
+   msi-parent = <_its_dsa_a 0x44b1b>;
+   interrupt-controller;
+   #interrupt-cells = <2>;
+   num-pins = <3>;
+   };
+   };
+   p1_mbigen_alg_b:interrupt-controller@408,d008 {
+   compatible = "hisilicon,mbigen-v2";
+   reg = <0x408 0xd008 0x0 0x1>;
+
+   p1_mbigen_sec_b: intc_sec {
+   msi-parent = <_its_dsa_b 0x46400>;
+   interrupt-controller;
+   #interrupt-cells = <2>;
+   num-pins = <33>;
+   };
+   p1_mbigen_smmu_alg_b: intc_smmu_alg {
+   msi-parent = <_its_dsa_b 0x46b1b>;
+   interrupt-controller;
+   #interrupt-cells = <2>;
+   num-pins = <3>;
+   };
+   };
p0_mbigen_dsa_a: interrupt-controller@c008 {
compatible = "hisilicon,mbigen-v2";
reg = <0x0 0xc008 0x0 0x1>;
@@ -1083,6 +1150,58 @@
};
};
 
+   p0_smmu_alg_a: smmu_alg@d004 {
+   compatible = "arm,smmu-v3";
+   reg = <0x0 0xd004 0x0 0x2>;
+   interrupt-parent = <_mbigen_smmu_alg_a>;
+   interrupts = <733 1>,
+   <734 1>,
+   <735 1>;
+   interrupt-names = "eventq", "gerror", "priq";
+   #iommu-cells = <1>;
+   dma-coherent;
+   hisilicon,broken-prefetch-cmd;
+   /* smmu-cb-memtype = <0x0 0x1>;*/
+   };
+   p0_smmu_alg_b: smmu_alg@8,d004 {
+   compatible = "arm,smmu-v3";
+   reg = <0x8 0xd004 0x0 0x2>;
+   interrupt-parent = <_mbigen_smmu_alg_b>;
+   interrupts = <733 1>,
+   <734 1>,
+   <735 1>;
+   interrupt-names = "eventq", "gerror", "priq";
+   #iommu-cells = <1>;
+   dma-coherent;
+   hisilicon,broken-prefet

[RFC PATCH 0/3] Hisilicon SEC accelerator driver

2018-01-30 Thread Jonathan Cameron

From: Jonathan Cameron <jonathan.came...@huawei.com>

This an RFC for a couple of reasons:
1) There is some ongoing discussion on how to handle IVs in the AF_ALG
   interface when doing async IO.  There are a few corners of handling
   in here that are performance enhancements based on Stephan's recent
   set.
2) The handling of iommu presence is ugly.  I need to discuss this with
   iommu maintainer and interested parties.
3) The hardware is limited to 32MB per queued job. We need to handle
   splitting this up into multiple jobs and related IV chaining.

This driver provides in kernel support for the Hisilicon SEC
accelerators found on the hip06 and hip07 SoCs.  There are 4 such
units on the D05 board for which DT bindings are provided.  ACPI
configuration also works with an appropriate UEFI build.

The hardware does not update the IV in chaining or counting modes.
This is done in the driver on completion of the cipher operation.

If we are relying on IV chaining with current AF_ALG it is necessary
to ensure we do not stream the elements directly into the hardware
queues.  For modes where this necessary we use a front end software
queue.

The driver currently supports AES, DES and 3DES block ciphers.
Hash support and AAED support to follow.

Jonathan Cameron (3):
  dt-bindings: Add bindings for Hisilicon SEC crypto accelerators.
  crypto: hisilicon hacv1 driver
  arm64: dts: hisi: add SEC crypto accelerator nodes for hip07 SoC

 .../bindings/crypto/hisilicon,hip07-sec.txt|   71 +
 arch/arm64/boot/dts/hisilicon/hip07.dtsi   |  293 
 drivers/crypto/Kconfig |2 +
 drivers/crypto/Makefile|1 +
 drivers/crypto/hisilicon/Kconfig   |   16 +
 drivers/crypto/hisilicon/Makefile  |2 +
 drivers/crypto/hisilicon/sec/Makefile  |3 +
 drivers/crypto/hisilicon/sec/sec_algs.c| 1082 +++
 drivers/crypto/hisilicon/sec/sec_drv.c | 1418 
 drivers/crypto/hisilicon/sec/sec_drv.h |  282 
 10 files changed, 3170 insertions(+)
 create mode 100644 
Documentation/devicetree/bindings/crypto/hisilicon,hip07-sec.txt
 create mode 100644 drivers/crypto/hisilicon/Kconfig
 create mode 100644 drivers/crypto/hisilicon/Makefile
 create mode 100644 drivers/crypto/hisilicon/sec/Makefile
 create mode 100644 drivers/crypto/hisilicon/sec/sec_algs.c
 create mode 100644 drivers/crypto/hisilicon/sec/sec_drv.c
 create mode 100644 drivers/crypto/hisilicon/sec/sec_drv.h

-- 
2.11.0

[RFC PATCH 1/3] dt-bindings: Add bindings for Hisilicon SEC crypto accelerators.

2018-01-30 Thread Jonathan Cameron

From: Jonathan Cameron <jonathan.came...@huawei.com>

The hip06 and hip07 SoCs contain a number of these crypto units which
accelerate AES and DES operations.

Signed-off-by: Jonathan Cameron <jonathan.came...@huawei.com>
---
 .../bindings/crypto/hisilicon,hip07-sec.txt| 71 ++
 1 file changed, 71 insertions(+)

diff --git a/Documentation/devicetree/bindings/crypto/hisilicon,hip07-sec.txt 
b/Documentation/devicetree/bindings/crypto/hisilicon,hip07-sec.txt
new file mode 100644
index ..bf81118e560c
--- /dev/null
+++ b/Documentation/devicetree/bindings/crypto/hisilicon,hip07-sec.txt
@@ -0,0 +1,71 @@
+* Hisilicon hip07 Security Accelerator (SEC)
+
+Required properties:
+- compatible: Must contain one of
+  - "hisilicon,hip06-sec"
+  - "hisilicon,hip07-sec"
+- #address-cells: Must be <2> as 64 bit addresses in reg.
+- #size-cells: Must be <2> as 64 bit lengths in reg.
+- reg: Memory addresses and lengths of the memory regions used by the driver.
+  Region 0 has registers to control the backend processing engines.
+  Region 1 has registers for functionality common to all queues.
+  Regions 2-18 have registers for the individual queues which are isolated
+  both in hardware and within the driver.
+- interrupts: Interrupt specifiers.
+  Refer to interrupt-controller/interrupts.txt for generic interrupt client 
node
+  bindings.
+  Interrupt 0 is for the SEC unit error queue.
+  Interrupt 2N + 1 is the completion interrupt for queue N.
+  Interrupt 2N + 2 is the error interrupt for queue N.
+- dma-coherent:  The driver assumes coherent dma is possible.
+
+Optional properties:
+- iommus: The SEC units are behind smmu-v3 iommus.
+  Refer to iommu/arm,smmu-v3.txt for more information.
+
+Example:
+Second socket, first unit chosen to illustrate need for 64 bit addresses.
+
+p1_sec_a: sec@d200 {
+   compatible = "hisilicon,hip07-sec";
+   #address-cells = <2>;
+   #size-cells = <2>;
+   reg = <0x400 0xd000 0x0 0x1
+  0x400 0xd200 0x0 0x1
+  0x400 0xd201 0x0 0x1
+  0x400 0xd202 0x0 0x1
+  0x400 0xd203 0x0 0x1
+  0x400 0xd204 0x0 0x1
+  0x400 0xd205 0x0 0x1
+  0x400 0xd206 0x0 0x1
+  0x400 0xd207 0x0 0x1
+  0x400 0xd208 0x0 0x1
+  0x400 0xd209 0x0 0x1
+  0x400 0xd20a 0x0 0x1
+  0x400 0xd20b 0x0 0x1
+  0x400 0xd20c 0x0 0x1
+  0x400 0xd20d 0x0 0x1
+  0x400 0xd20e 0x0 0x1
+  0x400 0xd20f 0x0 0x1
+  0x400 0xd210 0x0 0x1>;
+   interrupt-parent = <_mbigen_sec_a>;
+   iommus = <_smmu_alg_a 0x600>;
+   dma-coherent;
+   interrupts = <576 4>,
+<577 1>,<578 4>,
+<579 1>,<580 4>,
+<581 1>,<582 4>,
+<583 1>,<584 4>,
+<585 1>,<586 4>,
+<587 1>,<588 4>,
+<589 1>,<590 4>,
+<591 1>,<592 4>,
+<593 1>,<594 4>,
+<595 1>,<596 4>,
+<597 1>,<598 4>,
+<599 1>,<600 4>,
+<601 1>,<602 4>,
+<603 1>,<604 4>,
+<605 1>,<606 4>,
+<607 1>,<608 4>;
+};
-- 
2.11.0

Re: [PATCH] crypto: AF_ALG - inline IV support

2018-01-22 Thread Jonathan Cameron

On Mon, 22 Jan 2018 15:30:39 +0100
Stephan Mueller <smuel...@chronox.de> wrote:

> Am Montag, 22. Januar 2018, 15:11:53 CET schrieb Jonathan Cameron:
> 
> Hi Jonathan,
Hi Stephan,
> 
> > On Mon, 15 Jan 2018 10:35:34 +0100
> > 
> > Stephan Mueller <smuel...@chronox.de> wrote:  
> > > The kernel crypto API requires the caller to set an IV in the request
> > > data structure. That request data structure shall define one particular
> > > cipher operation. During the cipher operation, the IV is read by the
> > > cipher implementation and eventually the potentially updated IV (e.g.
> > > in case of CBC) is written back to the memory location the request data
> > > structure points to.
> > > 
> > > AF_ALG allows setting the IV with a sendmsg request, where the IV is
> > > stored in the AF_ALG context that is unique to one particular AF_ALG
> > > socket. Note the analogy: an AF_ALG socket is like a TFM where one
> > > recvmsg operation uses one request with the TFM from the socket.
> > > 
> > > AF_ALG these days supports AIO operations with multiple IOCBs. I.e.
> > > with one recvmsg call, multiple IOVECs can be specified. Each
> > > individual IOCB (derived from one IOVEC) implies that one request data
> > > structure is created with the data to be processed by the cipher
> > > implementation. The IV that was set with the sendmsg call is registered
> > > with the request data structure before the cipher operation.
> > > 
> > > In case of an AIO operation, the cipher operation invocation returns
> > > immediately, queuing the request to the hardware. While the AIO request
> > > is processed by the hardware, recvmsg processes the next IOVEC for
> > > which another request is created. Again, the IV buffer from the AF_ALG
> > > socket context is registered with the new request and the cipher
> > > operation is invoked.
> > > 
> > > You may now see that there is a potential race condition regarding the
> > > IV handling, because there is *no* separate IV buffer for the different
> > > requests. This is nicely demonstrated with libkcapi using the following
> > > command which creates an AIO request with two IOCBs each encrypting one
> > > AES block in CBC mode:
> > > 
> > > kcapi  -d 2 -x 9  -e -c "cbc(aes)" -k
> > > 8d7dd9b0170ce0b5f2f8e1aa768e01e91da8bfc67fd486d081b28254c99eb423 -i
> > > 7fbc02ebf5b93322329df9bfccb635af -p 48981da18e4bb9ef7e2e3162d16b1910
> > > 
> > > When the first AIO request finishes before the 2nd AIO request is
> > > processed, the returned value is:
> > > 
> > > 8b19050f66582cb7f7e4b6c873819b7108afa0eaa7de29bac7d903576b674c32
> > > 
> > > I.e. two blocks where the IV output from the first request is the IV input
> > > to the 2nd block.
> > > 
> > > In case the first AIO request is not completed before the 2nd request
> > > commences, the result is two identical AES blocks (i.e. both use the
> > > same IV):
> > > 
> > > 8b19050f66582cb7f7e4b6c873819b718b19050f66582cb7f7e4b6c873819b71
> > > 
> > > This inconsistent result may even lead to the conclusion that there can
> > > be a memory corruption in the IV buffer if both AIO requests write to
> > > the IV buffer at the same time.
> > > 
> > > The solution is to allow providing the IV data supplied as part of the
> > > plaintext/ciphertext. To do so, the AF_ALG interface treats the
> > > ALG_SET_OP flag usable with sendmsg as a bit-array allowing to set the
> > > cipher operation together with the flag whether the operation should
> > > enable support for inline IV handling.
> > > 
> > > If inline IV handling is enabled, the IV is expected to be the first
> > > part of the input plaintext/ciphertext. This IV is only used for one
> > > cipher operation and will not retained in the kernel for subsequent
> > > cipher operations.
> > > 
> > > The AEAD support required a slight re-arragning of the code, because
> > > obtaining the IV implies that ctx->used is updated. Thus, the ctx->used
> > > access in _aead_recvmsg must be moved below the IV gathering.
> > > 
> > > The AEAD code to find the first SG with data in the TX SGL is moved to a
> > > common function as it is required by the IV gathering function as well.
> > > 
> > > This patch does not change the existing interface where user space is
> > > allowed to provide an IV via sendm

Re: [PATCH] crypto: AF_ALG - inline IV support

2018-01-22 Thread Jonathan Cameron

On Mon, 15 Jan 2018 10:35:34 +0100
Stephan Mueller <smuel...@chronox.de> wrote:

> The kernel crypto API requires the caller to set an IV in the request
> data structure. That request data structure shall define one particular
> cipher operation. During the cipher operation, the IV is read by the
> cipher implementation and eventually the potentially updated IV (e.g.
> in case of CBC) is written back to the memory location the request data
> structure points to.
> 
> AF_ALG allows setting the IV with a sendmsg request, where the IV is
> stored in the AF_ALG context that is unique to one particular AF_ALG
> socket. Note the analogy: an AF_ALG socket is like a TFM where one
> recvmsg operation uses one request with the TFM from the socket.
> 
> AF_ALG these days supports AIO operations with multiple IOCBs. I.e.
> with one recvmsg call, multiple IOVECs can be specified. Each
> individual IOCB (derived from one IOVEC) implies that one request data
> structure is created with the data to be processed by the cipher
> implementation. The IV that was set with the sendmsg call is registered
> with the request data structure before the cipher operation.
> 
> In case of an AIO operation, the cipher operation invocation returns
> immediately, queuing the request to the hardware. While the AIO request
> is processed by the hardware, recvmsg processes the next IOVEC for
> which another request is created. Again, the IV buffer from the AF_ALG
> socket context is registered with the new request and the cipher
> operation is invoked.
> 
> You may now see that there is a potential race condition regarding the
> IV handling, because there is *no* separate IV buffer for the different
> requests. This is nicely demonstrated with libkcapi using the following
> command which creates an AIO request with two IOCBs each encrypting one
> AES block in CBC mode:
> 
> kcapi  -d 2 -x 9  -e -c "cbc(aes)" -k
> 8d7dd9b0170ce0b5f2f8e1aa768e01e91da8bfc67fd486d081b28254c99eb423 -i
> 7fbc02ebf5b93322329df9bfccb635af -p 48981da18e4bb9ef7e2e3162d16b1910
> 
> When the first AIO request finishes before the 2nd AIO request is
> processed, the returned value is:
> 
> 8b19050f66582cb7f7e4b6c873819b7108afa0eaa7de29bac7d903576b674c32
> 
> I.e. two blocks where the IV output from the first request is the IV input
> to the 2nd block.
> 
> In case the first AIO request is not completed before the 2nd request
> commences, the result is two identical AES blocks (i.e. both use the
> same IV):
> 
> 8b19050f66582cb7f7e4b6c873819b718b19050f66582cb7f7e4b6c873819b71
> 
> This inconsistent result may even lead to the conclusion that there can
> be a memory corruption in the IV buffer if both AIO requests write to
> the IV buffer at the same time.
> 
> The solution is to allow providing the IV data supplied as part of the
> plaintext/ciphertext. To do so, the AF_ALG interface treats the
> ALG_SET_OP flag usable with sendmsg as a bit-array allowing to set the
> cipher operation together with the flag whether the operation should
> enable support for inline IV handling.
> 
> If inline IV handling is enabled, the IV is expected to be the first
> part of the input plaintext/ciphertext. This IV is only used for one
> cipher operation and will not retained in the kernel for subsequent
> cipher operations.
> 
> The AEAD support required a slight re-arragning of the code, because
> obtaining the IV implies that ctx->used is updated. Thus, the ctx->used
> access in _aead_recvmsg must be moved below the IV gathering.
> 
> The AEAD code to find the first SG with data in the TX SGL is moved to a
> common function as it is required by the IV gathering function as well.
> 
> This patch does not change the existing interface where user space is
> allowed to provide an IV via sendmsg. It only extends the interface by
> giving the user the choice to provide the IV either via sendmsg (the
> current approach) or as part of the data (the additional approach).
> 
> Signed-off-by: Stephan Mueller <smuel...@chronox.de>

Firstly it works
Tested-by: Jonathan Cameron <jonathan.came...@huawei.com>

Now to be really useful for my particular hardware (which doesn't do dependency
tracking in its out of order queues) what I need to know is whether I need to
let previous queued up entries finish before I can run this one or not.

Now there is an obvious 'hack' I can use, which is that the above only matters
is if the IV is still 'in flight'. Given it is still inflight the memory is
in use. Thus until it is finished it's address is clearly only being used by
'this IV'.

Hence I can just compare against the address of the IV for the previous packet.
If it changed we know there is no need to wait for previous packets to finish.

I'll s

Re: [RFC] AF_ALG AIO and IV

2018-01-16 Thread Jonathan Cameron

On Tue, 16 Jan 2018 07:28:06 +0100
Stephan Mueller <smuel...@chronox.de> wrote:

> Am Montag, 15. Januar 2018, 15:42:58 CET schrieb Jonathan Cameron:
> 
> Hi Jonathan,
> 
> > > What about:
> > > 
> > > sendmsg(IV, data)
> > > sendmsg(data)
> > > ..
> > > AIO recvmsg with multiple IOCBs
> > > AIO recvmsg with multiple IOCBs
> > > ..
> > > sendmsg(IV, data)
> > > ..
> > > 
> > > This implies, however, that before the sendmsg with the second IV is sent,
> > > all AIO operations from the first invocation would need to be finished.  
> > Yes that works fine, but rather restricts the flow - you would end up
> > waiting until you could concatenate a bunch of data in userspace so as to
> > trade off against the slow down whenever you need to synchronize back up to
> > userspace.  
> 
> I think the solution is already present and even libkcapi's architecture is 
> set up to handle this scenario:
> 
> We have 2 types of FDs: one obtained from socket() and one from accept(). The 
> socket-FD is akin to the TFM. The accept FD is exactly what you want:
> 
> tfmfd = socket()
> setkey(tfmfd)
> opfd = accept()
> opfd2 = accept()
> sendmsg(opfd, IV, data)
> recvmsg(opfd, data)
> 
> sendmsg(opfd2, IV, data)
> recvmsg(opfd2, data)
> 
> sendmsg(opfd, data)
> ..
> 
> There can be multipe FDs from accept and these are the "identifiers" for your 
> cipher operation stream that belongs together.
> 
> libkcapi has already the architecture for this type of work, but it is not 
> exposed to the API yet. The internal calls for sendmsg/recvmsg all take an 
> (op)FD parameter. E.g. _kcapi_common_send_meta_fd has the fdptr variable. 
> internal.h currently wraps this call into _kcapi_common_send_meta where the 
> handle->opfd variable is used.
> 
> The idea why I implemented that is because the caller could maintain an array 
> of opfds. If we would expose these internal calls with the FD argument, you 
> can maintain multiple opfds implementing your use case.
> 
> The only change would be to expose the internal libkcapi calls.
> 
> Ciao
> Stephan

Thanks, I'll take a look at this soonish. Having a busy week.

Jonathan
> 
>

Re: [RFC] AF_ALG AIO and IV

2018-01-15 Thread Jonathan Cameron

On Mon, 15 Jan 2018 15:31:42 +0100
Stephan Mueller <smuel...@chronox.de> wrote:

> Am Montag, 15. Januar 2018, 15:25:38 CET schrieb Jonathan Cameron:
> 
> Hi Jonathan,
> 
> > On Mon, 15 Jan 2018 14:15:42 +0100
> > 
> > Stephan Mueller <smuel...@chronox.de> wrote:  
> > > Am Montag, 15. Januar 2018, 13:59:27 CET schrieb Jonathan Cameron:
> > > 
> > > Hi Jonathan,
> > >   
> > > > > But there may be hardware that cannot/will not track such
> > > > > dependencies.
> > > > > Yet, it has multiple hardware queues. Such hardware can still handle
> > > > > parallel requests when they are totally independent from each other.
> > > > > For
> > > > > such a case, AF_ALG currently has no support, because it lacks the
> > > > > support for setting multiple IVs for the multiple concurrent calls.  
> > > > 
> > > > Agreed, something like your new support is needed - I just suspect we
> > > > need
> > > > a level between one socket one iv chain and every IOCB with own IV and
> > > > right now the only way to hit that balance is to have a separate socket
> > > > for each IV chain.  Not exactly efficient use of resources though it
> > > > will
> > > > work.  
> > > 
> > > How would you propose such support via AF_ALG?
> > > Wouldn't it be possible to
> > > arrange the IOVEC array in user space appropriately before calling the
> > > AF_ALG interface? In this case, I would still see that the current AF_ALG
> > > (plus the patch) would support all use cases I am aware of.  
> > 
> > I'm not sure how that would work, but maybe I'm missing something - are you
> > suggesting we could contrive the situation where the kernel side can tell
> > it is getting the same IV multiple times and hence know that it should chain
> > it?  We are talking streaming here - we don't have the data for the later
> > elements when the first ones are queued up.
> > 
> > One approach to handling token based IV - where the token refers to an IV
> > without being it's value would be to add another flag similar to the one
> > you used for inline IV.  
> 
> What about:
> 
> sendmsg(IV, data)
> sendmsg(data)
> ..
> AIO recvmsg with multiple IOCBs
> AIO recvmsg with multiple IOCBs
> ..
> sendmsg(IV, data)
> ..
> 
> This implies, however, that before the sendmsg with the second IV is sent, 
> all 
> AIO operations from the first invocation would need to be finished.

Yes that works fine, but rather restricts the flow - you would end up waiting
until you could concatenate a bunch of data in userspace so as to trade
off against the slow down whenever you need to synchronize back up to userspace.

> > 
> > You would then set the IV as you have done, but also provide a magic value
> > by which to track the chain.  Later IOCBs using the same IV chain would
> > just provide the magic token.
> > 
> > You'd also need some way of retiring the IV eventually once you were done
> > with it or ultimately you would run out of resources.  
> 
> Let me think about that approach a bit.
> 
> > > 
> > > What AF_ALG should do is to enable different vendors like yourself to use
> > > the most appropriate solution. AF_ALG shall not limit users in any way.  
> > Agreed, but we also need to have some consistency for userspace to have some
> > awareness of what it should be using.  Last thing we want is lots of higher
> > level software having to have knowledge of the encryption hardware
> > underneath. Hence I think we should keep the options to the minimum
> > possible or put the burden on drivers that must play well with all options
> > (be it not as efficiently for the ones that work badly for them).
> >   
> > > Thus, AF_ALG allows multiple sockets, if desired. It allows a stream usage
> > > with one setiv call applicable to multiple cipher operations. And with the
> > > offered patch it would allow multiple concurrent and yet independent
> > > cipher
> > > operations. Whichever use case is right for you, AF_ALG should not block
> > > you from applying it. Yet, what is good for you may not be good for
> > > others. Thus, these others may implement a different usage strategy for
> > > AF_ALG. The good thing is that this strategy is defined by user space.
> > > 
> > > In case you see a use case that is prevented by AF_ALG, it would be great
> > > to hear about it to see whether we can support it.  
> > 
> > The usecas

Re: [RFC] AF_ALG AIO and IV

2018-01-15 Thread Jonathan Cameron

On Mon, 15 Jan 2018 14:25:38 +
Jonathan Cameron <jonathan.came...@huawei.com> wrote:

> On Mon, 15 Jan 2018 14:15:42 +0100
> Stephan Mueller <smuel...@chronox.de> wrote:
> 
> > Am Montag, 15. Januar 2018, 13:59:27 CET schrieb Jonathan Cameron:
> > 
> > Hi Jonathan,  
> > > > 
> > > > But there may be hardware that cannot/will not track such dependencies.
> > > > Yet, it has multiple hardware queues. Such hardware can still handle
> > > > parallel requests when they are totally independent from each other. For
> > > > such a case, AF_ALG currently has no support, because it lacks the
> > > > support for setting multiple IVs for the multiple concurrent calls.
> > > 
> > > Agreed, something like your new support is needed - I just suspect we need
> > > a level between one socket one iv chain and every IOCB with own IV and
> > > right now the only way to hit that balance is to have a separate socket
> > > for each IV chain.  Not exactly efficient use of resources though it will
> > > work.
> > 
> > How would you propose such support via AF_ALG? 
> > Wouldn't it be possible to 
> > arrange the IOVEC array in user space appropriately before calling the 
> > AF_ALG 
> > interface? In this case, I would still see that the current AF_ALG (plus 
> > the 
> > patch) would support all use cases I am aware of.  
> 
> I'm not sure how that would work, but maybe I'm missing something - are you
> suggesting we could contrive the situation where the kernel side can tell
> it is getting the same IV multiple times and hence know that it should chain
> it?  We are talking streaming here - we don't have the data for the
> later elements when the first ones are queued up.
> 
> One approach to handling token based IV - where the token refers to an IV 
> without
> being it's value would be to add another flag similar to the one you used for
> inline IV.
> 
> You would then set the IV as you have done, but also provide a magic value by
> which to track the chain.  Later IOCBs using the same IV chain would just
> provide the magic token.
> 
> You'd also need some way of retiring the IV eventually once you were done
> with it or ultimately you would run out of resources.
> 
> >   
> > > > > So the only one left is the case 3 above where the hardware is capable
> > > > > of doing the dependency tracking.
> > > > > 
> > > > > We can support that in two ways but one is rather heavyweight in terms
> > > > > of
> > > > > resources.
> > > > > 
> > > > > 1) Whenever we want to allocate a new context we spin up a new socket
> > > > > and
> > > > > effectively associate a single IV with that (and it's chained updates)
> > > > > much
> > > > > like we do in the existing interface.
> > > > 
> > > > I would not like that because it is too heavyweight. Moreover, 
> > > > considering
> > > > the kernel crypto API logic, a socket is the user space equivalent of a
> > > > TFM. I.e. for setting an IV, you do not need to re-instantiate a TFM.   
> > > >  
> > > 
> > > Agreed, though as I mention above if you have multiple processes you
> > > probably want to give them their own resources anyway (own socket and
> > > probably hardware queue if you can spare one) so as to avoid denial of
> > > service from one to another.
> > 
> > That is for sure, different processes shall never share a socket as 
> > otherwise 
> > one can obtain data belonging to the other which would violate process 
> > isolation.
> >   
> > > > > 2) We allow a token based tracking of IVs.  So userspace code 
> > > > > maintains
> > > > > a counter and tags ever message and the initial IV setup with that
> > > > > counter.
> > > > 
> > > > I think the option I offer with the patch, we have an even more
> > > > lightweight
> > > > approach.
> > > 
> > > Except that I think you have to go all the way back to userspace - unless 
> > > I
> > > am missing the point - you can't have multiple elements of a stream queued
> > > up. Performance will stink if you have a small number of contexts and 
> > > can't
> > > keep the processing engines busy.  At the moment option 1 here is the only
> > > way to implement this.
> > 
> > What AF_ALG should do is to enable different

Re: [RFC] AF_ALG AIO and IV

2018-01-15 Thread Jonathan Cameron

On Mon, 15 Jan 2018 14:15:42 +0100
Stephan Mueller <smuel...@chronox.de> wrote:

> Am Montag, 15. Januar 2018, 13:59:27 CET schrieb Jonathan Cameron:
> 
> Hi Jonathan,
> > > 
> > > But there may be hardware that cannot/will not track such dependencies.
> > > Yet, it has multiple hardware queues. Such hardware can still handle
> > > parallel requests when they are totally independent from each other. For
> > > such a case, AF_ALG currently has no support, because it lacks the
> > > support for setting multiple IVs for the multiple concurrent calls.  
> > 
> > Agreed, something like your new support is needed - I just suspect we need
> > a level between one socket one iv chain and every IOCB with own IV and
> > right now the only way to hit that balance is to have a separate socket
> > for each IV chain.  Not exactly efficient use of resources though it will
> > work.  
> 
> How would you propose such support via AF_ALG? 
> Wouldn't it be possible to 
> arrange the IOVEC array in user space appropriately before calling the AF_ALG 
> interface? In this case, I would still see that the current AF_ALG (plus the 
> patch) would support all use cases I am aware of.

I'm not sure how that would work, but maybe I'm missing something - are you
suggesting we could contrive the situation where the kernel side can tell
it is getting the same IV multiple times and hence know that it should chain
it?  We are talking streaming here - we don't have the data for the
later elements when the first ones are queued up.

One approach to handling token based IV - where the token refers to an IV 
without
being it's value would be to add another flag similar to the one you used for
inline IV.

You would then set the IV as you have done, but also provide a magic value by
which to track the chain.  Later IOCBs using the same IV chain would just
provide the magic token.

You'd also need some way of retiring the IV eventually once you were done
with it or ultimately you would run out of resources.

> 
> > > > So the only one left is the case 3 above where the hardware is capable
> > > > of doing the dependency tracking.
> > > > 
> > > > We can support that in two ways but one is rather heavyweight in terms
> > > > of
> > > > resources.
> > > > 
> > > > 1) Whenever we want to allocate a new context we spin up a new socket
> > > > and
> > > > effectively associate a single IV with that (and it's chained updates)
> > > > much
> > > > like we do in the existing interface.  
> > > 
> > > I would not like that because it is too heavyweight. Moreover, considering
> > > the kernel crypto API logic, a socket is the user space equivalent of a
> > > TFM. I.e. for setting an IV, you do not need to re-instantiate a TFM.  
> > 
> > Agreed, though as I mention above if you have multiple processes you
> > probably want to give them their own resources anyway (own socket and
> > probably hardware queue if you can spare one) so as to avoid denial of
> > service from one to another.  
> 
> That is for sure, different processes shall never share a socket as otherwise 
> one can obtain data belonging to the other which would violate process 
> isolation.
> 
> > > > 2) We allow a token based tracking of IVs.  So userspace code maintains
> > > > a counter and tags ever message and the initial IV setup with that
> > > > counter.  
> > > 
> > > I think the option I offer with the patch, we have an even more
> > > lightweight
> > > approach.  
> > 
> > Except that I think you have to go all the way back to userspace - unless I
> > am missing the point - you can't have multiple elements of a stream queued
> > up. Performance will stink if you have a small number of contexts and can't
> > keep the processing engines busy.  At the moment option 1 here is the only
> > way to implement this.  
> 
> What AF_ALG should do is to enable different vendors like yourself to use the 
> most appropriate solution. AF_ALG shall not limit users in any way.

Agreed, but we also need to have some consistency for userspace to have some
awareness of what it should be using.  Last thing we want is lots of higher
level software having to have knowledge of the encryption hardware underneath.
Hence I think we should keep the options to the minimum possible or put the
burden on drivers that must play well with all options (be it not as efficiently
for the ones that work badly for them).

> 
> Thus, AF_ALG allows multiple sockets, if desired. It allows a stream usage 
> with one setiv call applicable to multiple

Re: [RFC] AF_ALG AIO and IV

2018-01-15 Thread Jonathan Cameron

On Mon, 15 Jan 2018 13:07:16 +0100
Stephan Mueller <smuel...@chronox.de> wrote:

> Am Montag, 15. Januar 2018, 12:05:03 CET schrieb Jonathan Cameron:
> 
> Hi Jonathan,
> 
> > On Fri, 12 Jan 2018 14:21:15 +0100
> > 
> > Stephan Mueller <smuel...@chronox.de> wrote:  
> > > Hi,
> > > 
> > > The kernel crypto API requires the caller to set an IV in the request data
> > > structure. That request data structure shall define one particular cipher
> > > operation. During the cipher operation, the IV is read by the cipher
> > > implementation and eventually the potentially updated IV (e.g. in case of
> > > CBC) is written back to the memory location the request data structure
> > > points to.  
> > Silly question, are we obliged to always write it back?  
> 
> Well, in general, yes. The AF_ALG interface should allow a "stream" mode of 
> operation:
> 
> socket
> accept
> setsockopt(setkey)
> sendmsg(IV, data)
> recvmsg(data)
> sendmsg(data)
> recvmsg(data)
> ..
> 
> For such synchronous operation, I guess it is clear that the IV needs to be 
> written back.
> 
> If you want to play with it, use the "stream" API of libkcapi and the 
> associated test cases.

Thanks for the pointer - will do.

> 
> > In CBC it is
> > obviously the same as the last n bytes of the encrypted message.  I guess
> > for ease of handling it makes sense to do so though.
> >   
> > > AF_ALG allows setting the IV with a sendmsg request, where the IV is
> > > stored in the AF_ALG context that is unique to one particular AF_ALG
> > > socket. Note the analogy: an AF_ALG socket is like a TFM where one
> > > recvmsg operation uses one request with the TFM from the socket.
> > > 
> > > AF_ALG these days supports AIO operations with multiple IOCBs. I.e. with
> > > one recvmsg call, multiple IOVECs can be specified. Each individual IOCB
> > > (derived from one IOVEC) implies that one request data structure is
> > > created with the data to be processed by the cipher implementation. The
> > > IV that was set with the sendmsg call is registered with the request data
> > > structure before the cipher operation.
> > > 
> > > In case of an AIO operation, the cipher operation invocation returns
> > > immediately, queuing the request to the hardware. While the AIO request is
> > > processed by the hardware, recvmsg processes the next IOVEC for which
> > > another request is created. Again, the IV buffer from the AF_ALG socket
> > > context is registered with the new request and the cipher operation is
> > > invoked.
> > > 
> > > You may now see that there is a potential race condition regarding the IV
> > > handling, because there is *no* separate IV buffer for the different
> > > requests. This is nicely demonstrated with libkcapi using the following
> > > command which creates an AIO request with two IOCBs each encrypting one
> > > AES block in CBC mode:
> > > 
> > > kcapi  -d 2 -x 9  -e -c "cbc(aes)" -k
> > > 8d7dd9b0170ce0b5f2f8e1aa768e01e91da8bfc67fd486d081b28254c99eb423 -i
> > > 7fbc02ebf5b93322329df9bfccb635af -p 48981da18e4bb9ef7e2e3162d16b1910
> > > 
> > > When the first AIO request finishes before the 2nd AIO request is
> > > processed, the returned value is:
> > > 
> > > 8b19050f66582cb7f7e4b6c873819b7108afa0eaa7de29bac7d903576b674c32
> > > 
> > > I.e. two blocks where the IV output from the first request is the IV input
> > > to the 2nd block.
> > > 
> > > In case the first AIO request is not completed before the 2nd request
> > > commences, the result is two identical AES blocks (i.e. both use the same
> > > IV):
> > > 
> > > 8b19050f66582cb7f7e4b6c873819b718b19050f66582cb7f7e4b6c873819b71
> > > 
> > > This inconsistent result may even lead to the conclusion that there can be
> > > a memory corruption in the IV buffer if both AIO requests write to the IV
> > > buffer at the same time.
> > > 
> > > This needs to be solved somehow. I see the following options which I would
> > > like to have vetted by the community.  
> > 
> > Taking some 'entirely hypothetical' hardware with the following structure
> > for all my responses - it's about as flexible as I think we'll see in the
> > near future - though I'm sure someone has something more complex out there
> > :)
> > 
> > N hardware queues feeding M processing engines in a scheduler driven
> &g

Re: [RFC] AF_ALG AIO and IV

2018-01-15 Thread Jonathan Cameron

On Fri, 12 Jan 2018 14:21:15 +0100
Stephan Mueller  wrote:

> Hi,
> 
> The kernel crypto API requires the caller to set an IV in the request data 
> structure. That request data structure shall define one particular cipher 
> operation. During the cipher operation, the IV is read by the cipher 
> implementation and eventually the potentially updated IV (e.g. in case of 
> CBC) 
> is written back to the memory location the request data structure points to.

Silly question, are we obliged to always write it back? In CBC it is obviously
the same as the last n bytes of the encrypted message.  I guess for ease of
handling it makes sense to do so though.

> 
> AF_ALG allows setting the IV with a sendmsg request, where the IV is stored 
> in 
> the AF_ALG context that is unique to one particular AF_ALG socket. Note the 
> analogy: an AF_ALG socket is like a TFM where one recvmsg operation uses one 
> request with the TFM from the socket.
> 
> AF_ALG these days supports AIO operations with multiple IOCBs. I.e. with one 
> recvmsg call, multiple IOVECs can be specified. Each individual IOCB (derived 
> from one IOVEC) implies that one request data structure is created with the 
> data to be processed by the cipher implementation. The IV that was set with 
> the sendmsg call is registered with the request data structure before the 
> cipher operation.
> 
> In case of an AIO operation, the cipher operation invocation returns 
> immediately, queuing the request to the hardware. While the AIO request is 
> processed by the hardware, recvmsg processes the next IOVEC for which another 
> request is created. Again, the IV buffer from the AF_ALG socket context is 
> registered with the new request and the cipher operation is invoked.
> 
> You may now see that there is a potential race condition regarding the IV 
> handling, because there is *no* separate IV buffer for the different 
> requests. 
> This is nicely demonstrated with libkcapi using the following command which 
> creates an AIO request with two IOCBs each encrypting one AES block in CBC 
> mode:
> 
> kcapi  -d 2 -x 9  -e -c "cbc(aes)" -k 
> 8d7dd9b0170ce0b5f2f8e1aa768e01e91da8bfc67fd486d081b28254c99eb423 -i 
> 7fbc02ebf5b93322329df9bfccb635af -p 48981da18e4bb9ef7e2e3162d16b1910
> 
> When the first AIO request finishes before the 2nd AIO request is processed, 
> the returned value is:
> 
> 8b19050f66582cb7f7e4b6c873819b7108afa0eaa7de29bac7d903576b674c32
> 
> I.e. two blocks where the IV output from the first request is the IV input to 
> the 2nd block.
> 
> In case the first AIO request is not completed before the 2nd request 
> commences, the result is two identical AES blocks (i.e. both use the same IV):
> 
> 8b19050f66582cb7f7e4b6c873819b718b19050f66582cb7f7e4b6c873819b71
> 
> This inconsistent result may even lead to the conclusion that there can be a 
> memory corruption in the IV buffer if both AIO requests write to the IV 
> buffer 
> at the same time.
> 
> This needs to be solved somehow. I see the following options which I would 
> like to have vetted by the community.
>

Taking some 'entirely hypothetical' hardware with the following structure
for all my responses - it's about as flexible as I think we'll see in the
near future - though I'm sure someone has something more complex out there :)

N hardware queues feeding M processing engines in a scheduler driven fashion.
Actually we might have P sets of these, but load balancing and tracking and
transferring contexts between these is a complexity I think we can ignore.
If you want to use more than one of these P you'll just have to handle it
yourself in userspace.  Note messages may be shorter than IOCBs which
raises another question I've been meaning to ask.  Are all crypto algorithms
obliged to run unlimited length IOCBs?

If there are M messages in a particular queue and none elsewhere it is
capable of processing them all at once (and perhaps returning out of order but
we can fudge them back in order in the driver to avoid that additional
complexity from an interface point of view).

So I'm going to look at this from the hardware point of view - you have
well addressed software management above.

Three ways context management can be handled (in CBC this is basically just
the IV).

1. Each 'work item' queued on a hardware queue has it's IV embedded with the
data.  This requires external synchronization if we are chaining across
multiple 'work items' - note the hardware may have restrictions that mean
it has to split large pieces of data up to encrypt them.  Not all hardware
may support per 'work item' IVs (I haven't done a survey to find out if
everyone does...)

2. Each queue has a context assigned.  We get a new queue whenever we want
to have a different context.  Runs out eventually but our hypothetical
hardware may support a lot of queues.  Note this version could be 'faked'
by putting a cryptoengine queue on the front of the hardware queues.

3. The hardware supports IV

Re: AF_ALG: skb limits

2018-01-15 Thread Jonathan Cameron

On Sat, 13 Jan 2018 15:04:20 +0100
Stephan Müller <smuel...@chronox.de> wrote:

> Am Dienstag, 12. Dezember 2017, 14:59:21 CET schrieb Jonathan Cameron:
>
Hi Stephan,
 
> Hi Jonathan,
> 
> > On Fri, 8 Dec 2017 13:43:20 +0100
> > 
> > Stephan Mueller <smuel...@chronox.de> wrote:  
> > > Am Freitag, 8. Dezember 2017, 12:39:06 CET schrieb Jonathan Cameron:
> > > 
> > > Hi Jonathan,
> > >   
> > > > As a heads up, the other nasties we've found so far are around hitting
> > > > limits on the various socket buffers.  When you run into those you can
> > > > end
> > > > up with parts of the data to be encrypted going through without it being
> > > > obvious.  
> > > 
> > > The entire code uses sock_alloc to prevent user space to chew up kernel
> > > memory. I am aware that if you have a too low skb buffer limit, parts of
> > > the cipher operation will not succeed as a malloc will fail.
> > > 
> > > But that is returned with an error to user space.  
> > 
> > I 'think' there are some cases where you don't get an error to userspace -
> > or at least not directly.
> > 
> > In af_alg_get_rsgl, we have an af_alg_readable(sk) call which simply breaks
> > out of the loop - leaving us with len set to potentially part of the
> > required length - without setting a appropriate return value.
> > 
> > I can't immediately see how this one is reported to userspace.  
> 
> af_alg_get_rsgl is used to calculate and obtain the length of the buffers 
> receiving the result (what is called RX buffers). That length is used for the 
> cipher operation and finally reported to user space.
> 
> For example (TX is the buffer given to the kernel during sendmsg, RX during 
> recvmsg) where the example disregards the alignment to block size:
> 
> sendmsg TX: 1000 bytes
> recvmsg RX: 100 bytes given by user space, 100 bytes obtained by kernel
>   => recvmsg returns 100  
> recvmsg RX: 1000 bytes given by user space, 900 bytes obtained by kernel
>   => recvmsg returns 900  
> 
> Note, the TX and RX buffer sizes may *not* be identical or even related. User 
> space may send one large input block as TX, but "process" it with numerous 
> small recvmsg calls. Each recvmsg call defines the cipher operation size. 
> This 
> is of particular importance for an AEAD cipher.
Sure.
> > 
> > At least, with my driver, the first we see is an error returned by the
> > processing engine. That ultimately gets reported to userspace the aio
> > path.  
> 
> So, you say that the error is returned. Isn't that what you want or expect?

Given the issue is detected at sending to the device I would expect it to
be reported then not when we get corrupt data back from the device.  Note
this isn't the crypto core detecting it but the hardware picking up on a 
malformed
configuration.  There could easily be other hardware out there that would not
detect this at all and would simply, transparently, return the wrong value.

Now it's possible it would also be detected as a wrong length at the receive
end, but I'm not getting that far.  I would have expected the system to have
not queued it up for processing in the first place if not all the data is
there (because we have no space for the result).

Jonathan

> > 
> > I'm chasing a particularly evil bug in my driver at the moment that is
> > locking up the CPU and IOMMUs - if I ever figure that one out I'll run some
> > fuzzing around these limits and see if we can find other cases.  
> 
> Ciao
> Stephan
> 
>

Re: [PATCH] crypto: Fix race around ctx->rcvused by making it atomic_t

2017-12-19 Thread Jonathan Cameron

On Tue, 19 Dec 2017 12:11:19 +0100
Stephan Mueller <smuel...@chronox.de> wrote:

> Am Dienstag, 19. Dezember 2017, 11:31:22 CET schrieb Jonathan Cameron:
> 
> Hi Jonathan,
> 
> > This variable was increased and decreased without any protection.
> > Result was an occasional misscount and negative wrap around resulting
> > in false resource allocation failures.
> > 
> > Signed-off-by: Jonathan Cameron <jonathan.came...@huawei.com>
> > Fixes: 2d97591ef43d ("crypto: af_alg - consolidation of duplicate code")
> > ---
> > Resending due to incompetence on my part - sorry all!
> > 
> > The fixes tag is probably not ideal as I'm not 100% sure when this actually
> > became a bug.  The code in question was introduced in
> > 
> > Fixes: e870456d8e7c ("crypto: algif_skcipher - overhaul memory management")
> > and related.
> > rcvused moved in
> > Fixes: 2d97591ef43d ("crypto: af_alg - consolidation of duplicate code")
> > So I have gone with the latter option as that is where it will cleanly
> > apply. However, it probably doesn't matter as both are present only in the
> > 4.14 final kernel.
> > 
> >  crypto/af_alg.c | 4 ++--
> >  crypto/algif_aead.c | 2 +-
> >  crypto/algif_skcipher.c | 2 +-
> >  include/crypto/if_alg.h | 5 +++--
> >  4 files changed, 7 insertions(+), 6 deletions(-)
> > 
> > diff --git a/crypto/af_alg.c b/crypto/af_alg.c
> > index 8612aa36a3ef..759cfa678c04 100644
> > --- a/crypto/af_alg.c
> > +++ b/crypto/af_alg.c
> > @@ -664,7 +664,7 @@ void af_alg_free_areq_sgls(struct af_alg_async_req
> > *areq) unsigned int i;
> > 
> > list_for_each_entry_safe(rsgl, tmp, >rsgl_list, list) {
> > -   ctx->rcvused -= rsgl->sg_num_bytes;
> > +   atomic_sub(rsgl->sg_num_bytes, >rcvused);
> > af_alg_free_sg(>sgl);
> > list_del(>list);
> > if (rsgl != >first_rsgl)
> > @@ -1173,7 +1173,7 @@ int af_alg_get_rsgl(struct sock *sk, struct msghdr
> > *msg, int flags,
> > 
> > areq->last_rsgl = rsgl;
> > len += err;
> > -   ctx->rcvused += err;
> > +   atomic_add(err, >rcvused);
> > rsgl->sg_num_bytes = err;
> > iov_iter_advance(>msg_iter, err);
> > }
> > diff --git a/crypto/algif_aead.c b/crypto/algif_aead.c
> > index db9378a45296..d3da3b0869f5 100644
> > --- a/crypto/algif_aead.c
> > +++ b/crypto/algif_aead.c
> > @@ -550,7 +550,7 @@ static int aead_accept_parent_nokey(void *private,
> > struct sock *sk) INIT_LIST_HEAD(>tsgl_list);
> > ctx->len = len;
> > ctx->used = 0;
> > -   ctx->rcvused = 0;
> > +   atomic_set(>rcvused, 0);
> 
> I think ATOMIC_INIT(0) is more suitable here.

It's ugly to use it to assign a dynamic element like this.

ctx->rcvused = (atomic_t)ATOMIC_INIT(0);

Need the cast to avoid getting 
error: expected expression before '{' token
#define ATOMIC_INIT(i) { (i) }
There are only two drivers in the kernel doing this vs
a lot doing initialization using the atomic_set option.

I'm happy to change it though if you would prefer.

> 
> > ctx->more = 0;
> > ctx->merge = 0;
> > ctx->enc = 0;
> > diff --git a/crypto/algif_skcipher.c b/crypto/algif_skcipher.c
> > index c7c75ef41952..a5b4898f625a 100644
> > --- a/crypto/algif_skcipher.c
> > +++ b/crypto/algif_skcipher.c
> > @@ -388,7 +388,7 @@ static int skcipher_accept_parent_nokey(void *private,
> > struct sock *sk) INIT_LIST_HEAD(>tsgl_list);
> > ctx->len = len;
> > ctx->used = 0;
> > -   ctx->rcvused = 0;
> > +   atomic_set(>rcvused, 0);
> 
> dto.
> 
> > ctx->more = 0;
> > ctx->merge = 0;
> > ctx->enc = 0;
> > diff --git a/include/crypto/if_alg.h b/include/crypto/if_alg.h
> > index 38d9c5861ed8..f38227a78eae 100644
> > --- a/include/crypto/if_alg.h
> > +++ b/include/crypto/if_alg.h
> > @@ -18,6 +18,7 @@
> >  #include 
> >  #include 
> >  #include 
> > +#include 
> >  #include 
> > 
> >  #include 
> > @@ -150,7 +151,7 @@ struct af_alg_ctx {
> > struct crypto_wait wait;
> > 
> > size_t used;
> > -   size_t rcvused;
> > +   atomic_t rcvused;
> > 
> > bool more;
> > bool merge;
> > @@ -215,7 +216,7 @@ static inline int af_alg_rcvbuf(struct sock *sk)
> > struct af_alg_ctx *ctx = ask->private;
> > 
> > return max_t(int, max_t(int, sk->sk_rcvbuf & PAGE_MASK, PAGE_SIZE) -
> > - ctx->rcvused, 0);
> > +atomic_read(>rcvused), 0);
> >  }
> > 
> >  /**
> 
> Other than the comments above:
> 
> Reviewed-by: Stephan Mueller <smuel...@chronox.de>
> 
> 
> Ciao
> Stephan

Re: [Plinth PATCH]{topost} crypto: Fix race around ctx->rcvused by making it atomic_t

2017-12-19 Thread Jonathan Cameron

Sorry idiot moment.  Sent that out with our internal markings.
Resending shortly.

On Tue, 19 Dec 2017 10:27:24 +
Jonathan Cameron <jonathan.came...@huawei.com> wrote:

> This variable was increased and decreased without any protection.
> Result was an occasional misscount and negative wrap around resulting
> in false resource allocation failures.
> 
> Signed-off-by: Jonathan Cameron <jonathan.came...@huawei.com>
> Fixes: 2d97591ef43d ("crypto: af_alg - consolidation of duplicate code")
> ---
> The fixes tag is probably not ideal as I'm not 100% sure when this actually
> became a bug.  The code in question was introduced in
> 
> Fixes: e870456d8e7c ("crypto: algif_skcipher - overhaul memory management")
> and related.
> rcvused moved in 
> Fixes: 2d97591ef43d ("crypto: af_alg - consolidation of duplicate code")
> So I have gone with the latter option as that is where it will cleanly apply.
> However, it probably doesn't matter as both are present only in the 4.14
> final kernel.
> 
>  crypto/af_alg.c | 4 ++--
>  crypto/algif_aead.c | 2 +-
>  crypto/algif_skcipher.c | 2 +-
>  include/crypto/if_alg.h | 5 +++--
>  4 files changed, 7 insertions(+), 6 deletions(-)
> 
> diff --git a/crypto/af_alg.c b/crypto/af_alg.c
> index 8612aa36a3ef..759cfa678c04 100644
> --- a/crypto/af_alg.c
> +++ b/crypto/af_alg.c
> @@ -664,7 +664,7 @@ void af_alg_free_areq_sgls(struct af_alg_async_req *areq)
>   unsigned int i;
>  
>   list_for_each_entry_safe(rsgl, tmp, >rsgl_list, list) {
> - ctx->rcvused -= rsgl->sg_num_bytes;
> + atomic_sub(rsgl->sg_num_bytes, >rcvused);
>   af_alg_free_sg(>sgl);
>   list_del(>list);
>   if (rsgl != >first_rsgl)
> @@ -1173,7 +1173,7 @@ int af_alg_get_rsgl(struct sock *sk, struct msghdr 
> *msg, int flags,
>  
>   areq->last_rsgl = rsgl;
>   len += err;
> - ctx->rcvused += err;
> + atomic_add(err, >rcvused);
>   rsgl->sg_num_bytes = err;
>   iov_iter_advance(>msg_iter, err);
>   }
> diff --git a/crypto/algif_aead.c b/crypto/algif_aead.c
> index db9378a45296..d3da3b0869f5 100644
> --- a/crypto/algif_aead.c
> +++ b/crypto/algif_aead.c
> @@ -550,7 +550,7 @@ static int aead_accept_parent_nokey(void *private, struct 
> sock *sk)
>   INIT_LIST_HEAD(>tsgl_list);
>   ctx->len = len;
>   ctx->used = 0;
> - ctx->rcvused = 0;
> + atomic_set(>rcvused, 0);
>   ctx->more = 0;
>   ctx->merge = 0;
>   ctx->enc = 0;
> diff --git a/crypto/algif_skcipher.c b/crypto/algif_skcipher.c
> index c7c75ef41952..a5b4898f625a 100644
> --- a/crypto/algif_skcipher.c
> +++ b/crypto/algif_skcipher.c
> @@ -388,7 +388,7 @@ static int skcipher_accept_parent_nokey(void *private, 
> struct sock *sk)
>   INIT_LIST_HEAD(>tsgl_list);
>   ctx->len = len;
>   ctx->used = 0;
> - ctx->rcvused = 0;
> + atomic_set(>rcvused, 0);
>   ctx->more = 0;
>   ctx->merge = 0;
>   ctx->enc = 0;
> diff --git a/include/crypto/if_alg.h b/include/crypto/if_alg.h
> index 38d9c5861ed8..f38227a78eae 100644
> --- a/include/crypto/if_alg.h
> +++ b/include/crypto/if_alg.h
> @@ -18,6 +18,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  
>  #include 
> @@ -150,7 +151,7 @@ struct af_alg_ctx {
>   struct crypto_wait wait;
>  
>   size_t used;
> - size_t rcvused;
> + atomic_t rcvused;
>  
>   bool more;
>   bool merge;
> @@ -215,7 +216,7 @@ static inline int af_alg_rcvbuf(struct sock *sk)
>   struct af_alg_ctx *ctx = ask->private;
>  
>   return max_t(int, max_t(int, sk->sk_rcvbuf & PAGE_MASK, PAGE_SIZE) -
> -   ctx->rcvused, 0);
> +  atomic_read(>rcvused), 0);
>  }
>  
>  /**

[Plinth PATCH]{topost} crypto: Fix race around ctx->rcvused by making it atomic_t

2017-12-19 Thread Jonathan Cameron

This variable was increased and decreased without any protection.
Result was an occasional misscount and negative wrap around resulting
in false resource allocation failures.

Signed-off-by: Jonathan Cameron <jonathan.came...@huawei.com>
Fixes: 2d97591ef43d ("crypto: af_alg - consolidation of duplicate code")
---
The fixes tag is probably not ideal as I'm not 100% sure when this actually
became a bug.  The code in question was introduced in

Fixes: e870456d8e7c ("crypto: algif_skcipher - overhaul memory management")
and related.
rcvused moved in 
Fixes: 2d97591ef43d ("crypto: af_alg - consolidation of duplicate code")
So I have gone with the latter option as that is where it will cleanly apply.
However, it probably doesn't matter as both are present only in the 4.14
final kernel.

 crypto/af_alg.c | 4 ++--
 crypto/algif_aead.c | 2 +-
 crypto/algif_skcipher.c | 2 +-
 include/crypto/if_alg.h | 5 +++--
 4 files changed, 7 insertions(+), 6 deletions(-)

diff --git a/crypto/af_alg.c b/crypto/af_alg.c
index 8612aa36a3ef..759cfa678c04 100644
--- a/crypto/af_alg.c
+++ b/crypto/af_alg.c
@@ -664,7 +664,7 @@ void af_alg_free_areq_sgls(struct af_alg_async_req *areq)
unsigned int i;
 
list_for_each_entry_safe(rsgl, tmp, >rsgl_list, list) {
-   ctx->rcvused -= rsgl->sg_num_bytes;
+   atomic_sub(rsgl->sg_num_bytes, >rcvused);
af_alg_free_sg(>sgl);
list_del(>list);
if (rsgl != >first_rsgl)
@@ -1173,7 +1173,7 @@ int af_alg_get_rsgl(struct sock *sk, struct msghdr *msg, 
int flags,
 
areq->last_rsgl = rsgl;
len += err;
-   ctx->rcvused += err;
+   atomic_add(err, >rcvused);
rsgl->sg_num_bytes = err;
iov_iter_advance(>msg_iter, err);
}
diff --git a/crypto/algif_aead.c b/crypto/algif_aead.c
index db9378a45296..d3da3b0869f5 100644
--- a/crypto/algif_aead.c
+++ b/crypto/algif_aead.c
@@ -550,7 +550,7 @@ static int aead_accept_parent_nokey(void *private, struct 
sock *sk)
INIT_LIST_HEAD(>tsgl_list);
ctx->len = len;
ctx->used = 0;
-   ctx->rcvused = 0;
+   atomic_set(>rcvused, 0);
ctx->more = 0;
ctx->merge = 0;
ctx->enc = 0;
diff --git a/crypto/algif_skcipher.c b/crypto/algif_skcipher.c
index c7c75ef41952..a5b4898f625a 100644
--- a/crypto/algif_skcipher.c
+++ b/crypto/algif_skcipher.c
@@ -388,7 +388,7 @@ static int skcipher_accept_parent_nokey(void *private, 
struct sock *sk)
INIT_LIST_HEAD(>tsgl_list);
ctx->len = len;
ctx->used = 0;
-   ctx->rcvused = 0;
+   atomic_set(>rcvused, 0);
ctx->more = 0;
ctx->merge = 0;
ctx->enc = 0;
diff --git a/include/crypto/if_alg.h b/include/crypto/if_alg.h
index 38d9c5861ed8..f38227a78eae 100644
--- a/include/crypto/if_alg.h
+++ b/include/crypto/if_alg.h
@@ -18,6 +18,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 #include 
@@ -150,7 +151,7 @@ struct af_alg_ctx {
struct crypto_wait wait;
 
size_t used;
-   size_t rcvused;
+   atomic_t rcvused;
 
bool more;
bool merge;
@@ -215,7 +216,7 @@ static inline int af_alg_rcvbuf(struct sock *sk)
struct af_alg_ctx *ctx = ask->private;
 
return max_t(int, max_t(int, sk->sk_rcvbuf & PAGE_MASK, PAGE_SIZE) -
- ctx->rcvused, 0);
+atomic_read(>rcvused), 0);
 }
 
 /**
-- 
2.11.0

RFC: Crypto: Race in updating available writable socket memory.

2017-12-18 Thread Jonathan Cameron

Hi All,

I came across this one but am not sure how heavy weight a fix we want to apply.

I was getting failures on af_alg_readable in af_alg_get_rsgl which made little
sense as I had wmem set to a very large value.  This was caused by a race 
between
ctx->rcvused += err in af_alg_get_rsgl and
ctx->rcvused -= rsgl->sg_num_bytes in af_alg_free_areq_sgls which was being
called from an interrupt thread.  It was wrapping in a downwards direction once
in a large number of cycles.

Testing was using the AIO interface and libkcapi on the userspace side and a
(hopefully) soon to be posted new Hisilicon 161x crypto accelerator driver.

So it seems that this field needs some protection.
1. Could make it an atomic_t (works fine, but feels rather inelegant!)
2. Leverage the complexity seen in sock.c if it is needed...

So the question is whether making it an atomic_t is the right way to go.

Thanks,

Jonathan

Re: AF_ALG: skb limits

2017-12-12 Thread Jonathan Cameron

On Fri, 8 Dec 2017 13:43:20 +0100
Stephan Mueller <smuel...@chronox.de> wrote:

> Am Freitag, 8. Dezember 2017, 12:39:06 CET schrieb Jonathan Cameron:
> 
> Hi Jonathan,
> 
> > 
> > As a heads up, the other nasties we've found so far are around hitting
> > limits on the various socket buffers.  When you run into those you can end
> > up with parts of the data to be encrypted going through without it being
> > obvious.
> >   
> 
> The entire code uses sock_alloc to prevent user space to chew up kernel 
> memory. I am aware that if you have a too low skb buffer limit, parts of the 
> cipher operation will not succeed as a malloc will fail.
> 
> But that is returned with an error to user space. 
I 'think' there are some cases where you don't get an error to userspace - or
at least not directly.

In af_alg_get_rsgl, we have an af_alg_readable(sk) call which simply breaks out
of the loop - leaving us with len set to potentially part of the required length
- without setting a appropriate return value.

I can't immediately see how this one is reported to userspace.

At least, with my driver, the first we see is an error returned by the 
processing
engine.  That ultimately gets reported to userspace the aio path.

I'm chasing a particularly evil bug in my driver at the moment that is locking
up the CPU and IOMMUs - if I ever figure that one out I'll run some fuzzing
around these limits and see if we can find other cases.

Jonathan

> If you observe such an 
> error, the entire message you wanted to read with recvmsg must be considered 
> tainted (i.e. you do not know which part of the message has been encrypted or 
> not). Thus, the recvmsg must be invoked again on the same buffer sent to the 
> kernel if you want to ensure you encrypted the data.
> 
> Could you please help me understand where that functionality fails?
> 
> PS: If you want to give more memory to your sockets, you have to tweak /proc/
> sys/net/core/optmem_max.
> 
> Ciao
> Stephan

Re: [PATCH] crypto: AF_ALG - fix race accessing cipher request

2017-12-08 Thread Jonathan Cameron

On Fri, 8 Dec 2017 11:50:37 +0100
Stephan Müller <smuel...@chronox.de> wrote:

> Hi Herbert,
> 
> This patch would go on top of 7d2c3f54e6f646887d019faa45f35d6fe9fe82ce
> "crypto: af_alg - remove locking in async callback" found in Linus' tree
> which is not yet in the cryptodev-2.6 tree.
> 
> In addition, this patch is already on top of the other patches discussed
> on this list fixing similar issues. I.e. depending in which order you apply
> the patches, there may be a hunk. In case you want me to rebase the patch,
> please let me know.
> 
> ---8<---
> When invoking an asynchronous cipher operation, the invocation of the
> callback may be performed before the subsequent operations in the
> initial code path are invoked. The callback deletes the cipher request
> data structure which implies that after the invocation of the
> asynchronous cipher operation, this data structure must not be accessed
> any more.
> 
> The setting of the return code size with the request data structure must
> therefore be moved before the invocation of the asynchronous cipher
> operation.
> 
> Fixes: e870456d8e7c ("crypto: algif_skcipher - overhaul memory management")
> Fixes: d887c52d6ae4 ("crypto: algif_aead - overhaul memory management")
> Reported-by: syzbot <syzkal...@googlegroups.com>
> Cc: <sta...@vger.kernel.org> # v4.14+
> Signed-off-by: Stephan Mueller <smuel...@chronox.de>
Acked-by: Jonathan Cameron <jonathan.came...@huawei.com>

Have an identical patch in my queue having observed this happening on our
hardware - though only had the skcipher case.

As a heads up, the other nasties we've found so far are around hitting limits
on the various socket buffers.  When you run into those you can end up with
parts of the data to be encrypted going through without it being obvious.

Will update on that properly once I've chased down a local problem that
is limiting the length of my test runs to a few million messages.

Jonathan

> ---
>  crypto/algif_aead.c | 10 +-
>  crypto/algif_skcipher.c | 10 +-
>  2 files changed, 10 insertions(+), 10 deletions(-)
> 
> diff --git a/crypto/algif_aead.c b/crypto/algif_aead.c
> index 15e561dc47b5..d7ec2f4ebaf9 100644
> --- a/crypto/algif_aead.c
> +++ b/crypto/algif_aead.c
> @@ -291,6 +291,10 @@ static int _aead_recvmsg(struct socket *sock, struct 
> msghdr *msg,
>   /* AIO operation */
>   sock_hold(sk);
>   areq->iocb = msg->msg_iocb;
> +
> + /* Remember output size that will be generated. */
> + areq->outlen = outlen;
> +
>   aead_request_set_callback(>cra_u.aead_req,
> CRYPTO_TFM_REQ_MAY_BACKLOG,
> af_alg_async_cb, areq);
> @@ -298,12 +302,8 @@ static int _aead_recvmsg(struct socket *sock, struct 
> msghdr *msg,
>crypto_aead_decrypt(>cra_u.aead_req);
>  
>   /* AIO operation in progress */
> - if (err == -EINPROGRESS || err == -EBUSY) {
> - /* Remember output size that will be generated. */
> - areq->outlen = outlen;
> -
> + if (err == -EINPROGRESS || err == -EBUSY)
>   return -EIOCBQUEUED;
> - }
>  
>   sock_put(sk);
>   } else {
> diff --git a/crypto/algif_skcipher.c b/crypto/algif_skcipher.c
> index 6fb595cd63ac..baef9bfccdda 100644
> --- a/crypto/algif_skcipher.c
> +++ b/crypto/algif_skcipher.c
> @@ -125,6 +125,10 @@ static int _skcipher_recvmsg(struct socket *sock, struct 
> msghdr *msg,
>   /* AIO operation */
>   sock_hold(sk);
>   areq->iocb = msg->msg_iocb;
> +
> + /* Remember output size that will be generated. */
> + areq->outlen = len;
> +
>   skcipher_request_set_callback(>cra_u.skcipher_req,
> CRYPTO_TFM_REQ_MAY_SLEEP,
> af_alg_async_cb, areq);
> @@ -133,12 +137,8 @@ static int _skcipher_recvmsg(struct socket *sock, struct 
> msghdr *msg,
>   crypto_skcipher_decrypt(>cra_u.skcipher_req);
>  
>   /* AIO operation in progress */
> - if (err == -EINPROGRESS || err == -EBUSY) {
> - /* Remember output size that will be generated. */
> - areq->outlen = len;
> -
> + if (err == -EINPROGRESS || err == -EBUSY)
>   return -EIOCBQUEUED;
> - }
>  
>   sock_put(sk);
>   } else {

Re: [PATCH v3] crypto: AF_ALG - remove locking in async callback

2017-11-24 Thread Jonathan Cameron

On Fri, 24 Nov 2017 17:04:19 +0100
Stephan Mueller  wrote:

> Am Freitag, 24. November 2017, 08:37:39 CET schrieb Herbert Xu:
> 
> Hi Herbert,
> 
> > On Fri, Nov 10, 2017 at 01:20:55PM +0100, Stephan Müller wrote:  
> > > The code paths protected by the socket-lock do not use or modify the
> > > socket in a non-atomic fashion. The actions pertaining the socket do not
> > > even need to be handled as an atomic operation. Thus, the socket-lock
> > > can be safely ignored.
> > > 
> > > This fixes a bug regarding scheduling in atomic as the callback function
> > > may be invoked in interrupt context.
> > > 
> > > In addition, the sock_hold is moved before the AIO encrypt/decrypt
> > > operation to ensure that the socket is always present. This avoids a
> > > tiny race window where the socket is unprotected and yet used by the AIO
> > > operation.
> > > 
> > > Finally, the release of resources for a crypto operation is moved into a
> > > common function of af_alg_free_resources.
> > > 
> > > Fixes: e870456d8e7c8 ("crypto: algif_skcipher - overhaul memory
> > > management") Fixes: d887c52d6ae43 ("crypto: algif_aead - overhaul memory
> > > management") Reported-by: Romain Izard 
> > > Signed-off-by: Stephan Mueller   
> > 
> > Patch applied.  Thanks.  
> 
> Thanks a lot.
> 
> Would it make sense to feed it to stable?
> 
> Ciao
> Stephan
My view would be definitely.  Ran into this precise issue whilst testing
a new driver 4.14 today...

Jonathan

Re: New Linux accelerators discussion list [was: Re: Fostering linux community collaboration on hardware accelerators]

2017-10-17 Thread Jonathan Cameron

+ linux-accelerat...@lists.ozlabs.org

Seems sensible to have this email actually go to the new list so
at least it appears in the archive.

Sorry all, I should have thought of this before pressing send,

Jonathan

On Tue, 17 Oct 2017 13:48:10 +0100
Jonathan Cameron <jonathan.came...@huawei.com> wrote:

> On Tue, 17 Oct 2017 11:00:40 +1100
> Andrew Donnellan <andrew.donnel...@au1.ibm.com> wrote:
> 
> > On 17/10/17 01:07, Jonathan Cameron wrote:  
> > > 
> > > 
> > >>> So as ever with a linux community focusing on a particular topic, the
> > >>> obvious solution is a mailing list. There are a number of options on how
> > >>> do this.
> > >>>
> > >>> 1) Ask one of the industry bodies to host? Who?
> > >>>
> > >>> 2) Put together a compelling argument for 
> > >>> linux-accelerat...@vger.kernel.org
> > >>> as probably the most generic location for such a list.
> > >>
> > >> Happy to offer linux-accelerat...@lists.ozlabs.org, which I can get set
> > >> up immediately (and if we want patchwork, patchwork.ozlabs.org is
> > >> available as always, no matter where the list is hosted).
> > >>
> > > 
> > > That would be great! Thanks for doing this. Much easier to find out what
> > > such a list is useful for by the practical option of having a list and
> > > see what people do with it.
> > 
> > [+ LKML]
> > 
> > We now have a mailing list:
> > 
> >List:  linux-accelerat...@lists.ozlabs.org
> >Info:  https://lists.ozlabs.org/listinfo/linux-accelerators
> >Archives:  https://lists.ozlabs.org/pipermail/linux-accelerators
> > 
> > I haven't set up Patchwork as yet, but if people think that's a good 
> > idea I can get that done too.
> > 
> > 
> > Andrew
> >   
> 
> Thanks Andrew.
> 
> A quick summary of initial thoughts on scope of this list for anyone
> entering the discussion at this point.
> Note it will hopefully evolve in whatever direction people find helpful.
> This contains some suggestions not a barrier to wider scope!
> 
> There are a number of projects / applications involving what are
> termed hardware accelerators.  These include:
> * Traditional offload engines
>   - Crypto, compression, media transcoding and similar accelerators
>   - usecases including kTLS, Storage Encryption etc.
> * Dynamic FPGA type accelerators
> * ODP, DPDK and similar networking data plane - particularly dual stack
>   solutions where the kernel 'plays nicely' with userspace drivers.
> * AI Accelerators
> * Shared Virtual Memory (SVM) bus systems including Open-CAPI, CCIX etc
> * Fast data flow to/from userspace applications.
> 
> A number of existing project focus on these:
> * Mainline kernel drivers
>   - Numerous crypto drivers etc
>   - Open-CAPI
> * Various networking data plane activities
> * VFIO based and similar userspace drivers
> Hopefully this list can provide a broader forum where more general
> questions are being considered.
> 
> The discussion that lead to this list was that a number of people would
> like a general open forum on which to discuss ideas with scope beyond
> simply one kernel subsystem or one particular userspace framework.
> 
> Topics might include
> * RFCs and early reviews of new approaches.
> * Similar hardware - who is trying to solve the same problems?
> * What would we ideally want from new hardware iterations?
> * Hardware description - the question of how to chose a particular
>   crypto engine is very dependent on the particular data flows.
>   Sometimes hardware accelerators don't actually help due to overheads.
>   Understanding those barriers would be very useful.
> * Upstream paths - blockers and how to work with the communities to
>   overcome them.
> * Fostering stronger userspace communities to allow these accelerators to 
>   be easily harnessed.
>   - A number of projects have been highlighted in this thread
> OpenStack (cyborg project), openMP accelerator support 
> * Robustness / security of userspace frameworks.
> * Virtualisation of accelerators
> 
> Anyhow, this email was just meant to draw together some thoughts.
> It will be interesting to see what the list actually gets used for :)
> 
> Jonathan

Re: New Linux accelerators discussion list [was: Re: Fostering linux community collaboration on hardware accelerators]

2017-10-17 Thread Jonathan Cameron

On Tue, 17 Oct 2017 11:00:40 +1100
Andrew Donnellan <andrew.donnel...@au1.ibm.com> wrote:

> On 17/10/17 01:07, Jonathan Cameron wrote:
> > 
> >   
> >>> So as ever with a linux community focusing on a particular topic, the
> >>> obvious solution is a mailing list. There are a number of options on how
> >>> do this.
> >>>
> >>> 1) Ask one of the industry bodies to host? Who?
> >>>
> >>> 2) Put together a compelling argument for 
> >>> linux-accelerat...@vger.kernel.org
> >>> as probably the most generic location for such a list.  
> >>
> >> Happy to offer linux-accelerat...@lists.ozlabs.org, which I can get set
> >> up immediately (and if we want patchwork, patchwork.ozlabs.org is
> >> available as always, no matter where the list is hosted).
> >>  
> > 
> > That would be great! Thanks for doing this. Much easier to find out what
> > such a list is useful for by the practical option of having a list and
> > see what people do with it.  
> 
> [+ LKML]
> 
> We now have a mailing list:
> 
>List:  linux-accelerat...@lists.ozlabs.org
>Info:  https://lists.ozlabs.org/listinfo/linux-accelerators
>Archives:  https://lists.ozlabs.org/pipermail/linux-accelerators
> 
> I haven't set up Patchwork as yet, but if people think that's a good 
> idea I can get that done too.
> 
> 
> Andrew
> 

Thanks Andrew.

A quick summary of initial thoughts on scope of this list for anyone
entering the discussion at this point.
Note it will hopefully evolve in whatever direction people find helpful.
This contains some suggestions not a barrier to wider scope!

There are a number of projects / applications involving what are
termed hardware accelerators.  These include:
* Traditional offload engines
- Crypto, compression, media transcoding and similar accelerators
- usecases including kTLS, Storage Encryption etc.
* Dynamic FPGA type accelerators
* ODP, DPDK and similar networking data plane - particularly dual stack
  solutions where the kernel 'plays nicely' with userspace drivers.
* AI Accelerators
* Shared Virtual Memory (SVM) bus systems including Open-CAPI, CCIX etc
* Fast data flow to/from userspace applications.

A number of existing project focus on these:
* Mainline kernel drivers
- Numerous crypto drivers etc
- Open-CAPI
* Various networking data plane activities
* VFIO based and similar userspace drivers
Hopefully this list can provide a broader forum where more general
questions are being considered.

The discussion that lead to this list was that a number of people would
like a general open forum on which to discuss ideas with scope beyond
simply one kernel subsystem or one particular userspace framework.

Topics might include
* RFCs and early reviews of new approaches.
* Similar hardware - who is trying to solve the same problems?
* What would we ideally want from new hardware iterations?
* Hardware description - the question of how to chose a particular
  crypto engine is very dependent on the particular data flows.
  Sometimes hardware accelerators don't actually help due to overheads.
  Understanding those barriers would be very useful.
* Upstream paths - blockers and how to work with the communities to
  overcome them.
* Fostering stronger userspace communities to allow these accelerators to 
  be easily harnessed.
- A number of projects have been highlighted in this thread
  OpenStack (cyborg project), openMP accelerator support 
* Robustness / security of userspace frameworks.
* Virtualisation of accelerators

Anyhow, this email was just meant to draw together some thoughts.
It will be interesting to see what the list actually gets used for :)

Jonathan

Re: Fostering linux community collaboration on hardware accelerators

2017-10-16 Thread Jonathan Cameron



> > So as ever with a linux community focusing on a particular topic, the
> > obvious solution is a mailing list. There are a number of options on how
> > do this.
> > 
> > 1) Ask one of the industry bodies to host? Who?
> > 
> > 2) Put together a compelling argument for linux-accelerat...@vger.kernel.org
> > as probably the most generic location for such a list.  
> 
> Happy to offer linux-accelerat...@lists.ozlabs.org, which I can get set 
> up immediately (and if we want patchwork, patchwork.ozlabs.org is 
> available as always, no matter where the list is hosted).
> 

That would be great! Thanks for doing this. Much easier to find out what
such a list is useful for by the practical option of having a list and
see what people do with it.

Thanks,

Jonathan
> > More open questions are
> > 1) Scope?
> >   * Would anyone ever use such an overarching list?
> >   * Are we better off with the usual adhoc list of 'interested parties' + 
> > lkml?
> >   * Do we actually need to define the full scope - are we better with a 
> > vague
> > definition?  
> 
> I think a list with a broad and vaguely defined scope is a good idea - 
> it would certainly be helpful to us to be able to follow what other 
> contributors are doing that could be relevant to our CAPI and OpenCAPI work.
> 
> > 
> > 2) Is there an existing community we can use to discuss these issues?
> > (beyond the obvious firehose of LKML).
> > 
> > 3) Who else to approach for input on these general questions?
> > 
> > In parallel to this there are elements such as git / patchwork etc but
> > they can all be done as they are needed.
> > 
> > Thanks
> > 
> > --
> > Jonathan Cameron
> > Huawei
> >   
>

Re: Fostering linux community collaboration on hardware accelerators

2017-10-11 Thread Jonathan Cameron

On Wed, 11 Oct 2017 11:51:49 -0400
Sandy Harris <sandyinch...@gmail.com> wrote:

> I shortened the cc list radically. If the discussion continues, it may
> be a good idea to add people back. I added John Gilmore since I cite
> one of his posts below.

Fair enough - I have cc'd back in our internal list for now.
I cynically take the view that people in this community are very
good at ignoring emails they aren't interested in :)

> 
> Jonathan Cameron <jonathan.came...@huawei.com> wrote:
> 
> > On behalf of Huawei, I am looking into options to foster a wider community
> > around the various ongoing projects related to Accelerator support within
> > Linux.  The particular area of interest to Huawei is that of harnessing
> > accelerators from userspace, but in a collaborative way with the kernel
> > still able to make efficient use of them, where appropriate.
> >
> > We are keen to foster a wider community than one just focused on
> > our own current technology.  
> 
> Good stuff, but there are problems. e.g. see the thread starting
> with my message here:
> https://www.mail-archive.com/linux-crypto@vger.kernel.org/msg27274.html

An interesting email - we have heard (and observed) the same
working with the optimized ARM algorithm implementations. In particular
the point about 'when a given implementation is quickest' is important.
This isn't even obvious for the various ARM options available in mainline.

There is almost always more overhead (for now ;) in using a crypto
accelerators than doing it on the CPU which puts an obvious issue
in place for small data packets.

It's not easy - however not all crypto needs to come out of the kernel
again and with care you can avoid duplicating the overheads. e.g. 
* Kernel TLS moves the crypto down into the kernel so data goes in 
  once (which has to happen anyway to get it to the network card).
* Storage encryption - again data is going across the kernel boundary anyway.

> 
> My perspective is that of a crypto guy working on general-purpose
> CPUs, anything from 32-bit ARM up. There are certainly problems for
> devices with massive loads like a high-end router or with more limited
> CPUs that I will not even pretend to address.

Agreed - the application area is certainly not general. This stuff
consumes a lot of silicon - you really have to need it to make it worth
the expense.

> For me, far & away the biggest issue is having a good source of random
> numbers; more-or-less all crypto depends on that. The Linux random(4)
> RNG gets close, but there are cases where it may not be well
> initialized soon enough on some systems. If a system provides a
> hardware RNG, I will certainly use it to feed random(4). I do not care
> nearly as much about anything else that might be in a hardware crypto
> accelerator.

Absolutely - that needs to be part of a complete solution.

> 
> Separate accelerator devices require management, separating accesses
> by different kernel threads or by user processes if they are allowed
> to play, keeping them from seeing each other's keys, perhaps saving &
> restoring state sometimes. 

This can be assisted a lot by hardware context management (this look similar
to what you see on high end network cards with things that look like
queue steering etc).  Note, what we are trying to avoid is everything
going full userspace driver stack as has happened for some of those sorts of
systems (e.g. Solarflare's offering)

This hardware is appearing in crypto accelerators - until it is there
I agree this is a really nasty problem - be it perhaps not an unsolvable
one.  There are intermediate levels where the hardware can handle a small
number of contexts but to be truely useful you need to be able to use lots
and have the throughput to support them all with reasonable latency.

> Things that can be built into the CPU --
> RNG instruction or register, AES instructions, Intel's instruction for
> 128-bit finite field multiplication which accelerates AES-GCM,
> probably some I have not heard of -- require less management and are
> usable by  any process, assuming either compiler support or some
> assembler code. As a software guy, I'd far rather the hardware
> designers gave me those than anything that needs a driver.

I absolutely agree. If you can get away with doing crypto
on a CPU (under constraints of power usage and other needs for the cpu)
then do it that way. 

The targets for this stuff are where crypto has become a bottleneck
not cases where a good software implementation is quick enough.

Our initial focus in my team at Huawei, has been crypto partly because we
have a reasonably capable engine to play with - it is just one of
many options moving forward.

Jonathan

Re: Fostering linux community collaboration on hardware accelerators

2017-10-11 Thread Jonathan Cameron

On Tue, 10 Oct 2017 16:55:06 +0200
Francois Ozog <francois.o...@linaro.org> wrote:

> Hi,
Hi Francois,

Firstly, to anyone seeing this on the mailing lists, there was an attachment so
it presumably bounced.

> 
> I am really happy you started this thread. I think we need to have Grant
> Likely from Arm involved in the topic. I only have his HPE mail and not his
> new Arm. I also added Andrea Gallo who has visibility of all segment groups
> in Linaro.

Good idea. Given the simple nature of your average ARM email address I've one 
with
guessing Grant's *crosses fingers*.

> 
> 
> As you say, there are more questions than answers at this stage and I'd
> like to share thoughts and related activities I already conducted so that
> you can take those into account (if they are relevant) to build your action
> plan.
> 
> 
> On the scope of accelerators I include media related (transcoding...) and
> FPGAs.

Two more for the list - I knew I've forget some big groups ;)

> I kicked off discussions with various parties on how to build an
> FPGA focused community (see attached document on FPGAs). The FPGAs will
> need proper "factory" of userland drivers and I expect WD to be
> instrumental in this.
> 
> On the "ODP type" requirements: I think we have to include both ODP and
> DPDK.
> 
> Robustness/security of the userland framework: if anything happens to
> userland, the kernel has to recover.

This issue is a key one I'd missed when drawing up my list. Glad you
raised it.

> Exposing pieces of PCI config space or
> portions of MMIOs may be a new area of intellectual property for Arm.
> 
> I was leading acceleration activities in ETSI NFV (IFA stream 2 group that
> was created by Yun-Chao Hu@ Huawei, Bruno Chatras@Orange and myself) and
> talked to Orange, BT, Telecom Italia and AT yesterday. The guys I know
> are keen to participate to an advisory board related to acceleration in
> Opensource Communities. They see the value of continuing the work done in
> SDOs (ETSI, IETF, IEEE...) in OCs (Linaro, Linux Foundation...). 

Interesting - of course such a structure might be useful. Of course their
input along with that of others would be of great assistance.

However, any formal arrangement or consortium does bring costs that may put
some individuals or companies off involvement.   To my mind, any Linux
focused effort needs to incorporate those interests, but take in wider views.
That's not to say that any Linux community focused effort would not be a
good place for ideas coming out of industry consortiums / advisory groups
to be presented and evaluated as an integral part of the community.

>I think
> their participation to the acceleration efforts is key to ensure we solve
> the right problem the right way. I am trying to organize an accelerator
> workshop "accelerators: from SDO to Opensource) at the ETSI NFV F2F meeting
> (NFV#20 December 4-8th in Sophia Antipolis, France). An agenda item is also
> how to deal with Intel EPA that introduce bias in platform and accelerator
> "scheduling" in Openstack terminology.
> 
> As of particular interest is accelerator management in the context of Cloud
> and NFV. Huawei is leading efforts in Cyborg project (OpenStack) and I
> think, while it should be kept  as a separate effort, we should be aligning
> our efforts in a way or the other.

Management, in a more general sense isn't something I had really considered.
Efforts such as Cyborg seem to sit at a higher level in the stack but clearly
you are correct in suggesting that aligning efforts makes sense.

> 
> CCIX will have a massive impact on networking. With DMA it is all about
> packet MOVEMENT. With CCIX this is about cacheline games. Dealing with
> headroom and tail room of packets will no longer be an issue. Achieving
> 50Gbps and above line rate bumped into DMA transactions per second problem.
> Wit CCIX this is no longer an issue (at least on paper).

This is true of both CCIX and some competing / alternative technologies.
I am very keen to embrace that competition as well so that any results
of collaboration help everyone!  Not to mention there is plenty of
existing hardware with adhoc solutions in place already.

The one big element I clearly forgot to mention in the below scope is
virtualization (oops).  Any solutions clearly need to take that into
account as well.

Jonathan

> 
> 
> 
> Cordially,
> 
> François-Frédéric
> 
> On 10 October 2017 at 13:28, Jonathan Cameron <jonathan.came...@huawei.com>
> wrote:
> 
> > Hi All,
> >
> > Please forward this email to anyone you think may be interested.
> >
> > On behalf of Huawei, I am looking into options to foster a wider community
> > around the various ongoing projects related to Accelerator support

Fostering linux community collaboration on hardware accelerators

2017-10-10 Thread Jonathan Cameron

Hi All,

Please forward this email to anyone you think may be interested.

On behalf of Huawei, I am looking into options to foster a wider community
around the various ongoing projects related to Accelerator support within
Linux.  The particular area of interest to Huawei is that of harnessing
accelerators from userspace, but in a collaborative way with the kernel
still able to make efficient use of them, where appropriate.

We are keen to foster a wider community than one just focused on
our own current technology.  This is a field with no clear answers, so the
widest possible range of input is needed!

The address list of this email is drawn from people we have had discussions
with or who have been suggested in response to Kenneth Lee's wrapdrive
presentation at Linaro Connect and earlier presentations on the more general
issue. A few relevant lists added to hopefully catch anyone we missed.
My apologies to anyone who got swept up in this and isn't interested!

Here we are defining accelerators fairly broadly - suggestions for a better
term are also welcome.

The infrastructure may be appropriate for:
* Traditional offload engines - cryptography, compression and similar
* Upcoming AI accelerators
* ODP type requirements for access to elements of networking
* Systems utilizing SVM including CCIX and other cache coherent buses
* Many things we haven't thought of yet...

As I see it, there are several aspects to this:

1) Kernel drivers for accelerators themselves.
   * Traditional drivers such as crypto etc
- These already have their own communities. The main
  focus of such work will always be through them.
- What a more general community could add here would be an
  overview of the shared infrastructure of such devices.
  This is particularly true around VFIO based (or similar)
  userspace interfaces with a non trivial userspace component.
   * How to support new types of accelerator?

2) The need for lightweight access paths from userspace that 'play well' and
   share resources etc with standard in-kernel drivers.  This is the area
   that Kenneth Lee and Huawei have been focusing on with their wrapdrive
   effort. We know there are other similar efforts going on in other companies.
   * This may involve interacting with existing kernel communities such as
 those around VFIO and mdev.
   * Resource management when we may have many consumers - not all hardware
 has appropriate features to deal with this.

3) Usecases for accelerators. e.g.
   * kTLS
   * Storage encryption
   * ODP - networking dataplane
   * AI toolkits

Discussions we want to get started include:
* A wider range of hardware than we are currently considering. What makes
  sense to target / what hardware do people have they would like to support?
* Upstream paths - potential blockers and how to overcome them. The standard
  kernel drivers should be fairly straightforward, but once we start looking at
  systems with a heavier userspace component, things will get more
  controversial!
* Fostering stronger userspace communities to allow these these accelerators
  to be easily harnessed.

So as ever with a linux community focusing on a particular topic, the
obvious solution is a mailing list. There are a number of options on how
do this.

1) Ask one of the industry bodies to host? Who?

2) Put together a compelling argument for linux-accelerat...@vger.kernel.org
as probably the most generic location for such a list.

More open questions are
1) Scope?
 * Would anyone ever use such an overarching list?
 * Are we better off with the usual adhoc list of 'interested parties' + lkml?
 * Do we actually need to define the full scope - are we better with a vague
   definition?

2) Is there an existing community we can use to discuss these issues?
   (beyond the obvious firehose of LKML).

3) Who else to approach for input on these general questions?

In parallel to this there are elements such as git / patchwork etc but
they can all be done as they are needed.

Thanks

--
Jonathan Cameron
Huawei

Re: [PATCH v5 05/19] crypto: introduce crypto wait for async op

2017-08-14 Thread Jonathan Cameron

On Mon, 14 Aug 2017 18:21:15 +0300
Gilad Ben-Yossef  wrote:

> Invoking a possibly async. crypto op and waiting for completion
> while correctly handling backlog processing is a common task
> in the crypto API implementation and outside users of it.
> 
> This patch adds a generic implementation for doing so in
> preparation for using it across the board instead of hand
> rolled versions.
> 
> Signed-off-by: Gilad Ben-Yossef 
> CC: Eric Biggers 

One really trivial point inline I happened to notice.

> ---
>  crypto/api.c   | 13 +
>  include/linux/crypto.h | 41 +
>  2 files changed, 54 insertions(+)
> 
> diff --git a/crypto/api.c b/crypto/api.c
> index 941cd4c..2a2479d 100644
> --- a/crypto/api.c
> +++ b/crypto/api.c
> @@ -24,6 +24,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include "internal.h"
>  
>  LIST_HEAD(crypto_alg_list);
> @@ -595,5 +596,17 @@ int crypto_has_alg(const char *name, u32 type, u32 mask)
>  }
>  EXPORT_SYMBOL_GPL(crypto_has_alg);
>  
> +void crypto_req_done(struct crypto_async_request *req, int err)
> +{
> + struct crypto_wait *wait = req->data;
> +
> + if (err == -EINPROGRESS)
> + return;
> +
> + wait->err = err;
> + complete(>completion);
> +}
> +EXPORT_SYMBOL_GPL(crypto_req_done);
> +
>  MODULE_DESCRIPTION("Cryptographic core API");
>  MODULE_LICENSE("GPL");
> diff --git a/include/linux/crypto.h b/include/linux/crypto.h
> index 84da997..bb00186 100644
> --- a/include/linux/crypto.h
> +++ b/include/linux/crypto.h
> @@ -24,6 +24,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  /*
>   * Autoloaded crypto modules should only use a prefixed name to avoid 
> allowing
> @@ -468,6 +469,45 @@ struct crypto_alg {
>  } CRYPTO_MINALIGN_ATTR;
>  
>  /*
> + * A helper struct for waiting for completion of async crypto ops
> + */
> +struct crypto_wait {
> + struct completion completion;
> + int err;
> +};
> +
> +/*
> + * Macro for declaring a crypto op async wait object on stack
> + */
> +#define DECLARE_CRYPTO_WAIT(_wait) \
> + struct crypto_wait _wait = { \
> + COMPLETION_INITIALIZER_ONSTACK((_wait).completion), 0 }
> +
> +/*
> + * Async ops completion helper functioons
> + */
> +void crypto_req_done(struct crypto_async_request *req, int err);
> +
> +static inline int crypto_wait_req(int err, struct crypto_wait *wait)
> +{
> + switch (err) {
> + case -EINPROGRESS:
> + case -EBUSY:
> + wait_for_completion(>completion);
> + reinit_completion(>completion);
> + err = wait->err;
> + break;
> + };
> +
> + return err;
> +}
> +
> +static inline void crypto_init_wait(struct crypto_wait *wait)
> +{
> + init_completion(>completion);
> +}
> +
> +/*
>   * Algorithm registration interface.
>   */
>  int crypto_register_alg(struct crypto_alg *alg);
> @@ -1604,5 +1644,6 @@ static inline int crypto_comp_decompress(struct 
> crypto_comp *tfm,
>   src, slen, dst, dlen);
>  }
>  
> +

Trivial observation.  Shouldn't have this bonus blank line inserted here.

>  #endif   /* _LINUX_CRYPTO_H */
>

Re: [PATCH v2 00/11] getting back -Wmaybe-uninitialized

2016-11-12 Thread Jonathan Cameron

On 11/11/16 19:49, Arnd Bergmann wrote:
> On Friday, November 11, 2016 9:13:00 AM CET Linus Torvalds wrote:
>> On Thu, Nov 10, 2016 at 8:44 AM, Arnd Bergmann  wrote:
>>>
>>> Please merge these directly if you are happy with the result.
>>
>> I will take this.
> 
> Thanks a lot!
>  
>> I do see two warnings, but they both seem to be valid and recent,
>> though, so I have no issues with the spurious cases.
> 
> Ok, both of them should have my fixes coming your way already.
> 
>> Warning #1:
>>
>>   sound/soc/qcom/lpass-platform.c: In function ‘lpass_platform_pcmops_open’:
>>   sound/soc/qcom/lpass-platform.c:83:29: warning: ‘dma_ch’ may be used
>> uninitialized in this function [-Wmaybe-uninitialized]
>> drvdata->substream[dma_ch] = substream;
>> ~~~^~~
>>
>> and 'dma_ch' usage there really is crazy and wrong. Broken by
>> 022d00ee0b55 ("ASoC: lpass-platform: Fix broken pcm data usage")
> 
> Right, the patches crossed here, the bugfix patch that introduced
> this came into linux-next over the kernel summit, and the fix I
> sent on Tuesday made it into Mark Brown's tree on Wednesday but not
> before you pulled alsa tree. It should be fixed the next time you
> pull from the alsa tree, the commit is
> 
> 3b89e4b77ef9 ("ASoC: lpass-platform: initialize dma channel number")
>  
>> Warning #2 is not a real bug, but it's reasonable that gcc doesn't
>> know that storage_bytes (chip->read_size) has to be 2/4. Again,
>> introduced recently by commit 231147ee77f3 ("iio: maxim_thermocouple:
>> Align 16 bit big endian value of raw reads"), so you didn't see it.
> 
> This is the one I mentioned in the commit message as one that
> is fixed in linux-next and that should make it in soon.
> 
>>   drivers/iio/temperature/maxim_thermocouple.c: In function
>> ‘maxim_thermocouple_read_raw’:
>>   drivers/iio/temperature/maxim_thermocouple.c:141:5: warning: ‘ret’
>> may be used uninitialized in this function [-Wmaybe-uninitialized]
>> if (ret)
>>^
>>   drivers/iio/temperature/maxim_thermocouple.c:128:6: note: ‘ret’ was
>> declared here
>> int ret;
>> ^~~
>>
>> and I guess that code can just initialize 'ret' to '-EINVAL' or
>> something to just make the theoretical "somehow we had a wrong
>> chip->read_size" case error out cleanly.
> 
> Right, that was my conclusion too. I sent the bugfix on Oct 25
> for linux-next but it didn't make it in until this Monday, after
> you pulled the patch that introduced it on Oct 29.
> 
> The commit in staging-testing is
> 32cb7d27e65d ("iio: maxim_thermocouple: detect invalid storage size in 
> read()")
> 
> Greg and Jonathan, I see now that this is part of the 'iio-for-4.10b'
> branch, so I suspect you were not planning to send this before the
> merge window. Could you make sure this ends up in v4.9 so we get
> a clean build when -Wmaybe-uninitialized gets enabled again?
I'll queue this up and send a pull to Greg tomorrow.

Was highly doubtful that a false warning suppression (be it an
understandable one) was worth sending mid cycle, hence it was
taking the slow route.

Jonathan
> 
>   Arnd
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: linux-next: Tree for Jan 23 (mtd/ubi and iio and crypto/crc32[c])

2013-01-26 Thread Jonathan Cameron

On 01/23/2013 11:41 PM, Randy Dunlap wrote:
 On 01/23/13 15:23, Herbert Xu wrote:
 On Wed, Jan 23, 2013 at 03:10:01PM -0800, Randy Dunlap wrote:
 On 01/22/13 22:43, Stephen Rothwell wrote:
 Hi all,

 Changes since 20130122:



 on i386:

 ERROR: crc32_le [drivers/mtd/ubi/ubi.ko] undefined!
 ERROR: iio_triggered_buffer_setup [drivers/iio/adc/max1363.ko] undefined!
 ERROR: iio_triggered_buffer_cleanup [drivers/iio/adc/max1363.ko] 
 undefined!
 ERROR: __crc32c_le [crypto/crc32c.ko] undefined!
 ERROR: crc32_le [crypto/crc32.ko] undefined!

 Weird.  You do have CONFIG_CRC32 selected, so did it compile
 a lib/crc32.ko or not?
 
 Hmph, this seems to be some kind of rebuild problem.  It's not
 reproducible, so the mtd/ubi and crypto reports above are incorrect --
 please ignore them.  Only the iio one is reproducible.
 
 Sorry about that.
 
For iio one had a patch for a while, just not had a chance to send
a pull request to Greg until today.  Sorry for the slow response.
--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

54 matches

Mail list logo