Re: [PATCH 0/2] crypto: add new driver for Marvell CESA

2015-04-09 Thread Stephan Mueller
Am Donnerstag, 9. April 2015, 16:58:41 schrieb Boris Brezillon:

Hi Boris,

Hello,

This is an attempt to replace the mv_cesa driver by a new one to address
some limitations of the existing driver.
From a performance and CPU load point of view the most important
limitation is the lack of DMA support, thus preventing us from chaining
crypto operations.

I know we usually try to adapt existing drivers instead of replacing them
by new ones, but after trying to refactor the mv_cesa driver I realized it
would take longer than writing an new one from scratch.

Here are the main features brought by this new driver:
- support for armada SoCs (up to 38x) while keeping support for older ones
  (Orion and Kirkwood)
- DMA mode to offload the CPU in case of intensive crypto usage
- new algorithms: SHA256, DES and 3DES

I'd like to thank Arnaud, who has carefully reviewed several iterations of
this driver, helped me improved my implementation, provided support for
several crypto algorithms, provided support for armada-370 and tested
the driver on different platforms, hence the SoB and dual MODULE_AUTHOR
in the driver code.

Your patch 1/2 did not make it to the crypto list. To big? It is on the lkml 
list though.

Best Regards,

Boris

Boris Brezillon (2):
  crypto: add new driver for Marvell CESA
  crypto: marvell/CESA: update DT bindings documentation

 .../devicetree/bindings/crypto/mv_cesa.txt |   50 +-
 drivers/crypto/Kconfig |2 +
 drivers/crypto/Makefile|2 +-
 drivers/crypto/marvell/Makefile|1 +
 drivers/crypto/marvell/cesa.c  |  539 
 drivers/crypto/marvell/cesa.h  |  802 
 drivers/crypto/marvell/cipher.c|  761 +++
 drivers/crypto/marvell/hash.c  | 1349
 drivers/crypto/marvell/tdma.c  | 
223 
 drivers/crypto/mv_cesa.c   | 1193 -
 drivers/crypto/mv_cesa.h   |  150 ---
 11 files changed, 3716 insertions(+), 1356 deletions(-)
 create mode 100644 drivers/crypto/marvell/Makefile
 create mode 100644 drivers/crypto/marvell/cesa.c
 create mode 100644 drivers/crypto/marvell/cesa.h
 create mode 100644 drivers/crypto/marvell/cipher.c
 create mode 100644 drivers/crypto/marvell/hash.c
 create mode 100644 drivers/crypto/marvell/tdma.c
 delete mode 100644 drivers/crypto/mv_cesa.c
 delete mode 100644 drivers/crypto/mv_cesa.h


Ciao
Stephan
--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/2] crypto: add new driver for Marvell CESA

2015-04-09 Thread Boris Brezillon
Hello,

This is an attempt to replace the mv_cesa driver by a new one to address
some limitations of the existing driver.
From a performance and CPU load point of view the most important
limitation is the lack of DMA support, thus preventing us from chaining
crypto operations.

I know we usually try to adapt existing drivers instead of replacing them
by new ones, but after trying to refactor the mv_cesa driver I realized it
would take longer than writing an new one from scratch.

Here are the main features brought by this new driver:
- support for armada SoCs (up to 38x) while keeping support for older ones
  (Orion and Kirkwood)
- DMA mode to offload the CPU in case of intensive crypto usage
- new algorithms: SHA256, DES and 3DES

I'd like to thank Arnaud, who has carefully reviewed several iterations of
this driver, helped me improved my implementation, provided support for
several crypto algorithms, provided support for armada-370 and tested
the driver on different platforms, hence the SoB and dual MODULE_AUTHOR
in the driver code.

Best Regards,

Boris

Boris Brezillon (2):
  crypto: add new driver for Marvell CESA
  crypto: marvell/CESA: update DT bindings documentation

 .../devicetree/bindings/crypto/mv_cesa.txt |   50 +-
 drivers/crypto/Kconfig |2 +
 drivers/crypto/Makefile|2 +-
 drivers/crypto/marvell/Makefile|1 +
 drivers/crypto/marvell/cesa.c  |  539 
 drivers/crypto/marvell/cesa.h  |  802 
 drivers/crypto/marvell/cipher.c|  761 +++
 drivers/crypto/marvell/hash.c  | 1349 
 drivers/crypto/marvell/tdma.c  |  223 
 drivers/crypto/mv_cesa.c   | 1193 -
 drivers/crypto/mv_cesa.h   |  150 ---
 11 files changed, 3716 insertions(+), 1356 deletions(-)
 create mode 100644 drivers/crypto/marvell/Makefile
 create mode 100644 drivers/crypto/marvell/cesa.c
 create mode 100644 drivers/crypto/marvell/cesa.h
 create mode 100644 drivers/crypto/marvell/cipher.c
 create mode 100644 drivers/crypto/marvell/hash.c
 create mode 100644 drivers/crypto/marvell/tdma.c
 delete mode 100644 drivers/crypto/mv_cesa.c
 delete mode 100644 drivers/crypto/mv_cesa.h

-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v4] crypto: remove instance when test failed

2015-04-09 Thread Stephan Mueller
A cipher instance is added to the list of instances unconditionally
regardless of whether the associated test failed. However, a failed
test implies that during another lookup, the cipher instance will
be added to the list again as it will not be found by the lookup
code.

That means that the list can be filled up with instances whose tests
failed.

Note: tests only fail in reality in FIPS mode when a cipher is not
marked as fips_allowed=1. This can be seen with cmac(des3_ede) that does
not have a fips_allowed=1. When allocating the cipher, the allocation
fails with -ENOENT due to the missing fips_allowed=1 flag (which
causes the testmgr to return EINVAL). Yet, the instance of
cmac(des3_ede) is shown in /proc/crypto. Allocating the cipher again
fails again, but a 2nd instance is listed in /proc/crypto.

The patch simply de-registers the instance when the testing failed.

Signed-off-by: Stephan Mueller smuel...@chronox.de
---
 crypto/algapi.c | 10 +-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/crypto/algapi.c b/crypto/algapi.c
index f1d0307..1907d5b 100644
--- a/crypto/algapi.c
+++ b/crypto/algapi.c
@@ -522,7 +522,10 @@ int crypto_register_instance(struct crypto_template *tmpl,
 
err = crypto_check_alg(inst-alg);
if (err)
-   goto err;
+   return err;
+
+   if (unlikely(!crypto_mod_get(inst-alg)))
+   return -EAGAIN;
 
inst-alg.cra_module = tmpl-module;
inst-alg.cra_flags |= CRYPTO_ALG_INSTANCE;
@@ -544,9 +547,14 @@ unlock:
goto err;
 
crypto_wait_for_test(larval);
+
+   /* Remove instance if test failed */
+   if (!(inst-alg.cra_flags  CRYPTO_ALG_TESTED))
+   crypto_unregister_instance(inst);
err = 0;
 
 err:
+   crypto_mod_put(inst-alg);
return err;
 }
 EXPORT_SYMBOL_GPL(crypto_register_instance);
-- 
2.1.0


--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v4 05/16] crypto: sha256-generic: move to generic glue implementation

2015-04-09 Thread Ard Biesheuvel
This updates the generic SHA-256 implementation to use the
new shared SHA-256 glue code.

It also implements a .finup hook crypto_sha256_finup() and exports
it to other modules. The import and export() functions and the
.statesize member are dropped, since the default implementation
is perfectly suitable for this module.

Signed-off-by: Ard Biesheuvel ard.biesheu...@linaro.org
---
 crypto/sha256_generic.c | 133 
 include/crypto/sha.h|   3 ++
 2 files changed, 23 insertions(+), 113 deletions(-)

diff --git a/crypto/sha256_generic.c b/crypto/sha256_generic.c
index b001ff5c2efc..78431163ed3c 100644
--- a/crypto/sha256_generic.c
+++ b/crypto/sha256_generic.c
@@ -23,6 +23,7 @@
 #include linux/mm.h
 #include linux/types.h
 #include crypto/sha.h
+#include crypto/sha256_base.h
 #include asm/byteorder.h
 #include asm/unaligned.h
 
@@ -214,138 +215,43 @@ static void sha256_transform(u32 *state, const u8 *input)
memzero_explicit(W, 64 * sizeof(u32));
 }
 
-static int sha224_init(struct shash_desc *desc)
+static void sha256_generic_block_fn(struct sha256_state *sst, u8 const *src,
+   int blocks)
 {
-   struct sha256_state *sctx = shash_desc_ctx(desc);
-   sctx-state[0] = SHA224_H0;
-   sctx-state[1] = SHA224_H1;
-   sctx-state[2] = SHA224_H2;
-   sctx-state[3] = SHA224_H3;
-   sctx-state[4] = SHA224_H4;
-   sctx-state[5] = SHA224_H5;
-   sctx-state[6] = SHA224_H6;
-   sctx-state[7] = SHA224_H7;
-   sctx-count = 0;
-
-   return 0;
-}
-
-static int sha256_init(struct shash_desc *desc)
-{
-   struct sha256_state *sctx = shash_desc_ctx(desc);
-   sctx-state[0] = SHA256_H0;
-   sctx-state[1] = SHA256_H1;
-   sctx-state[2] = SHA256_H2;
-   sctx-state[3] = SHA256_H3;
-   sctx-state[4] = SHA256_H4;
-   sctx-state[5] = SHA256_H5;
-   sctx-state[6] = SHA256_H6;
-   sctx-state[7] = SHA256_H7;
-   sctx-count = 0;
-
-   return 0;
+   while (blocks--) {
+   sha256_transform(sst-state, src);
+   src += SHA256_BLOCK_SIZE;
+   }
 }
 
 int crypto_sha256_update(struct shash_desc *desc, const u8 *data,
  unsigned int len)
 {
-   struct sha256_state *sctx = shash_desc_ctx(desc);
-   unsigned int partial, done;
-   const u8 *src;
-
-   partial = sctx-count  0x3f;
-   sctx-count += len;
-   done = 0;
-   src = data;
-
-   if ((partial + len)  63) {
-   if (partial) {
-   done = -partial;
-   memcpy(sctx-buf + partial, data, done + 64);
-   src = sctx-buf;
-   }
-
-   do {
-   sha256_transform(sctx-state, src);
-   done += 64;
-   src = data + done;
-   } while (done + 63  len);
-
-   partial = 0;
-   }
-   memcpy(sctx-buf + partial, src, len - done);
-
-   return 0;
+   return sha256_base_do_update(desc, data, len, sha256_generic_block_fn);
 }
 EXPORT_SYMBOL(crypto_sha256_update);
 
 static int sha256_final(struct shash_desc *desc, u8 *out)
 {
-   struct sha256_state *sctx = shash_desc_ctx(desc);
-   __be32 *dst = (__be32 *)out;
-   __be64 bits;
-   unsigned int index, pad_len;
-   int i;
-   static const u8 padding[64] = { 0x80, };
-
-   /* Save number of bits */
-   bits = cpu_to_be64(sctx-count  3);
-
-   /* Pad out to 56 mod 64. */
-   index = sctx-count  0x3f;
-   pad_len = (index  56) ? (56 - index) : ((64+56) - index);
-   crypto_sha256_update(desc, padding, pad_len);
-
-   /* Append length (before padding) */
-   crypto_sha256_update(desc, (const u8 *)bits, sizeof(bits));
-
-   /* Store state in digest */
-   for (i = 0; i  8; i++)
-   dst[i] = cpu_to_be32(sctx-state[i]);
-
-   /* Zeroize sensitive information. */
-   memset(sctx, 0, sizeof(*sctx));
-
-   return 0;
+   sha256_base_do_finalize(desc, sha256_generic_block_fn);
+   return sha256_base_finish(desc, out);
 }
 
-static int sha224_final(struct shash_desc *desc, u8 *hash)
+int crypto_sha256_finup(struct shash_desc *desc, const u8 *data,
+   unsigned int len, u8 *hash)
 {
-   u8 D[SHA256_DIGEST_SIZE];
-
-   sha256_final(desc, D);
-
-   memcpy(hash, D, SHA224_DIGEST_SIZE);
-   memzero_explicit(D, SHA256_DIGEST_SIZE);
-
-   return 0;
-}
-
-static int sha256_export(struct shash_desc *desc, void *out)
-{
-   struct sha256_state *sctx = shash_desc_ctx(desc);
-
-   memcpy(out, sctx, sizeof(*sctx));
-   return 0;
-}
-
-static int sha256_import(struct shash_desc *desc, const void *in)
-{
-   struct sha256_state *sctx = shash_desc_ctx(desc);
-
-   memcpy(sctx, in, sizeof(*sctx));
-   return 0;
+   sha256_base_do_update(desc, data, len, sha256_generic_block_fn);
+  

[PATCH v4 07/16] crypto/arm: move SHA-1 ARM asm implementation to base layer

2015-04-09 Thread Ard Biesheuvel
This removes all the boilerplate from the existing implementation,
and replaces it with calls into the base layer.

Signed-off-by: Ard Biesheuvel ard.biesheu...@linaro.org
---
 arch/arm/crypto/sha1-ce-glue.c   |   3 +-
 arch/arm/{include/asm = }/crypto/sha1.h |   3 +
 arch/arm/crypto/sha1_glue.c  | 112 +--
 arch/arm/crypto/sha1_neon_glue.c |   2 +-
 4 files changed, 22 insertions(+), 98 deletions(-)
 rename arch/arm/{include/asm = }/crypto/sha1.h (67%)

diff --git a/arch/arm/crypto/sha1-ce-glue.c b/arch/arm/crypto/sha1-ce-glue.c
index a9dd90df9fd7..e93b24c1af1f 100644
--- a/arch/arm/crypto/sha1-ce-glue.c
+++ b/arch/arm/crypto/sha1-ce-glue.c
@@ -13,12 +13,13 @@
 #include linux/crypto.h
 #include linux/module.h
 
-#include asm/crypto/sha1.h
 #include asm/hwcap.h
 #include asm/neon.h
 #include asm/simd.h
 #include asm/unaligned.h
 
+#include sha1.h
+
 MODULE_DESCRIPTION(SHA1 secure hash using ARMv8 Crypto Extensions);
 MODULE_AUTHOR(Ard Biesheuvel ard.biesheu...@linaro.org);
 MODULE_LICENSE(GPL v2);
diff --git a/arch/arm/include/asm/crypto/sha1.h b/arch/arm/crypto/sha1.h
similarity index 67%
rename from arch/arm/include/asm/crypto/sha1.h
rename to arch/arm/crypto/sha1.h
index 75e6a417416b..ffd8bd08b1a7 100644
--- a/arch/arm/include/asm/crypto/sha1.h
+++ b/arch/arm/crypto/sha1.h
@@ -7,4 +7,7 @@
 extern int sha1_update_arm(struct shash_desc *desc, const u8 *data,
   unsigned int len);
 
+extern int sha1_finup_arm(struct shash_desc *desc, const u8 *data,
+  unsigned int len, u8 *out);
+
 #endif
diff --git a/arch/arm/crypto/sha1_glue.c b/arch/arm/crypto/sha1_glue.c
index e31b0440c613..6fc73bf8766d 100644
--- a/arch/arm/crypto/sha1_glue.c
+++ b/arch/arm/crypto/sha1_glue.c
@@ -22,127 +22,47 @@
 #include linux/cryptohash.h
 #include linux/types.h
 #include crypto/sha.h
+#include crypto/sha1_base.h
 #include asm/byteorder.h
-#include asm/crypto/sha1.h
 
+#include sha1.h
 
 asmlinkage void sha1_block_data_order(u32 *digest,
const unsigned char *data, unsigned int rounds);
 
-
-static int sha1_init(struct shash_desc *desc)
-{
-   struct sha1_state *sctx = shash_desc_ctx(desc);
-
-   *sctx = (struct sha1_state){
-   .state = { SHA1_H0, SHA1_H1, SHA1_H2, SHA1_H3, SHA1_H4 },
-   };
-
-   return 0;
-}
-
-
-static int __sha1_update(struct sha1_state *sctx, const u8 *data,
-unsigned int len, unsigned int partial)
-{
-   unsigned int done = 0;
-
-   sctx-count += len;
-
-   if (partial) {
-   done = SHA1_BLOCK_SIZE - partial;
-   memcpy(sctx-buffer + partial, data, done);
-   sha1_block_data_order(sctx-state, sctx-buffer, 1);
-   }
-
-   if (len - done = SHA1_BLOCK_SIZE) {
-   const unsigned int rounds = (len - done) / SHA1_BLOCK_SIZE;
-   sha1_block_data_order(sctx-state, data + done, rounds);
-   done += rounds * SHA1_BLOCK_SIZE;
-   }
-
-   memcpy(sctx-buffer, data + done, len - done);
-   return 0;
-}
-
-
 int sha1_update_arm(struct shash_desc *desc, const u8 *data,
unsigned int len)
 {
-   struct sha1_state *sctx = shash_desc_ctx(desc);
-   unsigned int partial = sctx-count % SHA1_BLOCK_SIZE;
-   int res;
+   /* make sure casting to sha1_block_fn() is safe */
+   BUILD_BUG_ON(offsetof(struct sha1_state, state) != 0);
 
-   /* Handle the fast case right here */
-   if (partial + len  SHA1_BLOCK_SIZE) {
-   sctx-count += len;
-   memcpy(sctx-buffer + partial, data, len);
-   return 0;
-   }
-   res = __sha1_update(sctx, data, len, partial);
-   return res;
+   return sha1_base_do_update(desc, data, len,
+  (sha1_block_fn *)sha1_block_data_order);
 }
 EXPORT_SYMBOL_GPL(sha1_update_arm);
 
-
-/* Add padding and return the message digest. */
 static int sha1_final(struct shash_desc *desc, u8 *out)
 {
-   struct sha1_state *sctx = shash_desc_ctx(desc);
-   unsigned int i, index, padlen;
-   __be32 *dst = (__be32 *)out;
-   __be64 bits;
-   static const u8 padding[SHA1_BLOCK_SIZE] = { 0x80, };
-
-   bits = cpu_to_be64(sctx-count  3);
-
-   /* Pad out to 56 mod 64 and append length */
-   index = sctx-count % SHA1_BLOCK_SIZE;
-   padlen = (index  56) ? (56 - index) : ((SHA1_BLOCK_SIZE+56) - index);
-   /* We need to fill a whole block for __sha1_update() */
-   if (padlen = 56) {
-   sctx-count += padlen;
-   memcpy(sctx-buffer + index, padding, padlen);
-   } else {
-   __sha1_update(sctx, padding, padlen, index);
-   }
-   __sha1_update(sctx, (const u8 *)bits, sizeof(bits), 56);
-
-   /* Store state in digest */
-   for (i = 0; i  5; i++)
-   dst[i] = cpu_to_be32(sctx-state[i]);
-
-   /* Wipe context */
-   

[PATCH 2/2] crypto: marvell/CESA: update DT bindings documentation

2015-04-09 Thread Boris Brezillon
Document new compatible strings, document the new method to reference the
crypto SRAM and deprecate the old one and document the the 'clocks' and
'clock-names' properties.

Signed-off-by: Boris Brezillon boris.brezil...@free-electrons.com
---
 .../devicetree/bindings/crypto/mv_cesa.txt | 50 --
 1 file changed, 38 insertions(+), 12 deletions(-)

diff --git a/Documentation/devicetree/bindings/crypto/mv_cesa.txt 
b/Documentation/devicetree/bindings/crypto/mv_cesa.txt
index 47229b1..4ce9bc5 100644
--- a/Documentation/devicetree/bindings/crypto/mv_cesa.txt
+++ b/Documentation/devicetree/bindings/crypto/mv_cesa.txt
@@ -1,20 +1,46 @@
 Marvell Cryptographic Engines And Security Accelerator
 
 Required properties:
-- compatible : should be marvell,orion-crypto
-- reg : base physical address of the engine and length of memory mapped
-region, followed by base physical address of sram and its memory
-length
-- reg-names : regs , sram;
-- interrupts : interrupt number
+- compatible: should be one of the following string
+ marvell,orion-crypto
+ marvell,kirkwood-crypto
+ marvell,armada-370-crypto
+ marvell,armada-xp-crypto
+ marvell,armada-375-crypto
+ marvell,armada-38x-crypto
+- reg: base physical address of the engine and length of memory mapped
+   region
+- reg-names: regs
+- interrupts: interrupt number
+- clocks: reference to the crypto engines clocks. This property is not
+ required for orion and kirkwood platforms
+- clock-names: cesaX and cesazX, X should be replaced by the crypto engine
+  id.
+  This property is not required for the orion and kirkwoord
+  platforms.
+  cesazX clocks are not required on armada-370 platforms
+- marvell,crypto-srams: phandle to crypto SRAM definitions
+
+Optional properties:
+- marvell,crypto-sram-size: SRAM size reserved for crypto operations, if not
+   specified the whole SRAM is used (2KB)
+
+Deprecated properties:
+- reg: base physical address of the engine and length of memory mapped
+   region, followed by base physical address of sram and its memory
+   length
+- reg-names: regs , sram
 
 Examples:
 
-   crypto@3 {
-   compatible = marvell,orion-crypto;
-   reg = 0x3 0x1,
- 0x400 0x800;
-   reg-names = regs , sram;
-   interrupts = 22;
+   crypto@9 {
+   compatible = marvell,armada-xp-crypto;
+   reg = 0x9 0x1;
+   reg-names = regs;
+   interrupts = 48, 49;
+   clocks = gateclk 23, gateclk 23;
+   clock-names = cesa0, cesa1;
+   marvell,crypto-srams = crypto_sram0, crypto_sram1;
+   marvell,crypto-sram-size = 0x600;
status = okay;
};
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] crypto: add new driver for Marvell CESA

2015-04-09 Thread Sebastian Hesselbarth

On 09.04.2015 16:58, Boris Brezillon wrote:

This is an attempt to replace the mv_cesa driver by a new one to address
some limitations of the existing driver.
 From a performance and CPU load point of view the most important
limitation is the lack of DMA support, thus preventing us from chaining
crypto operations.

I know we usually try to adapt existing drivers instead of replacing them
by new ones, but after trying to refactor the mv_cesa driver I realized it
would take longer than writing an new one from scratch.


Boris,

if you include a bunch of performance measurements, I guess it will help
you to get an agreement of replacing the driver instead of reworking it.


Here are the main features brought by this new driver:
- support for armada SoCs (up to 38x) while keeping support for older ones
   (Orion and Kirkwood)


Unfortunately, the list above is missing Dove SoCs which also have a
CESA engine with TDMA support. I checked the registers _very_ quickly
but it seems that they are compatible with Kirkwood's CESA.


- DMA mode to offload the CPU in case of intensive crypto usage
- new algorithms: SHA256, DES and 3DES


[...]

Boris Brezillon (2):
   crypto: add new driver for Marvell CESA
   crypto: marvell/CESA: update DT bindings documentation


IMHO, the patch set should be split up in:
- new core driver
- add support for TDMA on platforms that support it
- new cipher algorithms
- removal of old mv_cesa

I'd love to test on Dove, but time still is very limited. I guess the
patches will receive another round anyway, maybe I find some until the
final version.

Sebastian


  .../devicetree/bindings/crypto/mv_cesa.txt |   50 +-
  drivers/crypto/Kconfig |2 +
  drivers/crypto/Makefile|2 +-
  drivers/crypto/marvell/Makefile|1 +
  drivers/crypto/marvell/cesa.c  |  539 
  drivers/crypto/marvell/cesa.h  |  802 
  drivers/crypto/marvell/cipher.c|  761 +++
  drivers/crypto/marvell/hash.c  | 1349 
  drivers/crypto/marvell/tdma.c  |  223 
  drivers/crypto/mv_cesa.c   | 1193 -
  drivers/crypto/mv_cesa.h   |  150 ---
  11 files changed, 3716 insertions(+), 1356 deletions(-)
  create mode 100644 drivers/crypto/marvell/Makefile
  create mode 100644 drivers/crypto/marvell/cesa.c
  create mode 100644 drivers/crypto/marvell/cesa.h
  create mode 100644 drivers/crypto/marvell/cipher.c
  create mode 100644 drivers/crypto/marvell/hash.c
  create mode 100644 drivers/crypto/marvell/tdma.c
  delete mode 100644 drivers/crypto/mv_cesa.c
  delete mode 100644 drivers/crypto/mv_cesa.h



--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3] crypto: remove instance when test failed

2015-04-09 Thread Stephan Mueller
Am Donnerstag, 9. April 2015, 17:40:35 schrieb Herbert Xu:

Hi Herbert,

On Thu, Apr 09, 2015 at 11:22:19AM +0200, Stephan Mueller wrote:
 I tested it and this approach does not work.
 
 If I see that right, the reason for that is the following: The suggestion
 is
 to grab the ref count at the start of the function followed by a
 __crypto_register_alg. __crypto_register_alg however sets the refcount to 1
 unconditionally. That means that the final put of the alg will most likely
 set the refcount to 0 that causes an issue with all other operations (at
 least I cannot allocate HMAC or CMAC any more -- the ones I currently
 test).
 
 So, the grabing of the alg must happen after the invocation of
 __crypto_register_alg.

Well let's move it then.

Perfect. This now works with the changed patch too. I will resend my patch as 
v4 -- note that this new patch depends on your patch here as otherwise 
instances do not work at all. :-)

---8---
crypto: api - Move alg ref count init to crypto_check_alg

We currently initialise the crypto_alg ref count in the function
__crypto_register_alg.  As one of the callers of that function
crypto_register_instance needs to obtain a ref count before it
calls __crypto_register_alg, we need to move the initialisation
out of there.

Since both callers of __crypto_register_alg call crypto_check_alg,
this is the logical place to perform the initialisation.

Signed-off-by: Herbert Xu herb...@gondor.apana.org.au

Acked-by: Stephan Mueller smuel...@chronox.de

diff --git a/crypto/algapi.c b/crypto/algapi.c
index f1d0307..1462c68 100644
--- a/crypto/algapi.c
+++ b/crypto/algapi.c
@@ -64,6 +64,8 @@ static int crypto_check_alg(struct crypto_alg *alg)
   if (alg-cra_priority  0)
   return -EINVAL;

+  atomic_set(alg-cra_refcnt, 1);
+
   return crypto_set_driver_name(alg);
 }

@@ -187,7 +189,6 @@ static struct crypto_larval *__crypto_register_alg(struct
crypto_alg *alg)

   ret = -EEXIST;

-  atomic_set(alg-cra_refcnt, 1);
   list_for_each_entry(q, crypto_alg_list, cra_list) {
   if (q == alg)
   goto err;


Ciao
Stephan
--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v4 08/16] crypto/arm: move SHA-1 NEON implementation to base layer

2015-04-09 Thread Ard Biesheuvel
This removes all the boilerplate from the existing implementation,
and replaces it with calls into the base layer.

Signed-off-by: Ard Biesheuvel ard.biesheu...@linaro.org
---
 arch/arm/crypto/sha1_neon_glue.c | 135 +++
 1 file changed, 24 insertions(+), 111 deletions(-)

diff --git a/arch/arm/crypto/sha1_neon_glue.c b/arch/arm/crypto/sha1_neon_glue.c
index 5d9a1b4aac73..4e22f122f966 100644
--- a/arch/arm/crypto/sha1_neon_glue.c
+++ b/arch/arm/crypto/sha1_neon_glue.c
@@ -25,7 +25,7 @@
 #include linux/cryptohash.h
 #include linux/types.h
 #include crypto/sha.h
-#include asm/byteorder.h
+#include crypto/sha1_base.h
 #include asm/neon.h
 #include asm/simd.h
 
@@ -34,138 +34,51 @@
 asmlinkage void sha1_transform_neon(void *state_h, const char *data,
unsigned int rounds);
 
-
-static int sha1_neon_init(struct shash_desc *desc)
-{
-   struct sha1_state *sctx = shash_desc_ctx(desc);
-
-   *sctx = (struct sha1_state){
-   .state = { SHA1_H0, SHA1_H1, SHA1_H2, SHA1_H3, SHA1_H4 },
-   };
-
-   return 0;
-}
-
-static int __sha1_neon_update(struct shash_desc *desc, const u8 *data,
-  unsigned int len, unsigned int partial)
-{
-   struct sha1_state *sctx = shash_desc_ctx(desc);
-   unsigned int done = 0;
-
-   sctx-count += len;
-
-   if (partial) {
-   done = SHA1_BLOCK_SIZE - partial;
-   memcpy(sctx-buffer + partial, data, done);
-   sha1_transform_neon(sctx-state, sctx-buffer, 1);
-   }
-
-   if (len - done = SHA1_BLOCK_SIZE) {
-   const unsigned int rounds = (len - done) / SHA1_BLOCK_SIZE;
-
-   sha1_transform_neon(sctx-state, data + done, rounds);
-   done += rounds * SHA1_BLOCK_SIZE;
-   }
-
-   memcpy(sctx-buffer, data + done, len - done);
-
-   return 0;
-}
-
 static int sha1_neon_update(struct shash_desc *desc, const u8 *data,
-unsigned int len)
+ unsigned int len)
 {
struct sha1_state *sctx = shash_desc_ctx(desc);
-   unsigned int partial = sctx-count % SHA1_BLOCK_SIZE;
-   int res;
 
-   /* Handle the fast case right here */
-   if (partial + len  SHA1_BLOCK_SIZE) {
-   sctx-count += len;
-   memcpy(sctx-buffer + partial, data, len);
+   if (!may_use_simd() ||
+   (sctx-count % SHA1_BLOCK_SIZE) + len  SHA1_BLOCK_SIZE)
+   return sha1_update_arm(desc, data, len);
 
-   return 0;
-   }
-
-   if (!may_use_simd()) {
-   res = sha1_update_arm(desc, data, len);
-   } else {
-   kernel_neon_begin();
-   res = __sha1_neon_update(desc, data, len, partial);
-   kernel_neon_end();
-   }
-
-   return res;
-}
-
-
-/* Add padding and return the message digest. */
-static int sha1_neon_final(struct shash_desc *desc, u8 *out)
-{
-   struct sha1_state *sctx = shash_desc_ctx(desc);
-   unsigned int i, index, padlen;
-   __be32 *dst = (__be32 *)out;
-   __be64 bits;
-   static const u8 padding[SHA1_BLOCK_SIZE] = { 0x80, };
-
-   bits = cpu_to_be64(sctx-count  3);
-
-   /* Pad out to 56 mod 64 and append length */
-   index = sctx-count % SHA1_BLOCK_SIZE;
-   padlen = (index  56) ? (56 - index) : ((SHA1_BLOCK_SIZE+56) - index);
-   if (!may_use_simd()) {
-   sha1_update_arm(desc, padding, padlen);
-   sha1_update_arm(desc, (const u8 *)bits, sizeof(bits));
-   } else {
-   kernel_neon_begin();
-   /* We need to fill a whole block for __sha1_neon_update() */
-   if (padlen = 56) {
-   sctx-count += padlen;
-   memcpy(sctx-buffer + index, padding, padlen);
-   } else {
-   __sha1_neon_update(desc, padding, padlen, index);
-   }
-   __sha1_neon_update(desc, (const u8 *)bits, sizeof(bits), 56);
-   kernel_neon_end();
-   }
-
-   /* Store state in digest */
-   for (i = 0; i  5; i++)
-   dst[i] = cpu_to_be32(sctx-state[i]);
-
-   /* Wipe context */
-   memset(sctx, 0, sizeof(*sctx));
+   kernel_neon_begin();
+   sha1_base_do_update(desc, data, len,
+   (sha1_block_fn *)sha1_transform_neon);
+   kernel_neon_end();
 
return 0;
 }
 
-static int sha1_neon_export(struct shash_desc *desc, void *out)
+static int sha1_neon_finup(struct shash_desc *desc, const u8 *data,
+  unsigned int len, u8 *out)
 {
-   struct sha1_state *sctx = shash_desc_ctx(desc);
+   if (!may_use_simd())
+   return sha1_finup_arm(desc, data, len, out);
 
-   memcpy(out, sctx, sizeof(*sctx));
+   kernel_neon_begin();
+   if (len)
+   sha1_base_do_update(desc, data, len,
+ 

[PATCH v4 01/16] crypto: sha1: implement base layer for SHA-1

2015-04-09 Thread Ard Biesheuvel
To reduce the number of copies of boilerplate code throughout
the tree, this patch implements generic glue for the SHA-1
algorithm. This allows a specific arch or hardware implementation
to only implement the special handling that it needs.

The users need to supply an implementation of

  void (sha1_block_fn)(struct sha1_state *sst, u8 const *src, int blocks)

and pass it to the SHA-1 base functions. For easy casting between the
prototype above and existing block functions that take a 'u32 state[]'
as their first argument, the 'state' member of struct sha1_state is
moved to the base of the struct.

Signed-off-by: Ard Biesheuvel ard.biesheu...@linaro.org
---
 include/crypto/sha.h   |   2 +-
 include/crypto/sha1_base.h | 106 +
 2 files changed, 107 insertions(+), 1 deletion(-)
 create mode 100644 include/crypto/sha1_base.h

diff --git a/include/crypto/sha.h b/include/crypto/sha.h
index 190f8a0e0242..a9aad8e63f43 100644
--- a/include/crypto/sha.h
+++ b/include/crypto/sha.h
@@ -65,8 +65,8 @@
 #define SHA512_H7  0x5be0cd19137e2179ULL
 
 struct sha1_state {
-   u64 count;
u32 state[SHA1_DIGEST_SIZE / 4];
+   u64 count;
u8 buffer[SHA1_BLOCK_SIZE];
 };
 
diff --git a/include/crypto/sha1_base.h b/include/crypto/sha1_base.h
new file mode 100644
index ..41ccc5f325a0
--- /dev/null
+++ b/include/crypto/sha1_base.h
@@ -0,0 +1,106 @@
+/*
+ * sha1_base.h - core logic for SHA-1 implementations
+ *
+ * Copyright (C) 2015 Linaro Ltd ard.biesheu...@linaro.org
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include crypto/internal/hash.h
+#include crypto/sha.h
+#include linux/crypto.h
+#include linux/module.h
+
+#include asm/unaligned.h
+
+typedef void (sha1_block_fn)(struct sha1_state *sst, u8 const *src, int 
blocks);
+
+static inline int sha1_base_init(struct shash_desc *desc)
+{
+   struct sha1_state *sctx = shash_desc_ctx(desc);
+
+   sctx-state[0] = SHA1_H0;
+   sctx-state[1] = SHA1_H1;
+   sctx-state[2] = SHA1_H2;
+   sctx-state[3] = SHA1_H3;
+   sctx-state[4] = SHA1_H4;
+   sctx-count = 0;
+
+   return 0;
+}
+
+static inline int sha1_base_do_update(struct shash_desc *desc,
+ const u8 *data,
+ unsigned int len,
+ sha1_block_fn *block_fn)
+{
+   struct sha1_state *sctx = shash_desc_ctx(desc);
+   unsigned int partial = sctx-count % SHA1_BLOCK_SIZE;
+
+   sctx-count += len;
+
+   if (unlikely((partial + len) = SHA1_BLOCK_SIZE)) {
+   int blocks;
+
+   if (partial) {
+   int p = SHA1_BLOCK_SIZE - partial;
+
+   memcpy(sctx-buffer + partial, data, p);
+   data += p;
+   len -= p;
+
+   block_fn(sctx, sctx-buffer, 1);
+   }
+
+   blocks = len / SHA1_BLOCK_SIZE;
+   len %= SHA1_BLOCK_SIZE;
+
+   if (blocks) {
+   block_fn(sctx, data, blocks);
+   data += blocks * SHA1_BLOCK_SIZE;
+   }
+   partial = 0;
+   }
+   if (len)
+   memcpy(sctx-buffer + partial, data, len);
+
+   return 0;
+}
+
+static inline int sha1_base_do_finalize(struct shash_desc *desc,
+   sha1_block_fn *block_fn)
+{
+   const int bit_offset = SHA1_BLOCK_SIZE - sizeof(__be64);
+   struct sha1_state *sctx = shash_desc_ctx(desc);
+   __be64 *bits = (__be64 *)(sctx-buffer + bit_offset);
+   unsigned int partial = sctx-count % SHA1_BLOCK_SIZE;
+
+   sctx-buffer[partial++] = 0x80;
+   if (partial  bit_offset) {
+   memset(sctx-buffer + partial, 0x0, SHA1_BLOCK_SIZE - partial);
+   partial = 0;
+
+   block_fn(sctx, sctx-buffer, 1);
+   }
+
+   memset(sctx-buffer + partial, 0x0, bit_offset - partial);
+   *bits = cpu_to_be64(sctx-count  3);
+   block_fn(sctx, sctx-buffer, 1);
+
+   return 0;
+}
+
+static inline int sha1_base_finish(struct shash_desc *desc, u8 *out)
+{
+   struct sha1_state *sctx = shash_desc_ctx(desc);
+   __be32 *digest = (__be32 *)out;
+   int i;
+
+   for (i = 0; i  SHA1_DIGEST_SIZE / sizeof(__be32); i++)
+   put_unaligned_be32(sctx-state[i], digest++);
+
+   *sctx = (struct sha1_state){};
+   return 0;
+}
-- 
1.8.3.2

--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v4 02/16] crypto: sha256: implement base layer for SHA-256

2015-04-09 Thread Ard Biesheuvel
To reduce the number of copies of boilerplate code throughout
the tree, this patch implements generic glue for the SHA-256
algorithm. This allows a specific arch or hardware implementation
to only implement the special handling that it needs.

The users need to supply an implementation of

  void (sha256_block_fn)(struct sha256_state *sst, u8 const *src, int blocks)

and pass it to the SHA-256 base functions. For easy casting between the
prototype above and existing block functions that take a 'u32 state[]'
as their first argument, the 'state' member of struct sha256_state is
moved to the base of the struct.

Signed-off-by: Ard Biesheuvel ard.biesheu...@linaro.org
---
 include/crypto/sha.h |   2 +-
 include/crypto/sha256_base.h | 128 +++
 2 files changed, 129 insertions(+), 1 deletion(-)
 create mode 100644 include/crypto/sha256_base.h

diff --git a/include/crypto/sha.h b/include/crypto/sha.h
index a9aad8e63f43..a75bc80cc776 100644
--- a/include/crypto/sha.h
+++ b/include/crypto/sha.h
@@ -71,8 +71,8 @@ struct sha1_state {
 };
 
 struct sha256_state {
-   u64 count;
u32 state[SHA256_DIGEST_SIZE / 4];
+   u64 count;
u8 buf[SHA256_BLOCK_SIZE];
 };
 
diff --git a/include/crypto/sha256_base.h b/include/crypto/sha256_base.h
new file mode 100644
index ..d1f2195bb7de
--- /dev/null
+++ b/include/crypto/sha256_base.h
@@ -0,0 +1,128 @@
+/*
+ * sha256_base.h - core logic for SHA-256 implementations
+ *
+ * Copyright (C) 2015 Linaro Ltd ard.biesheu...@linaro.org
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include crypto/internal/hash.h
+#include crypto/sha.h
+#include linux/crypto.h
+#include linux/module.h
+
+#include asm/unaligned.h
+
+typedef void (sha256_block_fn)(struct sha256_state *sst, u8 const *src,
+  int blocks);
+
+static inline int sha224_base_init(struct shash_desc *desc)
+{
+   struct sha256_state *sctx = shash_desc_ctx(desc);
+
+   sctx-state[0] = SHA224_H0;
+   sctx-state[1] = SHA224_H1;
+   sctx-state[2] = SHA224_H2;
+   sctx-state[3] = SHA224_H3;
+   sctx-state[4] = SHA224_H4;
+   sctx-state[5] = SHA224_H5;
+   sctx-state[6] = SHA224_H6;
+   sctx-state[7] = SHA224_H7;
+   sctx-count = 0;
+
+   return 0;
+}
+
+static inline int sha256_base_init(struct shash_desc *desc)
+{
+   struct sha256_state *sctx = shash_desc_ctx(desc);
+
+   sctx-state[0] = SHA256_H0;
+   sctx-state[1] = SHA256_H1;
+   sctx-state[2] = SHA256_H2;
+   sctx-state[3] = SHA256_H3;
+   sctx-state[4] = SHA256_H4;
+   sctx-state[5] = SHA256_H5;
+   sctx-state[6] = SHA256_H6;
+   sctx-state[7] = SHA256_H7;
+   sctx-count = 0;
+
+   return 0;
+}
+
+static inline int sha256_base_do_update(struct shash_desc *desc,
+   const u8 *data,
+   unsigned int len,
+   sha256_block_fn *block_fn)
+{
+   struct sha256_state *sctx = shash_desc_ctx(desc);
+   unsigned int partial = sctx-count % SHA256_BLOCK_SIZE;
+
+   sctx-count += len;
+
+   if (unlikely((partial + len) = SHA256_BLOCK_SIZE)) {
+   int blocks;
+
+   if (partial) {
+   int p = SHA256_BLOCK_SIZE - partial;
+
+   memcpy(sctx-buf + partial, data, p);
+   data += p;
+   len -= p;
+
+   block_fn(sctx, sctx-buf, 1);
+   }
+
+   blocks = len / SHA256_BLOCK_SIZE;
+   len %= SHA256_BLOCK_SIZE;
+
+   if (blocks) {
+   block_fn(sctx, data, blocks);
+   data += blocks * SHA256_BLOCK_SIZE;
+   }
+   partial = 0;
+   }
+   if (len)
+   memcpy(sctx-buf + partial, data, len);
+
+   return 0;
+}
+
+static inline int sha256_base_do_finalize(struct shash_desc *desc,
+ sha256_block_fn *block_fn)
+{
+   const int bit_offset = SHA256_BLOCK_SIZE - sizeof(__be64);
+   struct sha256_state *sctx = shash_desc_ctx(desc);
+   __be64 *bits = (__be64 *)(sctx-buf + bit_offset);
+   unsigned int partial = sctx-count % SHA256_BLOCK_SIZE;
+
+   sctx-buf[partial++] = 0x80;
+   if (partial  bit_offset) {
+   memset(sctx-buf + partial, 0x0, SHA256_BLOCK_SIZE - partial);
+   partial = 0;
+
+   block_fn(sctx, sctx-buf, 1);
+   }
+
+   memset(sctx-buf + partial, 0x0, bit_offset - partial);
+   *bits = cpu_to_be64(sctx-count  3);
+   block_fn(sctx, sctx-buf, 1);
+
+   return 0;
+}
+
+static inline int sha256_base_finish(struct shash_desc *desc, u8 *out)
+{
+   unsigned int 

[PATCH v4 12/16] crypto/arm64: move SHA-1 ARMv8 implementation to base layer

2015-04-09 Thread Ard Biesheuvel
This removes all the boilerplate from the existing implementation,
and replaces it with calls into the base layer.

Signed-off-by: Ard Biesheuvel ard.biesheu...@linaro.org
---
 arch/arm64/crypto/sha1-ce-core.S |  33 -
 arch/arm64/crypto/sha1-ce-glue.c | 151 ---
 2 files changed, 59 insertions(+), 125 deletions(-)

diff --git a/arch/arm64/crypto/sha1-ce-core.S b/arch/arm64/crypto/sha1-ce-core.S
index 09d57d98609c..033aae6d732a 100644
--- a/arch/arm64/crypto/sha1-ce-core.S
+++ b/arch/arm64/crypto/sha1-ce-core.S
@@ -66,8 +66,8 @@
.word   0x5a827999, 0x6ed9eba1, 0x8f1bbcdc, 0xca62c1d6
 
/*
-* void sha1_ce_transform(int blocks, u8 const *src, u32 *state,
-*u8 *head, long bytes)
+* void sha1_ce_transform(struct sha1_ce_state *sst, u8 const *src,
+*int blocks)
 */
 ENTRY(sha1_ce_transform)
/* load round constants */
@@ -78,25 +78,22 @@ ENTRY(sha1_ce_transform)
ld1r{k3.4s}, [x6]
 
/* load state */
-   ldr dga, [x2]
-   ldr dgb, [x2, #16]
+   ldr dga, [x0]
+   ldr dgb, [x0, #16]
 
-   /* load partial state (if supplied) */
-   cbz x3, 0f
-   ld1 {v8.4s-v11.4s}, [x3]
-   b   1f
+   /* load sha1_ce_state::finalize */
+   ldr w4, [x0, #:lo12:sha1_ce_offsetof_finalize]
 
/* load input */
 0: ld1 {v8.4s-v11.4s}, [x1], #64
-   sub w0, w0, #1
+   sub w2, w2, #1
 
-1:
 CPU_LE(rev32   v8.16b, v8.16b  )
 CPU_LE(rev32   v9.16b, v9.16b  )
 CPU_LE(rev32   v10.16b, v10.16b)
 CPU_LE(rev32   v11.16b, v11.16b)
 
-2: add t0.4s, v8.4s, k0.4s
+1: add t0.4s, v8.4s, k0.4s
mov dg0v.16b, dgav.16b
 
add_update  c, ev, k0,  8,  9, 10, 11, dgb
@@ -127,15 +124,15 @@ CPU_LE(   rev32   v11.16b, v11.16b)
add dgbv.2s, dgbv.2s, dg1v.2s
add dgav.4s, dgav.4s, dg0v.4s
 
-   cbnzw0, 0b
+   cbnzw2, 0b
 
/*
 * Final block: add padding and total bit count.
-* Skip if we have no total byte count in x4. In that case, the input
-* size was not a round multiple of the block size, and the padding is
-* handled by the C code.
+* Skip if the input size was not a round multiple of the block size,
+* the padding is handled by the C code in that case.
 */
cbz x4, 3f
+   ldr x4, [x0, #:lo12:sha1_ce_offsetof_count]
moviv9.2d, #0
mov x8, #0x8000
moviv10.2d, #0
@@ -144,10 +141,10 @@ CPU_LE(   rev32   v11.16b, v11.16b)
mov x4, #0
mov v11.d[0], xzr
mov v11.d[1], x7
-   b   2b
+   b   1b
 
/* store new state */
-3: str dga, [x2]
-   str dgb, [x2, #16]
+3: str dga, [x0]
+   str dgb, [x0, #16]
ret
 ENDPROC(sha1_ce_transform)
diff --git a/arch/arm64/crypto/sha1-ce-glue.c b/arch/arm64/crypto/sha1-ce-glue.c
index 6fe83f37a750..114e7cc5de8c 100644
--- a/arch/arm64/crypto/sha1-ce-glue.c
+++ b/arch/arm64/crypto/sha1-ce-glue.c
@@ -12,144 +12,81 @@
 #include asm/unaligned.h
 #include crypto/internal/hash.h
 #include crypto/sha.h
+#include crypto/sha1_base.h
 #include linux/cpufeature.h
 #include linux/crypto.h
 #include linux/module.h
 
+#define ASM_EXPORT(sym, val) \
+   asm(.globl  #sym ; .set  #sym , %0 :: I(val));
+
 MODULE_DESCRIPTION(SHA1 secure hash using ARMv8 Crypto Extensions);
 MODULE_AUTHOR(Ard Biesheuvel ard.biesheu...@linaro.org);
 MODULE_LICENSE(GPL v2);
 
-asmlinkage void sha1_ce_transform(int blocks, u8 const *src, u32 *state,
- u8 *head, long bytes);
+struct sha1_ce_state {
+   struct sha1_state   sst;
+   u32 finalize;
+};
 
-static int sha1_init(struct shash_desc *desc)
-{
-   struct sha1_state *sctx = shash_desc_ctx(desc);
+asmlinkage void sha1_ce_transform(struct sha1_ce_state *sst, u8 const *src,
+ int blocks);
 
-   *sctx = (struct sha1_state){
-   .state = { SHA1_H0, SHA1_H1, SHA1_H2, SHA1_H3, SHA1_H4 },
-   };
-   return 0;
-}
-
-static int sha1_update(struct shash_desc *desc, const u8 *data,
-  unsigned int len)
+static int sha1_ce_update(struct shash_desc *desc, const u8 *data,
+ unsigned int len)
 {
-   struct sha1_state *sctx = shash_desc_ctx(desc);
-   unsigned int partial = sctx-count % SHA1_BLOCK_SIZE;
-
- 

[PATCH v4 09/16] crypto/arm: move SHA-1 ARMv8 implementation to base layer

2015-04-09 Thread Ard Biesheuvel
This removes all the boilerplate from the existing implementation,
and replaces it with calls into the base layer.

Signed-off-by: Ard Biesheuvel ard.biesheu...@linaro.org
---
 arch/arm/crypto/Kconfig|   1 -
 arch/arm/crypto/sha1-ce-core.S |  23 +++--
 arch/arm/crypto/sha1-ce-glue.c | 107 ++---
 3 files changed, 33 insertions(+), 98 deletions(-)

diff --git a/arch/arm/crypto/Kconfig b/arch/arm/crypto/Kconfig
index 458729d2ce22..5ed98bc6f95d 100644
--- a/arch/arm/crypto/Kconfig
+++ b/arch/arm/crypto/Kconfig
@@ -31,7 +31,6 @@ config CRYPTO_SHA1_ARM_CE
tristate SHA1 digest algorithm (ARM v8 Crypto Extensions)
depends on KERNEL_MODE_NEON
select CRYPTO_SHA1_ARM
-   select CRYPTO_SHA1
select CRYPTO_HASH
help
  SHA-1 secure hash standard (FIPS 180-1/DFIPS 180-2) implemented
diff --git a/arch/arm/crypto/sha1-ce-core.S b/arch/arm/crypto/sha1-ce-core.S
index 4aad520935d8..b623f51ccbcf 100644
--- a/arch/arm/crypto/sha1-ce-core.S
+++ b/arch/arm/crypto/sha1-ce-core.S
@@ -61,8 +61,8 @@
.word   0xca62c1d6, 0xca62c1d6, 0xca62c1d6, 0xca62c1d6
 
/*
-* void sha1_ce_transform(int blocks, u8 const *src, u32 *state,
-*u8 *head);
+* void sha1_ce_transform(struct sha1_state *sst, u8 const *src,
+*int blocks);
 */
 ENTRY(sha1_ce_transform)
/* load round constants */
@@ -71,23 +71,14 @@ ENTRY(sha1_ce_transform)
vld1.32 {k2-k3}, [ip, :128]
 
/* load state */
-   vld1.32 {dga}, [r2]
-   vldrdgbs, [r2, #16]
-
-   /* load partial input (if supplied) */
-   teq r3, #0
-   beq 0f
-   vld1.32 {q8-q9}, [r3]!
-   vld1.32 {q10-q11}, [r3]
-   teq r0, #0
-   b   1f
+   vld1.32 {dga}, [r0]
+   vldrdgbs, [r0, #16]
 
/* load input */
 0: vld1.32 {q8-q9}, [r1]!
vld1.32 {q10-q11}, [r1]!
-   subsr0, r0, #1
+   subsr2, r2, #1
 
-1:
 #ifndef CONFIG_CPU_BIG_ENDIAN
vrev32.8q8, q8
vrev32.8q9, q9
@@ -128,7 +119,7 @@ ENTRY(sha1_ce_transform)
bne 0b
 
/* store new state */
-   vst1.32 {dga}, [r2]
-   vstrdgbs, [r2, #16]
+   vst1.32 {dga}, [r0]
+   vstrdgbs, [r0, #16]
bx  lr
 ENDPROC(sha1_ce_transform)
diff --git a/arch/arm/crypto/sha1-ce-glue.c b/arch/arm/crypto/sha1-ce-glue.c
index e93b24c1af1f..80bc2fcd241a 100644
--- a/arch/arm/crypto/sha1-ce-glue.c
+++ b/arch/arm/crypto/sha1-ce-glue.c
@@ -10,13 +10,13 @@
 
 #include crypto/internal/hash.h
 #include crypto/sha.h
+#include crypto/sha1_base.h
 #include linux/crypto.h
 #include linux/module.h
 
 #include asm/hwcap.h
 #include asm/neon.h
 #include asm/simd.h
-#include asm/unaligned.h
 
 #include sha1.h
 
@@ -24,107 +24,52 @@ MODULE_DESCRIPTION(SHA1 secure hash using ARMv8 Crypto 
Extensions);
 MODULE_AUTHOR(Ard Biesheuvel ard.biesheu...@linaro.org);
 MODULE_LICENSE(GPL v2);
 
-asmlinkage void sha1_ce_transform(int blocks, u8 const *src, u32 *state, 
- u8 *head);
+asmlinkage void sha1_ce_transform(struct sha1_state *sst, u8 const *src,
+ int blocks);
 
-static int sha1_init(struct shash_desc *desc)
+static int sha1_ce_update(struct shash_desc *desc, const u8 *data,
+ unsigned int len)
 {
struct sha1_state *sctx = shash_desc_ctx(desc);
 
-   *sctx = (struct sha1_state){
-   .state = { SHA1_H0, SHA1_H1, SHA1_H2, SHA1_H3, SHA1_H4 },
-   };
-   return 0;
-}
-
-static int sha1_update(struct shash_desc *desc, const u8 *data,
-  unsigned int len)
-{
-   struct sha1_state *sctx = shash_desc_ctx(desc);
-   unsigned int partial;
-
-   if (!may_use_simd())
+   if (!may_use_simd() ||
+   (sctx-count % SHA1_BLOCK_SIZE) + len  SHA1_BLOCK_SIZE)
return sha1_update_arm(desc, data, len);
 
-   partial = sctx-count % SHA1_BLOCK_SIZE;
-   sctx-count += len;
-
-   if ((partial + len) = SHA1_BLOCK_SIZE) {
-   int blocks;
+   kernel_neon_begin();
+   sha1_base_do_update(desc, data, len, sha1_ce_transform);
+   kernel_neon_end();
 
-   if (partial) {
-   int p = SHA1_BLOCK_SIZE - partial;
-
-   memcpy(sctx-buffer + partial, data, p);
-   data += p;
-   len -= p;
-   }
-
-   blocks = len / SHA1_BLOCK_SIZE;
-   len %= SHA1_BLOCK_SIZE;
-
-   kernel_neon_begin();
-   sha1_ce_transform(blocks, data, sctx-state,
- partial ? sctx-buffer : NULL);
-   

[PATCH v4 06/16] crypto: sha512-generic: move to generic glue implementation

2015-04-09 Thread Ard Biesheuvel
This updated the generic SHA-512 implementation to use the
generic shared SHA-512 glue code.

It also implements a .finup hook crypto_sha512_finup() and exports
it to other modules. The import and export() functions and the
.statesize member are dropped, since the default implementation
is perfectly suitable for this module.

Signed-off-by: Ard Biesheuvel ard.biesheu...@linaro.org
---
 crypto/sha512_generic.c | 123 +---
 include/crypto/sha.h|   3 ++
 2 files changed, 24 insertions(+), 102 deletions(-)

diff --git a/crypto/sha512_generic.c b/crypto/sha512_generic.c
index 1c3c3767e079..eba965d18bfc 100644
--- a/crypto/sha512_generic.c
+++ b/crypto/sha512_generic.c
@@ -18,6 +18,7 @@
 #include linux/crypto.h
 #include linux/types.h
 #include crypto/sha.h
+#include crypto/sha512_base.h
 #include linux/percpu.h
 #include asm/byteorder.h
 #include asm/unaligned.h
@@ -130,125 +131,42 @@ sha512_transform(u64 *state, const u8 *input)
a = b = c = d = e = f = g = h = t1 = t2 = 0;
 }
 
-static int
-sha512_init(struct shash_desc *desc)
+static void sha512_generic_block_fn(struct sha512_state *sst, u8 const *src,
+   int blocks)
 {
-   struct sha512_state *sctx = shash_desc_ctx(desc);
-   sctx-state[0] = SHA512_H0;
-   sctx-state[1] = SHA512_H1;
-   sctx-state[2] = SHA512_H2;
-   sctx-state[3] = SHA512_H3;
-   sctx-state[4] = SHA512_H4;
-   sctx-state[5] = SHA512_H5;
-   sctx-state[6] = SHA512_H6;
-   sctx-state[7] = SHA512_H7;
-   sctx-count[0] = sctx-count[1] = 0;
-
-   return 0;
-}
-
-static int
-sha384_init(struct shash_desc *desc)
-{
-   struct sha512_state *sctx = shash_desc_ctx(desc);
-   sctx-state[0] = SHA384_H0;
-   sctx-state[1] = SHA384_H1;
-   sctx-state[2] = SHA384_H2;
-   sctx-state[3] = SHA384_H3;
-   sctx-state[4] = SHA384_H4;
-   sctx-state[5] = SHA384_H5;
-   sctx-state[6] = SHA384_H6;
-   sctx-state[7] = SHA384_H7;
-   sctx-count[0] = sctx-count[1] = 0;
-
-   return 0;
+   while (blocks--) {
+   sha512_transform(sst-state, src);
+   src += SHA512_BLOCK_SIZE;
+   }
 }
 
 int crypto_sha512_update(struct shash_desc *desc, const u8 *data,
unsigned int len)
 {
-   struct sha512_state *sctx = shash_desc_ctx(desc);
-
-   unsigned int i, index, part_len;
-
-   /* Compute number of bytes mod 128 */
-   index = sctx-count[0]  0x7f;
-
-   /* Update number of bytes */
-   if ((sctx-count[0] += len)  len)
-   sctx-count[1]++;
-
-part_len = 128 - index;
-
-   /* Transform as many times as possible. */
-   if (len = part_len) {
-   memcpy(sctx-buf[index], data, part_len);
-   sha512_transform(sctx-state, sctx-buf);
-
-   for (i = part_len; i + 127  len; i+=128)
-   sha512_transform(sctx-state, data[i]);
-
-   index = 0;
-   } else {
-   i = 0;
-   }
-
-   /* Buffer remaining input */
-   memcpy(sctx-buf[index], data[i], len - i);
-
-   return 0;
+   return sha512_base_do_update(desc, data, len, sha512_generic_block_fn);
 }
 EXPORT_SYMBOL(crypto_sha512_update);
 
-static int
-sha512_final(struct shash_desc *desc, u8 *hash)
+static int sha512_final(struct shash_desc *desc, u8 *hash)
 {
-   struct sha512_state *sctx = shash_desc_ctx(desc);
-static u8 padding[128] = { 0x80, };
-   __be64 *dst = (__be64 *)hash;
-   __be64 bits[2];
-   unsigned int index, pad_len;
-   int i;
-
-   /* Save number of bits */
-   bits[1] = cpu_to_be64(sctx-count[0]  3);
-   bits[0] = cpu_to_be64(sctx-count[1]  3 | sctx-count[0]  61);
-
-   /* Pad out to 112 mod 128. */
-   index = sctx-count[0]  0x7f;
-   pad_len = (index  112) ? (112 - index) : ((128+112) - index);
-   crypto_sha512_update(desc, padding, pad_len);
-
-   /* Append length (before padding) */
-   crypto_sha512_update(desc, (const u8 *)bits, sizeof(bits));
-
-   /* Store state in digest */
-   for (i = 0; i  8; i++)
-   dst[i] = cpu_to_be64(sctx-state[i]);
-
-   /* Zeroize sensitive information. */
-   memset(sctx, 0, sizeof(struct sha512_state));
-
-   return 0;
+   sha512_base_do_finalize(desc, sha512_generic_block_fn);
+   return sha512_base_finish(desc, hash);
 }
 
-static int sha384_final(struct shash_desc *desc, u8 *hash)
+int crypto_sha512_finup(struct shash_desc *desc, const u8 *data,
+   unsigned int len, u8 *hash)
 {
-   u8 D[64];
-
-   sha512_final(desc, D);
-
-   memcpy(hash, D, 48);
-   memzero_explicit(D, 64);
-
-   return 0;
+   sha512_base_do_update(desc, data, len, sha512_generic_block_fn);
+   return sha512_final(desc, hash);
 }
+EXPORT_SYMBOL(crypto_sha512_finup);
 
 static struct shash_alg sha512_algs[2] = { {

[PATCH v4 04/16] crypto: sha1-generic: move to generic glue implementation

2015-04-09 Thread Ard Biesheuvel
This updated the generic SHA-1 implementation to use the generic
shared SHA-1 glue code.

It also implements a .finup hook crypto_sha1_finup() and exports
it to other modules. The import and export() functions and the
.statesize member are dropped, since the default implementation
is perfectly suitable for this module.

Signed-off-by: Ard Biesheuvel ard.biesheu...@linaro.org
---
 crypto/sha1_generic.c | 102 ++
 include/crypto/sha.h  |   3 ++
 2 files changed, 23 insertions(+), 82 deletions(-)

diff --git a/crypto/sha1_generic.c b/crypto/sha1_generic.c
index a3e50c37eb6f..39e3acc438d9 100644
--- a/crypto/sha1_generic.c
+++ b/crypto/sha1_generic.c
@@ -23,111 +23,49 @@
 #include linux/cryptohash.h
 #include linux/types.h
 #include crypto/sha.h
+#include crypto/sha1_base.h
 #include asm/byteorder.h
 
-static int sha1_init(struct shash_desc *desc)
+static void sha1_generic_block_fn(struct sha1_state *sst, u8 const *src,
+ int blocks)
 {
-   struct sha1_state *sctx = shash_desc_ctx(desc);
+   u32 temp[SHA_WORKSPACE_WORDS];
 
-   *sctx = (struct sha1_state){
-   .state = { SHA1_H0, SHA1_H1, SHA1_H2, SHA1_H3, SHA1_H4 },
-   };
-
-   return 0;
+   while (blocks--) {
+   sha_transform(sst-state, src, temp);
+   src += SHA1_BLOCK_SIZE;
+   }
+   memzero_explicit(temp, sizeof(temp));
 }
 
 int crypto_sha1_update(struct shash_desc *desc, const u8 *data,
-   unsigned int len)
+  unsigned int len)
 {
-   struct sha1_state *sctx = shash_desc_ctx(desc);
-   unsigned int partial, done;
-   const u8 *src;
-
-   partial = sctx-count % SHA1_BLOCK_SIZE;
-   sctx-count += len;
-   done = 0;
-   src = data;
-
-   if ((partial + len) = SHA1_BLOCK_SIZE) {
-   u32 temp[SHA_WORKSPACE_WORDS];
-
-   if (partial) {
-   done = -partial;
-   memcpy(sctx-buffer + partial, data,
-  done + SHA1_BLOCK_SIZE);
-   src = sctx-buffer;
-   }
-
-   do {
-   sha_transform(sctx-state, src, temp);
-   done += SHA1_BLOCK_SIZE;
-   src = data + done;
-   } while (done + SHA1_BLOCK_SIZE = len);
-
-   memzero_explicit(temp, sizeof(temp));
-   partial = 0;
-   }
-   memcpy(sctx-buffer + partial, src, len - done);
-
-   return 0;
+   return sha1_base_do_update(desc, data, len, sha1_generic_block_fn);
 }
 EXPORT_SYMBOL(crypto_sha1_update);
 
-
-/* Add padding and return the message digest. */
 static int sha1_final(struct shash_desc *desc, u8 *out)
 {
-   struct sha1_state *sctx = shash_desc_ctx(desc);
-   __be32 *dst = (__be32 *)out;
-   u32 i, index, padlen;
-   __be64 bits;
-   static const u8 padding[64] = { 0x80, };
-
-   bits = cpu_to_be64(sctx-count  3);
-
-   /* Pad out to 56 mod 64 */
-   index = sctx-count  0x3f;
-   padlen = (index  56) ? (56 - index) : ((64+56) - index);
-   crypto_sha1_update(desc, padding, padlen);
-
-   /* Append length */
-   crypto_sha1_update(desc, (const u8 *)bits, sizeof(bits));
-
-   /* Store state in digest */
-   for (i = 0; i  5; i++)
-   dst[i] = cpu_to_be32(sctx-state[i]);
-
-   /* Wipe context */
-   memset(sctx, 0, sizeof *sctx);
-
-   return 0;
+   sha1_base_do_finalize(desc, sha1_generic_block_fn);
+   return sha1_base_finish(desc, out);
 }
 
-static int sha1_export(struct shash_desc *desc, void *out)
+int crypto_sha1_finup(struct shash_desc *desc, const u8 *data,
+ unsigned int len, u8 *out)
 {
-   struct sha1_state *sctx = shash_desc_ctx(desc);
-
-   memcpy(out, sctx, sizeof(*sctx));
-   return 0;
-}
-
-static int sha1_import(struct shash_desc *desc, const void *in)
-{
-   struct sha1_state *sctx = shash_desc_ctx(desc);
-
-   memcpy(sctx, in, sizeof(*sctx));
-   return 0;
+   sha1_base_do_update(desc, data, len, sha1_generic_block_fn);
+   return sha1_final(desc, out);
 }
+EXPORT_SYMBOL(crypto_sha1_finup);
 
 static struct shash_alg alg = {
.digestsize =   SHA1_DIGEST_SIZE,
-   .init   =   sha1_init,
+   .init   =   sha1_base_init,
.update =   crypto_sha1_update,
.final  =   sha1_final,
-   .export =   sha1_export,
-   .import =   sha1_import,
+   .finup  =   crypto_sha1_finup,
.descsize   =   sizeof(struct sha1_state),
-   .statesize  =   sizeof(struct sha1_state),
.base   =   {
.cra_name   =   sha1,
.cra_driver_name=   sha1-generic,
diff --git a/include/crypto/sha.h 

[PATCH v4 15/16] crypto/x86: move SHA-224/256 SSSE3 implementation to base layer

2015-04-09 Thread Ard Biesheuvel
This removes all the boilerplate from the existing implementation,
and replaces it with calls into the base layer. It also changes the
prototypes of the core asm functions to be compatible with the base
prototype

  void (sha256_block_fn)(struct sha256_state *sst, u8 const *src, int blocks)

so that they can be passed to the base layer directly.

Signed-off-by: Ard Biesheuvel ard.biesheu...@linaro.org
---
 arch/x86/crypto/sha256-avx-asm.S|  10 +-
 arch/x86/crypto/sha256-avx2-asm.S   |  10 +-
 arch/x86/crypto/sha256-ssse3-asm.S  |  10 +-
 arch/x86/crypto/sha256_ssse3_glue.c | 193 +++-
 4 files changed, 50 insertions(+), 173 deletions(-)

diff --git a/arch/x86/crypto/sha256-avx-asm.S b/arch/x86/crypto/sha256-avx-asm.S
index 642f15687a0a..92b3b5d75ba9 100644
--- a/arch/x86/crypto/sha256-avx-asm.S
+++ b/arch/x86/crypto/sha256-avx-asm.S
@@ -96,10 +96,10 @@ SHUF_DC00 = %xmm12  # shuffle xDxC - DC00
 BYTE_FLIP_MASK = %xmm13
 
 NUM_BLKS = %rdx   # 3rd arg
-CTX = %rsi# 2nd arg
-INP = %rdi# 1st arg
+INP = %rsi# 2nd arg
+CTX = %rdi# 1st arg
 
-SRND = %rdi   # clobbers INP
+SRND = %rsi   # clobbers INP
 c = %ecx
 d = %r8d
 e = %edx
@@ -342,8 +342,8 @@ a = TMP_
 
 
 ## void sha256_transform_avx(void *input_data, UINT32 digest[8], UINT64 
num_blks)
-## arg 1 : pointer to input data
-## arg 2 : pointer to digest
+## arg 1 : pointer to digest
+## arg 2 : pointer to input data
 ## arg 3 : Num blocks
 
 .text
diff --git a/arch/x86/crypto/sha256-avx2-asm.S 
b/arch/x86/crypto/sha256-avx2-asm.S
index 9e86944c539d..570ec5ec62d7 100644
--- a/arch/x86/crypto/sha256-avx2-asm.S
+++ b/arch/x86/crypto/sha256-avx2-asm.S
@@ -91,12 +91,12 @@ BYTE_FLIP_MASK = %ymm13
 X_BYTE_FLIP_MASK = %xmm13 # XMM version of BYTE_FLIP_MASK
 
 NUM_BLKS = %rdx# 3rd arg
-CTX= %rsi  # 2nd arg
-INP= %rdi  # 1st arg
+INP= %rsi  # 2nd arg
+CTX= %rdi  # 1st arg
 c  = %ecx
 d  = %r8d
 e   = %edx # clobbers NUM_BLKS
-y3 = %edi  # clobbers INP
+y3 = %esi  # clobbers INP
 
 
 TBL= %rbp
@@ -523,8 +523,8 @@ STACK_SIZE  = _RSP  + _RSP_SIZE
 
 
 ## void sha256_transform_rorx(void *input_data, UINT32 digest[8], UINT64 
num_blks)
-## arg 1 : pointer to input data
-## arg 2 : pointer to digest
+## arg 1 : pointer to digest
+## arg 2 : pointer to input data
 ## arg 3 : Num blocks
 
 .text
diff --git a/arch/x86/crypto/sha256-ssse3-asm.S 
b/arch/x86/crypto/sha256-ssse3-asm.S
index f833b74d902b..2cedc44e8121 100644
--- a/arch/x86/crypto/sha256-ssse3-asm.S
+++ b/arch/x86/crypto/sha256-ssse3-asm.S
@@ -88,10 +88,10 @@ SHUF_DC00 = %xmm11  # shuffle xDxC - DC00
 BYTE_FLIP_MASK = %xmm12
 
 NUM_BLKS = %rdx   # 3rd arg
-CTX = %rsi# 2nd arg
-INP = %rdi# 1st arg
+INP = %rsi# 2nd arg
+CTX = %rdi# 1st arg
 
-SRND = %rdi   # clobbers INP
+SRND = %rsi   # clobbers INP
 c = %ecx
 d = %r8d
 e = %edx
@@ -348,8 +348,8 @@ a = TMP_
 
 
 ## void sha256_transform_ssse3(void *input_data, UINT32 digest[8], UINT64 
num_blks)
-## arg 1 : pointer to input data
-## arg 2 : pointer to digest
+## arg 1 : pointer to digest
+## arg 2 : pointer to input data
 ## arg 3 : Num blocks
 
 .text
diff --git a/arch/x86/crypto/sha256_ssse3_glue.c 
b/arch/x86/crypto/sha256_ssse3_glue.c
index 8fad72f4dfd2..ccc338881ee8 100644
--- a/arch/x86/crypto/sha256_ssse3_glue.c
+++ b/arch/x86/crypto/sha256_ssse3_glue.c
@@ -36,195 +36,74 @@
 #include linux/cryptohash.h
 #include linux/types.h
 #include crypto/sha.h
-#include asm/byteorder.h
+#include crypto/sha256_base.h
 #include asm/i387.h
 #include asm/xcr.h
 #include asm/xsave.h
 #include linux/string.h
 
-asmlinkage void sha256_transform_ssse3(const char *data, u32 *digest,
-u64 rounds);
+asmlinkage void sha256_transform_ssse3(u32 *digest, const char *data,
+  u64 rounds);
 #ifdef CONFIG_AS_AVX
-asmlinkage void sha256_transform_avx(const char *data, u32 *digest,
+asmlinkage void sha256_transform_avx(u32 *digest, const char *data,
 u64 rounds);
 #endif
 #ifdef CONFIG_AS_AVX2
-asmlinkage void sha256_transform_rorx(const char *data, u32 *digest,
-u64 rounds);
+asmlinkage void sha256_transform_rorx(u32 *digest, const char *data,
+ u64 rounds);
 #endif
 
-static asmlinkage void (*sha256_transform_asm)(const char *, u32 *, u64);
-
-
-static int sha256_ssse3_init(struct shash_desc *desc)
-{
-   struct 

[PATCH v4 03/16] crypto: sha512: implement base layer for SHA-512

2015-04-09 Thread Ard Biesheuvel
To reduce the number of copies of boilerplate code throughout
the tree, this patch implements generic glue for the SHA-512
algorithm. This allows a specific arch or hardware implementation
to only implement the special handling that it needs.

The users need to supply an implementation of

  void (sha512_block_fn)(struct sha512_state *sst, u8 const *src, int blocks)

and pass it to the SHA-512 base functions. For easy casting between the
prototype above and existing block functions that take a 'u64 state[]'
as their first argument, the 'state' member of struct sha512_state is
moved to the base of the struct.

Signed-off-by: Ard Biesheuvel ard.biesheu...@linaro.org
---
 include/crypto/sha.h |   2 +-
 include/crypto/sha512_base.h | 131 +++
 2 files changed, 132 insertions(+), 1 deletion(-)
 create mode 100644 include/crypto/sha512_base.h

diff --git a/include/crypto/sha.h b/include/crypto/sha.h
index a75bc80cc776..05e82cbc4d8f 100644
--- a/include/crypto/sha.h
+++ b/include/crypto/sha.h
@@ -77,8 +77,8 @@ struct sha256_state {
 };
 
 struct sha512_state {
-   u64 count[2];
u64 state[SHA512_DIGEST_SIZE / 8];
+   u64 count[2];
u8 buf[SHA512_BLOCK_SIZE];
 };
 
diff --git a/include/crypto/sha512_base.h b/include/crypto/sha512_base.h
new file mode 100644
index ..6c5341e005ea
--- /dev/null
+++ b/include/crypto/sha512_base.h
@@ -0,0 +1,131 @@
+/*
+ * sha512_base.h - core logic for SHA-512 implementations
+ *
+ * Copyright (C) 2015 Linaro Ltd ard.biesheu...@linaro.org
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include crypto/internal/hash.h
+#include crypto/sha.h
+#include linux/crypto.h
+#include linux/module.h
+
+#include asm/unaligned.h
+
+typedef void (sha512_block_fn)(struct sha512_state *sst, u8 const *src,
+  int blocks);
+
+static inline int sha384_base_init(struct shash_desc *desc)
+{
+   struct sha512_state *sctx = shash_desc_ctx(desc);
+
+   sctx-state[0] = SHA384_H0;
+   sctx-state[1] = SHA384_H1;
+   sctx-state[2] = SHA384_H2;
+   sctx-state[3] = SHA384_H3;
+   sctx-state[4] = SHA384_H4;
+   sctx-state[5] = SHA384_H5;
+   sctx-state[6] = SHA384_H6;
+   sctx-state[7] = SHA384_H7;
+   sctx-count[0] = sctx-count[1] = 0;
+
+   return 0;
+}
+
+static inline int sha512_base_init(struct shash_desc *desc)
+{
+   struct sha512_state *sctx = shash_desc_ctx(desc);
+
+   sctx-state[0] = SHA512_H0;
+   sctx-state[1] = SHA512_H1;
+   sctx-state[2] = SHA512_H2;
+   sctx-state[3] = SHA512_H3;
+   sctx-state[4] = SHA512_H4;
+   sctx-state[5] = SHA512_H5;
+   sctx-state[6] = SHA512_H6;
+   sctx-state[7] = SHA512_H7;
+   sctx-count[0] = sctx-count[1] = 0;
+
+   return 0;
+}
+
+static inline int sha512_base_do_update(struct shash_desc *desc,
+   const u8 *data,
+   unsigned int len,
+   sha512_block_fn *block_fn)
+{
+   struct sha512_state *sctx = shash_desc_ctx(desc);
+   unsigned int partial = sctx-count[0] % SHA512_BLOCK_SIZE;
+
+   sctx-count[0] += len;
+   if (sctx-count[0]  len)
+   sctx-count[1]++;
+
+   if (unlikely((partial + len) = SHA512_BLOCK_SIZE)) {
+   int blocks;
+
+   if (partial) {
+   int p = SHA512_BLOCK_SIZE - partial;
+
+   memcpy(sctx-buf + partial, data, p);
+   data += p;
+   len -= p;
+
+   block_fn(sctx, sctx-buf, 1);
+   }
+
+   blocks = len / SHA512_BLOCK_SIZE;
+   len %= SHA512_BLOCK_SIZE;
+
+   if (blocks) {
+   block_fn(sctx, data, blocks);
+   data += blocks * SHA512_BLOCK_SIZE;
+   }
+   partial = 0;
+   }
+   if (len)
+   memcpy(sctx-buf + partial, data, len);
+
+   return 0;
+}
+
+static inline int sha512_base_do_finalize(struct shash_desc *desc,
+ sha512_block_fn *block_fn)
+{
+   const int bit_offset = SHA512_BLOCK_SIZE - sizeof(__be64[2]);
+   struct sha512_state *sctx = shash_desc_ctx(desc);
+   __be64 *bits = (__be64 *)(sctx-buf + bit_offset);
+   unsigned int partial = sctx-count[0] % SHA512_BLOCK_SIZE;
+
+   sctx-buf[partial++] = 0x80;
+   if (partial  bit_offset) {
+   memset(sctx-buf + partial, 0x0, SHA512_BLOCK_SIZE - partial);
+   partial = 0;
+
+   block_fn(sctx, sctx-buf, 1);
+   }
+
+   memset(sctx-buf + partial, 0x0, bit_offset - partial);
+   bits[0] = cpu_to_be64(sctx-count[1]  3 | sctx-count[0]  61);
+   

[PATCH v4 11/16] crypto/arm: move SHA-224/256 ARMv8 implementation to base layer

2015-04-09 Thread Ard Biesheuvel
This removes all the boilerplate from the existing implementation,
and replaces it with calls into the base layer.

Signed-off-by: Ard Biesheuvel ard.biesheu...@linaro.org
---
 arch/arm/crypto/Kconfig|   2 +-
 arch/arm/crypto/sha2-ce-core.S |  19 ++---
 arch/arm/crypto/sha2-ce-glue.c | 155 +
 3 files changed, 39 insertions(+), 137 deletions(-)

diff --git a/arch/arm/crypto/Kconfig b/arch/arm/crypto/Kconfig
index 5ed98bc6f95d..a267529d9577 100644
--- a/arch/arm/crypto/Kconfig
+++ b/arch/arm/crypto/Kconfig
@@ -39,7 +39,7 @@ config CRYPTO_SHA1_ARM_CE
 config CRYPTO_SHA2_ARM_CE
tristate SHA-224/256 digest algorithm (ARM v8 Crypto Extensions)
depends on KERNEL_MODE_NEON
-   select CRYPTO_SHA256
+   select CRYPTO_SHA256_ARM
select CRYPTO_HASH
help
  SHA-256 secure hash standard (DFIPS 180-2) implemented
diff --git a/arch/arm/crypto/sha2-ce-core.S b/arch/arm/crypto/sha2-ce-core.S
index 96af09fe957b..87ec11a5f405 100644
--- a/arch/arm/crypto/sha2-ce-core.S
+++ b/arch/arm/crypto/sha2-ce-core.S
@@ -69,27 +69,18 @@
.word   0x90befffa, 0xa4506ceb, 0xbef9a3f7, 0xc67178f2
 
/*
-* void sha2_ce_transform(int blocks, u8 const *src, u32 *state,
-*u8 *head);
+* void sha2_ce_transform(struct sha256_state *sst, u8 const *src,
+ int blocks);
 */
 ENTRY(sha2_ce_transform)
/* load state */
-   vld1.32 {dga-dgb}, [r2]
-
-   /* load partial input (if supplied) */
-   teq r3, #0
-   beq 0f
-   vld1.32 {q0-q1}, [r3]!
-   vld1.32 {q2-q3}, [r3]
-   teq r0, #0
-   b   1f
+   vld1.32 {dga-dgb}, [r0]
 
/* load input */
 0: vld1.32 {q0-q1}, [r1]!
vld1.32 {q2-q3}, [r1]!
-   subsr0, r0, #1
+   subsr2, r2, #1
 
-1:
 #ifndef CONFIG_CPU_BIG_ENDIAN
vrev32.8q0, q0
vrev32.8q1, q1
@@ -129,6 +120,6 @@ ENTRY(sha2_ce_transform)
bne 0b
 
/* store new state */
-   vst1.32 {dga-dgb}, [r2]
+   vst1.32 {dga-dgb}, [r0]
bx  lr
 ENDPROC(sha2_ce_transform)
diff --git a/arch/arm/crypto/sha2-ce-glue.c b/arch/arm/crypto/sha2-ce-glue.c
index 0449eca3aab3..0755b2d657f3 100644
--- a/arch/arm/crypto/sha2-ce-glue.c
+++ b/arch/arm/crypto/sha2-ce-glue.c
@@ -10,6 +10,7 @@
 
 #include crypto/internal/hash.h
 #include crypto/sha.h
+#include crypto/sha256_base.h
 #include linux/crypto.h
 #include linux/module.h
 
@@ -18,148 +19,60 @@
 #include asm/neon.h
 #include asm/unaligned.h
 
+#include sha256_glue.h
+
 MODULE_DESCRIPTION(SHA-224/SHA-256 secure hash using ARMv8 Crypto 
Extensions);
 MODULE_AUTHOR(Ard Biesheuvel ard.biesheu...@linaro.org);
 MODULE_LICENSE(GPL v2);
 
-asmlinkage void sha2_ce_transform(int blocks, u8 const *src, u32 *state,
- u8 *head);
+asmlinkage void sha2_ce_transform(struct sha256_state *sst, u8 const *src,
+ int blocks);
 
-static int sha224_init(struct shash_desc *desc)
+static int sha2_ce_update(struct shash_desc *desc, const u8 *data,
+ unsigned int len)
 {
struct sha256_state *sctx = shash_desc_ctx(desc);
 
-   *sctx = (struct sha256_state){
-   .state = {
-   SHA224_H0, SHA224_H1, SHA224_H2, SHA224_H3,
-   SHA224_H4, SHA224_H5, SHA224_H6, SHA224_H7,
-   }
-   };
-   return 0;
-}
+   if (!may_use_simd() ||
+   (sctx-count % SHA256_BLOCK_SIZE) + len  SHA256_BLOCK_SIZE)
+   return crypto_sha256_arm_update(desc, data, len);
 
-static int sha256_init(struct shash_desc *desc)
-{
-   struct sha256_state *sctx = shash_desc_ctx(desc);
+   kernel_neon_begin();
+   sha256_base_do_update(desc, data, len,
+ (sha256_block_fn *)sha2_ce_transform);
+   kernel_neon_end();
 
-   *sctx = (struct sha256_state){
-   .state = {
-   SHA256_H0, SHA256_H1, SHA256_H2, SHA256_H3,
-   SHA256_H4, SHA256_H5, SHA256_H6, SHA256_H7,
-   }
-   };
return 0;
 }
 
-static int sha2_update(struct shash_desc *desc, const u8 *data,
-  unsigned int len)
+static int sha2_ce_finup(struct shash_desc *desc, const u8 *data,
+unsigned int len, u8 *out)
 {
-   struct sha256_state *sctx = shash_desc_ctx(desc);
-   unsigned int partial;
-
if (!may_use_simd())
-   return crypto_sha256_update(desc, data, len);
-
-   partial = sctx-count % SHA256_BLOCK_SIZE;
-   sctx-count += len;
-
-   if ((partial + len) = SHA256_BLOCK_SIZE) {
-   int blocks;
-
-   if (partial) {
-  

Re: [PATCH 0/2] crypto: add new driver for Marvell CESA

2015-04-09 Thread Andrew Lunn
On Thu, Apr 09, 2015 at 04:58:41PM +0200, Boris Brezillon wrote:
 Hello,
 
 This is an attempt to replace the mv_cesa driver by a new one to address
 some limitations of the existing driver.
 From a performance and CPU load point of view the most important
 limitation is the lack of DMA support, thus preventing us from chaining
 crypto operations.
 
 I know we usually try to adapt existing drivers instead of replacing them
 by new ones, but after trying to refactor the mv_cesa driver I realized it
 would take longer than writing an new one from scratch.

Hi Boris

What is the situation with backwards compatibility? I see you have
kept the old compatibility string, and added lots of new ones, and
deprecated some properties. Will an old DT blob still work?
 
 Thanks
Andrew
--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v4 16/16] crypto/x86: move SHA-384/512 SSSE3 implementation to base layer

2015-04-09 Thread Ard Biesheuvel
This removes all the boilerplate from the existing implementation,
and replaces it with calls into the base layer.  It also changes the
prototypes of the core asm functions to be compatible with the base
prototype

  void (sha512_block_fn)(struct sha256_state *sst, u8 const *src, int blocks)

so that they can be passed to the base layer directly.

Signed-off-by: Ard Biesheuvel ard.biesheu...@linaro.org
---
 arch/x86/crypto/sha512-avx-asm.S|   6 +-
 arch/x86/crypto/sha512-avx2-asm.S   |   6 +-
 arch/x86/crypto/sha512-ssse3-asm.S  |   6 +-
 arch/x86/crypto/sha512_ssse3_glue.c | 202 +++-
 4 files changed, 44 insertions(+), 176 deletions(-)

diff --git a/arch/x86/crypto/sha512-avx-asm.S b/arch/x86/crypto/sha512-avx-asm.S
index 974dde9bc6cd..565274d6a641 100644
--- a/arch/x86/crypto/sha512-avx-asm.S
+++ b/arch/x86/crypto/sha512-avx-asm.S
@@ -54,9 +54,9 @@
 
 # Virtual Registers
 # ARG1
-msg= %rdi
+digest = %rdi
 # ARG2
-digest = %rsi
+msg= %rsi
 # ARG3
 msglen = %rdx
 T1 = %rcx
@@ -271,7 +271,7 @@ frame_size = frame_GPRSAVE + GPRSAVE_SIZE
 .endm
 
 
-# void sha512_transform_avx(const void* M, void* D, u64 L)
+# void sha512_transform_avx(void* D, const void* M, u64 L)
 # Purpose: Updates the SHA512 digest stored at D with the message stored in M.
 # The size of the message pointed to by M must be an integer multiple of SHA512
 # message blocks.
diff --git a/arch/x86/crypto/sha512-avx2-asm.S 
b/arch/x86/crypto/sha512-avx2-asm.S
index 568b96105f5c..a4771dcd1fcf 100644
--- a/arch/x86/crypto/sha512-avx2-asm.S
+++ b/arch/x86/crypto/sha512-avx2-asm.S
@@ -70,9 +70,9 @@ XFER  = YTMP0
 BYTE_FLIP_MASK  = %ymm9
 
 # 1st arg
-INP = %rdi
+CTX = %rdi
 # 2nd arg
-CTX = %rsi
+INP = %rsi
 # 3rd arg
 NUM_BLKS= %rdx
 
@@ -562,7 +562,7 @@ frame_size = frame_GPRSAVE + GPRSAVE_SIZE
 .endm
 
 
-# void sha512_transform_rorx(const void* M, void* D, uint64_t L)#
+# void sha512_transform_rorx(void* D, const void* M, uint64_t L)#
 # Purpose: Updates the SHA512 digest stored at D with the message stored in M.
 # The size of the message pointed to by M must be an integer multiple of SHA512
 #   message blocks.
diff --git a/arch/x86/crypto/sha512-ssse3-asm.S 
b/arch/x86/crypto/sha512-ssse3-asm.S
index fb56855d51f5..e610e29cbc81 100644
--- a/arch/x86/crypto/sha512-ssse3-asm.S
+++ b/arch/x86/crypto/sha512-ssse3-asm.S
@@ -53,9 +53,9 @@
 
 # Virtual Registers
 # ARG1
-msg =  %rdi
+digest =   %rdi
 # ARG2
-digest =   %rsi
+msg =  %rsi
 # ARG3
 msglen =   %rdx
 T1 =   %rcx
@@ -269,7 +269,7 @@ frame_size = frame_GPRSAVE + GPRSAVE_SIZE
 .endm
 
 
-# void sha512_transform_ssse3(const void* M, void* D, u64 L)#
+# void sha512_transform_ssse3(void* D, const void* M, u64 L)#
 # Purpose: Updates the SHA512 digest stored at D with the message stored in M.
 # The size of the message pointed to by M must be an integer multiple of SHA512
 #   message blocks.
diff --git a/arch/x86/crypto/sha512_ssse3_glue.c 
b/arch/x86/crypto/sha512_ssse3_glue.c
index 0b6af26832bf..d9fa4c1e063f 100644
--- a/arch/x86/crypto/sha512_ssse3_glue.c
+++ b/arch/x86/crypto/sha512_ssse3_glue.c
@@ -34,205 +34,75 @@
 #include linux/cryptohash.h
 #include linux/types.h
 #include crypto/sha.h
-#include asm/byteorder.h
+#include crypto/sha512_base.h
 #include asm/i387.h
 #include asm/xcr.h
 #include asm/xsave.h
 
 #include linux/string.h
 
-asmlinkage void sha512_transform_ssse3(const char *data, u64 *digest,
-u64 rounds);
+asmlinkage void sha512_transform_ssse3(u64 *digest, const char *data,
+  u64 rounds);
 #ifdef CONFIG_AS_AVX
-asmlinkage void sha512_transform_avx(const char *data, u64 *digest,
+asmlinkage void sha512_transform_avx(u64 *digest, const char *data,
 u64 rounds);
 #endif
 #ifdef CONFIG_AS_AVX2
-asmlinkage void sha512_transform_rorx(const char *data, u64 *digest,
-u64 rounds);
+asmlinkage void sha512_transform_rorx(u64 *digest, const char *data,
+ u64 rounds);
 #endif
 
-static asmlinkage void (*sha512_transform_asm)(const char *, u64 *, u64);
-
-
-static int sha512_ssse3_init(struct shash_desc *desc)
-{
-   struct sha512_state *sctx = shash_desc_ctx(desc);
-
-   sctx-state[0] = SHA512_H0;
-   sctx-state[1] = SHA512_H1;
-   sctx-state[2] = SHA512_H2;
-   sctx-state[3] = SHA512_H3;
-   sctx-state[4] = SHA512_H4;
-   sctx-state[5] = SHA512_H5;
-   sctx-state[6] = SHA512_H6;
-   sctx-state[7] = SHA512_H7;
-   sctx-count[0] = sctx-count[1] = 0;
-
-   return 0;
-}
+static void (*sha512_transform_asm)(u64 *, const char *, u64);
 

[PATCH v4 14/16] crypto/x86: move SHA-1 SSSE3 implementation to base layer

2015-04-09 Thread Ard Biesheuvel
This removes all the boilerplate from the existing implementation,
and replaces it with calls into the base layer.

Signed-off-by: Ard Biesheuvel ard.biesheu...@linaro.org
---
 arch/x86/crypto/sha1_ssse3_glue.c | 139 --
 1 file changed, 28 insertions(+), 111 deletions(-)

diff --git a/arch/x86/crypto/sha1_ssse3_glue.c 
b/arch/x86/crypto/sha1_ssse3_glue.c
index 6c20fe04a738..33d1b9dc14cc 100644
--- a/arch/x86/crypto/sha1_ssse3_glue.c
+++ b/arch/x86/crypto/sha1_ssse3_glue.c
@@ -28,7 +28,7 @@
 #include linux/cryptohash.h
 #include linux/types.h
 #include crypto/sha.h
-#include asm/byteorder.h
+#include crypto/sha1_base.h
 #include asm/i387.h
 #include asm/xcr.h
 #include asm/xsave.h
@@ -44,132 +44,51 @@ asmlinkage void sha1_transform_avx(u32 *digest, const char 
*data,
 #define SHA1_AVX2_BLOCK_OPTSIZE4   /* optimal 4*64 bytes of SHA1 
blocks */
 
 asmlinkage void sha1_transform_avx2(u32 *digest, const char *data,
-   unsigned int rounds);
+   unsigned int rounds);
 #endif
 
-static asmlinkage void (*sha1_transform_asm)(u32 *, const char *, unsigned 
int);
-
-
-static int sha1_ssse3_init(struct shash_desc *desc)
-{
-   struct sha1_state *sctx = shash_desc_ctx(desc);
-
-   *sctx = (struct sha1_state){
-   .state = { SHA1_H0, SHA1_H1, SHA1_H2, SHA1_H3, SHA1_H4 },
-   };
-
-   return 0;
-}
-
-static int __sha1_ssse3_update(struct shash_desc *desc, const u8 *data,
-  unsigned int len, unsigned int partial)
-{
-   struct sha1_state *sctx = shash_desc_ctx(desc);
-   unsigned int done = 0;
-
-   sctx-count += len;
-
-   if (partial) {
-   done = SHA1_BLOCK_SIZE - partial;
-   memcpy(sctx-buffer + partial, data, done);
-   sha1_transform_asm(sctx-state, sctx-buffer, 1);
-   }
-
-   if (len - done = SHA1_BLOCK_SIZE) {
-   const unsigned int rounds = (len - done) / SHA1_BLOCK_SIZE;
-
-   sha1_transform_asm(sctx-state, data + done, rounds);
-   done += rounds * SHA1_BLOCK_SIZE;
-   }
-
-   memcpy(sctx-buffer, data + done, len - done);
-
-   return 0;
-}
+static void (*sha1_transform_asm)(u32 *, const char *, unsigned int);
 
 static int sha1_ssse3_update(struct shash_desc *desc, const u8 *data,
 unsigned int len)
 {
struct sha1_state *sctx = shash_desc_ctx(desc);
-   unsigned int partial = sctx-count % SHA1_BLOCK_SIZE;
-   int res;
 
-   /* Handle the fast case right here */
-   if (partial + len  SHA1_BLOCK_SIZE) {
-   sctx-count += len;
-   memcpy(sctx-buffer + partial, data, len);
+   if (!irq_fpu_usable() ||
+   (sctx-count % SHA1_BLOCK_SIZE) + len  SHA1_BLOCK_SIZE)
+   return crypto_sha1_update(desc, data, len);
 
-   return 0;
-   }
+   /* make sure casting to sha1_block_fn() is safe */
+   BUILD_BUG_ON(offsetof(struct sha1_state, state) != 0);
 
-   if (!irq_fpu_usable()) {
-   res = crypto_sha1_update(desc, data, len);
-   } else {
-   kernel_fpu_begin();
-   res = __sha1_ssse3_update(desc, data, len, partial);
-   kernel_fpu_end();
-   }
-
-   return res;
-}
-
-
-/* Add padding and return the message digest. */
-static int sha1_ssse3_final(struct shash_desc *desc, u8 *out)
-{
-   struct sha1_state *sctx = shash_desc_ctx(desc);
-   unsigned int i, index, padlen;
-   __be32 *dst = (__be32 *)out;
-   __be64 bits;
-   static const u8 padding[SHA1_BLOCK_SIZE] = { 0x80, };
-
-   bits = cpu_to_be64(sctx-count  3);
-
-   /* Pad out to 56 mod 64 and append length */
-   index = sctx-count % SHA1_BLOCK_SIZE;
-   padlen = (index  56) ? (56 - index) : ((SHA1_BLOCK_SIZE+56) - index);
-   if (!irq_fpu_usable()) {
-   crypto_sha1_update(desc, padding, padlen);
-   crypto_sha1_update(desc, (const u8 *)bits, sizeof(bits));
-   } else {
-   kernel_fpu_begin();
-   /* We need to fill a whole block for __sha1_ssse3_update() */
-   if (padlen = 56) {
-   sctx-count += padlen;
-   memcpy(sctx-buffer + index, padding, padlen);
-   } else {
-   __sha1_ssse3_update(desc, padding, padlen, index);
-   }
-   __sha1_ssse3_update(desc, (const u8 *)bits, sizeof(bits), 56);
-   kernel_fpu_end();
-   }
-
-   /* Store state in digest */
-   for (i = 0; i  5; i++)
-   dst[i] = cpu_to_be32(sctx-state[i]);
-
-   /* Wipe context */
-   memset(sctx, 0, sizeof(*sctx));
+   kernel_fpu_begin();
+   sha1_base_do_update(desc, data, len,
+   (sha1_block_fn *)sha1_transform_asm);
+   kernel_fpu_end();
 
return 0;
 }
 

Re: [PATCH v3] crypto: remove instance when test failed

2015-04-09 Thread Herbert Xu
On Thu, Apr 09, 2015 at 09:36:03AM +0200, Stephan Mueller wrote:

 diff --git a/crypto/algapi.c b/crypto/algapi.c
 index f1d0307..cfca1de 100644
 --- a/crypto/algapi.c
 +++ b/crypto/algapi.c
 @@ -533,6 +533,13 @@ int crypto_register_instance(struct crypto_template 
 *tmpl,
   if (IS_ERR(larval))
   goto unlock;
  
 + err = -EAGAIN;
 + if (unlikely(!crypto_mod_get(inst-alg))) {
 + up_write(crypto_alg_sem);
 + crypto_unregister_instance(inst);
 + goto err;
 + }

Just grab the reference count as soon as you enter the function
and then you can unconditionally drop the reference count at the
end.  If you fail to grab it then just return an error and the
caller will free it for you.

Cheers,
-- 
Email: Herbert Xu herb...@gondor.apana.org.au
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html