Re: [PATCH 0/8] crypto: arm64+generic - SHA3/SHA-512/SM-3 roundup

2018-01-26 Thread Herbert Xu
On Fri, Jan 19, 2018 at 12:04:32PM +, Ard Biesheuvel wrote:
> This supersedes all outstanding patches from me related to SHA-3, SHA-512
> or SM-3.
> 
> - fix a correctness issue in the SHA-3 code (#1) and a performance issue (#2),
>   the first one is definitely a -stable candidate, the second one potentially
>   as well
> - patches #3 and #4 make the generic SHA-3 code reusable as a fallback for the
>   accelerated code introduced in #6
> - patch #5 adds some SHA-3 test cases
> - patch #6 implements SHA-3 using special arm64 instructions
> - patch #7 implements the Chinese SM3 secure hash algorithm using special
>   arm64 instructions
> - patch #8 contains some fixes for the recently queued SHA-512 arm64 code.
> 
> Ard Biesheuvel (8):
>   crypto/generic: sha3 - fixes for alignment and big endian operation
>   crypto/generic: sha3: rewrite KECCAK transform to help the compiler
> optimize
>   crypto/generic: sha3 - simplify code
>   crypto/generic: sha3 - export init/update/final routines
>   crypto/testmgr: sha3 - add new testcases
>   crypto/arm64: sha3 - new v8.2 Crypto Extensions implementation
>   crypto/arm64: sm3 - new v8.2 Crypto Extensions implementation
>   crypto/arm64: sha512 - fix/improve new v8.2 Crypto Extensions code
> 
>  arch/arm64/crypto/Kconfig  |  12 +
>  arch/arm64/crypto/Makefile |   6 +
>  arch/arm64/crypto/sha3-ce-core.S   | 210 
>  arch/arm64/crypto/sha3-ce-glue.c   | 161 ++
>  arch/arm64/crypto/sha512-ce-core.S | 145 +++---
>  arch/arm64/crypto/sha512-glue.c|   1 +
>  arch/arm64/crypto/sm3-ce-core.S| 141 +
>  arch/arm64/crypto/sm3-ce-glue.c|  92 
>  crypto/sha3_generic.c  | 332 ++--
>  crypto/testmgr.h   | 550 
>  include/crypto/sha3.h  |   6 +-
>  11 files changed, 1413 insertions(+), 243 deletions(-)
>  create mode 100644 arch/arm64/crypto/sha3-ce-core.S
>  create mode 100644 arch/arm64/crypto/sha3-ce-glue.c
>  create mode 100644 arch/arm64/crypto/sm3-ce-core.S
>  create mode 100644 arch/arm64/crypto/sm3-ce-glue.c

All applied.  Thanks.
-- 
Email: Herbert Xu <herb...@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt


Re: [PATCH 0/8] crypto: arm64+generic - SHA3/SHA-512/SM-3 roundup

2018-01-22 Thread Ard Biesheuvel
On 22 January 2018 at 20:51, Arnd Bergmann  wrote:
> On Mon, Jan 22, 2018 at 3:54 PM, Arnd Bergmann  wrote:
>> On Fri, Jan 19, 2018 at 1:04 PM, Ard Biesheuvel
>> I'm doing a little more randconfig build testing here now, will write back by
>> the end of today in the unlikely case that if I find anything else wrong.
>
> Did a few hundred randconfig builds, everything fine as expected.
>

Thanks Arnd


Re: [PATCH 0/8] crypto: arm64+generic - SHA3/SHA-512/SM-3 roundup

2018-01-22 Thread Arnd Bergmann
On Mon, Jan 22, 2018 at 3:54 PM, Arnd Bergmann  wrote:
> On Fri, Jan 19, 2018 at 1:04 PM, Ard Biesheuvel
> I'm doing a little more randconfig build testing here now, will write back by
> the end of today in the unlikely case that if I find anything else wrong.

Did a few hundred randconfig builds, everything fine as expected.

   Arnd


Re: [PATCH 0/8] crypto: arm64+generic - SHA3/SHA-512/SM-3 roundup

2018-01-22 Thread Arnd Bergmann
On Fri, Jan 19, 2018 at 1:04 PM, Ard Biesheuvel
<ard.biesheu...@linaro.org> wrote:
> This supersedes all outstanding patches from me related to SHA-3, SHA-512
> or SM-3.
>
> - fix a correctness issue in the SHA-3 code (#1) and a performance issue (#2),
>   the first one is definitely a -stable candidate, the second one potentially
>   as well
> - patches #3 and #4 make the generic SHA-3 code reusable as a fallback for the
>   accelerated code introduced in #6
> - patch #5 adds some SHA-3 test cases
> - patch #6 implements SHA-3 using special arm64 instructions
> - patch #7 implements the Chinese SM3 secure hash algorithm using special
>   arm64 instructions
> - patch #8 contains some fixes for the recently queued SHA-512 arm64 code.
>
> Ard Biesheuvel (8):
>   crypto/generic: sha3 - fixes for alignment and big endian operation
>   crypto/generic: sha3: rewrite KECCAK transform to help the compiler
> optimize
>   crypto/generic: sha3 - simplify code
>   crypto/generic: sha3 - export init/update/final routines
>   crypto/testmgr: sha3 - add new testcases
>   crypto/arm64: sha3 - new v8.2 Crypto Extensions implementation
>   crypto/arm64: sm3 - new v8.2 Crypto Extensions implementation
>   crypto/arm64: sha512 - fix/improve new v8.2 Crypto Extensions code

I can confirm that patch 8 fixes the issues I saw earlier, it would be
good to have that merged quickly.

I'm doing a little more randconfig build testing here now, will write back by
the end of today in the unlikely case that if I find anything else wrong.

  Arnd


[PATCH 0/8] crypto: arm64+generic - SHA3/SHA-512/SM-3 roundup

2018-01-19 Thread Ard Biesheuvel
This supersedes all outstanding patches from me related to SHA-3, SHA-512
or SM-3.

- fix a correctness issue in the SHA-3 code (#1) and a performance issue (#2),
  the first one is definitely a -stable candidate, the second one potentially
  as well
- patches #3 and #4 make the generic SHA-3 code reusable as a fallback for the
  accelerated code introduced in #6
- patch #5 adds some SHA-3 test cases
- patch #6 implements SHA-3 using special arm64 instructions
- patch #7 implements the Chinese SM3 secure hash algorithm using special
  arm64 instructions
- patch #8 contains some fixes for the recently queued SHA-512 arm64 code.

Ard Biesheuvel (8):
  crypto/generic: sha3 - fixes for alignment and big endian operation
  crypto/generic: sha3: rewrite KECCAK transform to help the compiler
optimize
  crypto/generic: sha3 - simplify code
  crypto/generic: sha3 - export init/update/final routines
  crypto/testmgr: sha3 - add new testcases
  crypto/arm64: sha3 - new v8.2 Crypto Extensions implementation
  crypto/arm64: sm3 - new v8.2 Crypto Extensions implementation
  crypto/arm64: sha512 - fix/improve new v8.2 Crypto Extensions code

 arch/arm64/crypto/Kconfig  |  12 +
 arch/arm64/crypto/Makefile |   6 +
 arch/arm64/crypto/sha3-ce-core.S   | 210 
 arch/arm64/crypto/sha3-ce-glue.c   | 161 ++
 arch/arm64/crypto/sha512-ce-core.S | 145 +++---
 arch/arm64/crypto/sha512-glue.c|   1 +
 arch/arm64/crypto/sm3-ce-core.S| 141 +
 arch/arm64/crypto/sm3-ce-glue.c|  92 
 crypto/sha3_generic.c  | 332 ++--
 crypto/testmgr.h   | 550 
 include/crypto/sha3.h  |   6 +-
 11 files changed, 1413 insertions(+), 243 deletions(-)
 create mode 100644 arch/arm64/crypto/sha3-ce-core.S
 create mode 100644 arch/arm64/crypto/sha3-ce-glue.c
 create mode 100644 arch/arm64/crypto/sm3-ce-core.S
 create mode 100644 arch/arm64/crypto/sm3-ce-glue.c

-- 
2.11.0



Re: [RFT PATCH] crypto: arm64 - implement SHA-512 using special instructions

2018-01-18 Thread Herbert Xu
On Tue, Jan 09, 2018 at 06:23:02PM +, Ard Biesheuvel wrote:
> Implement the SHA-512 using the new special instructions that have
> been introduced as an optional extension in ARMv8.2.
> 
> Signed-off-by: Ard Biesheuvel <ard.biesheu...@linaro.org>

Patch applied.  Thanks.
-- 
Email: Herbert Xu <herb...@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt


Re: [RFT PATCH] crypto: arm64 - implement SHA-512 using special instructions

2018-01-16 Thread Ard Biesheuvel
On 16 January 2018 at 08:16, Steve Capper <steve.cap...@arm.com> wrote:
> On Tue, Jan 09, 2018 at 06:23:02PM +, Ard Biesheuvel wrote:
>> Implement the SHA-512 using the new special instructions that have
>> been introduced as an optional extension in ARMv8.2.
>
> Hi Ard,
> I have tested this applied on top of 4.15-rc7 running in a model.
>
> For sha512-ce, I verified that tcrypt successfully passed tests for modes:
> 12, 104, 189, 190, 306, 406 and 424.
> (and I double checked that sha512-ce was being used).
>
> Similarly for sha384-ce, I tested the following modes:
> 11, 103, 187, 188, 305 and 405.
>
> Also, I had:
> CONFIG_CRYPTO_MANAGER_DISABLE_TESTS=n
>
> So FWIW, please feel free to add:
> Tested-by: Steve Capper <steve.cap...@arm.com>
>

Excellent! Thanks a lot Steve.


Re: [RFT PATCH] crypto: arm64 - implement SHA-512 using special instructions

2018-01-16 Thread Steve Capper
On Tue, Jan 09, 2018 at 06:23:02PM +, Ard Biesheuvel wrote:
> Implement the SHA-512 using the new special instructions that have
> been introduced as an optional extension in ARMv8.2.

Hi Ard,
I have tested this applied on top of 4.15-rc7 running in a model.

For sha512-ce, I verified that tcrypt successfully passed tests for modes:
12, 104, 189, 190, 306, 406 and 424.
(and I double checked that sha512-ce was being used).

Similarly for sha384-ce, I tested the following modes:
11, 103, 187, 188, 305 and 405. 

Also, I had:
CONFIG_CRYPTO_MANAGER_DISABLE_TESTS=n

So FWIW, please feel free to add:
Tested-by: Steve Capper <steve.cap...@arm.com>

Cheers,
-- 
Steve


[RFT PATCH] crypto: arm64 - implement SHA-512 using special instructions

2018-01-09 Thread Ard Biesheuvel
Implement the SHA-512 using the new special instructions that have
been introduced as an optional extension in ARMv8.2.

Signed-off-by: Ard Biesheuvel <ard.biesheu...@linaro.org>
---
 arch/arm64/crypto/Kconfig  |   6 ++
 arch/arm64/crypto/Makefile |   3 +
 arch/arm64/crypto/sha512-ce-core.S | 207 +
 arch/arm64/crypto/sha512-ce-glue.c | 119 +
 4 files changed, 335 insertions(+)
 create mode 100644 arch/arm64/crypto/sha512-ce-core.S
 create mode 100644 arch/arm64/crypto/sha512-ce-glue.c

diff --git a/arch/arm64/crypto/Kconfig b/arch/arm64/crypto/Kconfig
index 70c517aa4501..aad288f4b9de 100644
--- a/arch/arm64/crypto/Kconfig
+++ b/arch/arm64/crypto/Kconfig
@@ -29,6 +29,12 @@ config CRYPTO_SHA2_ARM64_CE
select CRYPTO_HASH
select CRYPTO_SHA256_ARM64
 
+config CRYPTO_SHA512_ARM64_CE
+   tristate "SHA-384/SHA-512 digest algorithm (ARMv8 Crypto Extensions)"
+   depends on KERNEL_MODE_NEON
+   select CRYPTO_HASH
+   select CRYPTO_SHA512_ARM64
+
 config CRYPTO_GHASH_ARM64_CE
tristate "GHASH/AES-GCM using ARMv8 Crypto Extensions"
depends on KERNEL_MODE_NEON
diff --git a/arch/arm64/crypto/Makefile b/arch/arm64/crypto/Makefile
index b5edc5918c28..d7573d31d397 100644
--- a/arch/arm64/crypto/Makefile
+++ b/arch/arm64/crypto/Makefile
@@ -14,6 +14,9 @@ sha1-ce-y := sha1-ce-glue.o sha1-ce-core.o
 obj-$(CONFIG_CRYPTO_SHA2_ARM64_CE) += sha2-ce.o
 sha2-ce-y := sha2-ce-glue.o sha2-ce-core.o
 
+obj-$(CONFIG_CRYPTO_SHA512_ARM64_CE) += sha512-ce.o
+sha512-ce-y := sha512-ce-glue.o sha512-ce-core.o
+
 obj-$(CONFIG_CRYPTO_GHASH_ARM64_CE) += ghash-ce.o
 ghash-ce-y := ghash-ce-glue.o ghash-ce-core.o
 
diff --git a/arch/arm64/crypto/sha512-ce-core.S 
b/arch/arm64/crypto/sha512-ce-core.S
new file mode 100644
index ..6c562f8df0b0
--- /dev/null
+++ b/arch/arm64/crypto/sha512-ce-core.S
@@ -0,0 +1,207 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * sha512-ce-core.S - core SHA-384/SHA-512 transform using v8 Crypto Extensions
+ *
+ * Copyright (C) 2018 Linaro Ltd <ard.biesheu...@linaro.org>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include 
+#include 
+
+   //
+   // Temporary - for testing only. binutils has no support for these yet
+   //
+   .irp
b,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31
+   .set.Lq\b, \b
+   .set.Lv\b\().2d, \b
+   .endr
+
+   .macro  sha512h, rd, rn, rm
+   .inst   0xce608000 | .L\rd | (.L\rn << 5) | (.L\rm << 16)
+   .endm
+
+   .macro  sha512h2, rd, rn, rm
+   .inst   0xce608400 | .L\rd | (.L\rn << 5) | (.L\rm << 16)
+   .endm
+
+   .macro  sha512su0, rd, rn
+   .inst   0xcec08000 | .L\rd | (.L\rn << 5)
+   .endm
+
+   .macro  sha512su1, rd, rn, rm
+   .inst   0xce608800 | .L\rd | (.L\rn << 5) | (.L\rm << 16)
+   .endm
+
+   .text
+   .arch   armv8-a+crypto
+
+   /*
+* The SHA-512 round constants
+*/
+   .align  4
+.Lsha512_rcon:
+   .quad   0x428a2f98d728ae22, 0x7137449123ef65cd
+   .quad   0xb5c0fbcfec4d3b2f, 0xe9b5dba58189dbbc
+   .quad   0x3956c25bf348b538, 0x59f111f1b605d019
+   .quad   0x923f82a4af194f9b, 0xab1c5ed5da6d8118
+   .quad   0xd807aa98a3030242, 0x12835b0145706fbe
+   .quad   0x243185be4ee4b28c, 0x550c7dc3d5ffb4e2
+   .quad   0x72be5d74f27b896f, 0x80deb1fe3b1696b1
+   .quad   0x9bdc06a725c71235, 0xc19bf174cf692694
+   .quad   0xe49b69c19ef14ad2, 0xefbe4786384f25e3
+   .quad   0x0fc19dc68b8cd5b5, 0x240ca1cc77ac9c65
+   .quad   0x2de92c6f592b0275, 0x4a7484aa6ea6e483
+   .quad   0x5cb0a9dcbd41fbd4, 0x76f988da831153b5
+   .quad   0x983e5152ee66dfab, 0xa831c66d2db43210
+   .quad   0xb00327c898fb213f, 0xbf597fc7beef0ee4
+   .quad   0xc6e00bf33da88fc2, 0xd5a79147930aa725
+   .quad   0x06ca6351e003826f, 0x142929670a0e6e70
+   .quad   0x27b70a8546d22ffc, 0x2e1b21385c26c926
+   .quad   0x4d2c6dfc5ac42aed, 0x53380d139d95b3df
+   .quad   0x650a73548baf63de, 0x766a0abb3c77b2a8
+   .quad   0x81c2c92e47edaee6, 0x92722c851482353b
+   .quad   0xa2bfe8a14cf10364, 0xa81a664bbc423001
+   .quad   0xc24b8b70d0f89791, 0xc76c51a30654be30
+   .quad   0xd192e819d6ef5218, 0xd69906245565a910
+   .quad   0xf40e35855771202a, 0x106aa07032bbd1b8
+   .quad   0x19a4c116b8d2d0c8, 0x1e376c085141ab53
+ 

Re: [PATCH 1/2] crypto: arm/sha512 - accelerated SHA-512 using ARM generic ASM and NEON

2015-05-11 Thread Ard Biesheuvel
On 11 May 2015 at 08:59, Herbert Xu herb...@gondor.apana.org.au wrote:
 On Fri, May 08, 2015 at 10:46:21AM +0200, Ard Biesheuvel wrote:

 diff --git a/arch/arm/crypto/Kconfig b/arch/arm/crypto/Kconfig
 index 8da2207b0072..08b5fb85bff5 100644
 --- a/arch/arm/crypto/Kconfig
 +++ b/arch/arm/crypto/Kconfig
 @@ -53,20 +53,14 @@ config CRYPTO_SHA256_ARM
 SHA-256 secure hash standard (DFIPS 180-2) implemented
 using optimized ARM assembler and NEON, when available.

 -config CRYPTO_SHA512_ARM_NEON
 - tristate SHA384 and SHA512 digest algorithm (ARM NEON)
 - depends on KERNEL_MODE_NEON
 - select CRYPTO_SHA512
 +config CRYPTO_SHA512_ARM
 + tristate SHA-384/512 digest algorithm (ARM-asm and NEON)
 + depends on !CPU_V7M
   select CRYPTO_HASH
 + depends on !CPU_V7M

 This looks like a duplicate, no?

Yes, you are right. Let me figure out what's going on and send you a
new version.
--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] crypto/arm: accelerated SHA-512 using ARM generic ASM and NEON

2015-04-13 Thread Ard Biesheuvel
On 13 April 2015 at 06:13, Herbert Xu herb...@gondor.apana.org.au wrote:
 On Sat, Apr 11, 2015 at 09:15:10PM +0200, Ard Biesheuvel wrote:

 @Herbert: could you please apply this onto cryptodev before sending
 out your pull request for v4.1?

 Done.

 And please disregard $subject, I will post a v3 with a similar
 'depends on' added (unless you're ok to add it yourself)

 Please resend the patch.  But I'll process it after the merge
 window closes so no hurry.



OK, all fine.

Thanks Herbert!
--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] crypto/arm: accelerated SHA-512 using ARM generic ASM and NEON

2015-04-12 Thread Herbert Xu
On Sat, Apr 11, 2015 at 09:15:10PM +0200, Ard Biesheuvel wrote:

 @Herbert: could you please apply this onto cryptodev before sending
 out your pull request for v4.1?

Done.

 And please disregard $subject, I will post a v3 with a similar
 'depends on' added (unless you're ok to add it yourself)

Please resend the patch.  But I'll process it after the merge
window closes so no hurry.

Thanks,
-- 
Email: Herbert Xu herb...@gondor.apana.org.au
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] crypto/arm: accelerated SHA-512 using ARM generic ASM and NEON

2015-04-11 Thread Ard Biesheuvel
On 11 April 2015 at 10:48, Arnd Bergmann a...@arndb.de wrote:
 On Saturday 11 April 2015 09:35:15 Ard Biesheuvel wrote:
 On 10 April 2015 at 22:23, Ard Biesheuvel ard.biesheu...@linaro.org wrote:
 
  On 10 apr. 2015, at 22:08, Arnd Bergmann a...@arndb.de wrote:
 
  On Friday 10 April 2015 16:29:08 Ard Biesheuvel wrote:
  +#if __ARM_MAX_ARCH__=7
  +.arch  armv7-a
  +.fpu   neon
  +
 
  This will cause a build failure on an ARMv7-M build, which is incompatible
  with .arch  armv7-a and .fpu   neon.
 
 
  The neon part depends on CONFIG_KERNEL_MODE_NEON, which would never be set 
  for that platform, I suppose

 On second thought, that is not entirely true, but I still don't think
 there is problem here:
 the .arch/.fpu declarations are understood perfectly fine by GAS when
 targeting ARMv7-M. Only, it will emit code that is incompatible with
 it. However, this code is invoked at runtime only if a NEON unit has
 been detected, so it will just be ignored on ARMv7-M

 Sorry, I should have collected my findings better when replying to your
 patch. What I remembered was that I saw a problem in this area in linux-next
 with randconfig builds, but I did not notice that it was for a different
 file, and I had not double-checked that patch yet in order to send it
 out.

 See below for the patch I'm currently using for my randconfig builder.
 Before you apply this, please check again which files are affected, as
 it's possible that there are other modules that suffer from the same
 problem.

 Arnd

 8---
 Subject: [PATCH] ARM: crypto: avoid sha256 code on ARMv7-M

 The sha256 assembly implementation can deal with all architecture levels
 from ARMv4 to ARMv7-A, but not with ARMv7-M. Enabling it in an
 ARMv7-M kernel results in this build failure:

 arm-linux-gnueabi-ld: error: arch/arm/crypto/sha256_glue.o: Conflicting 
 architecture profiles M/A
 arm-linux-gnueabi-ld: failed to merge target specific data of file 
 arch/arm/crypto/sha256_glue.o

 This adds a Kconfig dependency to prevent the code from being disabled

... enabled?

 for ARMv7-M.

 Signed-off-by: Arnd Bergmann a...@arndb.de

 diff --git a/arch/arm/crypto/Kconfig b/arch/arm/crypto/Kconfig
 index 458729d2ce22..76463da22f81 100644
 --- a/arch/arm/crypto/Kconfig
 +++ b/arch/arm/crypto/Kconfig
 @@ -49,6 +49,7 @@ config CRYPTO_SHA2_ARM_CE
  config CRYPTO_SHA256_ARM
 tristate SHA-224/256 digest algorithm (ARM-asm and NEON)
 select CRYPTO_HASH
 +   depends on !CPU_V7M
 help
   SHA-256 secure hash standard (DFIPS 180-2) implemented
   using optimized ARM assembler and NEON, when available.


@Herbert: could you please apply this onto cryptodev before sending
out your pull request for v4.1?
And please disregard $subject, I will post a v3 with a similar
'depends on' added (unless you're ok to add it yourself)

Thanks,
Ard.
--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] crypto/arm: accelerated SHA-512 using ARM generic ASM and NEON

2015-04-11 Thread Arnd Bergmann
On Saturday 11 April 2015 12:27:18 Ard Biesheuvel wrote:
 
 Ah i see it now. The new Sha256 module as well as the Sha512 i am proposing 
 here both use a single .o containing the !neon and neon implementations, and 
 only expose the latter if KERNEL_MODE_NEON. This way, we can use the exact 
 same .S file ad OpenSSL, which should mean less maintenance burden.
 
 So your fix seems the most appropriate, even if it means v7m won't be able to 
 use the !neon part either.
 
 

Ok, sounds good. If someone wants to change that code to work on ARMv7-M,
they probably want that fix in the openssl version as well, and then we
can update both.

Arnd
--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] crypto/arm: accelerated SHA-512 using ARM generic ASM and NEON

2015-04-11 Thread Arnd Bergmann
On Saturday 11 April 2015 09:35:15 Ard Biesheuvel wrote:
 On 10 April 2015 at 22:23, Ard Biesheuvel ard.biesheu...@linaro.org wrote:
 
  On 10 apr. 2015, at 22:08, Arnd Bergmann a...@arndb.de wrote:
 
  On Friday 10 April 2015 16:29:08 Ard Biesheuvel wrote:
  +#if __ARM_MAX_ARCH__=7
  +.arch  armv7-a
  +.fpu   neon
  +
 
  This will cause a build failure on an ARMv7-M build, which is incompatible
  with .arch  armv7-a and .fpu   neon.
 
 
  The neon part depends on CONFIG_KERNEL_MODE_NEON, which would never be set 
  for that platform, I suppose
 
 On second thought, that is not entirely true, but I still don't think
 there is problem here:
 the .arch/.fpu declarations are understood perfectly fine by GAS when
 targeting ARMv7-M. Only, it will emit code that is incompatible with
 it. However, this code is invoked at runtime only if a NEON unit has
 been detected, so it will just be ignored on ARMv7-M

Sorry, I should have collected my findings better when replying to your
patch. What I remembered was that I saw a problem in this area in linux-next
with randconfig builds, but I did not notice that it was for a different
file, and I had not double-checked that patch yet in order to send it
out.

See below for the patch I'm currently using for my randconfig builder.
Before you apply this, please check again which files are affected, as
it's possible that there are other modules that suffer from the same
problem.

Arnd

8---
Subject: [PATCH] ARM: crypto: avoid sha256 code on ARMv7-M

The sha256 assembly implementation can deal with all architecture levels
from ARMv4 to ARMv7-A, but not with ARMv7-M. Enabling it in an
ARMv7-M kernel results in this build failure:

arm-linux-gnueabi-ld: error: arch/arm/crypto/sha256_glue.o: Conflicting 
architecture profiles M/A
arm-linux-gnueabi-ld: failed to merge target specific data of file 
arch/arm/crypto/sha256_glue.o

This adds a Kconfig dependency to prevent the code from being disabled
for ARMv7-M.

Signed-off-by: Arnd Bergmann a...@arndb.de

diff --git a/arch/arm/crypto/Kconfig b/arch/arm/crypto/Kconfig
index 458729d2ce22..76463da22f81 100644
--- a/arch/arm/crypto/Kconfig
+++ b/arch/arm/crypto/Kconfig
@@ -49,6 +49,7 @@ config CRYPTO_SHA2_ARM_CE
 config CRYPTO_SHA256_ARM
tristate SHA-224/256 digest algorithm (ARM-asm and NEON)
select CRYPTO_HASH
+   depends on !CPU_V7M
help
  SHA-256 secure hash standard (DFIPS 180-2) implemented
  using optimized ARM assembler and NEON, when available.

--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] crypto/arm: accelerated SHA-512 using ARM generic ASM and NEON

2015-04-11 Thread Ard Biesheuvel
On 10 April 2015 at 22:23, Ard Biesheuvel ard.biesheu...@linaro.org wrote:

 On 10 apr. 2015, at 22:08, Arnd Bergmann a...@arndb.de wrote:

 On Friday 10 April 2015 16:29:08 Ard Biesheuvel wrote:
 +#if __ARM_MAX_ARCH__=7
 +.arch  armv7-a
 +.fpu   neon
 +

 This will cause a build failure on an ARMv7-M build, which is incompatible
 with .arch  armv7-a and .fpu   neon.


 The neon part depends on CONFIG_KERNEL_MODE_NEON, which would never be set 
 for that platform, I suppose

On second thought, that is not entirely true, but I still don't think
there is problem here:
the .arch/.fpu declarations are understood perfectly fine by GAS when
targeting ARMv7-M. Only, it will emit code that is incompatible with
it. However, this code is invoked at runtime only if a NEON unit has
been detected, so it will just be ignored on ARMv7-M

-- 
Ard.
--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] crypto/arm: accelerated SHA-512 using ARM generic ASM and NEON

2015-04-11 Thread Ard Biesheuvel

 On 11 apr. 2015, at 10:48, Arnd Bergmann a...@arndb.de wrote:
 
 On Saturday 11 April 2015 09:35:15 Ard Biesheuvel wrote:
 On 10 April 2015 at 22:23, Ard Biesheuvel ard.biesheu...@linaro.org wrote:
 
 On 10 apr. 2015, at 22:08, Arnd Bergmann a...@arndb.de wrote:
 
 On Friday 10 April 2015 16:29:08 Ard Biesheuvel wrote:
 +#if __ARM_MAX_ARCH__=7
 +.arch  armv7-a
 +.fpu   neon
 +
 
 This will cause a build failure on an ARMv7-M build, which is incompatible
 with .arch  armv7-a and .fpu   neon.
 
 The neon part depends on CONFIG_KERNEL_MODE_NEON, which would never be set 
 for that platform, I suppose
 
 On second thought, that is not entirely true, but I still don't think
 there is problem here:
 the .arch/.fpu declarations are understood perfectly fine by GAS when
 targeting ARMv7-M. Only, it will emit code that is incompatible with
 it. However, this code is invoked at runtime only if a NEON unit has
 been detected, so it will just be ignored on ARMv7-M
 
 Sorry, I should have collected my findings better when replying to your
 patch. What I remembered was that I saw a problem in this area in linux-next
 with randconfig builds, but I did not notice that it was for a different
 file, and I had not double-checked that patch yet in order to send it
 out.
 
 See below for the patch I'm currently using for my randconfig builder.
 Before you apply this, please check again which files are affected, as
 it's possible that there are other modules that suffer from the same
 problem.
 

Ah i see it now. The new Sha256 module as well as the Sha512 i am proposing 
here both use a single .o containing the !neon and neon implementations, and 
only expose the latter if KERNEL_MODE_NEON. This way, we can use the exact same 
.S file ad OpenSSL, which should mean less maintenance burden.

So your fix seems the most appropriate, even if it means v7m won't be able to 
use the !neon part either.


Arnd
 
 8---
 Subject: [PATCH] ARM: crypto: avoid sha256 code on ARMv7-M
 
 The sha256 assembly implementation can deal with all architecture levels
 from ARMv4 to ARMv7-A, but not with ARMv7-M. Enabling it in an
 ARMv7-M kernel results in this build failure:
 
 arm-linux-gnueabi-ld: error: arch/arm/crypto/sha256_glue.o: Conflicting 
 architecture profiles M/A
 arm-linux-gnueabi-ld: failed to merge target specific data of file 
 arch/arm/crypto/sha256_glue.o
 
 This adds a Kconfig dependency to prevent the code from being disabled
 for ARMv7-M.
 
 Signed-off-by: Arnd Bergmann a...@arndb.de
 
 diff --git a/arch/arm/crypto/Kconfig b/arch/arm/crypto/Kconfig
 index 458729d2ce22..76463da22f81 100644
 --- a/arch/arm/crypto/Kconfig
 +++ b/arch/arm/crypto/Kconfig
 @@ -49,6 +49,7 @@ config CRYPTO_SHA2_ARM_CE
 config CRYPTO_SHA256_ARM
tristate SHA-224/256 digest algorithm (ARM-asm and NEON)
select CRYPTO_HASH
 +depends on !CPU_V7M
help
  SHA-256 secure hash standard (DFIPS 180-2) implemented
  using optimized ARM assembler and NEON, when available.
 
--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] crypto/arm: accelerated SHA-512 using ARM generic ASM and NEON

2015-04-10 Thread Arnd Bergmann
On Friday 10 April 2015 16:29:08 Ard Biesheuvel wrote:
 +#if __ARM_MAX_ARCH__=7
 +.arch  armv7-a
 +.fpu   neon
 +
 

This will cause a build failure on an ARMv7-M build, which is incompatible
with .arch  armv7-a and .fpu   neon.

Arnd
--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] crypto/arm: accelerated SHA-512 using ARM generic ASM and NEON

2015-04-10 Thread Ard Biesheuvel

 On 10 apr. 2015, at 22:08, Arnd Bergmann a...@arndb.de wrote:
 
 On Friday 10 April 2015 16:29:08 Ard Biesheuvel wrote:
 +#if __ARM_MAX_ARCH__=7
 +.arch  armv7-a
 +.fpu   neon
 +
 
 This will cause a build failure on an ARMv7-M build, which is incompatible
 with .arch  armv7-a and .fpu   neon.
 

The neon part depends on CONFIG_KERNEL_MODE_NEON, which would never be set for 
that platform, I suppose--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v4 03/16] crypto: sha512: implement base layer for SHA-512

2015-04-09 Thread Ard Biesheuvel
To reduce the number of copies of boilerplate code throughout
the tree, this patch implements generic glue for the SHA-512
algorithm. This allows a specific arch or hardware implementation
to only implement the special handling that it needs.

The users need to supply an implementation of

  void (sha512_block_fn)(struct sha512_state *sst, u8 const *src, int blocks)

and pass it to the SHA-512 base functions. For easy casting between the
prototype above and existing block functions that take a 'u64 state[]'
as their first argument, the 'state' member of struct sha512_state is
moved to the base of the struct.

Signed-off-by: Ard Biesheuvel ard.biesheu...@linaro.org
---
 include/crypto/sha.h |   2 +-
 include/crypto/sha512_base.h | 131 +++
 2 files changed, 132 insertions(+), 1 deletion(-)
 create mode 100644 include/crypto/sha512_base.h

diff --git a/include/crypto/sha.h b/include/crypto/sha.h
index a75bc80cc776..05e82cbc4d8f 100644
--- a/include/crypto/sha.h
+++ b/include/crypto/sha.h
@@ -77,8 +77,8 @@ struct sha256_state {
 };
 
 struct sha512_state {
-   u64 count[2];
u64 state[SHA512_DIGEST_SIZE / 8];
+   u64 count[2];
u8 buf[SHA512_BLOCK_SIZE];
 };
 
diff --git a/include/crypto/sha512_base.h b/include/crypto/sha512_base.h
new file mode 100644
index ..6c5341e005ea
--- /dev/null
+++ b/include/crypto/sha512_base.h
@@ -0,0 +1,131 @@
+/*
+ * sha512_base.h - core logic for SHA-512 implementations
+ *
+ * Copyright (C) 2015 Linaro Ltd ard.biesheu...@linaro.org
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include crypto/internal/hash.h
+#include crypto/sha.h
+#include linux/crypto.h
+#include linux/module.h
+
+#include asm/unaligned.h
+
+typedef void (sha512_block_fn)(struct sha512_state *sst, u8 const *src,
+  int blocks);
+
+static inline int sha384_base_init(struct shash_desc *desc)
+{
+   struct sha512_state *sctx = shash_desc_ctx(desc);
+
+   sctx-state[0] = SHA384_H0;
+   sctx-state[1] = SHA384_H1;
+   sctx-state[2] = SHA384_H2;
+   sctx-state[3] = SHA384_H3;
+   sctx-state[4] = SHA384_H4;
+   sctx-state[5] = SHA384_H5;
+   sctx-state[6] = SHA384_H6;
+   sctx-state[7] = SHA384_H7;
+   sctx-count[0] = sctx-count[1] = 0;
+
+   return 0;
+}
+
+static inline int sha512_base_init(struct shash_desc *desc)
+{
+   struct sha512_state *sctx = shash_desc_ctx(desc);
+
+   sctx-state[0] = SHA512_H0;
+   sctx-state[1] = SHA512_H1;
+   sctx-state[2] = SHA512_H2;
+   sctx-state[3] = SHA512_H3;
+   sctx-state[4] = SHA512_H4;
+   sctx-state[5] = SHA512_H5;
+   sctx-state[6] = SHA512_H6;
+   sctx-state[7] = SHA512_H7;
+   sctx-count[0] = sctx-count[1] = 0;
+
+   return 0;
+}
+
+static inline int sha512_base_do_update(struct shash_desc *desc,
+   const u8 *data,
+   unsigned int len,
+   sha512_block_fn *block_fn)
+{
+   struct sha512_state *sctx = shash_desc_ctx(desc);
+   unsigned int partial = sctx-count[0] % SHA512_BLOCK_SIZE;
+
+   sctx-count[0] += len;
+   if (sctx-count[0]  len)
+   sctx-count[1]++;
+
+   if (unlikely((partial + len) = SHA512_BLOCK_SIZE)) {
+   int blocks;
+
+   if (partial) {
+   int p = SHA512_BLOCK_SIZE - partial;
+
+   memcpy(sctx-buf + partial, data, p);
+   data += p;
+   len -= p;
+
+   block_fn(sctx, sctx-buf, 1);
+   }
+
+   blocks = len / SHA512_BLOCK_SIZE;
+   len %= SHA512_BLOCK_SIZE;
+
+   if (blocks) {
+   block_fn(sctx, data, blocks);
+   data += blocks * SHA512_BLOCK_SIZE;
+   }
+   partial = 0;
+   }
+   if (len)
+   memcpy(sctx-buf + partial, data, len);
+
+   return 0;
+}
+
+static inline int sha512_base_do_finalize(struct shash_desc *desc,
+ sha512_block_fn *block_fn)
+{
+   const int bit_offset = SHA512_BLOCK_SIZE - sizeof(__be64[2]);
+   struct sha512_state *sctx = shash_desc_ctx(desc);
+   __be64 *bits = (__be64 *)(sctx-buf + bit_offset);
+   unsigned int partial = sctx-count[0] % SHA512_BLOCK_SIZE;
+
+   sctx-buf[partial++] = 0x80;
+   if (partial  bit_offset) {
+   memset(sctx-buf + partial, 0x0, SHA512_BLOCK_SIZE - partial);
+   partial = 0;
+
+   block_fn(sctx, sctx-buf, 1);
+   }
+
+   memset(sctx-buf + partial, 0x0, bit_offset - partial);
+   bits[0] = cpu_to_be64(sctx-count[1]  3 | sctx-count[0]  61);
+   bits[1

[PATCH v3 03/16] crypto: sha512: implement base layer for SHA-512

2015-04-07 Thread Ard Biesheuvel
To reduce the number of copies of boilerplate code throughout
the tree, this patch implements generic glue for the SHA-512
algorithm. This allows a specific arch or hardware implementation
to only implement the special handling that it needs.

Signed-off-by: Ard Biesheuvel ard.biesheu...@linaro.org
---
 include/crypto/sha512_base.h | 147 +++
 1 file changed, 147 insertions(+)
 create mode 100644 include/crypto/sha512_base.h

diff --git a/include/crypto/sha512_base.h b/include/crypto/sha512_base.h
new file mode 100644
index ..44351f781dce
--- /dev/null
+++ b/include/crypto/sha512_base.h
@@ -0,0 +1,147 @@
+/*
+ * sha512_base.h - core logic for SHA-512 implementations
+ *
+ * Copyright (C) 2015 Linaro Ltd ard.biesheu...@linaro.org
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include crypto/internal/hash.h
+#include crypto/sha.h
+#include linux/crypto.h
+#include linux/module.h
+
+#include asm/unaligned.h
+
+typedef void (sha512_block_fn)(int blocks, u8 const *src, u64 *state,
+  const u8 *head, void *p);
+
+static inline int sha384_base_init(struct shash_desc *desc)
+{
+   struct sha512_state *sctx = shash_desc_ctx(desc);
+
+   sctx-state[0] = SHA384_H0;
+   sctx-state[1] = SHA384_H1;
+   sctx-state[2] = SHA384_H2;
+   sctx-state[3] = SHA384_H3;
+   sctx-state[4] = SHA384_H4;
+   sctx-state[5] = SHA384_H5;
+   sctx-state[6] = SHA384_H6;
+   sctx-state[7] = SHA384_H7;
+   sctx-count[0] = sctx-count[1] = 0;
+
+   return 0;
+}
+
+static inline int sha512_base_init(struct shash_desc *desc)
+{
+   struct sha512_state *sctx = shash_desc_ctx(desc);
+
+   sctx-state[0] = SHA512_H0;
+   sctx-state[1] = SHA512_H1;
+   sctx-state[2] = SHA512_H2;
+   sctx-state[3] = SHA512_H3;
+   sctx-state[4] = SHA512_H4;
+   sctx-state[5] = SHA512_H5;
+   sctx-state[6] = SHA512_H6;
+   sctx-state[7] = SHA512_H7;
+   sctx-count[0] = sctx-count[1] = 0;
+
+   return 0;
+}
+
+static inline int sha512_base_export(struct shash_desc *desc, void *out)
+{
+   struct sha512_state *sctx = shash_desc_ctx(desc);
+   struct sha512_state *dst = out;
+
+   *dst = *sctx;
+
+   return 0;
+}
+
+static inline int sha512_base_import(struct shash_desc *desc, const void *in)
+{
+   struct sha512_state *sctx = shash_desc_ctx(desc);
+   struct sha512_state const *src = in;
+
+   *sctx = *src;
+
+   return 0;
+}
+
+static inline int sha512_base_do_update(struct shash_desc *desc, const u8 
*data,
+   unsigned int len,
+   sha512_block_fn *block_fn, void *p)
+{
+   struct sha512_state *sctx = shash_desc_ctx(desc);
+   unsigned int partial = sctx-count[0] % SHA512_BLOCK_SIZE;
+
+   sctx-count[0] += len;
+   if (sctx-count[0]  len)
+   sctx-count[1]++;
+
+   if (unlikely((partial + len) = SHA512_BLOCK_SIZE)) {
+   int blocks;
+
+   if (partial) {
+   int p = SHA512_BLOCK_SIZE - partial;
+
+   memcpy(sctx-buf + partial, data, p);
+   data += p;
+   len -= p;
+   }
+
+   blocks = len / SHA512_BLOCK_SIZE;
+   len %= SHA512_BLOCK_SIZE;
+
+   block_fn(blocks, data, sctx-state,
+partial ? sctx-buf : NULL, p);
+   data += blocks * SHA512_BLOCK_SIZE;
+   partial = 0;
+   }
+   if (len)
+   memcpy(sctx-buf + partial, data, len);
+
+   return 0;
+}
+
+static inline int sha512_base_do_finalize(struct shash_desc *desc,
+  sha512_block_fn *block_fn, void *p)
+{
+   const int bit_offset = SHA512_BLOCK_SIZE - sizeof(__be64[2]);
+   struct sha512_state *sctx = shash_desc_ctx(desc);
+   __be64 *bits = (__be64 *)(sctx-buf + bit_offset);
+   unsigned int partial = sctx-count[0] % SHA512_BLOCK_SIZE;
+
+   sctx-buf[partial++] = 0x80;
+   if (partial  bit_offset) {
+   memset(sctx-buf + partial, 0x0, SHA512_BLOCK_SIZE - partial);
+   partial = 0;
+
+   block_fn(1, sctx-buf, sctx-state, NULL, p);
+   }
+
+   memset(sctx-buf + partial, 0x0, bit_offset - partial);
+   bits[0] = cpu_to_be64(sctx-count[1]  3 | sctx-count[0]  61);
+   bits[1] = cpu_to_be64(sctx-count[0]  3);
+   block_fn(1, sctx-buf, sctx-state, NULL, p);
+
+   return 0;
+}
+
+static inline int sha512_base_finish(struct shash_desc *desc, u8 *out)
+{
+   unsigned int digest_size = crypto_shash_digestsize(desc-tfm);
+   struct sha512_state *sctx = shash_desc_ctx(desc);
+   __be64 *digest = (__be64 *)out;
+   int i

Re: [PATCH] crypto/arm: accelerated SHA-512 using ARM generic ASM and NEON

2015-04-06 Thread Ard Biesheuvel
On 29 March 2015 at 16:07, Andy Polyakov ap...@openssl.org wrote:
 This updates the SHA-512 NEON module with the faster and more
 versatile implementation from the OpenSSL project. It consists
 of both a NEON and a generic ASM version of the core SHA-512
 transform, where the NEON version reverts to the ASM version
 when invoked in non-process context.

 Performance relative to the generic implementation (measured
 using tcrypt.ko mode=306 sec=1 running on a Cortex-A57 under
 KVM):

   input size  block size  asm neonold neon

   819281921.513.512.69

 One should keep in mind that improvement coefficients vary greatly from
 platform to platform. Normally you *should* observe higher coefficients
 in asm column and *can* observe smaller differences between neon and
 old neon. BTW, 1.51 is unexpectedly low, I wonder which compiler
 version stands for 1.0?

This was Linaro GCC 4.9

 Nor can I replicate difference between neon
 and old neon, I get smaller difference, 17%, on Cortex-A57. Well, I'm
 comparing in user-land, but it shouldn't be that significant at large
 blocks...


That is a bit surprising, indeed. The primary difference is that this
executes under a 32-bit kernel, whereas your testing uses 32-bit
OpenSSL under a 64-bit kernel in 32-bit compatibility mode. I can't
really explain how that should make a difference at all, but it's
worth to be noted.


 Signed-off-by: Ard Biesheuvel ard.biesheu...@linaro.org
 ---

 This should get the same treatment as Sami's sha56 version: I would like
 to wait until the OpenSSL source file hits the upstream repository so that
 I can refer to its sha1 hash in the commit log.

 Update is committed as
 http://git.openssl.org/gitweb/?p=openssl.git;a=commitdiff;h=b1a5d1c652086257930a1f62ae51c9cdee654b2c.
 Note that the file I've initially sent privately was a little bit off.
 Sorry about that. But that little bit is just a commentary update that
 adds performance result for Cortex-A15. So that kernel patch as
 originally posted is 100% functionally equivalent.


Thanks.
--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 resend 01/14] crypto: sha512: implement base layer for SHA-512

2015-03-31 Thread Herbert Xu
On Mon, Mar 30, 2015 at 11:48:20AM +0200, Ard Biesheuvel wrote:
 To reduce the number of copies of boilerplate code throughout
 the tree, this patch implements generic glue for the SHA-512
 algorithm. This allows a specific arch or hardware implementation
 to only implement the special handling that it needs.
 
 Signed-off-by: Ard Biesheuvel ard.biesheu...@linaro.org

Creating yet another module for this is too much.  Please just
add these generic helpers to the generic module.

Thanks,
-- 
Email: Herbert Xu herb...@gondor.apana.org.au
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 01/14] crypto: sha512: implement base layer for SHA-512

2015-03-30 Thread Ard Biesheuvel
To reduce the number of copies of boilerplate code throughout
the tree, this patch implements generic glue for the SHA-512
algorithm. This allows a specific arch or hardware implementation
to only implement the special handling that it needs.

Signed-off-by: Ard Biesheuvel ard.biesheu...@linaro.org
---
 crypto/Kconfig   |   3 ++
 crypto/Makefile  |   1 +
 crypto/sha512_base.c | 143 +++
 include/crypto/sha.h |  20 +++
 4 files changed, 167 insertions(+)
 create mode 100644 crypto/sha512_base.c

diff --git a/crypto/Kconfig b/crypto/Kconfig
index 88639937a934..3400cf4e3cdb 100644
--- a/crypto/Kconfig
+++ b/crypto/Kconfig
@@ -641,6 +641,9 @@ config CRYPTO_SHA256_SPARC64
  SHA-256 secure hash standard (DFIPS 180-2) implemented
  using sparc64 crypto instructions, when available.
 
+config CRYPTO_SHA512_BASE
+   tristate
+
 config CRYPTO_SHA512
tristate SHA384 and SHA512 digest algorithms
select CRYPTO_HASH
diff --git a/crypto/Makefile b/crypto/Makefile
index 97b7d3ac87e7..6174bf2592fe 100644
--- a/crypto/Makefile
+++ b/crypto/Makefile
@@ -45,6 +45,7 @@ obj-$(CONFIG_CRYPTO_RMD256) += rmd256.o
 obj-$(CONFIG_CRYPTO_RMD320) += rmd320.o
 obj-$(CONFIG_CRYPTO_SHA1) += sha1_generic.o
 obj-$(CONFIG_CRYPTO_SHA256) += sha256_generic.o
+obj-$(CONFIG_CRYPTO_SHA512_BASE) += sha512_base.o
 obj-$(CONFIG_CRYPTO_SHA512) += sha512_generic.o
 obj-$(CONFIG_CRYPTO_WP512) += wp512.o
 obj-$(CONFIG_CRYPTO_TGR192) += tgr192.o
diff --git a/crypto/sha512_base.c b/crypto/sha512_base.c
new file mode 100644
index ..9a60829e06c4
--- /dev/null
+++ b/crypto/sha512_base.c
@@ -0,0 +1,143 @@
+/*
+ * sha512_base.c - core logic for SHA-512 implementations
+ *
+ * Copyright (C) 2015 Linaro Ltd ard.biesheu...@linaro.org
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include crypto/internal/hash.h
+#include crypto/sha.h
+#include linux/crypto.h
+#include linux/module.h
+
+#include asm/unaligned.h
+
+int crypto_sha384_base_init(struct shash_desc *desc)
+{
+   static const u64 sha384_init_state[] = {
+   SHA384_H0, SHA384_H1, SHA384_H2, SHA384_H3,
+   SHA384_H4, SHA384_H5, SHA384_H6, SHA384_H7,
+   };
+   struct sha512_state *sctx = shash_desc_ctx(desc);
+
+   memcpy(sctx-state, sha384_init_state, sizeof(sctx-state));
+   sctx-count[0] = sctx-count[1] = 0;
+   return 0;
+}
+EXPORT_SYMBOL(crypto_sha384_base_init);
+
+int crypto_sha512_base_init(struct shash_desc *desc)
+{
+   static const u64 sha512_init_state[] = {
+   SHA512_H0, SHA512_H1, SHA512_H2, SHA512_H3,
+   SHA512_H4, SHA512_H5, SHA512_H6, SHA512_H7,
+   };
+   struct sha512_state *sctx = shash_desc_ctx(desc);
+
+   memcpy(sctx-state, sha512_init_state, sizeof(sctx-state));
+   sctx-count[0] = sctx-count[1] = 0;
+   return 0;
+}
+EXPORT_SYMBOL(crypto_sha512_base_init);
+
+int crypto_sha512_base_export(struct shash_desc *desc, void *out)
+{
+   struct sha512_state *sctx = shash_desc_ctx(desc);
+   struct sha512_state *dst = out;
+
+   *dst = *sctx;
+
+   return 0;
+}
+EXPORT_SYMBOL(crypto_sha512_base_export);
+
+int crypto_sha512_base_import(struct shash_desc *desc, const void *in)
+{
+   struct sha512_state *sctx = shash_desc_ctx(desc);
+   struct sha512_state const *src = in;
+
+   *sctx = *src;
+
+   return 0;
+}
+EXPORT_SYMBOL(crypto_sha512_base_import);
+
+int crypto_sha512_base_do_update(struct shash_desc *desc, const u8 *data,
+unsigned int len, sha512_block_fn *block_fn,
+void *p)
+{
+   struct sha512_state *sctx = shash_desc_ctx(desc);
+   unsigned int partial = sctx-count[0] % SHA512_BLOCK_SIZE;
+
+   sctx-count[0] += len;
+   if (sctx-count[0]  len)
+   sctx-count[1]++;
+
+   if (unlikely((partial + len) = SHA512_BLOCK_SIZE)) {
+   int blocks;
+
+   if (partial) {
+   int p = SHA512_BLOCK_SIZE - partial;
+
+   memcpy(sctx-buf + partial, data, p);
+   data += p;
+   len -= p;
+   }
+
+   blocks = len / SHA512_BLOCK_SIZE;
+   len %= SHA512_BLOCK_SIZE;
+
+   block_fn(blocks, data, sctx-state,
+partial ? sctx-buf : NULL, p);
+   data += blocks * SHA512_BLOCK_SIZE;
+   partial = 0;
+   }
+   if (len)
+   memcpy(sctx-buf + partial, data, len);
+
+   return 0;
+}
+EXPORT_SYMBOL(crypto_sha512_base_do_update);
+
+int crypto_sha512_base_do_finalize(struct shash_desc *desc,
+  sha512_block_fn *block_fn, void *p)
+{
+   const int bit_offset = SHA512_BLOCK_SIZE

[RFC PATCH 6/6] arm/crypto: accelerated SHA-512 using ARM generic ASM and NEON

2015-03-30 Thread Ard Biesheuvel
This updates the SHA-512 NEON module with the faster and more
versatile implementation from the OpenSSL project. It consists
of both a NEON and a generic ASM version of the core SHA-512
transform, where the NEON version reverts to the ASM version
when invoked in non-process context.

Performance relative to the generic implementation (measured
using tcrypt.ko mode=306 sec=1 running on a Cortex-A57 under
KVM):

  input sizeblock size  asm neonold neon

  1616  1.392.542.21
  6416  1.322.332.09
  6464  1.382.532.19
  256   16  1.312.282.06
  256   64  1.382.542.25
  256   256 1.402.772.39
  1024  16  1.292.222.01
  1024  256 1.402.822.45
  1024  10241.412.932.53
  2048  16  1.332.212.00
  2048  256 1.402.842.46
  2048  10241.412.962.55
  2048  20481.412.982.56
  4096  16  1.342.201.99
  4096  256 1.402.842.46
  4096  10241.412.972.56
  4096  40961.413.012.58
  8192  16  1.342.191.99
  8192  256 1.402.852.47
  8192  10241.412.982.56
  8192  40961.412.712.59
  8192  81921.513.512.69

Signed-off-by: Ard Biesheuvel ard.biesheu...@linaro.org
---
 arch/arm/crypto/Kconfig   |8 +
 arch/arm/crypto/Makefile  |8 +-
 arch/arm/crypto/sha512-armv4.pl   |  656 
 arch/arm/crypto/sha512-core.S_shipped | 1814 +
 arch/arm/crypto/sha512-glue.c |  137 +++
 arch/arm/crypto/sha512-neon-glue.c|  111 ++
 arch/arm/crypto/sha512.h  |8 +
 7 files changed, 2741 insertions(+), 1 deletion(-)
 create mode 100644 arch/arm/crypto/sha512-armv4.pl
 create mode 100644 arch/arm/crypto/sha512-core.S_shipped
 create mode 100644 arch/arm/crypto/sha512-glue.c
 create mode 100644 arch/arm/crypto/sha512-neon-glue.c
 create mode 100644 arch/arm/crypto/sha512.h

diff --git a/arch/arm/crypto/Kconfig b/arch/arm/crypto/Kconfig
index 458729d2ce22..6b50c6d77b77 100644
--- a/arch/arm/crypto/Kconfig
+++ b/arch/arm/crypto/Kconfig
@@ -53,6 +53,14 @@ config CRYPTO_SHA256_ARM
  SHA-256 secure hash standard (DFIPS 180-2) implemented
  using optimized ARM assembler and NEON, when available.
 
+config CRYPTO_SHA512_ARM
+   tristate SHA-384/512 digest algorithm (ARM-asm and NEON)
+   select CRYPTO_HASH
+   select CRYPTO_SHA512_BASE
+   help
+ SHA-512 secure hash standard (DFIPS 180-2) implemented
+ using optimized ARM assembler and NEON, when available.
+
 config CRYPTO_SHA512_ARM_NEON
tristate SHA384 and SHA512 digest algorithm (ARM NEON)
depends on KERNEL_MODE_NEON
diff --git a/arch/arm/crypto/Makefile b/arch/arm/crypto/Makefile
index ef46e898f98b..322a6ca999a2 100644
--- a/arch/arm/crypto/Makefile
+++ b/arch/arm/crypto/Makefile
@@ -8,6 +8,7 @@ obj-$(CONFIG_CRYPTO_AES_ARM_CE) += aes-arm-ce.o
 obj-$(CONFIG_CRYPTO_SHA1_ARM) += sha1-arm.o
 obj-$(CONFIG_CRYPTO_SHA1_ARM_NEON) += sha1-arm-neon.o
 obj-$(CONFIG_CRYPTO_SHA256_ARM) += sha256-arm.o
+obj-$(CONFIG_CRYPTO_SHA512_ARM) += sha512-arm.o
 obj-$(CONFIG_CRYPTO_SHA512_ARM_NEON) += sha512-arm-neon.o
 obj-$(CONFIG_CRYPTO_SHA1_ARM_CE) += sha1-arm-ce.o
 obj-$(CONFIG_CRYPTO_SHA2_ARM_CE) += sha2-arm-ce.o
@@ -19,6 +20,8 @@ sha1-arm-y:= sha1-armv4-large.o sha1_glue.o
 sha1-arm-neon-y:= sha1-armv7-neon.o sha1_neon_glue.o
 sha256-arm-neon-$(CONFIG_KERNEL_MODE_NEON) := sha256_neon_glue.o
 sha256-arm-y   := sha256-core.o sha256_glue.o $(sha256-arm-neon-y)
+sha512-arm-neon-$(CONFIG_KERNEL_MODE_NEON) := sha512-neon-glue.o
+sha512-arm-y   := sha512-core.o sha512-glue.o $(sha512-arm-neon-y)
 sha512-arm-neon-y := sha512-armv7-neon.o sha512_neon_glue.o
 sha1-arm-ce-y  := sha1-ce-core.o sha1-ce-glue.o
 sha2-arm-ce-y  := sha2-ce-core.o sha2-ce-glue.o
@@ -34,4 +37,7 @@ $(src)/aesbs-core.S_shipped: $(src)/bsaes-armv7.pl
 $(src)/sha256-core.S_shipped: $(src)/sha256-armv4.pl
$(call cmd,perl)
 
-.PRECIOUS: $(obj)/aesbs-core.S $(obj)/sha256-core.S
+$(src)/sha512-core.S_shipped: $(src)/sha512-armv4.pl
+   $(call cmd,perl)
+
+.PRECIOUS: $(obj)/aesbs-core.S $(obj)/sha256-core.S $(obj)/sha512-core.S
diff --git a/arch/arm/crypto/sha512-armv4.pl b/arch/arm/crypto/sha512-armv4.pl
new file mode 100644
index ..7e540f8439da
--- /dev/null
+++ b/arch/arm/crypto/sha512-armv4.pl
@@ -0,0 +1,656 @@
+#!/usr/bin/env perl
+
+# 
+# Written by Andy Polyakov ap

[PATCH v2 resend 01/14] crypto: sha512: implement base layer for SHA-512

2015-03-30 Thread Ard Biesheuvel
To reduce the number of copies of boilerplate code throughout
the tree, this patch implements generic glue for the SHA-512
algorithm. This allows a specific arch or hardware implementation
to only implement the special handling that it needs.

Signed-off-by: Ard Biesheuvel ard.biesheu...@linaro.org
---
 crypto/Kconfig   |   3 ++
 crypto/Makefile  |   1 +
 crypto/sha512_base.c | 143 +++
 include/crypto/sha.h |  20 +++
 4 files changed, 167 insertions(+)
 create mode 100644 crypto/sha512_base.c

diff --git a/crypto/Kconfig b/crypto/Kconfig
index 88639937a934..3400cf4e3cdb 100644
--- a/crypto/Kconfig
+++ b/crypto/Kconfig
@@ -641,6 +641,9 @@ config CRYPTO_SHA256_SPARC64
  SHA-256 secure hash standard (DFIPS 180-2) implemented
  using sparc64 crypto instructions, when available.
 
+config CRYPTO_SHA512_BASE
+   tristate
+
 config CRYPTO_SHA512
tristate SHA384 and SHA512 digest algorithms
select CRYPTO_HASH
diff --git a/crypto/Makefile b/crypto/Makefile
index 97b7d3ac87e7..6174bf2592fe 100644
--- a/crypto/Makefile
+++ b/crypto/Makefile
@@ -45,6 +45,7 @@ obj-$(CONFIG_CRYPTO_RMD256) += rmd256.o
 obj-$(CONFIG_CRYPTO_RMD320) += rmd320.o
 obj-$(CONFIG_CRYPTO_SHA1) += sha1_generic.o
 obj-$(CONFIG_CRYPTO_SHA256) += sha256_generic.o
+obj-$(CONFIG_CRYPTO_SHA512_BASE) += sha512_base.o
 obj-$(CONFIG_CRYPTO_SHA512) += sha512_generic.o
 obj-$(CONFIG_CRYPTO_WP512) += wp512.o
 obj-$(CONFIG_CRYPTO_TGR192) += tgr192.o
diff --git a/crypto/sha512_base.c b/crypto/sha512_base.c
new file mode 100644
index ..9a60829e06c4
--- /dev/null
+++ b/crypto/sha512_base.c
@@ -0,0 +1,143 @@
+/*
+ * sha512_base.c - core logic for SHA-512 implementations
+ *
+ * Copyright (C) 2015 Linaro Ltd ard.biesheu...@linaro.org
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include crypto/internal/hash.h
+#include crypto/sha.h
+#include linux/crypto.h
+#include linux/module.h
+
+#include asm/unaligned.h
+
+int crypto_sha384_base_init(struct shash_desc *desc)
+{
+   static const u64 sha384_init_state[] = {
+   SHA384_H0, SHA384_H1, SHA384_H2, SHA384_H3,
+   SHA384_H4, SHA384_H5, SHA384_H6, SHA384_H7,
+   };
+   struct sha512_state *sctx = shash_desc_ctx(desc);
+
+   memcpy(sctx-state, sha384_init_state, sizeof(sctx-state));
+   sctx-count[0] = sctx-count[1] = 0;
+   return 0;
+}
+EXPORT_SYMBOL(crypto_sha384_base_init);
+
+int crypto_sha512_base_init(struct shash_desc *desc)
+{
+   static const u64 sha512_init_state[] = {
+   SHA512_H0, SHA512_H1, SHA512_H2, SHA512_H3,
+   SHA512_H4, SHA512_H5, SHA512_H6, SHA512_H7,
+   };
+   struct sha512_state *sctx = shash_desc_ctx(desc);
+
+   memcpy(sctx-state, sha512_init_state, sizeof(sctx-state));
+   sctx-count[0] = sctx-count[1] = 0;
+   return 0;
+}
+EXPORT_SYMBOL(crypto_sha512_base_init);
+
+int crypto_sha512_base_export(struct shash_desc *desc, void *out)
+{
+   struct sha512_state *sctx = shash_desc_ctx(desc);
+   struct sha512_state *dst = out;
+
+   *dst = *sctx;
+
+   return 0;
+}
+EXPORT_SYMBOL(crypto_sha512_base_export);
+
+int crypto_sha512_base_import(struct shash_desc *desc, const void *in)
+{
+   struct sha512_state *sctx = shash_desc_ctx(desc);
+   struct sha512_state const *src = in;
+
+   *sctx = *src;
+
+   return 0;
+}
+EXPORT_SYMBOL(crypto_sha512_base_import);
+
+int crypto_sha512_base_do_update(struct shash_desc *desc, const u8 *data,
+unsigned int len, sha512_block_fn *block_fn,
+void *p)
+{
+   struct sha512_state *sctx = shash_desc_ctx(desc);
+   unsigned int partial = sctx-count[0] % SHA512_BLOCK_SIZE;
+
+   sctx-count[0] += len;
+   if (sctx-count[0]  len)
+   sctx-count[1]++;
+
+   if (unlikely((partial + len) = SHA512_BLOCK_SIZE)) {
+   int blocks;
+
+   if (partial) {
+   int p = SHA512_BLOCK_SIZE - partial;
+
+   memcpy(sctx-buf + partial, data, p);
+   data += p;
+   len -= p;
+   }
+
+   blocks = len / SHA512_BLOCK_SIZE;
+   len %= SHA512_BLOCK_SIZE;
+
+   block_fn(blocks, data, sctx-state,
+partial ? sctx-buf : NULL, p);
+   data += blocks * SHA512_BLOCK_SIZE;
+   partial = 0;
+   }
+   if (len)
+   memcpy(sctx-buf + partial, data, len);
+
+   return 0;
+}
+EXPORT_SYMBOL(crypto_sha512_base_do_update);
+
+int crypto_sha512_base_do_finalize(struct shash_desc *desc,
+  sha512_block_fn *block_fn, void *p)
+{
+   const int bit_offset = SHA512_BLOCK_SIZE

AW: [RFC PATCH 1/6] crypto: sha512: implement base layer for SHA-512

2015-03-29 Thread Markus Stockhausen
 Von: Ard Biesheuvel [ard.biesheu...@linaro.org]
 Gesendet: Sonntag, 29. März 2015 12:38
 An: Markus Stockhausen
 Cc: linux-arm-ker...@lists.infradead.org; linux-crypto@vger.kernel.org; 
 samitolva...@google.com; herb...@gondor.apana.org.au; jussi.kivili...@iki.fi
 Betreff: Re: [RFC PATCH 1/6] crypto: sha512: implement base layer for SHA-512
 
 ...
 +int sha512_base_do_update(struct shash_desc *desc, const u8 *data,
 + unsigned int len, sha512_block_fn *block_fn, void 
 *p)
 +{
 +   struct sha512_state *sctx = shash_desc_ctx(desc);
 +   unsigned int partial = sctx-count[0] % SHA512_BLOCK_SIZE;
 +
 +   sctx-count[0] += len;
 +   if (sctx-count[0]  len)
 +   sctx-count[1]++;

 You should check if early kick out at this point if the buffer won't be 
 filled up
 is faster than first taking care about big data. That can improve performance
 for small blocks while large blocks might be unaffected.

 +
 +   if ((partial + len) = SHA512_BLOCK_SIZE) {

Isn't this early kickout? The if is only entered if there is enough
data to run the block function, otherwise it is a straight memcpy. I
could add an unlikely() here to favor the small data case

I did my tests only on low end hardware. 32bit PPC e500 single core 800MHz
256K cache. Maybe it prefers early return statements. 

Additionally I ended up clearing the context in the finish function with a 
simple inlined 32bit writes loop. Everything else (e.g. memzero) resulted in 
slower processing. Don't know what your clearing syntax will produce after
compilation.

Markus

Diese E-Mail enthält vertrauliche und/oder rechtlich geschützte
Informationen. Wenn Sie nicht der richtige Adressat sind oder diese E-Mail
irrtümlich erhalten haben, informieren Sie bitte sofort den Absender und
vernichten Sie diese Mail. Das unerlaubte Kopieren sowie die unbefugte
Weitergabe dieser Mail ist nicht gestattet.

Über das Internet versandte E-Mails können unter fremden Namen erstellt oder
manipuliert werden. Deshalb ist diese als E-Mail verschickte Nachricht keine
rechtsverbindliche Willenserklärung.

Collogia
Unternehmensberatung AG
Ubierring 11
D-50678 Köln

Vorstand:
Kadir Akin
Dr. Michael Höhnerbach

Vorsitzender des Aufsichtsrates:
Hans Kristian Langva

Registergericht: Amtsgericht Köln
Registernummer: HRB 52 497

This e-mail may contain confidential and/or privileged information. If you
are not the intended recipient (or have received this e-mail in error)
please notify the sender immediately and destroy this e-mail. Any
unauthorized copying, disclosure or distribution of the material in this
e-mail is strictly forbidden.

e-mails sent over the internet may have been written under a wrong name or
been manipulated. That is why this message sent as an e-mail is not a
legally binding declaration of intention.

Collogia
Unternehmensberatung AG
Ubierring 11
D-50678 Köln

executive board:
Kadir Akin
Dr. Michael Höhnerbach

President of the supervisory board:
Hans Kristian Langva

Registry office: district court Cologne
Register number: HRB 52 497




Re: [PATCH] crypto/arm: accelerated SHA-512 using ARM generic ASM and NEON

2015-03-29 Thread Andy Polyakov
 This updates the SHA-512 NEON module with the faster and more
 versatile implementation from the OpenSSL project. It consists
 of both a NEON and a generic ASM version of the core SHA-512
 transform, where the NEON version reverts to the ASM version
 when invoked in non-process context.
 
 Performance relative to the generic implementation (measured
 using tcrypt.ko mode=306 sec=1 running on a Cortex-A57 under
 KVM):
 
   input size  block size  asm neonold neon
 
   819281921.513.512.69

One should keep in mind that improvement coefficients vary greatly from
platform to platform. Normally you *should* observe higher coefficients
in asm column and *can* observe smaller differences between neon and
old neon. BTW, 1.51 is unexpectedly low, I wonder which compiler
version stands for 1.0? Nor can I replicate difference between neon
and old neon, I get smaller difference, 17%, on Cortex-A57. Well, I'm
comparing in user-land, but it shouldn't be that significant at large
blocks...

 Signed-off-by: Ard Biesheuvel ard.biesheu...@linaro.org
 ---
 
 This should get the same treatment as Sami's sha56 version: I would like
 to wait until the OpenSSL source file hits the upstream repository so that
 I can refer to its sha1 hash in the commit log.

Update is committed as
http://git.openssl.org/gitweb/?p=openssl.git;a=commitdiff;h=b1a5d1c652086257930a1f62ae51c9cdee654b2c.
Note that the file I've initially sent privately was a little bit off.
Sorry about that. But that little bit is just a commentary update that
adds performance result for Cortex-A15. So that kernel patch as
originally posted is 100% functionally equivalent.

Cheers.

--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 1/6] crypto: sha512: implement base layer for SHA-512

2015-03-29 Thread Ard Biesheuvel
On 29 March 2015 at 10:29, Markus Stockhausen stockhau...@collogia.de wrote:
 Von: linux-crypto-ow...@vger.kernel.org 
 [linux-crypto-ow...@vger.kernel.org]quot; im Auftrag von quot;Ard 
 Biesheuvel [ard.biesheu...@linaro.org]
 Gesendet: Samstag, 28. März 2015 23:10
 An: linux-arm-ker...@lists.infradead.org; linux-crypto@vger.kernel.org; 
 samitolva...@google.com; herb...@gondor.apana.org.au; jussi.kivili...@iki.fi
 Cc: Ard Biesheuvel
 Betreff: [RFC PATCH 1/6] crypto: sha512: implement base layer for SHA-512

 To reduce the number of copies of boilerplate code throughout
 the tree, this patch implements generic glue for the SHA-512
 algorithm. This allows a specific arch or hardware implementation
 to only implement the special handling that it needs.

 Hi Ard,

 Implementing a common layer is a very good idea - I didn't like to
 implement the glue code once again for some recently developed
 PPC crypto modules. From my very short crypto experience I was
 surprised that my optimized implementations degraded disproportional
 for small calculations in the =256byte update scenarios in contrast to
 some very old basic implementations. Below you will find some hints,
 that might fit your implementation too. Thus all new implementations
 based on your framework could benefit immediately.


Thanks for taking a look!

 ...
 +int sha384_base_init(struct shash_desc *desc)
 +{
 +   struct sha512_state *sctx = shash_desc_ctx(desc);
 +
 +   *sctx = (struct sha512_state){
 +   .state = {
 +   SHA384_H0, SHA384_H1, SHA384_H2, SHA384_H3,
 +   SHA384_H4, SHA384_H5, SHA384_H6, SHA384_H7,
 +   }
 +   };
 +   return 0;
 +}

 IIRC the above code will initialize the whole context including the 64/128
 byte buffer. Direct assignment of the 8 hashes was faster in my case.


Ah, I missed that. I will change it.

 ...
 +int sha512_base_do_update(struct shash_desc *desc, const u8 *data,
 + unsigned int len, sha512_block_fn *block_fn, void 
 *p)
 +{
 +   struct sha512_state *sctx = shash_desc_ctx(desc);
 +   unsigned int partial = sctx-count[0] % SHA512_BLOCK_SIZE;
 +
 +   sctx-count[0] += len;
 +   if (sctx-count[0]  len)
 +   sctx-count[1]++;

 You should check if early kick out at this point if the buffer won't be 
 filled up
 is faster than first taking care about big data. That can improve performance
 for small blocks while large blocks might be unaffected.

 +
 +   if ((partial + len) = SHA512_BLOCK_SIZE) {

Isn't this early kickout? The if is only entered if there is enough
data to run the block function, otherwise it is a straight memcpy. I
could add an unlikely() here to favor the small data case


 +   int blocks;
 +
 +   if (partial) {
 +   int p = SHA512_BLOCK_SIZE - partial;
 +
 +   memcpy(sctx-buf + partial, data, p);
 +   data += p;
 +   len -= p;
 +   }
 +
 +   blocks = len / SHA512_BLOCK_SIZE;
 +   len %= SHA512_BLOCK_SIZE;
 +
 +   block_fn(blocks, data, sctx-state,
 +partial ? sctx-buf : NULL, p);
 +   data += blocks * SHA512_BLOCK_SIZE;
 +   partial = 0;
 +   }
 +   if (len)
 +   memcpy(sctx-buf + partial, data, len);
 +
 +   return 0;
 +}
 +EXPORT_SYMBOL(sha512_base_do_update);
 +
 +int sha512_base_do_finalize(struct shash_desc *desc, sha512_block_fn 
 *block_fn,
 +   void *p)
 +{
 +   static const u8 padding[SHA512_BLOCK_SIZE] = { 0x80, };
 +
 +   struct sha512_state *sctx = shash_desc_ctx(desc);
 +   unsigned int padlen;
 +   __be64 bits[2];
 +
 +   padlen = SHA512_BLOCK_SIZE -
 +(sctx-count[0] + sizeof(bits)) % SHA512_BLOCK_SIZE;
 +
 +   bits[0] = cpu_to_be64(sctx-count[1]  3 |
 + sctx-count[0]  61);
 +   bits[1] = cpu_to_be64(sctx-count[0]  3);
 +
 +   sha512_base_do_update(desc, padding, padlen, block_fn, p);

 I know that this is the most intuitive and straight implementation for 
 handling
 finalization. Nevertheless the maybe a little obscure generic md5 algorithm
 gives best in class performance for hash finalization of small input data.


Well, memcpy'ing a buffer consisting almost entirely of zeroes doesn't
quite feel right, indeed.
I will instead follow the md5 suggestion

 For comparison: From the raw numbers the sha1-ppc-spe assembler module
 written by me is only 10% faster than the old sha1-popwerpc assembler module.
 Both are simple assembler algorithms without hardware acceleration. For large
 blocks I gain another 8% by avoding function calls because the core module
 may process several blocks. But for small single block updates the above glue
 code optimizations gave

 16byte block single update: +24%
 64byte block single update: +16

AW: [RFC PATCH 1/6] crypto: sha512: implement base layer for SHA-512

2015-03-29 Thread Markus Stockhausen
 Von: linux-crypto-ow...@vger.kernel.org 
 [linux-crypto-ow...@vger.kernel.org]quot; im Auftrag von quot;Ard 
 Biesheuvel [ard.biesheu...@linaro.org]
 Gesendet: Samstag, 28. März 2015 23:10
 An: linux-arm-ker...@lists.infradead.org; linux-crypto@vger.kernel.org; 
 samitolva...@google.com; herb...@gondor.apana.org.au; jussi.kivili...@iki.fi
 Cc: Ard Biesheuvel
 Betreff: [RFC PATCH 1/6] crypto: sha512: implement base layer for SHA-512
 
 To reduce the number of copies of boilerplate code throughout
 the tree, this patch implements generic glue for the SHA-512
 algorithm. This allows a specific arch or hardware implementation
 to only implement the special handling that it needs.

Hi Ard,

Implementing a common layer is a very good idea - I didn't like to 
implement the glue code once again for some recently developed 
PPC crypto modules. From my very short crypto experience I was 
surprised that my optimized implementations degraded disproportional 
for small calculations in the =256byte update scenarios in contrast to 
some very old basic implementations. Below you will find some hints, 
that might fit your implementation too. Thus all new implementations 
based on your framework could benefit immediately.

 ...
 +int sha384_base_init(struct shash_desc *desc)
 +{
 +   struct sha512_state *sctx = shash_desc_ctx(desc);
 +
 +   *sctx = (struct sha512_state){
 +   .state = {
 +   SHA384_H0, SHA384_H1, SHA384_H2, SHA384_H3,
 +   SHA384_H4, SHA384_H5, SHA384_H6, SHA384_H7,
 +   }
 +   };
 +   return 0;
 +}

IIRC the above code will initialize the whole context including the 64/128
byte buffer. Direct assignment of the 8 hashes was faster in my case. 

 ...
 +int sha512_base_do_update(struct shash_desc *desc, const u8 *data,
 + unsigned int len, sha512_block_fn *block_fn, void 
 *p)
 +{
 +   struct sha512_state *sctx = shash_desc_ctx(desc);
 +   unsigned int partial = sctx-count[0] % SHA512_BLOCK_SIZE;
 +
 +   sctx-count[0] += len;
 +   if (sctx-count[0]  len)
 +   sctx-count[1]++;

You should check if early kick out at this point if the buffer won't be filled 
up
is faster than first taking care about big data. That can improve performance
for small blocks while large blocks might be unaffected.

 +
 +   if ((partial + len) = SHA512_BLOCK_SIZE) {
 +   int blocks;
 +
 +   if (partial) {
 +   int p = SHA512_BLOCK_SIZE - partial;
 +
 +   memcpy(sctx-buf + partial, data, p);
 +   data += p;
 +   len -= p;
 +   }
 +
 +   blocks = len / SHA512_BLOCK_SIZE;
 +   len %= SHA512_BLOCK_SIZE;
 +
 +   block_fn(blocks, data, sctx-state,
 +partial ? sctx-buf : NULL, p);
 +   data += blocks * SHA512_BLOCK_SIZE;
 +   partial = 0;
 +   }
 +   if (len)
 +   memcpy(sctx-buf + partial, data, len);
 +
 +   return 0;
 +}
 +EXPORT_SYMBOL(sha512_base_do_update);
 +
 +int sha512_base_do_finalize(struct shash_desc *desc, sha512_block_fn 
 *block_fn,
 +   void *p)
 +{
 +   static const u8 padding[SHA512_BLOCK_SIZE] = { 0x80, };
 +
 +   struct sha512_state *sctx = shash_desc_ctx(desc);
 +   unsigned int padlen;
 +   __be64 bits[2];
 +
 +   padlen = SHA512_BLOCK_SIZE -
 +(sctx-count[0] + sizeof(bits)) % SHA512_BLOCK_SIZE;
 +
 +   bits[0] = cpu_to_be64(sctx-count[1]  3 |
 + sctx-count[0]  61);
 +   bits[1] = cpu_to_be64(sctx-count[0]  3);
 +
 +   sha512_base_do_update(desc, padding, padlen, block_fn, p);

I know that this is the most intuitive and straight implementation for handling
finalization. Nevertheless the maybe a little obscure generic md5 algorithm
gives best in class performance for hash finalization of small input data. 

For comparison: From the raw numbers the sha1-ppc-spe assembler module 
written by me is only 10% faster than the old sha1-popwerpc assembler module. 
Both are simple assembler algorithms without hardware acceleration. For large 
blocks I gain another 8% by avoding function calls because the core module 
may process several blocks. But for small single block updates the above glue 
code optimizations gave

16byte block single update: +24%
64byte block single update: +16%
256byte block single update +12%

Considering CPU assisted SHA calculations that percentage may be even higher.

Maybe worth the effort ... 

Markus

Diese E-Mail enthält vertrauliche und/oder rechtlich geschützte
Informationen. Wenn Sie nicht der richtige Adressat sind oder diese E-Mail
irrtümlich erhalten haben, informieren Sie bitte sofort den Absender und
vernichten Sie diese Mail. Das unerlaubte Kopieren sowie die

[RFC PATCH 1/6] crypto: sha512: implement base layer for SHA-512

2015-03-28 Thread Ard Biesheuvel
To reduce the number of copies of boilerplate code throughout
the tree, this patch implements generic glue for the SHA-512
algorithm. This allows a specific arch or hardware implementation
to only implement the special handling that it needs.

Signed-off-by: Ard Biesheuvel ard.biesheu...@linaro.org
---
 crypto/Kconfig   |   3 ++
 crypto/Makefile  |   1 +
 crypto/sha512_base.c | 143 +++
 include/crypto/sha.h |  20 +++
 4 files changed, 167 insertions(+)
 create mode 100644 crypto/sha512_base.c

diff --git a/crypto/Kconfig b/crypto/Kconfig
index 88639937a934..3400cf4e3cdb 100644
--- a/crypto/Kconfig
+++ b/crypto/Kconfig
@@ -641,6 +641,9 @@ config CRYPTO_SHA256_SPARC64
  SHA-256 secure hash standard (DFIPS 180-2) implemented
  using sparc64 crypto instructions, when available.
 
+config CRYPTO_SHA512_BASE
+   tristate
+
 config CRYPTO_SHA512
tristate SHA384 and SHA512 digest algorithms
select CRYPTO_HASH
diff --git a/crypto/Makefile b/crypto/Makefile
index 97b7d3ac87e7..6174bf2592fe 100644
--- a/crypto/Makefile
+++ b/crypto/Makefile
@@ -45,6 +45,7 @@ obj-$(CONFIG_CRYPTO_RMD256) += rmd256.o
 obj-$(CONFIG_CRYPTO_RMD320) += rmd320.o
 obj-$(CONFIG_CRYPTO_SHA1) += sha1_generic.o
 obj-$(CONFIG_CRYPTO_SHA256) += sha256_generic.o
+obj-$(CONFIG_CRYPTO_SHA512_BASE) += sha512_base.o
 obj-$(CONFIG_CRYPTO_SHA512) += sha512_generic.o
 obj-$(CONFIG_CRYPTO_WP512) += wp512.o
 obj-$(CONFIG_CRYPTO_TGR192) += tgr192.o
diff --git a/crypto/sha512_base.c b/crypto/sha512_base.c
new file mode 100644
index ..488e24cc6f0a
--- /dev/null
+++ b/crypto/sha512_base.c
@@ -0,0 +1,143 @@
+/*
+ * sha512_base.c - core logic for SHA-512 implementations
+ *
+ * Copyright (C) 2015 Linaro Ltd ard.biesheu...@linaro.org
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include crypto/internal/hash.h
+#include crypto/sha.h
+#include linux/crypto.h
+#include linux/module.h
+
+#include asm/unaligned.h
+
+int sha384_base_init(struct shash_desc *desc)
+{
+   struct sha512_state *sctx = shash_desc_ctx(desc);
+
+   *sctx = (struct sha512_state){
+   .state = {
+   SHA384_H0, SHA384_H1, SHA384_H2, SHA384_H3,
+   SHA384_H4, SHA384_H5, SHA384_H6, SHA384_H7,
+   }
+   };
+   return 0;
+}
+EXPORT_SYMBOL(sha384_base_init);
+
+int sha512_base_init(struct shash_desc *desc)
+{
+   struct sha512_state *sctx = shash_desc_ctx(desc);
+
+   *sctx = (struct sha512_state){
+   .state = {
+   SHA512_H0, SHA512_H1, SHA512_H2, SHA512_H3,
+   SHA512_H4, SHA512_H5, SHA512_H6, SHA512_H7,
+   }
+   };
+   return 0;
+}
+EXPORT_SYMBOL(sha512_base_init);
+
+int sha512_base_export(struct shash_desc *desc, void *out)
+{
+   struct sha512_state *sctx = shash_desc_ctx(desc);
+   struct sha512_state *dst = out;
+
+   *dst = *sctx;
+
+   return 0;
+}
+EXPORT_SYMBOL(sha512_base_export);
+
+int sha512_base_import(struct shash_desc *desc, const void *in)
+{
+   struct sha512_state *sctx = shash_desc_ctx(desc);
+   struct sha512_state const *src = in;
+
+   *sctx = *src;
+
+   return 0;
+}
+EXPORT_SYMBOL(sha512_base_import);
+
+int sha512_base_do_update(struct shash_desc *desc, const u8 *data,
+ unsigned int len, sha512_block_fn *block_fn, void *p)
+{
+   struct sha512_state *sctx = shash_desc_ctx(desc);
+   unsigned int partial = sctx-count[0] % SHA512_BLOCK_SIZE;
+
+   sctx-count[0] += len;
+   if (sctx-count[0]  len)
+   sctx-count[1]++;
+
+   if ((partial + len) = SHA512_BLOCK_SIZE) {
+   int blocks;
+
+   if (partial) {
+   int p = SHA512_BLOCK_SIZE - partial;
+
+   memcpy(sctx-buf + partial, data, p);
+   data += p;
+   len -= p;
+   }
+
+   blocks = len / SHA512_BLOCK_SIZE;
+   len %= SHA512_BLOCK_SIZE;
+
+   block_fn(blocks, data, sctx-state,
+partial ? sctx-buf : NULL, p);
+   data += blocks * SHA512_BLOCK_SIZE;
+   partial = 0;
+   }
+   if (len)
+   memcpy(sctx-buf + partial, data, len);
+
+   return 0;
+}
+EXPORT_SYMBOL(sha512_base_do_update);
+
+int sha512_base_do_finalize(struct shash_desc *desc, sha512_block_fn *block_fn,
+   void *p)
+{
+   static const u8 padding[SHA512_BLOCK_SIZE] = { 0x80, };
+
+   struct sha512_state *sctx = shash_desc_ctx(desc);
+   unsigned int padlen;
+   __be64 bits[2];
+
+   padlen = SHA512_BLOCK_SIZE -
+(sctx-count[0] + sizeof(bits)) % SHA512_BLOCK_SIZE;
+
+   bits[0

[RFC PATCH 6/6] arm/crypto: accelerated SHA-512 using ARM generic ASM and NEON

2015-03-28 Thread Ard Biesheuvel
This updates the SHA-512 NEON module with the faster and more
versatile implementation from the OpenSSL project. It consists
of both a NEON and a generic ASM version of the core SHA-512
transform, where the NEON version reverts to the ASM version
when invoked in non-process context.

Performance relative to the generic implementation (measured
using tcrypt.ko mode=306 sec=1 running on a Cortex-A57 under
KVM):

  input sizeblock size  asm neonold neon

  1616  1.392.542.21
  6416  1.322.332.09
  6464  1.382.532.19
  256   16  1.312.282.06
  256   64  1.382.542.25
  256   256 1.402.772.39
  1024  16  1.292.222.01
  1024  256 1.402.822.45
  1024  10241.412.932.53
  2048  16  1.332.212.00
  2048  256 1.402.842.46
  2048  10241.412.962.55
  2048  20481.412.982.56
  4096  16  1.342.201.99
  4096  256 1.402.842.46
  4096  10241.412.972.56
  4096  40961.413.012.58
  8192  16  1.342.191.99
  8192  256 1.402.852.47
  8192  10241.412.982.56
  8192  40961.412.712.59
  8192  81921.513.512.69

Signed-off-by: Ard Biesheuvel ard.biesheu...@linaro.org
---
 arch/arm/crypto/Kconfig   |8 +
 arch/arm/crypto/Makefile  |8 +-
 arch/arm/crypto/sha512-armv4.pl   |  656 
 arch/arm/crypto/sha512-core.S_shipped | 1814 +
 arch/arm/crypto/sha512-glue.c |  137 +++
 arch/arm/crypto/sha512-neon-glue.c|  111 ++
 arch/arm/crypto/sha512.h  |8 +
 7 files changed, 2741 insertions(+), 1 deletion(-)
 create mode 100644 arch/arm/crypto/sha512-armv4.pl
 create mode 100644 arch/arm/crypto/sha512-core.S_shipped
 create mode 100644 arch/arm/crypto/sha512-glue.c
 create mode 100644 arch/arm/crypto/sha512-neon-glue.c
 create mode 100644 arch/arm/crypto/sha512.h

diff --git a/arch/arm/crypto/Kconfig b/arch/arm/crypto/Kconfig
index 458729d2ce22..6b50c6d77b77 100644
--- a/arch/arm/crypto/Kconfig
+++ b/arch/arm/crypto/Kconfig
@@ -53,6 +53,14 @@ config CRYPTO_SHA256_ARM
  SHA-256 secure hash standard (DFIPS 180-2) implemented
  using optimized ARM assembler and NEON, when available.
 
+config CRYPTO_SHA512_ARM
+   tristate SHA-384/512 digest algorithm (ARM-asm and NEON)
+   select CRYPTO_HASH
+   select CRYPTO_SHA512_BASE
+   help
+ SHA-512 secure hash standard (DFIPS 180-2) implemented
+ using optimized ARM assembler and NEON, when available.
+
 config CRYPTO_SHA512_ARM_NEON
tristate SHA384 and SHA512 digest algorithm (ARM NEON)
depends on KERNEL_MODE_NEON
diff --git a/arch/arm/crypto/Makefile b/arch/arm/crypto/Makefile
index ef46e898f98b..322a6ca999a2 100644
--- a/arch/arm/crypto/Makefile
+++ b/arch/arm/crypto/Makefile
@@ -8,6 +8,7 @@ obj-$(CONFIG_CRYPTO_AES_ARM_CE) += aes-arm-ce.o
 obj-$(CONFIG_CRYPTO_SHA1_ARM) += sha1-arm.o
 obj-$(CONFIG_CRYPTO_SHA1_ARM_NEON) += sha1-arm-neon.o
 obj-$(CONFIG_CRYPTO_SHA256_ARM) += sha256-arm.o
+obj-$(CONFIG_CRYPTO_SHA512_ARM) += sha512-arm.o
 obj-$(CONFIG_CRYPTO_SHA512_ARM_NEON) += sha512-arm-neon.o
 obj-$(CONFIG_CRYPTO_SHA1_ARM_CE) += sha1-arm-ce.o
 obj-$(CONFIG_CRYPTO_SHA2_ARM_CE) += sha2-arm-ce.o
@@ -19,6 +20,8 @@ sha1-arm-y:= sha1-armv4-large.o sha1_glue.o
 sha1-arm-neon-y:= sha1-armv7-neon.o sha1_neon_glue.o
 sha256-arm-neon-$(CONFIG_KERNEL_MODE_NEON) := sha256_neon_glue.o
 sha256-arm-y   := sha256-core.o sha256_glue.o $(sha256-arm-neon-y)
+sha512-arm-neon-$(CONFIG_KERNEL_MODE_NEON) := sha512-neon-glue.o
+sha512-arm-y   := sha512-core.o sha512-glue.o $(sha512-arm-neon-y)
 sha512-arm-neon-y := sha512-armv7-neon.o sha512_neon_glue.o
 sha1-arm-ce-y  := sha1-ce-core.o sha1-ce-glue.o
 sha2-arm-ce-y  := sha2-ce-core.o sha2-ce-glue.o
@@ -34,4 +37,7 @@ $(src)/aesbs-core.S_shipped: $(src)/bsaes-armv7.pl
 $(src)/sha256-core.S_shipped: $(src)/sha256-armv4.pl
$(call cmd,perl)
 
-.PRECIOUS: $(obj)/aesbs-core.S $(obj)/sha256-core.S
+$(src)/sha512-core.S_shipped: $(src)/sha512-armv4.pl
+   $(call cmd,perl)
+
+.PRECIOUS: $(obj)/aesbs-core.S $(obj)/sha256-core.S $(obj)/sha512-core.S
diff --git a/arch/arm/crypto/sha512-armv4.pl b/arch/arm/crypto/sha512-armv4.pl
new file mode 100644
index ..7e540f8439da
--- /dev/null
+++ b/arch/arm/crypto/sha512-armv4.pl
@@ -0,0 +1,656 @@
+#!/usr/bin/env perl
+
+# 
+# Written by Andy Polyakov ap

Re: [PATCH] crypto/arm: accelerated SHA-512 using ARM generic ASM and NEON

2015-03-28 Thread Jussi Kivilinna
On 28.03.2015 09:28, Ard Biesheuvel wrote:
 This updates the SHA-512 NEON module with the faster and more
 versatile implementation from the OpenSSL project. It consists
 of both a NEON and a generic ASM version of the core SHA-512
 transform, where the NEON version reverts to the ASM version
 when invoked in non-process context.
 
 Performance relative to the generic implementation (measured
 using tcrypt.ko mode=306 sec=1 running on a Cortex-A57 under
 KVM):
 
   input size  block size  asm neonold neon
 
   16  16  1.392.542.21
   64  16  1.322.332.09
   64  64  1.382.532.19
   256 16  1.312.282.06
   256 64  1.382.542.25
   256 256 1.402.772.39
   102416  1.292.222.01
   1024256 1.402.822.45
   102410241.412.932.53
   204816  1.332.212.00
   2048256 1.402.842.46
   204810241.412.962.55
   204820481.412.982.56
   409616  1.342.201.99
   4096256 1.402.842.46
   409610241.412.972.56
   409640961.413.012.58
   819216  1.342.191.99
   8192256 1.402.852.47
   819210241.412.982.56
   819240961.412.712.59
   819281921.513.512.69
 
 Signed-off-by: Ard Biesheuvel ard.biesheu...@linaro.org
 ---
 
 This should get the same treatment as Sami's sha56 version: I would like
 to wait until the OpenSSL source file hits the upstream repository so that
 I can refer to its sha1 hash in the commit log.
 
  arch/arm/crypto/Kconfig   |2 -
  arch/arm/crypto/Makefile  |8 +-
  arch/arm/crypto/sha512-armv4.pl   |  656 
  arch/arm/crypto/sha512-armv7-neon.S   |  455 -
  arch/arm/crypto/sha512-core.S_shipped | 1814 
 +
  arch/arm/crypto/sha512.h  |   14 +
  arch/arm/crypto/sha512_glue.c |  255 +
  arch/arm/crypto/sha512_neon_glue.c|  155 +--
  8 files changed, 2762 insertions(+), 597 deletions(-)
  create mode 100644 arch/arm/crypto/sha512-armv4.pl
  delete mode 100644 arch/arm/crypto/sha512-armv7-neon.S

Acked-by: Jussi Kivilinna jussi.kivili...@iki.fi

  create mode 100644 arch/arm/crypto/sha512-core.S_shipped
  create mode 100644 arch/arm/crypto/sha512.h
  create mode 100644 arch/arm/crypto/sha512_glue.c
 
--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] crypto/arm: accelerated SHA-512 using ARM generic ASM and NEON

2015-03-28 Thread Ard Biesheuvel
This updates the SHA-512 NEON module with the faster and more
versatile implementation from the OpenSSL project. It consists
of both a NEON and a generic ASM version of the core SHA-512
transform, where the NEON version reverts to the ASM version
when invoked in non-process context.

Performance relative to the generic implementation (measured
using tcrypt.ko mode=306 sec=1 running on a Cortex-A57 under
KVM):

input size  block size  asm neonold neon

16  16  1.392.542.21
64  16  1.322.332.09
64  64  1.382.532.19
256 16  1.312.282.06
256 64  1.382.542.25
256 256 1.402.772.39
102416  1.292.222.01
1024256 1.402.822.45
102410241.412.932.53
204816  1.332.212.00
2048256 1.402.842.46
204810241.412.962.55
204820481.412.982.56
409616  1.342.201.99
4096256 1.402.842.46
409610241.412.972.56
409640961.413.012.58
819216  1.342.191.99
8192256 1.402.852.47
819210241.412.982.56
819240961.412.712.59
819281921.513.512.69

Signed-off-by: Ard Biesheuvel ard.biesheu...@linaro.org
---

This should get the same treatment as Sami's sha56 version: I would like
to wait until the OpenSSL source file hits the upstream repository so that
I can refer to its sha1 hash in the commit log.

 arch/arm/crypto/Kconfig   |2 -
 arch/arm/crypto/Makefile  |8 +-
 arch/arm/crypto/sha512-armv4.pl   |  656 
 arch/arm/crypto/sha512-armv7-neon.S   |  455 -
 arch/arm/crypto/sha512-core.S_shipped | 1814 +
 arch/arm/crypto/sha512.h  |   14 +
 arch/arm/crypto/sha512_glue.c |  255 +
 arch/arm/crypto/sha512_neon_glue.c|  155 +--
 8 files changed, 2762 insertions(+), 597 deletions(-)
 create mode 100644 arch/arm/crypto/sha512-armv4.pl
 delete mode 100644 arch/arm/crypto/sha512-armv7-neon.S
 create mode 100644 arch/arm/crypto/sha512-core.S_shipped
 create mode 100644 arch/arm/crypto/sha512.h
 create mode 100644 arch/arm/crypto/sha512_glue.c

diff --git a/arch/arm/crypto/Kconfig b/arch/arm/crypto/Kconfig
index 458729d2ce22..846694ad2b7d 100644
--- a/arch/arm/crypto/Kconfig
+++ b/arch/arm/crypto/Kconfig
@@ -55,8 +55,6 @@ config CRYPTO_SHA256_ARM
 
 config CRYPTO_SHA512_ARM_NEON
tristate SHA384 and SHA512 digest algorithm (ARM NEON)
-   depends on KERNEL_MODE_NEON
-   select CRYPTO_SHA512
select CRYPTO_HASH
help
  SHA-512 secure hash standard (DFIPS 180-2) implemented
diff --git a/arch/arm/crypto/Makefile b/arch/arm/crypto/Makefile
index ef46e898f98b..c0ed9b68fe12 100644
--- a/arch/arm/crypto/Makefile
+++ b/arch/arm/crypto/Makefile
@@ -19,7 +19,8 @@ sha1-arm-y:= sha1-armv4-large.o sha1_glue.o
 sha1-arm-neon-y:= sha1-armv7-neon.o sha1_neon_glue.o
 sha256-arm-neon-$(CONFIG_KERNEL_MODE_NEON) := sha256_neon_glue.o
 sha256-arm-y   := sha256-core.o sha256_glue.o $(sha256-arm-neon-y)
-sha512-arm-neon-y := sha512-armv7-neon.o sha512_neon_glue.o
+sha512-arm-neon-$(CONFIG_KERNEL_MODE_NEON) := sha512_neon_glue.o
+sha512-arm-neon-y := sha512-core.o sha512_glue.o $(sha512-arm-neon-y)
 sha1-arm-ce-y  := sha1-ce-core.o sha1-ce-glue.o
 sha2-arm-ce-y  := sha2-ce-core.o sha2-ce-glue.o
 aes-arm-ce-y   := aes-ce-core.o aes-ce-glue.o
@@ -34,4 +35,7 @@ $(src)/aesbs-core.S_shipped: $(src)/bsaes-armv7.pl
 $(src)/sha256-core.S_shipped: $(src)/sha256-armv4.pl
$(call cmd,perl)
 
-.PRECIOUS: $(obj)/aesbs-core.S $(obj)/sha256-core.S
+$(src)/sha512-core.S_shipped: $(src)/sha512-armv4.pl
+   $(call cmd,perl)
+
+.PRECIOUS: $(obj)/aesbs-core.S $(obj)/sha256-core.S $(obj)/sha512-core.S
diff --git a/arch/arm/crypto/sha512-armv4.pl b/arch/arm/crypto/sha512-armv4.pl
new file mode 100644
index ..7e540f8439da
--- /dev/null
+++ b/arch/arm/crypto/sha512-armv4.pl
@@ -0,0 +1,656 @@
+#!/usr/bin/env perl
+
+# 
+# Written by Andy Polyakov ap...@openssl.org for the OpenSSL
+# project. The module is, however, dual licensed under OpenSSL and
+# CRYPTOGAMS licenses depending on where you obtain it. For further
+# details see http://www.openssl.org/~appro/cryptogams

Re: [PATCH] crypto: testmgr: add empty and large test vectors for SHA-1, SHA-224, SHA-256, SHA-384 and SHA-512

2014-04-16 Thread Herbert Xu
On Sat, Apr 12, 2014 at 03:35:29PM +0300, Jussi Kivilinna wrote:
 Patch adds large test-vectors for SHA algorithms for better code coverage in
 optimized assembly implementations. Empty test-vectors are also added, as some
 crypto drivers appear to have special case handling for empty input.
 
 Signed-off-by: Jussi Kivilinna jussi.kivili...@iki.fi

Patch applied.  Thanks!
-- 
Email: Herbert Xu herb...@gondor.apana.org.au
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] crypto: testmgr: add empty and large test vectors for SHA-1, SHA-224, SHA-256, SHA-384 and SHA-512

2014-04-12 Thread Jussi Kivilinna
Patch adds large test-vectors for SHA algorithms for better code coverage in
optimized assembly implementations. Empty test-vectors are also added, as some
crypto drivers appear to have special case handling for empty input.

Signed-off-by: Jussi Kivilinna jussi.kivili...@iki.fi
---

This patch depends on the crypto: add test cases for SHA-1, SHA-224, SHA-256
and AES-CCM patch from Ard Biesheuvel.
---
 crypto/testmgr.h |  728 +-
 1 file changed, 721 insertions(+), 7 deletions(-)

diff --git a/crypto/testmgr.h b/crypto/testmgr.h
index 84ac0f0..7d1438e 100644
--- a/crypto/testmgr.h
+++ b/crypto/testmgr.h
@@ -487,10 +487,15 @@ static struct hash_testvec crct10dif_tv_template[] = {
  * SHA1 test vectors  from from FIPS PUB 180-1
  * Long vector from CAVS 5.0
  */
-#define SHA1_TEST_VECTORS  4
+#define SHA1_TEST_VECTORS  6
 
 static struct hash_testvec sha1_tv_template[] = {
{
+   .plaintext = ,
+   .psize  = 0,
+   .digest = \xda\x39\xa3\xee\x5e\x6b\x4b\x0d\x32\x55
+ \xbf\xef\x95\x60\x18\x90\xaf\xd8\x07\x09,
+   }, {
.plaintext = abc,
.psize  = 3,
.digest = \xa9\x99\x3e\x36\x47\x06\x81\x6a\xba\x3e
@@ -534,6 +539,139 @@ static struct hash_testvec sha1_tv_template[] = {
.psize  = 64,
.digest = \xc8\x71\xf6\x9a\x63\xcc\xa9\x84\x84\x82
  \x64\xe7\x79\x95\x5d\xd7\x19\x41\x7c\x91,
+   }, {
+   .plaintext = \x08\x9f\x13\xaa\x41\xd8\x4c\xe3
+\x7a\x11\x85\x1c\xb3\x27\xbe\x55
+\xec\x60\xf7\x8e\x02\x99\x30\xc7
+\x3b\xd2\x69\x00\x74\x0b\xa2\x16
+\xad\x44\xdb\x4f\xe6\x7d\x14\x88
+\x1f\xb6\x2a\xc1\x58\xef\x63\xfa
+\x91\x05\x9c\x33\xca\x3e\xd5\x6c
+\x03\x77\x0e\xa5\x19\xb0\x47\xde
+\x52\xe9\x80\x17\x8b\x22\xb9\x2d
+\xc4\x5b\xf2\x66\xfd\x94\x08\x9f
+\x36\xcd\x41\xd8\x6f\x06\x7a\x11
+\xa8\x1c\xb3\x4a\xe1\x55\xec\x83
+\x1a\x8e\x25\xbc\x30\xc7\x5e\xf5
+\x69\x00\x97\x0b\xa2\x39\xd0\x44
+\xdb\x72\x09\x7d\x14\xab\x1f\xb6
+\x4d\xe4\x58\xef\x86\x1d\x91\x28
+\xbf\x33\xca\x61\xf8\x6c\x03\x9a
+\x0e\xa5\x3c\xd3\x47\xde\x75\x0c
+\x80\x17\xae\x22\xb9\x50\xe7\x5b
+\xf2\x89\x20\x94\x2b\xc2\x36\xcd
+\x64\xfb\x6f\x06\x9d\x11\xa8\x3f
+\xd6\x4a\xe1\x78\x0f\x83\x1a\xb1
+\x25\xbc\x53\xea\x5e\xf5\x8c\x00
+\x97\x2e\xc5\x39\xd0\x67\xfe\x72
+\x09\xa0\x14\xab\x42\xd9\x4d\xe4
+\x7b\x12\x86\x1d\xb4\x28\xbf\x56
+\xed\x61\xf8\x8f\x03\x9a\x31\xc8
+\x3c\xd3\x6a\x01\x75\x0c\xa3\x17
+\xae\x45\xdc\x50\xe7\x7e\x15\x89
+\x20\xb7\x2b\xc2\x59\xf0\x64\xfb
+\x92\x06\x9d\x34\xcb\x3f\xd6\x6d
+\x04\x78\x0f\xa6\x1a\xb1\x48\xdf
+\x53\xea\x81\x18\x8c\x23\xba\x2e
+\xc5\x5c\xf3\x67\xfe\x95\x09\xa0
+\x37\xce\x42\xd9\x70\x07\x7b\x12
+\xa9\x1d\xb4\x4b\xe2\x56\xed\x84
+\x1b\x8f\x26\xbd\x31\xc8\x5f\xf6
+\x6a\x01\x98\x0c\xa3\x3a\xd1\x45
+\xdc\x73\x0a\x7e\x15\xac\x20\xb7
+\x4e\xe5\x59\xf0\x87\x1e\x92\x29
+\xc0\x34\xcb\x62\xf9\x6d\x04\x9b
+\x0f\xa6\x3d\xd4\x48\xdf\x76\x0d
+\x81\x18\xaf\x23\xba\x51\xe8\x5c
+\xf3\x8a\x21\x95\x2c\xc3\x37\xce
+\x65\xfc\x70\x07\x9e\x12\xa9\x40
+\xd7\x4b\xe2\x79\x10\x84\x1b\xb2
+\x26\xbd\x54\xeb\x5f\xf6\x8d\x01
+\x98\x2f\xc6\x3a\xd1\x68\xff\x73
+\x0a\xa1\x15\xac\x43\xda\x4e\xe5
+\x7c\x13\x87\x1e\xb5\x29\xc0\x57
+\xee\x62\xf9\x90\x04\x9b\x32\xc9
+\x3d\xd4\x6b\x02\x76\x0d\xa4\x18
+\xaf\x46\xdd\x51\xe8\x7f\x16\x8a
+\x21\xb8\x2c\xc3\x5a\xf1\x65\xfc
+\x93\x07\x9e\x35\xcc\x40\xd7\x6e
+

Re: [PATCH] crypto: Fix byte counter overflow in SHA-512

2012-04-05 Thread Herbert Xu
On Fri, Mar 16, 2012 at 08:26:28PM +, Kent Yoder wrote:
 The current code only increments the upper 64 bits of the SHA-512 byte
 counter when the number of bytes hashed happens to hit 2^64 exactly.
 
 This patch increments the upper 64 bits whenever the lower 64 bits
 overflows.
 
 Signed-off-by: Kent Yoder k...@linux.vnet.ibm.com

Good catch.  Patch applied to crypto and stable.  Thanks a lot!
-- 
Email: Herbert Xu herb...@gondor.apana.org.au
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] crypto: Fix byte counter overflow in SHA-512

2012-03-16 Thread Kent Yoder
The current code only increments the upper 64 bits of the SHA-512 byte
counter when the number of bytes hashed happens to hit 2^64 exactly.

This patch increments the upper 64 bits whenever the lower 64 bits
overflows.

Signed-off-by: Kent Yoder k...@linux.vnet.ibm.com
---
 crypto/sha512_generic.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/crypto/sha512_generic.c b/crypto/sha512_generic.c
index 107f6f7..dd30f40 100644
--- a/crypto/sha512_generic.c
+++ b/crypto/sha512_generic.c
@@ -174,7 +174,7 @@ sha512_update(struct shash_desc *desc, const u8 *data, 
unsigned int len)
index = sctx-count[0]  0x7f;
 
/* Update number of bytes */
-   if (!(sctx-count[0] += len))
+   if ((sctx-count[0] += len)  len)
sctx-count[1]++;
 
 part_len = 128 - index;
-- 
1.7.5.4


--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: sha-512...

2012-02-15 Thread Alexey Dobriyan
On Wed, Feb 15, 2012 at 12:23:52AM -0500, David Miller wrote:
 From: Herbert Xu herb...@gondor.hengli.com.au
 Date: Wed, 15 Feb 2012 16:16:08 +1100
 
  OK, so we grew by 1136 - 888 = 248.  Keep in mind that 128 of
  that is expected since we moved W onto the stack.
 
 Right.
 
  I guess we could go back to the percpu solution, what do you
  think?
 
 I'm not entirely sure, we might have to.
 
 sha512 is notorious for generating terrible code with gcc on 32-bit
 targets, so...  The sha512 test in the glibc testsuite tends to
 timeout on 32-bit sparc. :-)

Cherrypicking ror64() commit largely fixes the issue (on sparc-defconfig):

 sha512_transform:
   0:   9d e3 bc 78 save  %sp, -904, %sp

git://git.kernel.org/pub/scm/linux/kernel/git/herbert/cryptodev-2.6.git
b85a088f15f2070b7180735a231012843a5ac96c
crypto: sha512 - use standard ror64()
--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: sha-512...

2012-02-15 Thread David Miller
From: Alexey Dobriyan adobri...@gmail.com
Date: Wed, 15 Feb 2012 22:27:52 +0300

 On Wed, Feb 15, 2012 at 12:23:52AM -0500, David Miller wrote:
 From: Herbert Xu herb...@gondor.hengli.com.au
 Date: Wed, 15 Feb 2012 16:16:08 +1100
 
  OK, so we grew by 1136 - 888 = 248.  Keep in mind that 128 of
  that is expected since we moved W onto the stack.
 
 Right.
 
  I guess we could go back to the percpu solution, what do you
  think?
 
 I'm not entirely sure, we might have to.
 
 sha512 is notorious for generating terrible code with gcc on 32-bit
 targets, so...  The sha512 test in the glibc testsuite tends to
 timeout on 32-bit sparc. :-)
 
 Cherrypicking ror64() commit largely fixes the issue (on sparc-defconfig):
 
    sha512_transform:
  0:   9d e3 bc 78 save  %sp, -904, %sp
 
 git://git.kernel.org/pub/scm/linux/kernel/git/herbert/cryptodev-2.6.git
 b85a088f15f2070b7180735a231012843a5ac96c
 crypto: sha512 - use standard ror64()

I'm happy with a solution that involves pushing this change to Linus's
tree, it's pretty clear why it helps so much although I'm disappointed
that gcc can't se that the u64 shift argument passed in is always a
constant and therefore way within the range of a 32-bit value, ho hum
:-)

In fact, in my tree, this change brings the stack allocation instruction
down to:

save%sp, -824, %sp  !

which is actually BETTER than what the old per-cpu code got:

save%sp, -984, %sp  !

Therefore I highly recommend we apply that ror() change to Linus's
tree now. :-)

--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: sha-512...

2012-02-15 Thread Herbert Xu
On Wed, Feb 15, 2012 at 04:00:10PM -0500, David Miller wrote:

 In fact, in my tree, this change brings the stack allocation instruction
 down to:
 
 save%sp, -824, %sp  !
 
 which is actually BETTER than what the old per-cpu code got:
 
 save%sp, -984, %sp  !
 
 Therefore I highly recommend we apply that ror() change to Linus's
 tree now. :-)

Great, I'll push that out today.

Thanks,!
-- 
Email: Herbert Xu herb...@gondor.apana.org.au
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


sha-512...

2012-02-14 Thread David Miller

FYI, I just started seeing this on sparc32 after all those
sha512 optimizations:

crypto/sha512_generic.c: In function 'sha512_transform':
crypto/sha512_generic.c:135:1: warning: the frame size of 1136 bytes is larger 
than 1024 bytes [-Wframe-larger-than=]
--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: sha-512...

2012-02-14 Thread Herbert Xu
On Tue, Feb 14, 2012 at 10:58:33PM -0500, David Miller wrote:
 
 FYI, I just started seeing this on sparc32 after all those
 sha512 optimizations:
 
 crypto/sha512_generic.c: In function 'sha512_transform':
 crypto/sha512_generic.c:135:1: warning: the frame size of 1136 bytes is 
 larger than 1024 bytes [-Wframe-larger-than=]

Is that with the latest patch applied?

crypto: sha512 - Avoid stack bloat on i386

If so then this is not good.  What was the original stack usage,
i.e., if you revert to the original percpu code?

Thanks,
-- 
Email: Herbert Xu herb...@gondor.apana.org.au
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: sha-512...

2012-02-14 Thread David Miller
From: Herbert Xu herb...@gondor.hengli.com.au
Date: Wed, 15 Feb 2012 15:01:28 +1100

 On Tue, Feb 14, 2012 at 10:58:33PM -0500, David Miller wrote:
 
 FYI, I just started seeing this on sparc32 after all those
 sha512 optimizations:
 
 crypto/sha512_generic.c: In function 'sha512_transform':
 crypto/sha512_generic.c:135:1: warning: the frame size of 1136 bytes is 
 larger than 1024 bytes [-Wframe-larger-than=]
 
 Is that with the latest patch applied?
 
   crypto: sha512 - Avoid stack bloat on i386
 
 If so then this is not good.

Yes.  And, of course, with that commit reverted it's even worse.
Reverting it makes the stack frame twice as large.

 What was the original stack usage, i.e., if you revert to the
 original percpu code?

If I revert:

commit 3a92d687c8015860a19213e3c102cad6b722f83c
commit 58d7d18b5268febb8b1391c6dffc8e2aaa751fcd
commit 51fc6dc8f948047364f7d42a4ed89b416c6cc0a3
commit 84e31fdb7c797a7303e0cc295cb9bc8b73fb872d

the stackframe goes down to 888 bytes.

More detailed, the progression is:

master  1136
revert 3a92d687c8015860a19213e3c102cad6b722f83c 2408
revert 58d7d18b5268febb8b1391c6dffc8e2aaa751fcd 2408
revert 51fc6dc8f948047364f7d42a4ed89b416c6cc0a3 1520
revert 84e31fdb7c797a7303e0cc295cb9bc8b73fb872d 888
--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: sha-512...

2012-02-14 Thread Herbert Xu
On Wed, Feb 15, 2012 at 12:11:13AM -0500, David Miller wrote:
 
  On Tue, Feb 14, 2012 at 10:58:33PM -0500, David Miller wrote:
  
  FYI, I just started seeing this on sparc32 after all those
  sha512 optimizations:
  
  crypto/sha512_generic.c: In function 'sha512_transform':
  crypto/sha512_generic.c:135:1: warning: the frame size of 1136 bytes is 
  larger than 1024 bytes [-Wframe-larger-than=]
  
  Is that with the latest patch applied?
  
  crypto: sha512 - Avoid stack bloat on i386
  
  If so then this is not good.
 
 Yes.  And, of course, with that commit reverted it's even worse.
 Reverting it makes the stack frame twice as large.
 
  What was the original stack usage, i.e., if you revert to the
  original percpu code?
 
 If I revert:
 
 commit 3a92d687c8015860a19213e3c102cad6b722f83c
 commit 58d7d18b5268febb8b1391c6dffc8e2aaa751fcd
 commit 51fc6dc8f948047364f7d42a4ed89b416c6cc0a3
 commit 84e31fdb7c797a7303e0cc295cb9bc8b73fb872d
 
 the stackframe goes down to 888 bytes.
 
 More detailed, the progression is:
 
 master1136
 revert 3a92d687c8015860a19213e3c102cad6b722f83c   2408
 revert 58d7d18b5268febb8b1391c6dffc8e2aaa751fcd   2408
 revert 51fc6dc8f948047364f7d42a4ed89b416c6cc0a3   1520
 revert 84e31fdb7c797a7303e0cc295cb9bc8b73fb872d   888

OK, so we grew by 1136 - 888 = 248.  Keep in mind that 128 of
that is expected since we moved W onto the stack.

I guess we could go back to the percpu solution, what do you
think?

Cheers,
-- 
Email: Herbert Xu herb...@gondor.apana.org.au
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: sha-512...

2012-02-14 Thread David Miller
From: Herbert Xu herb...@gondor.hengli.com.au
Date: Wed, 15 Feb 2012 16:16:08 +1100

 OK, so we grew by 1136 - 888 = 248.  Keep in mind that 128 of
 that is expected since we moved W onto the stack.

Right.

 I guess we could go back to the percpu solution, what do you
 think?

I'm not entirely sure, we might have to.

sha512 is notorious for generating terrible code with gcc on 32-bit
targets, so...  The sha512 test in the glibc testsuite tends to
timeout on 32-bit sparc. :-)
--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html