Re: [PATCH 0/8] crypto: arm64+generic - SHA3/SHA-512/SM-3 roundup
On Fri, Jan 19, 2018 at 12:04:32PM +, Ard Biesheuvel wrote: > This supersedes all outstanding patches from me related to SHA-3, SHA-512 > or SM-3. > > - fix a correctness issue in the SHA-3 code (#1) and a performance issue (#2), > the first one is definitely a -stable candidate, the second one potentially > as well > - patches #3 and #4 make the generic SHA-3 code reusable as a fallback for the > accelerated code introduced in #6 > - patch #5 adds some SHA-3 test cases > - patch #6 implements SHA-3 using special arm64 instructions > - patch #7 implements the Chinese SM3 secure hash algorithm using special > arm64 instructions > - patch #8 contains some fixes for the recently queued SHA-512 arm64 code. > > Ard Biesheuvel (8): > crypto/generic: sha3 - fixes for alignment and big endian operation > crypto/generic: sha3: rewrite KECCAK transform to help the compiler > optimize > crypto/generic: sha3 - simplify code > crypto/generic: sha3 - export init/update/final routines > crypto/testmgr: sha3 - add new testcases > crypto/arm64: sha3 - new v8.2 Crypto Extensions implementation > crypto/arm64: sm3 - new v8.2 Crypto Extensions implementation > crypto/arm64: sha512 - fix/improve new v8.2 Crypto Extensions code > > arch/arm64/crypto/Kconfig | 12 + > arch/arm64/crypto/Makefile | 6 + > arch/arm64/crypto/sha3-ce-core.S | 210 > arch/arm64/crypto/sha3-ce-glue.c | 161 ++ > arch/arm64/crypto/sha512-ce-core.S | 145 +++--- > arch/arm64/crypto/sha512-glue.c| 1 + > arch/arm64/crypto/sm3-ce-core.S| 141 + > arch/arm64/crypto/sm3-ce-glue.c| 92 > crypto/sha3_generic.c | 332 ++-- > crypto/testmgr.h | 550 > include/crypto/sha3.h | 6 +- > 11 files changed, 1413 insertions(+), 243 deletions(-) > create mode 100644 arch/arm64/crypto/sha3-ce-core.S > create mode 100644 arch/arm64/crypto/sha3-ce-glue.c > create mode 100644 arch/arm64/crypto/sm3-ce-core.S > create mode 100644 arch/arm64/crypto/sm3-ce-glue.c All applied. Thanks. -- Email: Herbert Xu <herb...@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
Re: [PATCH 0/8] crypto: arm64+generic - SHA3/SHA-512/SM-3 roundup
On 22 January 2018 at 20:51, Arnd Bergmannwrote: > On Mon, Jan 22, 2018 at 3:54 PM, Arnd Bergmann wrote: >> On Fri, Jan 19, 2018 at 1:04 PM, Ard Biesheuvel >> I'm doing a little more randconfig build testing here now, will write back by >> the end of today in the unlikely case that if I find anything else wrong. > > Did a few hundred randconfig builds, everything fine as expected. > Thanks Arnd
Re: [PATCH 0/8] crypto: arm64+generic - SHA3/SHA-512/SM-3 roundup
On Mon, Jan 22, 2018 at 3:54 PM, Arnd Bergmannwrote: > On Fri, Jan 19, 2018 at 1:04 PM, Ard Biesheuvel > I'm doing a little more randconfig build testing here now, will write back by > the end of today in the unlikely case that if I find anything else wrong. Did a few hundred randconfig builds, everything fine as expected. Arnd
Re: [PATCH 0/8] crypto: arm64+generic - SHA3/SHA-512/SM-3 roundup
On Fri, Jan 19, 2018 at 1:04 PM, Ard Biesheuvel <ard.biesheu...@linaro.org> wrote: > This supersedes all outstanding patches from me related to SHA-3, SHA-512 > or SM-3. > > - fix a correctness issue in the SHA-3 code (#1) and a performance issue (#2), > the first one is definitely a -stable candidate, the second one potentially > as well > - patches #3 and #4 make the generic SHA-3 code reusable as a fallback for the > accelerated code introduced in #6 > - patch #5 adds some SHA-3 test cases > - patch #6 implements SHA-3 using special arm64 instructions > - patch #7 implements the Chinese SM3 secure hash algorithm using special > arm64 instructions > - patch #8 contains some fixes for the recently queued SHA-512 arm64 code. > > Ard Biesheuvel (8): > crypto/generic: sha3 - fixes for alignment and big endian operation > crypto/generic: sha3: rewrite KECCAK transform to help the compiler > optimize > crypto/generic: sha3 - simplify code > crypto/generic: sha3 - export init/update/final routines > crypto/testmgr: sha3 - add new testcases > crypto/arm64: sha3 - new v8.2 Crypto Extensions implementation > crypto/arm64: sm3 - new v8.2 Crypto Extensions implementation > crypto/arm64: sha512 - fix/improve new v8.2 Crypto Extensions code I can confirm that patch 8 fixes the issues I saw earlier, it would be good to have that merged quickly. I'm doing a little more randconfig build testing here now, will write back by the end of today in the unlikely case that if I find anything else wrong. Arnd
[PATCH 0/8] crypto: arm64+generic - SHA3/SHA-512/SM-3 roundup
This supersedes all outstanding patches from me related to SHA-3, SHA-512 or SM-3. - fix a correctness issue in the SHA-3 code (#1) and a performance issue (#2), the first one is definitely a -stable candidate, the second one potentially as well - patches #3 and #4 make the generic SHA-3 code reusable as a fallback for the accelerated code introduced in #6 - patch #5 adds some SHA-3 test cases - patch #6 implements SHA-3 using special arm64 instructions - patch #7 implements the Chinese SM3 secure hash algorithm using special arm64 instructions - patch #8 contains some fixes for the recently queued SHA-512 arm64 code. Ard Biesheuvel (8): crypto/generic: sha3 - fixes for alignment and big endian operation crypto/generic: sha3: rewrite KECCAK transform to help the compiler optimize crypto/generic: sha3 - simplify code crypto/generic: sha3 - export init/update/final routines crypto/testmgr: sha3 - add new testcases crypto/arm64: sha3 - new v8.2 Crypto Extensions implementation crypto/arm64: sm3 - new v8.2 Crypto Extensions implementation crypto/arm64: sha512 - fix/improve new v8.2 Crypto Extensions code arch/arm64/crypto/Kconfig | 12 + arch/arm64/crypto/Makefile | 6 + arch/arm64/crypto/sha3-ce-core.S | 210 arch/arm64/crypto/sha3-ce-glue.c | 161 ++ arch/arm64/crypto/sha512-ce-core.S | 145 +++--- arch/arm64/crypto/sha512-glue.c| 1 + arch/arm64/crypto/sm3-ce-core.S| 141 + arch/arm64/crypto/sm3-ce-glue.c| 92 crypto/sha3_generic.c | 332 ++-- crypto/testmgr.h | 550 include/crypto/sha3.h | 6 +- 11 files changed, 1413 insertions(+), 243 deletions(-) create mode 100644 arch/arm64/crypto/sha3-ce-core.S create mode 100644 arch/arm64/crypto/sha3-ce-glue.c create mode 100644 arch/arm64/crypto/sm3-ce-core.S create mode 100644 arch/arm64/crypto/sm3-ce-glue.c -- 2.11.0
Re: [RFT PATCH] crypto: arm64 - implement SHA-512 using special instructions
On Tue, Jan 09, 2018 at 06:23:02PM +, Ard Biesheuvel wrote: > Implement the SHA-512 using the new special instructions that have > been introduced as an optional extension in ARMv8.2. > > Signed-off-by: Ard Biesheuvel <ard.biesheu...@linaro.org> Patch applied. Thanks. -- Email: Herbert Xu <herb...@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
Re: [RFT PATCH] crypto: arm64 - implement SHA-512 using special instructions
On 16 January 2018 at 08:16, Steve Capper <steve.cap...@arm.com> wrote: > On Tue, Jan 09, 2018 at 06:23:02PM +, Ard Biesheuvel wrote: >> Implement the SHA-512 using the new special instructions that have >> been introduced as an optional extension in ARMv8.2. > > Hi Ard, > I have tested this applied on top of 4.15-rc7 running in a model. > > For sha512-ce, I verified that tcrypt successfully passed tests for modes: > 12, 104, 189, 190, 306, 406 and 424. > (and I double checked that sha512-ce was being used). > > Similarly for sha384-ce, I tested the following modes: > 11, 103, 187, 188, 305 and 405. > > Also, I had: > CONFIG_CRYPTO_MANAGER_DISABLE_TESTS=n > > So FWIW, please feel free to add: > Tested-by: Steve Capper <steve.cap...@arm.com> > Excellent! Thanks a lot Steve.
Re: [RFT PATCH] crypto: arm64 - implement SHA-512 using special instructions
On Tue, Jan 09, 2018 at 06:23:02PM +, Ard Biesheuvel wrote: > Implement the SHA-512 using the new special instructions that have > been introduced as an optional extension in ARMv8.2. Hi Ard, I have tested this applied on top of 4.15-rc7 running in a model. For sha512-ce, I verified that tcrypt successfully passed tests for modes: 12, 104, 189, 190, 306, 406 and 424. (and I double checked that sha512-ce was being used). Similarly for sha384-ce, I tested the following modes: 11, 103, 187, 188, 305 and 405. Also, I had: CONFIG_CRYPTO_MANAGER_DISABLE_TESTS=n So FWIW, please feel free to add: Tested-by: Steve Capper <steve.cap...@arm.com> Cheers, -- Steve
[RFT PATCH] crypto: arm64 - implement SHA-512 using special instructions
Implement the SHA-512 using the new special instructions that have been introduced as an optional extension in ARMv8.2. Signed-off-by: Ard Biesheuvel <ard.biesheu...@linaro.org> --- arch/arm64/crypto/Kconfig | 6 ++ arch/arm64/crypto/Makefile | 3 + arch/arm64/crypto/sha512-ce-core.S | 207 + arch/arm64/crypto/sha512-ce-glue.c | 119 + 4 files changed, 335 insertions(+) create mode 100644 arch/arm64/crypto/sha512-ce-core.S create mode 100644 arch/arm64/crypto/sha512-ce-glue.c diff --git a/arch/arm64/crypto/Kconfig b/arch/arm64/crypto/Kconfig index 70c517aa4501..aad288f4b9de 100644 --- a/arch/arm64/crypto/Kconfig +++ b/arch/arm64/crypto/Kconfig @@ -29,6 +29,12 @@ config CRYPTO_SHA2_ARM64_CE select CRYPTO_HASH select CRYPTO_SHA256_ARM64 +config CRYPTO_SHA512_ARM64_CE + tristate "SHA-384/SHA-512 digest algorithm (ARMv8 Crypto Extensions)" + depends on KERNEL_MODE_NEON + select CRYPTO_HASH + select CRYPTO_SHA512_ARM64 + config CRYPTO_GHASH_ARM64_CE tristate "GHASH/AES-GCM using ARMv8 Crypto Extensions" depends on KERNEL_MODE_NEON diff --git a/arch/arm64/crypto/Makefile b/arch/arm64/crypto/Makefile index b5edc5918c28..d7573d31d397 100644 --- a/arch/arm64/crypto/Makefile +++ b/arch/arm64/crypto/Makefile @@ -14,6 +14,9 @@ sha1-ce-y := sha1-ce-glue.o sha1-ce-core.o obj-$(CONFIG_CRYPTO_SHA2_ARM64_CE) += sha2-ce.o sha2-ce-y := sha2-ce-glue.o sha2-ce-core.o +obj-$(CONFIG_CRYPTO_SHA512_ARM64_CE) += sha512-ce.o +sha512-ce-y := sha512-ce-glue.o sha512-ce-core.o + obj-$(CONFIG_CRYPTO_GHASH_ARM64_CE) += ghash-ce.o ghash-ce-y := ghash-ce-glue.o ghash-ce-core.o diff --git a/arch/arm64/crypto/sha512-ce-core.S b/arch/arm64/crypto/sha512-ce-core.S new file mode 100644 index ..6c562f8df0b0 --- /dev/null +++ b/arch/arm64/crypto/sha512-ce-core.S @@ -0,0 +1,207 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * sha512-ce-core.S - core SHA-384/SHA-512 transform using v8 Crypto Extensions + * + * Copyright (C) 2018 Linaro Ltd <ard.biesheu...@linaro.org> + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + */ + +#include +#include + + // + // Temporary - for testing only. binutils has no support for these yet + // + .irp b,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31 + .set.Lq\b, \b + .set.Lv\b\().2d, \b + .endr + + .macro sha512h, rd, rn, rm + .inst 0xce608000 | .L\rd | (.L\rn << 5) | (.L\rm << 16) + .endm + + .macro sha512h2, rd, rn, rm + .inst 0xce608400 | .L\rd | (.L\rn << 5) | (.L\rm << 16) + .endm + + .macro sha512su0, rd, rn + .inst 0xcec08000 | .L\rd | (.L\rn << 5) + .endm + + .macro sha512su1, rd, rn, rm + .inst 0xce608800 | .L\rd | (.L\rn << 5) | (.L\rm << 16) + .endm + + .text + .arch armv8-a+crypto + + /* +* The SHA-512 round constants +*/ + .align 4 +.Lsha512_rcon: + .quad 0x428a2f98d728ae22, 0x7137449123ef65cd + .quad 0xb5c0fbcfec4d3b2f, 0xe9b5dba58189dbbc + .quad 0x3956c25bf348b538, 0x59f111f1b605d019 + .quad 0x923f82a4af194f9b, 0xab1c5ed5da6d8118 + .quad 0xd807aa98a3030242, 0x12835b0145706fbe + .quad 0x243185be4ee4b28c, 0x550c7dc3d5ffb4e2 + .quad 0x72be5d74f27b896f, 0x80deb1fe3b1696b1 + .quad 0x9bdc06a725c71235, 0xc19bf174cf692694 + .quad 0xe49b69c19ef14ad2, 0xefbe4786384f25e3 + .quad 0x0fc19dc68b8cd5b5, 0x240ca1cc77ac9c65 + .quad 0x2de92c6f592b0275, 0x4a7484aa6ea6e483 + .quad 0x5cb0a9dcbd41fbd4, 0x76f988da831153b5 + .quad 0x983e5152ee66dfab, 0xa831c66d2db43210 + .quad 0xb00327c898fb213f, 0xbf597fc7beef0ee4 + .quad 0xc6e00bf33da88fc2, 0xd5a79147930aa725 + .quad 0x06ca6351e003826f, 0x142929670a0e6e70 + .quad 0x27b70a8546d22ffc, 0x2e1b21385c26c926 + .quad 0x4d2c6dfc5ac42aed, 0x53380d139d95b3df + .quad 0x650a73548baf63de, 0x766a0abb3c77b2a8 + .quad 0x81c2c92e47edaee6, 0x92722c851482353b + .quad 0xa2bfe8a14cf10364, 0xa81a664bbc423001 + .quad 0xc24b8b70d0f89791, 0xc76c51a30654be30 + .quad 0xd192e819d6ef5218, 0xd69906245565a910 + .quad 0xf40e35855771202a, 0x106aa07032bbd1b8 + .quad 0x19a4c116b8d2d0c8, 0x1e376c085141ab53 +
Re: [PATCH 1/2] crypto: arm/sha512 - accelerated SHA-512 using ARM generic ASM and NEON
On 11 May 2015 at 08:59, Herbert Xu herb...@gondor.apana.org.au wrote: On Fri, May 08, 2015 at 10:46:21AM +0200, Ard Biesheuvel wrote: diff --git a/arch/arm/crypto/Kconfig b/arch/arm/crypto/Kconfig index 8da2207b0072..08b5fb85bff5 100644 --- a/arch/arm/crypto/Kconfig +++ b/arch/arm/crypto/Kconfig @@ -53,20 +53,14 @@ config CRYPTO_SHA256_ARM SHA-256 secure hash standard (DFIPS 180-2) implemented using optimized ARM assembler and NEON, when available. -config CRYPTO_SHA512_ARM_NEON - tristate SHA384 and SHA512 digest algorithm (ARM NEON) - depends on KERNEL_MODE_NEON - select CRYPTO_SHA512 +config CRYPTO_SHA512_ARM + tristate SHA-384/512 digest algorithm (ARM-asm and NEON) + depends on !CPU_V7M select CRYPTO_HASH + depends on !CPU_V7M This looks like a duplicate, no? Yes, you are right. Let me figure out what's going on and send you a new version. -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] crypto/arm: accelerated SHA-512 using ARM generic ASM and NEON
On 13 April 2015 at 06:13, Herbert Xu herb...@gondor.apana.org.au wrote: On Sat, Apr 11, 2015 at 09:15:10PM +0200, Ard Biesheuvel wrote: @Herbert: could you please apply this onto cryptodev before sending out your pull request for v4.1? Done. And please disregard $subject, I will post a v3 with a similar 'depends on' added (unless you're ok to add it yourself) Please resend the patch. But I'll process it after the merge window closes so no hurry. OK, all fine. Thanks Herbert! -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] crypto/arm: accelerated SHA-512 using ARM generic ASM and NEON
On Sat, Apr 11, 2015 at 09:15:10PM +0200, Ard Biesheuvel wrote: @Herbert: could you please apply this onto cryptodev before sending out your pull request for v4.1? Done. And please disregard $subject, I will post a v3 with a similar 'depends on' added (unless you're ok to add it yourself) Please resend the patch. But I'll process it after the merge window closes so no hurry. Thanks, -- Email: Herbert Xu herb...@gondor.apana.org.au Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] crypto/arm: accelerated SHA-512 using ARM generic ASM and NEON
On 11 April 2015 at 10:48, Arnd Bergmann a...@arndb.de wrote: On Saturday 11 April 2015 09:35:15 Ard Biesheuvel wrote: On 10 April 2015 at 22:23, Ard Biesheuvel ard.biesheu...@linaro.org wrote: On 10 apr. 2015, at 22:08, Arnd Bergmann a...@arndb.de wrote: On Friday 10 April 2015 16:29:08 Ard Biesheuvel wrote: +#if __ARM_MAX_ARCH__=7 +.arch armv7-a +.fpu neon + This will cause a build failure on an ARMv7-M build, which is incompatible with .arch armv7-a and .fpu neon. The neon part depends on CONFIG_KERNEL_MODE_NEON, which would never be set for that platform, I suppose On second thought, that is not entirely true, but I still don't think there is problem here: the .arch/.fpu declarations are understood perfectly fine by GAS when targeting ARMv7-M. Only, it will emit code that is incompatible with it. However, this code is invoked at runtime only if a NEON unit has been detected, so it will just be ignored on ARMv7-M Sorry, I should have collected my findings better when replying to your patch. What I remembered was that I saw a problem in this area in linux-next with randconfig builds, but I did not notice that it was for a different file, and I had not double-checked that patch yet in order to send it out. See below for the patch I'm currently using for my randconfig builder. Before you apply this, please check again which files are affected, as it's possible that there are other modules that suffer from the same problem. Arnd 8--- Subject: [PATCH] ARM: crypto: avoid sha256 code on ARMv7-M The sha256 assembly implementation can deal with all architecture levels from ARMv4 to ARMv7-A, but not with ARMv7-M. Enabling it in an ARMv7-M kernel results in this build failure: arm-linux-gnueabi-ld: error: arch/arm/crypto/sha256_glue.o: Conflicting architecture profiles M/A arm-linux-gnueabi-ld: failed to merge target specific data of file arch/arm/crypto/sha256_glue.o This adds a Kconfig dependency to prevent the code from being disabled ... enabled? for ARMv7-M. Signed-off-by: Arnd Bergmann a...@arndb.de diff --git a/arch/arm/crypto/Kconfig b/arch/arm/crypto/Kconfig index 458729d2ce22..76463da22f81 100644 --- a/arch/arm/crypto/Kconfig +++ b/arch/arm/crypto/Kconfig @@ -49,6 +49,7 @@ config CRYPTO_SHA2_ARM_CE config CRYPTO_SHA256_ARM tristate SHA-224/256 digest algorithm (ARM-asm and NEON) select CRYPTO_HASH + depends on !CPU_V7M help SHA-256 secure hash standard (DFIPS 180-2) implemented using optimized ARM assembler and NEON, when available. @Herbert: could you please apply this onto cryptodev before sending out your pull request for v4.1? And please disregard $subject, I will post a v3 with a similar 'depends on' added (unless you're ok to add it yourself) Thanks, Ard. -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] crypto/arm: accelerated SHA-512 using ARM generic ASM and NEON
On Saturday 11 April 2015 12:27:18 Ard Biesheuvel wrote: Ah i see it now. The new Sha256 module as well as the Sha512 i am proposing here both use a single .o containing the !neon and neon implementations, and only expose the latter if KERNEL_MODE_NEON. This way, we can use the exact same .S file ad OpenSSL, which should mean less maintenance burden. So your fix seems the most appropriate, even if it means v7m won't be able to use the !neon part either. Ok, sounds good. If someone wants to change that code to work on ARMv7-M, they probably want that fix in the openssl version as well, and then we can update both. Arnd -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] crypto/arm: accelerated SHA-512 using ARM generic ASM and NEON
On Saturday 11 April 2015 09:35:15 Ard Biesheuvel wrote: On 10 April 2015 at 22:23, Ard Biesheuvel ard.biesheu...@linaro.org wrote: On 10 apr. 2015, at 22:08, Arnd Bergmann a...@arndb.de wrote: On Friday 10 April 2015 16:29:08 Ard Biesheuvel wrote: +#if __ARM_MAX_ARCH__=7 +.arch armv7-a +.fpu neon + This will cause a build failure on an ARMv7-M build, which is incompatible with .arch armv7-a and .fpu neon. The neon part depends on CONFIG_KERNEL_MODE_NEON, which would never be set for that platform, I suppose On second thought, that is not entirely true, but I still don't think there is problem here: the .arch/.fpu declarations are understood perfectly fine by GAS when targeting ARMv7-M. Only, it will emit code that is incompatible with it. However, this code is invoked at runtime only if a NEON unit has been detected, so it will just be ignored on ARMv7-M Sorry, I should have collected my findings better when replying to your patch. What I remembered was that I saw a problem in this area in linux-next with randconfig builds, but I did not notice that it was for a different file, and I had not double-checked that patch yet in order to send it out. See below for the patch I'm currently using for my randconfig builder. Before you apply this, please check again which files are affected, as it's possible that there are other modules that suffer from the same problem. Arnd 8--- Subject: [PATCH] ARM: crypto: avoid sha256 code on ARMv7-M The sha256 assembly implementation can deal with all architecture levels from ARMv4 to ARMv7-A, but not with ARMv7-M. Enabling it in an ARMv7-M kernel results in this build failure: arm-linux-gnueabi-ld: error: arch/arm/crypto/sha256_glue.o: Conflicting architecture profiles M/A arm-linux-gnueabi-ld: failed to merge target specific data of file arch/arm/crypto/sha256_glue.o This adds a Kconfig dependency to prevent the code from being disabled for ARMv7-M. Signed-off-by: Arnd Bergmann a...@arndb.de diff --git a/arch/arm/crypto/Kconfig b/arch/arm/crypto/Kconfig index 458729d2ce22..76463da22f81 100644 --- a/arch/arm/crypto/Kconfig +++ b/arch/arm/crypto/Kconfig @@ -49,6 +49,7 @@ config CRYPTO_SHA2_ARM_CE config CRYPTO_SHA256_ARM tristate SHA-224/256 digest algorithm (ARM-asm and NEON) select CRYPTO_HASH + depends on !CPU_V7M help SHA-256 secure hash standard (DFIPS 180-2) implemented using optimized ARM assembler and NEON, when available. -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] crypto/arm: accelerated SHA-512 using ARM generic ASM and NEON
On 10 April 2015 at 22:23, Ard Biesheuvel ard.biesheu...@linaro.org wrote: On 10 apr. 2015, at 22:08, Arnd Bergmann a...@arndb.de wrote: On Friday 10 April 2015 16:29:08 Ard Biesheuvel wrote: +#if __ARM_MAX_ARCH__=7 +.arch armv7-a +.fpu neon + This will cause a build failure on an ARMv7-M build, which is incompatible with .arch armv7-a and .fpu neon. The neon part depends on CONFIG_KERNEL_MODE_NEON, which would never be set for that platform, I suppose On second thought, that is not entirely true, but I still don't think there is problem here: the .arch/.fpu declarations are understood perfectly fine by GAS when targeting ARMv7-M. Only, it will emit code that is incompatible with it. However, this code is invoked at runtime only if a NEON unit has been detected, so it will just be ignored on ARMv7-M -- Ard. -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] crypto/arm: accelerated SHA-512 using ARM generic ASM and NEON
On 11 apr. 2015, at 10:48, Arnd Bergmann a...@arndb.de wrote: On Saturday 11 April 2015 09:35:15 Ard Biesheuvel wrote: On 10 April 2015 at 22:23, Ard Biesheuvel ard.biesheu...@linaro.org wrote: On 10 apr. 2015, at 22:08, Arnd Bergmann a...@arndb.de wrote: On Friday 10 April 2015 16:29:08 Ard Biesheuvel wrote: +#if __ARM_MAX_ARCH__=7 +.arch armv7-a +.fpu neon + This will cause a build failure on an ARMv7-M build, which is incompatible with .arch armv7-a and .fpu neon. The neon part depends on CONFIG_KERNEL_MODE_NEON, which would never be set for that platform, I suppose On second thought, that is not entirely true, but I still don't think there is problem here: the .arch/.fpu declarations are understood perfectly fine by GAS when targeting ARMv7-M. Only, it will emit code that is incompatible with it. However, this code is invoked at runtime only if a NEON unit has been detected, so it will just be ignored on ARMv7-M Sorry, I should have collected my findings better when replying to your patch. What I remembered was that I saw a problem in this area in linux-next with randconfig builds, but I did not notice that it was for a different file, and I had not double-checked that patch yet in order to send it out. See below for the patch I'm currently using for my randconfig builder. Before you apply this, please check again which files are affected, as it's possible that there are other modules that suffer from the same problem. Ah i see it now. The new Sha256 module as well as the Sha512 i am proposing here both use a single .o containing the !neon and neon implementations, and only expose the latter if KERNEL_MODE_NEON. This way, we can use the exact same .S file ad OpenSSL, which should mean less maintenance burden. So your fix seems the most appropriate, even if it means v7m won't be able to use the !neon part either. Arnd 8--- Subject: [PATCH] ARM: crypto: avoid sha256 code on ARMv7-M The sha256 assembly implementation can deal with all architecture levels from ARMv4 to ARMv7-A, but not with ARMv7-M. Enabling it in an ARMv7-M kernel results in this build failure: arm-linux-gnueabi-ld: error: arch/arm/crypto/sha256_glue.o: Conflicting architecture profiles M/A arm-linux-gnueabi-ld: failed to merge target specific data of file arch/arm/crypto/sha256_glue.o This adds a Kconfig dependency to prevent the code from being disabled for ARMv7-M. Signed-off-by: Arnd Bergmann a...@arndb.de diff --git a/arch/arm/crypto/Kconfig b/arch/arm/crypto/Kconfig index 458729d2ce22..76463da22f81 100644 --- a/arch/arm/crypto/Kconfig +++ b/arch/arm/crypto/Kconfig @@ -49,6 +49,7 @@ config CRYPTO_SHA2_ARM_CE config CRYPTO_SHA256_ARM tristate SHA-224/256 digest algorithm (ARM-asm and NEON) select CRYPTO_HASH +depends on !CPU_V7M help SHA-256 secure hash standard (DFIPS 180-2) implemented using optimized ARM assembler and NEON, when available. -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] crypto/arm: accelerated SHA-512 using ARM generic ASM and NEON
On Friday 10 April 2015 16:29:08 Ard Biesheuvel wrote: +#if __ARM_MAX_ARCH__=7 +.arch armv7-a +.fpu neon + This will cause a build failure on an ARMv7-M build, which is incompatible with .arch armv7-a and .fpu neon. Arnd -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] crypto/arm: accelerated SHA-512 using ARM generic ASM and NEON
On 10 apr. 2015, at 22:08, Arnd Bergmann a...@arndb.de wrote: On Friday 10 April 2015 16:29:08 Ard Biesheuvel wrote: +#if __ARM_MAX_ARCH__=7 +.arch armv7-a +.fpu neon + This will cause a build failure on an ARMv7-M build, which is incompatible with .arch armv7-a and .fpu neon. The neon part depends on CONFIG_KERNEL_MODE_NEON, which would never be set for that platform, I suppose-- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v4 03/16] crypto: sha512: implement base layer for SHA-512
To reduce the number of copies of boilerplate code throughout the tree, this patch implements generic glue for the SHA-512 algorithm. This allows a specific arch or hardware implementation to only implement the special handling that it needs. The users need to supply an implementation of void (sha512_block_fn)(struct sha512_state *sst, u8 const *src, int blocks) and pass it to the SHA-512 base functions. For easy casting between the prototype above and existing block functions that take a 'u64 state[]' as their first argument, the 'state' member of struct sha512_state is moved to the base of the struct. Signed-off-by: Ard Biesheuvel ard.biesheu...@linaro.org --- include/crypto/sha.h | 2 +- include/crypto/sha512_base.h | 131 +++ 2 files changed, 132 insertions(+), 1 deletion(-) create mode 100644 include/crypto/sha512_base.h diff --git a/include/crypto/sha.h b/include/crypto/sha.h index a75bc80cc776..05e82cbc4d8f 100644 --- a/include/crypto/sha.h +++ b/include/crypto/sha.h @@ -77,8 +77,8 @@ struct sha256_state { }; struct sha512_state { - u64 count[2]; u64 state[SHA512_DIGEST_SIZE / 8]; + u64 count[2]; u8 buf[SHA512_BLOCK_SIZE]; }; diff --git a/include/crypto/sha512_base.h b/include/crypto/sha512_base.h new file mode 100644 index ..6c5341e005ea --- /dev/null +++ b/include/crypto/sha512_base.h @@ -0,0 +1,131 @@ +/* + * sha512_base.h - core logic for SHA-512 implementations + * + * Copyright (C) 2015 Linaro Ltd ard.biesheu...@linaro.org + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + */ + +#include crypto/internal/hash.h +#include crypto/sha.h +#include linux/crypto.h +#include linux/module.h + +#include asm/unaligned.h + +typedef void (sha512_block_fn)(struct sha512_state *sst, u8 const *src, + int blocks); + +static inline int sha384_base_init(struct shash_desc *desc) +{ + struct sha512_state *sctx = shash_desc_ctx(desc); + + sctx-state[0] = SHA384_H0; + sctx-state[1] = SHA384_H1; + sctx-state[2] = SHA384_H2; + sctx-state[3] = SHA384_H3; + sctx-state[4] = SHA384_H4; + sctx-state[5] = SHA384_H5; + sctx-state[6] = SHA384_H6; + sctx-state[7] = SHA384_H7; + sctx-count[0] = sctx-count[1] = 0; + + return 0; +} + +static inline int sha512_base_init(struct shash_desc *desc) +{ + struct sha512_state *sctx = shash_desc_ctx(desc); + + sctx-state[0] = SHA512_H0; + sctx-state[1] = SHA512_H1; + sctx-state[2] = SHA512_H2; + sctx-state[3] = SHA512_H3; + sctx-state[4] = SHA512_H4; + sctx-state[5] = SHA512_H5; + sctx-state[6] = SHA512_H6; + sctx-state[7] = SHA512_H7; + sctx-count[0] = sctx-count[1] = 0; + + return 0; +} + +static inline int sha512_base_do_update(struct shash_desc *desc, + const u8 *data, + unsigned int len, + sha512_block_fn *block_fn) +{ + struct sha512_state *sctx = shash_desc_ctx(desc); + unsigned int partial = sctx-count[0] % SHA512_BLOCK_SIZE; + + sctx-count[0] += len; + if (sctx-count[0] len) + sctx-count[1]++; + + if (unlikely((partial + len) = SHA512_BLOCK_SIZE)) { + int blocks; + + if (partial) { + int p = SHA512_BLOCK_SIZE - partial; + + memcpy(sctx-buf + partial, data, p); + data += p; + len -= p; + + block_fn(sctx, sctx-buf, 1); + } + + blocks = len / SHA512_BLOCK_SIZE; + len %= SHA512_BLOCK_SIZE; + + if (blocks) { + block_fn(sctx, data, blocks); + data += blocks * SHA512_BLOCK_SIZE; + } + partial = 0; + } + if (len) + memcpy(sctx-buf + partial, data, len); + + return 0; +} + +static inline int sha512_base_do_finalize(struct shash_desc *desc, + sha512_block_fn *block_fn) +{ + const int bit_offset = SHA512_BLOCK_SIZE - sizeof(__be64[2]); + struct sha512_state *sctx = shash_desc_ctx(desc); + __be64 *bits = (__be64 *)(sctx-buf + bit_offset); + unsigned int partial = sctx-count[0] % SHA512_BLOCK_SIZE; + + sctx-buf[partial++] = 0x80; + if (partial bit_offset) { + memset(sctx-buf + partial, 0x0, SHA512_BLOCK_SIZE - partial); + partial = 0; + + block_fn(sctx, sctx-buf, 1); + } + + memset(sctx-buf + partial, 0x0, bit_offset - partial); + bits[0] = cpu_to_be64(sctx-count[1] 3 | sctx-count[0] 61); + bits[1
[PATCH v3 03/16] crypto: sha512: implement base layer for SHA-512
To reduce the number of copies of boilerplate code throughout the tree, this patch implements generic glue for the SHA-512 algorithm. This allows a specific arch or hardware implementation to only implement the special handling that it needs. Signed-off-by: Ard Biesheuvel ard.biesheu...@linaro.org --- include/crypto/sha512_base.h | 147 +++ 1 file changed, 147 insertions(+) create mode 100644 include/crypto/sha512_base.h diff --git a/include/crypto/sha512_base.h b/include/crypto/sha512_base.h new file mode 100644 index ..44351f781dce --- /dev/null +++ b/include/crypto/sha512_base.h @@ -0,0 +1,147 @@ +/* + * sha512_base.h - core logic for SHA-512 implementations + * + * Copyright (C) 2015 Linaro Ltd ard.biesheu...@linaro.org + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + */ + +#include crypto/internal/hash.h +#include crypto/sha.h +#include linux/crypto.h +#include linux/module.h + +#include asm/unaligned.h + +typedef void (sha512_block_fn)(int blocks, u8 const *src, u64 *state, + const u8 *head, void *p); + +static inline int sha384_base_init(struct shash_desc *desc) +{ + struct sha512_state *sctx = shash_desc_ctx(desc); + + sctx-state[0] = SHA384_H0; + sctx-state[1] = SHA384_H1; + sctx-state[2] = SHA384_H2; + sctx-state[3] = SHA384_H3; + sctx-state[4] = SHA384_H4; + sctx-state[5] = SHA384_H5; + sctx-state[6] = SHA384_H6; + sctx-state[7] = SHA384_H7; + sctx-count[0] = sctx-count[1] = 0; + + return 0; +} + +static inline int sha512_base_init(struct shash_desc *desc) +{ + struct sha512_state *sctx = shash_desc_ctx(desc); + + sctx-state[0] = SHA512_H0; + sctx-state[1] = SHA512_H1; + sctx-state[2] = SHA512_H2; + sctx-state[3] = SHA512_H3; + sctx-state[4] = SHA512_H4; + sctx-state[5] = SHA512_H5; + sctx-state[6] = SHA512_H6; + sctx-state[7] = SHA512_H7; + sctx-count[0] = sctx-count[1] = 0; + + return 0; +} + +static inline int sha512_base_export(struct shash_desc *desc, void *out) +{ + struct sha512_state *sctx = shash_desc_ctx(desc); + struct sha512_state *dst = out; + + *dst = *sctx; + + return 0; +} + +static inline int sha512_base_import(struct shash_desc *desc, const void *in) +{ + struct sha512_state *sctx = shash_desc_ctx(desc); + struct sha512_state const *src = in; + + *sctx = *src; + + return 0; +} + +static inline int sha512_base_do_update(struct shash_desc *desc, const u8 *data, + unsigned int len, + sha512_block_fn *block_fn, void *p) +{ + struct sha512_state *sctx = shash_desc_ctx(desc); + unsigned int partial = sctx-count[0] % SHA512_BLOCK_SIZE; + + sctx-count[0] += len; + if (sctx-count[0] len) + sctx-count[1]++; + + if (unlikely((partial + len) = SHA512_BLOCK_SIZE)) { + int blocks; + + if (partial) { + int p = SHA512_BLOCK_SIZE - partial; + + memcpy(sctx-buf + partial, data, p); + data += p; + len -= p; + } + + blocks = len / SHA512_BLOCK_SIZE; + len %= SHA512_BLOCK_SIZE; + + block_fn(blocks, data, sctx-state, +partial ? sctx-buf : NULL, p); + data += blocks * SHA512_BLOCK_SIZE; + partial = 0; + } + if (len) + memcpy(sctx-buf + partial, data, len); + + return 0; +} + +static inline int sha512_base_do_finalize(struct shash_desc *desc, + sha512_block_fn *block_fn, void *p) +{ + const int bit_offset = SHA512_BLOCK_SIZE - sizeof(__be64[2]); + struct sha512_state *sctx = shash_desc_ctx(desc); + __be64 *bits = (__be64 *)(sctx-buf + bit_offset); + unsigned int partial = sctx-count[0] % SHA512_BLOCK_SIZE; + + sctx-buf[partial++] = 0x80; + if (partial bit_offset) { + memset(sctx-buf + partial, 0x0, SHA512_BLOCK_SIZE - partial); + partial = 0; + + block_fn(1, sctx-buf, sctx-state, NULL, p); + } + + memset(sctx-buf + partial, 0x0, bit_offset - partial); + bits[0] = cpu_to_be64(sctx-count[1] 3 | sctx-count[0] 61); + bits[1] = cpu_to_be64(sctx-count[0] 3); + block_fn(1, sctx-buf, sctx-state, NULL, p); + + return 0; +} + +static inline int sha512_base_finish(struct shash_desc *desc, u8 *out) +{ + unsigned int digest_size = crypto_shash_digestsize(desc-tfm); + struct sha512_state *sctx = shash_desc_ctx(desc); + __be64 *digest = (__be64 *)out; + int i
Re: [PATCH] crypto/arm: accelerated SHA-512 using ARM generic ASM and NEON
On 29 March 2015 at 16:07, Andy Polyakov ap...@openssl.org wrote: This updates the SHA-512 NEON module with the faster and more versatile implementation from the OpenSSL project. It consists of both a NEON and a generic ASM version of the core SHA-512 transform, where the NEON version reverts to the ASM version when invoked in non-process context. Performance relative to the generic implementation (measured using tcrypt.ko mode=306 sec=1 running on a Cortex-A57 under KVM): input size block size asm neonold neon 819281921.513.512.69 One should keep in mind that improvement coefficients vary greatly from platform to platform. Normally you *should* observe higher coefficients in asm column and *can* observe smaller differences between neon and old neon. BTW, 1.51 is unexpectedly low, I wonder which compiler version stands for 1.0? This was Linaro GCC 4.9 Nor can I replicate difference between neon and old neon, I get smaller difference, 17%, on Cortex-A57. Well, I'm comparing in user-land, but it shouldn't be that significant at large blocks... That is a bit surprising, indeed. The primary difference is that this executes under a 32-bit kernel, whereas your testing uses 32-bit OpenSSL under a 64-bit kernel in 32-bit compatibility mode. I can't really explain how that should make a difference at all, but it's worth to be noted. Signed-off-by: Ard Biesheuvel ard.biesheu...@linaro.org --- This should get the same treatment as Sami's sha56 version: I would like to wait until the OpenSSL source file hits the upstream repository so that I can refer to its sha1 hash in the commit log. Update is committed as http://git.openssl.org/gitweb/?p=openssl.git;a=commitdiff;h=b1a5d1c652086257930a1f62ae51c9cdee654b2c. Note that the file I've initially sent privately was a little bit off. Sorry about that. But that little bit is just a commentary update that adds performance result for Cortex-A15. So that kernel patch as originally posted is 100% functionally equivalent. Thanks. -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 resend 01/14] crypto: sha512: implement base layer for SHA-512
On Mon, Mar 30, 2015 at 11:48:20AM +0200, Ard Biesheuvel wrote: To reduce the number of copies of boilerplate code throughout the tree, this patch implements generic glue for the SHA-512 algorithm. This allows a specific arch or hardware implementation to only implement the special handling that it needs. Signed-off-by: Ard Biesheuvel ard.biesheu...@linaro.org Creating yet another module for this is too much. Please just add these generic helpers to the generic module. Thanks, -- Email: Herbert Xu herb...@gondor.apana.org.au Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 01/14] crypto: sha512: implement base layer for SHA-512
To reduce the number of copies of boilerplate code throughout the tree, this patch implements generic glue for the SHA-512 algorithm. This allows a specific arch or hardware implementation to only implement the special handling that it needs. Signed-off-by: Ard Biesheuvel ard.biesheu...@linaro.org --- crypto/Kconfig | 3 ++ crypto/Makefile | 1 + crypto/sha512_base.c | 143 +++ include/crypto/sha.h | 20 +++ 4 files changed, 167 insertions(+) create mode 100644 crypto/sha512_base.c diff --git a/crypto/Kconfig b/crypto/Kconfig index 88639937a934..3400cf4e3cdb 100644 --- a/crypto/Kconfig +++ b/crypto/Kconfig @@ -641,6 +641,9 @@ config CRYPTO_SHA256_SPARC64 SHA-256 secure hash standard (DFIPS 180-2) implemented using sparc64 crypto instructions, when available. +config CRYPTO_SHA512_BASE + tristate + config CRYPTO_SHA512 tristate SHA384 and SHA512 digest algorithms select CRYPTO_HASH diff --git a/crypto/Makefile b/crypto/Makefile index 97b7d3ac87e7..6174bf2592fe 100644 --- a/crypto/Makefile +++ b/crypto/Makefile @@ -45,6 +45,7 @@ obj-$(CONFIG_CRYPTO_RMD256) += rmd256.o obj-$(CONFIG_CRYPTO_RMD320) += rmd320.o obj-$(CONFIG_CRYPTO_SHA1) += sha1_generic.o obj-$(CONFIG_CRYPTO_SHA256) += sha256_generic.o +obj-$(CONFIG_CRYPTO_SHA512_BASE) += sha512_base.o obj-$(CONFIG_CRYPTO_SHA512) += sha512_generic.o obj-$(CONFIG_CRYPTO_WP512) += wp512.o obj-$(CONFIG_CRYPTO_TGR192) += tgr192.o diff --git a/crypto/sha512_base.c b/crypto/sha512_base.c new file mode 100644 index ..9a60829e06c4 --- /dev/null +++ b/crypto/sha512_base.c @@ -0,0 +1,143 @@ +/* + * sha512_base.c - core logic for SHA-512 implementations + * + * Copyright (C) 2015 Linaro Ltd ard.biesheu...@linaro.org + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + */ + +#include crypto/internal/hash.h +#include crypto/sha.h +#include linux/crypto.h +#include linux/module.h + +#include asm/unaligned.h + +int crypto_sha384_base_init(struct shash_desc *desc) +{ + static const u64 sha384_init_state[] = { + SHA384_H0, SHA384_H1, SHA384_H2, SHA384_H3, + SHA384_H4, SHA384_H5, SHA384_H6, SHA384_H7, + }; + struct sha512_state *sctx = shash_desc_ctx(desc); + + memcpy(sctx-state, sha384_init_state, sizeof(sctx-state)); + sctx-count[0] = sctx-count[1] = 0; + return 0; +} +EXPORT_SYMBOL(crypto_sha384_base_init); + +int crypto_sha512_base_init(struct shash_desc *desc) +{ + static const u64 sha512_init_state[] = { + SHA512_H0, SHA512_H1, SHA512_H2, SHA512_H3, + SHA512_H4, SHA512_H5, SHA512_H6, SHA512_H7, + }; + struct sha512_state *sctx = shash_desc_ctx(desc); + + memcpy(sctx-state, sha512_init_state, sizeof(sctx-state)); + sctx-count[0] = sctx-count[1] = 0; + return 0; +} +EXPORT_SYMBOL(crypto_sha512_base_init); + +int crypto_sha512_base_export(struct shash_desc *desc, void *out) +{ + struct sha512_state *sctx = shash_desc_ctx(desc); + struct sha512_state *dst = out; + + *dst = *sctx; + + return 0; +} +EXPORT_SYMBOL(crypto_sha512_base_export); + +int crypto_sha512_base_import(struct shash_desc *desc, const void *in) +{ + struct sha512_state *sctx = shash_desc_ctx(desc); + struct sha512_state const *src = in; + + *sctx = *src; + + return 0; +} +EXPORT_SYMBOL(crypto_sha512_base_import); + +int crypto_sha512_base_do_update(struct shash_desc *desc, const u8 *data, +unsigned int len, sha512_block_fn *block_fn, +void *p) +{ + struct sha512_state *sctx = shash_desc_ctx(desc); + unsigned int partial = sctx-count[0] % SHA512_BLOCK_SIZE; + + sctx-count[0] += len; + if (sctx-count[0] len) + sctx-count[1]++; + + if (unlikely((partial + len) = SHA512_BLOCK_SIZE)) { + int blocks; + + if (partial) { + int p = SHA512_BLOCK_SIZE - partial; + + memcpy(sctx-buf + partial, data, p); + data += p; + len -= p; + } + + blocks = len / SHA512_BLOCK_SIZE; + len %= SHA512_BLOCK_SIZE; + + block_fn(blocks, data, sctx-state, +partial ? sctx-buf : NULL, p); + data += blocks * SHA512_BLOCK_SIZE; + partial = 0; + } + if (len) + memcpy(sctx-buf + partial, data, len); + + return 0; +} +EXPORT_SYMBOL(crypto_sha512_base_do_update); + +int crypto_sha512_base_do_finalize(struct shash_desc *desc, + sha512_block_fn *block_fn, void *p) +{ + const int bit_offset = SHA512_BLOCK_SIZE
[RFC PATCH 6/6] arm/crypto: accelerated SHA-512 using ARM generic ASM and NEON
This updates the SHA-512 NEON module with the faster and more versatile implementation from the OpenSSL project. It consists of both a NEON and a generic ASM version of the core SHA-512 transform, where the NEON version reverts to the ASM version when invoked in non-process context. Performance relative to the generic implementation (measured using tcrypt.ko mode=306 sec=1 running on a Cortex-A57 under KVM): input sizeblock size asm neonold neon 1616 1.392.542.21 6416 1.322.332.09 6464 1.382.532.19 256 16 1.312.282.06 256 64 1.382.542.25 256 256 1.402.772.39 1024 16 1.292.222.01 1024 256 1.402.822.45 1024 10241.412.932.53 2048 16 1.332.212.00 2048 256 1.402.842.46 2048 10241.412.962.55 2048 20481.412.982.56 4096 16 1.342.201.99 4096 256 1.402.842.46 4096 10241.412.972.56 4096 40961.413.012.58 8192 16 1.342.191.99 8192 256 1.402.852.47 8192 10241.412.982.56 8192 40961.412.712.59 8192 81921.513.512.69 Signed-off-by: Ard Biesheuvel ard.biesheu...@linaro.org --- arch/arm/crypto/Kconfig |8 + arch/arm/crypto/Makefile |8 +- arch/arm/crypto/sha512-armv4.pl | 656 arch/arm/crypto/sha512-core.S_shipped | 1814 + arch/arm/crypto/sha512-glue.c | 137 +++ arch/arm/crypto/sha512-neon-glue.c| 111 ++ arch/arm/crypto/sha512.h |8 + 7 files changed, 2741 insertions(+), 1 deletion(-) create mode 100644 arch/arm/crypto/sha512-armv4.pl create mode 100644 arch/arm/crypto/sha512-core.S_shipped create mode 100644 arch/arm/crypto/sha512-glue.c create mode 100644 arch/arm/crypto/sha512-neon-glue.c create mode 100644 arch/arm/crypto/sha512.h diff --git a/arch/arm/crypto/Kconfig b/arch/arm/crypto/Kconfig index 458729d2ce22..6b50c6d77b77 100644 --- a/arch/arm/crypto/Kconfig +++ b/arch/arm/crypto/Kconfig @@ -53,6 +53,14 @@ config CRYPTO_SHA256_ARM SHA-256 secure hash standard (DFIPS 180-2) implemented using optimized ARM assembler and NEON, when available. +config CRYPTO_SHA512_ARM + tristate SHA-384/512 digest algorithm (ARM-asm and NEON) + select CRYPTO_HASH + select CRYPTO_SHA512_BASE + help + SHA-512 secure hash standard (DFIPS 180-2) implemented + using optimized ARM assembler and NEON, when available. + config CRYPTO_SHA512_ARM_NEON tristate SHA384 and SHA512 digest algorithm (ARM NEON) depends on KERNEL_MODE_NEON diff --git a/arch/arm/crypto/Makefile b/arch/arm/crypto/Makefile index ef46e898f98b..322a6ca999a2 100644 --- a/arch/arm/crypto/Makefile +++ b/arch/arm/crypto/Makefile @@ -8,6 +8,7 @@ obj-$(CONFIG_CRYPTO_AES_ARM_CE) += aes-arm-ce.o obj-$(CONFIG_CRYPTO_SHA1_ARM) += sha1-arm.o obj-$(CONFIG_CRYPTO_SHA1_ARM_NEON) += sha1-arm-neon.o obj-$(CONFIG_CRYPTO_SHA256_ARM) += sha256-arm.o +obj-$(CONFIG_CRYPTO_SHA512_ARM) += sha512-arm.o obj-$(CONFIG_CRYPTO_SHA512_ARM_NEON) += sha512-arm-neon.o obj-$(CONFIG_CRYPTO_SHA1_ARM_CE) += sha1-arm-ce.o obj-$(CONFIG_CRYPTO_SHA2_ARM_CE) += sha2-arm-ce.o @@ -19,6 +20,8 @@ sha1-arm-y:= sha1-armv4-large.o sha1_glue.o sha1-arm-neon-y:= sha1-armv7-neon.o sha1_neon_glue.o sha256-arm-neon-$(CONFIG_KERNEL_MODE_NEON) := sha256_neon_glue.o sha256-arm-y := sha256-core.o sha256_glue.o $(sha256-arm-neon-y) +sha512-arm-neon-$(CONFIG_KERNEL_MODE_NEON) := sha512-neon-glue.o +sha512-arm-y := sha512-core.o sha512-glue.o $(sha512-arm-neon-y) sha512-arm-neon-y := sha512-armv7-neon.o sha512_neon_glue.o sha1-arm-ce-y := sha1-ce-core.o sha1-ce-glue.o sha2-arm-ce-y := sha2-ce-core.o sha2-ce-glue.o @@ -34,4 +37,7 @@ $(src)/aesbs-core.S_shipped: $(src)/bsaes-armv7.pl $(src)/sha256-core.S_shipped: $(src)/sha256-armv4.pl $(call cmd,perl) -.PRECIOUS: $(obj)/aesbs-core.S $(obj)/sha256-core.S +$(src)/sha512-core.S_shipped: $(src)/sha512-armv4.pl + $(call cmd,perl) + +.PRECIOUS: $(obj)/aesbs-core.S $(obj)/sha256-core.S $(obj)/sha512-core.S diff --git a/arch/arm/crypto/sha512-armv4.pl b/arch/arm/crypto/sha512-armv4.pl new file mode 100644 index ..7e540f8439da --- /dev/null +++ b/arch/arm/crypto/sha512-armv4.pl @@ -0,0 +1,656 @@ +#!/usr/bin/env perl + +# +# Written by Andy Polyakov ap
[PATCH v2 resend 01/14] crypto: sha512: implement base layer for SHA-512
To reduce the number of copies of boilerplate code throughout the tree, this patch implements generic glue for the SHA-512 algorithm. This allows a specific arch or hardware implementation to only implement the special handling that it needs. Signed-off-by: Ard Biesheuvel ard.biesheu...@linaro.org --- crypto/Kconfig | 3 ++ crypto/Makefile | 1 + crypto/sha512_base.c | 143 +++ include/crypto/sha.h | 20 +++ 4 files changed, 167 insertions(+) create mode 100644 crypto/sha512_base.c diff --git a/crypto/Kconfig b/crypto/Kconfig index 88639937a934..3400cf4e3cdb 100644 --- a/crypto/Kconfig +++ b/crypto/Kconfig @@ -641,6 +641,9 @@ config CRYPTO_SHA256_SPARC64 SHA-256 secure hash standard (DFIPS 180-2) implemented using sparc64 crypto instructions, when available. +config CRYPTO_SHA512_BASE + tristate + config CRYPTO_SHA512 tristate SHA384 and SHA512 digest algorithms select CRYPTO_HASH diff --git a/crypto/Makefile b/crypto/Makefile index 97b7d3ac87e7..6174bf2592fe 100644 --- a/crypto/Makefile +++ b/crypto/Makefile @@ -45,6 +45,7 @@ obj-$(CONFIG_CRYPTO_RMD256) += rmd256.o obj-$(CONFIG_CRYPTO_RMD320) += rmd320.o obj-$(CONFIG_CRYPTO_SHA1) += sha1_generic.o obj-$(CONFIG_CRYPTO_SHA256) += sha256_generic.o +obj-$(CONFIG_CRYPTO_SHA512_BASE) += sha512_base.o obj-$(CONFIG_CRYPTO_SHA512) += sha512_generic.o obj-$(CONFIG_CRYPTO_WP512) += wp512.o obj-$(CONFIG_CRYPTO_TGR192) += tgr192.o diff --git a/crypto/sha512_base.c b/crypto/sha512_base.c new file mode 100644 index ..9a60829e06c4 --- /dev/null +++ b/crypto/sha512_base.c @@ -0,0 +1,143 @@ +/* + * sha512_base.c - core logic for SHA-512 implementations + * + * Copyright (C) 2015 Linaro Ltd ard.biesheu...@linaro.org + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + */ + +#include crypto/internal/hash.h +#include crypto/sha.h +#include linux/crypto.h +#include linux/module.h + +#include asm/unaligned.h + +int crypto_sha384_base_init(struct shash_desc *desc) +{ + static const u64 sha384_init_state[] = { + SHA384_H0, SHA384_H1, SHA384_H2, SHA384_H3, + SHA384_H4, SHA384_H5, SHA384_H6, SHA384_H7, + }; + struct sha512_state *sctx = shash_desc_ctx(desc); + + memcpy(sctx-state, sha384_init_state, sizeof(sctx-state)); + sctx-count[0] = sctx-count[1] = 0; + return 0; +} +EXPORT_SYMBOL(crypto_sha384_base_init); + +int crypto_sha512_base_init(struct shash_desc *desc) +{ + static const u64 sha512_init_state[] = { + SHA512_H0, SHA512_H1, SHA512_H2, SHA512_H3, + SHA512_H4, SHA512_H5, SHA512_H6, SHA512_H7, + }; + struct sha512_state *sctx = shash_desc_ctx(desc); + + memcpy(sctx-state, sha512_init_state, sizeof(sctx-state)); + sctx-count[0] = sctx-count[1] = 0; + return 0; +} +EXPORT_SYMBOL(crypto_sha512_base_init); + +int crypto_sha512_base_export(struct shash_desc *desc, void *out) +{ + struct sha512_state *sctx = shash_desc_ctx(desc); + struct sha512_state *dst = out; + + *dst = *sctx; + + return 0; +} +EXPORT_SYMBOL(crypto_sha512_base_export); + +int crypto_sha512_base_import(struct shash_desc *desc, const void *in) +{ + struct sha512_state *sctx = shash_desc_ctx(desc); + struct sha512_state const *src = in; + + *sctx = *src; + + return 0; +} +EXPORT_SYMBOL(crypto_sha512_base_import); + +int crypto_sha512_base_do_update(struct shash_desc *desc, const u8 *data, +unsigned int len, sha512_block_fn *block_fn, +void *p) +{ + struct sha512_state *sctx = shash_desc_ctx(desc); + unsigned int partial = sctx-count[0] % SHA512_BLOCK_SIZE; + + sctx-count[0] += len; + if (sctx-count[0] len) + sctx-count[1]++; + + if (unlikely((partial + len) = SHA512_BLOCK_SIZE)) { + int blocks; + + if (partial) { + int p = SHA512_BLOCK_SIZE - partial; + + memcpy(sctx-buf + partial, data, p); + data += p; + len -= p; + } + + blocks = len / SHA512_BLOCK_SIZE; + len %= SHA512_BLOCK_SIZE; + + block_fn(blocks, data, sctx-state, +partial ? sctx-buf : NULL, p); + data += blocks * SHA512_BLOCK_SIZE; + partial = 0; + } + if (len) + memcpy(sctx-buf + partial, data, len); + + return 0; +} +EXPORT_SYMBOL(crypto_sha512_base_do_update); + +int crypto_sha512_base_do_finalize(struct shash_desc *desc, + sha512_block_fn *block_fn, void *p) +{ + const int bit_offset = SHA512_BLOCK_SIZE
AW: [RFC PATCH 1/6] crypto: sha512: implement base layer for SHA-512
Von: Ard Biesheuvel [ard.biesheu...@linaro.org] Gesendet: Sonntag, 29. März 2015 12:38 An: Markus Stockhausen Cc: linux-arm-ker...@lists.infradead.org; linux-crypto@vger.kernel.org; samitolva...@google.com; herb...@gondor.apana.org.au; jussi.kivili...@iki.fi Betreff: Re: [RFC PATCH 1/6] crypto: sha512: implement base layer for SHA-512 ... +int sha512_base_do_update(struct shash_desc *desc, const u8 *data, + unsigned int len, sha512_block_fn *block_fn, void *p) +{ + struct sha512_state *sctx = shash_desc_ctx(desc); + unsigned int partial = sctx-count[0] % SHA512_BLOCK_SIZE; + + sctx-count[0] += len; + if (sctx-count[0] len) + sctx-count[1]++; You should check if early kick out at this point if the buffer won't be filled up is faster than first taking care about big data. That can improve performance for small blocks while large blocks might be unaffected. + + if ((partial + len) = SHA512_BLOCK_SIZE) { Isn't this early kickout? The if is only entered if there is enough data to run the block function, otherwise it is a straight memcpy. I could add an unlikely() here to favor the small data case I did my tests only on low end hardware. 32bit PPC e500 single core 800MHz 256K cache. Maybe it prefers early return statements. Additionally I ended up clearing the context in the finish function with a simple inlined 32bit writes loop. Everything else (e.g. memzero) resulted in slower processing. Don't know what your clearing syntax will produce after compilation. Markus Diese E-Mail enthält vertrauliche und/oder rechtlich geschützte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese E-Mail irrtümlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Mail. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Mail ist nicht gestattet. Ãber das Internet versandte E-Mails können unter fremden Namen erstellt oder manipuliert werden. Deshalb ist diese als E-Mail verschickte Nachricht keine rechtsverbindliche Willenserklärung. Collogia Unternehmensberatung AG Ubierring 11 D-50678 Köln Vorstand: Kadir Akin Dr. Michael Höhnerbach Vorsitzender des Aufsichtsrates: Hans Kristian Langva Registergericht: Amtsgericht Köln Registernummer: HRB 52 497 This e-mail may contain confidential and/or privileged information. If you are not the intended recipient (or have received this e-mail in error) please notify the sender immediately and destroy this e-mail. Any unauthorized copying, disclosure or distribution of the material in this e-mail is strictly forbidden. e-mails sent over the internet may have been written under a wrong name or been manipulated. That is why this message sent as an e-mail is not a legally binding declaration of intention. Collogia Unternehmensberatung AG Ubierring 11 D-50678 Köln executive board: Kadir Akin Dr. Michael Höhnerbach President of the supervisory board: Hans Kristian Langva Registry office: district court Cologne Register number: HRB 52 497
Re: [PATCH] crypto/arm: accelerated SHA-512 using ARM generic ASM and NEON
This updates the SHA-512 NEON module with the faster and more versatile implementation from the OpenSSL project. It consists of both a NEON and a generic ASM version of the core SHA-512 transform, where the NEON version reverts to the ASM version when invoked in non-process context. Performance relative to the generic implementation (measured using tcrypt.ko mode=306 sec=1 running on a Cortex-A57 under KVM): input size block size asm neonold neon 819281921.513.512.69 One should keep in mind that improvement coefficients vary greatly from platform to platform. Normally you *should* observe higher coefficients in asm column and *can* observe smaller differences between neon and old neon. BTW, 1.51 is unexpectedly low, I wonder which compiler version stands for 1.0? Nor can I replicate difference between neon and old neon, I get smaller difference, 17%, on Cortex-A57. Well, I'm comparing in user-land, but it shouldn't be that significant at large blocks... Signed-off-by: Ard Biesheuvel ard.biesheu...@linaro.org --- This should get the same treatment as Sami's sha56 version: I would like to wait until the OpenSSL source file hits the upstream repository so that I can refer to its sha1 hash in the commit log. Update is committed as http://git.openssl.org/gitweb/?p=openssl.git;a=commitdiff;h=b1a5d1c652086257930a1f62ae51c9cdee654b2c. Note that the file I've initially sent privately was a little bit off. Sorry about that. But that little bit is just a commentary update that adds performance result for Cortex-A15. So that kernel patch as originally posted is 100% functionally equivalent. Cheers. -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 1/6] crypto: sha512: implement base layer for SHA-512
On 29 March 2015 at 10:29, Markus Stockhausen stockhau...@collogia.de wrote: Von: linux-crypto-ow...@vger.kernel.org [linux-crypto-ow...@vger.kernel.org]quot; im Auftrag von quot;Ard Biesheuvel [ard.biesheu...@linaro.org] Gesendet: Samstag, 28. März 2015 23:10 An: linux-arm-ker...@lists.infradead.org; linux-crypto@vger.kernel.org; samitolva...@google.com; herb...@gondor.apana.org.au; jussi.kivili...@iki.fi Cc: Ard Biesheuvel Betreff: [RFC PATCH 1/6] crypto: sha512: implement base layer for SHA-512 To reduce the number of copies of boilerplate code throughout the tree, this patch implements generic glue for the SHA-512 algorithm. This allows a specific arch or hardware implementation to only implement the special handling that it needs. Hi Ard, Implementing a common layer is a very good idea - I didn't like to implement the glue code once again for some recently developed PPC crypto modules. From my very short crypto experience I was surprised that my optimized implementations degraded disproportional for small calculations in the =256byte update scenarios in contrast to some very old basic implementations. Below you will find some hints, that might fit your implementation too. Thus all new implementations based on your framework could benefit immediately. Thanks for taking a look! ... +int sha384_base_init(struct shash_desc *desc) +{ + struct sha512_state *sctx = shash_desc_ctx(desc); + + *sctx = (struct sha512_state){ + .state = { + SHA384_H0, SHA384_H1, SHA384_H2, SHA384_H3, + SHA384_H4, SHA384_H5, SHA384_H6, SHA384_H7, + } + }; + return 0; +} IIRC the above code will initialize the whole context including the 64/128 byte buffer. Direct assignment of the 8 hashes was faster in my case. Ah, I missed that. I will change it. ... +int sha512_base_do_update(struct shash_desc *desc, const u8 *data, + unsigned int len, sha512_block_fn *block_fn, void *p) +{ + struct sha512_state *sctx = shash_desc_ctx(desc); + unsigned int partial = sctx-count[0] % SHA512_BLOCK_SIZE; + + sctx-count[0] += len; + if (sctx-count[0] len) + sctx-count[1]++; You should check if early kick out at this point if the buffer won't be filled up is faster than first taking care about big data. That can improve performance for small blocks while large blocks might be unaffected. + + if ((partial + len) = SHA512_BLOCK_SIZE) { Isn't this early kickout? The if is only entered if there is enough data to run the block function, otherwise it is a straight memcpy. I could add an unlikely() here to favor the small data case + int blocks; + + if (partial) { + int p = SHA512_BLOCK_SIZE - partial; + + memcpy(sctx-buf + partial, data, p); + data += p; + len -= p; + } + + blocks = len / SHA512_BLOCK_SIZE; + len %= SHA512_BLOCK_SIZE; + + block_fn(blocks, data, sctx-state, +partial ? sctx-buf : NULL, p); + data += blocks * SHA512_BLOCK_SIZE; + partial = 0; + } + if (len) + memcpy(sctx-buf + partial, data, len); + + return 0; +} +EXPORT_SYMBOL(sha512_base_do_update); + +int sha512_base_do_finalize(struct shash_desc *desc, sha512_block_fn *block_fn, + void *p) +{ + static const u8 padding[SHA512_BLOCK_SIZE] = { 0x80, }; + + struct sha512_state *sctx = shash_desc_ctx(desc); + unsigned int padlen; + __be64 bits[2]; + + padlen = SHA512_BLOCK_SIZE - +(sctx-count[0] + sizeof(bits)) % SHA512_BLOCK_SIZE; + + bits[0] = cpu_to_be64(sctx-count[1] 3 | + sctx-count[0] 61); + bits[1] = cpu_to_be64(sctx-count[0] 3); + + sha512_base_do_update(desc, padding, padlen, block_fn, p); I know that this is the most intuitive and straight implementation for handling finalization. Nevertheless the maybe a little obscure generic md5 algorithm gives best in class performance for hash finalization of small input data. Well, memcpy'ing a buffer consisting almost entirely of zeroes doesn't quite feel right, indeed. I will instead follow the md5 suggestion For comparison: From the raw numbers the sha1-ppc-spe assembler module written by me is only 10% faster than the old sha1-popwerpc assembler module. Both are simple assembler algorithms without hardware acceleration. For large blocks I gain another 8% by avoding function calls because the core module may process several blocks. But for small single block updates the above glue code optimizations gave 16byte block single update: +24% 64byte block single update: +16
AW: [RFC PATCH 1/6] crypto: sha512: implement base layer for SHA-512
Von: linux-crypto-ow...@vger.kernel.org [linux-crypto-ow...@vger.kernel.org]quot; im Auftrag von quot;Ard Biesheuvel [ard.biesheu...@linaro.org] Gesendet: Samstag, 28. März 2015 23:10 An: linux-arm-ker...@lists.infradead.org; linux-crypto@vger.kernel.org; samitolva...@google.com; herb...@gondor.apana.org.au; jussi.kivili...@iki.fi Cc: Ard Biesheuvel Betreff: [RFC PATCH 1/6] crypto: sha512: implement base layer for SHA-512 To reduce the number of copies of boilerplate code throughout the tree, this patch implements generic glue for the SHA-512 algorithm. This allows a specific arch or hardware implementation to only implement the special handling that it needs. Hi Ard, Implementing a common layer is a very good idea - I didn't like to implement the glue code once again for some recently developed PPC crypto modules. From my very short crypto experience I was surprised that my optimized implementations degraded disproportional for small calculations in the =256byte update scenarios in contrast to some very old basic implementations. Below you will find some hints, that might fit your implementation too. Thus all new implementations based on your framework could benefit immediately. ... +int sha384_base_init(struct shash_desc *desc) +{ + struct sha512_state *sctx = shash_desc_ctx(desc); + + *sctx = (struct sha512_state){ + .state = { + SHA384_H0, SHA384_H1, SHA384_H2, SHA384_H3, + SHA384_H4, SHA384_H5, SHA384_H6, SHA384_H7, + } + }; + return 0; +} IIRC the above code will initialize the whole context including the 64/128 byte buffer. Direct assignment of the 8 hashes was faster in my case. ... +int sha512_base_do_update(struct shash_desc *desc, const u8 *data, + unsigned int len, sha512_block_fn *block_fn, void *p) +{ + struct sha512_state *sctx = shash_desc_ctx(desc); + unsigned int partial = sctx-count[0] % SHA512_BLOCK_SIZE; + + sctx-count[0] += len; + if (sctx-count[0] len) + sctx-count[1]++; You should check if early kick out at this point if the buffer won't be filled up is faster than first taking care about big data. That can improve performance for small blocks while large blocks might be unaffected. + + if ((partial + len) = SHA512_BLOCK_SIZE) { + int blocks; + + if (partial) { + int p = SHA512_BLOCK_SIZE - partial; + + memcpy(sctx-buf + partial, data, p); + data += p; + len -= p; + } + + blocks = len / SHA512_BLOCK_SIZE; + len %= SHA512_BLOCK_SIZE; + + block_fn(blocks, data, sctx-state, +partial ? sctx-buf : NULL, p); + data += blocks * SHA512_BLOCK_SIZE; + partial = 0; + } + if (len) + memcpy(sctx-buf + partial, data, len); + + return 0; +} +EXPORT_SYMBOL(sha512_base_do_update); + +int sha512_base_do_finalize(struct shash_desc *desc, sha512_block_fn *block_fn, + void *p) +{ + static const u8 padding[SHA512_BLOCK_SIZE] = { 0x80, }; + + struct sha512_state *sctx = shash_desc_ctx(desc); + unsigned int padlen; + __be64 bits[2]; + + padlen = SHA512_BLOCK_SIZE - +(sctx-count[0] + sizeof(bits)) % SHA512_BLOCK_SIZE; + + bits[0] = cpu_to_be64(sctx-count[1] 3 | + sctx-count[0] 61); + bits[1] = cpu_to_be64(sctx-count[0] 3); + + sha512_base_do_update(desc, padding, padlen, block_fn, p); I know that this is the most intuitive and straight implementation for handling finalization. Nevertheless the maybe a little obscure generic md5 algorithm gives best in class performance for hash finalization of small input data. For comparison: From the raw numbers the sha1-ppc-spe assembler module written by me is only 10% faster than the old sha1-popwerpc assembler module. Both are simple assembler algorithms without hardware acceleration. For large blocks I gain another 8% by avoding function calls because the core module may process several blocks. But for small single block updates the above glue code optimizations gave 16byte block single update: +24% 64byte block single update: +16% 256byte block single update +12% Considering CPU assisted SHA calculations that percentage may be even higher. Maybe worth the effort ... Markus Diese E-Mail enthält vertrauliche und/oder rechtlich geschützte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese E-Mail irrtümlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Mail. Das unerlaubte Kopieren sowie die
[RFC PATCH 1/6] crypto: sha512: implement base layer for SHA-512
To reduce the number of copies of boilerplate code throughout the tree, this patch implements generic glue for the SHA-512 algorithm. This allows a specific arch or hardware implementation to only implement the special handling that it needs. Signed-off-by: Ard Biesheuvel ard.biesheu...@linaro.org --- crypto/Kconfig | 3 ++ crypto/Makefile | 1 + crypto/sha512_base.c | 143 +++ include/crypto/sha.h | 20 +++ 4 files changed, 167 insertions(+) create mode 100644 crypto/sha512_base.c diff --git a/crypto/Kconfig b/crypto/Kconfig index 88639937a934..3400cf4e3cdb 100644 --- a/crypto/Kconfig +++ b/crypto/Kconfig @@ -641,6 +641,9 @@ config CRYPTO_SHA256_SPARC64 SHA-256 secure hash standard (DFIPS 180-2) implemented using sparc64 crypto instructions, when available. +config CRYPTO_SHA512_BASE + tristate + config CRYPTO_SHA512 tristate SHA384 and SHA512 digest algorithms select CRYPTO_HASH diff --git a/crypto/Makefile b/crypto/Makefile index 97b7d3ac87e7..6174bf2592fe 100644 --- a/crypto/Makefile +++ b/crypto/Makefile @@ -45,6 +45,7 @@ obj-$(CONFIG_CRYPTO_RMD256) += rmd256.o obj-$(CONFIG_CRYPTO_RMD320) += rmd320.o obj-$(CONFIG_CRYPTO_SHA1) += sha1_generic.o obj-$(CONFIG_CRYPTO_SHA256) += sha256_generic.o +obj-$(CONFIG_CRYPTO_SHA512_BASE) += sha512_base.o obj-$(CONFIG_CRYPTO_SHA512) += sha512_generic.o obj-$(CONFIG_CRYPTO_WP512) += wp512.o obj-$(CONFIG_CRYPTO_TGR192) += tgr192.o diff --git a/crypto/sha512_base.c b/crypto/sha512_base.c new file mode 100644 index ..488e24cc6f0a --- /dev/null +++ b/crypto/sha512_base.c @@ -0,0 +1,143 @@ +/* + * sha512_base.c - core logic for SHA-512 implementations + * + * Copyright (C) 2015 Linaro Ltd ard.biesheu...@linaro.org + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + */ + +#include crypto/internal/hash.h +#include crypto/sha.h +#include linux/crypto.h +#include linux/module.h + +#include asm/unaligned.h + +int sha384_base_init(struct shash_desc *desc) +{ + struct sha512_state *sctx = shash_desc_ctx(desc); + + *sctx = (struct sha512_state){ + .state = { + SHA384_H0, SHA384_H1, SHA384_H2, SHA384_H3, + SHA384_H4, SHA384_H5, SHA384_H6, SHA384_H7, + } + }; + return 0; +} +EXPORT_SYMBOL(sha384_base_init); + +int sha512_base_init(struct shash_desc *desc) +{ + struct sha512_state *sctx = shash_desc_ctx(desc); + + *sctx = (struct sha512_state){ + .state = { + SHA512_H0, SHA512_H1, SHA512_H2, SHA512_H3, + SHA512_H4, SHA512_H5, SHA512_H6, SHA512_H7, + } + }; + return 0; +} +EXPORT_SYMBOL(sha512_base_init); + +int sha512_base_export(struct shash_desc *desc, void *out) +{ + struct sha512_state *sctx = shash_desc_ctx(desc); + struct sha512_state *dst = out; + + *dst = *sctx; + + return 0; +} +EXPORT_SYMBOL(sha512_base_export); + +int sha512_base_import(struct shash_desc *desc, const void *in) +{ + struct sha512_state *sctx = shash_desc_ctx(desc); + struct sha512_state const *src = in; + + *sctx = *src; + + return 0; +} +EXPORT_SYMBOL(sha512_base_import); + +int sha512_base_do_update(struct shash_desc *desc, const u8 *data, + unsigned int len, sha512_block_fn *block_fn, void *p) +{ + struct sha512_state *sctx = shash_desc_ctx(desc); + unsigned int partial = sctx-count[0] % SHA512_BLOCK_SIZE; + + sctx-count[0] += len; + if (sctx-count[0] len) + sctx-count[1]++; + + if ((partial + len) = SHA512_BLOCK_SIZE) { + int blocks; + + if (partial) { + int p = SHA512_BLOCK_SIZE - partial; + + memcpy(sctx-buf + partial, data, p); + data += p; + len -= p; + } + + blocks = len / SHA512_BLOCK_SIZE; + len %= SHA512_BLOCK_SIZE; + + block_fn(blocks, data, sctx-state, +partial ? sctx-buf : NULL, p); + data += blocks * SHA512_BLOCK_SIZE; + partial = 0; + } + if (len) + memcpy(sctx-buf + partial, data, len); + + return 0; +} +EXPORT_SYMBOL(sha512_base_do_update); + +int sha512_base_do_finalize(struct shash_desc *desc, sha512_block_fn *block_fn, + void *p) +{ + static const u8 padding[SHA512_BLOCK_SIZE] = { 0x80, }; + + struct sha512_state *sctx = shash_desc_ctx(desc); + unsigned int padlen; + __be64 bits[2]; + + padlen = SHA512_BLOCK_SIZE - +(sctx-count[0] + sizeof(bits)) % SHA512_BLOCK_SIZE; + + bits[0
[RFC PATCH 6/6] arm/crypto: accelerated SHA-512 using ARM generic ASM and NEON
This updates the SHA-512 NEON module with the faster and more versatile implementation from the OpenSSL project. It consists of both a NEON and a generic ASM version of the core SHA-512 transform, where the NEON version reverts to the ASM version when invoked in non-process context. Performance relative to the generic implementation (measured using tcrypt.ko mode=306 sec=1 running on a Cortex-A57 under KVM): input sizeblock size asm neonold neon 1616 1.392.542.21 6416 1.322.332.09 6464 1.382.532.19 256 16 1.312.282.06 256 64 1.382.542.25 256 256 1.402.772.39 1024 16 1.292.222.01 1024 256 1.402.822.45 1024 10241.412.932.53 2048 16 1.332.212.00 2048 256 1.402.842.46 2048 10241.412.962.55 2048 20481.412.982.56 4096 16 1.342.201.99 4096 256 1.402.842.46 4096 10241.412.972.56 4096 40961.413.012.58 8192 16 1.342.191.99 8192 256 1.402.852.47 8192 10241.412.982.56 8192 40961.412.712.59 8192 81921.513.512.69 Signed-off-by: Ard Biesheuvel ard.biesheu...@linaro.org --- arch/arm/crypto/Kconfig |8 + arch/arm/crypto/Makefile |8 +- arch/arm/crypto/sha512-armv4.pl | 656 arch/arm/crypto/sha512-core.S_shipped | 1814 + arch/arm/crypto/sha512-glue.c | 137 +++ arch/arm/crypto/sha512-neon-glue.c| 111 ++ arch/arm/crypto/sha512.h |8 + 7 files changed, 2741 insertions(+), 1 deletion(-) create mode 100644 arch/arm/crypto/sha512-armv4.pl create mode 100644 arch/arm/crypto/sha512-core.S_shipped create mode 100644 arch/arm/crypto/sha512-glue.c create mode 100644 arch/arm/crypto/sha512-neon-glue.c create mode 100644 arch/arm/crypto/sha512.h diff --git a/arch/arm/crypto/Kconfig b/arch/arm/crypto/Kconfig index 458729d2ce22..6b50c6d77b77 100644 --- a/arch/arm/crypto/Kconfig +++ b/arch/arm/crypto/Kconfig @@ -53,6 +53,14 @@ config CRYPTO_SHA256_ARM SHA-256 secure hash standard (DFIPS 180-2) implemented using optimized ARM assembler and NEON, when available. +config CRYPTO_SHA512_ARM + tristate SHA-384/512 digest algorithm (ARM-asm and NEON) + select CRYPTO_HASH + select CRYPTO_SHA512_BASE + help + SHA-512 secure hash standard (DFIPS 180-2) implemented + using optimized ARM assembler and NEON, when available. + config CRYPTO_SHA512_ARM_NEON tristate SHA384 and SHA512 digest algorithm (ARM NEON) depends on KERNEL_MODE_NEON diff --git a/arch/arm/crypto/Makefile b/arch/arm/crypto/Makefile index ef46e898f98b..322a6ca999a2 100644 --- a/arch/arm/crypto/Makefile +++ b/arch/arm/crypto/Makefile @@ -8,6 +8,7 @@ obj-$(CONFIG_CRYPTO_AES_ARM_CE) += aes-arm-ce.o obj-$(CONFIG_CRYPTO_SHA1_ARM) += sha1-arm.o obj-$(CONFIG_CRYPTO_SHA1_ARM_NEON) += sha1-arm-neon.o obj-$(CONFIG_CRYPTO_SHA256_ARM) += sha256-arm.o +obj-$(CONFIG_CRYPTO_SHA512_ARM) += sha512-arm.o obj-$(CONFIG_CRYPTO_SHA512_ARM_NEON) += sha512-arm-neon.o obj-$(CONFIG_CRYPTO_SHA1_ARM_CE) += sha1-arm-ce.o obj-$(CONFIG_CRYPTO_SHA2_ARM_CE) += sha2-arm-ce.o @@ -19,6 +20,8 @@ sha1-arm-y:= sha1-armv4-large.o sha1_glue.o sha1-arm-neon-y:= sha1-armv7-neon.o sha1_neon_glue.o sha256-arm-neon-$(CONFIG_KERNEL_MODE_NEON) := sha256_neon_glue.o sha256-arm-y := sha256-core.o sha256_glue.o $(sha256-arm-neon-y) +sha512-arm-neon-$(CONFIG_KERNEL_MODE_NEON) := sha512-neon-glue.o +sha512-arm-y := sha512-core.o sha512-glue.o $(sha512-arm-neon-y) sha512-arm-neon-y := sha512-armv7-neon.o sha512_neon_glue.o sha1-arm-ce-y := sha1-ce-core.o sha1-ce-glue.o sha2-arm-ce-y := sha2-ce-core.o sha2-ce-glue.o @@ -34,4 +37,7 @@ $(src)/aesbs-core.S_shipped: $(src)/bsaes-armv7.pl $(src)/sha256-core.S_shipped: $(src)/sha256-armv4.pl $(call cmd,perl) -.PRECIOUS: $(obj)/aesbs-core.S $(obj)/sha256-core.S +$(src)/sha512-core.S_shipped: $(src)/sha512-armv4.pl + $(call cmd,perl) + +.PRECIOUS: $(obj)/aesbs-core.S $(obj)/sha256-core.S $(obj)/sha512-core.S diff --git a/arch/arm/crypto/sha512-armv4.pl b/arch/arm/crypto/sha512-armv4.pl new file mode 100644 index ..7e540f8439da --- /dev/null +++ b/arch/arm/crypto/sha512-armv4.pl @@ -0,0 +1,656 @@ +#!/usr/bin/env perl + +# +# Written by Andy Polyakov ap
Re: [PATCH] crypto/arm: accelerated SHA-512 using ARM generic ASM and NEON
On 28.03.2015 09:28, Ard Biesheuvel wrote: This updates the SHA-512 NEON module with the faster and more versatile implementation from the OpenSSL project. It consists of both a NEON and a generic ASM version of the core SHA-512 transform, where the NEON version reverts to the ASM version when invoked in non-process context. Performance relative to the generic implementation (measured using tcrypt.ko mode=306 sec=1 running on a Cortex-A57 under KVM): input size block size asm neonold neon 16 16 1.392.542.21 64 16 1.322.332.09 64 64 1.382.532.19 256 16 1.312.282.06 256 64 1.382.542.25 256 256 1.402.772.39 102416 1.292.222.01 1024256 1.402.822.45 102410241.412.932.53 204816 1.332.212.00 2048256 1.402.842.46 204810241.412.962.55 204820481.412.982.56 409616 1.342.201.99 4096256 1.402.842.46 409610241.412.972.56 409640961.413.012.58 819216 1.342.191.99 8192256 1.402.852.47 819210241.412.982.56 819240961.412.712.59 819281921.513.512.69 Signed-off-by: Ard Biesheuvel ard.biesheu...@linaro.org --- This should get the same treatment as Sami's sha56 version: I would like to wait until the OpenSSL source file hits the upstream repository so that I can refer to its sha1 hash in the commit log. arch/arm/crypto/Kconfig |2 - arch/arm/crypto/Makefile |8 +- arch/arm/crypto/sha512-armv4.pl | 656 arch/arm/crypto/sha512-armv7-neon.S | 455 - arch/arm/crypto/sha512-core.S_shipped | 1814 + arch/arm/crypto/sha512.h | 14 + arch/arm/crypto/sha512_glue.c | 255 + arch/arm/crypto/sha512_neon_glue.c| 155 +-- 8 files changed, 2762 insertions(+), 597 deletions(-) create mode 100644 arch/arm/crypto/sha512-armv4.pl delete mode 100644 arch/arm/crypto/sha512-armv7-neon.S Acked-by: Jussi Kivilinna jussi.kivili...@iki.fi create mode 100644 arch/arm/crypto/sha512-core.S_shipped create mode 100644 arch/arm/crypto/sha512.h create mode 100644 arch/arm/crypto/sha512_glue.c -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] crypto/arm: accelerated SHA-512 using ARM generic ASM and NEON
This updates the SHA-512 NEON module with the faster and more versatile implementation from the OpenSSL project. It consists of both a NEON and a generic ASM version of the core SHA-512 transform, where the NEON version reverts to the ASM version when invoked in non-process context. Performance relative to the generic implementation (measured using tcrypt.ko mode=306 sec=1 running on a Cortex-A57 under KVM): input size block size asm neonold neon 16 16 1.392.542.21 64 16 1.322.332.09 64 64 1.382.532.19 256 16 1.312.282.06 256 64 1.382.542.25 256 256 1.402.772.39 102416 1.292.222.01 1024256 1.402.822.45 102410241.412.932.53 204816 1.332.212.00 2048256 1.402.842.46 204810241.412.962.55 204820481.412.982.56 409616 1.342.201.99 4096256 1.402.842.46 409610241.412.972.56 409640961.413.012.58 819216 1.342.191.99 8192256 1.402.852.47 819210241.412.982.56 819240961.412.712.59 819281921.513.512.69 Signed-off-by: Ard Biesheuvel ard.biesheu...@linaro.org --- This should get the same treatment as Sami's sha56 version: I would like to wait until the OpenSSL source file hits the upstream repository so that I can refer to its sha1 hash in the commit log. arch/arm/crypto/Kconfig |2 - arch/arm/crypto/Makefile |8 +- arch/arm/crypto/sha512-armv4.pl | 656 arch/arm/crypto/sha512-armv7-neon.S | 455 - arch/arm/crypto/sha512-core.S_shipped | 1814 + arch/arm/crypto/sha512.h | 14 + arch/arm/crypto/sha512_glue.c | 255 + arch/arm/crypto/sha512_neon_glue.c| 155 +-- 8 files changed, 2762 insertions(+), 597 deletions(-) create mode 100644 arch/arm/crypto/sha512-armv4.pl delete mode 100644 arch/arm/crypto/sha512-armv7-neon.S create mode 100644 arch/arm/crypto/sha512-core.S_shipped create mode 100644 arch/arm/crypto/sha512.h create mode 100644 arch/arm/crypto/sha512_glue.c diff --git a/arch/arm/crypto/Kconfig b/arch/arm/crypto/Kconfig index 458729d2ce22..846694ad2b7d 100644 --- a/arch/arm/crypto/Kconfig +++ b/arch/arm/crypto/Kconfig @@ -55,8 +55,6 @@ config CRYPTO_SHA256_ARM config CRYPTO_SHA512_ARM_NEON tristate SHA384 and SHA512 digest algorithm (ARM NEON) - depends on KERNEL_MODE_NEON - select CRYPTO_SHA512 select CRYPTO_HASH help SHA-512 secure hash standard (DFIPS 180-2) implemented diff --git a/arch/arm/crypto/Makefile b/arch/arm/crypto/Makefile index ef46e898f98b..c0ed9b68fe12 100644 --- a/arch/arm/crypto/Makefile +++ b/arch/arm/crypto/Makefile @@ -19,7 +19,8 @@ sha1-arm-y:= sha1-armv4-large.o sha1_glue.o sha1-arm-neon-y:= sha1-armv7-neon.o sha1_neon_glue.o sha256-arm-neon-$(CONFIG_KERNEL_MODE_NEON) := sha256_neon_glue.o sha256-arm-y := sha256-core.o sha256_glue.o $(sha256-arm-neon-y) -sha512-arm-neon-y := sha512-armv7-neon.o sha512_neon_glue.o +sha512-arm-neon-$(CONFIG_KERNEL_MODE_NEON) := sha512_neon_glue.o +sha512-arm-neon-y := sha512-core.o sha512_glue.o $(sha512-arm-neon-y) sha1-arm-ce-y := sha1-ce-core.o sha1-ce-glue.o sha2-arm-ce-y := sha2-ce-core.o sha2-ce-glue.o aes-arm-ce-y := aes-ce-core.o aes-ce-glue.o @@ -34,4 +35,7 @@ $(src)/aesbs-core.S_shipped: $(src)/bsaes-armv7.pl $(src)/sha256-core.S_shipped: $(src)/sha256-armv4.pl $(call cmd,perl) -.PRECIOUS: $(obj)/aesbs-core.S $(obj)/sha256-core.S +$(src)/sha512-core.S_shipped: $(src)/sha512-armv4.pl + $(call cmd,perl) + +.PRECIOUS: $(obj)/aesbs-core.S $(obj)/sha256-core.S $(obj)/sha512-core.S diff --git a/arch/arm/crypto/sha512-armv4.pl b/arch/arm/crypto/sha512-armv4.pl new file mode 100644 index ..7e540f8439da --- /dev/null +++ b/arch/arm/crypto/sha512-armv4.pl @@ -0,0 +1,656 @@ +#!/usr/bin/env perl + +# +# Written by Andy Polyakov ap...@openssl.org for the OpenSSL +# project. The module is, however, dual licensed under OpenSSL and +# CRYPTOGAMS licenses depending on where you obtain it. For further +# details see http://www.openssl.org/~appro/cryptogams
Re: [PATCH] crypto: testmgr: add empty and large test vectors for SHA-1, SHA-224, SHA-256, SHA-384 and SHA-512
On Sat, Apr 12, 2014 at 03:35:29PM +0300, Jussi Kivilinna wrote: Patch adds large test-vectors for SHA algorithms for better code coverage in optimized assembly implementations. Empty test-vectors are also added, as some crypto drivers appear to have special case handling for empty input. Signed-off-by: Jussi Kivilinna jussi.kivili...@iki.fi Patch applied. Thanks! -- Email: Herbert Xu herb...@gondor.apana.org.au Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] crypto: testmgr: add empty and large test vectors for SHA-1, SHA-224, SHA-256, SHA-384 and SHA-512
Patch adds large test-vectors for SHA algorithms for better code coverage in optimized assembly implementations. Empty test-vectors are also added, as some crypto drivers appear to have special case handling for empty input. Signed-off-by: Jussi Kivilinna jussi.kivili...@iki.fi --- This patch depends on the crypto: add test cases for SHA-1, SHA-224, SHA-256 and AES-CCM patch from Ard Biesheuvel. --- crypto/testmgr.h | 728 +- 1 file changed, 721 insertions(+), 7 deletions(-) diff --git a/crypto/testmgr.h b/crypto/testmgr.h index 84ac0f0..7d1438e 100644 --- a/crypto/testmgr.h +++ b/crypto/testmgr.h @@ -487,10 +487,15 @@ static struct hash_testvec crct10dif_tv_template[] = { * SHA1 test vectors from from FIPS PUB 180-1 * Long vector from CAVS 5.0 */ -#define SHA1_TEST_VECTORS 4 +#define SHA1_TEST_VECTORS 6 static struct hash_testvec sha1_tv_template[] = { { + .plaintext = , + .psize = 0, + .digest = \xda\x39\xa3\xee\x5e\x6b\x4b\x0d\x32\x55 + \xbf\xef\x95\x60\x18\x90\xaf\xd8\x07\x09, + }, { .plaintext = abc, .psize = 3, .digest = \xa9\x99\x3e\x36\x47\x06\x81\x6a\xba\x3e @@ -534,6 +539,139 @@ static struct hash_testvec sha1_tv_template[] = { .psize = 64, .digest = \xc8\x71\xf6\x9a\x63\xcc\xa9\x84\x84\x82 \x64\xe7\x79\x95\x5d\xd7\x19\x41\x7c\x91, + }, { + .plaintext = \x08\x9f\x13\xaa\x41\xd8\x4c\xe3 +\x7a\x11\x85\x1c\xb3\x27\xbe\x55 +\xec\x60\xf7\x8e\x02\x99\x30\xc7 +\x3b\xd2\x69\x00\x74\x0b\xa2\x16 +\xad\x44\xdb\x4f\xe6\x7d\x14\x88 +\x1f\xb6\x2a\xc1\x58\xef\x63\xfa +\x91\x05\x9c\x33\xca\x3e\xd5\x6c +\x03\x77\x0e\xa5\x19\xb0\x47\xde +\x52\xe9\x80\x17\x8b\x22\xb9\x2d +\xc4\x5b\xf2\x66\xfd\x94\x08\x9f +\x36\xcd\x41\xd8\x6f\x06\x7a\x11 +\xa8\x1c\xb3\x4a\xe1\x55\xec\x83 +\x1a\x8e\x25\xbc\x30\xc7\x5e\xf5 +\x69\x00\x97\x0b\xa2\x39\xd0\x44 +\xdb\x72\x09\x7d\x14\xab\x1f\xb6 +\x4d\xe4\x58\xef\x86\x1d\x91\x28 +\xbf\x33\xca\x61\xf8\x6c\x03\x9a +\x0e\xa5\x3c\xd3\x47\xde\x75\x0c +\x80\x17\xae\x22\xb9\x50\xe7\x5b +\xf2\x89\x20\x94\x2b\xc2\x36\xcd +\x64\xfb\x6f\x06\x9d\x11\xa8\x3f +\xd6\x4a\xe1\x78\x0f\x83\x1a\xb1 +\x25\xbc\x53\xea\x5e\xf5\x8c\x00 +\x97\x2e\xc5\x39\xd0\x67\xfe\x72 +\x09\xa0\x14\xab\x42\xd9\x4d\xe4 +\x7b\x12\x86\x1d\xb4\x28\xbf\x56 +\xed\x61\xf8\x8f\x03\x9a\x31\xc8 +\x3c\xd3\x6a\x01\x75\x0c\xa3\x17 +\xae\x45\xdc\x50\xe7\x7e\x15\x89 +\x20\xb7\x2b\xc2\x59\xf0\x64\xfb +\x92\x06\x9d\x34\xcb\x3f\xd6\x6d +\x04\x78\x0f\xa6\x1a\xb1\x48\xdf +\x53\xea\x81\x18\x8c\x23\xba\x2e +\xc5\x5c\xf3\x67\xfe\x95\x09\xa0 +\x37\xce\x42\xd9\x70\x07\x7b\x12 +\xa9\x1d\xb4\x4b\xe2\x56\xed\x84 +\x1b\x8f\x26\xbd\x31\xc8\x5f\xf6 +\x6a\x01\x98\x0c\xa3\x3a\xd1\x45 +\xdc\x73\x0a\x7e\x15\xac\x20\xb7 +\x4e\xe5\x59\xf0\x87\x1e\x92\x29 +\xc0\x34\xcb\x62\xf9\x6d\x04\x9b +\x0f\xa6\x3d\xd4\x48\xdf\x76\x0d +\x81\x18\xaf\x23\xba\x51\xe8\x5c +\xf3\x8a\x21\x95\x2c\xc3\x37\xce +\x65\xfc\x70\x07\x9e\x12\xa9\x40 +\xd7\x4b\xe2\x79\x10\x84\x1b\xb2 +\x26\xbd\x54\xeb\x5f\xf6\x8d\x01 +\x98\x2f\xc6\x3a\xd1\x68\xff\x73 +\x0a\xa1\x15\xac\x43\xda\x4e\xe5 +\x7c\x13\x87\x1e\xb5\x29\xc0\x57 +\xee\x62\xf9\x90\x04\x9b\x32\xc9 +\x3d\xd4\x6b\x02\x76\x0d\xa4\x18 +\xaf\x46\xdd\x51\xe8\x7f\x16\x8a +\x21\xb8\x2c\xc3\x5a\xf1\x65\xfc +\x93\x07\x9e\x35\xcc\x40\xd7\x6e +
Re: [PATCH] crypto: Fix byte counter overflow in SHA-512
On Fri, Mar 16, 2012 at 08:26:28PM +, Kent Yoder wrote: The current code only increments the upper 64 bits of the SHA-512 byte counter when the number of bytes hashed happens to hit 2^64 exactly. This patch increments the upper 64 bits whenever the lower 64 bits overflows. Signed-off-by: Kent Yoder k...@linux.vnet.ibm.com Good catch. Patch applied to crypto and stable. Thanks a lot! -- Email: Herbert Xu herb...@gondor.apana.org.au Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] crypto: Fix byte counter overflow in SHA-512
The current code only increments the upper 64 bits of the SHA-512 byte counter when the number of bytes hashed happens to hit 2^64 exactly. This patch increments the upper 64 bits whenever the lower 64 bits overflows. Signed-off-by: Kent Yoder k...@linux.vnet.ibm.com --- crypto/sha512_generic.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/crypto/sha512_generic.c b/crypto/sha512_generic.c index 107f6f7..dd30f40 100644 --- a/crypto/sha512_generic.c +++ b/crypto/sha512_generic.c @@ -174,7 +174,7 @@ sha512_update(struct shash_desc *desc, const u8 *data, unsigned int len) index = sctx-count[0] 0x7f; /* Update number of bytes */ - if (!(sctx-count[0] += len)) + if ((sctx-count[0] += len) len) sctx-count[1]++; part_len = 128 - index; -- 1.7.5.4 -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: sha-512...
On Wed, Feb 15, 2012 at 12:23:52AM -0500, David Miller wrote: From: Herbert Xu herb...@gondor.hengli.com.au Date: Wed, 15 Feb 2012 16:16:08 +1100 OK, so we grew by 1136 - 888 = 248. Keep in mind that 128 of that is expected since we moved W onto the stack. Right. I guess we could go back to the percpu solution, what do you think? I'm not entirely sure, we might have to. sha512 is notorious for generating terrible code with gcc on 32-bit targets, so... The sha512 test in the glibc testsuite tends to timeout on 32-bit sparc. :-) Cherrypicking ror64() commit largely fixes the issue (on sparc-defconfig): sha512_transform: 0: 9d e3 bc 78 save %sp, -904, %sp git://git.kernel.org/pub/scm/linux/kernel/git/herbert/cryptodev-2.6.git b85a088f15f2070b7180735a231012843a5ac96c crypto: sha512 - use standard ror64() -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: sha-512...
From: Alexey Dobriyan adobri...@gmail.com Date: Wed, 15 Feb 2012 22:27:52 +0300 On Wed, Feb 15, 2012 at 12:23:52AM -0500, David Miller wrote: From: Herbert Xu herb...@gondor.hengli.com.au Date: Wed, 15 Feb 2012 16:16:08 +1100 OK, so we grew by 1136 - 888 = 248. Keep in mind that 128 of that is expected since we moved W onto the stack. Right. I guess we could go back to the percpu solution, what do you think? I'm not entirely sure, we might have to. sha512 is notorious for generating terrible code with gcc on 32-bit targets, so... The sha512 test in the glibc testsuite tends to timeout on 32-bit sparc. :-) Cherrypicking ror64() commit largely fixes the issue (on sparc-defconfig): sha512_transform: 0: 9d e3 bc 78 save %sp, -904, %sp git://git.kernel.org/pub/scm/linux/kernel/git/herbert/cryptodev-2.6.git b85a088f15f2070b7180735a231012843a5ac96c crypto: sha512 - use standard ror64() I'm happy with a solution that involves pushing this change to Linus's tree, it's pretty clear why it helps so much although I'm disappointed that gcc can't se that the u64 shift argument passed in is always a constant and therefore way within the range of a 32-bit value, ho hum :-) In fact, in my tree, this change brings the stack allocation instruction down to: save%sp, -824, %sp ! which is actually BETTER than what the old per-cpu code got: save%sp, -984, %sp ! Therefore I highly recommend we apply that ror() change to Linus's tree now. :-) -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: sha-512...
On Wed, Feb 15, 2012 at 04:00:10PM -0500, David Miller wrote: In fact, in my tree, this change brings the stack allocation instruction down to: save%sp, -824, %sp ! which is actually BETTER than what the old per-cpu code got: save%sp, -984, %sp ! Therefore I highly recommend we apply that ror() change to Linus's tree now. :-) Great, I'll push that out today. Thanks,! -- Email: Herbert Xu herb...@gondor.apana.org.au Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
sha-512...
FYI, I just started seeing this on sparc32 after all those sha512 optimizations: crypto/sha512_generic.c: In function 'sha512_transform': crypto/sha512_generic.c:135:1: warning: the frame size of 1136 bytes is larger than 1024 bytes [-Wframe-larger-than=] -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: sha-512...
On Tue, Feb 14, 2012 at 10:58:33PM -0500, David Miller wrote: FYI, I just started seeing this on sparc32 after all those sha512 optimizations: crypto/sha512_generic.c: In function 'sha512_transform': crypto/sha512_generic.c:135:1: warning: the frame size of 1136 bytes is larger than 1024 bytes [-Wframe-larger-than=] Is that with the latest patch applied? crypto: sha512 - Avoid stack bloat on i386 If so then this is not good. What was the original stack usage, i.e., if you revert to the original percpu code? Thanks, -- Email: Herbert Xu herb...@gondor.apana.org.au Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: sha-512...
From: Herbert Xu herb...@gondor.hengli.com.au Date: Wed, 15 Feb 2012 15:01:28 +1100 On Tue, Feb 14, 2012 at 10:58:33PM -0500, David Miller wrote: FYI, I just started seeing this on sparc32 after all those sha512 optimizations: crypto/sha512_generic.c: In function 'sha512_transform': crypto/sha512_generic.c:135:1: warning: the frame size of 1136 bytes is larger than 1024 bytes [-Wframe-larger-than=] Is that with the latest patch applied? crypto: sha512 - Avoid stack bloat on i386 If so then this is not good. Yes. And, of course, with that commit reverted it's even worse. Reverting it makes the stack frame twice as large. What was the original stack usage, i.e., if you revert to the original percpu code? If I revert: commit 3a92d687c8015860a19213e3c102cad6b722f83c commit 58d7d18b5268febb8b1391c6dffc8e2aaa751fcd commit 51fc6dc8f948047364f7d42a4ed89b416c6cc0a3 commit 84e31fdb7c797a7303e0cc295cb9bc8b73fb872d the stackframe goes down to 888 bytes. More detailed, the progression is: master 1136 revert 3a92d687c8015860a19213e3c102cad6b722f83c 2408 revert 58d7d18b5268febb8b1391c6dffc8e2aaa751fcd 2408 revert 51fc6dc8f948047364f7d42a4ed89b416c6cc0a3 1520 revert 84e31fdb7c797a7303e0cc295cb9bc8b73fb872d 888 -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: sha-512...
On Wed, Feb 15, 2012 at 12:11:13AM -0500, David Miller wrote: On Tue, Feb 14, 2012 at 10:58:33PM -0500, David Miller wrote: FYI, I just started seeing this on sparc32 after all those sha512 optimizations: crypto/sha512_generic.c: In function 'sha512_transform': crypto/sha512_generic.c:135:1: warning: the frame size of 1136 bytes is larger than 1024 bytes [-Wframe-larger-than=] Is that with the latest patch applied? crypto: sha512 - Avoid stack bloat on i386 If so then this is not good. Yes. And, of course, with that commit reverted it's even worse. Reverting it makes the stack frame twice as large. What was the original stack usage, i.e., if you revert to the original percpu code? If I revert: commit 3a92d687c8015860a19213e3c102cad6b722f83c commit 58d7d18b5268febb8b1391c6dffc8e2aaa751fcd commit 51fc6dc8f948047364f7d42a4ed89b416c6cc0a3 commit 84e31fdb7c797a7303e0cc295cb9bc8b73fb872d the stackframe goes down to 888 bytes. More detailed, the progression is: master1136 revert 3a92d687c8015860a19213e3c102cad6b722f83c 2408 revert 58d7d18b5268febb8b1391c6dffc8e2aaa751fcd 2408 revert 51fc6dc8f948047364f7d42a4ed89b416c6cc0a3 1520 revert 84e31fdb7c797a7303e0cc295cb9bc8b73fb872d 888 OK, so we grew by 1136 - 888 = 248. Keep in mind that 128 of that is expected since we moved W onto the stack. I guess we could go back to the percpu solution, what do you think? Cheers, -- Email: Herbert Xu herb...@gondor.apana.org.au Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: sha-512...
From: Herbert Xu herb...@gondor.hengli.com.au Date: Wed, 15 Feb 2012 16:16:08 +1100 OK, so we grew by 1136 - 888 = 248. Keep in mind that 128 of that is expected since we moved W onto the stack. Right. I guess we could go back to the percpu solution, what do you think? I'm not entirely sure, we might have to. sha512 is notorious for generating terrible code with gcc on 32-bit targets, so... The sha512 test in the glibc testsuite tends to timeout on 32-bit sparc. :-) -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html