Re: [PATCH] crypto/arm: accelerated SHA-512 using ARM generic ASM and NEON
On 28.03.2015 09:28, Ard Biesheuvel wrote: This updates the SHA-512 NEON module with the faster and more versatile implementation from the OpenSSL project. It consists of both a NEON and a generic ASM version of the core SHA-512 transform, where the NEON version reverts to the ASM version when invoked in non-process context. Performance relative to the generic implementation (measured using tcrypt.ko mode=306 sec=1 running on a Cortex-A57 under KVM): input size block size asm neonold neon 16 16 1.392.542.21 64 16 1.322.332.09 64 64 1.382.532.19 256 16 1.312.282.06 256 64 1.382.542.25 256 256 1.402.772.39 102416 1.292.222.01 1024256 1.402.822.45 102410241.412.932.53 204816 1.332.212.00 2048256 1.402.842.46 204810241.412.962.55 204820481.412.982.56 409616 1.342.201.99 4096256 1.402.842.46 409610241.412.972.56 409640961.413.012.58 819216 1.342.191.99 8192256 1.402.852.47 819210241.412.982.56 819240961.412.712.59 819281921.513.512.69 Signed-off-by: Ard Biesheuvel ard.biesheu...@linaro.org --- This should get the same treatment as Sami's sha56 version: I would like to wait until the OpenSSL source file hits the upstream repository so that I can refer to its sha1 hash in the commit log. arch/arm/crypto/Kconfig |2 - arch/arm/crypto/Makefile |8 +- arch/arm/crypto/sha512-armv4.pl | 656 arch/arm/crypto/sha512-armv7-neon.S | 455 - arch/arm/crypto/sha512-core.S_shipped | 1814 + arch/arm/crypto/sha512.h | 14 + arch/arm/crypto/sha512_glue.c | 255 + arch/arm/crypto/sha512_neon_glue.c| 155 +-- 8 files changed, 2762 insertions(+), 597 deletions(-) create mode 100644 arch/arm/crypto/sha512-armv4.pl delete mode 100644 arch/arm/crypto/sha512-armv7-neon.S Acked-by: Jussi Kivilinna jussi.kivili...@iki.fi create mode 100644 arch/arm/crypto/sha512-core.S_shipped create mode 100644 arch/arm/crypto/sha512.h create mode 100644 arch/arm/crypto/sha512_glue.c -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: rfc4543 testvectors in testmgr.h kernel
On 10.02.2015 18:22, Marcus Meissner wrote: Hi Jussi, We were trying to use rfc4543(gcm(aes)) in the kernel for FIPS mode, but the testvectors seem to fail. You probably need to add '.fips_allowed = 1,' in testmgr.c for rfc4543(gcm(aes)) to enable algorithm in fips mode. Did you verify that they work? Are these the ones from Page 18 of https://tools.ietf.org/html/draft-mcgrew-gcm-test-01, as there the plaintext and aaad seem to be switched? rfc4543() wrapper constructs the aad from '.assoc' and '.input'. -Jussi Ciao, Marcus signature.asc Description: OpenPGP digital signature
Re: Kernel crypto API: cryptoperf performance measurement
On 2014-08-20 21:14, Milan Broz wrote: On 08/20/2014 03:25 PM, Jussi Kivilinna wrote: One to four GB per second for XTS? 12 GB per second for AES CBC? Somehow that does not sound right. Agreed, those do not look correct... I wonder what happened there. On new run, I got more sane results: Which cryptsetup version are you using? There was a bug in that test on fast machines (fixed in 1.6.3, I hope :) I had version 1.6.1 at hand. But anyway, it is not intended as rigorous speed test, it was intended for comparison of ciphers speed on particular machine. True, but it's nice easy test when compared to parsing results from tcrypt speed tests. -Jussi Test basically tries to encrypt 1MB block (or multiple of this if machine is too fast). All it runs through kernel userspace crypto API interface. (Real FDE is always slower because it runs over 512bytes blocks.) Milan -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Kernel crypto API: cryptoperf performance measurement
Hello, On 2014-08-19 21:23, Stephan Mueller wrote: Am Dienstag, 19. August 2014, 10:17:36 schrieb Jussi Kivilinna: Hi Jussi, Hello, On 2014-08-17 18:55, Stephan Mueller wrote: Hi, during playing around with the kernel crypto API, I implemented a performance measurement tool kit for the various kernel crypto API cipher types. The cryptoperf tool kit is provided in [1]. Comments are welcome. Your results are quite slow compared to, for example cryptsetup benchmark, which uses kernel crypto from userspace. With Intel i5-2450M (turbo enabled), I get: # Algorithm | Key | Encryption | Decryption aes-cbc 128b 524,0 MiB/s 11909,1 MiB/s serpent-cbc 128b60,9 MiB/s 219,4 MiB/s twofish-cbc 128b 143,4 MiB/s 240,3 MiB/s aes-cbc 256b 330,4 MiB/s 1242,8 MiB/s serpent-cbc 256b66,1 MiB/s 220,3 MiB/s twofish-cbc 256b 143,5 MiB/s 221,8 MiB/s aes-xts 256b 1268,7 MiB/s 4193,0 MiB/s serpent-xts 256b 234,8 MiB/s 224,6 MiB/s twofish-xts 256b 253,5 MiB/s 254,7 MiB/s aes-xts 512b 2535,0 MiB/s 2945,0 MiB/s serpent-xts 512b 274,2 MiB/s 242,3 MiB/s twofish-xts 512b 250,0 MiB/s 245,8 MiB/s One to four GB per second for XTS? 12 GB per second for AES CBC? Somehow that does not sound right. Agreed, those do not look correct... I wonder what happened there. On new run, I got more sane results: # Algorithm | Key | Encryption | Decryption aes-cbc 128b 139,1 MiB/s 1713,6 MiB/s serpent-cbc 128b62,2 MiB/s 232,9 MiB/s twofish-cbc 128b 116,3 MiB/s 243,7 MiB/s aes-cbc 256b 375,1 MiB/s 1159,4 MiB/s serpent-cbc 256b62,1 MiB/s 214,9 MiB/s twofish-cbc 256b 139,3 MiB/s 217,5 MiB/s aes-xts 256b 1296,4 MiB/s 1272,5 MiB/s serpent-xts 256b 283,3 MiB/s 275,6 MiB/s twofish-xts 256b 294,8 MiB/s 299,3 MiB/s aes-xts 512b 984,3 MiB/s 991,1 MiB/s serpent-xts 512b 227,7 MiB/s 220,6 MiB/s twofish-xts 512b 220,6 MiB/s 220,2 MiB/s -Jussi -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Kernel crypto API: cryptoperf performance measurement
Hello, On 2014-08-17 18:55, Stephan Mueller wrote: Hi, during playing around with the kernel crypto API, I implemented a performance measurement tool kit for the various kernel crypto API cipher types. The cryptoperf tool kit is provided in [1]. Comments are welcome. Your results are quite slow compared to, for example cryptsetup benchmark, which uses kernel crypto from userspace. With Intel i5-2450M (turbo enabled), I get: # Algorithm | Key | Encryption | Decryption aes-cbc 128b 524,0 MiB/s 11909,1 MiB/s serpent-cbc 128b60,9 MiB/s 219,4 MiB/s twofish-cbc 128b 143,4 MiB/s 240,3 MiB/s aes-cbc 256b 330,4 MiB/s 1242,8 MiB/s serpent-cbc 256b66,1 MiB/s 220,3 MiB/s twofish-cbc 256b 143,5 MiB/s 221,8 MiB/s aes-xts 256b 1268,7 MiB/s 4193,0 MiB/s serpent-xts 256b 234,8 MiB/s 224,6 MiB/s twofish-xts 256b 253,5 MiB/s 254,7 MiB/s aes-xts 512b 2535,0 MiB/s 2945,0 MiB/s serpent-xts 512b 274,2 MiB/s 242,3 MiB/s twofish-xts 512b 250,0 MiB/s 245,8 MiB/s In general, the results are as expected, i.e. the assembler implementations are faster than the pure C implementations. However, there are curious results which probably should be checked by the maintainers of the respective ciphers (hoping that my tool works correctly ;-) ): ablkcipher -- - cryptd is slower by factor 10 across the board blkcipher - - Blowfish x86_64 assembler together with the generic C block chaining modes is significantly slower than Blowfish implemented in generic C - Blowfish x86_64 assembler in ECB is significantly slower than generic C Blowfish ECB - Serpent assembler implementations are not significantly faster than generic C implementations - AES-NI ECB, LRW, CTR is significantly slower than AES i586 assembler. - AES-NI ECB, LRW, CTR is not significantly faster than AES generic C Quite many assembly implementations get speed up from processing parallel block cipher blocks, which modes of operation that (CTR, XTS, LWR, CBC(dec)). For small buffer sizes, these implementations will use the non-parallel implementation of cipher. -Jussi rng --- - The ANSI X9.31 RNG seems to work massively faster than the underlying AES cipher (by about a factor of 5). I am unsure about the cause of this. Caveat -- Please note that there is one small error which I am unsure how to fix it as documented in the TODO file. [1] http://www.chronox.de/cryptoperf.html -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] [v3] crypto: sha512: add ARM NEON implementation
On 29.07.2014 15:35, Ard Biesheuvel wrote: On 30 June 2014 18:39, Jussi Kivilinna jussi.kivili...@iki.fi wrote: This patch adds ARM NEON assembly implementation of SHA-512 and SHA-384 algorithms. tcrypt benchmark results on Cortex-A8, sha512-generic vs sha512-neon-asm: block-size bytes/updateold-vs-new 16 16 2.99x 64 16 2.67x 64 64 3.00x 256 16 2.64x 256 64 3.06x 256 256 3.33x 102416 2.53x 1024256 3.39x 102410243.52x 204816 2.50x 2048256 3.41x 204810243.54x 204820483.57x 409616 2.49x 4096256 3.42x 409610243.56x 409640963.59x 819216 2.48x 8192256 3.42x 819210243.56x 819240963.60x 819281923.60x Acked-by: Ard Biesheuvel ard.biesheu...@linaro.org Tested-by: Ard Biesheuvel ard.biesheu...@linaro.org Signed-off-by: Jussi Kivilinna jussi.kivili...@iki.fi --- Changes in v2: - Use ENTRY/ENDPROC - Don't provide Thumb2 version v3: - Changelog moved below '---' Hi Jussi, What is the status of these patches? Have you sent them to Russell's patch tracker? I sent them to patch tracker moment ago. Thanks for the reminder. -Jussi -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/2] [v3] crypto: sha1: add ARM NEON implementation
This patch adds ARM NEON assembly implementation of SHA-1 algorithm. tcrypt benchmark results on Cortex-A8, sha1-arm-asm vs sha1-neon-asm: block-size bytes/updateold-vs-new 16 16 1.04x 64 16 1.02x 64 64 1.05x 256 16 1.03x 256 64 1.04x 256 256 1.30x 102416 1.03x 1024256 1.36x 102410241.52x 204816 1.03x 2048256 1.39x 204810241.55x 204820481.59x 409616 1.03x 4096256 1.40x 409610241.57x 409640961.62x 819216 1.03x 8192256 1.40x 819210241.58x 819240961.63x 819281921.63x Acked-by: Ard Biesheuvel ard.biesheu...@linaro.org Tested-by: Ard Biesheuvel ard.biesheu...@linaro.org Signed-off-by: Jussi Kivilinna jussi.kivili...@iki.fi --- Changes in v2: - Use ENTRY/ENDPROC - Don't provide Thumb2 version - Move contants to .text section - Further tweaks to implementation for ~10% speed-up. v3: - Changelog moved below '---' --- arch/arm/crypto/Makefile |2 arch/arm/crypto/sha1-armv7-neon.S | 634 arch/arm/crypto/sha1_glue.c|8 arch/arm/crypto/sha1_neon_glue.c | 197 +++ arch/arm/include/asm/crypto/sha1.h | 10 + crypto/Kconfig | 11 + 6 files changed, 859 insertions(+), 3 deletions(-) create mode 100644 arch/arm/crypto/sha1-armv7-neon.S create mode 100644 arch/arm/crypto/sha1_neon_glue.c create mode 100644 arch/arm/include/asm/crypto/sha1.h diff --git a/arch/arm/crypto/Makefile b/arch/arm/crypto/Makefile index 81cda39..374956d 100644 --- a/arch/arm/crypto/Makefile +++ b/arch/arm/crypto/Makefile @@ -5,10 +5,12 @@ obj-$(CONFIG_CRYPTO_AES_ARM) += aes-arm.o obj-$(CONFIG_CRYPTO_AES_ARM_BS) += aes-arm-bs.o obj-$(CONFIG_CRYPTO_SHA1_ARM) += sha1-arm.o +obj-$(CONFIG_CRYPTO_SHA1_ARM_NEON) += sha1-arm-neon.o aes-arm-y := aes-armv4.o aes_glue.o aes-arm-bs-y := aesbs-core.o aesbs-glue.o sha1-arm-y := sha1-armv4-large.o sha1_glue.o +sha1-arm-neon-y:= sha1-armv7-neon.o sha1_neon_glue.o quiet_cmd_perl = PERL$@ cmd_perl = $(PERL) $() $(@) diff --git a/arch/arm/crypto/sha1-armv7-neon.S b/arch/arm/crypto/sha1-armv7-neon.S new file mode 100644 index 000..50013c0 --- /dev/null +++ b/arch/arm/crypto/sha1-armv7-neon.S @@ -0,0 +1,634 @@ +/* sha1-armv7-neon.S - ARM/NEON accelerated SHA-1 transform function + * + * Copyright © 2013-2014 Jussi Kivilinna jussi.kivili...@iki.fi + * + * This program is free software; you can redistribute it and/or modify it + * under the terms of the GNU General Public License as published by the Free + * Software Foundation; either version 2 of the License, or (at your option) + * any later version. + */ + +#include linux/linkage.h + + +.syntax unified +.code 32 +.fpu neon + +.text + + +/* Context structure */ + +#define state_h0 0 +#define state_h1 4 +#define state_h2 8 +#define state_h3 12 +#define state_h4 16 + + +/* Constants */ + +#define K1 0x5A827999 +#define K2 0x6ED9EBA1 +#define K3 0x8F1BBCDC +#define K4 0xCA62C1D6 +.align 4 +.LK_VEC: +.LK1: .long K1, K1, K1, K1 +.LK2: .long K2, K2, K2, K2 +.LK3: .long K3, K3, K3, K3 +.LK4: .long K4, K4, K4, K4 + + +/* Register macros */ + +#define RSTATE r0 +#define RDATA r1 +#define RNBLKS r2 +#define ROLDSTACK r3 +#define RWK lr + +#define _a r4 +#define _b r5 +#define _c r6 +#define _d r7 +#define _e r8 + +#define RT0 r9 +#define RT1 r10 +#define RT2 r11 +#define RT3 r12 + +#define W0 q0 +#define W1 q1 +#define W2 q2 +#define W3 q3 +#define W4 q4 +#define W5 q5 +#define W6 q6 +#define W7 q7 + +#define tmp0 q8 +#define tmp1 q9 +#define tmp2 q10 +#define tmp3 q11 + +#define qK1 q12 +#define qK2 q13 +#define qK3 q14 +#define qK4 q15 + + +/* Round function macros. */ + +#define WK_offs(i) (((i) 15) * 4) + +#define _R_F1(a,b,c,d,e,i,pre1,pre2,pre3,i16,\ + W,W_m04,W_m08,W_m12,W_m16,W_m20,W_m24,W_m28) \ + ldr RT3, [sp, WK_offs(i)]; \ + pre1(i16,W,W_m04,W_m08,W_m12,W_m16,W_m20,W_m24,W_m28); \ + bic RT0, d, b; \ + add e, e, a, ror #(32 - 5); \ + and RT1, c, b; \ + pre2(i16,W,W_m04,W_m08,W_m12,W_m16,W_m20,W_m24,W_m28); \ + add RT0, RT0, RT3; \ + add e, e, RT1; \ + ror b, #(32 - 30); \ + pre3(i16,W,W_m04,W_m08,W_m12,W_m16,W_m20,W_m24,W_m28); \ + add e, e, RT0; + +#define _R_F2(a,b,c,d,e,i,pre1,pre2,pre3,i16,\ + W,W_m04,W_m08,W_m12,W_m16,W_m20,W_m24,W_m28) \ + ldr RT3, [sp, WK_offs(i)]; \ + pre1(i16,W,W_m04,W_m08,W_m12,W_m16,W_m20,W_m24
[PATCH] [v3] crypto: sha512: add ARM NEON implementation
This patch adds ARM NEON assembly implementation of SHA-512 and SHA-384 algorithms. tcrypt benchmark results on Cortex-A8, sha512-generic vs sha512-neon-asm: block-size bytes/updateold-vs-new 16 16 2.99x 64 16 2.67x 64 64 3.00x 256 16 2.64x 256 64 3.06x 256 256 3.33x 102416 2.53x 1024256 3.39x 102410243.52x 204816 2.50x 2048256 3.41x 204810243.54x 204820483.57x 409616 2.49x 4096256 3.42x 409610243.56x 409640963.59x 819216 2.48x 8192256 3.42x 819210243.56x 819240963.60x 819281923.60x Acked-by: Ard Biesheuvel ard.biesheu...@linaro.org Tested-by: Ard Biesheuvel ard.biesheu...@linaro.org Signed-off-by: Jussi Kivilinna jussi.kivili...@iki.fi --- Changes in v2: - Use ENTRY/ENDPROC - Don't provide Thumb2 version v3: - Changelog moved below '---' --- arch/arm/crypto/Makefile|2 arch/arm/crypto/sha512-armv7-neon.S | 455 +++ arch/arm/crypto/sha512_neon_glue.c | 305 +++ crypto/Kconfig | 15 + 4 files changed, 777 insertions(+) create mode 100644 arch/arm/crypto/sha512-armv7-neon.S create mode 100644 arch/arm/crypto/sha512_neon_glue.c diff --git a/arch/arm/crypto/Makefile b/arch/arm/crypto/Makefile index 374956d..b48fa34 100644 --- a/arch/arm/crypto/Makefile +++ b/arch/arm/crypto/Makefile @@ -6,11 +6,13 @@ obj-$(CONFIG_CRYPTO_AES_ARM) += aes-arm.o obj-$(CONFIG_CRYPTO_AES_ARM_BS) += aes-arm-bs.o obj-$(CONFIG_CRYPTO_SHA1_ARM) += sha1-arm.o obj-$(CONFIG_CRYPTO_SHA1_ARM_NEON) += sha1-arm-neon.o +obj-$(CONFIG_CRYPTO_SHA512_ARM_NEON) += sha512-arm-neon.o aes-arm-y := aes-armv4.o aes_glue.o aes-arm-bs-y := aesbs-core.o aesbs-glue.o sha1-arm-y := sha1-armv4-large.o sha1_glue.o sha1-arm-neon-y:= sha1-armv7-neon.o sha1_neon_glue.o +sha512-arm-neon-y := sha512-armv7-neon.o sha512_neon_glue.o quiet_cmd_perl = PERL$@ cmd_perl = $(PERL) $() $(@) diff --git a/arch/arm/crypto/sha512-armv7-neon.S b/arch/arm/crypto/sha512-armv7-neon.S new file mode 100644 index 000..fe99472 --- /dev/null +++ b/arch/arm/crypto/sha512-armv7-neon.S @@ -0,0 +1,455 @@ +/* sha512-armv7-neon.S - ARM/NEON assembly implementation of SHA-512 transform + * + * Copyright © 2013-2014 Jussi Kivilinna jussi.kivili...@iki.fi + * + * This program is free software; you can redistribute it and/or modify it + * under the terms of the GNU General Public License as published by the Free + * Software Foundation; either version 2 of the License, or (at your option) + * any later version. + */ + +#include linux/linkage.h + + +.syntax unified +.code 32 +.fpu neon + +.text + +/* structure of SHA512_CONTEXT */ +#define hd_a 0 +#define hd_b ((hd_a) + 8) +#define hd_c ((hd_b) + 8) +#define hd_d ((hd_c) + 8) +#define hd_e ((hd_d) + 8) +#define hd_f ((hd_e) + 8) +#define hd_g ((hd_f) + 8) + +/* register macros */ +#define RK %r2 + +#define RA d0 +#define RB d1 +#define RC d2 +#define RD d3 +#define RE d4 +#define RF d5 +#define RG d6 +#define RH d7 + +#define RT0 d8 +#define RT1 d9 +#define RT2 d10 +#define RT3 d11 +#define RT4 d12 +#define RT5 d13 +#define RT6 d14 +#define RT7 d15 + +#define RT01q q4 +#define RT23q q5 +#define RT45q q6 +#define RT67q q7 + +#define RW0 d16 +#define RW1 d17 +#define RW2 d18 +#define RW3 d19 +#define RW4 d20 +#define RW5 d21 +#define RW6 d22 +#define RW7 d23 +#define RW8 d24 +#define RW9 d25 +#define RW10 d26 +#define RW11 d27 +#define RW12 d28 +#define RW13 d29 +#define RW14 d30 +#define RW15 d31 + +#define RW01q q8 +#define RW23q q9 +#define RW45q q10 +#define RW67q q11 +#define RW89q q12 +#define RW1011q q13 +#define RW1213q q14 +#define RW1415q q15 + +/*** + * ARM assembly implementation of sha512 transform + ***/ +#define rounds2_0_63(ra, rb, rc, rd, re, rf, rg, rh, rw0, rw1, rw01q, rw2, \ + rw23q, rw1415q, rw9, rw10, interleave_op, arg1) \ + /* t1 = h + Sum1 (e) + Ch (e, f, g) + k[t] + w[t]; */ \ + vshr.u64 RT2, re, #14; \ + vshl.u64 RT3, re, #64 - 14; \ + interleave_op(arg1); \ + vshr.u64 RT4, re, #18; \ + vshl.u64 RT5, re, #64 - 18; \ + vld1.64 {RT0}, [RK]!; \ + veor.64 RT23q, RT23q, RT45q; \ + vshr.u64 RT4, re, #41; \ + vshl.u64 RT5, re, #64 - 41; \ + vadd.u64 RT0, RT0, rw0; \ + veor.64 RT23q, RT23q, RT45q
[PATCH 1/2] [v3] crypto: sha1/ARM: make use of common SHA-1 structures
Common SHA-1 structures are defined in crypto/sha.h for code sharing. This patch changes SHA-1/ARM glue code to use these structures. Acked-by: Ard Biesheuvel ard.biesheu...@linaro.org Signed-off-by: Jussi Kivilinna jussi.kivili...@iki.fi --- arch/arm/crypto/sha1_glue.c | 50 +++ 1 file changed, 22 insertions(+), 28 deletions(-) diff --git a/arch/arm/crypto/sha1_glue.c b/arch/arm/crypto/sha1_glue.c index 76cd976..c494e57 100644 --- a/arch/arm/crypto/sha1_glue.c +++ b/arch/arm/crypto/sha1_glue.c @@ -24,31 +24,25 @@ #include crypto/sha.h #include asm/byteorder.h -struct SHA1_CTX { - uint32_t h0,h1,h2,h3,h4; - u64 count; - u8 data[SHA1_BLOCK_SIZE]; -}; -asmlinkage void sha1_block_data_order(struct SHA1_CTX *digest, +asmlinkage void sha1_block_data_order(u32 *digest, const unsigned char *data, unsigned int rounds); static int sha1_init(struct shash_desc *desc) { - struct SHA1_CTX *sctx = shash_desc_ctx(desc); - memset(sctx, 0, sizeof(*sctx)); - sctx-h0 = SHA1_H0; - sctx-h1 = SHA1_H1; - sctx-h2 = SHA1_H2; - sctx-h3 = SHA1_H3; - sctx-h4 = SHA1_H4; + struct sha1_state *sctx = shash_desc_ctx(desc); + + *sctx = (struct sha1_state){ + .state = { SHA1_H0, SHA1_H1, SHA1_H2, SHA1_H3, SHA1_H4 }, + }; + return 0; } -static int __sha1_update(struct SHA1_CTX *sctx, const u8 *data, - unsigned int len, unsigned int partial) +static int __sha1_update(struct sha1_state *sctx, const u8 *data, +unsigned int len, unsigned int partial) { unsigned int done = 0; @@ -56,17 +50,17 @@ static int __sha1_update(struct SHA1_CTX *sctx, const u8 *data, if (partial) { done = SHA1_BLOCK_SIZE - partial; - memcpy(sctx-data + partial, data, done); - sha1_block_data_order(sctx, sctx-data, 1); + memcpy(sctx-buffer + partial, data, done); + sha1_block_data_order(sctx-state, sctx-buffer, 1); } if (len - done = SHA1_BLOCK_SIZE) { const unsigned int rounds = (len - done) / SHA1_BLOCK_SIZE; - sha1_block_data_order(sctx, data + done, rounds); + sha1_block_data_order(sctx-state, data + done, rounds); done += rounds * SHA1_BLOCK_SIZE; } - memcpy(sctx-data, data + done, len - done); + memcpy(sctx-buffer, data + done, len - done); return 0; } @@ -74,14 +68,14 @@ static int __sha1_update(struct SHA1_CTX *sctx, const u8 *data, static int sha1_update(struct shash_desc *desc, const u8 *data, unsigned int len) { - struct SHA1_CTX *sctx = shash_desc_ctx(desc); + struct sha1_state *sctx = shash_desc_ctx(desc); unsigned int partial = sctx-count % SHA1_BLOCK_SIZE; int res; /* Handle the fast case right here */ if (partial + len SHA1_BLOCK_SIZE) { sctx-count += len; - memcpy(sctx-data + partial, data, len); + memcpy(sctx-buffer + partial, data, len); return 0; } res = __sha1_update(sctx, data, len, partial); @@ -92,7 +86,7 @@ static int sha1_update(struct shash_desc *desc, const u8 *data, /* Add padding and return the message digest. */ static int sha1_final(struct shash_desc *desc, u8 *out) { - struct SHA1_CTX *sctx = shash_desc_ctx(desc); + struct sha1_state *sctx = shash_desc_ctx(desc); unsigned int i, index, padlen; __be32 *dst = (__be32 *)out; __be64 bits; @@ -106,7 +100,7 @@ static int sha1_final(struct shash_desc *desc, u8 *out) /* We need to fill a whole block for __sha1_update() */ if (padlen = 56) { sctx-count += padlen; - memcpy(sctx-data + index, padding, padlen); + memcpy(sctx-buffer + index, padding, padlen); } else { __sha1_update(sctx, padding, padlen, index); } @@ -114,7 +108,7 @@ static int sha1_final(struct shash_desc *desc, u8 *out) /* Store state in digest */ for (i = 0; i 5; i++) - dst[i] = cpu_to_be32(((u32 *)sctx)[i]); + dst[i] = cpu_to_be32(sctx-state[i]); /* Wipe context */ memset(sctx, 0, sizeof(*sctx)); @@ -124,7 +118,7 @@ static int sha1_final(struct shash_desc *desc, u8 *out) static int sha1_export(struct shash_desc *desc, void *out) { - struct SHA1_CTX *sctx = shash_desc_ctx(desc); + struct sha1_state *sctx = shash_desc_ctx(desc); memcpy(out, sctx, sizeof(*sctx)); return 0; } @@ -132,7 +126,7 @@ static int sha1_export(struct shash_desc *desc, void *out) static int sha1_import(struct shash_desc *desc, const void *in) { - struct SHA1_CTX *sctx = shash_desc_ctx(desc); + struct sha1_state *sctx = shash_desc_ctx(desc
Re: [PATCH] [v3] crypto: sha512: add ARM NEON implementation
On 30.06.2014 21:13, Ard Biesheuvel wrote: On 30 June 2014 18:39, Jussi Kivilinna jussi.kivili...@iki.fi wrote: This patch adds ARM NEON assembly implementation of SHA-512 and SHA-384 algorithms. tcrypt benchmark results on Cortex-A8, sha512-generic vs sha512-neon-asm: block-size bytes/updateold-vs-new 16 16 2.99x 64 16 2.67x 64 64 3.00x 256 16 2.64x 256 64 3.06x 256 256 3.33x 102416 2.53x 1024256 3.39x 102410243.52x 204816 2.50x 2048256 3.41x 204810243.54x 204820483.57x 409616 2.49x 4096256 3.42x 409610243.56x 409640963.59x 819216 2.48x 8192256 3.42x 819210243.56x 819240963.60x 819281923.60x Acked-by: Ard Biesheuvel ard.biesheu...@linaro.org Tested-by: Ard Biesheuvel ard.biesheu...@linaro.org Signed-off-by: Jussi Kivilinna jussi.kivili...@iki.fi Likewise for this one: if nobody has any more comments, this should go into the patch system. One remaining question though: is this code (and the SHA1 code) known to be broken for big endian or just untested? Untested and probably broken, so therefore I've disabled when CPU_BIG_ENDIAN=y. -Jussi Thanks, Ard. --- Changes in v2: - Use ENTRY/ENDPROC - Don't provide Thumb2 version v3: - Changelog moved below '---' --- arch/arm/crypto/Makefile|2 arch/arm/crypto/sha512-armv7-neon.S | 455 +++ arch/arm/crypto/sha512_neon_glue.c | 305 +++ crypto/Kconfig | 15 + 4 files changed, 777 insertions(+) create mode 100644 arch/arm/crypto/sha512-armv7-neon.S create mode 100644 arch/arm/crypto/sha512_neon_glue.c diff --git a/arch/arm/crypto/Makefile b/arch/arm/crypto/Makefile index 374956d..b48fa34 100644 --- a/arch/arm/crypto/Makefile +++ b/arch/arm/crypto/Makefile @@ -6,11 +6,13 @@ obj-$(CONFIG_CRYPTO_AES_ARM) += aes-arm.o obj-$(CONFIG_CRYPTO_AES_ARM_BS) += aes-arm-bs.o obj-$(CONFIG_CRYPTO_SHA1_ARM) += sha1-arm.o obj-$(CONFIG_CRYPTO_SHA1_ARM_NEON) += sha1-arm-neon.o +obj-$(CONFIG_CRYPTO_SHA512_ARM_NEON) += sha512-arm-neon.o aes-arm-y := aes-armv4.o aes_glue.o aes-arm-bs-y := aesbs-core.o aesbs-glue.o sha1-arm-y := sha1-armv4-large.o sha1_glue.o sha1-arm-neon-y:= sha1-armv7-neon.o sha1_neon_glue.o +sha512-arm-neon-y := sha512-armv7-neon.o sha512_neon_glue.o quiet_cmd_perl = PERL$@ cmd_perl = $(PERL) $() $(@) diff --git a/arch/arm/crypto/sha512-armv7-neon.S b/arch/arm/crypto/sha512-armv7-neon.S new file mode 100644 index 000..fe99472 --- /dev/null +++ b/arch/arm/crypto/sha512-armv7-neon.S @@ -0,0 +1,455 @@ +/* sha512-armv7-neon.S - ARM/NEON assembly implementation of SHA-512 transform + * + * Copyright © 2013-2014 Jussi Kivilinna jussi.kivili...@iki.fi + * + * This program is free software; you can redistribute it and/or modify it + * under the terms of the GNU General Public License as published by the Free + * Software Foundation; either version 2 of the License, or (at your option) + * any later version. + */ + +#include linux/linkage.h + + +.syntax unified +.code 32 +.fpu neon + +.text + +/* structure of SHA512_CONTEXT */ +#define hd_a 0 +#define hd_b ((hd_a) + 8) +#define hd_c ((hd_b) + 8) +#define hd_d ((hd_c) + 8) +#define hd_e ((hd_d) + 8) +#define hd_f ((hd_e) + 8) +#define hd_g ((hd_f) + 8) + +/* register macros */ +#define RK %r2 + +#define RA d0 +#define RB d1 +#define RC d2 +#define RD d3 +#define RE d4 +#define RF d5 +#define RG d6 +#define RH d7 + +#define RT0 d8 +#define RT1 d9 +#define RT2 d10 +#define RT3 d11 +#define RT4 d12 +#define RT5 d13 +#define RT6 d14 +#define RT7 d15 + +#define RT01q q4 +#define RT23q q5 +#define RT45q q6 +#define RT67q q7 + +#define RW0 d16 +#define RW1 d17 +#define RW2 d18 +#define RW3 d19 +#define RW4 d20 +#define RW5 d21 +#define RW6 d22 +#define RW7 d23 +#define RW8 d24 +#define RW9 d25 +#define RW10 d26 +#define RW11 d27 +#define RW12 d28 +#define RW13 d29 +#define RW14 d30 +#define RW15 d31 + +#define RW01q q8 +#define RW23q q9 +#define RW45q q10 +#define RW67q q11 +#define RW89q q12 +#define RW1011q q13 +#define RW1213q q14 +#define RW1415q q15 + +/*** + * ARM assembly implementation of sha512 transform + ***/ +#define rounds2_0_63
Re: [PATCH 2/2] crypto: sha1: add ARM NEON implementation
On 28.06.2014 23:07, Ard Biesheuvel wrote: Hi Jussi, On 28 June 2014 12:40, Jussi Kivilinna jussi.kivili...@iki.fi wrote: This patch adds ARM NEON assembly implementation of SHA-1 algorithm. tcrypt benchmark results on Cortex-A8, sha1-arm-asm vs sha1-neon-asm: block-size bytes/updateold-vs-new 16 16 1.06x 64 16 1.05x 64 64 1.09x 256 16 1.04x 256 64 1.11x 256 256 1.28x 102416 1.04x 1024256 1.34x 102410241.42x 204816 1.04x 2048256 1.35x 204810241.44x 204820481.46x 409616 1.04x 4096256 1.36x 409610241.45x 409640961.48x 819216 1.04x 8192256 1.36x 819210241.46x 819240961.49x 819281921.49x This is a nice result: about the same speedup as OpenSSL when comparing the ALU asm implementation with the NEON. Signed-off-by: Jussi Kivilinna jussi.kivili...@iki.fi --- arch/arm/crypto/Makefile |2 arch/arm/crypto/sha1-armv7-neon.S | 635 arch/arm/crypto/sha1_glue.c|8 arch/arm/crypto/sha1_neon_glue.c | 197 +++ arch/arm/include/asm/crypto/sha1.h | 10 + crypto/Kconfig | 11 + 6 files changed, 860 insertions(+), 3 deletions(-) create mode 100644 arch/arm/crypto/sha1-armv7-neon.S create mode 100644 arch/arm/crypto/sha1_neon_glue.c create mode 100644 arch/arm/include/asm/crypto/sha1.h diff --git a/arch/arm/crypto/Makefile b/arch/arm/crypto/Makefile index 81cda39..374956d 100644 --- a/arch/arm/crypto/Makefile +++ b/arch/arm/crypto/Makefile @@ -5,10 +5,12 @@ obj-$(CONFIG_CRYPTO_AES_ARM) += aes-arm.o obj-$(CONFIG_CRYPTO_AES_ARM_BS) += aes-arm-bs.o obj-$(CONFIG_CRYPTO_SHA1_ARM) += sha1-arm.o +obj-$(CONFIG_CRYPTO_SHA1_ARM_NEON) += sha1-arm-neon.o aes-arm-y := aes-armv4.o aes_glue.o aes-arm-bs-y := aesbs-core.o aesbs-glue.o sha1-arm-y := sha1-armv4-large.o sha1_glue.o +sha1-arm-neon-y:= sha1-armv7-neon.o sha1_neon_glue.o quiet_cmd_perl = PERL$@ cmd_perl = $(PERL) $() $(@) diff --git a/arch/arm/crypto/sha1-armv7-neon.S b/arch/arm/crypto/sha1-armv7-neon.S new file mode 100644 index 000..beb1ed1 --- /dev/null +++ b/arch/arm/crypto/sha1-armv7-neon.S @@ -0,0 +1,635 @@ +/* sha1-armv7-neon.S - ARM/NEON accelerated SHA-1 transform function + * + * Copyright © 2013-2014 Jussi Kivilinna jussi.kivili...@iki.fi + * + * This program is free software; you can redistribute it and/or modify it + * under the terms of the GNU General Public License as published by the Free + * Software Foundation; either version 2 of the License, or (at your option) + * any later version. + */ + +.syntax unified +#ifdef __thumb2__ +.thumb +#else +.code 32 +#endif This is all NEON code, which has no size benefit from being assembled as Thumb-2. (NEON instructions are 4 bytes in either case) If we drop the Thumb-2 versions, there's one less version to test. Ok, I'll drop the .thumb part for both SHA1 and SHA512. +.fpu neon + +.data + +#define GET_DATA_POINTER(reg, name, rtmp) ldr reg, =name + [...] +.align 4 +.LK_VEC: +.LK1: .long K1, K1, K1, K1 +.LK2: .long K2, K2, K2, K2 +.LK3: .long K3, K3, K3, K3 +.LK4: .long K4, K4, K4, K4 If you are going to put these constants in a different section, they belong in .rodata not .data. But why not just keep them in .text? In that case, you can replace the above 'ldr reg, =name' with 'adr reg ,name' (or adrl if required) and get rid of the .ltorg and the literal pool. Ok, I'll move these to .text. Actually I realized that these values can be loaded to still free NEON registers for additional speed up. +/* + * Transform nblks*64 bytes (nblks*16 32-bit words) at DATA. + * + * unsigned int + * sha1_transform_neon (void *ctx, const unsigned char *data, + * unsigned int nblks) + */ +.align 3 +.globl sha1_transform_neon +.type sha1_transform_neon,%function; + +sha1_transform_neon: ENTRY(sha1_transform_neon) [and matching ENDPROC() below] Sure. -Jussi -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/2] [v2] crypto: sha1: add ARM NEON implementation
This patch adds ARM NEON assembly implementation of SHA-1 algorithm. tcrypt benchmark results on Cortex-A8, sha1-arm-asm vs sha1-neon-asm: block-size bytes/updateold-vs-new 16 16 1.04x 64 16 1.02x 64 64 1.05x 256 16 1.03x 256 64 1.04x 256 256 1.30x 102416 1.03x 1024256 1.36x 102410241.52x 204816 1.03x 2048256 1.39x 204810241.55x 204820481.59x 409616 1.03x 4096256 1.40x 409610241.57x 409640961.62x 819216 1.03x 8192256 1.40x 819210241.58x 819240961.63x 819281921.63x Changes in v2: - Use ENTRY/ENDPROC - Don't provide Thumb2 version - Move contants to .text section - Further tweaks to implementation for ~10% speed-up. Signed-off-by: Jussi Kivilinna jussi.kivili...@iki.fi --- arch/arm/crypto/Makefile |2 arch/arm/crypto/sha1-armv7-neon.S | 634 arch/arm/crypto/sha1_glue.c|8 arch/arm/crypto/sha1_neon_glue.c | 197 +++ arch/arm/include/asm/crypto/sha1.h | 10 + crypto/Kconfig | 11 + 6 files changed, 859 insertions(+), 3 deletions(-) create mode 100644 arch/arm/crypto/sha1-armv7-neon.S create mode 100644 arch/arm/crypto/sha1_neon_glue.c create mode 100644 arch/arm/include/asm/crypto/sha1.h diff --git a/arch/arm/crypto/Makefile b/arch/arm/crypto/Makefile index 81cda39..374956d 100644 --- a/arch/arm/crypto/Makefile +++ b/arch/arm/crypto/Makefile @@ -5,10 +5,12 @@ obj-$(CONFIG_CRYPTO_AES_ARM) += aes-arm.o obj-$(CONFIG_CRYPTO_AES_ARM_BS) += aes-arm-bs.o obj-$(CONFIG_CRYPTO_SHA1_ARM) += sha1-arm.o +obj-$(CONFIG_CRYPTO_SHA1_ARM_NEON) += sha1-arm-neon.o aes-arm-y := aes-armv4.o aes_glue.o aes-arm-bs-y := aesbs-core.o aesbs-glue.o sha1-arm-y := sha1-armv4-large.o sha1_glue.o +sha1-arm-neon-y:= sha1-armv7-neon.o sha1_neon_glue.o quiet_cmd_perl = PERL$@ cmd_perl = $(PERL) $() $(@) diff --git a/arch/arm/crypto/sha1-armv7-neon.S b/arch/arm/crypto/sha1-armv7-neon.S new file mode 100644 index 000..50013c0 --- /dev/null +++ b/arch/arm/crypto/sha1-armv7-neon.S @@ -0,0 +1,634 @@ +/* sha1-armv7-neon.S - ARM/NEON accelerated SHA-1 transform function + * + * Copyright © 2013-2014 Jussi Kivilinna jussi.kivili...@iki.fi + * + * This program is free software; you can redistribute it and/or modify it + * under the terms of the GNU General Public License as published by the Free + * Software Foundation; either version 2 of the License, or (at your option) + * any later version. + */ + +#include linux/linkage.h + + +.syntax unified +.code 32 +.fpu neon + +.text + + +/* Context structure */ + +#define state_h0 0 +#define state_h1 4 +#define state_h2 8 +#define state_h3 12 +#define state_h4 16 + + +/* Constants */ + +#define K1 0x5A827999 +#define K2 0x6ED9EBA1 +#define K3 0x8F1BBCDC +#define K4 0xCA62C1D6 +.align 4 +.LK_VEC: +.LK1: .long K1, K1, K1, K1 +.LK2: .long K2, K2, K2, K2 +.LK3: .long K3, K3, K3, K3 +.LK4: .long K4, K4, K4, K4 + + +/* Register macros */ + +#define RSTATE r0 +#define RDATA r1 +#define RNBLKS r2 +#define ROLDSTACK r3 +#define RWK lr + +#define _a r4 +#define _b r5 +#define _c r6 +#define _d r7 +#define _e r8 + +#define RT0 r9 +#define RT1 r10 +#define RT2 r11 +#define RT3 r12 + +#define W0 q0 +#define W1 q1 +#define W2 q2 +#define W3 q3 +#define W4 q4 +#define W5 q5 +#define W6 q6 +#define W7 q7 + +#define tmp0 q8 +#define tmp1 q9 +#define tmp2 q10 +#define tmp3 q11 + +#define qK1 q12 +#define qK2 q13 +#define qK3 q14 +#define qK4 q15 + + +/* Round function macros. */ + +#define WK_offs(i) (((i) 15) * 4) + +#define _R_F1(a,b,c,d,e,i,pre1,pre2,pre3,i16,\ + W,W_m04,W_m08,W_m12,W_m16,W_m20,W_m24,W_m28) \ + ldr RT3, [sp, WK_offs(i)]; \ + pre1(i16,W,W_m04,W_m08,W_m12,W_m16,W_m20,W_m24,W_m28); \ + bic RT0, d, b; \ + add e, e, a, ror #(32 - 5); \ + and RT1, c, b; \ + pre2(i16,W,W_m04,W_m08,W_m12,W_m16,W_m20,W_m24,W_m28); \ + add RT0, RT0, RT3; \ + add e, e, RT1; \ + ror b, #(32 - 30); \ + pre3(i16,W,W_m04,W_m08,W_m12,W_m16,W_m20,W_m24,W_m28); \ + add e, e, RT0; + +#define _R_F2(a,b,c,d,e,i,pre1,pre2,pre3,i16,\ + W,W_m04,W_m08,W_m12,W_m16,W_m20,W_m24,W_m28) \ + ldr RT3, [sp, WK_offs(i)]; \ + pre1(i16,W,W_m04,W_m08,W_m12,W_m16,W_m20,W_m24,W_m28); \ + eor RT0, d, b; \ + add e, e, a, ror #(32 - 5); \ + eor RT0, RT0, c; \ + pre2(i16,W,W_m04,W_m08,W_m12
[PATCH 1/2] [v2] crypto: sha1/ARM: make use of common SHA-1 structures
Common SHA-1 structures are defined in crypto/sha.h for code sharing. This patch changes SHA-1/ARM glue code to use these structures. Acked-by: Ard Biesheuvel ard.biesheu...@linaro.org Signed-off-by: Jussi Kivilinna jussi.kivili...@iki.fi --- arch/arm/crypto/sha1_glue.c | 50 +++ 1 file changed, 22 insertions(+), 28 deletions(-) diff --git a/arch/arm/crypto/sha1_glue.c b/arch/arm/crypto/sha1_glue.c index 76cd976..c494e57 100644 --- a/arch/arm/crypto/sha1_glue.c +++ b/arch/arm/crypto/sha1_glue.c @@ -24,31 +24,25 @@ #include crypto/sha.h #include asm/byteorder.h -struct SHA1_CTX { - uint32_t h0,h1,h2,h3,h4; - u64 count; - u8 data[SHA1_BLOCK_SIZE]; -}; -asmlinkage void sha1_block_data_order(struct SHA1_CTX *digest, +asmlinkage void sha1_block_data_order(u32 *digest, const unsigned char *data, unsigned int rounds); static int sha1_init(struct shash_desc *desc) { - struct SHA1_CTX *sctx = shash_desc_ctx(desc); - memset(sctx, 0, sizeof(*sctx)); - sctx-h0 = SHA1_H0; - sctx-h1 = SHA1_H1; - sctx-h2 = SHA1_H2; - sctx-h3 = SHA1_H3; - sctx-h4 = SHA1_H4; + struct sha1_state *sctx = shash_desc_ctx(desc); + + *sctx = (struct sha1_state){ + .state = { SHA1_H0, SHA1_H1, SHA1_H2, SHA1_H3, SHA1_H4 }, + }; + return 0; } -static int __sha1_update(struct SHA1_CTX *sctx, const u8 *data, - unsigned int len, unsigned int partial) +static int __sha1_update(struct sha1_state *sctx, const u8 *data, +unsigned int len, unsigned int partial) { unsigned int done = 0; @@ -56,17 +50,17 @@ static int __sha1_update(struct SHA1_CTX *sctx, const u8 *data, if (partial) { done = SHA1_BLOCK_SIZE - partial; - memcpy(sctx-data + partial, data, done); - sha1_block_data_order(sctx, sctx-data, 1); + memcpy(sctx-buffer + partial, data, done); + sha1_block_data_order(sctx-state, sctx-buffer, 1); } if (len - done = SHA1_BLOCK_SIZE) { const unsigned int rounds = (len - done) / SHA1_BLOCK_SIZE; - sha1_block_data_order(sctx, data + done, rounds); + sha1_block_data_order(sctx-state, data + done, rounds); done += rounds * SHA1_BLOCK_SIZE; } - memcpy(sctx-data, data + done, len - done); + memcpy(sctx-buffer, data + done, len - done); return 0; } @@ -74,14 +68,14 @@ static int __sha1_update(struct SHA1_CTX *sctx, const u8 *data, static int sha1_update(struct shash_desc *desc, const u8 *data, unsigned int len) { - struct SHA1_CTX *sctx = shash_desc_ctx(desc); + struct sha1_state *sctx = shash_desc_ctx(desc); unsigned int partial = sctx-count % SHA1_BLOCK_SIZE; int res; /* Handle the fast case right here */ if (partial + len SHA1_BLOCK_SIZE) { sctx-count += len; - memcpy(sctx-data + partial, data, len); + memcpy(sctx-buffer + partial, data, len); return 0; } res = __sha1_update(sctx, data, len, partial); @@ -92,7 +86,7 @@ static int sha1_update(struct shash_desc *desc, const u8 *data, /* Add padding and return the message digest. */ static int sha1_final(struct shash_desc *desc, u8 *out) { - struct SHA1_CTX *sctx = shash_desc_ctx(desc); + struct sha1_state *sctx = shash_desc_ctx(desc); unsigned int i, index, padlen; __be32 *dst = (__be32 *)out; __be64 bits; @@ -106,7 +100,7 @@ static int sha1_final(struct shash_desc *desc, u8 *out) /* We need to fill a whole block for __sha1_update() */ if (padlen = 56) { sctx-count += padlen; - memcpy(sctx-data + index, padding, padlen); + memcpy(sctx-buffer + index, padding, padlen); } else { __sha1_update(sctx, padding, padlen, index); } @@ -114,7 +108,7 @@ static int sha1_final(struct shash_desc *desc, u8 *out) /* Store state in digest */ for (i = 0; i 5; i++) - dst[i] = cpu_to_be32(((u32 *)sctx)[i]); + dst[i] = cpu_to_be32(sctx-state[i]); /* Wipe context */ memset(sctx, 0, sizeof(*sctx)); @@ -124,7 +118,7 @@ static int sha1_final(struct shash_desc *desc, u8 *out) static int sha1_export(struct shash_desc *desc, void *out) { - struct SHA1_CTX *sctx = shash_desc_ctx(desc); + struct sha1_state *sctx = shash_desc_ctx(desc); memcpy(out, sctx, sizeof(*sctx)); return 0; } @@ -132,7 +126,7 @@ static int sha1_export(struct shash_desc *desc, void *out) static int sha1_import(struct shash_desc *desc, const void *in) { - struct SHA1_CTX *sctx = shash_desc_ctx(desc); + struct sha1_state *sctx = shash_desc_ctx(desc
[PATCH] [v2] crypto: sha512: add ARM NEON implementation
This patch adds ARM NEON assembly implementation of SHA-512 and SHA-384 algorithms. tcrypt benchmark results on Cortex-A8, sha512-generic vs sha512-neon-asm: block-size bytes/updateold-vs-new 16 16 2.99x 64 16 2.67x 64 64 3.00x 256 16 2.64x 256 64 3.06x 256 256 3.33x 102416 2.53x 1024256 3.39x 102410243.52x 204816 2.50x 2048256 3.41x 204810243.54x 204820483.57x 409616 2.49x 4096256 3.42x 409610243.56x 409640963.59x 819216 2.48x 8192256 3.42x 819210243.56x 819240963.60x 819281923.60x Changes in v2: - Use ENTRY/ENDPROC - Don't provide Thumb2 version Signed-off-by: Jussi Kivilinna jussi.kivili...@iki.fi --- arch/arm/crypto/Makefile|2 arch/arm/crypto/sha512-armv7-neon.S | 455 +++ arch/arm/crypto/sha512_neon_glue.c | 305 +++ crypto/Kconfig | 15 + 4 files changed, 777 insertions(+) create mode 100644 arch/arm/crypto/sha512-armv7-neon.S create mode 100644 arch/arm/crypto/sha512_neon_glue.c diff --git a/arch/arm/crypto/Makefile b/arch/arm/crypto/Makefile index 374956d..b48fa34 100644 --- a/arch/arm/crypto/Makefile +++ b/arch/arm/crypto/Makefile @@ -6,11 +6,13 @@ obj-$(CONFIG_CRYPTO_AES_ARM) += aes-arm.o obj-$(CONFIG_CRYPTO_AES_ARM_BS) += aes-arm-bs.o obj-$(CONFIG_CRYPTO_SHA1_ARM) += sha1-arm.o obj-$(CONFIG_CRYPTO_SHA1_ARM_NEON) += sha1-arm-neon.o +obj-$(CONFIG_CRYPTO_SHA512_ARM_NEON) += sha512-arm-neon.o aes-arm-y := aes-armv4.o aes_glue.o aes-arm-bs-y := aesbs-core.o aesbs-glue.o sha1-arm-y := sha1-armv4-large.o sha1_glue.o sha1-arm-neon-y:= sha1-armv7-neon.o sha1_neon_glue.o +sha512-arm-neon-y := sha512-armv7-neon.o sha512_neon_glue.o quiet_cmd_perl = PERL$@ cmd_perl = $(PERL) $() $(@) diff --git a/arch/arm/crypto/sha512-armv7-neon.S b/arch/arm/crypto/sha512-armv7-neon.S new file mode 100644 index 000..fe99472 --- /dev/null +++ b/arch/arm/crypto/sha512-armv7-neon.S @@ -0,0 +1,455 @@ +/* sha512-armv7-neon.S - ARM/NEON assembly implementation of SHA-512 transform + * + * Copyright © 2013-2014 Jussi Kivilinna jussi.kivili...@iki.fi + * + * This program is free software; you can redistribute it and/or modify it + * under the terms of the GNU General Public License as published by the Free + * Software Foundation; either version 2 of the License, or (at your option) + * any later version. + */ + +#include linux/linkage.h + + +.syntax unified +.code 32 +.fpu neon + +.text + +/* structure of SHA512_CONTEXT */ +#define hd_a 0 +#define hd_b ((hd_a) + 8) +#define hd_c ((hd_b) + 8) +#define hd_d ((hd_c) + 8) +#define hd_e ((hd_d) + 8) +#define hd_f ((hd_e) + 8) +#define hd_g ((hd_f) + 8) + +/* register macros */ +#define RK %r2 + +#define RA d0 +#define RB d1 +#define RC d2 +#define RD d3 +#define RE d4 +#define RF d5 +#define RG d6 +#define RH d7 + +#define RT0 d8 +#define RT1 d9 +#define RT2 d10 +#define RT3 d11 +#define RT4 d12 +#define RT5 d13 +#define RT6 d14 +#define RT7 d15 + +#define RT01q q4 +#define RT23q q5 +#define RT45q q6 +#define RT67q q7 + +#define RW0 d16 +#define RW1 d17 +#define RW2 d18 +#define RW3 d19 +#define RW4 d20 +#define RW5 d21 +#define RW6 d22 +#define RW7 d23 +#define RW8 d24 +#define RW9 d25 +#define RW10 d26 +#define RW11 d27 +#define RW12 d28 +#define RW13 d29 +#define RW14 d30 +#define RW15 d31 + +#define RW01q q8 +#define RW23q q9 +#define RW45q q10 +#define RW67q q11 +#define RW89q q12 +#define RW1011q q13 +#define RW1213q q14 +#define RW1415q q15 + +/*** + * ARM assembly implementation of sha512 transform + ***/ +#define rounds2_0_63(ra, rb, rc, rd, re, rf, rg, rh, rw0, rw1, rw01q, rw2, \ + rw23q, rw1415q, rw9, rw10, interleave_op, arg1) \ + /* t1 = h + Sum1 (e) + Ch (e, f, g) + k[t] + w[t]; */ \ + vshr.u64 RT2, re, #14; \ + vshl.u64 RT3, re, #64 - 14; \ + interleave_op(arg1); \ + vshr.u64 RT4, re, #18; \ + vshl.u64 RT5, re, #64 - 18; \ + vld1.64 {RT0}, [RK]!; \ + veor.64 RT23q, RT23q, RT45q; \ + vshr.u64 RT4, re, #41; \ + vshl.u64 RT5, re, #64 - 41; \ + vadd.u64 RT0, RT0, rw0; \ + veor.64 RT23q, RT23q, RT45q; \ + vmov.64 RT7, re; \ + veor.64 RT1, RT2, RT3; \ + vbsl.64 RT7, rf, rg; \ + \ + vadd.u64 RT1, RT1, rh; \ + vshr.u64 RT2
[PATCH 1/2] crypto: sha1/ARM: make use of common SHA-1 structures
Common SHA-1 structures are defined in crypto/sha.h for code sharing. This patch changes SHA-1/ARM glue code to use these structures. Signed-off-by: Jussi Kivilinna jussi.kivili...@iki.fi --- arch/arm/crypto/sha1_glue.c | 50 +++ 1 file changed, 22 insertions(+), 28 deletions(-) diff --git a/arch/arm/crypto/sha1_glue.c b/arch/arm/crypto/sha1_glue.c index 76cd976..c494e57 100644 --- a/arch/arm/crypto/sha1_glue.c +++ b/arch/arm/crypto/sha1_glue.c @@ -24,31 +24,25 @@ #include crypto/sha.h #include asm/byteorder.h -struct SHA1_CTX { - uint32_t h0,h1,h2,h3,h4; - u64 count; - u8 data[SHA1_BLOCK_SIZE]; -}; -asmlinkage void sha1_block_data_order(struct SHA1_CTX *digest, +asmlinkage void sha1_block_data_order(u32 *digest, const unsigned char *data, unsigned int rounds); static int sha1_init(struct shash_desc *desc) { - struct SHA1_CTX *sctx = shash_desc_ctx(desc); - memset(sctx, 0, sizeof(*sctx)); - sctx-h0 = SHA1_H0; - sctx-h1 = SHA1_H1; - sctx-h2 = SHA1_H2; - sctx-h3 = SHA1_H3; - sctx-h4 = SHA1_H4; + struct sha1_state *sctx = shash_desc_ctx(desc); + + *sctx = (struct sha1_state){ + .state = { SHA1_H0, SHA1_H1, SHA1_H2, SHA1_H3, SHA1_H4 }, + }; + return 0; } -static int __sha1_update(struct SHA1_CTX *sctx, const u8 *data, - unsigned int len, unsigned int partial) +static int __sha1_update(struct sha1_state *sctx, const u8 *data, +unsigned int len, unsigned int partial) { unsigned int done = 0; @@ -56,17 +50,17 @@ static int __sha1_update(struct SHA1_CTX *sctx, const u8 *data, if (partial) { done = SHA1_BLOCK_SIZE - partial; - memcpy(sctx-data + partial, data, done); - sha1_block_data_order(sctx, sctx-data, 1); + memcpy(sctx-buffer + partial, data, done); + sha1_block_data_order(sctx-state, sctx-buffer, 1); } if (len - done = SHA1_BLOCK_SIZE) { const unsigned int rounds = (len - done) / SHA1_BLOCK_SIZE; - sha1_block_data_order(sctx, data + done, rounds); + sha1_block_data_order(sctx-state, data + done, rounds); done += rounds * SHA1_BLOCK_SIZE; } - memcpy(sctx-data, data + done, len - done); + memcpy(sctx-buffer, data + done, len - done); return 0; } @@ -74,14 +68,14 @@ static int __sha1_update(struct SHA1_CTX *sctx, const u8 *data, static int sha1_update(struct shash_desc *desc, const u8 *data, unsigned int len) { - struct SHA1_CTX *sctx = shash_desc_ctx(desc); + struct sha1_state *sctx = shash_desc_ctx(desc); unsigned int partial = sctx-count % SHA1_BLOCK_SIZE; int res; /* Handle the fast case right here */ if (partial + len SHA1_BLOCK_SIZE) { sctx-count += len; - memcpy(sctx-data + partial, data, len); + memcpy(sctx-buffer + partial, data, len); return 0; } res = __sha1_update(sctx, data, len, partial); @@ -92,7 +86,7 @@ static int sha1_update(struct shash_desc *desc, const u8 *data, /* Add padding and return the message digest. */ static int sha1_final(struct shash_desc *desc, u8 *out) { - struct SHA1_CTX *sctx = shash_desc_ctx(desc); + struct sha1_state *sctx = shash_desc_ctx(desc); unsigned int i, index, padlen; __be32 *dst = (__be32 *)out; __be64 bits; @@ -106,7 +100,7 @@ static int sha1_final(struct shash_desc *desc, u8 *out) /* We need to fill a whole block for __sha1_update() */ if (padlen = 56) { sctx-count += padlen; - memcpy(sctx-data + index, padding, padlen); + memcpy(sctx-buffer + index, padding, padlen); } else { __sha1_update(sctx, padding, padlen, index); } @@ -114,7 +108,7 @@ static int sha1_final(struct shash_desc *desc, u8 *out) /* Store state in digest */ for (i = 0; i 5; i++) - dst[i] = cpu_to_be32(((u32 *)sctx)[i]); + dst[i] = cpu_to_be32(sctx-state[i]); /* Wipe context */ memset(sctx, 0, sizeof(*sctx)); @@ -124,7 +118,7 @@ static int sha1_final(struct shash_desc *desc, u8 *out) static int sha1_export(struct shash_desc *desc, void *out) { - struct SHA1_CTX *sctx = shash_desc_ctx(desc); + struct sha1_state *sctx = shash_desc_ctx(desc); memcpy(out, sctx, sizeof(*sctx)); return 0; } @@ -132,7 +126,7 @@ static int sha1_export(struct shash_desc *desc, void *out) static int sha1_import(struct shash_desc *desc, const void *in) { - struct SHA1_CTX *sctx = shash_desc_ctx(desc); + struct sha1_state *sctx = shash_desc_ctx(desc); memcpy(sctx, in, sizeof(*sctx
[PATCH 2/2] crypto: sha1: add ARM NEON implementation
This patch adds ARM NEON assembly implementation of SHA-1 algorithm. tcrypt benchmark results on Cortex-A8, sha1-arm-asm vs sha1-neon-asm: block-size bytes/updateold-vs-new 16 16 1.06x 64 16 1.05x 64 64 1.09x 256 16 1.04x 256 64 1.11x 256 256 1.28x 102416 1.04x 1024256 1.34x 102410241.42x 204816 1.04x 2048256 1.35x 204810241.44x 204820481.46x 409616 1.04x 4096256 1.36x 409610241.45x 409640961.48x 819216 1.04x 8192256 1.36x 819210241.46x 819240961.49x 819281921.49x Signed-off-by: Jussi Kivilinna jussi.kivili...@iki.fi --- arch/arm/crypto/Makefile |2 arch/arm/crypto/sha1-armv7-neon.S | 635 arch/arm/crypto/sha1_glue.c|8 arch/arm/crypto/sha1_neon_glue.c | 197 +++ arch/arm/include/asm/crypto/sha1.h | 10 + crypto/Kconfig | 11 + 6 files changed, 860 insertions(+), 3 deletions(-) create mode 100644 arch/arm/crypto/sha1-armv7-neon.S create mode 100644 arch/arm/crypto/sha1_neon_glue.c create mode 100644 arch/arm/include/asm/crypto/sha1.h diff --git a/arch/arm/crypto/Makefile b/arch/arm/crypto/Makefile index 81cda39..374956d 100644 --- a/arch/arm/crypto/Makefile +++ b/arch/arm/crypto/Makefile @@ -5,10 +5,12 @@ obj-$(CONFIG_CRYPTO_AES_ARM) += aes-arm.o obj-$(CONFIG_CRYPTO_AES_ARM_BS) += aes-arm-bs.o obj-$(CONFIG_CRYPTO_SHA1_ARM) += sha1-arm.o +obj-$(CONFIG_CRYPTO_SHA1_ARM_NEON) += sha1-arm-neon.o aes-arm-y := aes-armv4.o aes_glue.o aes-arm-bs-y := aesbs-core.o aesbs-glue.o sha1-arm-y := sha1-armv4-large.o sha1_glue.o +sha1-arm-neon-y:= sha1-armv7-neon.o sha1_neon_glue.o quiet_cmd_perl = PERL$@ cmd_perl = $(PERL) $() $(@) diff --git a/arch/arm/crypto/sha1-armv7-neon.S b/arch/arm/crypto/sha1-armv7-neon.S new file mode 100644 index 000..beb1ed1 --- /dev/null +++ b/arch/arm/crypto/sha1-armv7-neon.S @@ -0,0 +1,635 @@ +/* sha1-armv7-neon.S - ARM/NEON accelerated SHA-1 transform function + * + * Copyright © 2013-2014 Jussi Kivilinna jussi.kivili...@iki.fi + * + * This program is free software; you can redistribute it and/or modify it + * under the terms of the GNU General Public License as published by the Free + * Software Foundation; either version 2 of the License, or (at your option) + * any later version. + */ + +.syntax unified +#ifdef __thumb2__ +.thumb +#else +.code 32 +#endif +.fpu neon + +.data + +#define GET_DATA_POINTER(reg, name, rtmp) ldr reg, =name + +/* Context structure */ + +#define state_h0 0 +#define state_h1 4 +#define state_h2 8 +#define state_h3 12 +#define state_h4 16 + + +/* Constants */ + +#define K1 0x5A827999 +#define K2 0x6ED9EBA1 +#define K3 0x8F1BBCDC +#define K4 0xCA62C1D6 +.align 4 +.LK_VEC: +.LK1: .long K1, K1, K1, K1 +.LK2: .long K2, K2, K2, K2 +.LK3: .long K3, K3, K3, K3 +.LK4: .long K4, K4, K4, K4 + + +.text + +/* Register macros */ + +#define RSTATE r0 +#define RDATA r1 +#define RNBLKS r2 +#define ROLDSTACK r3 +#define RK lr +#define RWK r12 + +#define _a r4 +#define _b r5 +#define _c r6 +#define _d r7 +#define _e r8 + +#define RT0 r9 +#define RT1 r10 +#define RT2 r11 + +#define W0 q0 +#define W1 q1 +#define W2 q2 +#define W3 q3 +#define W4 q4 +#define W5 q5 +#define W6 q6 +#define W7 q7 + +#define tmp0 q8 +#define tmp1 q9 +#define tmp2 q10 +#define tmp3 q11 + +#define curK q12 + + +/* Round function macros. */ + +#define WK_offs(i) (((i) 15) * 4) + +#define _R_F1(a,b,c,d,e,i,pre1,pre2,pre3,i16,\ + W,W_m04,W_m08,W_m12,W_m16,W_m20,W_m24,W_m28) \ + and RT0, c, b; \ + pre1(i16,W,W_m04,W_m08,W_m12,W_m16,W_m20,W_m24,W_m28); \ + add e, e, a, ror #(32 - 5); \ + ldr RT2, [sp, WK_offs(i)]; \ + bic RT1, d, b; \ + add e, RT2; \ + pre2(i16,W,W_m04,W_m08,W_m12,W_m16,W_m20,W_m24,W_m28); \ + ror b, #(32 - 30); \ + eor RT0, RT1; \ + pre3(i16,W,W_m04,W_m08,W_m12,W_m16,W_m20,W_m24,W_m28); \ + add e, RT0; + +#define _R_F2(a,b,c,d,e,i,pre1,pre2,pre3,i16,\ + W,W_m04,W_m08,W_m12,W_m16,W_m20,W_m24,W_m28) \ + eor RT0, c, b; \ + pre1(i16,W,W_m04,W_m08,W_m12,W_m16,W_m20,W_m24,W_m28); \ + add e, e, a, ror #(32 - 5); \ + ldr RT2, [sp, WK_offs(i)]; \ + eor RT0, d; \ + pre2(i16,W,W_m04,W_m08,W_m12,W_m16,W_m20,W_m24,W_m28); \ + add e, RT2; \ + ror b, #(32 - 30); \ + pre3(i16,W,W_m04,W_m08,W_m12,W_m16,W_m20,W_m24,W_m28
[PATCH] crypto: sha512: add ARM NEON implementation
This patch adds ARM NEON assembly implementation of SHA-512 and SHA-384 algorithms. tcrypt benchmark results on Cortex-A8, sha512-generic vs sha512-neon-asm: block-size bytes/updateold-vs-new 16 16 2.99x 64 16 2.67x 64 64 3.00x 256 16 2.64x 256 64 3.06x 256 256 3.33x 102416 2.53x 1024256 3.39x 102410243.52x 204816 2.50x 2048256 3.41x 204810243.54x 204820483.57x 409616 2.49x 4096256 3.42x 409610243.56x 409640963.59x 819216 2.48x 8192256 3.42x 819210243.56x 819240963.60x 819281923.60x Signed-off-by: Jussi Kivilinna jussi.kivili...@iki.fi --- arch/arm/crypto/Makefile|2 arch/arm/crypto/sha512-armv7-neon.S | 461 +++ arch/arm/crypto/sha512_neon_glue.c | 305 +++ crypto/Kconfig | 15 + 4 files changed, 783 insertions(+) create mode 100644 arch/arm/crypto/sha512-armv7-neon.S create mode 100644 arch/arm/crypto/sha512_neon_glue.c diff --git a/arch/arm/crypto/Makefile b/arch/arm/crypto/Makefile index 374956d..b48fa34 100644 --- a/arch/arm/crypto/Makefile +++ b/arch/arm/crypto/Makefile @@ -6,11 +6,13 @@ obj-$(CONFIG_CRYPTO_AES_ARM) += aes-arm.o obj-$(CONFIG_CRYPTO_AES_ARM_BS) += aes-arm-bs.o obj-$(CONFIG_CRYPTO_SHA1_ARM) += sha1-arm.o obj-$(CONFIG_CRYPTO_SHA1_ARM_NEON) += sha1-arm-neon.o +obj-$(CONFIG_CRYPTO_SHA512_ARM_NEON) += sha512-arm-neon.o aes-arm-y := aes-armv4.o aes_glue.o aes-arm-bs-y := aesbs-core.o aesbs-glue.o sha1-arm-y := sha1-armv4-large.o sha1_glue.o sha1-arm-neon-y:= sha1-armv7-neon.o sha1_neon_glue.o +sha512-arm-neon-y := sha512-armv7-neon.o sha512_neon_glue.o quiet_cmd_perl = PERL$@ cmd_perl = $(PERL) $() $(@) diff --git a/arch/arm/crypto/sha512-armv7-neon.S b/arch/arm/crypto/sha512-armv7-neon.S new file mode 100644 index 000..cdc6385 --- /dev/null +++ b/arch/arm/crypto/sha512-armv7-neon.S @@ -0,0 +1,461 @@ +/* sha512-armv7-neon.S - ARM/NEON assembly implementation of SHA-512 transform + * + * Copyright © 2013-2014 Jussi Kivilinna jussi.kivili...@iki.fi + * + * This program is free software; you can redistribute it and/or modify it + * under the terms of the GNU General Public License as published by the Free + * Software Foundation; either version 2 of the License, or (at your option) + * any later version. + */ + +.syntax unified +#ifdef __thumb2__ +.thumb +#else +.code 32 +#endif +.fpu neon + +.text + +/* structure of SHA512_CONTEXT */ +#define hd_a 0 +#define hd_b ((hd_a) + 8) +#define hd_c ((hd_b) + 8) +#define hd_d ((hd_c) + 8) +#define hd_e ((hd_d) + 8) +#define hd_f ((hd_e) + 8) +#define hd_g ((hd_f) + 8) + +/* register macros */ +#define RK %r2 + +#define RA d0 +#define RB d1 +#define RC d2 +#define RD d3 +#define RE d4 +#define RF d5 +#define RG d6 +#define RH d7 + +#define RT0 d8 +#define RT1 d9 +#define RT2 d10 +#define RT3 d11 +#define RT4 d12 +#define RT5 d13 +#define RT6 d14 +#define RT7 d15 + +#define RT01q q4 +#define RT23q q5 +#define RT45q q6 +#define RT67q q7 + +#define RW0 d16 +#define RW1 d17 +#define RW2 d18 +#define RW3 d19 +#define RW4 d20 +#define RW5 d21 +#define RW6 d22 +#define RW7 d23 +#define RW8 d24 +#define RW9 d25 +#define RW10 d26 +#define RW11 d27 +#define RW12 d28 +#define RW13 d29 +#define RW14 d30 +#define RW15 d31 + +#define RW01q q8 +#define RW23q q9 +#define RW45q q10 +#define RW67q q11 +#define RW89q q12 +#define RW1011q q13 +#define RW1213q q14 +#define RW1415q q15 + +/*** + * ARM assembly implementation of sha512 transform + ***/ +#define rounds2_0_63(ra, rb, rc, rd, re, rf, rg, rh, rw0, rw1, rw01q, rw2, \ + rw23q, rw1415q, rw9, rw10, interleave_op, arg1) \ + /* t1 = h + Sum1 (e) + Ch (e, f, g) + k[t] + w[t]; */ \ + vshr.u64 RT2, re, #14; \ + vshl.u64 RT3, re, #64 - 14; \ + interleave_op(arg1); \ + vshr.u64 RT4, re, #18; \ + vshl.u64 RT5, re, #64 - 18; \ + vld1.64 {RT0}, [RK]!; \ + veor.64 RT23q, RT23q, RT45q; \ + vshr.u64 RT4, re, #41; \ + vshl.u64 RT5, re, #64 - 41; \ + vadd.u64 RT0, RT0, rw0; \ + veor.64 RT23q, RT23q, RT45q; \ + vmov.64 RT7, re; \ + veor.64 RT1, RT2, RT3; \ + vbsl.64 RT7, rf, rg; \ + \ + vadd.u64 RT1, RT1, rh; \ + vshr.u64 RT2, ra, #28; \ + vshl.u64 RT3, ra, #64 - 28
[PATCH] crypto: des3_ede/x86-64: fix parse warning
Patch fixes following sparse warning: CHECK arch/x86/crypto/des3_ede_glue.c arch/x86/crypto/des3_ede_glue.c:308:52: warning: restricted __be64 degrades to integer arch/x86/crypto/des3_ede_glue.c:309:52: warning: restricted __be64 degrades to integer arch/x86/crypto/des3_ede_glue.c:310:52: warning: restricted __be64 degrades to integer arch/x86/crypto/des3_ede_glue.c:326:44: warning: restricted __be64 degrades to integer Reported-by: kbuild test robot fengguang...@intel.com Signed-off-by: Jussi Kivilinna jussi.kivili...@iki.fi --- arch/x86/crypto/des3_ede_glue.c |4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/x86/crypto/des3_ede_glue.c b/arch/x86/crypto/des3_ede_glue.c index ebc4215..0e9c066 100644 --- a/arch/x86/crypto/des3_ede_glue.c +++ b/arch/x86/crypto/des3_ede_glue.c @@ -289,8 +289,8 @@ static unsigned int __ctr_crypt(struct blkcipher_desc *desc, struct des3_ede_x86_ctx *ctx = crypto_blkcipher_ctx(desc-tfm); unsigned int bsize = DES3_EDE_BLOCK_SIZE; unsigned int nbytes = walk-nbytes; - u64 *src = (u64 *)walk-src.virt.addr; - u64 *dst = (u64 *)walk-dst.virt.addr; + __be64 *src = (__be64 *)walk-src.virt.addr; + __be64 *dst = (__be64 *)walk-dst.virt.addr; u64 ctrblk = be64_to_cpu(*(__be64 *)walk-iv); __be64 ctrblocks[3]; -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] crypto: sha512_ssse3: fix byte count to bit count conversion
Byte-to-bit-count computation is only partly converted to big-endian and is mixing in CPU-endian values. Problem was noticed by sparce with warning: CHECK arch/x86/crypto/sha512_ssse3_glue.c arch/x86/crypto/sha512_ssse3_glue.c:144:19: warning: restricted __be64 degrades to integer arch/x86/crypto/sha512_ssse3_glue.c:144:17: warning: incorrect type in assignment (different base types) arch/x86/crypto/sha512_ssse3_glue.c:144:17:expected restricted __be64 noident arch/x86/crypto/sha512_ssse3_glue.c:144:17:got unsigned long long Cc: Tim Chen tim.c.c...@linux.intel.com Signed-off-by: Jussi Kivilinna jussi.kivili...@iki.fi --- arch/x86/crypto/sha512_ssse3_glue.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/x86/crypto/sha512_ssse3_glue.c b/arch/x86/crypto/sha512_ssse3_glue.c index f30cd10..8626b03 100644 --- a/arch/x86/crypto/sha512_ssse3_glue.c +++ b/arch/x86/crypto/sha512_ssse3_glue.c @@ -141,7 +141,7 @@ static int sha512_ssse3_final(struct shash_desc *desc, u8 *out) /* save number of bits */ bits[1] = cpu_to_be64(sctx-count[0] 3); - bits[0] = cpu_to_be64(sctx-count[1] 3) | sctx-count[0] 61; + bits[0] = cpu_to_be64(sctx-count[1] 3 | sctx-count[0] 61); /* Pad out to 112 mod 128 and append length */ index = sctx-count[0] 0x7f; -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/2] crypto: des_3des - add x86-64 assembly implementation
Patch adds x86_64 assembly implementation of Triple DES EDE cipher algorithm. Two assembly implementations are provided. First is regular 'one-block at time' encrypt/decrypt function. Second is 'three-blocks at time' function that gains performance increase on out-of-order CPUs. tcrypt test results: Intel Core i5-4570: des3_ede-asm vs des3_ede-generic: sizeecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec 16B 1.21x 1.22x 1.27x 1.36x 1.25x 1.25x 64B 1.98x 1.96x 1.23x 2.04x 2.01x 2.00x 256B2.34x 2.37x 1.21x 2.40x 2.38x 2.39x 1024B 2.50x 2.47x 1.22x 2.51x 2.52x 2.51x 8192B 2.51x 2.53x 1.21x 2.56x 2.54x 2.55x Signed-off-by: Jussi Kivilinna jussi.kivili...@iki.fi --- arch/x86/crypto/Makefile |2 arch/x86/crypto/des3_ede-asm_64.S | 805 + arch/x86/crypto/des3_ede_glue.c | 509 +++ crypto/Kconfig| 13 + crypto/des_generic.c | 22 + include/crypto/des.h |3 6 files changed, 1349 insertions(+), 5 deletions(-) create mode 100644 arch/x86/crypto/des3_ede-asm_64.S create mode 100644 arch/x86/crypto/des3_ede_glue.c diff --git a/arch/x86/crypto/Makefile b/arch/x86/crypto/Makefile index 61d6e28..a470de2 100644 --- a/arch/x86/crypto/Makefile +++ b/arch/x86/crypto/Makefile @@ -14,6 +14,7 @@ obj-$(CONFIG_CRYPTO_SALSA20_586) += salsa20-i586.o obj-$(CONFIG_CRYPTO_SERPENT_SSE2_586) += serpent-sse2-i586.o obj-$(CONFIG_CRYPTO_AES_X86_64) += aes-x86_64.o +obj-$(CONFIG_CRYPTO_DES3_EDE_X86_64) += des3_ede-x86_64.o obj-$(CONFIG_CRYPTO_CAMELLIA_X86_64) += camellia-x86_64.o obj-$(CONFIG_CRYPTO_BLOWFISH_X86_64) += blowfish-x86_64.o obj-$(CONFIG_CRYPTO_TWOFISH_X86_64) += twofish-x86_64.o @@ -52,6 +53,7 @@ salsa20-i586-y := salsa20-i586-asm_32.o salsa20_glue.o serpent-sse2-i586-y := serpent-sse2-i586-asm_32.o serpent_sse2_glue.o aes-x86_64-y := aes-x86_64-asm_64.o aes_glue.o +des3_ede-x86_64-y := des3_ede-asm_64.o des3_ede_glue.o camellia-x86_64-y := camellia-x86_64-asm_64.o camellia_glue.o blowfish-x86_64-y := blowfish-x86_64-asm_64.o blowfish_glue.o twofish-x86_64-y := twofish-x86_64-asm_64.o twofish_glue.o diff --git a/arch/x86/crypto/des3_ede-asm_64.S b/arch/x86/crypto/des3_ede-asm_64.S new file mode 100644 index 000..038f6ae --- /dev/null +++ b/arch/x86/crypto/des3_ede-asm_64.S @@ -0,0 +1,805 @@ +/* + * des3_ede-asm_64.S - x86-64 assembly implementation of 3DES cipher + * + * Copyright © 2014 Jussi Kivilinna jussi.kivili...@iki.fi + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + */ + +#include linux/linkage.h + +.file des3_ede-asm_64.S +.text + +#define s1 .L_s1 +#define s2 ((s1) + (64*8)) +#define s3 ((s2) + (64*8)) +#define s4 ((s3) + (64*8)) +#define s5 ((s4) + (64*8)) +#define s6 ((s5) + (64*8)) +#define s7 ((s6) + (64*8)) +#define s8 ((s7) + (64*8)) + +/* register macros */ +#define CTX %rdi + +#define RL0 %r8 +#define RL1 %r9 +#define RL2 %r10 + +#define RL0d %r8d +#define RL1d %r9d +#define RL2d %r10d + +#define RR0 %r11 +#define RR1 %r12 +#define RR2 %r13 + +#define RR0d %r11d +#define RR1d %r12d +#define RR2d %r13d + +#define RW0 %rax +#define RW1 %rbx +#define RW2 %rcx + +#define RW0d %eax +#define RW1d %ebx +#define RW2d %ecx + +#define RW0bl %al +#define RW1bl %bl +#define RW2bl %cl + +#define RW0bh %ah +#define RW1bh %bh +#define RW2bh %ch + +#define RT0 %r15 +#define RT1 %rbp +#define RT2 %r14 +#define RT3 %rdx + +#define RT0d %r15d +#define RT1d %ebp +#define RT2d %r14d +#define RT3d %edx + +/*** + * 1-way 3DES + ***/ +#define do_permutation(a, b, offset, mask) \ + movl a, RT0d; \ + shrl $(offset), RT0d; \ + xorl b, RT0d; \ + andl $(mask), RT0d; \ + xorl RT0d, b; \ + shll $(offset), RT0d; \ + xorl RT0d, a; + +#define expand_to_64bits(val, mask) \ + movl val##d, RT0d; \ + rorl $4, RT0d; \ + shlq $32, RT0; \ + orq RT0, val; \ + andq mask, val; + +#define compress_to_64bits(val) \ + movq val, RT0; \ + shrq $32, RT0; \ + roll $4, RT0d; \ + orl RT0d, val##d; + +#define initial_permutation(left, right) \ + do_permutation(left##d, right##d, 4, 0x0f0f0f0f); \ + do_permutation(left##d, right##d, 16, 0x); \ + do_permutation(right##d, left##d, 2, 0x); \ + do_permutation(right##d
[PATCH 1/2] crypto: tcrypt - add ctr(des3_ede) sync speed test
Signed-off-by: Jussi Kivilinna jussi.kivili...@iki.fi --- crypto/tcrypt.c |6 ++ 1 file changed, 6 insertions(+) diff --git a/crypto/tcrypt.c b/crypto/tcrypt.c index ba247cf..164ec0e 100644 --- a/crypto/tcrypt.c +++ b/crypto/tcrypt.c @@ -1585,6 +1585,12 @@ static int do_test(int m) test_cipher_speed(cbc(des3_ede), DECRYPT, sec, des3_speed_template, DES3_SPEED_VECTORS, speed_template_24); + test_cipher_speed(ctr(des3_ede), ENCRYPT, sec, + des3_speed_template, DES3_SPEED_VECTORS, + speed_template_24); + test_cipher_speed(ctr(des3_ede), DECRYPT, sec, + des3_speed_template, DES3_SPEED_VECTORS, + speed_template_24); break; case 202: -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH resend 13/15] arm64/crypto: add voluntary preemption to Crypto Extensions SHA1
On 01.05.2014 18:51, Ard Biesheuvel wrote: The Crypto Extensions based SHA1 implementation uses the NEON register file, and hence runs with preemption disabled. This patch adds a TIF_NEED_RESCHED check to its inner loop so we at least give up the CPU voluntarily when we are running in process context and have been tagged for preemption by the scheduler. Signed-off-by: Ard Biesheuvel ard.biesheu...@linaro.org --- snip @@ -42,6 +42,7 @@ static int sha1_update(struct shash_desc *desc, const u8 *data, sctx-count += len; if ((partial + len) = SHA1_BLOCK_SIZE) { + struct thread_info *ti = NULL; int blocks; if (partial) { @@ -52,16 +53,30 @@ static int sha1_update(struct shash_desc *desc, const u8 *data, len -= p; } + /* + * Pass current's thread info pointer to sha1_ce_transform() + * below if we want it to play nice under preemption. + */ + if ((IS_ENABLED(CONFIG_PREEMPT_VOLUNTARY) || + IS_ENABLED(CONFIG_PREEMPT)) !in_interrupt()) + ti = current_thread_info(); + blocks = len / SHA1_BLOCK_SIZE; len %= SHA1_BLOCK_SIZE; - kernel_neon_begin_partial(16); - sha1_ce_transform(blocks, data, sctx-state, - partial ? sctx-buffer : NULL, 0); - kernel_neon_end(); + do { + int rem; + + kernel_neon_begin_partial(16); + rem = sha1_ce_transform(blocks, data, sctx-state, + partial ? sctx-buffer : NULL, + 0, ti); + kernel_neon_end(); - data += blocks * SHA1_BLOCK_SIZE; - partial = 0; + data += (blocks - rem) * SHA1_BLOCK_SIZE; + blocks = rem; + partial = 0; + } while (unlikely(ti blocks 0)); } if (len) memcpy(sctx-buffer + partial, data, len); @@ -94,6 +109,7 @@ static int sha1_finup(struct shash_desc *desc, const u8 *data, unsigned int len, u8 *out) { struct sha1_state *sctx = shash_desc_ctx(desc); + struct thread_info *ti = NULL; __be32 *dst = (__be32 *)out; int blocks; int i; @@ -111,9 +127,20 @@ static int sha1_finup(struct shash_desc *desc, const u8 *data, */ blocks = len / SHA1_BLOCK_SIZE; - kernel_neon_begin_partial(16); - sha1_ce_transform(blocks, data, sctx-state, NULL, len); - kernel_neon_end(); + if ((IS_ENABLED(CONFIG_PREEMPT_VOLUNTARY) || + IS_ENABLED(CONFIG_PREEMPT)) !in_interrupt()) + ti = current_thread_info(); + + do { + int rem; + + kernel_neon_begin_partial(16); + rem = sha1_ce_transform(blocks, data, sctx-state, + NULL, len, ti); + kernel_neon_end(); + data += (blocks - rem) * SHA1_BLOCK_SIZE; + blocks = rem; + } while (unlikely(ti blocks 0)); These seem to be similar, how about renaming assembly function to __sha1_ce_transform and moving this loop to new sha1_ce_transform. Otherwise, patches looks good. -Jussi -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] crypto: testmgr: add empty and large test vectors for SHA-1, SHA-224, SHA-256, SHA-384 and SHA-512
Patch adds large test-vectors for SHA algorithms for better code coverage in optimized assembly implementations. Empty test-vectors are also added, as some crypto drivers appear to have special case handling for empty input. Signed-off-by: Jussi Kivilinna jussi.kivili...@iki.fi --- This patch depends on the crypto: add test cases for SHA-1, SHA-224, SHA-256 and AES-CCM patch from Ard Biesheuvel. --- crypto/testmgr.h | 728 +- 1 file changed, 721 insertions(+), 7 deletions(-) diff --git a/crypto/testmgr.h b/crypto/testmgr.h index 84ac0f0..7d1438e 100644 --- a/crypto/testmgr.h +++ b/crypto/testmgr.h @@ -487,10 +487,15 @@ static struct hash_testvec crct10dif_tv_template[] = { * SHA1 test vectors from from FIPS PUB 180-1 * Long vector from CAVS 5.0 */ -#define SHA1_TEST_VECTORS 4 +#define SHA1_TEST_VECTORS 6 static struct hash_testvec sha1_tv_template[] = { { + .plaintext = , + .psize = 0, + .digest = \xda\x39\xa3\xee\x5e\x6b\x4b\x0d\x32\x55 + \xbf\xef\x95\x60\x18\x90\xaf\xd8\x07\x09, + }, { .plaintext = abc, .psize = 3, .digest = \xa9\x99\x3e\x36\x47\x06\x81\x6a\xba\x3e @@ -534,6 +539,139 @@ static struct hash_testvec sha1_tv_template[] = { .psize = 64, .digest = \xc8\x71\xf6\x9a\x63\xcc\xa9\x84\x84\x82 \x64\xe7\x79\x95\x5d\xd7\x19\x41\x7c\x91, + }, { + .plaintext = \x08\x9f\x13\xaa\x41\xd8\x4c\xe3 +\x7a\x11\x85\x1c\xb3\x27\xbe\x55 +\xec\x60\xf7\x8e\x02\x99\x30\xc7 +\x3b\xd2\x69\x00\x74\x0b\xa2\x16 +\xad\x44\xdb\x4f\xe6\x7d\x14\x88 +\x1f\xb6\x2a\xc1\x58\xef\x63\xfa +\x91\x05\x9c\x33\xca\x3e\xd5\x6c +\x03\x77\x0e\xa5\x19\xb0\x47\xde +\x52\xe9\x80\x17\x8b\x22\xb9\x2d +\xc4\x5b\xf2\x66\xfd\x94\x08\x9f +\x36\xcd\x41\xd8\x6f\x06\x7a\x11 +\xa8\x1c\xb3\x4a\xe1\x55\xec\x83 +\x1a\x8e\x25\xbc\x30\xc7\x5e\xf5 +\x69\x00\x97\x0b\xa2\x39\xd0\x44 +\xdb\x72\x09\x7d\x14\xab\x1f\xb6 +\x4d\xe4\x58\xef\x86\x1d\x91\x28 +\xbf\x33\xca\x61\xf8\x6c\x03\x9a +\x0e\xa5\x3c\xd3\x47\xde\x75\x0c +\x80\x17\xae\x22\xb9\x50\xe7\x5b +\xf2\x89\x20\x94\x2b\xc2\x36\xcd +\x64\xfb\x6f\x06\x9d\x11\xa8\x3f +\xd6\x4a\xe1\x78\x0f\x83\x1a\xb1 +\x25\xbc\x53\xea\x5e\xf5\x8c\x00 +\x97\x2e\xc5\x39\xd0\x67\xfe\x72 +\x09\xa0\x14\xab\x42\xd9\x4d\xe4 +\x7b\x12\x86\x1d\xb4\x28\xbf\x56 +\xed\x61\xf8\x8f\x03\x9a\x31\xc8 +\x3c\xd3\x6a\x01\x75\x0c\xa3\x17 +\xae\x45\xdc\x50\xe7\x7e\x15\x89 +\x20\xb7\x2b\xc2\x59\xf0\x64\xfb +\x92\x06\x9d\x34\xcb\x3f\xd6\x6d +\x04\x78\x0f\xa6\x1a\xb1\x48\xdf +\x53\xea\x81\x18\x8c\x23\xba\x2e +\xc5\x5c\xf3\x67\xfe\x95\x09\xa0 +\x37\xce\x42\xd9\x70\x07\x7b\x12 +\xa9\x1d\xb4\x4b\xe2\x56\xed\x84 +\x1b\x8f\x26\xbd\x31\xc8\x5f\xf6 +\x6a\x01\x98\x0c\xa3\x3a\xd1\x45 +\xdc\x73\x0a\x7e\x15\xac\x20\xb7 +\x4e\xe5\x59\xf0\x87\x1e\x92\x29 +\xc0\x34\xcb\x62\xf9\x6d\x04\x9b +\x0f\xa6\x3d\xd4\x48\xdf\x76\x0d +\x81\x18\xaf\x23\xba\x51\xe8\x5c +\xf3\x8a\x21\x95\x2c\xc3\x37\xce +\x65\xfc\x70\x07\x9e\x12\xa9\x40 +\xd7\x4b\xe2\x79\x10\x84\x1b\xb2 +\x26\xbd\x54\xeb\x5f\xf6\x8d\x01 +\x98\x2f\xc6\x3a\xd1\x68\xff\x73 +\x0a\xa1\x15\xac\x43\xda\x4e\xe5 +\x7c\x13\x87\x1e\xb5\x29\xc0\x57 +\xee\x62\xf9\x90\x04\x9b\x32\xc9 +\x3d\xd4\x6b\x02\x76\x0d\xa4\x18 +\xaf\x46\xdd\x51\xe8\x7f\x16\x8a +\x21\xb8\x2c\xc3\x5a\xf1\x65\xfc +\x93\x07\x9e\x35\xcc\x40\xd7\x6e +\x05\x79\x10\xa7
Re: [PATCH 2/2] SHA1 transform: x86_64 AVX2 optimization - glue build - resend with email correction
On 27.02.2014 19:42, chandramouli narayanan wrote: This git patch adds the glue, build and configuration changes to include x86_64 AVX2 optimization of SHA1 transform to crypto support. The patch has been tested with 3.14.0-rc1 kernel. On a Haswell desktop, with turbo disabled and all cpus running at maximum frequency, tcrypt shows AVX2 performance improvement from 3% for 256 bytes update to 16% for 1024 bytes update over AVX implementation. Signed-off-by: Chandramouli Narayanan mo...@linux.intel.com ..snip.. static int __init sha1_ssse3_mod_init(void) { + char *algo_name; /* test for SSSE3 first */ - if (cpu_has_ssse3) + if (cpu_has_ssse3) { sha1_transform_asm = sha1_transform_ssse3; + algo_name = SSSE3; + } #ifdef CONFIG_AS_AVX /* allow AVX to override SSSE3, it's a little faster */ - if (avx_usable()) - sha1_transform_asm = sha1_transform_avx; + if (avx_usable()) { + if (cpu_has_avx) { + sha1_transform_asm = sha1_transform_avx; + algo_name = AVX; + } +#ifdef CONFIG_AS_AVX2 + if (cpu_has_avx2) { Wouldn't you need to check also for BMI2 as __sha1_transform_avx2 uses 'rorx'? For example, commit 16c0c4e1656c14ef9deac189a4240b5ca19c6919 added BMI2 check for SHA-256. -Jussi + /* allow AVX2 to override AVX, it's a little faster */ + sha1_transform_asm = __sha1_transform_avx2; + algo_name = AVX2; + } +#endif + } #endif -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch] crypto: remove a duplicate checks in __cbc_decrypt()
On 13.02.2014 16:58, Dan Carpenter wrote: We checked nbytes bsize before so it can't happen here. Signed-off-by: Dan Carpenter dan.carpen...@oracle.com Acked-by: Jussi Kivilinna jussi.kivili...@iki.fi --- This doesn't change how the code works, but maybe their is a bug in the original code. Please review? diff --git a/arch/x86/crypto/cast5_avx_glue.c b/arch/x86/crypto/cast5_avx_glue.c index e6a3700489b9..e57e20ab5e0b 100644 --- a/arch/x86/crypto/cast5_avx_glue.c +++ b/arch/x86/crypto/cast5_avx_glue.c @@ -203,9 +203,6 @@ static unsigned int __cbc_decrypt(struct blkcipher_desc *desc, src -= 1; dst -= 1; } while (nbytes = bsize * CAST5_PARALLEL_BLOCKS); - - if (nbytes bsize) - goto done; } /* Handle leftovers */ diff --git a/arch/x86/crypto/blowfish_glue.c b/arch/x86/crypto/blowfish_glue.c index 50ec333b70e6..8af519ed73d1 100644 --- a/arch/x86/crypto/blowfish_glue.c +++ b/arch/x86/crypto/blowfish_glue.c @@ -223,9 +223,6 @@ static unsigned int __cbc_decrypt(struct blkcipher_desc *desc, src -= 1; dst -= 1; } while (nbytes = bsize * 4); - - if (nbytes bsize) - goto done; } /* Handle leftovers */ -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] Unaligned CTR mode tests in crypto/testmgr.h
On 30.10.2013 23:06, Joel Fernandes wrote: On 10/30/2013 06:09 AM, Jussi Kivilinna wrote: On 30.10.2013 02:11, Joel Fernandes wrote: Hi, Some tests such as test 5 in AES CTR mode in crypto/testmgr.h have a unaligned input buffer size such as 499 which is not aligned to any 0 power of 2. Due to this, omap-aes driver, and I think atmel-aes too error out when encryption is requested for these buffers. pr_err(request size is not exact amount of AES blocks\n) or a similar message. Is this failure considered a bug? How do we fix it? Counter mode turns block cipher into stream cipher and implementation must handle buffer lengths that do not match the block size of underlying block cipher. How were the result output vectors generated, did you use 0 padding? Do we 0 pad the inputs to align in these cases to get correct results? See crypto/ctr.c:crypto_ctr_crypt_final() how to handle trailing bytes when 'buflen % AES_BLOCK_SIZE != 0'. Basically, you encrypt the last counter block to generate the last keystream block and xor only the 'buflen % AES_BLOCK_SIZE' bytes of last keystream block with the tail bytes of source buffer: key_last[0..15] = ENC(K, counter[0..15]); dst_last[0..trailbytes-1] = src_last[0..trailbytes-1] ^ key_last[0..trailbytes-1]; /* key_last[trailbytes..15] discarded. */ Or if you want to use hardware that only does block-size aligned CTR encryption, you can pad input to block size aligned length, do encryption, and then discard those padding bytes after encryption: src_padded[0..trailbytes-1] = src_last[0..trailbytes-1] src_padded[trailbytes..15] = /* don't care, can be anything/uninitialized */ src_padded[0..15] = ENC_HW_CTR(src_padded[0..15]); dst_last[0..trailbytes-1] = src_padded[0..trailbytes-1]; /* src_padded[trailbytes..15] discarded. */ Here, ENC_HW_CTR(in) internally does: keystream[0..15] = ENC(K, counter[0..15]); INC_CTR(counter); out[0..15] = in[0..15] ^ keystream[0..15]; Thanks, I'll try that. Just one question- is it safe to assume the output buffer (req-dst) is capable of holding those many bytes? In your algorithm above, we're assuming here without allocating explicitly that the output buffer passed to the driver has trailbytes..15 available. Because otherwise we are in danger of introducing a memory leak, if we just assume they are available in the output buffer. In above example, I meant src_padded being temporary block-sized buffer to handle the last trailing bytes. I don't think you can assume that req-dst would have this extra space. That said, I don't want to allocate new buffer in the driver and then do copying of encrypted data back into the output buffer. Because I did lot of hard work to get rid of such code as it is slower. Could you handle first 'buflen - buflen % blocksize' bytes as done currently without extra copies and then handle the trailing bytes separately? -Jussi thanks, -Joel -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] Unaligned CTR mode tests in crypto/testmgr.h
On 30.10.2013 02:11, Joel Fernandes wrote: Hi, Some tests such as test 5 in AES CTR mode in crypto/testmgr.h have a unaligned input buffer size such as 499 which is not aligned to any 0 power of 2. Due to this, omap-aes driver, and I think atmel-aes too error out when encryption is requested for these buffers. pr_err(request size is not exact amount of AES blocks\n) or a similar message. Is this failure considered a bug? How do we fix it? Counter mode turns block cipher into stream cipher and implementation must handle buffer lengths that do not match the block size of underlying block cipher. How were the result output vectors generated, did you use 0 padding? Do we 0 pad the inputs to align in these cases to get correct results? See crypto/ctr.c:crypto_ctr_crypt_final() how to handle trailing bytes when 'buflen % AES_BLOCK_SIZE != 0'. Basically, you encrypt the last counter block to generate the last keystream block and xor only the 'buflen % AES_BLOCK_SIZE' bytes of last keystream block with the tail bytes of source buffer: key_last[0..15] = ENC(K, counter[0..15]); dst_last[0..trailbytes-1] = src_last[0..trailbytes-1] ^ key_last[0..trailbytes-1]; /* key_last[trailbytes..15] discarded. */ Or if you want to use hardware that only does block-size aligned CTR encryption, you can pad input to block size aligned length, do encryption, and then discard those padding bytes after encryption: src_padded[0..trailbytes-1] = src_last[0..trailbytes-1] src_padded[trailbytes..15] = /* don't care, can be anything/uninitialized */ src_padded[0..15] = ENC_HW_CTR(src_padded[0..15]); dst_last[0..trailbytes-1] = src_padded[0..trailbytes-1]; /* src_padded[trailbytes..15] discarded. */ Here, ENC_HW_CTR(in) internally does: keystream[0..15] = ENC(K, counter[0..15]); INC_CTR(counter); out[0..15] = in[0..15] ^ keystream[0..15]; -Jussi thanks, -Joel -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Documentation: kerneli typo in description for Serpent cipher algorithm Bug #60848
On 02.10.2013 21:12, Rob Landley wrote: On 10/02/2013 11:10:37 AM, Kevin Mulvey wrote: change kerneli to kernel as well as kerneli.org to kernel.org Signed-off-by: Kevin Mulvey ke...@kevinmulvey.net There's a bug number for this? Acked, queued. (Although I'm not sure the value of pointing to www.kernel.org for this.) I think kerneli.org is correct.. see old website at http://web.archive.org/web/20010201085500/http://www.kerneli.org/ -Jussi Thanks, Rob -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/4] crypto: create generic version of ablk_helper
On 20.09.2013 21:46, Ard Biesheuvel wrote: Create a generic version of ablk_helper so it can be reused by other architectures. Acked-by: Jussi Kivilinna jussi.kivili...@iki.fi Signed-off-by: Ard Biesheuvel ard.biesheu...@linaro.org Why resent this patch here when this was in the earlier patchset? http://marc.info/?l=linux-crypto-vgerm=137966378813818w=2 -Jussi --- crypto/Kconfig | 4 ++ crypto/Makefile | 1 + crypto/ablk_helper.c | 150 +++ include/asm-generic/simd.h | 14 include/crypto/ablk_helper.h | 31 + 5 files changed, 200 insertions(+) create mode 100644 crypto/ablk_helper.c create mode 100644 include/asm-generic/simd.h create mode 100644 include/crypto/ablk_helper.h diff --git a/crypto/Kconfig b/crypto/Kconfig index 69ce573..8179ae6 100644 --- a/crypto/Kconfig +++ b/crypto/Kconfig @@ -179,6 +179,10 @@ config CRYPTO_ABLK_HELPER_X86 depends on X86 select CRYPTO_CRYPTD +config CRYPTO_ABLK_HELPER + tristate + select CRYPTO_CRYPTD + config CRYPTO_GLUE_HELPER_X86 tristate depends on X86 diff --git a/crypto/Makefile b/crypto/Makefile index 80019ba..5e1bdb1 100644 --- a/crypto/Makefile +++ b/crypto/Makefile @@ -104,3 +104,4 @@ obj-$(CONFIG_CRYPTO_USER_API_SKCIPHER) += algif_skcipher.o obj-$(CONFIG_XOR_BLOCKS) += xor.o obj-$(CONFIG_ASYNC_CORE) += async_tx/ obj-$(CONFIG_ASYMMETRIC_KEY_TYPE) += asymmetric_keys/ +obj-$(CONFIG_CRYPTO_ABLK_HELPER) += ablk_helper.o diff --git a/crypto/ablk_helper.c b/crypto/ablk_helper.c new file mode 100644 index 000..62568b1 --- /dev/null +++ b/crypto/ablk_helper.c @@ -0,0 +1,150 @@ +/* + * Shared async block cipher helpers + * + * Copyright (c) 2012 Jussi Kivilinna jussi.kivili...@mbnet.fi + * + * Based on aesni-intel_glue.c by: + * Copyright (C) 2008, Intel Corp. + *Author: Huang Ying ying.hu...@intel.com + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 + * USA + * + */ + +#include linux/kernel.h +#include linux/crypto.h +#include linux/init.h +#include linux/module.h +#include linux/hardirq.h +#include crypto/algapi.h +#include crypto/cryptd.h +#include crypto/ablk_helper.h +#include asm/simd.h + +int ablk_set_key(struct crypto_ablkcipher *tfm, const u8 *key, + unsigned int key_len) +{ + struct async_helper_ctx *ctx = crypto_ablkcipher_ctx(tfm); + struct crypto_ablkcipher *child = ctx-cryptd_tfm-base; + int err; + + crypto_ablkcipher_clear_flags(child, CRYPTO_TFM_REQ_MASK); + crypto_ablkcipher_set_flags(child, crypto_ablkcipher_get_flags(tfm) + CRYPTO_TFM_REQ_MASK); + err = crypto_ablkcipher_setkey(child, key, key_len); + crypto_ablkcipher_set_flags(tfm, crypto_ablkcipher_get_flags(child) + CRYPTO_TFM_RES_MASK); + return err; +} +EXPORT_SYMBOL_GPL(ablk_set_key); + +int __ablk_encrypt(struct ablkcipher_request *req) +{ + struct crypto_ablkcipher *tfm = crypto_ablkcipher_reqtfm(req); + struct async_helper_ctx *ctx = crypto_ablkcipher_ctx(tfm); + struct blkcipher_desc desc; + + desc.tfm = cryptd_ablkcipher_child(ctx-cryptd_tfm); + desc.info = req-info; + desc.flags = 0; + + return crypto_blkcipher_crt(desc.tfm)-encrypt( + desc, req-dst, req-src, req-nbytes); +} +EXPORT_SYMBOL_GPL(__ablk_encrypt); + +int ablk_encrypt(struct ablkcipher_request *req) +{ + struct crypto_ablkcipher *tfm = crypto_ablkcipher_reqtfm(req); + struct async_helper_ctx *ctx = crypto_ablkcipher_ctx(tfm); + + if (!may_use_simd()) { + struct ablkcipher_request *cryptd_req = + ablkcipher_request_ctx(req); + + memcpy(cryptd_req, req, sizeof(*req)); + ablkcipher_request_set_tfm(cryptd_req, ctx-cryptd_tfm-base); + + return crypto_ablkcipher_encrypt(cryptd_req); + } else { + return __ablk_encrypt(req); + } +} +EXPORT_SYMBOL_GPL(ablk_encrypt); + +int ablk_decrypt(struct ablkcipher_request *req) +{ + struct crypto_ablkcipher *tfm = crypto_ablkcipher_reqtfm(req); + struct async_helper_ctx *ctx
Re: [PATCH 4/4] ARM: add support for bit sliced AES using NEON instructions
On 20.09.2013 21:46, Ard Biesheuvel wrote: This implementation of the AES algorithm gives around 45% speedup on Cortex-A15 for CTR mode and for XTS in encryption mode. Both CBC and XTS in decryption mode are slightly faster (5 - 10% on Cortex-A15). [As CBC in encryption mode can only be performed sequentially, there is no speedup in this case.] Unlike the core AES cipher (on which this module also depends), this algorithm uses bit slicing to process up to 8 blocks in parallel in constant time. This algorithm does not rely on any lookup tables so it is believed to be invulnerable to cache timing attacks. The core code has been adopted from the OpenSSL project (in collaboration with the original author, on cc). For ease of maintenance, this version is identical to the upstream OpenSSL code, i.e., all modifications that were required to make it suitable for inclusion into the kernel have already been merged upstream. Cc: Andy Polyakov ap...@openssl.org Signed-off-by: Ard Biesheuvel ard.biesheu...@linaro.org --- [..snip..] + bcc .Ldec_done + @ multiplication by 0x0e Decryption can probably be made faster by implementing InvMixColumns slightly differently. Instead of implementing inverse MixColumns matrix directly, use preprocessing step, followed by MixColumns as described in section 4.1.3 Decryption of The Design of Rijndael: AES - The Advanced Encryption Standard (J. Daemen, V. Rijmen / 2002). In short, the MixColumns and InvMixColumns matrixes have following relation: | 0e 0b 0d 09 | | 02 03 01 01 | | 05 00 04 00 | | 09 0e 0b 0d | = | 01 02 03 01 | x | 00 05 00 04 | | 0d 09 0e 0b | | 01 01 02 03 | | 04 00 05 00 | | 0b 0d 09 0e | | 03 01 01 02 | | 00 04 00 05 | Bit-sliced implementation of the 05-00-04-00 matrix much shorter than 0e-0b-0d-09 matrix, so even when combined with MixColumns total instruction count for InvMixColumns implemented this way should be nearly half of current. Check [1] for implementation of this on AVX instruction set. -Jussi [1] https://github.com/jkivilin/supercop-blockciphers/blob/beyond_master/crypto_stream/aes128ctr/avx/aes_asm_bitslice_avx.S#L234 -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] crypto: move ablk_helper out of arch/x86
On 14.09.2013 13:24, Ard Biesheuvel wrote: Move the ablk_helper code out of arch/x86 so it can be reused by other architectures. The only x86 specific dependency is a call to irq_fpu_usable(), in the generic case we use !in_interrupt() instead. Cc: jussi.kivili...@iki.fi Signed-off-by: Ard Biesheuvel ard.biesheu...@linaro.org --- Any need to split this up between generic/crypto and x86? arch/x86/crypto/Makefile | 1 - arch/x86/crypto/ablk_helper.c | 149 arch/x86/crypto/aesni-intel_glue.c | 2 +- arch/x86/crypto/camellia_aesni_avx2_glue.c | 2 +- arch/x86/crypto/camellia_aesni_avx_glue.c | 2 +- arch/x86/crypto/cast5_avx_glue.c | 2 +- arch/x86/crypto/cast6_avx_glue.c | 2 +- arch/x86/crypto/serpent_avx2_glue.c| 2 +- arch/x86/crypto/serpent_avx_glue.c | 2 +- arch/x86/crypto/serpent_sse2_glue.c| 2 +- arch/x86/crypto/twofish_avx_glue.c | 2 +- arch/x86/include/asm/crypto/ablk_helper.h | 38 ++-- crypto/Kconfig | 23 +++-- crypto/Makefile| 1 + crypto/ablk_helper.c | 150 + include/asm-generic/crypto/ablk_helper.h | 13 +++ include/crypto/ablk_helper.h | 31 ++ 17 files changed, 224 insertions(+), 200 deletions(-) delete mode 100644 arch/x86/crypto/ablk_helper.c create mode 100644 crypto/ablk_helper.c create mode 100644 include/asm-generic/crypto/ablk_helper.h create mode 100644 include/crypto/ablk_helper.h diff --git a/arch/x86/crypto/Makefile b/arch/x86/crypto/Makefile index 7d6ba9d..18fda50 100644 --- a/arch/x86/crypto/Makefile +++ b/arch/x86/crypto/Makefile @@ -4,7 +4,6 @@ avx_supported := $(call as-instr,vpxor %xmm0$(comma)%xmm0$(comma)%xmm0,yes,no) -obj-$(CONFIG_CRYPTO_ABLK_HELPER_X86) += ablk_helper.o obj-$(CONFIG_CRYPTO_GLUE_HELPER_X86) += glue_helper.o This part does not apply cleanly to cryptodev tree (git://git.kernel.org/pub/scm/linux/kernel/git/herbert/cryptodev-2.6.git). [snip] diff --git a/include/asm-generic/crypto/ablk_helper.h b/include/asm-generic/crypto/ablk_helper.h new file mode 100644 index 000..ede807f --- /dev/null +++ b/include/asm-generic/crypto/ablk_helper.h @@ -0,0 +1,13 @@ + +#include linux/hardirq.h + +/* + * ablk_can_run_sync - used by crypto/ablk_helper to decide whether a request + * can be handled synchronously or needs to be queued up. + * + * Choose in_interrupt() as a reasonable default + */ Trailing whitespace in above comment block. ERROR: trailing whitespace #702: FILE: include/asm-generic/crypto/ablk_helper.h:7: + * $ Otherwise, Acked-by: Jussi Kivilinna jussi.kivili...@iki.fi -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 1/2] crypto: move ablk_helper out of arch/x86
On 13.09.2013 18:08, Ard Biesheuvel wrote: Move the ablk_helper code out of arch/x86 so it can be reused by other architectures. The only x86 specific dependency was a call to irq_fpu_usable(), this has been factored out and moved to crypto/ablk_helper_x86.c Signed-off-by: Ard Biesheuvel ard.biesheu...@linaro.org --- ..snip.. diff --git a/crypto/ablk_helper_generic.c b/crypto/ablk_helper_generic.c new file mode 100644 index 000..b63b800 --- /dev/null +++ b/crypto/ablk_helper_generic.c @@ -0,0 +1,155 @@ +/* + * Shared async block cipher helpers + * + * Copyright (c) 2012 Jussi Kivilinna jussi.kivili...@mbnet.fi + * + * Based on aesni-intel_glue.c by: + * Copyright (C) 2008, Intel Corp. + *Author: Huang Ying ying.hu...@intel.com + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 + * USA + * + */ + +#include linux/kernel.h +#include linux/crypto.h +#include linux/init.h +#include linux/module.h +#include linux/hardirq.h +#include crypto/algapi.h +#include crypto/cryptd.h +#include crypto/ablk_helper.h + +/* can be overridden by the architecture if desired */ +bool __weak ablk_can_run_sync(void) +{ + return !in_interrupt(); +} Why not have architecture specific header file that provides this function? With architecture using in_interrupt for this, you would avoid extra function call. -Jussi -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 2/2] arm64: add support for AES using ARMv8 Crypto Extensions
On 13.09.2013 18:08, Ard Biesheuvel wrote: This adds ARMv8 Crypto Extensions based implemenations of AES in CBC, CTR and XTS mode. Signed-off-by: Ard Biesheuvel ard.biesheu...@linaro.org --- ..snip.. +static int xts_set_key(struct crypto_tfm *tfm, const u8 *in_key, +unsigned int key_len) +{ + struct crypto_aes_xts_ctx *ctx = crypto_tfm_ctx(tfm); + u32 *flags = tfm-crt_flags; + int ret; + + ret = crypto_aes_expand_key(ctx-key1, in_key, key_len/2); + if (!ret) + ret = crypto_aes_expand_key(ctx-key2, in_key[key_len/2], + key_len/2); Use checkpatch. + if (!ret) + return 0; + + *flags |= CRYPTO_TFM_RES_BAD_KEY_LEN; + return -EINVAL; +} + +static int cbc_encrypt(struct blkcipher_desc *desc, struct scatterlist *dst, +struct scatterlist *src, unsigned int nbytes) +{ + struct crypto_aes_ctx *ctx = crypto_blkcipher_ctx(desc-tfm); + int err, first, rounds = 6 + ctx-key_length/4; + struct blkcipher_walk walk; + unsigned int blocks; + + blkcipher_walk_init(walk, dst, src, nbytes); + err = blkcipher_walk_virt(desc, walk); + + kernel_neon_begin(); Is sleeping allowed within kernel_neon_begin/end block? If not, you need to clear CRYPTO_TFM_REQ_MAY_SLEEP on desc-flags. Otherwise blkcipher_walk_done might sleep. + for (first = 1; (blocks = (walk.nbytes / AES_BLOCK_SIZE)); first = 0) { + aesce_cbc_encrypt(walk.dst.virt.addr, walk.src.virt.addr, + (u8*)ctx-key_enc, rounds, blocks, walk.iv, + first); + + err = blkcipher_walk_done(desc, walk, blocks * AES_BLOCK_SIZE); + } + kernel_neon_end(); + + /* non-integral sizes are not supported in CBC */ + if (unlikely(walk.nbytes)) + err = -EINVAL; I think blkcipher_walk_done already does this check by comparing against alg.cra_blocksize. + + return err; +} ..snip.. + +static int ctr_encrypt(struct blkcipher_desc *desc, struct scatterlist *dst, +struct scatterlist *src, unsigned int nbytes) +{ + struct crypto_aes_ctx *ctx = crypto_blkcipher_ctx(desc-tfm); + int err, first, rounds = 6 + ctx-key_length/4; + struct blkcipher_walk walk; + u8 ctr[AES_BLOCK_SIZE]; + + blkcipher_walk_init(walk, dst, src, nbytes); + err = blkcipher_walk_virt(desc, walk); + + memcpy(ctr, walk.iv, AES_BLOCK_SIZE); + + kernel_neon_begin(); + for (first = 1; (nbytes = walk.nbytes); first = 0) { + aesce_ctr_encrypt(walk.dst.virt.addr, walk.src.virt.addr, + (u8*)ctx-key_enc, rounds, nbytes, ctr, first); + + err = blkcipher_walk_done(desc, walk, 0); + + /* non-integral block *must* be the last one */ + if (unlikely(walk.nbytes (nbytes (AES_BLOCK_SIZE-1 { + err = -EINVAL; Other CTR implementations do not have this.. not needed? + break; + } + } ..snip.. +static struct crypto_alg aesce_cbc_algs[] = { { + .cra_name = __cbc-aes-aesce, + .cra_driver_name= __driver-cbc-aes-aesce, + .cra_priority = 0, + .cra_flags = CRYPTO_ALG_TYPE_BLKCIPHER, + .cra_blocksize = AES_BLOCK_SIZE, + .cra_ctxsize= sizeof(struct crypto_aes_ctx), + .cra_alignmask = 0, + .cra_type = crypto_blkcipher_type, + .cra_module = THIS_MODULE, + .cra_u = { + .blkcipher = { + .min_keysize= AES_MIN_KEY_SIZE, + .max_keysize= AES_MAX_KEY_SIZE, + .ivsize = AES_BLOCK_SIZE, + .setkey = crypto_aes_set_key, + .encrypt= cbc_encrypt, + .decrypt= cbc_decrypt, + }, + }, +}, { + .cra_name = __ctr-aes-aesce, + .cra_driver_name= __driver-ctr-aes-aesce, + .cra_priority = 0, + .cra_flags = CRYPTO_ALG_TYPE_BLKCIPHER, + .cra_blocksize = AES_BLOCK_SIZE, CTR mode is stream cipher, cra_blocksize must be set to 1. This should have been picked up by in-kernel run-time tests, check CONFIG_CRYPTO_MANAGER_DISABLE_TESTS (and CONFIG_CRYPTO_TEST/tcrypt module). + .cra_ctxsize= sizeof(struct crypto_aes_ctx), + .cra_alignmask = 0, + .cra_type = crypto_blkcipher_type, + .cra_module = THIS_MODULE, + .cra_u = { + .blkcipher = { + .min_keysize= AES_MIN_KEY_SIZE, + .max_keysize= AES_MAX_KEY_SIZE, + .ivsize = AES_BLOCK_SIZE, +
Re: Mistake ?
On 03.09.2013 15:36, Pierre-Mayeul Badaire wrote: Good afternoon, Don't you have a mistake on the MODULE_ALIAS at the last line of the commit ? Shouldn't it be MODULE_ALIAS(sha224) here ? Yes, that's correct, it should be ssh224 instead of sha384. I'll post patch soon. -Jussi Reference: http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=a710f761fc9ae5728765a5917f8beabb49f98483 Best regards, -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Mistake ?
On 03.09.2013 16:01, Jussi Kivilinna wrote: On 03.09.2013 15:36, Pierre-Mayeul Badaire wrote: Good afternoon, Don't you have a mistake on the MODULE_ALIAS at the last line of the commit ? Shouldn't it be MODULE_ALIAS(sha224) here ? Yes, that's correct, it should be ssh224 instead of sha384. I'll post patch soon. sha224. -Jussi -Jussi Reference: http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=a710f761fc9ae5728765a5917f8beabb49f98483 Best regards, -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] crypto: sha256_ssse3 - use correct module alias for sha224
Commit a710f761f (crypto: sha256_ssse3 - add sha224 support) attempted to add MODULE_ALIAS for SHA-224, but it ended up being sha384, probably because mix-up with previous commit 340991e30 (crypto: sha512_ssse3 - add sha384 support). Patch corrects module alias to sha224. Reported-by: Pierre-Mayeul Badaire pierre-mayeul.bada...@m4x.org Signed-off-by: Jussi Kivilinna jussi.kivili...@iki.fi --- arch/x86/crypto/sha256_ssse3_glue.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/x86/crypto/sha256_ssse3_glue.c b/arch/x86/crypto/sha256_ssse3_glue.c index 50226c4..85021a4 100644 --- a/arch/x86/crypto/sha256_ssse3_glue.c +++ b/arch/x86/crypto/sha256_ssse3_glue.c @@ -319,4 +319,4 @@ MODULE_LICENSE(GPL); MODULE_DESCRIPTION(SHA256 Secure Hash Algorithm, Supplemental SSE3 accelerated); MODULE_ALIAS(sha256); -MODULE_ALIAS(sha384); +MODULE_ALIAS(sha224); -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] crypto: x86: restore avx2_supported check
Commit 3d387ef08c4 (Revert crypto: blowfish - add AVX2/x86_64 implementation of blowfish cipher) reverted too much as it removed the 'assembler supports AVX2' check and therefore disabled remaining AVX2 implementations of Camellia and Serpent. Patch restores the check and enables these implementations. Signed-off-by: Jussi Kivilinna jussi.kivili...@iki.fi --- arch/x86/crypto/Makefile |2 ++ 1 file changed, 2 insertions(+) diff --git a/arch/x86/crypto/Makefile b/arch/x86/crypto/Makefile index 7d6ba9d..75b08e1e 100644 --- a/arch/x86/crypto/Makefile +++ b/arch/x86/crypto/Makefile @@ -3,6 +3,8 @@ # avx_supported := $(call as-instr,vpxor %xmm0$(comma)%xmm0$(comma)%xmm0,yes,no) +avx2_supported := $(call as-instr,vpgatherdd %ymm0$(comma)(%eax$(comma)%ymm1\ + $(comma)4)$(comma)%ymm2,yes,no) obj-$(CONFIG_CRYPTO_ABLK_HELPER_X86) += ablk_helper.o obj-$(CONFIG_CRYPTO_GLUE_HELPER_X86) += glue_helper.o -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/4] crypto: testmgr - test skciphers with unaligned buffers
This patch adds unaligned buffer tests for blkciphers. The first new test is with one byte offset and the second test checks if cra_alignmask for driver is big enough; for example, for testing a case where cra_alignmask is set to 7, but driver really needs buffers to be aligned to 16 bytes. Signed-off-by: Jussi Kivilinna jussi.kivili...@iki.fi --- crypto/testmgr.c | 33 + 1 file changed, 29 insertions(+), 4 deletions(-) diff --git a/crypto/testmgr.c b/crypto/testmgr.c index a81c154..8bd185f 100644 --- a/crypto/testmgr.c +++ b/crypto/testmgr.c @@ -820,7 +820,7 @@ out_nobuf: static int __test_skcipher(struct crypto_ablkcipher *tfm, int enc, struct cipher_testvec *template, unsigned int tcount, - const bool diff_dst) + const bool diff_dst, const int align_offset) { const char *algo = crypto_tfm_alg_driver_name(crypto_ablkcipher_tfm(tfm)); @@ -876,10 +876,12 @@ static int __test_skcipher(struct crypto_ablkcipher *tfm, int enc, j++; ret = -EINVAL; - if (WARN_ON(template[i].ilen PAGE_SIZE)) + if (WARN_ON(align_offset + template[i].ilen + PAGE_SIZE)) goto out; data = xbuf[0]; + data += align_offset; memcpy(data, template[i].input, template[i].ilen); crypto_ablkcipher_clear_flags(tfm, ~0); @@ -900,6 +902,7 @@ static int __test_skcipher(struct crypto_ablkcipher *tfm, int enc, sg_init_one(sg[0], data, template[i].ilen); if (diff_dst) { data = xoutbuf[0]; + data += align_offset; sg_init_one(sgout[0], data, template[i].ilen); } @@ -941,6 +944,9 @@ static int __test_skcipher(struct crypto_ablkcipher *tfm, int enc, j = 0; for (i = 0; i tcount; i++) { + /* alignment tests are only done with continuous buffers */ + if (align_offset != 0) + break; if (template[i].iv) memcpy(iv, template[i].iv, MAX_IVLEN); @@ -1075,15 +1081,34 @@ out_nobuf: static int test_skcipher(struct crypto_ablkcipher *tfm, int enc, struct cipher_testvec *template, unsigned int tcount) { + unsigned int alignmask; int ret; /* test 'dst == src' case */ - ret = __test_skcipher(tfm, enc, template, tcount, false); + ret = __test_skcipher(tfm, enc, template, tcount, false, 0); if (ret) return ret; /* test 'dst != src' case */ - return __test_skcipher(tfm, enc, template, tcount, true); + ret = __test_skcipher(tfm, enc, template, tcount, true, 0); + if (ret) + return ret; + + /* test unaligned buffers, check with one byte offset */ + ret = __test_skcipher(tfm, enc, template, tcount, true, 1); + if (ret) + return ret; + + alignmask = crypto_tfm_alg_alignmask(tfm-base); + if (alignmask) { + /* Check if alignment mask for tfm is correctly set. */ + ret = __test_skcipher(tfm, enc, template, tcount, true, + alignmask + 1); + if (ret) + return ret; + } + + return 0; } static int test_comp(struct crypto_comp *tfm, struct comp_testvec *ctemplate, -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 4/4] crypto: testmgr - test hash implementations with unaligned buffers
This patch adds unaligned buffer tests for hashes. The first new test is with one byte offset and the second test checks if cra_alignmask for driver is big enough; for example, for testing a case where cra_alignmask is set to 7, but driver really needs buffers to be aligned to 16 bytes. Signed-off-by: Jussi Kivilinna jussi.kivili...@iki.fi --- crypto/testmgr.c | 41 +++-- 1 file changed, 39 insertions(+), 2 deletions(-) diff --git a/crypto/testmgr.c b/crypto/testmgr.c index f205386..2f00607 100644 --- a/crypto/testmgr.c +++ b/crypto/testmgr.c @@ -184,8 +184,9 @@ static int do_one_async_hash_op(struct ahash_request *req, return ret; } -static int test_hash(struct crypto_ahash *tfm, struct hash_testvec *template, -unsigned int tcount, bool use_digest) +static int __test_hash(struct crypto_ahash *tfm, struct hash_testvec *template, + unsigned int tcount, bool use_digest, + const int align_offset) { const char *algo = crypto_tfm_alg_driver_name(crypto_ahash_tfm(tfm)); unsigned int i, j, k, temp; @@ -216,10 +217,15 @@ static int test_hash(struct crypto_ahash *tfm, struct hash_testvec *template, if (template[i].np) continue; + ret = -EINVAL; + if (WARN_ON(align_offset + template[i].psize PAGE_SIZE)) + goto out; + j++; memset(result, 0, 64); hash_buff = xbuf[0]; + hash_buff += align_offset; memcpy(hash_buff, template[i].plaintext, template[i].psize); sg_init_one(sg[0], hash_buff, template[i].psize); @@ -281,6 +287,10 @@ static int test_hash(struct crypto_ahash *tfm, struct hash_testvec *template, j = 0; for (i = 0; i tcount; i++) { + /* alignment tests are only done with continuous buffers */ + if (align_offset != 0) + break; + if (template[i].np) { j++; memset(result, 0, 64); @@ -358,6 +368,33 @@ out_nobuf: return ret; } +static int test_hash(struct crypto_ahash *tfm, struct hash_testvec *template, +unsigned int tcount, bool use_digest) +{ + unsigned int alignmask; + int ret; + + ret = __test_hash(tfm, template, tcount, use_digest, 0); + if (ret) + return ret; + + /* test unaligned buffers, check with one byte offset */ + ret = __test_hash(tfm, template, tcount, use_digest, 1); + if (ret) + return ret; + + alignmask = crypto_tfm_alg_alignmask(tfm-base); + if (alignmask) { + /* Check if alignment mask for tfm is correctly set. */ + ret = __test_hash(tfm, template, tcount, use_digest, + alignmask + 1); + if (ret) + return ret; + } + + return 0; +} + static int __test_aead(struct crypto_aead *tfm, int enc, struct aead_testvec *template, unsigned int tcount, const bool diff_dst, const int align_offset) -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/4] crypto: testmgr - test AEADs with unaligned buffers
This patch adds unaligned buffer tests for AEADs. The first new test is with one byte offset and the second test checks if cra_alignmask for driver is big enough; for example, for testing a case where cra_alignmask is set to 7, but driver really needs buffers to be aligned to 16 bytes. Signed-off-by: Jussi Kivilinna jussi.kivili...@iki.fi --- crypto/testmgr.c | 37 +++-- 1 file changed, 31 insertions(+), 6 deletions(-) diff --git a/crypto/testmgr.c b/crypto/testmgr.c index 8bd185f..f205386 100644 --- a/crypto/testmgr.c +++ b/crypto/testmgr.c @@ -360,7 +360,7 @@ out_nobuf: static int __test_aead(struct crypto_aead *tfm, int enc, struct aead_testvec *template, unsigned int tcount, - const bool diff_dst) + const bool diff_dst, const int align_offset) { const char *algo = crypto_tfm_alg_driver_name(crypto_aead_tfm(tfm)); unsigned int i, j, k, n, temp; @@ -423,15 +423,16 @@ static int __test_aead(struct crypto_aead *tfm, int enc, if (!template[i].np) { j++; - /* some tepmplates have no input data but they will + /* some templates have no input data but they will * touch input */ input = xbuf[0]; + input += align_offset; assoc = axbuf[0]; ret = -EINVAL; - if (WARN_ON(template[i].ilen PAGE_SIZE || - template[i].alen PAGE_SIZE)) + if (WARN_ON(align_offset + template[i].ilen + PAGE_SIZE || template[i].alen PAGE_SIZE)) goto out; memcpy(input, template[i].input, template[i].ilen); @@ -470,6 +471,7 @@ static int __test_aead(struct crypto_aead *tfm, int enc, if (diff_dst) { output = xoutbuf[0]; + output += align_offset; sg_init_one(sgout[0], output, template[i].ilen + (enc ? authsize : 0)); @@ -530,6 +532,10 @@ static int __test_aead(struct crypto_aead *tfm, int enc, } for (i = 0, j = 0; i tcount; i++) { + /* alignment tests are only done with continuous buffers */ + if (align_offset != 0) + break; + if (template[i].np) { j++; @@ -732,15 +738,34 @@ out_noxbuf: static int test_aead(struct crypto_aead *tfm, int enc, struct aead_testvec *template, unsigned int tcount) { + unsigned int alignmask; int ret; /* test 'dst == src' case */ - ret = __test_aead(tfm, enc, template, tcount, false); + ret = __test_aead(tfm, enc, template, tcount, false, 0); if (ret) return ret; /* test 'dst != src' case */ - return __test_aead(tfm, enc, template, tcount, true); + ret = __test_aead(tfm, enc, template, tcount, true, 0); + if (ret) + return ret; + + /* test unaligned buffers, check with one byte offset */ + ret = __test_aead(tfm, enc, template, tcount, true, 1); + if (ret) + return ret; + + alignmask = crypto_tfm_alg_alignmask(tfm-base); + if (alignmask) { + /* Check if alignment mask for tfm is correctly set. */ + ret = __test_aead(tfm, enc, template, tcount, true, + alignmask + 1); + if (ret) + return ret; + } + + return 0; } static int test_cipher(struct crypto_cipher *tfm, int enc, -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/4] crypto: testmgr - check that entries in alg_test_descs are in correct order
Patch adds check for alg_test_descs list order, so that accidentically misplaced entries are found quicker. Duplicate entries are also checked for. Signed-off-by: Jussi Kivilinna jussi.kivili...@iki.fi --- crypto/testmgr.c | 31 +++ 1 file changed, 31 insertions(+) diff --git a/crypto/testmgr.c b/crypto/testmgr.c index b2bc533..a81c154 100644 --- a/crypto/testmgr.c +++ b/crypto/testmgr.c @@ -3054,6 +3054,35 @@ static const struct alg_test_desc alg_test_descs[] = { } }; +static bool alg_test_descs_checked; + +static void alg_test_descs_check_order(void) +{ + int i; + + /* only check once */ + if (alg_test_descs_checked) + return; + + alg_test_descs_checked = true; + + for (i = 1; i ARRAY_SIZE(alg_test_descs); i++) { + int diff = strcmp(alg_test_descs[i - 1].alg, + alg_test_descs[i].alg); + + if (WARN_ON(diff 0)) { + pr_warn(testmgr: alg_test_descs entries in wrong order: '%s' before '%s'\n, + alg_test_descs[i - 1].alg, + alg_test_descs[i].alg); + } + + if (WARN_ON(diff == 0)) { + pr_warn(testmgr: duplicate alg_test_descs entry: '%s'\n, + alg_test_descs[i].alg); + } + } +} + static int alg_find_test(const char *alg) { int start = 0; @@ -3085,6 +3114,8 @@ int alg_test(const char *driver, const char *alg, u32 type, u32 mask) int j; int rc; + alg_test_descs_check_order(); + if ((type CRYPTO_ALG_TYPE_MASK) == CRYPTO_ALG_TYPE_CIPHER) { char nalg[CRYPTO_MAX_ALG_NAME]; -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: GPF in aesni_xts_crypt8 (3.10-rc5)
Hello, Does attached patch help? -Jussi On 11.06.2013 20:26, Dave Jones wrote: Just found that 3.10-rc doesn't boot on my laptop with encrypted disk. general protection fault: [#1] PREEMPT SMP DEBUG_PAGEALLOC Modules linked in: xfs libcrc32c dm_crypt crc32c_intel ghash_clmulni_intel aesni_intel glue_helper ablk_helper i915 i2c_algo_bit drm_kms_helper drm i2c_core video CPU: 1 PID: 53 Comm: kworker/1:1 Not tainted 3.10.0-rc5+ #5 Hardware name: LENOVO 2356JK8/2356JK8, BIOS G7ET94WW (2.54 ) 04/30/2013 Workqueue: kcryptd kcryptd_crypt [dm_crypt] task: 880135c58000 ti: 880135c54000 task.ti: 880135c54000 RIP: 0010:[a01433a2] [a01433a2] aesni_xts_crypt8+0x42/0x1e0 [aesni_intel] RSP: 0018:880135c55b68 EFLAGS: 00010282 RAX: a0142eb8 RBX: 0080 RCX: 00f0 RDX: 8801316eeaa8 RSI: 8801316eeaa8 RDI: 88012fd84440 RBP: 880135c55b70 R08: 8801304fe118 R09: 0020 R10: 00f0 R11: a0142eb8 R12: 8801316eeb28 R13: 0080 R14: 8801316eeb28 R15: 0180 FS: () GS:88013940() knlGS: CS: 0010 DS: ES: CR0: 80050033 CR2: 0039e88bc720 CR3: 01c0b000 CR4: 001407e0 Stack: a0143683 880135c55c40 a00602fb 880135c55c70 a0146060 01ad0190 a0146060 ea0004c5bb80 8801316eeaa8 ea0004c5bb80 8801316eeaa8 8801304fe0c0 Call Trace: [a0143683] ? aesni_xts_dec8+0x13/0x20 [aesni_intel] [a00602fb] glue_xts_crypt_128bit+0x10b/0x1c0 [glue_helper] [a014358b] xts_decrypt+0x4b/0x50 [aesni_intel] [a000617f] ablk_decrypt+0x4f/0xd0 [ablk_helper] [a0067202] crypt_convert+0x352/0x3b0 [dm_crypt] [a00675b5] kcryptd_crypt+0x355/0x4e0 [dm_crypt] [81061b35] ? process_one_work+0x1a5/0x700 [81061ba1] process_one_work+0x211/0x700 [81061b35] ? process_one_work+0x1a5/0x700 [810621ab] worker_thread+0x11b/0x3a0 [81062090] ? process_one_work+0x700/0x700 [81069f4d] kthread+0xed/0x100 [81069e60] ? insert_kthread_work+0x80/0x80 [815fd41c] ret_from_fork+0x7c/0xb0 [81069e60] ? insert_kthread_work+0x80/0x80 Code: 8d 04 25 b8 2e 14 a0 41 0f 44 ca 4c 0f 44 d8 66 44 0f 6f 14 25 00 70 14 a0 41 0f 10 18 44 8b 8f e0 01 00 00 48 01 cf 66 0f 6f c3 66 0f ef 02 f3 0f 7f 1e 66 44 0f 70 db 13 66 0f d4 db 66 41 0f RIP [a01433a2] aesni_xts_crypt8+0x42/0x1e0 [aesni_intel] RSP 880135c55b68 0: 8d 04 25 b8 2e 14 a0lea0xa0142eb8,%eax 7: 41 0f 44 ca cmove %r10d,%ecx b: 4c 0f 44 d8 cmove %rax,%r11 f: 66 44 0f 6f 14 25 00movdqa 0xa0147000,%xmm10 16: 70 14 a0 19: 41 0f 10 18 movups (%r8),%xmm3 1d: 44 8b 8f e0 01 00 00mov0x1e0(%rdi),%r9d 24: 48 01 cfadd%rcx,%rdi 27: 66 0f 6f c3 movdqa %xmm3,%xmm0 2b:*66 0f ef 02 pxor (%rdx),%xmm0 -- trapping instruction 2f: f3 0f 7f 1e movdqu %xmm3,(%rsi) 33: 66 44 0f 70 db 13 pshufd $0x13,%xmm3,%xmm11 39: 66 0f d4 db paddq %xmm3,%xmm3 3d: 66 data16 3e: 41 rex.B 3f: crypto: aesni_intel - fix accessing of unaligned memory From: Jussi Kivilinna jussi.kivili...@iki.fi The new XTS code for aesni_intel uses input buffers directly as memory operands for pxor instructions, which causes crash if those buffers are not aligned to 16 bytes. Patch change XTS code to handle unaligned memory correctly, by loading memory with movdqu instead. Reported-by: Dave Jones da...@redhat.com Signed-off-by: Jussi Kivilinna jussi.kivili...@iki.fi --- arch/x86/crypto/aesni-intel_asm.S | 48 + 1 file changed, 32 insertions(+), 16 deletions(-) diff --git a/arch/x86/crypto/aesni-intel_asm.S b/arch/x86/crypto/aesni-intel_asm.S index 62fe22c..477e9d7 100644 --- a/arch/x86/crypto/aesni-intel_asm.S +++ b/arch/x86/crypto/aesni-intel_asm.S @@ -2681,56 +2681,68 @@ ENTRY(aesni_xts_crypt8) addq %rcx, KEYP movdqa IV, STATE1 - pxor 0x00(INP), STATE1 + movdqu 0x00(INP), INC + pxor INC, STATE1 movdqu IV, 0x00(OUTP) _aesni_gf128mul_x_ble() movdqa IV, STATE2 - pxor 0x10(INP), STATE2 + movdqu 0x10(INP), INC + pxor INC, STATE2 movdqu IV, 0x10(OUTP) _aesni_gf128mul_x_ble() movdqa IV, STATE3 - pxor 0x20(INP), STATE3 + movdqu 0x20(INP), INC + pxor INC, STATE3 movdqu IV, 0x20(OUTP) _aesni_gf128mul_x_ble() movdqa IV, STATE4 - pxor 0x30(INP), STATE4 + movdqu 0x30(INP), INC + pxor INC, STATE4 movdqu IV, 0x30(OUTP) call *%r11 - pxor 0x00(OUTP), STATE1 + movdqu 0x00(OUTP), INC + pxor INC, STATE1 movdqu STATE1, 0x00(OUTP) _aesni_gf128mul_x_ble() movdqa IV, STATE1 - pxor 0x40(INP
[PATCH] crypto: aesni_intel - fix accessing of unaligned memory
The new XTS code for aesni_intel uses input buffers directly as memory operands for pxor instructions, which causes crash if those buffers are not aligned to 16 bytes. Patch changes XTS code to handle unaligned memory correctly, by loading memory with movdqu instead. Reported-by: Dave Jones da...@redhat.com Tested-by: Dave Jones da...@redhat.com Signed-off-by: Jussi Kivilinna jussi.kivili...@iki.fi --- arch/x86/crypto/aesni-intel_asm.S | 48 + 1 file changed, 32 insertions(+), 16 deletions(-) diff --git a/arch/x86/crypto/aesni-intel_asm.S b/arch/x86/crypto/aesni-intel_asm.S index 62fe22c..477e9d7 100644 --- a/arch/x86/crypto/aesni-intel_asm.S +++ b/arch/x86/crypto/aesni-intel_asm.S @@ -2681,56 +2681,68 @@ ENTRY(aesni_xts_crypt8) addq %rcx, KEYP movdqa IV, STATE1 - pxor 0x00(INP), STATE1 + movdqu 0x00(INP), INC + pxor INC, STATE1 movdqu IV, 0x00(OUTP) _aesni_gf128mul_x_ble() movdqa IV, STATE2 - pxor 0x10(INP), STATE2 + movdqu 0x10(INP), INC + pxor INC, STATE2 movdqu IV, 0x10(OUTP) _aesni_gf128mul_x_ble() movdqa IV, STATE3 - pxor 0x20(INP), STATE3 + movdqu 0x20(INP), INC + pxor INC, STATE3 movdqu IV, 0x20(OUTP) _aesni_gf128mul_x_ble() movdqa IV, STATE4 - pxor 0x30(INP), STATE4 + movdqu 0x30(INP), INC + pxor INC, STATE4 movdqu IV, 0x30(OUTP) call *%r11 - pxor 0x00(OUTP), STATE1 + movdqu 0x00(OUTP), INC + pxor INC, STATE1 movdqu STATE1, 0x00(OUTP) _aesni_gf128mul_x_ble() movdqa IV, STATE1 - pxor 0x40(INP), STATE1 + movdqu 0x40(INP), INC + pxor INC, STATE1 movdqu IV, 0x40(OUTP) - pxor 0x10(OUTP), STATE2 + movdqu 0x10(OUTP), INC + pxor INC, STATE2 movdqu STATE2, 0x10(OUTP) _aesni_gf128mul_x_ble() movdqa IV, STATE2 - pxor 0x50(INP), STATE2 + movdqu 0x50(INP), INC + pxor INC, STATE2 movdqu IV, 0x50(OUTP) - pxor 0x20(OUTP), STATE3 + movdqu 0x20(OUTP), INC + pxor INC, STATE3 movdqu STATE3, 0x20(OUTP) _aesni_gf128mul_x_ble() movdqa IV, STATE3 - pxor 0x60(INP), STATE3 + movdqu 0x60(INP), INC + pxor INC, STATE3 movdqu IV, 0x60(OUTP) - pxor 0x30(OUTP), STATE4 + movdqu 0x30(OUTP), INC + pxor INC, STATE4 movdqu STATE4, 0x30(OUTP) _aesni_gf128mul_x_ble() movdqa IV, STATE4 - pxor 0x70(INP), STATE4 + movdqu 0x70(INP), INC + pxor INC, STATE4 movdqu IV, 0x70(OUTP) _aesni_gf128mul_x_ble() @@ -2738,16 +2750,20 @@ ENTRY(aesni_xts_crypt8) call *%r11 - pxor 0x40(OUTP), STATE1 + movdqu 0x40(OUTP), INC + pxor INC, STATE1 movdqu STATE1, 0x40(OUTP) - pxor 0x50(OUTP), STATE2 + movdqu 0x50(OUTP), INC + pxor INC, STATE2 movdqu STATE2, 0x50(OUTP) - pxor 0x60(OUTP), STATE3 + movdqu 0x60(OUTP), INC + pxor INC, STATE3 movdqu STATE3, 0x60(OUTP) - pxor 0x70(OUTP), STATE4 + movdqu 0x70(OUTP), INC + pxor INC, STATE4 movdqu STATE4, 0x70(OUTP) ret -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] crypto: camellia-aesni-avx2 - tune assembly code for more performance
Add implementation tuned for more performance on real hardware. Changes are mostly around the part mixing 128-bit extract and insert instructions and AES-NI instructions. Also 'vpbroadcastb' instructions have been change to 'vpshufb with zero mask'. Tests on Intel Core i5-4570: tcrypt ECB results, old-AVX2 vs new-AVX2: size128bit key 256bit key enc dec enc dec 256 1.00x 1.00x 1.00x 1.00x 1k 1.08x 1.09x 1.05x 1.06x 8k 1.06x 1.06x 1.06x 1.06x tcrypt ECB results, AVX vs new-AVX2: size128bit key 256bit key enc dec enc dec 256 1.00x 1.00x 1.00x 1.00x 1k 1.51x 1.50x 1.52x 1.50x 8k 1.47x 1.48x 1.48x 1.48x Signed-off-by: Jussi Kivilinna jussi.kivili...@iki.fi --- arch/x86/crypto/camellia-aesni-avx2-asm_64.S | 160 ++ 1 file changed, 89 insertions(+), 71 deletions(-) diff --git a/arch/x86/crypto/camellia-aesni-avx2-asm_64.S b/arch/x86/crypto/camellia-aesni-avx2-asm_64.S index 91a1878..0e0b886 100644 --- a/arch/x86/crypto/camellia-aesni-avx2-asm_64.S +++ b/arch/x86/crypto/camellia-aesni-avx2-asm_64.S @@ -51,16 +51,6 @@ #define ymm14_x xmm14 #define ymm15_x xmm15 -/* - * AES-NI instructions do not support ymmX registers, so we need splitting and - * merging. - */ -#define vaesenclast256(zero, yreg, tmp) \ - vextracti128 $1, yreg, tmp##_x; \ - vaesenclast zero##_x, yreg##_x, yreg##_x; \ - vaesenclast zero##_x, tmp##_x, tmp##_x; \ - vinserti128 $1, tmp##_x, yreg, yreg; - /** 32-way camellia **/ @@ -79,46 +69,70 @@ * S-function with AES subbytes \ */ \ vbroadcasti128 .Linv_shift_row, t4; \ - vpbroadcastb .L0f0f0f0f, t7; \ - vbroadcasti128 .Lpre_tf_lo_s1, t0; \ - vbroadcasti128 .Lpre_tf_hi_s1, t1; \ + vpbroadcastd .L0f0f0f0f, t7; \ + vbroadcasti128 .Lpre_tf_lo_s1, t5; \ + vbroadcasti128 .Lpre_tf_hi_s1, t6; \ + vbroadcasti128 .Lpre_tf_lo_s4, t2; \ + vbroadcasti128 .Lpre_tf_hi_s4, t3; \ \ /* AES inverse shift rows */ \ vpshufb t4, x0, x0; \ vpshufb t4, x7, x7; \ - vpshufb t4, x1, x1; \ - vpshufb t4, x4, x4; \ - vpshufb t4, x2, x2; \ - vpshufb t4, x5, x5; \ vpshufb t4, x3, x3; \ vpshufb t4, x6, x6; \ + vpshufb t4, x2, x2; \ + vpshufb t4, x5, x5; \ + vpshufb t4, x1, x1; \ + vpshufb t4, x4, x4; \ \ /* prefilter sboxes 1, 2 and 3 */ \ - vbroadcasti128 .Lpre_tf_lo_s4, t2; \ - vbroadcasti128 .Lpre_tf_hi_s4, t3; \ - filter_8bit(x0, t0, t1, t7, t6); \ - filter_8bit(x7, t0, t1, t7, t6); \ - filter_8bit(x1, t0, t1, t7, t6); \ - filter_8bit(x4, t0, t1, t7, t6); \ - filter_8bit(x2, t0, t1, t7, t6); \ - filter_8bit(x5, t0, t1, t7, t6); \ - \ /* prefilter sbox 4 */ \ + filter_8bit(x0, t5, t6, t7, t4); \ + filter_8bit(x7, t5, t6, t7, t4); \ + vextracti128 $1, x0, t0##_x; \ + vextracti128 $1, x7, t1##_x; \ + filter_8bit(x3, t2, t3, t7, t4); \ + filter_8bit(x6, t2, t3, t7, t4); \ + vextracti128 $1, x3, t3##_x; \ + vextracti128 $1, x6, t2##_x; \ + filter_8bit(x2, t5, t6, t7, t4); \ + filter_8bit(x5, t5, t6, t7, t4); \ + filter_8bit(x1, t5, t6, t7, t4); \ + filter_8bit(x4, t5, t6, t7, t4); \ + \ vpxor t4##_x, t4##_x, t4##_x; \ - filter_8bit(x3, t2, t3, t7, t6); \ - filter_8bit(x6, t2, t3, t7, t6); \ \ /* AES subbytes + AES shift rows */ \ + vextracti128 $1, x2, t6##_x; \ + vextracti128 $1, x5, t5##_x; \ + vaesenclast t4##_x, x0##_x, x0##_x; \ + vaesenclast t4##_x, t0##_x, t0##_x; \ + vinserti128 $1, t0##_x, x0, x0; \ + vaesenclast t4##_x, x7##_x, x7##_x; \ + vaesenclast t4##_x, t1##_x, t1##_x; \ + vinserti128 $1, t1##_x, x7, x7; \ + vaesenclast t4##_x, x3##_x, x3##_x; \ + vaesenclast t4##_x, t3##_x, t3##_x; \ + vinserti128 $1, t3##_x, x3, x3; \ + vaesenclast t4##_x, x6##_x, x6##_x; \ + vaesenclast t4##_x, t2##_x, t2##_x; \ + vinserti128 $1, t2##_x, x6, x6; \ + vextracti128 $1, x1, t3##_x; \ + vextracti128 $1, x4, t2##_x; \ vbroadcasti128 .Lpost_tf_lo_s1, t0; \ vbroadcasti128 .Lpost_tf_hi_s1, t1; \ - vaesenclast256(t4, x0, t5); \ - vaesenclast256(t4, x7, t5); \ - vaesenclast256(t4, x1, t5); \ - vaesenclast256(t4, x4, t5); \ - vaesenclast256(t4, x2, t5); \ - vaesenclast256(t4, x5, t5); \ - vaesenclast256(t4, x3, t5); \ - vaesenclast256(t4, x6, t5); \ + vaesenclast t4##_x, x2##_x, x2##_x; \ + vaesenclast t4##_x, t6##_x, t6##_x; \ + vinserti128 $1, t6##_x, x2, x2; \ + vaesenclast t4##_x
[PATCH 1/2] Revert crypto: blowfish - add AVX2/x86_64 implementation of blowfish cipher
This reverts commit 604880107010a1e5794552d184cd5471ea31b973. Instruction (vpgatherdd) that this implementation relied on turned out to be slow performer on real hardware (i5-4570). The previous 4-way blowfish implementation is therefore faster and this implementation should be removed. Signed-off-by: Jussi Kivilinna jussi.kivili...@iki.fi --- arch/x86/crypto/Makefile |4 arch/x86/crypto/blowfish-avx2-asm_64.S | 449 - arch/x86/crypto/blowfish_avx2_glue.c | 585 arch/x86/crypto/blowfish_glue.c| 32 +- arch/x86/include/asm/crypto/blowfish.h | 43 -- crypto/Kconfig | 18 - crypto/testmgr.c | 12 - 7 files changed, 24 insertions(+), 1119 deletions(-) delete mode 100644 arch/x86/crypto/blowfish-avx2-asm_64.S delete mode 100644 arch/x86/crypto/blowfish_avx2_glue.c delete mode 100644 arch/x86/include/asm/crypto/blowfish.h diff --git a/arch/x86/crypto/Makefile b/arch/x86/crypto/Makefile index 94cb151..9ce3418 100644 --- a/arch/x86/crypto/Makefile +++ b/arch/x86/crypto/Makefile @@ -3,8 +3,6 @@ # avx_supported := $(call as-instr,vpxor %xmm0$(comma)%xmm0$(comma)%xmm0,yes,no) -avx2_supported := $(call as-instr,vpgatherdd %ymm0$(comma)(%eax$(comma)%ymm1\ - $(comma)4)$(comma)%ymm2,yes,no) obj-$(CONFIG_CRYPTO_ABLK_HELPER_X86) += ablk_helper.o obj-$(CONFIG_CRYPTO_GLUE_HELPER_X86) += glue_helper.o @@ -43,7 +41,6 @@ endif # These modules require assembler to support AVX2. ifeq ($(avx2_supported),yes) - obj-$(CONFIG_CRYPTO_BLOWFISH_AVX2_X86_64) += blowfish-avx2.o obj-$(CONFIG_CRYPTO_CAMELLIA_AESNI_AVX2_X86_64) += camellia-aesni-avx2.o obj-$(CONFIG_CRYPTO_SERPENT_AVX2_X86_64) += serpent-avx2.o obj-$(CONFIG_CRYPTO_TWOFISH_AVX2_X86_64) += twofish-avx2.o @@ -74,7 +71,6 @@ ifeq ($(avx_supported),yes) endif ifeq ($(avx2_supported),yes) - blowfish-avx2-y := blowfish-avx2-asm_64.o blowfish_avx2_glue.o camellia-aesni-avx2-y := camellia-aesni-avx2-asm_64.o camellia_aesni_avx2_glue.o serpent-avx2-y := serpent-avx2-asm_64.o serpent_avx2_glue.o twofish-avx2-y := twofish-avx2-asm_64.o twofish_avx2_glue.o diff --git a/arch/x86/crypto/blowfish-avx2-asm_64.S b/arch/x86/crypto/blowfish-avx2-asm_64.S deleted file mode 100644 index 784452e..000 --- a/arch/x86/crypto/blowfish-avx2-asm_64.S +++ /dev/null @@ -1,449 +0,0 @@ -/* - * x86_64/AVX2 assembler optimized version of Blowfish - * - * Copyright © 2012-2013 Jussi Kivilinna jussi.kivili...@iki.fi - * - * This program is free software; you can redistribute it and/or modify - * it under the terms of the GNU General Public License as published by - * the Free Software Foundation; either version 2 of the License, or - * (at your option) any later version. - * - */ - -#include linux/linkage.h - -.file blowfish-avx2-asm_64.S - -.data -.align 32 - -.Lprefetch_mask: -.long 0*64 -.long 1*64 -.long 2*64 -.long 3*64 -.long 4*64 -.long 5*64 -.long 6*64 -.long 7*64 - -.Lbswap32_mask: -.long 0x00010203 -.long 0x04050607 -.long 0x08090a0b -.long 0x0c0d0e0f - -.Lbswap128_mask: - .byte 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0 -.Lbswap_iv_mask: - .byte 7, 6, 5, 4, 3, 2, 1, 0, 7, 6, 5, 4, 3, 2, 1, 0 - -.text -/* structure of crypto context */ -#define p 0 -#define s0 ((16 + 2) * 4) -#define s1 ((16 + 2 + (1 * 256)) * 4) -#define s2 ((16 + 2 + (2 * 256)) * 4) -#define s3 ((16 + 2 + (3 * 256)) * 4) - -/* register macros */ -#define CTX%rdi -#define RIO %rdx - -#define RS0%rax -#define RS1%r8 -#define RS2%r9 -#define RS3%r10 - -#define RLOOP %r11 -#define RLOOPd %r11d - -#define RXr0 %ymm8 -#define RXr1 %ymm9 -#define RXr2 %ymm10 -#define RXr3 %ymm11 -#define RXl0 %ymm12 -#define RXl1 %ymm13 -#define RXl2 %ymm14 -#define RXl3 %ymm15 - -/* temp regs */ -#define RT0%ymm0 -#define RT0x %xmm0 -#define RT1%ymm1 -#define RT1x %xmm1 -#define RIDX0 %ymm2 -#define RIDX1 %ymm3 -#define RIDX1x %xmm3 -#define RIDX2 %ymm4 -#define RIDX3 %ymm5 - -/* vpgatherdd mask and '-1' */ -#define RNOT %ymm6 - -/* byte mask, (-1 24) */ -#define RBYTE %ymm7 - -/*** - * 32-way AVX2 blowfish - ***/ -#define F(xl, xr) \ - vpsrld $24, xl, RIDX0; \ - vpsrld $16, xl, RIDX1; \ - vpsrld $8, xl, RIDX2; \ - vpand RBYTE, RIDX1, RIDX1; \ - vpand RBYTE, RIDX2, RIDX2; \ - vpand RBYTE, xl, RIDX3; \ - \ - vpgatherdd RNOT, (RS0, RIDX0, 4), RT0; \ - vpcmpeqd RNOT, RNOT, RNOT; \ - vpcmpeqd RIDX0, RIDX0, RIDX0; \ - \ - vpgatherdd RNOT, (RS1, RIDX1, 4), RT1; \ - vpcmpeqd RIDX1, RIDX1, RIDX1; \ - vpaddd RT0, RT1, RT0; \ - \ - vpgatherdd RIDX0, (RS2, RIDX2, 4), RT1
[PATCH 2/2] Revert crypto: twofish - add AVX2/x86_64 assembler implementation of twofish cipher
This reverts commit cf1521a1a5e21fd1e79a458605c4282fbfbbeee2. Instruction (vpgatherdd) that this implementation relied on turned out to be slow performer on real hardware (i5-4570). The previous 8-way twofish/AVX implementation is therefore faster and this implementation should be removed. Converting this implementation to use the same method as in twofish/AVX for table look-ups would give additional ~3% speed up vs twofish/AVX, but would hardly be worth of the added code and binary size. Signed-off-by: Jussi Kivilinna jussi.kivili...@iki.fi --- arch/x86/crypto/Makefile |2 arch/x86/crypto/twofish-avx2-asm_64.S | 600 - arch/x86/crypto/twofish_avx2_glue.c | 584 arch/x86/crypto/twofish_avx_glue.c| 14 - arch/x86/include/asm/crypto/twofish.h | 18 - crypto/Kconfig| 24 - crypto/testmgr.c | 12 - 7 files changed, 2 insertions(+), 1252 deletions(-) delete mode 100644 arch/x86/crypto/twofish-avx2-asm_64.S delete mode 100644 arch/x86/crypto/twofish_avx2_glue.c diff --git a/arch/x86/crypto/Makefile b/arch/x86/crypto/Makefile index 9ce3418..7d6ba9d 100644 --- a/arch/x86/crypto/Makefile +++ b/arch/x86/crypto/Makefile @@ -43,7 +43,6 @@ endif ifeq ($(avx2_supported),yes) obj-$(CONFIG_CRYPTO_CAMELLIA_AESNI_AVX2_X86_64) += camellia-aesni-avx2.o obj-$(CONFIG_CRYPTO_SERPENT_AVX2_X86_64) += serpent-avx2.o - obj-$(CONFIG_CRYPTO_TWOFISH_AVX2_X86_64) += twofish-avx2.o endif aes-i586-y := aes-i586-asm_32.o aes_glue.o @@ -73,7 +72,6 @@ endif ifeq ($(avx2_supported),yes) camellia-aesni-avx2-y := camellia-aesni-avx2-asm_64.o camellia_aesni_avx2_glue.o serpent-avx2-y := serpent-avx2-asm_64.o serpent_avx2_glue.o - twofish-avx2-y := twofish-avx2-asm_64.o twofish_avx2_glue.o endif aesni-intel-y := aesni-intel_asm.o aesni-intel_glue.o fpu.o diff --git a/arch/x86/crypto/twofish-avx2-asm_64.S b/arch/x86/crypto/twofish-avx2-asm_64.S deleted file mode 100644 index e1a83b9..000 --- a/arch/x86/crypto/twofish-avx2-asm_64.S +++ /dev/null @@ -1,600 +0,0 @@ -/* - * x86_64/AVX2 assembler optimized version of Twofish - * - * Copyright © 2012-2013 Jussi Kivilinna jussi.kivili...@iki.fi - * - * This program is free software; you can redistribute it and/or modify - * it under the terms of the GNU General Public License as published by - * the Free Software Foundation; either version 2 of the License, or - * (at your option) any later version. - * - */ - -#include linux/linkage.h -#include glue_helper-asm-avx2.S - -.file twofish-avx2-asm_64.S - -.data -.align 16 - -.Lvpshufb_mask0: -.long 0x80808000 -.long 0x80808004 -.long 0x80808008 -.long 0x8080800c - -.Lbswap128_mask: - .byte 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0 -.Lxts_gf128mul_and_shl1_mask_0: - .byte 0x87, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0 -.Lxts_gf128mul_and_shl1_mask_1: - .byte 0x0e, 1, 0, 0, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0 - -.text - -/* structure of crypto context */ -#define s0 0 -#define s1 1024 -#define s2 2048 -#define s3 3072 -#define w 4096 -#definek 4128 - -/* register macros */ -#define CTX%rdi - -#define RS0CTX -#define RS1%r8 -#define RS2%r9 -#define RS3%r10 -#define RK %r11 -#define RW %rax -#define RROUND %r12 -#define RROUNDd %r12d - -#define RA0%ymm8 -#define RB0%ymm9 -#define RC0%ymm10 -#define RD0%ymm11 -#define RA1%ymm12 -#define RB1%ymm13 -#define RC1%ymm14 -#define RD1%ymm15 - -/* temp regs */ -#define RX0%ymm0 -#define RY0%ymm1 -#define RX1%ymm2 -#define RY1%ymm3 -#define RT0%ymm4 -#define RIDX %ymm5 - -#define RX0x %xmm0 -#define RY0x %xmm1 -#define RX1x %xmm2 -#define RY1x %xmm3 -#define RT0x %xmm4 - -/* vpgatherdd mask and '-1' */ -#define RNOT %ymm6 - -/* byte mask, (-1 24) */ -#define RBYTE %ymm7 - -/** - 16-way AVX2 twofish - **/ -#define init_round_constants() \ - vpcmpeqd RNOT, RNOT, RNOT; \ - vpsrld $24, RNOT, RBYTE; \ - leaq k(CTX), RK; \ - leaq w(CTX), RW; \ - leaq s1(CTX), RS1; \ - leaq s2(CTX), RS2; \ - leaq s3(CTX), RS3; \ - -#define g16(ab, rs0, rs1, rs2, rs3, xy) \ - vpand RBYTE, ab ## 0, RIDX; \ - vpgatherdd RNOT, (rs0, RIDX, 4), xy ## 0; \ - vpcmpeqd RNOT, RNOT, RNOT; \ - \ - vpand RBYTE, ab ## 1, RIDX; \ - vpgatherdd RNOT, (rs0, RIDX, 4), xy ## 1; \ - vpcmpeqd RNOT, RNOT, RNOT; \ - \ - vpsrld $8, ab ## 0, RIDX; \ - vpand RBYTE, RIDX, RIDX; \ - vpgatherdd RNOT, (rs1, RIDX, 4), RT0; \ - vpcmpeqd RNOT, RNOT, RNOT; \ - vpxor RT0, xy ## 0, xy ## 0
Re: [PATCH 2/2] crypto: blowfish - disable AVX2 implementation
On 05.06.2013 11:34, Herbert Xu wrote: On Sun, Jun 02, 2013 at 07:51:52PM +0300, Jussi Kivilinna wrote: It appears that the performance of 'vpgatherdd' is suboptimal for this kind of workload (tested on Core i5-4570) and causes blowfish-avx2 to be significantly slower than blowfish-amd64. So disable the AVX2 implementation to avoid performance regressions. Signed-off-by: Jussi Kivilinna jussi.kivili...@iki.fi Both patches applied to crypto. I presume you're working on a more permanent solution on this? Yes, I've been looking for solution. Problem is, well, that I assumed vgather to be quicker than emulating gather using vpextr/vpinsr instructions. But it appears that vgather has about the same speed as group of vpextr/vpinsr doing gather manually. So doing asm volatile( vpgatherdd %%xmm0, (%[ptr], %%xmm8, 4), %%xmm9;\n\t vpcmpeqd %%xmm0, %%xmm0, %%xmm0; /* reset mask */ \n\t vpgatherdd %%xmm0, (%[ptr], %%xmm9, 4), %%xmm8;\n\t vpcmpeqd %%xmm0, %%xmm0, %%xmm0; \n\t :: [ptr] r (mem[0]) : memory ); in loop is slightly _slower_ than manually extractinginserting values with asm volatile( vmovd %%xmm8, %%eax; \n\t vpextrd $1, %%xmm8, %%edx; \n\t vmovd (%[ptr], %%rax, 4), %%xmm10; \n\t vpextrd $2, %%xmm8, %%eax; \n\t vpinsrd $1, (%[ptr], %%rdx, 4), %%xmm10, %%xmm10; \n\t vpextrd $3, %%xmm8, %%edx; \n\t vpinsrd $2, (%[ptr], %%rax, 4), %%xmm10, %%xmm10; \n\t vpinsrd $3, (%[ptr], %%rdx, 4), %%xmm10, %%xmm9; \n\t vmovd %%xmm9, %%eax; \n\t vpextrd $1, %%xmm9, %%edx; \n\t vmovd (%[ptr], %%rax, 4), %%xmm10; \n\t vpextrd $2, %%xmm9, %%eax; \n\t vpinsrd $1, (%[ptr], %%rdx, 4), %%xmm10, %%xmm10; \n\t vpextrd $3, %%xmm9, %%edx; \n\t vpinsrd $2, (%[ptr], %%rax, 4), %%xmm10, %%xmm10; \n\t vpinsrd $3, (%[ptr], %%rdx, 4), %%xmm10, %%xmm8; \n\t :: [ptr] r (mem[0]) : memory, eax, edx ); vpextr/vpinsr cannot be used with 256-bit wide ymm registers, so 'vinserti128/vextracti128' is needed and make manual gather about the same speed as vpgatherdd. Now the block cipher implementations need to use all bytes of vector register for table look-ups, and the way that this is done in the AVX implementation of Twofish (move data from vector register to generic purpose registers, handle byte-extraction and table look-ups there and move processed data back to vector register) is about two to three times faster than the way with current AVX2 implementation using vgather. Blowfish does not do much processing in addition to table look-ups, so there is not much to that can be done. With Twofish, the table look-ups are the most computationally heavy part and I don't think that the wider vector registers in the other parts are going to give much boost. So permanent solution is likely to be revert. -Jussi Thanks, -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/2] crypto: blowfish - disable AVX2 implementation
It appears that the performance of 'vpgatherdd' is suboptimal for this kind of workload (tested on Core i5-4570) and causes blowfish-avx2 to be significantly slower than blowfish-amd64. So disable the AVX2 implementation to avoid performance regressions. Signed-off-by: Jussi Kivilinna jussi.kivili...@iki.fi --- crypto/Kconfig |1 + 1 file changed, 1 insertion(+) diff --git a/crypto/Kconfig b/crypto/Kconfig index 678a6ed..8ca52c5 100644 --- a/crypto/Kconfig +++ b/crypto/Kconfig @@ -842,6 +842,7 @@ config CRYPTO_BLOWFISH_X86_64 config CRYPTO_BLOWFISH_AVX2_X86_64 tristate Blowfish cipher algorithm (x86_64/AVX2) depends on X86 64BIT + depends on BROKEN select CRYPTO_ALGAPI select CRYPTO_CRYPTD select CRYPTO_ABLK_HELPER_X86 -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/2] crypto: twofish - disable AVX2 implementation
It appears that the performance of 'vpgatherdd' is suboptimal for this kind of workload (tested on Core i5-4570) and causes twofish_avx2 to be significantly slower than twofish_avx. So disable the AVX2 implementation to avoid performance regressions. Signed-off-by: Jussi Kivilinna jussi.kivili...@iki.fi --- crypto/Kconfig |1 + 1 file changed, 1 insertion(+) diff --git a/crypto/Kconfig b/crypto/Kconfig index d1ca631..678a6ed 100644 --- a/crypto/Kconfig +++ b/crypto/Kconfig @@ -1318,6 +1318,7 @@ config CRYPTO_TWOFISH_AVX_X86_64 config CRYPTO_TWOFISH_AVX2_X86_64 tristate Twofish cipher algorithm (x86_64/AVX2) depends on X86 64BIT + depends on BROKEN select CRYPTO_ALGAPI select CRYPTO_CRYPTD select CRYPTO_ABLK_HELPER_X86 -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] crypto: sha256_ssse3 - fix stack corruption with SSSE3 and AVX implementations
The _XFER stack element size was set too small, 8 bytes, when it needs to be 16 bytes. As _XFER is the last stack element used by these implementations, the 16 byte stores with 'movdqa' corrupt the stack where the value of register %r12 is temporarily stored. As these implementations align the stack pointer to 16 bytes, this corruption did not happen every time. Patch corrects this issue. Reported-by: Julian Wollrath jwollr...@web.de Cc: Tim Chen tim.c.c...@linux.intel.com Signed-off-by: Jussi Kivilinna jussi.kivili...@iki.fi --- arch/x86/crypto/sha256-avx-asm.S |2 +- arch/x86/crypto/sha256-ssse3-asm.S |2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/x86/crypto/sha256-avx-asm.S b/arch/x86/crypto/sha256-avx-asm.S index 56610c4..642f156 100644 --- a/arch/x86/crypto/sha256-avx-asm.S +++ b/arch/x86/crypto/sha256-avx-asm.S @@ -118,7 +118,7 @@ y2 = %r15d _INP_END_SIZE = 8 _INP_SIZE = 8 -_XFER_SIZE = 8 +_XFER_SIZE = 16 _XMM_SAVE_SIZE = 0 _INP_END = 0 diff --git a/arch/x86/crypto/sha256-ssse3-asm.S b/arch/x86/crypto/sha256-ssse3-asm.S index 98d3c39..f833b74 100644 --- a/arch/x86/crypto/sha256-ssse3-asm.S +++ b/arch/x86/crypto/sha256-ssse3-asm.S @@ -111,7 +111,7 @@ y2 = %r15d _INP_END_SIZE = 8 _INP_SIZE = 8 -_XFER_SIZE = 8 +_XFER_SIZE = 16 _XMM_SAVE_SIZE = 0 _INP_END = 0 -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] crypto: sha512_generic - set cra_driver_name
'sha512_generic' should set driver name now that there is alternative sha512 provider (sha512_ssse3). Signed-off-by: Jussi Kivilinna jussi.kivili...@iki.fi --- crypto/sha512_generic.c |2 ++ 1 file changed, 2 insertions(+) diff --git a/crypto/sha512_generic.c b/crypto/sha512_generic.c index 4c58620..6ed124f 100644 --- a/crypto/sha512_generic.c +++ b/crypto/sha512_generic.c @@ -251,6 +251,7 @@ static struct shash_alg sha512_algs[2] = { { .descsize = sizeof(struct sha512_state), .base = { .cra_name = sha512, + .cra_driver_name = sha512-generic, .cra_flags = CRYPTO_ALG_TYPE_SHASH, .cra_blocksize = SHA512_BLOCK_SIZE, .cra_module = THIS_MODULE, @@ -263,6 +264,7 @@ static struct shash_alg sha512_algs[2] = { { .descsize = sizeof(struct sha512_state), .base = { .cra_name = sha384, + .cra_driver_name = sha384-generic, .cra_flags = CRYPTO_ALG_TYPE_SHASH, .cra_blocksize = SHA384_BLOCK_SIZE, .cra_module = THIS_MODULE, -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/2] crypto: sha512_ssse3 - add sha384 support
Add sha384 implementation to sha512_ssse3 module. This also fixes sha512_ssse3 module autoloading issue when 'sha384' is used before 'sha512'. Previously in such case, just sha512_generic was loaded and not sha512_ssse3 (since it did not provide sha384). Now if 'sha512' was used after 'sha384' usage, sha512_ssse3 would remain unloaded. For example, this happens with tcrypt testing module since it tests 'sha384' before 'sha512'. Cc: Tim Chen tim.c.c...@linux.intel.com Signed-off-by: Jussi Kivilinna jussi.kivili...@iki.fi --- arch/x86/crypto/sha512_ssse3_glue.c | 58 --- 1 file changed, 53 insertions(+), 5 deletions(-) diff --git a/arch/x86/crypto/sha512_ssse3_glue.c b/arch/x86/crypto/sha512_ssse3_glue.c index 6cbd8df..f30cd10 100644 --- a/arch/x86/crypto/sha512_ssse3_glue.c +++ b/arch/x86/crypto/sha512_ssse3_glue.c @@ -194,7 +194,37 @@ static int sha512_ssse3_import(struct shash_desc *desc, const void *in) return 0; } -static struct shash_alg alg = { +static int sha384_ssse3_init(struct shash_desc *desc) +{ + struct sha512_state *sctx = shash_desc_ctx(desc); + + sctx-state[0] = SHA384_H0; + sctx-state[1] = SHA384_H1; + sctx-state[2] = SHA384_H2; + sctx-state[3] = SHA384_H3; + sctx-state[4] = SHA384_H4; + sctx-state[5] = SHA384_H5; + sctx-state[6] = SHA384_H6; + sctx-state[7] = SHA384_H7; + + sctx-count[0] = sctx-count[1] = 0; + + return 0; +} + +static int sha384_ssse3_final(struct shash_desc *desc, u8 *hash) +{ + u8 D[SHA512_DIGEST_SIZE]; + + sha512_ssse3_final(desc, D); + + memcpy(hash, D, SHA384_DIGEST_SIZE); + memset(D, 0, SHA512_DIGEST_SIZE); + + return 0; +} + +static struct shash_alg algs[] = { { .digestsize = SHA512_DIGEST_SIZE, .init = sha512_ssse3_init, .update = sha512_ssse3_update, @@ -211,7 +241,24 @@ static struct shash_alg alg = { .cra_blocksize = SHA512_BLOCK_SIZE, .cra_module = THIS_MODULE, } -}; +}, { + .digestsize = SHA384_DIGEST_SIZE, + .init = sha384_ssse3_init, + .update = sha512_ssse3_update, + .final = sha384_ssse3_final, + .export = sha512_ssse3_export, + .import = sha512_ssse3_import, + .descsize = sizeof(struct sha512_state), + .statesize = sizeof(struct sha512_state), + .base = { + .cra_name = sha384, + .cra_driver_name = sha384-ssse3, + .cra_priority = 150, + .cra_flags = CRYPTO_ALG_TYPE_SHASH, + .cra_blocksize = SHA384_BLOCK_SIZE, + .cra_module = THIS_MODULE, + } +} }; #ifdef CONFIG_AS_AVX static bool __init avx_usable(void) @@ -234,7 +281,7 @@ static bool __init avx_usable(void) static int __init sha512_ssse3_mod_init(void) { - /* test for SSE3 first */ + /* test for SSSE3 first */ if (cpu_has_ssse3) sha512_transform_asm = sha512_transform_ssse3; @@ -261,7 +308,7 @@ static int __init sha512_ssse3_mod_init(void) else #endif pr_info(Using SSSE3 optimized SHA-512 implementation\n); - return crypto_register_shash(alg); + return crypto_register_shashes(algs, ARRAY_SIZE(algs)); } pr_info(Neither AVX nor SSSE3 is available/usable.\n); @@ -270,7 +317,7 @@ static int __init sha512_ssse3_mod_init(void) static void __exit sha512_ssse3_mod_fini(void) { - crypto_unregister_shash(alg); + crypto_unregister_shashes(algs, ARRAY_SIZE(algs)); } module_init(sha512_ssse3_mod_init); @@ -280,3 +327,4 @@ MODULE_LICENSE(GPL); MODULE_DESCRIPTION(SHA512 Secure Hash Algorithm, Supplemental SSE3 accelerated); MODULE_ALIAS(sha512); +MODULE_ALIAS(sha384); -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/2] crypto: sha256_ssse3 - add sha224 support
Add sha224 implementation to sha256_ssse3 module. This also fixes sha256_ssse3 module autoloading issue when 'sha224' is used before 'sha256'. Previously in such case, just sha256_generic was loaded and not sha256_ssse3 (since it did not provide sha224). Now if 'sha256' was used after 'sha224' usage, sha256_ssse3 would remain unloaded. Cc: Tim Chen tim.c.c...@linux.intel.com Signed-off-by: Jussi Kivilinna jussi.kivili...@iki.fi --- arch/x86/crypto/sha256_ssse3_glue.c | 57 --- 1 file changed, 52 insertions(+), 5 deletions(-) diff --git a/arch/x86/crypto/sha256_ssse3_glue.c b/arch/x86/crypto/sha256_ssse3_glue.c index 597d4da..50226c4 100644 --- a/arch/x86/crypto/sha256_ssse3_glue.c +++ b/arch/x86/crypto/sha256_ssse3_glue.c @@ -187,7 +187,36 @@ static int sha256_ssse3_import(struct shash_desc *desc, const void *in) return 0; } -static struct shash_alg alg = { +static int sha224_ssse3_init(struct shash_desc *desc) +{ + struct sha256_state *sctx = shash_desc_ctx(desc); + + sctx-state[0] = SHA224_H0; + sctx-state[1] = SHA224_H1; + sctx-state[2] = SHA224_H2; + sctx-state[3] = SHA224_H3; + sctx-state[4] = SHA224_H4; + sctx-state[5] = SHA224_H5; + sctx-state[6] = SHA224_H6; + sctx-state[7] = SHA224_H7; + sctx-count = 0; + + return 0; +} + +static int sha224_ssse3_final(struct shash_desc *desc, u8 *hash) +{ + u8 D[SHA256_DIGEST_SIZE]; + + sha256_ssse3_final(desc, D); + + memcpy(hash, D, SHA224_DIGEST_SIZE); + memset(D, 0, SHA256_DIGEST_SIZE); + + return 0; +} + +static struct shash_alg algs[] = { { .digestsize = SHA256_DIGEST_SIZE, .init = sha256_ssse3_init, .update = sha256_ssse3_update, @@ -204,7 +233,24 @@ static struct shash_alg alg = { .cra_blocksize = SHA256_BLOCK_SIZE, .cra_module = THIS_MODULE, } -}; +}, { + .digestsize = SHA224_DIGEST_SIZE, + .init = sha224_ssse3_init, + .update = sha256_ssse3_update, + .final = sha224_ssse3_final, + .export = sha256_ssse3_export, + .import = sha256_ssse3_import, + .descsize = sizeof(struct sha256_state), + .statesize = sizeof(struct sha256_state), + .base = { + .cra_name = sha224, + .cra_driver_name = sha224-ssse3, + .cra_priority = 150, + .cra_flags = CRYPTO_ALG_TYPE_SHASH, + .cra_blocksize = SHA224_BLOCK_SIZE, + .cra_module = THIS_MODULE, + } +} }; #ifdef CONFIG_AS_AVX static bool __init avx_usable(void) @@ -227,7 +273,7 @@ static bool __init avx_usable(void) static int __init sha256_ssse3_mod_init(void) { - /* test for SSE3 first */ + /* test for SSSE3 first */ if (cpu_has_ssse3) sha256_transform_asm = sha256_transform_ssse3; @@ -254,7 +300,7 @@ static int __init sha256_ssse3_mod_init(void) else #endif pr_info(Using SSSE3 optimized SHA-256 implementation\n); - return crypto_register_shash(alg); + return crypto_register_shashes(algs, ARRAY_SIZE(algs)); } pr_info(Neither AVX nor SSSE3 is available/usable.\n); @@ -263,7 +309,7 @@ static int __init sha256_ssse3_mod_init(void) static void __exit sha256_ssse3_mod_fini(void) { - crypto_unregister_shash(alg); + crypto_unregister_shashes(algs, ARRAY_SIZE(algs)); } module_init(sha256_ssse3_mod_init); @@ -273,3 +319,4 @@ MODULE_LICENSE(GPL); MODULE_DESCRIPTION(SHA256 Secure Hash Algorithm, Supplemental SSE3 accelerated); MODULE_ALIAS(sha256); +MODULE_ALIAS(sha384); -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Oops on 3.10-rc1 related to ssh256_ssse3
and AVX implementations From: Jussi Kivilinna jussi.kivili...@iki.fi The _XFER stack element size was set too small, 8 bytes, when it needs to be 16 bytes. As _XFER is the last stack element used by these implementations, the 16 byte stores with 'movdqa' corrupt the stack where the value of register %r12 is temporarily stored. As implementations align stack to 16 bytes, this corruption did not happen every time. Patch corrects this issue. --- arch/x86/crypto/sha256-avx-asm.S |2 +- arch/x86/crypto/sha256-ssse3-asm.S |2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/x86/crypto/sha256-avx-asm.S b/arch/x86/crypto/sha256-avx-asm.S index 56610c4..642f156 100644 --- a/arch/x86/crypto/sha256-avx-asm.S +++ b/arch/x86/crypto/sha256-avx-asm.S @@ -118,7 +118,7 @@ y2 = %r15d _INP_END_SIZE = 8 _INP_SIZE = 8 -_XFER_SIZE = 8 +_XFER_SIZE = 16 _XMM_SAVE_SIZE = 0 _INP_END = 0 diff --git a/arch/x86/crypto/sha256-ssse3-asm.S b/arch/x86/crypto/sha256-ssse3-asm.S index 98d3c39..f833b74 100644 --- a/arch/x86/crypto/sha256-ssse3-asm.S +++ b/arch/x86/crypto/sha256-ssse3-asm.S @@ -111,7 +111,7 @@ y2 = %r15d _INP_END_SIZE = 8 _INP_SIZE = 8 -_XFER_SIZE = 8 +_XFER_SIZE = 16 _XMM_SAVE_SIZE = 0 _INP_END = 0
Re: [PATCH 2/4] Accelerated CRC T10 DIF computation with PCLMULQDQ instruction
On 16.04.2013 19:20, Tim Chen wrote: This is the x86_64 CRC T10 DIF transform accelerated with the PCLMULQDQ instructions. Details discussing the implementation can be found in the paper: Fast CRC Computation for Generic Polynomials Using PCLMULQDQ Instruction URL: http://download.intel.com/design/intarch/papers/323102.pdf URL does not work. Signed-off-by: Tim Chen tim.c.c...@linux.intel.com Tested-by: Keith Busch keith.bu...@intel.com --- arch/x86/crypto/crct10dif-pcl-asm_64.S | 659 + 1 file changed, 659 insertions(+) create mode 100644 arch/x86/crypto/crct10dif-pcl-asm_64.S snip + + # Allocate Stack Space + mov %rsp, %rcx + sub $16*10, %rsp + and $~(0x20 - 1), %rsp + + # push the xmm registers into the stack to maintain + movdqa %xmm10, 16*2(%rsp) + movdqa %xmm11, 16*3(%rsp) + movdqa %xmm8 , 16*4(%rsp) + movdqa %xmm12, 16*5(%rsp) + movdqa %xmm13, 16*6(%rsp) + movdqa %xmm6, 16*7(%rsp) + movdqa %xmm7, 16*8(%rsp) + movdqa %xmm9, 16*9(%rsp) You don't need to store (and restore) these, as 'crc_t10dif_pcl' is called between kernel_fpu_begin/_end. + + + # check if smaller than 256 + cmp $256, arg3 + snip +_cleanup: + # scale the result back to 16 bits + shr $16, %eax + movdqa 16*2(%rsp), %xmm10 + movdqa 16*3(%rsp), %xmm11 + movdqa 16*4(%rsp), %xmm8 + movdqa 16*5(%rsp), %xmm12 + movdqa 16*6(%rsp), %xmm13 + movdqa 16*7(%rsp), %xmm6 + movdqa 16*8(%rsp), %xmm7 + movdqa 16*9(%rsp), %xmm9 Registers are overwritten by kernel_fpu_end. + mov %rcx, %rsp + ret +ENDPROC(crc_t10dif_pcl) + You should move ENDPROC at end of the full function. + + +.align 16 +_less_than_128: + + # check if there is enough buffer to be able to fold 16B at a time + cmp $32, arg3 snip + movdqa (%rsp), %xmm7 + pshufb %xmm11, %xmm7 + pxor%xmm0 , %xmm7 # xor the initial crc value + + psrldq $7, %xmm7 + + jmp _barrett Move ENDPROC here. -Jussi -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/4] Simple correctness and speed test for CRCT10DIF hash
On 16.04.2013 19:20, Tim Chen wrote: These are simple tests to do sanity check of CRC T10 DIF hash. The correctness of the transform can be checked with the command modprobe tcrypt mode=47 The speed of the transform can be evaluated with the command modprobe tcrypt mode=320 Set the cpu frequency to constant and turn turbo off when running the speed test so the frequency governor will not tweak the frequency and affects the measurements. Signed-off-by: Tim Chen tim.c.c...@linux.intel.com Tested-by: Keith Busch keith.bu...@intel.com snip +#define CRCT10DIF_TEST_VECTORS 2 +static struct hash_testvec crct10dif_tv_template[] = { + { + .plaintext = abc, + .psize = 3, +#ifdef __LITTLE_ENDIAN + .digest = \x3b\x44, +#else + .digest = \x44\x3b, +#endif + }, { + .plaintext = + abcd, + .psize = 56, +#ifdef __LITTLE_ENDIAN + .digest = \xe3\x9c, +#else + .digest = \x9c\xe3, +#endif + .np = 2, + .tap= { 28, 28 } + } +}; + Are these large enough to test all code paths in the PCLMULQDQ implementation? -Jussi -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC PATCH 0/6] Add AVX2 accelerated implementations for Blowfish, Twofish, Serpent and Camellia
The following series implements four block ciphers - Blowfish, Twofish, Serpent and Camellia - using AVX2 instruction set. This work on AVX2 implementations started over year ago and have been available at https://github.com/jkivilin/crypto-avx2 The Serpent and Camellia implementations are directly based on the word-sliced and byte-sliced AVX implementations and have been extended to use the 256-bit YMM registers. As such the performance should be better than with the 128-bit wide AVX implementations. (Camellia implementation needs some extra handling for the AES-NI as AES instructions have remained only 128-bit wide.) Blowfish and Twofish implementations utilize the new vpgatherdd instruction to perform eight vectorized 8x32-bit table look-ups at once. This is different from the previous word-sliced AVX implementations, where table look-ups have to performed through general purpose registers. AVX2 implementations thus avoid additional moving of data between the SIMD and general purpose registers and therefore should be faster. For obvious reasons, I have not tested these implementations on real hardware. Kernel tcrypt tests have been run under Bochs, which should contain somewhat working AVX2 implementation. But I cannot be sure, even the Intel SDE emulator that I used for testing these implementations did not quite follow the specs (a past version of SDE that I initially used allowed vector registers to vgather be same, whereas specs say that in such case exception should be raised). Because of this, the first versions of patchset in above repository are broken. So since I'm unable to verify that these implementations work on real hardware and are unable to conduct real performance evaluation, I'm sending this patchset as RFC. Maybe someone can actually test these on real hardware and maybe give acked-by in case these look ok(?). If such is not possible, I'll do the testing myself when those Haswell processors come available where I live. -Jussi --- Jussi Kivilinna (6): crypto: testmgr - extend camellia test-vectors for camellia-aesni/avx2 crypto: tcrypt - add async cipher speed tests for blowfish crypto: blowfish - add AVX2/x86_64 implementation of blowfish cipher crypto: twofish - add AVX2/x86_64 assembler implementation of twofish cipher crypto: serpent - add AVX2/x86_64 assembler implementation of serpent cipher crypto: camellia - add AVX2/AES-NI/x86_64 assembler implementation of camellia cipher arch/x86/crypto/Makefile | 17 arch/x86/crypto/blowfish-avx2-asm_64.S | 449 + arch/x86/crypto/blowfish_avx2_glue.c | 585 +++ arch/x86/crypto/blowfish_glue.c | 32 - arch/x86/crypto/camellia-aesni-avx2-asm_64.S | 1368 ++ arch/x86/crypto/camellia_aesni_avx2_glue.c | 586 +++ arch/x86/crypto/camellia_aesni_avx_glue.c| 17 arch/x86/crypto/glue_helper-asm-avx2.S | 180 +++ arch/x86/crypto/serpent-avx2-asm_64.S| 800 +++ arch/x86/crypto/serpent_avx2_glue.c | 562 +++ arch/x86/crypto/serpent_avx_glue.c | 62 + arch/x86/crypto/twofish-avx2-asm_64.S| 600 +++ arch/x86/crypto/twofish_avx2_glue.c | 584 +++ arch/x86/crypto/twofish_avx_glue.c | 14 arch/x86/include/asm/cpufeature.h|1 arch/x86/include/asm/crypto/blowfish.h | 43 + arch/x86/include/asm/crypto/camellia.h | 19 arch/x86/include/asm/crypto/serpent-avx.h| 24 arch/x86/include/asm/crypto/twofish.h| 18 crypto/Kconfig | 88 ++ crypto/tcrypt.c | 15 crypto/testmgr.c | 51 + crypto/testmgr.h | 1100 - 23 files changed, 7128 insertions(+), 87 deletions(-) create mode 100644 arch/x86/crypto/blowfish-avx2-asm_64.S create mode 100644 arch/x86/crypto/blowfish_avx2_glue.c create mode 100644 arch/x86/crypto/camellia-aesni-avx2-asm_64.S create mode 100644 arch/x86/crypto/camellia_aesni_avx2_glue.c create mode 100644 arch/x86/crypto/glue_helper-asm-avx2.S create mode 100644 arch/x86/crypto/serpent-avx2-asm_64.S create mode 100644 arch/x86/crypto/serpent_avx2_glue.c create mode 100644 arch/x86/crypto/twofish-avx2-asm_64.S create mode 100644 arch/x86/crypto/twofish_avx2_glue.c create mode 100644 arch/x86/include/asm/crypto/blowfish.h -- -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC PATCH 1/6] crypto: testmgr - extend camellia test-vectors for camellia-aesni/avx2
Signed-off-by: Jussi Kivilinna jussi.kivili...@iki.fi --- crypto/testmgr.h | 1100 -- 1 file changed, 1062 insertions(+), 38 deletions(-) diff --git a/crypto/testmgr.h b/crypto/testmgr.h index d503660..dc2c054 100644 --- a/crypto/testmgr.h +++ b/crypto/testmgr.h @@ -20997,8 +20997,72 @@ static struct cipher_testvec camellia_enc_tv_template[] = { \x86\x1D\xB4\x28\xBF\x56\xED\x61 \xF8\x8F\x03\x9A\x31\xC8\x3C\xD3 \x6A\x01\x75\x0C\xA3\x17\xAE\x45 - \xDC\x50\xE7\x7E\x15\x89\x20\xB7, - .ilen = 496, + \xDC\x50\xE7\x7E\x15\x89\x20\xB7 + \x2B\xC2\x59\xF0\x64\xFB\x92\x06 + \x9D\x34\xCB\x3F\xD6\x6D\x04\x78 + \x0F\xA6\x1A\xB1\x48\xDF\x53\xEA + \x81\x18\x8C\x23\xBA\x2E\xC5\x5C + \xF3\x67\xFE\x95\x09\xA0\x37\xCE + \x42\xD9\x70\x07\x7B\x12\xA9\x1D + \xB4\x4B\xE2\x56\xED\x84\x1B\x8F + \x26\xBD\x31\xC8\x5F\xF6\x6A\x01 + \x98\x0C\xA3\x3A\xD1\x45\xDC\x73 + \x0A\x7E\x15\xAC\x20\xB7\x4E\xE5 + \x59\xF0\x87\x1E\x92\x29\xC0\x34 + \xCB\x62\xF9\x6D\x04\x9B\x0F\xA6 + \x3D\xD4\x48\xDF\x76\x0D\x81\x18 + \xAF\x23\xBA\x51\xE8\x5C\xF3\x8A + \x21\x95\x2C\xC3\x37\xCE\x65\xFC + \x70\x07\x9E\x12\xA9\x40\xD7\x4B + \xE2\x79\x10\x84\x1B\xB2\x26\xBD + \x54\xEB\x5F\xF6\x8D\x01\x98\x2F + \xC6\x3A\xD1\x68\xFF\x73\x0A\xA1 + \x15\xAC\x43\xDA\x4E\xE5\x7C\x13 + \x87\x1E\xB5\x29\xC0\x57\xEE\x62 + \xF9\x90\x04\x9B\x32\xC9\x3D\xD4 + \x6B\x02\x76\x0D\xA4\x18\xAF\x46 + \xDD\x51\xE8\x7F\x16\x8A\x21\xB8 + \x2C\xC3\x5A\xF1\x65\xFC\x93\x07 + \x9E\x35\xCC\x40\xD7\x6E\x05\x79 + \x10\xA7\x1B\xB2\x49\xE0\x54\xEB + \x82\x19\x8D\x24\xBB\x2F\xC6\x5D + \xF4\x68\xFF\x96\x0A\xA1\x38\xCF + \x43\xDA\x71\x08\x7C\x13\xAA\x1E + \xB5\x4C\xE3\x57\xEE\x85\x1C\x90 + \x27\xBE\x32\xC9\x60\xF7\x6B\x02 + \x99\x0D\xA4\x3B\xD2\x46\xDD\x74 + \x0B\x7F\x16\xAD\x21\xB8\x4F\xE6 + \x5A\xF1\x88\x1F\x93\x2A\xC1\x35 + \xCC\x63\xFA\x6E\x05\x9C\x10\xA7 + \x3E\xD5\x49\xE0\x77\x0E\x82\x19 + \xB0\x24\xBB\x52\xE9\x5D\xF4\x8B + \x22\x96\x2D\xC4\x38\xCF\x66\xFD + \x71\x08\x9F\x13\xAA\x41\xD8\x4C + \xE3\x7A\x11\x85\x1C\xB3\x27\xBE + \x55\xEC\x60\xF7\x8E\x02\x99\x30 + \xC7\x3B\xD2\x69\x00\x74\x0B\xA2 + \x16\xAD\x44\xDB\x4F\xE6\x7D\x14 + \x88\x1F\xB6\x2A\xC1\x58\xEF\x63 + \xFA\x91\x05\x9C\x33\xCA\x3E\xD5 + \x6C\x03\x77\x0E\xA5\x19\xB0\x47 + \xDE\x52\xE9\x80\x17\x8B\x22\xB9 + \x2D\xC4\x5B\xF2\x66\xFD\x94\x08 + \x9F\x36\xCD\x41\xD8\x6F\x06\x7A + \x11\xA8\x1C\xB3\x4A\xE1\x55\xEC + \x83\x1A\x8E\x25\xBC\x30\xC7\x5E + \xF5\x69\x00\x97\x0B\xA2\x39\xD0 + \x44\xDB\x72\x09\x7D\x14\xAB\x1F + \xB6\x4D\xE4\x58\xEF\x86\x1D\x91 + \x28\xBF\x33\xCA\x61\xF8\x6C\x03 + \x9A\x0E\xA5\x3C\xD3\x47\xDE\x75 + \x0C\x80\x17\xAE\x22\xB9\x50\xE7 + \x5B\xF2\x89\x20\x94\x2B\xC2\x36 + \xCD\x64\xFB\x6F\x06\x9D\x11\xA8 + \x3F\xD6\x4A\xE1\x78\x0F\x83\x1A + \xB1\x25\xBC\x53\xEA\x5E\xF5\x8C + \x00\x97\x2E\xC5\x39\xD0\x67\xFE + \x72\x09\xA0\x14\xAB\x42\xD9\x4D, + .ilen = 1008, .result = \xED\xCD\xDB\xB8\x68\xCE\xBD\xEA \x9D\x9D\xCD\x9F\x4F\xFC\x4D\xB7 \xA5\xFF\x6F\x43\x0F\xBA\x32\x04 @@ -21060,11 +21124,75 @@ static struct cipher_testvec camellia_enc_tv_template[] = { \x2C\x35\x1B\x38\x85\x7D\xE8\xF3 \x87\x4F\xDA\xD8\x5F\xFC\xB6\x44 \xD0\xE3\x9B\x8B\xBF\xD6\xB8\xC4
[RFC PATCH 3/6] crypto: blowfish - add AVX2/x86_64 implementation of blowfish cipher
Patch adds AVX2/x86-64 implementation of Blowfish cipher, requiring 32 parallel blocks for input (256 bytes). Table look-ups are performed using vpgatherdd instruction directly from vector registers and thus should be faster than earlier implementations. Signed-off-by: Jussi Kivilinna jussi.kivili...@iki.fi --- arch/x86/crypto/Makefile | 11 + arch/x86/crypto/blowfish-avx2-asm_64.S | 449 + arch/x86/crypto/blowfish_avx2_glue.c | 585 arch/x86/crypto/blowfish_glue.c| 32 -- arch/x86/include/asm/cpufeature.h |1 arch/x86/include/asm/crypto/blowfish.h | 43 ++ crypto/Kconfig | 18 + crypto/testmgr.c | 12 + 8 files changed, 1127 insertions(+), 24 deletions(-) create mode 100644 arch/x86/crypto/blowfish-avx2-asm_64.S create mode 100644 arch/x86/crypto/blowfish_avx2_glue.c create mode 100644 arch/x86/include/asm/crypto/blowfish.h diff --git a/arch/x86/crypto/Makefile b/arch/x86/crypto/Makefile index 03cd731..28464ef 100644 --- a/arch/x86/crypto/Makefile +++ b/arch/x86/crypto/Makefile @@ -3,6 +3,8 @@ # avx_supported := $(call as-instr,vpxor %xmm0$(comma)%xmm0$(comma)%xmm0,yes,no) +avx2_supported := $(call as-instr,vpgatherdd %ymm0$(comma)(%eax$(comma)%ymm1\ + $(comma)4)$(comma)%ymm2,yes,no) obj-$(CONFIG_CRYPTO_ABLK_HELPER_X86) += ablk_helper.o obj-$(CONFIG_CRYPTO_GLUE_HELPER_X86) += glue_helper.o @@ -38,6 +40,11 @@ ifeq ($(avx_supported),yes) obj-$(CONFIG_CRYPTO_SERPENT_AVX_X86_64) += serpent-avx-x86_64.o endif +# These modules require assembler to support AVX2. +ifeq ($(avx2_supported),yes) + obj-$(CONFIG_CRYPTO_BLOWFISH_AVX2_X86_64) += blowfish-avx2.o +endif + aes-i586-y := aes-i586-asm_32.o aes_glue.o twofish-i586-y := twofish-i586-asm_32.o twofish_glue.o salsa20-i586-y := salsa20-i586-asm_32.o salsa20_glue.o @@ -62,6 +69,10 @@ ifeq ($(avx_supported),yes) serpent_avx_glue.o endif +ifeq ($(avx2_supported),yes) + blowfish-avx2-y := blowfish-avx2-asm_64.o blowfish_avx2_glue.o +endif + aesni-intel-y := aesni-intel_asm.o aesni-intel_glue.o fpu.o ghash-clmulni-intel-y := ghash-clmulni-intel_asm.o ghash-clmulni-intel_glue.o sha1-ssse3-y := sha1_ssse3_asm.o sha1_ssse3_glue.o diff --git a/arch/x86/crypto/blowfish-avx2-asm_64.S b/arch/x86/crypto/blowfish-avx2-asm_64.S new file mode 100644 index 000..784452e --- /dev/null +++ b/arch/x86/crypto/blowfish-avx2-asm_64.S @@ -0,0 +1,449 @@ +/* + * x86_64/AVX2 assembler optimized version of Blowfish + * + * Copyright © 2012-2013 Jussi Kivilinna jussi.kivili...@iki.fi + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + */ + +#include linux/linkage.h + +.file blowfish-avx2-asm_64.S + +.data +.align 32 + +.Lprefetch_mask: +.long 0*64 +.long 1*64 +.long 2*64 +.long 3*64 +.long 4*64 +.long 5*64 +.long 6*64 +.long 7*64 + +.Lbswap32_mask: +.long 0x00010203 +.long 0x04050607 +.long 0x08090a0b +.long 0x0c0d0e0f + +.Lbswap128_mask: + .byte 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0 +.Lbswap_iv_mask: + .byte 7, 6, 5, 4, 3, 2, 1, 0, 7, 6, 5, 4, 3, 2, 1, 0 + +.text +/* structure of crypto context */ +#define p 0 +#define s0 ((16 + 2) * 4) +#define s1 ((16 + 2 + (1 * 256)) * 4) +#define s2 ((16 + 2 + (2 * 256)) * 4) +#define s3 ((16 + 2 + (3 * 256)) * 4) + +/* register macros */ +#define CTX%rdi +#define RIO %rdx + +#define RS0%rax +#define RS1%r8 +#define RS2%r9 +#define RS3%r10 + +#define RLOOP %r11 +#define RLOOPd %r11d + +#define RXr0 %ymm8 +#define RXr1 %ymm9 +#define RXr2 %ymm10 +#define RXr3 %ymm11 +#define RXl0 %ymm12 +#define RXl1 %ymm13 +#define RXl2 %ymm14 +#define RXl3 %ymm15 + +/* temp regs */ +#define RT0%ymm0 +#define RT0x %xmm0 +#define RT1%ymm1 +#define RT1x %xmm1 +#define RIDX0 %ymm2 +#define RIDX1 %ymm3 +#define RIDX1x %xmm3 +#define RIDX2 %ymm4 +#define RIDX3 %ymm5 + +/* vpgatherdd mask and '-1' */ +#define RNOT %ymm6 + +/* byte mask, (-1 24) */ +#define RBYTE %ymm7 + +/*** + * 32-way AVX2 blowfish + ***/ +#define F(xl, xr) \ + vpsrld $24, xl, RIDX0; \ + vpsrld $16, xl, RIDX1; \ + vpsrld $8, xl, RIDX2; \ + vpand RBYTE, RIDX1, RIDX1; \ + vpand RBYTE, RIDX2, RIDX2; \ + vpand RBYTE, xl, RIDX3; \ + \ + vpgatherdd RNOT, (RS0, RIDX0, 4), RT0; \ + vpcmpeqd RNOT, RNOT, RNOT; \ + vpcmpeqd RIDX0, RIDX0, RIDX0; \ + \ + vpgatherdd RNOT, (RS1, RIDX1, 4), RT1; \ + vpcmpeqd RIDX1, RIDX1, RIDX1
[RFC PATCH 4/6] crypto: twofish - add AVX2/x86_64 assembler implementation of twofish cipher
Patch adds AVX2/x86-64 implementation of Twofish cipher, requiring 16 parallel blocks for input (256 bytes). Table look-ups are performed using vpgatherdd instruction directly from vector registers and thus should be faster than earlier implementations. Implementation also uses 256-bit wide YMM registers, which should give additional speed up compared to the AVX implementation. Signed-off-by: Jussi Kivilinna jussi.kivili...@iki.fi --- arch/x86/crypto/Makefile |2 arch/x86/crypto/glue_helper-asm-avx2.S | 180 ++ arch/x86/crypto/twofish-avx2-asm_64.S | 600 arch/x86/crypto/twofish_avx2_glue.c| 584 +++ arch/x86/crypto/twofish_avx_glue.c | 14 + arch/x86/include/asm/crypto/twofish.h | 18 + crypto/Kconfig | 24 + crypto/testmgr.c | 12 + 8 files changed, 1432 insertions(+), 2 deletions(-) create mode 100644 arch/x86/crypto/glue_helper-asm-avx2.S create mode 100644 arch/x86/crypto/twofish-avx2-asm_64.S create mode 100644 arch/x86/crypto/twofish_avx2_glue.c diff --git a/arch/x86/crypto/Makefile b/arch/x86/crypto/Makefile index 28464ef..1f6e0c2 100644 --- a/arch/x86/crypto/Makefile +++ b/arch/x86/crypto/Makefile @@ -43,6 +43,7 @@ endif # These modules require assembler to support AVX2. ifeq ($(avx2_supported),yes) obj-$(CONFIG_CRYPTO_BLOWFISH_AVX2_X86_64) += blowfish-avx2.o + obj-$(CONFIG_CRYPTO_TWOFISH_AVX2_X86_64) += twofish-avx2.o endif aes-i586-y := aes-i586-asm_32.o aes_glue.o @@ -71,6 +72,7 @@ endif ifeq ($(avx2_supported),yes) blowfish-avx2-y := blowfish-avx2-asm_64.o blowfish_avx2_glue.o + twofish-avx2-y := twofish-avx2-asm_64.o twofish_avx2_glue.o endif aesni-intel-y := aesni-intel_asm.o aesni-intel_glue.o fpu.o diff --git a/arch/x86/crypto/glue_helper-asm-avx2.S b/arch/x86/crypto/glue_helper-asm-avx2.S new file mode 100644 index 000..a53ac11 --- /dev/null +++ b/arch/x86/crypto/glue_helper-asm-avx2.S @@ -0,0 +1,180 @@ +/* + * Shared glue code for 128bit block ciphers, AVX2 assembler macros + * + * Copyright © 2012-2013 Jussi Kivilinna jussi.kivili...@mbnet.fi + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + */ + +#define load_16way(src, x0, x1, x2, x3, x4, x5, x6, x7) \ + vmovdqu (0*32)(src), x0; \ + vmovdqu (1*32)(src), x1; \ + vmovdqu (2*32)(src), x2; \ + vmovdqu (3*32)(src), x3; \ + vmovdqu (4*32)(src), x4; \ + vmovdqu (5*32)(src), x5; \ + vmovdqu (6*32)(src), x6; \ + vmovdqu (7*32)(src), x7; + +#define store_16way(dst, x0, x1, x2, x3, x4, x5, x6, x7) \ + vmovdqu x0, (0*32)(dst); \ + vmovdqu x1, (1*32)(dst); \ + vmovdqu x2, (2*32)(dst); \ + vmovdqu x3, (3*32)(dst); \ + vmovdqu x4, (4*32)(dst); \ + vmovdqu x5, (5*32)(dst); \ + vmovdqu x6, (6*32)(dst); \ + vmovdqu x7, (7*32)(dst); + +#define store_cbc_16way(src, dst, x0, x1, x2, x3, x4, x5, x6, x7, t0) \ + vpxor t0, t0, t0; \ + vinserti128 $1, (src), t0, t0; \ + vpxor t0, x0, x0; \ + vpxor (0*32+16)(src), x1, x1; \ + vpxor (1*32+16)(src), x2, x2; \ + vpxor (2*32+16)(src), x3, x3; \ + vpxor (3*32+16)(src), x4, x4; \ + vpxor (4*32+16)(src), x5, x5; \ + vpxor (5*32+16)(src), x6, x6; \ + vpxor (6*32+16)(src), x7, x7; \ + store_16way(dst, x0, x1, x2, x3, x4, x5, x6, x7); + +#define inc_le128(x, minus_one, tmp) \ + vpcmpeqq minus_one, x, tmp; \ + vpsubq minus_one, x, x; \ + vpslldq $8, tmp, tmp; \ + vpsubq tmp, x, x; + +#define add2_le128(x, minus_one, minus_two, tmp1, tmp2) \ + vpcmpeqq minus_one, x, tmp1; \ + vpcmpeqq minus_two, x, tmp2; \ + vpsubq minus_two, x, x; \ + vpor tmp2, tmp1, tmp1; \ + vpslldq $8, tmp1, tmp1; \ + vpsubq tmp1, x, x; + +#define load_ctr_16way(iv, bswap, x0, x1, x2, x3, x4, x5, x6, x7, t0, t0x, t1, \ + t1x, t2, t2x, t3, t3x, t4, t5) \ + vpcmpeqd t0, t0, t0; \ + vpsrldq $8, t0, t0; /* ab: -1:0 ; cd: -1:0 */ \ + vpaddq t0, t0, t4; /* ab: -2:0 ; cd: -2:0 */\ + \ + /* load IV and byteswap */ \ + vmovdqu (iv), t2x; \ + vmovdqa t2x, t3x; \ + inc_le128(t2x, t0x, t1x); \ + vbroadcasti128 bswap, t1; \ + vinserti128 $1, t2x, t3, t2; /* ab: le0 ; cd: le1 */ \ + vpshufb t1, t2, x0; \ + \ + /* construct IVs */ \ + add2_le128(t2, t0, t4, t3, t5); /* ab: le2 ; cd: le3 */ \ + vpshufb t1, t2, x1; \ + add2_le128(t2, t0, t4, t3, t5); \ + vpshufb t1, t2, x2; \ + add2_le128(t2, t0, t4, t3, t5); \ + vpshufb t1, t2, x3; \ + add2_le128(t2, t0, t4, t3, t5); \ + vpshufb t1, t2
[RFC PATCH 5/6] crypto: serpent - add AVX2/x86_64 assembler implementation of serpent cipher
Patch adds AVX2/x86-64 implementation of Serpent cipher, requiring 16 parallel blocks for input (256 bytes). Implementation is based on the AVX implementation and extends to use the 256-bit wide YMM registers. Since serpent does not use table look-ups, this implementation should be close to two times faster than the AVX implementation. Signed-off-by: Jussi Kivilinna jussi.kivili...@iki.fi --- arch/x86/crypto/Makefile |2 arch/x86/crypto/serpent-avx2-asm_64.S | 800 + arch/x86/crypto/serpent_avx2_glue.c | 562 arch/x86/crypto/serpent_avx_glue.c| 62 ++ arch/x86/include/asm/crypto/serpent-avx.h | 24 + crypto/Kconfig| 23 + crypto/testmgr.c | 15 + 7 files changed, 1468 insertions(+), 20 deletions(-) create mode 100644 arch/x86/crypto/serpent-avx2-asm_64.S create mode 100644 arch/x86/crypto/serpent_avx2_glue.c diff --git a/arch/x86/crypto/Makefile b/arch/x86/crypto/Makefile index 1f6e0c2..a21af59 100644 --- a/arch/x86/crypto/Makefile +++ b/arch/x86/crypto/Makefile @@ -43,6 +43,7 @@ endif # These modules require assembler to support AVX2. ifeq ($(avx2_supported),yes) obj-$(CONFIG_CRYPTO_BLOWFISH_AVX2_X86_64) += blowfish-avx2.o + obj-$(CONFIG_CRYPTO_SERPENT_AVX2_X86_64) += serpent-avx2.o obj-$(CONFIG_CRYPTO_TWOFISH_AVX2_X86_64) += twofish-avx2.o endif @@ -72,6 +73,7 @@ endif ifeq ($(avx2_supported),yes) blowfish-avx2-y := blowfish-avx2-asm_64.o blowfish_avx2_glue.o + serpent-avx2-y := serpent-avx2-asm_64.o serpent_avx2_glue.o twofish-avx2-y := twofish-avx2-asm_64.o twofish_avx2_glue.o endif diff --git a/arch/x86/crypto/serpent-avx2-asm_64.S b/arch/x86/crypto/serpent-avx2-asm_64.S new file mode 100644 index 000..b222085 --- /dev/null +++ b/arch/x86/crypto/serpent-avx2-asm_64.S @@ -0,0 +1,800 @@ +/* + * x86_64/AVX2 assembler optimized version of Serpent + * + * Copyright © 2012-2013 Jussi Kivilinna jussi.kivili...@mbnet.fi + * + * Based on AVX assembler implementation of Serpent by: + * Copyright © 2012 Johannes Goetzfried + * johannes.goetzfr...@informatik.stud.uni-erlangen.de + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + */ + +#include linux/linkage.h +#include glue_helper-asm-avx2.S + +.file serpent-avx2-asm_64.S + +.data +.align 16 + +.Lbswap128_mask: + .byte 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0 +.Lxts_gf128mul_and_shl1_mask_0: + .byte 0x87, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0 +.Lxts_gf128mul_and_shl1_mask_1: + .byte 0x0e, 1, 0, 0, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0 + +.text + +#define CTX %rdi + +#define RNOT %ymm0 +#define tp %ymm1 + +#define RA1 %ymm2 +#define RA2 %ymm3 +#define RB1 %ymm4 +#define RB2 %ymm5 +#define RC1 %ymm6 +#define RC2 %ymm7 +#define RD1 %ymm8 +#define RD2 %ymm9 +#define RE1 %ymm10 +#define RE2 %ymm11 + +#define RK0 %ymm12 +#define RK1 %ymm13 +#define RK2 %ymm14 +#define RK3 %ymm15 + +#define RK0x %xmm12 +#define RK1x %xmm13 +#define RK2x %xmm14 +#define RK3x %xmm15 + +#define S0_1(x0, x1, x2, x3, x4) \ + vporx0, x3, tp; \ + vpxor x3, x0, x0; \ + vpxor x2, x3, x4; \ + vpxor RNOT, x4, x4; \ + vpxor x1, tp, x3; \ + vpand x0, x1, x1; \ + vpxor x4, x1, x1; \ + vpxor x0, x2, x2; +#define S0_2(x0, x1, x2, x3, x4) \ + vpxor x3, x0, x0; \ + vporx0, x4, x4; \ + vpxor x2, x0, x0; \ + vpand x1, x2, x2; \ + vpxor x2, x3, x3; \ + vpxor RNOT, x1, x1; \ + vpxor x4, x2, x2; \ + vpxor x2, x1, x1; + +#define S1_1(x0, x1, x2, x3, x4) \ + vpxor x0, x1, tp; \ + vpxor x3, x0, x0; \ + vpxor RNOT, x3, x3; \ + vpand tp, x1, x4; \ + vportp, x0, x0; \ + vpxor x2, x3, x3; \ + vpxor x3, x0, x0; \ + vpxor x3, tp, x1; +#define S1_2(x0, x1, x2, x3, x4) \ + vpxor x4, x3, x3; \ + vporx4, x1, x1; \ + vpxor x2, x4, x4; \ + vpand x0, x2, x2; \ + vpxor x1, x2, x2; \ + vporx0, x1, x1; \ + vpxor RNOT, x0, x0; \ + vpxor x2, x0, x0; \ + vpxor x1, x4, x4; + +#define S2_1(x0, x1, x2, x3, x4) \ + vpxor RNOT, x3, x3; \ + vpxor x0, x1, x1; \ + vpand x2, x0, tp; \ + vpxor x3
[RFC PATCH 2/6] crypto: tcrypt - add async cipher speed tests for blowfish
Signed-off-by: Jussi Kivilinna jussi.kivili...@iki.fi --- crypto/tcrypt.c | 15 +++ 1 file changed, 15 insertions(+) diff --git a/crypto/tcrypt.c b/crypto/tcrypt.c index 24ea7df..66d254c 100644 --- a/crypto/tcrypt.c +++ b/crypto/tcrypt.c @@ -1768,6 +1768,21 @@ static int do_test(int m) speed_template_32_64); break; + case 509: + test_acipher_speed(ecb(blowfish), ENCRYPT, sec, NULL, 0, + speed_template_8_32); + test_acipher_speed(ecb(blowfish), DECRYPT, sec, NULL, 0, + speed_template_8_32); + test_acipher_speed(cbc(blowfish), ENCRYPT, sec, NULL, 0, + speed_template_8_32); + test_acipher_speed(cbc(blowfish), DECRYPT, sec, NULL, 0, + speed_template_8_32); + test_acipher_speed(ctr(blowfish), ENCRYPT, sec, NULL, 0, + speed_template_8_32); + test_acipher_speed(ctr(blowfish), DECRYPT, sec, NULL, 0, + speed_template_8_32); + break; + case 1000: test_available(); break; -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] crypto: aesni_intel - fix Kconfig problem with CRYPTO_GLUE_HELPER_X86
The Kconfig setting for glue helper module is CRYPTO_GLUE_HELPER_X86, but recent change for aesni_intel used CRYPTO_GLUE_HELPER instead. Patch corrects this issue. Cc: kbuild-...@01.org Reported-by: kbuild test robot fengguang...@intel.com Signed-off-by: Jussi Kivilinna jussi.kivili...@iki.fi --- crypto/Kconfig |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/crypto/Kconfig b/crypto/Kconfig index 808ac37..0e7a237 100644 --- a/crypto/Kconfig +++ b/crypto/Kconfig @@ -678,7 +678,7 @@ config CRYPTO_AES_NI_INTEL select CRYPTO_CRYPTD select CRYPTO_ABLK_HELPER_X86 select CRYPTO_ALGAPI - select CRYPTO_GLUE_HELPER if 64BIT + select CRYPTO_GLUE_HELPER_X86 if 64BIT select CRYPTO_LRW select CRYPTO_XTS help -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/2] xfrm: add rfc4494 AES-CMAC-96 support
Now that CryptoAPI has support for CMAC, we can add support for AES-CMAC-96 (rfc4494). Cc: Tom St Denis tstde...@elliptictech.com Signed-off-by: Jussi Kivilinna jussi.kivili...@iki.fi --- net/xfrm/xfrm_algo.c | 13 + 1 file changed, 13 insertions(+) diff --git a/net/xfrm/xfrm_algo.c b/net/xfrm/xfrm_algo.c index 6fb9d00..ab4ef72 100644 --- a/net/xfrm/xfrm_algo.c +++ b/net/xfrm/xfrm_algo.c @@ -311,6 +311,19 @@ static struct xfrm_algo_desc aalg_list[] = { .sadb_alg_maxbits = 128 } }, +{ + /* rfc4494 */ + .name = cmac(aes), + + .uinfo = { + .auth = { + .icv_truncbits = 96, + .icv_fullbits = 128, + } + }, + + .pfkey_supported = 0, +}, }; static struct xfrm_algo_desc ealg_list[] = { -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] crypto: add CMAC support to CryptoAPI
On 08.04.2013 11:24, Steffen Klassert wrote: On Mon, Apr 08, 2013 at 10:48:44AM +0300, Jussi Kivilinna wrote: Patch adds support for NIST recommended block cipher mode CMAC to CryptoAPI. This work is based on Tom St Denis' earlier patch, http://marc.info/?l=linux-crypto-vgerm=135877306305466w=2 Cc: Tom St Denis tstde...@elliptictech.com Signed-off-by: Jussi Kivilinna jussi.kivili...@iki.fi This patch does not apply clean to the ipsec-next tree because of some crypto changes I don't have in ipsec-next. The IPsec part should apply to the cryptodev tree, so it's probaply the best if we route this patchset through the cryptodev tree. I should have mentioned that the patchset is on top of cryptodev tree and previous crypto patches that I send yesterday, likely to cause problems atleast at tcrypt.c: http://marc.info/?l=linux-crypto-vgerm=136534223503368w=2 -Jussi Herbert, are you going to take these patches? -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/5] crypto: x86 - add more optimized XTS-mode for serpent-avx
This patch adds AVX optimized XTS-mode helper functions/macros and converts serpent-avx to use the new facilities. Benefits are slightly improved speed and reduced stack usage as use of temporary IV-array is avoided. tcrypt results, with Intel i5-2450M: enc dec 16B 1.00x 1.00x 64B 1.00x 1.00x 256B1.04x 1.06x 1024B 1.09x 1.09x 8192B 1.10x 1.09x Signed-off-by: Jussi Kivilinna jussi.kivili...@iki.fi --- arch/x86/crypto/glue_helper-asm-avx.S | 61 + arch/x86/crypto/glue_helper.c | 97 +++ arch/x86/crypto/serpent-avx-x86_64-asm_64.S | 45 - arch/x86/crypto/serpent_avx_glue.c | 87 +--- arch/x86/include/asm/crypto/glue_helper.h | 24 +++ arch/x86/include/asm/crypto/serpent-avx.h |5 + 6 files changed, 273 insertions(+), 46 deletions(-) diff --git a/arch/x86/crypto/glue_helper-asm-avx.S b/arch/x86/crypto/glue_helper-asm-avx.S index f7b6ea2..02ee230 100644 --- a/arch/x86/crypto/glue_helper-asm-avx.S +++ b/arch/x86/crypto/glue_helper-asm-avx.S @@ -1,7 +1,7 @@ /* * Shared glue code for 128bit block ciphers, AVX assembler macros * - * Copyright (c) 2012 Jussi Kivilinna jussi.kivili...@mbnet.fi + * Copyright © 2012-2013 Jussi Kivilinna jussi.kivili...@iki.fi * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by @@ -89,3 +89,62 @@ vpxor (6*16)(src), x6, x6; \ vpxor (7*16)(src), x7, x7; \ store_8way(dst, x0, x1, x2, x3, x4, x5, x6, x7); + +#define gf128mul_x_ble(iv, mask, tmp) \ + vpsrad $31, iv, tmp; \ + vpaddq iv, iv, iv; \ + vpshufd $0x13, tmp, tmp; \ + vpand mask, tmp, tmp; \ + vpxor tmp, iv, iv; + +#define load_xts_8way(iv, src, dst, x0, x1, x2, x3, x4, x5, x6, x7, tiv, t0, \ + t1, xts_gf128mul_and_shl1_mask) \ + vmovdqa xts_gf128mul_and_shl1_mask, t0; \ + \ + /* load IV */ \ + vmovdqu (iv), tiv; \ + vpxor (0*16)(src), tiv, x0; \ + vmovdqu tiv, (0*16)(dst); \ + \ + /* construct and store IVs, also xor with source */ \ + gf128mul_x_ble(tiv, t0, t1); \ + vpxor (1*16)(src), tiv, x1; \ + vmovdqu tiv, (1*16)(dst); \ + \ + gf128mul_x_ble(tiv, t0, t1); \ + vpxor (2*16)(src), tiv, x2; \ + vmovdqu tiv, (2*16)(dst); \ + \ + gf128mul_x_ble(tiv, t0, t1); \ + vpxor (3*16)(src), tiv, x3; \ + vmovdqu tiv, (3*16)(dst); \ + \ + gf128mul_x_ble(tiv, t0, t1); \ + vpxor (4*16)(src), tiv, x4; \ + vmovdqu tiv, (4*16)(dst); \ + \ + gf128mul_x_ble(tiv, t0, t1); \ + vpxor (5*16)(src), tiv, x5; \ + vmovdqu tiv, (5*16)(dst); \ + \ + gf128mul_x_ble(tiv, t0, t1); \ + vpxor (6*16)(src), tiv, x6; \ + vmovdqu tiv, (6*16)(dst); \ + \ + gf128mul_x_ble(tiv, t0, t1); \ + vpxor (7*16)(src), tiv, x7; \ + vmovdqu tiv, (7*16)(dst); \ + \ + gf128mul_x_ble(tiv, t0, t1); \ + vmovdqu tiv, (iv); + +#define store_xts_8way(dst, x0, x1, x2, x3, x4, x5, x6, x7) \ + vpxor (0*16)(dst), x0, x0; \ + vpxor (1*16)(dst), x1, x1; \ + vpxor (2*16)(dst), x2, x2; \ + vpxor (3*16)(dst), x3, x3; \ + vpxor (4*16)(dst), x4, x4; \ + vpxor (5*16)(dst), x5, x5; \ + vpxor (6*16)(dst), x6, x6; \ + vpxor (7*16)(dst), x7, x7; \ + store_8way(dst, x0, x1, x2, x3, x4, x5, x6, x7); diff --git a/arch/x86/crypto/glue_helper.c b/arch/x86/crypto/glue_helper.c index 22ce4f6..432f1d76 100644 --- a/arch/x86/crypto/glue_helper.c +++ b/arch/x86/crypto/glue_helper.c @@ -1,7 +1,7 @@ /* * Shared glue code for 128bit block ciphers * - * Copyright (c) 2012 Jussi Kivilinna jussi.kivili...@mbnet.fi + * Copyright © 2012-2013 Jussi Kivilinna jussi.kivili...@iki.fi * * CBC ECB parts based on code (crypto/cbc.c,ecb.c) by: * Copyright (c) 2006 Herbert Xu herb...@gondor.apana.org.au @@ -304,4 +304,99 @@ int glue_ctr_crypt_128bit(const struct common_glue_ctx *gctx, } EXPORT_SYMBOL_GPL(glue_ctr_crypt_128bit); +static unsigned int __glue_xts_crypt_128bit(const struct common_glue_ctx *gctx, + void *ctx, + struct blkcipher_desc *desc, + struct blkcipher_walk *walk) +{ + const unsigned int bsize = 128 / 8; + unsigned int nbytes = walk-nbytes; + u128 *src = (u128 *)walk-src.virt.addr; + u128 *dst = (u128 *)walk-dst.virt.addr; + unsigned int num_blocks, func_bytes; + unsigned int i; + + /* Process multi-block batch */ + for (i = 0; i gctx-num_funcs; i++) { + num_blocks = gctx-funcs[i].num_blocks; + func_bytes = bsize * num_blocks; + + if (nbytes = func_bytes
[PATCH 3/5] crypto: cast6-avx: use new optimized XTS code
Change cast6-avx to use the new XTS code, for smaller stack usage and small boost to performance. tcrypt results, with Intel i5-2450M: enc dec 16B 1.01x 1.01x 64B 1.01x 1.00x 256B1.09x 1.02x 1024B 1.08x 1.06x 8192B 1.08x 1.07x Signed-off-by: Jussi Kivilinna jussi.kivili...@iki.fi --- arch/x86/crypto/cast6-avx-x86_64-asm_64.S | 48 +++ arch/x86/crypto/cast6_avx_glue.c | 91 - 2 files changed, 98 insertions(+), 41 deletions(-) diff --git a/arch/x86/crypto/cast6-avx-x86_64-asm_64.S b/arch/x86/crypto/cast6-avx-x86_64-asm_64.S index f93b610..e3531f8 100644 --- a/arch/x86/crypto/cast6-avx-x86_64-asm_64.S +++ b/arch/x86/crypto/cast6-avx-x86_64-asm_64.S @@ -4,7 +4,7 @@ * Copyright (C) 2012 Johannes Goetzfried * johannes.goetzfr...@informatik.stud.uni-erlangen.de * - * Copyright © 2012 Jussi Kivilinna jussi.kivili...@mbnet.fi + * Copyright © 2012-2013 Jussi Kivilinna jussi.kivili...@iki.fi * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by @@ -227,6 +227,8 @@ .data .align 16 +.Lxts_gf128mul_and_shl1_mask: + .byte 0x87, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0 .Lbswap_mask: .byte 3, 2, 1, 0, 7, 6, 5, 4, 11, 10, 9, 8, 15, 14, 13, 12 .Lbswap128_mask: @@ -424,3 +426,47 @@ ENTRY(cast6_ctr_8way) ret; ENDPROC(cast6_ctr_8way) + +ENTRY(cast6_xts_enc_8way) + /* input: +* %rdi: ctx, CTX +* %rsi: dst +* %rdx: src +* %rcx: iv (t ⊕ αⁿ ∈ GF(2¹²⁸)) +*/ + + movq %rsi, %r11; + + /* regs = src, dst = IVs, regs = regs xor IVs */ + load_xts_8way(%rcx, %rdx, %rsi, RA1, RB1, RC1, RD1, RA2, RB2, RC2, RD2, + RX, RKR, RKM, .Lxts_gf128mul_and_shl1_mask); + + call __cast6_enc_blk8; + + /* dst = regs xor IVs(in dst) */ + store_xts_8way(%r11, RA1, RB1, RC1, RD1, RA2, RB2, RC2, RD2); + + ret; +ENDPROC(cast6_xts_enc_8way) + +ENTRY(cast6_xts_dec_8way) + /* input: +* %rdi: ctx, CTX +* %rsi: dst +* %rdx: src +* %rcx: iv (t ⊕ αⁿ ∈ GF(2¹²⁸)) +*/ + + movq %rsi, %r11; + + /* regs = src, dst = IVs, regs = regs xor IVs */ + load_xts_8way(%rcx, %rdx, %rsi, RA1, RB1, RC1, RD1, RA2, RB2, RC2, RD2, + RX, RKR, RKM, .Lxts_gf128mul_and_shl1_mask); + + call __cast6_dec_blk8; + + /* dst = regs xor IVs(in dst) */ + store_xts_8way(%r11, RA1, RB1, RC1, RD1, RA2, RB2, RC2, RD2); + + ret; +ENDPROC(cast6_xts_dec_8way) diff --git a/arch/x86/crypto/cast6_avx_glue.c b/arch/x86/crypto/cast6_avx_glue.c index 92f7ca2..8d0dfb8 100644 --- a/arch/x86/crypto/cast6_avx_glue.c +++ b/arch/x86/crypto/cast6_avx_glue.c @@ -4,6 +4,8 @@ * Copyright (C) 2012 Johannes Goetzfried * johannes.goetzfr...@informatik.stud.uni-erlangen.de * + * Copyright © 2013 Jussi Kivilinna jussi.kivili...@iki.fi + * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or @@ -50,6 +52,23 @@ asmlinkage void cast6_cbc_dec_8way(struct cast6_ctx *ctx, u8 *dst, asmlinkage void cast6_ctr_8way(struct cast6_ctx *ctx, u8 *dst, const u8 *src, le128 *iv); +asmlinkage void cast6_xts_enc_8way(struct cast6_ctx *ctx, u8 *dst, + const u8 *src, le128 *iv); +asmlinkage void cast6_xts_dec_8way(struct cast6_ctx *ctx, u8 *dst, + const u8 *src, le128 *iv); + +static void cast6_xts_enc(void *ctx, u128 *dst, const u128 *src, le128 *iv) +{ + glue_xts_crypt_128bit_one(ctx, dst, src, iv, + GLUE_FUNC_CAST(__cast6_encrypt)); +} + +static void cast6_xts_dec(void *ctx, u128 *dst, const u128 *src, le128 *iv) +{ + glue_xts_crypt_128bit_one(ctx, dst, src, iv, + GLUE_FUNC_CAST(__cast6_decrypt)); +} + static void cast6_crypt_ctr(void *ctx, u128 *dst, const u128 *src, le128 *iv) { be128 ctrblk; @@ -87,6 +106,19 @@ static const struct common_glue_ctx cast6_ctr = { } } }; +static const struct common_glue_ctx cast6_enc_xts = { + .num_funcs = 2, + .fpu_blocks_limit = CAST6_PARALLEL_BLOCKS, + + .funcs = { { + .num_blocks = CAST6_PARALLEL_BLOCKS, + .fn_u = { .xts = GLUE_XTS_FUNC_CAST(cast6_xts_enc_8way) } + }, { + .num_blocks = 1, + .fn_u = { .xts = GLUE_XTS_FUNC_CAST(cast6_xts_enc) } + } } +}; + static const struct common_glue_ctx cast6_dec = { .num_funcs = 2, .fpu_blocks_limit = CAST6_PARALLEL_BLOCKS, @@ -113,6 +145,19 @@ static const struct common_glue_ctx cast6_dec_cbc = { } } }; +static
[PATCH 2/5] crypto: x86/twofish-avx - use optimized XTS code
Change twofish-avx to use the new XTS code, for smaller stack usage and small boost to performance. tcrypt results, with Intel i5-2450M: enc dec 16B 1.03x 1.02x 64B 0.91x 0.91x 256B1.10x 1.09x 1024B 1.12x 1.11x 8192B 1.12x 1.11x Since XTS is practically always used with data blocks of size 512 bytes or more, I chose to not make use of twofish-3way for block sized smaller than 128 bytes. This causes slower result in tcrypt for 64 bytes. Signed-off-by: Jussi Kivilinna jussi.kivili...@iki.fi --- arch/x86/crypto/twofish-avx-x86_64-asm_64.S | 48 ++ arch/x86/crypto/twofish_avx_glue.c | 91 +++ 2 files changed, 98 insertions(+), 41 deletions(-) diff --git a/arch/x86/crypto/twofish-avx-x86_64-asm_64.S b/arch/x86/crypto/twofish-avx-x86_64-asm_64.S index 8d3e113..0505813 100644 --- a/arch/x86/crypto/twofish-avx-x86_64-asm_64.S +++ b/arch/x86/crypto/twofish-avx-x86_64-asm_64.S @@ -4,7 +4,7 @@ * Copyright (C) 2012 Johannes Goetzfried * johannes.goetzfr...@informatik.stud.uni-erlangen.de * - * Copyright © 2012 Jussi Kivilinna jussi.kivili...@mbnet.fi + * Copyright © 2012-2013 Jussi Kivilinna jussi.kivili...@iki.fi * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by @@ -33,6 +33,8 @@ .Lbswap128_mask: .byte 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0 +.Lxts_gf128mul_and_shl1_mask: + .byte 0x87, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0 .text @@ -408,3 +410,47 @@ ENTRY(twofish_ctr_8way) ret; ENDPROC(twofish_ctr_8way) + +ENTRY(twofish_xts_enc_8way) + /* input: +* %rdi: ctx, CTX +* %rsi: dst +* %rdx: src +* %rcx: iv (t ⊕ αⁿ ∈ GF(2¹²⁸)) +*/ + + movq %rsi, %r11; + + /* regs = src, dst = IVs, regs = regs xor IVs */ + load_xts_8way(%rcx, %rdx, %rsi, RA1, RB1, RC1, RD1, RA2, RB2, RC2, RD2, + RX0, RX1, RY0, .Lxts_gf128mul_and_shl1_mask); + + call __twofish_enc_blk8; + + /* dst = regs xor IVs(in dst) */ + store_xts_8way(%r11, RC1, RD1, RA1, RB1, RC2, RD2, RA2, RB2); + + ret; +ENDPROC(twofish_xts_enc_8way) + +ENTRY(twofish_xts_dec_8way) + /* input: +* %rdi: ctx, CTX +* %rsi: dst +* %rdx: src +* %rcx: iv (t ⊕ αⁿ ∈ GF(2¹²⁸)) +*/ + + movq %rsi, %r11; + + /* regs = src, dst = IVs, regs = regs xor IVs */ + load_xts_8way(%rcx, %rdx, %rsi, RC1, RD1, RA1, RB1, RC2, RD2, RA2, RB2, + RX0, RX1, RY0, .Lxts_gf128mul_and_shl1_mask); + + call __twofish_dec_blk8; + + /* dst = regs xor IVs(in dst) */ + store_xts_8way(%r11, RA1, RB1, RC1, RD1, RA2, RB2, RC2, RD2); + + ret; +ENDPROC(twofish_xts_dec_8way) diff --git a/arch/x86/crypto/twofish_avx_glue.c b/arch/x86/crypto/twofish_avx_glue.c index 94ac91d..a62ba54 100644 --- a/arch/x86/crypto/twofish_avx_glue.c +++ b/arch/x86/crypto/twofish_avx_glue.c @@ -4,6 +4,8 @@ * Copyright (C) 2012 Johannes Goetzfried * johannes.goetzfr...@informatik.stud.uni-erlangen.de * + * Copyright © 2013 Jussi Kivilinna jussi.kivili...@iki.fi + * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or @@ -56,12 +58,29 @@ asmlinkage void twofish_cbc_dec_8way(struct twofish_ctx *ctx, u8 *dst, asmlinkage void twofish_ctr_8way(struct twofish_ctx *ctx, u8 *dst, const u8 *src, le128 *iv); +asmlinkage void twofish_xts_enc_8way(struct twofish_ctx *ctx, u8 *dst, +const u8 *src, le128 *iv); +asmlinkage void twofish_xts_dec_8way(struct twofish_ctx *ctx, u8 *dst, +const u8 *src, le128 *iv); + static inline void twofish_enc_blk_3way(struct twofish_ctx *ctx, u8 *dst, const u8 *src) { __twofish_enc_blk_3way(ctx, dst, src, false); } +static void twofish_xts_enc(void *ctx, u128 *dst, const u128 *src, le128 *iv) +{ + glue_xts_crypt_128bit_one(ctx, dst, src, iv, + GLUE_FUNC_CAST(twofish_enc_blk)); +} + +static void twofish_xts_dec(void *ctx, u128 *dst, const u128 *src, le128 *iv) +{ + glue_xts_crypt_128bit_one(ctx, dst, src, iv, + GLUE_FUNC_CAST(twofish_dec_blk)); +} + static const struct common_glue_ctx twofish_enc = { .num_funcs = 3, @@ -95,6 +114,19 @@ static const struct common_glue_ctx twofish_ctr = { } } }; +static const struct common_glue_ctx twofish_enc_xts = { + .num_funcs = 2, + .fpu_blocks_limit = TWOFISH_PARALLEL_BLOCKS, + + .funcs = { { + .num_blocks = TWOFISH_PARALLEL_BLOCKS
[PATCH 4/5] crypto: x86/camellia-aesni-avx - add more optimized XTS code
Add more optimized XTS code for camellia-aesni-avx, for smaller stack usage and small boost for speed. tcrypt results, with Intel i5-2450M: enc dec 16B 1.10x 1.01x 64B 0.82x 0.77x 256B1.14x 1.10x 1024B 1.17x 1.16x 8192B 1.10x 1.11x Since XTS is practically always used with data blocks of size 512 bytes or more, I chose to not make use of camellia-2way for block sized smaller than 256 bytes. This causes slower result in tcrypt for 64 bytes. Signed-off-by: Jussi Kivilinna jussi.kivili...@iki.fi --- arch/x86/crypto/camellia-aesni-avx-asm_64.S | 180 +++ arch/x86/crypto/camellia_aesni_avx_glue.c | 91 -- 2 files changed, 229 insertions(+), 42 deletions(-) diff --git a/arch/x86/crypto/camellia-aesni-avx-asm_64.S b/arch/x86/crypto/camellia-aesni-avx-asm_64.S index cfc1634..ce71f92 100644 --- a/arch/x86/crypto/camellia-aesni-avx-asm_64.S +++ b/arch/x86/crypto/camellia-aesni-avx-asm_64.S @@ -1,7 +1,7 @@ /* * x86_64/AVX/AES-NI assembler implementation of Camellia * - * Copyright © 2012 Jussi Kivilinna jussi.kivili...@mbnet.fi + * Copyright © 2012-2013 Jussi Kivilinna jussi.kivili...@iki.fi * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by @@ -589,6 +589,10 @@ ENDPROC(roundsm16_x4_x5_x6_x7_x0_x1_x2_x3_y4_y5_y6_y7_y0_y1_y2_y3_ab) .Lbswap128_mask: .byte 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0 +/* For XTS mode IV generation */ +.Lxts_gf128mul_and_shl1_mask: + .byte 0x87, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0 + /* * pre-SubByte transform * @@ -1090,3 +1094,177 @@ ENTRY(camellia_ctr_16way) ret; ENDPROC(camellia_ctr_16way) + +#define gf128mul_x_ble(iv, mask, tmp) \ + vpsrad $31, iv, tmp; \ + vpaddq iv, iv, iv; \ + vpshufd $0x13, tmp, tmp; \ + vpand mask, tmp, tmp; \ + vpxor tmp, iv, iv; + +.align 8 +camellia_xts_crypt_16way: + /* input: +* %rdi: ctx, CTX +* %rsi: dst (16 blocks) +* %rdx: src (16 blocks) +* %rcx: iv (t ⊕ αⁿ ∈ GF(2¹²⁸)) +* %r8: index for input whitening key +* %r9: pointer to __camellia_enc_blk16 or __camellia_dec_blk16 +*/ + + subq $(16 * 16), %rsp; + movq %rsp, %rax; + + vmovdqa .Lxts_gf128mul_and_shl1_mask, %xmm14; + + /* load IV */ + vmovdqu (%rcx), %xmm0; + vpxor 0 * 16(%rdx), %xmm0, %xmm15; + vmovdqu %xmm15, 15 * 16(%rax); + vmovdqu %xmm0, 0 * 16(%rsi); + + /* construct IVs */ + gf128mul_x_ble(%xmm0, %xmm14, %xmm15); + vpxor 1 * 16(%rdx), %xmm0, %xmm15; + vmovdqu %xmm15, 14 * 16(%rax); + vmovdqu %xmm0, 1 * 16(%rsi); + + gf128mul_x_ble(%xmm0, %xmm14, %xmm15); + vpxor 2 * 16(%rdx), %xmm0, %xmm13; + vmovdqu %xmm0, 2 * 16(%rsi); + + gf128mul_x_ble(%xmm0, %xmm14, %xmm15); + vpxor 3 * 16(%rdx), %xmm0, %xmm12; + vmovdqu %xmm0, 3 * 16(%rsi); + + gf128mul_x_ble(%xmm0, %xmm14, %xmm15); + vpxor 4 * 16(%rdx), %xmm0, %xmm11; + vmovdqu %xmm0, 4 * 16(%rsi); + + gf128mul_x_ble(%xmm0, %xmm14, %xmm15); + vpxor 5 * 16(%rdx), %xmm0, %xmm10; + vmovdqu %xmm0, 5 * 16(%rsi); + + gf128mul_x_ble(%xmm0, %xmm14, %xmm15); + vpxor 6 * 16(%rdx), %xmm0, %xmm9; + vmovdqu %xmm0, 6 * 16(%rsi); + + gf128mul_x_ble(%xmm0, %xmm14, %xmm15); + vpxor 7 * 16(%rdx), %xmm0, %xmm8; + vmovdqu %xmm0, 7 * 16(%rsi); + + gf128mul_x_ble(%xmm0, %xmm14, %xmm15); + vpxor 8 * 16(%rdx), %xmm0, %xmm7; + vmovdqu %xmm0, 8 * 16(%rsi); + + gf128mul_x_ble(%xmm0, %xmm14, %xmm15); + vpxor 9 * 16(%rdx), %xmm0, %xmm6; + vmovdqu %xmm0, 9 * 16(%rsi); + + gf128mul_x_ble(%xmm0, %xmm14, %xmm15); + vpxor 10 * 16(%rdx), %xmm0, %xmm5; + vmovdqu %xmm0, 10 * 16(%rsi); + + gf128mul_x_ble(%xmm0, %xmm14, %xmm15); + vpxor 11 * 16(%rdx), %xmm0, %xmm4; + vmovdqu %xmm0, 11 * 16(%rsi); + + gf128mul_x_ble(%xmm0, %xmm14, %xmm15); + vpxor 12 * 16(%rdx), %xmm0, %xmm3; + vmovdqu %xmm0, 12 * 16(%rsi); + + gf128mul_x_ble(%xmm0, %xmm14, %xmm15); + vpxor 13 * 16(%rdx), %xmm0, %xmm2; + vmovdqu %xmm0, 13 * 16(%rsi); + + gf128mul_x_ble(%xmm0, %xmm14, %xmm15); + vpxor 14 * 16(%rdx), %xmm0, %xmm1; + vmovdqu %xmm0, 14 * 16(%rsi); + + gf128mul_x_ble(%xmm0, %xmm14, %xmm15); + vpxor 15 * 16(%rdx), %xmm0, %xmm15; + vmovdqu %xmm15, 0 * 16(%rax); + vmovdqu %xmm0, 15 * 16(%rsi); + + gf128mul_x_ble(%xmm0, %xmm14, %xmm15); + vmovdqu %xmm0, (%rcx); + + /* inpack16_pre: */ + vmovq (key_table)(CTX, %r8, 8), %xmm15; + vpshufb .Lpack_bswap, %xmm15, %xmm15; + vpxor 0 * 16(%rax), %xmm15, %xmm0; + vpxor %xmm1, %xmm15, %xmm1; + vpxor %xmm2, %xmm15, %xmm2; + vpxor %xmm3
[PATCH 5/5] crypto: aesni_intel - add more optimized XTS mode for x86-64
Add more optimized XTS code for aesni_intel in 64-bit mode, for smaller stack usage and boost for speed. tcrypt results, with Intel i5-2450M: 256-bit key enc dec 16B 0.98x 0.99x 64B 0.64x 0.63x 256B1.29x 1.32x 1024B 1.54x 1.58x 8192B 1.57x 1.60x 512-bit key enc dec 16B 0.98x 0.99x 64B 0.60x 0.59x 256B1.24x 1.25x 1024B 1.39x 1.42x 8192B 1.38x 1.42x I chose not to optimize smaller than block size of 256 bytes, since XTS is practically always used with data blocks of size 512 bytes. This is why performance is reduced in tcrypt for 64 byte long blocks. Cc: Huang Ying ying.hu...@intel.com Signed-off-by: Jussi Kivilinna jussi.kivili...@iki.fi --- arch/x86/crypto/aesni-intel_asm.S | 117 arch/x86/crypto/aesni-intel_glue.c | 80 + crypto/Kconfig |1 3 files changed, 198 insertions(+) diff --git a/arch/x86/crypto/aesni-intel_asm.S b/arch/x86/crypto/aesni-intel_asm.S index 04b7977..62fe22c 100644 --- a/arch/x86/crypto/aesni-intel_asm.S +++ b/arch/x86/crypto/aesni-intel_asm.S @@ -34,6 +34,10 @@ #ifdef __x86_64__ .data +.align 16 +.Lgf128mul_x_ble_mask: + .octa 0x00010087 + POLY: .octa 0xC201 TWOONE: .octa 0x00010001 @@ -105,6 +109,8 @@ enc:.octa 0x2 #define CTR%xmm11 #define INC%xmm12 +#define GF128MUL_MASK %xmm10 + #ifdef __x86_64__ #define AREG %rax #define KEYP %rdi @@ -2636,4 +2642,115 @@ ENTRY(aesni_ctr_enc) .Lctr_enc_just_ret: ret ENDPROC(aesni_ctr_enc) + +/* + * _aesni_gf128mul_x_ble: internal ABI + * Multiply in GF(2^128) for XTS IVs + * input: + * IV: current IV + * GF128MUL_MASK == mask with 0x87 and 0x01 + * output: + * IV: next IV + * changed: + * CTR:== temporary value + */ +#define _aesni_gf128mul_x_ble() \ + pshufd $0x13, IV, CTR; \ + paddq IV, IV; \ + psrad $31, CTR; \ + pand GF128MUL_MASK, CTR; \ + pxor CTR, IV; + +/* + * void aesni_xts_crypt8(struct crypto_aes_ctx *ctx, const u8 *dst, u8 *src, + * bool enc, u8 *iv) + */ +ENTRY(aesni_xts_crypt8) + cmpb $0, %cl + movl $0, %ecx + movl $240, %r10d + leaq _aesni_enc4, %r11 + leaq _aesni_dec4, %rax + cmovel %r10d, %ecx + cmoveq %rax, %r11 + + movdqa .Lgf128mul_x_ble_mask, GF128MUL_MASK + movups (IVP), IV + + mov 480(KEYP), KLEN + addq %rcx, KEYP + + movdqa IV, STATE1 + pxor 0x00(INP), STATE1 + movdqu IV, 0x00(OUTP) + + _aesni_gf128mul_x_ble() + movdqa IV, STATE2 + pxor 0x10(INP), STATE2 + movdqu IV, 0x10(OUTP) + + _aesni_gf128mul_x_ble() + movdqa IV, STATE3 + pxor 0x20(INP), STATE3 + movdqu IV, 0x20(OUTP) + + _aesni_gf128mul_x_ble() + movdqa IV, STATE4 + pxor 0x30(INP), STATE4 + movdqu IV, 0x30(OUTP) + + call *%r11 + + pxor 0x00(OUTP), STATE1 + movdqu STATE1, 0x00(OUTP) + + _aesni_gf128mul_x_ble() + movdqa IV, STATE1 + pxor 0x40(INP), STATE1 + movdqu IV, 0x40(OUTP) + + pxor 0x10(OUTP), STATE2 + movdqu STATE2, 0x10(OUTP) + + _aesni_gf128mul_x_ble() + movdqa IV, STATE2 + pxor 0x50(INP), STATE2 + movdqu IV, 0x50(OUTP) + + pxor 0x20(OUTP), STATE3 + movdqu STATE3, 0x20(OUTP) + + _aesni_gf128mul_x_ble() + movdqa IV, STATE3 + pxor 0x60(INP), STATE3 + movdqu IV, 0x60(OUTP) + + pxor 0x30(OUTP), STATE4 + movdqu STATE4, 0x30(OUTP) + + _aesni_gf128mul_x_ble() + movdqa IV, STATE4 + pxor 0x70(INP), STATE4 + movdqu IV, 0x70(OUTP) + + _aesni_gf128mul_x_ble() + movups IV, (IVP) + + call *%r11 + + pxor 0x40(OUTP), STATE1 + movdqu STATE1, 0x40(OUTP) + + pxor 0x50(OUTP), STATE2 + movdqu STATE2, 0x50(OUTP) + + pxor 0x60(OUTP), STATE3 + movdqu STATE3, 0x60(OUTP) + + pxor 0x70(OUTP), STATE4 + movdqu STATE4, 0x70(OUTP) + + ret +ENDPROC(aesni_xts_crypt8) + #endif diff --git a/arch/x86/crypto/aesni-intel_glue.c b/arch/x86/crypto/aesni-intel_glue.c index a0795da..f80e668 100644 --- a/arch/x86/crypto/aesni-intel_glue.c +++ b/arch/x86/crypto/aesni-intel_glue.c @@ -39,6 +39,9 @@ #include crypto/internal/aead.h #include linux/workqueue.h #include linux/spinlock.h +#ifdef CONFIG_X86_64 +#include asm/crypto/glue_helper.h +#endif #if defined(CONFIG_CRYPTO_PCBC) || defined(CONFIG_CRYPTO_PCBC_MODULE) #define HAS_PCBC @@ -102,6 +105,9 @@ void crypto_fpu_exit(void); asmlinkage void aesni_ctr_enc(struct crypto_aes_ctx *ctx, u8 *out, const u8 *in, unsigned int len, u8 *iv); +asmlinkage void aesni_xts_crypt8(struct crypto_aes_ctx *ctx, u8 *out, +const u8
[PATCH 1/4] crypto: gcm - make GMAC work when dst and src are different
The GMAC code assumes that dst==src, which causes problems when trying to add rfc4543(gcm(aes)) test vectors. So fix this code to work when source and destination buffer are different. Signed-off-by: Jussi Kivilinna jussi.kivili...@iki.fi --- crypto/Kconfig |1 + crypto/gcm.c | 97 ++-- 2 files changed, 81 insertions(+), 17 deletions(-) diff --git a/crypto/Kconfig b/crypto/Kconfig index a654b13..6cc27f1 100644 --- a/crypto/Kconfig +++ b/crypto/Kconfig @@ -198,6 +198,7 @@ config CRYPTO_GCM select CRYPTO_CTR select CRYPTO_AEAD select CRYPTO_GHASH + select CRYPTO_NULL help Support for Galois/Counter Mode (GCM) and Galois Message Authentication Code (GMAC). Required for IPSec. diff --git a/crypto/gcm.c b/crypto/gcm.c index 137ad1e..4ff2139 100644 --- a/crypto/gcm.c +++ b/crypto/gcm.c @@ -37,8 +37,14 @@ struct crypto_rfc4106_ctx { u8 nonce[4]; }; +struct crypto_rfc4543_instance_ctx { + struct crypto_aead_spawn aead; + struct crypto_skcipher_spawn null; +}; + struct crypto_rfc4543_ctx { struct crypto_aead *child; + struct crypto_blkcipher *null; u8 nonce[4]; }; @@ -1094,20 +1100,20 @@ static int crypto_rfc4543_setauthsize(struct crypto_aead *parent, } static struct aead_request *crypto_rfc4543_crypt(struct aead_request *req, -int enc) +bool enc) { struct crypto_aead *aead = crypto_aead_reqtfm(req); struct crypto_rfc4543_ctx *ctx = crypto_aead_ctx(aead); struct crypto_rfc4543_req_ctx *rctx = crypto_rfc4543_reqctx(req); struct aead_request *subreq = rctx-subreq; - struct scatterlist *dst = req-dst; + struct scatterlist *src = req-src; struct scatterlist *cipher = rctx-cipher; struct scatterlist *payload = rctx-payload; struct scatterlist *assoc = rctx-assoc; unsigned int authsize = crypto_aead_authsize(aead); unsigned int assoclen = req-assoclen; - struct page *dstp; - u8 *vdst; + struct page *srcp; + u8 *vsrc; u8 *iv = PTR_ALIGN((u8 *)(rctx + 1) + crypto_aead_reqsize(ctx-child), crypto_aead_alignmask(ctx-child) + 1); @@ -1118,19 +1124,19 @@ static struct aead_request *crypto_rfc4543_crypt(struct aead_request *req, if (enc) memset(rctx-auth_tag, 0, authsize); else - scatterwalk_map_and_copy(rctx-auth_tag, dst, + scatterwalk_map_and_copy(rctx-auth_tag, src, req-cryptlen - authsize, authsize, 0); sg_init_one(cipher, rctx-auth_tag, authsize); /* construct the aad */ - dstp = sg_page(dst); - vdst = PageHighMem(dstp) ? NULL : page_address(dstp) + dst-offset; + srcp = sg_page(src); + vsrc = PageHighMem(srcp) ? NULL : page_address(srcp) + src-offset; sg_init_table(payload, 2); sg_set_buf(payload, req-iv, 8); - scatterwalk_crypto_chain(payload, dst, vdst == req-iv + 8, 2); + scatterwalk_crypto_chain(payload, src, vsrc == req-iv + 8, 2); assoclen += 8 + req-cryptlen - (enc ? 0 : authsize); sg_init_table(assoc, 2); @@ -1147,6 +1153,19 @@ static struct aead_request *crypto_rfc4543_crypt(struct aead_request *req, return subreq; } +static int crypto_rfc4543_copy_src_to_dst(struct aead_request *req, bool enc) +{ + struct crypto_aead *aead = crypto_aead_reqtfm(req); + struct crypto_rfc4543_ctx *ctx = crypto_aead_ctx(aead); + unsigned int authsize = crypto_aead_authsize(aead); + unsigned int nbytes = req-cryptlen - (enc ? 0 : authsize); + struct blkcipher_desc desc = { + .tfm = ctx-null, + }; + + return crypto_blkcipher_encrypt(desc, req-dst, req-src, nbytes); +} + static int crypto_rfc4543_encrypt(struct aead_request *req) { struct crypto_aead *aead = crypto_aead_reqtfm(req); @@ -1154,7 +1173,13 @@ static int crypto_rfc4543_encrypt(struct aead_request *req) struct aead_request *subreq; int err; - subreq = crypto_rfc4543_crypt(req, 1); + if (req-src != req-dst) { + err = crypto_rfc4543_copy_src_to_dst(req, true); + if (err) + return err; + } + + subreq = crypto_rfc4543_crypt(req, true); err = crypto_aead_encrypt(subreq); if (err) return err; @@ -1167,7 +1192,15 @@ static int crypto_rfc4543_encrypt(struct aead_request *req) static int crypto_rfc4543_decrypt(struct aead_request *req) { - req = crypto_rfc4543_crypt(req, 0); + int err; + + if (req-src != req-dst) { + err = crypto_rfc4543_copy_src_to_dst(req, false); + if (err
[PATCH 2/4] crypto: gcm - fix rfc4543 to handle async crypto correctly
If the gcm cipher used by rfc4543 does not complete request immediately, the authentication tag is not copied to destination buffer. Patch adds correct async logic for this case. Signed-off-by: Jussi Kivilinna jussi.kivili...@iki.fi --- crypto/gcm.c | 19 +-- 1 file changed, 17 insertions(+), 2 deletions(-) diff --git a/crypto/gcm.c b/crypto/gcm.c index 4ff2139..b0d3cb1 100644 --- a/crypto/gcm.c +++ b/crypto/gcm.c @@ -1099,6 +1099,21 @@ static int crypto_rfc4543_setauthsize(struct crypto_aead *parent, return crypto_aead_setauthsize(ctx-child, authsize); } +static void crypto_rfc4543_done(struct crypto_async_request *areq, int err) +{ + struct aead_request *req = areq-data; + struct crypto_aead *aead = crypto_aead_reqtfm(req); + struct crypto_rfc4543_req_ctx *rctx = crypto_rfc4543_reqctx(req); + + if (!err) { + scatterwalk_map_and_copy(rctx-auth_tag, req-dst, +req-cryptlen, +crypto_aead_authsize(aead), 1); + } + + aead_request_complete(req, err); +} + static struct aead_request *crypto_rfc4543_crypt(struct aead_request *req, bool enc) { @@ -1145,8 +1160,8 @@ static struct aead_request *crypto_rfc4543_crypt(struct aead_request *req, scatterwalk_crypto_chain(assoc, payload, 0, 2); aead_request_set_tfm(subreq, ctx-child); - aead_request_set_callback(subreq, req-base.flags, req-base.complete, - req-base.data); + aead_request_set_callback(subreq, req-base.flags, crypto_rfc4543_done, + req); aead_request_set_crypt(subreq, cipher, cipher, enc ? 0 : authsize, iv); aead_request_set_assoc(subreq, assoc, assoclen); -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/4] crypto: testmgr - add AES GMAC test vectors
Signed-off-by: Jussi Kivilinna jussi.kivili...@iki.fi --- crypto/tcrypt.c |4 ++ crypto/testmgr.c | 17 +- crypto/testmgr.h | 89 ++ 3 files changed, 108 insertions(+), 2 deletions(-) diff --git a/crypto/tcrypt.c b/crypto/tcrypt.c index 87ef7d6..6b911ef 100644 --- a/crypto/tcrypt.c +++ b/crypto/tcrypt.c @@ -1225,6 +1225,10 @@ static int do_test(int m) ret += tcrypt_test(rfc4106(gcm(aes))); break; + case 152: + ret += tcrypt_test(rfc4543(gcm(aes))); + break; + case 200: test_cipher_speed(ecb(aes), ENCRYPT, sec, NULL, 0, speed_template_16_24_32); diff --git a/crypto/testmgr.c b/crypto/testmgr.c index efd8b20..442ddb4 100644 --- a/crypto/testmgr.c +++ b/crypto/testmgr.c @@ -2696,8 +2696,6 @@ static const struct alg_test_desc alg_test_descs[] = { } } }, { - - .alg = rfc4309(ccm(aes)), .test = alg_test_aead, .fips_allowed = 1, @@ -2714,6 +2712,21 @@ static const struct alg_test_desc alg_test_descs[] = { } } }, { + .alg = rfc4543(gcm(aes)), + .test = alg_test_aead, + .suite = { + .aead = { + .enc = { + .vecs = aes_gcm_rfc4543_enc_tv_template, + .count = AES_GCM_4543_ENC_TEST_VECTORS + }, + .dec = { + .vecs = aes_gcm_rfc4543_dec_tv_template, + .count = AES_GCM_4543_DEC_TEST_VECTORS + }, + } + } + }, { .alg = rmd128, .test = alg_test_hash, .suite = { diff --git a/crypto/testmgr.h b/crypto/testmgr.h index b5721e0..92db37d 100644 --- a/crypto/testmgr.h +++ b/crypto/testmgr.h @@ -12680,6 +12680,8 @@ static struct cipher_testvec cast6_xts_dec_tv_template[] = { #define AES_GCM_DEC_TEST_VECTORS 8 #define AES_GCM_4106_ENC_TEST_VECTORS 7 #define AES_GCM_4106_DEC_TEST_VECTORS 7 +#define AES_GCM_4543_ENC_TEST_VECTORS 1 +#define AES_GCM_4543_DEC_TEST_VECTORS 2 #define AES_CCM_ENC_TEST_VECTORS 7 #define AES_CCM_DEC_TEST_VECTORS 7 #define AES_CCM_4309_ENC_TEST_VECTORS 7 @@ -18193,6 +18195,93 @@ static struct aead_testvec aes_gcm_rfc4106_dec_tv_template[] = { } }; +static struct aead_testvec aes_gcm_rfc4543_enc_tv_template[] = { + { /* From draft-mcgrew-gcm-test-01 */ + .key= \x4c\x80\xcd\xef\xbb\x5d\x10\xda + \x90\x6a\xc7\x3c\x36\x13\xa6\x34 + \x22\x43\x3c\x64, + .klen = 20, + .iv = zeroed_string, + .assoc = \x00\x00\x43\x21\x00\x00\x00\x07, + .alen = 8, + .input = \x45\x00\x00\x30\xda\x3a\x00\x00 + \x80\x01\xdf\x3b\xc0\xa8\x00\x05 + \xc0\xa8\x00\x01\x08\x00\xc6\xcd + \x02\x00\x07\x00\x61\x62\x63\x64 + \x65\x66\x67\x68\x69\x6a\x6b\x6c + \x6d\x6e\x6f\x70\x71\x72\x73\x74 + \x01\x02\x02\x01, + .ilen = 52, + .result = \x45\x00\x00\x30\xda\x3a\x00\x00 + \x80\x01\xdf\x3b\xc0\xa8\x00\x05 + \xc0\xa8\x00\x01\x08\x00\xc6\xcd + \x02\x00\x07\x00\x61\x62\x63\x64 + \x65\x66\x67\x68\x69\x6a\x6b\x6c + \x6d\x6e\x6f\x70\x71\x72\x73\x74 + \x01\x02\x02\x01\xf2\xa9\xa8\x36 + \xe1\x55\x10\x6a\xa8\xdc\xd6\x18 + \xe4\x09\x9a\xaa, + .rlen = 68, + } +}; + +static struct aead_testvec aes_gcm_rfc4543_dec_tv_template[] = { + { /* From draft-mcgrew-gcm-test-01 */ + .key= \x4c\x80\xcd\xef\xbb\x5d\x10\xda + \x90\x6a\xc7\x3c\x36\x13\xa6\x34 + \x22\x43\x3c\x64, + .klen = 20, + .iv = zeroed_string, + .assoc = \x00\x00\x43\x21\x00\x00\x00\x07, + .alen = 8, + .input = \x45\x00\x00\x30\xda\x3a\x00\x00 + \x80\x01\xdf\x3b\xc0\xa8\x00\x05 + \xc0\xa8\x00\x01\x08\x00\xc6\xcd + \x02\x00\x07\x00\x61\x62\x63\x64 + \x65\x66\x67\x68\x69\x6a\x6b\x6c + \x6d\x6e\x6f\x70\x71\x72\x73\x74 + \x01\x02\x02\x01\xf2\xa9\xa8\x36 + \xe1\x55\x10\x6a\xa8\xdc\xd6\x18
[PATCH 4/4] crypto: testmgr - add empty test vectors for null ciphers
Without these, kernel log shows: [5.984881] alg: No test for cipher_null (cipher_null-generic) [5.985096] alg: No test for ecb(cipher_null) (ecb-cipher_null) [5.985170] alg: No test for compress_null (compress_null-generic) [5.985297] alg: No test for digest_null (digest_null-generic) Signed-off-by: Jussi Kivilinna jussi.kivili...@iki.fi --- crypto/testmgr.c |9 + 1 file changed, 9 insertions(+) diff --git a/crypto/testmgr.c b/crypto/testmgr.c index 442ddb4..f37e544 100644 --- a/crypto/testmgr.c +++ b/crypto/testmgr.c @@ -1913,6 +1913,9 @@ static const struct alg_test_desc alg_test_descs[] = { } } }, { + .alg = compress_null, + .test = alg_test_null, + }, { .alg = crc32c, .test = alg_test_crc32c, .fips_allowed = 1, @@ -2127,6 +2130,9 @@ static const struct alg_test_desc alg_test_descs[] = { } } }, { + .alg = digest_null, + .test = alg_test_null, + }, { .alg = ecb(__aes-aesni), .test = alg_test_null, .fips_allowed = 1, @@ -2237,6 +2243,9 @@ static const struct alg_test_desc alg_test_descs[] = { } } }, { + .alg = ecb(cipher_null), + .test = alg_test_null, + }, { .alg = ecb(des), .test = alg_test_skcipher, .fips_allowed = 1, -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] crypto: gcm - fix assumption that assoc has one segment
rfc4543(gcm(*)) code for GMAC assumes that assoc scatterlist always contains only one segment and only makes use of this first segment. However ipsec passes assoc with three segments when using 'extended sequence number' thus in this case rfc4543(gcm(*)) fails to function correctly. Patch fixes this issue. Reported-by: Chaoxing Lin chaoxing@ultra-3eti.com Tested-by: Chaoxing Lin chaoxing@ultra-3eti.com Cc: sta...@vger.kernel.org Signed-off-by: Jussi Kivilinna jussi.kivili...@iki.fi --- crypto/gcm.c | 17 ++--- 1 file changed, 14 insertions(+), 3 deletions(-) diff --git a/crypto/gcm.c b/crypto/gcm.c index 137ad1e..13ccbda 100644 --- a/crypto/gcm.c +++ b/crypto/gcm.c @@ -44,6 +44,7 @@ struct crypto_rfc4543_ctx { struct crypto_rfc4543_req_ctx { u8 auth_tag[16]; + u8 assocbuf[32]; struct scatterlist cipher[1]; struct scatterlist payload[2]; struct scatterlist assoc[2]; @@ -1133,9 +1134,19 @@ static struct aead_request *crypto_rfc4543_crypt(struct aead_request *req, scatterwalk_crypto_chain(payload, dst, vdst == req-iv + 8, 2); assoclen += 8 + req-cryptlen - (enc ? 0 : authsize); - sg_init_table(assoc, 2); - sg_set_page(assoc, sg_page(req-assoc), req-assoc-length, - req-assoc-offset); + if (req-assoc-length == req-assoclen) { + sg_init_table(assoc, 2); + sg_set_page(assoc, sg_page(req-assoc), req-assoc-length, + req-assoc-offset); + } else { + BUG_ON(req-assoclen sizeof(rctx-assocbuf)); + + scatterwalk_map_and_copy(rctx-assocbuf, req-assoc, 0, +req-assoclen, 0); + + sg_init_table(assoc, 2); + sg_set_buf(assoc, rctx-assocbuf, req-assoclen); + } scatterwalk_crypto_chain(assoc, payload, 0, 2); aead_request_set_tfm(subreq, ctx-child); -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: potential bug in GMAC implementation. not work in ESN mode
On 25.03.2013 18:12, Chaoxing Lin wrote: 2nd ping Nobody is maintaining crypto/gcm.c? -Original Message- From: Chaoxing Lin Sent: Friday, March 08, 2013 11:38 AM To: 'linux-crypto@vger.kernel.org' Subject: potential bug in GMAC implementation. not work in ESN mode I was testing ipsec with GMAC and found that the rfc4543 GMAC implementation in kernel software crypto work in esp=aes256gmac-noesn! mode. It does not work in in esp=aes256gmac-esn! mode. The tunnel was established but no data traffic is possible. Looking at source code, I found this piece of code is suspicious. Line 1146~1147 tries to put req-assoc to assoc[1]. But I think this way only works when req-assoc has only one segment. In ESN mode, req-assoc contains 3 segments (SPI, SN-hi, SN-low). Line 1146~1147 will only attach SPI segment(with total length) in assoc. Please let me know whether I understand it right. Your analysis seems correct. Does attached the patch fix the problem? (I've only compile tested it.) -Jussi Thanks, Chaoxing Source from kernel 3.8.2 path: root/crypto/gcm.c 1136: /* construct the aad */ 1137: dstp = sg_page(dst); vdst = PageHighMem(dstp) ? NULL : page_address(dstp) + dst-offset; sg_init_table(payload, 2); sg_set_buf(payload, req-iv, 8); scatterwalk_crypto_chain(payload, dst, vdst == req-iv + 8, 2); assoclen += 8 + req-cryptlen - (enc ? 0 : authsize); sg_init_table(assoc, 2); 1146: sg_set_page(assoc, sg_page(req-assoc), req-assoc-length, 1147: req-assoc-offset); scatterwalk_crypto_chain(assoc, payload, 0, 2); aead_request_set_tfm(subreq, ctx-child); aead_request_set_callback(subreq, req-base.flags, req-base.complete, req-base.data); aead_request_set_crypt(subreq, cipher, cipher, enc ? 0 : authsize, iv); 1154: aead_request_set_assoc(subreq, assoc, assoclen); -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html crypto: gcm - fix assumption that assoc has one segment From: Jussi Kivilinna jussi.kivili...@iki.fi Signed-off-by: Jussi Kivilinna jussi.kivili...@iki.fi --- crypto/gcm.c| 17 ++--- crypto/tcrypt.c |4 2 files changed, 18 insertions(+), 3 deletions(-) diff --git a/crypto/gcm.c b/crypto/gcm.c index 137ad1e..13ccbda 100644 --- a/crypto/gcm.c +++ b/crypto/gcm.c @@ -44,6 +44,7 @@ struct crypto_rfc4543_ctx { struct crypto_rfc4543_req_ctx { u8 auth_tag[16]; + u8 assocbuf[32]; struct scatterlist cipher[1]; struct scatterlist payload[2]; struct scatterlist assoc[2]; @@ -1133,9 +1134,19 @@ static struct aead_request *crypto_rfc4543_crypt(struct aead_request *req, scatterwalk_crypto_chain(payload, dst, vdst == req-iv + 8, 2); assoclen += 8 + req-cryptlen - (enc ? 0 : authsize); - sg_init_table(assoc, 2); - sg_set_page(assoc, sg_page(req-assoc), req-assoc-length, - req-assoc-offset); + if (req-assoc-length == req-assoclen) { + sg_init_table(assoc, 2); + sg_set_page(assoc, sg_page(req-assoc), req-assoc-length, + req-assoc-offset); + } else { + BUG_ON(req-assoclen sizeof(rctx-assocbuf)); + + scatterwalk_map_and_copy(rctx-assocbuf, req-assoc, 0, + req-assoclen, 0); + + sg_init_table(assoc, 2); + sg_set_buf(assoc, rctx-assocbuf, req-assoclen); + } scatterwalk_crypto_chain(assoc, payload, 0, 2); aead_request_set_tfm(subreq, ctx-child); diff --git a/crypto/tcrypt.c b/crypto/tcrypt.c index 87ef7d6..6b911ef 100644 --- a/crypto/tcrypt.c +++ b/crypto/tcrypt.c @@ -1225,6 +1225,10 @@ static int do_test(int m) ret += tcrypt_test(rfc4106(gcm(aes))); break; + case 152: + ret += tcrypt_test(rfc4543(gcm(aes))); + break; + case 200: test_cipher_speed(ecb(aes), ENCRYPT, sec, NULL, 0, speed_template_16_24_32); signature.asc Description: OpenPGP digital signature
[PATCH 2/2] crypto: cast_common - change email address for Jussi Kivilinna
Change my email address from @mbnet.fi to @iki.fi in crypto/* Signed-off-by: Jussi Kivilinna jussi.kivili...@iki.fi --- crypto/cast_common.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/crypto/cast_common.c b/crypto/cast_common.c index a15f523..8924925 100644 --- a/crypto/cast_common.c +++ b/crypto/cast_common.c @@ -3,7 +3,7 @@ * * Copyright © 1998, 1999, 2000, 2001 Free Software Foundation, Inc. * Copyright © 2003 Kartikey Mahendra Bhatt kartik...@hotmail.com - * Copyright © 2012 Jussi Kivilinna jussi.kivili...@mbnet.fi + * Copyright © 2012 Jussi Kivilinna jussi.kivili...@iki.fi * * This program is free software; you can redistribute it and/or modify it * under the terms of GNU General Public License as published by the Free -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/2] crypto: x86 - change email address for Jussi Kivilinna
Change my email address from @mbnet.fi to @iki.fi in arch/x86/crypto/*. Signed-off-by: Jussi Kivilinna jussi.kivili...@iki.fi --- arch/x86/crypto/ablk_helper.c|2 +- arch/x86/crypto/blowfish-x86_64-asm_64.S |2 +- arch/x86/crypto/blowfish_glue.c |2 +- arch/x86/crypto/camellia-aesni-avx-asm_64.S |2 +- arch/x86/crypto/camellia-x86_64-asm_64.S |2 +- arch/x86/crypto/camellia_aesni_avx_glue.c|2 +- arch/x86/crypto/camellia_glue.c |2 +- arch/x86/crypto/cast5-avx-x86_64-asm_64.S|2 +- arch/x86/crypto/cast6-avx-x86_64-asm_64.S|2 +- arch/x86/crypto/glue_helper-asm-avx.S|2 +- arch/x86/crypto/glue_helper.c|2 +- arch/x86/crypto/serpent-avx-x86_64-asm_64.S |2 +- arch/x86/crypto/serpent-sse2-i586-asm_32.S |2 +- arch/x86/crypto/serpent-sse2-x86_64-asm_64.S |2 +- arch/x86/crypto/serpent_avx_glue.c |2 +- arch/x86/crypto/serpent_sse2_glue.c |2 +- arch/x86/crypto/twofish-avx-x86_64-asm_64.S |2 +- arch/x86/crypto/twofish-x86_64-asm_64-3way.S |2 +- arch/x86/crypto/twofish_glue_3way.c |2 +- 19 files changed, 19 insertions(+), 19 deletions(-) diff --git a/arch/x86/crypto/ablk_helper.c b/arch/x86/crypto/ablk_helper.c index 43282fe..08d4186 100644 --- a/arch/x86/crypto/ablk_helper.c +++ b/arch/x86/crypto/ablk_helper.c @@ -1,7 +1,7 @@ /* * Shared async block cipher helpers * - * Copyright (c) 2012 Jussi Kivilinna jussi.kivili...@mbnet.fi + * Copyright (c) 2012 Jussi Kivilinna jussi.kivili...@iki.fi * * Based on aesni-intel_glue.c by: * Copyright (C) 2008, Intel Corp. diff --git a/arch/x86/crypto/blowfish-x86_64-asm_64.S b/arch/x86/crypto/blowfish-x86_64-asm_64.S index 246c670..4e97088 100644 --- a/arch/x86/crypto/blowfish-x86_64-asm_64.S +++ b/arch/x86/crypto/blowfish-x86_64-asm_64.S @@ -1,7 +1,7 @@ /* * Blowfish Cipher Algorithm (x86_64) * - * Copyright (C) 2011 Jussi Kivilinna jussi.kivili...@mbnet.fi + * Copyright (C) 2011 Jussi Kivilinna jussi.kivili...@iki.fi * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by diff --git a/arch/x86/crypto/blowfish_glue.c b/arch/x86/crypto/blowfish_glue.c index 50ec333..eb1e2b5 100644 --- a/arch/x86/crypto/blowfish_glue.c +++ b/arch/x86/crypto/blowfish_glue.c @@ -1,7 +1,7 @@ /* * Glue Code for assembler optimized version of Blowfish * - * Copyright (c) 2011 Jussi Kivilinna jussi.kivili...@mbnet.fi + * Copyright (c) 2011 Jussi Kivilinna jussi.kivili...@iki.fi * * CBC ECB parts based on code (crypto/cbc.c,ecb.c) by: * Copyright (c) 2006 Herbert Xu herb...@gondor.apana.org.au diff --git a/arch/x86/crypto/camellia-aesni-avx-asm_64.S b/arch/x86/crypto/camellia-aesni-avx-asm_64.S index cfc1634..879a736 100644 --- a/arch/x86/crypto/camellia-aesni-avx-asm_64.S +++ b/arch/x86/crypto/camellia-aesni-avx-asm_64.S @@ -1,7 +1,7 @@ /* * x86_64/AVX/AES-NI assembler implementation of Camellia * - * Copyright © 2012 Jussi Kivilinna jussi.kivili...@mbnet.fi + * Copyright © 2012 Jussi Kivilinna jussi.kivili...@iki.fi * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by diff --git a/arch/x86/crypto/camellia-x86_64-asm_64.S b/arch/x86/crypto/camellia-x86_64-asm_64.S index 310319c..f2b52f9 100644 --- a/arch/x86/crypto/camellia-x86_64-asm_64.S +++ b/arch/x86/crypto/camellia-x86_64-asm_64.S @@ -1,7 +1,7 @@ /* * Camellia Cipher Algorithm (x86_64) * - * Copyright (C) 2012 Jussi Kivilinna jussi.kivili...@mbnet.fi + * Copyright (C) 2012 Jussi Kivilinna jussi.kivili...@iki.fi * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by diff --git a/arch/x86/crypto/camellia_aesni_avx_glue.c b/arch/x86/crypto/camellia_aesni_avx_glue.c index 96cbb60..321e9f4 100644 --- a/arch/x86/crypto/camellia_aesni_avx_glue.c +++ b/arch/x86/crypto/camellia_aesni_avx_glue.c @@ -1,7 +1,7 @@ /* * Glue Code for x86_64/AVX/AES-NI assembler optimized version of Camellia * - * Copyright © 2012 Jussi Kivilinna jussi.kivili...@mbnet.fi + * Copyright © 2012 Jussi Kivilinna jussi.kivili...@iki.fi * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by diff --git a/arch/x86/crypto/camellia_glue.c b/arch/x86/crypto/camellia_glue.c index 5cb86cc..3de9391 100644 --- a/arch/x86/crypto/camellia_glue.c +++ b/arch/x86/crypto/camellia_glue.c @@ -1,7 +1,7 @@ /* * Glue Code for assembler optimized version of Camellia * - * Copyright (c) 2012 Jussi Kivilinna jussi.kivili...@mbnet.fi + * Copyright (c) 2012 Jussi Kivilinna jussi.kivili...@iki.fi * * Camellia parts based on code by: * Copyright (C) 2006 NTT (Nippon Telegraph
Re: [PATCH] CMAC support for CryptoAPI, fixed patch issues, indent, and testmgr build issues
Quoting YOSHIFUJI Hideaki yoshf...@linux-ipv6.org: YOSHIFUJI Hideaki wrote: Jussi Kivilinna wrote: diff --git a/include/uapi/linux/pfkeyv2.h b/include/uapi/linux/pfkeyv2.h index 0b80c80..d61898e 100644 --- a/include/uapi/linux/pfkeyv2.h +++ b/include/uapi/linux/pfkeyv2.h @@ -296,6 +296,7 @@ struct sadb_x_kmaddress { #define SADB_X_AALG_SHA2_512HMAC7 #define SADB_X_AALG_RIPEMD160HMAC8 #define SADB_X_AALG_AES_XCBC_MAC9 +#define SADB_X_AALG_AES_CMAC_MAC10 #define SADB_X_AALG_NULL251/* kame */ #define SADB_AALG_MAX251 Should these values be based on IANA assigned IPSEC AH transform identifiers? https://www.iana.org/assignments/isakmp-registry/isakmp-registry.xml#isakmp-registry-6 There is no CMAC entry apparently ... despite the fact that CMAC is a proposed RFC standard for IPsec. It might be safer to move that to 14 since it's currently unassigned and then go through whatever channels are required to allocate it. Mostly this affects key setting. So this means my patch would break AH_RSA setkey calls (which the kernel doesn't support anyways). Problem seems to be that PFKEYv2 does not quite work with IKEv2, and XFRM API should be used instead. There is new numbers assigned for IKEv2: https://www.iana.org/assignments/ikev2-parameters/ikev2-parameters.xml#ikev2-parameters-7 For new SADB_X_AALG_*, I'd think you should use value from Reserved for private use range. Maybe 250? We can choose any value unless we do not break existing binaries. When IKE used, the daemon is responsible for translation. I meant, we can choose any values if we do not break ... Ok, so giving '10' to AES-CMAC is fine after all? And if I'd want to add Camellia-CTR and Camellia-CCM support, I can choose next free numbers from SADB_X_EALG_*? -Jussi -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] CMAC support for CryptoAPI, fixed patch issues, indent, and testmgr build issues
Quoting Steffen Klassert steffen.klass...@secunet.com: On Wed, Jan 23, 2013 at 05:35:10PM +0200, Jussi Kivilinna wrote: Problem seems to be that PFKEYv2 does not quite work with IKEv2, and XFRM API should be used instead. There is new numbers assigned for IKEv2: https://www.iana.org/assignments/ikev2-parameters/ikev2-parameters.xml#ikev2-parameters-7 For new SADB_X_AALG_*, I'd think you should use value from Reserved for private use range. Maybe 250? This would be an option, but we have just a few slots for private algorithms. But maybe better solution might be to not make AES-CMAC (or other new algorithms) available throught PFKEY API at all, just XFRM? It is probably the best to make new algorithms unavailable for pfkey as long as they have no official ikev1 iana transform identifier. But how to do that? Perhaps we can assign SADB_X_AALG_NOPFKEY to the private value 255 and return -EINVAL if pfkey tries to register such an algorithm. The netlink interface does not use these identifiers, everything should work as expected. So it should be possible to use these algoritms with iproute2 and the most modern ike deamons. Maybe it would be cleaner to not mess with pfkeyv2.h at all, but instead mark algorithms that do not support pfkey with flag. See patch below. Then I started looking up if sadb_alg_id is being used somewhere outside pfkey. Seems that its value is just being copied around.. but at http://lxr.linux.no/linux+v3.7/net/xfrm/xfrm_policy.c#L1991; it's used as bit-index. So do larger values than 31 break some stuff? Can multiple algorithms have same sadb_alg_id value? Also in af_key.c, sadb_alg_id being used as bit-index. -Jussi --- ONLY COMPILE TESTED! --- include/net/xfrm.h |5 +++-- net/key/af_key.c | 39 +++ net/xfrm/xfrm_algo.c | 12 ++-- 3 files changed, 40 insertions(+), 16 deletions(-) diff --git a/include/net/xfrm.h b/include/net/xfrm.h index 421f764..5d5eec2 100644 --- a/include/net/xfrm.h +++ b/include/net/xfrm.h @@ -1320,6 +1320,7 @@ struct xfrm_algo_desc { char *name; char *compat; u8 available:1; + u8 sadb_disabled:1; union { struct xfrm_algo_aead_info aead; struct xfrm_algo_auth_info auth; @@ -1561,8 +1562,8 @@ extern void xfrm_input_init(void); extern int xfrm_parse_spi(struct sk_buff *skb, u8 nexthdr, __be32 *spi, __be32 *seq); extern void xfrm_probe_algs(void); -extern int xfrm_count_auth_supported(void); -extern int xfrm_count_enc_supported(void); +extern int xfrm_count_sadb_auth_supported(void); +extern int xfrm_count_sadb_enc_supported(void); extern struct xfrm_algo_desc *xfrm_aalg_get_byidx(unsigned int idx); extern struct xfrm_algo_desc *xfrm_ealg_get_byidx(unsigned int idx); extern struct xfrm_algo_desc *xfrm_aalg_get_byid(int alg_id); diff --git a/net/key/af_key.c b/net/key/af_key.c index 5b426a6..307cf1d 100644 --- a/net/key/af_key.c +++ b/net/key/af_key.c @@ -816,18 +816,21 @@ static struct sk_buff *__pfkey_xfrm_state2msg(const struct xfrm_state *x, sa-sadb_sa_auth = 0; if (x-aalg) { struct xfrm_algo_desc *a = xfrm_aalg_get_byname(x-aalg-alg_name, 0); - sa-sadb_sa_auth = a ? a-desc.sadb_alg_id : 0; + sa-sadb_sa_auth = (a !a-sadb_disabled) ? + a-desc.sadb_alg_id : 0; } sa-sadb_sa_encrypt = 0; BUG_ON(x-ealg x-calg); if (x-ealg) { struct xfrm_algo_desc *a = xfrm_ealg_get_byname(x-ealg-alg_name, 0); - sa-sadb_sa_encrypt = a ? a-desc.sadb_alg_id : 0; + sa-sadb_sa_encrypt = (a !a-sadb_disabled) ? + a-desc.sadb_alg_id : 0; } /* KAME compatible: sadb_sa_encrypt is overloaded with calg id */ if (x-calg) { struct xfrm_algo_desc *a = xfrm_calg_get_byname(x-calg-alg_name, 0); - sa-sadb_sa_encrypt = a ? a-desc.sadb_alg_id : 0; + sa-sadb_sa_encrypt = (a !a-sadb_disabled) ? + a-desc.sadb_alg_id : 0; } sa-sadb_sa_flags = 0; @@ -1138,7 +1141,7 @@ static struct xfrm_state * pfkey_msg2xfrm_state(struct net *net, if (sa-sadb_sa_auth) { int keysize = 0; struct xfrm_algo_desc *a = xfrm_aalg_get_byid(sa-sadb_sa_auth); - if (!a) { + if (!a || a-sadb_disabled) { err = -ENOSYS; goto out; } @@ -1160,7 +1163,7 @@ static struct xfrm_state * pfkey_msg2xfrm_state(struct net *net, if (sa-sadb_sa_encrypt) { if (hdr-sadb_msg_satype == SADB_X_SATYPE_IPCOMP) { struct xfrm_algo_desc *a = xfrm_calg_get_byid(sa-sadb_sa_encrypt); - if (!a) { + if (!a || a-sadb_disabled
Re: [PATCH] CMAC support for CryptoAPI, fixed patch issues, indent, and testmgr build issues
Quoting Tom St Denis tstde...@elliptictech.com: - Original Message - From: Jussi Kivilinna jussi.kivili...@mbnet.fi To: Tom St Denis tstde...@elliptictech.com Cc: linux-ker...@vger.kernel.org, Herbert Xu herb...@gondor.apana.org.au, David Miller da...@davemloft.net, linux-crypto@vger.kernel.org, Steffen Klassert steffen.klass...@secunet.com, net...@vger.kernel.org Sent: Wednesday, 23 January, 2013 9:36:44 AM Subject: Re: [PATCH] CMAC support for CryptoAPI, fixed patch issues, indent, and testmgr build issues Quoting Tom St Denis tstde...@elliptictech.com: Hey all, Here's an updated patch which addresses a couple of build issues and coding style complaints. I still can't get it to run via testmgr I get [ 162.407807] alg: No test for cmac(aes) (cmac(aes-generic)) Despite the fact I have an entry for cmac(aes) (much like xcbc(aes)...). Here's the patch to bring 3.8-rc4 up with CMAC ... Signed-off-by: Tom St Denis tstde...@elliptictech.com snip diff --git a/include/uapi/linux/pfkeyv2.h b/include/uapi/linux/pfkeyv2.h index 0b80c80..d61898e 100644 --- a/include/uapi/linux/pfkeyv2.h +++ b/include/uapi/linux/pfkeyv2.h @@ -296,6 +296,7 @@ struct sadb_x_kmaddress { #define SADB_X_AALG_SHA2_512HMAC 7 #define SADB_X_AALG_RIPEMD160HMAC 8 #define SADB_X_AALG_AES_XCBC_MAC 9 +#define SADB_X_AALG_AES_CMAC_MAC 10 #define SADB_X_AALG_NULL 251 /* kame */ #define SADB_AALG_MAX 251 Should these values be based on IANA assigned IPSEC AH transform identifiers? https://www.iana.org/assignments/isakmp-registry/isakmp-registry.xml#isakmp-registry-6 There is no CMAC entry apparently ... despite the fact that CMAC is a proposed RFC standard for IPsec. It might be safer to move that to 14 since it's currently unassigned and then go through whatever channels are required to allocate it. Mostly this affects key setting. So this means my patch would break AH_RSA setkey calls (which the kernel doesn't support anyways). Problem seems to be that PFKEYv2 does not quite work with IKEv2, and XFRM API should be used instead. There is new numbers assigned for IKEv2: https://www.iana.org/assignments/ikev2-parameters/ikev2-parameters.xml#ikev2-parameters-7 For new SADB_X_AALG_*, I'd think you should use value from Reserved for private use range. Maybe 250? But maybe better solution might be to not make AES-CMAC (or other new algorithms) available throught PFKEY API at all, just XFRM? -Jussi -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] crypto: testmgr - add test vector for fcrypt
fcrypt is used only as pcbc(fcrypt), but testmgr does not know this. Use the zero key, zero plaintext pcbc(fcrypt) test vector for testing plain 'fcrypt' to hide no test for fcrypt warnings. Signed-off-by: Jussi Kivilinna jussi.kivili...@mbnet.fi --- crypto/testmgr.c | 15 +++ 1 file changed, 15 insertions(+) diff --git a/crypto/testmgr.c b/crypto/testmgr.c index edf4a08..efd8b20 100644 --- a/crypto/testmgr.c +++ b/crypto/testmgr.c @@ -2269,6 +2269,21 @@ static const struct alg_test_desc alg_test_descs[] = { } } }, { + .alg = ecb(fcrypt), + .test = alg_test_skcipher, + .suite = { + .cipher = { + .enc = { + .vecs = fcrypt_pcbc_enc_tv_template, + .count = 1 + }, + .dec = { + .vecs = fcrypt_pcbc_dec_tv_template, + .count = 1 + } + } + } + }, { .alg = ecb(khazad), .test = alg_test_skcipher, .suite = { -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 01/12] crypto: x86/aes - assembler clean-ups: use ENTRY/ENDPROC, localize jump targets
Signed-off-by: Jussi Kivilinna jussi.kivili...@mbnet.fi --- arch/x86/crypto/aes-i586-asm_32.S | 15 +-- arch/x86/crypto/aes-x86_64-asm_64.S | 30 +++--- 2 files changed, 20 insertions(+), 25 deletions(-) diff --git a/arch/x86/crypto/aes-i586-asm_32.S b/arch/x86/crypto/aes-i586-asm_32.S index b949ec2..2849dbc 100644 --- a/arch/x86/crypto/aes-i586-asm_32.S +++ b/arch/x86/crypto/aes-i586-asm_32.S @@ -36,6 +36,7 @@ .file aes-i586-asm.S .text +#include linux/linkage.h #include asm/asm-offsets.h #define tlen 1024 // length of each of 4 'xor' arrays (256 32-bit words) @@ -219,14 +220,10 @@ // AES (Rijndael) Encryption Subroutine /* void aes_enc_blk(struct crypto_aes_ctx *ctx, u8 *out_blk, const u8 *in_blk) */ -.global aes_enc_blk - .extern crypto_ft_tab .extern crypto_fl_tab -.align 4 - -aes_enc_blk: +ENTRY(aes_enc_blk) push%ebp mov ctx(%esp),%ebp @@ -290,18 +287,15 @@ aes_enc_blk: mov %r0,(%ebp) pop %ebp ret +ENDPROC(aes_enc_blk) // AES (Rijndael) Decryption Subroutine /* void aes_dec_blk(struct crypto_aes_ctx *ctx, u8 *out_blk, const u8 *in_blk) */ -.global aes_dec_blk - .extern crypto_it_tab .extern crypto_il_tab -.align 4 - -aes_dec_blk: +ENTRY(aes_dec_blk) push%ebp mov ctx(%esp),%ebp @@ -365,3 +359,4 @@ aes_dec_blk: mov %r0,(%ebp) pop %ebp ret +ENDPROC(aes_dec_blk) diff --git a/arch/x86/crypto/aes-x86_64-asm_64.S b/arch/x86/crypto/aes-x86_64-asm_64.S index 5b577d5..9105655 100644 --- a/arch/x86/crypto/aes-x86_64-asm_64.S +++ b/arch/x86/crypto/aes-x86_64-asm_64.S @@ -15,6 +15,7 @@ .text +#include linux/linkage.h #include asm/asm-offsets.h #define R1 %rax @@ -49,10 +50,8 @@ #define R11%r11 #define prologue(FUNC,KEY,B128,B192,r1,r2,r3,r4,r5,r6,r7,r8,r9,r10,r11) \ - .global FUNC; \ - .type FUNC,@function; \ - .align 8; \ -FUNC: movqr1,r2; \ + ENTRY(FUNC);\ + movqr1,r2; \ movqr3,r4; \ leaqKEY+48(r8),r9; \ movqr10,r11;\ @@ -71,14 +70,15 @@ FUNC: movqr1,r2; \ je B192; \ leaq32(r9),r9; -#define epilogue(r1,r2,r3,r4,r5,r6,r7,r8,r9) \ +#define epilogue(FUNC,r1,r2,r3,r4,r5,r6,r7,r8,r9) \ movqr1,r2; \ movqr3,r4; \ movlr5 ## E,(r9); \ movlr6 ## E,4(r9); \ movlr7 ## E,8(r9); \ movlr8 ## E,12(r9); \ - ret; + ret;\ + ENDPROC(FUNC); #define round(TAB,OFFSET,r1,r2,r3,r4,r5,r6,r7,r8,ra,rb,rc,rd) \ movzbl r2 ## H,r5 ## E;\ @@ -133,7 +133,7 @@ FUNC: movqr1,r2; \ #define entry(FUNC,KEY,B128,B192) \ prologue(FUNC,KEY,B128,B192,R2,R8,R7,R9,R1,R3,R4,R6,R10,R5,R11) -#define return epilogue(R8,R2,R9,R7,R5,R6,R3,R4,R11) +#define return(FUNC) epilogue(FUNC,R8,R2,R9,R7,R5,R6,R3,R4,R11) #define encrypt_round(TAB,OFFSET) \ round(TAB,OFFSET,R1,R2,R3,R4,R5,R6,R7,R10,R5,R6,R3,R4) \ @@ -151,12 +151,12 @@ FUNC: movqr1,r2; \ /* void aes_enc_blk(stuct crypto_tfm *tfm, u8 *out, const u8 *in) */ - entry(aes_enc_blk,0,enc128,enc192) + entry(aes_enc_blk,0,.Le128,.Le192) encrypt_round(crypto_ft_tab,-96) encrypt_round(crypto_ft_tab,-80) -enc192:encrypt_round(crypto_ft_tab,-64) +.Le192:encrypt_round(crypto_ft_tab,-64) encrypt_round(crypto_ft_tab,-48) -enc128:encrypt_round(crypto_ft_tab,-32) +.Le128:encrypt_round(crypto_ft_tab,-32) encrypt_round(crypto_ft_tab,-16) encrypt_round(crypto_ft_tab, 0) encrypt_round(crypto_ft_tab, 16) @@ -166,16 +166,16 @@ enc128: encrypt_round(crypto_ft_tab,-32) encrypt_round(crypto_ft_tab, 80) encrypt_round(crypto_ft_tab, 96) encrypt_final(crypto_fl_tab,112) - return + return(aes_enc_blk) /* void aes_dec_blk(struct crypto_tfm *tfm, u8 *out, const u8 *in) */ - entry(aes_dec_blk,240,dec128,dec192) + entry(aes_dec_blk,240,.Ld128,.Ld192) decrypt_round(crypto_it_tab,-96) decrypt_round(crypto_it_tab,-80) -dec192:decrypt_round(crypto_it_tab,-64) +.Ld192:decrypt_round(crypto_it_tab,-64) decrypt_round(crypto_it_tab,-48) -dec128:decrypt_round(crypto_it_tab,-32) +.Ld128:decrypt_round(crypto_it_tab,-32) decrypt_round(crypto_it_tab,-16) decrypt_round(crypto_it_tab, 0) decrypt_round(crypto_it_tab, 16) @@ -185,4 +185,4 @@ dec128: decrypt_round(crypto_it_tab,-32) decrypt_round(crypto_it_tab, 80) decrypt_round
[PATCH 07/12] crypto: x86/crc32c - assembler clean-up: use ENTRY/ENDPROC
Signed-off-by: Jussi Kivilinna jussi.kivili...@mbnet.fi --- arch/x86/crypto/crc32c-pcl-intel-asm_64.S |8 ++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/arch/x86/crypto/crc32c-pcl-intel-asm_64.S b/arch/x86/crypto/crc32c-pcl-intel-asm_64.S index 93c6d39..cf1a7ec 100644 --- a/arch/x86/crypto/crc32c-pcl-intel-asm_64.S +++ b/arch/x86/crypto/crc32c-pcl-intel-asm_64.S @@ -42,6 +42,8 @@ * SOFTWARE. */ +#include linux/linkage.h + ## ISCSI CRC 32 Implementation with crc32 and pclmulqdq Instruction .macro LABEL prefix n @@ -68,8 +70,7 @@ # unsigned int crc_pcl(u8 *buffer, int len, unsigned int crc_init); -.global crc_pcl -crc_pcl: +ENTRY(crc_pcl) #definebufp%rdi #definebufp_dw %edi #definebufp_w %di @@ -323,6 +324,9 @@ JMPTBL_ENTRY %i .noaltmacro i=i+1 .endr + +ENDPROC(crc_pcl) + ## PCLMULQDQ tables ## Table is 128 entries x 2 quad words each -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH][RFC] crypto: tcrypt - Ahash tests changed to run in parallel.
Quoting Garg Vakul-B16394 b16...@freescale.com: Does not this change make tcrypt give inconsistent results? Based on kernel scheduling of threads, this change can make tcrypt give varying results in different runs. For consistent results, we can use existing synchronous mode crypto sessions. But one cannot get consistent results for asynchronous software implementations after this patch. -Jussi -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH][RFC] crypto: tcrypt - Ahash tests changed to run in parallel.
Quoting Vakul Garg va...@freescale.com: This allows to test run multiple parallel crypto ahash contexts. Each of the test vector under the ahash speed test template is started under a separate kthread. Why you want to do this? Does not this change make tcrypt give inconsistent results? -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC PATCH 0/3] Make rfc3686 template work with asynchronous block ciphers
I'm not sure how this patchset should be dealt with (should 1st patch go through different tree than 2nd and 3rd?), so therefore it's RFC. Second patch makes rfc3686 template work with asynchronous block ciphers and third patch changes aesni-intel to use this template. First patch fixed problem in xfrm_algo found with help of 2nd and 3rd patches and without 1st patch 2nd patch breaks aes-ctr with IPSEC. --- Jussi Kivilinna (3): xfrm_algo: probe asynchronous block ciphers instead of synchronous crypto: ctr - make rfc3686 asynchronous block cipher crypto: aesni-intel - remove rfc3686(ctr(aes)), utilize rfc3686 from ctr-module instead arch/x86/crypto/aesni-intel_glue.c | 37 crypto/ctr.c | 173 +++- crypto/tcrypt.c|4 + crypto/tcrypt.h|1 net/xfrm/xfrm_algo.c |3 - 5 files changed, 116 insertions(+), 102 deletions(-) -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC PATCH 1/3] xfrm_algo: probe asynchronous block ciphers instead of synchronous
IPSEC uses block ciphers asynchronous, but probes only for synchronous block ciphers and makes ealg entries only available if synchronous block cipher is found. So with setup, where hardware crypto driver registers asynchronous block ciphers and software crypto module is not build, ealg is not marked as being available. Use crypto_has_ablkcipher instead and remove ASYNC mask. Signed-off-by: Jussi Kivilinna jussi.kivili...@mbnet.fi --- net/xfrm/xfrm_algo.c |3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/net/xfrm/xfrm_algo.c b/net/xfrm/xfrm_algo.c index 4ce2d93..f9a5495 100644 --- a/net/xfrm/xfrm_algo.c +++ b/net/xfrm/xfrm_algo.c @@ -700,8 +700,7 @@ void xfrm_probe_algs(void) } for (i = 0; i ealg_entries(); i++) { - status = crypto_has_blkcipher(ealg_list[i].name, 0, - CRYPTO_ALG_ASYNC); + status = crypto_has_ablkcipher(ealg_list[i].name, 0, 0); if (ealg_list[i].available != status) ealg_list[i].available = status; } -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC PATCH 2/3] crypto: ctr - make rfc3686 asynchronous block cipher
Some hardware crypto drivers register asynchronous ctr(aes), which is left unused in IPSEC because rfc3686 template only supports synchronous block ciphers. Some other drivers register rfc3686(ctr(aes)) to workaround this limitation but not all. This patch changes rfc3686 to use asynchronous block ciphers, to allow async ctr(aes) algorithms to be utilized automatically by IPSEC. Signed-off-by: Jussi Kivilinna jussi.kivili...@mbnet.fi --- crypto/ctr.c| 173 +++ crypto/tcrypt.c |4 + crypto/tcrypt.h |1 3 files changed, 115 insertions(+), 63 deletions(-) diff --git a/crypto/ctr.c b/crypto/ctr.c index 4ca7222..1f2997c 100644 --- a/crypto/ctr.c +++ b/crypto/ctr.c @@ -12,6 +12,7 @@ #include crypto/algapi.h #include crypto/ctr.h +#include crypto/internal/skcipher.h #include linux/err.h #include linux/init.h #include linux/kernel.h @@ -25,10 +26,15 @@ struct crypto_ctr_ctx { }; struct crypto_rfc3686_ctx { - struct crypto_blkcipher *child; + struct crypto_ablkcipher *child; u8 nonce[CTR_RFC3686_NONCE_SIZE]; }; +struct crypto_rfc3686_req_ctx { + u8 iv[CTR_RFC3686_BLOCK_SIZE]; + struct ablkcipher_request subreq CRYPTO_MINALIGN_ATTR; +}; + static int crypto_ctr_setkey(struct crypto_tfm *parent, const u8 *key, unsigned int keylen) { @@ -243,11 +249,11 @@ static struct crypto_template crypto_ctr_tmpl = { .module = THIS_MODULE, }; -static int crypto_rfc3686_setkey(struct crypto_tfm *parent, const u8 *key, -unsigned int keylen) +static int crypto_rfc3686_setkey(struct crypto_ablkcipher *parent, +const u8 *key, unsigned int keylen) { - struct crypto_rfc3686_ctx *ctx = crypto_tfm_ctx(parent); - struct crypto_blkcipher *child = ctx-child; + struct crypto_rfc3686_ctx *ctx = crypto_ablkcipher_ctx(parent); + struct crypto_ablkcipher *child = ctx-child; int err; /* the nonce is stored in bytes at end of key */ @@ -259,59 +265,64 @@ static int crypto_rfc3686_setkey(struct crypto_tfm *parent, const u8 *key, keylen -= CTR_RFC3686_NONCE_SIZE; - crypto_blkcipher_clear_flags(child, CRYPTO_TFM_REQ_MASK); - crypto_blkcipher_set_flags(child, crypto_tfm_get_flags(parent) - CRYPTO_TFM_REQ_MASK); - err = crypto_blkcipher_setkey(child, key, keylen); - crypto_tfm_set_flags(parent, crypto_blkcipher_get_flags(child) -CRYPTO_TFM_RES_MASK); + crypto_ablkcipher_clear_flags(child, CRYPTO_TFM_REQ_MASK); + crypto_ablkcipher_set_flags(child, crypto_ablkcipher_get_flags(parent) + CRYPTO_TFM_REQ_MASK); + err = crypto_ablkcipher_setkey(child, key, keylen); + crypto_ablkcipher_set_flags(parent, crypto_ablkcipher_get_flags(child) + CRYPTO_TFM_RES_MASK); return err; } -static int crypto_rfc3686_crypt(struct blkcipher_desc *desc, - struct scatterlist *dst, - struct scatterlist *src, unsigned int nbytes) +static int crypto_rfc3686_crypt(struct ablkcipher_request *req) { - struct crypto_blkcipher *tfm = desc-tfm; - struct crypto_rfc3686_ctx *ctx = crypto_blkcipher_ctx(tfm); - struct crypto_blkcipher *child = ctx-child; - unsigned long alignmask = crypto_blkcipher_alignmask(tfm); - u8 ivblk[CTR_RFC3686_BLOCK_SIZE + alignmask]; - u8 *iv = PTR_ALIGN(ivblk + 0, alignmask + 1); - u8 *info = desc-info; - int err; + struct crypto_ablkcipher *tfm = crypto_ablkcipher_reqtfm(req); + struct crypto_rfc3686_ctx *ctx = crypto_ablkcipher_ctx(tfm); + struct crypto_ablkcipher *child = ctx-child; + unsigned long align = crypto_ablkcipher_alignmask(tfm); + struct crypto_rfc3686_req_ctx *rctx = + (void *)PTR_ALIGN((u8 *)ablkcipher_request_ctx(req), align + 1); + struct ablkcipher_request *subreq = rctx-subreq; + u8 *iv = rctx-iv; /* set up counter block */ memcpy(iv, ctx-nonce, CTR_RFC3686_NONCE_SIZE); - memcpy(iv + CTR_RFC3686_NONCE_SIZE, info, CTR_RFC3686_IV_SIZE); + memcpy(iv + CTR_RFC3686_NONCE_SIZE, req-info, CTR_RFC3686_IV_SIZE); /* initialize counter portion of counter block */ *(__be32 *)(iv + CTR_RFC3686_NONCE_SIZE + CTR_RFC3686_IV_SIZE) = cpu_to_be32(1); - desc-tfm = child; - desc-info = iv; - err = crypto_blkcipher_encrypt_iv(desc, dst, src, nbytes); - desc-tfm = tfm; - desc-info = info; + ablkcipher_request_set_tfm(subreq, child); + ablkcipher_request_set_callback(subreq, req-base.flags, + req-base.complete, req-base.data); + ablkcipher_request_set_crypt(subreq, req-src, req
Re: Workaround for tcrypt bug?
Quoting Sandra Schlichting littlesandr...@gmail.com: Why you want to workaround this? It's safe to ignore hmac(crc32) warning. Because it stops from proceeding. I would have expected that modprobe tcrypt sec=1 type=1000 would have executed all test cases. Even if I just want to test one [root@amd ~]# modprobe tcrypt sec=2 type=402 ERROR: could not insert 'tcrypt': No such file or directory I get an error. I think you are using wrong module argument, type= instead of mode=. Try 'modprobe tcrypt sec=2 mode=402' instead. -Jussi -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Workaround for tcrypt bug?
Quoting Sandra Schlichting littlesandr...@gmail.com: I think you are using wrong module argument, type= instead of mode=. Try 'modprobe tcrypt sec=2 mode=402' instead. Thanks. I would never have thought of that =) Now it preforms the test, but gives this interesting error: [root@amd ~]# modprobe tcrypt sec=2 mode=402 Message from syslogd@amd at Dec 28 14:01:05 ... kernel:[ 5508.698788] BUG: soft lockup - CPU#0 stuck for 22s! [modprobe:3416] Tcrypt does all work in module init, which can take long time and therefore triggers 'soft lockup' warning. Possible solutions are: 1. Build kernel with CONFIG_LOCKUP_DETECTOR option disabled, 2. Boot kernel with 'nowatchdog' argument, 3. Ignore warning. ERROR: could not insert 'tcrypt': Resource temporarily unavailable Tcrypt fails to load after running tests, that's expected. -Jussi -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Crypto causes panic in scatterwalk_done with large/multiple buffers
Quoting Jorgen Lundman lund...@lundman.net: Appearently this patch only fixed my debug printk loop that used sg_next from scatterlist API instead of scatterwalk_sg_next from scatterwalk API. Sorry for the noise. Thanks for looking at this. I think I am dealing with 2 problems, one is that occasionally my buffers are from vmalloc, and needs to have some logic using vmalloc_to_page(). But I don't know if ciphers should handle that internally, blkcipher.c certainly seems to have several modes, although I do not see how to *set* them. From what I now researched, you must not pass vmalloc'd memory to sg_set_buf() as it internally uses virt_to_page() to get page of buffer address. You most likely need to walk through your vmalloc'd buffer and pass all individual pages to scatterlist with sg_set_page(). Second problem is most likely what you were looking at. It is quite easy to make the crypto code die. For example, if I use ccm(aes) which can take the dst buffer, plus a hmac buffer; cipher = kmalloc( ciphersize, ... hmac = kmalloc( 16, ... sg_set_buf( sg[0], cipher, ciphersize); sg_set_buf( sg[1], hmac, 16); aead_encrypt()... and all is well, but if you shift hmac address away from PAGE boundary, like: hmac = kmalloc( 16 + 32, ... hmac += 32; sg_set_buf( sg[1], hmac, 16); ie, allocate a larger buffer, and put the pointer into the page a bit. And it will die in scatterwalk very often. +32 isnt magical, any non-zero number works. This is strange as crypto subsystem's internal test mechanism uses such offsetted buffers. -Jussi -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Crypto causes panic in scatterwalk_done with large/multiple buffers
Quoting Jussi Kivilinna jussi.kivili...@mbnet.fi: Hello, I managed to reproduce something similiar with small buffers... Does attached patch help in your case? Appearently this patch only fixed my debug printk loop that used sg_next from scatterlist API instead of scatterwalk_sg_next from scatterwalk API. Sorry for the noise. -Jussi -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Crypto causes panic in scatterwalk_done with large/multiple buffers
Hello, I managed to reproduce something similiar with small buffers... Does attached patch help in your case? -Jussi Quoting Jorgen Lundman lund...@lundman.net: I have a situation where I setup scatterlists as: input scatterlist of 1, address c90003627000 len 0x2. output scatterlist of 2, address 0 c90002d45000 len 0x2 address 1 88003b079d98 len 0x000c When I call crypto_aead_encrypt(req); it will die with: kernel: [ 925.151113] BUG: unable to handle kernel paging request at eb04000b5140 kernel: [ 925.151253] IP: [812f4880] scatterwalk_done+0x50/0x60 kernel: [ 925.151325] PGD 0 kernel: [ 925.151381] Oops: [#1] SMP kernel: [ 925.151442] CPU 1 kernel: [ 925.154255] [812f7640] blkcipher_walk_done+0xb0/0x230 kernel: [ 925.154255] [a02e9169] crypto_ctr_crypt+0x129/0x2b0 [ctr] kernel: [ 925.154255] [812fe580] ? crypto_aes_set_key+0x40/0x40 kernel: [ 925.154255] [812f6cbd] async_encrypt+0x3d/0x40 kernel: [ 925.154255] [a0149326] crypto_ccm_encrypt+0x246/0x290 [ccm] kernel: [ 925.154255] [a01633bd] crypto_encrypt+0x26d/0x2d0 What is interesting about that is, if I allocate a linear buffer instead: dst = kmalloc(cryptlen, GFP_KERNEL); // 0x2 + 0x000c sg_init_table(sg, 1 ); sg_set_buf(sg[0], dst, cryptlen); crypto_aead_encrypt(req); will no longer panic. However, when I try to copy the linear buffer back to scatterlist; scatterwalk_map_and_copy(dst, sg, 0, cryptlen, 1); then it will panic there instead. However, if I replace it with the call: sg_copy_from_buffer(sg, sg_nents(sg), dst, cryptlen); everything works! - So, what am I doing wrong that makes scatterwalk_map_and_copy() fail, and sg_copy_from_buffer() work fine? It would be nice if I could fix it, so I did not need to copy to a temporary buffer. Lund -- Jorgen Lundman | lund...@lundman.net Unix Administrator | +81 (0)3 -5456-2687 ext 1017 (work) Shibuya-ku, Tokyo| +81 (0)90-5578-8500 (cell) Japan| +81 (0)3 -3375-1767 (home) -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html crypto: scatterwalk - fix broken scatterlist manipulation From: Jussi Kivilinna jussi.kivili...@mbnet.fi scatterlist_sg_chain() manipulates scatterlist structures directly in wrong way, chaining without marking 'chain' bit 0x01. This can in some cases lead to problems, such as triggering BUG_ON(!sg-length) in scatterwalk_start(). So instead of reinventing wheel, change scatterwalk to use existing functions from scatterlist API. --- include/crypto/scatterwalk.h |8 ++-- 1 file changed, 2 insertions(+), 6 deletions(-) diff --git a/include/crypto/scatterwalk.h b/include/crypto/scatterwalk.h index 3744d2a..d31870c 100644 --- a/include/crypto/scatterwalk.h +++ b/include/crypto/scatterwalk.h @@ -34,16 +34,12 @@ static inline void crypto_yield(u32 flags) static inline void scatterwalk_sg_chain(struct scatterlist *sg1, int num, struct scatterlist *sg2) { - sg_set_page(sg1[num - 1], (void *)sg2, 0, 0); - sg1[num - 1].page_link = ~0x02; + sg_chain(sg1, num, sg2); } static inline struct scatterlist *scatterwalk_sg_next(struct scatterlist *sg) { - if (sg_is_last(sg)) - return NULL; - - return (++sg)-length ? sg : (void *)sg_page(sg); + return sg_next(sg); } static inline void scatterwalk_crypto_chain(struct scatterlist *head,
[PATCH] crypto: cast5/cast6 - move lookup tables to shared module
CAST5 and CAST6 both use same lookup tables, which can be moved shared module 'cast_common'. Signed-off-by: Jussi Kivilinna jussi.kivili...@mbnet.fi --- arch/x86/crypto/cast5-avx-x86_64-asm_64.S | 16 +- arch/x86/crypto/cast6-avx-x86_64-asm_64.S | 16 +- crypto/Kconfig| 10 + crypto/Makefile |1 crypto/cast5_generic.c| 277 crypto/cast6_generic.c| 280 crypto/cast_common.c | 290 + include/crypto/cast5.h|6 - include/crypto/cast6.h|6 - include/crypto/cast_common.h |9 + 10 files changed, 336 insertions(+), 575 deletions(-) create mode 100644 crypto/cast_common.c create mode 100644 include/crypto/cast_common.h diff --git a/arch/x86/crypto/cast5-avx-x86_64-asm_64.S b/arch/x86/crypto/cast5-avx-x86_64-asm_64.S index 12478e4..15b00ac 100644 --- a/arch/x86/crypto/cast5-avx-x86_64-asm_64.S +++ b/arch/x86/crypto/cast5-avx-x86_64-asm_64.S @@ -25,10 +25,10 @@ .file cast5-avx-x86_64-asm_64.S -.extern cast5_s1 -.extern cast5_s2 -.extern cast5_s3 -.extern cast5_s4 +.extern cast_s1 +.extern cast_s2 +.extern cast_s3 +.extern cast_s4 /* structure of crypto context */ #define km 0 @@ -36,10 +36,10 @@ #define rr ((16*4)+16) /* s-boxes */ -#define s1 cast5_s1 -#define s2 cast5_s2 -#define s3 cast5_s3 -#define s4 cast5_s4 +#define s1 cast_s1 +#define s2 cast_s2 +#define s3 cast_s3 +#define s4 cast_s4 /** 16-way AVX cast5 diff --git a/arch/x86/crypto/cast6-avx-x86_64-asm_64.S b/arch/x86/crypto/cast6-avx-x86_64-asm_64.S index 83a5381..2569d0d 100644 --- a/arch/x86/crypto/cast6-avx-x86_64-asm_64.S +++ b/arch/x86/crypto/cast6-avx-x86_64-asm_64.S @@ -27,20 +27,20 @@ .file cast6-avx-x86_64-asm_64.S -.extern cast6_s1 -.extern cast6_s2 -.extern cast6_s3 -.extern cast6_s4 +.extern cast_s1 +.extern cast_s2 +.extern cast_s3 +.extern cast_s4 /* structure of crypto context */ #define km 0 #define kr (12*4*4) /* s-boxes */ -#define s1 cast6_s1 -#define s2 cast6_s2 -#define s3 cast6_s3 -#define s4 cast6_s4 +#define s1 cast_s1 +#define s2 cast_s2 +#define s3 cast_s3 +#define s4 cast_s4 /** 8-way AVX cast6 diff --git a/crypto/Kconfig b/crypto/Kconfig index c226b2c..4641d95 100644 --- a/crypto/Kconfig +++ b/crypto/Kconfig @@ -841,9 +841,16 @@ config CRYPTO_CAMELLIA_SPARC64 See also: https://info.isl.ntt.co.jp/crypt/eng/camellia/index_s.html +config CRYPTO_CAST_COMMON + tristate + help + Common parts of the CAST cipher algorithms shared by the + generic c and the assembler implementations. + config CRYPTO_CAST5 tristate CAST5 (CAST-128) cipher algorithm select CRYPTO_ALGAPI + select CRYPTO_CAST_COMMON help The CAST5 encryption algorithm (synonymous with CAST-128) is described in RFC2144. @@ -854,6 +861,7 @@ config CRYPTO_CAST5_AVX_X86_64 select CRYPTO_ALGAPI select CRYPTO_CRYPTD select CRYPTO_ABLK_HELPER_X86 + select CRYPTO_CAST_COMMON select CRYPTO_CAST5 help The CAST5 encryption algorithm (synonymous with CAST-128) is @@ -865,6 +873,7 @@ config CRYPTO_CAST5_AVX_X86_64 config CRYPTO_CAST6 tristate CAST6 (CAST-256) cipher algorithm select CRYPTO_ALGAPI + select CRYPTO_CAST_COMMON help The CAST6 encryption algorithm (synonymous with CAST-256) is described in RFC2612. @@ -876,6 +885,7 @@ config CRYPTO_CAST6_AVX_X86_64 select CRYPTO_CRYPTD select CRYPTO_ABLK_HELPER_X86 select CRYPTO_GLUE_HELPER_X86 + select CRYPTO_CAST_COMMON select CRYPTO_CAST6 select CRYPTO_LRW select CRYPTO_XTS diff --git a/crypto/Makefile b/crypto/Makefile index 8cf61ff..d59dec7 100644 --- a/crypto/Makefile +++ b/crypto/Makefile @@ -68,6 +68,7 @@ obj-$(CONFIG_CRYPTO_TWOFISH_COMMON) += twofish_common.o obj-$(CONFIG_CRYPTO_SERPENT) += serpent_generic.o obj-$(CONFIG_CRYPTO_AES) += aes_generic.o obj-$(CONFIG_CRYPTO_CAMELLIA) += camellia_generic.o +obj-$(CONFIG_CRYPTO_CAST_COMMON) += cast_common.o obj-$(CONFIG_CRYPTO_CAST5) += cast5_generic.o obj-$(CONFIG_CRYPTO_CAST6) += cast6_generic.o obj-$(CONFIG_CRYPTO_ARC4) += arc4.o diff --git a/crypto/cast5_generic.c b/crypto/cast5_generic.c index bc525db..5558f63 100644 --- a/crypto/cast5_generic.c +++ b/crypto/cast5_generic.c @@ -30,275 +30,6 @@ #include linux/types.h #include crypto/cast5.h - -const u32 cast5_s1[256] = { - 0x30fb40d4, 0x9fa0ff0b, 0x6beccd2f, 0x3f258c7a, 0x1e213f2f, - 0x9c004dd3, 0x6003e540, 0xcf9fc949
[PATCH 1/2] crypto: testmgr - remove fips_allowed flag from camellia-aesni null-tests
Remove incorrect fips_allowed from camellia null-test entries. Caused by incorrect copy-paste of aes-aesni null-tests into camellia-aesni null-tests. Signed-off-by: Jussi Kivilinna jussi.kivili...@mbnet.fi --- crypto/testmgr.c |2 -- 1 file changed, 2 deletions(-) diff --git a/crypto/testmgr.c b/crypto/testmgr.c index 3933241..b8695bf 100644 --- a/crypto/testmgr.c +++ b/crypto/testmgr.c @@ -2175,7 +2175,6 @@ static const struct alg_test_desc alg_test_descs[] = { }, { .alg = cryptd(__driver-cbc-camellia-aesni), .test = alg_test_null, - .fips_allowed = 1, .suite = { .cipher = { .enc = { @@ -2207,7 +2206,6 @@ static const struct alg_test_desc alg_test_descs[] = { }, { .alg = cryptd(__driver-ecb-camellia-aesni), .test = alg_test_null, - .fips_allowed = 1, .suite = { .cipher = { .enc = { -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] crypto: testmgr - add larger crc32c test vector to test FPU path in crc32c_intel
Signed-off-by: Jussi Kivilinna jussi.kivili...@mbnet.fi --- crypto/testmgr.h | 267 +- 1 file changed, 264 insertions(+), 3 deletions(-) diff --git a/crypto/testmgr.h b/crypto/testmgr.h index 17db4a9..189aeb6 100644 --- a/crypto/testmgr.h +++ b/crypto/testmgr.h @@ -41,7 +41,7 @@ struct hash_testvec { char *plaintext; char *digest; unsigned char tap[MAX_TAP]; - unsigned char psize; + unsigned short psize; unsigned char np; unsigned char ksize; }; @@ -25214,7 +25214,7 @@ static struct hash_testvec michael_mic_tv_template[] = { /* * CRC32C test vectors */ -#define CRC32C_TEST_VECTORS 14 +#define CRC32C_TEST_VECTORS 15 static struct hash_testvec crc32c_tv_template[] = { { @@ -25385,7 +25385,268 @@ static struct hash_testvec crc32c_tv_template[] = { .digest = \x75\xd3\xc5\x24, .np = 2, .tap = { 31, 209 } - }, + }, { + .key = \xff\xff\xff\xff, + .ksize = 4, + .plaintext =\x6e\x05\x79\x10\xa7\x1b\xb2\x49 + \xe0\x54\xeb\x82\x19\x8d\x24\xbb + \x2f\xc6\x5d\xf4\x68\xff\x96\x0a + \xa1\x38\xcf\x43\xda\x71\x08\x7c + \x13\xaa\x1e\xb5\x4c\xe3\x57\xee + \x85\x1c\x90\x27\xbe\x32\xc9\x60 + \xf7\x6b\x02\x99\x0d\xa4\x3b\xd2 + \x46\xdd\x74\x0b\x7f\x16\xad\x21 + \xb8\x4f\xe6\x5a\xf1\x88\x1f\x93 + \x2a\xc1\x35\xcc\x63\xfa\x6e\x05 + \x9c\x10\xa7\x3e\xd5\x49\xe0\x77 + \x0e\x82\x19\xb0\x24\xbb\x52\xe9 + \x5d\xf4\x8b\x22\x96\x2d\xc4\x38 + \xcf\x66\xfd\x71\x08\x9f\x13\xaa + \x41\xd8\x4c\xe3\x7a\x11\x85\x1c + \xb3\x27\xbe\x55\xec\x60\xf7\x8e + \x02\x99\x30\xc7\x3b\xd2\x69\x00 + \x74\x0b\xa2\x16\xad\x44\xdb\x4f + \xe6\x7d\x14\x88\x1f\xb6\x2a\xc1 + \x58\xef\x63\xfa\x91\x05\x9c\x33 + \xca\x3e\xd5\x6c\x03\x77\x0e\xa5 + \x19\xb0\x47\xde\x52\xe9\x80\x17 + \x8b\x22\xb9\x2d\xc4\x5b\xf2\x66 + \xfd\x94\x08\x9f\x36\xcd\x41\xd8 + \x6f\x06\x7a\x11\xa8\x1c\xb3\x4a + \xe1\x55\xec\x83\x1a\x8e\x25\xbc + \x30\xc7\x5e\xf5\x69\x00\x97\x0b + \xa2\x39\xd0\x44\xdb\x72\x09\x7d + \x14\xab\x1f\xb6\x4d\xe4\x58\xef + \x86\x1d\x91\x28\xbf\x33\xca\x61 + \xf8\x6c\x03\x9a\x0e\xa5\x3c\xd3 + \x47\xde\x75\x0c\x80\x17\xae\x22 + \xb9\x50\xe7\x5b\xf2\x89\x20\x94 + \x2b\xc2\x36\xcd\x64\xfb\x6f\x06 + \x9d\x11\xa8\x3f\xd6\x4a\xe1\x78 + \x0f\x83\x1a\xb1\x25\xbc\x53\xea + \x5e\xf5\x8c\x00\x97\x2e\xc5\x39 + \xd0\x67\xfe\x72\x09\xa0\x14\xab + \x42\xd9\x4d\xe4\x7b\x12\x86\x1d + \xb4\x28\xbf\x56\xed\x61\xf8\x8f + \x03\x9a\x31\xc8\x3c\xd3\x6a\x01 + \x75\x0c\xa3\x17\xae\x45\xdc\x50 + \xe7\x7e\x15\x89\x20\xb7\x2b\xc2 + \x59\xf0\x64\xfb\x92\x06\x9d\x34 + \xcb\x3f\xd6\x6d\x04\x78\x0f\xa6 + \x1a\xb1\x48\xdf\x53\xea\x81\x18 + \x8c\x23\xba\x2e\xc5\x5c\xf3\x67 + \xfe\x95\x09\xa0\x37\xce\x42\xd9 + \x70\x07\x7b\x12\xa9\x1d\xb4\x4b + \xe2\x56\xed\x84\x1b\x8f\x26\xbd + \x31\xc8\x5f\xf6\x6a\x01\x98\x0c + \xa3\x3a\xd1\x45\xdc\x73\x0a\x7e + \x15\xac\x20\xb7\x4e\xe5\x59\xf0 + \x87\x1e\x92\x29\xc0\x34\xcb\x62 + \xf9\x6d\x04\x9b\x0f\xa6\x3d\xd4 + \x48\xdf\x76\x0d\x81\x18\xaf\x23 + \xba\x51\xe8\x5c\xf3\x8a\x21\x95 + \x2c\xc3\x37\xce\x65\xfc\x70\x07 + \x9e\x12\xa9\x40\xd7\x4b\xe2\x79 + \x10\x84\x1b\xb2\x26\xbd\x54\xeb
Re: [PATCH 2/2] Remove VLAIS usage from crypto/testmgr.c
Quoting Behan Webster beh...@converseincode.com: From: Jan-Simon Möller dl...@gmx.de The use of variable length arrays in structs (VLAIS) in the Linux Kernel code precludes the use of compilers which don't implement VLAIS (for instance the Clang compiler). This patch instead allocates the appropriate amount of memory using an char array. Patch from series at http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20120507/142707.html by PaX Team. Signed-off-by: Jan-Simon Möller dl...@gmx.de Cc: pagee...@freemail.hu Signed-off-by: Behan Webster beh...@converseincode.com --- crypto/testmgr.c | 23 +-- 1 file changed, 13 insertions(+), 10 deletions(-) diff --git a/crypto/testmgr.c b/crypto/testmgr.c index 941d75c..5b7b3a6 100644 --- a/crypto/testmgr.c +++ b/crypto/testmgr.c @@ -1578,16 +1578,19 @@ static int alg_test_crc32c(const struct alg_test_desc *desc, } do { - struct { - struct shash_desc shash; - char ctx[crypto_shash_descsize(tfm)]; - } sdesc; - - sdesc.shash.tfm = tfm; - sdesc.shash.flags = 0; - - *(u32 *)sdesc.ctx = le32_to_cpu(420553207); - err = crypto_shash_final(sdesc.shash, (u8 *)val); + char sdesc[sizeof(struct shash_desc) + + crypto_shash_descsize(tfm) + + CRYPTO_MINALIGN] CRYPTO_MINALIGN_ATTR; + struct shash_desc *shash = (struct shash_desc *)sdesc; + u32 *ctx = (u32 *)((unsigned long)(sdesc + + sizeof(struct shash_desc) + CRYPTO_MINALIGN - 1) +~(CRYPTO_MINALIGN - 1)); I think you should use '(u32 *)shash_desc_ctx(shash)' instead of getting ctx pointer manually. + + shash-tfm = tfm; + shash-flags = 0; + + *ctx = le32_to_cpu(420553207); + err = crypto_shash_final(shash, (u8 *)val); if (err) { printk(KERN_ERR alg: crc32c: Operation failed for %s: %d\n, driver, err); -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
sha1-arm assembler and CONFIG_THUMB2_KERNEL = build error
Hello, I tested cryptodev-2.6 tree with ARCH=arm, and get following error with CONFIG_THUMB2_KERNEL=y + CONFIG_CRYPTO_SHA1_ARM=y combination. Config based on 'vexpress_defconfig' (config attached). AS arch/arm/crypto/sha1-armv4-large.o arch/arm/crypto/sha1-armv4-large.S: Assembler messages: arch/arm/crypto/sha1-armv4-large.S:197: Error: r13 not allowed here -- `teq r14,sp' arch/arm/crypto/sha1-armv4-large.S:377: Error: r13 not allowed here -- `teq r14,sp' arch/arm/crypto/sha1-armv4-large.S:469: Error: r13 not allowed here -- `teq r14,sp' -Jussi # # Automatically generated file; DO NOT EDIT. # Linux/arm 3.7.0-rc1 Kernel Configuration # CONFIG_ARM=y CONFIG_SYS_SUPPORTS_APM_EMULATION=y CONFIG_HAVE_PROC_CPU=y CONFIG_NO_IOPORT=y CONFIG_STACKTRACE_SUPPORT=y CONFIG_LOCKDEP_SUPPORT=y CONFIG_TRACE_IRQFLAGS_SUPPORT=y CONFIG_RWSEM_GENERIC_SPINLOCK=y CONFIG_GENERIC_HWEIGHT=y CONFIG_GENERIC_CALIBRATE_DELAY=y CONFIG_NEED_DMA_MAP_STATE=y CONFIG_VECTORS_BASE=0x CONFIG_ARM_PATCH_PHYS_VIRT=y CONFIG_GENERIC_BUG=y CONFIG_DEFCONFIG_LIST=/lib/modules/$UNAME_RELEASE/.config CONFIG_HAVE_IRQ_WORK=y CONFIG_IRQ_WORK=y # # General setup # CONFIG_EXPERIMENTAL=y CONFIG_INIT_ENV_ARG_LIMIT=32 CONFIG_CROSS_COMPILE= CONFIG_LOCALVERSION= # CONFIG_LOCALVERSION_AUTO is not set CONFIG_HAVE_KERNEL_GZIP=y CONFIG_HAVE_KERNEL_LZMA=y CONFIG_HAVE_KERNEL_XZ=y CONFIG_HAVE_KERNEL_LZO=y CONFIG_KERNEL_GZIP=y # CONFIG_KERNEL_LZMA is not set # CONFIG_KERNEL_XZ is not set # CONFIG_KERNEL_LZO is not set CONFIG_DEFAULT_HOSTNAME=(none) CONFIG_SWAP=y CONFIG_SYSVIPC=y CONFIG_SYSVIPC_SYSCTL=y # CONFIG_POSIX_MQUEUE is not set # CONFIG_FHANDLE is not set # CONFIG_AUDIT is not set CONFIG_HAVE_GENERIC_HARDIRQS=y # # IRQ subsystem # CONFIG_GENERIC_HARDIRQS=y CONFIG_GENERIC_IRQ_PROBE=y CONFIG_GENERIC_IRQ_SHOW=y CONFIG_HARDIRQS_SW_RESEND=y CONFIG_IRQ_DOMAIN=y # CONFIG_IRQ_DOMAIN_DEBUG is not set CONFIG_SPARSE_IRQ=y CONFIG_KTIME_SCALAR=y CONFIG_GENERIC_CLOCKEVENTS=y CONFIG_GENERIC_CLOCKEVENTS_BUILD=y CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y # # Timers subsystem # # CONFIG_NO_HZ is not set # CONFIG_HIGH_RES_TIMERS is not set # # CPU/Task time and stats accounting # CONFIG_TICK_CPU_ACCOUNTING=y # CONFIG_BSD_PROCESS_ACCT is not set # CONFIG_TASKSTATS is not set # # RCU Subsystem # CONFIG_TREE_RCU=y # CONFIG_PREEMPT_RCU is not set CONFIG_RCU_FANOUT=32 CONFIG_RCU_FANOUT_LEAF=16 # CONFIG_RCU_FANOUT_EXACT is not set # CONFIG_TREE_RCU_TRACE is not set CONFIG_IKCONFIG=y CONFIG_IKCONFIG_PROC=y CONFIG_LOG_BUF_SHIFT=14 CONFIG_CGROUPS=y # CONFIG_CGROUP_DEBUG is not set # CONFIG_CGROUP_FREEZER is not set # CONFIG_CGROUP_DEVICE is not set CONFIG_CPUSETS=y CONFIG_PROC_PID_CPUSET=y # CONFIG_CGROUP_CPUACCT is not set # CONFIG_RESOURCE_COUNTERS is not set # CONFIG_CGROUP_PERF is not set # CONFIG_CGROUP_SCHED is not set # CONFIG_BLK_CGROUP is not set # CONFIG_CHECKPOINT_RESTORE is not set CONFIG_NAMESPACES=y # CONFIG_UTS_NS is not set # CONFIG_IPC_NS is not set # CONFIG_PID_NS is not set # CONFIG_NET_NS is not set # CONFIG_SCHED_AUTOGROUP is not set # CONFIG_SYSFS_DEPRECATED is not set # CONFIG_RELAY is not set CONFIG_BLK_DEV_INITRD=y CONFIG_INITRAMFS_SOURCE= CONFIG_RD_GZIP=y CONFIG_RD_BZIP2=y CONFIG_RD_LZMA=y CONFIG_RD_XZ=y CONFIG_RD_LZO=y CONFIG_CC_OPTIMIZE_FOR_SIZE=y CONFIG_SYSCTL=y CONFIG_ANON_INODES=y # CONFIG_EXPERT is not set CONFIG_HAVE_UID16=y CONFIG_UID16=y # CONFIG_SYSCTL_SYSCALL is not set CONFIG_KALLSYMS=y # CONFIG_KALLSYMS_ALL is not set CONFIG_HOTPLUG=y CONFIG_PRINTK=y CONFIG_BUG=y CONFIG_ELF_CORE=y CONFIG_BASE_FULL=y CONFIG_FUTEX=y CONFIG_EPOLL=y CONFIG_SIGNALFD=y CONFIG_TIMERFD=y CONFIG_EVENTFD=y CONFIG_SHMEM=y CONFIG_AIO=y # CONFIG_EMBEDDED is not set CONFIG_HAVE_PERF_EVENTS=y CONFIG_PERF_USE_VMALLOC=y # # Kernel Performance Events And Counters # CONFIG_PERF_EVENTS=y # CONFIG_DEBUG_PERF_USE_VMALLOC is not set CONFIG_VM_EVENT_COUNTERS=y CONFIG_SLUB_DEBUG=y CONFIG_COMPAT_BRK=y # CONFIG_SLAB is not set CONFIG_SLUB=y CONFIG_PROFILING=y CONFIG_OPROFILE=y CONFIG_HAVE_OPROFILE=y # CONFIG_KPROBES is not set CONFIG_JUMP_LABEL=y CONFIG_HAVE_KPROBES=y CONFIG_HAVE_KRETPROBES=y CONFIG_HAVE_ARCH_TRACEHOOK=y CONFIG_HAVE_DMA_ATTRS=y CONFIG_HAVE_DMA_CONTIGUOUS=y CONFIG_USE_GENERIC_SMP_HELPERS=y CONFIG_GENERIC_SMP_IDLE_THREAD=y CONFIG_HAVE_REGS_AND_STACK_ACCESS_API=y CONFIG_HAVE_CLK=y CONFIG_HAVE_DMA_API_DEBUG=y CONFIG_HAVE_HW_BREAKPOINT=y CONFIG_HAVE_ARCH_JUMP_LABEL=y CONFIG_ARCH_WANT_IPC_PARSE_VERSION=y CONFIG_GENERIC_KERNEL_THREAD=y CONFIG_GENERIC_KERNEL_EXECVE=y CONFIG_HAVE_MOD_ARCH_SPECIFIC=y CONFIG_MODULES_USE_ELF_REL=y # # GCOV-based kernel profiling # # CONFIG_GCOV_KERNEL is not set CONFIG_HAVE_GENERIC_DMA_COHERENT=y CONFIG_SLABINFO=y CONFIG_RT_MUTEXES=y CONFIG_BASE_SMALL=0 CONFIG_MODULES=y # CONFIG_MODULE_FORCE_LOAD is not set CONFIG_MODULE_UNLOAD=y # CONFIG_MODULE_FORCE_UNLOAD is not set # CONFIG_MODVERSIONS is not set # CONFIG_MODULE_SRCVERSION_ALL is not set # CONFIG_MODULE_SIG is not set CONFIG_STOP_MACHINE=y CONFIG_BLOCK=y CONFIG_LBDAF=y #
[PATCH 0/3] [v2] AES-NI/AVX implementation of Camellia cipher
This patchset adds AES-NI/AVX assembler implementation of Camellia cipher for x86-64. [v2]: - No missing patches - No missing files --- Jussi Kivilinna (3): [v2] crypto: tcrypt - add async speed test for camellia cipher [v2] crypto: camellia-x86_64 - share common functions and move structures and function definitions to header file [v2] crypto: camellia - add AES-NI/AVX/x86_64 assembler implementation of camellia cipher arch/x86/crypto/Makefile|3 arch/x86/crypto/camellia-aesni-avx-asm_64.S | 1102 +++ arch/x86/crypto/camellia_aesni_avx_glue.c | 558 ++ arch/x86/crypto/camellia_glue.c | 80 +- arch/x86/include/asm/crypto/camellia.h | 82 ++ crypto/Kconfig | 22 + crypto/tcrypt.c | 23 + crypto/testmgr.c| 62 ++ 8 files changed, 1875 insertions(+), 57 deletions(-) create mode 100644 arch/x86/crypto/camellia-aesni-avx-asm_64.S create mode 100644 arch/x86/crypto/camellia_aesni_avx_glue.c create mode 100644 arch/x86/include/asm/crypto/camellia.h -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html