Re: [PATCH] crypto/arm: accelerated SHA-512 using ARM generic ASM and NEON

2015-03-28 Thread Jussi Kivilinna
On 28.03.2015 09:28, Ard Biesheuvel wrote:
 This updates the SHA-512 NEON module with the faster and more
 versatile implementation from the OpenSSL project. It consists
 of both a NEON and a generic ASM version of the core SHA-512
 transform, where the NEON version reverts to the ASM version
 when invoked in non-process context.
 
 Performance relative to the generic implementation (measured
 using tcrypt.ko mode=306 sec=1 running on a Cortex-A57 under
 KVM):
 
   input size  block size  asm neonold neon
 
   16  16  1.392.542.21
   64  16  1.322.332.09
   64  64  1.382.532.19
   256 16  1.312.282.06
   256 64  1.382.542.25
   256 256 1.402.772.39
   102416  1.292.222.01
   1024256 1.402.822.45
   102410241.412.932.53
   204816  1.332.212.00
   2048256 1.402.842.46
   204810241.412.962.55
   204820481.412.982.56
   409616  1.342.201.99
   4096256 1.402.842.46
   409610241.412.972.56
   409640961.413.012.58
   819216  1.342.191.99
   8192256 1.402.852.47
   819210241.412.982.56
   819240961.412.712.59
   819281921.513.512.69
 
 Signed-off-by: Ard Biesheuvel ard.biesheu...@linaro.org
 ---
 
 This should get the same treatment as Sami's sha56 version: I would like
 to wait until the OpenSSL source file hits the upstream repository so that
 I can refer to its sha1 hash in the commit log.
 
  arch/arm/crypto/Kconfig   |2 -
  arch/arm/crypto/Makefile  |8 +-
  arch/arm/crypto/sha512-armv4.pl   |  656 
  arch/arm/crypto/sha512-armv7-neon.S   |  455 -
  arch/arm/crypto/sha512-core.S_shipped | 1814 
 +
  arch/arm/crypto/sha512.h  |   14 +
  arch/arm/crypto/sha512_glue.c |  255 +
  arch/arm/crypto/sha512_neon_glue.c|  155 +--
  8 files changed, 2762 insertions(+), 597 deletions(-)
  create mode 100644 arch/arm/crypto/sha512-armv4.pl
  delete mode 100644 arch/arm/crypto/sha512-armv7-neon.S

Acked-by: Jussi Kivilinna jussi.kivili...@iki.fi

  create mode 100644 arch/arm/crypto/sha512-core.S_shipped
  create mode 100644 arch/arm/crypto/sha512.h
  create mode 100644 arch/arm/crypto/sha512_glue.c
 
--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: rfc4543 testvectors in testmgr.h kernel

2015-02-10 Thread Jussi Kivilinna
On 10.02.2015 18:22, Marcus Meissner wrote:
 Hi Jussi,
 
 We were trying to use rfc4543(gcm(aes)) in the kernel for FIPS mode,
 but the testvectors seem to fail.

You probably need to add '.fips_allowed = 1,' in testmgr.c for 
rfc4543(gcm(aes)) to enable algorithm in fips mode.

 
 Did you verify that they work? Are these the ones from Page 18 of 
 https://tools.ietf.org/html/draft-mcgrew-gcm-test-01, as there the 
 plaintext 
 and aaad seem to be switched?

rfc4543() wrapper constructs the aad from '.assoc' and '.input'.

-Jussi

 
 Ciao, Marcus
 




signature.asc
Description: OpenPGP digital signature


Re: Kernel crypto API: cryptoperf performance measurement

2014-08-21 Thread Jussi Kivilinna

On 2014-08-20 21:14, Milan Broz wrote:
 On 08/20/2014 03:25 PM, Jussi Kivilinna wrote:
 One to four GB per second for XTS? 12 GB per second for AES CBC? Somehow 
 that 
 does not sound right.

 Agreed, those do not look correct... I wonder what happened there. On
 new run, I got more sane results:
 
 Which cryptsetup version are you using?
 
 There was a bug in that test on fast machines (fixed in 1.6.3, I hope :)

I had version 1.6.1 at hand.

 
 But anyway, it is not intended as rigorous speed test,
 it was intended for comparison of ciphers speed on particular machine.


True, but it's nice easy test when compared to parsing results from
tcrypt speed tests.

-Jussi

 Test basically tries to encrypt 1MB block (or multiple of this
 if machine is too fast). All it runs through kernel userspace crypto API
 interface.
 (Real FDE is always slower because it runs over 512bytes blocks.)
 
 Milan
 
--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Kernel crypto API: cryptoperf performance measurement

2014-08-20 Thread Jussi Kivilinna
Hello,

On 2014-08-19 21:23, Stephan Mueller wrote:
 Am Dienstag, 19. August 2014, 10:17:36 schrieb Jussi Kivilinna:
 
 Hi Jussi,
 
 Hello,

 On 2014-08-17 18:55, Stephan Mueller wrote:
 Hi,

 during playing around with the kernel crypto API, I implemented a
 performance measurement tool kit for the various kernel crypto API cipher
 types. The cryptoperf tool kit is provided in [1].

 Comments are welcome.

 Your results are quite slow compared to, for example cryptsetup
 benchmark, which uses kernel crypto from userspace.

 With Intel i5-2450M (turbo enabled), I get:

 #  Algorithm | Key |  Encryption |  Decryption
  aes-cbc   128b   524,0 MiB/s  11909,1 MiB/s
  serpent-cbc   128b60,9 MiB/s   219,4 MiB/s
  twofish-cbc   128b   143,4 MiB/s   240,3 MiB/s
  aes-cbc   256b   330,4 MiB/s  1242,8 MiB/s
  serpent-cbc   256b66,1 MiB/s   220,3 MiB/s
  twofish-cbc   256b   143,5 MiB/s   221,8 MiB/s
  aes-xts   256b  1268,7 MiB/s  4193,0 MiB/s
  serpent-xts   256b   234,8 MiB/s   224,6 MiB/s
  twofish-xts   256b   253,5 MiB/s   254,7 MiB/s
  aes-xts   512b  2535,0 MiB/s  2945,0 MiB/s
  serpent-xts   512b   274,2 MiB/s   242,3 MiB/s
  twofish-xts   512b   250,0 MiB/s   245,8 MiB/s
 
 One to four GB per second for XTS? 12 GB per second for AES CBC? Somehow that 
 does not sound right.

Agreed, those do not look correct... I wonder what happened there. On
new run, I got more sane results:

#  Algorithm | Key |  Encryption |  Decryption
 aes-cbc   128b   139,1 MiB/s  1713,6 MiB/s
 serpent-cbc   128b62,2 MiB/s   232,9 MiB/s
 twofish-cbc   128b   116,3 MiB/s   243,7 MiB/s
 aes-cbc   256b   375,1 MiB/s  1159,4 MiB/s
 serpent-cbc   256b62,1 MiB/s   214,9 MiB/s
 twofish-cbc   256b   139,3 MiB/s   217,5 MiB/s
 aes-xts   256b  1296,4 MiB/s  1272,5 MiB/s
 serpent-xts   256b   283,3 MiB/s   275,6 MiB/s
 twofish-xts   256b   294,8 MiB/s   299,3 MiB/s
 aes-xts   512b   984,3 MiB/s   991,1 MiB/s
 serpent-xts   512b   227,7 MiB/s   220,6 MiB/s
 twofish-xts   512b   220,6 MiB/s   220,2 MiB/s

-Jussi
--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Kernel crypto API: cryptoperf performance measurement

2014-08-19 Thread Jussi Kivilinna
Hello,

On 2014-08-17 18:55, Stephan Mueller wrote:
 Hi,
 
 during playing around with the kernel crypto API, I implemented a performance 
 measurement tool kit for the various kernel crypto API cipher types. The 
 cryptoperf tool kit is provided in [1].
 
 Comments are welcome.

Your results are quite slow compared to, for example cryptsetup
benchmark, which uses kernel crypto from userspace.

With Intel i5-2450M (turbo enabled), I get:

#  Algorithm | Key |  Encryption |  Decryption
 aes-cbc   128b   524,0 MiB/s  11909,1 MiB/s
 serpent-cbc   128b60,9 MiB/s   219,4 MiB/s
 twofish-cbc   128b   143,4 MiB/s   240,3 MiB/s
 aes-cbc   256b   330,4 MiB/s  1242,8 MiB/s
 serpent-cbc   256b66,1 MiB/s   220,3 MiB/s
 twofish-cbc   256b   143,5 MiB/s   221,8 MiB/s
 aes-xts   256b  1268,7 MiB/s  4193,0 MiB/s
 serpent-xts   256b   234,8 MiB/s   224,6 MiB/s
 twofish-xts   256b   253,5 MiB/s   254,7 MiB/s
 aes-xts   512b  2535,0 MiB/s  2945,0 MiB/s
 serpent-xts   512b   274,2 MiB/s   242,3 MiB/s
 twofish-xts   512b   250,0 MiB/s   245,8 MiB/s

 
 In general, the results are as expected, i.e. the assembler implementations 
 are faster than the pure C implementations. However, there are curious 
 results 
 which probably should be checked by the maintainers of the respective ciphers 
 (hoping that my tool works correctly ;-) ):
 
 ablkcipher
 --
 
 - cryptd is slower by factor 10 across the board
 
 blkcipher
 -
 
 - Blowfish x86_64 assembler together with the generic C block chaining modes 
 is significantly slower than Blowfish implemented in generic C
 
 - Blowfish x86_64 assembler in ECB is significantly slower than generic C 
 Blowfish ECB
 
 - Serpent assembler implementations are not significantly faster than generic 
 C implementations
 
 - AES-NI ECB, LRW, CTR is significantly slower than AES i586 assembler.
 
 - AES-NI ECB, LRW, CTR is not significantly faster than AES generic C
 

Quite many assembly implementations get speed up from processing
parallel block cipher blocks, which modes of operation that (CTR, XTS,
LWR, CBC(dec)). For small buffer sizes, these implementations will use
the non-parallel implementation of cipher.

-Jussi

 rng
 ---
 
 - The ANSI X9.31 RNG seems to work massively faster than the underlying AES 
 cipher (by about a factor of 5). I am unsure about the cause of this.
 
 
 Caveat
 --
 
 Please note that there is one small error which I am unsure how to fix it as 
 documented in the TODO file.
 
 [1] http://www.chronox.de/cryptoperf.html
 
--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] [v3] crypto: sha512: add ARM NEON implementation

2014-07-29 Thread Jussi Kivilinna
On 29.07.2014 15:35, Ard Biesheuvel wrote:
 On 30 June 2014 18:39, Jussi Kivilinna jussi.kivili...@iki.fi wrote:
 This patch adds ARM NEON assembly implementation of SHA-512 and SHA-384
 algorithms.

 tcrypt benchmark results on Cortex-A8, sha512-generic vs sha512-neon-asm:

 block-size  bytes/updateold-vs-new
 16  16  2.99x
 64  16  2.67x
 64  64  3.00x
 256 16  2.64x
 256 64  3.06x
 256 256 3.33x
 102416  2.53x
 1024256 3.39x
 102410243.52x
 204816  2.50x
 2048256 3.41x
 204810243.54x
 204820483.57x
 409616  2.49x
 4096256 3.42x
 409610243.56x
 409640963.59x
 819216  2.48x
 8192256 3.42x
 819210243.56x
 819240963.60x
 819281923.60x

 Acked-by: Ard Biesheuvel ard.biesheu...@linaro.org
 Tested-by: Ard Biesheuvel ard.biesheu...@linaro.org
 Signed-off-by: Jussi Kivilinna jussi.kivili...@iki.fi

 ---

 Changes in v2:
  - Use ENTRY/ENDPROC
  - Don't provide Thumb2 version

 v3:
  - Changelog moved below '---'
 
 Hi Jussi,
 
 What is the status of these patches?
 Have you sent them to Russell's patch tracker?


I sent them to patch tracker moment ago. Thanks for the reminder.

-Jussi

--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] [v3] crypto: sha1: add ARM NEON implementation

2014-06-30 Thread Jussi Kivilinna
This patch adds ARM NEON assembly implementation of SHA-1 algorithm.

tcrypt benchmark results on Cortex-A8, sha1-arm-asm vs sha1-neon-asm:

block-size  bytes/updateold-vs-new
16  16  1.04x
64  16  1.02x
64  64  1.05x
256 16  1.03x
256 64  1.04x
256 256 1.30x
102416  1.03x
1024256 1.36x
102410241.52x
204816  1.03x
2048256 1.39x
204810241.55x
204820481.59x
409616  1.03x
4096256 1.40x
409610241.57x
409640961.62x
819216  1.03x
8192256 1.40x
819210241.58x
819240961.63x
819281921.63x

Acked-by: Ard Biesheuvel ard.biesheu...@linaro.org
Tested-by: Ard Biesheuvel ard.biesheu...@linaro.org
Signed-off-by: Jussi Kivilinna jussi.kivili...@iki.fi

---

Changes in v2:
 - Use ENTRY/ENDPROC
 - Don't provide Thumb2 version
 - Move contants to .text section
 - Further tweaks to implementation for ~10% speed-up.

v3:
 - Changelog moved below '---'
---
 arch/arm/crypto/Makefile   |2 
 arch/arm/crypto/sha1-armv7-neon.S  |  634 
 arch/arm/crypto/sha1_glue.c|8 
 arch/arm/crypto/sha1_neon_glue.c   |  197 +++
 arch/arm/include/asm/crypto/sha1.h |   10 +
 crypto/Kconfig |   11 +
 6 files changed, 859 insertions(+), 3 deletions(-)
 create mode 100644 arch/arm/crypto/sha1-armv7-neon.S
 create mode 100644 arch/arm/crypto/sha1_neon_glue.c
 create mode 100644 arch/arm/include/asm/crypto/sha1.h

diff --git a/arch/arm/crypto/Makefile b/arch/arm/crypto/Makefile
index 81cda39..374956d 100644
--- a/arch/arm/crypto/Makefile
+++ b/arch/arm/crypto/Makefile
@@ -5,10 +5,12 @@
 obj-$(CONFIG_CRYPTO_AES_ARM) += aes-arm.o
 obj-$(CONFIG_CRYPTO_AES_ARM_BS) += aes-arm-bs.o
 obj-$(CONFIG_CRYPTO_SHA1_ARM) += sha1-arm.o
+obj-$(CONFIG_CRYPTO_SHA1_ARM_NEON) += sha1-arm-neon.o
 
 aes-arm-y  := aes-armv4.o aes_glue.o
 aes-arm-bs-y   := aesbs-core.o aesbs-glue.o
 sha1-arm-y := sha1-armv4-large.o sha1_glue.o
+sha1-arm-neon-y:= sha1-armv7-neon.o sha1_neon_glue.o
 
 quiet_cmd_perl = PERL$@
   cmd_perl = $(PERL) $()  $(@)
diff --git a/arch/arm/crypto/sha1-armv7-neon.S 
b/arch/arm/crypto/sha1-armv7-neon.S
new file mode 100644
index 000..50013c0
--- /dev/null
+++ b/arch/arm/crypto/sha1-armv7-neon.S
@@ -0,0 +1,634 @@
+/* sha1-armv7-neon.S - ARM/NEON accelerated SHA-1 transform function
+ *
+ * Copyright © 2013-2014 Jussi Kivilinna jussi.kivili...@iki.fi
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 2 of the License, or (at your option)
+ * any later version.
+ */
+
+#include linux/linkage.h
+
+
+.syntax unified
+.code   32
+.fpu neon
+
+.text
+
+
+/* Context structure */
+
+#define state_h0 0
+#define state_h1 4
+#define state_h2 8
+#define state_h3 12
+#define state_h4 16
+
+
+/* Constants */
+
+#define K1  0x5A827999
+#define K2  0x6ED9EBA1
+#define K3  0x8F1BBCDC
+#define K4  0xCA62C1D6
+.align 4
+.LK_VEC:
+.LK1:  .long K1, K1, K1, K1
+.LK2:  .long K2, K2, K2, K2
+.LK3:  .long K3, K3, K3, K3
+.LK4:  .long K4, K4, K4, K4
+
+
+/* Register macros */
+
+#define RSTATE r0
+#define RDATA r1
+#define RNBLKS r2
+#define ROLDSTACK r3
+#define RWK lr
+
+#define _a r4
+#define _b r5
+#define _c r6
+#define _d r7
+#define _e r8
+
+#define RT0 r9
+#define RT1 r10
+#define RT2 r11
+#define RT3 r12
+
+#define W0 q0
+#define W1 q1
+#define W2 q2
+#define W3 q3
+#define W4 q4
+#define W5 q5
+#define W6 q6
+#define W7 q7
+
+#define tmp0 q8
+#define tmp1 q9
+#define tmp2 q10
+#define tmp3 q11
+
+#define qK1 q12
+#define qK2 q13
+#define qK3 q14
+#define qK4 q15
+
+
+/* Round function macros. */
+
+#define WK_offs(i) (((i)  15) * 4)
+
+#define _R_F1(a,b,c,d,e,i,pre1,pre2,pre3,i16,\
+ W,W_m04,W_m08,W_m12,W_m16,W_m20,W_m24,W_m28) \
+   ldr RT3, [sp, WK_offs(i)]; \
+   pre1(i16,W,W_m04,W_m08,W_m12,W_m16,W_m20,W_m24,W_m28); \
+   bic RT0, d, b; \
+   add e, e, a, ror #(32 - 5); \
+   and RT1, c, b; \
+   pre2(i16,W,W_m04,W_m08,W_m12,W_m16,W_m20,W_m24,W_m28); \
+   add RT0, RT0, RT3; \
+   add e, e, RT1; \
+   ror b, #(32 - 30); \
+   pre3(i16,W,W_m04,W_m08,W_m12,W_m16,W_m20,W_m24,W_m28); \
+   add e, e, RT0;
+
+#define _R_F2(a,b,c,d,e,i,pre1,pre2,pre3,i16,\
+ W,W_m04,W_m08,W_m12,W_m16,W_m20,W_m24,W_m28) \
+   ldr RT3, [sp, WK_offs(i)]; \
+   pre1(i16,W,W_m04,W_m08,W_m12,W_m16,W_m20,W_m24

[PATCH] [v3] crypto: sha512: add ARM NEON implementation

2014-06-30 Thread Jussi Kivilinna
This patch adds ARM NEON assembly implementation of SHA-512 and SHA-384
algorithms.

tcrypt benchmark results on Cortex-A8, sha512-generic vs sha512-neon-asm:

block-size  bytes/updateold-vs-new
16  16  2.99x
64  16  2.67x
64  64  3.00x
256 16  2.64x
256 64  3.06x
256 256 3.33x
102416  2.53x
1024256 3.39x
102410243.52x
204816  2.50x
2048256 3.41x
204810243.54x
204820483.57x
409616  2.49x
4096256 3.42x
409610243.56x
409640963.59x
819216  2.48x
8192256 3.42x
819210243.56x
819240963.60x
819281923.60x

Acked-by: Ard Biesheuvel ard.biesheu...@linaro.org
Tested-by: Ard Biesheuvel ard.biesheu...@linaro.org
Signed-off-by: Jussi Kivilinna jussi.kivili...@iki.fi

---

Changes in v2:
 - Use ENTRY/ENDPROC
 - Don't provide Thumb2 version

v3:
 - Changelog moved below '---'
---
 arch/arm/crypto/Makefile|2 
 arch/arm/crypto/sha512-armv7-neon.S |  455 +++
 arch/arm/crypto/sha512_neon_glue.c  |  305 +++
 crypto/Kconfig  |   15 +
 4 files changed, 777 insertions(+)
 create mode 100644 arch/arm/crypto/sha512-armv7-neon.S
 create mode 100644 arch/arm/crypto/sha512_neon_glue.c

diff --git a/arch/arm/crypto/Makefile b/arch/arm/crypto/Makefile
index 374956d..b48fa34 100644
--- a/arch/arm/crypto/Makefile
+++ b/arch/arm/crypto/Makefile
@@ -6,11 +6,13 @@ obj-$(CONFIG_CRYPTO_AES_ARM) += aes-arm.o
 obj-$(CONFIG_CRYPTO_AES_ARM_BS) += aes-arm-bs.o
 obj-$(CONFIG_CRYPTO_SHA1_ARM) += sha1-arm.o
 obj-$(CONFIG_CRYPTO_SHA1_ARM_NEON) += sha1-arm-neon.o
+obj-$(CONFIG_CRYPTO_SHA512_ARM_NEON) += sha512-arm-neon.o
 
 aes-arm-y  := aes-armv4.o aes_glue.o
 aes-arm-bs-y   := aesbs-core.o aesbs-glue.o
 sha1-arm-y := sha1-armv4-large.o sha1_glue.o
 sha1-arm-neon-y:= sha1-armv7-neon.o sha1_neon_glue.o
+sha512-arm-neon-y := sha512-armv7-neon.o sha512_neon_glue.o
 
 quiet_cmd_perl = PERL$@
   cmd_perl = $(PERL) $()  $(@)
diff --git a/arch/arm/crypto/sha512-armv7-neon.S 
b/arch/arm/crypto/sha512-armv7-neon.S
new file mode 100644
index 000..fe99472
--- /dev/null
+++ b/arch/arm/crypto/sha512-armv7-neon.S
@@ -0,0 +1,455 @@
+/* sha512-armv7-neon.S  -  ARM/NEON assembly implementation of SHA-512 
transform
+ *
+ * Copyright © 2013-2014 Jussi Kivilinna jussi.kivili...@iki.fi
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 2 of the License, or (at your option)
+ * any later version.
+ */
+
+#include linux/linkage.h
+
+
+.syntax unified
+.code   32
+.fpu neon
+
+.text
+
+/* structure of SHA512_CONTEXT */
+#define hd_a 0
+#define hd_b ((hd_a) + 8)
+#define hd_c ((hd_b) + 8)
+#define hd_d ((hd_c) + 8)
+#define hd_e ((hd_d) + 8)
+#define hd_f ((hd_e) + 8)
+#define hd_g ((hd_f) + 8)
+
+/* register macros */
+#define RK %r2
+
+#define RA d0
+#define RB d1
+#define RC d2
+#define RD d3
+#define RE d4
+#define RF d5
+#define RG d6
+#define RH d7
+
+#define RT0 d8
+#define RT1 d9
+#define RT2 d10
+#define RT3 d11
+#define RT4 d12
+#define RT5 d13
+#define RT6 d14
+#define RT7 d15
+
+#define RT01q q4
+#define RT23q q5
+#define RT45q q6
+#define RT67q q7
+
+#define RW0 d16
+#define RW1 d17
+#define RW2 d18
+#define RW3 d19
+#define RW4 d20
+#define RW5 d21
+#define RW6 d22
+#define RW7 d23
+#define RW8 d24
+#define RW9 d25
+#define RW10 d26
+#define RW11 d27
+#define RW12 d28
+#define RW13 d29
+#define RW14 d30
+#define RW15 d31
+
+#define RW01q q8
+#define RW23q q9
+#define RW45q q10
+#define RW67q q11
+#define RW89q q12
+#define RW1011q q13
+#define RW1213q q14
+#define RW1415q q15
+
+/***
+ * ARM assembly implementation of sha512 transform
+ ***/
+#define rounds2_0_63(ra, rb, rc, rd, re, rf, rg, rh, rw0, rw1, rw01q, rw2, \
+ rw23q, rw1415q, rw9, rw10, interleave_op, arg1) \
+   /* t1 = h + Sum1 (e) + Ch (e, f, g) + k[t] + w[t]; */ \
+   vshr.u64 RT2, re, #14; \
+   vshl.u64 RT3, re, #64 - 14; \
+   interleave_op(arg1); \
+   vshr.u64 RT4, re, #18; \
+   vshl.u64 RT5, re, #64 - 18; \
+   vld1.64 {RT0}, [RK]!; \
+   veor.64 RT23q, RT23q, RT45q; \
+   vshr.u64 RT4, re, #41; \
+   vshl.u64 RT5, re, #64 - 41; \
+   vadd.u64 RT0, RT0, rw0; \
+   veor.64 RT23q, RT23q, RT45q

[PATCH 1/2] [v3] crypto: sha1/ARM: make use of common SHA-1 structures

2014-06-30 Thread Jussi Kivilinna
Common SHA-1 structures are defined in crypto/sha.h for code sharing.

This patch changes SHA-1/ARM glue code to use these structures.

Acked-by: Ard Biesheuvel ard.biesheu...@linaro.org
Signed-off-by: Jussi Kivilinna jussi.kivili...@iki.fi
---
 arch/arm/crypto/sha1_glue.c |   50 +++
 1 file changed, 22 insertions(+), 28 deletions(-)

diff --git a/arch/arm/crypto/sha1_glue.c b/arch/arm/crypto/sha1_glue.c
index 76cd976..c494e57 100644
--- a/arch/arm/crypto/sha1_glue.c
+++ b/arch/arm/crypto/sha1_glue.c
@@ -24,31 +24,25 @@
 #include crypto/sha.h
 #include asm/byteorder.h
 
-struct SHA1_CTX {
-   uint32_t h0,h1,h2,h3,h4;
-   u64 count;
-   u8 data[SHA1_BLOCK_SIZE];
-};
 
-asmlinkage void sha1_block_data_order(struct SHA1_CTX *digest,
+asmlinkage void sha1_block_data_order(u32 *digest,
const unsigned char *data, unsigned int rounds);
 
 
 static int sha1_init(struct shash_desc *desc)
 {
-   struct SHA1_CTX *sctx = shash_desc_ctx(desc);
-   memset(sctx, 0, sizeof(*sctx));
-   sctx-h0 = SHA1_H0;
-   sctx-h1 = SHA1_H1;
-   sctx-h2 = SHA1_H2;
-   sctx-h3 = SHA1_H3;
-   sctx-h4 = SHA1_H4;
+   struct sha1_state *sctx = shash_desc_ctx(desc);
+
+   *sctx = (struct sha1_state){
+   .state = { SHA1_H0, SHA1_H1, SHA1_H2, SHA1_H3, SHA1_H4 },
+   };
+
return 0;
 }
 
 
-static int __sha1_update(struct SHA1_CTX *sctx, const u8 *data,
-  unsigned int len, unsigned int partial)
+static int __sha1_update(struct sha1_state *sctx, const u8 *data,
+unsigned int len, unsigned int partial)
 {
unsigned int done = 0;
 
@@ -56,17 +50,17 @@ static int __sha1_update(struct SHA1_CTX *sctx, const u8 
*data,
 
if (partial) {
done = SHA1_BLOCK_SIZE - partial;
-   memcpy(sctx-data + partial, data, done);
-   sha1_block_data_order(sctx, sctx-data, 1);
+   memcpy(sctx-buffer + partial, data, done);
+   sha1_block_data_order(sctx-state, sctx-buffer, 1);
}
 
if (len - done = SHA1_BLOCK_SIZE) {
const unsigned int rounds = (len - done) / SHA1_BLOCK_SIZE;
-   sha1_block_data_order(sctx, data + done, rounds);
+   sha1_block_data_order(sctx-state, data + done, rounds);
done += rounds * SHA1_BLOCK_SIZE;
}
 
-   memcpy(sctx-data, data + done, len - done);
+   memcpy(sctx-buffer, data + done, len - done);
return 0;
 }
 
@@ -74,14 +68,14 @@ static int __sha1_update(struct SHA1_CTX *sctx, const u8 
*data,
 static int sha1_update(struct shash_desc *desc, const u8 *data,
 unsigned int len)
 {
-   struct SHA1_CTX *sctx = shash_desc_ctx(desc);
+   struct sha1_state *sctx = shash_desc_ctx(desc);
unsigned int partial = sctx-count % SHA1_BLOCK_SIZE;
int res;
 
/* Handle the fast case right here */
if (partial + len  SHA1_BLOCK_SIZE) {
sctx-count += len;
-   memcpy(sctx-data + partial, data, len);
+   memcpy(sctx-buffer + partial, data, len);
return 0;
}
res = __sha1_update(sctx, data, len, partial);
@@ -92,7 +86,7 @@ static int sha1_update(struct shash_desc *desc, const u8 
*data,
 /* Add padding and return the message digest. */
 static int sha1_final(struct shash_desc *desc, u8 *out)
 {
-   struct SHA1_CTX *sctx = shash_desc_ctx(desc);
+   struct sha1_state *sctx = shash_desc_ctx(desc);
unsigned int i, index, padlen;
__be32 *dst = (__be32 *)out;
__be64 bits;
@@ -106,7 +100,7 @@ static int sha1_final(struct shash_desc *desc, u8 *out)
/* We need to fill a whole block for __sha1_update() */
if (padlen = 56) {
sctx-count += padlen;
-   memcpy(sctx-data + index, padding, padlen);
+   memcpy(sctx-buffer + index, padding, padlen);
} else {
__sha1_update(sctx, padding, padlen, index);
}
@@ -114,7 +108,7 @@ static int sha1_final(struct shash_desc *desc, u8 *out)
 
/* Store state in digest */
for (i = 0; i  5; i++)
-   dst[i] = cpu_to_be32(((u32 *)sctx)[i]);
+   dst[i] = cpu_to_be32(sctx-state[i]);
 
/* Wipe context */
memset(sctx, 0, sizeof(*sctx));
@@ -124,7 +118,7 @@ static int sha1_final(struct shash_desc *desc, u8 *out)
 
 static int sha1_export(struct shash_desc *desc, void *out)
 {
-   struct SHA1_CTX *sctx = shash_desc_ctx(desc);
+   struct sha1_state *sctx = shash_desc_ctx(desc);
memcpy(out, sctx, sizeof(*sctx));
return 0;
 }
@@ -132,7 +126,7 @@ static int sha1_export(struct shash_desc *desc, void *out)
 
 static int sha1_import(struct shash_desc *desc, const void *in)
 {
-   struct SHA1_CTX *sctx = shash_desc_ctx(desc);
+   struct sha1_state *sctx = shash_desc_ctx(desc

Re: [PATCH] [v3] crypto: sha512: add ARM NEON implementation

2014-06-30 Thread Jussi Kivilinna
On 30.06.2014 21:13, Ard Biesheuvel wrote:
 On 30 June 2014 18:39, Jussi Kivilinna jussi.kivili...@iki.fi wrote:
 This patch adds ARM NEON assembly implementation of SHA-512 and SHA-384
 algorithms.

 tcrypt benchmark results on Cortex-A8, sha512-generic vs sha512-neon-asm:

 block-size  bytes/updateold-vs-new
 16  16  2.99x
 64  16  2.67x
 64  64  3.00x
 256 16  2.64x
 256 64  3.06x
 256 256 3.33x
 102416  2.53x
 1024256 3.39x
 102410243.52x
 204816  2.50x
 2048256 3.41x
 204810243.54x
 204820483.57x
 409616  2.49x
 4096256 3.42x
 409610243.56x
 409640963.59x
 819216  2.48x
 8192256 3.42x
 819210243.56x
 819240963.60x
 819281923.60x

 Acked-by: Ard Biesheuvel ard.biesheu...@linaro.org
 Tested-by: Ard Biesheuvel ard.biesheu...@linaro.org
 Signed-off-by: Jussi Kivilinna jussi.kivili...@iki.fi

 
 Likewise for this one: if nobody has any more comments, this should go
 into the patch system.
 
 One remaining question though: is this code (and the SHA1 code) known
 to be broken for big endian or just untested?
 

Untested and probably broken, so therefore I've disabled when CPU_BIG_ENDIAN=y.

-Jussi

 Thanks,
 Ard.
 
 ---

 Changes in v2:
  - Use ENTRY/ENDPROC
  - Don't provide Thumb2 version

 v3:
  - Changelog moved below '---'
 ---
  arch/arm/crypto/Makefile|2
  arch/arm/crypto/sha512-armv7-neon.S |  455 
 +++
  arch/arm/crypto/sha512_neon_glue.c  |  305 +++
  crypto/Kconfig  |   15 +
  4 files changed, 777 insertions(+)
  create mode 100644 arch/arm/crypto/sha512-armv7-neon.S
  create mode 100644 arch/arm/crypto/sha512_neon_glue.c

 diff --git a/arch/arm/crypto/Makefile b/arch/arm/crypto/Makefile
 index 374956d..b48fa34 100644
 --- a/arch/arm/crypto/Makefile
 +++ b/arch/arm/crypto/Makefile
 @@ -6,11 +6,13 @@ obj-$(CONFIG_CRYPTO_AES_ARM) += aes-arm.o
  obj-$(CONFIG_CRYPTO_AES_ARM_BS) += aes-arm-bs.o
  obj-$(CONFIG_CRYPTO_SHA1_ARM) += sha1-arm.o
  obj-$(CONFIG_CRYPTO_SHA1_ARM_NEON) += sha1-arm-neon.o
 +obj-$(CONFIG_CRYPTO_SHA512_ARM_NEON) += sha512-arm-neon.o

  aes-arm-y  := aes-armv4.o aes_glue.o
  aes-arm-bs-y   := aesbs-core.o aesbs-glue.o
  sha1-arm-y := sha1-armv4-large.o sha1_glue.o
  sha1-arm-neon-y:= sha1-armv7-neon.o sha1_neon_glue.o
 +sha512-arm-neon-y := sha512-armv7-neon.o sha512_neon_glue.o

  quiet_cmd_perl = PERL$@
cmd_perl = $(PERL) $()  $(@)
 diff --git a/arch/arm/crypto/sha512-armv7-neon.S 
 b/arch/arm/crypto/sha512-armv7-neon.S
 new file mode 100644
 index 000..fe99472
 --- /dev/null
 +++ b/arch/arm/crypto/sha512-armv7-neon.S
 @@ -0,0 +1,455 @@
 +/* sha512-armv7-neon.S  -  ARM/NEON assembly implementation of SHA-512 
 transform
 + *
 + * Copyright © 2013-2014 Jussi Kivilinna jussi.kivili...@iki.fi
 + *
 + * This program is free software; you can redistribute it and/or modify it
 + * under the terms of the GNU General Public License as published by the 
 Free
 + * Software Foundation; either version 2 of the License, or (at your option)
 + * any later version.
 + */
 +
 +#include linux/linkage.h
 +
 +
 +.syntax unified
 +.code   32
 +.fpu neon
 +
 +.text
 +
 +/* structure of SHA512_CONTEXT */
 +#define hd_a 0
 +#define hd_b ((hd_a) + 8)
 +#define hd_c ((hd_b) + 8)
 +#define hd_d ((hd_c) + 8)
 +#define hd_e ((hd_d) + 8)
 +#define hd_f ((hd_e) + 8)
 +#define hd_g ((hd_f) + 8)
 +
 +/* register macros */
 +#define RK %r2
 +
 +#define RA d0
 +#define RB d1
 +#define RC d2
 +#define RD d3
 +#define RE d4
 +#define RF d5
 +#define RG d6
 +#define RH d7
 +
 +#define RT0 d8
 +#define RT1 d9
 +#define RT2 d10
 +#define RT3 d11
 +#define RT4 d12
 +#define RT5 d13
 +#define RT6 d14
 +#define RT7 d15
 +
 +#define RT01q q4
 +#define RT23q q5
 +#define RT45q q6
 +#define RT67q q7
 +
 +#define RW0 d16
 +#define RW1 d17
 +#define RW2 d18
 +#define RW3 d19
 +#define RW4 d20
 +#define RW5 d21
 +#define RW6 d22
 +#define RW7 d23
 +#define RW8 d24
 +#define RW9 d25
 +#define RW10 d26
 +#define RW11 d27
 +#define RW12 d28
 +#define RW13 d29
 +#define RW14 d30
 +#define RW15 d31
 +
 +#define RW01q q8
 +#define RW23q q9
 +#define RW45q q10
 +#define RW67q q11
 +#define RW89q q12
 +#define RW1011q q13
 +#define RW1213q q14
 +#define RW1415q q15
 +
 +/***
 + * ARM assembly implementation of sha512 transform
 + ***/
 +#define rounds2_0_63

Re: [PATCH 2/2] crypto: sha1: add ARM NEON implementation

2014-06-29 Thread Jussi Kivilinna
On 28.06.2014 23:07, Ard Biesheuvel wrote: Hi Jussi,
 
 On 28 June 2014 12:40, Jussi Kivilinna jussi.kivili...@iki.fi wrote:
 This patch adds ARM NEON assembly implementation of SHA-1 algorithm.

 tcrypt benchmark results on Cortex-A8, sha1-arm-asm vs sha1-neon-asm:

 block-size  bytes/updateold-vs-new
 16  16  1.06x
 64  16  1.05x
 64  64  1.09x
 256 16  1.04x
 256 64  1.11x
 256 256 1.28x
 102416  1.04x
 1024256 1.34x
 102410241.42x
 204816  1.04x
 2048256 1.35x
 204810241.44x
 204820481.46x
 409616  1.04x
 4096256 1.36x
 409610241.45x
 409640961.48x
 819216  1.04x
 8192256 1.36x
 819210241.46x
 819240961.49x
 819281921.49x

 
 This is a nice result: about the same speedup as OpenSSL when
 comparing the ALU asm implementation with the NEON.
 
 Signed-off-by: Jussi Kivilinna jussi.kivili...@iki.fi
 ---
  arch/arm/crypto/Makefile   |2
  arch/arm/crypto/sha1-armv7-neon.S  |  635 
 
  arch/arm/crypto/sha1_glue.c|8
  arch/arm/crypto/sha1_neon_glue.c   |  197 +++
  arch/arm/include/asm/crypto/sha1.h |   10 +
  crypto/Kconfig |   11 +
  6 files changed, 860 insertions(+), 3 deletions(-)
  create mode 100644 arch/arm/crypto/sha1-armv7-neon.S
  create mode 100644 arch/arm/crypto/sha1_neon_glue.c
  create mode 100644 arch/arm/include/asm/crypto/sha1.h

 diff --git a/arch/arm/crypto/Makefile b/arch/arm/crypto/Makefile
 index 81cda39..374956d 100644
 --- a/arch/arm/crypto/Makefile
 +++ b/arch/arm/crypto/Makefile
 @@ -5,10 +5,12 @@
  obj-$(CONFIG_CRYPTO_AES_ARM) += aes-arm.o
  obj-$(CONFIG_CRYPTO_AES_ARM_BS) += aes-arm-bs.o
  obj-$(CONFIG_CRYPTO_SHA1_ARM) += sha1-arm.o
 +obj-$(CONFIG_CRYPTO_SHA1_ARM_NEON) += sha1-arm-neon.o

  aes-arm-y  := aes-armv4.o aes_glue.o
  aes-arm-bs-y   := aesbs-core.o aesbs-glue.o
  sha1-arm-y := sha1-armv4-large.o sha1_glue.o
 +sha1-arm-neon-y:= sha1-armv7-neon.o sha1_neon_glue.o

  quiet_cmd_perl = PERL$@
cmd_perl = $(PERL) $()  $(@)
 diff --git a/arch/arm/crypto/sha1-armv7-neon.S 
 b/arch/arm/crypto/sha1-armv7-neon.S
 new file mode 100644
 index 000..beb1ed1
 --- /dev/null
 +++ b/arch/arm/crypto/sha1-armv7-neon.S
 @@ -0,0 +1,635 @@
 +/* sha1-armv7-neon.S - ARM/NEON accelerated SHA-1 transform function
 + *
 + * Copyright © 2013-2014 Jussi Kivilinna jussi.kivili...@iki.fi
 + *
 + * This program is free software; you can redistribute it and/or modify it
 + * under the terms of the GNU General Public License as published by the 
 Free
 + * Software Foundation; either version 2 of the License, or (at your option)
 + * any later version.
 + */
 +
 +.syntax unified
 +#ifdef __thumb2__
 +.thumb
 +#else
 +.code   32
 +#endif
 
 This is all NEON code, which has no size benefit from being assembled
 as Thumb-2. (NEON instructions are 4 bytes in either case)
 If we drop the Thumb-2 versions, there's one less version to test.
 

Ok, I'll drop the .thumb part for both SHA1 and SHA512.

 +.fpu neon
 +
 +.data
 +
 +#define GET_DATA_POINTER(reg, name, rtmp) ldr reg, =name
 +
 [...]
 +.align 4
 +.LK_VEC:
 +.LK1:  .long K1, K1, K1, K1
 +.LK2:  .long K2, K2, K2, K2
 +.LK3:  .long K3, K3, K3, K3
 +.LK4:  .long K4, K4, K4, K4
 
 If you are going to put these constants in a different section, they
 belong in .rodata not .data.
 But why not just keep them in .text? In that case, you can replace the
 above 'ldr reg, =name' with 'adr reg ,name' (or adrl if required) and
 get rid of the .ltorg and the literal pool.
 

Ok, I'll move these to .text.

Actually I realized that these values can be loaded to still free NEON
registers for additional speed up.

 +/*
 + * Transform nblks*64 bytes (nblks*16 32-bit words) at DATA.
 + *
 + * unsigned int
 + * sha1_transform_neon (void *ctx, const unsigned char *data,
 + *  unsigned int nblks)
 + */
 +.align 3
 +.globl sha1_transform_neon
 +.type  sha1_transform_neon,%function;
 +
 +sha1_transform_neon:
 
 ENTRY(sha1_transform_neon) [and matching ENDPROC() below]

Sure.

-Jussi

--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] [v2] crypto: sha1: add ARM NEON implementation

2014-06-29 Thread Jussi Kivilinna
This patch adds ARM NEON assembly implementation of SHA-1 algorithm.

tcrypt benchmark results on Cortex-A8, sha1-arm-asm vs sha1-neon-asm:

block-size  bytes/updateold-vs-new
16  16  1.04x
64  16  1.02x
64  64  1.05x
256 16  1.03x
256 64  1.04x
256 256 1.30x
102416  1.03x
1024256 1.36x
102410241.52x
204816  1.03x
2048256 1.39x
204810241.55x
204820481.59x
409616  1.03x
4096256 1.40x
409610241.57x
409640961.62x
819216  1.03x
8192256 1.40x
819210241.58x
819240961.63x
819281921.63x

Changes in v2:
 - Use ENTRY/ENDPROC
 - Don't provide Thumb2 version
 - Move contants to .text section
 - Further tweaks to implementation for ~10% speed-up.

Signed-off-by: Jussi Kivilinna jussi.kivili...@iki.fi
---
 arch/arm/crypto/Makefile   |2 
 arch/arm/crypto/sha1-armv7-neon.S  |  634 
 arch/arm/crypto/sha1_glue.c|8 
 arch/arm/crypto/sha1_neon_glue.c   |  197 +++
 arch/arm/include/asm/crypto/sha1.h |   10 +
 crypto/Kconfig |   11 +
 6 files changed, 859 insertions(+), 3 deletions(-)
 create mode 100644 arch/arm/crypto/sha1-armv7-neon.S
 create mode 100644 arch/arm/crypto/sha1_neon_glue.c
 create mode 100644 arch/arm/include/asm/crypto/sha1.h

diff --git a/arch/arm/crypto/Makefile b/arch/arm/crypto/Makefile
index 81cda39..374956d 100644
--- a/arch/arm/crypto/Makefile
+++ b/arch/arm/crypto/Makefile
@@ -5,10 +5,12 @@
 obj-$(CONFIG_CRYPTO_AES_ARM) += aes-arm.o
 obj-$(CONFIG_CRYPTO_AES_ARM_BS) += aes-arm-bs.o
 obj-$(CONFIG_CRYPTO_SHA1_ARM) += sha1-arm.o
+obj-$(CONFIG_CRYPTO_SHA1_ARM_NEON) += sha1-arm-neon.o
 
 aes-arm-y  := aes-armv4.o aes_glue.o
 aes-arm-bs-y   := aesbs-core.o aesbs-glue.o
 sha1-arm-y := sha1-armv4-large.o sha1_glue.o
+sha1-arm-neon-y:= sha1-armv7-neon.o sha1_neon_glue.o
 
 quiet_cmd_perl = PERL$@
   cmd_perl = $(PERL) $()  $(@)
diff --git a/arch/arm/crypto/sha1-armv7-neon.S 
b/arch/arm/crypto/sha1-armv7-neon.S
new file mode 100644
index 000..50013c0
--- /dev/null
+++ b/arch/arm/crypto/sha1-armv7-neon.S
@@ -0,0 +1,634 @@
+/* sha1-armv7-neon.S - ARM/NEON accelerated SHA-1 transform function
+ *
+ * Copyright © 2013-2014 Jussi Kivilinna jussi.kivili...@iki.fi
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 2 of the License, or (at your option)
+ * any later version.
+ */
+
+#include linux/linkage.h
+
+
+.syntax unified
+.code   32
+.fpu neon
+
+.text
+
+
+/* Context structure */
+
+#define state_h0 0
+#define state_h1 4
+#define state_h2 8
+#define state_h3 12
+#define state_h4 16
+
+
+/* Constants */
+
+#define K1  0x5A827999
+#define K2  0x6ED9EBA1
+#define K3  0x8F1BBCDC
+#define K4  0xCA62C1D6
+.align 4
+.LK_VEC:
+.LK1:  .long K1, K1, K1, K1
+.LK2:  .long K2, K2, K2, K2
+.LK3:  .long K3, K3, K3, K3
+.LK4:  .long K4, K4, K4, K4
+
+
+/* Register macros */
+
+#define RSTATE r0
+#define RDATA r1
+#define RNBLKS r2
+#define ROLDSTACK r3
+#define RWK lr
+
+#define _a r4
+#define _b r5
+#define _c r6
+#define _d r7
+#define _e r8
+
+#define RT0 r9
+#define RT1 r10
+#define RT2 r11
+#define RT3 r12
+
+#define W0 q0
+#define W1 q1
+#define W2 q2
+#define W3 q3
+#define W4 q4
+#define W5 q5
+#define W6 q6
+#define W7 q7
+
+#define tmp0 q8
+#define tmp1 q9
+#define tmp2 q10
+#define tmp3 q11
+
+#define qK1 q12
+#define qK2 q13
+#define qK3 q14
+#define qK4 q15
+
+
+/* Round function macros. */
+
+#define WK_offs(i) (((i)  15) * 4)
+
+#define _R_F1(a,b,c,d,e,i,pre1,pre2,pre3,i16,\
+ W,W_m04,W_m08,W_m12,W_m16,W_m20,W_m24,W_m28) \
+   ldr RT3, [sp, WK_offs(i)]; \
+   pre1(i16,W,W_m04,W_m08,W_m12,W_m16,W_m20,W_m24,W_m28); \
+   bic RT0, d, b; \
+   add e, e, a, ror #(32 - 5); \
+   and RT1, c, b; \
+   pre2(i16,W,W_m04,W_m08,W_m12,W_m16,W_m20,W_m24,W_m28); \
+   add RT0, RT0, RT3; \
+   add e, e, RT1; \
+   ror b, #(32 - 30); \
+   pre3(i16,W,W_m04,W_m08,W_m12,W_m16,W_m20,W_m24,W_m28); \
+   add e, e, RT0;
+
+#define _R_F2(a,b,c,d,e,i,pre1,pre2,pre3,i16,\
+ W,W_m04,W_m08,W_m12,W_m16,W_m20,W_m24,W_m28) \
+   ldr RT3, [sp, WK_offs(i)]; \
+   pre1(i16,W,W_m04,W_m08,W_m12,W_m16,W_m20,W_m24,W_m28); \
+   eor RT0, d, b; \
+   add e, e, a, ror #(32 - 5); \
+   eor RT0, RT0, c; \
+   pre2(i16,W,W_m04,W_m08,W_m12

[PATCH 1/2] [v2] crypto: sha1/ARM: make use of common SHA-1 structures

2014-06-29 Thread Jussi Kivilinna
Common SHA-1 structures are defined in crypto/sha.h for code sharing.

This patch changes SHA-1/ARM glue code to use these structures.

Acked-by: Ard Biesheuvel ard.biesheu...@linaro.org
Signed-off-by: Jussi Kivilinna jussi.kivili...@iki.fi
---
 arch/arm/crypto/sha1_glue.c |   50 +++
 1 file changed, 22 insertions(+), 28 deletions(-)

diff --git a/arch/arm/crypto/sha1_glue.c b/arch/arm/crypto/sha1_glue.c
index 76cd976..c494e57 100644
--- a/arch/arm/crypto/sha1_glue.c
+++ b/arch/arm/crypto/sha1_glue.c
@@ -24,31 +24,25 @@
 #include crypto/sha.h
 #include asm/byteorder.h
 
-struct SHA1_CTX {
-   uint32_t h0,h1,h2,h3,h4;
-   u64 count;
-   u8 data[SHA1_BLOCK_SIZE];
-};
 
-asmlinkage void sha1_block_data_order(struct SHA1_CTX *digest,
+asmlinkage void sha1_block_data_order(u32 *digest,
const unsigned char *data, unsigned int rounds);
 
 
 static int sha1_init(struct shash_desc *desc)
 {
-   struct SHA1_CTX *sctx = shash_desc_ctx(desc);
-   memset(sctx, 0, sizeof(*sctx));
-   sctx-h0 = SHA1_H0;
-   sctx-h1 = SHA1_H1;
-   sctx-h2 = SHA1_H2;
-   sctx-h3 = SHA1_H3;
-   sctx-h4 = SHA1_H4;
+   struct sha1_state *sctx = shash_desc_ctx(desc);
+
+   *sctx = (struct sha1_state){
+   .state = { SHA1_H0, SHA1_H1, SHA1_H2, SHA1_H3, SHA1_H4 },
+   };
+
return 0;
 }
 
 
-static int __sha1_update(struct SHA1_CTX *sctx, const u8 *data,
-  unsigned int len, unsigned int partial)
+static int __sha1_update(struct sha1_state *sctx, const u8 *data,
+unsigned int len, unsigned int partial)
 {
unsigned int done = 0;
 
@@ -56,17 +50,17 @@ static int __sha1_update(struct SHA1_CTX *sctx, const u8 
*data,
 
if (partial) {
done = SHA1_BLOCK_SIZE - partial;
-   memcpy(sctx-data + partial, data, done);
-   sha1_block_data_order(sctx, sctx-data, 1);
+   memcpy(sctx-buffer + partial, data, done);
+   sha1_block_data_order(sctx-state, sctx-buffer, 1);
}
 
if (len - done = SHA1_BLOCK_SIZE) {
const unsigned int rounds = (len - done) / SHA1_BLOCK_SIZE;
-   sha1_block_data_order(sctx, data + done, rounds);
+   sha1_block_data_order(sctx-state, data + done, rounds);
done += rounds * SHA1_BLOCK_SIZE;
}
 
-   memcpy(sctx-data, data + done, len - done);
+   memcpy(sctx-buffer, data + done, len - done);
return 0;
 }
 
@@ -74,14 +68,14 @@ static int __sha1_update(struct SHA1_CTX *sctx, const u8 
*data,
 static int sha1_update(struct shash_desc *desc, const u8 *data,
 unsigned int len)
 {
-   struct SHA1_CTX *sctx = shash_desc_ctx(desc);
+   struct sha1_state *sctx = shash_desc_ctx(desc);
unsigned int partial = sctx-count % SHA1_BLOCK_SIZE;
int res;
 
/* Handle the fast case right here */
if (partial + len  SHA1_BLOCK_SIZE) {
sctx-count += len;
-   memcpy(sctx-data + partial, data, len);
+   memcpy(sctx-buffer + partial, data, len);
return 0;
}
res = __sha1_update(sctx, data, len, partial);
@@ -92,7 +86,7 @@ static int sha1_update(struct shash_desc *desc, const u8 
*data,
 /* Add padding and return the message digest. */
 static int sha1_final(struct shash_desc *desc, u8 *out)
 {
-   struct SHA1_CTX *sctx = shash_desc_ctx(desc);
+   struct sha1_state *sctx = shash_desc_ctx(desc);
unsigned int i, index, padlen;
__be32 *dst = (__be32 *)out;
__be64 bits;
@@ -106,7 +100,7 @@ static int sha1_final(struct shash_desc *desc, u8 *out)
/* We need to fill a whole block for __sha1_update() */
if (padlen = 56) {
sctx-count += padlen;
-   memcpy(sctx-data + index, padding, padlen);
+   memcpy(sctx-buffer + index, padding, padlen);
} else {
__sha1_update(sctx, padding, padlen, index);
}
@@ -114,7 +108,7 @@ static int sha1_final(struct shash_desc *desc, u8 *out)
 
/* Store state in digest */
for (i = 0; i  5; i++)
-   dst[i] = cpu_to_be32(((u32 *)sctx)[i]);
+   dst[i] = cpu_to_be32(sctx-state[i]);
 
/* Wipe context */
memset(sctx, 0, sizeof(*sctx));
@@ -124,7 +118,7 @@ static int sha1_final(struct shash_desc *desc, u8 *out)
 
 static int sha1_export(struct shash_desc *desc, void *out)
 {
-   struct SHA1_CTX *sctx = shash_desc_ctx(desc);
+   struct sha1_state *sctx = shash_desc_ctx(desc);
memcpy(out, sctx, sizeof(*sctx));
return 0;
 }
@@ -132,7 +126,7 @@ static int sha1_export(struct shash_desc *desc, void *out)
 
 static int sha1_import(struct shash_desc *desc, const void *in)
 {
-   struct SHA1_CTX *sctx = shash_desc_ctx(desc);
+   struct sha1_state *sctx = shash_desc_ctx(desc

[PATCH] [v2] crypto: sha512: add ARM NEON implementation

2014-06-29 Thread Jussi Kivilinna
This patch adds ARM NEON assembly implementation of SHA-512 and SHA-384
algorithms.

tcrypt benchmark results on Cortex-A8, sha512-generic vs sha512-neon-asm:

block-size  bytes/updateold-vs-new
16  16  2.99x
64  16  2.67x
64  64  3.00x
256 16  2.64x
256 64  3.06x
256 256 3.33x
102416  2.53x
1024256 3.39x
102410243.52x
204816  2.50x
2048256 3.41x
204810243.54x
204820483.57x
409616  2.49x
4096256 3.42x
409610243.56x
409640963.59x
819216  2.48x
8192256 3.42x
819210243.56x
819240963.60x
819281923.60x

Changes in v2:
 - Use ENTRY/ENDPROC
 - Don't provide Thumb2 version

Signed-off-by: Jussi Kivilinna jussi.kivili...@iki.fi
---
 arch/arm/crypto/Makefile|2 
 arch/arm/crypto/sha512-armv7-neon.S |  455 +++
 arch/arm/crypto/sha512_neon_glue.c  |  305 +++
 crypto/Kconfig  |   15 +
 4 files changed, 777 insertions(+)
 create mode 100644 arch/arm/crypto/sha512-armv7-neon.S
 create mode 100644 arch/arm/crypto/sha512_neon_glue.c

diff --git a/arch/arm/crypto/Makefile b/arch/arm/crypto/Makefile
index 374956d..b48fa34 100644
--- a/arch/arm/crypto/Makefile
+++ b/arch/arm/crypto/Makefile
@@ -6,11 +6,13 @@ obj-$(CONFIG_CRYPTO_AES_ARM) += aes-arm.o
 obj-$(CONFIG_CRYPTO_AES_ARM_BS) += aes-arm-bs.o
 obj-$(CONFIG_CRYPTO_SHA1_ARM) += sha1-arm.o
 obj-$(CONFIG_CRYPTO_SHA1_ARM_NEON) += sha1-arm-neon.o
+obj-$(CONFIG_CRYPTO_SHA512_ARM_NEON) += sha512-arm-neon.o
 
 aes-arm-y  := aes-armv4.o aes_glue.o
 aes-arm-bs-y   := aesbs-core.o aesbs-glue.o
 sha1-arm-y := sha1-armv4-large.o sha1_glue.o
 sha1-arm-neon-y:= sha1-armv7-neon.o sha1_neon_glue.o
+sha512-arm-neon-y := sha512-armv7-neon.o sha512_neon_glue.o
 
 quiet_cmd_perl = PERL$@
   cmd_perl = $(PERL) $()  $(@)
diff --git a/arch/arm/crypto/sha512-armv7-neon.S 
b/arch/arm/crypto/sha512-armv7-neon.S
new file mode 100644
index 000..fe99472
--- /dev/null
+++ b/arch/arm/crypto/sha512-armv7-neon.S
@@ -0,0 +1,455 @@
+/* sha512-armv7-neon.S  -  ARM/NEON assembly implementation of SHA-512 
transform
+ *
+ * Copyright © 2013-2014 Jussi Kivilinna jussi.kivili...@iki.fi
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 2 of the License, or (at your option)
+ * any later version.
+ */
+
+#include linux/linkage.h
+
+
+.syntax unified
+.code   32
+.fpu neon
+
+.text
+
+/* structure of SHA512_CONTEXT */
+#define hd_a 0
+#define hd_b ((hd_a) + 8)
+#define hd_c ((hd_b) + 8)
+#define hd_d ((hd_c) + 8)
+#define hd_e ((hd_d) + 8)
+#define hd_f ((hd_e) + 8)
+#define hd_g ((hd_f) + 8)
+
+/* register macros */
+#define RK %r2
+
+#define RA d0
+#define RB d1
+#define RC d2
+#define RD d3
+#define RE d4
+#define RF d5
+#define RG d6
+#define RH d7
+
+#define RT0 d8
+#define RT1 d9
+#define RT2 d10
+#define RT3 d11
+#define RT4 d12
+#define RT5 d13
+#define RT6 d14
+#define RT7 d15
+
+#define RT01q q4
+#define RT23q q5
+#define RT45q q6
+#define RT67q q7
+
+#define RW0 d16
+#define RW1 d17
+#define RW2 d18
+#define RW3 d19
+#define RW4 d20
+#define RW5 d21
+#define RW6 d22
+#define RW7 d23
+#define RW8 d24
+#define RW9 d25
+#define RW10 d26
+#define RW11 d27
+#define RW12 d28
+#define RW13 d29
+#define RW14 d30
+#define RW15 d31
+
+#define RW01q q8
+#define RW23q q9
+#define RW45q q10
+#define RW67q q11
+#define RW89q q12
+#define RW1011q q13
+#define RW1213q q14
+#define RW1415q q15
+
+/***
+ * ARM assembly implementation of sha512 transform
+ ***/
+#define rounds2_0_63(ra, rb, rc, rd, re, rf, rg, rh, rw0, rw1, rw01q, rw2, \
+ rw23q, rw1415q, rw9, rw10, interleave_op, arg1) \
+   /* t1 = h + Sum1 (e) + Ch (e, f, g) + k[t] + w[t]; */ \
+   vshr.u64 RT2, re, #14; \
+   vshl.u64 RT3, re, #64 - 14; \
+   interleave_op(arg1); \
+   vshr.u64 RT4, re, #18; \
+   vshl.u64 RT5, re, #64 - 18; \
+   vld1.64 {RT0}, [RK]!; \
+   veor.64 RT23q, RT23q, RT45q; \
+   vshr.u64 RT4, re, #41; \
+   vshl.u64 RT5, re, #64 - 41; \
+   vadd.u64 RT0, RT0, rw0; \
+   veor.64 RT23q, RT23q, RT45q; \
+   vmov.64 RT7, re; \
+   veor.64 RT1, RT2, RT3; \
+   vbsl.64 RT7, rf, rg; \
+   \
+   vadd.u64 RT1, RT1, rh; \
+   vshr.u64 RT2

[PATCH 1/2] crypto: sha1/ARM: make use of common SHA-1 structures

2014-06-28 Thread Jussi Kivilinna
Common SHA-1 structures are defined in crypto/sha.h for code sharing.

This patch changes SHA-1/ARM glue code to use these structures.

Signed-off-by: Jussi Kivilinna jussi.kivili...@iki.fi
---
 arch/arm/crypto/sha1_glue.c |   50 +++
 1 file changed, 22 insertions(+), 28 deletions(-)

diff --git a/arch/arm/crypto/sha1_glue.c b/arch/arm/crypto/sha1_glue.c
index 76cd976..c494e57 100644
--- a/arch/arm/crypto/sha1_glue.c
+++ b/arch/arm/crypto/sha1_glue.c
@@ -24,31 +24,25 @@
 #include crypto/sha.h
 #include asm/byteorder.h
 
-struct SHA1_CTX {
-   uint32_t h0,h1,h2,h3,h4;
-   u64 count;
-   u8 data[SHA1_BLOCK_SIZE];
-};
 
-asmlinkage void sha1_block_data_order(struct SHA1_CTX *digest,
+asmlinkage void sha1_block_data_order(u32 *digest,
const unsigned char *data, unsigned int rounds);
 
 
 static int sha1_init(struct shash_desc *desc)
 {
-   struct SHA1_CTX *sctx = shash_desc_ctx(desc);
-   memset(sctx, 0, sizeof(*sctx));
-   sctx-h0 = SHA1_H0;
-   sctx-h1 = SHA1_H1;
-   sctx-h2 = SHA1_H2;
-   sctx-h3 = SHA1_H3;
-   sctx-h4 = SHA1_H4;
+   struct sha1_state *sctx = shash_desc_ctx(desc);
+
+   *sctx = (struct sha1_state){
+   .state = { SHA1_H0, SHA1_H1, SHA1_H2, SHA1_H3, SHA1_H4 },
+   };
+
return 0;
 }
 
 
-static int __sha1_update(struct SHA1_CTX *sctx, const u8 *data,
-  unsigned int len, unsigned int partial)
+static int __sha1_update(struct sha1_state *sctx, const u8 *data,
+unsigned int len, unsigned int partial)
 {
unsigned int done = 0;
 
@@ -56,17 +50,17 @@ static int __sha1_update(struct SHA1_CTX *sctx, const u8 
*data,
 
if (partial) {
done = SHA1_BLOCK_SIZE - partial;
-   memcpy(sctx-data + partial, data, done);
-   sha1_block_data_order(sctx, sctx-data, 1);
+   memcpy(sctx-buffer + partial, data, done);
+   sha1_block_data_order(sctx-state, sctx-buffer, 1);
}
 
if (len - done = SHA1_BLOCK_SIZE) {
const unsigned int rounds = (len - done) / SHA1_BLOCK_SIZE;
-   sha1_block_data_order(sctx, data + done, rounds);
+   sha1_block_data_order(sctx-state, data + done, rounds);
done += rounds * SHA1_BLOCK_SIZE;
}
 
-   memcpy(sctx-data, data + done, len - done);
+   memcpy(sctx-buffer, data + done, len - done);
return 0;
 }
 
@@ -74,14 +68,14 @@ static int __sha1_update(struct SHA1_CTX *sctx, const u8 
*data,
 static int sha1_update(struct shash_desc *desc, const u8 *data,
 unsigned int len)
 {
-   struct SHA1_CTX *sctx = shash_desc_ctx(desc);
+   struct sha1_state *sctx = shash_desc_ctx(desc);
unsigned int partial = sctx-count % SHA1_BLOCK_SIZE;
int res;
 
/* Handle the fast case right here */
if (partial + len  SHA1_BLOCK_SIZE) {
sctx-count += len;
-   memcpy(sctx-data + partial, data, len);
+   memcpy(sctx-buffer + partial, data, len);
return 0;
}
res = __sha1_update(sctx, data, len, partial);
@@ -92,7 +86,7 @@ static int sha1_update(struct shash_desc *desc, const u8 
*data,
 /* Add padding and return the message digest. */
 static int sha1_final(struct shash_desc *desc, u8 *out)
 {
-   struct SHA1_CTX *sctx = shash_desc_ctx(desc);
+   struct sha1_state *sctx = shash_desc_ctx(desc);
unsigned int i, index, padlen;
__be32 *dst = (__be32 *)out;
__be64 bits;
@@ -106,7 +100,7 @@ static int sha1_final(struct shash_desc *desc, u8 *out)
/* We need to fill a whole block for __sha1_update() */
if (padlen = 56) {
sctx-count += padlen;
-   memcpy(sctx-data + index, padding, padlen);
+   memcpy(sctx-buffer + index, padding, padlen);
} else {
__sha1_update(sctx, padding, padlen, index);
}
@@ -114,7 +108,7 @@ static int sha1_final(struct shash_desc *desc, u8 *out)
 
/* Store state in digest */
for (i = 0; i  5; i++)
-   dst[i] = cpu_to_be32(((u32 *)sctx)[i]);
+   dst[i] = cpu_to_be32(sctx-state[i]);
 
/* Wipe context */
memset(sctx, 0, sizeof(*sctx));
@@ -124,7 +118,7 @@ static int sha1_final(struct shash_desc *desc, u8 *out)
 
 static int sha1_export(struct shash_desc *desc, void *out)
 {
-   struct SHA1_CTX *sctx = shash_desc_ctx(desc);
+   struct sha1_state *sctx = shash_desc_ctx(desc);
memcpy(out, sctx, sizeof(*sctx));
return 0;
 }
@@ -132,7 +126,7 @@ static int sha1_export(struct shash_desc *desc, void *out)
 
 static int sha1_import(struct shash_desc *desc, const void *in)
 {
-   struct SHA1_CTX *sctx = shash_desc_ctx(desc);
+   struct sha1_state *sctx = shash_desc_ctx(desc);
memcpy(sctx, in, sizeof(*sctx

[PATCH 2/2] crypto: sha1: add ARM NEON implementation

2014-06-28 Thread Jussi Kivilinna
This patch adds ARM NEON assembly implementation of SHA-1 algorithm.

tcrypt benchmark results on Cortex-A8, sha1-arm-asm vs sha1-neon-asm:

block-size  bytes/updateold-vs-new
16  16  1.06x
64  16  1.05x
64  64  1.09x
256 16  1.04x
256 64  1.11x
256 256 1.28x
102416  1.04x
1024256 1.34x
102410241.42x
204816  1.04x
2048256 1.35x
204810241.44x
204820481.46x
409616  1.04x
4096256 1.36x
409610241.45x
409640961.48x
819216  1.04x
8192256 1.36x
819210241.46x
819240961.49x
819281921.49x

Signed-off-by: Jussi Kivilinna jussi.kivili...@iki.fi
---
 arch/arm/crypto/Makefile   |2 
 arch/arm/crypto/sha1-armv7-neon.S  |  635 
 arch/arm/crypto/sha1_glue.c|8 
 arch/arm/crypto/sha1_neon_glue.c   |  197 +++
 arch/arm/include/asm/crypto/sha1.h |   10 +
 crypto/Kconfig |   11 +
 6 files changed, 860 insertions(+), 3 deletions(-)
 create mode 100644 arch/arm/crypto/sha1-armv7-neon.S
 create mode 100644 arch/arm/crypto/sha1_neon_glue.c
 create mode 100644 arch/arm/include/asm/crypto/sha1.h

diff --git a/arch/arm/crypto/Makefile b/arch/arm/crypto/Makefile
index 81cda39..374956d 100644
--- a/arch/arm/crypto/Makefile
+++ b/arch/arm/crypto/Makefile
@@ -5,10 +5,12 @@
 obj-$(CONFIG_CRYPTO_AES_ARM) += aes-arm.o
 obj-$(CONFIG_CRYPTO_AES_ARM_BS) += aes-arm-bs.o
 obj-$(CONFIG_CRYPTO_SHA1_ARM) += sha1-arm.o
+obj-$(CONFIG_CRYPTO_SHA1_ARM_NEON) += sha1-arm-neon.o
 
 aes-arm-y  := aes-armv4.o aes_glue.o
 aes-arm-bs-y   := aesbs-core.o aesbs-glue.o
 sha1-arm-y := sha1-armv4-large.o sha1_glue.o
+sha1-arm-neon-y:= sha1-armv7-neon.o sha1_neon_glue.o
 
 quiet_cmd_perl = PERL$@
   cmd_perl = $(PERL) $()  $(@)
diff --git a/arch/arm/crypto/sha1-armv7-neon.S 
b/arch/arm/crypto/sha1-armv7-neon.S
new file mode 100644
index 000..beb1ed1
--- /dev/null
+++ b/arch/arm/crypto/sha1-armv7-neon.S
@@ -0,0 +1,635 @@
+/* sha1-armv7-neon.S - ARM/NEON accelerated SHA-1 transform function
+ *
+ * Copyright © 2013-2014 Jussi Kivilinna jussi.kivili...@iki.fi
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 2 of the License, or (at your option)
+ * any later version.
+ */
+
+.syntax unified
+#ifdef __thumb2__
+.thumb
+#else
+.code   32
+#endif
+.fpu neon
+
+.data
+
+#define GET_DATA_POINTER(reg, name, rtmp) ldr reg, =name
+
+/* Context structure */
+
+#define state_h0 0
+#define state_h1 4
+#define state_h2 8
+#define state_h3 12
+#define state_h4 16
+
+
+/* Constants */
+
+#define K1  0x5A827999
+#define K2  0x6ED9EBA1
+#define K3  0x8F1BBCDC
+#define K4  0xCA62C1D6
+.align 4
+.LK_VEC:
+.LK1:  .long K1, K1, K1, K1
+.LK2:  .long K2, K2, K2, K2
+.LK3:  .long K3, K3, K3, K3
+.LK4:  .long K4, K4, K4, K4
+
+
+.text
+
+/* Register macros */
+
+#define RSTATE r0
+#define RDATA r1
+#define RNBLKS r2
+#define ROLDSTACK r3
+#define RK lr
+#define RWK r12
+
+#define _a r4
+#define _b r5
+#define _c r6
+#define _d r7
+#define _e r8
+
+#define RT0 r9
+#define RT1 r10
+#define RT2 r11
+
+#define W0 q0
+#define W1 q1
+#define W2 q2
+#define W3 q3
+#define W4 q4
+#define W5 q5
+#define W6 q6
+#define W7 q7
+
+#define tmp0 q8
+#define tmp1 q9
+#define tmp2 q10
+#define tmp3 q11
+
+#define curK q12
+
+
+/* Round function macros. */
+
+#define WK_offs(i) (((i)  15) * 4)
+
+#define _R_F1(a,b,c,d,e,i,pre1,pre2,pre3,i16,\
+ W,W_m04,W_m08,W_m12,W_m16,W_m20,W_m24,W_m28) \
+   and RT0, c, b; \
+   pre1(i16,W,W_m04,W_m08,W_m12,W_m16,W_m20,W_m24,W_m28); \
+   add e, e, a, ror #(32 - 5); \
+   ldr RT2, [sp, WK_offs(i)]; \
+   bic RT1, d, b; \
+   add e, RT2; \
+   pre2(i16,W,W_m04,W_m08,W_m12,W_m16,W_m20,W_m24,W_m28); \
+   ror b, #(32 - 30); \
+   eor RT0, RT1; \
+   pre3(i16,W,W_m04,W_m08,W_m12,W_m16,W_m20,W_m24,W_m28); \
+   add e, RT0;
+
+#define _R_F2(a,b,c,d,e,i,pre1,pre2,pre3,i16,\
+ W,W_m04,W_m08,W_m12,W_m16,W_m20,W_m24,W_m28) \
+   eor RT0, c, b; \
+   pre1(i16,W,W_m04,W_m08,W_m12,W_m16,W_m20,W_m24,W_m28); \
+   add e, e, a, ror #(32 - 5); \
+   ldr RT2, [sp, WK_offs(i)]; \
+   eor RT0, d; \
+   pre2(i16,W,W_m04,W_m08,W_m12,W_m16,W_m20,W_m24,W_m28); \
+   add e, RT2; \
+   ror b, #(32 - 30); \
+   pre3(i16,W,W_m04,W_m08,W_m12,W_m16,W_m20,W_m24,W_m28

[PATCH] crypto: sha512: add ARM NEON implementation

2014-06-28 Thread Jussi Kivilinna
This patch adds ARM NEON assembly implementation of SHA-512 and SHA-384
algorithms.

tcrypt benchmark results on Cortex-A8, sha512-generic vs sha512-neon-asm:

block-size  bytes/updateold-vs-new
16  16  2.99x
64  16  2.67x
64  64  3.00x
256 16  2.64x
256 64  3.06x
256 256 3.33x
102416  2.53x
1024256 3.39x
102410243.52x
204816  2.50x
2048256 3.41x
204810243.54x
204820483.57x
409616  2.49x
4096256 3.42x
409610243.56x
409640963.59x
819216  2.48x
8192256 3.42x
819210243.56x
819240963.60x
819281923.60x

Signed-off-by: Jussi Kivilinna jussi.kivili...@iki.fi
---
 arch/arm/crypto/Makefile|2 
 arch/arm/crypto/sha512-armv7-neon.S |  461 +++
 arch/arm/crypto/sha512_neon_glue.c  |  305 +++
 crypto/Kconfig  |   15 +
 4 files changed, 783 insertions(+)
 create mode 100644 arch/arm/crypto/sha512-armv7-neon.S
 create mode 100644 arch/arm/crypto/sha512_neon_glue.c

diff --git a/arch/arm/crypto/Makefile b/arch/arm/crypto/Makefile
index 374956d..b48fa34 100644
--- a/arch/arm/crypto/Makefile
+++ b/arch/arm/crypto/Makefile
@@ -6,11 +6,13 @@ obj-$(CONFIG_CRYPTO_AES_ARM) += aes-arm.o
 obj-$(CONFIG_CRYPTO_AES_ARM_BS) += aes-arm-bs.o
 obj-$(CONFIG_CRYPTO_SHA1_ARM) += sha1-arm.o
 obj-$(CONFIG_CRYPTO_SHA1_ARM_NEON) += sha1-arm-neon.o
+obj-$(CONFIG_CRYPTO_SHA512_ARM_NEON) += sha512-arm-neon.o
 
 aes-arm-y  := aes-armv4.o aes_glue.o
 aes-arm-bs-y   := aesbs-core.o aesbs-glue.o
 sha1-arm-y := sha1-armv4-large.o sha1_glue.o
 sha1-arm-neon-y:= sha1-armv7-neon.o sha1_neon_glue.o
+sha512-arm-neon-y := sha512-armv7-neon.o sha512_neon_glue.o
 
 quiet_cmd_perl = PERL$@
   cmd_perl = $(PERL) $()  $(@)
diff --git a/arch/arm/crypto/sha512-armv7-neon.S 
b/arch/arm/crypto/sha512-armv7-neon.S
new file mode 100644
index 000..cdc6385
--- /dev/null
+++ b/arch/arm/crypto/sha512-armv7-neon.S
@@ -0,0 +1,461 @@
+/* sha512-armv7-neon.S  -  ARM/NEON assembly implementation of SHA-512 
transform
+ *
+ * Copyright © 2013-2014 Jussi Kivilinna jussi.kivili...@iki.fi
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 2 of the License, or (at your option)
+ * any later version.
+ */
+
+.syntax unified
+#ifdef __thumb2__
+.thumb
+#else
+.code   32
+#endif
+.fpu neon
+
+.text
+
+/* structure of SHA512_CONTEXT */
+#define hd_a 0
+#define hd_b ((hd_a) + 8)
+#define hd_c ((hd_b) + 8)
+#define hd_d ((hd_c) + 8)
+#define hd_e ((hd_d) + 8)
+#define hd_f ((hd_e) + 8)
+#define hd_g ((hd_f) + 8)
+
+/* register macros */
+#define RK %r2
+
+#define RA d0
+#define RB d1
+#define RC d2
+#define RD d3
+#define RE d4
+#define RF d5
+#define RG d6
+#define RH d7
+
+#define RT0 d8
+#define RT1 d9
+#define RT2 d10
+#define RT3 d11
+#define RT4 d12
+#define RT5 d13
+#define RT6 d14
+#define RT7 d15
+
+#define RT01q q4
+#define RT23q q5
+#define RT45q q6
+#define RT67q q7
+
+#define RW0 d16
+#define RW1 d17
+#define RW2 d18
+#define RW3 d19
+#define RW4 d20
+#define RW5 d21
+#define RW6 d22
+#define RW7 d23
+#define RW8 d24
+#define RW9 d25
+#define RW10 d26
+#define RW11 d27
+#define RW12 d28
+#define RW13 d29
+#define RW14 d30
+#define RW15 d31
+
+#define RW01q q8
+#define RW23q q9
+#define RW45q q10
+#define RW67q q11
+#define RW89q q12
+#define RW1011q q13
+#define RW1213q q14
+#define RW1415q q15
+
+/***
+ * ARM assembly implementation of sha512 transform
+ ***/
+#define rounds2_0_63(ra, rb, rc, rd, re, rf, rg, rh, rw0, rw1, rw01q, rw2, \
+ rw23q, rw1415q, rw9, rw10, interleave_op, arg1) \
+   /* t1 = h + Sum1 (e) + Ch (e, f, g) + k[t] + w[t]; */ \
+   vshr.u64 RT2, re, #14; \
+   vshl.u64 RT3, re, #64 - 14; \
+   interleave_op(arg1); \
+   vshr.u64 RT4, re, #18; \
+   vshl.u64 RT5, re, #64 - 18; \
+   vld1.64 {RT0}, [RK]!; \
+   veor.64 RT23q, RT23q, RT45q; \
+   vshr.u64 RT4, re, #41; \
+   vshl.u64 RT5, re, #64 - 41; \
+   vadd.u64 RT0, RT0, rw0; \
+   veor.64 RT23q, RT23q, RT45q; \
+   vmov.64 RT7, re; \
+   veor.64 RT1, RT2, RT3; \
+   vbsl.64 RT7, rf, rg; \
+   \
+   vadd.u64 RT1, RT1, rh; \
+   vshr.u64 RT2, ra, #28; \
+   vshl.u64 RT3, ra, #64 - 28

[PATCH] crypto: des3_ede/x86-64: fix parse warning

2014-06-23 Thread Jussi Kivilinna
Patch fixes following sparse warning:

  CHECK   arch/x86/crypto/des3_ede_glue.c
arch/x86/crypto/des3_ede_glue.c:308:52: warning: restricted __be64 degrades to 
integer
arch/x86/crypto/des3_ede_glue.c:309:52: warning: restricted __be64 degrades to 
integer
arch/x86/crypto/des3_ede_glue.c:310:52: warning: restricted __be64 degrades to 
integer
arch/x86/crypto/des3_ede_glue.c:326:44: warning: restricted __be64 degrades to 
integer

Reported-by: kbuild test robot fengguang...@intel.com
Signed-off-by: Jussi Kivilinna jussi.kivili...@iki.fi
---
 arch/x86/crypto/des3_ede_glue.c |4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/crypto/des3_ede_glue.c b/arch/x86/crypto/des3_ede_glue.c
index ebc4215..0e9c066 100644
--- a/arch/x86/crypto/des3_ede_glue.c
+++ b/arch/x86/crypto/des3_ede_glue.c
@@ -289,8 +289,8 @@ static unsigned int __ctr_crypt(struct blkcipher_desc *desc,
struct des3_ede_x86_ctx *ctx = crypto_blkcipher_ctx(desc-tfm);
unsigned int bsize = DES3_EDE_BLOCK_SIZE;
unsigned int nbytes = walk-nbytes;
-   u64 *src = (u64 *)walk-src.virt.addr;
-   u64 *dst = (u64 *)walk-dst.virt.addr;
+   __be64 *src = (__be64 *)walk-src.virt.addr;
+   __be64 *dst = (__be64 *)walk-dst.virt.addr;
u64 ctrblk = be64_to_cpu(*(__be64 *)walk-iv);
__be64 ctrblocks[3];
 

--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] crypto: sha512_ssse3: fix byte count to bit count conversion

2014-06-23 Thread Jussi Kivilinna
Byte-to-bit-count computation is only partly converted to big-endian and is
mixing in CPU-endian values. Problem was noticed by sparce with warning:

  CHECK   arch/x86/crypto/sha512_ssse3_glue.c
arch/x86/crypto/sha512_ssse3_glue.c:144:19: warning: restricted __be64 degrades 
to integer
arch/x86/crypto/sha512_ssse3_glue.c:144:17: warning: incorrect type in 
assignment (different base types)
arch/x86/crypto/sha512_ssse3_glue.c:144:17:expected restricted __be64 
noident
arch/x86/crypto/sha512_ssse3_glue.c:144:17:got unsigned long long

Cc: Tim Chen tim.c.c...@linux.intel.com
Signed-off-by: Jussi Kivilinna jussi.kivili...@iki.fi
---
 arch/x86/crypto/sha512_ssse3_glue.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/crypto/sha512_ssse3_glue.c 
b/arch/x86/crypto/sha512_ssse3_glue.c
index f30cd10..8626b03 100644
--- a/arch/x86/crypto/sha512_ssse3_glue.c
+++ b/arch/x86/crypto/sha512_ssse3_glue.c
@@ -141,7 +141,7 @@ static int sha512_ssse3_final(struct shash_desc *desc, u8 
*out)
 
/* save number of bits */
bits[1] = cpu_to_be64(sctx-count[0]  3);
-   bits[0] = cpu_to_be64(sctx-count[1]  3) | sctx-count[0]  61;
+   bits[0] = cpu_to_be64(sctx-count[1]  3 | sctx-count[0]  61);
 
/* Pad out to 112 mod 128 and append length */
index = sctx-count[0]  0x7f;

--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] crypto: des_3des - add x86-64 assembly implementation

2014-06-09 Thread Jussi Kivilinna
Patch adds x86_64 assembly implementation of Triple DES EDE cipher algorithm.
Two assembly implementations are provided. First is regular 'one-block at
time' encrypt/decrypt function. Second is 'three-blocks at time' function that
gains performance increase on out-of-order CPUs.

tcrypt test results:

Intel Core i5-4570:

des3_ede-asm vs des3_ede-generic:
sizeecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec
16B 1.21x   1.22x   1.27x   1.36x   1.25x   1.25x
64B 1.98x   1.96x   1.23x   2.04x   2.01x   2.00x
256B2.34x   2.37x   1.21x   2.40x   2.38x   2.39x
1024B   2.50x   2.47x   1.22x   2.51x   2.52x   2.51x
8192B   2.51x   2.53x   1.21x   2.56x   2.54x   2.55x

Signed-off-by: Jussi Kivilinna jussi.kivili...@iki.fi
---
 arch/x86/crypto/Makefile  |2 
 arch/x86/crypto/des3_ede-asm_64.S |  805 +
 arch/x86/crypto/des3_ede_glue.c   |  509 +++
 crypto/Kconfig|   13 +
 crypto/des_generic.c  |   22 +
 include/crypto/des.h  |3 
 6 files changed, 1349 insertions(+), 5 deletions(-)
 create mode 100644 arch/x86/crypto/des3_ede-asm_64.S
 create mode 100644 arch/x86/crypto/des3_ede_glue.c

diff --git a/arch/x86/crypto/Makefile b/arch/x86/crypto/Makefile
index 61d6e28..a470de2 100644
--- a/arch/x86/crypto/Makefile
+++ b/arch/x86/crypto/Makefile
@@ -14,6 +14,7 @@ obj-$(CONFIG_CRYPTO_SALSA20_586) += salsa20-i586.o
 obj-$(CONFIG_CRYPTO_SERPENT_SSE2_586) += serpent-sse2-i586.o
 
 obj-$(CONFIG_CRYPTO_AES_X86_64) += aes-x86_64.o
+obj-$(CONFIG_CRYPTO_DES3_EDE_X86_64) += des3_ede-x86_64.o
 obj-$(CONFIG_CRYPTO_CAMELLIA_X86_64) += camellia-x86_64.o
 obj-$(CONFIG_CRYPTO_BLOWFISH_X86_64) += blowfish-x86_64.o
 obj-$(CONFIG_CRYPTO_TWOFISH_X86_64) += twofish-x86_64.o
@@ -52,6 +53,7 @@ salsa20-i586-y := salsa20-i586-asm_32.o salsa20_glue.o
 serpent-sse2-i586-y := serpent-sse2-i586-asm_32.o serpent_sse2_glue.o
 
 aes-x86_64-y := aes-x86_64-asm_64.o aes_glue.o
+des3_ede-x86_64-y := des3_ede-asm_64.o des3_ede_glue.o
 camellia-x86_64-y := camellia-x86_64-asm_64.o camellia_glue.o
 blowfish-x86_64-y := blowfish-x86_64-asm_64.o blowfish_glue.o
 twofish-x86_64-y := twofish-x86_64-asm_64.o twofish_glue.o
diff --git a/arch/x86/crypto/des3_ede-asm_64.S 
b/arch/x86/crypto/des3_ede-asm_64.S
new file mode 100644
index 000..038f6ae
--- /dev/null
+++ b/arch/x86/crypto/des3_ede-asm_64.S
@@ -0,0 +1,805 @@
+/*
+ * des3_ede-asm_64.S  -  x86-64 assembly implementation of 3DES cipher
+ *
+ * Copyright © 2014 Jussi Kivilinna jussi.kivili...@iki.fi
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#include linux/linkage.h
+
+.file des3_ede-asm_64.S
+.text
+
+#define s1 .L_s1
+#define s2 ((s1) + (64*8))
+#define s3 ((s2) + (64*8))
+#define s4 ((s3) + (64*8))
+#define s5 ((s4) + (64*8))
+#define s6 ((s5) + (64*8))
+#define s7 ((s6) + (64*8))
+#define s8 ((s7) + (64*8))
+
+/* register macros */
+#define CTX %rdi
+
+#define RL0 %r8
+#define RL1 %r9
+#define RL2 %r10
+
+#define RL0d %r8d
+#define RL1d %r9d
+#define RL2d %r10d
+
+#define RR0 %r11
+#define RR1 %r12
+#define RR2 %r13
+
+#define RR0d %r11d
+#define RR1d %r12d
+#define RR2d %r13d
+
+#define RW0 %rax
+#define RW1 %rbx
+#define RW2 %rcx
+
+#define RW0d %eax
+#define RW1d %ebx
+#define RW2d %ecx
+
+#define RW0bl %al
+#define RW1bl %bl
+#define RW2bl %cl
+
+#define RW0bh %ah
+#define RW1bh %bh
+#define RW2bh %ch
+
+#define RT0 %r15
+#define RT1 %rbp
+#define RT2 %r14
+#define RT3 %rdx
+
+#define RT0d %r15d
+#define RT1d %ebp
+#define RT2d %r14d
+#define RT3d %edx
+
+/***
+ * 1-way 3DES
+ ***/
+#define do_permutation(a, b, offset, mask) \
+   movl a, RT0d; \
+   shrl $(offset), RT0d; \
+   xorl b, RT0d; \
+   andl $(mask), RT0d; \
+   xorl RT0d, b; \
+   shll $(offset), RT0d; \
+   xorl RT0d, a;
+
+#define expand_to_64bits(val, mask) \
+   movl val##d, RT0d; \
+   rorl $4, RT0d; \
+   shlq $32, RT0; \
+   orq RT0, val; \
+   andq mask, val;
+
+#define compress_to_64bits(val) \
+   movq val, RT0; \
+   shrq $32, RT0; \
+   roll $4, RT0d; \
+   orl RT0d, val##d;
+
+#define initial_permutation(left, right) \
+   do_permutation(left##d, right##d,  4, 0x0f0f0f0f); \
+   do_permutation(left##d, right##d, 16, 0x); \
+   do_permutation(right##d, left##d,  2, 0x); \
+   do_permutation(right##d

[PATCH 1/2] crypto: tcrypt - add ctr(des3_ede) sync speed test

2014-06-09 Thread Jussi Kivilinna
Signed-off-by: Jussi Kivilinna jussi.kivili...@iki.fi
---
 crypto/tcrypt.c |6 ++
 1 file changed, 6 insertions(+)

diff --git a/crypto/tcrypt.c b/crypto/tcrypt.c
index ba247cf..164ec0e 100644
--- a/crypto/tcrypt.c
+++ b/crypto/tcrypt.c
@@ -1585,6 +1585,12 @@ static int do_test(int m)
test_cipher_speed(cbc(des3_ede), DECRYPT, sec,
des3_speed_template, DES3_SPEED_VECTORS,
speed_template_24);
+   test_cipher_speed(ctr(des3_ede), ENCRYPT, sec,
+   des3_speed_template, DES3_SPEED_VECTORS,
+   speed_template_24);
+   test_cipher_speed(ctr(des3_ede), DECRYPT, sec,
+   des3_speed_template, DES3_SPEED_VECTORS,
+   speed_template_24);
break;
 
case 202:

--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH resend 13/15] arm64/crypto: add voluntary preemption to Crypto Extensions SHA1

2014-05-13 Thread Jussi Kivilinna
On 01.05.2014 18:51, Ard Biesheuvel wrote:
 The Crypto Extensions based SHA1 implementation uses the NEON register file,
 and hence runs with preemption disabled. This patch adds a TIF_NEED_RESCHED
 check to its inner loop so we at least give up the CPU voluntarily when we
 are running in process context and have been tagged for preemption by the
 scheduler.
 
 Signed-off-by: Ard Biesheuvel ard.biesheu...@linaro.org
 ---
snip
 @@ -42,6 +42,7 @@ static int sha1_update(struct shash_desc *desc, const u8 
 *data,
   sctx-count += len;
  
   if ((partial + len) = SHA1_BLOCK_SIZE) {
 + struct thread_info *ti = NULL;
   int blocks;
  
   if (partial) {
 @@ -52,16 +53,30 @@ static int sha1_update(struct shash_desc *desc, const u8 
 *data,
   len -= p;
   }
  
 + /*
 +  * Pass current's thread info pointer to sha1_ce_transform()
 +  * below if we want it to play nice under preemption.
 +  */
 + if ((IS_ENABLED(CONFIG_PREEMPT_VOLUNTARY) ||
 +  IS_ENABLED(CONFIG_PREEMPT))  !in_interrupt())
 + ti = current_thread_info();
 +
   blocks = len / SHA1_BLOCK_SIZE;
   len %= SHA1_BLOCK_SIZE;
  
 - kernel_neon_begin_partial(16);
 - sha1_ce_transform(blocks, data, sctx-state,
 -   partial ? sctx-buffer : NULL, 0);
 - kernel_neon_end();
 + do {
 + int rem;
 +
 + kernel_neon_begin_partial(16);
 + rem = sha1_ce_transform(blocks, data, sctx-state,
 + partial ? sctx-buffer : NULL,
 + 0, ti);
 + kernel_neon_end();
  
 - data += blocks * SHA1_BLOCK_SIZE;
 - partial = 0;
 + data += (blocks - rem) * SHA1_BLOCK_SIZE;
 + blocks = rem;
 + partial = 0;
 + } while (unlikely(ti  blocks  0));
   }
   if (len)
   memcpy(sctx-buffer + partial, data, len);
 @@ -94,6 +109,7 @@ static int sha1_finup(struct shash_desc *desc, const u8 
 *data,
 unsigned int len, u8 *out)
  {
   struct sha1_state *sctx = shash_desc_ctx(desc);
 + struct thread_info *ti = NULL;
   __be32 *dst = (__be32 *)out;
   int blocks;
   int i;
 @@ -111,9 +127,20 @@ static int sha1_finup(struct shash_desc *desc, const u8 
 *data,
*/
   blocks = len / SHA1_BLOCK_SIZE;
  
 - kernel_neon_begin_partial(16);
 - sha1_ce_transform(blocks, data, sctx-state, NULL, len);
 - kernel_neon_end();
 + if ((IS_ENABLED(CONFIG_PREEMPT_VOLUNTARY) ||
 +  IS_ENABLED(CONFIG_PREEMPT))  !in_interrupt())
 + ti = current_thread_info();
 +
 + do {
 + int rem;
 +
 + kernel_neon_begin_partial(16);
 + rem = sha1_ce_transform(blocks, data, sctx-state,
 + NULL, len, ti);
 + kernel_neon_end();
 + data += (blocks - rem) * SHA1_BLOCK_SIZE;
 + blocks = rem;
 + } while (unlikely(ti  blocks  0));
  

These seem to be similar, how about renaming assembly function to 
__sha1_ce_transform
and moving this loop to new sha1_ce_transform.

Otherwise, patches looks good.

-Jussi
--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] crypto: testmgr: add empty and large test vectors for SHA-1, SHA-224, SHA-256, SHA-384 and SHA-512

2014-04-12 Thread Jussi Kivilinna
Patch adds large test-vectors for SHA algorithms for better code coverage in
optimized assembly implementations. Empty test-vectors are also added, as some
crypto drivers appear to have special case handling for empty input.

Signed-off-by: Jussi Kivilinna jussi.kivili...@iki.fi
---

This patch depends on the crypto: add test cases for SHA-1, SHA-224, SHA-256
and AES-CCM patch from Ard Biesheuvel.
---
 crypto/testmgr.h |  728 +-
 1 file changed, 721 insertions(+), 7 deletions(-)

diff --git a/crypto/testmgr.h b/crypto/testmgr.h
index 84ac0f0..7d1438e 100644
--- a/crypto/testmgr.h
+++ b/crypto/testmgr.h
@@ -487,10 +487,15 @@ static struct hash_testvec crct10dif_tv_template[] = {
  * SHA1 test vectors  from from FIPS PUB 180-1
  * Long vector from CAVS 5.0
  */
-#define SHA1_TEST_VECTORS  4
+#define SHA1_TEST_VECTORS  6
 
 static struct hash_testvec sha1_tv_template[] = {
{
+   .plaintext = ,
+   .psize  = 0,
+   .digest = \xda\x39\xa3\xee\x5e\x6b\x4b\x0d\x32\x55
+ \xbf\xef\x95\x60\x18\x90\xaf\xd8\x07\x09,
+   }, {
.plaintext = abc,
.psize  = 3,
.digest = \xa9\x99\x3e\x36\x47\x06\x81\x6a\xba\x3e
@@ -534,6 +539,139 @@ static struct hash_testvec sha1_tv_template[] = {
.psize  = 64,
.digest = \xc8\x71\xf6\x9a\x63\xcc\xa9\x84\x84\x82
  \x64\xe7\x79\x95\x5d\xd7\x19\x41\x7c\x91,
+   }, {
+   .plaintext = \x08\x9f\x13\xaa\x41\xd8\x4c\xe3
+\x7a\x11\x85\x1c\xb3\x27\xbe\x55
+\xec\x60\xf7\x8e\x02\x99\x30\xc7
+\x3b\xd2\x69\x00\x74\x0b\xa2\x16
+\xad\x44\xdb\x4f\xe6\x7d\x14\x88
+\x1f\xb6\x2a\xc1\x58\xef\x63\xfa
+\x91\x05\x9c\x33\xca\x3e\xd5\x6c
+\x03\x77\x0e\xa5\x19\xb0\x47\xde
+\x52\xe9\x80\x17\x8b\x22\xb9\x2d
+\xc4\x5b\xf2\x66\xfd\x94\x08\x9f
+\x36\xcd\x41\xd8\x6f\x06\x7a\x11
+\xa8\x1c\xb3\x4a\xe1\x55\xec\x83
+\x1a\x8e\x25\xbc\x30\xc7\x5e\xf5
+\x69\x00\x97\x0b\xa2\x39\xd0\x44
+\xdb\x72\x09\x7d\x14\xab\x1f\xb6
+\x4d\xe4\x58\xef\x86\x1d\x91\x28
+\xbf\x33\xca\x61\xf8\x6c\x03\x9a
+\x0e\xa5\x3c\xd3\x47\xde\x75\x0c
+\x80\x17\xae\x22\xb9\x50\xe7\x5b
+\xf2\x89\x20\x94\x2b\xc2\x36\xcd
+\x64\xfb\x6f\x06\x9d\x11\xa8\x3f
+\xd6\x4a\xe1\x78\x0f\x83\x1a\xb1
+\x25\xbc\x53\xea\x5e\xf5\x8c\x00
+\x97\x2e\xc5\x39\xd0\x67\xfe\x72
+\x09\xa0\x14\xab\x42\xd9\x4d\xe4
+\x7b\x12\x86\x1d\xb4\x28\xbf\x56
+\xed\x61\xf8\x8f\x03\x9a\x31\xc8
+\x3c\xd3\x6a\x01\x75\x0c\xa3\x17
+\xae\x45\xdc\x50\xe7\x7e\x15\x89
+\x20\xb7\x2b\xc2\x59\xf0\x64\xfb
+\x92\x06\x9d\x34\xcb\x3f\xd6\x6d
+\x04\x78\x0f\xa6\x1a\xb1\x48\xdf
+\x53\xea\x81\x18\x8c\x23\xba\x2e
+\xc5\x5c\xf3\x67\xfe\x95\x09\xa0
+\x37\xce\x42\xd9\x70\x07\x7b\x12
+\xa9\x1d\xb4\x4b\xe2\x56\xed\x84
+\x1b\x8f\x26\xbd\x31\xc8\x5f\xf6
+\x6a\x01\x98\x0c\xa3\x3a\xd1\x45
+\xdc\x73\x0a\x7e\x15\xac\x20\xb7
+\x4e\xe5\x59\xf0\x87\x1e\x92\x29
+\xc0\x34\xcb\x62\xf9\x6d\x04\x9b
+\x0f\xa6\x3d\xd4\x48\xdf\x76\x0d
+\x81\x18\xaf\x23\xba\x51\xe8\x5c
+\xf3\x8a\x21\x95\x2c\xc3\x37\xce
+\x65\xfc\x70\x07\x9e\x12\xa9\x40
+\xd7\x4b\xe2\x79\x10\x84\x1b\xb2
+\x26\xbd\x54\xeb\x5f\xf6\x8d\x01
+\x98\x2f\xc6\x3a\xd1\x68\xff\x73
+\x0a\xa1\x15\xac\x43\xda\x4e\xe5
+\x7c\x13\x87\x1e\xb5\x29\xc0\x57
+\xee\x62\xf9\x90\x04\x9b\x32\xc9
+\x3d\xd4\x6b\x02\x76\x0d\xa4\x18
+\xaf\x46\xdd\x51\xe8\x7f\x16\x8a
+\x21\xb8\x2c\xc3\x5a\xf1\x65\xfc
+\x93\x07\x9e\x35\xcc\x40\xd7\x6e
+\x05\x79\x10\xa7

Re: [PATCH 2/2] SHA1 transform: x86_64 AVX2 optimization - glue build - resend with email correction

2014-02-27 Thread Jussi Kivilinna
On 27.02.2014 19:42, chandramouli narayanan wrote:
 This git patch adds the glue, build and configuration changes
 to include x86_64 AVX2 optimization of SHA1 transform to
 crypto support. The patch has been tested with 3.14.0-rc1
 kernel.
 
 On a Haswell desktop, with turbo disabled and all cpus running
 at maximum frequency, tcrypt shows AVX2 performance improvement
 from 3% for 256 bytes update to 16% for 1024 bytes update over
 AVX implementation. 
 
 Signed-off-by: Chandramouli Narayanan mo...@linux.intel.com
 
..snip..
  static int __init sha1_ssse3_mod_init(void)
  {
 + char *algo_name;
   /* test for SSSE3 first */
 - if (cpu_has_ssse3)
 + if (cpu_has_ssse3) {
   sha1_transform_asm = sha1_transform_ssse3;
 + algo_name = SSSE3;
 + }
  
  #ifdef CONFIG_AS_AVX
   /* allow AVX to override SSSE3, it's a little faster */
 - if (avx_usable())
 - sha1_transform_asm = sha1_transform_avx;
 + if (avx_usable()) {
 + if (cpu_has_avx) {
 + sha1_transform_asm = sha1_transform_avx;
 + algo_name = AVX;
 + }
 +#ifdef CONFIG_AS_AVX2
 + if (cpu_has_avx2) {

Wouldn't you need to check also for BMI2 as __sha1_transform_avx2 uses 'rorx'?

For example, commit 16c0c4e1656c14ef9deac189a4240b5ca19c6919 added BMI2 check 
for SHA-256.

-Jussi

 + /* allow AVX2 to override AVX, it's a little faster */
 + sha1_transform_asm = __sha1_transform_avx2;
 + algo_name = AVX2;
 + }
 +#endif
 + }
  #endif
--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch] crypto: remove a duplicate checks in __cbc_decrypt()

2014-02-14 Thread Jussi Kivilinna
On 13.02.2014 16:58, Dan Carpenter wrote:
 We checked nbytes  bsize before so it can't happen here.
 
 Signed-off-by: Dan Carpenter dan.carpen...@oracle.com

Acked-by: Jussi Kivilinna jussi.kivili...@iki.fi

 ---
 This doesn't change how the code works, but maybe their is a bug in the
 original code.  Please review?
 
 diff --git a/arch/x86/crypto/cast5_avx_glue.c 
 b/arch/x86/crypto/cast5_avx_glue.c
 index e6a3700489b9..e57e20ab5e0b 100644
 --- a/arch/x86/crypto/cast5_avx_glue.c
 +++ b/arch/x86/crypto/cast5_avx_glue.c
 @@ -203,9 +203,6 @@ static unsigned int __cbc_decrypt(struct blkcipher_desc 
 *desc,
   src -= 1;
   dst -= 1;
   } while (nbytes = bsize * CAST5_PARALLEL_BLOCKS);
 -
 - if (nbytes  bsize)
 - goto done;
   }
  
   /* Handle leftovers */
 diff --git a/arch/x86/crypto/blowfish_glue.c b/arch/x86/crypto/blowfish_glue.c
 index 50ec333b70e6..8af519ed73d1 100644
 --- a/arch/x86/crypto/blowfish_glue.c
 +++ b/arch/x86/crypto/blowfish_glue.c
 @@ -223,9 +223,6 @@ static unsigned int __cbc_decrypt(struct blkcipher_desc 
 *desc,
   src -= 1;
   dst -= 1;
   } while (nbytes = bsize * 4);
 -
 - if (nbytes  bsize)
 - goto done;
   }
  
   /* Handle leftovers */
 --
 To unsubscribe from this list: send the line unsubscribe linux-crypto in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 

--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Unaligned CTR mode tests in crypto/testmgr.h

2013-10-31 Thread Jussi Kivilinna
On 30.10.2013 23:06, Joel Fernandes wrote:
 On 10/30/2013 06:09 AM, Jussi Kivilinna wrote:
 On 30.10.2013 02:11, Joel Fernandes wrote:
 Hi,

 Some tests such as test 5 in AES CTR mode in crypto/testmgr.h have a 
 unaligned
 input buffer size such as 499 which is not aligned to any  0 power of 2.

 Due to this, omap-aes driver, and I think atmel-aes too error out when
 encryption is requested for these buffers.

 pr_err(request size is not exact amount of AES blocks\n) or a similar 
 message.

 Is this failure considered a bug? How do we fix it?

 Counter mode turns block cipher into stream cipher and implementation must 
 handle
 buffer lengths that do not match the block size of underlying block cipher.


 How were the result output vectors generated, did you use 0 padding? Do we 
 0 pad
 the inputs to align in these cases to get correct results?

 See crypto/ctr.c:crypto_ctr_crypt_final() how to handle trailing bytes when
 'buflen % AES_BLOCK_SIZE != 0'.

 Basically, you encrypt the last counter block to generate the last keystream
 block and xor only the 'buflen % AES_BLOCK_SIZE' bytes of last keystream 
 block
 with the tail bytes of source buffer:

  key_last[0..15] = ENC(K, counter[0..15]);
  dst_last[0..trailbytes-1] = src_last[0..trailbytes-1] ^ 
 key_last[0..trailbytes-1];
  /* key_last[trailbytes..15] discarded. */

 Or if you want to use hardware that only does block-size aligned CTR 
 encryption,
 you can pad input to block size aligned length, do encryption, and then 
 discard
 those padding bytes after encryption:

  src_padded[0..trailbytes-1] = src_last[0..trailbytes-1]
  src_padded[trailbytes..15] = /* don't care, can be anything/uninitialized */
  src_padded[0..15] = ENC_HW_CTR(src_padded[0..15]);
  dst_last[0..trailbytes-1] = src_padded[0..trailbytes-1];
  /* src_padded[trailbytes..15] discarded. */

 Here, ENC_HW_CTR(in) internally does:
  keystream[0..15] = ENC(K, counter[0..15]); INC_CTR(counter);
  out[0..15] = in[0..15] ^ keystream[0..15];

 
 Thanks, I'll try that. Just one question- is it safe to assume the output 
 buffer
 (req-dst) is capable of holding those many bytes?
 
 In your algorithm above, we're assuming here without allocating explicitly 
 that
 the output buffer passed to the driver has trailbytes..15 available. Because
 otherwise we are in danger of introducing a memory leak, if we just assume 
 they
 are available in the output buffer.

In above example, I meant src_padded being temporary block-sized buffer to 
handle
the last trailing bytes. I don't think you can assume that req-dst would have
this extra space.

 
 That said, I don't want to allocate new buffer in the driver and then do 
 copying
 of encrypted data back into the output buffer. Because I did lot of hard work 
 to
 get rid of such code as it is slower.
 

Could you handle first 'buflen - buflen % blocksize' bytes as done currently 
without
extra copies and then handle the trailing bytes separately?

-Jussi

 thanks,
 
 -Joel
 
 --
 To unsubscribe from this list: send the line unsubscribe linux-crypto in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 

--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Unaligned CTR mode tests in crypto/testmgr.h

2013-10-30 Thread Jussi Kivilinna
On 30.10.2013 02:11, Joel Fernandes wrote:
 Hi,
 
 Some tests such as test 5 in AES CTR mode in crypto/testmgr.h have a unaligned
 input buffer size such as 499 which is not aligned to any  0 power of 2.
 
 Due to this, omap-aes driver, and I think atmel-aes too error out when
 encryption is requested for these buffers.
 
 pr_err(request size is not exact amount of AES blocks\n) or a similar 
 message.
 
 Is this failure considered a bug? How do we fix it?

Counter mode turns block cipher into stream cipher and implementation must 
handle
buffer lengths that do not match the block size of underlying block cipher.

 
 How were the result output vectors generated, did you use 0 padding? Do we 0 
 pad
 the inputs to align in these cases to get correct results?

See crypto/ctr.c:crypto_ctr_crypt_final() how to handle trailing bytes when
'buflen % AES_BLOCK_SIZE != 0'.

Basically, you encrypt the last counter block to generate the last keystream
block and xor only the 'buflen % AES_BLOCK_SIZE' bytes of last keystream block
with the tail bytes of source buffer:

 key_last[0..15] = ENC(K, counter[0..15]);
 dst_last[0..trailbytes-1] = src_last[0..trailbytes-1] ^ 
key_last[0..trailbytes-1];
 /* key_last[trailbytes..15] discarded. */

Or if you want to use hardware that only does block-size aligned CTR encryption,
you can pad input to block size aligned length, do encryption, and then discard
those padding bytes after encryption:

 src_padded[0..trailbytes-1] = src_last[0..trailbytes-1]
 src_padded[trailbytes..15] = /* don't care, can be anything/uninitialized */
 src_padded[0..15] = ENC_HW_CTR(src_padded[0..15]);
 dst_last[0..trailbytes-1] = src_padded[0..trailbytes-1];
 /* src_padded[trailbytes..15] discarded. */

Here, ENC_HW_CTR(in) internally does:
 keystream[0..15] = ENC(K, counter[0..15]); INC_CTR(counter);
 out[0..15] = in[0..15] ^ keystream[0..15];

-Jussi

 
 thanks,
 
 -Joel
 --
 To unsubscribe from this list: send the line unsubscribe linux-crypto in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 

--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Documentation: kerneli typo in description for Serpent cipher algorithm Bug #60848

2013-10-02 Thread Jussi Kivilinna
On 02.10.2013 21:12, Rob Landley wrote:
 On 10/02/2013 11:10:37 AM, Kevin Mulvey wrote:
 change kerneli to kernel as well as kerneli.org to kernel.org

 Signed-off-by: Kevin Mulvey ke...@kevinmulvey.net
 
 There's a bug number for this?
 
 Acked, queued. (Although I'm not sure the value of pointing to www.kernel.org 
 for this.)

I think kerneli.org is correct.. see old website at 
http://web.archive.org/web/20010201085500/http://www.kerneli.org/

-Jussi

 
 Thanks,
 
 Rob
 
 -- 
 To unsubscribe from this list: send the line unsubscribe linux-crypto in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 

--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/4] crypto: create generic version of ablk_helper

2013-09-22 Thread Jussi Kivilinna
On 20.09.2013 21:46, Ard Biesheuvel wrote:
 Create a generic version of ablk_helper so it can be reused
 by other architectures.
 
 Acked-by: Jussi Kivilinna jussi.kivili...@iki.fi
 Signed-off-by: Ard Biesheuvel ard.biesheu...@linaro.org

Why resent this patch here when this was in the earlier patchset?
 http://marc.info/?l=linux-crypto-vgerm=137966378813818w=2

-Jussi

 ---
  crypto/Kconfig   |   4 ++
  crypto/Makefile  |   1 +
  crypto/ablk_helper.c | 150 
 +++
  include/asm-generic/simd.h   |  14 
  include/crypto/ablk_helper.h |  31 +
  5 files changed, 200 insertions(+)
  create mode 100644 crypto/ablk_helper.c
  create mode 100644 include/asm-generic/simd.h
  create mode 100644 include/crypto/ablk_helper.h
 
 diff --git a/crypto/Kconfig b/crypto/Kconfig
 index 69ce573..8179ae6 100644
 --- a/crypto/Kconfig
 +++ b/crypto/Kconfig
 @@ -179,6 +179,10 @@ config CRYPTO_ABLK_HELPER_X86
   depends on X86
   select CRYPTO_CRYPTD
  
 +config CRYPTO_ABLK_HELPER
 + tristate
 + select CRYPTO_CRYPTD
 +
  config CRYPTO_GLUE_HELPER_X86
   tristate
   depends on X86
 diff --git a/crypto/Makefile b/crypto/Makefile
 index 80019ba..5e1bdb1 100644
 --- a/crypto/Makefile
 +++ b/crypto/Makefile
 @@ -104,3 +104,4 @@ obj-$(CONFIG_CRYPTO_USER_API_SKCIPHER) += algif_skcipher.o
  obj-$(CONFIG_XOR_BLOCKS) += xor.o
  obj-$(CONFIG_ASYNC_CORE) += async_tx/
  obj-$(CONFIG_ASYMMETRIC_KEY_TYPE) += asymmetric_keys/
 +obj-$(CONFIG_CRYPTO_ABLK_HELPER) += ablk_helper.o
 diff --git a/crypto/ablk_helper.c b/crypto/ablk_helper.c
 new file mode 100644
 index 000..62568b1
 --- /dev/null
 +++ b/crypto/ablk_helper.c
 @@ -0,0 +1,150 @@
 +/*
 + * Shared async block cipher helpers
 + *
 + * Copyright (c) 2012 Jussi Kivilinna jussi.kivili...@mbnet.fi
 + *
 + * Based on aesni-intel_glue.c by:
 + *  Copyright (C) 2008, Intel Corp.
 + *Author: Huang Ying ying.hu...@intel.com
 + *
 + * This program is free software; you can redistribute it and/or modify
 + * it under the terms of the GNU General Public License as published by
 + * the Free Software Foundation; either version 2 of the License, or
 + * (at your option) any later version.
 + *
 + * This program is distributed in the hope that it will be useful,
 + * but WITHOUT ANY WARRANTY; without even the implied warranty of
 + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 + * GNU General Public License for more details.
 + *
 + * You should have received a copy of the GNU General Public License
 + * along with this program; if not, write to the Free Software
 + * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307
 + * USA
 + *
 + */
 +
 +#include linux/kernel.h
 +#include linux/crypto.h
 +#include linux/init.h
 +#include linux/module.h
 +#include linux/hardirq.h
 +#include crypto/algapi.h
 +#include crypto/cryptd.h
 +#include crypto/ablk_helper.h
 +#include asm/simd.h
 +
 +int ablk_set_key(struct crypto_ablkcipher *tfm, const u8 *key,
 +  unsigned int key_len)
 +{
 + struct async_helper_ctx *ctx = crypto_ablkcipher_ctx(tfm);
 + struct crypto_ablkcipher *child = ctx-cryptd_tfm-base;
 + int err;
 +
 + crypto_ablkcipher_clear_flags(child, CRYPTO_TFM_REQ_MASK);
 + crypto_ablkcipher_set_flags(child, crypto_ablkcipher_get_flags(tfm)
 +  CRYPTO_TFM_REQ_MASK);
 + err = crypto_ablkcipher_setkey(child, key, key_len);
 + crypto_ablkcipher_set_flags(tfm, crypto_ablkcipher_get_flags(child)
 +  CRYPTO_TFM_RES_MASK);
 + return err;
 +}
 +EXPORT_SYMBOL_GPL(ablk_set_key);
 +
 +int __ablk_encrypt(struct ablkcipher_request *req)
 +{
 + struct crypto_ablkcipher *tfm = crypto_ablkcipher_reqtfm(req);
 + struct async_helper_ctx *ctx = crypto_ablkcipher_ctx(tfm);
 + struct blkcipher_desc desc;
 +
 + desc.tfm = cryptd_ablkcipher_child(ctx-cryptd_tfm);
 + desc.info = req-info;
 + desc.flags = 0;
 +
 + return crypto_blkcipher_crt(desc.tfm)-encrypt(
 + desc, req-dst, req-src, req-nbytes);
 +}
 +EXPORT_SYMBOL_GPL(__ablk_encrypt);
 +
 +int ablk_encrypt(struct ablkcipher_request *req)
 +{
 + struct crypto_ablkcipher *tfm = crypto_ablkcipher_reqtfm(req);
 + struct async_helper_ctx *ctx = crypto_ablkcipher_ctx(tfm);
 +
 + if (!may_use_simd()) {
 + struct ablkcipher_request *cryptd_req =
 + ablkcipher_request_ctx(req);
 +
 + memcpy(cryptd_req, req, sizeof(*req));
 + ablkcipher_request_set_tfm(cryptd_req, ctx-cryptd_tfm-base);
 +
 + return crypto_ablkcipher_encrypt(cryptd_req);
 + } else {
 + return __ablk_encrypt(req);
 + }
 +}
 +EXPORT_SYMBOL_GPL(ablk_encrypt);
 +
 +int ablk_decrypt(struct ablkcipher_request *req)
 +{
 + struct crypto_ablkcipher *tfm = crypto_ablkcipher_reqtfm(req);
 + struct async_helper_ctx *ctx

Re: [PATCH 4/4] ARM: add support for bit sliced AES using NEON instructions

2013-09-22 Thread Jussi Kivilinna
On 20.09.2013 21:46, Ard Biesheuvel wrote:
 This implementation of the AES algorithm gives around 45% speedup on 
 Cortex-A15
 for CTR mode and for XTS in encryption mode. Both CBC and XTS in decryption 
 mode
 are slightly faster (5 - 10% on Cortex-A15). [As CBC in encryption mode can 
 only
 be performed sequentially, there is no speedup in this case.]
 
 Unlike the core AES cipher (on which this module also depends), this algorithm
 uses bit slicing to process up to 8 blocks in parallel in constant time. This
 algorithm does not rely on any lookup tables so it is believed to be
 invulnerable to cache timing attacks.
 
 The core code has been adopted from the OpenSSL project (in collaboration
 with the original author, on cc). For ease of maintenance, this version is
 identical to the upstream OpenSSL code, i.e., all modifications that were
 required to make it suitable for inclusion into the kernel have already been
 merged upstream.
 
 Cc: Andy Polyakov ap...@openssl.org
 Signed-off-by: Ard Biesheuvel ard.biesheu...@linaro.org
 ---
[..snip..]
 + bcc .Ldec_done
 + @ multiplication by 0x0e

Decryption can probably be made faster by implementing InvMixColumns slightly
differently. Instead of implementing inverse MixColumns matrix directly, use
preprocessing step, followed by MixColumns as described in section 4.1.3
Decryption of The Design of Rijndael: AES - The Advanced Encryption Standard
(J. Daemen, V. Rijmen / 2002).

In short, the MixColumns and InvMixColumns matrixes have following relation:
 | 0e 0b 0d 09 |   | 02 03 01 01 |   | 05 00 04 00 |
 | 09 0e 0b 0d | = | 01 02 03 01 | x | 00 05 00 04 |
 | 0d 09 0e 0b |   | 01 01 02 03 |   | 04 00 05 00 |
 | 0b 0d 09 0e |   | 03 01 01 02 |   | 00 04 00 05 |

Bit-sliced implementation of the 05-00-04-00 matrix much shorter than 
0e-0b-0d-09
matrix, so even when combined with MixColumns total instruction count for
InvMixColumns implemented this way should be nearly half of current.

Check [1] for implementation of this on AVX instruction set.

-Jussi

[1] 
https://github.com/jkivilin/supercop-blockciphers/blob/beyond_master/crypto_stream/aes128ctr/avx/aes_asm_bitslice_avx.S#L234

--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] crypto: move ablk_helper out of arch/x86

2013-09-16 Thread Jussi Kivilinna
On 14.09.2013 13:24, Ard Biesheuvel wrote:
 Move the ablk_helper code out of arch/x86 so it can be reused
 by other architectures. The only x86 specific dependency is
 a call to irq_fpu_usable(), in the generic case we use
 !in_interrupt() instead.
 
 Cc: jussi.kivili...@iki.fi
 Signed-off-by: Ard Biesheuvel ard.biesheu...@linaro.org
 ---
 
 Any need to split this up between generic/crypto and x86?
 
 
  arch/x86/crypto/Makefile   |   1 -
  arch/x86/crypto/ablk_helper.c  | 149 
  arch/x86/crypto/aesni-intel_glue.c |   2 +-
  arch/x86/crypto/camellia_aesni_avx2_glue.c |   2 +-
  arch/x86/crypto/camellia_aesni_avx_glue.c  |   2 +-
  arch/x86/crypto/cast5_avx_glue.c   |   2 +-
  arch/x86/crypto/cast6_avx_glue.c   |   2 +-
  arch/x86/crypto/serpent_avx2_glue.c|   2 +-
  arch/x86/crypto/serpent_avx_glue.c |   2 +-
  arch/x86/crypto/serpent_sse2_glue.c|   2 +-
  arch/x86/crypto/twofish_avx_glue.c |   2 +-
  arch/x86/include/asm/crypto/ablk_helper.h  |  38 ++--
  crypto/Kconfig |  23 +++--
  crypto/Makefile|   1 +
  crypto/ablk_helper.c   | 150 
 +
  include/asm-generic/crypto/ablk_helper.h   |  13 +++
  include/crypto/ablk_helper.h   |  31 ++
  17 files changed, 224 insertions(+), 200 deletions(-)
  delete mode 100644 arch/x86/crypto/ablk_helper.c
  create mode 100644 crypto/ablk_helper.c
  create mode 100644 include/asm-generic/crypto/ablk_helper.h
  create mode 100644 include/crypto/ablk_helper.h
 
 diff --git a/arch/x86/crypto/Makefile b/arch/x86/crypto/Makefile
 index 7d6ba9d..18fda50 100644
 --- a/arch/x86/crypto/Makefile
 +++ b/arch/x86/crypto/Makefile
 @@ -4,7 +4,6 @@
  
  avx_supported := $(call as-instr,vpxor 
 %xmm0$(comma)%xmm0$(comma)%xmm0,yes,no)
  
 -obj-$(CONFIG_CRYPTO_ABLK_HELPER_X86) += ablk_helper.o
  obj-$(CONFIG_CRYPTO_GLUE_HELPER_X86) += glue_helper.o

This part does not apply cleanly to cryptodev tree 
(git://git.kernel.org/pub/scm/linux/kernel/git/herbert/cryptodev-2.6.git).

 
[snip]
 diff --git a/include/asm-generic/crypto/ablk_helper.h 
 b/include/asm-generic/crypto/ablk_helper.h
 new file mode 100644
 index 000..ede807f
 --- /dev/null
 +++ b/include/asm-generic/crypto/ablk_helper.h
 @@ -0,0 +1,13 @@
 +
 +#include linux/hardirq.h
 +
 +/*
 + * ablk_can_run_sync - used by crypto/ablk_helper to decide whether a request
 + * can be handled synchronously or needs to be queued up.
 + * 
 + * Choose in_interrupt() as a reasonable default
 + */

Trailing whitespace in above comment block.

ERROR: trailing whitespace
#702: FILE: include/asm-generic/crypto/ablk_helper.h:7:
+ * $

Otherwise,
Acked-by: Jussi Kivilinna jussi.kivili...@iki.fi

--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 1/2] crypto: move ablk_helper out of arch/x86

2013-09-14 Thread Jussi Kivilinna
On 13.09.2013 18:08, Ard Biesheuvel wrote:
 Move the ablk_helper code out of arch/x86 so it can be reused
 by other architectures. The only x86 specific dependency was
 a call to irq_fpu_usable(), this has been factored out and moved
 to crypto/ablk_helper_x86.c
 
 Signed-off-by: Ard Biesheuvel ard.biesheu...@linaro.org
 ---
..snip..
 diff --git a/crypto/ablk_helper_generic.c b/crypto/ablk_helper_generic.c
 new file mode 100644
 index 000..b63b800
 --- /dev/null
 +++ b/crypto/ablk_helper_generic.c
 @@ -0,0 +1,155 @@
 +/*
 + * Shared async block cipher helpers
 + *
 + * Copyright (c) 2012 Jussi Kivilinna jussi.kivili...@mbnet.fi
 + *
 + * Based on aesni-intel_glue.c by:
 + *  Copyright (C) 2008, Intel Corp.
 + *Author: Huang Ying ying.hu...@intel.com
 + *
 + * This program is free software; you can redistribute it and/or modify
 + * it under the terms of the GNU General Public License as published by
 + * the Free Software Foundation; either version 2 of the License, or
 + * (at your option) any later version.
 + *
 + * This program is distributed in the hope that it will be useful,
 + * but WITHOUT ANY WARRANTY; without even the implied warranty of
 + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 + * GNU General Public License for more details.
 + *
 + * You should have received a copy of the GNU General Public License
 + * along with this program; if not, write to the Free Software
 + * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307
 + * USA
 + *
 + */
 +
 +#include linux/kernel.h
 +#include linux/crypto.h
 +#include linux/init.h
 +#include linux/module.h
 +#include linux/hardirq.h
 +#include crypto/algapi.h
 +#include crypto/cryptd.h
 +#include crypto/ablk_helper.h
 +
 +/* can be overridden by the architecture if desired */
 +bool __weak ablk_can_run_sync(void)
 +{
 + return !in_interrupt();
 +}

Why not have architecture specific header file that provides this function?
With architecture using in_interrupt for this, you would avoid extra function
call.

-Jussi
--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 2/2] arm64: add support for AES using ARMv8 Crypto Extensions

2013-09-14 Thread Jussi Kivilinna
On 13.09.2013 18:08, Ard Biesheuvel wrote:
 This adds ARMv8 Crypto Extensions based implemenations of
 AES in CBC, CTR and XTS mode.
 
 Signed-off-by: Ard Biesheuvel ard.biesheu...@linaro.org
 ---
..snip..
 +static int xts_set_key(struct crypto_tfm *tfm, const u8 *in_key,
 +unsigned int key_len)
 +{
 + struct crypto_aes_xts_ctx *ctx = crypto_tfm_ctx(tfm);
 + u32 *flags = tfm-crt_flags;
 + int ret;
 +
 + ret = crypto_aes_expand_key(ctx-key1, in_key, key_len/2);
 + if (!ret)
 + ret = crypto_aes_expand_key(ctx-key2, in_key[key_len/2],
 + key_len/2);

Use checkpatch.

 + if (!ret)
 + return 0;
 +
 + *flags |= CRYPTO_TFM_RES_BAD_KEY_LEN;
 + return -EINVAL;
 +}
 +
 +static int cbc_encrypt(struct blkcipher_desc *desc, struct scatterlist *dst,
 +struct scatterlist *src, unsigned int nbytes)
 +{
 + struct crypto_aes_ctx *ctx = crypto_blkcipher_ctx(desc-tfm);
 + int err, first, rounds = 6 + ctx-key_length/4;
 + struct blkcipher_walk walk;
 + unsigned int blocks;
 +
 + blkcipher_walk_init(walk, dst, src, nbytes);
 + err = blkcipher_walk_virt(desc, walk);
 +
 + kernel_neon_begin();

Is sleeping allowed within kernel_neon_begin/end block? If not, you need to
clear CRYPTO_TFM_REQ_MAY_SLEEP on desc-flags. Otherwise blkcipher_walk_done
might sleep.

 + for (first = 1; (blocks = (walk.nbytes / AES_BLOCK_SIZE)); first = 0) {
 + aesce_cbc_encrypt(walk.dst.virt.addr, walk.src.virt.addr,
 +   (u8*)ctx-key_enc, rounds, blocks, walk.iv,
 +   first);
 +
 + err = blkcipher_walk_done(desc, walk, blocks * AES_BLOCK_SIZE);
 + }
 + kernel_neon_end();
 +
 + /* non-integral sizes are not supported in CBC */
 + if (unlikely(walk.nbytes))
 + err = -EINVAL;

I think blkcipher_walk_done already does this check by comparing against
alg.cra_blocksize.

 +
 + return err;
 +}
..snip..
 +
 +static int ctr_encrypt(struct blkcipher_desc *desc, struct scatterlist *dst,
 +struct scatterlist *src, unsigned int nbytes)
 +{
 + struct crypto_aes_ctx *ctx = crypto_blkcipher_ctx(desc-tfm);
 + int err, first, rounds = 6 + ctx-key_length/4;
 + struct blkcipher_walk walk;
 + u8 ctr[AES_BLOCK_SIZE];
 +
 + blkcipher_walk_init(walk, dst, src, nbytes);
 + err = blkcipher_walk_virt(desc, walk);
 +
 + memcpy(ctr, walk.iv, AES_BLOCK_SIZE);
 +
 + kernel_neon_begin();
 + for (first = 1; (nbytes = walk.nbytes); first = 0) {
 + aesce_ctr_encrypt(walk.dst.virt.addr, walk.src.virt.addr,
 +   (u8*)ctx-key_enc, rounds, nbytes, ctr, 
 first);
 +
 + err = blkcipher_walk_done(desc, walk, 0);
 +
 + /* non-integral block *must* be the last one */
 + if (unlikely(walk.nbytes  (nbytes  (AES_BLOCK_SIZE-1 {
 + err = -EINVAL;

Other CTR implementations do not have this.. not needed?

 + break;
 + }
 + }
..snip..
 +static struct crypto_alg aesce_cbc_algs[] = { {
 + .cra_name   = __cbc-aes-aesce,
 + .cra_driver_name= __driver-cbc-aes-aesce,
 + .cra_priority   = 0,
 + .cra_flags  = CRYPTO_ALG_TYPE_BLKCIPHER,
 + .cra_blocksize  = AES_BLOCK_SIZE,
 + .cra_ctxsize= sizeof(struct crypto_aes_ctx),
 + .cra_alignmask  = 0,
 + .cra_type   = crypto_blkcipher_type,
 + .cra_module = THIS_MODULE,
 + .cra_u = {
 + .blkcipher = {
 + .min_keysize= AES_MIN_KEY_SIZE,
 + .max_keysize= AES_MAX_KEY_SIZE,
 + .ivsize = AES_BLOCK_SIZE,
 + .setkey = crypto_aes_set_key,
 + .encrypt= cbc_encrypt,
 + .decrypt= cbc_decrypt,
 + },
 + },
 +}, {
 + .cra_name   = __ctr-aes-aesce,
 + .cra_driver_name= __driver-ctr-aes-aesce,
 + .cra_priority   = 0,
 + .cra_flags  = CRYPTO_ALG_TYPE_BLKCIPHER,
 + .cra_blocksize  = AES_BLOCK_SIZE,

CTR mode is stream cipher, cra_blocksize must be set to 1.

This should have been picked up by in-kernel run-time tests, check
CONFIG_CRYPTO_MANAGER_DISABLE_TESTS (and CONFIG_CRYPTO_TEST/tcrypt
module).

 + .cra_ctxsize= sizeof(struct crypto_aes_ctx),
 + .cra_alignmask  = 0,
 + .cra_type   = crypto_blkcipher_type,
 + .cra_module = THIS_MODULE,
 + .cra_u = {
 + .blkcipher = {
 + .min_keysize= AES_MIN_KEY_SIZE,
 + .max_keysize= AES_MAX_KEY_SIZE,
 + .ivsize = AES_BLOCK_SIZE,
 + 

Re: Mistake ?

2013-09-03 Thread Jussi Kivilinna
On 03.09.2013 15:36, Pierre-Mayeul Badaire wrote:
 Good afternoon,
 
 Don't you have a mistake on the MODULE_ALIAS at the last line of the commit ? 
 Shouldn't it be MODULE_ALIAS(sha224) here ?

Yes, that's correct, it should be ssh224 instead of sha384. I'll post patch 
soon.

-Jussi

 
 Reference:
 http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=a710f761fc9ae5728765a5917f8beabb49f98483
 
 Best regards,
 

--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Mistake ?

2013-09-03 Thread Jussi Kivilinna
On 03.09.2013 16:01, Jussi Kivilinna wrote:
 On 03.09.2013 15:36, Pierre-Mayeul Badaire wrote:
 Good afternoon,

 Don't you have a mistake on the MODULE_ALIAS at the last line of the commit 
 ? Shouldn't it be MODULE_ALIAS(sha224) here ?
 
 Yes, that's correct, it should be ssh224 instead of sha384. I'll post 
 patch soon.

sha224.

-Jussi

 
 -Jussi
 

 Reference:
 http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=a710f761fc9ae5728765a5917f8beabb49f98483

 Best regards,

 
 --
 To unsubscribe from this list: send the line unsubscribe linux-crypto in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 

--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] crypto: sha256_ssse3 - use correct module alias for sha224

2013-09-03 Thread Jussi Kivilinna
Commit a710f761f (crypto: sha256_ssse3 - add sha224 support) attempted to add
MODULE_ALIAS for SHA-224, but it ended up being sha384, probably because
mix-up with previous commit 340991e30 (crypto: sha512_ssse3 - add sha384
support). Patch corrects module alias to sha224.

Reported-by: Pierre-Mayeul Badaire pierre-mayeul.bada...@m4x.org
Signed-off-by: Jussi Kivilinna jussi.kivili...@iki.fi
---
 arch/x86/crypto/sha256_ssse3_glue.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/crypto/sha256_ssse3_glue.c 
b/arch/x86/crypto/sha256_ssse3_glue.c
index 50226c4..85021a4 100644
--- a/arch/x86/crypto/sha256_ssse3_glue.c
+++ b/arch/x86/crypto/sha256_ssse3_glue.c
@@ -319,4 +319,4 @@ MODULE_LICENSE(GPL);
 MODULE_DESCRIPTION(SHA256 Secure Hash Algorithm, Supplemental SSE3 
accelerated);
 
 MODULE_ALIAS(sha256);
-MODULE_ALIAS(sha384);
+MODULE_ALIAS(sha224);

--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] crypto: x86: restore avx2_supported check

2013-09-03 Thread Jussi Kivilinna
Commit 3d387ef08c4 (Revert crypto: blowfish - add AVX2/x86_64 implementation
of blowfish cipher) reverted too much as it removed the 'assembler supports
AVX2' check and therefore disabled remaining AVX2 implementations of Camellia
and Serpent. Patch restores the check and enables these implementations.

Signed-off-by: Jussi Kivilinna jussi.kivili...@iki.fi
---
 arch/x86/crypto/Makefile |2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/x86/crypto/Makefile b/arch/x86/crypto/Makefile
index 7d6ba9d..75b08e1e 100644
--- a/arch/x86/crypto/Makefile
+++ b/arch/x86/crypto/Makefile
@@ -3,6 +3,8 @@
 #
 
 avx_supported := $(call as-instr,vpxor %xmm0$(comma)%xmm0$(comma)%xmm0,yes,no)
+avx2_supported := $(call as-instr,vpgatherdd %ymm0$(comma)(%eax$(comma)%ymm1\
+   $(comma)4)$(comma)%ymm2,yes,no)
 
 obj-$(CONFIG_CRYPTO_ABLK_HELPER_X86) += ablk_helper.o
 obj-$(CONFIG_CRYPTO_GLUE_HELPER_X86) += glue_helper.o

--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/4] crypto: testmgr - test skciphers with unaligned buffers

2013-06-13 Thread Jussi Kivilinna
This patch adds unaligned buffer tests for blkciphers.

The first new test is with one byte offset and the second test checks if
cra_alignmask for driver is big enough; for example, for testing a case
where cra_alignmask is set to 7, but driver really needs buffers to be
aligned to 16 bytes.

Signed-off-by: Jussi Kivilinna jussi.kivili...@iki.fi
---
 crypto/testmgr.c |   33 +
 1 file changed, 29 insertions(+), 4 deletions(-)

diff --git a/crypto/testmgr.c b/crypto/testmgr.c
index a81c154..8bd185f 100644
--- a/crypto/testmgr.c
+++ b/crypto/testmgr.c
@@ -820,7 +820,7 @@ out_nobuf:
 
 static int __test_skcipher(struct crypto_ablkcipher *tfm, int enc,
   struct cipher_testvec *template, unsigned int tcount,
-  const bool diff_dst)
+  const bool diff_dst, const int align_offset)
 {
const char *algo =
crypto_tfm_alg_driver_name(crypto_ablkcipher_tfm(tfm));
@@ -876,10 +876,12 @@ static int __test_skcipher(struct crypto_ablkcipher *tfm, 
int enc,
j++;
 
ret = -EINVAL;
-   if (WARN_ON(template[i].ilen  PAGE_SIZE))
+   if (WARN_ON(align_offset + template[i].ilen 
+   PAGE_SIZE))
goto out;
 
data = xbuf[0];
+   data += align_offset;
memcpy(data, template[i].input, template[i].ilen);
 
crypto_ablkcipher_clear_flags(tfm, ~0);
@@ -900,6 +902,7 @@ static int __test_skcipher(struct crypto_ablkcipher *tfm, 
int enc,
sg_init_one(sg[0], data, template[i].ilen);
if (diff_dst) {
data = xoutbuf[0];
+   data += align_offset;
sg_init_one(sgout[0], data, template[i].ilen);
}
 
@@ -941,6 +944,9 @@ static int __test_skcipher(struct crypto_ablkcipher *tfm, 
int enc,
 
j = 0;
for (i = 0; i  tcount; i++) {
+   /* alignment tests are only done with continuous buffers */
+   if (align_offset != 0)
+   break;
 
if (template[i].iv)
memcpy(iv, template[i].iv, MAX_IVLEN);
@@ -1075,15 +1081,34 @@ out_nobuf:
 static int test_skcipher(struct crypto_ablkcipher *tfm, int enc,
 struct cipher_testvec *template, unsigned int tcount)
 {
+   unsigned int alignmask;
int ret;
 
/* test 'dst == src' case */
-   ret = __test_skcipher(tfm, enc, template, tcount, false);
+   ret = __test_skcipher(tfm, enc, template, tcount, false, 0);
if (ret)
return ret;
 
/* test 'dst != src' case */
-   return __test_skcipher(tfm, enc, template, tcount, true);
+   ret = __test_skcipher(tfm, enc, template, tcount, true, 0);
+   if (ret)
+   return ret;
+
+   /* test unaligned buffers, check with one byte offset */
+   ret = __test_skcipher(tfm, enc, template, tcount, true, 1);
+   if (ret)
+   return ret;
+
+   alignmask = crypto_tfm_alg_alignmask(tfm-base);
+   if (alignmask) {
+   /* Check if alignment mask for tfm is correctly set. */
+   ret = __test_skcipher(tfm, enc, template, tcount, true,
+ alignmask + 1);
+   if (ret)
+   return ret;
+   }
+
+   return 0;
 }
 
 static int test_comp(struct crypto_comp *tfm, struct comp_testvec *ctemplate,

--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/4] crypto: testmgr - test hash implementations with unaligned buffers

2013-06-13 Thread Jussi Kivilinna
This patch adds unaligned buffer tests for hashes.

The first new test is with one byte offset and the second test checks if
cra_alignmask for driver is big enough; for example, for testing a case
where cra_alignmask is set to 7, but driver really needs buffers to be
aligned to 16 bytes.

Signed-off-by: Jussi Kivilinna jussi.kivili...@iki.fi
---
 crypto/testmgr.c |   41 +++--
 1 file changed, 39 insertions(+), 2 deletions(-)

diff --git a/crypto/testmgr.c b/crypto/testmgr.c
index f205386..2f00607 100644
--- a/crypto/testmgr.c
+++ b/crypto/testmgr.c
@@ -184,8 +184,9 @@ static int do_one_async_hash_op(struct ahash_request *req,
return ret;
 }
 
-static int test_hash(struct crypto_ahash *tfm, struct hash_testvec *template,
-unsigned int tcount, bool use_digest)
+static int __test_hash(struct crypto_ahash *tfm, struct hash_testvec *template,
+  unsigned int tcount, bool use_digest,
+  const int align_offset)
 {
const char *algo = crypto_tfm_alg_driver_name(crypto_ahash_tfm(tfm));
unsigned int i, j, k, temp;
@@ -216,10 +217,15 @@ static int test_hash(struct crypto_ahash *tfm, struct 
hash_testvec *template,
if (template[i].np)
continue;
 
+   ret = -EINVAL;
+   if (WARN_ON(align_offset + template[i].psize  PAGE_SIZE))
+   goto out;
+
j++;
memset(result, 0, 64);
 
hash_buff = xbuf[0];
+   hash_buff += align_offset;
 
memcpy(hash_buff, template[i].plaintext, template[i].psize);
sg_init_one(sg[0], hash_buff, template[i].psize);
@@ -281,6 +287,10 @@ static int test_hash(struct crypto_ahash *tfm, struct 
hash_testvec *template,
 
j = 0;
for (i = 0; i  tcount; i++) {
+   /* alignment tests are only done with continuous buffers */
+   if (align_offset != 0)
+   break;
+
if (template[i].np) {
j++;
memset(result, 0, 64);
@@ -358,6 +368,33 @@ out_nobuf:
return ret;
 }
 
+static int test_hash(struct crypto_ahash *tfm, struct hash_testvec *template,
+unsigned int tcount, bool use_digest)
+{
+   unsigned int alignmask;
+   int ret;
+
+   ret = __test_hash(tfm, template, tcount, use_digest, 0);
+   if (ret)
+   return ret;
+
+   /* test unaligned buffers, check with one byte offset */
+   ret = __test_hash(tfm, template, tcount, use_digest, 1);
+   if (ret)
+   return ret;
+
+   alignmask = crypto_tfm_alg_alignmask(tfm-base);
+   if (alignmask) {
+   /* Check if alignment mask for tfm is correctly set. */
+   ret = __test_hash(tfm, template, tcount, use_digest,
+ alignmask + 1);
+   if (ret)
+   return ret;
+   }
+
+   return 0;
+}
+
 static int __test_aead(struct crypto_aead *tfm, int enc,
   struct aead_testvec *template, unsigned int tcount,
   const bool diff_dst, const int align_offset)

--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/4] crypto: testmgr - test AEADs with unaligned buffers

2013-06-13 Thread Jussi Kivilinna
This patch adds unaligned buffer tests for AEADs.

The first new test is with one byte offset and the second test checks if
cra_alignmask for driver is big enough; for example, for testing a case
where cra_alignmask is set to 7, but driver really needs buffers to be
aligned to 16 bytes.

Signed-off-by: Jussi Kivilinna jussi.kivili...@iki.fi
---
 crypto/testmgr.c |   37 +++--
 1 file changed, 31 insertions(+), 6 deletions(-)

diff --git a/crypto/testmgr.c b/crypto/testmgr.c
index 8bd185f..f205386 100644
--- a/crypto/testmgr.c
+++ b/crypto/testmgr.c
@@ -360,7 +360,7 @@ out_nobuf:
 
 static int __test_aead(struct crypto_aead *tfm, int enc,
   struct aead_testvec *template, unsigned int tcount,
-  const bool diff_dst)
+  const bool diff_dst, const int align_offset)
 {
const char *algo = crypto_tfm_alg_driver_name(crypto_aead_tfm(tfm));
unsigned int i, j, k, n, temp;
@@ -423,15 +423,16 @@ static int __test_aead(struct crypto_aead *tfm, int enc,
if (!template[i].np) {
j++;
 
-   /* some tepmplates have no input data but they will
+   /* some templates have no input data but they will
 * touch input
 */
input = xbuf[0];
+   input += align_offset;
assoc = axbuf[0];
 
ret = -EINVAL;
-   if (WARN_ON(template[i].ilen  PAGE_SIZE ||
-   template[i].alen  PAGE_SIZE))
+   if (WARN_ON(align_offset + template[i].ilen 
+   PAGE_SIZE || template[i].alen  PAGE_SIZE))
goto out;
 
memcpy(input, template[i].input, template[i].ilen);
@@ -470,6 +471,7 @@ static int __test_aead(struct crypto_aead *tfm, int enc,
 
if (diff_dst) {
output = xoutbuf[0];
+   output += align_offset;
sg_init_one(sgout[0], output,
template[i].ilen +
(enc ? authsize : 0));
@@ -530,6 +532,10 @@ static int __test_aead(struct crypto_aead *tfm, int enc,
}
 
for (i = 0, j = 0; i  tcount; i++) {
+   /* alignment tests are only done with continuous buffers */
+   if (align_offset != 0)
+   break;
+
if (template[i].np) {
j++;
 
@@ -732,15 +738,34 @@ out_noxbuf:
 static int test_aead(struct crypto_aead *tfm, int enc,
 struct aead_testvec *template, unsigned int tcount)
 {
+   unsigned int alignmask;
int ret;
 
/* test 'dst == src' case */
-   ret = __test_aead(tfm, enc, template, tcount, false);
+   ret = __test_aead(tfm, enc, template, tcount, false, 0);
if (ret)
return ret;
 
/* test 'dst != src' case */
-   return __test_aead(tfm, enc, template, tcount, true);
+   ret = __test_aead(tfm, enc, template, tcount, true, 0);
+   if (ret)
+   return ret;
+
+   /* test unaligned buffers, check with one byte offset */
+   ret = __test_aead(tfm, enc, template, tcount, true, 1);
+   if (ret)
+   return ret;
+
+   alignmask = crypto_tfm_alg_alignmask(tfm-base);
+   if (alignmask) {
+   /* Check if alignment mask for tfm is correctly set. */
+   ret = __test_aead(tfm, enc, template, tcount, true,
+ alignmask + 1);
+   if (ret)
+   return ret;
+   }
+
+   return 0;
 }
 
 static int test_cipher(struct crypto_cipher *tfm, int enc,

--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/4] crypto: testmgr - check that entries in alg_test_descs are in correct order

2013-06-13 Thread Jussi Kivilinna
Patch adds check for alg_test_descs list order, so that accidentically
misplaced entries are found quicker. Duplicate entries are also checked for.

Signed-off-by: Jussi Kivilinna jussi.kivili...@iki.fi
---
 crypto/testmgr.c |   31 +++
 1 file changed, 31 insertions(+)

diff --git a/crypto/testmgr.c b/crypto/testmgr.c
index b2bc533..a81c154 100644
--- a/crypto/testmgr.c
+++ b/crypto/testmgr.c
@@ -3054,6 +3054,35 @@ static const struct alg_test_desc alg_test_descs[] = {
}
 };
 
+static bool alg_test_descs_checked;
+
+static void alg_test_descs_check_order(void)
+{
+   int i;
+
+   /* only check once */
+   if (alg_test_descs_checked)
+   return;
+
+   alg_test_descs_checked = true;
+
+   for (i = 1; i  ARRAY_SIZE(alg_test_descs); i++) {
+   int diff = strcmp(alg_test_descs[i - 1].alg,
+ alg_test_descs[i].alg);
+
+   if (WARN_ON(diff  0)) {
+   pr_warn(testmgr: alg_test_descs entries in wrong 
order: '%s' before '%s'\n,
+   alg_test_descs[i - 1].alg,
+   alg_test_descs[i].alg);
+   }
+
+   if (WARN_ON(diff == 0)) {
+   pr_warn(testmgr: duplicate alg_test_descs entry: 
'%s'\n,
+   alg_test_descs[i].alg);
+   }
+   }
+}
+
 static int alg_find_test(const char *alg)
 {
int start = 0;
@@ -3085,6 +3114,8 @@ int alg_test(const char *driver, const char *alg, u32 
type, u32 mask)
int j;
int rc;
 
+   alg_test_descs_check_order();
+
if ((type  CRYPTO_ALG_TYPE_MASK) == CRYPTO_ALG_TYPE_CIPHER) {
char nalg[CRYPTO_MAX_ALG_NAME];
 

--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: GPF in aesni_xts_crypt8 (3.10-rc5)

2013-06-11 Thread Jussi Kivilinna
Hello,

Does attached patch help?

-Jussi

On 11.06.2013 20:26, Dave Jones wrote:
 Just found that 3.10-rc doesn't boot on my laptop with encrypted disk.
 
 
 general protection fault:  [#1] PREEMPT SMP DEBUG_PAGEALLOC
 Modules linked in: xfs libcrc32c dm_crypt crc32c_intel ghash_clmulni_intel 
 aesni_intel glue_helper ablk_helper i915 i2c_algo_bit drm_kms_helper drm 
 i2c_core video
 CPU: 1 PID: 53 Comm: kworker/1:1 Not tainted 3.10.0-rc5+ #5 
 Hardware name: LENOVO 2356JK8/2356JK8, BIOS G7ET94WW (2.54 ) 04/30/2013
 Workqueue: kcryptd kcryptd_crypt [dm_crypt]
 task: 880135c58000 ti: 880135c54000 task.ti: 880135c54000
 RIP: 0010:[a01433a2]  [a01433a2] 
 aesni_xts_crypt8+0x42/0x1e0 [aesni_intel]
 RSP: 0018:880135c55b68  EFLAGS: 00010282
 RAX: a0142eb8 RBX: 0080 RCX: 00f0
 RDX: 8801316eeaa8 RSI: 8801316eeaa8 RDI: 88012fd84440
 RBP: 880135c55b70 R08: 8801304fe118 R09: 0020
 R10: 00f0 R11: a0142eb8 R12: 8801316eeb28
 R13: 0080 R14: 8801316eeb28 R15: 0180
 FS:  () GS:88013940() knlGS:
 CS:  0010 DS:  ES:  CR0: 80050033
 CR2: 0039e88bc720 CR3: 01c0b000 CR4: 001407e0
 Stack:
  a0143683 880135c55c40 a00602fb 880135c55c70
  a0146060 01ad0190 a0146060 ea0004c5bb80
  8801316eeaa8 ea0004c5bb80 8801316eeaa8 8801304fe0c0
 Call Trace:
  [a0143683] ? aesni_xts_dec8+0x13/0x20 [aesni_intel]
  [a00602fb] glue_xts_crypt_128bit+0x10b/0x1c0 [glue_helper]
  [a014358b] xts_decrypt+0x4b/0x50 [aesni_intel]
  [a000617f] ablk_decrypt+0x4f/0xd0 [ablk_helper]
  [a0067202] crypt_convert+0x352/0x3b0 [dm_crypt]
  [a00675b5] kcryptd_crypt+0x355/0x4e0 [dm_crypt]
  [81061b35] ? process_one_work+0x1a5/0x700
  [81061ba1] process_one_work+0x211/0x700
  [81061b35] ? process_one_work+0x1a5/0x700
  [810621ab] worker_thread+0x11b/0x3a0
  [81062090] ? process_one_work+0x700/0x700
  [81069f4d] kthread+0xed/0x100
  [81069e60] ? insert_kthread_work+0x80/0x80
  [815fd41c] ret_from_fork+0x7c/0xb0
  [81069e60] ? insert_kthread_work+0x80/0x80
 Code: 8d 04 25 b8 2e 14 a0 41 0f 44 ca 4c 0f 44 d8 66 44 0f 6f 14 25 00 70 14 
 a0 41 0f 10 18 44 8b 8f e0 01 00 00 48 01 cf 66 0f 6f c3 66 0f ef 02 f3 0f 
 7f 1e 66 44 0f 70 db 13 66 0f d4 db 66 41 0f 
 RIP  [a01433a2] aesni_xts_crypt8+0x42/0x1e0 [aesni_intel]
  RSP 880135c55b68
 
0: 8d 04 25 b8 2e 14 a0lea0xa0142eb8,%eax
7: 41 0f 44 ca cmove  %r10d,%ecx
b: 4c 0f 44 d8 cmove  %rax,%r11
f: 66 44 0f 6f 14 25 00movdqa 0xa0147000,%xmm10
   16: 70 14 a0 
   19: 41 0f 10 18 movups (%r8),%xmm3
   1d: 44 8b 8f e0 01 00 00mov0x1e0(%rdi),%r9d
   24: 48 01 cfadd%rcx,%rdi
   27: 66 0f 6f c3 movdqa %xmm3,%xmm0
   2b:*66 0f ef 02 pxor   (%rdx),%xmm0 -- trapping 
 instruction
   2f: f3 0f 7f 1e movdqu %xmm3,(%rsi)
   33: 66 44 0f 70 db 13   pshufd $0x13,%xmm3,%xmm11
   39: 66 0f d4 db paddq  %xmm3,%xmm3
   3d: 66  data16
   3e: 41  rex.B
   3f: 
 
 

crypto: aesni_intel - fix accessing of unaligned memory

From: Jussi Kivilinna jussi.kivili...@iki.fi

The new XTS code for aesni_intel uses input buffers directly as memory operands
for pxor instructions, which causes crash if those buffers are not aligned to
16 bytes.

Patch change XTS code to handle unaligned memory correctly, by loading memory
with movdqu instead.

Reported-by: Dave Jones da...@redhat.com
Signed-off-by: Jussi Kivilinna jussi.kivili...@iki.fi
---
 arch/x86/crypto/aesni-intel_asm.S |   48 +
 1 file changed, 32 insertions(+), 16 deletions(-)

diff --git a/arch/x86/crypto/aesni-intel_asm.S b/arch/x86/crypto/aesni-intel_asm.S
index 62fe22c..477e9d7 100644
--- a/arch/x86/crypto/aesni-intel_asm.S
+++ b/arch/x86/crypto/aesni-intel_asm.S
@@ -2681,56 +2681,68 @@ ENTRY(aesni_xts_crypt8)
 	addq %rcx, KEYP
 
 	movdqa IV, STATE1
-	pxor 0x00(INP), STATE1
+	movdqu 0x00(INP), INC
+	pxor INC, STATE1
 	movdqu IV, 0x00(OUTP)
 
 	_aesni_gf128mul_x_ble()
 	movdqa IV, STATE2
-	pxor 0x10(INP), STATE2
+	movdqu 0x10(INP), INC
+	pxor INC, STATE2
 	movdqu IV, 0x10(OUTP)
 
 	_aesni_gf128mul_x_ble()
 	movdqa IV, STATE3
-	pxor 0x20(INP), STATE3
+	movdqu 0x20(INP), INC
+	pxor INC, STATE3
 	movdqu IV, 0x20(OUTP)
 
 	_aesni_gf128mul_x_ble()
 	movdqa IV, STATE4
-	pxor 0x30(INP), STATE4
+	movdqu 0x30(INP), INC
+	pxor INC, STATE4
 	movdqu IV, 0x30(OUTP)
 
 	call *%r11
 
-	pxor 0x00(OUTP), STATE1
+	movdqu 0x00(OUTP), INC
+	pxor INC, STATE1
 	movdqu STATE1, 0x00(OUTP)
 
 	_aesni_gf128mul_x_ble()
 	movdqa IV, STATE1
-	pxor 0x40(INP

[PATCH] crypto: aesni_intel - fix accessing of unaligned memory

2013-06-11 Thread Jussi Kivilinna
The new XTS code for aesni_intel uses input buffers directly as memory operands
for pxor instructions, which causes crash if those buffers are not aligned to
16 bytes.

Patch changes XTS code to handle unaligned memory correctly, by loading memory
with movdqu instead.

Reported-by: Dave Jones da...@redhat.com
Tested-by: Dave Jones da...@redhat.com
Signed-off-by: Jussi Kivilinna jussi.kivili...@iki.fi
---
 arch/x86/crypto/aesni-intel_asm.S |   48 +
 1 file changed, 32 insertions(+), 16 deletions(-)

diff --git a/arch/x86/crypto/aesni-intel_asm.S 
b/arch/x86/crypto/aesni-intel_asm.S
index 62fe22c..477e9d7 100644
--- a/arch/x86/crypto/aesni-intel_asm.S
+++ b/arch/x86/crypto/aesni-intel_asm.S
@@ -2681,56 +2681,68 @@ ENTRY(aesni_xts_crypt8)
addq %rcx, KEYP
 
movdqa IV, STATE1
-   pxor 0x00(INP), STATE1
+   movdqu 0x00(INP), INC
+   pxor INC, STATE1
movdqu IV, 0x00(OUTP)
 
_aesni_gf128mul_x_ble()
movdqa IV, STATE2
-   pxor 0x10(INP), STATE2
+   movdqu 0x10(INP), INC
+   pxor INC, STATE2
movdqu IV, 0x10(OUTP)
 
_aesni_gf128mul_x_ble()
movdqa IV, STATE3
-   pxor 0x20(INP), STATE3
+   movdqu 0x20(INP), INC
+   pxor INC, STATE3
movdqu IV, 0x20(OUTP)
 
_aesni_gf128mul_x_ble()
movdqa IV, STATE4
-   pxor 0x30(INP), STATE4
+   movdqu 0x30(INP), INC
+   pxor INC, STATE4
movdqu IV, 0x30(OUTP)
 
call *%r11
 
-   pxor 0x00(OUTP), STATE1
+   movdqu 0x00(OUTP), INC
+   pxor INC, STATE1
movdqu STATE1, 0x00(OUTP)
 
_aesni_gf128mul_x_ble()
movdqa IV, STATE1
-   pxor 0x40(INP), STATE1
+   movdqu 0x40(INP), INC
+   pxor INC, STATE1
movdqu IV, 0x40(OUTP)
 
-   pxor 0x10(OUTP), STATE2
+   movdqu 0x10(OUTP), INC
+   pxor INC, STATE2
movdqu STATE2, 0x10(OUTP)
 
_aesni_gf128mul_x_ble()
movdqa IV, STATE2
-   pxor 0x50(INP), STATE2
+   movdqu 0x50(INP), INC
+   pxor INC, STATE2
movdqu IV, 0x50(OUTP)
 
-   pxor 0x20(OUTP), STATE3
+   movdqu 0x20(OUTP), INC
+   pxor INC, STATE3
movdqu STATE3, 0x20(OUTP)
 
_aesni_gf128mul_x_ble()
movdqa IV, STATE3
-   pxor 0x60(INP), STATE3
+   movdqu 0x60(INP), INC
+   pxor INC, STATE3
movdqu IV, 0x60(OUTP)
 
-   pxor 0x30(OUTP), STATE4
+   movdqu 0x30(OUTP), INC
+   pxor INC, STATE4
movdqu STATE4, 0x30(OUTP)
 
_aesni_gf128mul_x_ble()
movdqa IV, STATE4
-   pxor 0x70(INP), STATE4
+   movdqu 0x70(INP), INC
+   pxor INC, STATE4
movdqu IV, 0x70(OUTP)
 
_aesni_gf128mul_x_ble()
@@ -2738,16 +2750,20 @@ ENTRY(aesni_xts_crypt8)
 
call *%r11
 
-   pxor 0x40(OUTP), STATE1
+   movdqu 0x40(OUTP), INC
+   pxor INC, STATE1
movdqu STATE1, 0x40(OUTP)
 
-   pxor 0x50(OUTP), STATE2
+   movdqu 0x50(OUTP), INC
+   pxor INC, STATE2
movdqu STATE2, 0x50(OUTP)
 
-   pxor 0x60(OUTP), STATE3
+   movdqu 0x60(OUTP), INC
+   pxor INC, STATE3
movdqu STATE3, 0x60(OUTP)
 
-   pxor 0x70(OUTP), STATE4
+   movdqu 0x70(OUTP), INC
+   pxor INC, STATE4
movdqu STATE4, 0x70(OUTP)
 
ret

--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] crypto: camellia-aesni-avx2 - tune assembly code for more performance

2013-06-08 Thread Jussi Kivilinna
Add implementation tuned for more performance on real hardware. Changes are
mostly around the part mixing 128-bit extract and insert instructions and
AES-NI instructions. Also 'vpbroadcastb' instructions have been change to
'vpshufb with zero mask'.

Tests on Intel Core i5-4570:

tcrypt ECB results, old-AVX2 vs new-AVX2:

size128bit key  256bit key
enc dec enc dec
256 1.00x   1.00x   1.00x   1.00x
1k  1.08x   1.09x   1.05x   1.06x
8k  1.06x   1.06x   1.06x   1.06x

tcrypt ECB results, AVX vs new-AVX2:

size128bit key  256bit key
enc dec enc dec
256 1.00x   1.00x   1.00x   1.00x
1k  1.51x   1.50x   1.52x   1.50x
8k  1.47x   1.48x   1.48x   1.48x

Signed-off-by: Jussi Kivilinna jussi.kivili...@iki.fi
---
 arch/x86/crypto/camellia-aesni-avx2-asm_64.S |  160 ++
 1 file changed, 89 insertions(+), 71 deletions(-)

diff --git a/arch/x86/crypto/camellia-aesni-avx2-asm_64.S 
b/arch/x86/crypto/camellia-aesni-avx2-asm_64.S
index 91a1878..0e0b886 100644
--- a/arch/x86/crypto/camellia-aesni-avx2-asm_64.S
+++ b/arch/x86/crypto/camellia-aesni-avx2-asm_64.S
@@ -51,16 +51,6 @@
 #define ymm14_x xmm14
 #define ymm15_x xmm15
 
-/*
- * AES-NI instructions do not support ymmX registers, so we need splitting and
- * merging.
- */
-#define vaesenclast256(zero, yreg, tmp) \
-   vextracti128 $1, yreg, tmp##_x; \
-   vaesenclast zero##_x, yreg##_x, yreg##_x; \
-   vaesenclast zero##_x, tmp##_x, tmp##_x; \
-   vinserti128 $1, tmp##_x, yreg, yreg;
-
 /**
   32-way camellia
  **/
@@ -79,46 +69,70 @@
 * S-function with AES subbytes \
 */ \
vbroadcasti128 .Linv_shift_row, t4; \
-   vpbroadcastb .L0f0f0f0f, t7; \
-   vbroadcasti128 .Lpre_tf_lo_s1, t0; \
-   vbroadcasti128 .Lpre_tf_hi_s1, t1; \
+   vpbroadcastd .L0f0f0f0f, t7; \
+   vbroadcasti128 .Lpre_tf_lo_s1, t5; \
+   vbroadcasti128 .Lpre_tf_hi_s1, t6; \
+   vbroadcasti128 .Lpre_tf_lo_s4, t2; \
+   vbroadcasti128 .Lpre_tf_hi_s4, t3; \
\
/* AES inverse shift rows */ \
vpshufb t4, x0, x0; \
vpshufb t4, x7, x7; \
-   vpshufb t4, x1, x1; \
-   vpshufb t4, x4, x4; \
-   vpshufb t4, x2, x2; \
-   vpshufb t4, x5, x5; \
vpshufb t4, x3, x3; \
vpshufb t4, x6, x6; \
+   vpshufb t4, x2, x2; \
+   vpshufb t4, x5, x5; \
+   vpshufb t4, x1, x1; \
+   vpshufb t4, x4, x4; \
\
/* prefilter sboxes 1, 2 and 3 */ \
-   vbroadcasti128 .Lpre_tf_lo_s4, t2; \
-   vbroadcasti128 .Lpre_tf_hi_s4, t3; \
-   filter_8bit(x0, t0, t1, t7, t6); \
-   filter_8bit(x7, t0, t1, t7, t6); \
-   filter_8bit(x1, t0, t1, t7, t6); \
-   filter_8bit(x4, t0, t1, t7, t6); \
-   filter_8bit(x2, t0, t1, t7, t6); \
-   filter_8bit(x5, t0, t1, t7, t6); \
-   \
/* prefilter sbox 4 */ \
+   filter_8bit(x0, t5, t6, t7, t4); \
+   filter_8bit(x7, t5, t6, t7, t4); \
+   vextracti128 $1, x0, t0##_x; \
+   vextracti128 $1, x7, t1##_x; \
+   filter_8bit(x3, t2, t3, t7, t4); \
+   filter_8bit(x6, t2, t3, t7, t4); \
+   vextracti128 $1, x3, t3##_x; \
+   vextracti128 $1, x6, t2##_x; \
+   filter_8bit(x2, t5, t6, t7, t4); \
+   filter_8bit(x5, t5, t6, t7, t4); \
+   filter_8bit(x1, t5, t6, t7, t4); \
+   filter_8bit(x4, t5, t6, t7, t4); \
+   \
vpxor t4##_x, t4##_x, t4##_x; \
-   filter_8bit(x3, t2, t3, t7, t6); \
-   filter_8bit(x6, t2, t3, t7, t6); \
\
/* AES subbytes + AES shift rows */ \
+   vextracti128 $1, x2, t6##_x; \
+   vextracti128 $1, x5, t5##_x; \
+   vaesenclast t4##_x, x0##_x, x0##_x; \
+   vaesenclast t4##_x, t0##_x, t0##_x; \
+   vinserti128 $1, t0##_x, x0, x0; \
+   vaesenclast t4##_x, x7##_x, x7##_x; \
+   vaesenclast t4##_x, t1##_x, t1##_x; \
+   vinserti128 $1, t1##_x, x7, x7; \
+   vaesenclast t4##_x, x3##_x, x3##_x; \
+   vaesenclast t4##_x, t3##_x, t3##_x; \
+   vinserti128 $1, t3##_x, x3, x3; \
+   vaesenclast t4##_x, x6##_x, x6##_x; \
+   vaesenclast t4##_x, t2##_x, t2##_x; \
+   vinserti128 $1, t2##_x, x6, x6; \
+   vextracti128 $1, x1, t3##_x; \
+   vextracti128 $1, x4, t2##_x; \
vbroadcasti128 .Lpost_tf_lo_s1, t0; \
vbroadcasti128 .Lpost_tf_hi_s1, t1; \
-   vaesenclast256(t4, x0, t5); \
-   vaesenclast256(t4, x7, t5); \
-   vaesenclast256(t4, x1, t5); \
-   vaesenclast256(t4, x4, t5); \
-   vaesenclast256(t4, x2, t5); \
-   vaesenclast256(t4, x5, t5); \
-   vaesenclast256(t4, x3, t5); \
-   vaesenclast256(t4, x6, t5); \
+   vaesenclast t4##_x, x2##_x, x2##_x; \
+   vaesenclast t4##_x, t6##_x, t6##_x; \
+   vinserti128 $1, t6##_x, x2, x2; \
+   vaesenclast t4##_x

[PATCH 1/2] Revert crypto: blowfish - add AVX2/x86_64 implementation of blowfish cipher

2013-06-08 Thread Jussi Kivilinna
This reverts commit 604880107010a1e5794552d184cd5471ea31b973.

Instruction (vpgatherdd) that this implementation relied on turned out to be
slow performer on real hardware (i5-4570). The previous 4-way blowfish
implementation is therefore faster and this implementation should be removed.

Signed-off-by: Jussi Kivilinna jussi.kivili...@iki.fi
---
 arch/x86/crypto/Makefile   |4 
 arch/x86/crypto/blowfish-avx2-asm_64.S |  449 -
 arch/x86/crypto/blowfish_avx2_glue.c   |  585 
 arch/x86/crypto/blowfish_glue.c|   32 +-
 arch/x86/include/asm/crypto/blowfish.h |   43 --
 crypto/Kconfig |   18 -
 crypto/testmgr.c   |   12 -
 7 files changed, 24 insertions(+), 1119 deletions(-)
 delete mode 100644 arch/x86/crypto/blowfish-avx2-asm_64.S
 delete mode 100644 arch/x86/crypto/blowfish_avx2_glue.c
 delete mode 100644 arch/x86/include/asm/crypto/blowfish.h

diff --git a/arch/x86/crypto/Makefile b/arch/x86/crypto/Makefile
index 94cb151..9ce3418 100644
--- a/arch/x86/crypto/Makefile
+++ b/arch/x86/crypto/Makefile
@@ -3,8 +3,6 @@
 #
 
 avx_supported := $(call as-instr,vpxor %xmm0$(comma)%xmm0$(comma)%xmm0,yes,no)
-avx2_supported := $(call as-instr,vpgatherdd %ymm0$(comma)(%eax$(comma)%ymm1\
-   $(comma)4)$(comma)%ymm2,yes,no)
 
 obj-$(CONFIG_CRYPTO_ABLK_HELPER_X86) += ablk_helper.o
 obj-$(CONFIG_CRYPTO_GLUE_HELPER_X86) += glue_helper.o
@@ -43,7 +41,6 @@ endif
 
 # These modules require assembler to support AVX2.
 ifeq ($(avx2_supported),yes)
-   obj-$(CONFIG_CRYPTO_BLOWFISH_AVX2_X86_64) += blowfish-avx2.o
obj-$(CONFIG_CRYPTO_CAMELLIA_AESNI_AVX2_X86_64) += camellia-aesni-avx2.o
obj-$(CONFIG_CRYPTO_SERPENT_AVX2_X86_64) += serpent-avx2.o
obj-$(CONFIG_CRYPTO_TWOFISH_AVX2_X86_64) += twofish-avx2.o
@@ -74,7 +71,6 @@ ifeq ($(avx_supported),yes)
 endif
 
 ifeq ($(avx2_supported),yes)
-   blowfish-avx2-y := blowfish-avx2-asm_64.o blowfish_avx2_glue.o
camellia-aesni-avx2-y := camellia-aesni-avx2-asm_64.o 
camellia_aesni_avx2_glue.o
serpent-avx2-y := serpent-avx2-asm_64.o serpent_avx2_glue.o
twofish-avx2-y := twofish-avx2-asm_64.o twofish_avx2_glue.o
diff --git a/arch/x86/crypto/blowfish-avx2-asm_64.S 
b/arch/x86/crypto/blowfish-avx2-asm_64.S
deleted file mode 100644
index 784452e..000
--- a/arch/x86/crypto/blowfish-avx2-asm_64.S
+++ /dev/null
@@ -1,449 +0,0 @@
-/*
- * x86_64/AVX2 assembler optimized version of Blowfish
- *
- * Copyright © 2012-2013 Jussi Kivilinna jussi.kivili...@iki.fi
- *
- * This program is free software; you can redistribute it and/or modify
- * it under the terms of the GNU General Public License as published by
- * the Free Software Foundation; either version 2 of the License, or
- * (at your option) any later version.
- *
- */
-
-#include linux/linkage.h
-
-.file blowfish-avx2-asm_64.S
-
-.data
-.align 32
-
-.Lprefetch_mask:
-.long 0*64
-.long 1*64
-.long 2*64
-.long 3*64
-.long 4*64
-.long 5*64
-.long 6*64
-.long 7*64
-
-.Lbswap32_mask:
-.long 0x00010203
-.long 0x04050607
-.long 0x08090a0b
-.long 0x0c0d0e0f
-
-.Lbswap128_mask:
-   .byte 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0
-.Lbswap_iv_mask:
-   .byte 7, 6, 5, 4, 3, 2, 1, 0, 7, 6, 5, 4, 3, 2, 1, 0
-
-.text
-/* structure of crypto context */
-#define p  0
-#define s0 ((16 + 2) * 4)
-#define s1 ((16 + 2 + (1 * 256)) * 4)
-#define s2 ((16 + 2 + (2 * 256)) * 4)
-#define s3 ((16 + 2 + (3 * 256)) * 4)
-
-/* register macros */
-#define CTX%rdi
-#define RIO %rdx
-
-#define RS0%rax
-#define RS1%r8
-#define RS2%r9
-#define RS3%r10
-
-#define RLOOP  %r11
-#define RLOOPd %r11d
-
-#define RXr0   %ymm8
-#define RXr1   %ymm9
-#define RXr2   %ymm10
-#define RXr3   %ymm11
-#define RXl0   %ymm12
-#define RXl1   %ymm13
-#define RXl2   %ymm14
-#define RXl3   %ymm15
-
-/* temp regs */
-#define RT0%ymm0
-#define RT0x   %xmm0
-#define RT1%ymm1
-#define RT1x   %xmm1
-#define RIDX0  %ymm2
-#define RIDX1  %ymm3
-#define RIDX1x %xmm3
-#define RIDX2  %ymm4
-#define RIDX3  %ymm5
-
-/* vpgatherdd mask and '-1' */
-#define RNOT   %ymm6
-
-/* byte mask, (-1  24) */
-#define RBYTE  %ymm7
-
-/***
- * 32-way AVX2 blowfish
- ***/
-#define F(xl, xr) \
-   vpsrld $24, xl, RIDX0; \
-   vpsrld $16, xl, RIDX1; \
-   vpsrld $8, xl, RIDX2; \
-   vpand RBYTE, RIDX1, RIDX1; \
-   vpand RBYTE, RIDX2, RIDX2; \
-   vpand RBYTE, xl, RIDX3; \
-   \
-   vpgatherdd RNOT, (RS0, RIDX0, 4), RT0; \
-   vpcmpeqd RNOT, RNOT, RNOT; \
-   vpcmpeqd RIDX0, RIDX0, RIDX0; \
-   \
-   vpgatherdd RNOT, (RS1, RIDX1, 4), RT1; \
-   vpcmpeqd RIDX1, RIDX1, RIDX1; \
-   vpaddd RT0, RT1, RT0; \
-   \
-   vpgatherdd RIDX0, (RS2, RIDX2, 4), RT1

[PATCH 2/2] Revert crypto: twofish - add AVX2/x86_64 assembler implementation of twofish cipher

2013-06-08 Thread Jussi Kivilinna
This reverts commit cf1521a1a5e21fd1e79a458605c4282fbfbbeee2.

Instruction (vpgatherdd) that this implementation relied on turned out to be
slow performer on real hardware (i5-4570). The previous 8-way twofish/AVX
implementation is therefore faster and this implementation should be removed.

Converting this implementation to use the same method as in twofish/AVX for
table look-ups would give additional ~3% speed up vs twofish/AVX, but would
hardly be worth of the added code and binary size.

Signed-off-by: Jussi Kivilinna jussi.kivili...@iki.fi
---
 arch/x86/crypto/Makefile  |2 
 arch/x86/crypto/twofish-avx2-asm_64.S |  600 -
 arch/x86/crypto/twofish_avx2_glue.c   |  584 
 arch/x86/crypto/twofish_avx_glue.c|   14 -
 arch/x86/include/asm/crypto/twofish.h |   18 -
 crypto/Kconfig|   24 -
 crypto/testmgr.c  |   12 -
 7 files changed, 2 insertions(+), 1252 deletions(-)
 delete mode 100644 arch/x86/crypto/twofish-avx2-asm_64.S
 delete mode 100644 arch/x86/crypto/twofish_avx2_glue.c

diff --git a/arch/x86/crypto/Makefile b/arch/x86/crypto/Makefile
index 9ce3418..7d6ba9d 100644
--- a/arch/x86/crypto/Makefile
+++ b/arch/x86/crypto/Makefile
@@ -43,7 +43,6 @@ endif
 ifeq ($(avx2_supported),yes)
obj-$(CONFIG_CRYPTO_CAMELLIA_AESNI_AVX2_X86_64) += camellia-aesni-avx2.o
obj-$(CONFIG_CRYPTO_SERPENT_AVX2_X86_64) += serpent-avx2.o
-   obj-$(CONFIG_CRYPTO_TWOFISH_AVX2_X86_64) += twofish-avx2.o
 endif
 
 aes-i586-y := aes-i586-asm_32.o aes_glue.o
@@ -73,7 +72,6 @@ endif
 ifeq ($(avx2_supported),yes)
camellia-aesni-avx2-y := camellia-aesni-avx2-asm_64.o 
camellia_aesni_avx2_glue.o
serpent-avx2-y := serpent-avx2-asm_64.o serpent_avx2_glue.o
-   twofish-avx2-y := twofish-avx2-asm_64.o twofish_avx2_glue.o
 endif
 
 aesni-intel-y := aesni-intel_asm.o aesni-intel_glue.o fpu.o
diff --git a/arch/x86/crypto/twofish-avx2-asm_64.S 
b/arch/x86/crypto/twofish-avx2-asm_64.S
deleted file mode 100644
index e1a83b9..000
--- a/arch/x86/crypto/twofish-avx2-asm_64.S
+++ /dev/null
@@ -1,600 +0,0 @@
-/*
- * x86_64/AVX2 assembler optimized version of Twofish
- *
- * Copyright © 2012-2013 Jussi Kivilinna jussi.kivili...@iki.fi
- *
- * This program is free software; you can redistribute it and/or modify
- * it under the terms of the GNU General Public License as published by
- * the Free Software Foundation; either version 2 of the License, or
- * (at your option) any later version.
- *
- */
-
-#include linux/linkage.h
-#include glue_helper-asm-avx2.S
-
-.file twofish-avx2-asm_64.S
-
-.data
-.align 16
-
-.Lvpshufb_mask0:
-.long 0x80808000
-.long 0x80808004
-.long 0x80808008
-.long 0x8080800c
-
-.Lbswap128_mask:
-   .byte 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0
-.Lxts_gf128mul_and_shl1_mask_0:
-   .byte 0x87, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0
-.Lxts_gf128mul_and_shl1_mask_1:
-   .byte 0x0e, 1, 0, 0, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0
-
-.text
-
-/* structure of crypto context */
-#define s0 0
-#define s1 1024
-#define s2 2048
-#define s3 3072
-#define w  4096
-#definek   4128
-
-/* register macros */
-#define CTX%rdi
-
-#define RS0CTX
-#define RS1%r8
-#define RS2%r9
-#define RS3%r10
-#define RK %r11
-#define RW %rax
-#define RROUND  %r12
-#define RROUNDd %r12d
-
-#define RA0%ymm8
-#define RB0%ymm9
-#define RC0%ymm10
-#define RD0%ymm11
-#define RA1%ymm12
-#define RB1%ymm13
-#define RC1%ymm14
-#define RD1%ymm15
-
-/* temp regs */
-#define RX0%ymm0
-#define RY0%ymm1
-#define RX1%ymm2
-#define RY1%ymm3
-#define RT0%ymm4
-#define RIDX   %ymm5
-
-#define RX0x   %xmm0
-#define RY0x   %xmm1
-#define RX1x   %xmm2
-#define RY1x   %xmm3
-#define RT0x   %xmm4
-
-/* vpgatherdd mask and '-1' */
-#define RNOT   %ymm6
-
-/* byte mask, (-1  24) */
-#define RBYTE  %ymm7
-
-/**
-  16-way AVX2 twofish
- **/
-#define init_round_constants() \
-   vpcmpeqd RNOT, RNOT, RNOT; \
-   vpsrld $24, RNOT, RBYTE; \
-   leaq k(CTX), RK; \
-   leaq w(CTX), RW; \
-   leaq s1(CTX), RS1; \
-   leaq s2(CTX), RS2; \
-   leaq s3(CTX), RS3; \
-
-#define g16(ab, rs0, rs1, rs2, rs3, xy) \
-   vpand RBYTE, ab ## 0, RIDX; \
-   vpgatherdd RNOT, (rs0, RIDX, 4), xy ## 0; \
-   vpcmpeqd RNOT, RNOT, RNOT; \
-   \
-   vpand RBYTE, ab ## 1, RIDX; \
-   vpgatherdd RNOT, (rs0, RIDX, 4), xy ## 1; \
-   vpcmpeqd RNOT, RNOT, RNOT; \
-   \
-   vpsrld $8, ab ## 0, RIDX; \
-   vpand RBYTE, RIDX, RIDX; \
-   vpgatherdd RNOT, (rs1, RIDX, 4), RT0; \
-   vpcmpeqd RNOT, RNOT, RNOT; \
-   vpxor RT0, xy ## 0, xy ## 0

Re: [PATCH 2/2] crypto: blowfish - disable AVX2 implementation

2013-06-05 Thread Jussi Kivilinna
On 05.06.2013 11:34, Herbert Xu wrote:
 On Sun, Jun 02, 2013 at 07:51:52PM +0300, Jussi Kivilinna wrote:
 It appears that the performance of 'vpgatherdd' is suboptimal for this kind 
 of
 workload (tested on Core i5-4570) and causes blowfish-avx2 to be 
 significantly
 slower than blowfish-amd64. So disable the AVX2 implementation to avoid
 performance regressions.

 Signed-off-by: Jussi Kivilinna jussi.kivili...@iki.fi
 
 Both patches applied to crypto.  I presume you're working on
 a more permanent solution on this?

Yes, I've been looking for solution. Problem is, well, that I assumed vgather 
to be quicker than emulating gather using vpextr/vpinsr instructions. But it 
appears that vgather has about the same speed as group of vpextr/vpinsr doing 
gather manually. So doing

asm volatile(
vpgatherdd %%xmm0, (%[ptr], %%xmm8, 4), %%xmm9;\n\t
vpcmpeqd %%xmm0, %%xmm0, %%xmm0; /* reset mask */  \n\t
vpgatherdd %%xmm0, (%[ptr], %%xmm9, 4), %%xmm8;\n\t
vpcmpeqd %%xmm0, %%xmm0, %%xmm0;   \n\t
:: [ptr] r (mem[0]) : memory
);

in loop is slightly _slower_ than manually extractinginserting values with

asm volatile(
vmovd   %%xmm8, %%eax; \n\t
vpextrd $1, %%xmm8, %%edx; \n\t
vmovd   (%[ptr], %%rax, 4), %%xmm10;   \n\t
vpextrd $2, %%xmm8, %%eax; \n\t
vpinsrd $1, (%[ptr], %%rdx, 4), %%xmm10, %%xmm10;  \n\t
vpextrd $3, %%xmm8, %%edx; \n\t
vpinsrd $2, (%[ptr], %%rax, 4), %%xmm10, %%xmm10;  \n\t
vpinsrd $3, (%[ptr], %%rdx, 4), %%xmm10, %%xmm9;   \n\t

vmovd   %%xmm9, %%eax; \n\t
vpextrd $1, %%xmm9, %%edx; \n\t
vmovd   (%[ptr], %%rax, 4), %%xmm10;   \n\t
vpextrd $2, %%xmm9, %%eax; \n\t
vpinsrd $1, (%[ptr], %%rdx, 4), %%xmm10, %%xmm10;  \n\t
vpextrd $3, %%xmm9, %%edx; \n\t
vpinsrd $2, (%[ptr], %%rax, 4), %%xmm10, %%xmm10;  \n\t
vpinsrd $3, (%[ptr], %%rdx, 4), %%xmm10, %%xmm8;   \n\t
:: [ptr] r (mem[0]) : memory, eax, edx
);

vpextr/vpinsr cannot be used with 256-bit wide ymm registers, so 
'vinserti128/vextracti128' is needed and make manual gather about the same 
speed as vpgatherdd.

Now the block cipher implementations need to use all bytes of vector register 
for table look-ups, and the way that this is done in the AVX implementation of 
Twofish (move data from vector register to generic purpose registers, handle 
byte-extraction and table look-ups there and move processed data back to vector 
register) is about two to three times faster than the way with current AVX2 
implementation using vgather.

Blowfish does not do much processing in addition to table look-ups, so there is 
not much to that can be done. With Twofish, the table look-ups are the most 
computationally heavy part and I don't think that the wider vector registers in 
the other parts are going to give much boost. So permanent solution is likely 
to be revert.

-Jussi

 
 Thanks,
 

--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] crypto: blowfish - disable AVX2 implementation

2013-06-02 Thread Jussi Kivilinna
It appears that the performance of 'vpgatherdd' is suboptimal for this kind of
workload (tested on Core i5-4570) and causes blowfish-avx2 to be significantly
slower than blowfish-amd64. So disable the AVX2 implementation to avoid
performance regressions.

Signed-off-by: Jussi Kivilinna jussi.kivili...@iki.fi
---
 crypto/Kconfig |1 +
 1 file changed, 1 insertion(+)

diff --git a/crypto/Kconfig b/crypto/Kconfig
index 678a6ed..8ca52c5 100644
--- a/crypto/Kconfig
+++ b/crypto/Kconfig
@@ -842,6 +842,7 @@ config CRYPTO_BLOWFISH_X86_64
 config CRYPTO_BLOWFISH_AVX2_X86_64
tristate Blowfish cipher algorithm (x86_64/AVX2)
depends on X86  64BIT
+   depends on BROKEN
select CRYPTO_ALGAPI
select CRYPTO_CRYPTD
select CRYPTO_ABLK_HELPER_X86

--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/2] crypto: twofish - disable AVX2 implementation

2013-06-02 Thread Jussi Kivilinna
It appears that the performance of 'vpgatherdd' is suboptimal for this kind of
workload (tested on Core i5-4570) and causes twofish_avx2 to be significantly
slower than twofish_avx. So disable the AVX2 implementation to avoid
performance regressions.

Signed-off-by: Jussi Kivilinna jussi.kivili...@iki.fi
---
 crypto/Kconfig |1 +
 1 file changed, 1 insertion(+)

diff --git a/crypto/Kconfig b/crypto/Kconfig
index d1ca631..678a6ed 100644
--- a/crypto/Kconfig
+++ b/crypto/Kconfig
@@ -1318,6 +1318,7 @@ config CRYPTO_TWOFISH_AVX_X86_64
 config CRYPTO_TWOFISH_AVX2_X86_64
tristate Twofish cipher algorithm (x86_64/AVX2)
depends on X86  64BIT
+   depends on BROKEN
select CRYPTO_ALGAPI
select CRYPTO_CRYPTD
select CRYPTO_ABLK_HELPER_X86

--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] crypto: sha256_ssse3 - fix stack corruption with SSSE3 and AVX implementations

2013-05-21 Thread Jussi Kivilinna
The _XFER stack element size was set too small, 8 bytes, when it needs to be
16 bytes. As _XFER is the last stack element used by these implementations,
the 16 byte stores with 'movdqa' corrupt the stack where the value of register
%r12 is temporarily stored. As these implementations align the stack pointer
to 16 bytes, this corruption did not happen every time.

Patch corrects this issue.

Reported-by: Julian Wollrath jwollr...@web.de
Cc: Tim Chen tim.c.c...@linux.intel.com
Signed-off-by: Jussi Kivilinna jussi.kivili...@iki.fi
---
 arch/x86/crypto/sha256-avx-asm.S   |2 +-
 arch/x86/crypto/sha256-ssse3-asm.S |2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/crypto/sha256-avx-asm.S b/arch/x86/crypto/sha256-avx-asm.S
index 56610c4..642f156 100644
--- a/arch/x86/crypto/sha256-avx-asm.S
+++ b/arch/x86/crypto/sha256-avx-asm.S
@@ -118,7 +118,7 @@ y2 = %r15d
 
 _INP_END_SIZE = 8
 _INP_SIZE = 8
-_XFER_SIZE = 8
+_XFER_SIZE = 16
 _XMM_SAVE_SIZE = 0
 
 _INP_END = 0
diff --git a/arch/x86/crypto/sha256-ssse3-asm.S 
b/arch/x86/crypto/sha256-ssse3-asm.S
index 98d3c39..f833b74 100644
--- a/arch/x86/crypto/sha256-ssse3-asm.S
+++ b/arch/x86/crypto/sha256-ssse3-asm.S
@@ -111,7 +111,7 @@ y2 = %r15d
 
 _INP_END_SIZE = 8
 _INP_SIZE = 8
-_XFER_SIZE = 8
+_XFER_SIZE = 16
 _XMM_SAVE_SIZE = 0
 
 _INP_END = 0

--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] crypto: sha512_generic - set cra_driver_name

2013-05-21 Thread Jussi Kivilinna
'sha512_generic' should set driver name now that there is alternative sha512
provider (sha512_ssse3).

Signed-off-by: Jussi Kivilinna jussi.kivili...@iki.fi
---
 crypto/sha512_generic.c |2 ++
 1 file changed, 2 insertions(+)

diff --git a/crypto/sha512_generic.c b/crypto/sha512_generic.c
index 4c58620..6ed124f 100644
--- a/crypto/sha512_generic.c
+++ b/crypto/sha512_generic.c
@@ -251,6 +251,7 @@ static struct shash_alg sha512_algs[2] = { {
.descsize   =   sizeof(struct sha512_state),
.base   =   {
.cra_name   =   sha512,
+   .cra_driver_name =  sha512-generic,
.cra_flags  =   CRYPTO_ALG_TYPE_SHASH,
.cra_blocksize  =   SHA512_BLOCK_SIZE,
.cra_module =   THIS_MODULE,
@@ -263,6 +264,7 @@ static struct shash_alg sha512_algs[2] = { {
.descsize   =   sizeof(struct sha512_state),
.base   =   {
.cra_name   =   sha384,
+   .cra_driver_name =  sha384-generic,
.cra_flags  =   CRYPTO_ALG_TYPE_SHASH,
.cra_blocksize  =   SHA384_BLOCK_SIZE,
.cra_module =   THIS_MODULE,

--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/2] crypto: sha512_ssse3 - add sha384 support

2013-05-21 Thread Jussi Kivilinna
Add sha384 implementation to sha512_ssse3 module.

This also fixes sha512_ssse3 module autoloading issue when 'sha384' is used
before 'sha512'. Previously in such case, just sha512_generic was loaded and
not sha512_ssse3 (since it did not provide sha384). Now if 'sha512' was used
after 'sha384' usage, sha512_ssse3 would remain unloaded. For example, this
happens with tcrypt testing module since it tests 'sha384' before 'sha512'.

Cc: Tim Chen tim.c.c...@linux.intel.com
Signed-off-by: Jussi Kivilinna jussi.kivili...@iki.fi
---
 arch/x86/crypto/sha512_ssse3_glue.c |   58 ---
 1 file changed, 53 insertions(+), 5 deletions(-)

diff --git a/arch/x86/crypto/sha512_ssse3_glue.c 
b/arch/x86/crypto/sha512_ssse3_glue.c
index 6cbd8df..f30cd10 100644
--- a/arch/x86/crypto/sha512_ssse3_glue.c
+++ b/arch/x86/crypto/sha512_ssse3_glue.c
@@ -194,7 +194,37 @@ static int sha512_ssse3_import(struct shash_desc *desc, 
const void *in)
return 0;
 }
 
-static struct shash_alg alg = {
+static int sha384_ssse3_init(struct shash_desc *desc)
+{
+   struct sha512_state *sctx = shash_desc_ctx(desc);
+
+   sctx-state[0] = SHA384_H0;
+   sctx-state[1] = SHA384_H1;
+   sctx-state[2] = SHA384_H2;
+   sctx-state[3] = SHA384_H3;
+   sctx-state[4] = SHA384_H4;
+   sctx-state[5] = SHA384_H5;
+   sctx-state[6] = SHA384_H6;
+   sctx-state[7] = SHA384_H7;
+
+   sctx-count[0] = sctx-count[1] = 0;
+
+   return 0;
+}
+
+static int sha384_ssse3_final(struct shash_desc *desc, u8 *hash)
+{
+   u8 D[SHA512_DIGEST_SIZE];
+
+   sha512_ssse3_final(desc, D);
+
+   memcpy(hash, D, SHA384_DIGEST_SIZE);
+   memset(D, 0, SHA512_DIGEST_SIZE);
+
+   return 0;
+}
+
+static struct shash_alg algs[] = { {
.digestsize =   SHA512_DIGEST_SIZE,
.init   =   sha512_ssse3_init,
.update =   sha512_ssse3_update,
@@ -211,7 +241,24 @@ static struct shash_alg alg = {
.cra_blocksize  =   SHA512_BLOCK_SIZE,
.cra_module =   THIS_MODULE,
}
-};
+},  {
+   .digestsize =   SHA384_DIGEST_SIZE,
+   .init   =   sha384_ssse3_init,
+   .update =   sha512_ssse3_update,
+   .final  =   sha384_ssse3_final,
+   .export =   sha512_ssse3_export,
+   .import =   sha512_ssse3_import,
+   .descsize   =   sizeof(struct sha512_state),
+   .statesize  =   sizeof(struct sha512_state),
+   .base   =   {
+   .cra_name   =   sha384,
+   .cra_driver_name =  sha384-ssse3,
+   .cra_priority   =   150,
+   .cra_flags  =   CRYPTO_ALG_TYPE_SHASH,
+   .cra_blocksize  =   SHA384_BLOCK_SIZE,
+   .cra_module =   THIS_MODULE,
+   }
+} };
 
 #ifdef CONFIG_AS_AVX
 static bool __init avx_usable(void)
@@ -234,7 +281,7 @@ static bool __init avx_usable(void)
 
 static int __init sha512_ssse3_mod_init(void)
 {
-   /* test for SSE3 first */
+   /* test for SSSE3 first */
if (cpu_has_ssse3)
sha512_transform_asm = sha512_transform_ssse3;
 
@@ -261,7 +308,7 @@ static int __init sha512_ssse3_mod_init(void)
else
 #endif
pr_info(Using SSSE3 optimized SHA-512 
implementation\n);
-   return crypto_register_shash(alg);
+   return crypto_register_shashes(algs, ARRAY_SIZE(algs));
}
pr_info(Neither AVX nor SSSE3 is available/usable.\n);
 
@@ -270,7 +317,7 @@ static int __init sha512_ssse3_mod_init(void)
 
 static void __exit sha512_ssse3_mod_fini(void)
 {
-   crypto_unregister_shash(alg);
+   crypto_unregister_shashes(algs, ARRAY_SIZE(algs));
 }
 
 module_init(sha512_ssse3_mod_init);
@@ -280,3 +327,4 @@ MODULE_LICENSE(GPL);
 MODULE_DESCRIPTION(SHA512 Secure Hash Algorithm, Supplemental SSE3 
accelerated);
 
 MODULE_ALIAS(sha512);
+MODULE_ALIAS(sha384);

--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] crypto: sha256_ssse3 - add sha224 support

2013-05-21 Thread Jussi Kivilinna
Add sha224 implementation to sha256_ssse3 module.

This also fixes sha256_ssse3 module autoloading issue when 'sha224' is used
before 'sha256'. Previously in such case, just sha256_generic was loaded and
not sha256_ssse3 (since it did not provide sha224). Now if 'sha256' was used
after 'sha224' usage, sha256_ssse3 would remain unloaded.

Cc: Tim Chen tim.c.c...@linux.intel.com
Signed-off-by: Jussi Kivilinna jussi.kivili...@iki.fi
---
 arch/x86/crypto/sha256_ssse3_glue.c |   57 ---
 1 file changed, 52 insertions(+), 5 deletions(-)

diff --git a/arch/x86/crypto/sha256_ssse3_glue.c 
b/arch/x86/crypto/sha256_ssse3_glue.c
index 597d4da..50226c4 100644
--- a/arch/x86/crypto/sha256_ssse3_glue.c
+++ b/arch/x86/crypto/sha256_ssse3_glue.c
@@ -187,7 +187,36 @@ static int sha256_ssse3_import(struct shash_desc *desc, 
const void *in)
return 0;
 }
 
-static struct shash_alg alg = {
+static int sha224_ssse3_init(struct shash_desc *desc)
+{
+   struct sha256_state *sctx = shash_desc_ctx(desc);
+
+   sctx-state[0] = SHA224_H0;
+   sctx-state[1] = SHA224_H1;
+   sctx-state[2] = SHA224_H2;
+   sctx-state[3] = SHA224_H3;
+   sctx-state[4] = SHA224_H4;
+   sctx-state[5] = SHA224_H5;
+   sctx-state[6] = SHA224_H6;
+   sctx-state[7] = SHA224_H7;
+   sctx-count = 0;
+
+   return 0;
+}
+
+static int sha224_ssse3_final(struct shash_desc *desc, u8 *hash)
+{
+   u8 D[SHA256_DIGEST_SIZE];
+
+   sha256_ssse3_final(desc, D);
+
+   memcpy(hash, D, SHA224_DIGEST_SIZE);
+   memset(D, 0, SHA256_DIGEST_SIZE);
+
+   return 0;
+}
+
+static struct shash_alg algs[] = { {
.digestsize =   SHA256_DIGEST_SIZE,
.init   =   sha256_ssse3_init,
.update =   sha256_ssse3_update,
@@ -204,7 +233,24 @@ static struct shash_alg alg = {
.cra_blocksize  =   SHA256_BLOCK_SIZE,
.cra_module =   THIS_MODULE,
}
-};
+}, {
+   .digestsize =   SHA224_DIGEST_SIZE,
+   .init   =   sha224_ssse3_init,
+   .update =   sha256_ssse3_update,
+   .final  =   sha224_ssse3_final,
+   .export =   sha256_ssse3_export,
+   .import =   sha256_ssse3_import,
+   .descsize   =   sizeof(struct sha256_state),
+   .statesize  =   sizeof(struct sha256_state),
+   .base   =   {
+   .cra_name   =   sha224,
+   .cra_driver_name =  sha224-ssse3,
+   .cra_priority   =   150,
+   .cra_flags  =   CRYPTO_ALG_TYPE_SHASH,
+   .cra_blocksize  =   SHA224_BLOCK_SIZE,
+   .cra_module =   THIS_MODULE,
+   }
+} };
 
 #ifdef CONFIG_AS_AVX
 static bool __init avx_usable(void)
@@ -227,7 +273,7 @@ static bool __init avx_usable(void)
 
 static int __init sha256_ssse3_mod_init(void)
 {
-   /* test for SSE3 first */
+   /* test for SSSE3 first */
if (cpu_has_ssse3)
sha256_transform_asm = sha256_transform_ssse3;
 
@@ -254,7 +300,7 @@ static int __init sha256_ssse3_mod_init(void)
else
 #endif
pr_info(Using SSSE3 optimized SHA-256 
implementation\n);
-   return crypto_register_shash(alg);
+   return crypto_register_shashes(algs, ARRAY_SIZE(algs));
}
pr_info(Neither AVX nor SSSE3 is available/usable.\n);
 
@@ -263,7 +309,7 @@ static int __init sha256_ssse3_mod_init(void)
 
 static void __exit sha256_ssse3_mod_fini(void)
 {
-   crypto_unregister_shash(alg);
+   crypto_unregister_shashes(algs, ARRAY_SIZE(algs));
 }
 
 module_init(sha256_ssse3_mod_init);
@@ -273,3 +319,4 @@ MODULE_LICENSE(GPL);
 MODULE_DESCRIPTION(SHA256 Secure Hash Algorithm, Supplemental SSE3 
accelerated);
 
 MODULE_ALIAS(sha256);
+MODULE_ALIAS(sha384);

--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Oops on 3.10-rc1 related to ssh256_ssse3

2013-05-20 Thread Jussi Kivilinna
 and AVX implementations

From: Jussi Kivilinna jussi.kivili...@iki.fi

The _XFER stack element size was set too small, 8 bytes, when it needs to be
16 bytes. As _XFER is the last stack element used by these implementations,
the 16 byte stores with 'movdqa' corrupt the stack where the value of register
%r12 is temporarily stored. As implementations align stack to 16 bytes, this
corruption did not happen every time.

Patch corrects this issue.
---
 arch/x86/crypto/sha256-avx-asm.S   |2 +-
 arch/x86/crypto/sha256-ssse3-asm.S |2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/crypto/sha256-avx-asm.S b/arch/x86/crypto/sha256-avx-asm.S
index 56610c4..642f156 100644
--- a/arch/x86/crypto/sha256-avx-asm.S
+++ b/arch/x86/crypto/sha256-avx-asm.S
@@ -118,7 +118,7 @@ y2 = %r15d
 
 _INP_END_SIZE = 8
 _INP_SIZE = 8
-_XFER_SIZE = 8
+_XFER_SIZE = 16
 _XMM_SAVE_SIZE = 0
 
 _INP_END = 0
diff --git a/arch/x86/crypto/sha256-ssse3-asm.S b/arch/x86/crypto/sha256-ssse3-asm.S
index 98d3c39..f833b74 100644
--- a/arch/x86/crypto/sha256-ssse3-asm.S
+++ b/arch/x86/crypto/sha256-ssse3-asm.S
@@ -111,7 +111,7 @@ y2 = %r15d
 
 _INP_END_SIZE = 8
 _INP_SIZE = 8
-_XFER_SIZE = 8
+_XFER_SIZE = 16
 _XMM_SAVE_SIZE = 0
 
 _INP_END = 0


Re: [PATCH 2/4] Accelerated CRC T10 DIF computation with PCLMULQDQ instruction

2013-04-17 Thread Jussi Kivilinna
On 16.04.2013 19:20, Tim Chen wrote:
 This is the x86_64 CRC T10 DIF transform accelerated with the PCLMULQDQ
 instructions.  Details discussing the implementation can be found in the
 paper:
 
 Fast CRC Computation for Generic Polynomials Using PCLMULQDQ Instruction
 URL: http://download.intel.com/design/intarch/papers/323102.pdf

URL does not work.

 
 Signed-off-by: Tim Chen tim.c.c...@linux.intel.com
 Tested-by: Keith Busch keith.bu...@intel.com
 ---
  arch/x86/crypto/crct10dif-pcl-asm_64.S | 659 
 +
  1 file changed, 659 insertions(+)
  create mode 100644 arch/x86/crypto/crct10dif-pcl-asm_64.S
snip
 +
 + # Allocate Stack Space
 + mov %rsp, %rcx
 + sub $16*10, %rsp
 + and $~(0x20 - 1), %rsp
 +
 + # push the xmm registers into the stack to maintain
 + movdqa %xmm10, 16*2(%rsp)
 + movdqa %xmm11, 16*3(%rsp)
 + movdqa %xmm8 , 16*4(%rsp)
 + movdqa %xmm12, 16*5(%rsp)
 + movdqa %xmm13, 16*6(%rsp)
 + movdqa %xmm6,  16*7(%rsp)
 + movdqa %xmm7,  16*8(%rsp)
 + movdqa %xmm9,  16*9(%rsp)

You don't need to store (and restore) these, as 'crc_t10dif_pcl' is called 
between kernel_fpu_begin/_end.

 +
 +
 + # check if smaller than 256
 + cmp $256, arg3
 +
snip
 +_cleanup:
 + # scale the result back to 16 bits
 + shr $16, %eax
 + movdqa  16*2(%rsp), %xmm10
 + movdqa  16*3(%rsp), %xmm11
 + movdqa  16*4(%rsp), %xmm8
 + movdqa  16*5(%rsp), %xmm12
 + movdqa  16*6(%rsp), %xmm13
 + movdqa  16*7(%rsp), %xmm6
 + movdqa  16*8(%rsp), %xmm7
 + movdqa  16*9(%rsp), %xmm9

Registers are overwritten by kernel_fpu_end.

 + mov %rcx, %rsp
 + ret
 +ENDPROC(crc_t10dif_pcl)
 +

You should move ENDPROC at end of the full function.

 +
 +
 +.align 16
 +_less_than_128:
 +
 + # check if there is enough buffer to be able to fold 16B at a time
 + cmp $32, arg3
snip
 + movdqa  (%rsp), %xmm7
 + pshufb  %xmm11, %xmm7
 + pxor%xmm0 , %xmm7   # xor the initial crc value
 +
 + psrldq  $7, %xmm7
 +
 + jmp _barrett

Move ENDPROC here.


 -Jussi
--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/4] Simple correctness and speed test for CRCT10DIF hash

2013-04-17 Thread Jussi Kivilinna
On 16.04.2013 19:20, Tim Chen wrote:
 These are simple tests to do sanity check of CRC T10 DIF hash.  The
 correctness of the transform can be checked with the command
   modprobe tcrypt mode=47
 The speed of the transform can be evaluated with the command
   modprobe tcrypt mode=320
 
 Set the cpu frequency to constant and turn turbo off when running the
 speed test so the frequency governor will not tweak the frequency and
 affects the measurements.
 
 Signed-off-by: Tim Chen tim.c.c...@linux.intel.com
 Tested-by: Keith Busch keith.bu...@intel.com
snip
  
 +#define CRCT10DIF_TEST_VECTORS   2
 +static struct hash_testvec crct10dif_tv_template[] = {
 + {
 + .plaintext = abc,
 + .psize  = 3,
 +#ifdef __LITTLE_ENDIAN
 + .digest = \x3b\x44,
 +#else
 + .digest = \x44\x3b,
 +#endif
 + }, {
 + .plaintext =
 + abcd,
 + .psize  = 56,
 +#ifdef __LITTLE_ENDIAN
 + .digest = \xe3\x9c,
 +#else
 + .digest = \x9c\xe3,
 +#endif
 + .np = 2,
 + .tap= { 28, 28 }
 + }
 +};
 +

Are these large enough to test all code paths in the PCLMULQDQ implementation?

-Jussi

--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC PATCH 0/6] Add AVX2 accelerated implementations for Blowfish, Twofish, Serpent and Camellia

2013-04-13 Thread Jussi Kivilinna
The following series implements four block ciphers - Blowfish, Twofish, Serpent
and Camellia - using AVX2 instruction set. This work on AVX2 implementations
started over year ago and have been available at
https://github.com/jkivilin/crypto-avx2

The Serpent and Camellia implementations are directly based on the word-sliced
and byte-sliced AVX implementations and have been extended to use the 256-bit
YMM registers. As such the performance should be better than with the 128-bit
wide AVX implementations. (Camellia implementation needs some extra handling
for the AES-NI as AES instructions have remained only 128-bit wide.)

Blowfish and Twofish implementations utilize the new vpgatherdd instruction to
perform eight vectorized 8x32-bit table look-ups at once. This is different
from the previous word-sliced AVX implementations, where table look-ups have
to performed through general purpose registers. AVX2 implementations thus
avoid additional moving of data between the SIMD and general purpose registers
and therefore should be faster.

For obvious reasons, I have not tested these implementations on real hardware.
Kernel tcrypt tests have been run under Bochs, which should contain somewhat
working AVX2 implementation. But I cannot be sure, even the Intel SDE emulator
that I used for testing these implementations did not quite follow the specs
(a past version of SDE that I initially used allowed vector registers to
vgather be same, whereas specs say that in such case exception should be
raised). Because of this, the first versions of patchset in above repository
are broken.

So since I'm unable to verify that these implementations work on real hardware
and are unable to conduct real performance evaluation, I'm sending this
patchset as RFC. Maybe someone can actually test these on real hardware and
maybe give acked-by in case these look ok(?). If such is not possible, I'll
do the testing myself when those Haswell processors come available where I
live.

-Jussi

---

Jussi Kivilinna (6):
  crypto: testmgr - extend camellia test-vectors for camellia-aesni/avx2
  crypto: tcrypt - add async cipher speed tests for blowfish
  crypto: blowfish - add AVX2/x86_64 implementation of blowfish cipher
  crypto: twofish - add AVX2/x86_64 assembler implementation of twofish 
cipher
  crypto: serpent - add AVX2/x86_64 assembler implementation of serpent 
cipher
  crypto: camellia - add AVX2/AES-NI/x86_64 assembler implementation of 
camellia cipher


 arch/x86/crypto/Makefile |   17 
 arch/x86/crypto/blowfish-avx2-asm_64.S   |  449 +
 arch/x86/crypto/blowfish_avx2_glue.c |  585 +++
 arch/x86/crypto/blowfish_glue.c  |   32 -
 arch/x86/crypto/camellia-aesni-avx2-asm_64.S | 1368 ++
 arch/x86/crypto/camellia_aesni_avx2_glue.c   |  586 +++
 arch/x86/crypto/camellia_aesni_avx_glue.c|   17 
 arch/x86/crypto/glue_helper-asm-avx2.S   |  180 +++
 arch/x86/crypto/serpent-avx2-asm_64.S|  800 +++
 arch/x86/crypto/serpent_avx2_glue.c  |  562 +++
 arch/x86/crypto/serpent_avx_glue.c   |   62 +
 arch/x86/crypto/twofish-avx2-asm_64.S|  600 +++
 arch/x86/crypto/twofish_avx2_glue.c  |  584 +++
 arch/x86/crypto/twofish_avx_glue.c   |   14 
 arch/x86/include/asm/cpufeature.h|1 
 arch/x86/include/asm/crypto/blowfish.h   |   43 +
 arch/x86/include/asm/crypto/camellia.h   |   19 
 arch/x86/include/asm/crypto/serpent-avx.h|   24 
 arch/x86/include/asm/crypto/twofish.h|   18 
 crypto/Kconfig   |   88 ++
 crypto/tcrypt.c  |   15 
 crypto/testmgr.c |   51 +
 crypto/testmgr.h | 1100 -
 23 files changed, 7128 insertions(+), 87 deletions(-)
 create mode 100644 arch/x86/crypto/blowfish-avx2-asm_64.S
 create mode 100644 arch/x86/crypto/blowfish_avx2_glue.c
 create mode 100644 arch/x86/crypto/camellia-aesni-avx2-asm_64.S
 create mode 100644 arch/x86/crypto/camellia_aesni_avx2_glue.c
 create mode 100644 arch/x86/crypto/glue_helper-asm-avx2.S
 create mode 100644 arch/x86/crypto/serpent-avx2-asm_64.S
 create mode 100644 arch/x86/crypto/serpent_avx2_glue.c
 create mode 100644 arch/x86/crypto/twofish-avx2-asm_64.S
 create mode 100644 arch/x86/crypto/twofish_avx2_glue.c
 create mode 100644 arch/x86/include/asm/crypto/blowfish.h

-- 

--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC PATCH 1/6] crypto: testmgr - extend camellia test-vectors for camellia-aesni/avx2

2013-04-13 Thread Jussi Kivilinna
Signed-off-by: Jussi Kivilinna jussi.kivili...@iki.fi
---
 crypto/testmgr.h | 1100 --
 1 file changed, 1062 insertions(+), 38 deletions(-)

diff --git a/crypto/testmgr.h b/crypto/testmgr.h
index d503660..dc2c054 100644
--- a/crypto/testmgr.h
+++ b/crypto/testmgr.h
@@ -20997,8 +20997,72 @@ static struct cipher_testvec 
camellia_enc_tv_template[] = {
  \x86\x1D\xB4\x28\xBF\x56\xED\x61
  \xF8\x8F\x03\x9A\x31\xC8\x3C\xD3
  \x6A\x01\x75\x0C\xA3\x17\xAE\x45
- \xDC\x50\xE7\x7E\x15\x89\x20\xB7,
-   .ilen   = 496,
+ \xDC\x50\xE7\x7E\x15\x89\x20\xB7
+ \x2B\xC2\x59\xF0\x64\xFB\x92\x06
+ \x9D\x34\xCB\x3F\xD6\x6D\x04\x78
+ \x0F\xA6\x1A\xB1\x48\xDF\x53\xEA
+ \x81\x18\x8C\x23\xBA\x2E\xC5\x5C
+ \xF3\x67\xFE\x95\x09\xA0\x37\xCE
+ \x42\xD9\x70\x07\x7B\x12\xA9\x1D
+ \xB4\x4B\xE2\x56\xED\x84\x1B\x8F
+ \x26\xBD\x31\xC8\x5F\xF6\x6A\x01
+ \x98\x0C\xA3\x3A\xD1\x45\xDC\x73
+ \x0A\x7E\x15\xAC\x20\xB7\x4E\xE5
+ \x59\xF0\x87\x1E\x92\x29\xC0\x34
+ \xCB\x62\xF9\x6D\x04\x9B\x0F\xA6
+ \x3D\xD4\x48\xDF\x76\x0D\x81\x18
+ \xAF\x23\xBA\x51\xE8\x5C\xF3\x8A
+ \x21\x95\x2C\xC3\x37\xCE\x65\xFC
+ \x70\x07\x9E\x12\xA9\x40\xD7\x4B
+ \xE2\x79\x10\x84\x1B\xB2\x26\xBD
+ \x54\xEB\x5F\xF6\x8D\x01\x98\x2F
+ \xC6\x3A\xD1\x68\xFF\x73\x0A\xA1
+ \x15\xAC\x43\xDA\x4E\xE5\x7C\x13
+ \x87\x1E\xB5\x29\xC0\x57\xEE\x62
+ \xF9\x90\x04\x9B\x32\xC9\x3D\xD4
+ \x6B\x02\x76\x0D\xA4\x18\xAF\x46
+ \xDD\x51\xE8\x7F\x16\x8A\x21\xB8
+ \x2C\xC3\x5A\xF1\x65\xFC\x93\x07
+ \x9E\x35\xCC\x40\xD7\x6E\x05\x79
+ \x10\xA7\x1B\xB2\x49\xE0\x54\xEB
+ \x82\x19\x8D\x24\xBB\x2F\xC6\x5D
+ \xF4\x68\xFF\x96\x0A\xA1\x38\xCF
+ \x43\xDA\x71\x08\x7C\x13\xAA\x1E
+ \xB5\x4C\xE3\x57\xEE\x85\x1C\x90
+ \x27\xBE\x32\xC9\x60\xF7\x6B\x02
+ \x99\x0D\xA4\x3B\xD2\x46\xDD\x74
+ \x0B\x7F\x16\xAD\x21\xB8\x4F\xE6
+ \x5A\xF1\x88\x1F\x93\x2A\xC1\x35
+ \xCC\x63\xFA\x6E\x05\x9C\x10\xA7
+ \x3E\xD5\x49\xE0\x77\x0E\x82\x19
+ \xB0\x24\xBB\x52\xE9\x5D\xF4\x8B
+ \x22\x96\x2D\xC4\x38\xCF\x66\xFD
+ \x71\x08\x9F\x13\xAA\x41\xD8\x4C
+ \xE3\x7A\x11\x85\x1C\xB3\x27\xBE
+ \x55\xEC\x60\xF7\x8E\x02\x99\x30
+ \xC7\x3B\xD2\x69\x00\x74\x0B\xA2
+ \x16\xAD\x44\xDB\x4F\xE6\x7D\x14
+ \x88\x1F\xB6\x2A\xC1\x58\xEF\x63
+ \xFA\x91\x05\x9C\x33\xCA\x3E\xD5
+ \x6C\x03\x77\x0E\xA5\x19\xB0\x47
+ \xDE\x52\xE9\x80\x17\x8B\x22\xB9
+ \x2D\xC4\x5B\xF2\x66\xFD\x94\x08
+ \x9F\x36\xCD\x41\xD8\x6F\x06\x7A
+ \x11\xA8\x1C\xB3\x4A\xE1\x55\xEC
+ \x83\x1A\x8E\x25\xBC\x30\xC7\x5E
+ \xF5\x69\x00\x97\x0B\xA2\x39\xD0
+ \x44\xDB\x72\x09\x7D\x14\xAB\x1F
+ \xB6\x4D\xE4\x58\xEF\x86\x1D\x91
+ \x28\xBF\x33\xCA\x61\xF8\x6C\x03
+ \x9A\x0E\xA5\x3C\xD3\x47\xDE\x75
+ \x0C\x80\x17\xAE\x22\xB9\x50\xE7
+ \x5B\xF2\x89\x20\x94\x2B\xC2\x36
+ \xCD\x64\xFB\x6F\x06\x9D\x11\xA8
+ \x3F\xD6\x4A\xE1\x78\x0F\x83\x1A
+ \xB1\x25\xBC\x53\xEA\x5E\xF5\x8C
+ \x00\x97\x2E\xC5\x39\xD0\x67\xFE
+ \x72\x09\xA0\x14\xAB\x42\xD9\x4D,
+   .ilen   = 1008,
.result = \xED\xCD\xDB\xB8\x68\xCE\xBD\xEA
  \x9D\x9D\xCD\x9F\x4F\xFC\x4D\xB7
  \xA5\xFF\x6F\x43\x0F\xBA\x32\x04
@@ -21060,11 +21124,75 @@ static struct cipher_testvec 
camellia_enc_tv_template[] = {
  \x2C\x35\x1B\x38\x85\x7D\xE8\xF3
  \x87\x4F\xDA\xD8\x5F\xFC\xB6\x44
  \xD0\xE3\x9B\x8B\xBF\xD6\xB8\xC4

[RFC PATCH 3/6] crypto: blowfish - add AVX2/x86_64 implementation of blowfish cipher

2013-04-13 Thread Jussi Kivilinna
Patch adds AVX2/x86-64 implementation of Blowfish cipher, requiring 32 parallel
blocks for input (256 bytes). Table look-ups are performed using vpgatherdd
instruction directly from vector registers and thus should be faster than
earlier implementations.

Signed-off-by: Jussi Kivilinna jussi.kivili...@iki.fi
---
 arch/x86/crypto/Makefile   |   11 +
 arch/x86/crypto/blowfish-avx2-asm_64.S |  449 +
 arch/x86/crypto/blowfish_avx2_glue.c   |  585 
 arch/x86/crypto/blowfish_glue.c|   32 --
 arch/x86/include/asm/cpufeature.h  |1 
 arch/x86/include/asm/crypto/blowfish.h |   43 ++
 crypto/Kconfig |   18 +
 crypto/testmgr.c   |   12 +
 8 files changed, 1127 insertions(+), 24 deletions(-)
 create mode 100644 arch/x86/crypto/blowfish-avx2-asm_64.S
 create mode 100644 arch/x86/crypto/blowfish_avx2_glue.c
 create mode 100644 arch/x86/include/asm/crypto/blowfish.h

diff --git a/arch/x86/crypto/Makefile b/arch/x86/crypto/Makefile
index 03cd731..28464ef 100644
--- a/arch/x86/crypto/Makefile
+++ b/arch/x86/crypto/Makefile
@@ -3,6 +3,8 @@
 #
 
 avx_supported := $(call as-instr,vpxor %xmm0$(comma)%xmm0$(comma)%xmm0,yes,no)
+avx2_supported := $(call as-instr,vpgatherdd %ymm0$(comma)(%eax$(comma)%ymm1\
+   $(comma)4)$(comma)%ymm2,yes,no)
 
 obj-$(CONFIG_CRYPTO_ABLK_HELPER_X86) += ablk_helper.o
 obj-$(CONFIG_CRYPTO_GLUE_HELPER_X86) += glue_helper.o
@@ -38,6 +40,11 @@ ifeq ($(avx_supported),yes)
obj-$(CONFIG_CRYPTO_SERPENT_AVX_X86_64) += serpent-avx-x86_64.o
 endif
 
+# These modules require assembler to support AVX2.
+ifeq ($(avx2_supported),yes)
+   obj-$(CONFIG_CRYPTO_BLOWFISH_AVX2_X86_64) += blowfish-avx2.o
+endif
+
 aes-i586-y := aes-i586-asm_32.o aes_glue.o
 twofish-i586-y := twofish-i586-asm_32.o twofish_glue.o
 salsa20-i586-y := salsa20-i586-asm_32.o salsa20_glue.o
@@ -62,6 +69,10 @@ ifeq ($(avx_supported),yes)
serpent_avx_glue.o
 endif
 
+ifeq ($(avx2_supported),yes)
+   blowfish-avx2-y := blowfish-avx2-asm_64.o blowfish_avx2_glue.o
+endif
+
 aesni-intel-y := aesni-intel_asm.o aesni-intel_glue.o fpu.o
 ghash-clmulni-intel-y := ghash-clmulni-intel_asm.o ghash-clmulni-intel_glue.o
 sha1-ssse3-y := sha1_ssse3_asm.o sha1_ssse3_glue.o
diff --git a/arch/x86/crypto/blowfish-avx2-asm_64.S 
b/arch/x86/crypto/blowfish-avx2-asm_64.S
new file mode 100644
index 000..784452e
--- /dev/null
+++ b/arch/x86/crypto/blowfish-avx2-asm_64.S
@@ -0,0 +1,449 @@
+/*
+ * x86_64/AVX2 assembler optimized version of Blowfish
+ *
+ * Copyright © 2012-2013 Jussi Kivilinna jussi.kivili...@iki.fi
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ */
+
+#include linux/linkage.h
+
+.file blowfish-avx2-asm_64.S
+
+.data
+.align 32
+
+.Lprefetch_mask:
+.long 0*64
+.long 1*64
+.long 2*64
+.long 3*64
+.long 4*64
+.long 5*64
+.long 6*64
+.long 7*64
+
+.Lbswap32_mask:
+.long 0x00010203
+.long 0x04050607
+.long 0x08090a0b
+.long 0x0c0d0e0f
+
+.Lbswap128_mask:
+   .byte 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0
+.Lbswap_iv_mask:
+   .byte 7, 6, 5, 4, 3, 2, 1, 0, 7, 6, 5, 4, 3, 2, 1, 0
+
+.text
+/* structure of crypto context */
+#define p  0
+#define s0 ((16 + 2) * 4)
+#define s1 ((16 + 2 + (1 * 256)) * 4)
+#define s2 ((16 + 2 + (2 * 256)) * 4)
+#define s3 ((16 + 2 + (3 * 256)) * 4)
+
+/* register macros */
+#define CTX%rdi
+#define RIO %rdx
+
+#define RS0%rax
+#define RS1%r8
+#define RS2%r9
+#define RS3%r10
+
+#define RLOOP  %r11
+#define RLOOPd %r11d
+
+#define RXr0   %ymm8
+#define RXr1   %ymm9
+#define RXr2   %ymm10
+#define RXr3   %ymm11
+#define RXl0   %ymm12
+#define RXl1   %ymm13
+#define RXl2   %ymm14
+#define RXl3   %ymm15
+
+/* temp regs */
+#define RT0%ymm0
+#define RT0x   %xmm0
+#define RT1%ymm1
+#define RT1x   %xmm1
+#define RIDX0  %ymm2
+#define RIDX1  %ymm3
+#define RIDX1x %xmm3
+#define RIDX2  %ymm4
+#define RIDX3  %ymm5
+
+/* vpgatherdd mask and '-1' */
+#define RNOT   %ymm6
+
+/* byte mask, (-1  24) */
+#define RBYTE  %ymm7
+
+/***
+ * 32-way AVX2 blowfish
+ ***/
+#define F(xl, xr) \
+   vpsrld $24, xl, RIDX0; \
+   vpsrld $16, xl, RIDX1; \
+   vpsrld $8, xl, RIDX2; \
+   vpand RBYTE, RIDX1, RIDX1; \
+   vpand RBYTE, RIDX2, RIDX2; \
+   vpand RBYTE, xl, RIDX3; \
+   \
+   vpgatherdd RNOT, (RS0, RIDX0, 4), RT0; \
+   vpcmpeqd RNOT, RNOT, RNOT; \
+   vpcmpeqd RIDX0, RIDX0, RIDX0; \
+   \
+   vpgatherdd RNOT, (RS1, RIDX1, 4), RT1; \
+   vpcmpeqd RIDX1, RIDX1, RIDX1

[RFC PATCH 4/6] crypto: twofish - add AVX2/x86_64 assembler implementation of twofish cipher

2013-04-13 Thread Jussi Kivilinna
Patch adds AVX2/x86-64 implementation of Twofish cipher, requiring 16 parallel
blocks for input (256 bytes). Table look-ups are performed using vpgatherdd
instruction directly from vector registers and thus should be faster than
earlier implementations. Implementation also uses 256-bit wide YMM registers,
which should give additional speed up compared to the AVX implementation.

Signed-off-by: Jussi Kivilinna jussi.kivili...@iki.fi
---
 arch/x86/crypto/Makefile   |2 
 arch/x86/crypto/glue_helper-asm-avx2.S |  180 ++
 arch/x86/crypto/twofish-avx2-asm_64.S  |  600 
 arch/x86/crypto/twofish_avx2_glue.c|  584 +++
 arch/x86/crypto/twofish_avx_glue.c |   14 +
 arch/x86/include/asm/crypto/twofish.h  |   18 +
 crypto/Kconfig |   24 +
 crypto/testmgr.c   |   12 +
 8 files changed, 1432 insertions(+), 2 deletions(-)
 create mode 100644 arch/x86/crypto/glue_helper-asm-avx2.S
 create mode 100644 arch/x86/crypto/twofish-avx2-asm_64.S
 create mode 100644 arch/x86/crypto/twofish_avx2_glue.c

diff --git a/arch/x86/crypto/Makefile b/arch/x86/crypto/Makefile
index 28464ef..1f6e0c2 100644
--- a/arch/x86/crypto/Makefile
+++ b/arch/x86/crypto/Makefile
@@ -43,6 +43,7 @@ endif
 # These modules require assembler to support AVX2.
 ifeq ($(avx2_supported),yes)
obj-$(CONFIG_CRYPTO_BLOWFISH_AVX2_X86_64) += blowfish-avx2.o
+   obj-$(CONFIG_CRYPTO_TWOFISH_AVX2_X86_64) += twofish-avx2.o
 endif
 
 aes-i586-y := aes-i586-asm_32.o aes_glue.o
@@ -71,6 +72,7 @@ endif
 
 ifeq ($(avx2_supported),yes)
blowfish-avx2-y := blowfish-avx2-asm_64.o blowfish_avx2_glue.o
+   twofish-avx2-y := twofish-avx2-asm_64.o twofish_avx2_glue.o
 endif
 
 aesni-intel-y := aesni-intel_asm.o aesni-intel_glue.o fpu.o
diff --git a/arch/x86/crypto/glue_helper-asm-avx2.S 
b/arch/x86/crypto/glue_helper-asm-avx2.S
new file mode 100644
index 000..a53ac11
--- /dev/null
+++ b/arch/x86/crypto/glue_helper-asm-avx2.S
@@ -0,0 +1,180 @@
+/*
+ * Shared glue code for 128bit block ciphers, AVX2 assembler macros
+ *
+ * Copyright © 2012-2013 Jussi Kivilinna jussi.kivili...@mbnet.fi
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ */
+
+#define load_16way(src, x0, x1, x2, x3, x4, x5, x6, x7) \
+   vmovdqu (0*32)(src), x0; \
+   vmovdqu (1*32)(src), x1; \
+   vmovdqu (2*32)(src), x2; \
+   vmovdqu (3*32)(src), x3; \
+   vmovdqu (4*32)(src), x4; \
+   vmovdqu (5*32)(src), x5; \
+   vmovdqu (6*32)(src), x6; \
+   vmovdqu (7*32)(src), x7;
+
+#define store_16way(dst, x0, x1, x2, x3, x4, x5, x6, x7) \
+   vmovdqu x0, (0*32)(dst); \
+   vmovdqu x1, (1*32)(dst); \
+   vmovdqu x2, (2*32)(dst); \
+   vmovdqu x3, (3*32)(dst); \
+   vmovdqu x4, (4*32)(dst); \
+   vmovdqu x5, (5*32)(dst); \
+   vmovdqu x6, (6*32)(dst); \
+   vmovdqu x7, (7*32)(dst);
+
+#define store_cbc_16way(src, dst, x0, x1, x2, x3, x4, x5, x6, x7, t0) \
+   vpxor t0, t0, t0; \
+   vinserti128 $1, (src), t0, t0; \
+   vpxor t0, x0, x0; \
+   vpxor (0*32+16)(src), x1, x1; \
+   vpxor (1*32+16)(src), x2, x2; \
+   vpxor (2*32+16)(src), x3, x3; \
+   vpxor (3*32+16)(src), x4, x4; \
+   vpxor (4*32+16)(src), x5, x5; \
+   vpxor (5*32+16)(src), x6, x6; \
+   vpxor (6*32+16)(src), x7, x7; \
+   store_16way(dst, x0, x1, x2, x3, x4, x5, x6, x7);
+
+#define inc_le128(x, minus_one, tmp) \
+   vpcmpeqq minus_one, x, tmp; \
+   vpsubq minus_one, x, x; \
+   vpslldq $8, tmp, tmp; \
+   vpsubq tmp, x, x;
+
+#define add2_le128(x, minus_one, minus_two, tmp1, tmp2) \
+   vpcmpeqq minus_one, x, tmp1; \
+   vpcmpeqq minus_two, x, tmp2; \
+   vpsubq minus_two, x, x; \
+   vpor tmp2, tmp1, tmp1; \
+   vpslldq $8, tmp1, tmp1; \
+   vpsubq tmp1, x, x;
+
+#define load_ctr_16way(iv, bswap, x0, x1, x2, x3, x4, x5, x6, x7, t0, t0x, t1, 
\
+  t1x, t2, t2x, t3, t3x, t4, t5) \
+   vpcmpeqd t0, t0, t0; \
+   vpsrldq $8, t0, t0; /* ab: -1:0 ; cd: -1:0 */ \
+   vpaddq t0, t0, t4; /* ab: -2:0 ; cd: -2:0 */\
+   \
+   /* load IV and byteswap */ \
+   vmovdqu (iv), t2x; \
+   vmovdqa t2x, t3x; \
+   inc_le128(t2x, t0x, t1x); \
+   vbroadcasti128 bswap, t1; \
+   vinserti128 $1, t2x, t3, t2; /* ab: le0 ; cd: le1 */ \
+   vpshufb t1, t2, x0; \
+   \
+   /* construct IVs */ \
+   add2_le128(t2, t0, t4, t3, t5); /* ab: le2 ; cd: le3 */ \
+   vpshufb t1, t2, x1; \
+   add2_le128(t2, t0, t4, t3, t5); \
+   vpshufb t1, t2, x2; \
+   add2_le128(t2, t0, t4, t3, t5); \
+   vpshufb t1, t2, x3; \
+   add2_le128(t2, t0, t4, t3, t5); \
+   vpshufb t1, t2

[RFC PATCH 5/6] crypto: serpent - add AVX2/x86_64 assembler implementation of serpent cipher

2013-04-13 Thread Jussi Kivilinna
Patch adds AVX2/x86-64 implementation of Serpent cipher, requiring 16 parallel
blocks for input (256 bytes). Implementation is based on the AVX implementation
and extends to use the 256-bit wide YMM registers. Since serpent does not use
table look-ups, this implementation should be close to two times faster than
the AVX implementation.

Signed-off-by: Jussi Kivilinna jussi.kivili...@iki.fi
---
 arch/x86/crypto/Makefile  |2 
 arch/x86/crypto/serpent-avx2-asm_64.S |  800 +
 arch/x86/crypto/serpent_avx2_glue.c   |  562 
 arch/x86/crypto/serpent_avx_glue.c|   62 ++
 arch/x86/include/asm/crypto/serpent-avx.h |   24 +
 crypto/Kconfig|   23 +
 crypto/testmgr.c  |   15 +
 7 files changed, 1468 insertions(+), 20 deletions(-)
 create mode 100644 arch/x86/crypto/serpent-avx2-asm_64.S
 create mode 100644 arch/x86/crypto/serpent_avx2_glue.c

diff --git a/arch/x86/crypto/Makefile b/arch/x86/crypto/Makefile
index 1f6e0c2..a21af59 100644
--- a/arch/x86/crypto/Makefile
+++ b/arch/x86/crypto/Makefile
@@ -43,6 +43,7 @@ endif
 # These modules require assembler to support AVX2.
 ifeq ($(avx2_supported),yes)
obj-$(CONFIG_CRYPTO_BLOWFISH_AVX2_X86_64) += blowfish-avx2.o
+   obj-$(CONFIG_CRYPTO_SERPENT_AVX2_X86_64) += serpent-avx2.o
obj-$(CONFIG_CRYPTO_TWOFISH_AVX2_X86_64) += twofish-avx2.o
 endif
 
@@ -72,6 +73,7 @@ endif
 
 ifeq ($(avx2_supported),yes)
blowfish-avx2-y := blowfish-avx2-asm_64.o blowfish_avx2_glue.o
+   serpent-avx2-y := serpent-avx2-asm_64.o serpent_avx2_glue.o
twofish-avx2-y := twofish-avx2-asm_64.o twofish_avx2_glue.o
 endif
 
diff --git a/arch/x86/crypto/serpent-avx2-asm_64.S 
b/arch/x86/crypto/serpent-avx2-asm_64.S
new file mode 100644
index 000..b222085
--- /dev/null
+++ b/arch/x86/crypto/serpent-avx2-asm_64.S
@@ -0,0 +1,800 @@
+/*
+ * x86_64/AVX2 assembler optimized version of Serpent
+ *
+ * Copyright © 2012-2013 Jussi Kivilinna jussi.kivili...@mbnet.fi
+ *
+ * Based on AVX assembler implementation of Serpent by:
+ *  Copyright © 2012 Johannes Goetzfried
+ *  johannes.goetzfr...@informatik.stud.uni-erlangen.de
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ */
+
+#include linux/linkage.h
+#include glue_helper-asm-avx2.S
+
+.file serpent-avx2-asm_64.S
+
+.data
+.align 16
+
+.Lbswap128_mask:
+   .byte 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0
+.Lxts_gf128mul_and_shl1_mask_0:
+   .byte 0x87, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0
+.Lxts_gf128mul_and_shl1_mask_1:
+   .byte 0x0e, 1, 0, 0, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0
+
+.text
+
+#define CTX %rdi
+
+#define RNOT %ymm0
+#define tp  %ymm1
+
+#define RA1 %ymm2
+#define RA2 %ymm3
+#define RB1 %ymm4
+#define RB2 %ymm5
+#define RC1 %ymm6
+#define RC2 %ymm7
+#define RD1 %ymm8
+#define RD2 %ymm9
+#define RE1 %ymm10
+#define RE2 %ymm11
+
+#define RK0 %ymm12
+#define RK1 %ymm13
+#define RK2 %ymm14
+#define RK3 %ymm15
+
+#define RK0x %xmm12
+#define RK1x %xmm13
+#define RK2x %xmm14
+#define RK3x %xmm15
+
+#define S0_1(x0, x1, x2, x3, x4)  \
+   vporx0,   x3, tp; \
+   vpxor   x3,   x0, x0; \
+   vpxor   x2,   x3, x4; \
+   vpxor   RNOT, x4, x4; \
+   vpxor   x1,   tp, x3; \
+   vpand   x0,   x1, x1; \
+   vpxor   x4,   x1, x1; \
+   vpxor   x0,   x2, x2;
+#define S0_2(x0, x1, x2, x3, x4)  \
+   vpxor   x3,   x0, x0; \
+   vporx0,   x4, x4; \
+   vpxor   x2,   x0, x0; \
+   vpand   x1,   x2, x2; \
+   vpxor   x2,   x3, x3; \
+   vpxor   RNOT, x1, x1; \
+   vpxor   x4,   x2, x2; \
+   vpxor   x2,   x1, x1;
+
+#define S1_1(x0, x1, x2, x3, x4)  \
+   vpxor   x0,   x1, tp; \
+   vpxor   x3,   x0, x0; \
+   vpxor   RNOT, x3, x3; \
+   vpand   tp,   x1, x4; \
+   vportp,   x0, x0; \
+   vpxor   x2,   x3, x3; \
+   vpxor   x3,   x0, x0; \
+   vpxor   x3,   tp, x1;
+#define S1_2(x0, x1, x2, x3, x4)  \
+   vpxor   x4,   x3, x3; \
+   vporx4,   x1, x1; \
+   vpxor   x2,   x4, x4; \
+   vpand   x0,   x2, x2; \
+   vpxor   x1,   x2, x2; \
+   vporx0,   x1, x1; \
+   vpxor   RNOT, x0, x0; \
+   vpxor   x2,   x0, x0; \
+   vpxor   x1,   x4, x4;
+
+#define S2_1(x0, x1, x2, x3, x4)  \
+   vpxor   RNOT, x3, x3; \
+   vpxor   x0,   x1, x1; \
+   vpand   x2,   x0, tp; \
+   vpxor   x3

[RFC PATCH 2/6] crypto: tcrypt - add async cipher speed tests for blowfish

2013-04-13 Thread Jussi Kivilinna
Signed-off-by: Jussi Kivilinna jussi.kivili...@iki.fi
---
 crypto/tcrypt.c |   15 +++
 1 file changed, 15 insertions(+)

diff --git a/crypto/tcrypt.c b/crypto/tcrypt.c
index 24ea7df..66d254c 100644
--- a/crypto/tcrypt.c
+++ b/crypto/tcrypt.c
@@ -1768,6 +1768,21 @@ static int do_test(int m)
   speed_template_32_64);
break;
 
+   case 509:
+   test_acipher_speed(ecb(blowfish), ENCRYPT, sec, NULL, 0,
+  speed_template_8_32);
+   test_acipher_speed(ecb(blowfish), DECRYPT, sec, NULL, 0,
+  speed_template_8_32);
+   test_acipher_speed(cbc(blowfish), ENCRYPT, sec, NULL, 0,
+  speed_template_8_32);
+   test_acipher_speed(cbc(blowfish), DECRYPT, sec, NULL, 0,
+  speed_template_8_32);
+   test_acipher_speed(ctr(blowfish), ENCRYPT, sec, NULL, 0,
+  speed_template_8_32);
+   test_acipher_speed(ctr(blowfish), DECRYPT, sec, NULL, 0,
+  speed_template_8_32);
+   break;
+
case 1000:
test_available();
break;

--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] crypto: aesni_intel - fix Kconfig problem with CRYPTO_GLUE_HELPER_X86

2013-04-10 Thread Jussi Kivilinna
The Kconfig setting for glue helper module is CRYPTO_GLUE_HELPER_X86, but
recent change for aesni_intel used CRYPTO_GLUE_HELPER instead. Patch corrects
this issue.

Cc: kbuild-...@01.org
Reported-by: kbuild test robot fengguang...@intel.com
Signed-off-by: Jussi Kivilinna jussi.kivili...@iki.fi
---
 crypto/Kconfig |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/crypto/Kconfig b/crypto/Kconfig
index 808ac37..0e7a237 100644
--- a/crypto/Kconfig
+++ b/crypto/Kconfig
@@ -678,7 +678,7 @@ config CRYPTO_AES_NI_INTEL
select CRYPTO_CRYPTD
select CRYPTO_ABLK_HELPER_X86
select CRYPTO_ALGAPI
-   select CRYPTO_GLUE_HELPER if 64BIT
+   select CRYPTO_GLUE_HELPER_X86 if 64BIT
select CRYPTO_LRW
select CRYPTO_XTS
help

--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] xfrm: add rfc4494 AES-CMAC-96 support

2013-04-08 Thread Jussi Kivilinna
Now that CryptoAPI has support for CMAC, we can add support for AES-CMAC-96
(rfc4494).

Cc: Tom St Denis tstde...@elliptictech.com
Signed-off-by: Jussi Kivilinna jussi.kivili...@iki.fi
---
 net/xfrm/xfrm_algo.c |   13 +
 1 file changed, 13 insertions(+)

diff --git a/net/xfrm/xfrm_algo.c b/net/xfrm/xfrm_algo.c
index 6fb9d00..ab4ef72 100644
--- a/net/xfrm/xfrm_algo.c
+++ b/net/xfrm/xfrm_algo.c
@@ -311,6 +311,19 @@ static struct xfrm_algo_desc aalg_list[] = {
.sadb_alg_maxbits = 128
}
 },
+{
+   /* rfc4494 */
+   .name = cmac(aes),
+
+   .uinfo = {
+   .auth = {
+   .icv_truncbits = 96,
+   .icv_fullbits = 128,
+   }
+   },
+
+   .pfkey_supported = 0,
+},
 };
 
 static struct xfrm_algo_desc ealg_list[] = {

--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] crypto: add CMAC support to CryptoAPI

2013-04-08 Thread Jussi Kivilinna
On 08.04.2013 11:24, Steffen Klassert wrote:
 On Mon, Apr 08, 2013 at 10:48:44AM +0300, Jussi Kivilinna wrote:
 Patch adds support for NIST recommended block cipher mode CMAC to CryptoAPI.

 This work is based on Tom St Denis' earlier patch,
  http://marc.info/?l=linux-crypto-vgerm=135877306305466w=2

 Cc: Tom St Denis tstde...@elliptictech.com
 Signed-off-by: Jussi Kivilinna jussi.kivili...@iki.fi
 
 This patch does not apply clean to the ipsec-next tree
 because of some crypto changes I don't have in ipsec-next.
 The IPsec part should apply to the cryptodev tree,
 so it's probaply the best if we route this patchset
 through the cryptodev tree.

I should have mentioned that the patchset is on top of cryptodev tree and
previous crypto patches that I send yesterday, likely to cause problems
atleast at tcrypt.c:

http://marc.info/?l=linux-crypto-vgerm=136534223503368w=2

-Jussi

 
 Herbert,
 
 are you going to take these patches?
 

--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/5] crypto: x86 - add more optimized XTS-mode for serpent-avx

2013-04-08 Thread Jussi Kivilinna
This patch adds AVX optimized XTS-mode helper functions/macros and converts
serpent-avx to use the new facilities. Benefits are slightly improved speed
and reduced stack usage as use of temporary IV-array is avoided.

tcrypt results, with Intel i5-2450M:
enc dec
16B 1.00x   1.00x
64B 1.00x   1.00x
256B1.04x   1.06x
1024B   1.09x   1.09x
8192B   1.10x   1.09x

Signed-off-by: Jussi Kivilinna jussi.kivili...@iki.fi
---
 arch/x86/crypto/glue_helper-asm-avx.S   |   61 +
 arch/x86/crypto/glue_helper.c   |   97 +++
 arch/x86/crypto/serpent-avx-x86_64-asm_64.S |   45 -
 arch/x86/crypto/serpent_avx_glue.c  |   87 +---
 arch/x86/include/asm/crypto/glue_helper.h   |   24 +++
 arch/x86/include/asm/crypto/serpent-avx.h   |5 +
 6 files changed, 273 insertions(+), 46 deletions(-)

diff --git a/arch/x86/crypto/glue_helper-asm-avx.S 
b/arch/x86/crypto/glue_helper-asm-avx.S
index f7b6ea2..02ee230 100644
--- a/arch/x86/crypto/glue_helper-asm-avx.S
+++ b/arch/x86/crypto/glue_helper-asm-avx.S
@@ -1,7 +1,7 @@
 /*
  * Shared glue code for 128bit block ciphers, AVX assembler macros
  *
- * Copyright (c) 2012 Jussi Kivilinna jussi.kivili...@mbnet.fi
+ * Copyright © 2012-2013 Jussi Kivilinna jussi.kivili...@iki.fi
  *
  * This program is free software; you can redistribute it and/or modify
  * it under the terms of the GNU General Public License as published by
@@ -89,3 +89,62 @@
vpxor (6*16)(src), x6, x6; \
vpxor (7*16)(src), x7, x7; \
store_8way(dst, x0, x1, x2, x3, x4, x5, x6, x7);
+
+#define gf128mul_x_ble(iv, mask, tmp) \
+   vpsrad $31, iv, tmp; \
+   vpaddq iv, iv, iv; \
+   vpshufd $0x13, tmp, tmp; \
+   vpand mask, tmp, tmp; \
+   vpxor tmp, iv, iv;
+
+#define load_xts_8way(iv, src, dst, x0, x1, x2, x3, x4, x5, x6, x7, tiv, t0, \
+ t1, xts_gf128mul_and_shl1_mask) \
+   vmovdqa xts_gf128mul_and_shl1_mask, t0; \
+   \
+   /* load IV */ \
+   vmovdqu (iv), tiv; \
+   vpxor (0*16)(src), tiv, x0; \
+   vmovdqu tiv, (0*16)(dst); \
+   \
+   /* construct and store IVs, also xor with source */ \
+   gf128mul_x_ble(tiv, t0, t1); \
+   vpxor (1*16)(src), tiv, x1; \
+   vmovdqu tiv, (1*16)(dst); \
+   \
+   gf128mul_x_ble(tiv, t0, t1); \
+   vpxor (2*16)(src), tiv, x2; \
+   vmovdqu tiv, (2*16)(dst); \
+   \
+   gf128mul_x_ble(tiv, t0, t1); \
+   vpxor (3*16)(src), tiv, x3; \
+   vmovdqu tiv, (3*16)(dst); \
+   \
+   gf128mul_x_ble(tiv, t0, t1); \
+   vpxor (4*16)(src), tiv, x4; \
+   vmovdqu tiv, (4*16)(dst); \
+   \
+   gf128mul_x_ble(tiv, t0, t1); \
+   vpxor (5*16)(src), tiv, x5; \
+   vmovdqu tiv, (5*16)(dst); \
+   \
+   gf128mul_x_ble(tiv, t0, t1); \
+   vpxor (6*16)(src), tiv, x6; \
+   vmovdqu tiv, (6*16)(dst); \
+   \
+   gf128mul_x_ble(tiv, t0, t1); \
+   vpxor (7*16)(src), tiv, x7; \
+   vmovdqu tiv, (7*16)(dst); \
+   \
+   gf128mul_x_ble(tiv, t0, t1); \
+   vmovdqu tiv, (iv);
+
+#define store_xts_8way(dst, x0, x1, x2, x3, x4, x5, x6, x7) \
+   vpxor (0*16)(dst), x0, x0; \
+   vpxor (1*16)(dst), x1, x1; \
+   vpxor (2*16)(dst), x2, x2; \
+   vpxor (3*16)(dst), x3, x3; \
+   vpxor (4*16)(dst), x4, x4; \
+   vpxor (5*16)(dst), x5, x5; \
+   vpxor (6*16)(dst), x6, x6; \
+   vpxor (7*16)(dst), x7, x7; \
+   store_8way(dst, x0, x1, x2, x3, x4, x5, x6, x7);
diff --git a/arch/x86/crypto/glue_helper.c b/arch/x86/crypto/glue_helper.c
index 22ce4f6..432f1d76 100644
--- a/arch/x86/crypto/glue_helper.c
+++ b/arch/x86/crypto/glue_helper.c
@@ -1,7 +1,7 @@
 /*
  * Shared glue code for 128bit block ciphers
  *
- * Copyright (c) 2012 Jussi Kivilinna jussi.kivili...@mbnet.fi
+ * Copyright © 2012-2013 Jussi Kivilinna jussi.kivili...@iki.fi
  *
  * CBC  ECB parts based on code (crypto/cbc.c,ecb.c) by:
  *   Copyright (c) 2006 Herbert Xu herb...@gondor.apana.org.au
@@ -304,4 +304,99 @@ int glue_ctr_crypt_128bit(const struct common_glue_ctx 
*gctx,
 }
 EXPORT_SYMBOL_GPL(glue_ctr_crypt_128bit);
 
+static unsigned int __glue_xts_crypt_128bit(const struct common_glue_ctx *gctx,
+   void *ctx,
+   struct blkcipher_desc *desc,
+   struct blkcipher_walk *walk)
+{
+   const unsigned int bsize = 128 / 8;
+   unsigned int nbytes = walk-nbytes;
+   u128 *src = (u128 *)walk-src.virt.addr;
+   u128 *dst = (u128 *)walk-dst.virt.addr;
+   unsigned int num_blocks, func_bytes;
+   unsigned int i;
+
+   /* Process multi-block batch */
+   for (i = 0; i  gctx-num_funcs; i++) {
+   num_blocks = gctx-funcs[i].num_blocks;
+   func_bytes = bsize * num_blocks;
+
+   if (nbytes = func_bytes

[PATCH 3/5] crypto: cast6-avx: use new optimized XTS code

2013-04-08 Thread Jussi Kivilinna
Change cast6-avx to use the new XTS code, for smaller stack usage and small
boost to performance.

tcrypt results, with Intel i5-2450M:
enc dec
16B 1.01x   1.01x
64B 1.01x   1.00x
256B1.09x   1.02x
1024B   1.08x   1.06x
8192B   1.08x   1.07x

Signed-off-by: Jussi Kivilinna jussi.kivili...@iki.fi
---
 arch/x86/crypto/cast6-avx-x86_64-asm_64.S |   48 +++
 arch/x86/crypto/cast6_avx_glue.c  |   91 -
 2 files changed, 98 insertions(+), 41 deletions(-)

diff --git a/arch/x86/crypto/cast6-avx-x86_64-asm_64.S 
b/arch/x86/crypto/cast6-avx-x86_64-asm_64.S
index f93b610..e3531f8 100644
--- a/arch/x86/crypto/cast6-avx-x86_64-asm_64.S
+++ b/arch/x86/crypto/cast6-avx-x86_64-asm_64.S
@@ -4,7 +4,7 @@
  * Copyright (C) 2012 Johannes Goetzfried
  * johannes.goetzfr...@informatik.stud.uni-erlangen.de
  *
- * Copyright © 2012 Jussi Kivilinna jussi.kivili...@mbnet.fi
+ * Copyright © 2012-2013 Jussi Kivilinna jussi.kivili...@iki.fi
  *
  * This program is free software; you can redistribute it and/or modify
  * it under the terms of the GNU General Public License as published by
@@ -227,6 +227,8 @@
 .data
 
 .align 16
+.Lxts_gf128mul_and_shl1_mask:
+   .byte 0x87, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0
 .Lbswap_mask:
.byte 3, 2, 1, 0, 7, 6, 5, 4, 11, 10, 9, 8, 15, 14, 13, 12
 .Lbswap128_mask:
@@ -424,3 +426,47 @@ ENTRY(cast6_ctr_8way)
 
ret;
 ENDPROC(cast6_ctr_8way)
+
+ENTRY(cast6_xts_enc_8way)
+   /* input:
+*  %rdi: ctx, CTX
+*  %rsi: dst
+*  %rdx: src
+*  %rcx: iv (t ⊕ αⁿ ∈ GF(2¹²⁸))
+*/
+
+   movq %rsi, %r11;
+
+   /* regs = src, dst = IVs, regs = regs xor IVs */
+   load_xts_8way(%rcx, %rdx, %rsi, RA1, RB1, RC1, RD1, RA2, RB2, RC2, RD2,
+ RX, RKR, RKM, .Lxts_gf128mul_and_shl1_mask);
+
+   call __cast6_enc_blk8;
+
+   /* dst = regs xor IVs(in dst) */
+   store_xts_8way(%r11, RA1, RB1, RC1, RD1, RA2, RB2, RC2, RD2);
+
+   ret;
+ENDPROC(cast6_xts_enc_8way)
+
+ENTRY(cast6_xts_dec_8way)
+   /* input:
+*  %rdi: ctx, CTX
+*  %rsi: dst
+*  %rdx: src
+*  %rcx: iv (t ⊕ αⁿ ∈ GF(2¹²⁸))
+*/
+
+   movq %rsi, %r11;
+
+   /* regs = src, dst = IVs, regs = regs xor IVs */
+   load_xts_8way(%rcx, %rdx, %rsi, RA1, RB1, RC1, RD1, RA2, RB2, RC2, RD2,
+ RX, RKR, RKM, .Lxts_gf128mul_and_shl1_mask);
+
+   call __cast6_dec_blk8;
+
+   /* dst = regs xor IVs(in dst) */
+   store_xts_8way(%r11, RA1, RB1, RC1, RD1, RA2, RB2, RC2, RD2);
+
+   ret;
+ENDPROC(cast6_xts_dec_8way)
diff --git a/arch/x86/crypto/cast6_avx_glue.c b/arch/x86/crypto/cast6_avx_glue.c
index 92f7ca2..8d0dfb8 100644
--- a/arch/x86/crypto/cast6_avx_glue.c
+++ b/arch/x86/crypto/cast6_avx_glue.c
@@ -4,6 +4,8 @@
  * Copyright (C) 2012 Johannes Goetzfried
  * johannes.goetzfr...@informatik.stud.uni-erlangen.de
  *
+ * Copyright © 2013 Jussi Kivilinna jussi.kivili...@iki.fi
+ *
  * This program is free software; you can redistribute it and/or modify
  * it under the terms of the GNU General Public License as published by
  * the Free Software Foundation; either version 2 of the License, or
@@ -50,6 +52,23 @@ asmlinkage void cast6_cbc_dec_8way(struct cast6_ctx *ctx, u8 
*dst,
 asmlinkage void cast6_ctr_8way(struct cast6_ctx *ctx, u8 *dst, const u8 *src,
   le128 *iv);
 
+asmlinkage void cast6_xts_enc_8way(struct cast6_ctx *ctx, u8 *dst,
+  const u8 *src, le128 *iv);
+asmlinkage void cast6_xts_dec_8way(struct cast6_ctx *ctx, u8 *dst,
+  const u8 *src, le128 *iv);
+
+static void cast6_xts_enc(void *ctx, u128 *dst, const u128 *src, le128 *iv)
+{
+   glue_xts_crypt_128bit_one(ctx, dst, src, iv,
+ GLUE_FUNC_CAST(__cast6_encrypt));
+}
+
+static void cast6_xts_dec(void *ctx, u128 *dst, const u128 *src, le128 *iv)
+{
+   glue_xts_crypt_128bit_one(ctx, dst, src, iv,
+ GLUE_FUNC_CAST(__cast6_decrypt));
+}
+
 static void cast6_crypt_ctr(void *ctx, u128 *dst, const u128 *src, le128 *iv)
 {
be128 ctrblk;
@@ -87,6 +106,19 @@ static const struct common_glue_ctx cast6_ctr = {
} }
 };
 
+static const struct common_glue_ctx cast6_enc_xts = {
+   .num_funcs = 2,
+   .fpu_blocks_limit = CAST6_PARALLEL_BLOCKS,
+
+   .funcs = { {
+   .num_blocks = CAST6_PARALLEL_BLOCKS,
+   .fn_u = { .xts = GLUE_XTS_FUNC_CAST(cast6_xts_enc_8way) }
+   }, {
+   .num_blocks = 1,
+   .fn_u = { .xts = GLUE_XTS_FUNC_CAST(cast6_xts_enc) }
+   } }
+};
+
 static const struct common_glue_ctx cast6_dec = {
.num_funcs = 2,
.fpu_blocks_limit = CAST6_PARALLEL_BLOCKS,
@@ -113,6 +145,19 @@ static const struct common_glue_ctx cast6_dec_cbc = {
} }
 };
 
+static

[PATCH 2/5] crypto: x86/twofish-avx - use optimized XTS code

2013-04-08 Thread Jussi Kivilinna
Change twofish-avx to use the new XTS code, for smaller stack usage and small
boost to performance.

tcrypt results, with Intel i5-2450M:
enc dec
16B 1.03x   1.02x
64B 0.91x   0.91x
256B1.10x   1.09x
1024B   1.12x   1.11x
8192B   1.12x   1.11x

Since XTS is practically always used with data blocks of size 512 bytes or
more, I chose to not make use of twofish-3way for block sized smaller than
128 bytes. This causes slower result in tcrypt for 64 bytes.

Signed-off-by: Jussi Kivilinna jussi.kivili...@iki.fi
---
 arch/x86/crypto/twofish-avx-x86_64-asm_64.S |   48 ++
 arch/x86/crypto/twofish_avx_glue.c  |   91 +++
 2 files changed, 98 insertions(+), 41 deletions(-)

diff --git a/arch/x86/crypto/twofish-avx-x86_64-asm_64.S 
b/arch/x86/crypto/twofish-avx-x86_64-asm_64.S
index 8d3e113..0505813 100644
--- a/arch/x86/crypto/twofish-avx-x86_64-asm_64.S
+++ b/arch/x86/crypto/twofish-avx-x86_64-asm_64.S
@@ -4,7 +4,7 @@
  * Copyright (C) 2012 Johannes Goetzfried
  * johannes.goetzfr...@informatik.stud.uni-erlangen.de
  *
- * Copyright © 2012 Jussi Kivilinna jussi.kivili...@mbnet.fi
+ * Copyright © 2012-2013 Jussi Kivilinna jussi.kivili...@iki.fi
  *
  * This program is free software; you can redistribute it and/or modify
  * it under the terms of the GNU General Public License as published by
@@ -33,6 +33,8 @@
 
 .Lbswap128_mask:
.byte 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0
+.Lxts_gf128mul_and_shl1_mask:
+   .byte 0x87, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0
 
 .text
 
@@ -408,3 +410,47 @@ ENTRY(twofish_ctr_8way)
 
ret;
 ENDPROC(twofish_ctr_8way)
+
+ENTRY(twofish_xts_enc_8way)
+   /* input:
+*  %rdi: ctx, CTX
+*  %rsi: dst
+*  %rdx: src
+*  %rcx: iv (t ⊕ αⁿ ∈ GF(2¹²⁸))
+*/
+
+   movq %rsi, %r11;
+
+   /* regs = src, dst = IVs, regs = regs xor IVs */
+   load_xts_8way(%rcx, %rdx, %rsi, RA1, RB1, RC1, RD1, RA2, RB2, RC2, RD2,
+ RX0, RX1, RY0, .Lxts_gf128mul_and_shl1_mask);
+
+   call __twofish_enc_blk8;
+
+   /* dst = regs xor IVs(in dst) */
+   store_xts_8way(%r11, RC1, RD1, RA1, RB1, RC2, RD2, RA2, RB2);
+
+   ret;
+ENDPROC(twofish_xts_enc_8way)
+
+ENTRY(twofish_xts_dec_8way)
+   /* input:
+*  %rdi: ctx, CTX
+*  %rsi: dst
+*  %rdx: src
+*  %rcx: iv (t ⊕ αⁿ ∈ GF(2¹²⁸))
+*/
+
+   movq %rsi, %r11;
+
+   /* regs = src, dst = IVs, regs = regs xor IVs */
+   load_xts_8way(%rcx, %rdx, %rsi, RC1, RD1, RA1, RB1, RC2, RD2, RA2, RB2,
+ RX0, RX1, RY0, .Lxts_gf128mul_and_shl1_mask);
+
+   call __twofish_dec_blk8;
+
+   /* dst = regs xor IVs(in dst) */
+   store_xts_8way(%r11, RA1, RB1, RC1, RD1, RA2, RB2, RC2, RD2);
+
+   ret;
+ENDPROC(twofish_xts_dec_8way)
diff --git a/arch/x86/crypto/twofish_avx_glue.c 
b/arch/x86/crypto/twofish_avx_glue.c
index 94ac91d..a62ba54 100644
--- a/arch/x86/crypto/twofish_avx_glue.c
+++ b/arch/x86/crypto/twofish_avx_glue.c
@@ -4,6 +4,8 @@
  * Copyright (C) 2012 Johannes Goetzfried
  * johannes.goetzfr...@informatik.stud.uni-erlangen.de
  *
+ * Copyright © 2013 Jussi Kivilinna jussi.kivili...@iki.fi
+ *
  * This program is free software; you can redistribute it and/or modify
  * it under the terms of the GNU General Public License as published by
  * the Free Software Foundation; either version 2 of the License, or
@@ -56,12 +58,29 @@ asmlinkage void twofish_cbc_dec_8way(struct twofish_ctx 
*ctx, u8 *dst,
 asmlinkage void twofish_ctr_8way(struct twofish_ctx *ctx, u8 *dst,
 const u8 *src, le128 *iv);
 
+asmlinkage void twofish_xts_enc_8way(struct twofish_ctx *ctx, u8 *dst,
+const u8 *src, le128 *iv);
+asmlinkage void twofish_xts_dec_8way(struct twofish_ctx *ctx, u8 *dst,
+const u8 *src, le128 *iv);
+
 static inline void twofish_enc_blk_3way(struct twofish_ctx *ctx, u8 *dst,
const u8 *src)
 {
__twofish_enc_blk_3way(ctx, dst, src, false);
 }
 
+static void twofish_xts_enc(void *ctx, u128 *dst, const u128 *src, le128 *iv)
+{
+   glue_xts_crypt_128bit_one(ctx, dst, src, iv,
+ GLUE_FUNC_CAST(twofish_enc_blk));
+}
+
+static void twofish_xts_dec(void *ctx, u128 *dst, const u128 *src, le128 *iv)
+{
+   glue_xts_crypt_128bit_one(ctx, dst, src, iv,
+ GLUE_FUNC_CAST(twofish_dec_blk));
+}
+
 
 static const struct common_glue_ctx twofish_enc = {
.num_funcs = 3,
@@ -95,6 +114,19 @@ static const struct common_glue_ctx twofish_ctr = {
} }
 };
 
+static const struct common_glue_ctx twofish_enc_xts = {
+   .num_funcs = 2,
+   .fpu_blocks_limit = TWOFISH_PARALLEL_BLOCKS,
+
+   .funcs = { {
+   .num_blocks = TWOFISH_PARALLEL_BLOCKS

[PATCH 4/5] crypto: x86/camellia-aesni-avx - add more optimized XTS code

2013-04-08 Thread Jussi Kivilinna
Add more optimized XTS code for camellia-aesni-avx, for smaller stack usage
and small boost for speed.

tcrypt results, with Intel i5-2450M:
enc dec
16B 1.10x   1.01x
64B 0.82x   0.77x
256B1.14x   1.10x
1024B   1.17x   1.16x
8192B   1.10x   1.11x

Since XTS is practically always used with data blocks of size 512 bytes or
more, I chose to not make use of camellia-2way for block sized smaller than
256 bytes. This causes slower result in tcrypt for 64 bytes.

Signed-off-by: Jussi Kivilinna jussi.kivili...@iki.fi
---
 arch/x86/crypto/camellia-aesni-avx-asm_64.S |  180 +++
 arch/x86/crypto/camellia_aesni_avx_glue.c   |   91 --
 2 files changed, 229 insertions(+), 42 deletions(-)

diff --git a/arch/x86/crypto/camellia-aesni-avx-asm_64.S 
b/arch/x86/crypto/camellia-aesni-avx-asm_64.S
index cfc1634..ce71f92 100644
--- a/arch/x86/crypto/camellia-aesni-avx-asm_64.S
+++ b/arch/x86/crypto/camellia-aesni-avx-asm_64.S
@@ -1,7 +1,7 @@
 /*
  * x86_64/AVX/AES-NI assembler implementation of Camellia
  *
- * Copyright © 2012 Jussi Kivilinna jussi.kivili...@mbnet.fi
+ * Copyright © 2012-2013 Jussi Kivilinna jussi.kivili...@iki.fi
  *
  * This program is free software; you can redistribute it and/or modify
  * it under the terms of the GNU General Public License as published by
@@ -589,6 +589,10 @@ 
ENDPROC(roundsm16_x4_x5_x6_x7_x0_x1_x2_x3_y4_y5_y6_y7_y0_y1_y2_y3_ab)
 .Lbswap128_mask:
.byte 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0
 
+/* For XTS mode IV generation */
+.Lxts_gf128mul_and_shl1_mask:
+   .byte 0x87, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0
+
 /*
  * pre-SubByte transform
  *
@@ -1090,3 +1094,177 @@ ENTRY(camellia_ctr_16way)
 
ret;
 ENDPROC(camellia_ctr_16way)
+
+#define gf128mul_x_ble(iv, mask, tmp) \
+   vpsrad $31, iv, tmp; \
+   vpaddq iv, iv, iv; \
+   vpshufd $0x13, tmp, tmp; \
+   vpand mask, tmp, tmp; \
+   vpxor tmp, iv, iv;
+
+.align 8
+camellia_xts_crypt_16way:
+   /* input:
+*  %rdi: ctx, CTX
+*  %rsi: dst (16 blocks)
+*  %rdx: src (16 blocks)
+*  %rcx: iv (t ⊕ αⁿ ∈ GF(2¹²⁸))
+*  %r8: index for input whitening key
+*  %r9: pointer to  __camellia_enc_blk16 or __camellia_dec_blk16
+*/
+
+   subq $(16 * 16), %rsp;
+   movq %rsp, %rax;
+
+   vmovdqa .Lxts_gf128mul_and_shl1_mask, %xmm14;
+
+   /* load IV */
+   vmovdqu (%rcx), %xmm0;
+   vpxor 0 * 16(%rdx), %xmm0, %xmm15;
+   vmovdqu %xmm15, 15 * 16(%rax);
+   vmovdqu %xmm0, 0 * 16(%rsi);
+
+   /* construct IVs */
+   gf128mul_x_ble(%xmm0, %xmm14, %xmm15);
+   vpxor 1 * 16(%rdx), %xmm0, %xmm15;
+   vmovdqu %xmm15, 14 * 16(%rax);
+   vmovdqu %xmm0, 1 * 16(%rsi);
+
+   gf128mul_x_ble(%xmm0, %xmm14, %xmm15);
+   vpxor 2 * 16(%rdx), %xmm0, %xmm13;
+   vmovdqu %xmm0, 2 * 16(%rsi);
+
+   gf128mul_x_ble(%xmm0, %xmm14, %xmm15);
+   vpxor 3 * 16(%rdx), %xmm0, %xmm12;
+   vmovdqu %xmm0, 3 * 16(%rsi);
+
+   gf128mul_x_ble(%xmm0, %xmm14, %xmm15);
+   vpxor 4 * 16(%rdx), %xmm0, %xmm11;
+   vmovdqu %xmm0, 4 * 16(%rsi);
+
+   gf128mul_x_ble(%xmm0, %xmm14, %xmm15);
+   vpxor 5 * 16(%rdx), %xmm0, %xmm10;
+   vmovdqu %xmm0, 5 * 16(%rsi);
+
+   gf128mul_x_ble(%xmm0, %xmm14, %xmm15);
+   vpxor 6 * 16(%rdx), %xmm0, %xmm9;
+   vmovdqu %xmm0, 6 * 16(%rsi);
+
+   gf128mul_x_ble(%xmm0, %xmm14, %xmm15);
+   vpxor 7 * 16(%rdx), %xmm0, %xmm8;
+   vmovdqu %xmm0, 7 * 16(%rsi);
+
+   gf128mul_x_ble(%xmm0, %xmm14, %xmm15);
+   vpxor 8 * 16(%rdx), %xmm0, %xmm7;
+   vmovdqu %xmm0, 8 * 16(%rsi);
+
+   gf128mul_x_ble(%xmm0, %xmm14, %xmm15);
+   vpxor 9 * 16(%rdx), %xmm0, %xmm6;
+   vmovdqu %xmm0, 9 * 16(%rsi);
+
+   gf128mul_x_ble(%xmm0, %xmm14, %xmm15);
+   vpxor 10 * 16(%rdx), %xmm0, %xmm5;
+   vmovdqu %xmm0, 10 * 16(%rsi);
+
+   gf128mul_x_ble(%xmm0, %xmm14, %xmm15);
+   vpxor 11 * 16(%rdx), %xmm0, %xmm4;
+   vmovdqu %xmm0, 11 * 16(%rsi);
+
+   gf128mul_x_ble(%xmm0, %xmm14, %xmm15);
+   vpxor 12 * 16(%rdx), %xmm0, %xmm3;
+   vmovdqu %xmm0, 12 * 16(%rsi);
+
+   gf128mul_x_ble(%xmm0, %xmm14, %xmm15);
+   vpxor 13 * 16(%rdx), %xmm0, %xmm2;
+   vmovdqu %xmm0, 13 * 16(%rsi);
+
+   gf128mul_x_ble(%xmm0, %xmm14, %xmm15);
+   vpxor 14 * 16(%rdx), %xmm0, %xmm1;
+   vmovdqu %xmm0, 14 * 16(%rsi);
+
+   gf128mul_x_ble(%xmm0, %xmm14, %xmm15);
+   vpxor 15 * 16(%rdx), %xmm0, %xmm15;
+   vmovdqu %xmm15, 0 * 16(%rax);
+   vmovdqu %xmm0, 15 * 16(%rsi);
+
+   gf128mul_x_ble(%xmm0, %xmm14, %xmm15);
+   vmovdqu %xmm0, (%rcx);
+
+   /* inpack16_pre: */
+   vmovq (key_table)(CTX, %r8, 8), %xmm15;
+   vpshufb .Lpack_bswap, %xmm15, %xmm15;
+   vpxor 0 * 16(%rax), %xmm15, %xmm0;
+   vpxor %xmm1, %xmm15, %xmm1;
+   vpxor %xmm2, %xmm15, %xmm2;
+   vpxor %xmm3

[PATCH 5/5] crypto: aesni_intel - add more optimized XTS mode for x86-64

2013-04-08 Thread Jussi Kivilinna
Add more optimized XTS code for aesni_intel in 64-bit mode, for smaller stack
usage and boost for speed.

tcrypt results, with Intel i5-2450M:
256-bit key
enc dec
16B 0.98x   0.99x
64B 0.64x   0.63x
256B1.29x   1.32x
1024B   1.54x   1.58x
8192B   1.57x   1.60x

512-bit key
enc dec
16B 0.98x   0.99x
64B 0.60x   0.59x
256B1.24x   1.25x
1024B   1.39x   1.42x
8192B   1.38x   1.42x

I chose not to optimize smaller than block size of 256 bytes, since XTS is
practically always used with data blocks of size 512 bytes. This is why
performance is reduced in tcrypt for 64 byte long blocks.

Cc: Huang Ying ying.hu...@intel.com
Signed-off-by: Jussi Kivilinna jussi.kivili...@iki.fi
---
 arch/x86/crypto/aesni-intel_asm.S  |  117 
 arch/x86/crypto/aesni-intel_glue.c |   80 +
 crypto/Kconfig |1 
 3 files changed, 198 insertions(+)

diff --git a/arch/x86/crypto/aesni-intel_asm.S 
b/arch/x86/crypto/aesni-intel_asm.S
index 04b7977..62fe22c 100644
--- a/arch/x86/crypto/aesni-intel_asm.S
+++ b/arch/x86/crypto/aesni-intel_asm.S
@@ -34,6 +34,10 @@
 
 #ifdef __x86_64__
 .data
+.align 16
+.Lgf128mul_x_ble_mask:
+   .octa 0x00010087
+
 POLY:   .octa 0xC201
 TWOONE: .octa 0x00010001
 
@@ -105,6 +109,8 @@ enc:.octa 0x2
 #define CTR%xmm11
 #define INC%xmm12
 
+#define GF128MUL_MASK %xmm10
+
 #ifdef __x86_64__
 #define AREG   %rax
 #define KEYP   %rdi
@@ -2636,4 +2642,115 @@ ENTRY(aesni_ctr_enc)
 .Lctr_enc_just_ret:
ret
 ENDPROC(aesni_ctr_enc)
+
+/*
+ * _aesni_gf128mul_x_ble:  internal ABI
+ * Multiply in GF(2^128) for XTS IVs
+ * input:
+ * IV: current IV
+ * GF128MUL_MASK == mask with 0x87 and 0x01
+ * output:
+ * IV: next IV
+ * changed:
+ * CTR:== temporary value
+ */
+#define _aesni_gf128mul_x_ble() \
+   pshufd $0x13, IV, CTR; \
+   paddq IV, IV; \
+   psrad $31, CTR; \
+   pand GF128MUL_MASK, CTR; \
+   pxor CTR, IV;
+
+/*
+ * void aesni_xts_crypt8(struct crypto_aes_ctx *ctx, const u8 *dst, u8 *src,
+ *  bool enc, u8 *iv)
+ */
+ENTRY(aesni_xts_crypt8)
+   cmpb $0, %cl
+   movl $0, %ecx
+   movl $240, %r10d
+   leaq _aesni_enc4, %r11
+   leaq _aesni_dec4, %rax
+   cmovel %r10d, %ecx
+   cmoveq %rax, %r11
+
+   movdqa .Lgf128mul_x_ble_mask, GF128MUL_MASK
+   movups (IVP), IV
+
+   mov 480(KEYP), KLEN
+   addq %rcx, KEYP
+
+   movdqa IV, STATE1
+   pxor 0x00(INP), STATE1
+   movdqu IV, 0x00(OUTP)
+
+   _aesni_gf128mul_x_ble()
+   movdqa IV, STATE2
+   pxor 0x10(INP), STATE2
+   movdqu IV, 0x10(OUTP)
+
+   _aesni_gf128mul_x_ble()
+   movdqa IV, STATE3
+   pxor 0x20(INP), STATE3
+   movdqu IV, 0x20(OUTP)
+
+   _aesni_gf128mul_x_ble()
+   movdqa IV, STATE4
+   pxor 0x30(INP), STATE4
+   movdqu IV, 0x30(OUTP)
+
+   call *%r11
+
+   pxor 0x00(OUTP), STATE1
+   movdqu STATE1, 0x00(OUTP)
+
+   _aesni_gf128mul_x_ble()
+   movdqa IV, STATE1
+   pxor 0x40(INP), STATE1
+   movdqu IV, 0x40(OUTP)
+
+   pxor 0x10(OUTP), STATE2
+   movdqu STATE2, 0x10(OUTP)
+
+   _aesni_gf128mul_x_ble()
+   movdqa IV, STATE2
+   pxor 0x50(INP), STATE2
+   movdqu IV, 0x50(OUTP)
+
+   pxor 0x20(OUTP), STATE3
+   movdqu STATE3, 0x20(OUTP)
+
+   _aesni_gf128mul_x_ble()
+   movdqa IV, STATE3
+   pxor 0x60(INP), STATE3
+   movdqu IV, 0x60(OUTP)
+
+   pxor 0x30(OUTP), STATE4
+   movdqu STATE4, 0x30(OUTP)
+
+   _aesni_gf128mul_x_ble()
+   movdqa IV, STATE4
+   pxor 0x70(INP), STATE4
+   movdqu IV, 0x70(OUTP)
+
+   _aesni_gf128mul_x_ble()
+   movups IV, (IVP)
+
+   call *%r11
+
+   pxor 0x40(OUTP), STATE1
+   movdqu STATE1, 0x40(OUTP)
+
+   pxor 0x50(OUTP), STATE2
+   movdqu STATE2, 0x50(OUTP)
+
+   pxor 0x60(OUTP), STATE3
+   movdqu STATE3, 0x60(OUTP)
+
+   pxor 0x70(OUTP), STATE4
+   movdqu STATE4, 0x70(OUTP)
+
+   ret
+ENDPROC(aesni_xts_crypt8)
+
 #endif
diff --git a/arch/x86/crypto/aesni-intel_glue.c 
b/arch/x86/crypto/aesni-intel_glue.c
index a0795da..f80e668 100644
--- a/arch/x86/crypto/aesni-intel_glue.c
+++ b/arch/x86/crypto/aesni-intel_glue.c
@@ -39,6 +39,9 @@
 #include crypto/internal/aead.h
 #include linux/workqueue.h
 #include linux/spinlock.h
+#ifdef CONFIG_X86_64
+#include asm/crypto/glue_helper.h
+#endif
 
 #if defined(CONFIG_CRYPTO_PCBC) || defined(CONFIG_CRYPTO_PCBC_MODULE)
 #define HAS_PCBC
@@ -102,6 +105,9 @@ void crypto_fpu_exit(void);
 asmlinkage void aesni_ctr_enc(struct crypto_aes_ctx *ctx, u8 *out,
  const u8 *in, unsigned int len, u8 *iv);
 
+asmlinkage void aesni_xts_crypt8(struct crypto_aes_ctx *ctx, u8 *out,
+const u8

[PATCH 1/4] crypto: gcm - make GMAC work when dst and src are different

2013-04-07 Thread Jussi Kivilinna
The GMAC code assumes that dst==src, which causes problems when trying to add
rfc4543(gcm(aes)) test vectors.

So fix this code to work when source and destination buffer are different.

Signed-off-by: Jussi Kivilinna jussi.kivili...@iki.fi
---
 crypto/Kconfig |1 +
 crypto/gcm.c   |   97 ++--
 2 files changed, 81 insertions(+), 17 deletions(-)

diff --git a/crypto/Kconfig b/crypto/Kconfig
index a654b13..6cc27f1 100644
--- a/crypto/Kconfig
+++ b/crypto/Kconfig
@@ -198,6 +198,7 @@ config CRYPTO_GCM
select CRYPTO_CTR
select CRYPTO_AEAD
select CRYPTO_GHASH
+   select CRYPTO_NULL
help
  Support for Galois/Counter Mode (GCM) and Galois Message
  Authentication Code (GMAC). Required for IPSec.
diff --git a/crypto/gcm.c b/crypto/gcm.c
index 137ad1e..4ff2139 100644
--- a/crypto/gcm.c
+++ b/crypto/gcm.c
@@ -37,8 +37,14 @@ struct crypto_rfc4106_ctx {
u8 nonce[4];
 };
 
+struct crypto_rfc4543_instance_ctx {
+   struct crypto_aead_spawn aead;
+   struct crypto_skcipher_spawn null;
+};
+
 struct crypto_rfc4543_ctx {
struct crypto_aead *child;
+   struct crypto_blkcipher *null;
u8 nonce[4];
 };
 
@@ -1094,20 +1100,20 @@ static int crypto_rfc4543_setauthsize(struct 
crypto_aead *parent,
 }
 
 static struct aead_request *crypto_rfc4543_crypt(struct aead_request *req,
-int enc)
+bool enc)
 {
struct crypto_aead *aead = crypto_aead_reqtfm(req);
struct crypto_rfc4543_ctx *ctx = crypto_aead_ctx(aead);
struct crypto_rfc4543_req_ctx *rctx = crypto_rfc4543_reqctx(req);
struct aead_request *subreq = rctx-subreq;
-   struct scatterlist *dst = req-dst;
+   struct scatterlist *src = req-src;
struct scatterlist *cipher = rctx-cipher;
struct scatterlist *payload = rctx-payload;
struct scatterlist *assoc = rctx-assoc;
unsigned int authsize = crypto_aead_authsize(aead);
unsigned int assoclen = req-assoclen;
-   struct page *dstp;
-   u8 *vdst;
+   struct page *srcp;
+   u8 *vsrc;
u8 *iv = PTR_ALIGN((u8 *)(rctx + 1) + crypto_aead_reqsize(ctx-child),
   crypto_aead_alignmask(ctx-child) + 1);
 
@@ -1118,19 +1124,19 @@ static struct aead_request *crypto_rfc4543_crypt(struct 
aead_request *req,
if (enc)
memset(rctx-auth_tag, 0, authsize);
else
-   scatterwalk_map_and_copy(rctx-auth_tag, dst,
+   scatterwalk_map_and_copy(rctx-auth_tag, src,
 req-cryptlen - authsize,
 authsize, 0);
 
sg_init_one(cipher, rctx-auth_tag, authsize);
 
/* construct the aad */
-   dstp = sg_page(dst);
-   vdst = PageHighMem(dstp) ? NULL : page_address(dstp) + dst-offset;
+   srcp = sg_page(src);
+   vsrc = PageHighMem(srcp) ? NULL : page_address(srcp) + src-offset;
 
sg_init_table(payload, 2);
sg_set_buf(payload, req-iv, 8);
-   scatterwalk_crypto_chain(payload, dst, vdst == req-iv + 8, 2);
+   scatterwalk_crypto_chain(payload, src, vsrc == req-iv + 8, 2);
assoclen += 8 + req-cryptlen - (enc ? 0 : authsize);
 
sg_init_table(assoc, 2);
@@ -1147,6 +1153,19 @@ static struct aead_request *crypto_rfc4543_crypt(struct 
aead_request *req,
return subreq;
 }
 
+static int crypto_rfc4543_copy_src_to_dst(struct aead_request *req, bool enc)
+{
+   struct crypto_aead *aead = crypto_aead_reqtfm(req);
+   struct crypto_rfc4543_ctx *ctx = crypto_aead_ctx(aead);
+   unsigned int authsize = crypto_aead_authsize(aead);
+   unsigned int nbytes = req-cryptlen - (enc ? 0 : authsize);
+   struct blkcipher_desc desc = {
+   .tfm = ctx-null,
+   };
+
+   return crypto_blkcipher_encrypt(desc, req-dst, req-src, nbytes);
+}
+
 static int crypto_rfc4543_encrypt(struct aead_request *req)
 {
struct crypto_aead *aead = crypto_aead_reqtfm(req);
@@ -1154,7 +1173,13 @@ static int crypto_rfc4543_encrypt(struct aead_request 
*req)
struct aead_request *subreq;
int err;
 
-   subreq = crypto_rfc4543_crypt(req, 1);
+   if (req-src != req-dst) {
+   err = crypto_rfc4543_copy_src_to_dst(req, true);
+   if (err)
+   return err;
+   }
+
+   subreq = crypto_rfc4543_crypt(req, true);
err = crypto_aead_encrypt(subreq);
if (err)
return err;
@@ -1167,7 +1192,15 @@ static int crypto_rfc4543_encrypt(struct aead_request 
*req)
 
 static int crypto_rfc4543_decrypt(struct aead_request *req)
 {
-   req = crypto_rfc4543_crypt(req, 0);
+   int err;
+
+   if (req-src != req-dst) {
+   err = crypto_rfc4543_copy_src_to_dst(req, false);
+   if (err

[PATCH 2/4] crypto: gcm - fix rfc4543 to handle async crypto correctly

2013-04-07 Thread Jussi Kivilinna
If the gcm cipher used by rfc4543 does not complete request immediately,
the authentication tag is not copied to destination buffer. Patch adds
correct async logic for this case.

Signed-off-by: Jussi Kivilinna jussi.kivili...@iki.fi
---
 crypto/gcm.c |   19 +--
 1 file changed, 17 insertions(+), 2 deletions(-)

diff --git a/crypto/gcm.c b/crypto/gcm.c
index 4ff2139..b0d3cb1 100644
--- a/crypto/gcm.c
+++ b/crypto/gcm.c
@@ -1099,6 +1099,21 @@ static int crypto_rfc4543_setauthsize(struct crypto_aead 
*parent,
return crypto_aead_setauthsize(ctx-child, authsize);
 }
 
+static void crypto_rfc4543_done(struct crypto_async_request *areq, int err)
+{
+   struct aead_request *req = areq-data;
+   struct crypto_aead *aead = crypto_aead_reqtfm(req);
+   struct crypto_rfc4543_req_ctx *rctx = crypto_rfc4543_reqctx(req);
+
+   if (!err) {
+   scatterwalk_map_and_copy(rctx-auth_tag, req-dst,
+req-cryptlen,
+crypto_aead_authsize(aead), 1);
+   }
+
+   aead_request_complete(req, err);
+}
+
 static struct aead_request *crypto_rfc4543_crypt(struct aead_request *req,
 bool enc)
 {
@@ -1145,8 +1160,8 @@ static struct aead_request *crypto_rfc4543_crypt(struct 
aead_request *req,
scatterwalk_crypto_chain(assoc, payload, 0, 2);
 
aead_request_set_tfm(subreq, ctx-child);
-   aead_request_set_callback(subreq, req-base.flags, req-base.complete,
- req-base.data);
+   aead_request_set_callback(subreq, req-base.flags, crypto_rfc4543_done,
+ req);
aead_request_set_crypt(subreq, cipher, cipher, enc ? 0 : authsize, iv);
aead_request_set_assoc(subreq, assoc, assoclen);
 

--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/4] crypto: testmgr - add AES GMAC test vectors

2013-04-07 Thread Jussi Kivilinna
Signed-off-by: Jussi Kivilinna jussi.kivili...@iki.fi
---
 crypto/tcrypt.c  |4 ++
 crypto/testmgr.c |   17 +-
 crypto/testmgr.h |   89 ++
 3 files changed, 108 insertions(+), 2 deletions(-)

diff --git a/crypto/tcrypt.c b/crypto/tcrypt.c
index 87ef7d6..6b911ef 100644
--- a/crypto/tcrypt.c
+++ b/crypto/tcrypt.c
@@ -1225,6 +1225,10 @@ static int do_test(int m)
ret += tcrypt_test(rfc4106(gcm(aes)));
break;
 
+   case 152:
+   ret += tcrypt_test(rfc4543(gcm(aes)));
+   break;
+
case 200:
test_cipher_speed(ecb(aes), ENCRYPT, sec, NULL, 0,
speed_template_16_24_32);
diff --git a/crypto/testmgr.c b/crypto/testmgr.c
index efd8b20..442ddb4 100644
--- a/crypto/testmgr.c
+++ b/crypto/testmgr.c
@@ -2696,8 +2696,6 @@ static const struct alg_test_desc alg_test_descs[] = {
}
}
}, {
-
-
.alg = rfc4309(ccm(aes)),
.test = alg_test_aead,
.fips_allowed = 1,
@@ -2714,6 +2712,21 @@ static const struct alg_test_desc alg_test_descs[] = {
}
}
}, {
+   .alg = rfc4543(gcm(aes)),
+   .test = alg_test_aead,
+   .suite = {
+   .aead = {
+   .enc = {
+   .vecs = aes_gcm_rfc4543_enc_tv_template,
+   .count = AES_GCM_4543_ENC_TEST_VECTORS
+   },
+   .dec = {
+   .vecs = aes_gcm_rfc4543_dec_tv_template,
+   .count = AES_GCM_4543_DEC_TEST_VECTORS
+   },
+   }
+   }
+   }, {
.alg = rmd128,
.test = alg_test_hash,
.suite = {
diff --git a/crypto/testmgr.h b/crypto/testmgr.h
index b5721e0..92db37d 100644
--- a/crypto/testmgr.h
+++ b/crypto/testmgr.h
@@ -12680,6 +12680,8 @@ static struct cipher_testvec 
cast6_xts_dec_tv_template[] = {
 #define AES_GCM_DEC_TEST_VECTORS 8
 #define AES_GCM_4106_ENC_TEST_VECTORS 7
 #define AES_GCM_4106_DEC_TEST_VECTORS 7
+#define AES_GCM_4543_ENC_TEST_VECTORS 1
+#define AES_GCM_4543_DEC_TEST_VECTORS 2
 #define AES_CCM_ENC_TEST_VECTORS 7
 #define AES_CCM_DEC_TEST_VECTORS 7
 #define AES_CCM_4309_ENC_TEST_VECTORS 7
@@ -18193,6 +18195,93 @@ static struct aead_testvec 
aes_gcm_rfc4106_dec_tv_template[] = {
}
 };
 
+static struct aead_testvec aes_gcm_rfc4543_enc_tv_template[] = {
+   { /* From draft-mcgrew-gcm-test-01 */
+   .key= \x4c\x80\xcd\xef\xbb\x5d\x10\xda
+ \x90\x6a\xc7\x3c\x36\x13\xa6\x34
+ \x22\x43\x3c\x64,
+   .klen   = 20,
+   .iv = zeroed_string,
+   .assoc  = \x00\x00\x43\x21\x00\x00\x00\x07,
+   .alen   = 8,
+   .input  = \x45\x00\x00\x30\xda\x3a\x00\x00
+ \x80\x01\xdf\x3b\xc0\xa8\x00\x05
+ \xc0\xa8\x00\x01\x08\x00\xc6\xcd
+ \x02\x00\x07\x00\x61\x62\x63\x64
+ \x65\x66\x67\x68\x69\x6a\x6b\x6c
+ \x6d\x6e\x6f\x70\x71\x72\x73\x74
+ \x01\x02\x02\x01,
+   .ilen   = 52,
+   .result = \x45\x00\x00\x30\xda\x3a\x00\x00
+ \x80\x01\xdf\x3b\xc0\xa8\x00\x05
+ \xc0\xa8\x00\x01\x08\x00\xc6\xcd
+ \x02\x00\x07\x00\x61\x62\x63\x64
+ \x65\x66\x67\x68\x69\x6a\x6b\x6c
+ \x6d\x6e\x6f\x70\x71\x72\x73\x74
+ \x01\x02\x02\x01\xf2\xa9\xa8\x36
+ \xe1\x55\x10\x6a\xa8\xdc\xd6\x18
+ \xe4\x09\x9a\xaa,
+   .rlen   = 68,
+   }
+};
+
+static struct aead_testvec aes_gcm_rfc4543_dec_tv_template[] = {
+   { /* From draft-mcgrew-gcm-test-01 */
+   .key= \x4c\x80\xcd\xef\xbb\x5d\x10\xda
+ \x90\x6a\xc7\x3c\x36\x13\xa6\x34
+ \x22\x43\x3c\x64,
+   .klen   = 20,
+   .iv = zeroed_string,
+   .assoc  = \x00\x00\x43\x21\x00\x00\x00\x07,
+   .alen   = 8,
+   .input  = \x45\x00\x00\x30\xda\x3a\x00\x00
+ \x80\x01\xdf\x3b\xc0\xa8\x00\x05
+ \xc0\xa8\x00\x01\x08\x00\xc6\xcd
+ \x02\x00\x07\x00\x61\x62\x63\x64
+ \x65\x66\x67\x68\x69\x6a\x6b\x6c
+ \x6d\x6e\x6f\x70\x71\x72\x73\x74
+ \x01\x02\x02\x01\xf2\xa9\xa8\x36
+ \xe1\x55\x10\x6a\xa8\xdc\xd6\x18

[PATCH 4/4] crypto: testmgr - add empty test vectors for null ciphers

2013-04-07 Thread Jussi Kivilinna
Without these, kernel log shows:
[5.984881] alg: No test for cipher_null (cipher_null-generic)
[5.985096] alg: No test for ecb(cipher_null) (ecb-cipher_null)
[5.985170] alg: No test for compress_null (compress_null-generic)
[5.985297] alg: No test for digest_null (digest_null-generic)

Signed-off-by: Jussi Kivilinna jussi.kivili...@iki.fi
---
 crypto/testmgr.c |9 +
 1 file changed, 9 insertions(+)

diff --git a/crypto/testmgr.c b/crypto/testmgr.c
index 442ddb4..f37e544 100644
--- a/crypto/testmgr.c
+++ b/crypto/testmgr.c
@@ -1913,6 +1913,9 @@ static const struct alg_test_desc alg_test_descs[] = {
}
}
}, {
+   .alg = compress_null,
+   .test = alg_test_null,
+   }, {
.alg = crc32c,
.test = alg_test_crc32c,
.fips_allowed = 1,
@@ -2127,6 +2130,9 @@ static const struct alg_test_desc alg_test_descs[] = {
}
}
}, {
+   .alg = digest_null,
+   .test = alg_test_null,
+   }, {
.alg = ecb(__aes-aesni),
.test = alg_test_null,
.fips_allowed = 1,
@@ -2237,6 +2243,9 @@ static const struct alg_test_desc alg_test_descs[] = {
}
}
}, {
+   .alg = ecb(cipher_null),
+   .test = alg_test_null,
+   }, {
.alg = ecb(des),
.test = alg_test_skcipher,
.fips_allowed = 1,

--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] crypto: gcm - fix assumption that assoc has one segment

2013-03-28 Thread Jussi Kivilinna
rfc4543(gcm(*)) code for GMAC assumes that assoc scatterlist always contains
only one segment and only makes use of this first segment. However ipsec passes
assoc with three segments when using 'extended sequence number' thus in this
case rfc4543(gcm(*)) fails to function correctly. Patch fixes this issue.

Reported-by: Chaoxing Lin chaoxing@ultra-3eti.com
Tested-by: Chaoxing Lin chaoxing@ultra-3eti.com
Cc: sta...@vger.kernel.org
Signed-off-by: Jussi Kivilinna jussi.kivili...@iki.fi
---
 crypto/gcm.c |   17 ++---
 1 file changed, 14 insertions(+), 3 deletions(-)

diff --git a/crypto/gcm.c b/crypto/gcm.c
index 137ad1e..13ccbda 100644
--- a/crypto/gcm.c
+++ b/crypto/gcm.c
@@ -44,6 +44,7 @@ struct crypto_rfc4543_ctx {
 
 struct crypto_rfc4543_req_ctx {
u8 auth_tag[16];
+   u8 assocbuf[32];
struct scatterlist cipher[1];
struct scatterlist payload[2];
struct scatterlist assoc[2];
@@ -1133,9 +1134,19 @@ static struct aead_request *crypto_rfc4543_crypt(struct 
aead_request *req,
scatterwalk_crypto_chain(payload, dst, vdst == req-iv + 8, 2);
assoclen += 8 + req-cryptlen - (enc ? 0 : authsize);
 
-   sg_init_table(assoc, 2);
-   sg_set_page(assoc, sg_page(req-assoc), req-assoc-length,
-   req-assoc-offset);
+   if (req-assoc-length == req-assoclen) {
+   sg_init_table(assoc, 2);
+   sg_set_page(assoc, sg_page(req-assoc), req-assoc-length,
+   req-assoc-offset);
+   } else {
+   BUG_ON(req-assoclen  sizeof(rctx-assocbuf));
+
+   scatterwalk_map_and_copy(rctx-assocbuf, req-assoc, 0,
+req-assoclen, 0);
+
+   sg_init_table(assoc, 2);
+   sg_set_buf(assoc, rctx-assocbuf, req-assoclen);
+   }
scatterwalk_crypto_chain(assoc, payload, 0, 2);
 
aead_request_set_tfm(subreq, ctx-child);

--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: potential bug in GMAC implementation. not work in ESN mode

2013-03-26 Thread Jussi Kivilinna
On 25.03.2013 18:12, Chaoxing Lin wrote:
 2nd ping
 
 Nobody is maintaining crypto/gcm.c?
 
 
 
 -Original Message-
 From: Chaoxing Lin 
 Sent: Friday, March 08, 2013 11:38 AM
 To: 'linux-crypto@vger.kernel.org'
 Subject: potential bug in GMAC implementation. not work in ESN mode
 
 I was testing ipsec with GMAC and found that the rfc4543 GMAC implementation 
 in kernel software crypto work in esp=aes256gmac-noesn! mode.
 It does not work in in esp=aes256gmac-esn! mode. The tunnel was established 
 but no data traffic is possible.
 
 Looking at source code, I found this piece of code is suspicious.
 Line 1146~1147 tries to put req-assoc to assoc[1]. But I think this way only 
 works when req-assoc has only one segment. In ESN mode, req-assoc contains 
 3 segments (SPI, SN-hi, SN-low). Line 1146~1147 will only attach SPI 
 segment(with total length) in assoc.
 
 Please let me know whether I understand it right.

Your analysis seems correct. Does attached the patch fix the problem? (I've 
only compile tested it.)

-Jussi

 Thanks,
 
 Chaoxing
 
 
 Source from kernel 3.8.2
 path: root/crypto/gcm.c
 
 1136: /* construct the aad */
 1137: dstp = sg_page(dst);
   vdst = PageHighMem(dstp) ? NULL : page_address(dstp) + dst-offset;
 
   sg_init_table(payload, 2);
   sg_set_buf(payload, req-iv, 8);
   scatterwalk_crypto_chain(payload, dst, vdst == req-iv + 8, 2);
   assoclen += 8 + req-cryptlen - (enc ? 0 : authsize);
 
   sg_init_table(assoc, 2);
 1146: sg_set_page(assoc, sg_page(req-assoc), req-assoc-length,
 1147: req-assoc-offset);
   scatterwalk_crypto_chain(assoc, payload, 0, 2);
 
   aead_request_set_tfm(subreq, ctx-child);
   aead_request_set_callback(subreq, req-base.flags, req-base.complete,
 req-base.data);
   aead_request_set_crypt(subreq, cipher, cipher, enc ? 0 : authsize, iv);
 1154: aead_request_set_assoc(subreq, assoc, assoclen);
 --
 To unsubscribe from this list: send the line unsubscribe linux-crypto in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 

crypto: gcm - fix assumption that assoc has one segment

From: Jussi Kivilinna jussi.kivili...@iki.fi

Signed-off-by: Jussi Kivilinna jussi.kivili...@iki.fi
---
 crypto/gcm.c|   17 ++---
 crypto/tcrypt.c |4 
 2 files changed, 18 insertions(+), 3 deletions(-)

diff --git a/crypto/gcm.c b/crypto/gcm.c
index 137ad1e..13ccbda 100644
--- a/crypto/gcm.c
+++ b/crypto/gcm.c
@@ -44,6 +44,7 @@ struct crypto_rfc4543_ctx {
 
 struct crypto_rfc4543_req_ctx {
 	u8 auth_tag[16];
+	u8 assocbuf[32];
 	struct scatterlist cipher[1];
 	struct scatterlist payload[2];
 	struct scatterlist assoc[2];
@@ -1133,9 +1134,19 @@ static struct aead_request *crypto_rfc4543_crypt(struct aead_request *req,
 	scatterwalk_crypto_chain(payload, dst, vdst == req-iv + 8, 2);
 	assoclen += 8 + req-cryptlen - (enc ? 0 : authsize);
 
-	sg_init_table(assoc, 2);
-	sg_set_page(assoc, sg_page(req-assoc), req-assoc-length,
-		req-assoc-offset);
+	if (req-assoc-length == req-assoclen) {
+		sg_init_table(assoc, 2);
+		sg_set_page(assoc, sg_page(req-assoc), req-assoc-length,
+			req-assoc-offset);
+	} else {
+		BUG_ON(req-assoclen  sizeof(rctx-assocbuf));
+
+		scatterwalk_map_and_copy(rctx-assocbuf, req-assoc, 0,
+	 req-assoclen, 0);
+
+		sg_init_table(assoc, 2);
+		sg_set_buf(assoc, rctx-assocbuf, req-assoclen);
+	}
 	scatterwalk_crypto_chain(assoc, payload, 0, 2);
 
 	aead_request_set_tfm(subreq, ctx-child);
diff --git a/crypto/tcrypt.c b/crypto/tcrypt.c
index 87ef7d6..6b911ef 100644
--- a/crypto/tcrypt.c
+++ b/crypto/tcrypt.c
@@ -1225,6 +1225,10 @@ static int do_test(int m)
 		ret += tcrypt_test(rfc4106(gcm(aes)));
 		break;
 
+	case 152:
+		ret += tcrypt_test(rfc4543(gcm(aes)));
+		break;
+
 	case 200:
 		test_cipher_speed(ecb(aes), ENCRYPT, sec, NULL, 0,
 speed_template_16_24_32);


signature.asc
Description: OpenPGP digital signature


[PATCH 2/2] crypto: cast_common - change email address for Jussi Kivilinna

2013-03-07 Thread Jussi Kivilinna
Change my email address from @mbnet.fi to @iki.fi in crypto/*

Signed-off-by: Jussi Kivilinna jussi.kivili...@iki.fi
---
 crypto/cast_common.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/crypto/cast_common.c b/crypto/cast_common.c
index a15f523..8924925 100644
--- a/crypto/cast_common.c
+++ b/crypto/cast_common.c
@@ -3,7 +3,7 @@
  *
  * Copyright © 1998, 1999, 2000, 2001 Free Software Foundation, Inc.
  * Copyright © 2003 Kartikey Mahendra Bhatt kartik...@hotmail.com
- * Copyright © 2012 Jussi Kivilinna jussi.kivili...@mbnet.fi
+ * Copyright © 2012 Jussi Kivilinna jussi.kivili...@iki.fi
  *
  * This program is free software; you can redistribute it and/or modify it
  * under the terms of GNU General Public License as published by the Free

--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/2] crypto: x86 - change email address for Jussi Kivilinna

2013-03-07 Thread Jussi Kivilinna
Change my email address from @mbnet.fi to @iki.fi in arch/x86/crypto/*.

Signed-off-by: Jussi Kivilinna jussi.kivili...@iki.fi
---
 arch/x86/crypto/ablk_helper.c|2 +-
 arch/x86/crypto/blowfish-x86_64-asm_64.S |2 +-
 arch/x86/crypto/blowfish_glue.c  |2 +-
 arch/x86/crypto/camellia-aesni-avx-asm_64.S  |2 +-
 arch/x86/crypto/camellia-x86_64-asm_64.S |2 +-
 arch/x86/crypto/camellia_aesni_avx_glue.c|2 +-
 arch/x86/crypto/camellia_glue.c  |2 +-
 arch/x86/crypto/cast5-avx-x86_64-asm_64.S|2 +-
 arch/x86/crypto/cast6-avx-x86_64-asm_64.S|2 +-
 arch/x86/crypto/glue_helper-asm-avx.S|2 +-
 arch/x86/crypto/glue_helper.c|2 +-
 arch/x86/crypto/serpent-avx-x86_64-asm_64.S  |2 +-
 arch/x86/crypto/serpent-sse2-i586-asm_32.S   |2 +-
 arch/x86/crypto/serpent-sse2-x86_64-asm_64.S |2 +-
 arch/x86/crypto/serpent_avx_glue.c   |2 +-
 arch/x86/crypto/serpent_sse2_glue.c  |2 +-
 arch/x86/crypto/twofish-avx-x86_64-asm_64.S  |2 +-
 arch/x86/crypto/twofish-x86_64-asm_64-3way.S |2 +-
 arch/x86/crypto/twofish_glue_3way.c  |2 +-
 19 files changed, 19 insertions(+), 19 deletions(-)

diff --git a/arch/x86/crypto/ablk_helper.c b/arch/x86/crypto/ablk_helper.c
index 43282fe..08d4186 100644
--- a/arch/x86/crypto/ablk_helper.c
+++ b/arch/x86/crypto/ablk_helper.c
@@ -1,7 +1,7 @@
 /*
  * Shared async block cipher helpers
  *
- * Copyright (c) 2012 Jussi Kivilinna jussi.kivili...@mbnet.fi
+ * Copyright (c) 2012 Jussi Kivilinna jussi.kivili...@iki.fi
  *
  * Based on aesni-intel_glue.c by:
  *  Copyright (C) 2008, Intel Corp.
diff --git a/arch/x86/crypto/blowfish-x86_64-asm_64.S 
b/arch/x86/crypto/blowfish-x86_64-asm_64.S
index 246c670..4e97088 100644
--- a/arch/x86/crypto/blowfish-x86_64-asm_64.S
+++ b/arch/x86/crypto/blowfish-x86_64-asm_64.S
@@ -1,7 +1,7 @@
 /*
  * Blowfish Cipher Algorithm (x86_64)
  *
- * Copyright (C) 2011 Jussi Kivilinna jussi.kivili...@mbnet.fi
+ * Copyright (C) 2011 Jussi Kivilinna jussi.kivili...@iki.fi
  *
  * This program is free software; you can redistribute it and/or modify
  * it under the terms of the GNU General Public License as published by
diff --git a/arch/x86/crypto/blowfish_glue.c b/arch/x86/crypto/blowfish_glue.c
index 50ec333..eb1e2b5 100644
--- a/arch/x86/crypto/blowfish_glue.c
+++ b/arch/x86/crypto/blowfish_glue.c
@@ -1,7 +1,7 @@
 /*
  * Glue Code for assembler optimized version of Blowfish
  *
- * Copyright (c) 2011 Jussi Kivilinna jussi.kivili...@mbnet.fi
+ * Copyright (c) 2011 Jussi Kivilinna jussi.kivili...@iki.fi
  *
  * CBC  ECB parts based on code (crypto/cbc.c,ecb.c) by:
  *   Copyright (c) 2006 Herbert Xu herb...@gondor.apana.org.au
diff --git a/arch/x86/crypto/camellia-aesni-avx-asm_64.S 
b/arch/x86/crypto/camellia-aesni-avx-asm_64.S
index cfc1634..879a736 100644
--- a/arch/x86/crypto/camellia-aesni-avx-asm_64.S
+++ b/arch/x86/crypto/camellia-aesni-avx-asm_64.S
@@ -1,7 +1,7 @@
 /*
  * x86_64/AVX/AES-NI assembler implementation of Camellia
  *
- * Copyright © 2012 Jussi Kivilinna jussi.kivili...@mbnet.fi
+ * Copyright © 2012 Jussi Kivilinna jussi.kivili...@iki.fi
  *
  * This program is free software; you can redistribute it and/or modify
  * it under the terms of the GNU General Public License as published by
diff --git a/arch/x86/crypto/camellia-x86_64-asm_64.S 
b/arch/x86/crypto/camellia-x86_64-asm_64.S
index 310319c..f2b52f9 100644
--- a/arch/x86/crypto/camellia-x86_64-asm_64.S
+++ b/arch/x86/crypto/camellia-x86_64-asm_64.S
@@ -1,7 +1,7 @@
 /*
  * Camellia Cipher Algorithm (x86_64)
  *
- * Copyright (C) 2012 Jussi Kivilinna jussi.kivili...@mbnet.fi
+ * Copyright (C) 2012 Jussi Kivilinna jussi.kivili...@iki.fi
  *
  * This program is free software; you can redistribute it and/or modify
  * it under the terms of the GNU General Public License as published by
diff --git a/arch/x86/crypto/camellia_aesni_avx_glue.c 
b/arch/x86/crypto/camellia_aesni_avx_glue.c
index 96cbb60..321e9f4 100644
--- a/arch/x86/crypto/camellia_aesni_avx_glue.c
+++ b/arch/x86/crypto/camellia_aesni_avx_glue.c
@@ -1,7 +1,7 @@
 /*
  * Glue Code for x86_64/AVX/AES-NI assembler optimized version of Camellia
  *
- * Copyright © 2012 Jussi Kivilinna jussi.kivili...@mbnet.fi
+ * Copyright © 2012 Jussi Kivilinna jussi.kivili...@iki.fi
  *
  * This program is free software; you can redistribute it and/or modify
  * it under the terms of the GNU General Public License as published by
diff --git a/arch/x86/crypto/camellia_glue.c b/arch/x86/crypto/camellia_glue.c
index 5cb86cc..3de9391 100644
--- a/arch/x86/crypto/camellia_glue.c
+++ b/arch/x86/crypto/camellia_glue.c
@@ -1,7 +1,7 @@
 /*
  * Glue Code for assembler optimized version of Camellia
  *
- * Copyright (c) 2012 Jussi Kivilinna jussi.kivili...@mbnet.fi
+ * Copyright (c) 2012 Jussi Kivilinna jussi.kivili...@iki.fi
  *
  * Camellia parts based on code by:
  *  Copyright (C) 2006 NTT (Nippon Telegraph

Re: [PATCH] CMAC support for CryptoAPI, fixed patch issues, indent, and testmgr build issues

2013-01-24 Thread Jussi Kivilinna

Quoting YOSHIFUJI Hideaki yoshf...@linux-ipv6.org:


YOSHIFUJI Hideaki wrote:

Jussi Kivilinna wrote:


diff --git a/include/uapi/linux/pfkeyv2.h
b/include/uapi/linux/pfkeyv2.h
index 0b80c80..d61898e 100644
--- a/include/uapi/linux/pfkeyv2.h
+++ b/include/uapi/linux/pfkeyv2.h
@@ -296,6 +296,7 @@ struct sadb_x_kmaddress {
 #define SADB_X_AALG_SHA2_512HMAC7
 #define SADB_X_AALG_RIPEMD160HMAC8
 #define SADB_X_AALG_AES_XCBC_MAC9
+#define SADB_X_AALG_AES_CMAC_MAC10
 #define SADB_X_AALG_NULL251/* kame */
 #define SADB_AALG_MAX251


Should these values be based on IANA assigned IPSEC AH transform
identifiers?

https://www.iana.org/assignments/isakmp-registry/isakmp-registry.xml#isakmp-registry-6


There is no CMAC entry apparently ... despite the fact that CMAC  
is a proposed RFC standard for IPsec.


It might be safer to move that to 14 since it's currently  
unassigned and then go through whatever channels are required to  
allocate it.  Mostly this affects key setting.  So this means my  
patch would break AH_RSA setkey calls (which the kernel doesn't  
support anyways).




Problem seems to be that PFKEYv2 does not quite work with IKEv2,  
and XFRM API should be used instead. There is new numbers assigned  
for IKEv2:  
https://www.iana.org/assignments/ikev2-parameters/ikev2-parameters.xml#ikev2-parameters-7


For new SADB_X_AALG_*, I'd think you should use value from  
Reserved for private use range. Maybe 250?


We can choose any value unless we do not break existing
binaries.  When IKE used, the daemon is responsible
for translation.


I meant, we can choose any values if we do not break ...



Ok, so giving '10' to AES-CMAC is fine after all?

And if I'd want to add Camellia-CTR and Camellia-CCM support, I can  
choose next free numbers from SADB_X_EALG_*?


-Jussi


--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] CMAC support for CryptoAPI, fixed patch issues, indent, and testmgr build issues

2013-01-24 Thread Jussi Kivilinna
Quoting Steffen Klassert steffen.klass...@secunet.com:

 On Wed, Jan 23, 2013 at 05:35:10PM +0200, Jussi Kivilinna wrote:

 Problem seems to be that PFKEYv2 does not quite work with IKEv2, and
 XFRM API should be used instead. There is new numbers assigned for
 IKEv2: 
 https://www.iana.org/assignments/ikev2-parameters/ikev2-parameters.xml#ikev2-parameters-7

 For new SADB_X_AALG_*, I'd think you should use value from Reserved
 for private use range. Maybe 250?

 This would be an option, but we have just a few slots for private
 algorithms.


 But maybe better solution might be to not make AES-CMAC (or other
 new algorithms) available throught PFKEY API at all, just XFRM?


 It is probably the best to make new algorithms unavailable for pfkey
 as long as they have no official ikev1 iana transform identifier.

 But how to do that? Perhaps we can assign SADB_X_AALG_NOPFKEY to
 the private value 255 and return -EINVAL if pfkey tries to register
 such an algorithm. The netlink interface does not use these
 identifiers, everything should work as expected. So it should be
 possible to use these algoritms with iproute2 and the most modern
 ike deamons.

Maybe it would be cleaner to not mess with pfkeyv2.h at all, but instead mark 
algorithms that do not support pfkey with flag. See patch below.

Then I started looking up if sadb_alg_id is being used somewhere outside pfkey. 
Seems that its value is just being copied around.. but at 
http://lxr.linux.no/linux+v3.7/net/xfrm/xfrm_policy.c#L1991; it's used as 
bit-index. So do larger values than 31 break some stuff? Can multiple 
algorithms have same sadb_alg_id value? Also in af_key.c, sadb_alg_id being 
used as bit-index.

-Jussi

---
ONLY COMPILE TESTED!
---
 include/net/xfrm.h   |5 +++--
 net/key/af_key.c |   39 +++
 net/xfrm/xfrm_algo.c |   12 ++--
 3 files changed, 40 insertions(+), 16 deletions(-)

diff --git a/include/net/xfrm.h b/include/net/xfrm.h
index 421f764..5d5eec2 100644
--- a/include/net/xfrm.h
+++ b/include/net/xfrm.h
@@ -1320,6 +1320,7 @@ struct xfrm_algo_desc {
char *name;
char *compat;
u8 available:1;
+   u8 sadb_disabled:1;
union {
struct xfrm_algo_aead_info aead;
struct xfrm_algo_auth_info auth;
@@ -1561,8 +1562,8 @@ extern void xfrm_input_init(void);
 extern int xfrm_parse_spi(struct sk_buff *skb, u8 nexthdr, __be32 *spi, __be32 
*seq);
 
 extern void xfrm_probe_algs(void);
-extern int xfrm_count_auth_supported(void);
-extern int xfrm_count_enc_supported(void);
+extern int xfrm_count_sadb_auth_supported(void);
+extern int xfrm_count_sadb_enc_supported(void);
 extern struct xfrm_algo_desc *xfrm_aalg_get_byidx(unsigned int idx);
 extern struct xfrm_algo_desc *xfrm_ealg_get_byidx(unsigned int idx);
 extern struct xfrm_algo_desc *xfrm_aalg_get_byid(int alg_id);
diff --git a/net/key/af_key.c b/net/key/af_key.c
index 5b426a6..307cf1d 100644
--- a/net/key/af_key.c
+++ b/net/key/af_key.c
@@ -816,18 +816,21 @@ static struct sk_buff *__pfkey_xfrm_state2msg(const 
struct xfrm_state *x,
sa-sadb_sa_auth = 0;
if (x-aalg) {
struct xfrm_algo_desc *a = 
xfrm_aalg_get_byname(x-aalg-alg_name, 0);
-   sa-sadb_sa_auth = a ? a-desc.sadb_alg_id : 0;
+   sa-sadb_sa_auth = (a  !a-sadb_disabled) ?
+   a-desc.sadb_alg_id : 0;
}
sa-sadb_sa_encrypt = 0;
BUG_ON(x-ealg  x-calg);
if (x-ealg) {
struct xfrm_algo_desc *a = 
xfrm_ealg_get_byname(x-ealg-alg_name, 0);
-   sa-sadb_sa_encrypt = a ? a-desc.sadb_alg_id : 0;
+   sa-sadb_sa_encrypt = (a  !a-sadb_disabled) ?
+   a-desc.sadb_alg_id : 0;
}
/* KAME compatible: sadb_sa_encrypt is overloaded with calg id */
if (x-calg) {
struct xfrm_algo_desc *a = 
xfrm_calg_get_byname(x-calg-alg_name, 0);
-   sa-sadb_sa_encrypt = a ? a-desc.sadb_alg_id : 0;
+   sa-sadb_sa_encrypt = (a  !a-sadb_disabled) ?
+   a-desc.sadb_alg_id : 0;
}
 
sa-sadb_sa_flags = 0;
@@ -1138,7 +1141,7 @@ static struct xfrm_state * pfkey_msg2xfrm_state(struct 
net *net,
if (sa-sadb_sa_auth) {
int keysize = 0;
struct xfrm_algo_desc *a = xfrm_aalg_get_byid(sa-sadb_sa_auth);
-   if (!a) {
+   if (!a || a-sadb_disabled) {
err = -ENOSYS;
goto out;
}
@@ -1160,7 +1163,7 @@ static struct xfrm_state * pfkey_msg2xfrm_state(struct 
net *net,
if (sa-sadb_sa_encrypt) {
if (hdr-sadb_msg_satype == SADB_X_SATYPE_IPCOMP) {
struct xfrm_algo_desc *a = 
xfrm_calg_get_byid(sa-sadb_sa_encrypt);
-   if (!a) {
+   if (!a || a-sadb_disabled

Re: [PATCH] CMAC support for CryptoAPI, fixed patch issues, indent, and testmgr build issues

2013-01-23 Thread Jussi Kivilinna

Quoting Tom St Denis tstde...@elliptictech.com:


- Original Message -

From: Jussi Kivilinna jussi.kivili...@mbnet.fi
To: Tom St Denis tstde...@elliptictech.com
Cc: linux-ker...@vger.kernel.org, Herbert Xu  
herb...@gondor.apana.org.au, David Miller da...@davemloft.net,
linux-crypto@vger.kernel.org, Steffen Klassert  
steffen.klass...@secunet.com, net...@vger.kernel.org

Sent: Wednesday, 23 January, 2013 9:36:44 AM
Subject: Re: [PATCH] CMAC support for CryptoAPI, fixed patch  
issues, indent, and testmgr build issues


Quoting Tom St Denis tstde...@elliptictech.com:

 Hey all,

 Here's an updated patch which addresses a couple of build issues
 and
 coding style complaints.

 I still can't get it to run via testmgr I get

 [  162.407807] alg: No test for cmac(aes) (cmac(aes-generic))

 Despite the fact I have an entry for cmac(aes) (much like
 xcbc(aes)...).

 Here's the patch to bring 3.8-rc4 up with CMAC ...

 Signed-off-by: Tom St Denis tstde...@elliptictech.com

snip
 diff --git a/include/uapi/linux/pfkeyv2.h
 b/include/uapi/linux/pfkeyv2.h
 index 0b80c80..d61898e 100644
 --- a/include/uapi/linux/pfkeyv2.h
 +++ b/include/uapi/linux/pfkeyv2.h
 @@ -296,6 +296,7 @@ struct sadb_x_kmaddress {
  #define SADB_X_AALG_SHA2_512HMAC  7
  #define SADB_X_AALG_RIPEMD160HMAC 8
  #define SADB_X_AALG_AES_XCBC_MAC  9
 +#define SADB_X_AALG_AES_CMAC_MAC  10
  #define SADB_X_AALG_NULL  251 /* kame */
  #define SADB_AALG_MAX 251

Should these values be based on IANA assigned IPSEC AH transform
identifiers?

https://www.iana.org/assignments/isakmp-registry/isakmp-registry.xml#isakmp-registry-6


There is no CMAC entry apparently ... despite the fact that CMAC is  
a proposed RFC standard for IPsec.


It might be safer to move that to 14 since it's currently unassigned  
and then go through whatever channels are required to allocate it.   
Mostly this affects key setting.  So this means my patch would break  
AH_RSA setkey calls (which the kernel doesn't support anyways).




Problem seems to be that PFKEYv2 does not quite work with IKEv2, and  
XFRM API should be used instead. There is new numbers assigned for  
IKEv2:  
https://www.iana.org/assignments/ikev2-parameters/ikev2-parameters.xml#ikev2-parameters-7


For new SADB_X_AALG_*, I'd think you should use value from Reserved  
for private use range. Maybe 250?


But maybe better solution might be to not make AES-CMAC (or other new  
algorithms) available throught PFKEY API at all, just XFRM?


-Jussi


--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] crypto: testmgr - add test vector for fcrypt

2013-01-19 Thread Jussi Kivilinna
fcrypt is used only as pcbc(fcrypt), but testmgr does not know this.
Use the zero key, zero plaintext pcbc(fcrypt) test vector for
testing plain 'fcrypt' to hide no test for fcrypt warnings.

Signed-off-by: Jussi Kivilinna jussi.kivili...@mbnet.fi
---
 crypto/testmgr.c |   15 +++
 1 file changed, 15 insertions(+)

diff --git a/crypto/testmgr.c b/crypto/testmgr.c
index edf4a08..efd8b20 100644
--- a/crypto/testmgr.c
+++ b/crypto/testmgr.c
@@ -2269,6 +2269,21 @@ static const struct alg_test_desc alg_test_descs[] = {
}
}
}, {
+   .alg = ecb(fcrypt),
+   .test = alg_test_skcipher,
+   .suite = {
+   .cipher = {
+   .enc = {
+   .vecs = fcrypt_pcbc_enc_tv_template,
+   .count = 1
+   },
+   .dec = {
+   .vecs = fcrypt_pcbc_dec_tv_template,
+   .count = 1
+   }
+   }
+   }
+   }, {
.alg = ecb(khazad),
.test = alg_test_skcipher,
.suite = {

--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 01/12] crypto: x86/aes - assembler clean-ups: use ENTRY/ENDPROC, localize jump targets

2013-01-19 Thread Jussi Kivilinna
Signed-off-by: Jussi Kivilinna jussi.kivili...@mbnet.fi
---
 arch/x86/crypto/aes-i586-asm_32.S   |   15 +--
 arch/x86/crypto/aes-x86_64-asm_64.S |   30 +++---
 2 files changed, 20 insertions(+), 25 deletions(-)

diff --git a/arch/x86/crypto/aes-i586-asm_32.S 
b/arch/x86/crypto/aes-i586-asm_32.S
index b949ec2..2849dbc 100644
--- a/arch/x86/crypto/aes-i586-asm_32.S
+++ b/arch/x86/crypto/aes-i586-asm_32.S
@@ -36,6 +36,7 @@
 .file aes-i586-asm.S
 .text
 
+#include linux/linkage.h
 #include asm/asm-offsets.h
 
 #define tlen 1024   // length of each of 4 'xor' arrays (256 32-bit words)
@@ -219,14 +220,10 @@
 // AES (Rijndael) Encryption Subroutine
 /* void aes_enc_blk(struct crypto_aes_ctx *ctx, u8 *out_blk, const u8 *in_blk) 
*/
 
-.global  aes_enc_blk
-
 .extern  crypto_ft_tab
 .extern  crypto_fl_tab
 
-.align 4
-
-aes_enc_blk:
+ENTRY(aes_enc_blk)
push%ebp
mov ctx(%esp),%ebp
 
@@ -290,18 +287,15 @@ aes_enc_blk:
mov %r0,(%ebp)
pop %ebp
ret
+ENDPROC(aes_enc_blk)
 
 // AES (Rijndael) Decryption Subroutine
 /* void aes_dec_blk(struct crypto_aes_ctx *ctx, u8 *out_blk, const u8 *in_blk) 
*/
 
-.global  aes_dec_blk
-
 .extern  crypto_it_tab
 .extern  crypto_il_tab
 
-.align 4
-
-aes_dec_blk:
+ENTRY(aes_dec_blk)
push%ebp
mov ctx(%esp),%ebp
 
@@ -365,3 +359,4 @@ aes_dec_blk:
mov %r0,(%ebp)
pop %ebp
ret
+ENDPROC(aes_dec_blk)
diff --git a/arch/x86/crypto/aes-x86_64-asm_64.S 
b/arch/x86/crypto/aes-x86_64-asm_64.S
index 5b577d5..9105655 100644
--- a/arch/x86/crypto/aes-x86_64-asm_64.S
+++ b/arch/x86/crypto/aes-x86_64-asm_64.S
@@ -15,6 +15,7 @@
 
 .text
 
+#include linux/linkage.h
 #include asm/asm-offsets.h
 
 #define R1 %rax
@@ -49,10 +50,8 @@
 #define R11%r11
 
 #define prologue(FUNC,KEY,B128,B192,r1,r2,r3,r4,r5,r6,r7,r8,r9,r10,r11) \
-   .global FUNC;   \
-   .type   FUNC,@function; \
-   .align  8;  \
-FUNC:  movqr1,r2;  \
+   ENTRY(FUNC);\
+   movqr1,r2;  \
movqr3,r4;  \
leaqKEY+48(r8),r9;  \
movqr10,r11;\
@@ -71,14 +70,15 @@ FUNC:   movqr1,r2;  \
je  B192;   \
leaq32(r9),r9;
 
-#define epilogue(r1,r2,r3,r4,r5,r6,r7,r8,r9) \
+#define epilogue(FUNC,r1,r2,r3,r4,r5,r6,r7,r8,r9) \
movqr1,r2;  \
movqr3,r4;  \
movlr5 ## E,(r9);   \
movlr6 ## E,4(r9);  \
movlr7 ## E,8(r9);  \
movlr8 ## E,12(r9); \
-   ret;
+   ret;\
+   ENDPROC(FUNC);
 
 #define round(TAB,OFFSET,r1,r2,r3,r4,r5,r6,r7,r8,ra,rb,rc,rd) \
movzbl  r2 ## H,r5 ## E;\
@@ -133,7 +133,7 @@ FUNC:   movqr1,r2;  \
 #define entry(FUNC,KEY,B128,B192) \
prologue(FUNC,KEY,B128,B192,R2,R8,R7,R9,R1,R3,R4,R6,R10,R5,R11)
 
-#define return epilogue(R8,R2,R9,R7,R5,R6,R3,R4,R11)
+#define return(FUNC) epilogue(FUNC,R8,R2,R9,R7,R5,R6,R3,R4,R11)
 
 #define encrypt_round(TAB,OFFSET) \
round(TAB,OFFSET,R1,R2,R3,R4,R5,R6,R7,R10,R5,R6,R3,R4) \
@@ -151,12 +151,12 @@ FUNC: movqr1,r2;  \
 
 /* void aes_enc_blk(stuct crypto_tfm *tfm, u8 *out, const u8 *in) */
 
-   entry(aes_enc_blk,0,enc128,enc192)
+   entry(aes_enc_blk,0,.Le128,.Le192)
encrypt_round(crypto_ft_tab,-96)
encrypt_round(crypto_ft_tab,-80)
-enc192:encrypt_round(crypto_ft_tab,-64)
+.Le192:encrypt_round(crypto_ft_tab,-64)
encrypt_round(crypto_ft_tab,-48)
-enc128:encrypt_round(crypto_ft_tab,-32)
+.Le128:encrypt_round(crypto_ft_tab,-32)
encrypt_round(crypto_ft_tab,-16)
encrypt_round(crypto_ft_tab,  0)
encrypt_round(crypto_ft_tab, 16)
@@ -166,16 +166,16 @@ enc128:   encrypt_round(crypto_ft_tab,-32)
encrypt_round(crypto_ft_tab, 80)
encrypt_round(crypto_ft_tab, 96)
encrypt_final(crypto_fl_tab,112)
-   return
+   return(aes_enc_blk)
 
 /* void aes_dec_blk(struct crypto_tfm *tfm, u8 *out, const u8 *in) */
 
-   entry(aes_dec_blk,240,dec128,dec192)
+   entry(aes_dec_blk,240,.Ld128,.Ld192)
decrypt_round(crypto_it_tab,-96)
decrypt_round(crypto_it_tab,-80)
-dec192:decrypt_round(crypto_it_tab,-64)
+.Ld192:decrypt_round(crypto_it_tab,-64)
decrypt_round(crypto_it_tab,-48)
-dec128:decrypt_round(crypto_it_tab,-32)
+.Ld128:decrypt_round(crypto_it_tab,-32)
decrypt_round(crypto_it_tab,-16)
decrypt_round(crypto_it_tab,  0)
decrypt_round(crypto_it_tab, 16)
@@ -185,4 +185,4 @@ dec128: decrypt_round(crypto_it_tab,-32)
decrypt_round(crypto_it_tab, 80)
decrypt_round

[PATCH 07/12] crypto: x86/crc32c - assembler clean-up: use ENTRY/ENDPROC

2013-01-19 Thread Jussi Kivilinna
Signed-off-by: Jussi Kivilinna jussi.kivili...@mbnet.fi
---
 arch/x86/crypto/crc32c-pcl-intel-asm_64.S |8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/arch/x86/crypto/crc32c-pcl-intel-asm_64.S 
b/arch/x86/crypto/crc32c-pcl-intel-asm_64.S
index 93c6d39..cf1a7ec 100644
--- a/arch/x86/crypto/crc32c-pcl-intel-asm_64.S
+++ b/arch/x86/crypto/crc32c-pcl-intel-asm_64.S
@@ -42,6 +42,8 @@
  * SOFTWARE.
  */
 
+#include linux/linkage.h
+
 ## ISCSI CRC 32 Implementation with crc32 and pclmulqdq Instruction
 
 .macro LABEL prefix n
@@ -68,8 +70,7 @@
 
 # unsigned int crc_pcl(u8 *buffer, int len, unsigned int crc_init);
 
-.global crc_pcl
-crc_pcl:
+ENTRY(crc_pcl)
 #definebufp%rdi
 #definebufp_dw %edi
 #definebufp_w  %di
@@ -323,6 +324,9 @@ JMPTBL_ENTRY %i
 .noaltmacro
i=i+1
 .endr
+
+ENDPROC(crc_pcl)
+

## PCLMULQDQ tables
## Table is 128 entries x 2 quad words each

--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH][RFC] crypto: tcrypt - Ahash tests changed to run in parallel.

2013-01-18 Thread Jussi Kivilinna

Quoting Garg Vakul-B16394 b16...@freescale.com:




Does not this change make tcrypt give
inconsistent results?



Based on kernel scheduling of threads, this change can make tcrypt  
give varying results in different runs.

For consistent results, we can use existing synchronous mode crypto sessions.


But one cannot get consistent results for asynchronous software  
implementations after this patch.


-Jussi


--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH][RFC] crypto: tcrypt - Ahash tests changed to run in parallel.

2013-01-05 Thread Jussi Kivilinna

Quoting Vakul Garg va...@freescale.com:


This allows to test  run multiple parallel crypto ahash contexts.
Each of the test vector under the ahash speed test template is started
under a separate kthread.


Why you want to do this? Does not this change make tcrypt give  
inconsistent results?



--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC PATCH 0/3] Make rfc3686 template work with asynchronous block ciphers

2012-12-28 Thread Jussi Kivilinna
I'm not sure how this patchset should be dealt with (should 1st patch go
through different tree than 2nd and 3rd?), so therefore it's RFC.

Second patch makes rfc3686 template work with asynchronous block ciphers and
third patch changes aesni-intel to use this template. First patch fixed problem
in xfrm_algo found with help of 2nd and 3rd patches and without 1st patch
2nd patch breaks aes-ctr with IPSEC.


---

Jussi Kivilinna (3):
  xfrm_algo: probe asynchronous block ciphers instead of synchronous
  crypto: ctr - make rfc3686 asynchronous block cipher
  crypto: aesni-intel - remove rfc3686(ctr(aes)), utilize rfc3686 from 
ctr-module instead


 arch/x86/crypto/aesni-intel_glue.c |   37 
 crypto/ctr.c   |  173 +++-
 crypto/tcrypt.c|4 +
 crypto/tcrypt.h|1 
 net/xfrm/xfrm_algo.c   |3 -
 5 files changed, 116 insertions(+), 102 deletions(-)
--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC PATCH 1/3] xfrm_algo: probe asynchronous block ciphers instead of synchronous

2012-12-28 Thread Jussi Kivilinna
IPSEC uses block ciphers asynchronous, but probes only for synchronous block
ciphers and makes ealg entries only available if synchronous block cipher is
found. So with setup, where hardware crypto driver registers asynchronous
block ciphers and software crypto module is not build, ealg is not marked
as being available.

Use crypto_has_ablkcipher instead and remove ASYNC mask.

Signed-off-by: Jussi Kivilinna jussi.kivili...@mbnet.fi
---
 net/xfrm/xfrm_algo.c |3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/net/xfrm/xfrm_algo.c b/net/xfrm/xfrm_algo.c
index 4ce2d93..f9a5495 100644
--- a/net/xfrm/xfrm_algo.c
+++ b/net/xfrm/xfrm_algo.c
@@ -700,8 +700,7 @@ void xfrm_probe_algs(void)
}
 
for (i = 0; i  ealg_entries(); i++) {
-   status = crypto_has_blkcipher(ealg_list[i].name, 0,
- CRYPTO_ALG_ASYNC);
+   status = crypto_has_ablkcipher(ealg_list[i].name, 0, 0);
if (ealg_list[i].available != status)
ealg_list[i].available = status;
}

--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC PATCH 2/3] crypto: ctr - make rfc3686 asynchronous block cipher

2012-12-28 Thread Jussi Kivilinna
Some hardware crypto drivers register asynchronous ctr(aes), which is left
unused in IPSEC because rfc3686 template only supports synchronous block
ciphers. Some other drivers register rfc3686(ctr(aes)) to workaround this
limitation but not all.

This patch changes rfc3686 to use asynchronous block ciphers, to allow async
ctr(aes) algorithms to be utilized automatically by IPSEC.

Signed-off-by: Jussi Kivilinna jussi.kivili...@mbnet.fi
---
 crypto/ctr.c|  173 +++
 crypto/tcrypt.c |4 +
 crypto/tcrypt.h |1 
 3 files changed, 115 insertions(+), 63 deletions(-)

diff --git a/crypto/ctr.c b/crypto/ctr.c
index 4ca7222..1f2997c 100644
--- a/crypto/ctr.c
+++ b/crypto/ctr.c
@@ -12,6 +12,7 @@
 
 #include crypto/algapi.h
 #include crypto/ctr.h
+#include crypto/internal/skcipher.h
 #include linux/err.h
 #include linux/init.h
 #include linux/kernel.h
@@ -25,10 +26,15 @@ struct crypto_ctr_ctx {
 };
 
 struct crypto_rfc3686_ctx {
-   struct crypto_blkcipher *child;
+   struct crypto_ablkcipher *child;
u8 nonce[CTR_RFC3686_NONCE_SIZE];
 };
 
+struct crypto_rfc3686_req_ctx {
+   u8 iv[CTR_RFC3686_BLOCK_SIZE];
+   struct ablkcipher_request subreq CRYPTO_MINALIGN_ATTR;
+};
+
 static int crypto_ctr_setkey(struct crypto_tfm *parent, const u8 *key,
 unsigned int keylen)
 {
@@ -243,11 +249,11 @@ static struct crypto_template crypto_ctr_tmpl = {
.module = THIS_MODULE,
 };
 
-static int crypto_rfc3686_setkey(struct crypto_tfm *parent, const u8 *key,
-unsigned int keylen)
+static int crypto_rfc3686_setkey(struct crypto_ablkcipher *parent,
+const u8 *key, unsigned int keylen)
 {
-   struct crypto_rfc3686_ctx *ctx = crypto_tfm_ctx(parent);
-   struct crypto_blkcipher *child = ctx-child;
+   struct crypto_rfc3686_ctx *ctx = crypto_ablkcipher_ctx(parent);
+   struct crypto_ablkcipher *child = ctx-child;
int err;
 
/* the nonce is stored in bytes at end of key */
@@ -259,59 +265,64 @@ static int crypto_rfc3686_setkey(struct crypto_tfm 
*parent, const u8 *key,
 
keylen -= CTR_RFC3686_NONCE_SIZE;
 
-   crypto_blkcipher_clear_flags(child, CRYPTO_TFM_REQ_MASK);
-   crypto_blkcipher_set_flags(child, crypto_tfm_get_flags(parent) 
- CRYPTO_TFM_REQ_MASK);
-   err = crypto_blkcipher_setkey(child, key, keylen);
-   crypto_tfm_set_flags(parent, crypto_blkcipher_get_flags(child) 
-CRYPTO_TFM_RES_MASK);
+   crypto_ablkcipher_clear_flags(child, CRYPTO_TFM_REQ_MASK);
+   crypto_ablkcipher_set_flags(child, crypto_ablkcipher_get_flags(parent) 
+   CRYPTO_TFM_REQ_MASK);
+   err = crypto_ablkcipher_setkey(child, key, keylen);
+   crypto_ablkcipher_set_flags(parent, crypto_ablkcipher_get_flags(child) 
+   CRYPTO_TFM_RES_MASK);
 
return err;
 }
 
-static int crypto_rfc3686_crypt(struct blkcipher_desc *desc,
-   struct scatterlist *dst,
-   struct scatterlist *src, unsigned int nbytes)
+static int crypto_rfc3686_crypt(struct ablkcipher_request *req)
 {
-   struct crypto_blkcipher *tfm = desc-tfm;
-   struct crypto_rfc3686_ctx *ctx = crypto_blkcipher_ctx(tfm);
-   struct crypto_blkcipher *child = ctx-child;
-   unsigned long alignmask = crypto_blkcipher_alignmask(tfm);
-   u8 ivblk[CTR_RFC3686_BLOCK_SIZE + alignmask];
-   u8 *iv = PTR_ALIGN(ivblk + 0, alignmask + 1);
-   u8 *info = desc-info;
-   int err;
+   struct crypto_ablkcipher *tfm = crypto_ablkcipher_reqtfm(req);
+   struct crypto_rfc3686_ctx *ctx = crypto_ablkcipher_ctx(tfm);
+   struct crypto_ablkcipher *child = ctx-child;
+   unsigned long align = crypto_ablkcipher_alignmask(tfm);
+   struct crypto_rfc3686_req_ctx *rctx =
+   (void *)PTR_ALIGN((u8 *)ablkcipher_request_ctx(req), align + 1);
+   struct ablkcipher_request *subreq = rctx-subreq;
+   u8 *iv = rctx-iv;
 
/* set up counter block */
memcpy(iv, ctx-nonce, CTR_RFC3686_NONCE_SIZE);
-   memcpy(iv + CTR_RFC3686_NONCE_SIZE, info, CTR_RFC3686_IV_SIZE);
+   memcpy(iv + CTR_RFC3686_NONCE_SIZE, req-info, CTR_RFC3686_IV_SIZE);
 
/* initialize counter portion of counter block */
*(__be32 *)(iv + CTR_RFC3686_NONCE_SIZE + CTR_RFC3686_IV_SIZE) =
cpu_to_be32(1);
 
-   desc-tfm = child;
-   desc-info = iv;
-   err = crypto_blkcipher_encrypt_iv(desc, dst, src, nbytes);
-   desc-tfm = tfm;
-   desc-info = info;
+   ablkcipher_request_set_tfm(subreq, child);
+   ablkcipher_request_set_callback(subreq, req-base.flags,
+   req-base.complete, req-base.data);
+   ablkcipher_request_set_crypt(subreq, req-src, req

Re: Workaround for tcrypt bug?

2012-12-28 Thread Jussi Kivilinna

Quoting Sandra Schlichting littlesandr...@gmail.com:


Why you want to workaround this? It's safe to ignore hmac(crc32) warning.


Because it stops from proceeding. I would have expected that

modprobe tcrypt sec=1 type=1000

would have executed all test cases.

Even if I just want to test one

[root@amd ~]# modprobe tcrypt sec=2 type=402
ERROR: could not insert 'tcrypt': No such file or directory

I get an error.


I think you are using wrong module argument, type= instead of mode=.  
Try 'modprobe tcrypt sec=2 mode=402' instead.


-Jussi




--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Workaround for tcrypt bug?

2012-12-28 Thread Jussi Kivilinna

Quoting Sandra Schlichting littlesandr...@gmail.com:


I think you are using wrong module argument, type= instead of mode=. Try
'modprobe tcrypt sec=2 mode=402' instead.


Thanks. I would never have thought of that =)

Now it preforms the test, but gives this interesting error:

[root@amd ~]# modprobe tcrypt sec=2 mode=402

Message from syslogd@amd at Dec 28 14:01:05 ...
 kernel:[ 5508.698788] BUG: soft lockup - CPU#0 stuck for 22s!  
[modprobe:3416]


Tcrypt does all work in module init, which can take long time and  
therefore triggers 'soft lockup' warning.


Possible solutions are:
 1. Build kernel with CONFIG_LOCKUP_DETECTOR option disabled,
 2. Boot kernel with 'nowatchdog' argument,
 3. Ignore warning.


ERROR: could not insert 'tcrypt': Resource temporarily unavailable


Tcrypt fails to load after running tests, that's expected.

-Jussi

--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Crypto causes panic in scatterwalk_done with large/multiple buffers

2012-11-18 Thread Jussi Kivilinna

Quoting Jorgen Lundman lund...@lundman.net:



Appearently this patch only fixed my debug printk loop that used sg_next
from scatterlist API instead of scatterwalk_sg_next from scatterwalk API.
Sorry for the noise.



Thanks for looking at this. I think I am dealing with 2 problems, one is
that occasionally my buffers are from vmalloc, and needs to have some logic
using vmalloc_to_page().  But I don't know if ciphers should handle that
internally, blkcipher.c certainly seems to have several modes, although I
do not see how to *set* them.


From what I now researched, you must not pass vmalloc'd memory to  
sg_set_buf() as it internally uses virt_to_page() to get page of  
buffer address. You most likely need to walk through your vmalloc'd  
buffer and pass all individual pages to scatterlist with sg_set_page().




Second problem is most likely what you were looking at. It is quite easy to
make the crypto code die.

For example, if I use ccm(aes) which can take the dst buffer, plus a hmac
buffer;

  cipher = kmalloc( ciphersize, ...
  hmac = kmalloc( 16, ...
  sg_set_buf( sg[0], cipher, ciphersize);
  sg_set_buf( sg[1], hmac, 16);
  aead_encrypt()...

and all is well, but if you shift hmac address away from PAGE boundary, like:

  hmac = kmalloc( 16 + 32, ...
  hmac += 32;
  sg_set_buf( sg[1], hmac, 16);

ie, allocate a larger buffer, and put the pointer into the page a bit. And
it will die in scatterwalk very often. +32 isnt magical, any non-zero
number works.


This is strange as crypto subsystem's internal test mechanism uses  
such offsetted buffers.


-Jussi


--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Crypto causes panic in scatterwalk_done with large/multiple buffers

2012-11-17 Thread Jussi Kivilinna

Quoting Jussi Kivilinna jussi.kivili...@mbnet.fi:


Hello,

I managed to reproduce something similiar with small buffers...

Does attached patch help in your case?



Appearently this patch only fixed my debug printk loop that used  
sg_next from scatterlist API instead of scatterwalk_sg_next from  
scatterwalk API. Sorry for the noise.


-Jussi

--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Crypto causes panic in scatterwalk_done with large/multiple buffers

2012-11-16 Thread Jussi Kivilinna

Hello,

I managed to reproduce something similiar with small buffers...

Does attached patch help in your case?

-Jussi

Quoting Jorgen Lundman lund...@lundman.net:



I have a situation where I setup scatterlists as:

 input scatterlist of 1, address c90003627000 len 0x2.

output scatterlist of 2, address 0 c90002d45000 len 0x2
 address 1 88003b079d98 len 0x000c

When I call crypto_aead_encrypt(req); it will die with:

kernel: [  925.151113] BUG: unable to handle kernel paging request at
eb04000b5140
kernel: [  925.151253] IP: [812f4880] scatterwalk_done+0x50/0x60
kernel: [  925.151325] PGD 0
kernel: [  925.151381] Oops:  [#1] SMP
kernel: [  925.151442] CPU 1
kernel: [  925.154255]  [812f7640] blkcipher_walk_done+0xb0/0x230
kernel: [  925.154255]  [a02e9169]  
crypto_ctr_crypt+0x129/0x2b0 [ctr]

kernel: [  925.154255]  [812fe580] ? crypto_aes_set_key+0x40/0x40
kernel: [  925.154255]  [812f6cbd] async_encrypt+0x3d/0x40
kernel: [  925.154255]  [a0149326] crypto_ccm_encrypt+0x246/0x290
[ccm]
kernel: [  925.154255]  [a01633bd] crypto_encrypt+0x26d/0x2d0



What is interesting about that is, if I allocate a linear buffer instead:

  dst = kmalloc(cryptlen, GFP_KERNEL); // 0x2 + 0x000c
  sg_init_table(sg, 1 );
  sg_set_buf(sg[0], dst, cryptlen);

  crypto_aead_encrypt(req);

will no longer panic. However, when I try to copy the linear buffer back to
scatterlist;


  scatterwalk_map_and_copy(dst, sg, 0, cryptlen, 1);


then it will panic there instead.


However, if I replace it with the call:

sg_copy_from_buffer(sg, sg_nents(sg),
dst, cryptlen);

everything works! -

So, what am I doing wrong that makes scatterwalk_map_and_copy() fail, and
sg_copy_from_buffer() work fine? It would be nice if I could fix it, so I
did not need to copy to a temporary buffer.

Lund

--
Jorgen Lundman   | lund...@lundman.net
Unix Administrator   | +81 (0)3 -5456-2687 ext 1017 (work)
Shibuya-ku, Tokyo| +81 (0)90-5578-8500  (cell)
Japan| +81 (0)3 -3375-1767  (home)
--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html





crypto: scatterwalk - fix broken scatterlist manipulation

From: Jussi Kivilinna jussi.kivili...@mbnet.fi

scatterlist_sg_chain() manipulates scatterlist structures directly in wrong
way, chaining without marking 'chain' bit 0x01. This can in some cases lead
to problems, such as triggering BUG_ON(!sg-length) in scatterwalk_start().

So instead of reinventing wheel, change scatterwalk to use existing functions
from scatterlist API.
---
 include/crypto/scatterwalk.h |8 ++--
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/include/crypto/scatterwalk.h b/include/crypto/scatterwalk.h
index 3744d2a..d31870c 100644
--- a/include/crypto/scatterwalk.h
+++ b/include/crypto/scatterwalk.h
@@ -34,16 +34,12 @@ static inline void crypto_yield(u32 flags)
 static inline void scatterwalk_sg_chain(struct scatterlist *sg1, int num,
 	struct scatterlist *sg2)
 {
-	sg_set_page(sg1[num - 1], (void *)sg2, 0, 0);
-	sg1[num - 1].page_link = ~0x02;
+	sg_chain(sg1, num, sg2);
 }
 
 static inline struct scatterlist *scatterwalk_sg_next(struct scatterlist *sg)
 {
-	if (sg_is_last(sg))
-		return NULL;
-
-	return (++sg)-length ? sg : (void *)sg_page(sg);
+	return sg_next(sg);
 }
 
 static inline void scatterwalk_crypto_chain(struct scatterlist *head,


[PATCH] crypto: cast5/cast6 - move lookup tables to shared module

2012-11-13 Thread Jussi Kivilinna
CAST5 and CAST6 both use same lookup tables, which can be moved shared module
'cast_common'.

Signed-off-by: Jussi Kivilinna jussi.kivili...@mbnet.fi
---
 arch/x86/crypto/cast5-avx-x86_64-asm_64.S |   16 +-
 arch/x86/crypto/cast6-avx-x86_64-asm_64.S |   16 +-
 crypto/Kconfig|   10 +
 crypto/Makefile   |1 
 crypto/cast5_generic.c|  277 
 crypto/cast6_generic.c|  280 
 crypto/cast_common.c  |  290 +
 include/crypto/cast5.h|6 -
 include/crypto/cast6.h|6 -
 include/crypto/cast_common.h  |9 +
 10 files changed, 336 insertions(+), 575 deletions(-)
 create mode 100644 crypto/cast_common.c
 create mode 100644 include/crypto/cast_common.h

diff --git a/arch/x86/crypto/cast5-avx-x86_64-asm_64.S 
b/arch/x86/crypto/cast5-avx-x86_64-asm_64.S
index 12478e4..15b00ac 100644
--- a/arch/x86/crypto/cast5-avx-x86_64-asm_64.S
+++ b/arch/x86/crypto/cast5-avx-x86_64-asm_64.S
@@ -25,10 +25,10 @@
 
 .file cast5-avx-x86_64-asm_64.S
 
-.extern cast5_s1
-.extern cast5_s2
-.extern cast5_s3
-.extern cast5_s4
+.extern cast_s1
+.extern cast_s2
+.extern cast_s3
+.extern cast_s4
 
 /* structure of crypto context */
 #define km 0
@@ -36,10 +36,10 @@
 #define rr ((16*4)+16)
 
 /* s-boxes */
-#define s1 cast5_s1
-#define s2 cast5_s2
-#define s3 cast5_s3
-#define s4 cast5_s4
+#define s1 cast_s1
+#define s2 cast_s2
+#define s3 cast_s3
+#define s4 cast_s4
 
 /**
   16-way AVX cast5
diff --git a/arch/x86/crypto/cast6-avx-x86_64-asm_64.S 
b/arch/x86/crypto/cast6-avx-x86_64-asm_64.S
index 83a5381..2569d0d 100644
--- a/arch/x86/crypto/cast6-avx-x86_64-asm_64.S
+++ b/arch/x86/crypto/cast6-avx-x86_64-asm_64.S
@@ -27,20 +27,20 @@
 
 .file cast6-avx-x86_64-asm_64.S
 
-.extern cast6_s1
-.extern cast6_s2
-.extern cast6_s3
-.extern cast6_s4
+.extern cast_s1
+.extern cast_s2
+.extern cast_s3
+.extern cast_s4
 
 /* structure of crypto context */
 #define km 0
 #define kr (12*4*4)
 
 /* s-boxes */
-#define s1 cast6_s1
-#define s2 cast6_s2
-#define s3 cast6_s3
-#define s4 cast6_s4
+#define s1 cast_s1
+#define s2 cast_s2
+#define s3 cast_s3
+#define s4 cast_s4
 
 /**
   8-way AVX cast6
diff --git a/crypto/Kconfig b/crypto/Kconfig
index c226b2c..4641d95 100644
--- a/crypto/Kconfig
+++ b/crypto/Kconfig
@@ -841,9 +841,16 @@ config CRYPTO_CAMELLIA_SPARC64
  See also:
  https://info.isl.ntt.co.jp/crypt/eng/camellia/index_s.html
 
+config CRYPTO_CAST_COMMON
+   tristate
+   help
+ Common parts of the CAST cipher algorithms shared by the
+ generic c and the assembler implementations.
+
 config CRYPTO_CAST5
tristate CAST5 (CAST-128) cipher algorithm
select CRYPTO_ALGAPI
+   select CRYPTO_CAST_COMMON
help
  The CAST5 encryption algorithm (synonymous with CAST-128) is
  described in RFC2144.
@@ -854,6 +861,7 @@ config CRYPTO_CAST5_AVX_X86_64
select CRYPTO_ALGAPI
select CRYPTO_CRYPTD
select CRYPTO_ABLK_HELPER_X86
+   select CRYPTO_CAST_COMMON
select CRYPTO_CAST5
help
  The CAST5 encryption algorithm (synonymous with CAST-128) is
@@ -865,6 +873,7 @@ config CRYPTO_CAST5_AVX_X86_64
 config CRYPTO_CAST6
tristate CAST6 (CAST-256) cipher algorithm
select CRYPTO_ALGAPI
+   select CRYPTO_CAST_COMMON
help
  The CAST6 encryption algorithm (synonymous with CAST-256) is
  described in RFC2612.
@@ -876,6 +885,7 @@ config CRYPTO_CAST6_AVX_X86_64
select CRYPTO_CRYPTD
select CRYPTO_ABLK_HELPER_X86
select CRYPTO_GLUE_HELPER_X86
+   select CRYPTO_CAST_COMMON
select CRYPTO_CAST6
select CRYPTO_LRW
select CRYPTO_XTS
diff --git a/crypto/Makefile b/crypto/Makefile
index 8cf61ff..d59dec7 100644
--- a/crypto/Makefile
+++ b/crypto/Makefile
@@ -68,6 +68,7 @@ obj-$(CONFIG_CRYPTO_TWOFISH_COMMON) += twofish_common.o
 obj-$(CONFIG_CRYPTO_SERPENT) += serpent_generic.o
 obj-$(CONFIG_CRYPTO_AES) += aes_generic.o
 obj-$(CONFIG_CRYPTO_CAMELLIA) += camellia_generic.o
+obj-$(CONFIG_CRYPTO_CAST_COMMON) += cast_common.o
 obj-$(CONFIG_CRYPTO_CAST5) += cast5_generic.o
 obj-$(CONFIG_CRYPTO_CAST6) += cast6_generic.o
 obj-$(CONFIG_CRYPTO_ARC4) += arc4.o
diff --git a/crypto/cast5_generic.c b/crypto/cast5_generic.c
index bc525db..5558f63 100644
--- a/crypto/cast5_generic.c
+++ b/crypto/cast5_generic.c
@@ -30,275 +30,6 @@
 #include linux/types.h
 #include crypto/cast5.h
 
-
-const u32 cast5_s1[256] = {
-   0x30fb40d4, 0x9fa0ff0b, 0x6beccd2f, 0x3f258c7a, 0x1e213f2f,
-   0x9c004dd3, 0x6003e540, 0xcf9fc949

[PATCH 1/2] crypto: testmgr - remove fips_allowed flag from camellia-aesni null-tests

2012-11-13 Thread Jussi Kivilinna
Remove incorrect fips_allowed from camellia null-test entries. Caused by
incorrect copy-paste of aes-aesni null-tests into camellia-aesni null-tests.

Signed-off-by: Jussi Kivilinna jussi.kivili...@mbnet.fi
---
 crypto/testmgr.c |2 --
 1 file changed, 2 deletions(-)

diff --git a/crypto/testmgr.c b/crypto/testmgr.c
index 3933241..b8695bf 100644
--- a/crypto/testmgr.c
+++ b/crypto/testmgr.c
@@ -2175,7 +2175,6 @@ static const struct alg_test_desc alg_test_descs[] = {
}, {
.alg = cryptd(__driver-cbc-camellia-aesni),
.test = alg_test_null,
-   .fips_allowed = 1,
.suite = {
.cipher = {
.enc = {
@@ -2207,7 +2206,6 @@ static const struct alg_test_desc alg_test_descs[] = {
}, {
.alg = cryptd(__driver-ecb-camellia-aesni),
.test = alg_test_null,
-   .fips_allowed = 1,
.suite = {
.cipher = {
.enc = {

--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] crypto: testmgr - add larger crc32c test vector to test FPU path in crc32c_intel

2012-11-13 Thread Jussi Kivilinna
Signed-off-by: Jussi Kivilinna jussi.kivili...@mbnet.fi
---
 crypto/testmgr.h |  267 +-
 1 file changed, 264 insertions(+), 3 deletions(-)

diff --git a/crypto/testmgr.h b/crypto/testmgr.h
index 17db4a9..189aeb6 100644
--- a/crypto/testmgr.h
+++ b/crypto/testmgr.h
@@ -41,7 +41,7 @@ struct hash_testvec {
char *plaintext;
char *digest;
unsigned char tap[MAX_TAP];
-   unsigned char psize;
+   unsigned short psize;
unsigned char np;
unsigned char ksize;
 };
@@ -25214,7 +25214,7 @@ static struct hash_testvec michael_mic_tv_template[] = {
 /*
  * CRC32C test vectors
  */
-#define CRC32C_TEST_VECTORS 14
+#define CRC32C_TEST_VECTORS 15
 
 static struct hash_testvec crc32c_tv_template[] = {
{
@@ -25385,7 +25385,268 @@ static struct hash_testvec crc32c_tv_template[] = {
.digest = \x75\xd3\xc5\x24,
.np = 2,
.tap = { 31, 209 }
-   },
+   }, {
+   .key = \xff\xff\xff\xff,
+   .ksize = 4,
+   .plaintext =\x6e\x05\x79\x10\xa7\x1b\xb2\x49
+   \xe0\x54\xeb\x82\x19\x8d\x24\xbb
+   \x2f\xc6\x5d\xf4\x68\xff\x96\x0a
+   \xa1\x38\xcf\x43\xda\x71\x08\x7c
+   \x13\xaa\x1e\xb5\x4c\xe3\x57\xee
+   \x85\x1c\x90\x27\xbe\x32\xc9\x60
+   \xf7\x6b\x02\x99\x0d\xa4\x3b\xd2
+   \x46\xdd\x74\x0b\x7f\x16\xad\x21
+   \xb8\x4f\xe6\x5a\xf1\x88\x1f\x93
+   \x2a\xc1\x35\xcc\x63\xfa\x6e\x05
+   \x9c\x10\xa7\x3e\xd5\x49\xe0\x77
+   \x0e\x82\x19\xb0\x24\xbb\x52\xe9
+   \x5d\xf4\x8b\x22\x96\x2d\xc4\x38
+   \xcf\x66\xfd\x71\x08\x9f\x13\xaa
+   \x41\xd8\x4c\xe3\x7a\x11\x85\x1c
+   \xb3\x27\xbe\x55\xec\x60\xf7\x8e
+   \x02\x99\x30\xc7\x3b\xd2\x69\x00
+   \x74\x0b\xa2\x16\xad\x44\xdb\x4f
+   \xe6\x7d\x14\x88\x1f\xb6\x2a\xc1
+   \x58\xef\x63\xfa\x91\x05\x9c\x33
+   \xca\x3e\xd5\x6c\x03\x77\x0e\xa5
+   \x19\xb0\x47\xde\x52\xe9\x80\x17
+   \x8b\x22\xb9\x2d\xc4\x5b\xf2\x66
+   \xfd\x94\x08\x9f\x36\xcd\x41\xd8
+   \x6f\x06\x7a\x11\xa8\x1c\xb3\x4a
+   \xe1\x55\xec\x83\x1a\x8e\x25\xbc
+   \x30\xc7\x5e\xf5\x69\x00\x97\x0b
+   \xa2\x39\xd0\x44\xdb\x72\x09\x7d
+   \x14\xab\x1f\xb6\x4d\xe4\x58\xef
+   \x86\x1d\x91\x28\xbf\x33\xca\x61
+   \xf8\x6c\x03\x9a\x0e\xa5\x3c\xd3
+   \x47\xde\x75\x0c\x80\x17\xae\x22
+   \xb9\x50\xe7\x5b\xf2\x89\x20\x94
+   \x2b\xc2\x36\xcd\x64\xfb\x6f\x06
+   \x9d\x11\xa8\x3f\xd6\x4a\xe1\x78
+   \x0f\x83\x1a\xb1\x25\xbc\x53\xea
+   \x5e\xf5\x8c\x00\x97\x2e\xc5\x39
+   \xd0\x67\xfe\x72\x09\xa0\x14\xab
+   \x42\xd9\x4d\xe4\x7b\x12\x86\x1d
+   \xb4\x28\xbf\x56\xed\x61\xf8\x8f
+   \x03\x9a\x31\xc8\x3c\xd3\x6a\x01
+   \x75\x0c\xa3\x17\xae\x45\xdc\x50
+   \xe7\x7e\x15\x89\x20\xb7\x2b\xc2
+   \x59\xf0\x64\xfb\x92\x06\x9d\x34
+   \xcb\x3f\xd6\x6d\x04\x78\x0f\xa6
+   \x1a\xb1\x48\xdf\x53\xea\x81\x18
+   \x8c\x23\xba\x2e\xc5\x5c\xf3\x67
+   \xfe\x95\x09\xa0\x37\xce\x42\xd9
+   \x70\x07\x7b\x12\xa9\x1d\xb4\x4b
+   \xe2\x56\xed\x84\x1b\x8f\x26\xbd
+   \x31\xc8\x5f\xf6\x6a\x01\x98\x0c
+   \xa3\x3a\xd1\x45\xdc\x73\x0a\x7e
+   \x15\xac\x20\xb7\x4e\xe5\x59\xf0
+   \x87\x1e\x92\x29\xc0\x34\xcb\x62
+   \xf9\x6d\x04\x9b\x0f\xa6\x3d\xd4
+   \x48\xdf\x76\x0d\x81\x18\xaf\x23
+   \xba\x51\xe8\x5c\xf3\x8a\x21\x95
+   \x2c\xc3\x37\xce\x65\xfc\x70\x07
+   \x9e\x12\xa9\x40\xd7\x4b\xe2\x79
+   \x10\x84\x1b\xb2\x26\xbd\x54\xeb

Re: [PATCH 2/2] Remove VLAIS usage from crypto/testmgr.c

2012-10-31 Thread Jussi Kivilinna

Quoting Behan Webster beh...@converseincode.com:


From: Jan-Simon Möller dl...@gmx.de

The use of variable length arrays in structs (VLAIS) in the Linux Kernel code
precludes the use of compilers which don't implement VLAIS (for instance the
Clang compiler). This patch instead allocates the appropriate amount  
of memory

using an char array.

Patch from series at
http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20120507/142707.html
by PaX Team.

Signed-off-by: Jan-Simon Möller dl...@gmx.de
Cc: pagee...@freemail.hu
Signed-off-by: Behan Webster beh...@converseincode.com
---
 crypto/testmgr.c |   23 +--
 1 file changed, 13 insertions(+), 10 deletions(-)

diff --git a/crypto/testmgr.c b/crypto/testmgr.c
index 941d75c..5b7b3a6 100644
--- a/crypto/testmgr.c
+++ b/crypto/testmgr.c
@@ -1578,16 +1578,19 @@ static int alg_test_crc32c(const struct  
alg_test_desc *desc,

}

do {
-   struct {
-   struct shash_desc shash;
-   char ctx[crypto_shash_descsize(tfm)];
-   } sdesc;
-
-   sdesc.shash.tfm = tfm;
-   sdesc.shash.flags = 0;
-
-   *(u32 *)sdesc.ctx = le32_to_cpu(420553207);
-   err = crypto_shash_final(sdesc.shash, (u8 *)val);
+   char sdesc[sizeof(struct shash_desc)
+   + crypto_shash_descsize(tfm)
+   + CRYPTO_MINALIGN] CRYPTO_MINALIGN_ATTR;
+   struct shash_desc *shash = (struct shash_desc *)sdesc;
+   u32 *ctx = (u32 *)((unsigned long)(sdesc
+   + sizeof(struct shash_desc) + CRYPTO_MINALIGN - 1)
+~(CRYPTO_MINALIGN - 1));


I think you should use '(u32 *)shash_desc_ctx(shash)' instead of  
getting ctx pointer manually.



+
+   shash-tfm = tfm;
+   shash-flags = 0;
+
+   *ctx = le32_to_cpu(420553207);
+   err = crypto_shash_final(shash, (u8 *)val);
if (err) {
printk(KERN_ERR alg: crc32c: Operation failed for 
   %s: %d\n, driver, err);




--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


sha1-arm assembler and CONFIG_THUMB2_KERNEL = build error

2012-10-28 Thread Jussi Kivilinna

Hello,

I tested cryptodev-2.6 tree with ARCH=arm, and get following error  
with CONFIG_THUMB2_KERNEL=y + CONFIG_CRYPTO_SHA1_ARM=y combination.  
Config based on 'vexpress_defconfig' (config attached).


  AS  arch/arm/crypto/sha1-armv4-large.o
arch/arm/crypto/sha1-armv4-large.S: Assembler messages:
arch/arm/crypto/sha1-armv4-large.S:197: Error: r13 not allowed here --  
`teq r14,sp'
arch/arm/crypto/sha1-armv4-large.S:377: Error: r13 not allowed here --  
`teq r14,sp'
arch/arm/crypto/sha1-armv4-large.S:469: Error: r13 not allowed here --  
`teq r14,sp'


-Jussi
#
# Automatically generated file; DO NOT EDIT.
# Linux/arm 3.7.0-rc1 Kernel Configuration
#
CONFIG_ARM=y
CONFIG_SYS_SUPPORTS_APM_EMULATION=y
CONFIG_HAVE_PROC_CPU=y
CONFIG_NO_IOPORT=y
CONFIG_STACKTRACE_SUPPORT=y
CONFIG_LOCKDEP_SUPPORT=y
CONFIG_TRACE_IRQFLAGS_SUPPORT=y
CONFIG_RWSEM_GENERIC_SPINLOCK=y
CONFIG_GENERIC_HWEIGHT=y
CONFIG_GENERIC_CALIBRATE_DELAY=y
CONFIG_NEED_DMA_MAP_STATE=y
CONFIG_VECTORS_BASE=0x
CONFIG_ARM_PATCH_PHYS_VIRT=y
CONFIG_GENERIC_BUG=y
CONFIG_DEFCONFIG_LIST=/lib/modules/$UNAME_RELEASE/.config
CONFIG_HAVE_IRQ_WORK=y
CONFIG_IRQ_WORK=y

#
# General setup
#
CONFIG_EXPERIMENTAL=y
CONFIG_INIT_ENV_ARG_LIMIT=32
CONFIG_CROSS_COMPILE=
CONFIG_LOCALVERSION=
# CONFIG_LOCALVERSION_AUTO is not set
CONFIG_HAVE_KERNEL_GZIP=y
CONFIG_HAVE_KERNEL_LZMA=y
CONFIG_HAVE_KERNEL_XZ=y
CONFIG_HAVE_KERNEL_LZO=y
CONFIG_KERNEL_GZIP=y
# CONFIG_KERNEL_LZMA is not set
# CONFIG_KERNEL_XZ is not set
# CONFIG_KERNEL_LZO is not set
CONFIG_DEFAULT_HOSTNAME=(none)
CONFIG_SWAP=y
CONFIG_SYSVIPC=y
CONFIG_SYSVIPC_SYSCTL=y
# CONFIG_POSIX_MQUEUE is not set
# CONFIG_FHANDLE is not set
# CONFIG_AUDIT is not set
CONFIG_HAVE_GENERIC_HARDIRQS=y

#
# IRQ subsystem
#
CONFIG_GENERIC_HARDIRQS=y
CONFIG_GENERIC_IRQ_PROBE=y
CONFIG_GENERIC_IRQ_SHOW=y
CONFIG_HARDIRQS_SW_RESEND=y
CONFIG_IRQ_DOMAIN=y
# CONFIG_IRQ_DOMAIN_DEBUG is not set
CONFIG_SPARSE_IRQ=y
CONFIG_KTIME_SCALAR=y
CONFIG_GENERIC_CLOCKEVENTS=y
CONFIG_GENERIC_CLOCKEVENTS_BUILD=y
CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y

#
# Timers subsystem
#
# CONFIG_NO_HZ is not set
# CONFIG_HIGH_RES_TIMERS is not set

#
# CPU/Task time and stats accounting
#
CONFIG_TICK_CPU_ACCOUNTING=y
# CONFIG_BSD_PROCESS_ACCT is not set
# CONFIG_TASKSTATS is not set

#
# RCU Subsystem
#
CONFIG_TREE_RCU=y
# CONFIG_PREEMPT_RCU is not set
CONFIG_RCU_FANOUT=32
CONFIG_RCU_FANOUT_LEAF=16
# CONFIG_RCU_FANOUT_EXACT is not set
# CONFIG_TREE_RCU_TRACE is not set
CONFIG_IKCONFIG=y
CONFIG_IKCONFIG_PROC=y
CONFIG_LOG_BUF_SHIFT=14
CONFIG_CGROUPS=y
# CONFIG_CGROUP_DEBUG is not set
# CONFIG_CGROUP_FREEZER is not set
# CONFIG_CGROUP_DEVICE is not set
CONFIG_CPUSETS=y
CONFIG_PROC_PID_CPUSET=y
# CONFIG_CGROUP_CPUACCT is not set
# CONFIG_RESOURCE_COUNTERS is not set
# CONFIG_CGROUP_PERF is not set
# CONFIG_CGROUP_SCHED is not set
# CONFIG_BLK_CGROUP is not set
# CONFIG_CHECKPOINT_RESTORE is not set
CONFIG_NAMESPACES=y
# CONFIG_UTS_NS is not set
# CONFIG_IPC_NS is not set
# CONFIG_PID_NS is not set
# CONFIG_NET_NS is not set
# CONFIG_SCHED_AUTOGROUP is not set
# CONFIG_SYSFS_DEPRECATED is not set
# CONFIG_RELAY is not set
CONFIG_BLK_DEV_INITRD=y
CONFIG_INITRAMFS_SOURCE=
CONFIG_RD_GZIP=y
CONFIG_RD_BZIP2=y
CONFIG_RD_LZMA=y
CONFIG_RD_XZ=y
CONFIG_RD_LZO=y
CONFIG_CC_OPTIMIZE_FOR_SIZE=y
CONFIG_SYSCTL=y
CONFIG_ANON_INODES=y
# CONFIG_EXPERT is not set
CONFIG_HAVE_UID16=y
CONFIG_UID16=y
# CONFIG_SYSCTL_SYSCALL is not set
CONFIG_KALLSYMS=y
# CONFIG_KALLSYMS_ALL is not set
CONFIG_HOTPLUG=y
CONFIG_PRINTK=y
CONFIG_BUG=y
CONFIG_ELF_CORE=y
CONFIG_BASE_FULL=y
CONFIG_FUTEX=y
CONFIG_EPOLL=y
CONFIG_SIGNALFD=y
CONFIG_TIMERFD=y
CONFIG_EVENTFD=y
CONFIG_SHMEM=y
CONFIG_AIO=y
# CONFIG_EMBEDDED is not set
CONFIG_HAVE_PERF_EVENTS=y
CONFIG_PERF_USE_VMALLOC=y

#
# Kernel Performance Events And Counters
#
CONFIG_PERF_EVENTS=y
# CONFIG_DEBUG_PERF_USE_VMALLOC is not set
CONFIG_VM_EVENT_COUNTERS=y
CONFIG_SLUB_DEBUG=y
CONFIG_COMPAT_BRK=y
# CONFIG_SLAB is not set
CONFIG_SLUB=y
CONFIG_PROFILING=y
CONFIG_OPROFILE=y
CONFIG_HAVE_OPROFILE=y
# CONFIG_KPROBES is not set
CONFIG_JUMP_LABEL=y
CONFIG_HAVE_KPROBES=y
CONFIG_HAVE_KRETPROBES=y
CONFIG_HAVE_ARCH_TRACEHOOK=y
CONFIG_HAVE_DMA_ATTRS=y
CONFIG_HAVE_DMA_CONTIGUOUS=y
CONFIG_USE_GENERIC_SMP_HELPERS=y
CONFIG_GENERIC_SMP_IDLE_THREAD=y
CONFIG_HAVE_REGS_AND_STACK_ACCESS_API=y
CONFIG_HAVE_CLK=y
CONFIG_HAVE_DMA_API_DEBUG=y
CONFIG_HAVE_HW_BREAKPOINT=y
CONFIG_HAVE_ARCH_JUMP_LABEL=y
CONFIG_ARCH_WANT_IPC_PARSE_VERSION=y
CONFIG_GENERIC_KERNEL_THREAD=y
CONFIG_GENERIC_KERNEL_EXECVE=y
CONFIG_HAVE_MOD_ARCH_SPECIFIC=y
CONFIG_MODULES_USE_ELF_REL=y

#
# GCOV-based kernel profiling
#
# CONFIG_GCOV_KERNEL is not set
CONFIG_HAVE_GENERIC_DMA_COHERENT=y
CONFIG_SLABINFO=y
CONFIG_RT_MUTEXES=y
CONFIG_BASE_SMALL=0
CONFIG_MODULES=y
# CONFIG_MODULE_FORCE_LOAD is not set
CONFIG_MODULE_UNLOAD=y
# CONFIG_MODULE_FORCE_UNLOAD is not set
# CONFIG_MODVERSIONS is not set
# CONFIG_MODULE_SRCVERSION_ALL is not set
# CONFIG_MODULE_SIG is not set
CONFIG_STOP_MACHINE=y
CONFIG_BLOCK=y
CONFIG_LBDAF=y
# 

[PATCH 0/3] [v2] AES-NI/AVX implementation of Camellia cipher

2012-10-26 Thread Jussi Kivilinna
This patchset adds AES-NI/AVX assembler implementation of Camellia cipher
for x86-64.

[v2]:
 - No missing patches
 - No missing files

---

Jussi Kivilinna (3):
  [v2] crypto: tcrypt - add async speed test for camellia cipher
  [v2] crypto: camellia-x86_64 - share common functions and move structures 
and function definitions to header file
  [v2] crypto: camellia - add AES-NI/AVX/x86_64 assembler implementation of 
camellia cipher


 arch/x86/crypto/Makefile|3 
 arch/x86/crypto/camellia-aesni-avx-asm_64.S | 1102 +++
 arch/x86/crypto/camellia_aesni_avx_glue.c   |  558 ++
 arch/x86/crypto/camellia_glue.c |   80 +-
 arch/x86/include/asm/crypto/camellia.h  |   82 ++
 crypto/Kconfig  |   22 +
 crypto/tcrypt.c |   23 +
 crypto/testmgr.c|   62 ++
 8 files changed, 1875 insertions(+), 57 deletions(-)
 create mode 100644 arch/x86/crypto/camellia-aesni-avx-asm_64.S
 create mode 100644 arch/x86/crypto/camellia_aesni_avx_glue.c
 create mode 100644 arch/x86/include/asm/crypto/camellia.h
--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


  1   2   3   >