[PATCH v2 2/2] crypto, x86: SSSE3 based SHA1 implementation for x86-64

2011-07-24 Thread Mathias Krause
This is an assembler implementation of the SHA1 algorithm using the
Supplemental SSE3 (SSSE3) instructions or, when available, the
Advanced Vector Extensions (AVX).

Testing with the tcrypt module shows the raw hash performance is up to
2.3 times faster than the C implementation, using 8k data blocks on a
Core 2 Duo T5500. For the smalest data set (16 byte) it is still 25%
faster.

Since this implementation uses SSE/YMM registers it cannot safely be
used in every situation, e.g. while an IRQ interrupts a kernel thread.
The implementation falls back to the generic SHA1 variant, if using
the SSE/YMM registers is not possible.

With this algorithm I was able to increase the throughput of a single
IPsec link from 344 Mbit/s to 464 Mbit/s on a Core 2 Quad CPU using
the SSSE3 variant -- a speedup of +34.8%.

Saving and restoring SSE/YMM state might make the actual throughput
fluctuate when there are FPU intensive userland applications running.
For example, meassuring the performance using iperf2 directly on the
machine under test gives wobbling numbers because iperf2 uses the FPU
for each packet to check if the reporting interval has expired (in the
above test I got min/max/avg: 402/484/464 MBit/s).

Using this algorithm on a IPsec gateway gives much more reasonable and
stable numbers, albeit not as high as in the directly connected case.
Here is the result from an RFC 2544 test run with a EXFO Packet Blazer
FTB-8510:

 frame sizesha1-generic sha1-ssse3delta
64 byte 37.5 MBit/s37.5 MBit/s 0.0%
   128 byte 56.3 MBit/s62.5 MBit/s   +11.0%
   256 byte 87.5 MBit/s   100.0 MBit/s   +14.3%
   512 byte131.3 MBit/s   150.0 MBit/s   +14.2%
  1024 byte162.5 MBit/s   193.8 MBit/s   +19.3%
  1280 byte175.0 MBit/s   212.5 MBit/s   +21.4%
  1420 byte175.0 MBit/s   218.7 MBit/s   +25.0%
  1518 byte150.0 MBit/s   181.2 MBit/s   +20.8%

The throughput for the largest frame size is lower than for the
previous size because the IP packets need to be fragmented in this
case to make there way through the IPsec tunnel.

Signed-off-by: Mathias Krause 
Cc: Maxim Locktyukhin 
---
 arch/x86/crypto/Makefile  |8 +
 arch/x86/crypto/sha1_ssse3_asm.S  |  558 +
 arch/x86/crypto/sha1_ssse3_glue.c |  240 
 arch/x86/include/asm/cpufeature.h |3 +
 crypto/Kconfig|   10 +
 5 files changed, 819 insertions(+), 0 deletions(-)
 create mode 100644 arch/x86/crypto/sha1_ssse3_asm.S
 create mode 100644 arch/x86/crypto/sha1_ssse3_glue.c

diff --git a/arch/x86/crypto/Makefile b/arch/x86/crypto/Makefile
index c04f1b7..57c7f7b 100644
--- a/arch/x86/crypto/Makefile
+++ b/arch/x86/crypto/Makefile
@@ -13,6 +13,7 @@ obj-$(CONFIG_CRYPTO_AES_NI_INTEL) += aesni-intel.o
 obj-$(CONFIG_CRYPTO_GHASH_CLMUL_NI_INTEL) += ghash-clmulni-intel.o
 
 obj-$(CONFIG_CRYPTO_CRC32C_INTEL) += crc32c-intel.o
+obj-$(CONFIG_CRYPTO_SHA1_SSSE3) += sha1-ssse3.o
 
 aes-i586-y := aes-i586-asm_32.o aes_glue.o
 twofish-i586-y := twofish-i586-asm_32.o twofish_glue.o
@@ -25,3 +26,10 @@ salsa20-x86_64-y := salsa20-x86_64-asm_64.o salsa20_glue.o
 aesni-intel-y := aesni-intel_asm.o aesni-intel_glue.o fpu.o
 
 ghash-clmulni-intel-y := ghash-clmulni-intel_asm.o ghash-clmulni-intel_glue.o
+
+# enable AVX support only when $(AS) can actually assemble the instructions
+ifeq ($(call as-instr,vpxor %xmm0$(comma)%xmm1$(comma)%xmm2,yes,no),yes)
+AFLAGS_sha1_ssse3_asm.o += -DSHA1_ENABLE_AVX_SUPPORT
+CFLAGS_sha1_ssse3_glue.o += -DSHA1_ENABLE_AVX_SUPPORT
+endif
+sha1-ssse3-y := sha1_ssse3_asm.o sha1_ssse3_glue.o
diff --git a/arch/x86/crypto/sha1_ssse3_asm.S b/arch/x86/crypto/sha1_ssse3_asm.S
new file mode 100644
index 000..b2c2f57
--- /dev/null
+++ b/arch/x86/crypto/sha1_ssse3_asm.S
@@ -0,0 +1,558 @@
+/*
+ * This is a SIMD SHA-1 implementation. It requires the Intel(R) Supplemental
+ * SSE3 instruction set extensions introduced in Intel Core Microarchitecture
+ * processors. CPUs supporting Intel(R) AVX extensions will get an additional
+ * boost.
+ *
+ * This work was inspired by the vectorized implementation of Dean Gaudet.
+ * Additional information on it can be found at:
+ *http://www.arctic.org/~dean/crypto/sha1.html
+ *
+ * It was improved upon with more efficient vectorization of the message
+ * scheduling. This implementation has also been optimized for all current and
+ * several future generations of Intel CPUs.
+ *
+ * See this article for more information about the implementation details:
+ *   
http://software.intel.com/en-us/articles/improving-the-performance-of-the-secure-hash-algorithm-1/
+ *
+ * Copyright (C) 2010, Intel Corp.
+ *   Authors: Maxim Locktyukhin 
+ *Ronen Zohar 
+ *
+ * Converted to AT&T syntax and adapted for inclusion in the Linux kernel:
+ *   Author: Mathias Krause 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by

[PATCH v2 1/2] crypto, sha1: export sha1_update for reuse

2011-07-24 Thread Mathias Krause
Export the update function as crypto_sha1_update() to not have the need
to reimplement the same algorithm for each SHA-1 implementation. This
way the generic SHA-1 implementation can be used as fallback for other
implementations that fail to run under certain circumstances, like the
need for an FPU context while executing in IRQ context.

Signed-off-by: Mathias Krause 
---
 crypto/sha1_generic.c |9 +
 include/crypto/sha.h  |5 +
 2 files changed, 10 insertions(+), 4 deletions(-)

diff --git a/crypto/sha1_generic.c b/crypto/sha1_generic.c
index 0416091..0b6d907 100644
--- a/crypto/sha1_generic.c
+++ b/crypto/sha1_generic.c
@@ -36,7 +36,7 @@ static int sha1_init(struct shash_desc *desc)
return 0;
 }
 
-static int sha1_update(struct shash_desc *desc, const u8 *data,
+int crypto_sha1_update(struct shash_desc *desc, const u8 *data,
unsigned int len)
 {
struct sha1_state *sctx = shash_desc_ctx(desc);
@@ -70,6 +70,7 @@ static int sha1_update(struct shash_desc *desc, const u8 
*data,
 
return 0;
 }
+EXPORT_SYMBOL(crypto_sha1_update);
 
 
 /* Add padding and return the message digest. */
@@ -86,10 +87,10 @@ static int sha1_final(struct shash_desc *desc, u8 *out)
/* Pad out to 56 mod 64 */
index = sctx->count & 0x3f;
padlen = (index < 56) ? (56 - index) : ((64+56) - index);
-   sha1_update(desc, padding, padlen);
+   crypto_sha1_update(desc, padding, padlen);
 
/* Append length */
-   sha1_update(desc, (const u8 *)&bits, sizeof(bits));
+   crypto_sha1_update(desc, (const u8 *)&bits, sizeof(bits));
 
/* Store state in digest */
for (i = 0; i < 5; i++)
@@ -120,7 +121,7 @@ static int sha1_import(struct shash_desc *desc, const void 
*in)
 static struct shash_alg alg = {
.digestsize =   SHA1_DIGEST_SIZE,
.init   =   sha1_init,
-   .update =   sha1_update,
+   .update =   crypto_sha1_update,
.final  =   sha1_final,
.export =   sha1_export,
.import =   sha1_import,
diff --git a/include/crypto/sha.h b/include/crypto/sha.h
index 069e85b..7c46d0c 100644
--- a/include/crypto/sha.h
+++ b/include/crypto/sha.h
@@ -82,4 +82,9 @@ struct sha512_state {
u8 buf[SHA512_BLOCK_SIZE];
 };
 
+#if defined(CONFIG_CRYPTO_SHA1) || defined (CONFIG_CRYPTO_SHA1_MODULE)
+extern int crypto_sha1_update(struct shash_desc *desc, const u8 *data,
+ unsigned int len);
+#endif
+
 #endif
-- 
1.5.6.5

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 0/2] crypto, x86: assembler implementation of SHA1

2011-07-24 Thread Mathias Krause
This patch series adds an assembler implementation for the SHA1 hash algorithm
for the x86-64 architecture. Its raw hash performance can be more than 2 times
faster than the generic C implementation. This gives a real world benefit for
IPsec with an throughput increase of up to +35%. For concrete numbers have a
look at the second patch.

This implementation is currently x86-64 only but might be ported to 32 bit with
some effort in a follow up patch. (I had no time to do this yet.)

Note: The SSSE3 is no typo, it's "Supplemental SSE3".

v2 changes:
- fixed typo in Makefile making AVX version unusable
- whitespace fixes for the .S file

Regards,
Mathias

Mathias Krause (2):
  crypto, sha1: export sha1_update for reuse
  crypto, x86: SSSE3 based SHA1 implementation for x86-64

 arch/x86/crypto/Makefile  |8 +
 arch/x86/crypto/sha1_ssse3_asm.S  |  558 +
 arch/x86/crypto/sha1_ssse3_glue.c |  240 
 arch/x86/include/asm/cpufeature.h |3 +
 crypto/Kconfig|   10 +
 crypto/sha1_generic.c |9 +-
 include/crypto/sha.h  |5 +
 7 files changed, 829 insertions(+), 4 deletions(-)
 create mode 100644 arch/x86/crypto/sha1_ssse3_asm.S
 create mode 100644 arch/x86/crypto/sha1_ssse3_glue.c

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] crypto, x86: SSSE3 based SHA1 implementation for x86-64

2011-07-24 Thread Mathias Krause
On Sat, Jul 16, 2011 at 2:44 PM, Mathias Krause  wrote:
> diff --git a/arch/x86/crypto/Makefile b/arch/x86/crypto/Makefile
> index c04f1b7..a80be92 100644
> --- a/arch/x86/crypto/Makefile
> +++ b/arch/x86/crypto/Makefile
> @@ -13,6 +13,7 @@ obj-$(CONFIG_CRYPTO_AES_NI_INTEL) += aesni-intel.o
>  obj-$(CONFIG_CRYPTO_GHASH_CLMUL_NI_INTEL) += ghash-clmulni-intel.o
>
>  obj-$(CONFIG_CRYPTO_CRC32C_INTEL) += crc32c-intel.o
> +obj-$(CONFIG_CRYPTO_SHA1_SSSE3) += sha1-ssse3.o
>
>  aes-i586-y := aes-i586-asm_32.o aes_glue.o
>  twofish-i586-y := twofish-i586-asm_32.o twofish_glue.o
> @@ -25,3 +26,10 @@ salsa20-x86_64-y := salsa20-x86_64-asm_64.o salsa20_glue.o
>  aesni-intel-y := aesni-intel_asm.o aesni-intel_glue.o fpu.o
>
>  ghash-clmulni-intel-y := ghash-clmulni-intel_asm.o ghash-clmulni-intel_glue.o
> +
> +# enable AVX support only when $(AS) can actually assemble the instructions
> +ifeq ($(call as-instr,vpxor %xmm0$(comma)%xmm1$(comma)%xmm2,yes,no),yes)
> +AFLAGS_sha1_ssse3.o += -DSHA1_ENABLE_AVX_SUPPORT

This should have been

AFLAGS_sha1_ssse3_asm.o += -DSHA1_ENABLE_AVX_SUPPORT

instead. Sorry, a missing adjustment for a "last minute file rename".
I'll post a new version of the series with a wider target audience
since there have been no reply so far for a week.

> +CFLAGS_sha1_ssse3_glue.o += -DSHA1_ENABLE_AVX_SUPPORT
> +endif
> +sha1-ssse3-y := sha1_ssse3_asm.o sha1_ssse3_glue.o
> diff --git a/arch/x86/crypto/sha1_ssse3_asm.S 
> b/arch/x86/crypto/sha1_ssse3_asm.S
> new file mode 100644
> index 000..8fb0ba6


Thanks,
Mathias
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html