Re: [PATCH] lib/sha1: use the git implementation of SHA-1

2011-08-07 Thread Nicolas Pitre
On Sun, 7 Aug 2011, Linus Torvalds wrote:

> On Sun, Aug 7, 2011 at 1:48 PM, Joachim  Eastwood  wrote:
> >
> > yes, this works. At least my board boots as normal.
> 
> Ok, I'll remove it for -rc1, just to have a working ARM setup. Maybe
> we can re-introduce it later (either together with some arm-specific
> hack for SHA_WORKSPACE_WORDS or by having an arm-optimized version of
> the *good* sha1 routine).
> 
> But I doubt it: there used to be an ARM-optimized thing in git too. It
> was removed two years ago with the commit message:
> 
> remove ARM and Mozilla SHA1 implementations
> 
> They are both slower than the new BLK_SHA1 implementation, so it is
> pointless to keep them around.
> 
> and quite frankly, that removed code seems to be the same as the
> in-kernel one. So I bet the ARM "optimized" SHA1 is simply not worth
> keeping around.

Indeed.  The ARM code in the kernel is mine:

|commit c09f98271f685af349d3f0199360f1c0e85550e0
|Author: Nicolas Pitre 
|Date:   Fri Oct 28 15:26:40 2005 +0100
|
|[ARM] 2930/1: optimized sha1 implementation for ARM
|
|Patch from Nicolas Pitre
|
|Here's an ARM assembly SHA1 implementation to replace the default C
|version. It is approximately 50% faster than the generic C version. On
|an XScale processor running at 400MHz:
|generic C version:  9.8 MB/s
|my version: 14.5 MB/s
[...]

I eventually added it to Git as well.  After a while I worked on the C 
version in Git:

|$ git log --reverse --oneline --author="Nicolas Pitre" git/block-sha1/
|30ba0de block-sha1: move code around
|dc52fd2 block-sha1: split the different "hacks" to be individually selected
|660231a block-sha1: support for architectures with memory alignment 
restrictions
|ee7dc31 block-sha1: more good unaligned memory access candidates
|d5f6a96 block-sha1: make the size member first in the context struct
|51ea551 make sure byte swapping is optimal for git
|e9c5dcd block-sha1: guard gcc extensions with __GNUC__
|30ae47b remove ARM and Mozilla SHA1 implementations

I therefore think that the ARM version in the kernel should also be 
removed if the Git C version goes in.


Nicolas
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] lib/sha1: use the git implementation of SHA-1

2011-08-07 Thread Linus Torvalds
On Sun, Aug 7, 2011 at 1:48 PM, Joachim  Eastwood  wrote:
>
> yes, this works. At least my board boots as normal.

Ok, I'll remove it for -rc1, just to have a working ARM setup. Maybe
we can re-introduce it later (either together with some arm-specific
hack for SHA_WORKSPACE_WORDS or by having an arm-optimized version of
the *good* sha1 routine).

But I doubt it: there used to be an ARM-optimized thing in git too. It
was removed two years ago with the commit message:

remove ARM and Mozilla SHA1 implementations

They are both slower than the new BLK_SHA1 implementation, so it is
pointless to keep them around.

and quite frankly, that removed code seems to be the same as the
in-kernel one. So I bet the ARM "optimized" SHA1 is simply not worth
keeping around.

  Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] lib/sha1: use the git implementation of SHA-1

2011-08-07 Thread Joachim Eastwood
On Sun, Aug 7, 2011 at 10:17 PM, Linus Torvalds
 wrote:
> On Sun, Aug 7, 2011 at 12:47 PM, Andreas Schwab  wrote:
>>
>> ARM has its own implementation of sha_transform in arch/arm/lib/sha1.S,
>> which assumes SHA_WORKSPACE_WORDS is 80.
>
> Well, that certainly explains it.
>
> I wonder if that thing is worth it. It seems to be based on the bad
> slow version of sha1, so I suspect that the biggest advantage of it
> may the byte-swapping being done more efficiently. The ARM version of
> "get_unaligned_be32()" is potentially pretty bad.
>
> Joachim, does it all work for you if you just remove 'sha1.o' from
> lib-y in arch/arm/lib/Makefile?

yes, this works. At least my board boots as normal.

regards
Joachim Eastwood

> Nico (now with corrected email address): is that ARM-optimized asm
> really worth it? Compared to the git C implementation?
>
>                        Linus
>
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] lib/sha1: use the git implementation of SHA-1

2011-08-07 Thread Linus Torvalds
On Sun, Aug 7, 2011 at 12:47 PM, Andreas Schwab  wrote:
>
> ARM has its own implementation of sha_transform in arch/arm/lib/sha1.S,
> which assumes SHA_WORKSPACE_WORDS is 80.

Well, that certainly explains it.

I wonder if that thing is worth it. It seems to be based on the bad
slow version of sha1, so I suspect that the biggest advantage of it
may the byte-swapping being done more efficiently. The ARM version of
"get_unaligned_be32()" is potentially pretty bad.

Joachim, does it all work for you if you just remove 'sha1.o' from
lib-y in arch/arm/lib/Makefile?

Nico (now with corrected email address): is that ARM-optimized asm
really worth it? Compared to the git C implementation?

Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] lib/sha1: use the git implementation of SHA-1

2011-08-07 Thread Andreas Schwab
Joachim Eastwood  writes:

> On Sun, Aug 7, 2011 at 8:38 PM, Linus Torvalds
>  wrote:
>> There aren't many users of that define, could you just turn it back to the 
>> proper 16, and then try changing it to 80 in each place that uses it?
>>
>> That way we'd see exactly *which* use is the buggy one..
>
> Its drivers/char/random.c.

ARM has its own implementation of sha_transform in arch/arm/lib/sha1.S,
which assumes SHA_WORKSPACE_WORDS is 80.

Andreas.

-- 
Andreas Schwab, sch...@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] lib/sha1: use the git implementation of SHA-1

2011-08-07 Thread Joachim Eastwood
On Sun, Aug 7, 2011 at 8:38 PM, Linus Torvalds
 wrote:
> There aren't many users of that define, could you just turn it back to the 
> proper 16, and then try changing it to 80 in each place that uses it?
>
> That way we'd see exactly *which* use is the buggy one..

Its drivers/char/random.c.

Boots fine when forcing workspace array in extract_buf to 80.

diff --git a/drivers/char/random.c b/drivers/char/random.c
index c35a785..0584bb0 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -816,7 +816,7 @@ static size_t account(struct entropy_store *r,
size_t nbytes, int min,
 static void extract_buf(struct entropy_store *r, __u8 *out)
 {
int i;
-   __u32 hash[5], workspace[SHA_WORKSPACE_WORDS];
+   __u32 hash[5], workspace[80 /*SHA_WORKSPACE_WORDS*/];
__u8 extract[64];

/* Generate a hash across the pool, 16 words (512 bits) at a time */

regards
Joachim Eastwood


>        Linus
>
> Joachim  Eastwood  wrote:
>
>>On Sun, Aug 7, 2011 at 7:44 PM, Linus Torvalds
>> wrote:
>>> On Sun, Aug 7, 2011 at 10:36 AM, Joachim  Eastwood
>> wrote:

 These printk's come from drivers/char/random.c
 So it doesn't seem like it hangs in any of the sha_* funtions.
>>>
>>> The only other change is to SHA_WORKSPACE_WORDS - I wonder if some
>>> code depends on the old (much bigger) workspace for some reason?
>>>
>>> The git SHA1 routines are way smarter than the old SHA1, and will
>>> re-use the workspace area, so they need only a fraction of the old
>>> area.
>>>
>>> Try changing SHA_WORKSPACE_WORDS back to 80 (in
>>> include/linux/cryptohash.h). The git sha1 only needs 16 words, but ..
>>
>>yup, setting it to 80 makes my kernel boot again :-)
>>
>>> If that fixes it for you, then it's almost certainly some buggy user
>>> that uses the SHA1 workspace array for its own odd case, and
>>> incorrectly "knows" that it's that old wasteful 320 bytes. There's a
>>> few places in networking that uses SHA_WORKSPACE_WORDS.
>>
>>Guess more architectures than ARM are affected by this then.
>>
>>regards
>>Joachim Eastwood
>>
>>>                                     Linus
>>>
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] lib/sha1: use the git implementation of SHA-1

2011-08-07 Thread Linus Torvalds
There aren't many users of that define, could you just turn it back to the 
proper 16, and then try changing it to 80 in each place that uses it?

That way we'd see exactly *which* use is the buggy one..

Linus

Joachim  Eastwood  wrote:

>On Sun, Aug 7, 2011 at 7:44 PM, Linus Torvalds
> wrote:
>> On Sun, Aug 7, 2011 at 10:36 AM, Joachim  Eastwood
> wrote:
>>>
>>> These printk's come from drivers/char/random.c
>>> So it doesn't seem like it hangs in any of the sha_* funtions.
>>
>> The only other change is to SHA_WORKSPACE_WORDS - I wonder if some
>> code depends on the old (much bigger) workspace for some reason?
>>
>> The git SHA1 routines are way smarter than the old SHA1, and will
>> re-use the workspace area, so they need only a fraction of the old
>> area.
>>
>> Try changing SHA_WORKSPACE_WORDS back to 80 (in
>> include/linux/cryptohash.h). The git sha1 only needs 16 words, but ..
>
>yup, setting it to 80 makes my kernel boot again :-)
>
>> If that fixes it for you, then it's almost certainly some buggy user
>> that uses the SHA1 workspace array for its own odd case, and
>> incorrectly "knows" that it's that old wasteful 320 bytes. There's a
>> few places in networking that uses SHA_WORKSPACE_WORDS.
>
>Guess more architectures than ARM are affected by this then.
>
>regards
>Joachim Eastwood
>
>>                                     Linus
>>

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] lib/sha1: use the git implementation of SHA-1

2011-08-07 Thread Joachim Eastwood
On Sun, Aug 7, 2011 at 7:44 PM, Linus Torvalds
 wrote:
> On Sun, Aug 7, 2011 at 10:36 AM, Joachim  Eastwood  wrote:
>>
>> These printk's come from drivers/char/random.c
>> So it doesn't seem like it hangs in any of the sha_* funtions.
>
> The only other change is to SHA_WORKSPACE_WORDS - I wonder if some
> code depends on the old (much bigger) workspace for some reason?
>
> The git SHA1 routines are way smarter than the old SHA1, and will
> re-use the workspace area, so they need only a fraction of the old
> area.
>
> Try changing SHA_WORKSPACE_WORDS back to 80 (in
> include/linux/cryptohash.h). The git sha1 only needs 16 words, but ..

yup, setting it to 80 makes my kernel boot again :-)

> If that fixes it for you, then it's almost certainly some buggy user
> that uses the SHA1 workspace array for its own odd case, and
> incorrectly "knows" that it's that old wasteful 320 bytes. There's a
> few places in networking that uses SHA_WORKSPACE_WORDS.

Guess more architectures than ARM are affected by this then.

regards
Joachim Eastwood

>                                     Linus
>
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] lib/sha1: use the git implementation of SHA-1

2011-08-07 Thread Linus Torvalds
On Sun, Aug 7, 2011 at 10:36 AM, Joachim  Eastwood  wrote:
>
> These printk's come from drivers/char/random.c
> So it doesn't seem like it hangs in any of the sha_* funtions.

The only other change is to SHA_WORKSPACE_WORDS - I wonder if some
code depends on the old (much bigger) workspace for some reason?

The git SHA1 routines are way smarter than the old SHA1, and will
re-use the workspace area, so they need only a fraction of the old
area.

Try changing SHA_WORKSPACE_WORDS back to 80 (in
include/linux/cryptohash.h). The git sha1 only needs 16 words, but ..

If that fixes it for you, then it's almost certainly some buggy user
that uses the SHA1 workspace array for its own odd case, and
incorrectly "knows" that it's that old wasteful 320 bytes. There's a
few places in networking that uses SHA_WORKSPACE_WORDS.

 Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] lib/sha1: use the git implementation of SHA-1

2011-08-07 Thread Joachim Eastwood
On Sun, Aug 7, 2011 at 6:52 PM, Linus Torvalds
 wrote:
> On Sun, Aug 7, 2011 at 4:54 AM, Joachim  Eastwood  wrote:
>>
>> I see some ARM asm in your patch, maybe this is the cause?
>
> No, it's just a barrier to make sure the compiler doesn't do crazy
> things, no actual asm instructions involved.
>
> That code is quite well tested in git, so I'm surprised it has any
> problems on arm. It also has zero loops, a hang sounds odd. Can you
> get some more debug information out of it (for example, where it hangs
> - maybe "initcall_debug=1" on the kernel command line?

initcall_debug=1 didn't do anything to the boot log.

But I add some printk's around the calls to sha_init and sha_transform.
...
NET: Registered protocol family 2
extract_buf: call sha_init
extract_buf: call sha_init done
extract_buf: call sha_transform
extract_buf: call sha_transform done
extract_buf: call sha_transform
extract_buf: call sha_transform done
extract_buf: function exit

These printk's come from drivers/char/random.c
So it doesn't seem like it hangs in any of the sha_* funtions.

But I never see any of the printk's I added to net/ipv4/syncookies.c
or net/ipv4/tcp_output.c.

btw, my compiler is: arm-angstrom-linux-gnueabi-gcc (GCC) 4.3.3

regards
Joachim Eastwood

> The biggest difference with the git sources is the slightly different
> calling conventions (passing the workspace array as an argument is bad
> for code generation, btw - since now gcc doesn't see that the
> workspace accesses are dead) and the fact that the kernel version uses
> kernel macros like "get_unaligned_be32()" rather than it's own
> implementation.
>
>                                      Linus
>
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] lib/sha1: use the git implementation of SHA-1

2011-08-07 Thread Linus Torvalds
On Sun, Aug 7, 2011 at 4:54 AM, Joachim  Eastwood  wrote:
>
> I see some ARM asm in your patch, maybe this is the cause?

No, it's just a barrier to make sure the compiler doesn't do crazy
things, no actual asm instructions involved.

That code is quite well tested in git, so I'm surprised it has any
problems on arm. It also has zero loops, a hang sounds odd. Can you
get some more debug information out of it (for example, where it hangs
- maybe "initcall_debug=1" on the kernel command line?

The biggest difference with the git sources is the slightly different
calling conventions (passing the workspace array as an argument is bad
for code generation, btw - since now gcc doesn't see that the
workspace accesses are dead) and the fact that the kernel version uses
kernel macros like "get_unaligned_be32()" rather than it's own
implementation.

  Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] lib/sha1: use the git implementation of SHA-1

2011-08-07 Thread Joachim Eastwood
On Sat, Aug 6, 2011 at 3:46 AM, Mandeep Singh Baines  wrote:
> For ChromiumOS, we use SHA-1 to verify the integrity of the root
> filesystem. The speed of the kernel sha-1 implementation has a major
> impact on our boot performance.
>
> To improve boot performance, we investigated using the heavily
> optimized sha-1 implementation used in git. With the git
> sha-1 implementation, we see a 11.7% improvement in boot time.
>
> 10 reboots, remove slowest/fastest.
>
> Before:
>
> Mean: 6.58 seconds Stdev: 0.14
>
> After (with git sha-1, this patch):
>
> Mean: 5.89 seconds Stdev: 0.07
>
> The other cool thing about the git SHA-1 implementation is that
> it only needs 64 bytes of stack for the workspace while the original
> kernel implementation needed 320 bytes.
>
> Signed-off-by: Mandeep Singh Baines 
> CC: Ramsay Jones 
> CC: Nicolas Pitre 
> CC: Linus Torvalds 
> CC: Herbert Xu 
> CC: David S. Miller 
> CC: linux-crypto@vger.kernel.org
> ---
>  include/linux/cryptohash.h |    2 +-
>  lib/sha1.c                 |  212 ---
>  2 files changed, 159 insertions(+), 55 deletions(-)
>
> diff --git a/include/linux/cryptohash.h b/include/linux/cryptohash.h
> index ec78a4b..f945218 100644
> --- a/include/linux/cryptohash.h
> +++ b/include/linux/cryptohash.h
> @@ -3,7 +3,7 @@
>
>  #define SHA_DIGEST_WORDS 5
>  #define SHA_MESSAGE_BYTES (512 /*bits*/ / 8)
> -#define SHA_WORKSPACE_WORDS 80
> +#define SHA_WORKSPACE_WORDS 16
>
>  void sha_init(__u32 *buf);
>  void sha_transform(__u32 *digest, const char *data, __u32 *W);
> diff --git a/lib/sha1.c b/lib/sha1.c
> index 4c45fd5..f33271d 100644
> --- a/lib/sha1.c
> +++ b/lib/sha1.c
> @@ -1,31 +1,72 @@
>  /*
> - * SHA transform algorithm, originally taken from code written by
> - * Peter Gutmann, and placed in the public domain.
> + * SHA1 routine optimized to do word accesses rather than byte accesses,
> + * and to avoid unnecessary copies into the context array.
> + *
> + * This was based on the git SHA1 implementation.
>  */
>
>  #include 
>  #include 
> -#include 
> +#include 
> +#include 
>
> -/* The SHA f()-functions.  */
> +/*
> + * If you have 32 registers or more, the compiler can (and should)
> + * try to change the array[] accesses into registers. However, on
> + * machines with less than ~25 registers, that won't really work,
> + * and at least gcc will make an unholy mess of it.
> + *
> + * So to avoid that mess which just slows things down, we force
> + * the stores to memory to actually happen (we might be better off
> + * with a 'W(t)=(val);asm("":"+m" (W(t))' there instead, as
> + * suggested by Artur Skawina - that will also make gcc unable to
> + * try to do the silly "optimize away loads" part because it won't
> + * see what the value will be).
> + *
> + * Ben Herrenschmidt reports that on PPC, the C version comes close
> + * to the optimized asm with this (ie on PPC you don't want that
> + * 'volatile', since there are lots of registers).
> + *
> + * On ARM we get the best code generation by forcing a full memory barrier
> + * between each SHA_ROUND, otherwise gcc happily get wild with spilling and
> + * the stack frame size simply explode and performance goes down the drain.
> + */
>
> -#define f1(x,y,z)   (z ^ (x & (y ^ z)))                /* x ? y : z */
> -#define f2(x,y,z)   (x ^ y ^ z)                        /* XOR */
> -#define f3(x,y,z)   ((x & y) + (z & (x ^ y)))  /* majority */
> +#ifdef CONFIG_X86
> +  #define setW(x, val) (*(volatile __u32 *)&W(x) = (val))
> +#elif defined(CONFIG_ARM)
> +  #define setW(x, val) do { W(x) = (val); __asm__("":::"memory"); } while (0)
> +#else
> +  #define setW(x, val) (W(x) = (val))
> +#endif
>
> -/* The SHA Mysterious Constants */
> +/* This "rolls" over the 512-bit array */
> +#define W(x) (array[(x)&15])
>
> -#define K1  0x5A827999L                        /* Rounds  0-19: sqrt(2) * 
> 2^30 */
> -#define K2  0x6ED9EBA1L                        /* Rounds 20-39: sqrt(3) * 
> 2^30 */
> -#define K3  0x8F1BBCDCL                        /* Rounds 40-59: sqrt(5) * 
> 2^30 */
> -#define K4  0xCA62C1D6L                        /* Rounds 60-79: sqrt(10) * 
> 2^30 */
> +/*
> + * Where do we get the source from? The first 16 iterations get it from
> + * the input data, the next mix it from the 512-bit array.
> + */
> +#define SHA_SRC(t) get_unaligned_be32((__u32 *)data + t)
> +#define SHA_MIX(t) rol32(W(t+13) ^ W(t+8) ^ W(t+2) ^ W(t), 1)
> +
> +#define SHA_ROUND(t, input, fn, constant, A, B, C, D, E) do { \
> +       __u32 TEMP = input(t); setW(t, TEMP); \
> +       E += TEMP + rol32(A,5) + (fn) + (constant); \
> +       B = ror32(B, 2); } while (0)
> +
> +#define T_0_15(t, A, B, C, D, E)  SHA_ROUND(t, SHA_SRC, (((C^D)&B)^D) , 
> 0x5a827999, A, B, C, D, E )
> +#define T_16_19(t, A, B, C, D, E) SHA_ROUND(t, SHA_MIX, (((C^D)&B)^D) , 
> 0x5a827999, A, B, C, D, E )
> +#define T_20_39(t, A, B, C, D, E) SHA_ROUND(t, SHA_MIX, (B^C^D) , 
> 0x6ed9eba1, A, B, C, D, E )
> +#define T_40_59(t, A, B, C,