Re: [PATCH 4/3] sha512: reduce stack usage even on i386

2012-01-30 Thread Alexey Dobriyan
On 1/28/12, Herbert Xu herb...@gondor.apana.org.au wrote:
 On Fri, Jan 27, 2012 at 08:51:30PM +0300, Alexey Dobriyan wrote:

 I think this is because your tree contained %16 code instead if 15.
 Now that it contains 15 it should become applicable.

 OK.

 --
 [PATCH] sha512: reduce stack usage even on i386

 Can you try the approach that git takes with using asm to read
 and write W (see previous email from Linus in respone to my push
 request)? As it stands your patch is simply relying on gcc's
 ability to optimise.  At least with asm volatile we know that
 gcc will leave it alone.

For some reason it doesn't. :-( I've also tried full barriers.

With this patch, stack usage is still ~900 bytes.

diff --git a/crypto/sha512_generic.c b/crypto/sha512_generic.c
index dd0439d..35e7ae7 100644
--- a/crypto/sha512_generic.c
+++ b/crypto/sha512_generic.c
@@ -66,16 +66,6 @@ static const u64 sha512_K[80] = {
 #define s0(x)   (ror64(x, 1) ^ ror64(x, 8) ^ (x  7))
 #define s1(x)   (ror64(x,19) ^ ror64(x,61) ^ (x  6))

-static inline void LOAD_OP(int I, u64 *W, const u8 *input)
-{
-   W[I] = __be64_to_cpu( ((__be64*)(input))[I] );
-}
-
-static inline void BLEND_OP(int I, u64 *W)
-{
-   W[I  15] += s1(W[(I-2)  15]) + W[(I-7)  15] + s0(W[(I-15)  15]);
-}
-
 static void
 sha512_transform(u64 *state, const u8 *input)
 {
@@ -84,26 +74,29 @@ sha512_transform(u64 *state, const u8 *input)
int i;
u64 W[16];

-   /* load the input */
-for (i = 0; i  16; i++)
-LOAD_OP(i, W, input);
-
/* load the state into our registers */
a=state[0];   b=state[1];   c=state[2];   d=state[3];
e=state[4];   f=state[5];   g=state[6];   h=state[7];

 #define SHA512_0_15(i, a, b, c, d, e, f, g, h) \
-   t1 = h + e1(e) + Ch(e, f, g) + sha512_K[i] + W[i];  \
+   {   \
+   u64 tmp = be64_to_cpu(*((__be64 *)input + (i)));\
+   *(volatile u64 *)W[i] = tmp;   \
+   t1 = h + e1(e) + Ch(e, f, g) + sha512_K[i] + tmp;   \
t2 = e0(a) + Maj(a, b, c);  \
d += t1;\
-   h = t1 + t2
+   h = t1 + t2;\
+   }

 #define SHA512_16_79(i, a, b, c, d, e, f, g, h)\
-   BLEND_OP(i, W); \
-   t1 = h + e1(e) + Ch(e, f, g) + sha512_K[i] + W[(i)15];\
+   {   \
+   u64 tmp = W[(i)  15] + s1(W[(i-2)  15]) + W[(i-7)  15] +
s0(W[(i-15)  15]);\
+   *(volatile u64 *)W[(i)  15] = tmp;\
+   t1 = h + e1(e) + Ch(e, f, g) + sha512_K[i] + tmp;   \
t2 = e0(a) + Maj(a, b, c);  \
d += t1;\
-   h = t1 + t2
+   h = t1 + t2;\
+   }

for (i = 0; i  16; i += 8) {
SHA512_0_15(i, a, b, c, d, e, f, g, h);
--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/3] sha512: reduce stack usage even on i386

2012-01-27 Thread Alexey Dobriyan
On Thu, Jan 26, 2012 at 01:35:02PM +1100, Herbert Xu wrote:
 On Wed, Jan 18, 2012 at 09:02:10PM +0300, Alexey Dobriyan wrote:
  Fix still excessive stack usage on i386.
  
  There is too much loop unrolling going on, despite W[16] being used,
  gcc screws up this for some reason. So, don't be smart, use simple code
  from SHA-512 definition, this keeps code size _and_ stack usage back
  under control even on i386:
  
  -14b:   81 ec 9c 03 00 00   sub$0x39c,%esp
  +149:   81 ec 64 01 00 00   sub$0x164,%esp
  
  $ size ../sha512_generic-i386-00*
 textdata bss dec hex filename
15521 712   0   162333f69 ../sha512_generic-i386-000.o
 4225 712   049371349 ../sha512_generic-i386-001.o
  
  Signed-off-by: Alexey Dobriyan adobri...@gmail.com
  Cc: sta...@vger.kernel.org
 
 Hmm, your patch doesn't apply against my crypto tree.  Please
 regenerate.

I think this is because your tree contained %16 code instead if 15.
Now that it contains 15 it should become applicable.

Anyway.
--
[PATCH] sha512: reduce stack usage even on i386

Fix still excessive stack usage on i386.

There is too much loop unrolling going on, despite W[16] being used,
gcc screws up this for some reason. So, don't be smart, use simple code
from SHA-512 definition, this keeps code size _and_ stack usage back
under control even on i386:

-14b:   81 ec 9c 03 00 00   sub$0x39c,%esp
+149:   81 ec 64 01 00 00   sub$0x164,%esp

$ size ../sha512_generic-i386-00*
   textdata bss dec hex filename
  15521 712   0   162333f69 ../sha512_generic-i386-000.o
   4225 712   049371349 ../sha512_generic-i386-001.o

Signed-off-by: Alexey Dobriyan adobri...@gmail.com
Cc: sta...@vger.kernel.org
---

 crypto/sha512_generic.c |   42 --
 1 file changed, 20 insertions(+), 22 deletions(-)

--- a/crypto/sha512_generic.c
+++ b/crypto/sha512_generic.c
@@ -100,35 +100,33 @@ sha512_transform(u64 *state, const u8 *input)
 #define SHA512_0_15(i, a, b, c, d, e, f, g, h) \
t1 = h + e1(e) + Ch(e, f, g) + sha512_K[i] + W[i];  \
t2 = e0(a) + Maj(a, b, c);  \
-   d += t1;\
-   h = t1 + t2
+   h = g;  \
+   g = f;  \
+   f = e;  \
+   e = d + t1; \
+   d = c;  \
+   c = b;  \
+   b = a;  \
+   a = t1 + t2
 
 #define SHA512_16_79(i, a, b, c, d, e, f, g, h)\
BLEND_OP(i, W); \
-   t1 = h + e1(e) + Ch(e, f, g) + sha512_K[i] + W[(i)15]; \
+   t1 = h + e1(e) + Ch(e, f, g) + sha512_K[i] + W[i  15]; \
t2 = e0(a) + Maj(a, b, c);  \
-   d += t1;\
-   h = t1 + t2
-
-   for (i = 0; i  16; i += 8) {
+   h = g;  \
+   g = f;  \
+   f = e;  \
+   e = d + t1; \
+   d = c;  \
+   c = b;  \
+   b = a;  \
+   a = t1 + t2
+
+   for (i = 0; i  16; i++) {
SHA512_0_15(i, a, b, c, d, e, f, g, h);
-   SHA512_0_15(i + 1, h, a, b, c, d, e, f, g);
-   SHA512_0_15(i + 2, g, h, a, b, c, d, e, f);
-   SHA512_0_15(i + 3, f, g, h, a, b, c, d, e);
-   SHA512_0_15(i + 4, e, f, g, h, a, b, c, d);
-   SHA512_0_15(i + 5, d, e, f, g, h, a, b, c);
-   SHA512_0_15(i + 6, c, d, e, f, g, h, a, b);
-   SHA512_0_15(i + 7, b, c, d, e, f, g, h, a);
}
-   for (i = 16; i  80; i += 8) {
+   for (i = 16; i  80; i++) {
SHA512_16_79(i, a, b, c, d, e, f, g, h);
-   SHA512_16_79(i + 1, h, a, b, c, d, e, f, g);
-   SHA512_16_79(i + 2, g, h, a, b, c, d, e, f);
-   SHA512_16_79(i + 3, f, g, h, a, b, c, d, e);
-   SHA512_16_79(i + 4, e, f, g, h, a, b, c, d);
-   SHA512_16_79(i + 5, d, e, f, g, h, a, b, c);
-   SHA512_16_79(i + 6, c, d, e, f, g, h, a, b);
-   SHA512_16_79(i + 7, b, c, d, e, f, g, h, a);
}
 

Re: [PATCH 4/3] sha512: reduce stack usage even on i386

2012-01-27 Thread Herbert Xu
On Fri, Jan 27, 2012 at 08:51:30PM +0300, Alexey Dobriyan wrote:

 I think this is because your tree contained %16 code instead if 15.
 Now that it contains 15 it should become applicable.

OK.

 --
 [PATCH] sha512: reduce stack usage even on i386

Can you try the approach that git takes with using asm to read
and write W (see previous email from Linus in respone to my push
request)? As it stands your patch is simply relying on gcc's
ability to optimise.  At least with asm volatile we know that
gcc will leave it alone.

Thanks,
-- 
Email: Herbert Xu herb...@gondor.apana.org.au
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/3] sha512: reduce stack usage even on i386

2012-01-25 Thread Herbert Xu
On Wed, Jan 18, 2012 at 09:02:10PM +0300, Alexey Dobriyan wrote:
 Fix still excessive stack usage on i386.
 
 There is too much loop unrolling going on, despite W[16] being used,
 gcc screws up this for some reason. So, don't be smart, use simple code
 from SHA-512 definition, this keeps code size _and_ stack usage back
 under control even on i386:
 
   -14b:   81 ec 9c 03 00 00   sub$0x39c,%esp
   +149:   81 ec 64 01 00 00   sub$0x164,%esp
 
   $ size ../sha512_generic-i386-00*
  textdata bss dec hex filename
 15521 712   0   162333f69 ../sha512_generic-i386-000.o
  4225 712   049371349 ../sha512_generic-i386-001.o
 
 Signed-off-by: Alexey Dobriyan adobri...@gmail.com
 Cc: sta...@vger.kernel.org

Hmm, your patch doesn't apply against my crypto tree.  Please
regenerate.

Thanks,
-- 
Email: Herbert Xu herb...@gondor.apana.org.au
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html