Re: [PATCH 2/3] sha512: reduce stack usage to safe number
On 1/16/12, Eric Dumazet eric.duma...@gmail.com wrote: Le lundi 16 janvier 2012 à 09:56 +, David Laight a écrit : Doesn't this badly overflow W[] .. +#define SHA512_0_15(i, a, b, c, d, e, f, g, h) \ + t1 = h + e1(e) + Ch(e, f, g) + sha512_K[i] + W[i]; \ ... + for (i = 0; i 16; i += 8) { ... + SHA512_0_15(i + 7, b, c, d, e, f, g, h, a); + } David No overflow since loop is done for only i=0 and i=8 By the way, I suspect previous code was chosen years ago because this version uses less stack but adds much more code bloat. I think W[80] was use because it's the most straightforward way to write this code by following spec. All SHA definitions have full message schedule pseudocoded before hash computation. size crypto/sha512_generic.o crypto/sha512_generic_old.o text data bss dec hex filename 17369 704 0 180734699 crypto/sha512_generic.o 8249 704 0895322f9 crypto/sha512_generic_old.o This is because SHA-512 is fundamentally 64-bit algorithm multiplied by excessive unrolling. Surprisingly, doing variable renaming by hand like in spec: t1 = ... t2 = ... h = g; g = f; f = e; e = d + T1; d = c; c = b; b = a; a = t1 + t2; bring stack space on i386 under control too. -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH 2/3] sha512: reduce stack usage to safe number
Doesn't this badly overflow W[] .. +#define SHA512_0_15(i, a, b, c, d, e, f, g, h) \ + t1 = h + e1(e) + Ch(e, f, g) + sha512_K[i] + W[i]; \ ... + for (i = 0; i 16; i += 8) { ... + SHA512_0_15(i + 7, b, c, d, e, f, g, h, a); + } David -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/3] sha512: reduce stack usage to safe number
On 1/16/12, David Laight david.lai...@aculab.com wrote: Doesn't this badly overflow W[] .. +#define SHA512_0_15(i, a, b, c, d, e, f, g, h) \ +t1 = h + e1(e) + Ch(e, f, g) + sha512_K[i] + W[i]; \ ... +for (i = 0; i 16; i += 8) { ... +SHA512_0_15(i + 7, b, c, d, e, f, g, h, a); +} No, why should it? i can be only 0 and 8. -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH 2/3] sha512: reduce stack usage to safe number
Le lundi 16 janvier 2012 à 09:56 +, David Laight a écrit : Doesn't this badly overflow W[] .. +#define SHA512_0_15(i, a, b, c, d, e, f, g, h) \ + t1 = h + e1(e) + Ch(e, f, g) + sha512_K[i] + W[i]; \ ... + for (i = 0; i 16; i += 8) { ... + SHA512_0_15(i + 7, b, c, d, e, f, g, h, a); + } David No overflow since loop is done for only i=0 and i=8 By the way, I suspect previous code was chosen years ago because this version uses less stack but adds much more code bloat. size crypto/sha512_generic.o crypto/sha512_generic_old.o textdata bss dec hex filename 17369 704 0 180734699 crypto/sha512_generic.o 8249 704 0895322f9 crypto/sha512_generic_old.o -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/3] sha512: reduce stack usage to safe number
On Sat, Jan 14, 2012 at 10:40 AM, Alexey Dobriyan adobri...@gmail.com wrote: Line by line explanation: * BLEND_OP array is circular now, all indexes have to be modulo 16. Round number is positive, so remainder operation should be without surprises. Don't use % except on unsigned values. Even if it's positive, if it's a signed number and the compiler doesn't *see* that it is absolutely positive, division is nontrivial. Even when you divide by a constant. For example, % 16 on an 'int' on x86-64 will generate movl%edi, %edx sarl$31, %edx shrl$28, %edx leal(%rdi,%rdx), %eax andl$15, %eax subl%edx, %eax in order to get the signed case right. The fact that the end result is correct for unsigned numbers is irrelevant: it's still stupid and slow. With an unsigned int, '% 16' will generate the obvious andl$15, %eax instead. Quite frankly, stop using division in the first place. Dividing by powers-of-two and expecting the compiler to fix things up is just stupid, *exactly* because of issues like these: you either have to think about it carefully, or the compiler may end up creating crap code. So just use 15 instead. That doesn't have these kinds of issues. It is a *good* thing when the C code is close to the end result you want to generate. It is *not* a good thing to write code that looks nothing like the end result and just expect the compiler to do the right thing. Even if the compiler does do the right thing, what was the advantage? Linus -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/3] sha512: reduce stack usage to safe number
On Sat, Jan 14, 2012 at 11:08:45AM -0800, Linus Torvalds wrote: On Sat, Jan 14, 2012 at 10:40 AM, Alexey Dobriyan adobri...@gmail.com wrote: Line by line explanation: * BLEND_OP array is circular now, all indexes have to be modulo 16. Round number is positive, so remainder operation should be without surprises. Don't use % except on unsigned values. Even if it's positive, if it's a signed number and the compiler doesn't *see* that it is absolutely positive, division is nontrivial. Even when you divide by a constant. For example, % 16 on an 'int' on x86-64 will generate movl%edi, %edx sarl$31, %edx shrl$28, %edx leal(%rdi,%rdx), %eax andl$15, %eax subl%edx, %eax in order to get the signed case right. The fact that the end result is correct for unsigned numbers is irrelevant: it's still stupid and slow. With an unsigned int, '% 16' will generate the obvious andl$15, %eax instead. Quite frankly, stop using division in the first place. Dividing by powers-of-two and expecting the compiler to fix things up is just stupid, *exactly* because of issues like these: you either have to think about it carefully, or the compiler may end up creating crap code. For the record, it generates andl $15 here. So just use 15 instead. That doesn't have these kinds of issues. It is a *good* thing when the C code is close to the end result you want to generate. It is *not* a good thing to write code that looks nothing like the end result and just expect the compiler to do the right thing. Even if the compiler does do the right thing, what was the advantage? Here is updated patch which explicitly uses (equally tested): --- For rounds 16--79, W[i] only depends on W[i - 2], W[i - 7], W[i - 15] and W[i - 16]. Consequently, keeping all W[80] array on stack is unnecessary, only 16 values are really needed. Using W[16] instead of W[80] greatly reduces stack usage (~750 bytes to ~340 bytes on x86_64). Line by line explanation: * BLEND_OP array is circular now, all indexes have to be modulo 16. * initial full message scheduling is trimmed to first 16 values which come from data block, the rest is calculated right before it's needed. * original loop body is unrolled version of new SHA512_0_15 and SHA512_16_79 macros, unrolling was done to not do explicit variable renaming. Otherwise it's the very same code after preprocessing. See sha1_transform() code which does the same trick. Patch survives in-tree crypto test and original bugreport test (ping flood with hmac(sha512). See FIPS 180-2 for SHA-512 definition http://csrc.nist.gov/publications/fips/fips180-2/fips180-2withchangenotice.pdf Signed-off-by: Alexey Dobriyan adobri...@gmail.com Cc: sta...@vger.kernel.org --- This is patch is for stable if 750 byte stack usage is not considered safe. crypto/sha512_generic.c | 58 1 file changed, 34 insertions(+), 24 deletions(-) --- a/crypto/sha512_generic.c +++ b/crypto/sha512_generic.c @@ -78,7 +78,7 @@ static inline void LOAD_OP(int I, u64 *W, const u8 *input) static inline void BLEND_OP(int I, u64 *W) { - W[I] = s1(W[I-2]) + W[I-7] + s0(W[I-15]) + W[I-16]; + W[I 15] += s1(W[(I-2) 15]) + W[(I-7) 15] + s0(W[(I-15) 15]); } static void @@ -87,38 +87,48 @@ sha512_transform(u64 *state, const u8 *input) u64 a, b, c, d, e, f, g, h, t1, t2; int i; - u64 W[80]; + u64 W[16]; /* load the input */ for (i = 0; i 16; i++) LOAD_OP(i, W, input); -for (i = 16; i 80; i++) { -BLEND_OP(i, W); -} - /* load the state into our registers */ a=state[0]; b=state[1]; c=state[2]; d=state[3]; e=state[4]; f=state[5]; g=state[6]; h=state[7]; - /* now iterate */ - for (i=0; i80; i+=8) { - t1 = h + e1(e) + Ch(e,f,g) + sha512_K[i ] + W[i ]; - t2 = e0(a) + Maj(a,b,c);d+=t1;h=t1+t2; - t1 = g + e1(d) + Ch(d,e,f) + sha512_K[i+1] + W[i+1]; - t2 = e0(h) + Maj(h,a,b);c+=t1;g=t1+t2; - t1 = f + e1(c) + Ch(c,d,e) + sha512_K[i+2] + W[i+2]; - t2 = e0(g) + Maj(g,h,a);b+=t1;f=t1+t2; - t1 = e + e1(b) + Ch(b,c,d) + sha512_K[i+3] + W[i+3]; - t2 = e0(f) + Maj(f,g,h);a+=t1;e=t1+t2; - t1 = d + e1(a) + Ch(a,b,c) + sha512_K[i+4] + W[i+4]; - t2 = e0(e) + Maj(e,f,g);h+=t1;d=t1+t2; - t1 = c + e1(h) + Ch(h,a,b) + sha512_K[i+5] + W[i+5]; - t2 = e0(d) + Maj(d,e,f);g+=t1;c=t1+t2; - t1 = b + e1(g) + Ch(g,h,a) + sha512_K[i+6] + W[i+6]; - t2 = e0(c) + Maj(c,d,e);f+=t1;b=t1+t2; - t1 = a + e1(f) + Ch(f,g,h) +
Re: [PATCH 2/3] sha512: reduce stack usage to safe number
On Sat, Jan 14, 2012 at 12:41 PM, Alexey Dobriyan adobri...@gmail.com wrote: For the record, it generates andl $15 here. Ok. That means that gcc was able to prove that it never had any signed values (which is certainly reasonable when you do things like for (i=0; iX;i++)). But it's better to simply not rely on gcc always getting details like this right. It's also better to use a model that simply doesn't even require you as a programmer to have to even *think* about signed values. It's easy to get % wrong by mistake (signed integer modulus didn't even use to have well-defined semantics in traditional C), and there is almost never any excuse for using it for powers-of-two. Here is updated patch which explicitly uses (equally tested): Thanks, Linus -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html