Re: sha-512...
On Wed, Feb 15, 2012 at 12:23:52AM -0500, David Miller wrote: From: Herbert Xu herb...@gondor.hengli.com.au Date: Wed, 15 Feb 2012 16:16:08 +1100 OK, so we grew by 1136 - 888 = 248. Keep in mind that 128 of that is expected since we moved W onto the stack. Right. I guess we could go back to the percpu solution, what do you think? I'm not entirely sure, we might have to. sha512 is notorious for generating terrible code with gcc on 32-bit targets, so... The sha512 test in the glibc testsuite tends to timeout on 32-bit sparc. :-) Cherrypicking ror64() commit largely fixes the issue (on sparc-defconfig): sha512_transform: 0: 9d e3 bc 78 save %sp, -904, %sp git://git.kernel.org/pub/scm/linux/kernel/git/herbert/cryptodev-2.6.git b85a088f15f2070b7180735a231012843a5ac96c crypto: sha512 - use standard ror64() -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: sha-512...
From: Alexey Dobriyan adobri...@gmail.com Date: Wed, 15 Feb 2012 22:27:52 +0300 On Wed, Feb 15, 2012 at 12:23:52AM -0500, David Miller wrote: From: Herbert Xu herb...@gondor.hengli.com.au Date: Wed, 15 Feb 2012 16:16:08 +1100 OK, so we grew by 1136 - 888 = 248. Keep in mind that 128 of that is expected since we moved W onto the stack. Right. I guess we could go back to the percpu solution, what do you think? I'm not entirely sure, we might have to. sha512 is notorious for generating terrible code with gcc on 32-bit targets, so... The sha512 test in the glibc testsuite tends to timeout on 32-bit sparc. :-) Cherrypicking ror64() commit largely fixes the issue (on sparc-defconfig): sha512_transform: 0: 9d e3 bc 78 save %sp, -904, %sp git://git.kernel.org/pub/scm/linux/kernel/git/herbert/cryptodev-2.6.git b85a088f15f2070b7180735a231012843a5ac96c crypto: sha512 - use standard ror64() I'm happy with a solution that involves pushing this change to Linus's tree, it's pretty clear why it helps so much although I'm disappointed that gcc can't se that the u64 shift argument passed in is always a constant and therefore way within the range of a 32-bit value, ho hum :-) In fact, in my tree, this change brings the stack allocation instruction down to: save%sp, -824, %sp ! which is actually BETTER than what the old per-cpu code got: save%sp, -984, %sp ! Therefore I highly recommend we apply that ror() change to Linus's tree now. :-) -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: sha-512...
On Wed, Feb 15, 2012 at 04:00:10PM -0500, David Miller wrote: In fact, in my tree, this change brings the stack allocation instruction down to: save%sp, -824, %sp ! which is actually BETTER than what the old per-cpu code got: save%sp, -984, %sp ! Therefore I highly recommend we apply that ror() change to Linus's tree now. :-) Great, I'll push that out today. Thanks,! -- Email: Herbert Xu herb...@gondor.apana.org.au Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: sha-512...
On Tue, Feb 14, 2012 at 10:58:33PM -0500, David Miller wrote: FYI, I just started seeing this on sparc32 after all those sha512 optimizations: crypto/sha512_generic.c: In function 'sha512_transform': crypto/sha512_generic.c:135:1: warning: the frame size of 1136 bytes is larger than 1024 bytes [-Wframe-larger-than=] Is that with the latest patch applied? crypto: sha512 - Avoid stack bloat on i386 If so then this is not good. What was the original stack usage, i.e., if you revert to the original percpu code? Thanks, -- Email: Herbert Xu herb...@gondor.apana.org.au Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: sha-512...
From: Herbert Xu herb...@gondor.hengli.com.au Date: Wed, 15 Feb 2012 15:01:28 +1100 On Tue, Feb 14, 2012 at 10:58:33PM -0500, David Miller wrote: FYI, I just started seeing this on sparc32 after all those sha512 optimizations: crypto/sha512_generic.c: In function 'sha512_transform': crypto/sha512_generic.c:135:1: warning: the frame size of 1136 bytes is larger than 1024 bytes [-Wframe-larger-than=] Is that with the latest patch applied? crypto: sha512 - Avoid stack bloat on i386 If so then this is not good. Yes. And, of course, with that commit reverted it's even worse. Reverting it makes the stack frame twice as large. What was the original stack usage, i.e., if you revert to the original percpu code? If I revert: commit 3a92d687c8015860a19213e3c102cad6b722f83c commit 58d7d18b5268febb8b1391c6dffc8e2aaa751fcd commit 51fc6dc8f948047364f7d42a4ed89b416c6cc0a3 commit 84e31fdb7c797a7303e0cc295cb9bc8b73fb872d the stackframe goes down to 888 bytes. More detailed, the progression is: master 1136 revert 3a92d687c8015860a19213e3c102cad6b722f83c 2408 revert 58d7d18b5268febb8b1391c6dffc8e2aaa751fcd 2408 revert 51fc6dc8f948047364f7d42a4ed89b416c6cc0a3 1520 revert 84e31fdb7c797a7303e0cc295cb9bc8b73fb872d 888 -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: sha-512...
On Wed, Feb 15, 2012 at 12:11:13AM -0500, David Miller wrote: On Tue, Feb 14, 2012 at 10:58:33PM -0500, David Miller wrote: FYI, I just started seeing this on sparc32 after all those sha512 optimizations: crypto/sha512_generic.c: In function 'sha512_transform': crypto/sha512_generic.c:135:1: warning: the frame size of 1136 bytes is larger than 1024 bytes [-Wframe-larger-than=] Is that with the latest patch applied? crypto: sha512 - Avoid stack bloat on i386 If so then this is not good. Yes. And, of course, with that commit reverted it's even worse. Reverting it makes the stack frame twice as large. What was the original stack usage, i.e., if you revert to the original percpu code? If I revert: commit 3a92d687c8015860a19213e3c102cad6b722f83c commit 58d7d18b5268febb8b1391c6dffc8e2aaa751fcd commit 51fc6dc8f948047364f7d42a4ed89b416c6cc0a3 commit 84e31fdb7c797a7303e0cc295cb9bc8b73fb872d the stackframe goes down to 888 bytes. More detailed, the progression is: master1136 revert 3a92d687c8015860a19213e3c102cad6b722f83c 2408 revert 58d7d18b5268febb8b1391c6dffc8e2aaa751fcd 2408 revert 51fc6dc8f948047364f7d42a4ed89b416c6cc0a3 1520 revert 84e31fdb7c797a7303e0cc295cb9bc8b73fb872d 888 OK, so we grew by 1136 - 888 = 248. Keep in mind that 128 of that is expected since we moved W onto the stack. I guess we could go back to the percpu solution, what do you think? Cheers, -- Email: Herbert Xu herb...@gondor.apana.org.au Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: sha-512...
From: Herbert Xu herb...@gondor.hengli.com.au Date: Wed, 15 Feb 2012 16:16:08 +1100 OK, so we grew by 1136 - 888 = 248. Keep in mind that 128 of that is expected since we moved W onto the stack. Right. I guess we could go back to the percpu solution, what do you think? I'm not entirely sure, we might have to. sha512 is notorious for generating terrible code with gcc on 32-bit targets, so... The sha512 test in the glibc testsuite tends to timeout on 32-bit sparc. :-) -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html