Re: sha-512...

2012-02-15 Thread Alexey Dobriyan
On Wed, Feb 15, 2012 at 12:23:52AM -0500, David Miller wrote:
 From: Herbert Xu herb...@gondor.hengli.com.au
 Date: Wed, 15 Feb 2012 16:16:08 +1100
 
  OK, so we grew by 1136 - 888 = 248.  Keep in mind that 128 of
  that is expected since we moved W onto the stack.
 
 Right.
 
  I guess we could go back to the percpu solution, what do you
  think?
 
 I'm not entirely sure, we might have to.
 
 sha512 is notorious for generating terrible code with gcc on 32-bit
 targets, so...  The sha512 test in the glibc testsuite tends to
 timeout on 32-bit sparc. :-)

Cherrypicking ror64() commit largely fixes the issue (on sparc-defconfig):

 sha512_transform:
   0:   9d e3 bc 78 save  %sp, -904, %sp

git://git.kernel.org/pub/scm/linux/kernel/git/herbert/cryptodev-2.6.git
b85a088f15f2070b7180735a231012843a5ac96c
crypto: sha512 - use standard ror64()
--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: sha-512...

2012-02-15 Thread David Miller
From: Alexey Dobriyan adobri...@gmail.com
Date: Wed, 15 Feb 2012 22:27:52 +0300

 On Wed, Feb 15, 2012 at 12:23:52AM -0500, David Miller wrote:
 From: Herbert Xu herb...@gondor.hengli.com.au
 Date: Wed, 15 Feb 2012 16:16:08 +1100
 
  OK, so we grew by 1136 - 888 = 248.  Keep in mind that 128 of
  that is expected since we moved W onto the stack.
 
 Right.
 
  I guess we could go back to the percpu solution, what do you
  think?
 
 I'm not entirely sure, we might have to.
 
 sha512 is notorious for generating terrible code with gcc on 32-bit
 targets, so...  The sha512 test in the glibc testsuite tends to
 timeout on 32-bit sparc. :-)
 
 Cherrypicking ror64() commit largely fixes the issue (on sparc-defconfig):
 
    sha512_transform:
  0:   9d e3 bc 78 save  %sp, -904, %sp
 
 git://git.kernel.org/pub/scm/linux/kernel/git/herbert/cryptodev-2.6.git
 b85a088f15f2070b7180735a231012843a5ac96c
 crypto: sha512 - use standard ror64()

I'm happy with a solution that involves pushing this change to Linus's
tree, it's pretty clear why it helps so much although I'm disappointed
that gcc can't se that the u64 shift argument passed in is always a
constant and therefore way within the range of a 32-bit value, ho hum
:-)

In fact, in my tree, this change brings the stack allocation instruction
down to:

save%sp, -824, %sp  !

which is actually BETTER than what the old per-cpu code got:

save%sp, -984, %sp  !

Therefore I highly recommend we apply that ror() change to Linus's
tree now. :-)

--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: sha-512...

2012-02-15 Thread Herbert Xu
On Wed, Feb 15, 2012 at 04:00:10PM -0500, David Miller wrote:

 In fact, in my tree, this change brings the stack allocation instruction
 down to:
 
 save%sp, -824, %sp  !
 
 which is actually BETTER than what the old per-cpu code got:
 
 save%sp, -984, %sp  !
 
 Therefore I highly recommend we apply that ror() change to Linus's
 tree now. :-)

Great, I'll push that out today.

Thanks,!
-- 
Email: Herbert Xu herb...@gondor.apana.org.au
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: sha-512...

2012-02-14 Thread Herbert Xu
On Tue, Feb 14, 2012 at 10:58:33PM -0500, David Miller wrote:
 
 FYI, I just started seeing this on sparc32 after all those
 sha512 optimizations:
 
 crypto/sha512_generic.c: In function 'sha512_transform':
 crypto/sha512_generic.c:135:1: warning: the frame size of 1136 bytes is 
 larger than 1024 bytes [-Wframe-larger-than=]

Is that with the latest patch applied?

crypto: sha512 - Avoid stack bloat on i386

If so then this is not good.  What was the original stack usage,
i.e., if you revert to the original percpu code?

Thanks,
-- 
Email: Herbert Xu herb...@gondor.apana.org.au
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: sha-512...

2012-02-14 Thread David Miller
From: Herbert Xu herb...@gondor.hengli.com.au
Date: Wed, 15 Feb 2012 15:01:28 +1100

 On Tue, Feb 14, 2012 at 10:58:33PM -0500, David Miller wrote:
 
 FYI, I just started seeing this on sparc32 after all those
 sha512 optimizations:
 
 crypto/sha512_generic.c: In function 'sha512_transform':
 crypto/sha512_generic.c:135:1: warning: the frame size of 1136 bytes is 
 larger than 1024 bytes [-Wframe-larger-than=]
 
 Is that with the latest patch applied?
 
   crypto: sha512 - Avoid stack bloat on i386
 
 If so then this is not good.

Yes.  And, of course, with that commit reverted it's even worse.
Reverting it makes the stack frame twice as large.

 What was the original stack usage, i.e., if you revert to the
 original percpu code?

If I revert:

commit 3a92d687c8015860a19213e3c102cad6b722f83c
commit 58d7d18b5268febb8b1391c6dffc8e2aaa751fcd
commit 51fc6dc8f948047364f7d42a4ed89b416c6cc0a3
commit 84e31fdb7c797a7303e0cc295cb9bc8b73fb872d

the stackframe goes down to 888 bytes.

More detailed, the progression is:

master  1136
revert 3a92d687c8015860a19213e3c102cad6b722f83c 2408
revert 58d7d18b5268febb8b1391c6dffc8e2aaa751fcd 2408
revert 51fc6dc8f948047364f7d42a4ed89b416c6cc0a3 1520
revert 84e31fdb7c797a7303e0cc295cb9bc8b73fb872d 888
--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: sha-512...

2012-02-14 Thread Herbert Xu
On Wed, Feb 15, 2012 at 12:11:13AM -0500, David Miller wrote:
 
  On Tue, Feb 14, 2012 at 10:58:33PM -0500, David Miller wrote:
  
  FYI, I just started seeing this on sparc32 after all those
  sha512 optimizations:
  
  crypto/sha512_generic.c: In function 'sha512_transform':
  crypto/sha512_generic.c:135:1: warning: the frame size of 1136 bytes is 
  larger than 1024 bytes [-Wframe-larger-than=]
  
  Is that with the latest patch applied?
  
  crypto: sha512 - Avoid stack bloat on i386
  
  If so then this is not good.
 
 Yes.  And, of course, with that commit reverted it's even worse.
 Reverting it makes the stack frame twice as large.
 
  What was the original stack usage, i.e., if you revert to the
  original percpu code?
 
 If I revert:
 
 commit 3a92d687c8015860a19213e3c102cad6b722f83c
 commit 58d7d18b5268febb8b1391c6dffc8e2aaa751fcd
 commit 51fc6dc8f948047364f7d42a4ed89b416c6cc0a3
 commit 84e31fdb7c797a7303e0cc295cb9bc8b73fb872d
 
 the stackframe goes down to 888 bytes.
 
 More detailed, the progression is:
 
 master1136
 revert 3a92d687c8015860a19213e3c102cad6b722f83c   2408
 revert 58d7d18b5268febb8b1391c6dffc8e2aaa751fcd   2408
 revert 51fc6dc8f948047364f7d42a4ed89b416c6cc0a3   1520
 revert 84e31fdb7c797a7303e0cc295cb9bc8b73fb872d   888

OK, so we grew by 1136 - 888 = 248.  Keep in mind that 128 of
that is expected since we moved W onto the stack.

I guess we could go back to the percpu solution, what do you
think?

Cheers,
-- 
Email: Herbert Xu herb...@gondor.apana.org.au
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: sha-512...

2012-02-14 Thread David Miller
From: Herbert Xu herb...@gondor.hengli.com.au
Date: Wed, 15 Feb 2012 16:16:08 +1100

 OK, so we grew by 1136 - 888 = 248.  Keep in mind that 128 of
 that is expected since we moved W onto the stack.

Right.

 I guess we could go back to the percpu solution, what do you
 think?

I'm not entirely sure, we might have to.

sha512 is notorious for generating terrible code with gcc on 32-bit
targets, so...  The sha512 test in the glibc testsuite tends to
timeout on 32-bit sparc. :-)
--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html