Re: SHA1 21% speedup

2013-02-21 Thread Darryl Miles

Rosen Penev wrote:

One downside of this is that in the case of iterated hashes(PBKDF2) this
only speeds up the first iteration. I do not believe that TLS uses
iterated HMAC-SHA1.


And that is a good thing!  Anything that speeds up or otherwise reduces 
the computation effort of PBKDF2 only weaken its purpose.


One of the primary goals of PBKDF2 is to have a fixed computation cost 
that can be adjusted by setting an appropriate iteration level.  This is 
used to store a tiny (think 1Kb) amount of fixed data from long periods 
of time (think days or years).


It is good to hear that a bunch of GPUs can not usefully speed up 
PBKDF2, working as intended then.




On the subject of precomputing hash digests...

OpenSSL used to allow the standard C API to be used in this way that the 
internal state of a digest could be snapshot/saved by using memcpy().


At some point (maybe around 0.9.7 to 0.9.8) the ENGINE API got its teeth 
into the main C APIs and broke this by doing indirection of internal 
data structures.


The scenarios from the late 1990s that did this (like RADIUS), did not 
use a lot of input data and on modern CPU might as well run all ~64bytes 
of data into the digest each time as we are no longer CPU bound in this 
area.



Darryl

__
OpenSSL Project http://www.openssl.org
Development Mailing List   openssl-dev@openssl.org
Automated List Manager   majord...@openssl.org


Re: SHA1 21% speedup

2013-01-24 Thread Andy Polyakov
 Will this ever get implemented?: https://hashcat.net/p12/
 
 My understanding is that this precomputes several portions of the key
 expansion stage to speed up performance. As modern websites using TLS
 use HMAC-SHA1, this seems like a nice addition.
 
 One downside of this is that in the case of iterated hashes(PBKDF2) this
 only speeds up the first iteration. I do not believe that TLS uses
 iterated HMAC-SHA1.
 
 One thing to mention is that I am not a developer, hence the lack of a
 patch.

First of all one should recognize that it's about GPU, *graphical*
processing unit, computing. Then one should recognize that it pays off
to bet on GPU only under very specific circumstances, and circumstances
are *guaranteed* *massively* *parallel* data flow. And on top of that
discussed optimization implies specific and strong correlation between
the pieces of data. Already first condition is problematic in practical
web server scenarios, not to mention second. In other words, no, it
wouldn't help OpenSSL.

On side note I can't help pointing out several contradictions. I fail to
see how do tables at pages 4 and 15 relate to each other. And numbers on
either don't make sense, because F1 and F3 *are* more computationally
intensive than F2 and F4, which should be reflected in numbers. Most
notably critical paths, i.e. theoretical minimal amount of cycles, for
F1/3 is 3 cycles, and for F2/4 - 2 cycles. This is amount of cycles a
processor with unlimited amount of computational resources would have to
spend per F-operation no matter what, because of algorithm. Of course
parallel GPU can spend clock cycles processing multiple inputs at a time
and thus reduce *effective* amount of *cycles* spent per single
F-operation (table on page 4?), but then why does author speak about
*instruction* count? If it's an effective count, not actual, then how
come word expansion is accounted for differently? I mean if you manage
to parallelize instructions performing F-operations, I can't see any
reason for why you'd fail to do same to word expansion...

__
OpenSSL Project http://www.openssl.org
Development Mailing List   openssl-dev@openssl.org
Automated List Manager   majord...@openssl.org