RE: Openssl Engine Performance Benchmarks

2009-03-31 Thread David Schwartz

 Is it 
 openssl speed -evp aes-128-cbc -engine xx -elapsed 
 or
 openssl speed -evp aes-128-cbc -engine xx

It depends what you want to measure.

 I have seen examples with both of them on the internet and I get
 different results with each of them. What exactly does elapsed
 option add here?

-elapsedmeasure time in real time instead of CPU user time.

So, do you want to know which one is faster or which one uses less CPU?

DS


__
OpenSSL Project http://www.openssl.org
User Support Mailing Listopenssl-users@openssl.org
Automated List Manager   majord...@openssl.org


Re: Openssl Engine Performance Benchmarks

2009-03-31 Thread Geoff Thorpe
On Tuesday 31 March 2009 23:16:10 Shasi Thati wrote:
 Hi,

 I have a question regarding the openssl speed command. When I use this
 command to test the crypto offload engine performance  what is the
 right command to use?

 Is it

 openssl speed -evp aes-128-cbc -engine xx -elapsed

 or

 openssl speed -evp aes-128-cbc -engine xx

 I have seen examples with both of them on the internet and I get
 different results with each of them. What exactly does elapsed
 option  add here?

It means elapsed. :-) Ie. how much time elapsed during the benchmark. 
The normal measurement is cpu usage, which is something less than or 
equal to the elapsed time - if the benchmark used half the available cpu 
cycles during the elapsed period (according to scheduler stats, accurate 
or otherwise), the time given would be half the elapsed time.

The usefulness of using cpu-time (instead of -elapsed) is to eliminate;
(a) skewed statistics due to the system running other tasks while the 
benchmark was in progress (ie. you're only billed for what you use), and
(b) to eliminate time the s/w (and driver) spent waiting for the crypto 
accelerator to respond to crypto operations.
The value of (b) is to interpolate certain theoretical limits. Ie. if 80% 
of the time is spent waiting on the accelerator, the cpu-time for the 
benchmark run would be 1/5 of the elapsed time and so the calculated 
number of crypto ops per second would be 5 times what actually happened 
in real/elapsed time. If the latency of the accelerator is roughly 
constant but it can process multiple things at once due to having 
multiple execution units, then this inflated number is a 
useful estimate of how much you could theoretically process if you had 
multiple threads/processes keeping the cpu busy rather than waiting. In 
this example you'd need at least 5 threads to achieve such a performance 
level. (Which also assumes the accelerator performance would continue to 
scale up that far.)

Cheers,
Geoff

-- 
Un terrien, c'est un singe avec des clefs de char...
__
OpenSSL Project http://www.openssl.org
User Support Mailing Listopenssl-users@openssl.org
Automated List Manager   majord...@openssl.org