Re: AW: via padlock support much slower in 0.9.8e than in 0.9.8d, why?

2007-09-26 Thread Harald Latzko

Hi!

perhaps the benchmark differences come from the different CPU types  
(I've got an Esther CPU with 1.2GHz). Here is the content of the / 
proc/cpuinfo of Linux:


processor   : 0
vendor_id   : CentaurHauls
cpu family  : 6
model   : 10
model name  : VIA Esther processor 1200MHz
stepping: 9
cpu MHz : 1200.052
cache size  : 128 KB
fdiv_bug: no
hlt_bug : no
f00f_bug: no
coma_bug: no
fpu : yes
fpu_exception   : yes
cpuid level : 1
wp  : yes
flags   : fpu vme de pse tsc msr pae mce apic sep mtrr pge  
cmov pat clflush acpi mmx fxsr sse sse2 tm nx up pni est tm2 rng  
rng_en ace ace_en ace2 ace2_en phe phe_en pmm pmm_en

bogomips: 2401.87
clflush size: 64


Am 25.09.2007 um 21:38 schrieb Buddy Butterfly:


Hi,

strange. What could be the reason then? I have 2 systems available for
testing.
C5 and C7. C5 runs Suse 9.3 (kernel 2.6.11) which shows the  
difference I

have posted below.
C7 runs Debian etch (kernel 2.6.18 type i686). On the C7 I see no  
difference

between
openssl version d and e but speed seems to be much slower (max  
292MB/s) with

both versions
and not going up to over 600MB/s like you posted. Any clues?


-Ursprüngliche Nachricht-
Von: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] Im
Auftrag von Harald Latzko
Gesendet: Dienstag, 25. September 2007 20:25
An: openssl-users@openssl.org
Betreff: Re: via padlock support much slower in 0.9.8e than
in 0.9.8d, why?

Hi!

I cannot confirm these performance differences between 0.9.8d
and 0.9.8e. My results on a Via CPU are:

0.9.8d
==
engine padlock set.
Doing aes-256-cbc for 3s on 16 size blocks: 11906104
aes-256-cbc's in 3.00s Doing aes-256-cbc for 3s on 64 size
blocks: 9088256 aes-256-cbc's in 2.99s Doing aes-256-cbc for
3s on 256 size blocks: 4744283 aes-256-cbc's in 2.98s Doing
aes-256-cbc for 3s on 1024 size blocks: 1624804 aes-256-cbc's
in 3.00s Doing aes-256-cbc for 3s on 8192 size blocks: 226672
aes-256-cbc's in 2.99s OpenSSL 0.9.8d 28 Sep 2006 built on:
Tue Sep 25 20:13:59 GMT 2007
options:bn(64,32) md2(int) rc4(idx,int) des(ptr,risc1,16,long) aes
(partial) idea(int) blowfish(idx)
compiler: gcc -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -
DHAVE_DLFCN_H -DL_ENDIAN -DTERMIO -O3 -fomit-frame-pointer
-Wall - DOPENSSL_BN_ASM_PART_WORDS -DOPENSSL_IA32_SSE2
-DSHA1_ASM -DMD5_ASM - DRMD160_ASM -DAES_ASM available timing
options: TIMES TIMEB HZ=100 [sysconf value] timing function
used: times The 'numbers' are in 1000s of bytes per second processed.
type 16 bytes 64 bytes256 bytes   1024 bytes
8192 bytes
aes-256-cbc  63499.22k   194531.23k   407562.57k   554599.77k
621035.79k

0.9.8e
==
engine padlock set.
Doing aes-256-cbc for 3s on 16 size blocks: 11597661
aes-256-cbc's in 3.01s Doing aes-256-cbc for 3s on 64 size
blocks: 8927779 aes-256-cbc's in 3.00s Doing aes-256-cbc for
3s on 256 size blocks: 4708369 aes-256-cbc's in 3.01s Doing
aes-256-cbc for 3s on 1024 size blocks: 1622241 aes-256-cbc's
in 3.00s Doing aes-256-cbc for 3s on 8192 size blocks: 227275
aes-256-cbc's in 2.97s OpenSSL 0.9.8e 23 Feb 2007 built on:
Tue Sep 25 20:21:15 GMT 2007
options:bn(64,32) md2(int) rc4(idx,int) des(ptr,risc1,16,long) aes
(partial) idea(int) blowfish(idx)
compiler: gcc -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -
DHAVE_DLFCN_H -DL_ENDIAN -DTERMIO -O3 -fomit-frame-pointer
-Wall - DOPENSSL_BN_ASM_PART_WORDS -DOPENSSL_IA32_SSE2
-DSHA1_ASM -DMD5_ASM - DRMD160_ASM -DAES_ASM available timing
options: TIMES TIMEB HZ=100 [sysconf value] timing function
used: times The 'numbers' are in 1000s of bytes per second processed.
type 16 bytes 64 bytes256 bytes   1024 bytes
8192 bytes
aes-256-cbc  61648.70k   190459.29k   400446.00k   553724.93k
626881.08k


Regards,
Harald


Am 25.09.2007 um 19:35 schrieb Buddy Butterfly:



With a VIA C5 board I get a huge difference in speed with engine
padlock support (same machine same OS etc.).
Where is the difference coming from. Are there any changes

regarding

buffering or block sizes? Look at this results:

0.9.8e:

#./openssl speed -evp aes-256-cbc -engine padlock engine padlock
set.
Doing aes-256-cbc for 3s on 16 size blocks: 9477714

aes-256-cbc's in

2.99s Doing aes-256-cbc for 3s on 64 size blocks: 5371202
aes-256-cbc's in 3.00s Doing aes-256-cbc for 3s on 256 size blocks:
2058449 aes-256-cbc's in 2.99s Doing aes-256-cbc for 3s on

1024 size

blocks: 645381 aes-256-cbc's in 3.00s Doing aes-256-cbc for

3s on 8192

size blocks: 93456 aes-256-cbc's in 3.00s OpenSSL 0.9.8e 23

Feb 2007

built on: Wed Aug 22 17:00:48 CEST 2007
options:bn(64,32) md2(int) rc4(idx,int) des(ptr,risc1,16,long) aes
(partial)
idea(int) blowfish(idx)
compiler: gcc -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -
DHAVE_DLFCN_H -DL_ENDIAN -DTERMIO -O3 -fomit-frame-pointer -Wall
-DOPENSSL_BN_ASM_PART_WORDS -DOPENSSL_IA32_SSE2 -DSHA1_ASM

-DMD5_ASM

-DRMD160_ASM -DAES_ASM available

via padlock support much slower in 0.9.8e than in 0.9.8d, why?

2007-09-25 Thread Buddy Butterfly

With a VIA C5 board I get a huge difference in speed with engine padlock
support (same machine same OS etc.).
Where is the difference coming from. Are there any changes regarding
buffering or block sizes? Look at this results:

0.9.8e:

#./openssl speed -evp aes-256-cbc -engine padlock engine padlock set.
Doing aes-256-cbc for 3s on 16 size blocks: 9477714 aes-256-cbc's in 2.99s
Doing aes-256-cbc for 3s on 64 size blocks: 5371202 aes-256-cbc's in 3.00s
Doing aes-256-cbc for 3s on 256 size blocks: 2058449 aes-256-cbc's in 2.99s
Doing aes-256-cbc for 3s on 1024 size blocks: 645381 aes-256-cbc's in 3.00s
Doing aes-256-cbc for 3s on 8192 size blocks: 93456 aes-256-cbc's in 3.00s
OpenSSL 0.9.8e 23 Feb 2007 built on: Wed Aug 22 17:00:48 CEST 2007
options:bn(64,32) md2(int) rc4(idx,int) des(ptr,risc1,16,long) aes(partial)
idea(int) blowfish(idx)
compiler: gcc -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H
-DL_ENDIAN -DTERMIO -O3 -fomit-frame-pointer -Wall
-DOPENSSL_BN_ASM_PART_WORDS -DOPENSSL_IA32_SSE2 -DSHA1_ASM -DMD5_ASM
-DRMD160_ASM -DAES_ASM available timing options: TIMES TIMEB HZ=100 [sysconf
value] timing function used: times The 'numbers' are in 1000s of bytes per
second processed.
type 16 bytes 64 bytes256 bytes   1024 bytes   8192
bytes
aes-256-cbc  50716.86k   114585.64k   176241.79k   220290.05k
255197.18k
#

0.9.8d:

# ./openssl speed -evp aes-256-cbc -engine padlock engine padlock set.
Doing aes-256-cbc for 3s on 16 size blocks: 13856973 aes-256-cbc's in 2.99s
Doing aes-256-cbc for 3s on 64 size blocks: 10520959 aes-256-cbc's in 3.00s
Doing aes-256-cbc for 3s on 256 size blocks: 5370328 aes-256-cbc's in 3.00s
Doing aes-256-cbc for 3s on 1024 size blocks: 1807981 aes-256-cbc's in 3.00s
Doing aes-256-cbc for 3s on 8192 size blocks: 251498 aes-256-cbc's in 3.00s
OpenSSL 0.9.8d 28 Sep 2006 built on: Fri Nov 10 20:44:47 CET 2006
options:bn(64,32) md2(int) rc4(idx,int) des(ptr,risc1,16,long) aes(partial)
idea(int) blowfish(idx)
compiler: gcc -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H
-DL_ENDIAN -DTERMIO -O3 -fomit-frame-pointer -Wall
-DOPENSSL_BN_ASM_PART_WORDS -DOPENSSL_IA32_SSE2 -DSHA1_ASM -DMD5_ASM
-DRMD160_ASM -DAES_ASM available timing options: TIMES TIMEB HZ=100 [sysconf
value] timing function used: times The 'numbers' are in 1000s of bytes per
second processed.
type 16 bytes 64 bytes256 bytes   1024 bytes   8192
bytes
aes-256-cbc  74151.03k   224447.13k   458267.99k   617124.18k
686757.21k
#

__
OpenSSL Project http://www.openssl.org
User Support Mailing Listopenssl-users@openssl.org
Automated List Manager   [EMAIL PROTECTED]


Re: via padlock support much slower in 0.9.8e than in 0.9.8d, why?

2007-09-25 Thread Harald Latzko

Hi!

I cannot confirm these performance differences between 0.9.8d and  
0.9.8e. My results on a Via CPU are:


0.9.8d
==
engine padlock set.
Doing aes-256-cbc for 3s on 16 size blocks: 11906104 aes-256-cbc's in  
3.00s
Doing aes-256-cbc for 3s on 64 size blocks: 9088256 aes-256-cbc's in  
2.99s
Doing aes-256-cbc for 3s on 256 size blocks: 4744283 aes-256-cbc's in  
2.98s
Doing aes-256-cbc for 3s on 1024 size blocks: 1624804 aes-256-cbc's  
in 3.00s
Doing aes-256-cbc for 3s on 8192 size blocks: 226672 aes-256-cbc's in  
2.99s

OpenSSL 0.9.8d 28 Sep 2006
built on: Tue Sep 25 20:13:59 GMT 2007
options:bn(64,32) md2(int) rc4(idx,int) des(ptr,risc1,16,long) aes 
(partial) idea(int) blowfish(idx)
compiler: gcc -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN - 
DHAVE_DLFCN_H -DL_ENDIAN -DTERMIO -O3 -fomit-frame-pointer -Wall - 
DOPENSSL_BN_ASM_PART_WORDS -DOPENSSL_IA32_SSE2 -DSHA1_ASM -DMD5_ASM - 
DRMD160_ASM -DAES_ASM

available timing options: TIMES TIMEB HZ=100 [sysconf value]
timing function used: times
The 'numbers' are in 1000s of bytes per second processed.
type 16 bytes 64 bytes256 bytes   1024 bytes
8192 bytes
aes-256-cbc  63499.22k   194531.23k   407562.57k   554599.77k
621035.79k


0.9.8e
==
engine padlock set.
Doing aes-256-cbc for 3s on 16 size blocks: 11597661 aes-256-cbc's in  
3.01s
Doing aes-256-cbc for 3s on 64 size blocks: 8927779 aes-256-cbc's in  
3.00s
Doing aes-256-cbc for 3s on 256 size blocks: 4708369 aes-256-cbc's in  
3.01s
Doing aes-256-cbc for 3s on 1024 size blocks: 1622241 aes-256-cbc's  
in 3.00s
Doing aes-256-cbc for 3s on 8192 size blocks: 227275 aes-256-cbc's in  
2.97s

OpenSSL 0.9.8e 23 Feb 2007
built on: Tue Sep 25 20:21:15 GMT 2007
options:bn(64,32) md2(int) rc4(idx,int) des(ptr,risc1,16,long) aes 
(partial) idea(int) blowfish(idx)
compiler: gcc -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN - 
DHAVE_DLFCN_H -DL_ENDIAN -DTERMIO -O3 -fomit-frame-pointer -Wall - 
DOPENSSL_BN_ASM_PART_WORDS -DOPENSSL_IA32_SSE2 -DSHA1_ASM -DMD5_ASM - 
DRMD160_ASM -DAES_ASM

available timing options: TIMES TIMEB HZ=100 [sysconf value]
timing function used: times
The 'numbers' are in 1000s of bytes per second processed.
type 16 bytes 64 bytes256 bytes   1024 bytes
8192 bytes
aes-256-cbc  61648.70k   190459.29k   400446.00k   553724.93k
626881.08k



Regards,
Harald


Am 25.09.2007 um 19:35 schrieb Buddy Butterfly:



With a VIA C5 board I get a huge difference in speed with engine  
padlock

support (same machine same OS etc.).
Where is the difference coming from. Are there any changes regarding
buffering or block sizes? Look at this results:

0.9.8e:

#./openssl speed -evp aes-256-cbc -engine padlock engine padlock  
set.
Doing aes-256-cbc for 3s on 16 size blocks: 9477714 aes-256-cbc's  
in 2.99s
Doing aes-256-cbc for 3s on 64 size blocks: 5371202 aes-256-cbc's  
in 3.00s
Doing aes-256-cbc for 3s on 256 size blocks: 2058449 aes-256-cbc's  
in 2.99s
Doing aes-256-cbc for 3s on 1024 size blocks: 645381 aes-256-cbc's  
in 3.00s
Doing aes-256-cbc for 3s on 8192 size blocks: 93456 aes-256-cbc's  
in 3.00s

OpenSSL 0.9.8e 23 Feb 2007 built on: Wed Aug 22 17:00:48 CEST 2007
options:bn(64,32) md2(int) rc4(idx,int) des(ptr,risc1,16,long) aes 
(partial)

idea(int) blowfish(idx)
compiler: gcc -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN - 
DHAVE_DLFCN_H

-DL_ENDIAN -DTERMIO -O3 -fomit-frame-pointer -Wall
-DOPENSSL_BN_ASM_PART_WORDS -DOPENSSL_IA32_SSE2 -DSHA1_ASM -DMD5_ASM
-DRMD160_ASM -DAES_ASM available timing options: TIMES TIMEB HZ=100  
[sysconf
value] timing function used: times The 'numbers' are in 1000s of  
bytes per

second processed.
type 16 bytes 64 bytes256 bytes   1024 bytes
8192

bytes
aes-256-cbc  50716.86k   114585.64k   176241.79k   220290.05k
255197.18k
#

0.9.8d:

# ./openssl speed -evp aes-256-cbc -engine padlock engine padlock  
set.
Doing aes-256-cbc for 3s on 16 size blocks: 13856973 aes-256-cbc's  
in 2.99s
Doing aes-256-cbc for 3s on 64 size blocks: 10520959 aes-256-cbc's  
in 3.00s
Doing aes-256-cbc for 3s on 256 size blocks: 5370328 aes-256-cbc's  
in 3.00s
Doing aes-256-cbc for 3s on 1024 size blocks: 1807981 aes-256-cbc's  
in 3.00s
Doing aes-256-cbc for 3s on 8192 size blocks: 251498 aes-256-cbc's  
in 3.00s

OpenSSL 0.9.8d 28 Sep 2006 built on: Fri Nov 10 20:44:47 CET 2006
options:bn(64,32) md2(int) rc4(idx,int) des(ptr,risc1,16,long) aes 
(partial)

idea(int) blowfish(idx)
compiler: gcc -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN - 
DHAVE_DLFCN_H

-DL_ENDIAN -DTERMIO -O3 -fomit-frame-pointer -Wall
-DOPENSSL_BN_ASM_PART_WORDS -DOPENSSL_IA32_SSE2 -DSHA1_ASM -DMD5_ASM
-DRMD160_ASM -DAES_ASM available timing options: TIMES TIMEB HZ=100  
[sysconf
value] timing function used: times The 'numbers' are in 1000s of  
bytes per

second processed.
type 16 bytes 64 bytes256 bytes   1024 bytes
8192

bytes
aes-256-cbc  74151.03k   224447.13k   458267.99k   617124.18k

AW: via padlock support much slower in 0.9.8e than in 0.9.8d, why?

2007-09-25 Thread Buddy Butterfly
Hi,

strange. What could be the reason then? I have 2 systems available for
testing.
C5 and C7. C5 runs Suse 9.3 (kernel 2.6.11) which shows the difference I
have posted below.
C7 runs Debian etch (kernel 2.6.18 type i686). On the C7 I see no difference
between
openssl version d and e but speed seems to be much slower (max 292MB/s) with
both versions
and not going up to over 600MB/s like you posted. Any clues?

 -Ursprüngliche Nachricht-
 Von: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] Im 
 Auftrag von Harald Latzko
 Gesendet: Dienstag, 25. September 2007 20:25
 An: openssl-users@openssl.org
 Betreff: Re: via padlock support much slower in 0.9.8e than 
 in 0.9.8d, why?
 
 Hi!
 
 I cannot confirm these performance differences between 0.9.8d 
 and 0.9.8e. My results on a Via CPU are:
 
 0.9.8d
 ==
 engine padlock set.
 Doing aes-256-cbc for 3s on 16 size blocks: 11906104 
 aes-256-cbc's in 3.00s Doing aes-256-cbc for 3s on 64 size 
 blocks: 9088256 aes-256-cbc's in 2.99s Doing aes-256-cbc for 
 3s on 256 size blocks: 4744283 aes-256-cbc's in 2.98s Doing 
 aes-256-cbc for 3s on 1024 size blocks: 1624804 aes-256-cbc's 
 in 3.00s Doing aes-256-cbc for 3s on 8192 size blocks: 226672 
 aes-256-cbc's in 2.99s OpenSSL 0.9.8d 28 Sep 2006 built on: 
 Tue Sep 25 20:13:59 GMT 2007
 options:bn(64,32) md2(int) rc4(idx,int) des(ptr,risc1,16,long) aes
 (partial) idea(int) blowfish(idx)
 compiler: gcc -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN - 
 DHAVE_DLFCN_H -DL_ENDIAN -DTERMIO -O3 -fomit-frame-pointer 
 -Wall - DOPENSSL_BN_ASM_PART_WORDS -DOPENSSL_IA32_SSE2 
 -DSHA1_ASM -DMD5_ASM - DRMD160_ASM -DAES_ASM available timing 
 options: TIMES TIMEB HZ=100 [sysconf value] timing function 
 used: times The 'numbers' are in 1000s of bytes per second processed.
 type 16 bytes 64 bytes256 bytes   1024 bytes
 8192 bytes
 aes-256-cbc  63499.22k   194531.23k   407562.57k   554599.77k
 621035.79k
 
 0.9.8e
 ==
 engine padlock set.
 Doing aes-256-cbc for 3s on 16 size blocks: 11597661 
 aes-256-cbc's in 3.01s Doing aes-256-cbc for 3s on 64 size 
 blocks: 8927779 aes-256-cbc's in 3.00s Doing aes-256-cbc for 
 3s on 256 size blocks: 4708369 aes-256-cbc's in 3.01s Doing 
 aes-256-cbc for 3s on 1024 size blocks: 1622241 aes-256-cbc's 
 in 3.00s Doing aes-256-cbc for 3s on 8192 size blocks: 227275 
 aes-256-cbc's in 2.97s OpenSSL 0.9.8e 23 Feb 2007 built on: 
 Tue Sep 25 20:21:15 GMT 2007
 options:bn(64,32) md2(int) rc4(idx,int) des(ptr,risc1,16,long) aes
 (partial) idea(int) blowfish(idx)
 compiler: gcc -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN - 
 DHAVE_DLFCN_H -DL_ENDIAN -DTERMIO -O3 -fomit-frame-pointer 
 -Wall - DOPENSSL_BN_ASM_PART_WORDS -DOPENSSL_IA32_SSE2 
 -DSHA1_ASM -DMD5_ASM - DRMD160_ASM -DAES_ASM available timing 
 options: TIMES TIMEB HZ=100 [sysconf value] timing function 
 used: times The 'numbers' are in 1000s of bytes per second processed.
 type 16 bytes 64 bytes256 bytes   1024 bytes
 8192 bytes
 aes-256-cbc  61648.70k   190459.29k   400446.00k   553724.93k
 626881.08k
 
 
 Regards,
 Harald
 
 
 Am 25.09.2007 um 19:35 schrieb Buddy Butterfly:
 
 
  With a VIA C5 board I get a huge difference in speed with engine 
  padlock support (same machine same OS etc.).
  Where is the difference coming from. Are there any changes 
 regarding 
  buffering or block sizes? Look at this results:
 
  0.9.8e:
 
  #./openssl speed -evp aes-256-cbc -engine padlock engine padlock  
  set.
  Doing aes-256-cbc for 3s on 16 size blocks: 9477714 
 aes-256-cbc's in 
  2.99s Doing aes-256-cbc for 3s on 64 size blocks: 5371202 
  aes-256-cbc's in 3.00s Doing aes-256-cbc for 3s on 256 size blocks: 
  2058449 aes-256-cbc's in 2.99s Doing aes-256-cbc for 3s on 
 1024 size 
  blocks: 645381 aes-256-cbc's in 3.00s Doing aes-256-cbc for 
 3s on 8192 
  size blocks: 93456 aes-256-cbc's in 3.00s OpenSSL 0.9.8e 23 
 Feb 2007 
  built on: Wed Aug 22 17:00:48 CEST 2007
  options:bn(64,32) md2(int) rc4(idx,int) des(ptr,risc1,16,long) aes
  (partial)
  idea(int) blowfish(idx)
  compiler: gcc -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN - 
  DHAVE_DLFCN_H -DL_ENDIAN -DTERMIO -O3 -fomit-frame-pointer -Wall 
  -DOPENSSL_BN_ASM_PART_WORDS -DOPENSSL_IA32_SSE2 -DSHA1_ASM 
 -DMD5_ASM 
  -DRMD160_ASM -DAES_ASM available timing options: TIMES TIMEB HZ=100 
  [sysconf value] timing function used: times The 'numbers' 
 are in 1000s 
  of bytes per second processed.
  type 16 bytes 64 bytes256 bytes   1024 bytes
  8192
  bytes
  aes-256-cbc  50716.86k   114585.64k   176241.79k   220290.05k
  255197.18k
  #
 
  0.9.8d:
 
  # ./openssl speed -evp aes-256-cbc -engine padlock engine 
 padlock  
  set.
  Doing aes-256-cbc for 3s on 16 size blocks: 13856973 
 aes-256-cbc's in 
  2.99s Doing aes-256-cbc for 3s on 64 size blocks: 10520959 
  aes-256-cbc's in 3.00s Doing aes-256-cbc for 3s on 256 size blocks: 
  5370328 aes-256-cbc's in 3.00s Doing aes-256-cbc for 3s