Hi all,

I was asked to forward this to the list.



I've been working on an OpenSSL engine to support the Cell processor's
(Playstation 3 etc.) vector processors (SPU's)

I've (finally!) got a rough version glued together using the IBM
multi-precision library from the Cell SDK.

You may be interested in the results of version  ***0.001.****


bottom line is 47 * 4096bit RSA sign/sec as opposed to 11 per without.


 >./apps/openssl speed rsa4096 -engine cellspumpm -elapsed -multi 15

(see [2] below for openssl build options)


This is with 15 Multi processes and elapsed time. The choice of 15 is
random.

----------------------------------------------------
with SPU engine : -
                  sign    verify    sign/s verify/s
rsa 4096 bits 0.020915s 0.000546s     47.8   1832.2

----------------------------------------------------
'raw' OpenSSL  (same build)
                  sign    verify    sign/s verify/s
rsa 4096 bits 0.091103s 0.001213s     11.0    824.7

----------------------------------------------------


Without the -multi option.

----------------------------------------------------
with SPU engine : -
                  sign    verify    sign/s verify/s
rsa 4096 bits 0.098480s 0.001725s     10.2    579.7

----------------------------------------------------
'raw' OpenSSL  (same build)
                  sign    verify    sign/s verify/s
rsa 4096 bits 0.108516s 0.001742s      9.2    574.1

----------------------------------------------------



Note:


- Results are from a 3.2 GHz Playstation 3 with 7 SPUs running yellow
dog linux 5.0. [1]
A server/blade Cell system would have up to 16 SPUs.

- I'm using elapsed time on a relatively quite machine. It still has SSH
and X server connections running. I'll be able to get this down later.
The multi-threaded nature of the system messes up the CPU timing option.

- This is first cut with a basic a mod_exp(). Further optimisations
maybe possible with pre-computation and different window sizes.

- There are overheads with the current DMA transfer of parameters that
will be erased as I optimise the big number conversion code and
introduce double buffering techniques. But overheads are pretty small
compared to the mod_exp(). I'm also pretty hopeful that a MIRACL version
I am working on will be even faster.

- I still have an intermittent PKCS#1 padding problems. (arrraggghhhh)

- See http://en.wikipedia.org/wiki/Cell_microprocessor for more details
on the Cell

- The Cell PPU is a PowerPC G5.  OpenSSL is configured for PPC/G5 ASM at
64-bit.

- The Cell SPU can be viewed as a co-processor with 270K RAM with
restricted I/O but enhancements for accelerated multimedia or number
crunching. It does not directly interact with the main system memory.

- The engine is based upon the GMP engine shipped with OpenSSL but uses
the vector optimised IBM MPM multi-precision library on the SPUs for the
big number operations.
The speed gains are attributed to the SPU's 128bit registers and
specialist vector operations allowing for multiple 32bit integer
operations in fewer clock cycles


I'd like to thank Augusto Jun Devegili (DCU/Ireland  & unicamp,
Brazil),  Dr Mike Scott (DCU, Ireland) , and Dr Stephen Henson (OpenSSL) 
for their assistance and patience.

I hope to announce a public/release version later this month.

Any comments, questions,  etc. to neil.costigan[at]computing.dcu.ie

Regards,


Neil Costigan

School of Computing,

Dublin City University,

Dublin, IRELAND.

http://www.computing.dcu.ie/~ncostiga


______________________________________________________________________
OpenSSL Project                                 http://www.openssl.org
User Support Mailing List                    openssl-users@openssl.org
Automated List Manager                           [EMAIL PROTECTED]

Reply via email to