Re: Open SSL and CUDA

Andy Polyakov Thu, 15 Nov 2012 08:37:31 -0800

As I wrote in 2007 "I don't mean to discourage anybody from looking for
answer." It implies that I don't actually provide answers either. My
objective is discussion, not debunking or anything like that. I merely
attempt to provide additional perspective on the problem.


> Some time ago I  came up with a proof of concept multi-threaded
> implementation.
> The GPU is not used if the system load (measured through getloadvg under
> linux) is below a certain threshold.
> Otherwise each thread puts its message (on which the private key
> operation has to be performed) into a shared buffer.
> If the buffer is full after inserting the message, the current thread
> runs the private key operation  batch on the GPU.
> If the buffer is not full it sleeps for some time.
> The first thread to wake up runs the batch on the GPU even if the buffer
> is not full.
> There thread running the batch wakes up the others afterwards.
> What do you think about this approach?

Problem here is that you're likely to end up in time domain that
contradict users' expectations. Well, you didn't put a number on your
latency but in previous thread ~200ms was mentioned for [up to] 2048
operations. Note that for CPU it takes ~1ms to perform one operation. As
you have no way of knowing whether or not you'll have 200 additional
requests (per core) within next millisecond you probably would like to
go for 1ms operation. Because you don't want to sleep for a say 100ms
just to figure out that amount of requests is not high enough and you'd
do better running on CPU. I mean in your example let's say the thread
that woke up found that it's still the only one. Should it go for GPU
and let user wait additional 200ms or perform operation in 1ms? But of
course, as you mentioned this doesn't account for load. But even then
you probably would bet on GPU only when the load is ... at least 200
(per core). But then question is if users percept system under such load
as usable? You don't spend most of the time on encryption, you spend it
generating content, and load of 200 would mean that particular user
would experience it as 200 times slower than "normally." Well, normal
load might be 50...

> Can you give more insight about developing the idea for DNSSEC?
> How can I go about that?  

Once again, I'm not claiming that I possess answers. DNSSEC is simply
the case when you know number of operations to be performed in
*advance*. I.e. unlike SSL you don't have to guess whether or not there
are 200 additional requests coming in next moment.
______________________________________________________________________
OpenSSL Project                                 http://www.openssl.org
Development Mailing List                       openssl-dev@openssl.org
Automated List Manager                           majord...@openssl.org

Re: Open SSL and CUDA

Reply via email to