Re: OpenSSL performance woes with ubsec crypto engine (Broadcom BCM5820/BCM5823/BMC5825/BMC582x)

2008-01-30 Thread Dr. Stephen Henson
On Thu, Jan 31, 2008, Peter Waltenberg wrote:

> OPENSSL_cleanse() doesn't zero memory regions, it fills them with
> pseudo-random data.
> Edit crypto/mem_clr.c and replace that code with  memset(ptr,'\0',len); and
> just clear the region - you'll see a significant performance boost if
> that's your majorbottleneck.
> 
> Just be aware that some hypothetical compiler could decide to skip the
> memset - I can't remember which compiler that is, but it's the one that
> comes with the free tinfoil hats .
> 

Note also that there is an assembly language version of OPENSSL_cleanse() in
0.9.9-dev which is significantly faster than the C version.

Steve.
--
Dr Stephen N. Henson. Email, S/MIME and PGP keys: see homepage
OpenSSL project core developer and freelance consultant.
Homepage: http://www.drh-consultancy.demon.co.uk
__
OpenSSL Project http://www.openssl.org
Development Mailing List   openssl-dev@openssl.org
Automated List Manager   [EMAIL PROTECTED]


Re: OpenSSL performance woes with ubsec crypto engine (Broadcom BCM5820/BCM5823/BMC5825/BMC582x)

2008-01-30 Thread Peter Waltenberg
OPENSSL_cleanse() doesn't zero memory regions, it fills them with
pseudo-random data.
Edit crypto/mem_clr.c and replace that code with  memset(ptr,'\0',len); and
just clear the region - you'll see a significant performance boost if
that's your majorbottleneck.

Just be aware that some hypothetical compiler could decide to skip the
memset - I can't remember which compiler that is, but it's the one that
comes with the free tinfoil hats .

Peter




   
  From:   Thor Lancelot Simon <[EMAIL PROTECTED]>   
   

   
  To: openssl-dev@openssl.org   
   

   
  Date:   31/01/2008 06:19  
   

   
  Subject:Re: OpenSSL performance woes with ubsec crypto engine (Broadcom 
BCM5820/BCM5823/BMC5825/BMC582x) 

   





On Wed, Jan 30, 2008 at 09:32:34PM +0200, Paul Sheer wrote:
> Hi,
>
> I have a BMC5825 card from Silicom that is supposed to do over
> 10'000 rsa per second.

Never going to happen.  The context switches to talk to the
accellerator are too expensive, and OpenSSL doesn't support (nor
have any way to support) modern accellerators' SSL-handshake nor
SSL-record operations.

2000/sec is a good place to be, on a client.  Expect less on a
server, unfortunately.

> I replaced OPENSSL_cleanse() {...} with { memset(); } already
> - IT WAS THE TOP FUNCTION IN MY FIRST GPROF RUN!

Yes.  This is the OPENSSL_cleanse() of a maximum-sized SSL record,
right at the outset of any session.  It is amazingly expensive, but
I have had trouble ascertaining whether it can be safely arranged
for it to zero less data.

If any of the OpenSSL developers are listening, I would really love
some feedback on this.

> The card supports hardware SHA1 and MD5 - but it's not used
> because OpenSSL divides each md operation into an init(),
> update() and final() stage. But the card wants a one shot.
> So the crypto card API does not fit the software API

Which version of OpenSSL are you using?  It appears that in -current
an engine can provide an HMAC method, and the Broadcom hardware does
directly support HMAC.  The old way, where engines saw only raw hash
operations (MD5 or SHA) even though SSLv3 or TLS was doing HMAC, was
completely insane.

I wish there were a way for an engine to provide an SSL-record-encryption
or -decryption method.  Most modern accellerators do those, too.

Thor
__
OpenSSL Project http://www.openssl.org
Development Mailing List   openssl-dev@openssl.org
Automated List Manager   [EMAIL PROTECTED]



__
OpenSSL Project http://www.openssl.org
Development Mailing List   openssl-dev@openssl.org
Automated List Manager   [EMAIL PROTECTED]


Re: OpenSSL performance woes with ubsec crypto engine (Broadcom BCM5820/BCM5823/BMC5825/BMC582x)

2008-01-30 Thread Paul Sheer
no I meant that I am already getting 2000/sec on the *server*.

By my calculations I should be able to get 3000/sec on the server
with the optimizations I want to do.


> 2000/sec is a good place to be, on a client.  Expect less on a
> server, unfortunately.
>
> > I replaced OPENSSL_cleanse() {...} with { memset(); } already
> > - IT WAS THE TOP FUNCTION IN MY FIRST GPROF RUN!
>
> Yes.  This is the OPENSSL_cleanse() of a maximum-sized SSL record,
> right at the outset of any session.  It is amazingly expensive, but
> I have had trouble ascertaining whether it can be safely arranged
> for it to zero less data.


It's the algorithm that's expensive. What's wrong with memset? *duck*
I memset the full packet length and it drops of the first page of gprof
output. I mean how paranoid do you need to be hear?


> Which version of OpenSSL are you using?


openssl-0.9.8g


> It appears that in -current
> an engine can provide an HMAC method,


oh? ok thanks


> and the Broadcom hardware does
> directly support HMAC.  The old way, where engines saw only raw hash
> operations (MD5 or SHA) even though SSLv3 or TLS was doing HMAC, was
> completely insane.
>
> I wish there were a way for an engine to provide an SSL-record-encryption
> or -decryption method.  Most modern accellerators do those, too.
>
>
Yep

-paul


Re: OpenSSL performance woes with ubsec crypto engine (Broadcom BCM5820/BCM5823/BMC5825/BMC582x)

2008-01-30 Thread Thor Lancelot Simon
On Wed, Jan 30, 2008 at 09:32:34PM +0200, Paul Sheer wrote:
> Hi,
> 
> I have a BMC5825 card from Silicom that is supposed to do over
> 10'000 rsa per second.

Never going to happen.  The context switches to talk to the
accellerator are too expensive, and OpenSSL doesn't support (nor
have any way to support) modern accellerators' SSL-handshake nor
SSL-record operations.

2000/sec is a good place to be, on a client.  Expect less on a
server, unfortunately.

> I replaced OPENSSL_cleanse() {...} with { memset(); } already
> - IT WAS THE TOP FUNCTION IN MY FIRST GPROF RUN!

Yes.  This is the OPENSSL_cleanse() of a maximum-sized SSL record,
right at the outset of any session.  It is amazingly expensive, but
I have had trouble ascertaining whether it can be safely arranged
for it to zero less data.

If any of the OpenSSL developers are listening, I would really love
some feedback on this.

> The card supports hardware SHA1 and MD5 - but it's not used
> because OpenSSL divides each md operation into an init(),
> update() and final() stage. But the card wants a one shot.
> So the crypto card API does not fit the software API

Which version of OpenSSL are you using?  It appears that in -current
an engine can provide an HMAC method, and the Broadcom hardware does
directly support HMAC.  The old way, where engines saw only raw hash
operations (MD5 or SHA) even though SSLv3 or TLS was doing HMAC, was
completely insane.

I wish there were a way for an engine to provide an SSL-record-encryption
or -decryption method.  Most modern accellerators do those, too.

Thor
__
OpenSSL Project http://www.openssl.org
Development Mailing List   openssl-dev@openssl.org
Automated List Manager   [EMAIL PROTECTED]


OpenSSL performance woes with ubsec crypto engine (Broadcom BCM5820/BCM5823/BMC5825/BMC582x)

2008-01-30 Thread Paul Sheer
Hi,

I have a BMC5825 card from Silicom that is supposed to do over
10'000 rsa per second.

In practice Proto Balance can do about 1900 fresh SSL connections
per second, on an Intel Core2 Duo 2.2Ghz. But I think more work
can vastly improve this.

(Without the card I get about 700 per second - thus the card
improves the performance by about 270%)

I compiled with -O1 -g -pg and the gprof output is below.

I replaced OPENSSL_cleanse() {...} with { memset(); } already
- IT WAS THE TOP FUNCTION IN MY FIRST GPROF RUN!

My test does not use sessions. It downloads a minimal web
page, "", with 200 clients concurrently.

The malloc at the top is surprisingly expensive: it is called
mostly from EVP_DigestInit_ex(). Refactoring to eliminate
this malloc would be worthwhile I think.

The card supports hardware SHA1 and MD5 - but it's not used
because OpenSSL divides each md operation into an init(),
update() and final() stage. But the card wants a one shot.
So the crypto card API does not fit the software API

:-(

OpenSSL *really* needs to be fixed to properly support
hardware md's

I see Silicom's BMC586x/BMC5861/BMC5862 OpenSSL patch
plugs in code everywhere to directly call their card's
SSL signing function - a sorry solution indeed.

By eliminating the top 6 functions listed below, another 30%
cpu can be saved at least.

Kinds regards

-paul


--=--


Flat profile:

Each sample counts as 0.01 seconds.
  %   cumulative   self  self total
 time   seconds   secondscalls   s/call   s/call  name
 15.22  0.28 0.28   861099 0.00 0.00  malloc  <<
!!
  8.15  0.43 0.15
md5_block_asm_host_order
  6.52  0.55 0.12   461101 0.00 0.00  sha1_block_host_order
  4.89  0.64 0.09   234451 0.00 0.00  sha1_block_data_order
  3.80  0.71 0.07   340275 0.00 0.00
sslconnection_thread_bas
  2.72  0.76 0.05   725772 0.00 0.00  SHA1_Update
  2.72  0.81 0.05   673096 0.00 0.00  asn1_i2d_ex_primitive
  2.17  0.85 0.041 0.04 1.51  _thread_os_thread
  2.17  0.89 0.04 RC4
  1.63  0.92 0.03   818060 0.00 0.00  asn1_ex_i2c
  1.63  0.95 0.03   438186 0.00 0.00  HMAC_Init_ex
  1.63  0.98 0.03   355077 0.00 0.00  SHA1_Final
  1.63  1.01 0.0382992 0.00 0.00  ssl3_read_bytes
  1.09  1.03 0.02  1361354 0.00 0.00  EVP_MD_CTX_cleanup


RE: Static global - bug? (Re: Two valgrind warnings in OpenSSL -possible bug???)

2008-01-30 Thread David Schwartz

> >  3) You cannot link to the pthreads library and still use fork, and

> David, you absolutely cannot link with pthreads and still use fork()

> It doesn't work except in a few very simplistic scenarios.

> -paul

What you are saying just doesn't make any sense. I agree that it is
difficult to use fork properly in a process that creates multiple threads.
But this has nothing whatsoever to do with linking with the pthreads library
nor with compiling your code multi-threaded.

You can write code that has multiple internal models, say one that uses
'fork' the way unthreaded processes normally do and one that uses
multiple-threads, and compile it using your platform's options for compiling
a multi-threaded process. You can then select at run time whether to create
threads or to call 'fork'.

On no platform that claims POSIX compliance will you have any problems at
all. The problem you are complaining about simply does not exist.

Yes, it's difficult to use 'fork' in a process that actually creates
multiple threads. But whether or not a process *creates* multiple threads is
not a compilation issue, it's a run time issue. So it can't possibly cause
you to need to compile two versions.

I defy you to show me any platform where 'fork' breaks just because you
specify multi-threaded compiler options or link to the threading library.
You can most certainly do:

int multi_threaded;

if(multi_threaded)
{
 // some code that calls pthread_create
}
else
{
 // some code that calls fork
}

And compile it multi-threaded and link it to the pthreads library and you
will have *NO* issues with 'fork'.

The issues you are talking about with 'fork' are all run-time issues. None
of them require compiling two copies of a library.

DS


__
OpenSSL Project http://www.openssl.org
Development Mailing List   openssl-dev@openssl.org
Automated List Manager   [EMAIL PROTECTED]


Re: memory corruption after usin BN_mod_inverse

2008-01-30 Thread Евгений Ломовский
Hi, Yair Elharrar!

For me it looks bad. :-/ Because, BN_sub doesn't handle this situation (r = b):
 1) BN_sub call BN_uadd(r,a,b), but r = b, then
 2) BN_sub change r->neg, but r = b, then
 3) BN_sub call BN_expand(r), then
 4) BN_sub call BN_ucmp(a,b), but b here is not that b that was at the
begin of BN_sub, then
 5) BN_sub call BN_usub(r,a,b) or BN_usub(r,b,a), but ...

May be I've used wrong words, but my thought was that calling
BN_sub(Y,n,Y) from BN_mod_inverse leads to unpredictable behavior. And
this is not subject of standard of C rather using it.
__
OpenSSL Project http://www.openssl.org
Development Mailing List   openssl-dev@openssl.org
Automated List Manager   [EMAIL PROTECTED]


RE: memory corruption after usin BN_mod_inverse

2008-01-30 Thread Yair Elharrar
Hi Eugene,
ISO/IEC 9899 doesn't discuss this directly, but says in section 6.7.5.1:

"...const int *ptr_to_constant;
int *const constant_ptr;
The contents of any object pointed to by ptr_to_constant shall not be modified 
through that pointer..."

in BN_sub, "b" is a const BIGNUM *, hence the content referenced by it may not 
be modified _through b_.
The content (*b) cannot be placed in read-only storage as it is referenced, not 
created, by this declaration.
This implies that it's OK to modify it _through r_.

If you were to create a const BIGNUM Z, then attempt to BN_sub(&Z, n, &Z) then 
you would be violating constness by passing Z as the first (non-const) 
argument. As it stands, however, the code looks fine to me.

-Yair


-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of [EMAIL PROTECTED]
Sent: Wednesday, January 30, 2008 5:16 PM
To: openssl-dev@openssl.org
Subject: Re: memory corruption after usin BN_mod_inverse


Hi, Yair Elharrar!

> Sorry, I don't think that breaks any const rules.
> See explanation and example in ISO/IEC 14882 section 7.1.5.1.

First of all, OpenSSL was written in C, so ISO/IEC 14882 is not a
subject to reffer to (it is the C++ standard).

Let's see in ISO/IEC 9899 section 6.7.3:
"The implementation may place a const object that is not volatile in a
read-only region of storage." That's enough.

Then, if you look in BN_sub you'll easy understand that behavior will
be undefined if r and b point to the same object.

--
 Eugene.
__
OpenSSL Project http://www.openssl.org
Development Mailing List   openssl-dev@openssl.org
Automated List Manager   [EMAIL PROTECTED]

This email and any files transmitted with it are confidential material. They 
are intended solely for the use of the designated individual or entity to whom 
they are addressed. If the reader of this message is not the intended 
recipient, you are hereby notified that any dissemination, use, distribution or 
copying of this communication is strictly prohibited and may be unlawful.

If you have received this email in error please immediately notify the sender 
and delete or destroy any copy of this message
__
OpenSSL Project http://www.openssl.org
Development Mailing List   openssl-dev@openssl.org
Automated List Manager   [EMAIL PROTECTED]


Re: memory corruption after usin BN_mod_inverse

2008-01-30 Thread Евгений Ломовский
Hi, Yair Elharrar!

> Sorry, I don't think that breaks any const rules.
> See explanation and example in ISO/IEC 14882 section 7.1.5.1.

First of all, OpenSSL was written in C, so ISO/IEC 14882 is not a
subject to reffer to (it is the C++ standard).

Let's see in ISO/IEC 9899 section 6.7.3:
"The implementation may place a const object that is not volatile in a
read-only region of storage." That's enough.

Then, if you look in BN_sub you'll easy understand that behavior will
be undefined if r and b point to the same object.

-- 
 Eugene.
__
OpenSSL Project http://www.openssl.org
Development Mailing List   openssl-dev@openssl.org
Automated List Manager   [EMAIL PROTECTED]


RE: memory corruption after usin BN_mod_inverse

2008-01-30 Thread Yair Elharrar
Sorry, I don't think that breaks any const rules.
See explanation and example in ISO/IEC 14882 section 7.1.5.1.


-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of [EMAIL PROTECTED]
Sent: Wednesday, January 30, 2008 3:59 PM
To: openssl-dev@openssl.org
Subject: memory corruption after usin BN_mod_inverse


Hello!

During the OpenSSL source investigation I found some strange call in
function BN_mod_inverse:
...
if (sign < 0)
{
if (!BN_sub(Y,n,Y)) goto err;
}
...

But! Declaration of BN_sub looks like this:
int BN_sub(BIGNUM *r, const BIGNUM *a, const BIGNUM *b)

In some circumstances r will be expanded in BN_sub, so original call
"BN_sub(Y,n,Y)" breaks the rule of const.

--
 Eugene.
__
OpenSSL Project http://www.openssl.org
Development Mailing List   openssl-dev@openssl.org
Automated List Manager   [EMAIL PROTECTED]

This email and any files transmitted with it are confidential material. They 
are intended solely for the use of the designated individual or entity to whom 
they are addressed. If the reader of this message is not the intended 
recipient, you are hereby notified that any dissemination, use, distribution or 
copying of this communication is strictly prohibited and may be unlawful.

If you have received this email in error please immediately notify the sender 
and delete or destroy any copy of this message
__
OpenSSL Project http://www.openssl.org
Development Mailing List   openssl-dev@openssl.org
Automated List Manager   [EMAIL PROTECTED]


memory corruption after usin BN_mod_inverse

2008-01-30 Thread Евгений Ломовский
Hello!

During the OpenSSL source investigation I found some strange call in
function BN_mod_inverse:
...
if (sign < 0)
{
if (!BN_sub(Y,n,Y)) goto err;
}
...

But! Declaration of BN_sub looks like this:
int BN_sub(BIGNUM *r, const BIGNUM *a, const BIGNUM *b)

In some circumstances r will be expanded in BN_sub, so original call
"BN_sub(Y,n,Y)" breaks the rule of const.

-- 
 Eugene.
__
OpenSSL Project http://www.openssl.org
Development Mailing List   openssl-dev@openssl.org
Automated List Manager   [EMAIL PROTECTED]


[openssl.org #1637] Memory leak in SSL_set_tlsext_host_name

2008-01-30 Thread Paul Stewart via RT
Memory allocated in SSL_set_tlsext_host_name() isn't freed in  
SSL_free().  As a workaround one can do
SSL_set_tlsext_host_name(ssl, NULL) before SSL_free(), but I don't  
imagine this was what was meant to be implemented.  The bug is easy  
to replicate using the code below, using valgrind or your favorite  
memory profiler.  This was not a problem back in 0.9.8b, but 0.9.8f  
and 0.9.8g have this problem.


#include 

int main(int argc, char **argv) {
 SSL *ssl;
 SSL_CTX *ctx;
 SSL_library_init();

 ctx = SSL_CTX_new(SSLv23_client_method());
 ssl = SSL_new(ctx);
 SSL_set_tlsext_host_name(ssl, "hostname");
 SSL_free(ssl);
 SSL_CTX_free(ctx);

 CRYPTO_cleanup_all_ex_data();

 return 0;
}

__
OpenSSL Project http://www.openssl.org
Development Mailing List   openssl-dev@openssl.org
Automated List Manager   [EMAIL PROTECTED]


Re: Static global - bug? (Re: Two valgrind warnings in OpenSSL -possible bug???)

2008-01-30 Thread Paul Sheer
>  So you had a bug in your code. So what?

No bug - read this:

http://www.unix.org/version2/whatsnew/threadspaper.ps :



Registration of fork handlers (pthread_atfork( )). The fork handlers are
routines that are to

be executed in association with calls to the fork( ) function. There are
three classes of fork

handlers: prepare, parent, and child. Prepare fork handlers are executed
prior to fork()

processing, in the context of the calling thread. Parent fork handlers are
executed upon

completion of fork() processing in the parent, again in the context of the
calling thread. Child

fork handlers are executed upon completion of fork() processing in the
child, in the context of

the single thread initially existing in the child process.



Fork handlers are envisioned as a mechanism for dealing with the problem of
orphaned

mutexes that can occur when a multi-threaded process calls fork(). The
problem arises

when threads other than the calling thread own mutexes at the time of the
call to fork( ).

Since the non-calling threads are not replicated in the child process, the
child process is

created with mutexes locked by non-existent threads. These mutexes can
therefore never

be unlocked.



Fork handlers are intended to resolve the problem of orphaned mutexes in the
following way.

Prepare fork handlers can be written to lock all mutexes. In this way,
orphaned mutexes are

avoided, and the resources protected by the mutexes are not left in
inconsistent states. This

is due to the fact that the calling thread itself, which is replicated in
the child process, has

locked all mutexes. Thus, both the parent and child processes have all
mutexes locked upon

completion of fork() processing, at which time the parent and child fork
handlers execute.

The parent and child fork handlers unlock mutexes locked by the prepare fork
handler.



Fork handlers are especially useful in enabling independently-developed
libraries and

application programs to protect themselves from one another. A
multi-threaded library can

protect itself from application programs that issue fork( ) operations,
possibly without even

knowing that the library is multi-threaded, by providing fork handlers.
Similarly, an

application program can protect itself from fork( ) operations issued by
library functions


>  3) You cannot link to the pthreads library and still use fork, and
David, you absolutely cannot link with pthreads and still use fork()

It doesn't work except in a few very simplistic scenarios.

-paul