Re: major ssl read/ write performance improvement

Deng Michael Sat, 03 Dec 2011 15:56:12 -0800

Hi Andrey,

Thanks for trying it out. I did not try this version with many engines. I am 
very interested in your set up. could you try without the patch how it works 
(under gdb)


what is the value of "ctx->pctx->pmeth->signctx" when the function was entered. 
and what is the " tmp_ctx.pctx->pmeth->signctx" after the ctx copy. 


Also I am not sure how you use engines. The patch should work if digest engine 
is used (as digest engine such as sha1 or md5). I am sure if there is "signing" 
engine.

It would be great if your could send me the engine code and how your code used 
the engine then we could figure out how to escape that. I am not sure how the 
pointer is set up by openssl (I'll do some digging there). but the value "0x08" 
likely coming from a member of NULL pointer structure (the member happens to be 
at offset 8). this is a guess.

Regards,

Michael Deng

[email protected]


----- Original Message -----
From: Andrey Kulikov <[email protected]>
To: [email protected]
Cc: 
Sent: Saturday, December 3, 2011 5:15 PM
Subject: Re: major ssl read/ write performance improvement

Hello,

Thanks for interesting contribution!

Unfortunately when I apply the patch s_server failed with SEGFAULT,
when using ccgost engine (and possibly others) here:

EVP_DigestSignFinal
        if (sctx)
>>>>>      r = md_ctx_ptr->pctx->pmeth->signctx(md_ctx_ptr->pctx,
                               sigret, siglen, md_ctx_ptr);
        else

because of
pmeth->signctx == 0x08
(or something like this)

When I use RSA certificate segfault didn't occur, as pmeth->signctx
points to some valid place.

Stacktrace is:

EVP_DigestSignFinal (ctx=0x87802b0, sigret=0xbfd5f6dc
\"\\b\\002x\\b\\001\", siglen=0xbfd5f698)
tls1_mac (ssl=0x877a088, md=0xbfd5f6dc \"\\b\\002x\\b\\001\", send=0)
ssl3_get_record (s=0x877a088)
ssl3_read_bytes (s=0x877a088, type=22, buf=0x8788d50 \"\\020\", len=4, peek=0)
ssl3_get_message (s=0x877a088, st1=8608, stn=8609, mt=-1, max=514,
ok=0xbfd5f8b8)
ssl3_get_cert_verify (s=0x877a088)
ssl3_accept (s=0x877a088)
ssl3_read_bytes (s=0x877a088, type=23, buf=0x877e7e8 \"\", len=4096, peek=0)
ssl3_read_internal (s=0x877a088, buf=0x877e7e8, len=4096, peek=0)
ssl3_read (s=0x877a088, buf=0x877e7e8, len=4096)
SSL_read (s=0x877a088, buf=0x877e7e8, num=4096)
ssl_read (b=0x8779370, out=0x877e7e8 \"\", outl=4096)
BIO_read (b=0x8779370, out=0x877e7e8, outl=4096)
buffer_gets (b=0x8777e00, buf=0x877a7e0 \"\", size=16382)
BIO_gets (b=0x8777e00, in=0x877a7e0 \"\", inl=16383)
www_body (hostname=0x0, s=6, context=0x0)
do_server (port=443, type=1, ret=0x8248ac8, cb=0x8072d24 <www_body>,
context=0x0)
s_server_main (argc=0, argv=0xbfd602b8)
do_cmd (prog=0x8770868, argc=16, argv=0xbfd60278)
main (Argc=16, Argv=0xbfd60278)


Could you please advice, what going wrong with your code???

Go check it you need:
1. Adjust your openssl.cnf file, bu adding there:

openssl_conf = openssl_def

[openssl_def]
engines = engine_section

[engine_section]
gost = gost_section

[gost_section]
engine_id = gost
default_algorithms = ALL

somewhhere before "[ new_oids ]" (if we talking about sample config
file from OpenSSL distribution).

2. Generate private key:
./apps/openssl genpkey -engine gost -algorithm gost2001 -pkeyopt
paramset:A -out botkey.p8

3. Create self-sign certificate
./apps/openssl req -x509 -days 1095 -subj
'/C=US/CN=ccgost_srv/[email protected]' -engine gost -new -key
botkey.p8 -out botcert.cer

4. Run s_server
./apps/openssl s_server  -engine gost  -tls1 -www -accept 443  -state
-cert botcert.cer  -key botkey.p8 -cipher "aGOST01"

5. Run s_client
./apps/openssl s_client -tls1 -connect 192.168.10.103:443 -msg

Well.... Here s_client will crash with segfault... But if you'll
connect via browser - s_server will crash.


Please let me know if you'll have any questions.

Andrey.


On 30 November 2011 05:56, Deng Michael <[email protected]> wrote:
> Thanks Steve for the comment.
>
>
> I guess there are other ways to do similar things, since I was not sure about 
> the intentions of the original code I was trying to make the change in a way 
> such that when checkpoint is not call it should behave like before. Adding a 
> new field for me is less likely to interfere with other code. It seems to me 
> the three evp_md_ctxs contained within the hmac_md_ctx has the data for 
> restoring the state but I was not sure. Also the new field serves as a flag 
> to tell if it has checkpoint data (I could have used an existing flag). My 
> patch also contains some hacking I would think.
>
>
> anyway the real saving comes from redo of state preserving of the evp_md_ctx 
> that contains evp_pkey_ctx which in turn contains hmac_ctx which again 
> contains three evp_md_ctx's. the dup of these are called in
>
> tls1_mac() similar place for ssl3
> and
> EVP_DigestSignFinal()
>
> these two are the super expensive ones (real super)
>
> the copy of ctx in
> HMAC_Final()         --- this one is not too bad
>
>
> can be simplified.
>
>
> I would think the saving is so much that is worth changing maybe in future 
> releases.
>
> regards,
> Michael
>
>
>
> ----- Original Message -----
> From: Dr. Stephen Henson <[email protected]>
> To: [email protected]
> Cc:
> Sent: Tuesday, November 29, 2011 1:21 PM
> Subject: Re: major ssl read/ write performance improvement
>
> On Mon, Nov 28, 2011, Deng Michael wrote:
>
>> Hi,
>>
>> I have changed the mac code which gives substantial improvement for both 
>> read and write (not handshake)
>>
>> The saving is fairly major, on cpu with cryto acceleration, the change can 
>> more than double the overall ssl read /write speed for 1K record excluding 
>> OS IO time. this implies the change removed majority of the code overhead 
>> for read and write.
>>
>> The basic idea is to remove all the EVP_MD_CTX duplications (which is very 
>> cpu intensive) during read and write. the original code involves numerous 
>> memory allocations and frees for each read or write all due to the ctx's 
>> deep copy.
>>
>> the new way of keeping the ctx is to make it do state checkpoint and restore 
>> instead of deep copy, after this change there is NO memory operation for 
>> read and write. The changes are not too big also.
>>
>> One catch (should not really be a catch) is that at application level NO 
>> MORE than one thread can work on the SAME SSL/TLS connection for read or 
>> write (read or write can be done at the same time). But I would think most 
>> apps would NEVER allow more than one thread to read or write on the same 
>> connection (I don't think it would work if you do that anyway, even without 
>> my change).
>>
>> the patch file I attached is mad from 1.0.0e version.
>>
>
> Thanks for the patch. It should really go to the request tracker RT though.
>
> There are a few problems with the patch as it stands.
>
> Firstly new features will never be added to 1.0.0x only security and bug
> fixes.
>
> Your patch adds a field in the middle of an EVP_MD_CTX which will result in
> binary compatibility issues with existing applications so that makes it
> problematical including it in 1.0.1 either. Adding the field on the end would
> result in fewer problems but it would still increase the size of EVP_MD_CTX.
>
> However I wonder if the same savings could be achieved in a different way. If
> the destination EVP_MD_CTX is the same digest as the existing one no new
> memory is allocated and it should simply memcpy the result across which should
> be a far less expensive operation.
>
> So perhaps if instead of having a temporary EVP_MD_CTX which is created and
> destroyed regularly we could have a more persistent one tied to the SSL
> structure: so the initial copy would allocate memory but subsequent ones would
> only be a memcpy? Adding fields at the end of an SSL structure is likely to
> cause far fewer problems because SSL structures are allocated using SSL_new().
>
> Steve.
> --
> Dr Stephen N. Henson. OpenSSL project core developer.
> Commercial tech support now available see: http://www.openssl.org
> ______________________________________________________________________
> OpenSSL Project                                http://www.openssl.org
> Development Mailing List                      [email protected]
> Automated List Manager                          [email protected]
>
> ______________________________________________________________________
> OpenSSL Project                                 http://www.openssl.org
> Development Mailing List                       [email protected]
> Automated List Manager                           [email protected]
______________________________________________________________________
OpenSSL Project                                http://www.openssl.org
Development Mailing List                      [email protected]
Automated List Manager                          [email protected]

______________________________________________________________________
OpenSSL Project                                 http://www.openssl.org
Development Mailing List                       [email protected]
Automated List Manager                           [email protected]

Re: major ssl read/ write performance improvement

Reply via email to