Hi, I have a windows multi-threaded SSL server application which handles
each client request in a new thread. The Server handles different types of
requests. One of the request type is like “send file” where server
thread has to read a file from local filesystem and send the content to the
client.Server
configurations:
FIPS:
Enabled
SSL Protocol:
TLSv1.2
Cipher: AES256-SHA It was observed that as the number of thread
parallelism increases, the throughput decreases.To profile the server, I had
recompiled the OpenSSL and FIPS source with debug symbol information. When run
under a statistical profiler “verysleepy“
(http://www.codersnotes.com/sleepy) points out below stack (hotspot
) which was consuming most of the
time.###################################WaitForSingleObjectEx
KERNELBASE
[unknown]
0
0x7fefd2610dcCRYPTO_lock
LIBEAY64
c:\openssl_src\openssl-1.0.2f\crypto\cryptlib.c
597
0xfb0bb26FIPS_lock &nb
sp;
LIBEAY64
c:\fips_src\openssl-fips-2.0.10\fips\utl\fips_lck.c
69
0xfceb291fips_drbg_bytes
LIBEAY64
c:\fips_src\openssl-fips-2.0.10\fips\rand\fips_drbg_rand.c
86
0xfcfe868RAND_bytes &n
bsp;
LIBEAY64
c:\openssl_src\openssl-1.0.2f\crypto\rand\rand_lib.c
159
0xfc0dbe5tls1_enc
SSLEAY64
c:\openssl_src\openssl-1.0.2f\ssl\t1_enc.c
786
0x3b6675cdo_ssl3_write
SSLEAY64 &
nbsp;
c:\openssl_src\openssl-1.0.2f\ssl\s3_pkt.c
1042
0x3b4c336ssl3_write_bytes
SSLEAY64
c:\openssl_src\openssl-1.0.2f\ssl\s3_pkt.c
830
0x3b4baddssl3_write
SSLEAY64
c:\openssl_src\openssl-1.0.2f\ssl\s3_lib.c &
nbsp; 4404
0x3b4796cSSL_write
SSLEAY64
c:\openssl_src\openssl-1.0.2f\ssl\ssl_lib.c
1047
0x3b7a3e4################################### To check if this behavior can
be seen outside of our code, I wrote a standalone multi threaded SSL server
which performs same task as “send file”. And profiling of the
standalone server also point out at the similar stack. So I was able to
reproduced this behavior in standalone program.File size used: 340 MB To
find out how the bottleneck varies with increasing the parallel thread count
in standalone SSL server program, I analyzed one thread behavior with
different parallelism and here are the
results:######################“Parallel thread count” ->
“% of time spend in waiting for global lock”1 -> 1 %2
-> 2 %5 -> 5 %10 -> 40 %15 -> 46 %20 -> 65 %25 -> 68 %30
-> 70 %###################### After digging into the FIPS code found
that there is a global lock around the random number generation code which is
causing the bottleneck when multiple threads want to perform SSL_write
operation in parallel.Code snippet from
fips/rand/fips_drbg_rand.c:######################/* Since we only have one
global PRNG used at any time in OpenSSL use a global* variable to store
context.*/static DRBG_CTX ossl_dctx;….….static int
fips_drbg_bytes(unsigned char *out, int
count)
{
DRBG_CTX *dctx =
&ossl_dctx;
int rv =
0;
unsigned char *adin =
NULL;
size_t adinlen =
0;
CRYPTO_w_lock(CRYPTO_LOCK_RAND);
….
….
CRYPTO_w_unlock(CRYPTO_LOCK_RAND);###################### As comment from
fips_drbg_rand.c says, do we really need to have one global PRNG at any time in
OpenSSL? Does any
one has any suggestion about how starvation (due to the global locks) of
parallel SSL_write can be reduced? Any suggestions are welcome
:) Thanks,Dipak
--
openssl-users mailing list
To unsubscribe: https://mta.openssl.org/mailman/listinfo/openssl-users