> In my case, handshake rate drops down to 5-6% on the same hardware in 1.0.1c in comparison to 1.0.0i. I was wrong. Handshake performance degradation is about 10%.
First guilty function is EVP_DigestSignFinal what is perform copying of supplied context. When I replaces in tls1_P_hash() EVP_DigestSignFinal() by it's dull equvivalent, not performing context copying i got these numbers: OpenSSL 1.0.1c - EVP_DigestSignFinalNoCopy Digest init called 105 times. Digest copy called 70 times. Digest cleanup called 174 times. And get 5% performance improvement in comparison to original OpenSSL 1.0.1c (Still, 5% worth than 1.0.0) Second suspect is re-initialization of MAC context in tls1_P_hash()'s loop. While it was real re-initialization when there was HMAC_Init_ex(), what performed initialization of internal i_ctx and o_ctx only when necessary, with EVP_DigestSignInit it is full re-creating, and internal i_ctx and o_ctx is always initialized. This is why we see 3 times more 'digest init' calls. Also, there could be other reasons for what is still hidden from me. Eliminating EVP_DigestSignFinal overhead in tls1_P_hash() by replasing it with calls, what do not perform context copying is trivial. But how can we properly perfrom MAC true re-initialization instead of creation from very beggining? > As a drawback, keyblock setup for a chiphersuites with 256-bit encryption > and MAC key require about 3 times more intensive usage of hash objects. > For example, in order to perform one handshake, > in OpenSSL 1.0.0i > Digest init called 30 times. > Digest copy called 69 times. > Digest cleanup called 98 times. > > OpenSSL 1.0.1c > Digest init called 105 times. > Digest copy called 160 times. > Digest cleanup called 264 times. > > ~3 times more intensive hashes objects usage definitely not good for > performance. > >In my case, handshake rate drops down to 5-6% on the same hardware in 1.0.1c > in comparison to 1.0.0i. > > > > Is there any way to reduce hash objects usage, while keeping TLS 1.1/1.2 > features? > >