Jean-Daniel Cryans has submitted this change and it was merged. Change subject: KUDU-1964. security: avoid calling ERR_clear_error() defensively ......................................................................
KUDU-1964. security: avoid calling ERR_clear_error() defensively This changes our various code which wraps OpenSSL to avoid calling ERR_clear_error() defensively. Instead, we now make sure to call ERR_clear_error() after any case where we receive an error return value from an OpenSSL library call. For cases where we use the existing wrapper macros like OPENSSL_RET_NOT_OK we are already ensured of this since the GetOpenSSLErrors() call clears the error queue. This provides a performance boost, since ERR_clear_error() ends up taking several library-wide mutexes in OpenSSL 1.0.x (apparently improved in OpenSSL 1.1, but that's not available on current OSes). To ensure that we don't accidentally leave an OpenSSL error lying around after any functions, I added a scoped helper which is active in debug builds. This performs a DCHECK that the error queue is empty on scope entry and exit. To benchmark, I tested rpc-bench with encryption enabled using OpenSSL 1.0.1e (RHEl6): Master, SSL off: ---------------- I0404 12:49:04.811158 7250 rpc-bench.cc:84] Mode: Async I0404 12:49:04.811172 7250 rpc-bench.cc:88] Client reactors: 16 I0404 12:49:04.811175 7250 rpc-bench.cc:89] Call concurrency: 60 I0404 12:49:04.811178 7250 rpc-bench.cc:92] Worker threads: 1 I0404 12:49:04.811182 7250 rpc-bench.cc:93] Server reactors: 4 I0404 12:49:04.811183 7250 rpc-bench.cc:94] ---------------------------------- I0404 12:49:04.811187 7250 rpc-bench.cc:95] Reqs/sec: 446998 I0404 12:49:04.811193 7250 rpc-bench.cc:96] User CPU per req: 9.97575us I0404 12:49:04.811197 7250 rpc-bench.cc:97] Sys CPU per req: 12.2342us I0404 12:49:04.811202 7250 rpc-bench.cc:98] Ctx Sw. per req: 0.604032 Master, SSL on: -------------- I0404 12:57:10.241720 9949 rpc-bench.cc:86] Mode: Async I0404 12:57:10.241736 9949 rpc-bench.cc:90] Client reactors: 16 I0404 12:57:10.241739 9949 rpc-bench.cc:91] Call concurrency: 60 I0404 12:57:10.241742 9949 rpc-bench.cc:94] Worker threads: 1 I0404 12:57:10.241745 9949 rpc-bench.cc:95] Server reactors: 4 I0404 12:57:10.241747 9949 rpc-bench.cc:96] Encryption: 1 I0404 12:57:10.241760 9949 rpc-bench.cc:97] ---------------------------------- I0404 12:57:10.241762 9949 rpc-bench.cc:98] Reqs/sec: 56761.3 I0404 12:57:10.241778 9949 rpc-bench.cc:99] User CPU per req: 39.7792us I0404 12:57:10.241783 9949 rpc-bench.cc:100] Sys CPU per req: 106.916us I0404 12:57:10.241786 9949 rpc-bench.cc:101] Ctx Sw. per req: 5.98631 Note the high number of context switches and system CPU. I gathered stack traces of context switches using "perf record -g -e cs": Overhead Samples Command Shared Object Symbol ........ ............ ......... ................. ............................. 100.00% 180979 rpc-bench [kernel.kallsyms] [k] perf_event_task_sched_out | --- perf_event_task_sched_out schedule | |--83.17%-- futex_wait_queue_me | futex_wait | do_futex | sys_futex | system_call_fastpath | | | |--83.11%-- __lll_lock_wait | | | | | |--42.33%-- int_thread_get | | | | | |--29.36%-- CRYPTO_add_lock | | | | | |--28.28%-- int_thread_get_item | | --0.03%-- [...] | | | |--16.89%-- pthread_cond_wait@@GLIBC_2.3.2 | | | | | |--100.00%-- kudu::rpc::ServicePool::RunThread() | | | kudu::Thread::SuperviseThread(void*) | | | start_thread | | --0.00%-- [...] | --0.01%-- [...] | |--16.80%-- schedule_hrtimeout_range | | | |--100.00%-- ep_poll | | sys_epoll_wait | | system_call_fastpath | | epoll_wait | | | | | --100.00%-- ev_run | | kudu::rpc::ReactorThread::RunThread() | | kudu::Thread::SuperviseThread(void*) | | start_thread | --0.00%-- [...] --0.03%-- [...] With this patch --------------- I0404 14:13:59.529119 27892 rpc-bench.cc:86] Mode: Async I0404 14:13:59.529134 27892 rpc-bench.cc:90] Client reactors: 16 I0404 14:13:59.529137 27892 rpc-bench.cc:91] Call concurrency: 60 I0404 14:13:59.529140 27892 rpc-bench.cc:94] Worker threads: 1 I0404 14:13:59.529142 27892 rpc-bench.cc:95] Server reactors: 4 I0404 14:13:59.529145 27892 rpc-bench.cc:96] Encryption: 1 I0404 14:13:59.529155 27892 rpc-bench.cc:97] ---------------------------------- I0404 14:13:59.529158 27892 rpc-bench.cc:98] Reqs/sec: 280819 I0404 14:13:59.529173 27892 rpc-bench.cc:99] User CPU per req: 18.2016us I0404 14:13:59.529177 27892 rpc-bench.cc:100] Sys CPU per req: 19.9601us I0404 14:13:59.529181 27892 rpc-bench.cc:101] Ctx Sw. per req: 1.20247 (about 5x improved throughput) The SSL-related context switches no longer show up: 100.00% 186653 rpc-bench [kernel.kallsyms] [k] perf_event_task_sched_out | --- perf_event_task_sched_out schedule | |--81.87%-- schedule_hrtimeout_range | | | |--100.00%-- ep_poll | | sys_epoll_wait | | system_call_fastpath | | epoll_wait | | | | | --100.00%-- ev_run | | kudu::rpc::ReactorThread::RunThread() | | kudu::Thread::SuperviseThread(void*) | | start_thread | --0.00%-- [...] | |--18.11%-- futex_wait_queue_me | futex_wait | do_futex | sys_futex | system_call_fastpath | | | |--98.68%-- pthread_cond_wait@@GLIBC_2.3.2 | | | | | |--100.00%-- kudu::rpc::ServicePool::RunThread() | | | kudu::Thread::SuperviseThread(void*) | | | start_thread | | --0.00%-- [...] | | | |--1.02%-- base::internal::SpinLockDelay(int volatile*, int, int) | | | | | |--99.82%-- base::SpinLock::SlowLock() | | | | | | | |--98.09%-- kudu::rpc::LifoServiceQueue::BlockingGet(std::unique_ptr<kudu::rpc::InboundCall, std::default_ | | | | kudu::rpc::ServicePool::RunThread() | | | | kudu::Thread::SuperviseThread(void*) | | | | start_thread | | | | | | | --1.91%-- kudu::rpc::LifoServiceQueue::Put(kudu::rpc::InboundCall*, boost::optional<kudu::rpc::InboundCal | | | kudu::rpc::ServicePool::QueueInboundCall(gscoped_ptr<kudu::rpc::InboundCall, kudu::DefaultDelet | | | kudu::rpc::Messenger::QueueInboundCall(gscoped_ptr<kudu::rpc::InboundCall, kudu::DefaultDeleter | | | kudu::rpc::Connection::HandleIncomingCall(gscoped_ptr<kudu::rpc::InboundTransfer, kudu::Default | | | void ev::base<ev_io, ev::io>::method_thunk<kudu::rpc::Connection, &(kudu::rpc::Connection::Read | | | ev_invoke_pending | | | ev_run | | | kudu::rpc::ReactorThread::RunThread() | | | kudu::Thread::SuperviseThread(void*) | | | start_thread | | --0.18%-- [...] | --0.29%-- [...] --0.03%-- [...] Change-Id: I3b4421f4aae4d0e5a2d938881f9eea4e07ff2b10 Reviewed-on: http://gerrit.cloudera.org:8080/6552 Reviewed-by: Alexey Serbin <aser...@cloudera.com> Tested-by: Kudu Jenkins (cherry picked from commit 5f1ca4f3948a61b22946255e4ada895c77bc6adf) Reviewed-on: http://gerrit.cloudera.org:8080/6801 Reviewed-by: Jean-Daniel Cryans <jdcry...@apache.org> --- M src/kudu/security/ca/cert_management.cc M src/kudu/security/crypto.cc M src/kudu/security/openssl_util.cc M src/kudu/security/openssl_util.h M src/kudu/security/tls_context.cc M src/kudu/security/tls_handshake.cc M src/kudu/security/tls_socket.cc 7 files changed, 111 insertions(+), 19 deletions(-) Approvals: Jean-Daniel Cryans: Looks good to me, approved Kudu Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/6801 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: I3b4421f4aae4d0e5a2d938881f9eea4e07ff2b10 Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: branch-1.3.x Gerrit-Owner: Todd Lipcon <t...@apache.org> Gerrit-Reviewer: Jean-Daniel Cryans <jdcry...@apache.org> Gerrit-Reviewer: Kudu Jenkins