Hello Henry Robinson, Dan Hecht,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/7116

to look at the new patch set (#4).

Change subject: IMPALA-5221: Avoid re-use of stale SASL contexts.
......................................................................

IMPALA-5221: Avoid re-use of stale SASL contexts.

The TSaslTransport is written as a thrift extension that is a wrapper
around the Cyrus-SASL APIs. This transport is then used by Impala's
RPC layer.

On RHEL7 systems that use newer versions of the Cyrus-SASL library,
we noticed that we sometimes crash inside the Cyrus-SASL thirdparty
while trying to lock an internal mutex. During my investigation, I
found that we needed to fix the order of negotiation that happens in
an edge case.

The steps to use the Cyrus-SASL APIs for SASL negitiation are the
following (Replace '_client_' with '_server_' for server calls):
sasl_client_new()
sasl_client_start()
sasl_client_step()
sasl_dispose()   < --- When we're done with the connection.

sasl_client_new() was being called in the constructor TSaslClient()
which is invoked from SaslAuthProvider::WrapClientTransport().

sasl_client_start() and sasl_client_step() were being called under
TSaslTransport::open(). If for some reason we hit an error during
SASL negotiation, the TSaslTransport::open() call would fail. When
we fail, the ThriftClientImpl::OpenWithRetry() does a retry, which
directly retries the negotiation from sasl_client_start(). This
caused the use of already freed resources from the first negotiation
failure, hence causing the crash.

To fix this, we make sure that on a negotiation failure, we dispose
of all the resources properly by calling sasl_dispose() and retry
the negotiation from the start by calling sasl_client_new() first, and
then the remaining steps. This is done by moving the sasl_client_new()
and sasl_server_new() calls out of the TSaslClient/TSaslServer
constructors and into a new call called TSasl*::setupSaslContext(),
which is called under TSaslTransport::open().

The patch is fairly large for the above mentioned change, however,
most of it is just plumbing.

Testing: Tested on systems with older SASL versions to make sure
we don't regress. Also tested on systems with newer SASL versions
where we previously saw the crash and verified that we don't see
them anymore.
Also, tested with GSSAPI and LDAP mechanisms.

Change-Id: I7d509e14dfe46fa28d0626cf84daf6de82955d80
---
M be/src/transport/TSasl.cpp
M be/src/transport/TSasl.h
M be/src/transport/TSaslClientTransport.cpp
M be/src/transport/TSaslClientTransport.h
M be/src/transport/TSaslServerTransport.cpp
M be/src/transport/TSaslServerTransport.h
M be/src/transport/TSaslTransport.cpp
M be/src/transport/TSaslTransport.h
8 files changed, 234 insertions(+), 82 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/16/7116/4
-- 
To view, visit http://gerrit.cloudera.org:8080/7116
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7d509e14dfe46fa28d0626cf84daf6de82955d80
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Sailesh Mukil <sail...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dhe...@cloudera.com>
Gerrit-Reviewer: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sail...@cloudera.com>

Reply via email to