GitHub user ivmaykov opened a pull request:
https://github.com/apache/zookeeper/pull/627
ZOOKEEPER-236: SSL Support for Atomic Broadcast protocol [part 2]
# Overview
These are fixes and improvements to #184 that we've coded up at Facebook
while testing that PR internally. This is meant to be in addition to #184, and
should be stacked on top of it (and in fact includes the commits from that pull
request). Most notable changes:
## Fixed networking issues/bugs in UnifiedServerSocket
- don't crash the accept() thread if the client closes the connection
without sending any data
- don't corrupt the connection if the client sends smaller than 5 bytes for
the initial read
- delay the detection of TLS vs. plaintext mode until a socket stream is
read from or written to, for example
- prepending 5 bytes to `PrependableSocket` and then trying to read >5
bytes would only return the first 5 bytes, even if more bytes were available.
This is fixed.
## Added support for PEM formatted key stores and trust stores
- key store and trust store files can now be in PEM format as well as JKS.
- Added config properties to tell ZK what type of trust/key store to load:
- `zookeeper.ssl.keyStore.type` and `zookeeper.ssl.trustStore.type` for
ClientX509Util
- `zookeeper.ssl.quorum.keyStore.type` and
`zookeeper.ssl.quorum.trustStore.type` for QuorumX509Util
- store type properties could have the values "JKS", "PEM", or not set
- leaving the type properties unset will cause auto-detection of the store
type based on the file extension (".jks" or ".pem")
## Added support for reloading key/trust stores when the file on disk
changes
- new property `sslQuorumReloadCertFiles` which controls the behavior for
reloading the key and trust store files for `QuorumX509Util`. Reloading of key
and trust store for `ClientX509Util` is not in this PR but could be added easily
- this allows a ZK server to keep running on a machine that uses
short-lived certs that refresh frequently without having to restart the ZK
process.
## Added test utilities for easily creating X509 certs and using them in
unit tests
- added new class `X509TestContext` and its friend, `X509TestHelpers`
- rewrote some existing unit tests to use these classes, and added new
tests that use them
- some existing tests (i.e. `QuorumSSLTest`) should probably be ported to
use this as well, haven't got around to it yet
## Added more options for ssl settings to X509Util and encapsulate them
better
- previously, some SSL settings came from a ZKConfig and others came from
global `System.getProperties()`. This made it hard to isolate certain settings
in tests.
- now all SSL-related settings come from the ZKConfig object used to create
the context
- new settings added:
- `zookeeper.ssl(.quorum).enabledProtocols` - list of enabled protocols. If
not set, defaults to a single-entry list with the value of
`zookeeper.ssl(.quorum).protocol`.
- `zookeeper.ssl(.quorum).clientAuth` - can be "NONE", "WANT", or "NEED".
This controls whether the server doesn't want / allows / requires the client to
present an X509 certificate.
- `zookeeper.ssl(.quorum).handshakeDetectionTimeoutMillis` - timeout for
the first read of 5 bytes to detect the transport mode (TLS or plaintext) of a
client connection made to a `UnifiedServerSocket`
## Fixed odd plaintext perf regression caused by new dependency on
`org.apache.httpcomponents`
- internal regression testing showed a >10% perf degradation in plaintext
mode that was tracked down to the addition of the `org.apache.httpcomponents`
dependency.
- Removing the dependency and implementing the hostname verification by
mostly copy-pasting the hostname verification code from `httpcomponents` fixed
the regression.
There may be some other changes that I'm forgetting, but those are the main
ones.
We believe that #184 should not be accepted without this PR coming close
behind, mainly because of the issues with UnifiedServerSocket. We were able to
put clusters into a bad state pretty easily simply by having a participant
disconnect immediately after connecting to the quorum port. This crashed the
accept thread in the Leader and prevented any other participants from
connecting. This could happen due to network hick-ups, etc. Ideally, both PRs
get +1s around the same time and can land with a small delay one after another.
We have started testing internal builds with #184 and the code in this PR.
Quorums with TLS disabled have shown no perf or stability regressions, so we
believe it's safe to merge this code to master as it's purely opt-in at this
point, gated behind config options. We've done correctness testing of quorums
with TLS enabled, but have not done perf testing to measure the impact of
enabling quorum TLS yet. That should happen in the near future, and I will
update this PR with our perf test results once we have them.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/ivmaykov/zookeeper fb-tls
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/zookeeper/pull/627.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #627
----
commit 88b61716d26f6b3c11325367a075bb062dc4147c
Author: Andor Molnar <andor@...>
Date: 2018-04-03T19:24:49Z
ZOOKEEPER-236: SSL Support for Atomic Broadcast protocol
commit c452d1b036bb3c9cd53478254129474ad4e387ea
Author: Andor Molnar <andor@...>
Date: 2018-05-18T00:09:38Z
ZOOKEEPER-236. Fixed unit test + added some extra debug logging
commit ed10e88de70aa97f12f586d0c52b63a7d9a4cbae
Author: Andor Molnar <andor@...>
Date: 2018-05-18T16:58:40Z
ZOOKEEPER-236. Added cipher suite to test to run on CentOS. Timeout in
constant value. Null checks
commit 9ab476a7a4d345a94f174bcd831a350ec92c94f9
Author: Andor Molnar <andor@...>
Date: 2018-05-20T20:09:09Z
ZOOKEEPER-236. Trying to fix cipher suites test by changing the default
protocol to TLSv1.2 and filter suitable cipher suites
commit d64eb26f75574f118d29cffec11d4116b469c36a
Author: Andor Molnar <andor@...>
Date: 2018-06-11T15:09:58Z
ZOOKEEPER-236. Code review related changes:
- server & client hostname verification can be set independently,
- refactor defaultSSLContext to use AtomicReference,
- some minor nitpicks
commit e8a1729794c8044f8fbde5c8f304cd1bd3648836
Author: Andor Molnar <andor@...>
Date: 2018-06-11T15:58:12Z
ZOOKEEPER-236. Reverted to use single property for hostname verification
commit 777f31ac2dba38f2315dfdbfa47cdcb3832e8a6f
Author: Andor Molnar <andor@...>
Date: 2018-06-14T15:37:58Z
ZOOKEEPER-236. Added Java8/Java9 default cipher suites
commit a9fa698131d36670b11973b8823efd11afa1acb5
Author: Andor Molnar <andor@...>
Date: 2018-06-15T11:57:46Z
ZOOKEEPER-236. Code review fixes:
- Added comment to clarify default enabled cipher suites,
- Use java.specification.version to determine JDK version,
- Use port unification only when SSL quorum is enabled,
- Unit test fixes.
commit 1f8aab05bf45a5c6c40eb30c84c8a7f95a7fa27e
Author: Andor Molnar <andor@...>
Date: 2018-06-15T13:04:31Z
ZOOKEEPER-236. Revert portUnification/sslQuorum logic
commit 209fbca78c083a1c06bee8d463ba77a86a3ee1ee
Author: Andor Molnar <andor@...>
Date: 2018-06-22T14:38:17Z
ZOOKEEPER-236. Added new JMX properties to expose SSL quorum related
settings
commit d78e1146a8a35f6d2f9d75fd9b769ff8a423540c
Author: Ilya Maykov <ilyam@...>
Date: 2018-08-20T23:22:23Z
[rebase] [Quorum TLS] Fix bugs in PrependableSocket and UnifiedServerSocket
Summary:
Fixed some bugs in PrependableSocket and UnifiedServerSocket
- using SequenceInputStream in PrependableSocket broke the read() behavior
when trying to read > 5 bytes
- changed it to use PushbackInputStream which behaves as expected
- changed UnifiedSocketTest to read more than 5 bytes from the unified
socket to make sure the fix works
- call setNeedClientAuth(true) in UnifiedServerSocket.accept() when
connection is TLS. This is to have parity with
X509Util.configureSSLServerSocket() which sets the same behavior for strict
server-side SSL sockets.
- Use the Java8 API for creating a secure socket when some bytes have
already been read. I think this allows us to avoid the overhead of
PushbackInputStream in the SSL case.
- added "localhost" to the connect address of client socket in
UnifiedServerSocket test. The test didn't work for me otherwise.
Test Plan: `ant -Dtestcase=UnifiedServerSocketTest test-core-java`
Reviewers: nixon, bunn, nedelchev
Reviewed By: nixon
Subscribers: nixon, dongw, [email protected]
Differential Revision: https://phabricator.intern.facebook.com/D9234843
Tasks: T31626216
Signature: 9234843:1533776414:92d829ba49f997931345e7d90b4a2183b666f294
commit ef2db825516bf0a20d2a62a6f1c3449f543a15ca
Author: Ilya Maykov <ilyam@...>
Date: 2018-08-20T23:22:24Z
[rebase] [Quorum TLS] get SSL protocol from ZKConfig instead of System
properties
Summary: See title. This makes this property consistent with all the others
that we already get through the ZKConfig.
Test Plan: `ant -Dtestcase=X509UtilTest test-core-java`
Reviewers: nixon, bunn, nedelchev
Reviewed By: nixon
Subscribers: nixon, dongw, [email protected], [email protected]
Differential Revision: https://phabricator.intern.facebook.com/D9236333
Tasks: T31626216
Signature: 9236333:1533836898:5d1c734d96b807a5f6a39f863661669abefe11aa
commit 8682c3b48880831f70e918ddf53ea4275fa08cb2
Author: Ilya Maykov <ilyam@...>
Date: 2018-08-20T23:22:25Z
[rebase] [Quorum TLS] added X509TestContext and supporting code
Summary:
Added a few new classes that will make testing security related code much
easier.
- X509TestContext: easy way to manage certs and private keys in unit tests,
to be used with SSL sockets
- X509TestHelpers: static functions for creating certs and key pairs, and
saving them to JKS or PEM files on disk
- X509KeyType: enum for asymmetric key types
Test Plan: Bunch of tests in future diffs will rely on it
Reviewers: nixon, bunn, nedelchev
Reviewed By: nixon
Subscribers: nixon, dongw, [email protected]
Differential Revision: https://phabricator.intern.facebook.com/D9174703
Tasks: T31626216
Signature: 9174703:1533597616:330f1ea9db736c50825d6eb084e20af91b8fd782
commit fa08a806c59f7cc6e0f46686b6b388bfe07884e9
Author: Ilya Maykov <ilyam@...>
Date: 2018-08-20T23:22:26Z
[rebase] [Quorum TLS] update X509UtilTest to use X509TestContext and
parameterize the test
Summary: See title. Now X509UtilTest has better coverage of different
combinations of key types (RSA, EC) and password protected / unprotected key
store. Also added some new test cases and removed redundant ones.
Test Plan: ran X509UtilTest from IntelliJ
Reviewers: nixon, bunn, nedelchev
Reviewed By: nixon
Subscribers: nixon, dongw, [email protected]
Differential Revision: https://phabricator.intern.facebook.com/D9174711
Tasks: T31626216
Signature: 9174711:1533766713:efa901b625d079cf6f2fab26270faf9a39591d2e
commit ec1b1467f6934c36b3fd03b01928c1654c57b4e4
Author: Ilya Maykov <ilyam@...>
Date: 2018-08-20T23:22:27Z
[rebase] [Quorum TLS] clean up UnifiedServerSocketTest and make it use
X509TestContext
Summary: Use X509TestContext to create the cert and key files in
UnifiedSocketTest, plus other minor cleanup.
Test Plan: ran UnifiedSocketTest in IntelliJ
Reviewers: nixon, bunn, nedelchev
Reviewed By: nixon
Subscribers: nixon, dongw, [email protected]
Differential Revision: https://phabricator.intern.facebook.com/D9174713
Tasks: T31626216
Signature: 9174713:1534362831:b526048e672c9893a4046d0ff5e9df1d8cde4adc
commit d11d85cf2d3b883e5ade7029336521c77832650c
Author: Ilya Maykov <ilyam@...>
Date: 2018-08-20T23:22:28Z
[rebase] [Quorum TLS] Add support for parsing PEM encoded keys/certs
Summary: See title.
Test Plan: Tested it manually. Will add automated tests later.
Reviewers: nixon, bunn, nedelchev
Reviewed By: nixon
Subscribers: nixon, bunn, dain, dongw, nedelchev, [email protected]
Differential Revision: https://phabricator.intern.facebook.com/D8906740
Tasks: T31626216
Signature: 8906740:1534362912:b52885e9c0f9957b0a911697e976bf153adf6197
commit 210bfe7b75429668331fc14c383f0d1c9ae151b4
Author: Ilya Maykov <ilyam@...>
Date: 2018-08-20T23:22:29Z
[rebase] [Quorum TLS] Added FileChangeWatcher class and unit test
Summary: The FileChangeWatcher class provides a simple API to watch a local
directory for file changes, backed by WatchService from the Java standard
library. This will allow us to implement cert store / trust store reloading on
changes in a later diff.
Test Plan: added a new unit test, made sure it passes
Reviewers: nixon, bunn, nedelchev
Reviewed By: nixon
Subscribers: nixon, bunn, dongw, nedelchev, [email protected]
Differential Revision: https://phabricator.intern.facebook.com/D8906830
Tasks: T31626216
Signature: 8906830:1533162316:679eceab34d30955a582ef2dde8c23822af9555d
commit b2c8593b3c9a7adca79600b1c6f9bb8946f862a0
Author: Ilya Maykov <ilyam@...>
Date: 2018-08-20T23:22:30Z
[rebase] [Quorum TLS] enable cert store / trust store reloading
Summary: Enable the reloading of cert store / trust store files.
Test Plan: TBD.
Reviewers: nixon, bunn, nedelchev
Reviewed By: nixon
Subscribers: nixon, bunn, dongw, nedelchev, [email protected]
Differential Revision: https://phabricator.intern.facebook.com/D8906914
Tasks: T31626216
Signature: 8906914:1533600372:b254376732d94e74555160d25054a563a5378bc2
commit 9c70cc93c24d2bc7ed6af9ebd8464262aab625e3
Author: Ilya Maykov <ilyam@...>
Date: 2018-08-20T23:22:31Z
[rebase] [Quorum TLS] use UnifiedServerSocket even w/o port unification to
allow cert reloading
Summary: Added an option to UnifiedServerSocket to block insecure
connections. This allows us to implement cert store / trust store reloading
even when port unification is off, because the conversion to SSL socket is
delayed until after the connection is made. This lets us use an updated SSL
context if the file on disk changed.
Test Plan: Modified UnifiedServerSocketTest
Reviewers: nixon, bunn, nedelchev
Reviewed By: nixon
Subscribers: nixon, bunn, dongw, nedelchev, [email protected]
Differential Revision: https://phabricator.intern.facebook.com/D8906965
Tasks: T31626216
Signature: 8906965:1533770631:379211c3714ef8fe958012ed9584b64ecf0c4355
commit f5f54afce39bcc0f4df346bc64e4fca0916b53dd
Author: Ilya Maykov <ilyam@...>
Date: 2018-08-20T23:22:33Z
[rebase] [Quorum TLS] WIP: add denial-of-service-attack resistance to
UnifiedServerSocket
Summary:
Added resistance to simple denial-of-service attacks to UnifiedServerSocket.
The attack in question is simple: a client connects to the
UnifiedServerSocket and does nothing.
Because the old code did a blocking read() in the accept() method, this
would hang the accept thread
and prevent any other clients from connecting.
The solution is to write a new socket type
(`UnifiedServerSocket.UnifiedSocket`), which doesn't know
if it's a SSLSocket or plaintext socket at the time of creation. Trying to
read or write any data on
the socket (by calling `getInputStream()`, `getOutputStream()`, or
`sendUrgentData()`) will cause
the socket to peek at the first 5 bytes of the input stream to try and
determine if it
should switch to TLS or stay as plaintext, using the previous logic.
Changed `UnifiedServerSocket.accept()`
to return the new socket type.
Added new test cases to `UnifiedServerSocketTest` which check for DoS
resistance. Had to make changes
to the test to make `UnifiedServerThread` more closely resemble the
behavior in Leader.java (`accept()` on one
thread, do all reads and writes on another thread). I verified that the old
version of the code (which blocked in
`UnifiedServerSocket.accept()`) fails the new tests.
Test Plan: `ant -Dtestcase=UnifiedServerSocketTest test-core-java`
Reviewers: nixon, bunn, nedelchev
Reviewed By: nixon
Subscribers: nixon, dongw, [email protected]
Differential Revision: https://phabricator.intern.facebook.com/D9327931
Tasks: T31626216
Signature: 9327931:1534364537:832d8f589a70f0ceb8f4ae8faa1c3cbb4b42fd40
commit 359546a321cff748c5bd40887b45baa6ecc3e9ed
Author: Ilya Maykov <ilyam@...>
Date: 2018-08-29T16:02:01Z
[Quorum TLS] allow keystore/truststore password properties to be omitted if
the password is empty
Summary: Currently, the keystore password property must be specified if the
keystore location is. This is clunky, and I bet the common case will be key
stores not protected by a password anyway (if your key password has to be in
plaintext in a zoo.cfg file, it's not adding any security to your key store
file anyway). This change allows the keystore password property to be omitted
when there is no password on the keystore.
Test Plan: `ant -Dtestcase=X509UtilTest test-core-java`
Reviewers: nixon, bunn, nedelchev
Reviewed By: nixon
Subscribers: nixon, dongw, [email protected]
Differential Revision: https://phabricator.intern.facebook.com/D9374038
Tasks: T31626216
Signature: 9374038:1534532478:12253e0252ef69b95b2e8d7e86e96f7f988771a3
commit 33e9538f14e3264612c27e1e39b13ce748d63981
Author: Ilya Maykov <ilyam@...>
Date: 2018-08-29T16:02:01Z
[Quorum TLS] add config options for more TLS settings and preserve them
alongside SSLContext in X509Util
Summary:
Added a wrapper for the SSL Context + options that have to be set on the
socket and not on the context, and made all such options configurable via
properties. The options are:
- default TLS protocol to use
- enabled protocols to potentially downgrade to if a client doesn't support
our protocol of choice
- enabled ciper suites
- client auth mode (NONE / WANT / NEED)
Test Plan: sandcastle
Reviewers: nixon, bunn, nedelchev
Reviewed By: nixon
Subscribers: nixon, dongw, [email protected], [email protected]
Differential Revision: https://phabricator.intern.facebook.com/D9526137
Tasks: T31626216
Signature: 9526137:1535491578:4624f336d24c0cd737697e95b08d5a157e4abf5a
commit f2cb5c39ffdff91791e0f22680bc7bb9b889bfbd
Author: Brian Nixon <nixon@...>
Date: 2018-09-01T02:57:53Z
remove the dependency on httpclient of org.apache.httpcomponents
Summary:
This dependency is causing a performance regression when the
perf_regression_version_write test is run against zeus.regional.atn.10001.
https://fb.facebook.com/groups/1837468356490057/permalink/2184210251815864/
has more context.
Test Plan: Built the jar and am running perf_regression_version_write
against it on zeus.regional.atn.10001
Reviewers: ilyam, allenlyu
Reviewed By: ilyam, allenlyu
Subscribers: nixon, dongw, #zeus, [email protected]
Differential Revision: https://phabricator.intern.facebook.com/D9622408
Tasks: T31626216, T33493338
Signature: 9622408:1535761773:ae82b939c8cdbf9b4ab0de3eb6e3b7a598a0a6ab
commit 956639dfceb2cb4ce67cac7b93abf668da4b89ce
Author: Ilya Maykov <ilyam@...>
Date: 2018-09-08T01:04:23Z
[Quorum TLS] fix bugs in UnifiedServerSocket accept() flow
Summary:
There were a few issues in UnifiedServerSocket accept() flow that were
exposed during testing:
- if a client closed the connection without sending any data, the accept
thread would crash because a negative return from read() was not properly
detected
- a client connects and doesn't send any data but keeps the connection
open, the accept thread will not make progress. Fixed by adding a new timeout
property for the initial read that determines the client mode, defaults to
5000ms.
- Exceptions in Leader.LearnerCnxAcceptor.run() could leak an open socket
if the error happened before the socket was given to the LearnerHandler.
Test Plan: compiles, passes a couple unit tests, will let sandcastle run
the full test suite
Reviewers: nixon, bunn, allenlyu
Reviewed By: bunn
Subscribers: nixon, dongw, [email protected]
Differential Revision: https://phabricator.intern.facebook.com/D9694977
Tasks: T31626216
Signature: 9694977:1536281695:fe2ece2be1277513de727fb3130e9d952e33f2f4
commit 0ed15965f03794c95f04d30e7d93b0a531ee3ebe
Author: Ilya Maykov <ilyam@...>
Date: 2018-09-08T01:04:24Z
[Quorum TLS] delay unified socket mode detection to the first
read()/write() operation
Summary: Instead of reading from the underlying socket and detecting the
mode as soon as an input/output stream is requested from a unified socket,
delay the blocking mode detection until the first actual read()/write() on the
requested stream. This means an accept() thread that gets the input stream but
doesn't actually read from it and passes the stream to a worker thread will not
get blocked.
Test Plan: Modified UnifiedServerSocketTest to grab the input stream in the
main accept() thread, make sure it passes with the changes. The modified tests
fail as expected if the logic in UnifiedServerSocket.java is reverted.
Reviewers: nixon, bunn, allenlyu
Reviewed By: nixon
Subscribers: nixon, dongw, [email protected]
Differential Revision: https://phabricator.intern.facebook.com/D9696594
Tasks: T31626216
Signature: 9696594:1536343729:00605b4393e61f340d2d079ce00027e2e11489d0
----
---