Todd Lipcon has posted comments on this change. Change subject: KuduRPC integration with OpenSSL ......................................................................
Patch Set 2: (44 comments) http://gerrit.cloudera.org:8080/#/c/4789/2//COMMIT_MSG Commit Message: Line 34: - Make SSL methods (SSLv23, TLS1, etc.) configurable and OpenSSL yea, I think it's very important that we disallow SSLv2/v3. Isn't that just a matter of one call: SSL_CTX_set_options(ctx->ssl_ctx, SSL_OP_NO_SSLv2 | SSL_OP_NO_SSLv3); http://gerrit.cloudera.org:8080/#/c/4789/2/src/kudu/rpc/connection.cc File src/kudu/rpc/connection.cc: PS2, Line 74: bool ssl_enabled = reactor_thread_->reactor()->messenger()->ssl_enabled(); : if (ssl_enabled) { : socket_.reset(reactor_thread_->reactor()->messenger()->ssl_factory()->CreateSocket( : socket, direction_ == SERVER)); : } else { : socket_.reset(new Socket(socket)); : } : rather than doing this work in the ctor, does it make sense to do this at the point just before you create the connection, and then pass in the constructed Socket to the Connection ctor? seems like looser coupling (you don't need to call back into the messenger to find if SSL is enabled), and then you don't need this two-step initialization of the sasl server/client http://gerrit.cloudera.org:8080/#/c/4789/2/src/kudu/rpc/messenger.cc File src/kudu/rpc/messenger.cc: PS2, Line 59: : DEFINE_string(ssl_server_certificate, "", "Path to the SSL certificate to be used."); : DEFINE_string(ssl_private_key, "", : "Path to the private key to be used to complement the public key present in " : "--ssl_server_certificate"); : DEFINE_string(ssl_certificate_authority, "", : can we tag these as experimental for now? I don't think we want to sign up for maintaining them until we get a little farther into actually operating this, etc, and I think the settings may end up needing to be passed in from elsewhere. For example, the client CA definitely needs to be exposed as an API, not a gflag, since gflags aren't exposed to clients, and because different clients in the same process may need to choose different CAs. On the server side, we expect that we'll be using a cert generated on the fly, in which case we also wouldn't want these as gflags. Also, the descriptions and names should more specifically reference RPC. Otherwise I think most users would read these and think it's talking about HTTPS Line 167: if (ssl_enabled()) { how about checking if (ssl_factory_) since even if ssl is enabled, maybe it didn't actually get constructed http://gerrit.cloudera.org:8080/#/c/4789/2/src/kudu/rpc/messenger.h File src/kudu/rpc/messenger.h: Line 220: bool ssl_enabled_; maybe I'm being pedantic, but shouldn't all of these variables and classes be called 'TLS' rather than 'SSL' since we'll probably disable SSLv2 and SSLv3? http://gerrit.cloudera.org:8080/#/c/4789/2/src/kudu/rpc/rpc-test.cc File src/kudu/rpc/rpc-test.cc: Line 52: class TestRpc : public RpcTestBase, public ::testing::WithParamInterface<bool> { just a thought: if you made the RpcTestBase extend WithParamInterface, would that avoid having to propagate the flag in all the superclass method calls? or is that more of a mess because of other tests depending on it? PS2, Line 55: parametrized nit: parameterized PS2, Line 56: ,Te nit: missing space http://gerrit.cloudera.org:8080/#/c/4789/2/src/kudu/rpc/sasl_client.h File src/kudu/rpc/sasl_client.h: PS2, Line 70: Required for some mechanisms. this bit of the comment seems out of place http://gerrit.cloudera.org:8080/#/c/4789/2/src/kudu/rpc/sasl_rpc-test.cc File src/kudu/rpc/sasl_rpc-test.cc: PS2, Line 53: NUL nit: prefer nullptr here and elsewhere http://gerrit.cloudera.org:8080/#/c/4789/2/src/kudu/rpc/sasl_server.h File src/kudu/rpc/sasl_server.h: Line 69: // Set the underlying socket to be used. nit: trailing whitesapce http://gerrit.cloudera.org:8080/#/c/4789/2/src/kudu/testutil/server-cert.pem File src/kudu/testutil/server-cert.pem: > Consider having these in code of the test and writing into a file under uni agreed. Otherwise you'll need to at least add this to the RAT exclude list, and to the list of files to always ship to slaves in the build-support/dist_test.py script http://gerrit.cloudera.org:8080/#/c/4789/2/src/kudu/util/net/ssl_factory.cc File src/kudu/util/net/ssl_factory.cc: Line 55: // Callbacks used by OpenSSL required in a multi-threaded setting. Are these going to play nice if you're also using OpenSSL from the squeasel library? It also sets the locking and id callbacks. I'm afraid it's likely that both initializations will run, and there will be some mutexes locked using one set of callbacks, and then attempt to unlock (a different mutex) using the other set. Or, switching thread IDs, etc. Do we need some option to tell squeasel "SSL is already initialized, don't do your own init"? Line 66: Status SSLFactory::Init() { CHECK(!ctx_) here to make sure we don't double-init and leak Line 67: // TODO(sailesh): Make method configurable. not sure it needs to be configurable. Perhaps we should be doing something like checking: #if OPENSSL_VERSION_NUMBER >= 0x10100000L SSL_CTX_new(TLS_method()) #else SSL_CTX_new(SSLv23_method()) #endif since it seems like SSLv23_method has been depreacted in 1.1.0 PS2, Line 74: SSL_VERIFY_FAIL_IF_NO_PEER_CERT we don't expect the client to have a cert in the Kudu use case, so we'll need some flexibility here to set it on a per-server basis. PS2, Line 116: if (!ssl_socket) this condition is unnecessary since 'new' will always return non-NULL http://gerrit.cloudera.org:8080/#/c/4789/2/src/kudu/util/net/ssl_factory.h File src/kudu/util/net/ssl_factory.h: Line 34: // Setup the OpenSSL library. specify when this must be called and how many times. Why not keep this internal and call it from the first SSLFactory ctor, considering it doesn't return Status? Line 38: public: please add a ctor and dtor (and define them in the .cc file) even if they're empty. PS2, Line 42: // Teardown any context set up by this class. : void Teardown(); do you expect that this will block? Given it doesn't return any error, why not just do this work in the dtor? PS2, Line 56: bool is_server); we usually prefer enums to bools, eg: enum Direction { CLIENT, SERVER }; just to make call sites more obvious PS2, Line 62: SSL error queue please document that this is a thread-local concept. Also, note that this doesn't actually modify the error queue since it's using peek. Is that the behavior we want? http://stackoverflow.com/questions/18179128/how-to-manage-the-error-queue-in-openssl-ssl-get-error-and-err-get-error also seems to suggest that if you ever plan to call SSLGetError, you should probably be using ERR_clear_error before every call. PS2, Line 63: ssp typo Line 64: static std::string GetLastError(int errno_copy) { seems better to define this in the .cc file in an anonymous namespace, since it's private anyway Line 67: if (error_code != 0) { I think this would be clearer to invert this condition so the shorter thing is indented: if (error_code == 0) return ErrnoToString(errno_copy) PS2, Line 72: "SSL error " + error_code; that's not what you want. I think you mean Substitute("SSL error $0", error_code). (otherwise this is incrementing the char* pointer) http://gerrit.cloudera.org:8080/#/c/4789/2/src/kudu/util/net/ssl_socket.cc File src/kudu/util/net/ssl_socket.cc: PS2, Line 29: ssl_ = ssl; : SSL_set_fd(ssl_, fd); : is_server_ = is_server; use an initializer list for the assignments of ssl_ and is_server_ Line 37: std::string GetX509CommonName(X509* cert, int* cn_size) { this function and a few below are only used within this compilation unit, so they should be put in an anonymous namespace PS2, Line 38: std::string() here and below, you can just return ""; Line 53: *cn_size = ASN1_STRING_to_UTF8(&utf8_cn, cn_asn1); The example on https://wiki.openssl.org/index.php/Hostname_validation shows them checking that there are no embedded \0s in the resulting string. That's probably a good idea. Line 56: std::string common_name(reinterpret_cast<char*>(utf8_cn)); why not pass *cn_size as the second argument to this constructor, and then you don't need the out-param of GetX509CommonName? Line 63: // Return 'true' if successful and 'false' otherwise. I think it's worth adding a comment here that references https://wiki.openssl.org/index.php/Hostname_validation I was surprised when reading the code that you had to implement this yourself, but seems necessary since we need to support openssl 1.0.1 on el6 Line 64: bool VerifyHost(const std::string& hostname, const std::string& subject_name, int size) { would be nice to have some specific unit tests for this one, since it's got a few edge cases Line 68: // RFC-6125 Section 6.4.3, item 1: Only the complete leftmost label may have a wildcard. wondering if this would be easier to read doing something like: vector<StringPiece> components = strings::Split(subject_name, '*'); for (int i = 1; i < components.size(); i++) { if (components[i].find('*') != npos) ... } (I always find substr stuff hard to reason about and prone to off-by-one errors) Line 73: int i = 0, j = 0; why not define these inside the for loop? PS2, Line 76: while (j < hostname_size && hostname[j] != '.') { : ++j; : } is this greedy match always right? the RFC 6125 seems to indicate that you should be able to match things like "*baz.example.com" and I'm not sure this algorithm works. Maybe using MatchPattern would be better? Line 125: peer_addr.LookupHostname(&peer_hostname); need to check result (eg if reverse DNS fails) Line 136: if (ssl_ == NULL) return Status::NetworkError("SSLSocket::Write(): SSL context unavailable"); this can probably just be a CHECK, since it would be programmer error. (same below in a few places) Line 155: for (int i = 0; i < iov_len; ++i) { We typically set O_NODELAY on our sockets, so each of the underlying Write calls is going to cause a separate TCP packet. I think we should do something whereby we set TCP_CORK before the loop, and unset it at the end, so that we still end up consolidating packets for vectored writes. PS2, Line 157: int32_t bytes_written = SSL_write(ssl_, iov[i].iov_base, frame_size); : if (bytes_written <= 0) { : if (SSL_get_error(ssl_, bytes_written) == SSL_ERROR_WANT_WRITE) { : // Socket not ready to write yet. : *nwritten = 0; : return Status::OK(); : } : return Status::NetworkError("SSL_write", SSLFactory::GetLastError(errno)); : } : total_written += bytes_written; can you refactor out this common code with Write above? or just call through to Write even? Line 196: if (ret == 0) ret = SSL_shutdown(ssl_); this doesn't seem right, since this seems like it's a blocking call. Given we plan to shut down the socket anyway, isn't a unidirectional one enough? PS2, Line 207: Socket::Close() this is calling it again which seems wrong http://gerrit.cloudera.org:8080/#/c/4789/2/src/kudu/util/net/ssl_socket.h File src/kudu/util/net/ssl_socket.h: PS2, Line 40: Status Write(const uint8_t *buf, int32_t amt, int32_t *nwritten); : : Status Writev(const struct ::iovec *iov, int iov_len, int32_t *nwritten); : : Status Recv(uint8_t *buf, int32_t amt, int32_t *nread); these should all have the 'override' keyword on them PS2, Line 49: Inherits better to say 'Owned'. Inherits sounds like it's something coming from a superclass -- To view, visit http://gerrit.cloudera.org:8080/4789 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I27167faa4e6a78e59b46093055b16682c93af0ea Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Sailesh Mukil <sail...@cloudera.com> Gerrit-Reviewer: Alexey Serbin <aser...@cloudera.com> Gerrit-Reviewer: Dan Burkert <danburk...@apache.org> Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Sailesh Mukil <sail...@cloudera.com> Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon <t...@apache.org> Gerrit-HasComments: Yes