Hi Andor, You are totally correct, the server works adding this auth provider. Thanks a lot!
I did a cursory comparison between ZooKeeper versions 3.5.4-beta and 3.5.5 and I couldn't find a change that justifies this behavior change. In any case, the Pravega build has passed with zookeeper-3.5.5, which are great news. I will execute some more tests and leave my vote to the release candidate, if you feel that this could be useful. Thanks a lot, Raúl. -----Original Message----- From: Andor Molnar <an...@cloudera.com.INVALID> Sent: Thursday, May 16, 2019 6:43 PM To: DevZooKeeper Subject: Re: Question about security configuration (was: Re: [VOTE] Apache ZooKeeper release 3.5.5 candidate 6) [EXTERNAL EMAIL] Hi Raul, X509AuthenticationProvider is not registered in the embedded ZK. In server logs it says: "[epollEventLoopGroup-4-1] ERROR org.apache.zookeeper.server.NettyServerCnxnFactory - Auth provider not found: x509" It's done by QuorumPeerConfig.java:436 (configureSSLAuth()) when you run ZooKeeper in standalone mode, but your code doesn't use this configuration class at all. If you add this: System.setProperty("zookeeper.authProvider.x509", "org.apache.zookeeper.server.auth.X509AuthenticationProvider"); to your initialize() method, client SSL works: [nioEventLoopGroup-4-2] INFO org.apache.zookeeper.server.NettyServerCnxnFactory - SSL handler added for channel: [id: 0x698604a3, L:/127.0.0.1:2281 - R:/127.0.0.1:56750] [nioEventLoopGroup-4-2] INFO org.apache.zookeeper.server.auth.X509AuthenticationProvider - Authenticated Id 'CN=server.pravegastack.io' for Scheme 'x509' TBH I haven't diffed the code with 3.5.4-beta, so not sure why it worked previously and I don't have experience with embedded ZK, but I believe QuorumPeerConfig class has to be involved somehow. Regards, Andor On Thu, May 16, 2019 at 5:10 PM Gracia, Raul <raul.gra...@dell.com> wrote: > Thanks Andor for your quick reply. Let me answer to your questions: > > 1) Yes, the problem is related to client/server communication using > SSL, not related to Quorum SSL (we use a single Zookeeper process in our > tests). > I would like your feedback first to conclude if this is a problem in > our config/code or a regression/change in the behavior of Zookeeper 3.5.5. > > 2) Yes, with the external Zookeeper server running separately (e.g., > zkServer.sh start) all the tests are passing (SSL/non-SSL). With the > Zookeeper server process we instantiate in our tests, the non-SSL > tests are also passing, but not the SSL ones. > > 3) Correct. Just to give more detail here, we are instantiating the > Zookeeper server process using the ZooKeeperServer class jointly with > NettyServerCnxnFactory. > > 4) I have done 2 types of tests: with Zookeeper started as a separate > service ("zkServer.sh") and using the Zookeeper server process we > instantiate in Pravega standalone tests (namely, "zk-pravega-tests"): > - zkServer.sh: Works well with regular Zookeeper client (zkCli.sh) and > the Pravega standalone tests pass using it with/without SSL. > - zk-pravega-tests: Without SSL, the zkCli.sh can connect to that > process and the non-SSL Pravega tests pass. With SSL configured, > neither zkCli.sh nor Pravega tests with SSL are capable to connect to > the server (KeeperErrorCode = ConnectionLoss). > > 5) No, I haven't tested this scenario yet. I have tested a standalone > Zookeeper server (zkServer.sh) and a client (zkCli.sh) with SSL > enabled in the same machine, and it works well. Apart from that, I > have also performed distributed tests with a Zookeeper server > (3.5.4-beta) and Pravega (using Curator 4.0.1 + zookeeper-3.5.5) in > Kubernetes and it worked fine. > > 6) Yes, in fact I have done a little more than that and I have created > a repository to investigate this issue in isolation: > https://github.com/RaulGracia/zookeeper-test > Apart from providing logs (see logs folder), in this repo I extracted > the piece of code from the Pravega repository that is used to start > the Zookeeper standalone process, making it easier to configure the > SSL properties via executable. I think that this will make it easier > for anyone to reproduce the problem I'm experiencing. Moreover, I have > provided instructions in the README file on how to reproduce the issue. > > Thanks a lot, > Raúl. > > > -----Original Message----- > From: Andor Molnar <an...@cloudera.com.INVALID> > Sent: Thursday, May 16, 2019 11:18 AM > To: DevZooKeeper > Subject: Re: Question about security configuration (was: Re: [VOTE] > Apache ZooKeeper release 3.5.5 candidate 6) > > > [EXTERNAL EMAIL] > > Hi Raul, > > Thanks for the analysis. Let me ask a few questions, because I see > some things that need to be clarified first. > > 1. This issue is only about server-client SSL scenario (not Quorum > TLS), so it's possibly a regression in 3.5. Is that correct? > 2. When running all Pravega tests against an external ZooKeeper > standalone server, all tests passed including SSL/nonSSL. Is that correct? > 3. SSL tests are failing when ZooKeeper is running inside the test process? > 4. You verified it by running ZooKeeper in standalone mode, > SSL-enabled and according to the log snippet, your client has > connected successfully, but later timed out. Is that right? > 5. Have you verified client-server SSL config with real (3-node) > cluster with zkCli.sh? > 6. Would you please provide the server side logs as well, maybe it > sheds some light why the client timed out? > > Thanks, > Andor > > > > > On Thu, May 16, 2019 at 10:25 AM Gracia, Raul <raul.gra...@dell.com> > wrote: > > > Hi all, > > > > My name is Raúl Gracia and I work in the Pravega project > > (open-source project for data stream storage): http://pravega.io/. > > > > I'm currently working on a Pravega branch using > > "zookeeper-3.5.5-rc6", as we are interested on allowing Curator > > (4.0.1) to use a Zookeeper version with the bugfix proposed in > > ZOOKEEPER-2184< > > https://issues.apache.org/jira/browse/ZOOKEEPER-2184>. The > > integration has been pretty smooth and 99% of tests are successful > > in a Pravega build, and the original issue that motivated the > > upgrade to > > zookeeper-3.5.5 seems also solved. > > > > However, there are failures related to a specific type of tests in > > Pravega in which we instantiate a Zookeeper server process (for > > testing Pravega in standalone mode). Such failures only occur when > > running the standalone tests with SSL enabled, which includes > > configuring the Zookeeper server process with SSL as well. > > > > To constrain the scope of the problem, I have built > > zookeeper-3.5.5-rc6 ("mvn package") and executed the server (e.g., > > "./bin/zkServer.sh start") with the appropriate security > > configuration > to enable SSL: > > export SERVER_JVMFLAGS=" > > > > -Dzookeeper.serverCnxnFactory=org.apache.zookeeper.server.NettyServe > > rC nxnFactory > > -Dzookeeper.ssl.keyStore.location=.../server.keystore.jks > > -Dzookeeper.ssl.keyStore.password=password > > -Dzookeeper.ssl.trustStore.location=.../client.truststore.jks > > -Dzookeeper.ssl.trustStore.password= password" > > (I have also added secureClientPort=2281 in zoo.cfg as indicated in > > the admin instructions) > > > > With the Zookeeper server running separately, I executed all the > > Pravega standalone tests (with and without SSL) pointing that > > external Zookeeper server (and disabling the Zookeeper server > > process that was created as part of the test workflow). Regarding > > configuration, in our tests the clients are configured with the > > recommended security settings in the administration > > guide: > > System.setProperty("zookeeper.client.secure", "true"); > > System.setProperty("zookeeper.clientCnxnSocket", > > "org.apache.zookeeper.ClientCnxnSocketNetty"); > > System.setProperty("zookeeper.ssl.trustStore.location", > > .../client.truststore.jks"); > > System.setProperty("zookeeper.ssl.trustStore.password", "password > > "); System.setProperty("zookeeper.ssl.keyStore.location", > > ".../server.keystore.jks"); > > System.setProperty("zookeeper.ssl.keyStore.password", "password "); > > > > In this case, all the Pravega standalone tests succeeded. > > > > This leaves us the way we are configuring SSL in the Zookeeper > > server process in Pravega standalone as the most plausible cause for > > the > problem. > > This is intriguing, as the security settings used are the same in > > both scenarios (zkServer.sh / Zookeeper server process started in > > the test > code). > > > > I have also confirmed this by running the Zookeeper server process > > used in standalone with/without SSL and connecting to it via the > > zkCli. Without SSL configured I can connect properly to it, whereas > > with SSL enabled I get the following error in the client: > > > > 2019-05-15 19:59:40,479 [myid:] - INFO [main:ZooKeeper@868] - > > Initiating client connection, connectString=localhost:2281 > > sessionTimeout=30000 > > watcher=org.apache.zookeeper.ZooKeeperMain$MyWatcher@621be5d1<mailto: > > watcher=org.apache.zookeeper.ZooKeeperMain$MyWatcher@621be5d1> > > 2019-05-15 19:59:40,507 [myid:] - INFO [main:X509Util@79] - Setting > > -D jdk.tls.rejectClientInitiatedRenegotiation=true to disable > > client-initiated TLS renegotiation > > 2019-05-15 19:59:40,791 [myid:] - INFO [main:ClientCnxnSocket@237] > > - jute.maxbuffer value is 4194304 Bytes > > 2019-05-15 19:59:40,798 [myid:] - INFO [main:ClientCnxn@1653] - > > zookeeper.request.timeout value is 0. feature enabled= > > 2019-05-15 19:59:40,817 [myid:localhost:2281] - INFO > > [main-SendThread(localhost:2281):ClientCnxn$SendThread@1112] - > > Opening socket connection to server localhost/127.0.0.1:2281. Will > > not attempt to authenticate using SASL (unknown error) Welcome to ZooKeeper! > > JLine support is enabled > > [zk: localhost:2281(CONNECTING) 0] 2019-05-15 19:59:41,168 > > [myid:localhost:2281] - INFO > > [epollEventLoopGroup-2-1:ClientCnxnSocketNetty$ZKClientPipelineFacto > > ry > > @460] > > - SSL handler added for channel: [id: 0x7bf11dfa] > > 2019-05-15 19:59:41,176 [myid:localhost:2281] - INFO > > [epollEventLoopGroup-2-1:ClientCnxn$SendThread@959] - Socket > > connection established, initiating session, client: > > /127.0.0.1:52652, > server: > > localhost/127.0.0.1:2281 > > 2019-05-15 19:59:41,178 [myid:localhost:2281] - INFO > > [epollEventLoopGroup-2-1:ClientCnxnSocketNetty$1@188] - channel is > > connected: [id: 0x7bf11dfa, L:/127.0.0.1:52652 - R:localhost/ > > 127.0.0.1:2281] > > 2019-05-15 19:59:41,614 [myid:localhost:2281] - INFO > > [epollEventLoopGroup-2-1:ClientCnxn$SendThread@1394] - Session > > establishment complete on server localhost/127.0.0.1:2281, sessionid > > = 0x10002239ae10000, negotiated timeout = 30000 > > WATCHER:: > > WatchedEvent state:SyncConnected type:None path:null > > [zk: localhost:2281(CONNECTED) 0] ls / > > 2019-05-15 20:00:01,616 [myid:localhost:2281] - WARN > > [main-SendThread(localhost:2281):ClientCnxn$SendThread@1190] - > > Client session timed out, have not heard from server in 20004ms for > > sessionid > > 0x10002239ae10000 > > 2019-05-15 20:00:01,618 [myid:localhost:2281] - INFO > > [main-SendThread(localhost:2281):ClientCnxn$SendThread@1238] - > > Client session timed out, have not heard from server in 20004ms for > > sessionid 0x10002239ae10000, closing socket connection and > > attempting reconnect > > 2019-05-15 20:00:01,630 [myid:localhost:2281] - INFO > > [epollEventLoopGroup-2-1:ClientCnxnSocketNetty$ZKClientHandler@473] > > - channel is disconnected: [id: 0x7bf11dfa, L:/127.0.0.1:52652 ! > > R:localhost/127.0.0.1:2281] > > 2019-05-15 20:00:01,631 [myid:localhost:2281] - INFO > > [epollEventLoopGroup-2-1:ClientCnxnSocketNetty@253] - channel is > > told closing KeeperErrorCode = ConnectionLoss for / > > [zk: localhost:2281(CONNECTED) 1] > > > > I see some suspicious messages in these logs that I will need to > > investigate further. But as a general observation, it looks like the > > way we instantiate the Zookeeper server process for Pravega > > standalone is not valid in zookeeper-3.5.5-rc6 (to inspect how we > > create the Zookeeper server process, please see methods initialize() > > and start() in this file< > > https://github.com/pravega/pravega/blob/master/segmentstore/storage/ > > im > > pl/src/main/java/io/pravega/segmentstore/storage/impl/bookkeeper/Zoo > > Ke > > eperServiceRunner.java > > >). > > > > In summary, if the error I'm getting is related to changes in the > > SSL configuration introduced in zookeeper-3.5.5, it would be great > > to get feedback from you if I'm missing something. On the other > > hand, if the way we are creating a Zookeeper server process is not > > the recommended one, I'm also open to suggestions here. > > > > Thanks in advance and sorry for the long email, Raúl. > > > > PS: I have also tried to run the Zookeeper server process with SSL > > forcing to only use the netty and boringSSL library versions that > > are used either in Pravega(netty*:4.1.30.Final, > > netty-tcnative-boringssl-static:2.0.17) or Zookeeper > > 3.5.5(netty*:4.1.29.Final, netty-tcnative-boringssl-static:2.0.7), > > but none of these combinations made any difference in the behavior > > of the > Zookeeper server process. > > > > PS2: The JDK version I use is: openjdk version "1.8.0_212". > > > > >