Il ven 17 mag 2019, 01:18 Gracia, Raul <[email protected]> ha scritto:
> Hi Andor, > > You are totally correct, the server works adding this auth provider. > Thanks a lot! > > I did a cursory comparison between ZooKeeper versions 3.5.4-beta and 3.5.5 > and I couldn't find a change that justifies this behavior change. > In any case, the Pravega build has passed with zookeeper-3.5.5, which are > great news. > > I will execute some more tests and leave my vote to the release candidate, > if you feel that this could be useful. > Raul, It's great to see that you solved your problem. It is also interesting that you are testing boring-ssl as we still not included it in the release tarball. Yes please cast your vote Enrico > Thanks a lot, > Raúl. > > -----Original Message----- > From: Andor Molnar <[email protected]> > Sent: Thursday, May 16, 2019 6:43 PM > To: DevZooKeeper > Subject: Re: Question about security configuration (was: Re: [VOTE] Apache > ZooKeeper release 3.5.5 candidate 6) > > > [EXTERNAL EMAIL] > > Hi Raul, > > X509AuthenticationProvider is not registered in the embedded ZK. In server > logs it says: > "[epollEventLoopGroup-4-1] ERROR > org.apache.zookeeper.server.NettyServerCnxnFactory - Auth provider not > found: x509" > > It's done by QuorumPeerConfig.java:436 (configureSSLAuth()) when you run > ZooKeeper in standalone mode, but your code doesn't use this configuration > class at all. > If you add this: > > System.setProperty("zookeeper.authProvider.x509", > "org.apache.zookeeper.server.auth.X509AuthenticationProvider"); > > to your initialize() method, client SSL works: > > [nioEventLoopGroup-4-2] INFO > org.apache.zookeeper.server.NettyServerCnxnFactory - SSL handler added for > channel: [id: 0x698604a3, L:/127.0.0.1:2281 - R:/127.0.0.1:56750] > [nioEventLoopGroup-4-2] INFO > org.apache.zookeeper.server.auth.X509AuthenticationProvider - Authenticated > Id 'CN=server.pravegastack.io' for Scheme 'x509' > > TBH I haven't diffed the code with 3.5.4-beta, so not sure why it worked > previously and I don't have experience with embedded ZK, but I believe > QuorumPeerConfig class has to be involved somehow. > > Regards, > Andor > > > > On Thu, May 16, 2019 at 5:10 PM Gracia, Raul <[email protected]> wrote: > > > Thanks Andor for your quick reply. Let me answer to your questions: > > > > 1) Yes, the problem is related to client/server communication using > > SSL, not related to Quorum SSL (we use a single Zookeeper process in our > tests). > > I would like your feedback first to conclude if this is a problem in > > our config/code or a regression/change in the behavior of Zookeeper > 3.5.5. > > > > 2) Yes, with the external Zookeeper server running separately (e.g., > > zkServer.sh start) all the tests are passing (SSL/non-SSL). With the > > Zookeeper server process we instantiate in our tests, the non-SSL > > tests are also passing, but not the SSL ones. > > > > 3) Correct. Just to give more detail here, we are instantiating the > > Zookeeper server process using the ZooKeeperServer class jointly with > > NettyServerCnxnFactory. > > > > 4) I have done 2 types of tests: with Zookeeper started as a separate > > service ("zkServer.sh") and using the Zookeeper server process we > > instantiate in Pravega standalone tests (namely, "zk-pravega-tests"): > > - zkServer.sh: Works well with regular Zookeeper client (zkCli.sh) and > > the Pravega standalone tests pass using it with/without SSL. > > - zk-pravega-tests: Without SSL, the zkCli.sh can connect to that > > process and the non-SSL Pravega tests pass. With SSL configured, > > neither zkCli.sh nor Pravega tests with SSL are capable to connect to > > the server (KeeperErrorCode = ConnectionLoss). > > > > 5) No, I haven't tested this scenario yet. I have tested a standalone > > Zookeeper server (zkServer.sh) and a client (zkCli.sh) with SSL > > enabled in the same machine, and it works well. Apart from that, I > > have also performed distributed tests with a Zookeeper server > > (3.5.4-beta) and Pravega (using Curator 4.0.1 + zookeeper-3.5.5) in > Kubernetes and it worked fine. > > > > 6) Yes, in fact I have done a little more than that and I have created > > a repository to investigate this issue in isolation: > > https://github.com/RaulGracia/zookeeper-test > > Apart from providing logs (see logs folder), in this repo I extracted > > the piece of code from the Pravega repository that is used to start > > the Zookeeper standalone process, making it easier to configure the > > SSL properties via executable. I think that this will make it easier > > for anyone to reproduce the problem I'm experiencing. Moreover, I have > > provided instructions in the README file on how to reproduce the issue. > > > > Thanks a lot, > > Raúl. > > > > > > -----Original Message----- > > From: Andor Molnar <[email protected]> > > Sent: Thursday, May 16, 2019 11:18 AM > > To: DevZooKeeper > > Subject: Re: Question about security configuration (was: Re: [VOTE] > > Apache ZooKeeper release 3.5.5 candidate 6) > > > > > > [EXTERNAL EMAIL] > > > > Hi Raul, > > > > Thanks for the analysis. Let me ask a few questions, because I see > > some things that need to be clarified first. > > > > 1. This issue is only about server-client SSL scenario (not Quorum > > TLS), so it's possibly a regression in 3.5. Is that correct? > > 2. When running all Pravega tests against an external ZooKeeper > > standalone server, all tests passed including SSL/nonSSL. Is that > correct? > > 3. SSL tests are failing when ZooKeeper is running inside the test > process? > > 4. You verified it by running ZooKeeper in standalone mode, > > SSL-enabled and according to the log snippet, your client has > > connected successfully, but later timed out. Is that right? > > 5. Have you verified client-server SSL config with real (3-node) > > cluster with zkCli.sh? > > 6. Would you please provide the server side logs as well, maybe it > > sheds some light why the client timed out? > > > > Thanks, > > Andor > > > > > > > > > > On Thu, May 16, 2019 at 10:25 AM Gracia, Raul <[email protected]> > > wrote: > > > > > Hi all, > > > > > > My name is Raúl Gracia and I work in the Pravega project > > > (open-source project for data stream storage): http://pravega.io/. > > > > > > I'm currently working on a Pravega branch using > > > "zookeeper-3.5.5-rc6", as we are interested on allowing Curator > > > (4.0.1) to use a Zookeeper version with the bugfix proposed in > > > ZOOKEEPER-2184< > > > https://issues.apache.org/jira/browse/ZOOKEEPER-2184>. The > > > integration has been pretty smooth and 99% of tests are successful > > > in a Pravega build, and the original issue that motivated the > > > upgrade to > > > zookeeper-3.5.5 seems also solved. > > > > > > However, there are failures related to a specific type of tests in > > > Pravega in which we instantiate a Zookeeper server process (for > > > testing Pravega in standalone mode). Such failures only occur when > > > running the standalone tests with SSL enabled, which includes > > > configuring the Zookeeper server process with SSL as well. > > > > > > To constrain the scope of the problem, I have built > > > zookeeper-3.5.5-rc6 ("mvn package") and executed the server (e.g., > > > "./bin/zkServer.sh start") with the appropriate security > > > configuration > > to enable SSL: > > > export SERVER_JVMFLAGS=" > > > > > > -Dzookeeper.serverCnxnFactory=org.apache.zookeeper.server.NettyServe > > > rC nxnFactory > > > -Dzookeeper.ssl.keyStore.location=.../server.keystore.jks > > > -Dzookeeper.ssl.keyStore.password=password > > > -Dzookeeper.ssl.trustStore.location=.../client.truststore.jks > > > -Dzookeeper.ssl.trustStore.password= password" > > > (I have also added secureClientPort=2281 in zoo.cfg as indicated in > > > the admin instructions) > > > > > > With the Zookeeper server running separately, I executed all the > > > Pravega standalone tests (with and without SSL) pointing that > > > external Zookeeper server (and disabling the Zookeeper server > > > process that was created as part of the test workflow). Regarding > > > configuration, in our tests the clients are configured with the > > > recommended security settings in the administration > > > guide: > > > System.setProperty("zookeeper.client.secure", "true"); > > > System.setProperty("zookeeper.clientCnxnSocket", > > > "org.apache.zookeeper.ClientCnxnSocketNetty"); > > > System.setProperty("zookeeper.ssl.trustStore.location", > > > .../client.truststore.jks"); > > > System.setProperty("zookeeper.ssl.trustStore.password", "password > > > "); System.setProperty("zookeeper.ssl.keyStore.location", > > > ".../server.keystore.jks"); > > > System.setProperty("zookeeper.ssl.keyStore.password", "password "); > > > > > > In this case, all the Pravega standalone tests succeeded. > > > > > > This leaves us the way we are configuring SSL in the Zookeeper > > > server process in Pravega standalone as the most plausible cause for > > > the > > problem. > > > This is intriguing, as the security settings used are the same in > > > both scenarios (zkServer.sh / Zookeeper server process started in > > > the test > > code). > > > > > > I have also confirmed this by running the Zookeeper server process > > > used in standalone with/without SSL and connecting to it via the > > > zkCli. Without SSL configured I can connect properly to it, whereas > > > with SSL enabled I get the following error in the client: > > > > > > 2019-05-15 19:59:40,479 [myid:] - INFO [main:ZooKeeper@868] - > > > Initiating client connection, connectString=localhost:2281 > > > sessionTimeout=30000 > > > watcher=org.apache.zookeeper.ZooKeeperMain$MyWatcher@621be5d1<mailto: > > > watcher=org.apache.zookeeper.ZooKeeperMain$MyWatcher@621be5d1> > > > 2019-05-15 19:59:40,507 [myid:] - INFO [main:X509Util@79] - Setting > > > -D jdk.tls.rejectClientInitiatedRenegotiation=true to disable > > > client-initiated TLS renegotiation > > > 2019-05-15 19:59:40,791 [myid:] - INFO [main:ClientCnxnSocket@237] > > > - jute.maxbuffer value is 4194304 Bytes > > > 2019-05-15 19:59:40,798 [myid:] - INFO [main:ClientCnxn@1653] - > > > zookeeper.request.timeout value is 0. feature enabled= > > > 2019-05-15 19:59:40,817 [myid:localhost:2281] - INFO > > > [main-SendThread(localhost:2281):ClientCnxn$SendThread@1112] - > > > Opening socket connection to server localhost/127.0.0.1:2281. Will > > > not attempt to authenticate using SASL (unknown error) Welcome to > ZooKeeper! > > > JLine support is enabled > > > [zk: localhost:2281(CONNECTING) 0] 2019-05-15 19:59:41,168 > > > [myid:localhost:2281] - INFO > > > [epollEventLoopGroup-2-1:ClientCnxnSocketNetty$ZKClientPipelineFacto > > > ry > > > @460] > > > - SSL handler added for channel: [id: 0x7bf11dfa] > > > 2019-05-15 19:59:41,176 [myid:localhost:2281] - INFO > > > [epollEventLoopGroup-2-1:ClientCnxn$SendThread@959] - Socket > > > connection established, initiating session, client: > > > /127.0.0.1:52652, > > server: > > > localhost/127.0.0.1:2281 > > > 2019-05-15 19:59:41,178 [myid:localhost:2281] - INFO > > > [epollEventLoopGroup-2-1:ClientCnxnSocketNetty$1@188] - channel is > > > connected: [id: 0x7bf11dfa, L:/127.0.0.1:52652 - R:localhost/ > > > 127.0.0.1:2281] > > > 2019-05-15 19:59:41,614 [myid:localhost:2281] - INFO > > > [epollEventLoopGroup-2-1:ClientCnxn$SendThread@1394] - Session > > > establishment complete on server localhost/127.0.0.1:2281, sessionid > > > = 0x10002239ae10000, negotiated timeout = 30000 > > > WATCHER:: > > > WatchedEvent state:SyncConnected type:None path:null > > > [zk: localhost:2281(CONNECTED) 0] ls / > > > 2019-05-15 20:00:01,616 [myid:localhost:2281] - WARN > > > [main-SendThread(localhost:2281):ClientCnxn$SendThread@1190] - > > > Client session timed out, have not heard from server in 20004ms for > > > sessionid > > > 0x10002239ae10000 > > > 2019-05-15 20:00:01,618 [myid:localhost:2281] - INFO > > > [main-SendThread(localhost:2281):ClientCnxn$SendThread@1238] - > > > Client session timed out, have not heard from server in 20004ms for > > > sessionid 0x10002239ae10000, closing socket connection and > > > attempting reconnect > > > 2019-05-15 20:00:01,630 [myid:localhost:2281] - INFO > > > [epollEventLoopGroup-2-1:ClientCnxnSocketNetty$ZKClientHandler@473] > > > - channel is disconnected: [id: 0x7bf11dfa, L:/127.0.0.1:52652 ! > > > R:localhost/127.0.0.1:2281] > > > 2019-05-15 20:00:01,631 [myid:localhost:2281] - INFO > > > [epollEventLoopGroup-2-1:ClientCnxnSocketNetty@253] - channel is > > > told closing KeeperErrorCode = ConnectionLoss for / > > > [zk: localhost:2281(CONNECTED) 1] > > > > > > I see some suspicious messages in these logs that I will need to > > > investigate further. But as a general observation, it looks like the > > > way we instantiate the Zookeeper server process for Pravega > > > standalone is not valid in zookeeper-3.5.5-rc6 (to inspect how we > > > create the Zookeeper server process, please see methods initialize() > > > and start() in this file< > > > https://github.com/pravega/pravega/blob/master/segmentstore/storage/ > > > im > > > pl/src/main/java/io/pravega/segmentstore/storage/impl/bookkeeper/Zoo > > > Ke > > > eperServiceRunner.java > > > >). > > > > > > In summary, if the error I'm getting is related to changes in the > > > SSL configuration introduced in zookeeper-3.5.5, it would be great > > > to get feedback from you if I'm missing something. On the other > > > hand, if the way we are creating a Zookeeper server process is not > > > the recommended one, I'm also open to suggestions here. > > > > > > Thanks in advance and sorry for the long email, Raúl. > > > > > > PS: I have also tried to run the Zookeeper server process with SSL > > > forcing to only use the netty and boringSSL library versions that > > > are used either in Pravega(netty*:4.1.30.Final, > > > netty-tcnative-boringssl-static:2.0.17) or Zookeeper > > > 3.5.5(netty*:4.1.29.Final, netty-tcnative-boringssl-static:2.0.7), > > > but none of these combinations made any difference in the behavior > > > of the > > Zookeeper server process. > > > > > > PS2: The JDK version I use is: openjdk version "1.8.0_212". > > > > > > > > >
