Hi Raul,

Thanks for the analysis. Let me ask a few questions, because I see some
things that need to be clarified first.

1. This issue is only about server-client SSL scenario (not Quorum TLS), so
it's possibly a regression in 3.5. Is that correct?
2. When running all Pravega tests against an external ZooKeeper standalone
server, all tests passed including SSL/nonSSL. Is that correct?
3. SSL tests are failing when ZooKeeper is running inside the test process?
4. You verified it by running ZooKeeper in standalone mode, SSL-enabled and
according to the log snippet, your client has connected successfully, but
later timed out. Is that right?
5. Have you verified client-server SSL config with real (3-node) cluster
with zkCli.sh?
6. Would you please provide the server side logs as well, maybe it sheds
some light why the client timed out?

Thanks,
Andor




On Thu, May 16, 2019 at 10:25 AM Gracia, Raul <raul.gra...@dell.com> wrote:

> Hi all,
>
> My name is Raúl Gracia and I work in the Pravega project (open-source
> project for data stream storage): http://pravega.io/.
>
> I'm currently working on a Pravega branch using "zookeeper-3.5.5-rc6", as
> we are interested on allowing Curator (4.0.1) to use a Zookeeper version
> with the bugfix proposed in ZOOKEEPER-2184<
> https://issues.apache.org/jira/browse/ZOOKEEPER-2184>. The integration
> has been pretty smooth and 99% of tests are successful in a Pravega build,
> and the original issue that motivated the upgrade to zookeeper-3.5.5 seems
> also solved.
>
> However, there are failures related to a specific type of tests in Pravega
> in which we instantiate a Zookeeper server process (for testing Pravega in
> standalone mode). Such failures only occur when running the standalone
> tests with SSL enabled, which includes configuring the Zookeeper server
> process with SSL as well.
>
> To constrain the scope of the problem, I have built zookeeper-3.5.5-rc6
> ("mvn package") and executed the server (e.g., "./bin/zkServer.sh start")
> with the appropriate security configuration to enable SSL:
> export SERVER_JVMFLAGS="
>
> -Dzookeeper.serverCnxnFactory=org.apache.zookeeper.server.NettyServerCnxnFactory
> -Dzookeeper.ssl.keyStore.location=.../server.keystore.jks
> -Dzookeeper.ssl.keyStore.password=password
> -Dzookeeper.ssl.trustStore.location=.../client.truststore.jks
> -Dzookeeper.ssl.trustStore.password= password"
> (I have also added secureClientPort=2281 in zoo.cfg as indicated in the
> admin instructions)
>
> With the Zookeeper server running separately, I executed all the Pravega
> standalone tests (with and without SSL) pointing that external Zookeeper
> server (and disabling the Zookeeper server process that was created as part
> of the test workflow). Regarding configuration, in our tests the clients
> are configured with the recommended security settings in the administration
> guide:
> System.setProperty("zookeeper.client.secure", "true");
> System.setProperty("zookeeper.clientCnxnSocket",
> "org.apache.zookeeper.ClientCnxnSocketNetty");
> System.setProperty("zookeeper.ssl.trustStore.location",
> .../client.truststore.jks");
> System.setProperty("zookeeper.ssl.trustStore.password", "password ");
> System.setProperty("zookeeper.ssl.keyStore.location",
> ".../server.keystore.jks");
> System.setProperty("zookeeper.ssl.keyStore.password", "password ");
>
> In this case, all the Pravega standalone tests succeeded.
>
> This leaves us the way we are configuring SSL in the Zookeeper server
> process in Pravega standalone as the most plausible cause for the problem.
> This is intriguing, as the security settings used are the same in both
> scenarios (zkServer.sh / Zookeeper server process started in the test code).
>
> I have also confirmed this by running the Zookeeper server process used in
> standalone with/without SSL and connecting to it via the zkCli. Without SSL
> configured I can connect properly to it, whereas with SSL enabled I get the
> following error in the client:
>
> 2019-05-15 19:59:40,479 [myid:] - INFO  [main:ZooKeeper@868] - Initiating
> client connection, connectString=localhost:2281 sessionTimeout=30000
> watcher=org.apache.zookeeper.ZooKeeperMain$MyWatcher@621be5d1<mailto:
> watcher=org.apache.zookeeper.ZooKeeperMain$MyWatcher@621be5d1>
> 2019-05-15 19:59:40,507 [myid:] - INFO  [main:X509Util@79] - Setting -D
> jdk.tls.rejectClientInitiatedRenegotiation=true to disable client-initiated
> TLS renegotiation
> 2019-05-15 19:59:40,791 [myid:] - INFO  [main:ClientCnxnSocket@237] -
> jute.maxbuffer value is 4194304 Bytes
> 2019-05-15 19:59:40,798 [myid:] - INFO  [main:ClientCnxn@1653] -
> zookeeper.request.timeout value is 0. feature enabled=
> 2019-05-15 19:59:40,817 [myid:localhost:2281] - INFO
> [main-SendThread(localhost:2281):ClientCnxn$SendThread@1112] - Opening
> socket connection to server localhost/127.0.0.1:2281. Will not attempt to
> authenticate using SASL (unknown error)
> Welcome to ZooKeeper!
> JLine support is enabled
> [zk: localhost:2281(CONNECTING) 0] 2019-05-15 19:59:41,168
> [myid:localhost:2281] - INFO
> [epollEventLoopGroup-2-1:ClientCnxnSocketNetty$ZKClientPipelineFactory@460]
> - SSL handler added for channel: [id: 0x7bf11dfa]
> 2019-05-15 19:59:41,176 [myid:localhost:2281] - INFO
> [epollEventLoopGroup-2-1:ClientCnxn$SendThread@959] - Socket connection
> established, initiating session, client: /127.0.0.1:52652, server:
> localhost/127.0.0.1:2281
> 2019-05-15 19:59:41,178 [myid:localhost:2281] - INFO
> [epollEventLoopGroup-2-1:ClientCnxnSocketNetty$1@188] - channel is
> connected: [id: 0x7bf11dfa, L:/127.0.0.1:52652 - R:localhost/
> 127.0.0.1:2281]
> 2019-05-15 19:59:41,614 [myid:localhost:2281] - INFO
> [epollEventLoopGroup-2-1:ClientCnxn$SendThread@1394] - Session
> establishment complete on server localhost/127.0.0.1:2281, sessionid =
> 0x10002239ae10000, negotiated timeout = 30000
> WATCHER::
> WatchedEvent state:SyncConnected type:None path:null
> [zk: localhost:2281(CONNECTED) 0] ls /
> 2019-05-15 20:00:01,616 [myid:localhost:2281] - WARN
> [main-SendThread(localhost:2281):ClientCnxn$SendThread@1190] - Client
> session timed out, have not heard from server in 20004ms for sessionid
> 0x10002239ae10000
> 2019-05-15 20:00:01,618 [myid:localhost:2281] - INFO
> [main-SendThread(localhost:2281):ClientCnxn$SendThread@1238] - Client
> session timed out, have not heard from server in 20004ms for sessionid
> 0x10002239ae10000, closing socket connection and attempting reconnect
> 2019-05-15 20:00:01,630 [myid:localhost:2281] - INFO
> [epollEventLoopGroup-2-1:ClientCnxnSocketNetty$ZKClientHandler@473] -
> channel is disconnected: [id: 0x7bf11dfa, L:/127.0.0.1:52652 !
> R:localhost/127.0.0.1:2281]
> 2019-05-15 20:00:01,631 [myid:localhost:2281] - INFO
> [epollEventLoopGroup-2-1:ClientCnxnSocketNetty@253] - channel is told
> closing
> KeeperErrorCode = ConnectionLoss for /
> [zk: localhost:2281(CONNECTED) 1]
>
> I see some suspicious messages in these logs that I will need to
> investigate further. But as a general observation, it looks like the way we
> instantiate the Zookeeper server process for Pravega standalone is not
> valid in zookeeper-3.5.5-rc6 (to inspect how we create the Zookeeper server
> process, please see methods initialize() and start() in this file<
> https://github.com/pravega/pravega/blob/master/segmentstore/storage/impl/src/main/java/io/pravega/segmentstore/storage/impl/bookkeeper/ZooKeeperServiceRunner.java
> >).
>
> In summary, if the error I'm getting is related to changes in the SSL
> configuration introduced in zookeeper-3.5.5, it would be great to get
> feedback from you if I'm missing something. On the other hand, if the way
> we are creating a Zookeeper server process is not the recommended one, I'm
> also open to suggestions here.
>
> Thanks in advance and sorry for the long email,
> Raúl.
>
> PS: I have also tried to run the Zookeeper server process with SSL forcing
> to only use the netty and boringSSL library versions that are used either
> in Pravega(netty*:4.1.30.Final, netty-tcnative-boringssl-static:2.0.17) or
> Zookeeper 3.5.5(netty*:4.1.29.Final,
> netty-tcnative-boringssl-static:2.0.7), but none of these combinations made
> any difference in the behavior of the Zookeeper server process.
>
> PS2: The JDK version I use is: openjdk version "1.8.0_212".
>
>

Reply via email to