Hi all,

My name is Raúl Gracia and I work in the Pravega project (open-source project 
for data stream storage): http://pravega.io/.

I'm currently working on a Pravega branch using "zookeeper-3.5.5-rc6", as we 
are interested on allowing Curator (4.0.1) to use a Zookeeper version with the 
bugfix proposed in 
ZOOKEEPER-2184<https://issues.apache.org/jira/browse/ZOOKEEPER-2184>. The 
integration has been pretty smooth and 99% of tests are successful in a Pravega 
build, and the original issue that motivated the upgrade to zookeeper-3.5.5 
seems also solved.

However, there are failures related to a specific type of tests in Pravega in 
which we instantiate a Zookeeper server process (for testing Pravega in 
standalone mode). Such failures only occur when running the standalone tests 
with SSL enabled, which includes configuring the Zookeeper server process with 
SSL as well.

To constrain the scope of the problem, I have built zookeeper-3.5.5-rc6 ("mvn 
package") and executed the server (e.g., "./bin/zkServer.sh start") with the 
appropriate security configuration to enable SSL:
export SERVER_JVMFLAGS="
-Dzookeeper.serverCnxnFactory=org.apache.zookeeper.server.NettyServerCnxnFactory
-Dzookeeper.ssl.keyStore.location=.../server.keystore.jks
-Dzookeeper.ssl.keyStore.password=password
-Dzookeeper.ssl.trustStore.location=.../client.truststore.jks
-Dzookeeper.ssl.trustStore.password= password"
(I have also added secureClientPort=2281 in zoo.cfg as indicated in the admin 
instructions)

With the Zookeeper server running separately, I executed all the Pravega 
standalone tests (with and without SSL) pointing that external Zookeeper server 
(and disabling the Zookeeper server process that was created as part of the 
test workflow). Regarding configuration, in our tests the clients are 
configured with the recommended security settings in the administration guide:
System.setProperty("zookeeper.client.secure", "true");
System.setProperty("zookeeper.clientCnxnSocket", 
"org.apache.zookeeper.ClientCnxnSocketNetty");
System.setProperty("zookeeper.ssl.trustStore.location", 
.../client.truststore.jks");
System.setProperty("zookeeper.ssl.trustStore.password", "password ");
System.setProperty("zookeeper.ssl.keyStore.location", 
".../server.keystore.jks");
System.setProperty("zookeeper.ssl.keyStore.password", "password ");

In this case, all the Pravega standalone tests succeeded.

This leaves us the way we are configuring SSL in the Zookeeper server process 
in Pravega standalone as the most plausible cause for the problem. This is 
intriguing, as the security settings used are the same in both scenarios 
(zkServer.sh / Zookeeper server process started in the test code).

I have also confirmed this by running the Zookeeper server process used in 
standalone with/without SSL and connecting to it via the zkCli. Without SSL 
configured I can connect properly to it, whereas with SSL enabled I get the 
following error in the client:

2019-05-15 19:59:40,479 [myid:] - INFO  [main:ZooKeeper@868] - Initiating 
client connection, connectString=localhost:2281 sessionTimeout=30000 
watcher=org.apache.zookeeper.ZooKeeperMain$MyWatcher@621be5d1<mailto:watcher=org.apache.zookeeper.ZooKeeperMain$MyWatcher@621be5d1>
2019-05-15 19:59:40,507 [myid:] - INFO  [main:X509Util@79] - Setting -D 
jdk.tls.rejectClientInitiatedRenegotiation=true to disable client-initiated TLS 
renegotiation
2019-05-15 19:59:40,791 [myid:] - INFO  [main:ClientCnxnSocket@237] - 
jute.maxbuffer value is 4194304 Bytes
2019-05-15 19:59:40,798 [myid:] - INFO  [main:ClientCnxn@1653] - 
zookeeper.request.timeout value is 0. feature enabled=
2019-05-15 19:59:40,817 [myid:localhost:2281] - INFO  
[main-SendThread(localhost:2281):ClientCnxn$SendThread@1112] - Opening socket 
connection to server localhost/127.0.0.1:2281. Will not attempt to authenticate 
using SASL (unknown error)
Welcome to ZooKeeper!
JLine support is enabled
[zk: localhost:2281(CONNECTING) 0] 2019-05-15 19:59:41,168 
[myid:localhost:2281] - INFO  
[epollEventLoopGroup-2-1:ClientCnxnSocketNetty$ZKClientPipelineFactory@460] - 
SSL handler added for channel: [id: 0x7bf11dfa]
2019-05-15 19:59:41,176 [myid:localhost:2281] - INFO  
[epollEventLoopGroup-2-1:ClientCnxn$SendThread@959] - Socket connection 
established, initiating session, client: /127.0.0.1:52652, server: 
localhost/127.0.0.1:2281
2019-05-15 19:59:41,178 [myid:localhost:2281] - INFO  
[epollEventLoopGroup-2-1:ClientCnxnSocketNetty$1@188] - channel is connected: 
[id: 0x7bf11dfa, L:/127.0.0.1:52652 - R:localhost/127.0.0.1:2281]
2019-05-15 19:59:41,614 [myid:localhost:2281] - INFO  
[epollEventLoopGroup-2-1:ClientCnxn$SendThread@1394] - Session establishment 
complete on server localhost/127.0.0.1:2281, sessionid = 0x10002239ae10000, 
negotiated timeout = 30000
WATCHER::
WatchedEvent state:SyncConnected type:None path:null
[zk: localhost:2281(CONNECTED) 0] ls /
2019-05-15 20:00:01,616 [myid:localhost:2281] - WARN  
[main-SendThread(localhost:2281):ClientCnxn$SendThread@1190] - Client session 
timed out, have not heard from server in 20004ms for sessionid 0x10002239ae10000
2019-05-15 20:00:01,618 [myid:localhost:2281] - INFO  
[main-SendThread(localhost:2281):ClientCnxn$SendThread@1238] - Client session 
timed out, have not heard from server in 20004ms for sessionid 
0x10002239ae10000, closing socket connection and attempting reconnect
2019-05-15 20:00:01,630 [myid:localhost:2281] - INFO  
[epollEventLoopGroup-2-1:ClientCnxnSocketNetty$ZKClientHandler@473] - channel 
is disconnected: [id: 0x7bf11dfa, L:/127.0.0.1:52652 ! 
R:localhost/127.0.0.1:2281]
2019-05-15 20:00:01,631 [myid:localhost:2281] - INFO  
[epollEventLoopGroup-2-1:ClientCnxnSocketNetty@253] - channel is told closing
KeeperErrorCode = ConnectionLoss for /
[zk: localhost:2281(CONNECTED) 1]

I see some suspicious messages in these logs that I will need to investigate 
further. But as a general observation, it looks like the way we instantiate the 
Zookeeper server process for Pravega standalone is not valid in 
zookeeper-3.5.5-rc6 (to inspect how we create the Zookeeper server process, 
please see methods initialize() and start() in this 
file<https://github.com/pravega/pravega/blob/master/segmentstore/storage/impl/src/main/java/io/pravega/segmentstore/storage/impl/bookkeeper/ZooKeeperServiceRunner.java>).

In summary, if the error I'm getting is related to changes in the SSL 
configuration introduced in zookeeper-3.5.5, it would be great to get feedback 
from you if I'm missing something. On the other hand, if the way we are 
creating a Zookeeper server process is not the recommended one, I'm also open 
to suggestions here.

Thanks in advance and sorry for the long email,
Raúl.

PS: I have also tried to run the Zookeeper server process with SSL forcing to 
only use the netty and boringSSL library versions that are used either in 
Pravega(netty*:4.1.30.Final, netty-tcnative-boringssl-static:2.0.17) or 
Zookeeper 3.5.5(netty*:4.1.29.Final, netty-tcnative-boringssl-static:2.0.7), 
but none of these combinations made any difference in the behavior of the 
Zookeeper server process.

PS2: The JDK version I use is: openjdk version "1.8.0_212".

Reply via email to