RE: [VOTE] Apache ZooKeeper release 3.5.5 candidate 6

2019-05-17 Thread Gracia, Raul
+1

- Built Zookeeper 3.5.5-rc6 from source correctly.
- Exercised ZooKeeper standalone + zkCli with/without SSL configured.
- Pravega branch with ZooKeeper 3.5.5 (used by Curator 4.0.1) passed the build 
tests.
- All distributed tests of Pravega in a Kubernetes cluster with ZooKeeper 3.5.5 
(used by Curator 4.0.1) passed (running ZooKeeper 3.5.4-beta in the server 
side).

Thanks to all of you for your efforts making ZooKeeper such a great project 

Thanks,
Raúl.

-Original Message-
From: Rakesh Radhakrishnan  
Sent: Friday, May 17, 2019 5:23 PM
To: DevZooKeeper
Subject: Re: [VOTE] Apache ZooKeeper release 3.5.5 candidate 6


[EXTERNAL EMAIL] 

+1

- built ZooKeeper jar from source
- tested using 3 node cluster, ran smoke test scenarios, ran command line 
client, checked jmx beans and LGTM.
- verified sig/xsum, release notes looks fine.

Thanks Andor and others who have worked on making this release happen!

Rakesh

On Fri, May 3, 2019 at 6:04 PM Andor Molnar  wrote:

> This is the first stable release of 3.5 branch: 3.5.5. It resolves 117 
> issues, including Maven migration, Quorum TLS, TTL nodes and lots of 
> other performance and stability improvements.
>
> The full release notes is available at:
>
>
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12310
> 801=12343268
>
> *** Please download, test and vote by May 10th 2019, 23:59 UTC+0. ***
>
> Source files:
> https://dist.apache.org/repos/dist/dev/zookeeper/zookeeper-3.5.5-rc6/
>
> Maven staging repos:
>
> https://repository.apache.org/content/groups/staging/org/apache/zookee
> per/parent/3.5.5/
>
> https://repository.apache.org/content/groups/staging/org/apache/zookee
> per/zookeeper-jute/3.5.5/
>
> https://repository.apache.org/content/groups/staging/org/apache/zookee
> per/zookeeper/3.5.5/
>
> The release candidate tag in git to be voted upon: release-3.5.5-rc6
>
> ZooKeeper's KEYS file containing PGP keys we use to sign the release:
> http://www.apache.org/dist/zookeeper/KEYS
>
> Should we release this candidate?
>
>


RE: Question about security configuration (was: Re: [VOTE] Apache ZooKeeper release 3.5.5 candidate 6)

2019-05-16 Thread Gracia, Raul
Hi Andor,

You are totally correct, the server works adding this auth provider. Thanks a 
lot!

I did a cursory comparison between ZooKeeper versions 3.5.4-beta and 3.5.5 and 
I couldn't find a change that justifies this behavior change. 
In any case, the Pravega build has passed with zookeeper-3.5.5, which are great 
news. 

I will execute some more tests and leave my vote to the release candidate, if 
you feel that this could be useful.

Thanks a lot,
Raúl.

-Original Message-
From: Andor Molnar  
Sent: Thursday, May 16, 2019 6:43 PM
To: DevZooKeeper
Subject: Re: Question about security configuration (was: Re: [VOTE] Apache 
ZooKeeper release 3.5.5 candidate 6)


[EXTERNAL EMAIL] 

Hi Raul,

X509AuthenticationProvider is not registered in the embedded ZK. In server logs 
it says:
"[epollEventLoopGroup-4-1] ERROR
org.apache.zookeeper.server.NettyServerCnxnFactory - Auth provider not
found: x509"

It's done by QuorumPeerConfig.java:436 (configureSSLAuth()) when you run 
ZooKeeper in standalone mode, but your code doesn't use this configuration 
class at all.
If you add this:

System.setProperty("zookeeper.authProvider.x509",
"org.apache.zookeeper.server.auth.X509AuthenticationProvider");

to your initialize() method, client SSL works:

[nioEventLoopGroup-4-2] INFO
org.apache.zookeeper.server.NettyServerCnxnFactory - SSL handler added for
channel: [id: 0x698604a3, L:/127.0.0.1:2281 - R:/127.0.0.1:56750] 
[nioEventLoopGroup-4-2] INFO 
org.apache.zookeeper.server.auth.X509AuthenticationProvider - Authenticated Id 
'CN=server.pravegastack.io' for Scheme 'x509'

TBH I haven't diffed the code with 3.5.4-beta, so not sure why it worked 
previously and I don't have experience with embedded ZK, but I believe 
QuorumPeerConfig class has to be involved somehow.

Regards,
Andor



On Thu, May 16, 2019 at 5:10 PM Gracia, Raul  wrote:

> Thanks Andor for your quick reply. Let me answer to your questions:
>
> 1) Yes, the problem is related to client/server communication using 
> SSL, not related to Quorum SSL (we use a single Zookeeper process in our 
> tests).
> I would like your feedback first to conclude if this is a problem in 
> our config/code or a regression/change in the behavior of Zookeeper 3.5.5.
>
> 2) Yes, with the external Zookeeper server running separately (e.g., 
> zkServer.sh start) all the tests are passing (SSL/non-SSL). With the 
> Zookeeper server process we instantiate in our tests, the non-SSL 
> tests are also passing, but not the SSL ones.
>
> 3) Correct. Just to give more detail here, we are instantiating the 
> Zookeeper server process using the ZooKeeperServer class jointly with 
> NettyServerCnxnFactory.
>
> 4) I have done 2 types of tests: with Zookeeper started as a separate 
> service ("zkServer.sh") and using the Zookeeper server process we 
> instantiate in Pravega standalone tests (namely, "zk-pravega-tests"):
> - zkServer.sh: Works well with regular Zookeeper client (zkCli.sh) and 
> the Pravega standalone tests pass using it with/without SSL.
> - zk-pravega-tests: Without SSL, the zkCli.sh can connect to that 
> process and the non-SSL Pravega tests pass. With SSL configured, 
> neither zkCli.sh nor Pravega tests with SSL are capable to connect to 
> the server (KeeperErrorCode = ConnectionLoss).
>
> 5) No, I haven't tested this scenario yet. I have tested a standalone 
> Zookeeper server (zkServer.sh) and a client (zkCli.sh) with SSL 
> enabled in the same machine, and it works well. Apart from that, I 
> have also performed distributed tests with a Zookeeper server 
> (3.5.4-beta) and Pravega (using Curator 4.0.1 + zookeeper-3.5.5) in 
> Kubernetes and it worked fine.
>
> 6) Yes, in fact I have done a little more than that and I have created 
> a repository to investigate this issue in isolation:
> https://github.com/RaulGracia/zookeeper-test
> Apart from providing logs (see logs folder), in this repo I extracted 
> the piece of code from the Pravega repository that is used to start 
> the Zookeeper standalone process, making it easier to configure the 
> SSL properties via executable. I think that this will make it easier 
> for anyone to reproduce the problem I'm experiencing. Moreover, I have 
> provided instructions in the README file on how to reproduce the issue.
>
> Thanks a lot,
> Raúl.
>
>
> -Original Message-
> From: Andor Molnar 
> Sent: Thursday, May 16, 2019 11:18 AM
> To: DevZooKeeper
> Subject: Re: Question about security configuration (was: Re: [VOTE] 
> Apache ZooKeeper release 3.5.5 candidate 6)
>
>
> [EXTERNAL EMAIL]
>
> Hi Raul,
>
> Thanks for the analysis. Let me ask a few questions, because I see 
> some things that need to be clarified first.
>
> 1. This issue is only about server-client SSL scenario (not

RE: Question about security configuration (was: Re: [VOTE] Apache ZooKeeper release 3.5.5 candidate 6)

2019-05-16 Thread Gracia, Raul
Thanks Andor for your quick reply. Let me answer to your questions:

1) Yes, the problem is related to client/server communication using SSL, not 
related to Quorum SSL (we use a single Zookeeper process in our tests). I would 
like your feedback first to conclude if this is a problem in our config/code or 
a regression/change in the behavior of Zookeeper 3.5.5. 

2) Yes, with the external Zookeeper server running separately (e.g., 
zkServer.sh start) all the tests are passing (SSL/non-SSL). With the Zookeeper 
server process we instantiate in our tests, the non-SSL tests are also passing, 
but not the SSL ones.

3) Correct. Just to give more detail here, we are instantiating the Zookeeper 
server process using the ZooKeeperServer class jointly with 
NettyServerCnxnFactory.

4) I have done 2 types of tests: with Zookeeper started as a separate service 
("zkServer.sh") and using the Zookeeper server process we instantiate in 
Pravega standalone tests (namely, "zk-pravega-tests"):
- zkServer.sh: Works well with regular Zookeeper client (zkCli.sh) and the 
Pravega standalone tests pass using it with/without SSL.
- zk-pravega-tests: Without SSL, the zkCli.sh can connect to that process and 
the non-SSL Pravega tests pass. With SSL configured, neither zkCli.sh nor 
Pravega tests with SSL are capable to connect to the server (KeeperErrorCode = 
ConnectionLoss).

5) No, I haven't tested this scenario yet. I have tested a standalone Zookeeper 
server (zkServer.sh) and a client (zkCli.sh) with SSL enabled in the same 
machine, and it works well. Apart from that, I have also performed distributed 
tests with a Zookeeper server (3.5.4-beta) and Pravega (using Curator 4.0.1 + 
zookeeper-3.5.5) in Kubernetes and it worked fine.

6) Yes, in fact I have done a little more than that and I have created a 
repository to investigate this issue in isolation: 
https://github.com/RaulGracia/zookeeper-test
Apart from providing logs (see logs folder), in this repo I extracted the piece 
of code from the Pravega repository that is used to start the Zookeeper 
standalone process, making it easier to configure the SSL properties via 
executable. I think that this will make it easier for anyone to reproduce the 
problem I'm experiencing. Moreover, I have provided instructions in the README 
file on how to reproduce the issue.

Thanks a lot,
Raúl.


-Original Message-
From: Andor Molnar  
Sent: Thursday, May 16, 2019 11:18 AM
To: DevZooKeeper
Subject: Re: Question about security configuration (was: Re: [VOTE] Apache 
ZooKeeper release 3.5.5 candidate 6)


[EXTERNAL EMAIL] 

Hi Raul,

Thanks for the analysis. Let me ask a few questions, because I see some things 
that need to be clarified first.

1. This issue is only about server-client SSL scenario (not Quorum TLS), so 
it's possibly a regression in 3.5. Is that correct?
2. When running all Pravega tests against an external ZooKeeper standalone 
server, all tests passed including SSL/nonSSL. Is that correct?
3. SSL tests are failing when ZooKeeper is running inside the test process?
4. You verified it by running ZooKeeper in standalone mode, SSL-enabled and 
according to the log snippet, your client has connected successfully, but later 
timed out. Is that right?
5. Have you verified client-server SSL config with real (3-node) cluster with 
zkCli.sh?
6. Would you please provide the server side logs as well, maybe it sheds some 
light why the client timed out?

Thanks,
Andor




On Thu, May 16, 2019 at 10:25 AM Gracia, Raul  wrote:

> Hi all,
>
> My name is Raúl Gracia and I work in the Pravega project (open-source 
> project for data stream storage): http://pravega.io/.
>
> I'm currently working on a Pravega branch using "zookeeper-3.5.5-rc6", 
> as we are interested on allowing Curator (4.0.1) to use a Zookeeper 
> version with the bugfix proposed in ZOOKEEPER-2184< 
> https://issues.apache.org/jira/browse/ZOOKEEPER-2184>. The integration 
> has been pretty smooth and 99% of tests are successful in a Pravega 
> build, and the original issue that motivated the upgrade to 
> zookeeper-3.5.5 seems also solved.
>
> However, there are failures related to a specific type of tests in 
> Pravega in which we instantiate a Zookeeper server process (for 
> testing Pravega in standalone mode). Such failures only occur when 
> running the standalone tests with SSL enabled, which includes 
> configuring the Zookeeper server process with SSL as well.
>
> To constrain the scope of the problem, I have built 
> zookeeper-3.5.5-rc6 ("mvn package") and executed the server (e.g., 
> "./bin/zkServer.sh start") with the appropriate security configuration to 
> enable SSL:
> export SERVER_JVMFLAGS="
>
> -Dzookeeper.serverCnxnFactory=org.apache.zookeeper.server.NettyServerC
> nxnFactory -Dzookeeper.ssl.keyStore.location=.../server.keystore.jks
> -Dzookeep

Question about security configuration (was: Re: [VOTE] Apache ZooKeeper release 3.5.5 candidate 6)

2019-05-16 Thread Gracia, Raul
Hi all,

My name is Raúl Gracia and I work in the Pravega project (open-source project 
for data stream storage): http://pravega.io/.

I'm currently working on a Pravega branch using "zookeeper-3.5.5-rc6", as we 
are interested on allowing Curator (4.0.1) to use a Zookeeper version with the 
bugfix proposed in 
ZOOKEEPER-2184. The 
integration has been pretty smooth and 99% of tests are successful in a Pravega 
build, and the original issue that motivated the upgrade to zookeeper-3.5.5 
seems also solved.

However, there are failures related to a specific type of tests in Pravega in 
which we instantiate a Zookeeper server process (for testing Pravega in 
standalone mode). Such failures only occur when running the standalone tests 
with SSL enabled, which includes configuring the Zookeeper server process with 
SSL as well.

To constrain the scope of the problem, I have built zookeeper-3.5.5-rc6 ("mvn 
package") and executed the server (e.g., "./bin/zkServer.sh start") with the 
appropriate security configuration to enable SSL:
export SERVER_JVMFLAGS="
-Dzookeeper.serverCnxnFactory=org.apache.zookeeper.server.NettyServerCnxnFactory
-Dzookeeper.ssl.keyStore.location=.../server.keystore.jks
-Dzookeeper.ssl.keyStore.password=password
-Dzookeeper.ssl.trustStore.location=.../client.truststore.jks
-Dzookeeper.ssl.trustStore.password= password"
(I have also added secureClientPort=2281 in zoo.cfg as indicated in the admin 
instructions)

With the Zookeeper server running separately, I executed all the Pravega 
standalone tests (with and without SSL) pointing that external Zookeeper server 
(and disabling the Zookeeper server process that was created as part of the 
test workflow). Regarding configuration, in our tests the clients are 
configured with the recommended security settings in the administration guide:
System.setProperty("zookeeper.client.secure", "true");
System.setProperty("zookeeper.clientCnxnSocket", 
"org.apache.zookeeper.ClientCnxnSocketNetty");
System.setProperty("zookeeper.ssl.trustStore.location", 
.../client.truststore.jks");
System.setProperty("zookeeper.ssl.trustStore.password", "password ");
System.setProperty("zookeeper.ssl.keyStore.location", 
".../server.keystore.jks");
System.setProperty("zookeeper.ssl.keyStore.password", "password ");

In this case, all the Pravega standalone tests succeeded.

This leaves us the way we are configuring SSL in the Zookeeper server process 
in Pravega standalone as the most plausible cause for the problem. This is 
intriguing, as the security settings used are the same in both scenarios 
(zkServer.sh / Zookeeper server process started in the test code).

I have also confirmed this by running the Zookeeper server process used in 
standalone with/without SSL and connecting to it via the zkCli. Without SSL 
configured I can connect properly to it, whereas with SSL enabled I get the 
following error in the client:

2019-05-15 19:59:40,479 [myid:] - INFO  [main:ZooKeeper@868] - Initiating 
client connection, connectString=localhost:2281 sessionTimeout=3 
watcher=org.apache.zookeeper.ZooKeeperMain$MyWatcher@621be5d1
2019-05-15 19:59:40,507 [myid:] - INFO  [main:X509Util@79] - Setting -D 
jdk.tls.rejectClientInitiatedRenegotiation=true to disable client-initiated TLS 
renegotiation
2019-05-15 19:59:40,791 [myid:] - INFO  [main:ClientCnxnSocket@237] - 
jute.maxbuffer value is 4194304 Bytes
2019-05-15 19:59:40,798 [myid:] - INFO  [main:ClientCnxn@1653] - 
zookeeper.request.timeout value is 0. feature enabled=
2019-05-15 19:59:40,817 [myid:localhost:2281] - INFO  
[main-SendThread(localhost:2281):ClientCnxn$SendThread@1112] - Opening socket 
connection to server localhost/127.0.0.1:2281. Will not attempt to authenticate 
using SASL (unknown error)
Welcome to ZooKeeper!
JLine support is enabled
[zk: localhost:2281(CONNECTING) 0] 2019-05-15 19:59:41,168 
[myid:localhost:2281] - INFO  
[epollEventLoopGroup-2-1:ClientCnxnSocketNetty$ZKClientPipelineFactory@460] - 
SSL handler added for channel: [id: 0x7bf11dfa]
2019-05-15 19:59:41,176 [myid:localhost:2281] - INFO  
[epollEventLoopGroup-2-1:ClientCnxn$SendThread@959] - Socket connection 
established, initiating session, client: /127.0.0.1:52652, server: 
localhost/127.0.0.1:2281
2019-05-15 19:59:41,178 [myid:localhost:2281] - INFO  
[epollEventLoopGroup-2-1:ClientCnxnSocketNetty$1@188] - channel is connected: 
[id: 0x7bf11dfa, L:/127.0.0.1:52652 - R:localhost/127.0.0.1:2281]
2019-05-15 19:59:41,614 [myid:localhost:2281] - INFO  
[epollEventLoopGroup-2-1:ClientCnxn$SendThread@1394] - Session establishment 
complete on server localhost/127.0.0.1:2281, sessionid = 0x10002239ae1, 
negotiated timeout = 3
WATCHER::
WatchedEvent state:SyncConnected type:None path:null
[zk: localhost:2281(CONNECTED) 0] ls /
2019-05-15 20:00:01,616 [myid:localhost:2281] - WARN