[jira] [Resolved] (ZOOKEEPER-2650) Test Improvement by adding more QuorumPeer Auth related test cases

2017-01-03 Thread Raul Gutierrez Segales (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raul Gutierrez Segales resolved ZOOKEEPER-2650.
---
Resolution: Fixed

Merged:

https://git-wip-us.apache.org/repos/asf?p=zookeeper.git;a=commitdiff;h=8b75543a5375832333468345ca68d3ae6aa88f64

Thanks [~rakeshr]!

> Test Improvement by adding more QuorumPeer Auth related test cases
> --
>
> Key: ZOOKEEPER-2650
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2650
> Project: ZooKeeper
>  Issue Type: Test
>Reporter: Rakesh R
>Assignee: Rakesh R
> Fix For: 3.4.10
>
>
> This jira to add more test cases to the ZOOKEEPER-1045 feature.
> Cases:-
> 1) Ensemble with auth enabled Observer.
> 2) Connecting non-auth Observer to auth enabled quorum.
> 3) Quorum re-election with auth enabled servers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (ZOOKEEPER-2652) Fix HierarchicalQuorumTest.java

2016-12-21 Thread Raul Gutierrez Segales (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raul Gutierrez Segales resolved ZOOKEEPER-2652.
---
Resolution: Fixed

Issue resolved by pull request 132
[https://github.com/apache/zookeeper/pull/132]

> Fix HierarchicalQuorumTest.java
> ---
>
> Key: ZOOKEEPER-2652
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2652
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.4.10
>Reporter: Rakesh R
>Assignee: Rakesh R
> Fix For: 3.4.10
>
>
> The commit of ZOOKEEPER-2479 has introduced a compilation error(due to 
> diamond operator usage) in {{branch-3.4}}, which uses {{JDK 1.6}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2202) Cluster crashes when reconfig adds an unreachable observer

2016-12-07 Thread Raul Gutierrez Segales (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15730773#comment-15730773
 ] 

Raul Gutierrez Segales commented on ZOOKEEPER-2202:
---

[~hanm], [~phunt], [~shralex]: this is still hurting us in production, could we 
get it reviewed for 3.5.3 pls? Thanks!

> Cluster crashes when reconfig adds an unreachable observer
> --
>
> Key: ZOOKEEPER-2202
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2202
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.5.0, 3.6.0
>Reporter: Raul Gutierrez Segales
>Assignee: Raul Gutierrez Segales
> Fix For: 3.5.3, 3.6.0
>
> Attachments: ZOOKEEPER-2202.patch
>
>
> While adding support for reconfig() in Kazoo 
> (https://github.com/python-zk/kazoo/pull/333) I found that the cluster can be 
> crashed if you add an observer whose election port isn't reachable (i.e.: 
> packets for that destination are dropped, not rejected). This will raise a 
> SocketTimeoutException which will bring down the PrepRequestProcessor:
> {code}
> 2015-06-02 14:37:16,473 [myid:3] - WARN  [ProcessThread(sid:3 
> cport:-1)::QuorumCnxManager@384] - Cannot open channel to 100 at election 
> address /8.8.8.8:38703
> java.net.SocketTimeoutException: connect timed out
> at java.net.PlainSocketImpl.socketConnect(Native Method)
> at 
> java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:345)
> at 
> java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
> at 
> java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
> at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
> at java.net.Socket.connect(Socket.java:589)
> at 
> org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:369)
> at 
> org.apache.zookeeper.server.quorum.QuorumPeer.connectNewPeers(QuorumPeer.java:1288)
> at 
> org.apache.zookeeper.server.quorum.QuorumPeer.setLastSeenQuorumVerifier(QuorumPeer.java:1315)
> at org.apache.zookeeper.server.quorum.Leader.propose(Leader.java:1056)
> at 
> org.apache.zookeeper.server.quorum.ProposalRequestProcessor.processRequest(ProposalRequestProcessor.java:78)
> at 
> org.apache.zookeeper.server.PrepRequestProcessor.pRequest(PrepRequestProcessor.java:877)
> at 
> org.apache.zookeeper.server.PrepRequestProcessor.run(PrepRequestProcessor.java:143)
> {code}
> A simple repro can be obtained by using the code in the referenced pull 
> request above and using 8.8.8.8:3888 (for example) instead of a free (but 
> closed) port in the loopback. 
> I think that adding an Observer (or a Participant) that isn't currently 
> reachable is a valid use case (i.e.: you are provisioning the machine and 
> it's not currently needed) so I think we could handle this with lower connect 
> timeouts, not sure. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (ZOOKEEPER-2631) Make issue extraction in the git pull request script more robust

2016-11-12 Thread Raul Gutierrez Segales (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raul Gutierrez Segales resolved ZOOKEEPER-2631.
---
Resolution: Fixed

Merged:

https://git-wip-us.apache.org/repos/asf?p=zookeeper.git;a=commitdiff;h=881256ea97a19e51b1c6e9a114e6e61ad83bd4ec;hp=440e0923dd9e3be533a196fdd6ada960860ca7f6

Thanks [~fpj]!

> Make issue extraction in the git pull request script more robust
> 
>
> Key: ZOOKEEPER-2631
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2631
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: build
>Affects Versions: 3.6.0
>Reporter: Flavio Junqueira
>Assignee: Flavio Junqueira
>
> The QA build is failing for some pull requests because the issue title isn't 
> following the expected format. The issue extraction right now is a bit 
> fragile, so this is to fix the issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (ZOOKEEPER-2624) Add test script for pull requests

2016-11-06 Thread Raul Gutierrez Segales (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raul Gutierrez Segales resolved ZOOKEEPER-2624.
---
Resolution: Fixed

> Add test script for pull requests
> -
>
> Key: ZOOKEEPER-2624
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2624
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: scripts
>Reporter: Flavio Junqueira
>Assignee: Flavio Junqueira
>
> We need a script similar to {{test-patch.sh}} to handle QA builds for pull 
> requests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2624) Add test script for pull requests

2016-11-06 Thread Raul Gutierrez Segales (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15642378#comment-15642378
 ] 

Raul Gutierrez Segales commented on ZOOKEEPER-2624:
---

Merged:

https://git-wip-us.apache.org/repos/asf?p=zookeeper.git;a=commitdiff;h=440e0923dd9e3be533a196fdd6ada960860ca7f6;hp=bcb07a09b06c91243ed244f04a71b8daf629e286

Thanks Flavio & Ben!

> Add test script for pull requests
> -
>
> Key: ZOOKEEPER-2624
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2624
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: scripts
>Reporter: Flavio Junqueira
>Assignee: Flavio Junqueira
>
> We need a script similar to {{test-patch.sh}} to handle QA builds for pull 
> requests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2627) Remove ZRWSERVERFOUND from C client and replace handle_error with something more semantically explicit for r/w server reconnect.

2016-11-05 Thread Raul Gutierrez Segales (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15640591#comment-15640591
 ] 

Raul Gutierrez Segales commented on ZOOKEEPER-2627:
---

Thanks for cleaning this up [~hanm]! Wonder what others thing of the ABI 
breakage; other than that lgtm

> Remove ZRWSERVERFOUND from C client and replace handle_error with something 
> more semantically explicit for r/w server reconnect.
> 
>
> Key: ZOOKEEPER-2627
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2627
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: c client
>Affects Versions: 3.5.2
>Reporter: Michael Han
>Assignee: Michael Han
> Fix For: 3.5.3
>
> Attachments: ZOOKEEPER-2627.patch
>
>
> While working on ZOOKEEPER-2014, I noticed a discrepancy between Java and C 
> client regarding the error codes definition. There is a 
> {noformat}ZRWSERVERFOUND = -122{noformat} definition in C client which is not 
> present in Java client's KeeperException.Code definitions. 
> This discrepancy was introduced by ZOOKEEPER-827, where the C client logic 
> was simulating the Java client's logic when doing a read/write server search 
> while client is in read only mode. Once client finds a valid read/write 
> server, client will try to disconnect and reconnect with this read/write 
> server, as we always prefer r/w server in ro mode. The way Java client is 
> doing this disconnect/reconnect process is by throwing a 
> RWServerFoundException (instead of a KeeperException) to set the client in 
> disconnected state, then wait for client reconnect with r/w server address 
> set before throwing the exception. C client did similar but instead of having 
> an explicitly disconnect / clean up routine, the client was relying on 
> handle_error to do the job where ZRWSERVERFOUND was introduced.
> I propose we remove ZRWSERVERFOUND error code from C client and use an 
> explicit routine instead of handle_error when we do r/w server search in C 
> client for two reasons:
> * ZRWSERVERFOUND is not something ZK client users would need to know. It's a 
> pure implementation detail that's used to alter the connection state of the 
> client, and ZK client users have no desire nor need to handle such errors, as 
> R/W server scanning and connect is handled transparently by ZK client library.
> * To maintain consistency between Java and C client regarding error codes 
> definition. Without removing this from C client, we would need to replace 
> RWServerFoundException in Java client with a new KeeperException, and again 
> with the reason mentioned above, we don't need a KeeperException for this 
> because such implementation detail does not have to be exposed to end users 
> (unless, we provided alternative for users to opt-out automate R/W server 
> switching when in read only mode which we don't.).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2549) As NettyServerCnxn.sendResponse() allows all the exception to bubble up it can stop main ZK requests processing thread

2016-11-05 Thread Raul Gutierrez Segales (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15640557#comment-15640557
 ] 

Raul Gutierrez Segales commented on ZOOKEEPER-2549:
---

[~yufeldman]: thanks for addressing the comments. Added a few more comments on 
GH. Thanks!

> As NettyServerCnxn.sendResponse() allows all the exception to bubble up it 
> can stop main ZK requests processing thread
> --
>
> Key: ZOOKEEPER-2549
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2549
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.5.1
>Reporter: Yuliya Feldman
>Assignee: Yuliya Feldman
> Attachments: ZOOKEEPER-2549-2.patch, ZOOKEEPER-2549-3.patch, 
> ZOOKEEPER-2549-3.patch, ZOOKEEPER-2549.patch, ZOOKEEPER-2549.patch, 
> zookeeper-2549-1.patch
>
>
> As NettyServerCnxn.sendResponse() allows all the exception to bubble up it 
> can stop main ZK requests processing thread and make Zookeeper server look 
> like it is hanging, while it just can not process any request anymore.
> Idea is to catch all the exceptions in NettyServerCnxn.sendResponse() , 
> convert them to IOException and allow it propagating up



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2549) As NettyServerCnxn.sendResponse() allows all the exception to bubble up it can stop main ZK requests processing thread

2016-11-05 Thread Raul Gutierrez Segales (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15640362#comment-15640362
 ] 

Raul Gutierrez Segales commented on ZOOKEEPER-2549:
---

we can ignore the findbugs warnings (see ZOOKEEPER-2628). 

> As NettyServerCnxn.sendResponse() allows all the exception to bubble up it 
> can stop main ZK requests processing thread
> --
>
> Key: ZOOKEEPER-2549
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2549
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.5.1
>Reporter: Yuliya Feldman
>Assignee: Yuliya Feldman
> Attachments: ZOOKEEPER-2549-2.patch, ZOOKEEPER-2549-3.patch, 
> ZOOKEEPER-2549-3.patch, ZOOKEEPER-2549.patch, ZOOKEEPER-2549.patch, 
> zookeeper-2549-1.patch
>
>
> As NettyServerCnxn.sendResponse() allows all the exception to bubble up it 
> can stop main ZK requests processing thread and make Zookeeper server look 
> like it is hanging, while it just can not process any request anymore.
> Idea is to catch all the exceptions in NettyServerCnxn.sendResponse() , 
> convert them to IOException and allow it propagating up



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2549) As NettyServerCnxn.sendResponse() allows all the exception to bubble up it can stop main ZK requests processing thread

2016-11-04 Thread Raul Gutierrez Segales (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15636811#comment-15636811
 ] 

Raul Gutierrez Segales commented on ZOOKEEPER-2549:
---

[~yufeldman]: a few things:

In:

{code}
  } catch(Exception e) {
 LOG.warn("Unexpected exception. Destruction averted.", e);
+throw new IOException(e);
  }
 }
{code}

can you remove the LOG.warn()? I don't think it's relevant anymore, given it 
will be handled by the caller. 


Nit in:

{code}
+if ( serverCnxnClassName != null ) {
{code}

extra spaces around the condition.

Ditto for:

{code}
+if ( serverCnxnClassCtr != null ) {
{code}


Looks like you are doing extra work (allocations) here:

{code}
+NIOServerCnxn cnxn = new NIOServerCnxn(zkServer, sock, sk, this, 
selectorThread);
+
+if ( serverCnxnClassCtr != null ) {
+try {
+cnxn = serverCnxnClassCtr.newInstance(zkServer, sock, sk, 
this, selectorThread);
+} catch (InstantiationException e1) {
+LOG.debug("Can not instantiate class for " + 
serverCnxnClassCtr.getName() + ". Using NIOServerCnxn");
+} catch (IllegalAccessException e1) {
+LOG.debug("IllegalAccessException for " + 
serverCnxnClassCtr.getName() + ". Using NIOServerCnxn");
+} catch (InvocationTargetException e1) {
+LOG.debug("InvocationTargetException for " + 
serverCnxnClassCtr.getName() + ". Using NIOServerCnxn");
+} catch (Throwable t) {
+LOG.debug("Unknown Exception while dealing with: {} . Using 
NIOServerCnxn", serverCnxnClassCtr.getName());
+}
+}
{code}

Sounds like we should try this first (if possible):

{code}
cnxn = serverCnxnClassCtr.newInstance(zkServer, sock, sk, this, 
selectorThread);
{code}

And only fallback to this:

{code}
   cnxn = new NIOServerCnxn(zkServer, sock, sk, this, selectorThread);
{code}

if that failed.


In:

{code}
+} catch (Exception e) {
+LOG.warn("Unexpected exception. Converting to IOException.", e);
+throw new IOException(e);
 }
{code}

I'd drop the warning, it's common enough...

Extra whitespaces:

{code}
+  if ( stats != null ) {
+int length = stats.getDataLength();
+  }
{code}

Other than that, I think it's looking good. Thanks [~yufeldman]!

> As NettyServerCnxn.sendResponse() allows all the exception to bubble up it 
> can stop main ZK requests processing thread
> --
>
> Key: ZOOKEEPER-2549
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2549
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.5.1
>Reporter: Yuliya Feldman
>Assignee: Yuliya Feldman
> Attachments: ZOOKEEPER-2549-2.patch, ZOOKEEPER-2549.patch, 
> ZOOKEEPER-2549.patch, zookeeper-2549-1.patch
>
>
> As NettyServerCnxn.sendResponse() allows all the exception to bubble up it 
> can stop main ZK requests processing thread and make Zookeeper server look 
> like it is hanging, while it just can not process any request anymore.
> Idea is to catch all the exceptions in NettyServerCnxn.sendResponse() , 
> convert them to IOException and allow it propagating up



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2549) As NettyServerCnxn.sendResponse() allows all the exception to bubble up it can stop main ZK requests processing thread

2016-11-04 Thread Raul Gutierrez Segales (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15636749#comment-15636749
 ] 

Raul Gutierrez Segales commented on ZOOKEEPER-2549:
---

[~yufeldman]: oops, sorry for dropping the ball. reviewing it now. 

> As NettyServerCnxn.sendResponse() allows all the exception to bubble up it 
> can stop main ZK requests processing thread
> --
>
> Key: ZOOKEEPER-2549
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2549
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.5.1
>Reporter: Yuliya Feldman
>Assignee: Yuliya Feldman
> Attachments: ZOOKEEPER-2549-2.patch, ZOOKEEPER-2549.patch, 
> ZOOKEEPER-2549.patch, zookeeper-2549-1.patch
>
>
> As NettyServerCnxn.sendResponse() allows all the exception to bubble up it 
> can stop main ZK requests processing thread and make Zookeeper server look 
> like it is hanging, while it just can not process any request anymore.
> Idea is to catch all the exceptions in NettyServerCnxn.sendResponse() , 
> convert them to IOException and allow it propagating up



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-761) Remove *synchronous* calls from the *single-threaded* C clieant API, since they are documented not to work

2016-10-22 Thread Raul Gutierrez Segales (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15598295#comment-15598295
 ] 

Raul Gutierrez Segales commented on ZOOKEEPER-761:
--

Commented on GH — overall lgtm as well. Wrt  to the lack of type safeness in 
assuming {code}const void *data{code} will be {code}struct 
sync_completion*{code} I think that's probably work for another ticket. All 
other async calls operate under the same assumption.

> Remove *synchronous* calls from the *single-threaded* C clieant API, since 
> they are documented not to work
> --
>
> Key: ZOOKEEPER-761
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-761
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: c client
>Affects Versions: 3.1.1, 3.2.2
> Environment: RHEL 4u8 (Linux).  The issue is not OS-specific though.
>Reporter: Jozef Hatala
>Assignee: Benjamin Reed
>Priority: Minor
> Fix For: 3.5.3, 3.6.0
>
> Attachments: fix-sync-apis-in-st-adaptor.patch, 
> fix-sync-apis-in-st-adaptor.v2.patch
>
>
> Since the synchronous calls are 
> [known|http://hadoop.apache.org/zookeeper/docs/current/zookeeperProgrammers.html#Using+the+C+Client]
>  to be unimplemented in the single threaded version of the client library 
> libzookeeper_st.so, I believe that it would be helpful towards users of the 
> library if that information was also obvious from the header file.
> Anecdotally more than one of us here made the mistake of starting by using 
> the synchronous calls with the single-threaded library, and we found 
> ourselves debugging it.  An early warning would have been greatly appreciated.
> 1. Could you please add warnings to the doxygen blocks of all synchronous 
> calls saying that they are not available in the single-threaded API.  This 
> cannot be safely done with {{#ifdef THREADED}}, obviously, because the same 
> header file is included whichever client library implementation one is 
> compiling for.
> 2. Could you please bracket the implementation of all synchronous calls in 
> zookeeper.c with {{#ifdef THREADED}} and {{#endif}}, so that those symbols 
> are not present in libzookeeper_st.so?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2619) Client library reconnecting breaks FIFO client order

2016-10-21 Thread Raul Gutierrez Segales (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15596822#comment-15596822
 ] 

Raul Gutierrez Segales commented on ZOOKEEPER-2619:
---

[~ongardie]: thanks for reporting this. In the example given though:

{code}
  zk = new ZooKeeper(...)
  // The library establishes a TCP connection.
  zk.createAsync("/data-23857", "...", callback)
  // The library/kernel closes the TCP connection because it times out, and
  // the create of /data-23857 is doomed to fail with ConnectionLoss. Suppose
  // that it never reaches the server.
  // The library establishes a new TCP connection.
  zk.createSync("/pointer", "/data-23857")
  // The create of /pointer succeeds.
{code}

Callback should be called with ConnectionLossException before createSync() is 
send to the server, because internally all requests — sync or async — are 
serialized through the same queue.

It sounds like the assumption here is that a createSync() should fail if a 
previous createAsync() call failed? That should be left to the application, no? 
Internally, all replies/events are delivered in order so ordering shouldn't be 
broken — no?

> Client library reconnecting breaks FIFO client order
> 
>
> Key: ZOOKEEPER-2619
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2619
> Project: ZooKeeper
>  Issue Type: Bug
>Reporter: Diego Ongaro
>
> According to the USENIX ATC 2010 
> [paper|https://www.usenix.org/conference/usenix-atc-10/zookeeper-wait-free-coordination-internet-scale-systems],
>  ZooKeeper provides "FIFO client order: all requests from a given client are 
> executed in the order that they were sent by the client." I believe 
> applications written using the Java client library are unable to rely on this 
> guarantee, and any current application that does so is broken. Other client 
> libraries are also likely to be affected.
> Consider this application, which is simplified from the algorithm described 
> on Page 4 (right column) of the paper:
> {code}
>   zk = new ZooKeeper(...)
>   zk.createAsync("/data-23857", "...", callback)
>   zk.createSync("/pointer", "/data-23857")
> {code}
> Assume an empty ZooKeeper database to begin with and no other writers. 
> Applying the above definition, if the ZooKeeper database contains /pointer, 
> it must also contain /data-23857.
> Now consider this series of unfortunate events:
> {code}
>   zk = new ZooKeeper(...)
>   // The library establishes a TCP connection.
>   zk.createAsync("/data-23857", "...", callback)
>   // The library/kernel closes the TCP connection because it times out, and
>   // the create of /data-23857 is doomed to fail with ConnectionLoss. Suppose
>   // that it never reaches the server.
>   // The library establishes a new TCP connection.
>   zk.createSync("/pointer", "/data-23857")
>   // The create of /pointer succeeds.
> {code}
> That's the problem: subsequent operations get assigned to the new connection 
> and succeed, while earlier operations fail.
> In general, I believe it's impossible to have a system with the following 
> three properties:
>  # FIFO client order for asynchronous operations,
>  # Failing operations when connections are lost, AND
>  # Transparently reconnecting when connections are lost.
> To argue this, consider an application that issues a series of pipelined 
> operations, then upon noticing a connection loss, issues a series of recovery 
> operations, repeating the recovery procedure as necessary. If a pipelined 
> operation fails, all subsequent operations in the pipeline must also fail. 
> Yet the client must also carry on eventually: the recovery operations cannot 
> be trivially failed forever. Unfortunately, the client library does not know 
> where the pipelined operations end and the recovery operations begin. At the 
> time of a connection loss, subsequent pipelined operations may or may not be 
> queued in the library; others might be upcoming in the application thread. If 
> the library re-establishes a connection too early, it will send pipelined 
> operations out of FIFO client order.
> I considered a possible workaround of having the client diligently check its 
> callbacks and watchers for connection loss events, and do its best to stop 
> the subsequent pipelined operations at the first sign of a connection loss. 
> In addition to being a large burden for the application, this does not solve 
> the problem all the time. In particular, if the callback thread is delayed 
> significantly (as can happen due to excessive computation or scheduling 
> hiccups), the application may not learn about the connection loss event until 
> after the connection has been re-established and after dependent pipelined 
> operations have already been transmitted over the new connection.
> I suggest the following API 

[jira] [Resolved] (ZOOKEEPER-2611) zoo_remove_watchers - can remove the wrong watch

2016-10-09 Thread Raul Gutierrez Segales (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raul Gutierrez Segales resolved ZOOKEEPER-2611.
---
Resolution: Fixed

> zoo_remove_watchers - can remove the wrong watch 
> -
>
> Key: ZOOKEEPER-2611
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2611
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: c client
>Reporter: Eyal leshem
>Priority: Critical
> Attachments: ZOOKEEPER-2611.patch
>
>
> The actual problem is in the function "removeWatcherFromList" - 
> That when we check if we need to delete the watch -  we compare the 
> WatcherCtx to one node before the one we want to delete.. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2611) zoo_remove_watchers - can remove the wrong watch

2016-10-09 Thread Raul Gutierrez Segales (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15560597#comment-15560597
 ] 

Raul Gutierrez Segales commented on ZOOKEEPER-2611:
---

Merged for master & 3.5:

https://git-wip-us.apache.org/repos/asf?p=zookeeper.git;a=commitdiff;h=f78061aafb19b102c37cb6d744ec6258d5f5b66e
https://git-wip-us.apache.org/repos/asf?p=zookeeper.git;a=commitdiff;h=d53501b090906a89bfa30cce582d18a06123e49c

Thanks [~Eyal.leshem]!

> zoo_remove_watchers - can remove the wrong watch 
> -
>
> Key: ZOOKEEPER-2611
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2611
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: c client
>Reporter: Eyal leshem
>Priority: Critical
> Attachments: ZOOKEEPER-2611.patch
>
>
> The actual problem is in the function "removeWatcherFromList" - 
> That when we check if we need to delete the watch -  we compare the 
> WatcherCtx to one node before the one we want to delete.. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2611) zoo_remove_watchers - can remove the wrong watch

2016-10-09 Thread Raul Gutierrez Segales (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15560509#comment-15560509
 ] 

Raul Gutierrez Segales commented on ZOOKEEPER-2611:
---

[~Eyal.leshem]: see my comment above, the ctxt parameter is in the wrong place. 
The tests logs are here:

https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/3475//console

> zoo_remove_watchers - can remove the wrong watch 
> -
>
> Key: ZOOKEEPER-2611
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2611
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: c client
>Reporter: Eyal leshem
>Priority: Critical
> Attachments: ZOOKEEPER-2611.patch
>
>
> The actual problem is in the function "removeWatcherFromList" - 
> That when we check if we need to delete the watch -  we compare the 
> WatcherCtx to one node before the one we want to delete.. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2611) zoo_remove_watchers - can remove the wrong watch

2016-10-09 Thread Raul Gutierrez Segales (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15560506#comment-15560506
 ] 

Raul Gutierrez Segales commented on ZOOKEEPER-2611:
---

The context parameter is in the wrong place (the last parameter is struct Stat 
\*stat, whereas the 4th param is void\* watcherCtx):

{code}
int zoo_wget(zhandle_t *zh, const char *path,
watcher_fn watcher, void* watcherCtx,
char *buffer, int* buffer_len, struct Stat *stat)
{code}

> zoo_remove_watchers - can remove the wrong watch 
> -
>
> Key: ZOOKEEPER-2611
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2611
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: c client
>Reporter: Eyal leshem
>Priority: Critical
> Attachments: ZOOKEEPER-2611.patch
>
>
> The actual problem is in the function "removeWatcherFromList" - 
> That when we check if we need to delete the watch -  we compare the 
> WatcherCtx to one node before the one we want to delete.. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2608) Create CLI option for TTL ephemerals

2016-10-09 Thread Raul Gutierrez Segales (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15560489#comment-15560489
 ] 

Raul Gutierrez Segales commented on ZOOKEEPER-2608:
---

quick pass and some nits:

{code}
+if ( hasT && hasE ) {
+throw new MalformedCommandException("TTLs cannot be used with 
Ephemeral znodes");
+}
+if ( hasT && hasC ) {
+throw new MalformedCommandException("TTLs cannot be used with 
Container znodes");
+}
+
{code}

extra whitespaces around the conditions (doesn't match that file's coding 
sytle).

{code}
+if ( hasT ) {
+try {
+EphemeralType.ttlToEphemeralOwner(ttl);
+} catch (IllegalArgumentException e) {
+throw new MalformedCommandException(e.getMessage());
+}
{code}

ditto.

> Create CLI option for TTL ephemerals
> 
>
> Key: ZOOKEEPER-2608
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2608
> Project: ZooKeeper
>  Issue Type: Sub-task
>  Components: c client, java client, jute, server
>Reporter: Camille Fournier
>Assignee: Jordan Zimmerman
> Fix For: 3.6.0
>
> Attachments: ZOOKEEPER-2608-2.patch, ZOOKEEPER-2608.patch
>
>
> Need to update CreateCommand to have the TTL node option



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2611) zoo_remove_watchers - can remove the wrong watch

2016-10-09 Thread Raul Gutierrez Segales (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15560222#comment-15560222
 ] 

Raul Gutierrez Segales commented on ZOOKEEPER-2611:
---

[~Eyal.leshem] thanks for the patch! LGTM, do you think you could also  add a 
test case with > 1 watch?

> zoo_remove_watchers - can remove the wrong watch 
> -
>
> Key: ZOOKEEPER-2611
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2611
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: c client
>Reporter: Eyal leshem
>Priority: Critical
> Attachments: ZOOKEEPER-2611.patch
>
>
> The actual problem is in the function "removeWatcherFromList" - 
> That when we check if we need to delete the watch -  we compare the 
> WatcherCtx to one node before the one we want to delete.. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2576) After svn to git migration ZooKeeper Precommit jenkins job is failing.

2016-09-12 Thread Raul Gutierrez Segales (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15484676#comment-15484676
 ] 

Raul Gutierrez Segales commented on ZOOKEEPER-2576:
---

(waiting for reply before closing)

> After svn to git migration ZooKeeper Precommit jenkins job is failing.
> --
>
> Key: ZOOKEEPER-2576
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2576
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: build
>Reporter: Patrick Hunt
>Assignee: Patrick Hunt
>Priority: Blocker
> Attachments: ZOOKEEPER-2576.patch
>
>
> After moving from svn to git the precommit job is failing. I've disabled it 
> temporarily.
> https://builds.apache.org/view/S-Z/view/ZooKeeper/job/PreCommit-ZOOKEEPER-Build/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2576) After svn to git migration ZooKeeper Precommit jenkins job is failing.

2016-09-12 Thread Raul Gutierrez Segales (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15484678#comment-15484678
 ] 

Raul Gutierrez Segales commented on ZOOKEEPER-2576:
---

Oops -- committed with the wrong credentials. Updating my git config now. Sorry 
about that. 

> After svn to git migration ZooKeeper Precommit jenkins job is failing.
> --
>
> Key: ZOOKEEPER-2576
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2576
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: build
>Reporter: Patrick Hunt
>Assignee: Patrick Hunt
>Priority: Blocker
> Attachments: ZOOKEEPER-2576.patch
>
>
> After moving from svn to git the precommit job is failing. I've disabled it 
> temporarily.
> https://builds.apache.org/view/S-Z/view/ZooKeeper/job/PreCommit-ZOOKEEPER-Build/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2576) After svn to git migration ZooKeeper Precommit jenkins job is failing.

2016-09-12 Thread Raul Gutierrez Segales (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15484671#comment-15484671
 ] 

Raul Gutierrez Segales commented on ZOOKEEPER-2576:
---

Merged:

https://git-wip-us.apache.org/repos/asf?p=zookeeper.git;a=commitdiff;h=8c4082647f89b0a92fa00a2af8de84b3c7314e23

Do we need this in any other branch besides trunk or we only kick things from 
trunk?

Thanks [~phunt]!

> After svn to git migration ZooKeeper Precommit jenkins job is failing.
> --
>
> Key: ZOOKEEPER-2576
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2576
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: build
>Reporter: Patrick Hunt
>Assignee: Patrick Hunt
>Priority: Blocker
> Attachments: ZOOKEEPER-2576.patch
>
>
> After moving from svn to git the precommit job is failing. I've disabled it 
> temporarily.
> https://builds.apache.org/view/S-Z/view/ZooKeeper/job/PreCommit-ZOOKEEPER-Build/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2576) After svn to git migration ZooKeeper Precommit jenkins job is failing.

2016-09-12 Thread Raul Gutierrez Segales (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15484675#comment-15484675
 ] 

Raul Gutierrez Segales commented on ZOOKEEPER-2576:
---

(waiting for reply before closing)

> After svn to git migration ZooKeeper Precommit jenkins job is failing.
> --
>
> Key: ZOOKEEPER-2576
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2576
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: build
>Reporter: Patrick Hunt
>Assignee: Patrick Hunt
>Priority: Blocker
> Attachments: ZOOKEEPER-2576.patch
>
>
> After moving from svn to git the precommit job is failing. I've disabled it 
> temporarily.
> https://builds.apache.org/view/S-Z/view/ZooKeeper/job/PreCommit-ZOOKEEPER-Build/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (ZOOKEEPER-1927) zkServer.sh fails to read dataDir (and others) from zoo.cfg on Solaris 10 (grep issue, manifests as FAILED TO WRITE PID).

2016-09-07 Thread Raul Gutierrez Segales (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raul Gutierrez Segales updated ZOOKEEPER-1927:
--
Fix Version/s: (was: 3.5.2)
   3.5.3

> zkServer.sh fails to read dataDir (and others) from zoo.cfg on Solaris 10 
> (grep issue, manifests as FAILED TO WRITE PID).  
> ---
>
> Key: ZOOKEEPER-1927
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1927
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: scripts
>Affects Versions: 3.4.6
> Environment: Solaris 5.10 
>Reporter: Ed Schmed
>Assignee: Chris Nauroth
> Fix For: 3.4.7, 3.5.3, 3.6.0
>
> Attachments: ZOOKEEPER-1927-branch-3.4.002.patch, 
> ZOOKEEPER-1927.001.patch, ZOOKEEPER-1927.002.patch
>
>
> Fails to write PID file with a permissions error, because the startup script 
> fails to read the dataDir variable from zoo.cfg, and then tries to use the 
> drive root ( / ) as the data dir.
> Tracked the problem down to line 84 of zkServer.sh:
> ZOO_DATADIR="$(grep "^[[:space:]]*dataDir" "$ZOOCFG" | sed -e 's/.*=//')"
> If i run just that line and point it right at the config file, ZOO_DATADIR is 
> empty.
> If I remove [[:space:]]* from the grep:
> ZOO_DATADIR="$(grep "^dataDir" "$ZOOCFG" | sed -e 's/.*=//')"
> Then it works fine. (If I also make the same change on line 164 and 169)
> My regex skills are pretty bad, so I'm afraid to comment on why [[space]]* 
> needs to be in there?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-1927) zkServer.sh fails to read dataDir (and others) from zoo.cfg on Solaris 10 (grep issue, manifests as FAILED TO WRITE PID).

2016-09-07 Thread Raul Gutierrez Segales (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15472652#comment-15472652
 ] 

Raul Gutierrez Segales commented on ZOOKEEPER-1927:
---

Oops -- forgot to merge for 3.5 (literally a year ago!):

https://github.com/apache/zookeeper/commit/ac26b96b61e116937239a15fb4dbcc4f17a4f818

(thanks [~phunt] for the heads-up) 


> zkServer.sh fails to read dataDir (and others) from zoo.cfg on Solaris 10 
> (grep issue, manifests as FAILED TO WRITE PID).  
> ---
>
> Key: ZOOKEEPER-1927
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1927
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: scripts
>Affects Versions: 3.4.6
> Environment: Solaris 5.10 
>Reporter: Ed Schmed
>Assignee: Chris Nauroth
> Fix For: 3.4.7, 3.5.2, 3.6.0
>
> Attachments: ZOOKEEPER-1927-branch-3.4.002.patch, 
> ZOOKEEPER-1927.001.patch, ZOOKEEPER-1927.002.patch
>
>
> Fails to write PID file with a permissions error, because the startup script 
> fails to read the dataDir variable from zoo.cfg, and then tries to use the 
> drive root ( / ) as the data dir.
> Tracked the problem down to line 84 of zkServer.sh:
> ZOO_DATADIR="$(grep "^[[:space:]]*dataDir" "$ZOOCFG" | sed -e 's/.*=//')"
> If i run just that line and point it right at the config file, ZOO_DATADIR is 
> empty.
> If I remove [[:space:]]* from the grep:
> ZOO_DATADIR="$(grep "^dataDir" "$ZOOCFG" | sed -e 's/.*=//')"
> Then it works fine. (If I also make the same change on line 164 and 169)
> My regex skills are pretty bad, so I'm afraid to comment on why [[space]]* 
> needs to be in there?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2355) Ephemeral node is never deleted if follower fails while reading the proposal packet

2016-08-09 Thread Raul Gutierrez Segales (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15414746#comment-15414746
 ] 

Raul Gutierrez Segales commented on ZOOKEEPER-2355:
---

One small nit, these two methods:

{code}
+private QuorumPeer getLeader(MainThread[] mt) {
+for (int i = mt.length - 1; i >= 0; i--) {
+QuorumPeer quorumPeer = mt[i].getQuorumPeer();
+if (null != quorumPeer && ServerState.LEADING == 
quorumPeer.getPeerState()) {
+return quorumPeer;
+}
+}
+return null;
+}
+
+private QuorumPeer getFollower(MainThread[] mt) {
+for (int i = mt.length - 1; i >= 0; i--) {
+QuorumPeer quorumPeer = mt[i].getQuorumPeer();
+if (null != quorumPeer && ServerState.FOLLOWING == 
quorumPeer.getPeerState()) {
+return quorumPeer;
+}
+}
+return null;
+}
{code}

Can probably be reduced to one more general method:

{code}
+private QuorumPeer getByServerState(MainThread[] mt, ServerState state) {
+for (int i = mt.length - 1; i >= 0; i--) {
+QuorumPeer quorumPeer = mt[i].getQuorumPeer();
+if (null != quorumPeer && state == quorumPeer.getPeerState()) {
+return quorumPeer;
+}
+}
+return null;
+}
{code}



> Ephemeral node is never deleted if follower fails while reading the proposal 
> packet
> ---
>
> Key: ZOOKEEPER-2355
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2355
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: quorum, server
>Reporter: Arshad Mohammad
>Assignee: Martin Kuchta
>Priority: Critical
> Fix For: 3.4.9
>
> Attachments: ZOOKEEPER-2355-01.patch, ZOOKEEPER-2355-02.patch, 
> ZOOKEEPER-2355-03.patch
>
>
> ZooKeeper ephemeral node is never deleted if follower fail while reading the 
> proposal packet
> The scenario is as follows:
> # Configure three node ZooKeeper cluster, lets say nodes are A, B and C, 
> start all, assume A is leader, B and C are follower
> # Connect to any of the server and create ephemeral node /e1
> # Close the session, ephemeral node /e1 will go for deletion
> # While receiving delete proposal make Follower B to fail with 
> {{SocketTimeoutException}}. This we need to do to reproduce the scenario 
> otherwise in production environment it happens because of network fault.
> # Remove the fault, just check that faulted Follower is now connected with 
> quorum
> # Connect to any of the server, create the same ephemeral node /e1, created 
> is success.
> # Close the session,  ephemeral node /e1 will go for deletion
> # {color:red}/e1 is not deleted from the faulted Follower B, It should have 
> been deleted as it was again created with another session{color}
> # {color:green}/e1 is deleted from Leader A and other Follower C{color}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2508) Many ZooKeeper tests are flaky because they proceed with zk operation without connecting to ZooKeeper server.

2016-08-09 Thread Raul Gutierrez Segales (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15414733#comment-15414733
 ] 

Raul Gutierrez Segales commented on ZOOKEEPER-2508:
---

Sweet patch, thanks [~arshad.mohammad]!

A few nits:

In:

{code}
+public class PurgeTxnTest extends ZKTestCase{
{code}

a space is missing before the {

In:

{code}
+zk[i] =  ClientBase.createZKClient("127.0.0.1:" + clientPorts[i]);
{code}

extra whitespace after the = 

Ditto in:

{code}
+ZooKeeper zk =  ClientBase.createZKClient(HOSTPORT);
{code}

Missing whitespace before the { in:

{code}
+public class LoadFromLogTest extends ZKTestCase{
{code}

Ditto for:

{code}
+public class SledgeHammer extends Thread{
{code}

Extra whitespace in:

{code}
+zk =  ClientBase.createZKClient("127.0.0.1:" + port2, 15000);
+
{code}

Other than that, it looks great. +1.


> Many ZooKeeper tests are flaky because they proceed with zk operation without 
> connecting to ZooKeeper server.
> -
>
> Key: ZOOKEEPER-2508
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2508
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: tests
>Reporter: Arshad Mohammad
>Assignee: Arshad Mohammad
> Fix For: 3.5.3
>
> Attachments: ZOOKEEPER-2508-01.patch, ZOOKEEPER-2508-02.patch, 
> ZOOKEEPER-2508-03.patch, ZOOKEEPER-2508-04.patch, ZOOKEEPER-2508-05.patch
>
>
> Many ZooKeeper tests are flaky because they proceed with zk operation without 
> connecting to ZooKeeper server.
> Recently in our build 
> {{org.apache.zookeeper.server.ZooKeeperServerMainTest.testStandalone()}} 
> failed.
> After analyzing we found that it is failed because it is not waiting for 
> ZooKeeper client get connected to server. In this case normally zookeeper 
> client gets connected immediately but if not connected immediately the test 
> case is bound to fail.
> Not only ZooKeeperServerMainTest but there are many other classes which have 
> such test cases. This jira is to address all those test cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2509) Secure mode leaks memory

2016-08-09 Thread Raul Gutierrez Segales (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15414713#comment-15414713
 ] 

Raul Gutierrez Segales commented on ZOOKEEPER-2509:
---

[~tdunning]: thanks for reporting! could we not reuse the test infra that was 
added with the patch that adds secure mode?

> Secure mode leaks memory
> 
>
> Key: ZOOKEEPER-2509
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2509
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.5.1
>Reporter: Ted Dunning
>
> The Netty connection handling logic fails to clean up watches on connection 
> close. This causes memory to leak.
> I will have a repro script available soon and a fix. I am not sure how to 
> build a unit test since we would need to build an entire server and generate 
> keys and such. Advice on that appreciated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2383) Startup race in ZooKeeperServer

2016-08-04 Thread Raul Gutierrez Segales (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15408826#comment-15408826
 ] 

Raul Gutierrez Segales commented on ZOOKEEPER-2383:
---

[~rakeshr]: looks good to me, one nit though. In 
testClientConnectionRequestDuringStartup:

{code}
+CountdownWatcher watcher = new CountdownWatcher();
+new ZooKeeper(HOSTPORT, ClientBase.CONNECTION_TIMEOUT, watcher);
{code}

lets assign the ZooKeeper object to a var and explicitly clean it up when done. 
Other than that, +1.

Happy to merge it after that. Thanks [~rakeshr]!

> Startup race in ZooKeeperServer
> ---
>
> Key: ZOOKEEPER-2383
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2383
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: jmx, server
>Affects Versions: 3.4.8
>Reporter: Steve Rowe
>Assignee: Rakesh R
>Priority: Blocker
> Fix For: 3.4.9, 3.5.3, 3.6.0
>
> Attachments: TestZkStandaloneJMXRegistrationRaceConcurrent.java, 
> ZOOKEEPER-2383-br-3-4.patch, ZOOKEEPER-2383.patch, 
> release-3.4.8-extra-logging.patch, zk-3.4.8-MBeanRegistry.log, 
> zk-3.4.8-NPE.log
>
>
> In attempting to upgrade Solr's ZooKeeper dependency from 3.4.6 to 3.4.8 
> (SOLR-8724) I ran into test failures where attempts to create a node in a 
> newly started standalone ZooKeeperServer were failing because of an assertion 
> in MBeanRegistry.
> ZooKeeperServer.startup() first sets up its request processor chain then 
> registers itself in JMX, but if a connection comes in before the server's JMX 
> registration happens, registration of the connection will fail because it 
> trips the assertion that (effectively) its parent (the server) has already 
> registered itself.
> {code:java|title=ZooKeeperServer.java}
> public synchronized void startup() {
> if (sessionTracker == null) {
> createSessionTracker();
> }
> startSessionTracker();
> setupRequestProcessors();
> registerJMX();
> state = State.RUNNING;
> notifyAll();
> }
> {code}
> {code:java|title=MBeanRegistry.java}
> public void register(ZKMBeanInfo bean, ZKMBeanInfo parent)
> throws JMException
> {
> assert bean != null;
> String path = null;
> if (parent != null) {
> path = mapBean2Path.get(parent);
> assert path != null;
> }
> {code}
> This problem appears to be new with ZK 3.4.8 - AFAIK Solr never had this 
> issue with ZK 3.4.6. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (ZOOKEEPER-2498) Potential resource leak in C client when processing unexpected / out of order response

2016-08-03 Thread Raul Gutierrez Segales (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raul Gutierrez Segales resolved ZOOKEEPER-2498.
---
Resolution: Fixed

> Potential resource leak in C client when processing unexpected / out of order 
> response
> --
>
> Key: ZOOKEEPER-2498
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2498
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: c client
>Affects Versions: 3.4.8, 3.5.2
>Reporter: Michael Han
>Assignee: Michael Han
> Fix For: 3.4.9, 3.5.3
>
> Attachments: ZOOKEEPER-2498.patch
>
>
> In C client, we use reference counting to decide if a given zh handle can be 
> destroyed or not. This requires we always make sure to call api_prolog (which 
> increment the counter) and api_epilog (which decrease the counter) in pairs, 
> for a given call context. 
> In zookeeper_process, there is a place where the code will return without 
> invoking api_epilog, which would lead to potential zh resource leak.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2498) Potential resource leak in C client when processing unexpected / out of order response

2016-08-03 Thread Raul Gutierrez Segales (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15406352#comment-15406352
 ] 

Raul Gutierrez Segales commented on ZOOKEEPER-2498:
---

[~hanm]: thanks! Merged:

https://github.com/apache/zookeeper/commit/21b0152ed0e0841b961ff3bca0b0d3c8567ce1a4
https://github.com/apache/zookeeper/commit/9518d9cb69cf1aaa05194795df3bfd6950b64a04
https://github.com/apache/zookeeper/commit/c00937a34b124fb705febab3422827be4891eb39


> Potential resource leak in C client when processing unexpected / out of order 
> response
> --
>
> Key: ZOOKEEPER-2498
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2498
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: c client
>Affects Versions: 3.4.8, 3.5.2
>Reporter: Michael Han
>Assignee: Michael Han
> Fix For: 3.4.9, 3.5.3
>
> Attachments: ZOOKEEPER-2498.patch
>
>
> In C client, we use reference counting to decide if a given zh handle can be 
> destroyed or not. This requires we always make sure to call api_prolog (which 
> increment the counter) and api_epilog (which decrease the counter) in pairs, 
> for a given call context. 
> In zookeeper_process, there is a place where the code will return without 
> invoking api_epilog, which would lead to potential zh resource leak.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2080) ReconfigRecoveryTest fails intermittently

2016-08-02 Thread Raul Gutierrez Segales (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15403431#comment-15403431
 ] 

Raul Gutierrez Segales commented on ZOOKEEPER-2080:
---

[~hanm]: thanks for the updated patch! I think the failure with 
org.apache.zookeeper.test.QuorumTest.testMultipleWatcherObjs is unrelated, have 
you seen it fail before? Passes for me locally:

{code}
~/src/zookeeper-svn (master) ✔ function runt() { ant -Dtestcase=$1 
test-core-java; } 
~/src/zookeeper-svn (master) ✔ runt QuorumTest

()

junit.run-concurrent:
 [echo] Running 1 concurrent JUnit processes.
[junit] WARNING: multiple versions of ant detected in path for junit 
[junit]  
jar:file:/usr/share/java/ant/ant.jar!/org/apache/tools/ant/Project.class
[junit]  and 
jar:file:/usr/share/ant/lib/ant.jar!/org/apache/tools/ant/Project.class
[junit] Picked up JAVA_TOOL_OPTIONS: -Dfile.encoding=utf8
[junit] Running org.apache.zookeeper.test.QuorumTest
[junit] Tests run: 15, Failures: 0, Errors: 0, Skipped: 1, Time elapsed: 
125.227 sec

junit.run:

test-core-java:

BUILD SUCCESSFUL
Total time: 2 minutes 52 seconds

{code}

Other than that, lgtm -- +1. 

[~shralex]: what do you think?

Happy to merge this after Alex gets another look. Thanks!

> ReconfigRecoveryTest fails intermittently
> -
>
> Key: ZOOKEEPER-2080
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2080
> Project: ZooKeeper
>  Issue Type: Sub-task
>Reporter: Ted Yu
>Assignee: Michael Han
> Fix For: 3.5.3, 3.6.0
>
> Attachments: ZOOKEEPER-2080.patch, ZOOKEEPER-2080.patch, 
> ZOOKEEPER-2080.patch, jacoco-ZOOKEEPER-2080.unzip-grows-to-70MB.7z, 
> repro-20150816.log, threaddump.log
>
>
> I got the following test failure on MacBook with trunk code:
> {code}
> Testcase: testCurrentObserverIsParticipantInNewConfig took 93.628 sec
>   FAILED
> waiting for server 2 being up
> junit.framework.AssertionFailedError: waiting for server 2 being up
>   at 
> org.apache.zookeeper.server.quorum.ReconfigRecoveryTest.testCurrentObserverIsParticipantInNewConfig(ReconfigRecoveryTest.java:529)
>   at 
> org.apache.zookeeper.JUnit4ZKTestRunner$LoggedInvokeMethod.evaluate(JUnit4ZKTestRunner.java:52)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2247) Zookeeper service becomes unavailable when leader fails to write transaction log

2016-08-01 Thread Raul Gutierrez Segales (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15403413#comment-15403413
 ] 

Raul Gutierrez Segales commented on ZOOKEEPER-2247:
---

[~rakeshr]: sorry for dropping the ball here. Lgtm, +1. One nit though:

{code}
+if ((state == State.ERROR) || (state == State.SHUTDOWN)) {
{code}

Drop the extra ()s around the state checks, it's readable enough without them.

I can merge this once we have a +1 from Flavio as well. Thanks!

> Zookeeper service becomes unavailable when leader fails to write transaction 
> log
> 
>
> Key: ZOOKEEPER-2247
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2247
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.5.0
>Reporter: Arshad Mohammad
>Assignee: Rakesh R
>Priority: Critical
> Fix For: 3.4.9, 3.5.3, 3.6.0
>
> Attachments: ZOOKEEPER-2247-01.patch, ZOOKEEPER-2247-02.patch, 
> ZOOKEEPER-2247-03.patch, ZOOKEEPER-2247-04.patch, ZOOKEEPER-2247-05.patch, 
> ZOOKEEPER-2247-06.patch, ZOOKEEPER-2247-07.patch, ZOOKEEPER-2247-09.patch, 
> ZOOKEEPER-2247-10.patch, ZOOKEEPER-2247-11.patch, ZOOKEEPER-2247-12.patch, 
> ZOOKEEPER-2247-13.patch, ZOOKEEPER-2247-14.patch, ZOOKEEPER-2247-15.patch, 
> ZOOKEEPER-2247-16.patch, ZOOKEEPER-2247-17.patch, ZOOKEEPER-2247-18.patch, 
> ZOOKEEPER-2247-19.patch, ZOOKEEPER-2247-b3.5.patch, 
> ZOOKEEPER-2247-br-3.4.patch
>
>
> Zookeeper service becomes unavailable when leader fails to write transaction 
> log. Bellow are the exceptions
> {code}
> 2015-08-14 15:41:18,556 [myid:100] - ERROR 
> [SyncThread:100:ZooKeeperCriticalThread@48] - Severe unrecoverable error, 
> from thread : SyncThread:100
> java.io.IOException: Input/output error
>   at sun.nio.ch.FileDispatcherImpl.force0(Native Method)
>   at sun.nio.ch.FileDispatcherImpl.force(FileDispatcherImpl.java:76)
>   at sun.nio.ch.FileChannelImpl.force(FileChannelImpl.java:376)
>   at 
> org.apache.zookeeper.server.persistence.FileTxnLog.commit(FileTxnLog.java:331)
>   at 
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.commit(FileTxnSnapLog.java:380)
>   at org.apache.zookeeper.server.ZKDatabase.commit(ZKDatabase.java:563)
>   at 
> org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:178)
>   at 
> org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:113)
> 2015-08-14 15:41:18,559 [myid:100] - INFO  
> [SyncThread:100:ZooKeeperServer$ZooKeeperServerListenerImpl@500] - Thread 
> SyncThread:100 exits, error code 1
> 2015-08-14 15:41:18,559 [myid:100] - INFO  
> [SyncThread:100:ZooKeeperServer@523] - shutting down
> 2015-08-14 15:41:18,560 [myid:100] - INFO  
> [SyncThread:100:SessionTrackerImpl@232] - Shutting down
> 2015-08-14 15:41:18,560 [myid:100] - INFO  
> [SyncThread:100:LeaderRequestProcessor@77] - Shutting down
> 2015-08-14 15:41:18,560 [myid:100] - INFO  
> [SyncThread:100:PrepRequestProcessor@1035] - Shutting down
> 2015-08-14 15:41:18,560 [myid:100] - INFO  
> [SyncThread:100:ProposalRequestProcessor@88] - Shutting down
> 2015-08-14 15:41:18,561 [myid:100] - INFO  
> [SyncThread:100:CommitProcessor@356] - Shutting down
> 2015-08-14 15:41:18,561 [myid:100] - INFO  
> [CommitProcessor:100:CommitProcessor@191] - CommitProcessor exited loop!
> 2015-08-14 15:41:18,562 [myid:100] - INFO  
> [SyncThread:100:Leader$ToBeAppliedRequestProcessor@915] - Shutting down
> 2015-08-14 15:41:18,562 [myid:100] - INFO  
> [SyncThread:100:FinalRequestProcessor@646] - shutdown of request processor 
> complete
> 2015-08-14 15:41:18,562 [myid:100] - INFO  
> [SyncThread:100:SyncRequestProcessor@191] - Shutting down
> 2015-08-14 15:41:18,563 [myid:100] - INFO  [ProcessThread(sid:100 
> cport:-1)::PrepRequestProcessor@159] - PrepRequestProcessor exited loop!
> {code}
> After this exception Leader server still remains leader. After this non 
> recoverable exception the leader should go down and let other followers 
> become leader.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (ZOOKEEPER-2169) Enable creation of nodes with TTLs

2016-07-31 Thread Raul Gutierrez Segales (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15401429#comment-15401429
 ] 

Raul Gutierrez Segales edited comment on ZOOKEEPER-2169 at 8/1/16 1:31 AM:
---

[~fpj]: it's here: https://reviews.apache.org/r/46983/.

cc: [~randgalt]


was (Author: rgs):
[~fpj]: it's here.

cc: [~randgalt]

> Enable creation of nodes with TTLs
> --
>
> Key: ZOOKEEPER-2169
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2169
> Project: ZooKeeper
>  Issue Type: New Feature
>  Components: c client, java client, jute, server
>Affects Versions: 3.6.0
>Reporter: Camille Fournier
>Assignee: Jordan Zimmerman
> Fix For: 3.6.0
>
> Attachments: ZOOKEEPER-2169-2.patch, ZOOKEEPER-2169-3.patch, 
> ZOOKEEPER-2169-4.patch, ZOOKEEPER-2169-5.patch, ZOOKEEPER-2169.patch
>
>
> As a user, I would like to be able to create a node that is NOT tied to a 
> session but that WILL expire automatically if action is not taken by some 
> client within a time window.
> I propose this to enable clients interacting with ZK via http or other "thin 
> clients" to create ephemeral-like nodes.
> Some ideas for the design, up for discussion:
> The node should support all normal ZK node operations including ACLs, 
> sequential key generation, etc, however, it should not support the ephemeral 
> flag. The node will be created with a TTL that is updated via a refresh 
> operation. 
> The ZK quorum will watch this node similarly to the way that it watches for 
> session liveness; if the node is not refreshed within the TTL, it will expire.
> QUESTIONS:
> 1) Should we let the refresh operation set the TTL to a different base value?
> 2) If so, should the setting of the TTL to a new base value cause a watch to 
> fire?
> 3) Do we want to allow these nodes to have children or prevent this similar 
> to ephemeral nodes?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2080) ReconfigRecoveryTest fails intermittently

2016-07-31 Thread Raul Gutierrez Segales (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15401425#comment-15401425
 ] 

Raul Gutierrez Segales commented on ZOOKEEPER-2080:
---

[~hanm]: thanks for tracking this down and for the patch! A few questions/asks, 
looking at the code:

{code}
Election election = null;
synchronized(self) {
try {
rqv = self.configFromString(new String(b));
QuorumVerifier curQV = self.getQuorumVerifier();
if (rqv.getVersion() > curQV.getVersion()) {
LOG.info("{} Received version: {} my version: {}", self.getId(),
Long.toHexString(rqv.getVersion()),
Long.toHexString(self.getQuorumVerifier().getVersion()));
if (self.getPeerState() == ServerState.LOOKING) {
LOG.debug("Invoking processReconfig(), state: {}", 
self.getServerState());
self.processReconfig(rqv, null, null, false);
if (!rqv.equals(curQV)) {
LOG.info("restarting leader election");
// Signaling quorum peer to restart leader election.
self.shuttingDownLE = true;
 // Get a hold of current leader election object of quorum 
peer,
// so we can clean it up later without holding the lock of 
quorum
// peer. If we shutdown current leader election we will run 
into
// potential deadlock. See ZOOKEEPER-2080 for more details.
election = self.getElectionAlg();
}
} else {
LOG.debug("Skip processReconfig(), state: {}", 
self.getServerState());
}
}
} catch (IOException e) {
LOG.error("Something went wrong while processing config received from 
{}", response.sid);
   } catch (ConfigException e) {
   LOG.error("Something went wrong while processing config received from 
{}", response.sid);
   }
}
{code}

Do we really need to synchronize around self for the first part:

{code}
rqv = self.configFromString(new String(b));
QuorumVerifier curQV = self.getQuorumVerifier();
if (rqv.getVersion() > curQV.getVersion()) {

{code}

? Sounds like that can be done without synchronizing... no? 

Also, given you've spent a good amount of cycles untangling the dependencies 
around locking QuorumPeer, could you maybe add a comment before the 
synchronize(self) block noting why it is needed and who else might be 
contending for this lock. Thanks so much!

I think unit testing these things is a bit tricky, we might get a better return 
by just keeping better comments around synchronized regions and generally 
keeping them well maintained (imho). So I am happy to +1 without tests. 

> ReconfigRecoveryTest fails intermittently
> -
>
> Key: ZOOKEEPER-2080
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2080
> Project: ZooKeeper
>  Issue Type: Sub-task
>Reporter: Ted Yu
>Assignee: Michael Han
> Fix For: 3.5.3, 3.6.0
>
> Attachments: ZOOKEEPER-2080.patch, ZOOKEEPER-2080.patch, 
> jacoco-ZOOKEEPER-2080.unzip-grows-to-70MB.7z, repro-20150816.log, 
> threaddump.log
>
>
> I got the following test failure on MacBook with trunk code:
> {code}
> Testcase: testCurrentObserverIsParticipantInNewConfig took 93.628 sec
>   FAILED
> waiting for server 2 being up
> junit.framework.AssertionFailedError: waiting for server 2 being up
>   at 
> org.apache.zookeeper.server.quorum.ReconfigRecoveryTest.testCurrentObserverIsParticipantInNewConfig(ReconfigRecoveryTest.java:529)
>   at 
> org.apache.zookeeper.JUnit4ZKTestRunner$LoggedInvokeMethod.evaluate(JUnit4ZKTestRunner.java:52)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2495) Cluster unavailable on disk full(ENOSPC), disk quota(EDQUOT), disk write error(EIO) errors

2016-07-31 Thread Raul Gutierrez Segales (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15401406#comment-15401406
 ] 

Raul Gutierrez Segales commented on ZOOKEEPER-2495:
---

[~ramanala]: out of curiosity, using what filesystem did this happen with? 

> Cluster unavailable on disk full(ENOSPC), disk quota(EDQUOT), disk write 
> error(EIO) errors
> --
>
> Key: ZOOKEEPER-2495
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2495
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: leaderElection, server
>Affects Versions: 3.4.8
> Environment: Normal ZooKeeper cluster with 3 Linux nodes.
>Reporter: Ramnatthan Alagappan
>
> ZooKeeper cluster completely stalls with *no* transactions making progress 
> when a storage related error (such as *ENOSPC, EDQUOT, EIO*) is encountered 
> by the current *leader*. 
> Surprisingly, the same errors in some circumstances cause the node to 
> completely crash and therefore allowing other nodes in the cluster to become 
> the leader and make progress with transactions. Interestingly, the same 
> errors if encountered while initializing a new log file causes the current 
> leader to go to weird state (but does not crash) where it thinks it is the 
> leader (and so does not allow others to become the leader). *This causes the 
> entire cluster to freeze. *
> Here is the stacktrace of the leader:
> 
> 2016-07-11 15:42:27,502 [myid:3] - INFO  [SyncThread:3:FileTxnLog@199] - 
> Creating new log file: log.20001
> 2016-07-11 15:42:27,505 [myid:3] - ERROR 
> [SyncThread:3:ZooKeeperCriticalThread@49] - Severe unrecoverable error, from 
> thread : SyncThread:3
> java.io.IOException: Disk quota exceeded
>   at java.io.FileOutputStream.writeBytes(Native Method)
>   at java.io.FileOutputStream.write(FileOutputStream.java:345)
>   at 
> java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
>   at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)
>   at 
> org.apache.zookeeper.server.persistence.FileTxnLog.append(FileTxnLog.java:211)
>   at 
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.append(FileTxnSnapLog.java:314)
>   at org.apache.zookeeper.server.ZKDatabase.append(ZKDatabase.java:476)
>   at 
> org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:140)
> 
> From the trace and the code, it looks like the problem happens only when a 
> new log file is initialized and only when there are errors in two cases:
> 1. Error during the append of *log header*.
> 2. Error during *padding zero bytes to the end of the log*.
>  
> If similar errors happen when writing some other blocks of data, then the 
> node just completely crashes allowing others to be elected as a new leader. 
> These two blocks of the newly created log file are special as they take a 
> different error recovery code path -- the node does not completely crash but 
> rather certain threads are killed but supposedly the quorum holding thread 
> stays up thereby preventing others to become the new leader.  This causes the 
> other nodes to think that there is no problem with the leader but the cluster 
> just becomes unavailable for any subsequent operations such as read/write. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2169) Enable creation of nodes with TTLs

2016-07-19 Thread Raul Gutierrez Segales (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15384821#comment-15384821
 ] 

Raul Gutierrez Segales commented on ZOOKEEPER-2169:
---

[~randgalt]: sorry for the lag - will look at this tonight/tomorrow. 

> Enable creation of nodes with TTLs
> --
>
> Key: ZOOKEEPER-2169
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2169
> Project: ZooKeeper
>  Issue Type: New Feature
>  Components: c client, java client, jute, server
>Affects Versions: 3.6.0
>Reporter: Camille Fournier
>Assignee: Jordan Zimmerman
> Fix For: 3.6.0
>
> Attachments: ZOOKEEPER-2169-2.patch, ZOOKEEPER-2169-3.patch, 
> ZOOKEEPER-2169-4.patch, ZOOKEEPER-2169-5.patch, ZOOKEEPER-2169.patch
>
>
> As a user, I would like to be able to create a node that is NOT tied to a 
> session but that WILL expire automatically if action is not taken by some 
> client within a time window.
> I propose this to enable clients interacting with ZK via http or other "thin 
> clients" to create ephemeral-like nodes.
> Some ideas for the design, up for discussion:
> The node should support all normal ZK node operations including ACLs, 
> sequential key generation, etc, however, it should not support the ephemeral 
> flag. The node will be created with a TTL that is updated via a refresh 
> operation. 
> The ZK quorum will watch this node similarly to the way that it watches for 
> session liveness; if the node is not refreshed within the TTL, it will expire.
> QUESTIONS:
> 1) Should we let the refresh operation set the TTL to a different base value?
> 2) If so, should the setting of the TTL to a new base value cause a watch to 
> fire?
> 3) Do we want to allow these nodes to have children or prevent this similar 
> to ephemeral nodes?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2152) Intermittent failure in TestReconfig.cc

2016-07-12 Thread Raul Gutierrez Segales (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15374126#comment-15374126
 ] 

Raul Gutierrez Segales commented on ZOOKEEPER-2152:
---

Sorry for the drive by review [~hanm] (though I'll look in deeper a bit later 
tonight), but I wanted to second [~shralex]'s comments: better to avoid, if at 
all possible, specific logic just to workaround this test case...

> Intermittent failure in TestReconfig.cc
> ---
>
> Key: ZOOKEEPER-2152
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2152
> Project: ZooKeeper
>  Issue Type: Sub-task
>  Components: c client
>Reporter: Michi Mutsuzaki
>Assignee: Michael Han
>  Labels: reconfiguration
> Fix For: 3.5.3, 3.6.0
>
> Attachments: ZOOKEEPER-2152.patch
>
>
> I'm seeing this failure in the c client test once in a while:
> {noformat}
> [exec] 
> /home/jenkins/jenkins-slave/workspace/ZooKeeper-trunk/trunk/src/c/tests/TestReconfig.cc:474:
>  Assertion: assertion failed [Expression: found != string::npos, 
> 10.10.10.4:2004 not in newComing list]
> {noformat}
> https://builds.apache.org/job/ZooKeeper-trunk/2640/console



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2453) Cannot compile on ARM: "Error: bad instruction `lock xaddl"

2016-06-28 Thread Raul Gutierrez Segales (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15353980#comment-15353980
 ] 

Raul Gutierrez Segales commented on ZOOKEEPER-2453:
---

[~mthies]: hmm, what compilers do we have to support? This could be useful:

https://gcc.gnu.org/onlinedocs/gcc-4.1.2/gcc/Atomic-Builtins.html

> Cannot compile on ARM: "Error: bad instruction `lock xaddl"
> ---
>
> Key: ZOOKEEPER-2453
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2453
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: c client
>Affects Versions: 3.4.5
> Environment: Jessie, Raspberry
>Reporter: Markus Thies
>Priority: Minor
>
> It seems that this is a bug equivalent to the issue ZOOKEEPER-1374.
> make[5]: Entering directory 
> '/home/pi/Downloads/mesos-0.28.2/build/3rdparty/zookeeper-3.4.5/src/c'
> if /bin/bash ./libtool --tag=CC --mode=compile gcc -DHAVE_CONFIG_H -I. -I. 
> -I.  -I./include -I./tests -I./generated  -DTHREADED -g -O2 -D_GNU_SOURCE -MT 
> libzkmt_la-mt_adaptor.lo -MD -MP -MF ".deps/libzkmt_la-mt_adaptor.Tpo" -c -o 
> libzkmt_la-mt_adaptor.lo `test -f 'src/mt_adaptor.c' || echo 
> './'`src/mt_adaptor.c; \
> then mv -f ".deps/libzkmt_la-mt_adaptor.Tpo" 
> ".deps/libzkmt_la-mt_adaptor.Plo"; else rm -f 
> ".deps/libzkmt_la-mt_adaptor.Tpo"; exit 1; fi
>  gcc -DHAVE_CONFIG_H -I. -I. -I. -I./include -I./tests -I./generated 
> -DTHREADED -g -O2 -D_GNU_SOURCE -MT libzkmt_la-mt_adaptor.lo -MD -MP -MF 
> .deps/libzkmt_la-mt_adaptor.Tpo -c src/mt_adaptor.c  -fPIC -DPIC -o 
> libzkmt_la-mt_adaptor.o
> /tmp/ccs0G1lb.s: Assembler messages:
> /tmp/ccs0G1lb.s:1589: Error: bad instruction `lock xaddl r1,[r0]'
> Makefile:743: recipe for target 'libzkmt_la-mt_adaptor.lo' failed



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2247) Zookeeper service becomes unavailable when leader fails to write transaction log

2016-06-23 Thread Raul Gutierrez Segales (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15347203#comment-15347203
 ] 

Raul Gutierrez Segales commented on ZOOKEEPER-2247:
---

Minor nit:

{code}
+/**
+ * This can be used while shutting down the server to see whether the 
server
+ * is already shutdown or not.
+ *
+ * @return true if the server is running or server hits an error, false
+ * otherwise.
+ */
+protected boolean needsShutdown() {
+return state == State.RUNNING || state == State.ERROR;
+}
{code}

should probably be canShutdown(), given that if you are in State.RUNNING it's 
not like you need a shutdown. 

> Zookeeper service becomes unavailable when leader fails to write transaction 
> log
> 
>
> Key: ZOOKEEPER-2247
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2247
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.5.0
>Reporter: Arshad Mohammad
>Assignee: Rakesh R
>Priority: Critical
> Fix For: 3.4.9, 3.5.2, 3.6.0
>
> Attachments: ZOOKEEPER-2247-01.patch, ZOOKEEPER-2247-02.patch, 
> ZOOKEEPER-2247-03.patch, ZOOKEEPER-2247-04.patch, ZOOKEEPER-2247-05.patch, 
> ZOOKEEPER-2247-06.patch, ZOOKEEPER-2247-07.patch, ZOOKEEPER-2247-09.patch, 
> ZOOKEEPER-2247-10.patch, ZOOKEEPER-2247-11.patch, ZOOKEEPER-2247-12.patch, 
> ZOOKEEPER-2247-13.patch, ZOOKEEPER-2247-14.patch, ZOOKEEPER-2247-15.patch, 
> ZOOKEEPER-2247-16.patch, ZOOKEEPER-2247-17.patch, ZOOKEEPER-2247-18.patch, 
> ZOOKEEPER-2247-b3.5.patch, ZOOKEEPER-2247-br-3.4.patch
>
>
> Zookeeper service becomes unavailable when leader fails to write transaction 
> log. Bellow are the exceptions
> {code}
> 2015-08-14 15:41:18,556 [myid:100] - ERROR 
> [SyncThread:100:ZooKeeperCriticalThread@48] - Severe unrecoverable error, 
> from thread : SyncThread:100
> java.io.IOException: Input/output error
>   at sun.nio.ch.FileDispatcherImpl.force0(Native Method)
>   at sun.nio.ch.FileDispatcherImpl.force(FileDispatcherImpl.java:76)
>   at sun.nio.ch.FileChannelImpl.force(FileChannelImpl.java:376)
>   at 
> org.apache.zookeeper.server.persistence.FileTxnLog.commit(FileTxnLog.java:331)
>   at 
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.commit(FileTxnSnapLog.java:380)
>   at org.apache.zookeeper.server.ZKDatabase.commit(ZKDatabase.java:563)
>   at 
> org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:178)
>   at 
> org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:113)
> 2015-08-14 15:41:18,559 [myid:100] - INFO  
> [SyncThread:100:ZooKeeperServer$ZooKeeperServerListenerImpl@500] - Thread 
> SyncThread:100 exits, error code 1
> 2015-08-14 15:41:18,559 [myid:100] - INFO  
> [SyncThread:100:ZooKeeperServer@523] - shutting down
> 2015-08-14 15:41:18,560 [myid:100] - INFO  
> [SyncThread:100:SessionTrackerImpl@232] - Shutting down
> 2015-08-14 15:41:18,560 [myid:100] - INFO  
> [SyncThread:100:LeaderRequestProcessor@77] - Shutting down
> 2015-08-14 15:41:18,560 [myid:100] - INFO  
> [SyncThread:100:PrepRequestProcessor@1035] - Shutting down
> 2015-08-14 15:41:18,560 [myid:100] - INFO  
> [SyncThread:100:ProposalRequestProcessor@88] - Shutting down
> 2015-08-14 15:41:18,561 [myid:100] - INFO  
> [SyncThread:100:CommitProcessor@356] - Shutting down
> 2015-08-14 15:41:18,561 [myid:100] - INFO  
> [CommitProcessor:100:CommitProcessor@191] - CommitProcessor exited loop!
> 2015-08-14 15:41:18,562 [myid:100] - INFO  
> [SyncThread:100:Leader$ToBeAppliedRequestProcessor@915] - Shutting down
> 2015-08-14 15:41:18,562 [myid:100] - INFO  
> [SyncThread:100:FinalRequestProcessor@646] - shutdown of request processor 
> complete
> 2015-08-14 15:41:18,562 [myid:100] - INFO  
> [SyncThread:100:SyncRequestProcessor@191] - Shutting down
> 2015-08-14 15:41:18,563 [myid:100] - INFO  [ProcessThread(sid:100 
> cport:-1)::PrepRequestProcessor@159] - PrepRequestProcessor exited loop!
> {code}
> After this exception Leader server still remains leader. After this non 
> recoverable exception the leader should go down and let other followers 
> become leader.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2247) Zookeeper service becomes unavailable when leader fails to write transaction log

2016-06-23 Thread Raul Gutierrez Segales (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15347184#comment-15347184
 ] 

Raul Gutierrez Segales commented on ZOOKEEPER-2247:
---

[~fpj], [~rakeshr]: sorry for the late comments. So, for the test case in 
ZooKeeperServerMainTest.java:

{code}
 /**
+ * Test case for https://issues.apache.org/jira/browse/ZOOKEEPER-2247.
+ * Test to verify that even after non recoverable error (error while
+ * writing transaction log) on ZooKeeper service will be available
+ */
+@Test(timeout = 3)
+public void testNonRecoverableError() throws Exception {
{code}

That's really not what's happening, given that we don't wait for the quorum to 
come back. We only wait for the injected failure to happen. Does this test case 
actually provide anything new to what we have for in 
NonRecoverableErrorTest.java? Am I missing some context?

> Zookeeper service becomes unavailable when leader fails to write transaction 
> log
> 
>
> Key: ZOOKEEPER-2247
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2247
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.5.0
>Reporter: Arshad Mohammad
>Assignee: Rakesh R
>Priority: Critical
> Fix For: 3.4.9, 3.5.2, 3.6.0
>
> Attachments: ZOOKEEPER-2247-01.patch, ZOOKEEPER-2247-02.patch, 
> ZOOKEEPER-2247-03.patch, ZOOKEEPER-2247-04.patch, ZOOKEEPER-2247-05.patch, 
> ZOOKEEPER-2247-06.patch, ZOOKEEPER-2247-07.patch, ZOOKEEPER-2247-09.patch, 
> ZOOKEEPER-2247-10.patch, ZOOKEEPER-2247-11.patch, ZOOKEEPER-2247-12.patch, 
> ZOOKEEPER-2247-13.patch, ZOOKEEPER-2247-14.patch, ZOOKEEPER-2247-15.patch, 
> ZOOKEEPER-2247-16.patch, ZOOKEEPER-2247-17.patch, ZOOKEEPER-2247-18.patch, 
> ZOOKEEPER-2247-b3.5.patch, ZOOKEEPER-2247-br-3.4.patch
>
>
> Zookeeper service becomes unavailable when leader fails to write transaction 
> log. Bellow are the exceptions
> {code}
> 2015-08-14 15:41:18,556 [myid:100] - ERROR 
> [SyncThread:100:ZooKeeperCriticalThread@48] - Severe unrecoverable error, 
> from thread : SyncThread:100
> java.io.IOException: Input/output error
>   at sun.nio.ch.FileDispatcherImpl.force0(Native Method)
>   at sun.nio.ch.FileDispatcherImpl.force(FileDispatcherImpl.java:76)
>   at sun.nio.ch.FileChannelImpl.force(FileChannelImpl.java:376)
>   at 
> org.apache.zookeeper.server.persistence.FileTxnLog.commit(FileTxnLog.java:331)
>   at 
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.commit(FileTxnSnapLog.java:380)
>   at org.apache.zookeeper.server.ZKDatabase.commit(ZKDatabase.java:563)
>   at 
> org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:178)
>   at 
> org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:113)
> 2015-08-14 15:41:18,559 [myid:100] - INFO  
> [SyncThread:100:ZooKeeperServer$ZooKeeperServerListenerImpl@500] - Thread 
> SyncThread:100 exits, error code 1
> 2015-08-14 15:41:18,559 [myid:100] - INFO  
> [SyncThread:100:ZooKeeperServer@523] - shutting down
> 2015-08-14 15:41:18,560 [myid:100] - INFO  
> [SyncThread:100:SessionTrackerImpl@232] - Shutting down
> 2015-08-14 15:41:18,560 [myid:100] - INFO  
> [SyncThread:100:LeaderRequestProcessor@77] - Shutting down
> 2015-08-14 15:41:18,560 [myid:100] - INFO  
> [SyncThread:100:PrepRequestProcessor@1035] - Shutting down
> 2015-08-14 15:41:18,560 [myid:100] - INFO  
> [SyncThread:100:ProposalRequestProcessor@88] - Shutting down
> 2015-08-14 15:41:18,561 [myid:100] - INFO  
> [SyncThread:100:CommitProcessor@356] - Shutting down
> 2015-08-14 15:41:18,561 [myid:100] - INFO  
> [CommitProcessor:100:CommitProcessor@191] - CommitProcessor exited loop!
> 2015-08-14 15:41:18,562 [myid:100] - INFO  
> [SyncThread:100:Leader$ToBeAppliedRequestProcessor@915] - Shutting down
> 2015-08-14 15:41:18,562 [myid:100] - INFO  
> [SyncThread:100:FinalRequestProcessor@646] - shutdown of request processor 
> complete
> 2015-08-14 15:41:18,562 [myid:100] - INFO  
> [SyncThread:100:SyncRequestProcessor@191] - Shutting down
> 2015-08-14 15:41:18,563 [myid:100] - INFO  [ProcessThread(sid:100 
> cport:-1)::PrepRequestProcessor@159] - PrepRequestProcessor exited loop!
> {code}
> After this exception Leader server still remains leader. After this non 
> recoverable exception the leader should go down and let other followers 
> become leader.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2247) Zookeeper service becomes unavailable when leader fails to write transaction log

2016-06-23 Thread Raul Gutierrez Segales (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15347195#comment-15347195
 ] 

Raul Gutierrez Segales commented on ZOOKEEPER-2247:
---

Another thing, this feels weird, in ZooKeeperServerStateListener.java:

{code}
+class ZooKeeperServerStateListener {
+private final CountDownLatch shutdownLatch;
+
+ZooKeeperServerStateListener(CountDownLatch shutdownLatch) {
+this.shutdownLatch = shutdownLatch;
+}
+
+/**
+ * This will be invoked when the server transition to a new server state.
+ *
+ * @param state new server state
+ */
+void stateChanged(State state) {
+if (state != State.RUNNING) {
+shutdownLatch.countDown();
+}
+}
+}
{code}

I think the name is misleading, since it's only used to watch/count shutdown 
events. Should we name it appropriately then?

Also, what happens if someone calls stateChanged(State.INITIAL), we'd still 
call shutdownLatch.countDown(). Should we not assert that doesn't happen?

> Zookeeper service becomes unavailable when leader fails to write transaction 
> log
> 
>
> Key: ZOOKEEPER-2247
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2247
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.5.0
>Reporter: Arshad Mohammad
>Assignee: Rakesh R
>Priority: Critical
> Fix For: 3.4.9, 3.5.2, 3.6.0
>
> Attachments: ZOOKEEPER-2247-01.patch, ZOOKEEPER-2247-02.patch, 
> ZOOKEEPER-2247-03.patch, ZOOKEEPER-2247-04.patch, ZOOKEEPER-2247-05.patch, 
> ZOOKEEPER-2247-06.patch, ZOOKEEPER-2247-07.patch, ZOOKEEPER-2247-09.patch, 
> ZOOKEEPER-2247-10.patch, ZOOKEEPER-2247-11.patch, ZOOKEEPER-2247-12.patch, 
> ZOOKEEPER-2247-13.patch, ZOOKEEPER-2247-14.patch, ZOOKEEPER-2247-15.patch, 
> ZOOKEEPER-2247-16.patch, ZOOKEEPER-2247-17.patch, ZOOKEEPER-2247-18.patch, 
> ZOOKEEPER-2247-b3.5.patch, ZOOKEEPER-2247-br-3.4.patch
>
>
> Zookeeper service becomes unavailable when leader fails to write transaction 
> log. Bellow are the exceptions
> {code}
> 2015-08-14 15:41:18,556 [myid:100] - ERROR 
> [SyncThread:100:ZooKeeperCriticalThread@48] - Severe unrecoverable error, 
> from thread : SyncThread:100
> java.io.IOException: Input/output error
>   at sun.nio.ch.FileDispatcherImpl.force0(Native Method)
>   at sun.nio.ch.FileDispatcherImpl.force(FileDispatcherImpl.java:76)
>   at sun.nio.ch.FileChannelImpl.force(FileChannelImpl.java:376)
>   at 
> org.apache.zookeeper.server.persistence.FileTxnLog.commit(FileTxnLog.java:331)
>   at 
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.commit(FileTxnSnapLog.java:380)
>   at org.apache.zookeeper.server.ZKDatabase.commit(ZKDatabase.java:563)
>   at 
> org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:178)
>   at 
> org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:113)
> 2015-08-14 15:41:18,559 [myid:100] - INFO  
> [SyncThread:100:ZooKeeperServer$ZooKeeperServerListenerImpl@500] - Thread 
> SyncThread:100 exits, error code 1
> 2015-08-14 15:41:18,559 [myid:100] - INFO  
> [SyncThread:100:ZooKeeperServer@523] - shutting down
> 2015-08-14 15:41:18,560 [myid:100] - INFO  
> [SyncThread:100:SessionTrackerImpl@232] - Shutting down
> 2015-08-14 15:41:18,560 [myid:100] - INFO  
> [SyncThread:100:LeaderRequestProcessor@77] - Shutting down
> 2015-08-14 15:41:18,560 [myid:100] - INFO  
> [SyncThread:100:PrepRequestProcessor@1035] - Shutting down
> 2015-08-14 15:41:18,560 [myid:100] - INFO  
> [SyncThread:100:ProposalRequestProcessor@88] - Shutting down
> 2015-08-14 15:41:18,561 [myid:100] - INFO  
> [SyncThread:100:CommitProcessor@356] - Shutting down
> 2015-08-14 15:41:18,561 [myid:100] - INFO  
> [CommitProcessor:100:CommitProcessor@191] - CommitProcessor exited loop!
> 2015-08-14 15:41:18,562 [myid:100] - INFO  
> [SyncThread:100:Leader$ToBeAppliedRequestProcessor@915] - Shutting down
> 2015-08-14 15:41:18,562 [myid:100] - INFO  
> [SyncThread:100:FinalRequestProcessor@646] - shutdown of request processor 
> complete
> 2015-08-14 15:41:18,562 [myid:100] - INFO  
> [SyncThread:100:SyncRequestProcessor@191] - Shutting down
> 2015-08-14 15:41:18,563 [myid:100] - INFO  [ProcessThread(sid:100 
> cport:-1)::PrepRequestProcessor@159] - PrepRequestProcessor exited loop!
> {code}
> After this exception Leader server still remains leader. After this non 
> recoverable exception the leader should go down and let other followers 
> become leader.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2366) Reconfiguration of client port causes a socket leak

2016-06-23 Thread Raul Gutierrez Segales (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15346989#comment-15346989
 ] 

Raul Gutierrez Segales commented on ZOOKEEPER-2366:
---

[~cnauroth]: actually, can't merge from here, I am on the wrong laptop, sorry 
:-( 

> Reconfiguration of client port causes a socket leak
> ---
>
> Key: ZOOKEEPER-2366
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2366
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: quorum
>Affects Versions: 3.5.0
>Reporter: Timothy Ward
>Assignee: Flavio Junqueira
>Priority: Blocker
> Fix For: 3.5.2, 3.6.0
>
> Attachments: ZOOKEEPER-2366.patch, ZOOKEEPER-2366.patch, 
> ZOOKEEPER-2366.patch, ZOOKEEPER-2366.patch, ZOOKEEPER-2366.patch, 
> ZOOKEEPER-2366.patch, ZOOKEEPER-2366.patch, zookeeper.patch
>
>
> The NIOServerCnxnFactory reconfigure method can leak server sockets, and 
> hence make ports unusable until the JVM restarts:
> The first line of the method takes a reference to the current 
> ServerSocketChannel and then the next line replaces it. The subsequent 
> interactions with the server socket can fail (for example if the 
> reconfiguration tries to bind to an in-use port). If they fail *before* the  
> call to oldSS.close() then oldSS is *never* closed. This holds that port open 
> forever, and prevents the user from rolling back to the previous port!
> The code from reconfigure is shown below:
>  ServerSocketChannel oldSS = ss;
> try {
>this.ss = ServerSocketChannel.open();
>ss.socket().setReuseAddress(true);
>LOG.info("binding to port " + addr);
>ss.socket().bind(addr);
>ss.configureBlocking(false);
>acceptThread.setReconfiguring();
>oldSS.close();   
>acceptThread.wakeupSelector();
>try {
> acceptThread.join();
>  } catch (InterruptedException e) {
>  LOG.error("Error joining old acceptThread when 
> reconfiguring client port " + e.getMessage());
>  }
>acceptThread = new AcceptThread(ss, addr, selectorThreads);
>acceptThread.start();
> } catch(IOException e) {
>LOG.error("Error reconfiguring client port to " + addr + " " + 
> e.getMessage());
> }



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2366) Reconfiguration of client port causes a socket leak

2016-06-23 Thread Raul Gutierrez Segales (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15346986#comment-15346986
 ] 

Raul Gutierrez Segales commented on ZOOKEEPER-2366:
---

+1, thanks [~fpj].

[~cnauroth]: i'll go ahead and merge this. 

> Reconfiguration of client port causes a socket leak
> ---
>
> Key: ZOOKEEPER-2366
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2366
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: quorum
>Affects Versions: 3.5.0
>Reporter: Timothy Ward
>Assignee: Flavio Junqueira
>Priority: Blocker
> Fix For: 3.5.2, 3.6.0
>
> Attachments: ZOOKEEPER-2366.patch, ZOOKEEPER-2366.patch, 
> ZOOKEEPER-2366.patch, ZOOKEEPER-2366.patch, ZOOKEEPER-2366.patch, 
> ZOOKEEPER-2366.patch, ZOOKEEPER-2366.patch, zookeeper.patch
>
>
> The NIOServerCnxnFactory reconfigure method can leak server sockets, and 
> hence make ports unusable until the JVM restarts:
> The first line of the method takes a reference to the current 
> ServerSocketChannel and then the next line replaces it. The subsequent 
> interactions with the server socket can fail (for example if the 
> reconfiguration tries to bind to an in-use port). If they fail *before* the  
> call to oldSS.close() then oldSS is *never* closed. This holds that port open 
> forever, and prevents the user from rolling back to the previous port!
> The code from reconfigure is shown below:
>  ServerSocketChannel oldSS = ss;
> try {
>this.ss = ServerSocketChannel.open();
>ss.socket().setReuseAddress(true);
>LOG.info("binding to port " + addr);
>ss.socket().bind(addr);
>ss.configureBlocking(false);
>acceptThread.setReconfiguring();
>oldSS.close();   
>acceptThread.wakeupSelector();
>try {
> acceptThread.join();
>  } catch (InterruptedException e) {
>  LOG.error("Error joining old acceptThread when 
> reconfiguring client port " + e.getMessage());
>  }
>acceptThread = new AcceptThread(ss, addr, selectorThreads);
>acceptThread.start();
> } catch(IOException e) {
>LOG.error("Error reconfiguring client port to " + addr + " " + 
> e.getMessage());
> }



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2366) Reconfiguration of client port causes a socket leak

2016-06-23 Thread Raul Gutierrez Segales (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15346833#comment-15346833
 ] 

Raul Gutierrez Segales commented on ZOOKEEPER-2366:
---

[~fpj]: what I am saying is that closing the (old) socket should always be best 
effort, the outer try/catch means that acceptThread.wakeupSelector() (and what 
comes after) would be skipped, which doesn't sound right.

I think it should be like this:

{code}
private void tryClose(ServerSocketChannel s) {
  try {
s.close();
  } catch (IOException sse) {
LOG.error("Error while closing server socket.", sse);
  }
}

public void reconfigure(InetSocketAddress addr) {
 ServerSocketChannel oldSS = ss;
 try {
this.ss = ServerSocketChannel.open();
ss.socket().setReuseAddress(true);
LOG.info("binding to port " + addr);
ss.socket().bind(addr);
ss.configureBlocking(false);
acceptThread.setReconfiguring();
tryClose(oldSS);
acceptThread.wakeupSelector();
try {
acceptThread.join();
} catch (InterruptedException e) {
LOG.error("Error joining old acceptThread when reconfiguring 
client port {}",
e.getMessage());
Thread.currentThread().interrupt();
}
acceptThread = new AcceptThread(ss, addr, selectorThreads);
acceptThread.start();
 } catch(IOException e) {
LOG.error("Error reconfiguring client port to {} {}", addr, 
e.getMessage());
tryClose(oldSS);
 }
{code}

Thus, the outer try/catch is about the rest of the stuff we are doing and 
closing the old socket is only a best effort thing. 

> Reconfiguration of client port causes a socket leak
> ---
>
> Key: ZOOKEEPER-2366
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2366
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: quorum
>Affects Versions: 3.5.0
>Reporter: Timothy Ward
>Assignee: Flavio Junqueira
>Priority: Blocker
> Fix For: 3.5.2
>
> Attachments: ZOOKEEPER-2366.patch, ZOOKEEPER-2366.patch, 
> ZOOKEEPER-2366.patch, ZOOKEEPER-2366.patch, ZOOKEEPER-2366.patch, 
> ZOOKEEPER-2366.patch, zookeeper.patch
>
>
> The NIOServerCnxnFactory reconfigure method can leak server sockets, and 
> hence make ports unusable until the JVM restarts:
> The first line of the method takes a reference to the current 
> ServerSocketChannel and then the next line replaces it. The subsequent 
> interactions with the server socket can fail (for example if the 
> reconfiguration tries to bind to an in-use port). If they fail *before* the  
> call to oldSS.close() then oldSS is *never* closed. This holds that port open 
> forever, and prevents the user from rolling back to the previous port!
> The code from reconfigure is shown below:
>  ServerSocketChannel oldSS = ss;
> try {
>this.ss = ServerSocketChannel.open();
>ss.socket().setReuseAddress(true);
>LOG.info("binding to port " + addr);
>ss.socket().bind(addr);
>ss.configureBlocking(false);
>acceptThread.setReconfiguring();
>oldSS.close();   
>acceptThread.wakeupSelector();
>try {
> acceptThread.join();
>  } catch (InterruptedException e) {
>  LOG.error("Error joining old acceptThread when 
> reconfiguring client port " + e.getMessage());
>  }
>acceptThread = new AcceptThread(ss, addr, selectorThreads);
>acceptThread.start();
> } catch(IOException e) {
>LOG.error("Error reconfiguring client port to " + addr + " " + 
> e.getMessage());
> }



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2366) Reconfiguration of client port causes a socket leak

2016-06-23 Thread Raul Gutierrez Segales (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15346631#comment-15346631
 ] 

Raul Gutierrez Segales commented on ZOOKEEPER-2366:
---

Hey all, sorry for joining the party late. One quick note is that in 
NIOServerCnxnFactory.reconfigure():

{code}

+acceptThread.setReconfiguring();
+oldSS.close();
+acceptThread.wakeupSelector();
...
{code}

sounds like that first oldSS.close() call should be wrapped with a try/catch 
IOException given it's best effort too (it could be gone by that time, no?).

Other than, +1.

> Reconfiguration of client port causes a socket leak
> ---
>
> Key: ZOOKEEPER-2366
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2366
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: quorum
>Affects Versions: 3.5.0
>Reporter: Timothy Ward
>Assignee: Flavio Junqueira
>Priority: Blocker
> Fix For: 3.5.2
>
> Attachments: ZOOKEEPER-2366.patch, ZOOKEEPER-2366.patch, 
> ZOOKEEPER-2366.patch, ZOOKEEPER-2366.patch, ZOOKEEPER-2366.patch, 
> ZOOKEEPER-2366.patch, zookeeper.patch
>
>
> The NIOServerCnxnFactory reconfigure method can leak server sockets, and 
> hence make ports unusable until the JVM restarts:
> The first line of the method takes a reference to the current 
> ServerSocketChannel and then the next line replaces it. The subsequent 
> interactions with the server socket can fail (for example if the 
> reconfiguration tries to bind to an in-use port). If they fail *before* the  
> call to oldSS.close() then oldSS is *never* closed. This holds that port open 
> forever, and prevents the user from rolling back to the previous port!
> The code from reconfigure is shown below:
>  ServerSocketChannel oldSS = ss;
> try {
>this.ss = ServerSocketChannel.open();
>ss.socket().setReuseAddress(true);
>LOG.info("binding to port " + addr);
>ss.socket().bind(addr);
>ss.configureBlocking(false);
>acceptThread.setReconfiguring();
>oldSS.close();   
>acceptThread.wakeupSelector();
>try {
> acceptThread.join();
>  } catch (InterruptedException e) {
>  LOG.error("Error joining old acceptThread when 
> reconfiguring client port " + e.getMessage());
>  }
>acceptThread = new AcceptThread(ss, addr, selectorThreads);
>acceptThread.start();
> } catch(IOException e) {
>LOG.error("Error reconfiguring client port to " + addr + " " + 
> e.getMessage());
> }



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2320) C-client crashes when removing watcher asynchronously in "local" mode

2016-06-17 Thread Raul Gutierrez Segales (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15336639#comment-15336639
 ] 

Raul Gutierrez Segales commented on ZOOKEEPER-2320:
---

Nice catch [~abrahamfine]! Do you think you can provide a patch for this? I 
wrote that code, so more than happy to review it.

> C-client crashes when removing watcher asynchronously in "local" mode
> -
>
> Key: ZOOKEEPER-2320
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2320
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: c client
>Affects Versions: 3.5.1
>Reporter: Hadriel Kaplan
>Assignee: Abraham Fine
>
> The C-client library will crash when invoking the asynchronous 
> {{zoo_aremove_watchers()}} API function with the '{{local}}' argument set to 
> 1.
> The reason is: if the local argument is 1/true, then the code does 
> '{{notify_sync_completion((struct sync_completion *)data);}}' But casting the 
> '{{data}}' variable to a {{sync_completion}} struct pointer is bogus/invalid, 
> and when it's later handles as that struct pointer it's accessing invalid 
> memory.
> As a side note: it will work ok when called _synchronously_ through 
> {{zoo_remove_watchers()}}, because that function creates a 
> {{sync_completion}} struct and passes it to the asynch 
> {{zoo_aremove_watchers()}}, but it will not work ok when the asynch function 
> is used directly for the reason stated perviously.
> Another side note: the docs state that setting the 'local' flag makes the 
> C-client remove the watcher "even if there is no server connection" - but 
> really it makes the C-client remove the watcher without notifying the server 
> at *all*, even if the connection to a server is up. (well... that's what it 
> would do if it didn't just crash instead ;)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-1045) Quorum Peer mutual authentication

2016-06-10 Thread Raul Gutierrez Segales (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15324009#comment-15324009
 ] 

Raul Gutierrez Segales commented on ZOOKEEPER-1045:
---

+1, great work [~rakeshr]!

> Quorum Peer mutual authentication
> -
>
> Key: ZOOKEEPER-1045
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1045
> Project: ZooKeeper
>  Issue Type: New Feature
>  Components: server
>Reporter: Eugene Koontz
>Assignee: Rakesh R
>Priority: Critical
> Fix For: 3.4.9, 3.5.3
>
> Attachments: 0001-ZOOKEEPER-1045-br-3-4.patch, 
> 1045_failing_phunt.tar.gz, ZK-1045-test-case-failure-logs.zip, 
> ZOOKEEPER-1045-00.patch, ZOOKEEPER-1045-Rolling Upgrade Design Proposal.pdf, 
> ZOOKEEPER-1045-br-3-4.patch, ZOOKEEPER-1045-br-3-4.patch, 
> ZOOKEEPER-1045-br-3-4.patch, ZOOKEEPER-1045-br-3-4.patch
>
>
> ZOOKEEPER-938 addresses mutual authentication between clients and servers. 
> This bug, on the other hand, is for authentication among quorum peers. 
> Hopefully much of the work done on SASL integration with Zookeeper for 
> ZOOKEEPER-938 can be used as a foundation for this enhancement.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (ZOOKEEPER-2137) Make testPortChange() less flaky

2016-06-09 Thread Raul Gutierrez Segales (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15321965#comment-15321965
 ] 

Raul Gutierrez Segales edited comment on ZOOKEEPER-2137 at 6/9/16 6:06 AM:
---

lgtm, +1.

[~shralex]: any other thoughts?


was (Author: rgs):
lgtm, +1.

[~shralex]: any other thoughts?

> Make testPortChange() less flaky
> 
>
> Key: ZOOKEEPER-2137
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2137
> Project: ZooKeeper
>  Issue Type: Sub-task
>Reporter: Hongchao Deng
>Assignee: Michael Han
> Fix For: 3.5.2, 3.6.0
>
> Attachments: ZOOKEEPER-2137-cb.patch, ZOOKEEPER-2137.patch, 
> ZOOKEEPER-2137.patch, ZOOKEEPER-2137.patch, ZOOKEEPER-2137.patch
>
>
> The cause of flaky failure of testPortChange() is a race in sync().
> I figured out it could take some time to fix sync(). Meanwhile, we can make 
> testPortChange() less flaky by doing reconfig on the leader. We can change 
> this back in the fix of ZOOKEEPER-2136.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2137) Make testPortChange() less flaky

2016-06-09 Thread Raul Gutierrez Segales (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15321965#comment-15321965
 ] 

Raul Gutierrez Segales commented on ZOOKEEPER-2137:
---

lgtm, +1.

[~shralex]: any other thoughts?

> Make testPortChange() less flaky
> 
>
> Key: ZOOKEEPER-2137
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2137
> Project: ZooKeeper
>  Issue Type: Sub-task
>Reporter: Hongchao Deng
>Assignee: Michael Han
> Fix For: 3.5.2, 3.6.0
>
> Attachments: ZOOKEEPER-2137-cb.patch, ZOOKEEPER-2137.patch, 
> ZOOKEEPER-2137.patch, ZOOKEEPER-2137.patch, ZOOKEEPER-2137.patch
>
>
> The cause of flaky failure of testPortChange() is a race in sync().
> I figured out it could take some time to fix sync(). Meanwhile, we can make 
> testPortChange() less flaky by doing reconfig on the leader. We can change 
> this back in the fix of ZOOKEEPER-2136.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2080) ReconfigRecoveryTest fails intermittently

2016-06-09 Thread Raul Gutierrez Segales (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15321959#comment-15321959
 ] 

Raul Gutierrez Segales commented on ZOOKEEPER-2080:
---

[~hanm]: sure — go for it. Thanks!

> ReconfigRecoveryTest fails intermittently
> -
>
> Key: ZOOKEEPER-2080
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2080
> Project: ZooKeeper
>  Issue Type: Sub-task
>Reporter: Ted Yu
>Assignee: Raul Gutierrez Segales
> Attachments: jacoco-ZOOKEEPER-2080.unzip-grows-to-70MB.7z, 
> repro-20150816.log
>
>
> I got the following test failure on MacBook with trunk code:
> {code}
> Testcase: testCurrentObserverIsParticipantInNewConfig took 93.628 sec
>   FAILED
> waiting for server 2 being up
> junit.framework.AssertionFailedError: waiting for server 2 being up
>   at 
> org.apache.zookeeper.server.quorum.ReconfigRecoveryTest.testCurrentObserverIsParticipantInNewConfig(ReconfigRecoveryTest.java:529)
>   at 
> org.apache.zookeeper.JUnit4ZKTestRunner$LoggedInvokeMethod.evaluate(JUnit4ZKTestRunner.java:52)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2442) Socket leak in QuorumCnxManager connectOne

2016-06-08 Thread Raul Gutierrez Segales (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15320773#comment-15320773
 ] 

Raul Gutierrez Segales commented on ZOOKEEPER-2442:
---

Merged to trunk:

https://github.com/apache/zookeeper/commit/3c37184e83a3e68b73544cebccf9388eea26f523

Waiting on [~cnauroth] for merging to the 3.5 branch.

> Socket leak in QuorumCnxManager connectOne
> --
>
> Key: ZOOKEEPER-2442
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2442
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: quorum, server
>Affects Versions: 3.5.1
>Reporter: Michael Han
>Assignee: Michael Han
> Attachments: ZOOKEEPER-2442.patch
>
>
> The function connectOne() in QuorumCnxManager.java sometimes fails to release 
> a socket allocated by Socket():
> {code}
>  try {
> if (LOG.isDebugEnabled()) {
> LOG.debug("Opening channel to server " + sid);
> }
> Socket sock = new Socket();
> setSockOpts(sock);
> sock.connect(self.getView().get(sid).electionAddr, cnxTO);
> if (LOG.isDebugEnabled()) {
> LOG.debug("Connected to server " + sid);
> }
> initiateConnection(sock, sid);
> } catch (UnresolvedAddressException e) {
> // Sun doesn't include the address that causes this
> // exception to be thrown, also UAE cannot be wrapped cleanly
> // so we log the exception in order to capture this critical
> // detail.
> LOG.warn("Cannot open channel to " + sid
> + " at election address " + electionAddr, e);
> throw e;
> } catch (IOException e) {
> LOG.warn("Cannot open channel to " + sid
> + " at election address " + electionAddr,
> e);
> }
> {code}
> Another place in Listener.run() where the client socket is not explicitly 
> closed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2442) Socket leak in QuorumCnxManager connectOne

2016-06-08 Thread Raul Gutierrez Segales (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15320766#comment-15320766
 ] 

Raul Gutierrez Segales commented on ZOOKEEPER-2442:
---

Oh, forgot to mention, a super small nit for next time. In:

{code}
-if (LOG.isDebugEnabled()) {
-LOG.debug("Opening channel to server " + sid);
-}
+LOG.debug("Opening channel to server " + sid);
{code}

I think for LOG statements using {} (extrapolation) instead of + 
(concatenation) reads nicer (specially for statements with multiple vars in 
them.

> Socket leak in QuorumCnxManager connectOne
> --
>
> Key: ZOOKEEPER-2442
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2442
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: quorum, server
>Affects Versions: 3.5.1
>Reporter: Michael Han
>Assignee: Michael Han
> Attachments: ZOOKEEPER-2442.patch
>
>
> The function connectOne() in QuorumCnxManager.java sometimes fails to release 
> a socket allocated by Socket():
> {code}
>  try {
> if (LOG.isDebugEnabled()) {
> LOG.debug("Opening channel to server " + sid);
> }
> Socket sock = new Socket();
> setSockOpts(sock);
> sock.connect(self.getView().get(sid).electionAddr, cnxTO);
> if (LOG.isDebugEnabled()) {
> LOG.debug("Connected to server " + sid);
> }
> initiateConnection(sock, sid);
> } catch (UnresolvedAddressException e) {
> // Sun doesn't include the address that causes this
> // exception to be thrown, also UAE cannot be wrapped cleanly
> // so we log the exception in order to capture this critical
> // detail.
> LOG.warn("Cannot open channel to " + sid
> + " at election address " + electionAddr, e);
> throw e;
> } catch (IOException e) {
> LOG.warn("Cannot open channel to " + sid
> + " at election address " + electionAddr,
> e);
> }
> {code}
> Another place in Listener.run() where the client socket is not explicitly 
> closed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2442) Socket leak in QuorumCnxManager connectOne

2016-06-08 Thread Raul Gutierrez Segales (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15320763#comment-15320763
 ] 

Raul Gutierrez Segales commented on ZOOKEEPER-2442:
---

lgtm, +1 - thanks [~hanm]! 

Presumably we want this for trunk and 3.5.x, right? I'll merge to trunk first. 

cc: [~cnauroth] who is getting the next 3.5.x release ready. 

> Socket leak in QuorumCnxManager connectOne
> --
>
> Key: ZOOKEEPER-2442
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2442
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: quorum, server
>Affects Versions: 3.5.1
>Reporter: Michael Han
>Assignee: Michael Han
> Attachments: ZOOKEEPER-2442.patch
>
>
> The function connectOne() in QuorumCnxManager.java sometimes fails to release 
> a socket allocated by Socket():
> {code}
>  try {
> if (LOG.isDebugEnabled()) {
> LOG.debug("Opening channel to server " + sid);
> }
> Socket sock = new Socket();
> setSockOpts(sock);
> sock.connect(self.getView().get(sid).electionAddr, cnxTO);
> if (LOG.isDebugEnabled()) {
> LOG.debug("Connected to server " + sid);
> }
> initiateConnection(sock, sid);
> } catch (UnresolvedAddressException e) {
> // Sun doesn't include the address that causes this
> // exception to be thrown, also UAE cannot be wrapped cleanly
> // so we log the exception in order to capture this critical
> // detail.
> LOG.warn("Cannot open channel to " + sid
> + " at election address " + electionAddr, e);
> throw e;
> } catch (IOException e) {
> LOG.warn("Cannot open channel to " + sid
> + " at election address " + electionAddr,
> e);
> }
> {code}
> Another place in Listener.run() where the client socket is not explicitly 
> closed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2137) Make testPortChange() less flaky

2016-06-08 Thread Raul Gutierrez Segales (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15320742#comment-15320742
 ] 

Raul Gutierrez Segales commented on ZOOKEEPER-2137:
---

In:

{code}
public static void testNormalOperation(ZooKeeper writer, ZooKeeper reader)
try{ 
writer.create("/test", "test".getBytes(),
ZooDefs.Ids.OPEN_ACL_UNSAFE, CreateMode.PERSISTENT);
+   reader.create("/dummy", "dummy".getBytes(),
+   ZooDefs.Ids.OPEN_ACL_UNSAFE, 
CreateMode.PERSISTENT);
} catch (KeeperException.NodeExistsException e) {   

}
{code}

don't you want *each* create() call to be independently wrapped within a 
try/catch NodeExistsException to ensure both writer and reader have an updated 
view of the world?

> Make testPortChange() less flaky
> 
>
> Key: ZOOKEEPER-2137
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2137
> Project: ZooKeeper
>  Issue Type: Sub-task
>Reporter: Hongchao Deng
>Assignee: Michael Han
> Fix For: 3.5.2, 3.6.0
>
> Attachments: ZOOKEEPER-2137-cb.patch, ZOOKEEPER-2137.patch, 
> ZOOKEEPER-2137.patch, ZOOKEEPER-2137.patch
>
>
> The cause of flaky failure of testPortChange() is a race in sync().
> I figured out it could take some time to fix sync(). Meanwhile, we can make 
> testPortChange() less flaky by doing reconfig on the leader. We can change 
> this back in the fix of ZOOKEEPER-2136.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2169) Enable creation of nodes with TTLs

2016-05-15 Thread Raul Gutierrez Segales (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15283998#comment-15283998
 ] 

Raul Gutierrez Segales commented on ZOOKEEPER-2169:
---

[~randgalt]: I did another pass. No show stoppers, generally looking pretty 
clean & nice. Let me know if you agree with the feedback and lets see where we 
go from there. I am thinking that this might not be the last time we add a new 
znode type, so might as well think a bit about the future... Thanks!

cc: [~fpj], [~cnauroth]

> Enable creation of nodes with TTLs
> --
>
> Key: ZOOKEEPER-2169
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2169
> Project: ZooKeeper
>  Issue Type: New Feature
>  Components: c client, java client, jute, server
>Affects Versions: 3.6.0
>Reporter: Camille Fournier
>Assignee: Jordan Zimmerman
> Fix For: 3.6.0
>
> Attachments: ZOOKEEPER-2169-2.patch, ZOOKEEPER-2169-3.patch, 
> ZOOKEEPER-2169-4.patch, ZOOKEEPER-2169-5.patch, ZOOKEEPER-2169.patch
>
>
> As a user, I would like to be able to create a node that is NOT tied to a 
> session but that WILL expire automatically if action is not taken by some 
> client within a time window.
> I propose this to enable clients interacting with ZK via http or other "thin 
> clients" to create ephemeral-like nodes.
> Some ideas for the design, up for discussion:
> The node should support all normal ZK node operations including ACLs, 
> sequential key generation, etc, however, it should not support the ephemeral 
> flag. The node will be created with a TTL that is updated via a refresh 
> operation. 
> The ZK quorum will watch this node similarly to the way that it watches for 
> session liveness; if the node is not refreshed within the TTL, it will expire.
> QUESTIONS:
> 1) Should we let the refresh operation set the TTL to a different base value?
> 2) If so, should the setting of the TTL to a new base value cause a watch to 
> fire?
> 3) Do we want to allow these nodes to have children or prevent this similar 
> to ephemeral nodes?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2024) Major throughput improvement with mixed workloads

2016-05-15 Thread Raul Gutierrez Segales (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15283995#comment-15283995
 ] 

Raul Gutierrez Segales commented on ZOOKEEPER-2024:
---

Cool [~shralex]! Thanks for the hard work. Will be good to know how this 
behaves in prod as early adopters step forward :-) 

> Major throughput improvement with mixed workloads
> -
>
> Key: ZOOKEEPER-2024
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2024
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: quorum, server
>Reporter: Kfir Lev-Ari
>Assignee: Kfir Lev-Ari
> Fix For: 3.5.3
>
> Attachments: ZOOKEEPER-2024.patch, ZOOKEEPER-2024.patch, 
> ZOOKEEPER-2024.patch, ZOOKEEPER-2024.patch, ZOOKEEPER-2024.patch, 
> ZOOKEEPER-2024.patch, ZOOKEEPER-2024.patch, ZOOKEEPER-2024.patch, 
> ZOOKEEPER-2024.patch, ZOOKEEPER-2024.patch, ZOOKEEPER-2024.patch, 
> ZOOKEEPER-2024.patch, ZOOKEEPER-2024.patch
>
>
> The patch is applied to the commit processor, and solves two problems:
> 1. Stalling - once the commit processor encounters a local write request, it 
> stalls local processing of all sessions until it receives a commit of that 
> request from the leader. 
> In mixed workloads, this severely hampers performance as it does not allow 
> read-only sessions to proceed at faster speed than read-write ones.
> 2. Starvation - as long as there are read requests to process, older remote 
> committed write requests are starved. 
> This occurs due to a bug fix 
> (https://issues.apache.org/jira/browse/ZOOKEEPER-1505) that forces processing 
> of local read requests before handling any committed write. The problem is 
> only manifested under high local read load. 
> Our solution solves these two problems. It improves throughput in mixed 
> workloads (in our tests, by up to 8x), and reduces latency, especially higher 
> percentiles (i.e., slowest requests). 
> The main idea is to separate sessions that inherently need to stall in order 
> to enforce order semantics, from ones that do not need to stall. To this end, 
> we add data structures for buffering and managing pending requests of stalled 
> sessions; these requests are moved out of the critical path to these data 
> structures, allowing continued processing of unaffected sessions. 
> Please see the docs:  
> 1) https://goo.gl/m1cINJ - includes a detailed description of the new commit 
> processor algorithm.
> 2) The attached patch implements our solution, and a collection of related 
> unit tests (https://reviews.apache.org/r/25160)
> 3) https://goo.gl/W0xDUP - performance results. 
> (See https://issues.apache.org/jira/browse/ZOOKEEPER-2023 for the 
> corresponding new system test that produced these performance measurements)
>  
> See also https://issues.apache.org/jira/browse/ZOOKEEPER-1609



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2024) Major throughput improvement with mixed workloads

2016-05-15 Thread Raul Gutierrez Segales (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15283979#comment-15283979
 ] 

Raul Gutierrez Segales commented on ZOOKEEPER-2024:
---

Well, that was too late I guess... Did Flavio +1 it offline?

> Major throughput improvement with mixed workloads
> -
>
> Key: ZOOKEEPER-2024
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2024
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: quorum, server
>Reporter: Kfir Lev-Ari
>Assignee: Kfir Lev-Ari
> Fix For: 3.5.3
>
> Attachments: ZOOKEEPER-2024.patch, ZOOKEEPER-2024.patch, 
> ZOOKEEPER-2024.patch, ZOOKEEPER-2024.patch, ZOOKEEPER-2024.patch, 
> ZOOKEEPER-2024.patch, ZOOKEEPER-2024.patch, ZOOKEEPER-2024.patch, 
> ZOOKEEPER-2024.patch, ZOOKEEPER-2024.patch, ZOOKEEPER-2024.patch, 
> ZOOKEEPER-2024.patch, ZOOKEEPER-2024.patch
>
>
> The patch is applied to the commit processor, and solves two problems:
> 1. Stalling - once the commit processor encounters a local write request, it 
> stalls local processing of all sessions until it receives a commit of that 
> request from the leader. 
> In mixed workloads, this severely hampers performance as it does not allow 
> read-only sessions to proceed at faster speed than read-write ones.
> 2. Starvation - as long as there are read requests to process, older remote 
> committed write requests are starved. 
> This occurs due to a bug fix 
> (https://issues.apache.org/jira/browse/ZOOKEEPER-1505) that forces processing 
> of local read requests before handling any committed write. The problem is 
> only manifested under high local read load. 
> Our solution solves these two problems. It improves throughput in mixed 
> workloads (in our tests, by up to 8x), and reduces latency, especially higher 
> percentiles (i.e., slowest requests). 
> The main idea is to separate sessions that inherently need to stall in order 
> to enforce order semantics, from ones that do not need to stall. To this end, 
> we add data structures for buffering and managing pending requests of stalled 
> sessions; these requests are moved out of the critical path to these data 
> structures, allowing continued processing of unaffected sessions. 
> Please see the docs:  
> 1) https://goo.gl/m1cINJ - includes a detailed description of the new commit 
> processor algorithm.
> 2) The attached patch implements our solution, and a collection of related 
> unit tests (https://reviews.apache.org/r/25160)
> 3) https://goo.gl/W0xDUP - performance results. 
> (See https://issues.apache.org/jira/browse/ZOOKEEPER-2023 for the 
> corresponding new system test that produced these performance measurements)
>  
> See also https://issues.apache.org/jira/browse/ZOOKEEPER-1609



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2024) Major throughput improvement with mixed workloads

2016-05-15 Thread Raul Gutierrez Segales (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15283976#comment-15283976
 ] 

Raul Gutierrez Segales commented on ZOOKEEPER-2024:
---

[~shralex]: thanks for the follow-up Alex. I am doing another pass now.

A small nit though: the tabs and extra whitespaces are still there, which make 
the patch hard to read via git diff (and git show) (and possibly in many 
editors).

If you have a git checkout of the repo, try git diff if your patch is not 
committed to a branch, otherwise git show. And you'll see something like:

http://imgur.com/a/8rclv

> Major throughput improvement with mixed workloads
> -
>
> Key: ZOOKEEPER-2024
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2024
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: quorum, server
>Reporter: Kfir Lev-Ari
>Assignee: Kfir Lev-Ari
> Fix For: 3.5.3
>
> Attachments: ZOOKEEPER-2024.patch, ZOOKEEPER-2024.patch, 
> ZOOKEEPER-2024.patch, ZOOKEEPER-2024.patch, ZOOKEEPER-2024.patch, 
> ZOOKEEPER-2024.patch, ZOOKEEPER-2024.patch, ZOOKEEPER-2024.patch, 
> ZOOKEEPER-2024.patch, ZOOKEEPER-2024.patch, ZOOKEEPER-2024.patch, 
> ZOOKEEPER-2024.patch, ZOOKEEPER-2024.patch
>
>
> The patch is applied to the commit processor, and solves two problems:
> 1. Stalling - once the commit processor encounters a local write request, it 
> stalls local processing of all sessions until it receives a commit of that 
> request from the leader. 
> In mixed workloads, this severely hampers performance as it does not allow 
> read-only sessions to proceed at faster speed than read-write ones.
> 2. Starvation - as long as there are read requests to process, older remote 
> committed write requests are starved. 
> This occurs due to a bug fix 
> (https://issues.apache.org/jira/browse/ZOOKEEPER-1505) that forces processing 
> of local read requests before handling any committed write. The problem is 
> only manifested under high local read load. 
> Our solution solves these two problems. It improves throughput in mixed 
> workloads (in our tests, by up to 8x), and reduces latency, especially higher 
> percentiles (i.e., slowest requests). 
> The main idea is to separate sessions that inherently need to stall in order 
> to enforce order semantics, from ones that do not need to stall. To this end, 
> we add data structures for buffering and managing pending requests of stalled 
> sessions; these requests are moved out of the critical path to these data 
> structures, allowing continued processing of unaffected sessions. 
> Please see the docs:  
> 1) https://goo.gl/m1cINJ - includes a detailed description of the new commit 
> processor algorithm.
> 2) The attached patch implements our solution, and a collection of related 
> unit tests (https://reviews.apache.org/r/25160)
> 3) https://goo.gl/W0xDUP - performance results. 
> (See https://issues.apache.org/jira/browse/ZOOKEEPER-2023 for the 
> corresponding new system test that produced these performance measurements)
>  
> See also https://issues.apache.org/jira/browse/ZOOKEEPER-1609



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2169) Enable creation of nodes with TTLs

2016-05-08 Thread Raul Gutierrez Segales (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15275984#comment-15275984
 ] 

Raul Gutierrez Segales commented on ZOOKEEPER-2169:
---

[~randgalt]: thanks for the patch! I did a first pass on reviewboard, though I 
don't think you've updated reviewboard with the latest version of the patch (to 
include tests, docs, ..). 

> Enable creation of nodes with TTLs
> --
>
> Key: ZOOKEEPER-2169
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2169
> Project: ZooKeeper
>  Issue Type: New Feature
>  Components: c client, java client, jute, server
>Affects Versions: 3.6.0
>Reporter: Camille Fournier
>Assignee: Jordan Zimmerman
> Fix For: 3.6.0
>
> Attachments: ZOOKEEPER-2169-2.patch, ZOOKEEPER-2169-3.patch, 
> ZOOKEEPER-2169-4.patch, ZOOKEEPER-2169-5.patch, ZOOKEEPER-2169.patch
>
>
> As a user, I would like to be able to create a node that is NOT tied to a 
> session but that WILL expire automatically if action is not taken by some 
> client within a time window.
> I propose this to enable clients interacting with ZK via http or other "thin 
> clients" to create ephemeral-like nodes.
> Some ideas for the design, up for discussion:
> The node should support all normal ZK node operations including ACLs, 
> sequential key generation, etc, however, it should not support the ephemeral 
> flag. The node will be created with a TTL that is updated via a refresh 
> operation. 
> The ZK quorum will watch this node similarly to the way that it watches for 
> session liveness; if the node is not refreshed within the TTL, it will expire.
> QUESTIONS:
> 1) Should we let the refresh operation set the TTL to a different base value?
> 2) If so, should the setting of the TTL to a new base value cause a watch to 
> fire?
> 3) Do we want to allow these nodes to have children or prevent this similar 
> to ephemeral nodes?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (ZOOKEEPER-2024) Major throughput improvement with mixed workloads

2016-03-27 Thread Raul Gutierrez Segales (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15213812#comment-15213812
 ] 

Raul Gutierrez Segales edited comment on ZOOKEEPER-2024 at 3/28/16 5:03 AM:


I pinged [~shralex] privately about this, but posting here as well: the 
indentation for comments (along with some extra tabs) went crazy on the last 
version of the patch (see the attached screenshot).

Could we please fix the indentation and remove the tabs/whitespaces? Thanks!


was (Author: rgs):
I pinged [~shralex] privately about this, but posting here as well: the 
indentation for comments (along with some extra tabs) went crazy on the last 
version of the patch.

Could we please fix the indentation and remove the tabs/whitespaces? Thanks!

> Major throughput improvement with mixed workloads
> -
>
> Key: ZOOKEEPER-2024
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2024
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: quorum, server
>Reporter: Kfir Lev-Ari
>Assignee: Kfir Lev-Ari
> Fix For: 3.5.3
>
> Attachments: ZOOKEEPER-2024.patch, ZOOKEEPER-2024.patch, 
> ZOOKEEPER-2024.patch, ZOOKEEPER-2024.patch, ZOOKEEPER-2024.patch, 
> ZOOKEEPER-2024.patch, ZOOKEEPER-2024.patch, ZOOKEEPER-2024.patch, 
> ZOOKEEPER-2024.patch, ZOOKEEPER-2024.patch, ZOOKEEPER-2024.patch, 
> ZOOKEEPER-2024.patch, indentation-tabs.png
>
>
> The patch is applied to the commit processor, and solves two problems:
> 1. Stalling - once the commit processor encounters a local write request, it 
> stalls local processing of all sessions until it receives a commit of that 
> request from the leader. 
> In mixed workloads, this severely hampers performance as it does not allow 
> read-only sessions to proceed at faster speed than read-write ones.
> 2. Starvation - as long as there are read requests to process, older remote 
> committed write requests are starved. 
> This occurs due to a bug fix 
> (https://issues.apache.org/jira/browse/ZOOKEEPER-1505) that forces processing 
> of local read requests before handling any committed write. The problem is 
> only manifested under high local read load. 
> Our solution solves these two problems. It improves throughput in mixed 
> workloads (in our tests, by up to 8x), and reduces latency, especially higher 
> percentiles (i.e., slowest requests). 
> The main idea is to separate sessions that inherently need to stall in order 
> to enforce order semantics, from ones that do not need to stall. To this end, 
> we add data structures for buffering and managing pending requests of stalled 
> sessions; these requests are moved out of the critical path to these data 
> structures, allowing continued processing of unaffected sessions. 
> Please see the docs:  
> 1) https://goo.gl/m1cINJ - includes a detailed description of the new commit 
> processor algorithm.
> 2) The attached patch implements our solution, and a collection of related 
> unit tests (https://reviews.apache.org/r/25160)
> 3) https://goo.gl/W0xDUP - performance results. 
> See also https://issues.apache.org/jira/browse/ZOOKEEPER-1609



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (ZOOKEEPER-2024) Major throughput improvement with mixed workloads

2016-03-27 Thread Raul Gutierrez Segales (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raul Gutierrez Segales updated ZOOKEEPER-2024:
--
Attachment: indentation-tabs.png

I pinged [~shralex] privately about this, but posting here as well: the 
indentation for comments (along with some extra tabs) went crazy on the last 
version of the patch.

Could we please fix the indentation and remove the tabs/whitespaces? Thanks!

> Major throughput improvement with mixed workloads
> -
>
> Key: ZOOKEEPER-2024
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2024
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: quorum, server
>Reporter: Kfir Lev-Ari
>Assignee: Kfir Lev-Ari
> Fix For: 3.5.3
>
> Attachments: ZOOKEEPER-2024.patch, ZOOKEEPER-2024.patch, 
> ZOOKEEPER-2024.patch, ZOOKEEPER-2024.patch, ZOOKEEPER-2024.patch, 
> ZOOKEEPER-2024.patch, ZOOKEEPER-2024.patch, ZOOKEEPER-2024.patch, 
> ZOOKEEPER-2024.patch, ZOOKEEPER-2024.patch, ZOOKEEPER-2024.patch, 
> ZOOKEEPER-2024.patch, indentation-tabs.png
>
>
> The patch is applied to the commit processor, and solves two problems:
> 1. Stalling - once the commit processor encounters a local write request, it 
> stalls local processing of all sessions until it receives a commit of that 
> request from the leader. 
> In mixed workloads, this severely hampers performance as it does not allow 
> read-only sessions to proceed at faster speed than read-write ones.
> 2. Starvation - as long as there are read requests to process, older remote 
> committed write requests are starved. 
> This occurs due to a bug fix 
> (https://issues.apache.org/jira/browse/ZOOKEEPER-1505) that forces processing 
> of local read requests before handling any committed write. The problem is 
> only manifested under high local read load. 
> Our solution solves these two problems. It improves throughput in mixed 
> workloads (in our tests, by up to 8x), and reduces latency, especially higher 
> percentiles (i.e., slowest requests). 
> The main idea is to separate sessions that inherently need to stall in order 
> to enforce order semantics, from ones that do not need to stall. To this end, 
> we add data structures for buffering and managing pending requests of stalled 
> sessions; these requests are moved out of the critical path to these data 
> structures, allowing continued processing of unaffected sessions. 
> Please see the docs:  
> 1) https://goo.gl/m1cINJ - includes a detailed description of the new commit 
> processor algorithm.
> 2) The attached patch implements our solution, and a collection of related 
> unit tests (https://reviews.apache.org/r/25160)
> 3) https://goo.gl/W0xDUP - performance results. 
> See also https://issues.apache.org/jira/browse/ZOOKEEPER-1609



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2405) getTGT() in Login.java mishandles confidential information

2016-03-26 Thread Raul Gutierrez Segales (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15213191#comment-15213191
 ] 

Raul Gutierrez Segales commented on ZOOKEEPER-2405:
---

I am leaning towards removing the LOG.debug call, it's probably just a left 
over of when this feature was being added. 

> getTGT() in Login.java mishandles confidential information
> --
>
> Key: ZOOKEEPER-2405
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2405
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: kerberos, security, server
>Affects Versions: 3.4.8, 3.5.1, 3.6.0
>Reporter: Patrick Hunt
>Priority: Blocker
> Fix For: 3.4.9, 3.5.2, 3.6.0
>
>
> We're logging the kerberos ticket when in debug mode, probably not the best 
> idea. This was identified as a "critical" issue by Fortify.
> {noformat}
> for(KerberosTicket ticket: tickets) {
> KerberosPrincipal server = ticket.getServer();
> if (server.getName().equals("krbtgt/" + server.getRealm() + "@" + 
> server.getRealm())) {
> LOG.debug("Found tgt " + ticket + ".");
> return ticket;
> }
> }
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2024) Major throughput improvement with mixed workloads

2016-03-26 Thread Raul Gutierrez Segales (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15213147#comment-15213147
 ] 

Raul Gutierrez Segales commented on ZOOKEEPER-2024:
---

[~kfirlevari]: I am doing another round of reviews and I see that a lot of 
white-spaces and tabs sneaked in... do you mind removing them please? They are 
pretty easy to spot via reviewboard or (if you have a git via svn checkout) 
with git diff.. Thanks!

> Major throughput improvement with mixed workloads
> -
>
> Key: ZOOKEEPER-2024
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2024
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: quorum, server
>Reporter: Kfir Lev-Ari
>Assignee: Kfir Lev-Ari
> Fix For: 3.5.3
>
> Attachments: ZOOKEEPER-2024.patch, ZOOKEEPER-2024.patch, 
> ZOOKEEPER-2024.patch, ZOOKEEPER-2024.patch, ZOOKEEPER-2024.patch, 
> ZOOKEEPER-2024.patch, ZOOKEEPER-2024.patch, ZOOKEEPER-2024.patch, 
> ZOOKEEPER-2024.patch, ZOOKEEPER-2024.patch, ZOOKEEPER-2024.patch
>
>
> The patch is applied to the commit processor, and solves two problems:
> 1. Stalling - once the commit processor encounters a local write request, it 
> stalls local processing of all sessions until it receives a commit of that 
> request from the leader. 
> In mixed workloads, this severely hampers performance as it does not allow 
> read-only sessions to proceed at faster speed than read-write ones.
> 2. Starvation - as long as there are read requests to process, older remote 
> committed write requests are starved. 
> This occurs due to a bug fix 
> (https://issues.apache.org/jira/browse/ZOOKEEPER-1505) that forces processing 
> of local read requests before handling any committed write. The problem is 
> only manifested under high local read load. 
> Our solution solves these two problems. It improves throughput in mixed 
> workloads (in our tests, by up to 8x), and reduces latency, especially higher 
> percentiles (i.e., slowest requests). 
> The main idea is to separate sessions that inherently need to stall in order 
> to enforce order semantics, from ones that do not need to stall. To this end, 
> we add data structures for buffering and managing pending requests of stalled 
> sessions; these requests are moved out of the critical path to these data 
> structures, allowing continued processing of unaffected sessions. 
> Please see the docs:  
> 1) https://goo.gl/m1cINJ - includes a detailed description of the new commit 
> processor algorithm.
> 2) The attached patch implements our solution, and a collection of related 
> unit tests (https://reviews.apache.org/r/25160)
> 3) https://goo.gl/W0xDUP - performance results. 
> See also https://issues.apache.org/jira/browse/ZOOKEEPER-1609



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2403) zookeeper.skipACL should be a boolean in addition to a yes or no

2016-03-25 Thread Raul Gutierrez Segales (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15212748#comment-15212748
 ] 

Raul Gutierrez Segales commented on ZOOKEEPER-2403:
---

[~Ryan P]: I generate my patches with git because I access svn over git. So to 
generate a patch you'd do something like:

{code}
git diff --no-prefix HEAD~1.. > ZOOKEEPER-2403.patch
{code}

once you've committed to your branch.

We could probably rename this ticket to something like 'Consistently handle 
boolean properties' and then handle all the cases pointed out by Pat...

Thanks Ryan!

[~phunt]: makes sense?

> zookeeper.skipACL should be a boolean in addition to a yes or no
> 
>
> Key: ZOOKEEPER-2403
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2403
> Project: ZooKeeper
>  Issue Type: Wish
>Reporter: Ryan P
>Priority: Trivial
>
> Currently zookeeper.skipACL is evaluated to be either yes or no. This is less 
> than intuitive most developers would expect this to except true or false. 
> https://github.com/apache/zookeeper/blob/trunk/src/java/main/org/apache/zookeeper/server/PrepRequestProcessor.java#L96-Lundefined



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2388) Unit tests failing on Solaris

2016-03-18 Thread Raul Gutierrez Segales (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15198074#comment-15198074
 ] 

Raul Gutierrez Segales commented on ZOOKEEPER-2388:
---

Thanks for the quick fix [~arshad.mohammad]! +1

cc: [~cnauroth], [~phunt] in case any of you don't get to merge this, i'll do 
so later tonight. Thanks!

> Unit tests failing on Solaris
> -
>
> Key: ZOOKEEPER-2388
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2388
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: tests
>Affects Versions: 3.5.2
>Reporter: Patrick Hunt
>Assignee: Arshad Mohammad
>Priority: Blocker
> Fix For: 3.5.2, 3.6.0
>
> Attachments: ZOOKEEPER-2388-01.patch
>
>
> The same two tests are failing consistently on Solaris in 3.5/trunk (I don't 
> see similar failures in 3.4, jenkins is mostly green there)
> org.apache.zookeeper.server.quorum.LocalPeerBeanTest.testClientAddress
> org.apache.zookeeper.server.quorum.QuorumPeerTest.testQuorumPeerListendOnSpecifiedClientIP



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2388) Unit tests failing on Solaris

2016-03-14 Thread Raul Gutierrez Segales (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15194156#comment-15194156
 ] 

Raul Gutierrez Segales commented on ZOOKEEPER-2388:
---

[~arshad.mohammad]: FYI, the first failure comes from ZOOKEEPER-2299 and the 
second one from ZOOKEEPER-2301. I _think_ it might be related to binding 
127.0.0.2, but I need to get hold of a solaris box to check this out. 

> Unit tests failing on Solaris
> -
>
> Key: ZOOKEEPER-2388
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2388
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: tests
>Affects Versions: 3.5.2
>Reporter: Patrick Hunt
>Priority: Blocker
> Fix For: 3.5.2, 3.6.0
>
>
> The same two tests are failing consistently on Solaris in 3.5/trunk (I don't 
> see similar failures in 3.4, jenkins is mostly green there)
> org.apache.zookeeper.server.quorum.LocalPeerBeanTest.testClientAddress
> org.apache.zookeeper.server.quorum.QuorumPeerTest.testQuorumPeerListendOnSpecifiedClientIP



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2388) Unit tests failing on Solaris

2016-03-14 Thread Raul Gutierrez Segales (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15194141#comment-15194141
 ] 

Raul Gutierrez Segales commented on ZOOKEEPER-2388:
---

Thanks for the heads-up [~phunt] - looking!



> Unit tests failing on Solaris
> -
>
> Key: ZOOKEEPER-2388
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2388
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: tests
>Affects Versions: 3.5.2
>Reporter: Patrick Hunt
>Priority: Blocker
> Fix For: 3.5.2, 3.6.0
>
>
> The same two tests are failing consistently on Solaris in 3.5/trunk (I don't 
> see similar failures in 3.4, jenkins is mostly green there)
> org.apache.zookeeper.server.quorum.LocalPeerBeanTest.testClientAddress
> org.apache.zookeeper.server.quorum.QuorumPeerTest.testQuorumPeerListendOnSpecifiedClientIP



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2379) recent commit broke findbugs qabot check

2016-03-04 Thread Raul Gutierrez Segales (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15180729#comment-15180729
 ] 

Raul Gutierrez Segales commented on ZOOKEEPER-2379:
---

Tested locally, that doesn't work (errors with DC_PARTIALLY_CONSTRUCTED). So 
lets go with this patch, +1. Thanks [~rakeshr]!

[~cnauroth]: can you merge it? Otherwise I'll get to it later today, thanks 
Chris!

> recent commit broke findbugs qabot check
> 
>
> Key: ZOOKEEPER-2379
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2379
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: build
>Affects Versions: 3.4.9, 3.5.2, 3.6.0
>Reporter: Patrick Hunt
>Assignee: Rakesh R
>Priority: Blocker
> Fix For: 3.4.9, 3.5.2, 3.6.0
>
> Attachments: ZOOKEEPER-2379.patch
>
>
> A recent commit seems to have broken findbugs, looks like it's in 
> ZooKeeperSaslClient
> see:
> https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/3075//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2379) recent commit broke findbugs qabot check

2016-03-04 Thread Raul Gutierrez Segales (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15180711#comment-15180711
 ] 

Raul Gutierrez Segales commented on ZOOKEEPER-2379:
---

[~rakeshr], [~cnauroth]: would marking:

{code}
 private static Login login = null;
{code}

as volatile be enough to make findbugs happy?

> recent commit broke findbugs qabot check
> 
>
> Key: ZOOKEEPER-2379
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2379
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: build
>Affects Versions: 3.4.9, 3.5.2, 3.6.0
>Reporter: Patrick Hunt
>Assignee: Rakesh R
>Priority: Blocker
> Fix For: 3.4.9, 3.5.2, 3.6.0
>
> Attachments: ZOOKEEPER-2379.patch
>
>
> A recent commit seems to have broken findbugs, looks like it's in 
> ZooKeeperSaslClient
> see:
> https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/3075//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2379) recent commit broke findbugs qabot check

2016-03-04 Thread Raul Gutierrez Segales (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15180703#comment-15180703
 ] 

Raul Gutierrez Segales commented on ZOOKEEPER-2379:
---

[~rakeshr]: [~cnauroth]: looking! thanks for the heads-up!

> recent commit broke findbugs qabot check
> 
>
> Key: ZOOKEEPER-2379
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2379
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: build
>Affects Versions: 3.4.9, 3.5.2, 3.6.0
>Reporter: Patrick Hunt
>Assignee: Rakesh R
>Priority: Blocker
> Fix For: 3.4.9, 3.5.2, 3.6.0
>
> Attachments: ZOOKEEPER-2379.patch
>
>
> A recent commit seems to have broken findbugs, looks like it's in 
> ZooKeeperSaslClient
> see:
> https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/3075//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2366) Reconfiguration of client port causes a socket leak

2016-02-28 Thread Raul Gutierrez Segales (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15171425#comment-15171425
 ] 

Raul Gutierrez Segales commented on ZOOKEEPER-2366:
---

[~timothyjward]: thanks for the writing the test case! Mind generating the 
patch with something like:

{code}
git diff --no-prefix HEAD~1.. > ZOOKEEPER-2366.patch
{code}

so that it applies cleanly and CI can run.. Thanks!

> Reconfiguration of client port causes a socket leak
> ---
>
> Key: ZOOKEEPER-2366
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2366
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: quorum
>Affects Versions: 3.5.0
>Reporter: Timothy Ward
>Priority: Critical
> Fix For: 3.5.2
>
> Attachments: zookeeper.patch
>
>
> The NIOServerCnxnFactory reconfigure method can leak server sockets, and 
> hence make ports unusable until the JVM restarts:
> The first line of the method takes a reference to the current 
> ServerSocketChannel and then the next line replaces it. The subsequent 
> interactions with the server socket can fail (for example if the 
> reconfiguration tries to bind to an in-use port). If they fail *before* the  
> call to oldSS.close() then oldSS is *never* closed. This holds that port open 
> forever, and prevents the user from rolling back to the previous port!
> The code from reconfigure is shown below:
>  ServerSocketChannel oldSS = ss;
> try {
>this.ss = ServerSocketChannel.open();
>ss.socket().setReuseAddress(true);
>LOG.info("binding to port " + addr);
>ss.socket().bind(addr);
>ss.configureBlocking(false);
>acceptThread.setReconfiguring();
>oldSS.close();   
>acceptThread.wakeupSelector();
>try {
> acceptThread.join();
>  } catch (InterruptedException e) {
>  LOG.error("Error joining old acceptThread when 
> reconfiguring client port " + e.getMessage());
>  }
>acceptThread = new AcceptThread(ss, addr, selectorThreads);
>acceptThread.start();
> } catch(IOException e) {
>LOG.error("Error reconfiguring client port to " + addr + " " + 
> e.getMessage());
> }



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2375) The synchronize method of createSaslClient in ZooKeeperSaslClient can't be synchronize

2016-02-28 Thread Raul Gutierrez Segales (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15171422#comment-15171422
 ] 

Raul Gutierrez Segales commented on ZOOKEEPER-2375:
---

Thanks [~yuemeng] for the patch! +1.

[~rakeshr], [~arshad.mohammad]: mind giving it a look? I'll merge after I get a 
2nd +1.

> The synchronize method of createSaslClient in ZooKeeperSaslClient can't be 
> synchronize
> --
>
> Key: ZOOKEEPER-2375
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2375
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: java client
>Affects Versions: 3.4.6, 3.4.8, 3.5.0, 3.5.1
>Reporter: yuemeng
>Priority: Blocker
> Fix For: 3.5.2
>
> Attachments: ZOOKEEPER-2375.patch, ZOOKEEPER-2375_01.patch
>
>
> If there are exist many ZooKeeperSaslClient instance in one process,each 
> ZooKeeperSaslClient instance will be call synchronize method( 
> createSaslClient),But each ZooKeeperSaslClient instance will be lock the 
> current object(that is say ,the synchronize only for lock it's own object) 
> ,but many instances can access the static variable login,the synchronize 
> can't prevent other threads access the static login object,it will be cause 
> more than one ZooKeeperSaslClient  instances use the same login object,and 
> login.startThreadIfNeeded() will be called more than one times for same login 
> object。
> it wll cause problem:
>  ERROR | [Executor task launch worker-1-SendThread(fi1:24002)] | Exception 
> while trying to create SASL client: java.lang.IllegalThreadStateException | 
> org.apache.zookeeper.client.ZooKeeperSaslClient.createSaslClient(ZooKeeperSaslClient.java:305)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2377) zkServer.sh should resolve canonical path from symlinks

2016-02-28 Thread Raul Gutierrez Segales (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15171408#comment-15171408
 ] 

Raul Gutierrez Segales commented on ZOOKEEPER-2377:
---

Thanks for the patch [~sdh]! Lgtm, +1.

[~cnauroth], [~eribeiro]: could I get some additional eyes? Would this work for 
all platforms?

> zkServer.sh should resolve canonical path from symlinks
> ---
>
> Key: ZOOKEEPER-2377
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2377
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: scripts
>Affects Versions: 3.4.8
> Environment: Centos 6
>Reporter: Siddhartha
>Priority: Minor
> Attachments: ZOOKEEPER-2377.patch
>
>
> If zkServer.sh is started from a symlink, it is not able to correctly source 
> the other scripts because it looks in the wrong path.
> Attached patch fixes this by first resolving absolute path to the script.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2024) Major throughput improvement with mixed workloads

2016-02-23 Thread Raul Gutierrez Segales (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15159543#comment-15159543
 ] 

Raul Gutierrez Segales commented on ZOOKEEPER-2024:
---

+1 on aiming for trunk first.

[~shralex]: one remaining concern (will re-read the code today) is with losing 
watches. We had a very similar patch at Twitter but we ended up with some weird 
corner cases in which watches were lost.

cc: [~svoutil], [~tnarg]

> Major throughput improvement with mixed workloads
> -
>
> Key: ZOOKEEPER-2024
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2024
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: quorum, server
>Reporter: Kfir Lev-Ari
>Assignee: Kfir Lev-Ari
> Fix For: 3.5.2
>
> Attachments: ZOOKEEPER-2024.patch, ZOOKEEPER-2024.patch, 
> ZOOKEEPER-2024.patch, ZOOKEEPER-2024.patch, ZOOKEEPER-2024.patch, 
> ZOOKEEPER-2024.patch, ZOOKEEPER-2024.patch, ZOOKEEPER-2024.patch, 
> ZOOKEEPER-2024.patch
>
>
> The patch is applied to the commit processor, and solves two problems:
> 1. Stalling - once the commit processor encounters a local write request, it 
> stalls local processing of all sessions until it receives a commit of that 
> request from the leader. 
> In mixed workloads, this severely hampers performance as it does not allow 
> read-only sessions to proceed at faster speed than read-write ones.
> 2. Starvation - as long as there are read requests to process, older remote 
> committed write requests are starved. 
> This occurs due to a bug fix 
> (https://issues.apache.org/jira/browse/ZOOKEEPER-1505) that forces processing 
> of local read requests before handling any committed write. The problem is 
> only manifested under high local read load. 
> Our solution solves these two problems. It improves throughput in mixed 
> workloads (in our tests, by up to 8x), and reduces latency, especially higher 
> percentiles (i.e., slowest requests). 
> The main idea is to separate sessions that inherently need to stall in order 
> to enforce order semantics, from ones that do not need to stall. To this end, 
> we add data structures for buffering and managing pending requests of stalled 
> sessions; these requests are moved out of the critical path to these data 
> structures, allowing continued processing of unaffected sessions. 
> Please see the docs:  
> 1) https://goo.gl/m1cINJ - includes a detailed description of the new commit 
> processor algorithm.
> 2) The attached patch implements our solution, and a collection of related 
> unit tests (https://reviews.apache.org/r/25160)
> 3) https://goo.gl/W0xDUP - performance results. 
> See also https://issues.apache.org/jira/browse/ZOOKEEPER-1609



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (ZOOKEEPER-2314) Improve SASL documentation

2016-02-05 Thread Raul Gutierrez Segales (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raul Gutierrez Segales updated ZOOKEEPER-2314:
--
Fix Version/s: (was: 3.4.8)
   3.4.9

> Improve SASL documentation
> --
>
> Key: ZOOKEEPER-2314
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2314
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: documentation
>Affects Versions: 3.4.6, 3.5.1
>Reporter: Flavio Junqueira
>Assignee: Flavio Junqueira
>Priority: Blocker
> Fix For: 3.4.9, 3.5.2, 3.6.0
>
>
> Points that occur to me right now:
> # The login object in ZooKeeperSaslClient is static, which means that if you 
> try to create another client for tests, the login object will be the first 
> one you've set for all runs. I've experienced this with 3.4.6.
> # There are a number of properties spread across the code that do not appear 
> in the docs. For example, zookeeper.allowSaslFailedClients isn't documented 
> afaict.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2314) Improve SASL documentation

2016-02-05 Thread Raul Gutierrez Segales (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15135175#comment-15135175
 ] 

Raul Gutierrez Segales commented on ZOOKEEPER-2314:
---

Lets move it to 3.4.9, the spirit of 3.4.8 was to fix the shutdown issue in 
3.4.7. 

> Improve SASL documentation
> --
>
> Key: ZOOKEEPER-2314
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2314
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: documentation
>Affects Versions: 3.4.6, 3.5.1
>Reporter: Flavio Junqueira
>Assignee: Flavio Junqueira
>Priority: Blocker
> Fix For: 3.4.8, 3.5.2, 3.6.0
>
>
> Points that occur to me right now:
> # The login object in ZooKeeperSaslClient is static, which means that if you 
> try to create another client for tests, the login object will be the first 
> one you've set for all runs. I've experienced this with 3.4.6.
> # There are a number of properties spread across the code that do not appear 
> in the docs. For example, zookeeper.allowSaslFailedClients isn't documented 
> afaict.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2358) NettyServerCnxn leaks watches upon close

2016-02-05 Thread Raul Gutierrez Segales (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15135484#comment-15135484
 ] 

Raul Gutierrez Segales commented on ZOOKEEPER-2358:
---

[~fpj], [~iandi]: punting this for 3.4.9 (but do ping me when this is ready for 
review). 

> NettyServerCnxn leaks watches upon close
> 
>
> Key: ZOOKEEPER-2358
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2358
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.4.7, 3.5.1
>Reporter: Ian Dimayuga
>Assignee: Ian Dimayuga
> Fix For: 3.4.9, 3.5.2
>
> Attachments: ZOOKEEPER-2358-3.4.patch, ZOOKEEPER-2358.patch
>
>
> NettyServerCnxn.close() neglects to call zkServer.removeCnxn the way 
> NIOServerCnxn.close() does. Also, WatchLeakTest does not test watch leaks in 
> Netty.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (ZOOKEEPER-2358) NettyServerCnxn leaks watches upon close

2016-02-05 Thread Raul Gutierrez Segales (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raul Gutierrez Segales updated ZOOKEEPER-2358:
--
Fix Version/s: (was: 3.4.8)
   3.4.9

> NettyServerCnxn leaks watches upon close
> 
>
> Key: ZOOKEEPER-2358
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2358
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.4.7, 3.5.1
>Reporter: Ian Dimayuga
>Assignee: Ian Dimayuga
> Fix For: 3.4.9, 3.5.2
>
> Attachments: ZOOKEEPER-2358-3.4.patch, ZOOKEEPER-2358.patch
>
>
> NettyServerCnxn.close() neglects to call zkServer.removeCnxn the way 
> NIOServerCnxn.close() does. Also, WatchLeakTest does not test watch leaks in 
> Netty.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2344) Provide more diagnostics/stack traces on SASL Auth failure

2016-02-05 Thread Raul Gutierrez Segales (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15135488#comment-15135488
 ] 

Raul Gutierrez Segales commented on ZOOKEEPER-2344:
---

[~cnauroth], [~steve_l]: lets target this for 3.4.9 - thanks!

> Provide more diagnostics/stack traces on SASL Auth failure
> --
>
> Key: ZOOKEEPER-2344
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2344
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: java client, server
>Affects Versions: 3.4.7, 3.5.1
>Reporter: Steve Loughran
> Fix For: 3.4.9, 3.5.2, 3.6.0
>
>
> When Kerberos decides it doesn't want to work, the JRE libraries provide some 
> terse and unhelpful error messages.
> The only way to debug the problem is (a) to have complete stack traces and 
> (b) as much related information as possible.
> Zookeeper could do more here. Currently too much of the code loses stack 
> traces; sometimes auth errors aren't reported back to the client (the 
> connection is closed) +others
> Everyone who has tried to diagnose kerberos problems will appreciate 
> improvements here



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (ZOOKEEPER-1460) IPv6 literal address not supported for quorum members

2016-02-05 Thread Raul Gutierrez Segales (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raul Gutierrez Segales updated ZOOKEEPER-1460:
--
Fix Version/s: (was: 3.4.8)
   3.4.9

> IPv6 literal address not supported for quorum members
> -
>
> Key: ZOOKEEPER-1460
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1460
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: quorum
>Affects Versions: 3.4.3
>Reporter: Chris Dolan
>Assignee: Joseph Walton
> Fix For: 3.4.9, 3.5.2, 3.6.0
>
> Attachments: 
> ZOOKEEPER-1460-accept-square-bracket-delimited-IPv6-literals.2.diff, 
> ZOOKEEPER-1460-accept-square-bracket-delimited-IPv6-literals.diff, 
> ZOOKEEPER-1460-for-3.5.0.patch, ZOOKEEPER-1460.003.patch
>
>
> Via code inspection, I see that the "server.nnn" configuration key does not 
> support literal IPv6 addresses because the property value is split on ":". In 
> v3.4.3, the problem is in QuorumPeerConfig:
> {noformat}
> String parts[] = value.split(":");
> InetSocketAddress addr = new InetSocketAddress(parts[0],
> Integer.parseInt(parts[1]));
> {noformat}
> In the current trunk 
> (http://svn.apache.org/viewvc/zookeeper/trunk/src/java/main/org/apache/zookeeper/server/quorum/QuorumPeer.java?view=markup)
>  this code has been refactored into QuorumPeer.QuorumServer, but the bug 
> remains:
> {noformat}
> String serverClientParts[] = addressStr.split(";");
> String serverParts[] = serverClientParts[0].split(":");
> addr = new InetSocketAddress(serverParts[0],
> Integer.parseInt(serverParts[1]));
> {noformat}
> This bug probably affects very few users because most will naturally use a 
> hostname rather than a literal IP address. But given that IPv6 addresses are 
> supported for clients via ZOOKEEPER-667 it seems that server support should 
> be fixed too.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-1460) IPv6 literal address not supported for quorum members

2016-02-05 Thread Raul Gutierrez Segales (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15135489#comment-15135489
 ] 

Raul Gutierrez Segales commented on ZOOKEEPER-1460:
---

[~cnauroth]: lets target this in 3.4.9 - thanks!

> IPv6 literal address not supported for quorum members
> -
>
> Key: ZOOKEEPER-1460
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1460
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: quorum
>Affects Versions: 3.4.3
>Reporter: Chris Dolan
>Assignee: Joseph Walton
> Fix For: 3.4.9, 3.5.2, 3.6.0
>
> Attachments: 
> ZOOKEEPER-1460-accept-square-bracket-delimited-IPv6-literals.2.diff, 
> ZOOKEEPER-1460-accept-square-bracket-delimited-IPv6-literals.diff, 
> ZOOKEEPER-1460-for-3.5.0.patch, ZOOKEEPER-1460.003.patch
>
>
> Via code inspection, I see that the "server.nnn" configuration key does not 
> support literal IPv6 addresses because the property value is split on ":". In 
> v3.4.3, the problem is in QuorumPeerConfig:
> {noformat}
> String parts[] = value.split(":");
> InetSocketAddress addr = new InetSocketAddress(parts[0],
> Integer.parseInt(parts[1]));
> {noformat}
> In the current trunk 
> (http://svn.apache.org/viewvc/zookeeper/trunk/src/java/main/org/apache/zookeeper/server/quorum/QuorumPeer.java?view=markup)
>  this code has been refactored into QuorumPeer.QuorumServer, but the bug 
> remains:
> {noformat}
> String serverClientParts[] = addressStr.split(";");
> String serverParts[] = serverClientParts[0].split(":");
> addr = new InetSocketAddress(serverParts[0],
> Integer.parseInt(serverParts[1]));
> {noformat}
> This bug probably affects very few users because most will naturally use a 
> hostname rather than a literal IP address. But given that IPv6 addresses are 
> supported for clients via ZOOKEEPER-667 it seems that server support should 
> be fixed too.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2323) ZooKeeper client enters into infinite AuthFailedException cycle if its unable to recreate Kerberos ticket

2016-02-05 Thread Raul Gutierrez Segales (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15135492#comment-15135492
 ] 

Raul Gutierrez Segales commented on ZOOKEEPER-2323:
---

[~fpj], [~Jobo], [~arshad.mohammad]: punting for 3.4.9. 

> ZooKeeper client enters into infinite AuthFailedException cycle if its unable 
> to recreate Kerberos ticket
> -
>
> Key: ZOOKEEPER-2323
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2323
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: java client
>Affects Versions: 3.4.7, 3.5.1
>Reporter: Arshad Mohammad
>Assignee: Arshad Mohammad
> Fix For: 3.4.8, 3.5.2, 3.6.0
>
> Attachments: ZOOKEEPER-2323-01.patch
>
>
> ZooKeeper client enters into infinite AuthFailedException cycle. For every 
> operation its throws AuthFailedException
> Here is the create operation exception
> {code}
> org.apache.zookeeper.KeeperException$AuthFailedException: KeeperErrorCode = 
> AuthFailed for /continuousRunningZKClient
>   at org.apache.zookeeper.KeeperException.create(KeeperException.java:127)
>   at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
>   at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1753)
> {code}
> This can be reproduced easily with the following steps:
> # Reduce the ZooKeeper client principal max life for example set 2 min.  use 
> command {color:blue} modprinc -maxlife 2min zkcli  {color} in kadmin. (This 
> is done to reduce the issue reproduce time)
> # Connect Client to ZooKeeper quorum,let it gets connected and some 
> operations are done successfully
> # Disconnect the Client's network, by pulling out the Ethernet cable or by 
> any way. Now the Client is in disconnected state, no operation is 
> expected,Client tries to reconnect to different-different servers in the 
> ZooKeeper quorum.
> # After two minutes Client tries to get new Keberos ticket and it fails.
> # Connect the Client to network. Client comes in connected state but 
> AuthFailedException for every operation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (ZOOKEEPER-832) Invalid session id causes infinite loop during automatic reconnect

2016-02-05 Thread Raul Gutierrez Segales (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raul Gutierrez Segales updated ZOOKEEPER-832:
-
Fix Version/s: (was: 3.4.8)
   3.4.9

> Invalid session id causes infinite loop during automatic reconnect
> --
>
> Key: ZOOKEEPER-832
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-832
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.5, 3.5.0
> Environment: All
>Reporter: Ryan Holmes
>Assignee: Arshad Mohammad
>Priority: Critical
> Fix For: 3.4.9, 3.5.2, 3.6.0
>
> Attachments: ZOOKEEPER-832.patch, ZOOKEEPER-832.patch, 
> ZOOKEEPER-832.patch, ZOOKEEPER-832.patch, ZOOKEEPER-832.patch, 
> ZOOKEEPER-832.patch, ZOOKEEPER-832.patch, ZOOKEEPER-832.patch, 
> ZOOKEEPER-832.patch, ZOOKEEPER-832.patch
>
>
> Steps to reproduce:
> 1.) Connect to a standalone server using the Java client.
> 2.) Stop the server.
> 3.) Delete the contents of the data directory (i.e. the persisted session 
> data).
> 4.) Start the server.
> The client now automatically tries to reconnect but the server refuses the 
> connection because the session id is invalid. The client and server are now 
> in an infinite loop of attempted and rejected connections. While this 
> situation represents a catastrophic failure and the current behavior is not 
> incorrect, it appears that there is no way to detect this situation on the 
> client and therefore no way to recover.
> The suggested improvement is to send an event to the default watcher 
> indicating that the current state is "session invalid", similar to how the 
> "session expired" state is handled.
> Server log output (repeats indefinitely):
> 2010-08-05 11:48:08,283 - INFO  
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn$Factory@250] - 
> Accepted socket connection from /127.0.0.1:63292
> 2010-08-05 11:48:08,284 - INFO  
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@751] - Refusing 
> session request for client /127.0.0.1:63292 as it has seen zxid 0x44 our last 
> zxid is 0x0 client must try another server
> 2010-08-05 11:48:08,284 - INFO  
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1434] - Closed 
> socket connection for client /127.0.0.1:63292 (no session established for 
> client)
> Client log output (repeats indefinitely):
> 11:47:17 org.apache.zookeeper.ClientCnxn startConnect INFO line 1000 - 
> Opening socket connection to server localhost/127.0.0.1:2181
> 11:47:17 org.apache.zookeeper.ClientCnxn run WARN line 1120 - Session 
> 0x12a3ae4e893000a for server null, unexpected error, closing socket 
> connection and attempting reconnect
> java.net.ConnectException: Connection refused
>   at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>   at 
> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574)
>   at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1078)
> 11:47:17 org.apache.zookeeper.ClientCnxn cleanup DEBUG line 1167 - Ignoring 
> exception during shutdown input
> java.nio.channels.ClosedChannelException
>   at 
> sun.nio.ch.SocketChannelImpl.shutdownInput(SocketChannelImpl.java:638)
>   at sun.nio.ch.SocketAdaptor.shutdownInput(SocketAdaptor.java:360)
>   at 
> org.apache.zookeeper.ClientCnxn$SendThread.cleanup(ClientCnxn.java:1164)
>   at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1129)
> 11:47:17 org.apache.zookeeper.ClientCnxn cleanup DEBUG line 1174 - Ignoring 
> exception during shutdown output
> java.nio.channels.ClosedChannelException
>   at 
> sun.nio.ch.SocketChannelImpl.shutdownOutput(SocketChannelImpl.java:649)
>   at sun.nio.ch.SocketAdaptor.shutdownOutput(SocketAdaptor.java:368)
>   at 
> org.apache.zookeeper.ClientCnxn$SendThread.cleanup(ClientCnxn.java:1171)
>   at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1129)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2044) CancelledKeyException in zookeeper 3.4.5

2016-02-05 Thread Raul Gutierrez Segales (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15135491#comment-15135491
 ] 

Raul Gutierrez Segales commented on ZOOKEEPER-2044:
---

[~abranzyck] - thanks for the patch and sorry for the lack of review. Given we 
are running late with 3.4.8, lets get to this in 3.4.9. 

> CancelledKeyException in zookeeper 3.4.5
> 
>
> Key: ZOOKEEPER-2044
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2044
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.6
> Environment: Red Hat Enterprise Linux Server release 6.2
>Reporter: shamjith antholi
>Assignee: Germán Blanco
>Priority: Minor
> Fix For: 3.4.9
>
> Attachments: ZOOKEEPER-2044.patch
>
>
> I am getting cancelled key exception in zookeeper (version 3.4.5). Please see 
> the log below. When this error is thrown, the connected solr shard is going 
> down by giving the error "Failed to index metadata in 
> Solr,StackTrace=SolrError: HTTP status 503.Reason: 
> {"responseHeader":{"status":503,"QTime":204},"error":{"msg":"ClusterState 
> says we are the leader, but locally we don't think so","code":503"  and 
> ultimately the current activity is going down. Could you please give a 
> solution for this ?
> Zookeper log 
> --
> 2014-09-16 02:58:47,799 [myid:1] - INFO  
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@832] - Client 
> attempting to renew session 0x24868e7ca980003 at /172.22.0.5:58587
> 2014-09-16 02:58:47,800 [myid:1] - INFO  
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:Learner@107] - Revalidating 
> client: 0x24868e7ca980003
> 2014-09-16 02:58:47,802 [myid:1] - INFO  
> [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:ZooKeeperServer@588] - Invalid 
> session 0x24868e7ca980003 for client /172.22.0.5:58587, probably expired
> 2014-09-16 02:58:47,803 [myid:1] - INFO  
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1001] - Closed 
> socket connection for client /172.22.0.5:58587 which had sessionid 
> 0x24868e7ca980003
> 2014-09-16 02:58:47,810 [myid:1] - ERROR 
> [CommitProcessor:1:NIOServerCnxn@180] - Unexpected Exception:
> java.nio.channels.CancelledKeyException
> at sun.nio.ch.SelectionKeyImpl.ensureValid(SelectionKeyImpl.java:55)
> at sun.nio.ch.SelectionKeyImpl.interestOps(SelectionKeyImpl.java:59)
> at 
> org.apache.zookeeper.server.NIOServerCnxn.sendBuffer(NIOServerCnxn.java:153)
> at 
> org.apache.zookeeper.server.NIOServerCnxn.sendResponse(NIOServerCnxn.java:1076)
> at 
> org.apache.zookeeper.server.NIOServerCnxn.process(NIOServerCnxn.java:1113)
> at org.apache.zookeeper.server.DataTree.setWatches(DataTree.java:1327)
> at 
> org.apache.zookeeper.server.ZKDatabase.setWatches(ZKDatabase.java:384)
> at 
> org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:304)
> at 
> org.apache.zookeeper.server.quorum.CommitProcessor.run(CommitProcessor.java:74)
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (ZOOKEEPER-2344) Provide more diagnostics/stack traces on SASL Auth failure

2016-02-05 Thread Raul Gutierrez Segales (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raul Gutierrez Segales updated ZOOKEEPER-2344:
--
Fix Version/s: (was: 3.4.8)
   3.4.9

> Provide more diagnostics/stack traces on SASL Auth failure
> --
>
> Key: ZOOKEEPER-2344
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2344
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: java client, server
>Affects Versions: 3.4.7, 3.5.1
>Reporter: Steve Loughran
> Fix For: 3.4.9, 3.5.2, 3.6.0
>
>
> When Kerberos decides it doesn't want to work, the JRE libraries provide some 
> terse and unhelpful error messages.
> The only way to debug the problem is (a) to have complete stack traces and 
> (b) as much related information as possible.
> Zookeeper could do more here. Currently too much of the code loses stack 
> traces; sometimes auth errors aren't reported back to the client (the 
> connection is closed) +others
> Everyone who has tried to diagnose kerberos problems will appreciate 
> improvements here



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (ZOOKEEPER-2044) CancelledKeyException in zookeeper 3.4.5

2016-02-05 Thread Raul Gutierrez Segales (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raul Gutierrez Segales updated ZOOKEEPER-2044:
--
Fix Version/s: (was: 3.4.8)
   3.4.9

> CancelledKeyException in zookeeper 3.4.5
> 
>
> Key: ZOOKEEPER-2044
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2044
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.6
> Environment: Red Hat Enterprise Linux Server release 6.2
>Reporter: shamjith antholi
>Assignee: Germán Blanco
>Priority: Minor
> Fix For: 3.4.9
>
> Attachments: ZOOKEEPER-2044.patch
>
>
> I am getting cancelled key exception in zookeeper (version 3.4.5). Please see 
> the log below. When this error is thrown, the connected solr shard is going 
> down by giving the error "Failed to index metadata in 
> Solr,StackTrace=SolrError: HTTP status 503.Reason: 
> {"responseHeader":{"status":503,"QTime":204},"error":{"msg":"ClusterState 
> says we are the leader, but locally we don't think so","code":503"  and 
> ultimately the current activity is going down. Could you please give a 
> solution for this ?
> Zookeper log 
> --
> 2014-09-16 02:58:47,799 [myid:1] - INFO  
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@832] - Client 
> attempting to renew session 0x24868e7ca980003 at /172.22.0.5:58587
> 2014-09-16 02:58:47,800 [myid:1] - INFO  
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:Learner@107] - Revalidating 
> client: 0x24868e7ca980003
> 2014-09-16 02:58:47,802 [myid:1] - INFO  
> [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:ZooKeeperServer@588] - Invalid 
> session 0x24868e7ca980003 for client /172.22.0.5:58587, probably expired
> 2014-09-16 02:58:47,803 [myid:1] - INFO  
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1001] - Closed 
> socket connection for client /172.22.0.5:58587 which had sessionid 
> 0x24868e7ca980003
> 2014-09-16 02:58:47,810 [myid:1] - ERROR 
> [CommitProcessor:1:NIOServerCnxn@180] - Unexpected Exception:
> java.nio.channels.CancelledKeyException
> at sun.nio.ch.SelectionKeyImpl.ensureValid(SelectionKeyImpl.java:55)
> at sun.nio.ch.SelectionKeyImpl.interestOps(SelectionKeyImpl.java:59)
> at 
> org.apache.zookeeper.server.NIOServerCnxn.sendBuffer(NIOServerCnxn.java:153)
> at 
> org.apache.zookeeper.server.NIOServerCnxn.sendResponse(NIOServerCnxn.java:1076)
> at 
> org.apache.zookeeper.server.NIOServerCnxn.process(NIOServerCnxn.java:1113)
> at org.apache.zookeeper.server.DataTree.setWatches(DataTree.java:1327)
> at 
> org.apache.zookeeper.server.ZKDatabase.setWatches(ZKDatabase.java:384)
> at 
> org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:304)
> at 
> org.apache.zookeeper.server.quorum.CommitProcessor.run(CommitProcessor.java:74)
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (ZOOKEEPER-2192) Port "Introduce new ZNode type: container" to 3.4.x

2016-02-05 Thread Raul Gutierrez Segales (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raul Gutierrez Segales updated ZOOKEEPER-2192:
--
Fix Version/s: (was: 3.4.8)
   3.4.9

> Port "Introduce new ZNode type: container" to 3.4.x
> ---
>
> Key: ZOOKEEPER-2192
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2192
> Project: ZooKeeper
>  Issue Type: Sub-task
>  Components: c client, java client, server
>Affects Versions: 3.4.6
>Reporter: Jordan Zimmerman
>Assignee: Jordan Zimmerman
> Fix For: 3.4.9
>
> Attachments: ZOOKEEPER-2192.patch, ZOOKEEPER-2192.patch
>
>
> ZOOKEEPER-2163 applies to the trunk branch. This feature is too needed to 
> wait for 3.5.x. So, port the feature to the 3.4.x branch so it can be 
> released ahead of 3.5.x.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (ZOOKEEPER-1884) zkCli silently ignores commands with missing parameters

2016-02-05 Thread Raul Gutierrez Segales (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raul Gutierrez Segales updated ZOOKEEPER-1884:
--
Fix Version/s: (was: 3.4.8)
   3.4.9

> zkCli silently ignores commands with missing parameters
> ---
>
> Key: ZOOKEEPER-1884
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1884
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.4.6
>Reporter: Flavio Junqueira
>Assignee: Raul Gutierrez Segales
>Priority: Minor
> Fix For: 3.4.9
>
> Attachments: ZOOKEEPER-1884.patch
>
>
> Apparently, we have fixed this in trunk, but not in the 3.4 branch. When we 
> pass only the path to create, the command is not executed because it expects 
> an additional parameter and there is no error message because the create 
> command exists.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (ZOOKEEPER-2184) Zookeeper Client should re-resolve hosts when connection attempts fail

2016-02-05 Thread Raul Gutierrez Segales (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raul Gutierrez Segales updated ZOOKEEPER-2184:
--
Fix Version/s: (was: 3.4.8)
   3.4.9

> Zookeeper Client should re-resolve hosts when connection attempts fail
> --
>
> Key: ZOOKEEPER-2184
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2184
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: java client
>Affects Versions: 3.4.6, 3.5.0
> Environment: Ubuntu 14.04 host, Docker containers for Zookeeper & 
> Kafka
>Reporter: Robert P. Thille
>Assignee: Robert P. Thille
>  Labels: easyfix, patch
> Fix For: 3.4.9, 3.5.2
>
> Attachments: ZOOKEEPER-2184.patch
>
>
> Testing in a Docker environment with a single Kafka instance using a single 
> Zookeeper instance. Restarting the Zookeeper container will cause it to 
> receive a new IP address. Kafka will never be able to reconnect to Zookeeper 
> and will hang indefinitely. Updating DNS or /etc/hosts with the new IP 
> address will not help the client to reconnect as the 
> zookeeper/client/StaticHostProvider resolves the connection string hosts at 
> creation time and never re-resolves.
> A solution would be for the client to notice that connection attempts fail 
> and attempt to re-resolve the hostnames in the connectString.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2192) Port "Introduce new ZNode type: container" to 3.4.x

2016-02-05 Thread Raul Gutierrez Segales (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15135495#comment-15135495
 ] 

Raul Gutierrez Segales commented on ZOOKEEPER-2192:
---

Given the issue we had with 3.4.7, I am now leaning towards not back-porting 
this into 3.4 and directing our efforts towards making 3.5 ready for wide 
consumption. 

> Port "Introduce new ZNode type: container" to 3.4.x
> ---
>
> Key: ZOOKEEPER-2192
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2192
> Project: ZooKeeper
>  Issue Type: Sub-task
>  Components: c client, java client, server
>Affects Versions: 3.4.6
>Reporter: Jordan Zimmerman
>Assignee: Jordan Zimmerman
> Fix For: 3.4.9
>
> Attachments: ZOOKEEPER-2192.patch, ZOOKEEPER-2192.patch
>
>
> ZOOKEEPER-2163 applies to the trunk branch. This feature is too needed to 
> wait for 3.5.x. So, port the feature to the 3.4.x branch so it can be 
> released ahead of 3.5.x.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (ZOOKEEPER-2243) Supported platforms is completely out of date

2016-02-05 Thread Raul Gutierrez Segales (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raul Gutierrez Segales updated ZOOKEEPER-2243:
--
Fix Version/s: (was: 3.4.8)
   3.4.9

> Supported platforms is completely out of date
> -
>
> Key: ZOOKEEPER-2243
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2243
> Project: ZooKeeper
>  Issue Type: Bug
>Reporter: Ivan Kelly
>Assignee: Chris Nauroth
> Fix For: 3.4.9, 3.5.2, 3.6.0
>
> Attachments: ZOOKEEPER-2243-branch-3.4.001.patch, 
> ZOOKEEPER-2243.001.patch
>
>
> http://zookeeper.apache.org/doc/r3.4.6/zookeeperAdmin.html#sc_supportedPlatforms
> It refers to Solaris as Sun Solaris so it's at least 5 years out of date.
> We should "support" the platforms that we are running zookeeper on regularly, 
> so I suggest paring it down to linux and windows (mac os doesn't really count 
> because people don't run it on servers anymore). Everything else should be 
> "may work, not supported, but will fix obvious bugs".



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (ZOOKEEPER-2323) ZooKeeper client enters into infinite AuthFailedException cycle if its unable to recreate Kerberos ticket

2016-02-05 Thread Raul Gutierrez Segales (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raul Gutierrez Segales updated ZOOKEEPER-2323:
--
Fix Version/s: (was: 3.4.8)
   3.4.9

> ZooKeeper client enters into infinite AuthFailedException cycle if its unable 
> to recreate Kerberos ticket
> -
>
> Key: ZOOKEEPER-2323
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2323
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: java client
>Affects Versions: 3.4.7, 3.5.1
>Reporter: Arshad Mohammad
>Assignee: Arshad Mohammad
> Fix For: 3.4.9, 3.5.2, 3.6.0
>
> Attachments: ZOOKEEPER-2323-01.patch
>
>
> ZooKeeper client enters into infinite AuthFailedException cycle. For every 
> operation its throws AuthFailedException
> Here is the create operation exception
> {code}
> org.apache.zookeeper.KeeperException$AuthFailedException: KeeperErrorCode = 
> AuthFailed for /continuousRunningZKClient
>   at org.apache.zookeeper.KeeperException.create(KeeperException.java:127)
>   at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
>   at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1753)
> {code}
> This can be reproduced easily with the following steps:
> # Reduce the ZooKeeper client principal max life for example set 2 min.  use 
> command {color:blue} modprinc -maxlife 2min zkcli  {color} in kadmin. (This 
> is done to reduce the issue reproduce time)
> # Connect Client to ZooKeeper quorum,let it gets connected and some 
> operations are done successfully
> # Disconnect the Client's network, by pulling out the Ethernet cable or by 
> any way. Now the Client is in disconnected state, no operation is 
> expected,Client tries to reconnect to different-different servers in the 
> ZooKeeper quorum.
> # After two minutes Client tries to get new Keberos ticket and it fails.
> # Connect the Client to network. Client comes in connected state but 
> AuthFailedException for every operation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (ZOOKEEPER-2154) NPE in KeeperException

2016-02-05 Thread Raul Gutierrez Segales (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raul Gutierrez Segales updated ZOOKEEPER-2154:
--
Fix Version/s: (was: 3.4.8)
   3.4.9

> NPE in KeeperException
> --
>
> Key: ZOOKEEPER-2154
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2154
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: java client
>Affects Versions: 3.4.6
>Reporter: Surendra Singh Lilhore
>Assignee: Surendra Singh Lilhore
> Fix For: 3.4.9, 3.5.2, 3.6.0
>
> Attachments: ZOOKEEPER-2154.patch
>
>
> KeeperException should handle exception is code is null...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2360) Update commons collections version used by tests/releaseaudit

2016-02-04 Thread Raul Gutierrez Segales (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15133212#comment-15133212
 ] 

Raul Gutierrez Segales commented on ZOOKEEPER-2360:
---

[~cnauroth], [~phunt]: i am planning on cutting an RC for 3.4.8 tonight, is 
this a must have? Can we delay it for 3.4.9?

> Update commons collections version used by tests/releaseaudit
> -
>
> Key: ZOOKEEPER-2360
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2360
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: build
>Affects Versions: 3.4.7, 3.5.1
>Reporter: Patrick Hunt
>Assignee: Patrick Hunt
>Priority: Blocker
> Fix For: 3.4.8, 3.5.2
>
> Attachments: ZOOKEEPER-2360-branch34.patch, ZOOKEEPER-2360.patch, 
> ZOOKEEPER-2360.patch
>
>
> I don't believe this affects us from a security perspective directly, however 
> it's something we should clean up in our next release.
> Afaict the only commons we use for shipping/production code is commons-cli. 
> Our two release branches, 3.4 and 3.5, neither of them use 
> commons-collections. I looked at the binary release artifact and it doesn't 
> include the commons collections jar.
> We do have a test that uses CollectionsUtils, but no shipping code. I 
> downloaded our 3.4 and 3.5 artifacts, this is all I see:
> phunt:~/Downloads/zd/5/zookeeper-3.5.1-alpha$ grep -R 
> "org.apache.commons.collections" .
> ./src/java/test/org/apache/zookeeper/RemoveWatchesTest.java:import 
> org.apache.commons.collections.CollectionUtils;
> phunt:~/Downloads/zd/5/zookeeper-3.5.1-alpha$
> Also in our ivy file we have
>  rev="0.10" conf="releaseaudit->default"/>
>  rev="2.6" conf="releaseaudit->default"/>
>  rev="3.2.1" conf="releaseaudit->default"/>
> So commons-collections is pulled in - but only for the release audit, which 
> is something we do as a build verification activity but not part of the 
> product itself.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2247) Zookeeper service becomes unavailable when leader fails to write transaction log

2016-02-02 Thread Raul Gutierrez Segales (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15128705#comment-15128705
 ] 

Raul Gutierrez Segales commented on ZOOKEEPER-2247:
---

[~rakeshr], [~fpj]: could we wrap this today please? Thanks!

> Zookeeper service becomes unavailable when leader fails to write transaction 
> log
> 
>
> Key: ZOOKEEPER-2247
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2247
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.5.0
>Reporter: Arshad Mohammad
>Assignee: Arshad Mohammad
>Priority: Critical
> Fix For: 3.4.9, 3.5.2, 3.6.0
>
> Attachments: ZOOKEEPER-2247-01.patch, ZOOKEEPER-2247-02.patch, 
> ZOOKEEPER-2247-03.patch, ZOOKEEPER-2247-04.patch, ZOOKEEPER-2247-05.patch, 
> ZOOKEEPER-2247-06.patch, ZOOKEEPER-2247-07.patch, ZOOKEEPER-2247-09.patch, 
> ZOOKEEPER-2247-10.patch, ZOOKEEPER-2247-11.patch, ZOOKEEPER-2247-b3.5.patch
>
>
> Zookeeper service becomes unavailable when leader fails to write transaction 
> log. Bellow are the exceptions
> {code}
> 2015-08-14 15:41:18,556 [myid:100] - ERROR 
> [SyncThread:100:ZooKeeperCriticalThread@48] - Severe unrecoverable error, 
> from thread : SyncThread:100
> java.io.IOException: Input/output error
>   at sun.nio.ch.FileDispatcherImpl.force0(Native Method)
>   at sun.nio.ch.FileDispatcherImpl.force(FileDispatcherImpl.java:76)
>   at sun.nio.ch.FileChannelImpl.force(FileChannelImpl.java:376)
>   at 
> org.apache.zookeeper.server.persistence.FileTxnLog.commit(FileTxnLog.java:331)
>   at 
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.commit(FileTxnSnapLog.java:380)
>   at org.apache.zookeeper.server.ZKDatabase.commit(ZKDatabase.java:563)
>   at 
> org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:178)
>   at 
> org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:113)
> 2015-08-14 15:41:18,559 [myid:100] - INFO  
> [SyncThread:100:ZooKeeperServer$ZooKeeperServerListenerImpl@500] - Thread 
> SyncThread:100 exits, error code 1
> 2015-08-14 15:41:18,559 [myid:100] - INFO  
> [SyncThread:100:ZooKeeperServer@523] - shutting down
> 2015-08-14 15:41:18,560 [myid:100] - INFO  
> [SyncThread:100:SessionTrackerImpl@232] - Shutting down
> 2015-08-14 15:41:18,560 [myid:100] - INFO  
> [SyncThread:100:LeaderRequestProcessor@77] - Shutting down
> 2015-08-14 15:41:18,560 [myid:100] - INFO  
> [SyncThread:100:PrepRequestProcessor@1035] - Shutting down
> 2015-08-14 15:41:18,560 [myid:100] - INFO  
> [SyncThread:100:ProposalRequestProcessor@88] - Shutting down
> 2015-08-14 15:41:18,561 [myid:100] - INFO  
> [SyncThread:100:CommitProcessor@356] - Shutting down
> 2015-08-14 15:41:18,561 [myid:100] - INFO  
> [CommitProcessor:100:CommitProcessor@191] - CommitProcessor exited loop!
> 2015-08-14 15:41:18,562 [myid:100] - INFO  
> [SyncThread:100:Leader$ToBeAppliedRequestProcessor@915] - Shutting down
> 2015-08-14 15:41:18,562 [myid:100] - INFO  
> [SyncThread:100:FinalRequestProcessor@646] - shutdown of request processor 
> complete
> 2015-08-14 15:41:18,562 [myid:100] - INFO  
> [SyncThread:100:SyncRequestProcessor@191] - Shutting down
> 2015-08-14 15:41:18,563 [myid:100] - INFO  [ProcessThread(sid:100 
> cport:-1)::PrepRequestProcessor@159] - PrepRequestProcessor exited loop!
> {code}
> After this exception Leader server still remains leader. After this non 
> recoverable exception the leader should go down and let other followers 
> become leader.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2247) Zookeeper service becomes unavailable when leader fails to write transaction log

2016-02-01 Thread Raul Gutierrez Segales (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15126714#comment-15126714
 ] 

Raul Gutierrez Segales commented on ZOOKEEPER-2247:
---

Thanks [~rakeshr]!

Mind taking one more look [~fpj]?

> Zookeeper service becomes unavailable when leader fails to write transaction 
> log
> 
>
> Key: ZOOKEEPER-2247
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2247
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.5.0
>Reporter: Arshad Mohammad
>Assignee: Arshad Mohammad
>Priority: Critical
> Fix For: 3.4.9, 3.5.2
>
> Attachments: ZOOKEEPER-2247-01.patch, ZOOKEEPER-2247-02.patch, 
> ZOOKEEPER-2247-03.patch, ZOOKEEPER-2247-04.patch, ZOOKEEPER-2247-05.patch, 
> ZOOKEEPER-2247-06.patch, ZOOKEEPER-2247-07.patch, ZOOKEEPER-2247-09.patch, 
> ZOOKEEPER-2247-10.patch
>
>
> Zookeeper service becomes unavailable when leader fails to write transaction 
> log. Bellow are the exceptions
> {code}
> 2015-08-14 15:41:18,556 [myid:100] - ERROR 
> [SyncThread:100:ZooKeeperCriticalThread@48] - Severe unrecoverable error, 
> from thread : SyncThread:100
> java.io.IOException: Input/output error
>   at sun.nio.ch.FileDispatcherImpl.force0(Native Method)
>   at sun.nio.ch.FileDispatcherImpl.force(FileDispatcherImpl.java:76)
>   at sun.nio.ch.FileChannelImpl.force(FileChannelImpl.java:376)
>   at 
> org.apache.zookeeper.server.persistence.FileTxnLog.commit(FileTxnLog.java:331)
>   at 
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.commit(FileTxnSnapLog.java:380)
>   at org.apache.zookeeper.server.ZKDatabase.commit(ZKDatabase.java:563)
>   at 
> org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:178)
>   at 
> org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:113)
> 2015-08-14 15:41:18,559 [myid:100] - INFO  
> [SyncThread:100:ZooKeeperServer$ZooKeeperServerListenerImpl@500] - Thread 
> SyncThread:100 exits, error code 1
> 2015-08-14 15:41:18,559 [myid:100] - INFO  
> [SyncThread:100:ZooKeeperServer@523] - shutting down
> 2015-08-14 15:41:18,560 [myid:100] - INFO  
> [SyncThread:100:SessionTrackerImpl@232] - Shutting down
> 2015-08-14 15:41:18,560 [myid:100] - INFO  
> [SyncThread:100:LeaderRequestProcessor@77] - Shutting down
> 2015-08-14 15:41:18,560 [myid:100] - INFO  
> [SyncThread:100:PrepRequestProcessor@1035] - Shutting down
> 2015-08-14 15:41:18,560 [myid:100] - INFO  
> [SyncThread:100:ProposalRequestProcessor@88] - Shutting down
> 2015-08-14 15:41:18,561 [myid:100] - INFO  
> [SyncThread:100:CommitProcessor@356] - Shutting down
> 2015-08-14 15:41:18,561 [myid:100] - INFO  
> [CommitProcessor:100:CommitProcessor@191] - CommitProcessor exited loop!
> 2015-08-14 15:41:18,562 [myid:100] - INFO  
> [SyncThread:100:Leader$ToBeAppliedRequestProcessor@915] - Shutting down
> 2015-08-14 15:41:18,562 [myid:100] - INFO  
> [SyncThread:100:FinalRequestProcessor@646] - shutdown of request processor 
> complete
> 2015-08-14 15:41:18,562 [myid:100] - INFO  
> [SyncThread:100:SyncRequestProcessor@191] - Shutting down
> 2015-08-14 15:41:18,563 [myid:100] - INFO  [ProcessThread(sid:100 
> cport:-1)::PrepRequestProcessor@159] - PrepRequestProcessor exited loop!
> {code}
> After this exception Leader server still remains leader. After this non 
> recoverable exception the leader should go down and let other followers 
> become leader.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2247) Zookeeper service becomes unavailable when leader fails to write transaction log

2016-01-29 Thread Raul Gutierrez Segales (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15123914#comment-15123914
 ] 

Raul Gutierrez Segales commented on ZOOKEEPER-2247:
---

Per discussion in the mailing list, lets punt this to 3.4.9.

cc: [~fpj]

> Zookeeper service becomes unavailable when leader fails to write transaction 
> log
> 
>
> Key: ZOOKEEPER-2247
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2247
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.5.0
>Reporter: Arshad Mohammad
>Assignee: Arshad Mohammad
>Priority: Critical
> Fix For: 3.4.9, 3.5.2
>
> Attachments: ZOOKEEPER-2247-01.patch, ZOOKEEPER-2247-02.patch, 
> ZOOKEEPER-2247-03.patch, ZOOKEEPER-2247-04.patch, ZOOKEEPER-2247-05.patch, 
> ZOOKEEPER-2247-06.patch, ZOOKEEPER-2247-07.patch, ZOOKEEPER-2247-09.patch
>
>
> Zookeeper service becomes unavailable when leader fails to write transaction 
> log. Bellow are the exceptions
> {code}
> 2015-08-14 15:41:18,556 [myid:100] - ERROR 
> [SyncThread:100:ZooKeeperCriticalThread@48] - Severe unrecoverable error, 
> from thread : SyncThread:100
> java.io.IOException: Input/output error
>   at sun.nio.ch.FileDispatcherImpl.force0(Native Method)
>   at sun.nio.ch.FileDispatcherImpl.force(FileDispatcherImpl.java:76)
>   at sun.nio.ch.FileChannelImpl.force(FileChannelImpl.java:376)
>   at 
> org.apache.zookeeper.server.persistence.FileTxnLog.commit(FileTxnLog.java:331)
>   at 
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.commit(FileTxnSnapLog.java:380)
>   at org.apache.zookeeper.server.ZKDatabase.commit(ZKDatabase.java:563)
>   at 
> org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:178)
>   at 
> org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:113)
> 2015-08-14 15:41:18,559 [myid:100] - INFO  
> [SyncThread:100:ZooKeeperServer$ZooKeeperServerListenerImpl@500] - Thread 
> SyncThread:100 exits, error code 1
> 2015-08-14 15:41:18,559 [myid:100] - INFO  
> [SyncThread:100:ZooKeeperServer@523] - shutting down
> 2015-08-14 15:41:18,560 [myid:100] - INFO  
> [SyncThread:100:SessionTrackerImpl@232] - Shutting down
> 2015-08-14 15:41:18,560 [myid:100] - INFO  
> [SyncThread:100:LeaderRequestProcessor@77] - Shutting down
> 2015-08-14 15:41:18,560 [myid:100] - INFO  
> [SyncThread:100:PrepRequestProcessor@1035] - Shutting down
> 2015-08-14 15:41:18,560 [myid:100] - INFO  
> [SyncThread:100:ProposalRequestProcessor@88] - Shutting down
> 2015-08-14 15:41:18,561 [myid:100] - INFO  
> [SyncThread:100:CommitProcessor@356] - Shutting down
> 2015-08-14 15:41:18,561 [myid:100] - INFO  
> [CommitProcessor:100:CommitProcessor@191] - CommitProcessor exited loop!
> 2015-08-14 15:41:18,562 [myid:100] - INFO  
> [SyncThread:100:Leader$ToBeAppliedRequestProcessor@915] - Shutting down
> 2015-08-14 15:41:18,562 [myid:100] - INFO  
> [SyncThread:100:FinalRequestProcessor@646] - shutdown of request processor 
> complete
> 2015-08-14 15:41:18,562 [myid:100] - INFO  
> [SyncThread:100:SyncRequestProcessor@191] - Shutting down
> 2015-08-14 15:41:18,563 [myid:100] - INFO  [ProcessThread(sid:100 
> cport:-1)::PrepRequestProcessor@159] - PrepRequestProcessor exited loop!
> {code}
> After this exception Leader server still remains leader. After this non 
> recoverable exception the leader should go down and let other followers 
> become leader.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (ZOOKEEPER-2355) Ephemeral node is never deleted if follower fails while reading the proposal packet

2016-01-29 Thread Raul Gutierrez Segales (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raul Gutierrez Segales updated ZOOKEEPER-2355:
--
Fix Version/s: 3.4.9

> Ephemeral node is never deleted if follower fails while reading the proposal 
> packet
> ---
>
> Key: ZOOKEEPER-2355
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2355
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: quorum, server
>Reporter: Arshad Mohammad
>Assignee: Arshad Mohammad
>Priority: Critical
> Fix For: 3.4.9
>
> Attachments: ZOOKEEPER-2355-01.patch
>
>
> ZooKeeper ephemeral node is never deleted if follower fail while reading the 
> proposal packet
> The scenario is as follows:
> # Configure three node ZooKeeper cluster, lets say nodes are A, B and C, 
> start all, assume A is leader, B and C are follower
> # Connect to any of the server and create ephemeral node /e1
> # Close the session, ephemeral node /e1 will go for deletion
> # While receiving delete proposal make Follower B to fail with 
> {{SocketTimeoutException}}. This we need to do to reproduce the scenario 
> otherwise in production environment it happens because of network fault.
> # Remove the fault, just check that faulted Follower is now connected with 
> quorum
> # Connect to any of the server, create the same ephemeral node /e1, created 
> is success.
> # Close the session,  ephemeral node /e1 will go for deletion
> # {color:red}/e1 is not deleted from the faulted Follower B, It should have 
> been deleted as it was again created with another session{color}
> # {color:green}/e1 is deleted from Leader A and other Follower C{color}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (ZOOKEEPER-2247) Zookeeper service becomes unavailable when leader fails to write transaction log

2016-01-29 Thread Raul Gutierrez Segales (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raul Gutierrez Segales updated ZOOKEEPER-2247:
--
Fix Version/s: (was: 3.4.8)
   3.4.9

> Zookeeper service becomes unavailable when leader fails to write transaction 
> log
> 
>
> Key: ZOOKEEPER-2247
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2247
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.5.0
>Reporter: Arshad Mohammad
>Assignee: Arshad Mohammad
>Priority: Critical
> Fix For: 3.4.9, 3.5.2
>
> Attachments: ZOOKEEPER-2247-01.patch, ZOOKEEPER-2247-02.patch, 
> ZOOKEEPER-2247-03.patch, ZOOKEEPER-2247-04.patch, ZOOKEEPER-2247-05.patch, 
> ZOOKEEPER-2247-06.patch, ZOOKEEPER-2247-07.patch, ZOOKEEPER-2247-09.patch
>
>
> Zookeeper service becomes unavailable when leader fails to write transaction 
> log. Bellow are the exceptions
> {code}
> 2015-08-14 15:41:18,556 [myid:100] - ERROR 
> [SyncThread:100:ZooKeeperCriticalThread@48] - Severe unrecoverable error, 
> from thread : SyncThread:100
> java.io.IOException: Input/output error
>   at sun.nio.ch.FileDispatcherImpl.force0(Native Method)
>   at sun.nio.ch.FileDispatcherImpl.force(FileDispatcherImpl.java:76)
>   at sun.nio.ch.FileChannelImpl.force(FileChannelImpl.java:376)
>   at 
> org.apache.zookeeper.server.persistence.FileTxnLog.commit(FileTxnLog.java:331)
>   at 
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.commit(FileTxnSnapLog.java:380)
>   at org.apache.zookeeper.server.ZKDatabase.commit(ZKDatabase.java:563)
>   at 
> org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:178)
>   at 
> org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:113)
> 2015-08-14 15:41:18,559 [myid:100] - INFO  
> [SyncThread:100:ZooKeeperServer$ZooKeeperServerListenerImpl@500] - Thread 
> SyncThread:100 exits, error code 1
> 2015-08-14 15:41:18,559 [myid:100] - INFO  
> [SyncThread:100:ZooKeeperServer@523] - shutting down
> 2015-08-14 15:41:18,560 [myid:100] - INFO  
> [SyncThread:100:SessionTrackerImpl@232] - Shutting down
> 2015-08-14 15:41:18,560 [myid:100] - INFO  
> [SyncThread:100:LeaderRequestProcessor@77] - Shutting down
> 2015-08-14 15:41:18,560 [myid:100] - INFO  
> [SyncThread:100:PrepRequestProcessor@1035] - Shutting down
> 2015-08-14 15:41:18,560 [myid:100] - INFO  
> [SyncThread:100:ProposalRequestProcessor@88] - Shutting down
> 2015-08-14 15:41:18,561 [myid:100] - INFO  
> [SyncThread:100:CommitProcessor@356] - Shutting down
> 2015-08-14 15:41:18,561 [myid:100] - INFO  
> [CommitProcessor:100:CommitProcessor@191] - CommitProcessor exited loop!
> 2015-08-14 15:41:18,562 [myid:100] - INFO  
> [SyncThread:100:Leader$ToBeAppliedRequestProcessor@915] - Shutting down
> 2015-08-14 15:41:18,562 [myid:100] - INFO  
> [SyncThread:100:FinalRequestProcessor@646] - shutdown of request processor 
> complete
> 2015-08-14 15:41:18,562 [myid:100] - INFO  
> [SyncThread:100:SyncRequestProcessor@191] - Shutting down
> 2015-08-14 15:41:18,563 [myid:100] - INFO  [ProcessThread(sid:100 
> cport:-1)::PrepRequestProcessor@159] - PrepRequestProcessor exited loop!
> {code}
> After this exception Leader server still remains leader. After this non 
> recoverable exception the leader should go down and let other followers 
> become leader.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   3   4   5   6   7   8   9   >