from:"Camille Fournier \(JIRA\)"

[jira] [Comment Edited] (ZOOKEEPER-2792) [QP MutualAuth]: Port ZOOKEEPER-1045 implementation from branch-3.4 to branch-3.5

2017-08-22 Thread Camille Fournier (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16137729#comment-16137729
 ] 

Camille Fournier edited comment on ZOOKEEPER-2792 at 8/23/17 1:21 AM:
--

I'm confused, why isn't this in HEAD? [~hanm] I tried to do a merge back to 3.5 
for something from head and it failed because this is a major change missing 
which seems weird.


was (Author: fournc):
I'm confused, why isn't this in HEAD?

> [QP MutualAuth]: Port ZOOKEEPER-1045 implementation from branch-3.4 to 
> branch-3.5
> -
>
> Key: ZOOKEEPER-2792
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2792
> Project: ZooKeeper
>  Issue Type: Sub-task
>  Components: quorum, security
>Reporter: Rakesh R
>Assignee: Michael Han
> Fix For: 3.5.4
>
> Attachments: ZOOKEEPER-2792.patch
>
>
> This jira is to merge the basic working patch covering the authentication and 
> authorization of static(zoo.cfg) ZooKeeper servers from {{branch-3.4}} code 
> base.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (ZOOKEEPER-2792) [QP MutualAuth]: Port ZOOKEEPER-1045 implementation from branch-3.4 to branch-3.5

2017-08-22 Thread Camille Fournier (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16137729#comment-16137729
 ] 

Camille Fournier commented on ZOOKEEPER-2792:
-

I'm confused, why isn't this in HEAD?

> [QP MutualAuth]: Port ZOOKEEPER-1045 implementation from branch-3.4 to 
> branch-3.5
> -
>
> Key: ZOOKEEPER-2792
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2792
> Project: ZooKeeper
>  Issue Type: Sub-task
>  Components: quorum, security
>Reporter: Rakesh R
>Assignee: Michael Han
> Fix For: 3.5.4
>
> Attachments: ZOOKEEPER-2792.patch
>
>
> This jira is to merge the basic working patch covering the authentication and 
> authorization of static(zoo.cfg) ZooKeeper servers from {{branch-3.4}} code 
> base.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (ZOOKEEPER-1442) Uncaught exception handler should exit on a java.lang.Error

2017-07-28 Thread Camille Fournier (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16104966#comment-16104966
 ] 

Camille Fournier commented on ZOOKEEPER-1442:
-

Oh 2012, how we missed you. Apparently this is still an issue! We should 
probably fix it. 

> Uncaught exception handler should exit on a java.lang.Error
> ---
>
> Key: ZOOKEEPER-1442
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1442
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: java client, server
>Affects Versions: 3.4.3, 3.3.5
>Reporter: Jeremy Stribling
>Assignee: Jeremy Stribling
>Priority: Minor
> Attachments: ZOOKEEPER-1442.patch, ZOOKEEPER-1442.patch, 
> ZOOKEEPER-1442.patch
>
>
> The uncaught exception handler registered in NIOServerCnxnFactory and 
> ClientCnxn simply logs exceptions and lets the rest of ZooKeeper go on its 
> merry way.  However, errors such as OutOfMemoryErrors should really crash the 
> program, as they represent unrecoverable errors.  If the exception that gets 
> to the uncaught exception handler is an instanceof a java.lang.Error, ZK 
> should exit with an error code (in addition to logging the error).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (ZOOKEEPER-2770) ZooKeeper slow operation log

2017-07-25 Thread Camille Fournier (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16100640#comment-16100640
 ] 

Camille Fournier commented on ZOOKEEPER-2770:
-

Are there really 10s long slow requests? It's defaults like this that make me 
skeptical about the usefulness of this particular implementation. If we have a 
request through ZK that takes 10s to process your whole system is completely 
effed. 

I don't think we should add complexity to the code base without suitable 
justification for the value of the new feature. With that in mind, I'd like to 
understand what, specifically, the circumstances we're trying to measure are. 
It looks like processing time for a request through the ZK quorum alone, 
correct? The only network time that might be captured would be, in the case of 
a write, the quorum voting time.

I'm all for making ZK more operable and exposing metrics but I don't think 
exposing low-value metrics is worth the additional code complexity without 
justification.

> ZooKeeper slow operation log
> 
>
> Key: ZOOKEEPER-2770
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2770
> Project: ZooKeeper
>  Issue Type: Improvement
>Reporter: Karan Mehta
>Assignee: Karan Mehta
> Attachments: ZOOKEEPER-2770.001.patch, ZOOKEEPER-2770.002.patch, 
> ZOOKEEPER-2770.003.patch
>
>
> ZooKeeper is a complex distributed application. There are many reasons why 
> any given read or write operation may become slow: a software bug, a protocol 
> problem, a hardware issue with the commit log(s), a network issue. If the 
> problem is constant it is trivial to come to an understanding of the cause. 
> However in order to diagnose intermittent problems we often don't know where, 
> or when, to begin looking. We need some sort of timestamped indication of the 
> problem. Although ZooKeeper is not a datastore, it does persist data, and can 
> suffer intermittent performance degradation, and should consider implementing 
> a 'slow query' log, a feature very common to services which persist 
> information on behalf of clients which may be sensitive to latency while 
> waiting for confirmation of successful persistence.
> Log the client and request details if the server discovers, when finally 
> processing the request, that the current time minus arrival time of the 
> request is beyond a configured threshold. 
> Look at the HBase {{responseTooSlow}} feature for inspiration. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (ZOOKEEPER-2770) ZooKeeper slow operation log

2017-07-25 Thread Camille Fournier (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16100551#comment-16100551
 ] 

Camille Fournier commented on ZOOKEEPER-2770:
-

I completely agree with [~tdunning] I don't understand the motivation for this. 
Are we just timing the internal processing time for the request? ZK is not the 
same type of system as HBase so I'm not sure we are comparing apples to oranges 
trying to cross-implement this feature.

> ZooKeeper slow operation log
> 
>
> Key: ZOOKEEPER-2770
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2770
> Project: ZooKeeper
>  Issue Type: Improvement
>Reporter: Karan Mehta
>Assignee: Karan Mehta
> Attachments: ZOOKEEPER-2770.001.patch, ZOOKEEPER-2770.002.patch, 
> ZOOKEEPER-2770.003.patch
>
>
> ZooKeeper is a complex distributed application. There are many reasons why 
> any given read or write operation may become slow: a software bug, a protocol 
> problem, a hardware issue with the commit log(s), a network issue. If the 
> problem is constant it is trivial to come to an understanding of the cause. 
> However in order to diagnose intermittent problems we often don't know where, 
> or when, to begin looking. We need some sort of timestamped indication of the 
> problem. Although ZooKeeper is not a datastore, it does persist data, and can 
> suffer intermittent performance degradation, and should consider implementing 
> a 'slow query' log, a feature very common to services which persist 
> information on behalf of clients which may be sensitive to latency while 
> waiting for confirmation of successful persistence.
> Log the client and request details if the server discovers, when finally 
> processing the request, that the current time minus arrival time of the 
> request is beyond a configured threshold. 
> Look at the HBase {{responseTooSlow}} feature for inspiration. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (ZOOKEEPER-2733) Cleanup findbug warnings in branch-3.4: Dodgy code Warnings

2017-06-21 Thread Camille Fournier (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16058409#comment-16058409
 ] 

Camille Fournier commented on ZOOKEEPER-2733:
-

Did this cause a javadoc regression in the branch? We've got build failures on 
outstanding PRs in this branch because of that [~rakeshr]

> Cleanup findbug warnings in branch-3.4: Dodgy code Warnings
> ---
>
> Key: ZOOKEEPER-2733
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2733
> Project: ZooKeeper
>  Issue Type: Sub-task
>Affects Versions: 3.4.10
>Reporter: Rakesh R
>Assignee: Abraham Fine
> Fix For: 3.4.11
>
>
> Please refer the attached sheet in parent jira. Below is the details of 
> findbug warnings.
> {code}
> DB
> org.apache.zookeeper.server.quorum.auth.SaslQuorumAuthLearner.send(DataOutputStream,
>  byte[]) uses the same code for two branches
> DLS   Dead store to txn in 
> org.apache.zookeeper.server.quorum.LearnerHandler.packetToString(QuorumPacket)
> NPLoad of known null value in 
> org.apache.zookeeper.server.PrepRequestProcessor.pRequest(Request)
> NPPossible null pointer dereference in 
> org.apache.zookeeper.server.PurgeTxnLog.purgeOlderSnapshots(FileTxnSnapLog, 
> File) due to return value of called method
> NPPossible null pointer dereference in 
> org.apache.zookeeper.server.PurgeTxnLog.purgeOlderSnapshots(FileTxnSnapLog, 
> File) due to return value of called method
> NPLoad of known null value in 
> org.apache.zookeeper.server.quorum.auth.SaslQuorumAuthLearner.send(DataOutputStream,
>  byte[])
> NPLoad of known null value in 
> org.apache.zookeeper.server.quorum.auth.SaslQuorumAuthServer.send(DataOutputStream,
>  byte[], QuorumAuth$Status)
> NPPossible null pointer dereference in 
> org.apache.zookeeper.server.upgrade.UpgradeMain.copyFiles(File, File, String) 
> due to return value of called method
> RCN   Redundant nullcheck of bytes, which is known to be non-null in 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next()
> SFSwitch statement found in 
> org.apache.zookeeper.server.PrepRequestProcessor.pRequest(Request) where 
> default case is missing
> SFSwitch statement found in 
> org.apache.zookeeper.server.PrepRequestProcessor.pRequest2Txn(int, long, 
> Request, Record, boolean) where default case is missing
> SFSwitch statement found in 
> org.apache.zookeeper.server.quorum.AuthFastLeaderElection$Messenger$WorkerReceiver.run()
>  where default case is missing
> SFSwitch statement found in 
> org.apache.zookeeper.server.quorum.AuthFastLeaderElection$Messenger$WorkerSender.process(AuthFastLeaderElection$ToSend)
>  where default case is missing
> SFSwitch statement found in 
> org.apache.zookeeper.server.quorum.Follower.processPacket(QuorumPacket) where 
> default case is missing
> SFSwitch statement found in 
> org.apache.zookeeper.server.quorum.Observer.processPacket(QuorumPacket) where 
> default case is missing
> STWrite to static field 
> org.apache.zookeeper.server.SyncRequestProcessor.randRoll from instance 
> method org.apache.zookeeper.server.SyncRequestProcessor.run()
> UrF   Unread public/protected field: 
> org.apache.zookeeper.server.upgrade.DataTreeV1$ProcessTxnResult.err
> UrF   Unread public/protected field: 
> org.apache.zookeeper.server.upgrade.DataTreeV1$ProcessTxnResult.path
> UrF   Unread public/protected field: 
> org.apache.zookeeper.server.upgrade.DataTreeV1$ProcessTxnResult.stat
> UrF   Unread public/protected field: 
> org.apache.zookeeper.server.upgrade.DataTreeV1$ProcessTxnResult.type
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Resolved] (ZOOKEEPER-2719) Port ZOOKEEPER-2169 to 3.5 branch

2017-03-17 Thread Camille Fournier (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Camille Fournier resolved ZOOKEEPER-2719.
-
Resolution: Fixed

Issue resolved by pull request 192
[https://github.com/apache/zookeeper/pull/192]

> Port ZOOKEEPER-2169 to 3.5 branch
> -
>
> Key: ZOOKEEPER-2719
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2719
> Project: ZooKeeper
>  Issue Type: New Feature
>  Components: java client, server
>Reporter: Jordan Zimmerman
>Assignee: Jordan Zimmerman
> Fix For: 3.5.3
>
>
> ZOOKEEPER-2169 is a useful feature that should be deployed sooner than later. 
> Take the work done in the master branch and port it to the 3.5 branch



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Resolved] (ZOOKEEPER-2724) Skip cert files for releaseaudit target.

2017-03-17 Thread Camille Fournier (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Camille Fournier resolved ZOOKEEPER-2724.
-
Resolution: Fixed

> Skip cert files for releaseaudit target.
> 
>
> Key: ZOOKEEPER-2724
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2724
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: build
>Affects Versions: 3.5.2
>Reporter: Michael Han
>Assignee: Michael Han
>Priority: Blocker
>  Labels: build
> Fix For: 3.5.3
>
>
> In branch-3.5 release auditing generating warnings against cert files as 
> these files don't contain Apache License (AL) header. I don't think these 
> files should be checked because they are not source files, and we skip them 
> in master branch. We should do the same for branch-3.5 by skipping these cert 
> files as well. This should be fixed before 3.5.3 release.
> Attach the snippet of warning for reference:
> {noformat}
> [rat:report]  !? 
> /home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build/build/zookeeper-3.5.3-alpha-SNAPSHOT/contrib/rest/conf/keys/rest.cer
> [rat:report]  !? 
> /home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build/build/zookeeper-3.5.3-alpha-SNAPSHOT/src/contrib/rest/conf/keys/rest.cer
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (ZOOKEEPER-2608) Create CLI option for TTL ephemerals

2017-03-14 Thread Camille Fournier (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15924837#comment-15924837
 ] 

Camille Fournier commented on ZOOKEEPER-2608:
-

Is there a PR or just the patches here [~randgalt]?

> Create CLI option for TTL ephemerals
> 
>
> Key: ZOOKEEPER-2608
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2608
> Project: ZooKeeper
>  Issue Type: Sub-task
>  Components: c client, java client, jute, server
>Reporter: Camille Fournier
>Assignee: Jordan Zimmerman
> Fix For: 3.6.0
>
> Attachments: ZOOKEEPER-2608-2.patch, ZOOKEEPER-2608-3.patch, 
> ZOOKEEPER-2608.patch
>
>
> Need to update CreateCommand to have the TTL node option



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (ZOOKEEPER-2608) Create CLI option for TTL ephemerals

2017-03-14 Thread Camille Fournier (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15924824#comment-15924824
 ] 

Camille Fournier commented on ZOOKEEPER-2608:
-

looking

> Create CLI option for TTL ephemerals
> 
>
> Key: ZOOKEEPER-2608
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2608
> Project: ZooKeeper
>  Issue Type: Sub-task
>  Components: c client, java client, jute, server
>Reporter: Camille Fournier
>Assignee: Jordan Zimmerman
> Fix For: 3.6.0
>
> Attachments: ZOOKEEPER-2608-2.patch, ZOOKEEPER-2608-3.patch, 
> ZOOKEEPER-2608.patch
>
>
> Need to update CreateCommand to have the TTL node option



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (ZOOKEEPER-2615) Zookeeper server holds onto dead/expired session ids in the watch data structures

2016-10-28 Thread Camille Fournier (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15615399#comment-15615399
 ] 

Camille Fournier commented on ZOOKEEPER-2615:
-

Because watches are one-time triggers, you could try triggering the watch by 
removing, adding, or changing the data in the watched nodes. I believe that 
should remove all set watches. But there's no easy fix that I can see.

> Zookeeper server holds onto dead/expired session ids in the watch data 
> structures
> -
>
> Key: ZOOKEEPER-2615
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2615
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.6
>Reporter: guoping.gp
>Assignee: Camille Fournier
> Fix For: 3.4.10, 3.5.3, 3.6.0
>
>
> The same issue (https://issues.apache.org/jira/browse/ZOOKEEPER-1382) still 
> can be found even with zookeeper 3.4.6.
> this issue cause our production zookeeper cluster leak about 1 million 
> watchs, after restart the server one by one, the watch count decrease to only 
> about 4.
> I can reproduce the issue on my mac,here it is:
> 
> pguodeMacBook-Air:bin pguo$ echo srvr | nc localhost 6181
> Zookeeper version: 3.4.6-1569965, built on 02/20/2014 09:09 GMT
> Latency min/avg/max: 0/1156/128513
> Received: 539
> Sent: 531
> Connections: 1
> Outstanding: 0
> Zxid: 0x10037
> Mode: follower
> Node count: 5
> 
> pguodeMacBook-Air:bin pguo$ echo cons | nc localhost 6181
>  
> /127.0.0.1:55759[1](queued=0,recved=5,sent=5,sid=0x157be2732de,lop=PING,est=1476372631116,to=15000,lcxid=0x1,lzxid=0x,lresp=1476372646260,llat=8,minlat=0,avglat=6,maxlat=17)
>  /0:0:0:0:0:0:0:1:55767[0](queued=0,recved=1,sent=0)
> 
> pguodeMacBook-Air:bin pguo$ echo wchp | nc localhost 6181
> /curator_exists_watch
>   0x357be48e4d90007
>   0x357be48e4d90009
>   0x157be2732de
> as above 4-letter's report shows, 0x357be48e4d90007 and 0x357be48e4d90009 
> are leaked after the two sessions expired 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ZOOKEEPER-2615) Zookeeper server holds onto dead/expired session ids in the watch data structures

2016-10-16 Thread Camille Fournier (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15580096#comment-15580096
 ] 

Camille Fournier commented on ZOOKEEPER-2615:
-

I noticed that as well. Does anyone actually use the Netty implementations in 
production?

> Zookeeper server holds onto dead/expired session ids in the watch data 
> structures
> -
>
> Key: ZOOKEEPER-2615
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2615
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.6
>Reporter: guoping.gp
>Assignee: Camille Fournier
> Fix For: 3.4.10, 3.5.3, 3.6.0
>
>
> The same issue (https://issues.apache.org/jira/browse/ZOOKEEPER-1382) still 
> can be found even with zookeeper 3.4.6.
> this issue cause our production zookeeper cluster leak about 1 million 
> watchs, after restart the server one by one, the watch count decrease to only 
> about 4.
> I can reproduce the issue on my mac,here it is:
> 
> pguodeMacBook-Air:bin pguo$ echo srvr | nc localhost 6181
> Zookeeper version: 3.4.6-1569965, built on 02/20/2014 09:09 GMT
> Latency min/avg/max: 0/1156/128513
> Received: 539
> Sent: 531
> Connections: 1
> Outstanding: 0
> Zxid: 0x10037
> Mode: follower
> Node count: 5
> 
> pguodeMacBook-Air:bin pguo$ echo cons | nc localhost 6181
>  
> /127.0.0.1:55759[1](queued=0,recved=5,sent=5,sid=0x157be2732de,lop=PING,est=1476372631116,to=15000,lcxid=0x1,lzxid=0x,lresp=1476372646260,llat=8,minlat=0,avglat=6,maxlat=17)
>  /0:0:0:0:0:0:0:1:55767[0](queued=0,recved=1,sent=0)
> 
> pguodeMacBook-Air:bin pguo$ echo wchp | nc localhost 6181
> /curator_exists_watch
>   0x357be48e4d90007
>   0x357be48e4d90009
>   0x157be2732de
> as above 4-letter's report shows, 0x357be48e4d90007 and 0x357be48e4d90009 
> are leaked after the two sessions expired 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ZOOKEEPER-2615) Zookeeper server holds onto dead/expired session ids in the watch data structures

2016-10-14 Thread Camille Fournier (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15576271#comment-15576271
 ] 

Camille Fournier commented on ZOOKEEPER-2615:
-

So it looks like all of the watches (not just the watches resent by the client) 
can suffer from a race between closing the client and clearing the watches for 
that connection and actually storing the watches in the data structure. This is 
also a bug in trunk. I'm playing around with it (and have the world's ugliest 
hacks to repro in tests + judicious thread.sleeps) but heads-up to folks 
([~phunt] [~fpj] etc)

> Zookeeper server holds onto dead/expired session ids in the watch data 
> structures
> -
>
> Key: ZOOKEEPER-2615
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2615
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.6
>Reporter: guoping.gp
>
> The same issue (https://issues.apache.org/jira/browse/ZOOKEEPER-1382) still 
> can be found even with zookeeper 3.4.6.
> this issue cause our production zookeeper cluster leak about 1 million 
> watchs, after restart the server one by one, the watch count decrease to only 
> about 4.
> I can reproduce the issue on my mac,here it is:
> 
> pguodeMacBook-Air:bin pguo$ echo srvr | nc localhost 6181
> Zookeeper version: 3.4.6-1569965, built on 02/20/2014 09:09 GMT
> Latency min/avg/max: 0/1156/128513
> Received: 539
> Sent: 531
> Connections: 1
> Outstanding: 0
> Zxid: 0x10037
> Mode: follower
> Node count: 5
> 
> pguodeMacBook-Air:bin pguo$ echo cons | nc localhost 6181
>  
> /127.0.0.1:55759[1](queued=0,recved=5,sent=5,sid=0x157be2732de,lop=PING,est=1476372631116,to=15000,lcxid=0x1,lzxid=0x,lresp=1476372646260,llat=8,minlat=0,avglat=6,maxlat=17)
>  /0:0:0:0:0:0:0:1:55767[0](queued=0,recved=1,sent=0)
> 
> pguodeMacBook-Air:bin pguo$ echo wchp | nc localhost 6181
> /curator_exists_watch
>   0x357be48e4d90007
>   0x357be48e4d90009
>   0x157be2732de
> as above 4-letter's report shows, 0x357be48e4d90007 and 0x357be48e4d90009 
> are leaked after the two sessions expired 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ZOOKEEPER-2615) Zookeeper server holds onto dead/expired session ids in the watch data structures

2016-10-14 Thread Camille Fournier (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15575936#comment-15575936
 ] 

Camille Fournier commented on ZOOKEEPER-2615:
-

OK yup. This is a different race than what we fixed in ZOOKEEPER-1382. Because 
we don't check to see if the watch we're setting is still on a live connection 
when we set it, we can totally race to set a watch vs deleting watches from a 
connection.

Thinking.

> Zookeeper server holds onto dead/expired session ids in the watch data 
> structures
> -
>
> Key: ZOOKEEPER-2615
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2615
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.6
>Reporter: guoping.gp
>
> The same issue (https://issues.apache.org/jira/browse/ZOOKEEPER-1382) still 
> can be found even with zookeeper 3.4.6.
> this issue cause our production zookeeper cluster leak about 1 million 
> watchs, after restart the server one by one, the watch count decrease to only 
> about 4.
> I can reproduce the issue on my mac,here it is:
> 
> pguodeMacBook-Air:bin pguo$ echo srvr | nc localhost 6181
> Zookeeper version: 3.4.6-1569965, built on 02/20/2014 09:09 GMT
> Latency min/avg/max: 0/1156/128513
> Received: 539
> Sent: 531
> Connections: 1
> Outstanding: 0
> Zxid: 0x10037
> Mode: follower
> Node count: 5
> 
> pguodeMacBook-Air:bin pguo$ echo cons | nc localhost 6181
>  
> /127.0.0.1:55759[1](queued=0,recved=5,sent=5,sid=0x157be2732de,lop=PING,est=1476372631116,to=15000,lcxid=0x1,lzxid=0x,lresp=1476372646260,llat=8,minlat=0,avglat=6,maxlat=17)
>  /0:0:0:0:0:0:0:1:55767[0](queued=0,recved=1,sent=0)
> 
> pguodeMacBook-Air:bin pguo$ echo wchp | nc localhost 6181
> /curator_exists_watch
>   0x357be48e4d90007
>   0x357be48e4d90009
>   0x157be2732de
> as above 4-letter's report shows, 0x357be48e4d90007 and 0x357be48e4d90009 
> are leaked after the two sessions expired 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ZOOKEEPER-2615) Zookeeper server holds onto dead/expired session ids in the watch data structures

2016-10-14 Thread Camille Fournier (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15575840#comment-15575840
 ] 

Camille Fournier commented on ZOOKEEPER-2615:
-

Looking

> Zookeeper server holds onto dead/expired session ids in the watch data 
> structures
> -
>
> Key: ZOOKEEPER-2615
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2615
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.6
>Reporter: guoping.gp
>
> The same issue (https://issues.apache.org/jira/browse/ZOOKEEPER-1382) still 
> can be found even with zookeeper 3.4.6.
> this issue cause our production zookeeper cluster leak about 1 million 
> watchs, after restart the server one by one, the watch count decrease to only 
> about 4.
> I can reproduce the issue on my mac,here it is:
> 
> pguodeMacBook-Air:bin pguo$ echo srvr | nc localhost 6181
> Zookeeper version: 3.4.6-1569965, built on 02/20/2014 09:09 GMT
> Latency min/avg/max: 0/1156/128513
> Received: 539
> Sent: 531
> Connections: 1
> Outstanding: 0
> Zxid: 0x10037
> Mode: follower
> Node count: 5
> 
> pguodeMacBook-Air:bin pguo$ echo cons | nc localhost 6181
>  
> /127.0.0.1:55759[1](queued=0,recved=5,sent=5,sid=0x157be2732de,lop=PING,est=1476372631116,to=15000,lcxid=0x1,lzxid=0x,lresp=1476372646260,llat=8,minlat=0,avglat=6,maxlat=17)
>  /0:0:0:0:0:0:0:1:55767[0](queued=0,recved=1,sent=0)
> 
> pguodeMacBook-Air:bin pguo$ echo wchp | nc localhost 6181
> /curator_exists_watch
>   0x357be48e4d90007
>   0x357be48e4d90009
>   0x157be2732de
> as above 4-letter's report shows, 0x357be48e4d90007 and 0x357be48e4d90009 
> are leaked after the two sessions expired 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ZOOKEEPER-2169) Enable creation of nodes with TTLs

2016-10-09 Thread Camille Fournier (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15560125#comment-15560125
 ] 

Camille Fournier commented on ZOOKEEPER-2169:
-

[~randgalt] I wanted to leave it until we got the CLI change, since I don't 
really remember how JIRA subtasks work. And I took the liberty of removing the 
extra import in my push, so no need to worry about that.

> Enable creation of nodes with TTLs
> --
>
> Key: ZOOKEEPER-2169
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2169
> Project: ZooKeeper
>  Issue Type: New Feature
>  Components: c client, java client, jute, server
>Affects Versions: 3.6.0
>Reporter: Camille Fournier
>Assignee: Jordan Zimmerman
> Fix For: 3.6.0
>
> Attachments: ZOOKEEPER-2169-2.patch, ZOOKEEPER-2169-3.patch, 
> ZOOKEEPER-2169-4.patch, ZOOKEEPER-2169-5.patch, ZOOKEEPER-2169-6.patch, 
> ZOOKEEPER-2169-7.patch, ZOOKEEPER-2169-8.patch, ZOOKEEPER-2169-9.patch, 
> ZOOKEEPER-2169.patch
>
>
> As a user, I would like to be able to create a node that is NOT tied to a 
> session but that WILL expire automatically if action is not taken by some 
> client within a time window.
> I propose this to enable clients interacting with ZK via http or other "thin 
> clients" to create ephemeral-like nodes.
> Some ideas for the design, up for discussion:
> The node should support all normal ZK node operations including ACLs, 
> sequential key generation, etc, however, it should not support the ephemeral 
> flag. The node will be created with a TTL that is updated via a refresh 
> operation. 
> The ZK quorum will watch this node similarly to the way that it watches for 
> session liveness; if the node is not refreshed within the TTL, it will expire.
> QUESTIONS:
> 1) Should we let the refresh operation set the TTL to a different base value?
> 2) If so, should the setting of the TTL to a new base value cause a watch to 
> fire?
> 3) Do we want to allow these nodes to have children or prevent this similar 
> to ephemeral nodes?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ZOOKEEPER-2169) Enable creation of nodes with TTLs

2016-10-08 Thread Camille Fournier (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15558442#comment-15558442
 ] 

Camille Fournier commented on ZOOKEEPER-2169:
-

This error passes locally and has nothing to do with this patch. I'll wait for 
the subtask to finish then close this completely. Thanks [~randgalt]

> Enable creation of nodes with TTLs
> --
>
> Key: ZOOKEEPER-2169
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2169
> Project: ZooKeeper
>  Issue Type: New Feature
>  Components: c client, java client, jute, server
>Affects Versions: 3.6.0
>Reporter: Camille Fournier
>Assignee: Jordan Zimmerman
> Fix For: 3.6.0
>
> Attachments: ZOOKEEPER-2169-2.patch, ZOOKEEPER-2169-3.patch, 
> ZOOKEEPER-2169-4.patch, ZOOKEEPER-2169-5.patch, ZOOKEEPER-2169-6.patch, 
> ZOOKEEPER-2169-7.patch, ZOOKEEPER-2169.patch
>
>
> As a user, I would like to be able to create a node that is NOT tied to a 
> session but that WILL expire automatically if action is not taken by some 
> client within a time window.
> I propose this to enable clients interacting with ZK via http or other "thin 
> clients" to create ephemeral-like nodes.
> Some ideas for the design, up for discussion:
> The node should support all normal ZK node operations including ACLs, 
> sequential key generation, etc, however, it should not support the ephemeral 
> flag. The node will be created with a TTL that is updated via a refresh 
> operation. 
> The ZK quorum will watch this node similarly to the way that it watches for 
> session liveness; if the node is not refreshed within the TTL, it will expire.
> QUESTIONS:
> 1) Should we let the refresh operation set the TTL to a different base value?
> 2) If so, should the setting of the TTL to a new base value cause a watch to 
> fire?
> 3) Do we want to allow these nodes to have children or prevent this similar 
> to ephemeral nodes?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (ZOOKEEPER-2608) Create CLI option for TTL ephemerals

2016-10-07 Thread Camille Fournier (JIRA)

Camille Fournier created ZOOKEEPER-2608:
---

 Summary: Create CLI option for TTL ephemerals
 Key: ZOOKEEPER-2608
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2608
 Project: ZooKeeper
  Issue Type: Sub-task
Reporter: Camille Fournier
Assignee: Jordan Zimmerman


Need to update CreateCommand to have the TTL node option



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Issue Comment Deleted] (ZOOKEEPER-2169) Enable creation of nodes with TTLs

2016-10-07 Thread Camille Fournier (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Camille Fournier updated ZOOKEEPER-2169:

Comment: was deleted

(was: One thing we're missing is a zkCli option for creating these. 
Specifically, CreateCommand needs to be updated. [~randgalt] can you create a 
child ticket for that? I don't think it's a blocker but we don't want to lose 
it.)

> Enable creation of nodes with TTLs
> --
>
> Key: ZOOKEEPER-2169
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2169
> Project: ZooKeeper
>  Issue Type: New Feature
>  Components: c client, java client, jute, server
>Affects Versions: 3.6.0
>Reporter: Camille Fournier
>Assignee: Jordan Zimmerman
> Fix For: 3.6.0
>
> Attachments: ZOOKEEPER-2169-2.patch, ZOOKEEPER-2169-3.patch, 
> ZOOKEEPER-2169-4.patch, ZOOKEEPER-2169-5.patch, ZOOKEEPER-2169-6.patch, 
> ZOOKEEPER-2169-7.patch, ZOOKEEPER-2169.patch
>
>
> As a user, I would like to be able to create a node that is NOT tied to a 
> session but that WILL expire automatically if action is not taken by some 
> client within a time window.
> I propose this to enable clients interacting with ZK via http or other "thin 
> clients" to create ephemeral-like nodes.
> Some ideas for the design, up for discussion:
> The node should support all normal ZK node operations including ACLs, 
> sequential key generation, etc, however, it should not support the ephemeral 
> flag. The node will be created with a TTL that is updated via a refresh 
> operation. 
> The ZK quorum will watch this node similarly to the way that it watches for 
> session liveness; if the node is not refreshed within the TTL, it will expire.
> QUESTIONS:
> 1) Should we let the refresh operation set the TTL to a different base value?
> 2) If so, should the setting of the TTL to a new base value cause a watch to 
> fire?
> 3) Do we want to allow these nodes to have children or prevent this similar 
> to ephemeral nodes?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ZOOKEEPER-2169) Enable creation of nodes with TTLs

2016-10-07 Thread Camille Fournier (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15556059#comment-15556059
 ] 

Camille Fournier commented on ZOOKEEPER-2169:
-

OK so with the exception of removing an extraneous import statement this looks 
good to go I think. I can remove the import myself, assuming I can figure out 
how to actually push the changes in the new git workflow.

> Enable creation of nodes with TTLs
> --
>
> Key: ZOOKEEPER-2169
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2169
> Project: ZooKeeper
>  Issue Type: New Feature
>  Components: c client, java client, jute, server
>Affects Versions: 3.6.0
>Reporter: Camille Fournier
>Assignee: Jordan Zimmerman
> Fix For: 3.6.0
>
> Attachments: ZOOKEEPER-2169-2.patch, ZOOKEEPER-2169-3.patch, 
> ZOOKEEPER-2169-4.patch, ZOOKEEPER-2169-5.patch, ZOOKEEPER-2169-6.patch, 
> ZOOKEEPER-2169-7.patch, ZOOKEEPER-2169.patch
>
>
> As a user, I would like to be able to create a node that is NOT tied to a 
> session but that WILL expire automatically if action is not taken by some 
> client within a time window.
> I propose this to enable clients interacting with ZK via http or other "thin 
> clients" to create ephemeral-like nodes.
> Some ideas for the design, up for discussion:
> The node should support all normal ZK node operations including ACLs, 
> sequential key generation, etc, however, it should not support the ephemeral 
> flag. The node will be created with a TTL that is updated via a refresh 
> operation. 
> The ZK quorum will watch this node similarly to the way that it watches for 
> session liveness; if the node is not refreshed within the TTL, it will expire.
> QUESTIONS:
> 1) Should we let the refresh operation set the TTL to a different base value?
> 2) If so, should the setting of the TTL to a new base value cause a watch to 
> fire?
> 3) Do we want to allow these nodes to have children or prevent this similar 
> to ephemeral nodes?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ZOOKEEPER-2169) Enable creation of nodes with TTLs

2016-10-07 Thread Camille Fournier (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15556039#comment-15556039
 ] 

Camille Fournier commented on ZOOKEEPER-2169:
-

One thing we're missing is a zkCli option for creating these. Specifically, 
CreateCommand needs to be updated. [~randgalt] can you create a child ticket 
for that? I don't think it's a blocker but we don't want to lose it.

> Enable creation of nodes with TTLs
> --
>
> Key: ZOOKEEPER-2169
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2169
> Project: ZooKeeper
>  Issue Type: New Feature
>  Components: c client, java client, jute, server
>Affects Versions: 3.6.0
>Reporter: Camille Fournier
>Assignee: Jordan Zimmerman
> Fix For: 3.6.0
>
> Attachments: ZOOKEEPER-2169-2.patch, ZOOKEEPER-2169-3.patch, 
> ZOOKEEPER-2169-4.patch, ZOOKEEPER-2169-5.patch, ZOOKEEPER-2169-6.patch, 
> ZOOKEEPER-2169-7.patch, ZOOKEEPER-2169.patch
>
>
> As a user, I would like to be able to create a node that is NOT tied to a 
> session but that WILL expire automatically if action is not taken by some 
> client within a time window.
> I propose this to enable clients interacting with ZK via http or other "thin 
> clients" to create ephemeral-like nodes.
> Some ideas for the design, up for discussion:
> The node should support all normal ZK node operations including ACLs, 
> sequential key generation, etc, however, it should not support the ephemeral 
> flag. The node will be created with a TTL that is updated via a refresh 
> operation. 
> The ZK quorum will watch this node similarly to the way that it watches for 
> session liveness; if the node is not refreshed within the TTL, it will expire.
> QUESTIONS:
> 1) Should we let the refresh operation set the TTL to a different base value?
> 2) If so, should the setting of the TTL to a new base value cause a watch to 
> fire?
> 3) Do we want to allow these nodes to have children or prevent this similar 
> to ephemeral nodes?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ZOOKEEPER-2169) Enable creation of nodes with TTLs

2016-10-07 Thread Camille Fournier (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1502#comment-1502
 ] 

Camille Fournier commented on ZOOKEEPER-2169:
-

Looks like we need you to generate a patch that buildbot can use, [~randgalt]

> Enable creation of nodes with TTLs
> --
>
> Key: ZOOKEEPER-2169
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2169
> Project: ZooKeeper
>  Issue Type: New Feature
>  Components: c client, java client, jute, server
>Affects Versions: 3.6.0
>Reporter: Camille Fournier
>Assignee: Jordan Zimmerman
> Fix For: 3.6.0
>
> Attachments: ZOOKEEPER-2169-2.patch, ZOOKEEPER-2169-3.patch, 
> ZOOKEEPER-2169-4.patch, ZOOKEEPER-2169-5.patch, ZOOKEEPER-2169-6.patch, 
> ZOOKEEPER-2169.patch
>
>
> As a user, I would like to be able to create a node that is NOT tied to a 
> session but that WILL expire automatically if action is not taken by some 
> client within a time window.
> I propose this to enable clients interacting with ZK via http or other "thin 
> clients" to create ephemeral-like nodes.
> Some ideas for the design, up for discussion:
> The node should support all normal ZK node operations including ACLs, 
> sequential key generation, etc, however, it should not support the ephemeral 
> flag. The node will be created with a TTL that is updated via a refresh 
> operation. 
> The ZK quorum will watch this node similarly to the way that it watches for 
> session liveness; if the node is not refreshed within the TTL, it will expire.
> QUESTIONS:
> 1) Should we let the refresh operation set the TTL to a different base value?
> 2) If so, should the setting of the TTL to a new base value cause a watch to 
> fire?
> 3) Do we want to allow these nodes to have children or prevent this similar 
> to ephemeral nodes?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ZOOKEEPER-2169) Enable creation of nodes with TTLs

2016-10-07 Thread Camille Fournier (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1485#comment-1485
 ] 

Camille Fournier commented on ZOOKEEPER-2169:
-

I just did it, let's see if it works

> Enable creation of nodes with TTLs
> --
>
> Key: ZOOKEEPER-2169
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2169
> Project: ZooKeeper
>  Issue Type: New Feature
>  Components: c client, java client, jute, server
>Affects Versions: 3.6.0
>Reporter: Camille Fournier
>Assignee: Jordan Zimmerman
> Fix For: 3.6.0
>
> Attachments: ZOOKEEPER-2169-2.patch, ZOOKEEPER-2169-3.patch, 
> ZOOKEEPER-2169-4.patch, ZOOKEEPER-2169-5.patch, ZOOKEEPER-2169-6.patch, 
> ZOOKEEPER-2169.patch
>
>
> As a user, I would like to be able to create a node that is NOT tied to a 
> session but that WILL expire automatically if action is not taken by some 
> client within a time window.
> I propose this to enable clients interacting with ZK via http or other "thin 
> clients" to create ephemeral-like nodes.
> Some ideas for the design, up for discussion:
> The node should support all normal ZK node operations including ACLs, 
> sequential key generation, etc, however, it should not support the ephemeral 
> flag. The node will be created with a TTL that is updated via a refresh 
> operation. 
> The ZK quorum will watch this node similarly to the way that it watches for 
> session liveness; if the node is not refreshed within the TTL, it will expire.
> QUESTIONS:
> 1) Should we let the refresh operation set the TTL to a different base value?
> 2) If so, should the setting of the TTL to a new base value cause a watch to 
> fire?
> 3) Do we want to allow these nodes to have children or prevent this similar 
> to ephemeral nodes?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (ZOOKEEPER-2169) Enable creation of nodes with TTLs

2016-10-07 Thread Camille Fournier (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Camille Fournier updated ZOOKEEPER-2169:

Attachment: ZOOKEEPER-2169-6.patch

> Enable creation of nodes with TTLs
> --
>
> Key: ZOOKEEPER-2169
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2169
> Project: ZooKeeper
>  Issue Type: New Feature
>  Components: c client, java client, jute, server
>Affects Versions: 3.6.0
>Reporter: Camille Fournier
>Assignee: Jordan Zimmerman
> Fix For: 3.6.0
>
> Attachments: ZOOKEEPER-2169-2.patch, ZOOKEEPER-2169-3.patch, 
> ZOOKEEPER-2169-4.patch, ZOOKEEPER-2169-5.patch, ZOOKEEPER-2169-6.patch, 
> ZOOKEEPER-2169.patch
>
>
> As a user, I would like to be able to create a node that is NOT tied to a 
> session but that WILL expire automatically if action is not taken by some 
> client within a time window.
> I propose this to enable clients interacting with ZK via http or other "thin 
> clients" to create ephemeral-like nodes.
> Some ideas for the design, up for discussion:
> The node should support all normal ZK node operations including ACLs, 
> sequential key generation, etc, however, it should not support the ephemeral 
> flag. The node will be created with a TTL that is updated via a refresh 
> operation. 
> The ZK quorum will watch this node similarly to the way that it watches for 
> session liveness; if the node is not refreshed within the TTL, it will expire.
> QUESTIONS:
> 1) Should we let the refresh operation set the TTL to a different base value?
> 2) If so, should the setting of the TTL to a new base value cause a watch to 
> fire?
> 3) Do we want to allow these nodes to have children or prevent this similar 
> to ephemeral nodes?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ZOOKEEPER-2169) Enable creation of nodes with TTLs

2016-10-06 Thread Camille Fournier (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15552865#comment-15552865
 ] 

Camille Fournier commented on ZOOKEEPER-2169:
-

Sorry [~randgalt], the asynchronicity of all of this is brutal. Looking.

> Enable creation of nodes with TTLs
> --
>
> Key: ZOOKEEPER-2169
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2169
> Project: ZooKeeper
>  Issue Type: New Feature
>  Components: c client, java client, jute, server
>Affects Versions: 3.6.0
>Reporter: Camille Fournier
>Assignee: Jordan Zimmerman
> Fix For: 3.6.0
>
> Attachments: ZOOKEEPER-2169-2.patch, ZOOKEEPER-2169-3.patch, 
> ZOOKEEPER-2169-4.patch, ZOOKEEPER-2169-5.patch, ZOOKEEPER-2169.patch
>
>
> As a user, I would like to be able to create a node that is NOT tied to a 
> session but that WILL expire automatically if action is not taken by some 
> client within a time window.
> I propose this to enable clients interacting with ZK via http or other "thin 
> clients" to create ephemeral-like nodes.
> Some ideas for the design, up for discussion:
> The node should support all normal ZK node operations including ACLs, 
> sequential key generation, etc, however, it should not support the ephemeral 
> flag. The node will be created with a TTL that is updated via a refresh 
> operation. 
> The ZK quorum will watch this node similarly to the way that it watches for 
> session liveness; if the node is not refreshed within the TTL, it will expire.
> QUESTIONS:
> 1) Should we let the refresh operation set the TTL to a different base value?
> 2) If so, should the setting of the TTL to a new base value cause a watch to 
> fire?
> 3) Do we want to allow these nodes to have children or prevent this similar 
> to ephemeral nodes?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ZOOKEEPER-2169) Enable creation of nodes with TTLs

2016-09-03 Thread Camille Fournier (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15461319#comment-15461319
 ] 

Camille Fournier commented on ZOOKEEPER-2169:
-

My instinct is that we get this one out, make a separate ticket for Touch that 
you can start working on now, but not try to add additional complexity to the 
initial patch

> Enable creation of nodes with TTLs
> --
>
> Key: ZOOKEEPER-2169
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2169
> Project: ZooKeeper
>  Issue Type: New Feature
>  Components: c client, java client, jute, server
>Affects Versions: 3.6.0
>Reporter: Camille Fournier
>Assignee: Jordan Zimmerman
> Fix For: 3.6.0
>
> Attachments: ZOOKEEPER-2169-2.patch, ZOOKEEPER-2169-3.patch, 
> ZOOKEEPER-2169-4.patch, ZOOKEEPER-2169-5.patch, ZOOKEEPER-2169.patch
>
>
> As a user, I would like to be able to create a node that is NOT tied to a 
> session but that WILL expire automatically if action is not taken by some 
> client within a time window.
> I propose this to enable clients interacting with ZK via http or other "thin 
> clients" to create ephemeral-like nodes.
> Some ideas for the design, up for discussion:
> The node should support all normal ZK node operations including ACLs, 
> sequential key generation, etc, however, it should not support the ephemeral 
> flag. The node will be created with a TTL that is updated via a refresh 
> operation. 
> The ZK quorum will watch this node similarly to the way that it watches for 
> session liveness; if the node is not refreshed within the TTL, it will expire.
> QUESTIONS:
> 1) Should we let the refresh operation set the TTL to a different base value?
> 2) If so, should the setting of the TTL to a new base value cause a watch to 
> fire?
> 3) Do we want to allow these nodes to have children or prevent this similar 
> to ephemeral nodes?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ZOOKEEPER-2169) Enable creation of nodes with TTLs

2016-09-03 Thread Camille Fournier (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15461301#comment-15461301
 ] 

Camille Fournier commented on ZOOKEEPER-2169:
-

Yeah I think we probably will want that as part of the 3.6 release but I don't 
know that we should block getting this patch done just to do it. Or does adding 
to the API require 4.0? I can never remember how these versioning schemes work 
:)

> Enable creation of nodes with TTLs
> --
>
> Key: ZOOKEEPER-2169
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2169
> Project: ZooKeeper
>  Issue Type: New Feature
>  Components: c client, java client, jute, server
>Affects Versions: 3.6.0
>Reporter: Camille Fournier
>Assignee: Jordan Zimmerman
> Fix For: 3.6.0
>
> Attachments: ZOOKEEPER-2169-2.patch, ZOOKEEPER-2169-3.patch, 
> ZOOKEEPER-2169-4.patch, ZOOKEEPER-2169-5.patch, ZOOKEEPER-2169.patch
>
>
> As a user, I would like to be able to create a node that is NOT tied to a 
> session but that WILL expire automatically if action is not taken by some 
> client within a time window.
> I propose this to enable clients interacting with ZK via http or other "thin 
> clients" to create ephemeral-like nodes.
> Some ideas for the design, up for discussion:
> The node should support all normal ZK node operations including ACLs, 
> sequential key generation, etc, however, it should not support the ephemeral 
> flag. The node will be created with a TTL that is updated via a refresh 
> operation. 
> The ZK quorum will watch this node similarly to the way that it watches for 
> session liveness; if the node is not refreshed within the TTL, it will expire.
> QUESTIONS:
> 1) Should we let the refresh operation set the TTL to a different base value?
> 2) If so, should the setting of the TTL to a new base value cause a watch to 
> fire?
> 3) Do we want to allow these nodes to have children or prevent this similar 
> to ephemeral nodes?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ZOOKEEPER-2169) Enable creation of nodes with TTLs

2016-09-03 Thread Camille Fournier (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15461288#comment-15461288
 ] 

Camille Fournier commented on ZOOKEEPER-2169:
-

OK Finally looking at this in detail. Overall it seems pretty good. I have 
mixed feelings about calling the create type PERSISTENT_WITH_TTL since they 
aren't exactly persistent, hence the TTL, but I'm willing to let that go unless 
others feel it would be confusing.

One thing I was expecting to see was the ability to "touch" a node to reset the 
TTL without having to change other data. By using mTime to indicate liveness 
we're basically saying the only way to reset TTL is to change the data in the 
node (other operations such as setting ACLs, adding children, deleting children 
don't reset that mTime, only initial creation and updating of node data. I 
think we can add some sort of "touch" later if we decide we want to. 

I still want to look at the tests a bit more but I'm getting comfortable. I'm 
going to add any minor comments I have to the reviewboard. 

Finally, I know several others have looked at this code, if you have ([~breed] 
[~fpj] [[~rgs] [~hanm]) it would be helpful to get this finished to get an ack 
that someone has
a) Gone through the doc changes and made sure they look good
b) Gone through the comment changes to the client-facing methods that have 
changed in ZooKeeper and made sure they look good

These tend to be the two areas we discover after the fact we missed something 
obvious (vs small bugs) so if one of you has already done a thorough validation 
of either of these please LMK.

Thanks!

> Enable creation of nodes with TTLs
> --
>
> Key: ZOOKEEPER-2169
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2169
> Project: ZooKeeper
>  Issue Type: New Feature
>  Components: c client, java client, jute, server
>Affects Versions: 3.6.0
>Reporter: Camille Fournier
>Assignee: Jordan Zimmerman
> Fix For: 3.6.0
>
> Attachments: ZOOKEEPER-2169-2.patch, ZOOKEEPER-2169-3.patch, 
> ZOOKEEPER-2169-4.patch, ZOOKEEPER-2169-5.patch, ZOOKEEPER-2169.patch
>
>
> As a user, I would like to be able to create a node that is NOT tied to a 
> session but that WILL expire automatically if action is not taken by some 
> client within a time window.
> I propose this to enable clients interacting with ZK via http or other "thin 
> clients" to create ephemeral-like nodes.
> Some ideas for the design, up for discussion:
> The node should support all normal ZK node operations including ACLs, 
> sequential key generation, etc, however, it should not support the ephemeral 
> flag. The node will be created with a TTL that is updated via a refresh 
> operation. 
> The ZK quorum will watch this node similarly to the way that it watches for 
> session liveness; if the node is not refreshed within the TTL, it will expire.
> QUESTIONS:
> 1) Should we let the refresh operation set the TTL to a different base value?
> 2) If so, should the setting of the TTL to a new base value cause a watch to 
> fire?
> 3) Do we want to allow these nodes to have children or prevent this similar 
> to ephemeral nodes?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (ZOOKEEPER-1256) ClientPortBindTest is failing on Mac OS X

2016-07-27 Thread Camille Fournier (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Camille Fournier updated ZOOKEEPER-1256:

Attachment: ZOOKEEPER-1256v3.patch

all JDK 1.4 and respects loopback check

> ClientPortBindTest is failing on Mac OS X
> -
>
> Key: ZOOKEEPER-1256
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1256
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: tests
> Environment: Mac OS X
>Reporter: Daniel Gómez Ferro
>Assignee: Flavio Junqueira
> Fix For: 3.5.3, 3.6.0
>
> Attachments: ClientPortBindTest.log, ZOOKEEPER-1256.patch, 
> ZOOKEEPER-1256.patch, ZOOKEEPER-1256.patch, ZOOKEEPER-1256v2.patch, 
> ZOOKEEPER-1256v3.patch
>
>
> ClientPortBindTest is failing consistently on Mac OS X.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ZOOKEEPER-2104) Sudden crash of all nodes in the cluster

2016-07-27 Thread Camille Fournier (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15396234#comment-15396234
 ] 

Camille Fournier commented on ZOOKEEPER-2104:
-

Yeah, your init limit needs to be longer. They're not getting into quorum 
because it takes longer than 20s to sync. Dunno why the original node crashed 
but if you increase initLimit that should solve this problem.

> Sudden crash of all nodes in the cluster
> 
>
> Key: ZOOKEEPER-2104
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2104
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.6
>Reporter: Benjamin Jaton
> Attachments: zookeeper-errors.txt, zookeeper-warns.txt
>
>
> In a 3 nodes ensemble, suddenly all the nodes seem to fail, displaying 
> "ZooKeeper is not running" messages.
> Not retry seems to be happening after that.
> This a request to understand what happened and probably to improve the logs 
> when it does.
> See logs below:
> NODE1:
> -- no log for several days before this --
> 2015-01-04 16:18:22,259 [myid:1] - WARN  [SyncThread:1:FileTxnLog@321] - 
> fsync-ing the write ahead log in SyncThread:1 took 11024ms which will 
> adversely effect operation latency. See the ZooKeeper troubleshooting guide
> 2015-01-04 16:18:22,380 [myid:1] - WARN  
> [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:Follower@89] - Exception when 
> following the leader
> java.io.EOFException
> at java.io.DataInputStream.readInt(DataInputStream.java:392)
> at 
> org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
> at 
> org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:83)
> at 
> org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:103)
> at 
> org.apache.zookeeper.server.quorum.Learner.readPacket(Learner.java:153)
> at 
> org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:85)
> at 
> org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:786)
> 2015-01-04 16:18:23,384 [myid:1] - WARN  
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@362] - Exception 
> causing close of session 0x0 due to java.io.IOException: ZooKeeperServer not 
> running
> 2015-01-04 16:18:23,492 [myid:1] - WARN  
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@362] - Exception 
> causing close of session 0x0 due to java.io.IOException: ZooKeeperServer not 
> running
> 2015-01-04 16:18:24,060 [myid:1] - WARN  
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@362] - Exception 
> causing close of session 0x0 due to java.io.IOException: ZooKeeperServer not 
> running
> NODE2:
> -- no log for several days before this --
> 2015-01-04 16:18:21,899 [myid:3] - WARN  
> [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:2181:Follower@89] - Exception when 
> following the leader
> java.io.EOFException
> at java.io.DataInputStream.readInt(DataInputStream.java:392)
> at 
> org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
> at 
> org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:83)
> at 
> org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:103)
> at 
> org.apache.zookeeper.server.quorum.Learner.readPacket(Learner.java:153)
> at 
> org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:85)
> at 
> org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:786)
> 2015-01-04 16:18:22,760 [myid:3] - WARN  
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@362] - Exception 
> causing close of session 0x0 due to java.io.IOException: ZooKeeperServer not 
> running
> 2015-01-04 16:18:22,801 [myid:3] - WARN  
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@362] - Exception 
> causing close of session 0x0 due to java.io.IOException: ZooKeeperServer not 
> running
> 2015-01-04 16:18:22,886 [myid:3] - WARN  
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@362] - Exception 
> causing close of session 0x0 due to java.io.IOException: ZooKeeperServer not 
> running
> NODE3 (leader):
> -- no log for several days before this --
> 2015-01-04 16:18:21,897 [myid:2] - WARN  
> [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2181:LearnerHandler@687] - Closing 
> connection to peer due to transaction timeout.
> 2015-01-04 16:18:21,898 [myid:2] - WARN  
> [LearnerHandler-/204.53.107.249:43402:LearnerHandler@646] - *** GOODBYE 
> /204.53.107.249:43402 
> 2015-01-04 16:18:21,905 [myid:2] - WARN  
> [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2181:LearnerHandler@687] - Closing 
> connection to peer due to transaction timeout.
> 2015-01-04 16:18:21,907 [myid:2] - WARN  
> [LearnerHandler-/204.53.107.247:45953:LearnerHandler@646] -

[jira] [Commented] (ZOOKEEPER-2104) Sudden crash of all nodes in the cluster

2016-07-27 Thread Camille Fournier (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15396172#comment-15396172
 ] 

Camille Fournier commented on ZOOKEEPER-2104:
-

It's hard to tell if this is just that the logs were grabbed at different times 
or if it is clock drift but I would check for clock drift.
I'm also seeing this error though:
2016-07-27 11:47:05,709 [myid:2] - WARN  [SyncThread:2:FileTxnLog@321] - 
fsync-ing the write ahead 
log in SyncThread:2 took ms which will adversely effect operation latency. 
See the ZooKeeper troubleshooting guide

It's also taking over 10 seconds to read the snapshot on startup, which is not 
a good sign. Flavio's advice to increase the initLimit is probably good.

> Sudden crash of all nodes in the cluster
> 
>
> Key: ZOOKEEPER-2104
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2104
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.6
>Reporter: Benjamin Jaton
> Attachments: zookeeper-errors.txt, zookeeper-warns.txt
>
>
> In a 3 nodes ensemble, suddenly all the nodes seem to fail, displaying 
> "ZooKeeper is not running" messages.
> Not retry seems to be happening after that.
> This a request to understand what happened and probably to improve the logs 
> when it does.
> See logs below:
> NODE1:
> -- no log for several days before this --
> 2015-01-04 16:18:22,259 [myid:1] - WARN  [SyncThread:1:FileTxnLog@321] - 
> fsync-ing the write ahead log in SyncThread:1 took 11024ms which will 
> adversely effect operation latency. See the ZooKeeper troubleshooting guide
> 2015-01-04 16:18:22,380 [myid:1] - WARN  
> [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:Follower@89] - Exception when 
> following the leader
> java.io.EOFException
> at java.io.DataInputStream.readInt(DataInputStream.java:392)
> at 
> org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
> at 
> org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:83)
> at 
> org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:103)
> at 
> org.apache.zookeeper.server.quorum.Learner.readPacket(Learner.java:153)
> at 
> org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:85)
> at 
> org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:786)
> 2015-01-04 16:18:23,384 [myid:1] - WARN  
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@362] - Exception 
> causing close of session 0x0 due to java.io.IOException: ZooKeeperServer not 
> running
> 2015-01-04 16:18:23,492 [myid:1] - WARN  
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@362] - Exception 
> causing close of session 0x0 due to java.io.IOException: ZooKeeperServer not 
> running
> 2015-01-04 16:18:24,060 [myid:1] - WARN  
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@362] - Exception 
> causing close of session 0x0 due to java.io.IOException: ZooKeeperServer not 
> running
> NODE2:
> -- no log for several days before this --
> 2015-01-04 16:18:21,899 [myid:3] - WARN  
> [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:2181:Follower@89] - Exception when 
> following the leader
> java.io.EOFException
> at java.io.DataInputStream.readInt(DataInputStream.java:392)
> at 
> org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
> at 
> org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:83)
> at 
> org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:103)
> at 
> org.apache.zookeeper.server.quorum.Learner.readPacket(Learner.java:153)
> at 
> org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:85)
> at 
> org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:786)
> 2015-01-04 16:18:22,760 [myid:3] - WARN  
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@362] - Exception 
> causing close of session 0x0 due to java.io.IOException: ZooKeeperServer not 
> running
> 2015-01-04 16:18:22,801 [myid:3] - WARN  
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@362] - Exception 
> causing close of session 0x0 due to java.io.IOException: ZooKeeperServer not 
> running
> 2015-01-04 16:18:22,886 [myid:3] - WARN  
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@362] - Exception 
> causing close of session 0x0 due to java.io.IOException: ZooKeeperServer not 
> running
> NODE3 (leader):
> -- no log for several days before this --
> 2015-01-04 16:18:21,897 [myid:2] - WARN  
> [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2181:LearnerHandler@687] - Closing 
> connection to peer due to transaction timeout.
> 2015-01-04 16:18:21,898 [myid:2] - WARN  
> [LearnerHandler-/204.53.107.249:43402:LearnerHan

[jira] [Commented] (ZOOKEEPER-2104) Sudden crash of all nodes in the cluster

2016-07-27 Thread Camille Fournier (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15396130#comment-15396130
 ] 

Camille Fournier commented on ZOOKEEPER-2104:
-

Is it possible this is a clock drift problem? The logs you've provided end at 
12:13:35 for node1, 12:18:31 for node 2, and 12:14:11 for node3. I can't 
remember if this degree of clock drift causes issues or not, [~fpj] do you 
recall?

> Sudden crash of all nodes in the cluster
> 
>
> Key: ZOOKEEPER-2104
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2104
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.6
>Reporter: Benjamin Jaton
> Attachments: zookeeper-errors.txt, zookeeper-warns.txt
>
>
> In a 3 nodes ensemble, suddenly all the nodes seem to fail, displaying 
> "ZooKeeper is not running" messages.
> Not retry seems to be happening after that.
> This a request to understand what happened and probably to improve the logs 
> when it does.
> See logs below:
> NODE1:
> -- no log for several days before this --
> 2015-01-04 16:18:22,259 [myid:1] - WARN  [SyncThread:1:FileTxnLog@321] - 
> fsync-ing the write ahead log in SyncThread:1 took 11024ms which will 
> adversely effect operation latency. See the ZooKeeper troubleshooting guide
> 2015-01-04 16:18:22,380 [myid:1] - WARN  
> [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:Follower@89] - Exception when 
> following the leader
> java.io.EOFException
> at java.io.DataInputStream.readInt(DataInputStream.java:392)
> at 
> org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
> at 
> org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:83)
> at 
> org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:103)
> at 
> org.apache.zookeeper.server.quorum.Learner.readPacket(Learner.java:153)
> at 
> org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:85)
> at 
> org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:786)
> 2015-01-04 16:18:23,384 [myid:1] - WARN  
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@362] - Exception 
> causing close of session 0x0 due to java.io.IOException: ZooKeeperServer not 
> running
> 2015-01-04 16:18:23,492 [myid:1] - WARN  
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@362] - Exception 
> causing close of session 0x0 due to java.io.IOException: ZooKeeperServer not 
> running
> 2015-01-04 16:18:24,060 [myid:1] - WARN  
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@362] - Exception 
> causing close of session 0x0 due to java.io.IOException: ZooKeeperServer not 
> running
> NODE2:
> -- no log for several days before this --
> 2015-01-04 16:18:21,899 [myid:3] - WARN  
> [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:2181:Follower@89] - Exception when 
> following the leader
> java.io.EOFException
> at java.io.DataInputStream.readInt(DataInputStream.java:392)
> at 
> org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
> at 
> org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:83)
> at 
> org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:103)
> at 
> org.apache.zookeeper.server.quorum.Learner.readPacket(Learner.java:153)
> at 
> org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:85)
> at 
> org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:786)
> 2015-01-04 16:18:22,760 [myid:3] - WARN  
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@362] - Exception 
> causing close of session 0x0 due to java.io.IOException: ZooKeeperServer not 
> running
> 2015-01-04 16:18:22,801 [myid:3] - WARN  
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@362] - Exception 
> causing close of session 0x0 due to java.io.IOException: ZooKeeperServer not 
> running
> 2015-01-04 16:18:22,886 [myid:3] - WARN  
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@362] - Exception 
> causing close of session 0x0 due to java.io.IOException: ZooKeeperServer not 
> running
> NODE3 (leader):
> -- no log for several days before this --
> 2015-01-04 16:18:21,897 [myid:2] - WARN  
> [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2181:LearnerHandler@687] - Closing 
> connection to peer due to transaction timeout.
> 2015-01-04 16:18:21,898 [myid:2] - WARN  
> [LearnerHandler-/204.53.107.249:43402:LearnerHandler@646] - *** GOODBYE 
> /204.53.107.249:43402 
> 2015-01-04 16:18:21,905 [myid:2] - WARN  
> [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2181:LearnerHandler@687] - Closing 
> connection to peer due to transaction timeout.
> 2015-01-04 16:18:21,907 [myid:2] - WARN  
> [LearnerHandler-/204.53.107.247:45953:

[jira] [Commented] (ZOOKEEPER-1256) ClientPortBindTest is failing on Mac OS X

2016-07-27 Thread Camille Fournier (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15396115#comment-15396115
 ] 

Camille Fournier commented on ZOOKEEPER-1256:
-

So, the test says in comments that "if we have a loopback, and it has an 
address use it" but we do not request the loopback address when getting the 
bindAddress set. This patch does nothing but specifically request the loopback 
address. It passes on my machine.

> ClientPortBindTest is failing on Mac OS X
> -
>
> Key: ZOOKEEPER-1256
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1256
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: tests
> Environment: Mac OS X
>Reporter: Daniel Gómez Ferro
>Assignee: Flavio Junqueira
> Fix For: 3.5.3, 3.6.0
>
> Attachments: ClientPortBindTest.log, ZOOKEEPER-1256.patch, 
> ZOOKEEPER-1256.patch, ZOOKEEPER-1256.patch, ZOOKEEPER-1256v2.patch
>
>
> ClientPortBindTest is failing consistently on Mac OS X.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (ZOOKEEPER-1256) ClientPortBindTest is failing on Mac OS X

2016-07-27 Thread Camille Fournier (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Camille Fournier updated ZOOKEEPER-1256:

Attachment: ZOOKEEPER-1256v2.patch

patch that specifies that we want the loopback address

> ClientPortBindTest is failing on Mac OS X
> -
>
> Key: ZOOKEEPER-1256
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1256
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: tests
> Environment: Mac OS X
>Reporter: Daniel Gómez Ferro
>Assignee: Flavio Junqueira
> Fix For: 3.5.3, 3.6.0
>
> Attachments: ClientPortBindTest.log, ZOOKEEPER-1256.patch, 
> ZOOKEEPER-1256.patch, ZOOKEEPER-1256.patch, ZOOKEEPER-1256v2.patch
>
>
> ClientPortBindTest is failing consistently on Mac OS X.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ZOOKEEPER-1256) ClientPortBindTest is failing on Mac OS X

2016-07-27 Thread Camille Fournier (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15396106#comment-15396106
 ] 

Camille Fournier commented on ZOOKEEPER-1256:
-

I don't absolutely love the patch fix by setting that property. Let me look at 
this for a minute.

> ClientPortBindTest is failing on Mac OS X
> -
>
> Key: ZOOKEEPER-1256
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1256
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: tests
> Environment: Mac OS X
>Reporter: Daniel Gómez Ferro
>Assignee: Flavio Junqueira
> Fix For: 3.5.3, 3.6.0
>
> Attachments: ClientPortBindTest.log, ZOOKEEPER-1256.patch, 
> ZOOKEEPER-1256.patch, ZOOKEEPER-1256.patch
>
>
> ClientPortBindTest is failing consistently on Mac OS X.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ZOOKEEPER-2368) Client watches are not disconnected on close

2016-07-12 Thread Camille Fournier (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15373389#comment-15373389
 ] 

Camille Fournier commented on ZOOKEEPER-2368:
-

Yeah, I am interested in the perspective of "is this the right thing to do for 
clients, what will it do to existing client libraries like Curator"

> Client watches are not disconnected on close
> 
>
> Key: ZOOKEEPER-2368
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2368
> Project: ZooKeeper
>  Issue Type: Improvement
>Affects Versions: 3.4.0, 3.5.0
>Reporter: Timothy Ward
> Fix For: 3.5.2
>
> Attachments: ZOOKEEPER-2368.patch
>
>
> If I have a ZooKeeper client connected to an ensemble then obviously I can 
> register watches. 
> If the client is disconnected (for example by a failing ensemble member) then 
> I get a disconnection event for all of my watches. If, on the other hand, my 
> client is closed then I *do not* get a disconnection event. This asymmetry 
> makes it really hard to clear up properly when using the asynchronous API, as 
> there is no way to "fail" data reads/updates when the client is closed.
> I believe that the correct behaviour should be for all watchers to receive a 
> disconnection event when the client is closed. The watchers can then respond 
> as appropriate, and can differentiate between a "server disconnect" and a 
> "client disconnect" by checking the ZooKeeper#getState() method. 
> This would not be a breaking behaviour change as Watchers are already 
> required to handle disconnection events.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ZOOKEEPER-2368) Client watches are not disconnected on close

2016-07-12 Thread Camille Fournier (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15373118#comment-15373118
 ] 

Camille Fournier commented on ZOOKEEPER-2368:
-

Hey [~randgalt] (or others) can you talk through whether this makes sense from 
a client impl perspective? I'm not sure and would appreciate a set of eyes from 
someone deeper in client logic.

> Client watches are not disconnected on close
> 
>
> Key: ZOOKEEPER-2368
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2368
> Project: ZooKeeper
>  Issue Type: Improvement
>Affects Versions: 3.4.0, 3.5.0
>Reporter: Timothy Ward
> Fix For: 3.5.2
>
> Attachments: ZOOKEEPER-2368.patch
>
>
> If I have a ZooKeeper client connected to an ensemble then obviously I can 
> register watches. 
> If the client is disconnected (for example by a failing ensemble member) then 
> I get a disconnection event for all of my watches. If, on the other hand, my 
> client is closed then I *do not* get a disconnection event. This asymmetry 
> makes it really hard to clear up properly when using the asynchronous API, as 
> there is no way to "fail" data reads/updates when the client is closed.
> I believe that the correct behaviour should be for all watchers to receive a 
> disconnection event when the client is closed. The watchers can then respond 
> as appropriate, and can differentiate between a "server disconnect" and a 
> "client disconnect" by checking the ZooKeeper#getState() method. 
> This would not be a breaking behaviour change as Watchers are already 
> required to handle disconnection events.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (ZOOKEEPER-2368) Client watches are not disconnected on close

2016-07-12 Thread Camille Fournier (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Camille Fournier updated ZOOKEEPER-2368:

Issue Type: Improvement  (was: Bug)

> Client watches are not disconnected on close
> 
>
> Key: ZOOKEEPER-2368
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2368
> Project: ZooKeeper
>  Issue Type: Improvement
>Affects Versions: 3.4.0, 3.5.0
>Reporter: Timothy Ward
> Fix For: 3.5.2
>
> Attachments: ZOOKEEPER-2368.patch
>
>
> If I have a ZooKeeper client connected to an ensemble then obviously I can 
> register watches. 
> If the client is disconnected (for example by a failing ensemble member) then 
> I get a disconnection event for all of my watches. If, on the other hand, my 
> client is closed then I *do not* get a disconnection event. This asymmetry 
> makes it really hard to clear up properly when using the asynchronous API, as 
> there is no way to "fail" data reads/updates when the client is closed.
> I believe that the correct behaviour should be for all watchers to receive a 
> disconnection event when the client is closed. The watchers can then respond 
> as appropriate, and can differentiate between a "server disconnect" and a 
> "client disconnect" by checking the ZooKeeper#getState() method. 
> This would not be a breaking behaviour change as Watchers are already 
> required to handle disconnection events.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ZOOKEEPER-1256) ClientPortBindTest is failing on Mac OS X

2016-07-08 Thread Camille Fournier (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15367962#comment-15367962
 ] 

Camille Fournier commented on ZOOKEEPER-1256:
-

Hey [~fpj], so we're looking for the loopback host address here? I don't really 
understand what this test is trying to do exactly, but it looks like on line 
64, if I change that from 
bindAddress = addrs.nextElement().getHostAddress(); to
bindAddress = addrs.nextElement().getLoopbackAddress().getHostAddress();
all is fine. So I'm just not sure, if we're looking for the loopback address, 
why we aren't specifically asking for that in the test? But this is not a part 
of the networking code that is super familiar to me. Anyway, that change fixes 
it for me locally.

> ClientPortBindTest is failing on Mac OS X
> -
>
> Key: ZOOKEEPER-1256
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1256
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: tests
> Environment: Mac OS X
>Reporter: Daniel Gómez Ferro
>Assignee: Flavio Junqueira
> Fix For: 3.5.3, 3.6.0
>
> Attachments: ClientPortBindTest.log, ZOOKEEPER-1256.patch, 
> ZOOKEEPER-1256.patch, ZOOKEEPER-1256.patch
>
>
> ClientPortBindTest is failing consistently on Mac OS X.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ZOOKEEPER-2141) ACL cache in DataTree never removes entries

2016-04-01 Thread Camille Fournier (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15222172#comment-15222172
 ] 

Camille Fournier commented on ZOOKEEPER-2141:
-

OK I'm looking at this patch more carefully this time. It looks OK to me but 
would love for [~eribeiro] to take a look as well since he caught some details 
last time. If you +1 I'll check it in [~eribeiro]

> ACL cache in DataTree never removes entries
> ---
>
> Key: ZOOKEEPER-2141
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2141
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.4.6
>Reporter: Karol Dudzinski
>Assignee: Adam Milne-Smith
>Priority: Blocker
> Fix For: 3.4.9, 3.5.2
>
> Attachments: ZOOKEEPER-2141-3.4.patch, ZOOKEEPER-2141.patch, 
> ZOOKEEPER-2141.patch, ZOOKEEPER-2141.patch, ZOOKEEPER-2141.patch, 
> ZOOKEEPER-2141.patch, ZOOKEEPER-2141.patch, ZOOKEEPER-2141.patch
>
>
> The problem and potential solutions are discussed in 
> http://mail-archives.apache.org/mod_mbox/zookeeper-user/201502.mbox/browser
> I will attach a proposed patch in due course.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Reopened] (ZOOKEEPER-2141) ACL cache in DataTree never removes entries

2016-03-29 Thread Camille Fournier (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Camille Fournier reopened ZOOKEEPER-2141:
-

> ACL cache in DataTree never removes entries
> ---
>
> Key: ZOOKEEPER-2141
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2141
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.4.6
>Reporter: Karol Dudzinski
>Assignee: Adam Milne-Smith
> Fix For: 3.4.9, 3.5.2
>
> Attachments: ZOOKEEPER-2141-3.4.patch, ZOOKEEPER-2141.patch, 
> ZOOKEEPER-2141.patch, ZOOKEEPER-2141.patch, ZOOKEEPER-2141.patch, 
> ZOOKEEPER-2141.patch
>
>
> The problem and potential solutions are discussed in 
> http://mail-archives.apache.org/mod_mbox/zookeeper-user/201502.mbox/browser
> I will attach a proposed patch in due course.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ZOOKEEPER-2141) ACL cache in DataTree never removes entries

2016-03-29 Thread Camille Fournier (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15217052#comment-15217052
 ] 

Camille Fournier commented on ZOOKEEPER-2141:
-

Nice catch Eddie. Issue #1 is worth addressing, I'm less sure about the rest. 
[~adammilnesmith] why did you synchronize that value in the equals?

> ACL cache in DataTree never removes entries
> ---
>
> Key: ZOOKEEPER-2141
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2141
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.4.6
>Reporter: Karol Dudzinski
>Assignee: Adam Milne-Smith
> Fix For: 3.4.9, 3.5.2
>
> Attachments: ZOOKEEPER-2141-3.4.patch, ZOOKEEPER-2141.patch, 
> ZOOKEEPER-2141.patch, ZOOKEEPER-2141.patch, ZOOKEEPER-2141.patch, 
> ZOOKEEPER-2141.patch
>
>
> The problem and potential solutions are discussed in 
> http://mail-archives.apache.org/mod_mbox/zookeeper-user/201502.mbox/browser
> I will attach a proposed patch in due course.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ZOOKEEPER-2141) ACL cache in DataTree never removes entries

2016-03-29 Thread Camille Fournier (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15216359#comment-15216359
 ] 

Camille Fournier commented on ZOOKEEPER-2141:
-

This is great, +1 on all of it, I've applied to trunk, 3.5 and 3.4. Thank you 
all!

> ACL cache in DataTree never removes entries
> ---
>
> Key: ZOOKEEPER-2141
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2141
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.4.6
>Reporter: Karol Dudzinski
>Assignee: Adam Milne-Smith
> Attachments: ZOOKEEPER-2141-3.4.patch, ZOOKEEPER-2141.patch, 
> ZOOKEEPER-2141.patch, ZOOKEEPER-2141.patch, ZOOKEEPER-2141.patch, 
> ZOOKEEPER-2141.patch
>
>
> The problem and potential solutions are discussed in 
> http://mail-archives.apache.org/mod_mbox/zookeeper-user/201502.mbox/browser
> I will attach a proposed patch in due course.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ZOOKEEPER-2141) ACL cache in DataTree never removes entries

2016-03-22 Thread Camille Fournier (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15207534#comment-15207534
 ] 

Camille Fournier commented on ZOOKEEPER-2141:
-

Into 3.5 as well. Just waiting on 3.4.

> ACL cache in DataTree never removes entries
> ---
>
> Key: ZOOKEEPER-2141
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2141
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.4.6
>Reporter: Karol Dudzinski
>Assignee: Adam Milne-Smith
> Attachments: ZOOKEEPER-2141.patch, ZOOKEEPER-2141.patch, 
> ZOOKEEPER-2141.patch, ZOOKEEPER-2141.patch, ZOOKEEPER-2141.patch
>
>
> The problem and potential solutions are discussed in 
> http://mail-archives.apache.org/mod_mbox/zookeeper-user/201502.mbox/browser
> I will attach a proposed patch in due course.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ZOOKEEPER-2141) ACL cache in DataTree never removes entries

2016-03-22 Thread Camille Fournier (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15207522#comment-15207522
 ] 

Camille Fournier commented on ZOOKEEPER-2141:
-

Reviewed the patch against trunk, +1 on that and checked in. Waiting on patch 
against 3.4. Checking existing patch against 3.5.

> ACL cache in DataTree never removes entries
> ---
>
> Key: ZOOKEEPER-2141
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2141
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.4.6
>Reporter: Karol Dudzinski
>Assignee: Adam Milne-Smith
> Attachments: ZOOKEEPER-2141.patch, ZOOKEEPER-2141.patch, 
> ZOOKEEPER-2141.patch, ZOOKEEPER-2141.patch, ZOOKEEPER-2141.patch
>
>
> The problem and potential solutions are discussed in 
> http://mail-archives.apache.org/mod_mbox/zookeeper-user/201502.mbox/browser
> I will attach a proposed patch in due course.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ZOOKEEPER-2141) ACL cache in DataTree never removes entries

2016-03-21 Thread Camille Fournier (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15204821#comment-15204821
 ] 

Camille Fournier commented on ZOOKEEPER-2141:
-

Yes please just make a patch that applies to the 3.4 branch and attach it here 
with a naming indication that it's for 3.4. Thanks!

> ACL cache in DataTree never removes entries
> ---
>
> Key: ZOOKEEPER-2141
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2141
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.4.6
>Reporter: Karol Dudzinski
>Assignee: Adam Milne-Smith
> Attachments: ZOOKEEPER-2141.patch, ZOOKEEPER-2141.patch, 
> ZOOKEEPER-2141.patch, ZOOKEEPER-2141.patch, ZOOKEEPER-2141.patch
>
>
> The problem and potential solutions are discussed in 
> http://mail-archives.apache.org/mod_mbox/zookeeper-user/201502.mbox/browser
> I will attach a proposed patch in due course.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ZOOKEEPER-2141) ACL cache in DataTree never removes entries

2016-03-21 Thread Camille Fournier (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15204790#comment-15204790
 ] 

Camille Fournier commented on ZOOKEEPER-2141:
-

Do you all want this fixed on the 3.4 branch? The patch will only apply to 
trunk currently. LMK.

> ACL cache in DataTree never removes entries
> ---
>
> Key: ZOOKEEPER-2141
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2141
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.4.6
>Reporter: Karol Dudzinski
>Assignee: Adam Milne-Smith
> Attachments: ZOOKEEPER-2141.patch, ZOOKEEPER-2141.patch, 
> ZOOKEEPER-2141.patch, ZOOKEEPER-2141.patch, ZOOKEEPER-2141.patch
>
>
> The problem and potential solutions are discussed in 
> http://mail-archives.apache.org/mod_mbox/zookeeper-user/201502.mbox/browser
> I will attach a proposed patch in due course.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ZOOKEEPER-2342) ZooKeeper cannot write logs, because there is no SLF4J binding available on the runtime classpath.

2016-03-19 Thread Camille Fournier (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15197330#comment-15197330
 ] 

Camille Fournier commented on ZOOKEEPER-2342:
-

I'm +1 for reverting the breaking patch. I don't honestly see the value in that 
change that outweighs the cost of trying to make it not backward-breaking. If 
someone wants to advocate for it with some concrete examples of the pain it is 
causing, I am all ears, otherwise, let's revert.

> ZooKeeper cannot write logs, because there is no SLF4J binding available on 
> the runtime classpath.
> --
>
> Key: ZOOKEEPER-2342
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2342
> Project: ZooKeeper
>  Issue Type: Bug
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
>Priority: Blocker
> Fix For: 3.5.2, 3.6.0
>
> Attachments: ZOOKEEPER-2342.001.patch
>
>
> ZOOKEEPER-1371 removed our source code dependency on Log4J.  It appears that 
> this also removed the Log4J SLF4J binding jar from the runtime classpath.  
> Without any SLF4J binding jar available on the runtime classpath, the it is 
> impossible to write logs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ZOOKEEPER-2141) ACL cache in DataTree never removes entries

2016-03-19 Thread Camille Fournier (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15197318#comment-15197318
 ] 

Camille Fournier commented on ZOOKEEPER-2141:
-

It's failing in AsyncHammerTest. Not sure if this is a flaky test or not. 
[~adammilnesmith_gs] if you'll run that test locally and see if you think it 
should pass, I'll review this and get it committed. Thanks for this, nice to 
see you folks contributing!

> ACL cache in DataTree never removes entries
> ---
>
> Key: ZOOKEEPER-2141
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2141
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.4.6
>Reporter: Karol Dudzinski
>Assignee: Adam Milne-Smith
> Attachments: ZOOKEEPER-2141.patch, ZOOKEEPER-2141.patch, 
> ZOOKEEPER-2141.patch, ZOOKEEPER-2141.patch, ZOOKEEPER-2141.patch
>
>
> The problem and potential solutions are discussed in 
> http://mail-archives.apache.org/mod_mbox/zookeeper-user/201502.mbox/browser
> I will attach a proposed patch in due course.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ZOOKEEPER-2141) ACL cache in DataTree never removes entries

2016-03-19 Thread Camille Fournier (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15200102#comment-15200102
 ] 

Camille Fournier commented on ZOOKEEPER-2141:
-

This one failed on a cpp test
 TestReconfigServer::testRemoveConnectedFollowerStarting zookeeper ... FAILED 
TO START

Is this test flaky? Does anyone know? I don't think I can run the cpp tests 
from my computer.

> ACL cache in DataTree never removes entries
> ---
>
> Key: ZOOKEEPER-2141
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2141
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.4.6
>Reporter: Karol Dudzinski
>Assignee: Adam Milne-Smith
> Attachments: ZOOKEEPER-2141.patch, ZOOKEEPER-2141.patch, 
> ZOOKEEPER-2141.patch, ZOOKEEPER-2141.patch, ZOOKEEPER-2141.patch
>
>
> The problem and potential solutions are discussed in 
> http://mail-archives.apache.org/mod_mbox/zookeeper-user/201502.mbox/browser
> I will attach a proposed patch in due course.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ZOOKEEPER-2348) Data between leader and followers are not synchronized.

2015-12-18 Thread Camille Fournier (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15064166#comment-15064166
 ] 

Camille Fournier commented on ZOOKEEPER-2348:
-

In the first scenario, you say that the session expired, and the removal 
happened, then fault, then it wasn't removed. Does the new leader eventually 
detect the session expiration, or does it never detect it?

In the second scenario, is the delete returned as succeeded to the client?

> Data between leader and followers are not synchronized.
> ---
>
> Key: ZOOKEEPER-2348
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2348
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.5.1
>Reporter: Chen Ningning
>
> When client session expired, leader tried to remove it from session map and 
> remove its EPHEMERAL znode, for example, /test_znode. This operation succeed 
> on leader, but at the very same time, network fault happended and not synced 
> to followers, a new leader election launched. After leader election finished, 
> the new leader is not the old leader. we found the znode /test_znode still 
> existed in the followers but not on leader
>  *Scenario :* 
> 1) Create znode E.g.  
> {{/rmstore/ZKRMStateRoot/RMAppRoot/application_1449644945944_0001/appattempt_1449644945944_0001_01}}
> 2) Delete Znode. 
> 3) Network fault b/w follower and leader machines
> 4) leader election again and follower became leader.
> Now data is not synced with new leader..After this client is not able to same 
> znode.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ZOOKEEPER-2159) Pluggable SASL Authentication

2015-10-02 Thread Camille Fournier (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14941213#comment-14941213
 ] 

Camille Fournier commented on ZOOKEEPER-2159:
-

Do we have any updates on this? [~ekoontz]?

> Pluggable SASL Authentication
> -
>
> Key: ZOOKEEPER-2159
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2159
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: java client, server
>Reporter: Yuliya Feldman
>Assignee: Yuliya Feldman
> Attachments: PluggableZookeeperAuthentication (1).pdf, 
> PluggableZookeeperAuthentication.pdf
>
>
> Today SASLAuthenticationProvider is used for all SASL based authentications 
> which creates some "if/else" statements in ZookeeperSaslClient and 
> ZookeeperSaslServer code with just Kerberos and Digest.
> We want to use yet another different SASL based authentication and adding one 
> more "if/else" with some code specific just to that new way does not make 
> much sense.
> Proposal is to allow to plug custom SASL Authentication mechanism(s) without  
> further changes in Zookeeper code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ZOOKEEPER-2101) Transaction larger than max buffer of jute makes zookeeper unavailable

2015-08-26 Thread Camille Fournier (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14715508#comment-14715508
 ] 

Camille Fournier commented on ZOOKEEPER-2101:
-

Awesome thanks [~hdeng]!

> Transaction larger than max buffer of jute makes zookeeper unavailable
> --
>
> Key: ZOOKEEPER-2101
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2101
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: jute
>Affects Versions: 3.4.4
>Reporter: Liu Shaohui
> Fix For: 3.5.2, 3.6.0
>
> Attachments: ZOOKEEPER-2101-v1.diff, ZOOKEEPER-2101-v2.diff, 
> ZOOKEEPER-2101-v3.diff, ZOOKEEPER-2101-v4.diff, ZOOKEEPER-2101-v5.diff, 
> ZOOKEEPER-2101-v6.diff, ZOOKEEPER-2101-v7.diff, test.diff
>
>
> *Problem*
> For multi operation, PrepRequestProcessor may produce a large transaction 
> whose size may be larger than the max buffer size of jute. There is check of 
> buffer size in readBuffer method  of BinaryInputArchive, but no check in 
> writeBuffer method  of BinaryOutputArchive, which will cause that 
> 1, Leader can sync transaction to txn log and send the large transaction to 
> the followers, but the followers failed to read the transaction and can't 
> sync with leader.
> {code}
> 2015-01-04,12:42:26,474 WARN org.apache.zookeeper.server.quorum.Learner: 
> [myid:2] Exception when following the leader
> java.io.IOException: Unreasonable length = 2054758
> at 
> org.apache.jute.BinaryInputArchive.readBuffer(BinaryInputArchive.java:100)
> at 
> org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:85)
> at 
> org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:108)
> at 
> org.apache.zookeeper.server.quorum.Learner.readPacket(Learner.java:152)
> at 
> org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:85)
> at 
> org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:740)
> 2015-01-04,12:42:26,475 INFO org.apache.zookeeper.server.quorum.Learner: 
> [myid:2] shutdown called
> java.lang.Exception: shutdown Follower
> at 
> org.apache.zookeeper.server.quorum.Follower.shutdown(Follower.java:166)
> at 
> org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:744)
> {code}
> 2, The leader lose all followers, which trigger the leader election. The old 
> leader will become leader again for it has up-to-date data.
> {code}
> 2015-01-04,12:42:28,502 INFO org.apache.zookeeper.server.quorum.Leader: 
> [myid:3] Shutting down
> 2015-01-04,12:42:28,502 INFO org.apache.zookeeper.server.quorum.Leader: 
> [myid:3] Shutdown called
> java.lang.Exception: shutdown Leader! reason: Only 1 followers, need 2
> at org.apache.zookeeper.server.quorum.Leader.shutdown(Leader.java:496)
> at org.apache.zookeeper.server.quorum.Leader.lead(Leader.java:471)
> at 
> org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:753)
> {code}
> 3, The leader can not load the transaction from the txn log for the length of 
> data is larger than the max buffer of jute.
> {code}
> 2015-01-04,12:42:31,282 ERROR org.apache.zookeeper.server.quorum.QuorumPeer: 
> [myid:3] Unable to load database on disk
> java.io.IOException: Unreasonable length = 2054758
> at 
> org.apache.jute.BinaryInputArchive.readBuffer(BinaryInputArchive.java:100)
> at 
> org.apache.zookeeper.server.persistence.Util.readTxnBytes(Util.java:233)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:602)
> at 
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:157)
> at 
> org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223)
> at 
> org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:417)
> at 
> org.apache.zookeeper.server.quorum.QuorumPeer.getLastLoggedZxid(QuorumPeer.java:546)
> at 
> org.apache.zookeeper.server.quorum.FastLeaderElection.getInitLastLoggedZxid(FastLeaderElection.java:690)
> at 
> org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:737)
> at 
> org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:716)
> {code}
> The zookeeper service will be unavailable until we enlarge the jute.maxbuffer 
> and restart zookeeper hbase cluster.
> *Solution*
> Add buffer size check in BinaryOutputArchive to avoid large transaction be 
> written to log and sent to followers.
> But I am not sure if there are side-effects of throwing an IOException in 
> BinaryOutputArchive  and RequestProcessors



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ZOOKEEPER-2145) Node can be seen but not deleted

2015-08-26 Thread Camille Fournier (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14715496#comment-14715496
 ] 

Camille Fournier commented on ZOOKEEPER-2145:
-

Would love to look into this but it will be hard to reproduce without 
transaction logs and snapshots. Any chance anyone has an instance with more 
detailed information they can share?

> Node can be seen but not deleted
> 
>
> Key: ZOOKEEPER-2145
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2145
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.4.6
>Reporter: Frans Lawaetz
>
> I have a three-server ensemble that appears to be working fine in every 
> respect but for the fact that I can ls or get a znode but can not rmr it.
> >[zk: localhost:2181(CONNECTED) 0] get 
> >/accumulo/9354e975-7e2a-4207-8c7b-5d36c0e7765d/masters/goal_state
> CLEAN_STOP
> cZxid = 0x15
> ctime = Fri Feb 20 13:37:59 CST 2015
> mZxid = 0x72
> mtime = Fri Feb 20 13:38:05 CST 2015
> pZxid = 0x15
> cversion = 0
> dataVersion = 2
> aclVersion = 0
> ephemeralOwner = 0x0
> dataLength = 10
> numChildren = 0
> [zk: localhost:2181(CONNECTED) 1] rmr 
> /accumulo/9354e975-7e2a-4207-8c7b-5d36c0e7765d/masters/goal_state
> Node does not exist: 
> /accumulo/9354e975-7e2a-4207-8c7b-5d36c0e7765d/masters/goal_state
> I have run a 'stat' against all three servers and they seem properly 
> structured with a leader and two followers.  An md5sum of all zoo.cfg shows 
> them to be identical.  
> The problem seems localized to the accumulo/935 directory as I can create 
> and delete znodes outside of that path fine but not inside of it.
> For example:
> [zk: localhost:2181(CONNECTED) 12] create 
> /accumulo/9354e975-7e2a-4207-8c7b-5d36c0e7765d/fubar asdf
> Node does not exist: /accumulo/9354e975-7e2a-4207-8c7b-5d36c0e7765d/fubar
> [zk: localhost:2181(CONNECTED) 13] create /accumulo/fubar asdf
> Created /accumulo/fubar
> [zk: localhost:2181(CONNECTED) 14] ls /accumulo/fubar
> []
> [zk: localhost:2181(CONNECTED) 15] rmr /accumulo/fubar
> [zk: localhost:2181(CONNECTED) 16]
> Here is my zoo.cfg:
> tickTime=2000
> initLimit=10
> syncLimit=15
> dataDir=/data/extera/zkeeper/data
> clientPort=2181
>  maxClientCnxns=300
> autopurge.snapRetainCount=10
> autopurge.purgeInterval=1
> server.1=cdf61:2888:3888
> server.2=cdf62:2888:3888
> server.3=cdf63:2888:3888



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ZOOKEEPER-2101) Transaction larger than max buffer of jute makes zookeeper unavailable

2015-08-26 Thread Camille Fournier (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14715491#comment-14715491
 ] 

Camille Fournier commented on ZOOKEEPER-2101:
-

Can we get this finished up folks?

> Transaction larger than max buffer of jute makes zookeeper unavailable
> --
>
> Key: ZOOKEEPER-2101
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2101
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: jute
>Affects Versions: 3.4.4
>Reporter: Liu Shaohui
> Fix For: 3.5.2, 3.6.0
>
> Attachments: ZOOKEEPER-2101-v1.diff, ZOOKEEPER-2101-v2.diff, 
> ZOOKEEPER-2101-v3.diff, ZOOKEEPER-2101-v4.diff, ZOOKEEPER-2101-v5.diff, 
> ZOOKEEPER-2101-v6.diff, ZOOKEEPER-2101-v7.diff, test.diff
>
>
> *Problem*
> For multi operation, PrepRequestProcessor may produce a large transaction 
> whose size may be larger than the max buffer size of jute. There is check of 
> buffer size in readBuffer method  of BinaryInputArchive, but no check in 
> writeBuffer method  of BinaryOutputArchive, which will cause that 
> 1, Leader can sync transaction to txn log and send the large transaction to 
> the followers, but the followers failed to read the transaction and can't 
> sync with leader.
> {code}
> 2015-01-04,12:42:26,474 WARN org.apache.zookeeper.server.quorum.Learner: 
> [myid:2] Exception when following the leader
> java.io.IOException: Unreasonable length = 2054758
> at 
> org.apache.jute.BinaryInputArchive.readBuffer(BinaryInputArchive.java:100)
> at 
> org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:85)
> at 
> org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:108)
> at 
> org.apache.zookeeper.server.quorum.Learner.readPacket(Learner.java:152)
> at 
> org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:85)
> at 
> org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:740)
> 2015-01-04,12:42:26,475 INFO org.apache.zookeeper.server.quorum.Learner: 
> [myid:2] shutdown called
> java.lang.Exception: shutdown Follower
> at 
> org.apache.zookeeper.server.quorum.Follower.shutdown(Follower.java:166)
> at 
> org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:744)
> {code}
> 2, The leader lose all followers, which trigger the leader election. The old 
> leader will become leader again for it has up-to-date data.
> {code}
> 2015-01-04,12:42:28,502 INFO org.apache.zookeeper.server.quorum.Leader: 
> [myid:3] Shutting down
> 2015-01-04,12:42:28,502 INFO org.apache.zookeeper.server.quorum.Leader: 
> [myid:3] Shutdown called
> java.lang.Exception: shutdown Leader! reason: Only 1 followers, need 2
> at org.apache.zookeeper.server.quorum.Leader.shutdown(Leader.java:496)
> at org.apache.zookeeper.server.quorum.Leader.lead(Leader.java:471)
> at 
> org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:753)
> {code}
> 3, The leader can not load the transaction from the txn log for the length of 
> data is larger than the max buffer of jute.
> {code}
> 2015-01-04,12:42:31,282 ERROR org.apache.zookeeper.server.quorum.QuorumPeer: 
> [myid:3] Unable to load database on disk
> java.io.IOException: Unreasonable length = 2054758
> at 
> org.apache.jute.BinaryInputArchive.readBuffer(BinaryInputArchive.java:100)
> at 
> org.apache.zookeeper.server.persistence.Util.readTxnBytes(Util.java:233)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:602)
> at 
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:157)
> at 
> org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223)
> at 
> org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:417)
> at 
> org.apache.zookeeper.server.quorum.QuorumPeer.getLastLoggedZxid(QuorumPeer.java:546)
> at 
> org.apache.zookeeper.server.quorum.FastLeaderElection.getInitLastLoggedZxid(FastLeaderElection.java:690)
> at 
> org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:737)
> at 
> org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:716)
> {code}
> The zookeeper service will be unavailable until we enlarge the jute.maxbuffer 
> and restart zookeeper hbase cluster.
> *Solution*
> Add buffer size check in BinaryOutputArchive to avoid large transaction be 
> written to log and sent to followers.
> But I am not sure if there are side-effects of throwing an IOException in 
> BinaryOutputArchive  and RequestProcessors



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Issue Comment Deleted] (ZOOKEEPER-2084) Document local session parameters

2015-08-26 Thread Camille Fournier (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Camille Fournier updated ZOOKEEPER-2084:

Comment: was deleted

(was: very good, see here : http://wfshare.com/)

> Document local session parameters
> -
>
> Key: ZOOKEEPER-2084
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2084
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: documentation
>Affects Versions: 3.5.0
>Reporter: Flavio Junqueira
> Fix For: 3.5.2, 3.6.0
>
>
> Document the options introduced in ZOOKEEPER-1147.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (ZOOKEEPER-2059) Use command like this "./zkCli.sh -server host:port cmd args" but it doesn't work, 3.4.5 version is work fine

2015-08-26 Thread Camille Fournier (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Camille Fournier resolved ZOOKEEPER-2059.
-
Resolution: Cannot Reproduce

Looks like this was fixed in later releases

> Use command like this "./zkCli.sh -server host:port cmd args" but it doesn't 
> work, 3.4.5 version is work fine
> -
>
> Key: ZOOKEEPER-2059
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2059
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.4.6
>Reporter: huanghaijun
>
> Use command like this [./zkCli.sh -server host:port cmd args], such as 
> [./zkCli.sh -server localhost:2181 create /test ""]  to create a node, 3.4.5 
> is work fine, but 3.4.6 it doesn't work.
> for 3.4.5 it is ok
> zookeeper-3.4.5/bin> ./zkCli.sh -server localhost:34096 create /test ""
> Connecting to localhost:34096
> WATCHER::
> WatchedEvent state:SyncConnected type:None path:null
> Created /test
> for 3.4.6 it's not ok
> zookeeper-3.4.6/bin> ./zkCli.sh -server localhost:43096 crate /test1 ""
> Connecting to localhost:43096
> 
> 2014-10-10 01:24:44,517 [myid:] - INFO  [main:ZooKeeper@438] - Initiating 
> client connection, connectString=localhost:43096 sessionTimeout=3 
> watcher=org.apache.zookeeper.ZooKeeperMain$MyWatcher@48b8f82d



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ZOOKEEPER-2005) Failure to setCurrentEpoch on lead

2015-08-26 Thread Camille Fournier (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14715484#comment-14715484
 ] 

Camille Fournier commented on ZOOKEEPER-2005:
-

Do we believe this issue is still valid?

> Failure to setCurrentEpoch on lead
> --
>
> Key: ZOOKEEPER-2005
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2005
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: leaderElection
>Affects Versions: 3.4.6
>Reporter: Ioannis Canellos
> Attachments: ZOOKEEPER-2005-Test.patch
>
>
> We are embedding the zookeeper server in our container and every now and then 
> I see the exception below when running our integration tests suite.
> This is something that have never bother us before when using 3.4.5 but we do 
> see in 3.4.6. 
> When this occurs, the ensemble is not formed.
> java.io.IOException: Could not rename temporary file 
> /data/zookeeper/0001/version-2/currentEpoch.tmp to 
> /data/zookeeper/0001/version-2/currentEpoch
> at 
> org.apache.zookeeper.common.AtomicFileOutputStream.close(AtomicFileOutputStream.java:82)
> at 
> org.apache.zookeeper.server.quorum.QuorumPeer.writeLongToFile(QuorumPeer.java:1202)
> at 
> org.apache.zookeeper.server.quorum.QuorumPeer.setCurrentEpoch(QuorumPeer.java:1223)
> at org.apache.zookeeper.server.quorum.Leader.lead(Leader.java:395)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (ZOOKEEPER-2155) network is not good, the watcher in observer env will clear

2015-08-26 Thread Camille Fournier (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Camille Fournier resolved ZOOKEEPER-2155.
-
Resolution: Invalid

Please reopen if there is more information here

> network is not good, the watcher in observer env will clear
> ---
>
> Key: ZOOKEEPER-2155
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2155
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: quorum
>Affects Versions: 3.4.6
>Reporter: linking12
>Priority: Critical
>  Labels: moreinfo
> Fix For: 3.5.0
>
>
> When I set up a ZooKeeper ensemble that uses Observers, The network is not 
> very good.
> I find all of the watcher disappear.
> I read the source code and find:
>   When the observer connect to leader, will dump the DataTree from leader and 
> rebuild in observer.
> But the datawachers and childWatches is cleared for this operation.
> after i change code like:
> WatchManager dataWatchers = zk.getZKDatabase().getDataTree()
>.getDataWatches();
> WatchManager childWatchers = zk.getZKDatabase().getDataTree()
>.getChildWatches();
> zk.getZKDatabase().clear();
> zk.getZKDatabase().deserializeSnapshot(leaderIs);
> zk.getZKDatabase().getDataTree().setDataWatches(dataWatchers);
> zk.getZKDatabase().getDataTree().setChildWatches(childWatchers);
> The watcher do not disappear



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ZOOKEEPER-2238) Support limiting the maximum number of connections/clients to a zookeeper server.

2015-08-04 Thread Camille Fournier (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14653808#comment-14653808
 ] 

Camille Fournier commented on ZOOKEEPER-2238:
-

Sounds like a fine feature. Please provide a patch!

> Support limiting the maximum number of connections/clients to a zookeeper 
> server.
> -
>
> Key: ZOOKEEPER-2238
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2238
> Project: ZooKeeper
>  Issue Type: Improvement
>Reporter: nijel
>
> Currently zookeeper have the feature of limiting the maximum number of 
> connection/client  per IP or Host (maxClientCnxns).
> But to safe guard zookeeper server from DoS attack due to many clients from 
> different IPs,  it is better to have a limit of total number of 
> connections/clients to a a single member of the ZooKeeper ensemble as well.
> So the idea is to introduce a new configuration to limit the maximum number 
> of total connections/clients.
> Please share your thoughts.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ZOOKEEPER-2217) event might lost before re-watch

2015-06-26 Thread Camille Fournier (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14603922#comment-14603922
 ] 

Camille Fournier commented on ZOOKEEPER-2217:
-

[~caspian] I am closing this jira because this was a fundamental design 
decision of the system and there seems to be some confusion about the intended 
usage and behavior. We're happy to discuss this in more depth on the users or 
dev mailing lists if you are interested in feedback on what you are trying to 
do. Thanks!

> event might lost before re-watch
> 
>
> Key: ZOOKEEPER-2217
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2217
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: c client, java client
>Affects Versions: 3.4.5, 3.4.6
> Environment: jdk1.7_45 on centos6.5 and ubuntu14.4 
>Reporter: Caspian
>
> I use zk to  monitor the children nodes under a path, eg: /servers. 
> when the client is told that children changes,  I have to re-watch the path 
> again, during the peroid, it's possible that some children down, or some up. 
> And those events will be missed.
> For now, my temporary solution is not to use getChildren(path, true...) to 
> get children and re-watch this path, but re-watch this path first, then get 
> the children. Thus non events can be ignored, but I don't know what will the 
> zk server be like if there are too much clients that act like this.
> How do you think of this problem? Is there any other solutions?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (ZOOKEEPER-2217) event might lost before re-watch

2015-06-26 Thread Camille Fournier (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Camille Fournier resolved ZOOKEEPER-2217.
-
Resolution: Not A Problem

> event might lost before re-watch
> 
>
> Key: ZOOKEEPER-2217
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2217
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: c client, java client
>Affects Versions: 3.4.5, 3.4.6
> Environment: jdk1.7_45 on centos6.5 and ubuntu14.4 
>Reporter: Caspian
>
> I use zk to  monitor the children nodes under a path, eg: /servers. 
> when the client is told that children changes,  I have to re-watch the path 
> again, during the peroid, it's possible that some children down, or some up. 
> And those events will be missed.
> For now, my temporary solution is not to use getChildren(path, true...) to 
> get children and re-watch this path, but re-watch this path first, then get 
> the children. Thus non events can be ignored, but I don't know what will the 
> zk server be like if there are too much clients that act like this.
> How do you think of this problem? Is there any other solutions?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ZOOKEEPER-2217) event might lost before re-watch

2015-06-24 Thread Camille Fournier (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14600549#comment-14600549
 ] 

Camille Fournier commented on ZOOKEEPER-2217:
-

This is a fundamental aspect of the design of ZK watches, they do not guarantee 
that events will be caught in-between the watch being fired and the new data 
being read. See the documentation here:
http://zookeeper.apache.org/doc/r3.4.6/zookeeperProgrammers.html#ch_zkWatches

In particular, they are not designed to be a messaging bus where you get every 
change in order.

> event might lost before re-watch
> 
>
> Key: ZOOKEEPER-2217
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2217
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: c client, java client
>Affects Versions: 3.4.5, 3.4.6
> Environment: jdk1.7_45 on centos6.5 and ubuntu14.4 
>Reporter: Caspian
>
> I use zk to  monitor the children nodes under a path, eg: /servers. 
> when the client is told that children changes,  I have to re-watch the path 
> again, during the peroid, it's possible that some children down, or some up. 
> And those events will be missed.
> For now, my temporary solution is not to use getChildren(path, true...) to 
> get children and re-watch this path, but re-watch this path first, then get 
> the children. Thus non events can be ignored, but I don't know what will the 
> zk server be like if there are too much clients that act like this.
> How do you think of this problem? Is there any other solutions?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ZOOKEEPER-2163) Introduce new ZNode type: container

2015-04-30 Thread Camille Fournier (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14521955#comment-14521955
 ] 

Camille Fournier commented on ZOOKEEPER-2163:
-

I can't think of a better way so I'm OK with it

> Introduce new ZNode type: container
> ---
>
> Key: ZOOKEEPER-2163
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2163
> Project: ZooKeeper
>  Issue Type: New Feature
>  Components: c client, java client, server
>Affects Versions: 3.5.0
>Reporter: Jordan Zimmerman
>Assignee: Jordan Zimmerman
> Attachments: zookeeper-2163.3.patch
>
>
> BACKGROUND
> 
> A recurring problem for ZooKeeper users is garbage collection of parent 
> nodes. Many recipes (e.g. locks, leaders, etc.) call for the creation of a 
> parent node under which participants create sequential nodes. When the 
> participant is done, it deletes its node. In practice, the ZooKeeper tree 
> begins to fill up with orphaned parent nodes that are no longer needed. The 
> ZooKeeper APIs don’t provide a way to clean these. Over time, ZooKeeper can 
> become unstable due to the number of these nodes.
> CURRENT SOLUTIONS
> ===
> Apache Curator has a workaround solution for this by providing the Reaper 
> class which runs in the background looking for orphaned parent nodes and 
> deleting them. This isn’t ideal and it would be better if ZooKeeper supported 
> this directly.
> PROPOSAL
> =
> ZOOKEEPER-723 and ZOOKEEPER-834 have been proposed to allow EPHEMERAL nodes 
> to contain child nodes. This is not optimum as EPHEMERALs are tied to a 
> session and the general use case of parent nodes is for PERSISTENT nodes. 
> This proposal adds a new node type, CONTAINER. A CONTAINER node is the same 
> as a PERSISTENT node with the additional property that when its last child is 
> deleted, it is deleted (and CONTAINER nodes recursively up the tree are 
> deleted if empty).
> CANONICAL USAGE
> 
> {code}
> while ( true) { // or some reasonable limit
> try {
> zk.create(path, ...);
> break;
> } catch ( KeeperException.NoNodeException e ) {
> try {
> zk.createContainer(containerPath, ...);
> } catch ( KeeperException.NodeExistsException ignore) {
>}
> }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ZOOKEEPER-2163) Introduce new ZNode type: container

2015-04-30 Thread Camille Fournier (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14521943#comment-14521943
 ] 

Camille Fournier commented on ZOOKEEPER-2163:
-

Haven't done a super detailed code review but I am +1 with the high-level 
implementation, api, etc.

> Introduce new ZNode type: container
> ---
>
> Key: ZOOKEEPER-2163
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2163
> Project: ZooKeeper
>  Issue Type: New Feature
>  Components: c client, java client, server
>Affects Versions: 3.5.0
>Reporter: Jordan Zimmerman
>Assignee: Jordan Zimmerman
> Attachments: zookeeper-2163.3.patch
>
>
> BACKGROUND
> 
> A recurring problem for ZooKeeper users is garbage collection of parent 
> nodes. Many recipes (e.g. locks, leaders, etc.) call for the creation of a 
> parent node under which participants create sequential nodes. When the 
> participant is done, it deletes its node. In practice, the ZooKeeper tree 
> begins to fill up with orphaned parent nodes that are no longer needed. The 
> ZooKeeper APIs don’t provide a way to clean these. Over time, ZooKeeper can 
> become unstable due to the number of these nodes.
> CURRENT SOLUTIONS
> ===
> Apache Curator has a workaround solution for this by providing the Reaper 
> class which runs in the background looking for orphaned parent nodes and 
> deleting them. This isn’t ideal and it would be better if ZooKeeper supported 
> this directly.
> PROPOSAL
> =
> ZOOKEEPER-723 and ZOOKEEPER-834 have been proposed to allow EPHEMERAL nodes 
> to contain child nodes. This is not optimum as EPHEMERALs are tied to a 
> session and the general use case of parent nodes is for PERSISTENT nodes. 
> This proposal adds a new node type, CONTAINER. A CONTAINER node is the same 
> as a PERSISTENT node with the additional property that when its last child is 
> deleted, it is deleted (and CONTAINER nodes recursively up the tree are 
> deleted if empty).
> CANONICAL USAGE
> 
> {code}
> while ( true) { // or some reasonable limit
> try {
> zk.create(path, ...);
> break;
> } catch ( KeeperException.NoNodeException e ) {
> try {
> zk.createContainer(containerPath, ...);
> } catch ( KeeperException.NodeExistsException ignore) {
>}
> }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ZOOKEEPER-2163) Introduce new ZNode type: container

2015-04-28 Thread Camille Fournier (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14517154#comment-14517154
 ] 

Camille Fournier commented on ZOOKEEPER-2163:
-

Note the findbugs error introduced.

> Introduce new ZNode type: container
> ---
>
> Key: ZOOKEEPER-2163
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2163
> Project: ZooKeeper
>  Issue Type: New Feature
>  Components: c client, java client, server
>Affects Versions: 3.5.0
>Reporter: Jordan Zimmerman
>Assignee: Jordan Zimmerman
> Attachments: zookeeper-2163.patch
>
>
> BACKGROUND
> 
> A recurring problem for ZooKeeper users is garbage collection of parent 
> nodes. Many recipes (e.g. locks, leaders, etc.) call for the creation of a 
> parent node under which participants create sequential nodes. When the 
> participant is done, it deletes its node. In practice, the ZooKeeper tree 
> begins to fill up with orphaned parent nodes that are no longer needed. The 
> ZooKeeper APIs don’t provide a way to clean these. Over time, ZooKeeper can 
> become unstable due to the number of these nodes.
> CURRENT SOLUTIONS
> ===
> Apache Curator has a workaround solution for this by providing the Reaper 
> class which runs in the background looking for orphaned parent nodes and 
> deleting them. This isn’t ideal and it would be better if ZooKeeper supported 
> this directly.
> PROPOSAL
> =
> ZOOKEEPER-723 and ZOOKEEPER-834 have been proposed to allow EPHEMERAL nodes 
> to contain child nodes. This is not optimum as EPHEMERALs are tied to a 
> session and the general use case of parent nodes is for PERSISTENT nodes. 
> This proposal adds a new node type, CONTAINER. A CONTAINER node is the same 
> as a PERSISTENT node with the additional property that when its last child is 
> deleted, it is deleted (and CONTAINER nodes recursively up the tree are 
> deleted if empty).
> CANONICAL USAGE
> 
> {code}
> while ( true) { // or some reasonable limit
> try {
> zk.create(path, ...);
> break;
> } catch ( KeeperException.NoNodeException e ) {
> try {
> zk.createContainer(containerPath, ...);
> } catch ( KeeperException.NodeExistsException ignore) {
>}
> }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (ZOOKEEPER-2173) ZK startup failure should be handled with proper error message

2015-04-27 Thread Camille Fournier (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Camille Fournier updated ZOOKEEPER-2173:

Fix Version/s: 3.6.0
   3.5.1

> ZK startup failure should be handled with proper error message
> --
>
> Key: ZOOKEEPER-2173
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2173
> Project: ZooKeeper
>  Issue Type: Bug
>Reporter: J.Andreina
>Assignee: J.Andreina
> Fix For: 3.5.1, 3.6.0
>
> Attachments: ZOOKEEPER-2173.1.patch
>
>
> If any failure during zk Startup (myid file does not exist), then still zk 
> startup returns as successful (STARTED).
> ZK startup failure should be handled with proper error message



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ZOOKEEPER-2173) ZK startup failure should be handled with proper error message

2015-04-27 Thread Camille Fournier (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14514648#comment-14514648
 ] 

Camille Fournier commented on ZOOKEEPER-2173:
-

This is great, +1. I will check it in.

> ZK startup failure should be handled with proper error message
> --
>
> Key: ZOOKEEPER-2173
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2173
> Project: ZooKeeper
>  Issue Type: Bug
>Reporter: J.Andreina
>Assignee: J.Andreina
> Attachments: ZOOKEEPER-2173.1.patch
>
>
> If any failure during zk Startup (myid file does not exist), then still zk 
> startup returns as successful (STARTED).
> ZK startup failure should be handled with proper error message



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ZOOKEEPER-2171) avoid reverse lookups in QuorumCnxManager

2015-04-27 Thread Camille Fournier (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14514612#comment-14514612
 ] 

Camille Fournier commented on ZOOKEEPER-2171:
-

I see a bunch of references throughout the project beyond QCM. Is it just QCM 
that needs updating?

> avoid reverse lookups in QuorumCnxManager
> -
>
> Key: ZOOKEEPER-2171
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2171
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: quorum
>Reporter: Raul Gutierrez Segales
>Assignee: Raul Gutierrez Segales
> Fix For: 3.5.1, 3.6.0
>
>
> Apparently, ZOOKEEPER-107 (via a quick git-blame look) introduced a bunch of 
> getHostName() calls in QCM. Besides the overhead, these can cause problems 
> when mixed with failing/mis-configured DNS servers.
> It would be nice to reduce them, if that doesn't affect operational 
> correctness. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ZOOKEEPER-2062) RemoveWatchesTest takes forever to run

2015-04-27 Thread Camille Fournier (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14514596#comment-14514596
 ] 

Camille Fournier commented on ZOOKEEPER-2062:
-

I dig this a bunch.
[~rakeshr] if you wanna upload an updated patch with your changes I will review 
it and pending my +1 you can check it in.

> RemoveWatchesTest takes forever to run
> --
>
> Key: ZOOKEEPER-2062
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2062
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: tests
>Affects Versions: 3.5.0
>Reporter: Flavio Junqueira
>Assignee: Chris Nauroth
> Attachments: ZOOKEEPER-2062.001.patch, ZOOKEEPER-2062.002.patch
>
>
> [junit] Running org.apache.zookeeper.RemoveWatchesTest
> [junit] Tests run: 46, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
> 306.188 sec



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (ZOOKEEPER-2169) Enable creation of nodes with TTLs

2015-04-16 Thread Camille Fournier (JIRA)

Camille Fournier created ZOOKEEPER-2169:
---

 Summary: Enable creation of nodes with TTLs
 Key: ZOOKEEPER-2169
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2169
 Project: ZooKeeper
  Issue Type: New Feature
  Components: c client, java client, jute, server
Affects Versions: 3.6.0
Reporter: Camille Fournier
 Fix For: 3.6.0


As a user, I would like to be able to create a node that is NOT tied to a 
session but that WILL expire automatically if action is not taken by some 
client within a time window.

I propose this to enable clients interacting with ZK via http or other "thin 
clients" to create ephemeral-like nodes.

Some ideas for the design, up for discussion:

The node should support all normal ZK node operations including ACLs, 
sequential key generation, etc, however, it should not support the ephemeral 
flag. The node will be created with a TTL that is updated via a refresh 
operation. 

The ZK quorum will watch this node similarly to the way that it watches for 
session liveness; if the node is not refreshed within the TTL, it will expire.

QUESTIONS:

1) Should we let the refresh operation set the TTL to a different base value?
2) If so, should the setting of the TTL to a new base value cause a watch to 
fire?
3) Do we want to allow these nodes to have children or prevent this similar to 
ephemeral nodes?





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ZOOKEEPER-2156) If JAVA_HOME is not set zk startup and fetching status command execution result misleads user.

2015-04-15 Thread Camille Fournier (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14496151#comment-14496151
 ] 

Camille Fournier commented on ZOOKEEPER-2156:
-

I think this patch is fine but isn't the problem that we had an error that we 
didn't detect and reported success in starting the server, not that we didn't 
check JAVA_HOME etc properly? I am not a bash shell script expert but seems 
like we also need to fix this so that it doesn't claim to have started the 
process when in fact it couldn't start it due to some failure. 

> If JAVA_HOME is not set zk startup and fetching status command execution 
> result misleads user.
> --
>
> Key: ZOOKEEPER-2156
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2156
> Project: ZooKeeper
>  Issue Type: Bug
>Reporter: J.Andreina
>Assignee: J.Andreina
> Fix For: 3.5.2, 3.6.0
>
> Attachments: ZOOKEEPER-2156.1.patch, ZOOKEEPER-2156.2.patch, 
> ZOOKEEPER-2156.3.patch, ZOOKEEPER-2156.4.patch
>
>
> If JAVA_HOME is not set,  zk startup and fetching status command execution 
> result misleads user.
> 1. Eventhough zk startup has failed since JAVA_HOME is not set , on CLI it 
> displays that zk STARTED.
> {noformat}
> #:~/Apr3rd/zookeeper-3.4.6/bin> ./zkServer.sh start
> JMX enabled by default
> Using config: /home/REX/Apr3rd/zookeeper-3.4.6/bin/../conf/zoo.cfg
> Starting zookeeper ... STARTED
> {noformat}
> 2.  Fetching zk status when JAVA_HOME is not set displays that process not 
> running .
> {noformat}
> #:~/Apr3rd/zookeeper-3.4.6/bin> ./zkServer.sh status
> JMX enabled by default
> Using config: /home/REX/Apr3rd/zookeeper-3.4.6/bin/../conf/zoo.cfg
> Error contacting service. It is probably not running.
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ZOOKEEPER-1506) Re-try DNS hostname -> IP resolution if node connection fails

2015-04-09 Thread Camille Fournier (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14487759#comment-14487759
 ] 

Camille Fournier commented on ZOOKEEPER-1506:
-

+1

> Re-try DNS hostname -> IP resolution if node connection fails
> -
>
> Key: ZOOKEEPER-1506
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1506
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: server
>Affects Versions: 3.4.5
> Environment: Ubuntu 11.04 64-bit
>Reporter: Mike Heffner
>Assignee: Michael Lasevich
>Priority: Critical
>  Labels: patch
> Fix For: 3.4.7, 3.5.1, 3.6.0
>
> Attachments: ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, 
> ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, 
> ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, zk-dns-caching-refresh.patch
>
>
> In our zoo.cfg we use hostnames to identify the ZK servers that are part of 
> an ensemble. These hostnames are configured with a low (<= 60s) TTL and the 
> IP address they map to can and does change. Our procedure for 
> replacing/upgrading a ZK node is to boot an entirely new instance and remap 
> the hostname to the new instance's IP address. Our expectation is that when 
> the original ZK node is terminated/shutdown, the remaining nodes in the 
> ensemble would reconnect to the new instance.
> However, what we are noticing is that the remaining ZK nodes do not attempt 
> to re-resolve the hostname->IP mapping for the new server. Once the original 
> ZK node is terminated, the existing servers continue to attempt contacting it 
> at the old IP address. It would be great if the ZK servers could try to 
> re-resolve the hostname when attempting to connect to a lost ZK server, 
> instead of caching the lookup indefinitely. Currently we must do a rolling 
> restart of the ZK ensemble after swapping a node -- which at three nodes 
> means we periodically lose quorum.
> The exact method we are following is to boot new instances in EC2 and attach 
> one, of a set of three, Elastic IP address. External to EC2 this IP address 
> remains the same and maps to whatever instance it is attached to. Internal to 
> EC2, the elastic IP hostname has a TTL of about 45-60 seconds and is remapped 
> to the internal (10.x.y.z) address of the instance it is attached to. 
> Therefore, in our case we would like ZK to pickup the new 10.x.y.z address 
> that the elastic IP hostname gets mapped to and reconnect appropriately.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ZOOKEEPER-2144) Provide a way to update the auth info on a connection

2015-03-16 Thread Camille Fournier (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14364235#comment-14364235
 ] 

Camille Fournier commented on ZOOKEEPER-2144:
-

Can you explain the problem in a bit more detail? I think the question in my 
mind is will there be a security risk by allowing updating of auth info in the 
manner you suggest. What are your thoughts around that?

> Provide a way to update the auth info on a connection
> -
>
> Key: ZOOKEEPER-2144
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2144
> Project: ZooKeeper
>  Issue Type: Improvement
>Reporter: Karol Dudzinski
>
> The current auth info implementation makes it very difficult to work with 
> expiring auth info.  If a client fails over between servers, it resends its 
> list of auth info in a FIFO order.  Therefore, if any of the info has 
> expired, it'll cause the session to be lost.  There is currently no way to 
> update or remove any existing info, only add.  Any objections to adding an 
> update or remove auth info method?
> An alternate solution would be for ClientCnxn.AuthData to implement an equals 
> method that only checks the scheme field.  As the AuthData is stored in a 
> set, this would have the same effect as an update operation.  However, I'm 
> not sure if there is a use case for supplying multiple bits of AuthData for 
> the same scheme?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ZOOKEEPER-2142) JMX ObjectName is incorrect for observers

2015-03-16 Thread Camille Fournier (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14364232#comment-14364232
 ] 

Camille Fournier commented on ZOOKEEPER-2142:
-

Seems like a reasonable thing to fix if it is causing you issues, please submit 
a patch!

> JMX ObjectName is incorrect for observers
> -
>
> Key: ZOOKEEPER-2142
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2142
> Project: ZooKeeper
>  Issue Type: Bug
>Reporter: Karol Dudzinski
>Priority: Trivial
>
> Observers show up in JMX as StandaloneServer rather than Observer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ZOOKEEPER-1865) Fix retry logic in Learner.connectToLeader()

2015-03-15 Thread Camille Fournier (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14362592#comment-14362592
 ] 

Camille Fournier commented on ZOOKEEPER-1865:
-

[~michim] is this failing on trunk or precommit builds? Can you point to the 
build that you saw fail besides your local?

> Fix retry logic in Learner.connectToLeader() 
> -
>
> Key: ZOOKEEPER-1865
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1865
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Reporter: Thawan Kooburat
>Assignee: Edward Carter
> Fix For: 3.5.1, 3.6.0
>
> Attachments: ZOOKEEPER-1865-nanoTime.patch, 
> ZOOKEEPER-1865-testfix.patch, ZOOKEEPER-1865.patch
>
>
> We discovered a long leader election time today in one of our prod ensemble.
> Here is the description of the event. 
> Before the old leader goes down, it is able to announce notification message. 
> So 3 out 5 (including the old leader) elected the old leader to be a new 
> leader for the next epoch. While, the old leader is being rebooted, 2 other 
> machines are trying to connect to the old leader.  So the quorum couldn't 
> form until those 2 machines give up and move to the next round of leader 
> election.
> This is because Learner.connectToLeader() use a simple retry logic. The 
> contract for this method is that it should never spend longer that initLimit 
> trying to connect to the leader.  In our outage, each sock.connect() is 
> probably blocked for initLimit and it is called 5 times.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ZOOKEEPER-1865) Fix retry logic in Learner.connectToLeader()

2015-03-15 Thread Camille Fournier (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14362585#comment-14362585
 ] 

Camille Fournier commented on ZOOKEEPER-1865:
-

Not sure, I can't get it to fail but let me look.

> Fix retry logic in Learner.connectToLeader() 
> -
>
> Key: ZOOKEEPER-1865
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1865
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Reporter: Thawan Kooburat
>Assignee: Edward Carter
> Fix For: 3.5.1, 3.6.0
>
> Attachments: ZOOKEEPER-1865-nanoTime.patch, 
> ZOOKEEPER-1865-testfix.patch, ZOOKEEPER-1865.patch
>
>
> We discovered a long leader election time today in one of our prod ensemble.
> Here is the description of the event. 
> Before the old leader goes down, it is able to announce notification message. 
> So 3 out 5 (including the old leader) elected the old leader to be a new 
> leader for the next epoch. While, the old leader is being rebooted, 2 other 
> machines are trying to connect to the old leader.  So the quorum couldn't 
> form until those 2 machines give up and move to the next round of leader 
> election.
> This is because Learner.connectToLeader() use a simple retry logic. The 
> contract for this method is that it should never spend longer that initLimit 
> trying to connect to the leader.  In our outage, each sock.connect() is 
> probably blocked for initLimit and it is called 5 times.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ZOOKEEPER-1506) Re-try DNS hostname -> IP resolution if node connection fails

2015-03-14 Thread Camille Fournier (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14362115#comment-14362115
 ] 

Camille Fournier commented on ZOOKEEPER-1506:
-

The only concern I have is removing the retries. I think it's probably right to 
do but does it imply documentation changes anywhere that we need to make? 
[~michim]?

> Re-try DNS hostname -> IP resolution if node connection fails
> -
>
> Key: ZOOKEEPER-1506
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1506
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: server
>Affects Versions: 3.4.5
> Environment: Ubuntu 11.04 64-bit
>Reporter: Mike Heffner
>Assignee: Michael Lasevich
>Priority: Critical
>  Labels: patch
> Fix For: 3.4.7, 3.5.1, 3.6.0
>
> Attachments: ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, 
> ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, zk-dns-caching-refresh.patch
>
>
> In our zoo.cfg we use hostnames to identify the ZK servers that are part of 
> an ensemble. These hostnames are configured with a low (<= 60s) TTL and the 
> IP address they map to can and does change. Our procedure for 
> replacing/upgrading a ZK node is to boot an entirely new instance and remap 
> the hostname to the new instance's IP address. Our expectation is that when 
> the original ZK node is terminated/shutdown, the remaining nodes in the 
> ensemble would reconnect to the new instance.
> However, what we are noticing is that the remaining ZK nodes do not attempt 
> to re-resolve the hostname->IP mapping for the new server. Once the original 
> ZK node is terminated, the existing servers continue to attempt contacting it 
> at the old IP address. It would be great if the ZK servers could try to 
> re-resolve the hostname when attempting to connect to a lost ZK server, 
> instead of caching the lookup indefinitely. Currently we must do a rolling 
> restart of the ZK ensemble after swapping a node -- which at three nodes 
> means we periodically lose quorum.
> The exact method we are following is to boot new instances in EC2 and attach 
> one, of a set of three, Elastic IP address. External to EC2 this IP address 
> remains the same and maps to whatever instance it is attached to. Internal to 
> EC2, the elastic IP hostname has a TTL of about 45-60 seconds and is remapped 
> to the internal (10.x.y.z) address of the instance it is attached to. 
> Therefore, in our case we would like ZK to pickup the new 10.x.y.z address 
> that the elastic IP hostname gets mapped to and reconnect appropriately.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ZOOKEEPER-2125) SSL on Netty client-server communication

2015-03-14 Thread Camille Fournier (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14362113#comment-14362113
 ] 

Camille Fournier commented on ZOOKEEPER-2125:
-

[~michim] tbh not sure there's much I can add unless very specifically 
directed. Any areas you folks are really concerned with having another set of 
eyes on?

> SSL on Netty client-server communication
> 
>
> Key: ZOOKEEPER-2125
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2125
> Project: ZooKeeper
>  Issue Type: Sub-task
>Reporter: Hongchao Deng
>Assignee: Hongchao Deng
> Fix For: 3.5.1
>
> Attachments: ZOOKEEPER-2125-build.patch, ZOOKEEPER-2125.patch, 
> ZOOKEEPER-2125.patch, ZOOKEEPER-2125.patch, ZOOKEEPER-2125.patch, 
> ZOOKEEPER-2125.patch, ZOOKEEPER-2125.patch, ZOOKEEPER-2125.patch, 
> ZOOKEEPER-2125.patch, ZOOKEEPER-2125.patch, ZOOKEEPER-2125.patch, 
> ZOOKEEPER-2125.patch, ZOOKEEPER-2125.patch, ZOOKEEPER-2125.patch, 
> ZOOKEEPER-2125.patch, ZOOKEEPER-2125.patch, ZOOKEEPER-2125.patch, 
> ZOOKEEPER-2125.patch, testKeyStore.jks, testTrustStore.jks
>
>
> Supporting SSL on Netty client-server communication. 
> 1. It supports keystore and trustore usage. 
> 2. It adds an additional ZK server port which supports SSL. This would be 
> useful for rolling upgrade.
> RB: https://reviews.apache.org/r/31277/
> The patch includes three files: 
> * testing purpose keystore and truststore under 
> "$(ZK_REPO_HOME)/src/java/test/data/ssl". Might need to create "ssl/".
> * latest ZOOKEEPER-2125.patch
> h2. How to use it
> You need to set some parameters on both ZK server and client.
> h3. Server
> You need to specify a listening SSL port in "zoo.cfg":
> {code}
> secureClientPort=2281
> {code}
> Just like what you did with "clientPort". And then set some jvm flags:
> {code}
> export 
> SERVER_JVMFLAGS="-Dzookeeper.serverCnxnFactory=org.apache.zookeeper.server.NettyServerCnxnFactory
>  -Dzookeeper.ssl.keyStore.location=/root/zookeeper/ssl/testKeyStore.jks 
> -Dzookeeper.ssl.keyStore.password=testpass 
> -Dzookeeper.ssl.trustStore.location=/root/zookeeper/ssl/testTrustStore.jks 
> -Dzookeeper.ssl.trustStore.password=testpass"
> {code}
> Please change keystore and truststore parameters accordingly.
> h3. Client
> You need to set jvm flags:
> {code}
> export 
> CLIENT_JVMFLAGS="-Dzookeeper.clientCnxnSocket=org.apache.zookeeper.ClientCnxnSocketNetty
>  -Dzookeeper.client.secure=true 
> -Dzookeeper.ssl.keyStore.location=/root/zookeeper/ssl/testKeyStore.jks 
> -Dzookeeper.ssl.keyStore.password=testpass 
> -Dzookeeper.ssl.trustStore.location=/root/zookeeper/ssl/testTrustStore.jks 
> -Dzookeeper.ssl.trustStore.password=testpass"
> {code}
> change keystore and truststore parameters accordingly.
> And then connect to the server's SSL port, in this case:
> {code}
> bin/zkCli.sh -server 127.0.0.1:2281
> {code}
> If you have any feedback, you are more than welcome to discuss it here!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ZOOKEEPER-1893) automake: use serial-tests option

2015-03-14 Thread Camille Fournier (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14362108#comment-14362108
 ] 

Camille Fournier commented on ZOOKEEPER-1893:
-

Awesome I will do it [~hdeng]

> automake: use serial-tests option
> -
>
> Key: ZOOKEEPER-1893
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1893
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: c client
>Reporter: Michi Mutsuzaki
>Assignee: Michi Mutsuzaki
>Priority: Minor
> Fix For: 3.5.1, 3.6.0
>
> Attachments: ZOOKEEPER-1893.patch
>
>
> automake switched to run tests in parallel by default in 1.13, but zktest-st 
> and zktest-mt can't run in parallel. We can use the serial-tests option to 
> run tests serially but this option was introduced in automake 1.12. I don't 
> know which version of automake buidbot has. I'll upload the patch and see.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (ZOOKEEPER-1865) Fix retry logic in Learner.connectToLeader()

2015-03-14 Thread Camille Fournier (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Camille Fournier updated ZOOKEEPER-1865:

Attachment: ZOOKEEPER-1865-testfix.patch

fix test, leaves Learner changes as-is

> Fix retry logic in Learner.connectToLeader() 
> -
>
> Key: ZOOKEEPER-1865
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1865
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Reporter: Thawan Kooburat
>Assignee: Edward Carter
> Fix For: 3.5.1
>
> Attachments: ZOOKEEPER-1865-nanoTime.patch, 
> ZOOKEEPER-1865-testfix.patch, ZOOKEEPER-1865.patch
>
>
> We discovered a long leader election time today in one of our prod ensemble.
> Here is the description of the event. 
> Before the old leader goes down, it is able to announce notification message. 
> So 3 out 5 (including the old leader) elected the old leader to be a new 
> leader for the next epoch. While, the old leader is being rebooted, 2 other 
> machines are trying to connect to the old leader.  So the quorum couldn't 
> form until those 2 machines give up and move to the next round of leader 
> election.
> This is because Learner.connectToLeader() use a simple retry logic. The 
> contract for this method is that it should never spend longer that initLimit 
> trying to connect to the leader.  In our outage, each sock.connect() is 
> probably blocked for initLimit and it is called 5 times.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ZOOKEEPER-1865) Fix retry logic in Learner.connectToLeader()

2015-03-14 Thread Camille Fournier (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14362102#comment-14362102
 ] 

Camille Fournier commented on ZOOKEEPER-1865:
-

So, the test as written does not actually exhibit the error we're fixing; if we 
revert the meaningful change you've proposed to Learner it will still pass. I 
updated it a bit to get it to fail with the (mostly) old code (modulo some 
helper methods you wrote), and pass with the new code. Have attached. [~michim] 
if you have a chance to look at this quickly it would be nice to get this into 
3.5.1

> Fix retry logic in Learner.connectToLeader() 
> -
>
> Key: ZOOKEEPER-1865
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1865
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Reporter: Thawan Kooburat
>Assignee: Edward Carter
> Fix For: 3.5.1
>
> Attachments: ZOOKEEPER-1865-nanoTime.patch, ZOOKEEPER-1865.patch
>
>
> We discovered a long leader election time today in one of our prod ensemble.
> Here is the description of the event. 
> Before the old leader goes down, it is able to announce notification message. 
> So 3 out 5 (including the old leader) elected the old leader to be a new 
> leader for the next epoch. While, the old leader is being rebooted, 2 other 
> machines are trying to connect to the old leader.  So the quorum couldn't 
> form until those 2 machines give up and move to the next round of leader 
> election.
> This is because Learner.connectToLeader() use a simple retry logic. The 
> contract for this method is that it should never spend longer that initLimit 
> trying to connect to the leader.  In our outage, each sock.connect() is 
> probably blocked for initLimit and it is called 5 times.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ZOOKEEPER-1865) Fix retry logic in Learner.connectToLeader()

2015-03-14 Thread Camille Fournier (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14362071#comment-14362071
 ] 

Camille Fournier commented on ZOOKEEPER-1865:
-

I'm going to retrigger a build. Can't believe this patch has been open for a 
year...

> Fix retry logic in Learner.connectToLeader() 
> -
>
> Key: ZOOKEEPER-1865
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1865
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Reporter: Thawan Kooburat
>Assignee: Edward Carter
> Fix For: 3.5.1
>
> Attachments: ZOOKEEPER-1865-nanoTime.patch, ZOOKEEPER-1865.patch
>
>
> We discovered a long leader election time today in one of our prod ensemble.
> Here is the description of the event. 
> Before the old leader goes down, it is able to announce notification message. 
> So 3 out 5 (including the old leader) elected the old leader to be a new 
> leader for the next epoch. While, the old leader is being rebooted, 2 other 
> machines are trying to connect to the old leader.  So the quorum couldn't 
> form until those 2 machines give up and move to the next round of leader 
> election.
> This is because Learner.connectToLeader() use a simple retry logic. The 
> contract for this method is that it should never spend longer that initLimit 
> trying to connect to the leader.  In our outage, each sock.connect() is 
> probably blocked for initLimit and it is called 5 times.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ZOOKEEPER-2074) Incorrect exit codes for "./zkCli.sh cmd arg"

2015-03-14 Thread Camille Fournier (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14362066#comment-14362066
 ] 

Camille Fournier commented on ZOOKEEPER-2074:
-

[~michim] did you revert the change? Do we want to revert or fix?

> Incorrect exit codes for "./zkCli.sh cmd arg"
> -
>
> Key: ZOOKEEPER-2074
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2074
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.5.0
>Reporter: surendra singh lilhore
>Assignee: surendra singh lilhore
>Priority: Minor
> Fix For: 3.5.1, 3.6.0
>
> Attachments: ZOOKEEPER-2074_1.patch, ZOOKEEPER-2074_2.patch
>
>
> Linux@hghoulaslx406:/> $ZOOKEEPER_HOME/bin/zkCli.sh create /test "test"
> Created /test1
> Linux@hghoulaslx406:/> echo $?
> 0
> Linux@hghoulaslx406:/> $ZOOKEEPER_HOME/bin/zkCli.sh create /test "test"
> Node already exists: /test1
> Linux@hghoulaslx406:/> echo $?
> 0
> Linux@hghoulaslx406:/> $ZOOKEEPER_HOME/bin/zkCli.sh delete /test
> Linux@hghoulaslx406:/> echo $?
> 0
> Linux@hghoulaslx406:/> $ZOOKEEPER_HOME/bin/zkCli.sh delete /test
> Node does not exist: /test1
> Linux@hghoulaslx406:/> echo $?
> 0
> Here for failed command it should return exit code 1



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ZOOKEEPER-2137) Make testPortChange() less flaky

2015-03-14 Thread Camille Fournier (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14362065#comment-14362065
 ] 

Camille Fournier commented on ZOOKEEPER-2137:
-

I guess my preference [~michim] would be to put in this fix unless we think it 
compromises the usefulness of the test. I'd rather have the test running than 
not if possible. Otherwise I agree with you; I'm not in the loop enough to know 
if the test with this change is still useful.

> Make testPortChange() less flaky
> 
>
> Key: ZOOKEEPER-2137
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2137
> Project: ZooKeeper
>  Issue Type: Sub-task
>Reporter: Hongchao Deng
>Assignee: Hongchao Deng
> Fix For: 3.6.0
>
> Attachments: ZOOKEEPER-2137.patch
>
>
> The cause of flaky failure of testPortChange() is a race in sync().
> I figured out it could take some time to fix sync(). Meanwhile, we can make 
> testPortChange() less flaky by doing reconfig on the leader. We can change 
> this back in the fix of ZOOKEEPER-2136.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ZOOKEEPER-1865) Fix retry logic in Learner.connectToLeader()

2015-01-11 Thread Camille Fournier (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14273122#comment-14273122
 ] 

Camille Fournier commented on ZOOKEEPER-1865:
-

I'm not super crazy jazzed with all the inline calls to 
System.currentTimeMillis tbh. Feels like it will be a nightmare to test. Why 
not make it an overridable method to check this invariant?

> Fix retry logic in Learner.connectToLeader() 
> -
>
> Key: ZOOKEEPER-1865
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1865
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Reporter: Thawan Kooburat
>Assignee: Edward Carter
> Fix For: 3.5.1
>
> Attachments: ZOOKEEPER-1865.patch
>
>
> We discovered a long leader election time today in one of our prod ensemble.
> Here is the description of the event. 
> Before the old leader goes down, it is able to announce notification message. 
> So 3 out 5 (including the old leader) elected the old leader to be a new 
> leader for the next epoch. While, the old leader is being rebooted, 2 other 
> machines are trying to connect to the old leader.  So the quorum couldn't 
> form until those 2 machines give up and move to the next round of leader 
> election.
> This is because Learner.connectToLeader() use a simple retry logic. The 
> contract for this method is that it should never spend longer that initLimit 
> trying to connect to the leader.  In our outage, each sock.connect() is 
> probably blocked for initLimit and it is called 5 times.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ZOOKEEPER-2091) Possible logic error in ClientCnxnSocketNIO

2015-01-11 Thread Camille Fournier (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14273116#comment-14273116
 ] 

Camille Fournier commented on ZOOKEEPER-2091:
-

Yes, [~rakeshr], as [~hdeng] says, the doIO is written to accommodate packets 
that are not completely sent; changing that will change the semantics of that 
method and make it blocking for the duration of a packet. You're right that in 
SendPacket we need to handle that and worrying about blocking is not as 
important, but I don't think this is actually an error in doIO.

> Possible logic error in ClientCnxnSocketNIO
> ---
>
> Key: ZOOKEEPER-2091
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2091
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: java client
>Affects Versions: 3.4.6
>Reporter: Cheng
>Assignee: Rakesh R
> Fix For: 3.5.1
>
> Attachments: ZOOKEEPER-2091.patch
>
>
> When SASL authentication is enabled, the ZooKeeper client will finally call 
> ClientCnxnSocketNIO#sendPacket(Packet p) to send a packet to server:
> @Override
> void sendPacket(Packet p) throws IOException {
> SocketChannel sock = (SocketChannel) sockKey.channel();
> if (sock == null) {
> throw new IOException("Socket is null!");
> }
> p.createBB();
> ByteBuffer pbb = p.bb;
> sock.write(pbb);
> }
> One problem I can see is that the sock is non-blocking, so when the sock's 
> output buffer is full(theoretically), only part of the Packet is sent out and 
> the communication will break.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ZOOKEEPER-2099) Using txnlog to sync a learner can corrupt the learner's datatree

2014-12-30 Thread Camille Fournier (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14261533#comment-14261533
 ] 

Camille Fournier commented on ZOOKEEPER-2099:
-

Changing status to get the build to execute with the patch

> Using txnlog to sync a learner can corrupt the learner's datatree
> -
>
> Key: ZOOKEEPER-2099
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2099
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.5.0, 3.6.0
>Reporter: Santeri (Santtu) Voutilainen
> Attachments: ZOOKEEPER-2099-repro.patch
>
>
> When a learner sync's with the leader, it is possible for the Leader to send 
> the learner a DIFF that does NOT contain all the transactions between the 
> learner's zxid and that of the leader's zxid thus resulting in a corruption 
> datatree on the learner.
> For this to occur, the leader must have sync'd with a previous leader using a 
> SNAP and the zxid requested by the learner must still exist in the current 
> leader's txnlog files.
> This issue was introduced by ZOOKEEPER-1413.
> *Scenario*
> A sample sequence in which this issue occurs:
> # Hosts H1 and H2 disconnect from the current leader H3 (crash, network 
> partition, etc).  The last zxid on these hosts is Z1.
> # Additional transactions occur on the cluster resulting in the latest zxid 
> being Z2.
> # Host H1 recovers and connects to H3 to sync and sends Z1 as part of its 
> FOLLOWERINFO or OBSERVERINFO packet.
> # The leader, H3, decides to send a SNAP because a) it does not have the 
> necessary records in the in-mem committed log, AND b) the size of the 
> required txnlog to send it larger than the limit.
> # Host H1 successfully sync's with the leader (H3). At this point H1's 
> txnlogs have records up to and including Z1 as well as Z2 and up.  It does 
> NOT have records between Z1 and Z2.
> # Host H3 fails; a leader election occurs and H1 is chosen as the leader
> # Host H2 recovers and connects to H2 to sync and sends Z1 in its 
> FOLLOWERINFO/OBSERVERINFO packet
> # The leader, H1, determines it can send a DIFF.  It concludes this because 
> although it does not have the necessary records in its in-memory commit log, 
> it does have Z1 in its txnlog and the size of the log is less than the limit. 
>  H1 ends up with a different size calculation than H3 because H1 is missing 
> all the records between Z1 and Z2 so it has less log to send.
> # H2 receives the DIFF and applies the records to its data tree. Depending on 
> the type of transactions that occurred between Z1 and Z2 it may not hit any 
> errors when applying these records.
> H2 now has a corrupted view of the data tree because it is missing all the 
> changes made by the transactions between Z1 and Z2.
> *Recovery*
> The way to recover from this situation is to delete the data/snap directory 
> contents from the affected hosts and have them resync with the leader at 
> which point they will receive a SNAP since they will appear as empty hosts.
> *Workaround*
> A quick workaround for anyone concerned about this issue is to disable sync 
> from the txnlog by changing the database size limit to 0.  This is a code 
> change as it is not a configurable setting.
> *Potential fixes*
> There are several ways of fixing this.  A few of options:
> * Delete all snaps and txnlog files on a host when it receives a SNAP from 
> the leader
> * Invalidate sync from txnlog after receiving a SNAP. This state must also be 
> persisted on-disk so that the txnlogs with the gap cannot be used to provide 
> a DIFF even after restart.  A couple ways in which the state could be 
> persisted:
> ** Write a file (for example: loggap.) in the data dir indicating that 
> the host was sync'd with a SNAP and thus txnlogs might be missing. Presence 
> of these files would be checked when reading txnlogs.
> ** Write a new record into the txnlog file as "sync'd-by-snap-from-leader" 
> marker. Readers of the txnlog would then check for presence of this record 
> when iterating through it and act appropriately.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ZOOKEEPER-2098) QuorumCnxManager: use BufferedOutputStream for initial msg

2014-12-30 Thread Camille Fournier (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14261529#comment-14261529
 ] 

Camille Fournier commented on ZOOKEEPER-2098:
-

We've got a test failure, is it related to this patch?

> QuorumCnxManager: use BufferedOutputStream for initial msg
> --
>
> Key: ZOOKEEPER-2098
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2098
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: quorum, server
>Affects Versions: 3.5.0
>Reporter: Raul Gutierrez Segales
>Assignee: Raul Gutierrez Segales
> Fix For: 3.5.1
>
> Attachments: ZOOKEEPER-2098.patch
>
>
> Whilst writing fle-dump (a tool like 
> [zk-dump|https://github.com/twitter/zktraffic/], but to dump 
> FastLeaderElection messages), I noticed that QCM is using DataOutputStream 
> (which doesn't buffer) directly.
> So all calls to write() are written immediately to the network, which means 
> simple messaages like two participants exchanging Votes can take a couple 
> RTTs! This is specially terrible for global clusters (i.e.: x-country RTTs).
> The solution is to use BufferedOutputStream for the initial negotiation 
> between members of the cluster. Note that there are other places were 
> suboptimal (but not entirely unbuffered) writes to the network still exist. 
> I'll get those in separate tickets.
> After using BufferedOutputStream we get only 1 RTT for the initial message, 
> so elections & time for for participants to join a cluster is reduced.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ZOOKEEPER-2091) Possible logic error in ClientCnxnSocketNIO

2014-12-30 Thread Camille Fournier (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14261525#comment-14261525
 ] 

Camille Fournier commented on ZOOKEEPER-2091:
-

Any more thoughts on this?

> Possible logic error in ClientCnxnSocketNIO
> ---
>
> Key: ZOOKEEPER-2091
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2091
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: java client
>Affects Versions: 3.4.6
>Reporter: Cheng
>Assignee: Rakesh R
> Fix For: 3.5.1
>
> Attachments: ZOOKEEPER-2091.patch
>
>
> When SASL authentication is enabled, the ZooKeeper client will finally call 
> ClientCnxnSocketNIO#sendPacket(Packet p) to send a packet to server:
> @Override
> void sendPacket(Packet p) throws IOException {
> SocketChannel sock = (SocketChannel) sockKey.channel();
> if (sock == null) {
> throw new IOException("Socket is null!");
> }
> p.createBB();
> ByteBuffer pbb = p.bb;
> sock.write(pbb);
> }
> One problem I can see is that the sock is non-blocking, so when the sock's 
> output buffer is full(theoretically), only part of the Packet is sent out and 
> the communication will break.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ZOOKEEPER-2091) Possible logic error in ClientCnxnSocketNIO

2014-12-12 Thread Camille Fournier (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14244892#comment-14244892
 ] 

Camille Fournier commented on ZOOKEEPER-2091:
-

I agree [~hdeng] but I'm not sure what conclusion you're driving to. From my 
observation, we can do one of two obvious things:
1) Only change SendPacket, not doIO, which I *think* I agree should solve the 
observed problem with less impact than the current patch

2) Something more drastic to actually fix the hack that is our current Sasl 
hacks, which I don't have a patch to do and I'm not sure if we have any other 
open tickets that will resolve this.

If we think solution 1 solves the issue, I think it is a simple enough fix to 
go ahead and use while we look into Netty etc.

> Possible logic error in ClientCnxnSocketNIO
> ---
>
> Key: ZOOKEEPER-2091
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2091
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: java client
>Affects Versions: 3.4.6
>Reporter: Cheng
>Assignee: Rakesh R
> Fix For: 3.5.1
>
> Attachments: ZOOKEEPER-2091.patch
>
>
> When SASL authentication is enabled, the ZooKeeper client will finally call 
> ClientCnxnSocketNIO#sendPacket(Packet p) to send a packet to server:
> @Override
> void sendPacket(Packet p) throws IOException {
> SocketChannel sock = (SocketChannel) sockKey.channel();
> if (sock == null) {
> throw new IOException("Socket is null!");
> }
> p.createBB();
> ByteBuffer pbb = p.bb;
> sock.write(pbb);
> }
> One problem I can see is that the sock is non-blocking, so when the sock's 
> output buffer is full(theoretically), only part of the Packet is sent out and 
> the communication will break.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ZOOKEEPER-2091) Possible logic error in ClientCnxnSocketNIO

2014-12-12 Thread Camille Fournier (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14244861#comment-14244861
 ] 

Camille Fournier commented on ZOOKEEPER-2091:
-

I'm all the way in the bottom of this code and don't even remember the 
invariants. Why don't we wait till all the bytes are flushed to the buffer if 
we don't read anything until that's done?

FWIW, I looked into the depth's of Netty's NIO socket handling. In particular 
to verify that Netty sockets will sometimes switch to reading before all of the 
outgoing bytes are written, which they will. I thought it was interesting 
though that they also configure a "writeSpinCount" that lets them try to flush 
all bytes a configurable number of times before going back to the selector to 
do other IO. Might be something worth considering. Or, you know, not doing this 
socket stuff ourselves at all and just using netty :)
https://github.com/netty/netty/blob/0eb059bf58642f3c06144e1ea4a9d6c7632eb4d5/transport/src/main/java/io/netty/channel/nio/AbstractNioByteChannel.java#L198

In conclusion, I think that it may not matter much either way but probably we 
should only add this to the SendPacket method.

> Possible logic error in ClientCnxnSocketNIO
> ---
>
> Key: ZOOKEEPER-2091
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2091
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: java client
>Affects Versions: 3.4.6
>Reporter: Cheng
>Assignee: Rakesh R
> Fix For: 3.5.1
>
> Attachments: ZOOKEEPER-2091.patch
>
>
> When SASL authentication is enabled, the ZooKeeper client will finally call 
> ClientCnxnSocketNIO#sendPacket(Packet p) to send a packet to server:
> @Override
> void sendPacket(Packet p) throws IOException {
> SocketChannel sock = (SocketChannel) sockKey.channel();
> if (sock == null) {
> throw new IOException("Socket is null!");
> }
> p.createBB();
> ByteBuffer pbb = p.bb;
> sock.write(pbb);
> }
> One problem I can see is that the sock is non-blocking, so when the sock's 
> output buffer is full(theoretically), only part of the Packet is sent out and 
> the communication will break.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ZOOKEEPER-2091) Possible logic error in ClientCnxnSocketNIO

2014-12-12 Thread Camille Fournier (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14244659#comment-14244659
 ] 

Camille Fournier commented on ZOOKEEPER-2091:
-

I agree that it seems like only sending the full packet in 
ClientCnxnSocketNIO#SendPacket() would fix this issue. We don't do this in doIO 
presumably because we might only send part of the buffer, then allow another 
read to come in before we send the rest of the packet so as not to block on the 
complete send of the pending outgoing packet? Is that correct for implementing 
the nonblocking socket, can someone verify?

> Possible logic error in ClientCnxnSocketNIO
> ---
>
> Key: ZOOKEEPER-2091
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2091
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: java client
>Affects Versions: 3.4.6
>Reporter: Cheng
>Assignee: Rakesh R
> Fix For: 3.5.1
>
> Attachments: ZOOKEEPER-2091.patch
>
>
> When SASL authentication is enabled, the ZooKeeper client will finally call 
> ClientCnxnSocketNIO#sendPacket(Packet p) to send a packet to server:
> @Override
> void sendPacket(Packet p) throws IOException {
> SocketChannel sock = (SocketChannel) sockKey.channel();
> if (sock == null) {
> throw new IOException("Socket is null!");
> }
> p.createBB();
> ByteBuffer pbb = p.bb;
> sock.write(pbb);
> }
> One problem I can see is that the sock is non-blocking, so when the sock's 
> output buffer is full(theoretically), only part of the Packet is sent out and 
> the communication will break.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ZOOKEEPER-2091) Possible logic error in ClientCnxnSocketNIO

2014-12-10 Thread Camille Fournier (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14241475#comment-14241475
 ] 

Camille Fournier commented on ZOOKEEPER-2091:
-

This seems right... can someone seeing this error try this patch and see if it 
fixes it?

> Possible logic error in ClientCnxnSocketNIO
> ---
>
> Key: ZOOKEEPER-2091
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2091
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: java client
>Affects Versions: 3.4.6
>Reporter: Cheng
>Assignee: Rakesh R
> Fix For: 3.5.1
>
> Attachments: ZOOKEEPER-2091.patch
>
>
> When SASL authentication is enabled, the ZooKeeper client will finally call 
> ClientCnxnSocketNIO#sendPacket(Packet p) to send a packet to server:
> @Override
> void sendPacket(Packet p) throws IOException {
> SocketChannel sock = (SocketChannel) sockKey.channel();
> if (sock == null) {
> throw new IOException("Socket is null!");
> }
> p.createBB();
> ByteBuffer pbb = p.bb;
> sock.write(pbb);
> }
> One problem I can see is that the sock is non-blocking, so when the sock's 
> output buffer is full(theoretically), only part of the Packet is sent out and 
> the communication will break.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ZOOKEEPER-2066) Updates to README.txt

2014-10-24 Thread Camille Fournier (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14183065#comment-14183065
 ] 

Camille Fournier commented on ZOOKEEPER-2066:
-

I think it's part of the build server magic. Fortunately the committers 
generally actually read what the patch is about and don't decide purely based 
on what jenkins tells us.

> Updates to README.txt
> -
>
> Key: ZOOKEEPER-2066
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2066
> Project: ZooKeeper
>  Issue Type: Improvement
>Reporter: Wendy Smoak
>Priority: Minor
> Attachments: ZOOKEEPER-2066.diff
>
>
> Updates to README.txt
>  - first reference should be to Apache ZooKeeper
>  - fix obsolete ibiblio-rsync-repository url
>  - better describe the release process
>  - minor grammar and punctuation changes



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ZOOKEEPER-2066) Updates to README.txt

2014-10-24 Thread Camille Fournier (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14183003#comment-14183003
 ] 

Camille Fournier commented on ZOOKEEPER-2066:
-

+1 thanks for contributing [~wsmoak]


> Updates to README.txt
> -
>
> Key: ZOOKEEPER-2066
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2066
> Project: ZooKeeper
>  Issue Type: Improvement
>Reporter: Wendy Smoak
>Priority: Minor
> Attachments: ZOOKEEPER-2066.diff
>
>
> Updates to README.txt
>  - first reference should be to Apache ZooKeeper
>  - fix obsolete ibiblio-rsync-repository url
>  - better describe the release process
>  - minor grammar and punctuation changes



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ZOOKEEPER-1863) Race condition in commit processor leading to out of order request completion, xid mismatch on client.

2014-07-15 Thread Camille Fournier (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14062724#comment-14062724
 ] 

Camille Fournier commented on ZOOKEEPER-1863:
-

Awesome. Checked this in to trunk. Thanks [~fpj] and [~dutch] and [~rgs] and 
everyone else who helped!

> Race condition in commit processor leading to out of order request 
> completion, xid mismatch on client.
> --
>
> Key: ZOOKEEPER-1863
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1863
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.5.0
>Reporter: Dutch T. Meyer
>Assignee: Dutch T. Meyer
>Priority: Blocker
> Fix For: 3.5.0
>
> Attachments: ZOOKEEPER-1863.patch, ZOOKEEPER-1863.patch, 
> ZOOKEEPER-1863.patch, ZOOKEEPER-1863.patch, ZOOKEEPER-1863.patch, 
> ZOOKEEPER-1863.patch, ZOOKEEPER-1863.patch, stack.17512
>
>
> In CommitProcessor.java processor, if we are at the primary request handler 
> on line 167:
> {noformat}
> while (!stopped && !isWaitingForCommit() &&
>!isProcessingCommit() &&
>(request = queuedRequests.poll()) != null) {
> if (needCommit(request)) {
> nextPending.set(request);
> } else {
> sendToNextProcessor(request);
> }
> }
> {noformat}
> A request can be handled in this block and be quickly processed and completed 
> on another thread. If queuedRequests is empty, we then exit the block. Next, 
> before this thread makes any more progress, we can get 2 more requests, one 
> get_children(say), and a sync placed on queuedRequests for the processor. 
> Then, if we are very unlucky, the sync request can complete and this object's 
> commit() routine is called (from FollowerZookeeperServer), which places the 
> sync request on the previously empty committedRequests queue. At that point, 
> this thread continues.
> We reach line 182, which is a check on sync requests.
> {noformat}
> if (!stopped && !isProcessingRequest() &&
> (request = committedRequests.poll()) != null) {
> {noformat}
> Here we are not processing any requests, because the original request has 
> completed. We haven't dequeued either the read or the sync request in this 
> processor. Next, the poll above will pull the sync request off the queue, and 
> in the following block, the sync will get forwarded to the next processor.
> This is a problem because the read request hasn't been forwarded yet, so 
> requests are now out of order.
> I've been able to reproduce this bug reliably by injecting a 
> Thread.sleep(5000) between the two blocks above to make the race condition 
> far more likely, then in a client program.
> {noformat}
> zoo_aget_children(zh, "/", 0, getchildren_cb, NULL);
> //Wait long enough for queuedRequests to drain
> sleep(1);
> zoo_aget_children(zh, "/", 0, getchildren_cb, &th_ctx[0]);
> zoo_async(zh, "/", sync_cb, &th_ctx[0]);
> {noformat}
> When this bug is triggered, 3 things can happen:
> 1) Clients will see requests complete out of order and fail on xid mismatches.
> 2) Kazoo in particular doesn't handle this runtime exception well, and can 
> orphan outstanding requests.
> 3) I've seen zookeeper servers deadlock, likely because the commit cannot be 
> completed, which can wedge the commit processor.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (ZOOKEEPER-1863) Race condition in commit processor leading to out of order request completion, xid mismatch on client.

2014-07-15 Thread Camille Fournier (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14062589#comment-14062589
 ] 

Camille Fournier commented on ZOOKEEPER-1863:
-

Just to clarify [~fpj] this is just a patch so we can write a test, not a fix 
for the issue?

> Race condition in commit processor leading to out of order request 
> completion, xid mismatch on client.
> --
>
> Key: ZOOKEEPER-1863
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1863
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.5.0
>Reporter: Dutch T. Meyer
>Assignee: Dutch T. Meyer
>Priority: Blocker
> Fix For: 3.5.0
>
> Attachments: ZOOKEEPER-1863.patch, ZOOKEEPER-1863.patch, 
> ZOOKEEPER-1863.patch, ZOOKEEPER-1863.patch, ZOOKEEPER-1863.patch, 
> ZOOKEEPER-1863.patch, ZOOKEEPER-1863.patch, stack.17512
>
>
> In CommitProcessor.java processor, if we are at the primary request handler 
> on line 167:
> {noformat}
> while (!stopped && !isWaitingForCommit() &&
>!isProcessingCommit() &&
>(request = queuedRequests.poll()) != null) {
> if (needCommit(request)) {
> nextPending.set(request);
> } else {
> sendToNextProcessor(request);
> }
> }
> {noformat}
> A request can be handled in this block and be quickly processed and completed 
> on another thread. If queuedRequests is empty, we then exit the block. Next, 
> before this thread makes any more progress, we can get 2 more requests, one 
> get_children(say), and a sync placed on queuedRequests for the processor. 
> Then, if we are very unlucky, the sync request can complete and this object's 
> commit() routine is called (from FollowerZookeeperServer), which places the 
> sync request on the previously empty committedRequests queue. At that point, 
> this thread continues.
> We reach line 182, which is a check on sync requests.
> {noformat}
> if (!stopped && !isProcessingRequest() &&
> (request = committedRequests.poll()) != null) {
> {noformat}
> Here we are not processing any requests, because the original request has 
> completed. We haven't dequeued either the read or the sync request in this 
> processor. Next, the poll above will pull the sync request off the queue, and 
> in the following block, the sync will get forwarded to the next processor.
> This is a problem because the read request hasn't been forwarded yet, so 
> requests are now out of order.
> I've been able to reproduce this bug reliably by injecting a 
> Thread.sleep(5000) between the two blocks above to make the race condition 
> far more likely, then in a client program.
> {noformat}
> zoo_aget_children(zh, "/", 0, getchildren_cb, NULL);
> //Wait long enough for queuedRequests to drain
> sleep(1);
> zoo_aget_children(zh, "/", 0, getchildren_cb, &th_ctx[0]);
> zoo_async(zh, "/", sync_cb, &th_ctx[0]);
> {noformat}
> When this bug is triggered, 3 things can happen:
> 1) Clients will see requests complete out of order and fail on xid mismatches.
> 2) Kazoo in particular doesn't handle this runtime exception well, and can 
> orphan outstanding requests.
> 3) I've seen zookeeper servers deadlock, likely because the commit cannot be 
> completed, which can wedge the commit processor.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (ZOOKEEPER-1955) EOFException on Reading Snapshot

2014-07-04 Thread Camille Fournier (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14052747#comment-14052747
 ] 

Camille Fournier commented on ZOOKEEPER-1955:
-

It looks like the problem may actually be in the log file, from your report:
2014-07-04 12:58:52,896 [myid:1] - DEBUG [main:FileTxnLog$FileTxnIterator@575] 
- Created new input stream /var/lib/zookeeper/version-2/log.30021
2014-07-04 12:58:52,915 [myid:1] - DEBUG [main:FileTxnLog$FileTxnIterator@578] 
- Created new input archive /var/lib/zookeeper/version-2/log.30021
2014-07-04 12:59:25,870 [myid:1] - DEBUG [main:FileTxnLog$FileTxnIterator@618] 
- EOF excepton java.io.EOFException: Failed to read 
/var/lib/zookeeper/version-2/log.30021

I can load a ZK fine with the snapshot you've provided. Do you have the log 
file or other data?

> EOFException on Reading Snapshot
> 
>
> Key: ZOOKEEPER-1955
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1955
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.4.5
>Reporter: Aaron Zimmerman
> Attachments: snapshot
>
>
> We have a 5 node zookeeper cluster that has been operating normally for 
> several months.  Starting a few days ago, the entire cluster crashes a few 
> times per day, all nodes at the exact same time.  We can't track down the 
> exact issue, but deleting the snapshots and logs and restarting allows the 
> cluster to come back up.  
> We are running exhibitor to monitor the cluster.  
> It appears that something bad gets into the logs, causing an EOFException and 
> this cascades through the entire cluster:
> 2014-07-04 12:55:26,328 [myid:1] - WARN  
> [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:Follower@89] - Exception when 
> following the leader
> java.io.EOFException
> at java.io.DataInputStream.readInt(DataInputStream.java:375)
> at 
> org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
> at 
> org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:83)
> at 
> org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:108)
> at 
> org.apache.zookeeper.server.quorum.Learner.readPacket(Learner.java:152)
> at 
> org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:85)
> at 
> org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:740)
> 2014-07-04 12:55:26,328 [myid:1] - INFO  
> [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:Follower@166] - shutdown called
> java.lang.Exception: shutdown Follower
> at 
> org.apache.zookeeper.server.quorum.Follower.shutdown(Follower.java:166)
> at 
> org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:744)
> Then the server dies, exhibitor tries to restart each node, and they all get 
> stuck trying to replay the bad transaction, logging things like:
>  
> 2014-07-04 12:58:52,734 [myid:1] - INFO  [main:FileSnap@83] - Reading 
> snapshot /var/lib/zookeeper/version-2/snapshot.300011fc0
> 2014-07-04 12:58:52,896 [myid:1] - DEBUG 
> [main:FileTxnLog$FileTxnIterator@575] - Created new input stream 
> /var/lib/zookeeper/version-2/log.30021
> 2014-07-04 12:58:52,915 [myid:1] - DEBUG 
> [main:FileTxnLog$FileTxnIterator@578] - Created new input archive 
> /var/lib/zookeeper/version-2/log.30021
> 2014-07-04 12:59:25,870 [myid:1] - DEBUG 
> [main:FileTxnLog$FileTxnIterator@618] - EOF excepton java.io.EOFException: 
> Failed to read /var/lib/zookeeper/version-2/log.30021
> 2014-07-04 12:59:25,871 [myid:1] - DEBUG 
> [main:FileTxnLog$FileTxnIterator@575] - Created new input stream 
> /var/lib/zookeeper/version-2/log.300011fc2
> 2014-07-04 12:59:25,872 [myid:1] - DEBUG 
> [main:FileTxnLog$FileTxnIterator@578] - Created new input archive 
> /var/lib/zookeeper/version-2/log.300011fc2
> 2014-07-04 12:59:48,722 [myid:1] - DEBUG 
> [main:FileTxnLog$FileTxnIterator@618] - EOF excepton java.io.EOFException: 
> Failed to read /var/lib/zookeeper/version-2/log.300011fc2
> And the cluster is dead.  The only way we have found to recover is to delete 
> all of the data and restart.
> [~fournc] Appreciate any assistance you can offer.  



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (ZOOKEEPER-1900) NullPointerException in truncate

2014-06-30 Thread Camille Fournier (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Camille Fournier updated ZOOKEEPER-1900:


Attachment: ZOOKEEPER-190034v2.patch

>  NullPointerException in truncate
> -
>
> Key: ZOOKEEPER-1900
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1900
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.4.5, 3.4.6
> Environment: linux java 1.6
>Reporter: Steven Bower
>Assignee: Camille Fournier
>Priority: Blocker
> Fix For: 3.4.7, 3.5.0
>
> Attachments: ZOOKEEPER-1900-34.patch, ZOOKEEPER-1900.patch, 
> ZOOKEEPER-190034v2.patch, ZOOKEEPER-1900v2.patch
>
>
> The other day we started up a ZK instance that had been down for a bit (1day) 
> and started getting NPEs all over the place...
> {noformat}
> 2014-20-03 11:15:42.320 INFO  QuorumPeerConfig [main] - Reading configuration 
> from: /xxx/bin/zk/etc/zk.cfg
> 2014-20-03 11:15:42.350 INFO  QuorumPeerConfig [main] - Defaulting to 
> majority quorums
> 2014-20-03 11:15:42.353 INFO  DatadirCleanupManager [main] - 
> autopurge.snapRetainCount set to 3
> 2014-20-03 11:15:42.353 INFO  DatadirCleanupManager [main] - 
> autopurge.purgeInterval set to 0
> 2014-20-03 11:15:42.353 INFO  DatadirCleanupManager [main] - Purge task is 
> not scheduled.
> 2014-20-03 11:15:42.385 INFO  QuorumPeerMain [main] - Starting quorum peer
> 2014-20-03 11:15:42.399 INFO  NIOServerCnxnFactory [main] - binding to port 
> 0.0.0.0/0.0.0.0:
> 2014-20-03 11:15:42.413 INFO  QuorumPeer [main] - tickTime set to 2000
> 2014-20-03 11:15:42.413 INFO  QuorumPeer [main] - minSessionTimeout set to -1
> 2014-20-03 11:15:42.413 INFO  QuorumPeer [main] - maxSessionTimeout set to -1
> 2014-20-03 11:15:42.413 INFO  QuorumPeer [main] - initLimit set to 10
> 2014-20-03 11:15:42.456 INFO  FileSnap [main] - Reading snapshot 
> /xxx/zk_data/version-2/snapshot.2c
> 2014-20-03 11:15:42.463 INFO  QuorumCnxManager [Thread-3] - My election bind 
> port: 0.0.0.0/0.0.0.0:7555
> 2014-20-03 11:15:42.470 INFO  QuorumPeer 
> [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:] - LOOKING
> 2014-20-03 11:15:42.471 INFO  FastLeaderElection 
> [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:] - New election. My id =  3, 
> proposed zxid=0x8000
> 2014-20-03 11:15:42.479 INFO  FastLeaderElection [WorkerReceiver[myid=3]] - 
> Notification: 2 (n.leader), 0x2b0002 (n.zxid), 0x2c (n.round), FOLLOWING 
> (n.state), 1 (n.sid), 0x2b (n.peerEPoch), LOOKING (my state)
> 2014-20-03 11:15:42.479 INFO  FastLeaderElection [WorkerReceiver[myid=3]] - 
> Notification: 2 (n.leader), 0x2b0002 (n.zxid), 0x2c (n.round), FOLLOWING 
> (n.state), 1 (n.sid), 0x2b (n.peerEPoch), LOOKING (my state)
> 2014-20-03 11:15:42.482 INFO  QuorumCnxManager [WorkerSender[myid=3]] - Have 
> smaller server identifier, so dropping the connection: (5, 3)
> 2014-20-03 11:15:42.482 INFO  FastLeaderElection [WorkerReceiver[myid=3]] - 
> Notification: 2 (n.leader), 0x2b0002 (n.zxid), 0x2c (n.round), LEADING 
> (n.state), 2 (n.sid), 0x2b (n.peerEPoch), LOOKING (my state)
> 2014-20-03 11:15:42.482 INFO  FastLeaderElection [WorkerReceiver[myid=3]] - 
> Notification: 2 (n.leader), 0x2b0002 (n.zxid), 0x2c (n.round), LEADING 
> (n.state), 2 (n.sid), 0x2b (n.peerEPoch), LOOKING (my state)
> 2014-20-03 11:15:42.482 INFO  QuorumPeer 
> [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:] - OBSERVING
> 2014-20-03 11:15:42.486 INFO  Learner 
> [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:] - TCP NoDelay set to: true
> 2014-20-03 11:15:42.488 INFO  QuorumCnxManager [host1/###.###.###.###:7555] - 
> Received connection request /###.###.###.###:64528
> 2014-20-03 11:15:42.490 INFO  ZooKeeperServer 
> [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:] - Server 
> environment:zookeeper.version=3.4.5-1392090, built on 09/30/2012 17:52 GMT
> 2014-20-03 11:15:42.490 INFO  ZooKeeperServer 
> [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:] - Server environment:host.name=host1
> 2014-20-03 11:15:42.490 INFO  ZooKeeperServer 
> [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:] - Server 
> environment:java.version=1.6.0_20
> 2014-20-03 11:15:42.490 INFO  ZooKeeperServer 
> [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:] - Server 
> environment:java.vendor=Sun Microsystems Inc.
> 2014-20-03 11:15:42.490 INFO  ZooKeeperServer 
> [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:] - Server 
> environment:java.home=/xxx/util/common/jdk1.6.0_20_64bit/jre
> 2014-20-03 11:15:42.490 INFO  ZooKeeperServer 
> [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:] - Server 
> environment:java.class.path=/xxx/bin/zk/etc:/xxx/bin/zk/lib/slf4j-log4j12-1.7.2.jar:/xxx/bin/zk/lib/jline-0.9.94.jar:/xxx/bin/zk/lib/jul-to-slf4j-1.7.2.jar:/xxx/bin/zk/lib/ZooInspector-3.4.5.jar:/xxx/bin/zk/lib/jcl-over-slf4j-1.7.2.jar:/xxx/bin/zk/lib/log4j-1.2.17.jar:/xxx/bin/zk/lib/zookeeper-3.4

1 2 3 4 5 >

1 - 100 of 476 matches

Mail list logo