[jira] [Commented] (ZOOKEEPER-1633) Introduce a protocol version to connection initiation message

2013-02-26 Thread Alexander Shraer (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13586920#comment-13586920
 ] 

Alexander Shraer commented on ZOOKEEPER-1633:
-

sure, I can add this, but just to clarify - this is a 3.4.6 only code (that 
will only run during upgrade to 3.5.0). The test is 3.4.6 only too. I can add a 
test that will just make sure that the receiving server finds the connecting 
server's id in the stream and ignores the rest. If you were thinking of 
something different please let me know.

> Introduce a protocol version to connection initiation message
> -
>
> Key: ZOOKEEPER-1633
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1633
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Reporter: Alexander Shraer
>Assignee: Alexander Shraer
> Fix For: 3.4.6
>
> Attachments: ZOOKEEPER-1633.patch
>
>
> Currently the first message a server sends to another server includes just 
> one field - the server's id (long). This is in QuorumCnxManager.java. This 
> makes changes to the information passed during this initial connection very 
> difficult. This patch will change the first field of the message to be a 
> protocol version (a negative number that can't be a server id). The second 
> field will be the server id. The third field is number of bytes in the 
> remainder of the message. A 3.4 server will read the first field as before, 
> but if this is a negative number it will read the second field to find the 
> server id, and then remove the remainder of the message from the stream. This 
> will not affect 3.4 since 3.4 and earlier servers send just the server id (so 
> the code in the patch will not run unless there is a server > 3.4 trying to 
> connect). This will, however, provide the necessary flexibility for future 
> releases as well as an upgrade path from 3.4

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (ZOOKEEPER-1618) Disconnected event when stopping leader process

2013-02-26 Thread Peter Nerg (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13586925#comment-13586925
 ] 

Peter Nerg commented on ZOOKEEPER-1618:
---

{quote}The client doesn't need to know that a member of the ensemble is gone. A 
client needs to know that it needs to find another server that is either 
following or leading before its session expire, otherwise it might lose 
ephemerals and such. The client learns it through the disconnected event and it 
is not important to the client the precise reason.{quote}

So what you're saying is that killing a leader requires more action on behalf 
of the client hence the client needs to be notified via a disconnected event.
I'm starting to feel slightly daft but I don't see the difference with the 
scenario that you kill a follower. Any clients attached to the killed instance 
will also have to migrate to a new alive ZK instance (leader or follower).

Though I guess your answer lies in:
{quote}I'm pointing out that a server cannot distinguish a situation in which 
servers are partitioned away from each other for hours, and therefore there is 
no leader, from one in which a single server is partitioned away and the rest 
of the ensemble is making progress.{quote}
As I gather then the key point is that the client has no way to see the 
difference between a cluster partition and a temporary loss of the leader.
Now we're getting somewhere, perhaps even my thick skull starts to get the 
picture...:)

So if this is how it behaves due to the explanation above then I got the 
answers I wanted.
Though I then expect this to be appropriately documented to avoid future 
confusion.
Do you want me to create a new documentation bug or will you just re-use this 
one?


> Disconnected event when stopping leader process
> ---
>
> Key: ZOOKEEPER-1618
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1618
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.4.4, 3.4.5
> Environment: Linux SLES
> java version "1.6.0_31"
>Reporter: Peter Nerg
>Priority: Minor
>
> Running a three node ZK cluster I stop/kill the leader node.
> Immediately all connected clients will receive a Disconnected event, a second 
> or so later an event with SyncConnected is received.
> Killing a follower will not produce the same issue/event.
> The application/clients have been implemented to manage Disconnected events 
> so they survive.
> I however expected the ZK client to manage the hickup during the election 
> process. 
> This produces quite a lot of logging in large clusters that have many 
> services relying on ZK.
> In some cases we may loose a few requests as we need a working ZK cluster to 
> execute those requests.
> IMHO it's not really full high availability if the ZK cluster momentarily 
> takes a dive due to that the leader goes away.
> No matter how much redundancy one uses in form of ZK instances one still may 
> get processing errors during leader election.
> I've verified this behavior in both 3.4.4 and 3.4.5

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (ZOOKEEPER-1633) Introduce a protocol version to connection initiation message

2013-02-26 Thread Alexander Shraer (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13586939#comment-13586939
 ] 

Alexander Shraer commented on ZOOKEEPER-1633:
-

what I'm trying to say is that a unit test like this will test something very 
limited (that the very beginning of an upgrade process works). I don't mind 
adding it, just think this is better tested by the separate upgrade testing 
that's done before releases.

> Introduce a protocol version to connection initiation message
> -
>
> Key: ZOOKEEPER-1633
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1633
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Reporter: Alexander Shraer
>Assignee: Alexander Shraer
> Fix For: 3.4.6
>
> Attachments: ZOOKEEPER-1633.patch
>
>
> Currently the first message a server sends to another server includes just 
> one field - the server's id (long). This is in QuorumCnxManager.java. This 
> makes changes to the information passed during this initial connection very 
> difficult. This patch will change the first field of the message to be a 
> protocol version (a negative number that can't be a server id). The second 
> field will be the server id. The third field is number of bytes in the 
> remainder of the message. A 3.4 server will read the first field as before, 
> but if this is a negative number it will read the second field to find the 
> server id, and then remove the remainder of the message from the stream. This 
> will not affect 3.4 since 3.4 and earlier servers send just the server id (so 
> the code in the patch will not run unless there is a server > 3.4 trying to 
> connect). This will, however, provide the necessary flexibility for future 
> releases as well as an upgrade path from 3.4

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (ZOOKEEPER-1640) dynamically load command objects in zk

2013-02-26 Thread Tian Hong Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13586943#comment-13586943
 ] 

Tian Hong Wang commented on ZOOKEEPER-1640:
---

Ted, maybe you are correct. So close this issue.

> dynamically load command objects in zk
> --
>
> Key: ZOOKEEPER-1640
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1640
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: java client
>Reporter: Tian Hong Wang
>Assignee: Tian Hong Wang
>Priority: Minor
>  Labels: patch
> Fix For: 3.4.5
>
> Attachments: zookeeper.patch
>
>
> In class org.apache.zookeeper.ZooKeeperMain.java,
> new CloseCommand().addToMap(commandMapCli);
> new CreateCommand().addToMap(commandMapCli);
> new DeleteCommand().addToMap(commandMapCli);
> new DeleteAllCommand().addToMap(commandMapCli);
> // Depricated: rmr
> new DeleteAllCommand("rmr").addToMap(commandMapCli);
> new SetCommand().addToMap(commandMapCli);
> new GetCommand().addToMap(commandMapCli);
> new LsCommand().addToMap(commandMapCli);
> new Ls2Command().addToMap(commandMapCli);
> new GetAclCommand().addToMap(commandMapCli);
> new SetAclCommand().addToMap(commandMapCli);
> new StatCommand().addToMap(commandMapCli);
> new SyncCommand().addToMap(commandMapCli);
> new SetQuotaCommand().addToMap(commandMapCli);
> new ListQuotaCommand().addToMap(commandMapCli);
> new DelQuotaCommand().addToMap(commandMapCli);
> new AddAuthCommand().addToMap(commandMapCli);
> The above code is not flexible for command object scalability. It's better to 
> refine the code to load and create the command objects dynamically.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (ZOOKEEPER-1638) Redundant zk.getZKDatabase().clear();

2013-02-26 Thread Alexander Shraer (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13586951#comment-13586951
 ] 

Alexander Shraer commented on ZOOKEEPER-1638:
-

lgtm, [~fpj] if I remember correctly you were looking on related issues, does 
this seems ok to you ? 

> Redundant zk.getZKDatabase().clear();
> -
>
> Key: ZOOKEEPER-1638
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1638
> Project: ZooKeeper
>  Issue Type: Improvement
>Reporter: Alexander Shraer
>Priority: Trivial
>  Labels: newbie
> Fix For: 3.5.0
>
> Attachments: ZOOKEEPER-1638.patch
>
>
> Learner.syncWithLeader calls zk.getZKDatabase().clear() right before 
> zk.getZKDatabase().deserializeSnapshot(leaderIs); Then the first thing 
> deserializeSnapshot does is another clear(). 
> Suggest to remove the clear() in syncWithLeader.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (ZOOKEEPER-1638) Redundant zk.getZKDatabase().clear();

2013-02-26 Thread Flavio Junqueira (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13586988#comment-13586988
 ] 

Flavio Junqueira commented on ZOOKEEPER-1638:
-

Weird, both calls to clear() have been added in the same patch, ZOOKEEPER-596. 
It was probably overlooked in the review. I don't see a problem with removing 
this one, it doesn't make sense to call it twice.

I'm looking at related issues, but this one seems to be independent, good catch.

> Redundant zk.getZKDatabase().clear();
> -
>
> Key: ZOOKEEPER-1638
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1638
> Project: ZooKeeper
>  Issue Type: Improvement
>Reporter: Alexander Shraer
>Priority: Trivial
>  Labels: newbie
> Fix For: 3.5.0
>
> Attachments: ZOOKEEPER-1638.patch
>
>
> Learner.syncWithLeader calls zk.getZKDatabase().clear() right before 
> zk.getZKDatabase().deserializeSnapshot(leaderIs); Then the first thing 
> deserializeSnapshot does is another clear(). 
> Suggest to remove the clear() in syncWithLeader.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


ZooKeeper-trunk-solaris - Build # 481 - Still Failing

2013-02-26 Thread Apache Jenkins Server
See https://builds.apache.org/job/ZooKeeper-trunk-solaris/481/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 162655 lines...]
[junit] 2013-02-26 10:21:02,239 [myid:] - INFO  
[NIOServerCxnFactory.SelectorThread-0:NIOServerCnxnFactory$SelectorThread@413] 
- selector thread exitted run method
[junit] 2013-02-26 10:21:02,240 [myid:] - INFO  [main:ZooKeeperServer@398] 
- shutting down
[junit] 2013-02-26 10:21:02,240 [myid:] - INFO  
[main:SessionTrackerImpl@180] - Shutting down
[junit] 2013-02-26 10:21:02,240 [myid:] - INFO  
[main:PrepRequestProcessor@804] - Shutting down
[junit] 2013-02-26 10:21:02,240 [myid:] - INFO  
[main:SyncRequestProcessor@175] - Shutting down
[junit] 2013-02-26 10:21:02,240 [myid:] - INFO  [ProcessThread(sid:0 
cport:-1)::PrepRequestProcessor@144] - PrepRequestProcessor exited loop!
[junit] 2013-02-26 10:21:02,240 [myid:] - INFO  
[SyncThread:0:SyncRequestProcessor@155] - SyncRequestProcessor exited!
[junit] 2013-02-26 10:21:02,241 [myid:] - INFO  
[main:FinalRequestProcessor@421] - shutdown of request processor complete
[junit] 2013-02-26 10:21:02,241 [myid:] - INFO  
[main:FourLetterWordMain@43] - connecting to 127.0.0.1 11221
[junit] 2013-02-26 10:21:02,242 [myid:] - INFO  [main:JMXEnv@133] - 
ensureOnly:[]
[junit] 2013-02-26 10:21:02,243 [myid:] - INFO  [main:ClientBase@414] - 
STARTING server
[junit] 2013-02-26 10:21:02,243 [myid:] - INFO  [main:ZooKeeperServer@149] 
- Created server with tickTime 3000 minSessionTimeout 6000 maxSessionTimeout 
6 datadir 
/zonestorage/hudson_solaris/home/hudson/hudson-slave/workspace/ZooKeeper-trunk-solaris/trunk/build/test/tmp/test5151089989509480343.junit.dir/version-2
 snapdir 
/zonestorage/hudson_solaris/home/hudson/hudson-slave/workspace/ZooKeeper-trunk-solaris/trunk/build/test/tmp/test5151089989509480343.junit.dir/version-2
[junit] 2013-02-26 10:21:02,243 [myid:] - INFO  
[main:NIOServerCnxnFactory@663] - Configuring NIO connection handler with 10s 
sessionless connection timeout, 2 selector thread(s), 16 worker threads, and 64 
kB direct buffers.
[junit] 2013-02-26 10:21:02,244 [myid:] - INFO  
[main:NIOServerCnxnFactory@676] - binding to port 0.0.0.0/0.0.0.0:11221
[junit] 2013-02-26 10:21:02,245 [myid:] - INFO  [main:FileSnap@83] - 
Reading snapshot 
/zonestorage/hudson_solaris/home/hudson/hudson-slave/workspace/ZooKeeper-trunk-solaris/trunk/build/test/tmp/test5151089989509480343.junit.dir/version-2/snapshot.b
[junit] 2013-02-26 10:21:02,247 [myid:] - INFO  [main:FileTxnSnapLog@270] - 
Snapshotting: 0xb to 
/zonestorage/hudson_solaris/home/hudson/hudson-slave/workspace/ZooKeeper-trunk-solaris/trunk/build/test/tmp/test5151089989509480343.junit.dir/version-2/snapshot.b
[junit] 2013-02-26 10:21:02,249 [myid:] - INFO  
[main:FourLetterWordMain@43] - connecting to 127.0.0.1 11221
[junit] 2013-02-26 10:21:02,249 [myid:] - INFO  
[NIOServerCxnFactory.AcceptThread:0.0.0.0/0.0.0.0:11221:NIOServerCnxnFactory$AcceptThread@289]
 - Accepted socket connection from /127.0.0.1:54292
[junit] 2013-02-26 10:21:02,250 [myid:] - INFO  
[NIOWorkerThread-1:NIOServerCnxn@829] - Processing stat command from 
/127.0.0.1:54292
[junit] 2013-02-26 10:21:02,250 [myid:] - INFO  
[NIOWorkerThread-1:NIOServerCnxn$StatCommand@678] - Stat command output
[junit] 2013-02-26 10:21:02,250 [myid:] - INFO  
[NIOWorkerThread-1:NIOServerCnxn@1000] - Closed socket connection for client 
/127.0.0.1:54292 (no session established for client)
[junit] 2013-02-26 10:21:02,250 [myid:] - INFO  [main:JMXEnv@133] - 
ensureOnly:[InMemoryDataTree, StandaloneServer_port]
[junit] 2013-02-26 10:21:02,252 [myid:] - INFO  [main:JMXEnv@105] - 
expect:InMemoryDataTree
[junit] 2013-02-26 10:21:02,252 [myid:] - INFO  [main:JMXEnv@108] - 
found:InMemoryDataTree 
org.apache.ZooKeeperService:name0=StandaloneServer_port-1,name1=InMemoryDataTree
[junit] 2013-02-26 10:21:02,252 [myid:] - INFO  [main:JMXEnv@105] - 
expect:StandaloneServer_port
[junit] 2013-02-26 10:21:02,252 [myid:] - INFO  [main:JMXEnv@108] - 
found:StandaloneServer_port 
org.apache.ZooKeeperService:name0=StandaloneServer_port-1
[junit] 2013-02-26 10:21:02,252 [myid:] - INFO  
[main:JUnit4ZKTestRunner$LoggedInvokeMethod@57] - FINISHED TEST METHOD testQuota
[junit] 2013-02-26 10:21:02,252 [myid:] - INFO  [main:ClientBase@451] - 
tearDown starting
[junit] 2013-02-26 10:21:02,321 [myid:] - INFO  [main:ZooKeeper@744] - 
Session: 0x13d16050807 closed
[junit] 2013-02-26 10:21:02,322 [myid:] - INFO  
[main-EventThread:ClientCnxn$EventThread@513] - EventThread shut down
[junit] 2013-02-26 10:21:02,322 [myid:] - INFO  [main:ClientBase@421] - 
STOPPING server
[junit] 2013-02-26 10:21:02,322 [myid:] - INFO  
[NIOServerCxnFactory.AcceptThread:0.0.0.0/0.0.0.0:11221:NIOServerCnxnFactory$

[jira] [Commented] (ZOOKEEPER-1618) Disconnected event when stopping leader process

2013-02-26 Thread Flavio Junqueira (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13586995#comment-13586995
 ] 

Flavio Junqueira commented on ZOOKEEPER-1618:
-

I'm glad we have been able to make some progress here, Peter.

bq. Do you want me to create a new documentation bug or will you just re-use 
this one?

I would say that it is sufficient to just edit this one. It contains the 
discussion we've had and explains why we agreed upon changes to the 
documentation.

I'll edit the jira fields and please feel free to edit the summary if you think 
it is necessary. Also, please feel free to say how you think this should be 
documented and where if you have an opinion. It would be great to have your 
contribution.



> Disconnected event when stopping leader process
> ---
>
> Key: ZOOKEEPER-1618
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1618
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.4.4, 3.4.5
> Environment: Linux SLES
> java version "1.6.0_31"
>Reporter: Peter Nerg
>Priority: Minor
>
> Running a three node ZK cluster I stop/kill the leader node.
> Immediately all connected clients will receive a Disconnected event, a second 
> or so later an event with SyncConnected is received.
> Killing a follower will not produce the same issue/event.
> The application/clients have been implemented to manage Disconnected events 
> so they survive.
> I however expected the ZK client to manage the hickup during the election 
> process. 
> This produces quite a lot of logging in large clusters that have many 
> services relying on ZK.
> In some cases we may loose a few requests as we need a working ZK cluster to 
> execute those requests.
> IMHO it's not really full high availability if the ZK cluster momentarily 
> takes a dive due to that the leader goes away.
> No matter how much redundancy one uses in form of ZK instances one still may 
> get processing errors during leader election.
> I've verified this behavior in both 3.4.4 and 3.4.5

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (ZOOKEEPER-1618) Disconnected event when stopping leader process

2013-02-26 Thread Flavio Junqueira (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Flavio Junqueira updated ZOOKEEPER-1618:


Component/s: documentation

> Disconnected event when stopping leader process
> ---
>
> Key: ZOOKEEPER-1618
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1618
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: documentation
>Affects Versions: 3.4.4, 3.4.5
> Environment: Linux SLES
> java version "1.6.0_31"
>Reporter: Peter Nerg
>Priority: Minor
>
> Running a three node ZK cluster I stop/kill the leader node.
> Immediately all connected clients will receive a Disconnected event, a second 
> or so later an event with SyncConnected is received.
> Killing a follower will not produce the same issue/event.
> The application/clients have been implemented to manage Disconnected events 
> so they survive.
> I however expected the ZK client to manage the hickup during the election 
> process. 
> This produces quite a lot of logging in large clusters that have many 
> services relying on ZK.
> In some cases we may loose a few requests as we need a working ZK cluster to 
> execute those requests.
> IMHO it's not really full high availability if the ZK cluster momentarily 
> takes a dive due to that the leader goes away.
> No matter how much redundancy one uses in form of ZK instances one still may 
> get processing errors during leader election.
> I've verified this behavior in both 3.4.4 and 3.4.5

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (ZOOKEEPER-1618) Disconnected event when stopping leader process

2013-02-26 Thread Flavio Junqueira (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Flavio Junqueira updated ZOOKEEPER-1618:


Issue Type: Improvement  (was: Bug)

> Disconnected event when stopping leader process
> ---
>
> Key: ZOOKEEPER-1618
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1618
> Project: ZooKeeper
>  Issue Type: Improvement
>Affects Versions: 3.4.4, 3.4.5
> Environment: Linux SLES
> java version "1.6.0_31"
>Reporter: Peter Nerg
>Priority: Minor
>
> Running a three node ZK cluster I stop/kill the leader node.
> Immediately all connected clients will receive a Disconnected event, a second 
> or so later an event with SyncConnected is received.
> Killing a follower will not produce the same issue/event.
> The application/clients have been implemented to manage Disconnected events 
> so they survive.
> I however expected the ZK client to manage the hickup during the election 
> process. 
> This produces quite a lot of logging in large clusters that have many 
> services relying on ZK.
> In some cases we may loose a few requests as we need a working ZK cluster to 
> execute those requests.
> IMHO it's not really full high availability if the ZK cluster momentarily 
> takes a dive due to that the leader goes away.
> No matter how much redundancy one uses in form of ZK instances one still may 
> get processing errors during leader election.
> I've verified this behavior in both 3.4.4 and 3.4.5

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: [VOTE] BookKeeper 4.2.1 Release Candidate 0

2013-02-26 Thread Benjamin Reed
+1 thanx for putting this together ivan!

On Tue, Feb 19, 2013 at 2:24 PM, Flavio Junqueira  wrote:
> +1, I have run tests, rat, and checked the various project root files.
>
> -Flavio
>
> On Feb 19, 2013, at 6:52 PM, Ivan Kelly  wrote:
>
>> This is the first release candidate for Apache BookKeeper, version 4.2.1.
>>
>> This is a bugfix release to address the performance issues caused by 
>> BOOKKEEPER-569
>> https://issues.apache.org/jira/browse/BOOKKEEPER-569
>>
>> The full release notes is available at:
>>
>> https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12323840&styleName=Html&projectId=12311293
>>
>> *** Please download, test and vote by February 26th 2013, 10:00 UTC+0. ***
>>
>> Note that we are voting upon the source (tag), binaries are provided for
>> convenience.
>>
>> Source and binary files:
>> http://people.apache.org/~ivank/bookkeeper-4.2.1-candidate-0/
>>
>> Maven staging repo:
>> https://repository.apache.org/content/repositories/orgapachebookkeeper-271/
>>
>> The tag to be voted upon:
>> https://svn.apache.org/repos/asf/zookeeper/bookkeeper/tags/release-4.2.1
>>
>> BookKeeper's KEYS file containing PGP keys we use to sign the release:
>> http://svn.apache.org/repos/asf/zookeeper/bookkeeper/dist/KEYS
>>
>> Please download the the source package, and follow the README to build
>> and run a bookkeeper and hedwig service.
>


Re: [VOTE] BookKeeper 4.2.1 Release Candidate 0

2013-02-26 Thread Uma Maheswara Rao G
+1, I have ran the tests and they are passing with NN. Thanks a lot,
Ivan for your efforts in making this release.

Regards,
Uma

On Wed, Feb 20, 2013 at 12:22 AM, Ivan Kelly  wrote:
> This is the first release candidate for Apache BookKeeper, version 4.2.1.
>
> This is a bugfix release to address the performance issues caused by 
> BOOKKEEPER-569
> https://issues.apache.org/jira/browse/BOOKKEEPER-569
>
> The full release notes is available at:
>
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12323840&styleName=Html&projectId=12311293
>
> *** Please download, test and vote by February 26th 2013, 10:00 UTC+0. ***
>
> Note that we are voting upon the source (tag), binaries are provided for
> convenience.
>
> Source and binary files:
> http://people.apache.org/~ivank/bookkeeper-4.2.1-candidate-0/
>
> Maven staging repo:
> https://repository.apache.org/content/repositories/orgapachebookkeeper-271/
>
> The tag to be voted upon:
> https://svn.apache.org/repos/asf/zookeeper/bookkeeper/tags/release-4.2.1
>
> BookKeeper's KEYS file containing PGP keys we use to sign the release:
> http://svn.apache.org/repos/asf/zookeeper/bookkeeper/dist/KEYS
>
> Please download the the source package, and follow the README to build
> and run a bookkeeper and hedwig service.


Re: [VOTE] BookKeeper 4.2.1 Release Candidate 0

2013-02-26 Thread Ivan Kelly
As with have 5 +1 (Uma, Sijie, Ben, Flavio, mine) including the
required 3 PMC +1, this vote has passed. I will copy the artifacts
into place and make the release once the mirrors have synced.

Thanks for taking a look at this guys.

-Ivan

On Tue, Feb 26, 2013 at 10:25:13PM +0530, Uma Maheswara Rao G wrote:
> +1, I have ran the tests and they are passing with NN. Thanks a lot,
> Ivan for your efforts in making this release.
> 
> Regards,
> Uma
> 
> On Wed, Feb 20, 2013 at 12:22 AM, Ivan Kelly  wrote:
> > This is the first release candidate for Apache BookKeeper, version 4.2.1.
> >
> > This is a bugfix release to address the performance issues caused by 
> > BOOKKEEPER-569
> > https://issues.apache.org/jira/browse/BOOKKEEPER-569
> >
> > The full release notes is available at:
> >
> > https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12323840&styleName=Html&projectId=12311293
> >
> > *** Please download, test and vote by February 26th 2013, 10:00 UTC+0. ***
> >
> > Note that we are voting upon the source (tag), binaries are provided for
> > convenience.
> >
> > Source and binary files:
> > http://people.apache.org/~ivank/bookkeeper-4.2.1-candidate-0/
> >
> > Maven staging repo:
> > https://repository.apache.org/content/repositories/orgapachebookkeeper-271/
> >
> > The tag to be voted upon:
> > https://svn.apache.org/repos/asf/zookeeper/bookkeeper/tags/release-4.2.1
> >
> > BookKeeper's KEYS file containing PGP keys we use to sign the release:
> > http://svn.apache.org/repos/asf/zookeeper/bookkeeper/dist/KEYS
> >
> > Please download the the source package, and follow the README to build
> > and run a bookkeeper and hedwig service.


[jira] [Commented] (ZOOKEEPER-1519) Zookeeper Async calls can reference free()'d memory

2013-02-26 Thread Raul Gutierrez Segales (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13587340#comment-13587340
 ] 

Raul Gutierrez Segales commented on ZOOKEEPER-1519:
---

Does sizeof *(void *) work?

> Zookeeper Async calls can reference free()'d memory
> ---
>
> Key: ZOOKEEPER-1519
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1519
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: c client
>Affects Versions: 3.3.3, 3.3.6
> Environment: Ubuntu 11.10, Ubuntu packaged Zookeeper 3.3.3 with some 
> backported fixes.
>Reporter: Mark Gius
> Attachments: zookeeper-1519.patch
>
>
> zoo_acreate() and zoo_aset() take a char * argument for data and prepare a 
> call to zookeeper.  This char * doesn't seem to be duplicated at any point, 
> making it possible that the caller of the asynchronous function might 
> potentially free() the char * argument before the zookeeper library completes 
> its request.  This is unlikely to present a real problem unless the freed 
> memory is re-used before zookeeper consumes it.  I've been unable to 
> reproduce this issue using pure C as a result.
> However, ZKPython is a whole different story.  Consider this snippet:
>   ok = zookeeper.acreate(handle, path, json.dumps(value), 
>  acl, flags, callback)
>   assert ok == zookeeper.OK
> In this snippet, json.dumps() allocates a string which is passed into the 
> acreate().  When acreate() returns, the zookeeper request has been 
> constructed with a pointer to the string allocated by json.dumps().  Also 
> when acreate() returns, that string is now referenced by 0 things (ZKPython 
> doesn't bump the refcount) and the string is eligible for garbage collection 
> and re-use.  The Zookeeper request now has a pointer to dangerous freed 
> memory.
> I've been seeing odd behavior in our development environments for some time 
> now, where it appeared as though two separate JSON payloads had been joined 
> together.  Python has been allocating a new JSON string in the middle of the 
> old string that an incomplete zookeeper async call had not yet processed.
> I am not sure if this is a behavior that should be documented, or if the C 
> binding implementation needs to be updated to create copies of the data 
> payload provided for aset and acreate.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (ZOOKEEPER-1519) Zookeeper Async calls can reference free()'d memory

2013-02-26 Thread Raul Gutierrez Segales (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13587351#comment-13587351
 ] 

Raul Gutierrez Segales commented on ZOOKEEPER-1519:
---

Don't think so: http://fpaste.org/iwjf/

> Zookeeper Async calls can reference free()'d memory
> ---
>
> Key: ZOOKEEPER-1519
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1519
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: c client
>Affects Versions: 3.3.3, 3.3.6
> Environment: Ubuntu 11.10, Ubuntu packaged Zookeeper 3.3.3 with some 
> backported fixes.
>Reporter: Mark Gius
> Attachments: zookeeper-1519.patch
>
>
> zoo_acreate() and zoo_aset() take a char * argument for data and prepare a 
> call to zookeeper.  This char * doesn't seem to be duplicated at any point, 
> making it possible that the caller of the asynchronous function might 
> potentially free() the char * argument before the zookeeper library completes 
> its request.  This is unlikely to present a real problem unless the freed 
> memory is re-used before zookeeper consumes it.  I've been unable to 
> reproduce this issue using pure C as a result.
> However, ZKPython is a whole different story.  Consider this snippet:
>   ok = zookeeper.acreate(handle, path, json.dumps(value), 
>  acl, flags, callback)
>   assert ok == zookeeper.OK
> In this snippet, json.dumps() allocates a string which is passed into the 
> acreate().  When acreate() returns, the zookeeper request has been 
> constructed with a pointer to the string allocated by json.dumps().  Also 
> when acreate() returns, that string is now referenced by 0 things (ZKPython 
> doesn't bump the refcount) and the string is eligible for garbage collection 
> and re-use.  The Zookeeper request now has a pointer to dangerous freed 
> memory.
> I've been seeing odd behavior in our development environments for some time 
> now, where it appeared as though two separate JSON payloads had been joined 
> together.  Python has been allocating a new JSON string in the middle of the 
> old string that an incomplete zookeeper async call had not yet processed.
> I am not sure if this is a behavior that should be documented, or if the C 
> binding implementation needs to be updated to create copies of the data 
> payload provided for aset and acreate.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (ZOOKEEPER-1652) zookeeper java client does a reverse dns lookup when connecting

2013-02-26 Thread Sean Bridges (JIRA)
Sean Bridges created ZOOKEEPER-1652:
---

 Summary: zookeeper java client does a reverse dns lookup when 
connecting
 Key: ZOOKEEPER-1652
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1652
 Project: ZooKeeper
  Issue Type: Bug
  Components: java client
Affects Versions: 3.4.5
Reporter: Sean Bridges
Priority: Critical


When connecting to zookeeper, the client does a reverse dns lookup on the 
hostname.  In our environment, the reverse dns lookup takes 5 seconds to fail, 
causing zookeeper clients to connect slowly.

The reverse dns lookup occurs in ClientCnx in the calls to adr.getHostName()

{code}
setName(getName().replaceAll("\\(.*\\)",
"(" + addr.getHostName() + ":" + addr.getPort() + ")"));
try {
zooKeeperSaslClient = new 
ZooKeeperSaslClient("zookeeper/"+addr.getHostName());
} catch (LoginException e) {
{code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (ZOOKEEPER-1652) zookeeper java client does a reverse dns lookup when connecting

2013-02-26 Thread Sean Bridges (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Bridges updated ZOOKEEPER-1652:


Attachment: ZOOKEEPER-1652.patch

> zookeeper java client does a reverse dns lookup when connecting
> ---
>
> Key: ZOOKEEPER-1652
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1652
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: java client
>Affects Versions: 3.4.5
>Reporter: Sean Bridges
>Priority: Critical
> Attachments: ZOOKEEPER-1652.patch
>
>
> When connecting to zookeeper, the client does a reverse dns lookup on the 
> hostname.  In our environment, the reverse dns lookup takes 5 seconds to 
> fail, causing zookeeper clients to connect slowly.
> The reverse dns lookup occurs in ClientCnx in the calls to adr.getHostName()
> {code}
> setName(getName().replaceAll("\\(.*\\)",
> "(" + addr.getHostName() + ":" + addr.getPort() + ")"));
> try {
> zooKeeperSaslClient = new 
> ZooKeeperSaslClient("zookeeper/"+addr.getHostName());
> } catch (LoginException e) {
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Failed: ZOOKEEPER-1652 PreCommit Build #1410

2013-02-26 Thread Apache Jenkins Server
Jira: https://issues.apache.org/jira/browse/ZOOKEEPER-1652
Build: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1410/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 271926 lines...]
 [exec] 
 [exec] -1 overall.  Here are the results of testing the latest attachment 
 [exec]   
http://issues.apache.org/jira/secure/attachment/12571025/ZOOKEEPER-1652.patch
 [exec]   against trunk revision 1448007.
 [exec] 
 [exec] +1 @author.  The patch does not contain any @author tags.
 [exec] 
 [exec] -1 tests included.  The patch doesn't appear to include any new 
or modified tests.
 [exec] Please justify why no new tests are needed 
for this patch.
 [exec] Also please list what manual steps were 
performed to verify this patch.
 [exec] 
 [exec] +1 javadoc.  The javadoc tool did not generate any warning 
messages.
 [exec] 
 [exec] +1 javac.  The applied patch does not increase the total number 
of javac compiler warnings.
 [exec] 
 [exec] +1 findbugs.  The patch does not introduce any new Findbugs 
(version 1.3.9) warnings.
 [exec] 
 [exec] +1 release audit.  The applied patch does not increase the 
total number of release audit warnings.
 [exec] 
 [exec] +1 core tests.  The patch passed core unit tests.
 [exec] 
 [exec] +1 contrib tests.  The patch passed contrib unit tests.
 [exec] 
 [exec] Test results: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1410//testReport/
 [exec] Findbugs warnings: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1410//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
 [exec] Console output: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1410//console
 [exec] 
 [exec] This message is automatically generated.
 [exec] 
 [exec] 
 [exec] 
==
 [exec] 
==
 [exec] Adding comment to Jira.
 [exec] 
==
 [exec] 
==
 [exec] 
 [exec] 
 [exec] Comment added.
 [exec] 16b5c4f66adadb2c0465c2eefa0e9a2a4f69f5bd logged out
 [exec] 
 [exec] 
 [exec] 
==
 [exec] 
==
 [exec] Finished build.
 [exec] 
==
 [exec] 
==
 [exec] 
 [exec] 

BUILD FAILED
/home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-Build/trunk/build.xml:1623:
 exec returned: 1

Total time: 28 minutes 21 seconds
Build step 'Execute shell' marked build as failure
Archiving artifacts
Recording test results
Description set: ZOOKEEPER-1652
Email was triggered for: Failure
Sending email for trigger: Failure



###
## FAILED TESTS (if any) 
##
All tests passed

[jira] [Commented] (ZOOKEEPER-1652) zookeeper java client does a reverse dns lookup when connecting

2013-02-26 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13587433#comment-13587433
 ] 

Hadoop QA commented on ZOOKEEPER-1652:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12571025/ZOOKEEPER-1652.patch
  against trunk revision 1448007.

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1410//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1410//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1410//console

This message is automatically generated.

> zookeeper java client does a reverse dns lookup when connecting
> ---
>
> Key: ZOOKEEPER-1652
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1652
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: java client
>Affects Versions: 3.4.5
>Reporter: Sean Bridges
>Priority: Critical
> Attachments: ZOOKEEPER-1652.patch
>
>
> When connecting to zookeeper, the client does a reverse dns lookup on the 
> hostname.  In our environment, the reverse dns lookup takes 5 seconds to 
> fail, causing zookeeper clients to connect slowly.
> The reverse dns lookup occurs in ClientCnx in the calls to adr.getHostName()
> {code}
> setName(getName().replaceAll("\\(.*\\)",
> "(" + addr.getHostName() + ":" + addr.getPort() + ")"));
> try {
> zooKeeperSaslClient = new 
> ZooKeeperSaslClient("zookeeper/"+addr.getHostName());
> } catch (LoginException e) {
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (ZOOKEEPER-1519) Zookeeper Async calls can reference free()'d memory

2013-02-26 Thread Daniel Lescohier (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13587520#comment-13587520
 ] 

Daniel Lescohier commented on ZOOKEEPER-1519:
-

You're correct. I didn't have time to test the patch, but I wanted to get it 
out there for discussion.

I looked further, and the callers get void* data from the public API with no 
length parameter.  So, the public API does not allow us to copy the data.

In order to fix it, it looks like a public API change is required.  Either:

 1. Document in the API that the caller cannot free that memory until the 
zookeeper library is done with it (which also means it can't be a pointer to 
memory on the stack).  I assume that the library is done with it once it calls 
the completion callback? So the program can free it once it gets the same 
pointer back in a callback (or when the zookeeper connection is closed). I 
think this would make it hard to integrate with scripting languages like 
Python, because the scripting language C interface would have to copy the 
memory, account for it in some global structure, and free it once it sees that 
pointer again in a callback or when the zookeeper connection is closed.

 2. Document in the API that the void * must be malloc'ed memory, and the 
ownership is passed to the library (which means the caller copies it, and the 
library frees it). That's also a difficult API.

 3. Add a data length parameter to the API, so the library can copy it.

 4. Don't use a void * for the 'data' parameter, use something else.


> Zookeeper Async calls can reference free()'d memory
> ---
>
> Key: ZOOKEEPER-1519
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1519
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: c client
>Affects Versions: 3.3.3, 3.3.6
> Environment: Ubuntu 11.10, Ubuntu packaged Zookeeper 3.3.3 with some 
> backported fixes.
>Reporter: Mark Gius
> Attachments: zookeeper-1519.patch
>
>
> zoo_acreate() and zoo_aset() take a char * argument for data and prepare a 
> call to zookeeper.  This char * doesn't seem to be duplicated at any point, 
> making it possible that the caller of the asynchronous function might 
> potentially free() the char * argument before the zookeeper library completes 
> its request.  This is unlikely to present a real problem unless the freed 
> memory is re-used before zookeeper consumes it.  I've been unable to 
> reproduce this issue using pure C as a result.
> However, ZKPython is a whole different story.  Consider this snippet:
>   ok = zookeeper.acreate(handle, path, json.dumps(value), 
>  acl, flags, callback)
>   assert ok == zookeeper.OK
> In this snippet, json.dumps() allocates a string which is passed into the 
> acreate().  When acreate() returns, the zookeeper request has been 
> constructed with a pointer to the string allocated by json.dumps().  Also 
> when acreate() returns, that string is now referenced by 0 things (ZKPython 
> doesn't bump the refcount) and the string is eligible for garbage collection 
> and re-use.  The Zookeeper request now has a pointer to dangerous freed 
> memory.
> I've been seeing odd behavior in our development environments for some time 
> now, where it appeared as though two separate JSON payloads had been joined 
> together.  Python has been allocating a new JSON string in the middle of the 
> old string that an incomplete zookeeper async call had not yet processed.
> I am not sure if this is a behavior that should be documented, or if the C 
> binding implementation needs to be updated to create copies of the data 
> payload provided for aset and acreate.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (ZOOKEEPER-1519) Zookeeper Async calls can reference free()'d memory

2013-02-26 Thread Daniel Lescohier (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13587562#comment-13587562
 ] 

Daniel Lescohier commented on ZOOKEEPER-1519:
-

Never mind my comments about 'data'. I was confusing 'value' with 'data.'  
'value' is copied when it is serialized. So, I don't see where the submitter's 
original problem can occur.

'data' is all right, because it should be passed by reference.  In the zkpython 
bindings, it's used to hold a reference to the Python callable object that 
should be called on the callback.  That's a python object, so it's refcounted.  
'completion' is the zkpython C function that is called, and calls the python 
callable object that is cast to a Python object and called with the Python API.


> Zookeeper Async calls can reference free()'d memory
> ---
>
> Key: ZOOKEEPER-1519
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1519
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: c client
>Affects Versions: 3.3.3, 3.3.6
> Environment: Ubuntu 11.10, Ubuntu packaged Zookeeper 3.3.3 with some 
> backported fixes.
>Reporter: Mark Gius
> Attachments: zookeeper-1519.patch
>
>
> zoo_acreate() and zoo_aset() take a char * argument for data and prepare a 
> call to zookeeper.  This char * doesn't seem to be duplicated at any point, 
> making it possible that the caller of the asynchronous function might 
> potentially free() the char * argument before the zookeeper library completes 
> its request.  This is unlikely to present a real problem unless the freed 
> memory is re-used before zookeeper consumes it.  I've been unable to 
> reproduce this issue using pure C as a result.
> However, ZKPython is a whole different story.  Consider this snippet:
>   ok = zookeeper.acreate(handle, path, json.dumps(value), 
>  acl, flags, callback)
>   assert ok == zookeeper.OK
> In this snippet, json.dumps() allocates a string which is passed into the 
> acreate().  When acreate() returns, the zookeeper request has been 
> constructed with a pointer to the string allocated by json.dumps().  Also 
> when acreate() returns, that string is now referenced by 0 things (ZKPython 
> doesn't bump the refcount) and the string is eligible for garbage collection 
> and re-use.  The Zookeeper request now has a pointer to dangerous freed 
> memory.
> I've been seeing odd behavior in our development environments for some time 
> now, where it appeared as though two separate JSON payloads had been joined 
> together.  Python has been allocating a new JSON string in the middle of the 
> old string that an incomplete zookeeper async call had not yet processed.
> I am not sure if this is a behavior that should be documented, or if the C 
> binding implementation needs to be updated to create copies of the data 
> payload provided for aset and acreate.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


RFC: Behavior of QuotaExceededException

2013-02-26 Thread Thawan Kooburat
Hi,
I am currently working on ZOOKEEPER-1383. One of the main feature introduced in 
this change is to allow ZooKeeper to enforce hard limit (e.g.  Txn per sec) per 
folder .

With hard limit, we need to introduce a new exception/error code 
(QuotaExceeded) for ZooKeeper operations that modify the DataTree.  If a client 
get this error, it means that the particular operation is definitely failed.

>From our internal discussion, this may make it harder for a user to write an 
>application.  The thought is that this can possibly introduce a hole in 
>sequence of operations that the client application performs, since some 
>operation may success but some may be not.  One of the idea is to also  
>trigger session expire (or at least trigger disconnect) on the server-side in 
>addition to QuotaExceed error.  This will cause all subsequent operations from 
>that client to fail and allow the application to use existing error handling 
>logic to recover from QuotaExceed.  Typically, the application that exceeded 
>the quota is already doing something wrong from administrator's perspective, 
>but we also want to fail gracefully and able to recover when the problem is 
>fixed or quota is increased.

Let me know if you have any suggestion.

--
Thawan Kooburat


[jira] [Created] (ZOOKEEPER-1653) zookeeper fails to start because of inconsistent epoch

2013-02-26 Thread Michi Mutsuzaki (JIRA)
Michi Mutsuzaki created ZOOKEEPER-1653:
--

 Summary: zookeeper fails to start because of inconsistent epoch
 Key: ZOOKEEPER-1653
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1653
 Project: ZooKeeper
  Issue Type: Bug
  Components: quorum
Affects Versions: 3.4.5
Reporter: Michi Mutsuzaki


It looks like QuorumPeer.loadDataBase() could fail if the server was restarted 
after zk.takeSnapshot() but before finishing self.setCurrentEpoch(newEpoch).

{code:java}
case Leader.NEWLEADER: // it will be NEWLEADER in v1.0
zk.takeSnapshot();
self.setCurrentEpoch(newEpoch); // <<< got restarted here
snapshotTaken = true;
writePacket(new QuorumPacket(Leader.ACK, newLeaderZxid, null, null), true);
break;
{code}

The server fails to start because currentEpoch is still 1 but the last 
processed zkid from the snapshot has been updated to 2.

{noformat}
2013-02-20 13:45:02,733 5543 [pool-1-thread-1] ERROR 
org.apache.zookeeper.server.quorum.QuorumPeer  - Unable to load database on disk
java.io.IOException: The current epoch, 1, is older than the last zxid, 
8589934592
at 
org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:439)
at 
org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:413)
...
{noformat}

{noformat}
$ find datadir 
datadir
datadir/version-2
datadir/version-2/currentEpoch.tmp
datadir/version-2/acceptedEpoch
datadir/version-2/snapshot.0
datadir/version-2/currentEpoch
datadir/version-2/snapshot.2

$ cat datadir/version-2/currentEpoch.tmp
2%
$ cat datadir/version-2/acceptedEpoch
2%
$ cat datadir/version-2/currentEpoch
1%
{noformat}


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (ZOOKEEPER-1653) zookeeper fails to start because of inconsistent epoch

2013-02-26 Thread Michi Mutsuzaki (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michi Mutsuzaki updated ZOOKEEPER-1653:
---

Description: 
It looks like QuorumPeer.loadDataBase() could fail if the server was restarted 
after zk.takeSnapshot() but before finishing self.setCurrentEpoch(newEpoch).

{code:java}
case Leader.NEWLEADER: // it will be NEWLEADER in v1.0
zk.takeSnapshot();
self.setCurrentEpoch(newEpoch); // <<< got restarted here
snapshotTaken = true;
writePacket(new QuorumPacket(Leader.ACK, newLeaderZxid, null, null), true);
break;
{code}

The server fails to start because currentEpoch is still 1 but the last 
processed zkid from the snapshot has been updated.

{noformat}
2013-02-20 13:45:02,733 5543 [pool-1-thread-1] ERROR 
org.apache.zookeeper.server.quorum.QuorumPeer  - Unable to load database on disk
java.io.IOException: The current epoch, 1, is older than the last zxid, 
8589934592
at 
org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:439)
at 
org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:413)
...
{noformat}

{noformat}
$ find datadir 
datadir
datadir/version-2
datadir/version-2/currentEpoch.tmp
datadir/version-2/acceptedEpoch
datadir/version-2/snapshot.0
datadir/version-2/currentEpoch
datadir/version-2/snapshot.2

$ cat datadir/version-2/currentEpoch.tmp
2%
$ cat datadir/version-2/acceptedEpoch
2%
$ cat datadir/version-2/currentEpoch
1%
{noformat}


  was:
It looks like QuorumPeer.loadDataBase() could fail if the server was restarted 
after zk.takeSnapshot() but before finishing self.setCurrentEpoch(newEpoch).

{code:java}
case Leader.NEWLEADER: // it will be NEWLEADER in v1.0
zk.takeSnapshot();
self.setCurrentEpoch(newEpoch); // <<< got restarted here
snapshotTaken = true;
writePacket(new QuorumPacket(Leader.ACK, newLeaderZxid, null, null), true);
break;
{code}

The server fails to start because currentEpoch is still 1 but the last 
processed zkid from the snapshot has been updated to 2.

{noformat}
2013-02-20 13:45:02,733 5543 [pool-1-thread-1] ERROR 
org.apache.zookeeper.server.quorum.QuorumPeer  - Unable to load database on disk
java.io.IOException: The current epoch, 1, is older than the last zxid, 
8589934592
at 
org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:439)
at 
org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:413)
...
{noformat}

{noformat}
$ find datadir 
datadir
datadir/version-2
datadir/version-2/currentEpoch.tmp
datadir/version-2/acceptedEpoch
datadir/version-2/snapshot.0
datadir/version-2/currentEpoch
datadir/version-2/snapshot.2

$ cat datadir/version-2/currentEpoch.tmp
2%
$ cat datadir/version-2/acceptedEpoch
2%
$ cat datadir/version-2/currentEpoch
1%
{noformat}



> zookeeper fails to start because of inconsistent epoch
> --
>
> Key: ZOOKEEPER-1653
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1653
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: quorum
>Affects Versions: 3.4.5
>Reporter: Michi Mutsuzaki
>
> It looks like QuorumPeer.loadDataBase() could fail if the server was 
> restarted after zk.takeSnapshot() but before finishing 
> self.setCurrentEpoch(newEpoch).
> {code:java}
> case Leader.NEWLEADER: // it will be NEWLEADER in v1.0
> zk.takeSnapshot();
> self.setCurrentEpoch(newEpoch); // <<< got restarted here
> snapshotTaken = true;
> writePacket(new QuorumPacket(Leader.ACK, newLeaderZxid, null, null), 
> true);
> break;
> {code}
> The server fails to start because currentEpoch is still 1 but the last 
> processed zkid from the snapshot has been updated.
> {noformat}
> 2013-02-20 13:45:02,733 5543 [pool-1-thread-1] ERROR 
> org.apache.zookeeper.server.quorum.QuorumPeer  - Unable to load database on 
> disk
> java.io.IOException: The current epoch, 1, is older than the last zxid, 
> 8589934592
> at 
> org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:439)
> at 
> org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:413)
> ...
> {noformat}
> {noformat}
> $ find datadir 
> datadir
> datadir/version-2
> datadir/version-2/currentEpoch.tmp
> datadir/version-2/acceptedEpoch
> datadir/version-2/snapshot.0
> datadir/version-2/currentEpoch
> datadir/version-2/snapshot.2
> $ cat datadir/version-2/currentEpoch.tmp
> 2%
> $ cat datadir/version-2/acceptedEpoch
> 2%
> $ cat datadir/version-2/currentEpoch
> 1%
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For 

[jira] [Updated] (ZOOKEEPER-1653) zookeeper fails to start because of inconsistent epoch

2013-02-26 Thread Michi Mutsuzaki (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michi Mutsuzaki updated ZOOKEEPER-1653:
---

Description: 
It looks like QuorumPeer.loadDataBase() could fail if the server was restarted 
after zk.takeSnapshot() but before finishing self.setCurrentEpoch(newEpoch) in 
Learner.java.

{code:java}
case Leader.NEWLEADER: // it will be NEWLEADER in v1.0
zk.takeSnapshot();
self.setCurrentEpoch(newEpoch); // <<< got restarted here
snapshotTaken = true;
writePacket(new QuorumPacket(Leader.ACK, newLeaderZxid, null, null), true);
break;
{code}

The server fails to start because currentEpoch is still 1 but the last 
processed zkid from the snapshot has been updated.

{noformat}
2013-02-20 13:45:02,733 5543 [pool-1-thread-1] ERROR 
org.apache.zookeeper.server.quorum.QuorumPeer  - Unable to load database on disk
java.io.IOException: The current epoch, 1, is older than the last zxid, 
8589934592
at 
org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:439)
at 
org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:413)
...
{noformat}

{noformat}
$ find datadir 
datadir
datadir/version-2
datadir/version-2/currentEpoch.tmp
datadir/version-2/acceptedEpoch
datadir/version-2/snapshot.0
datadir/version-2/currentEpoch
datadir/version-2/snapshot.2

$ cat datadir/version-2/currentEpoch.tmp
2%
$ cat datadir/version-2/acceptedEpoch
2%
$ cat datadir/version-2/currentEpoch
1%
{noformat}


  was:
It looks like QuorumPeer.loadDataBase() could fail if the server was restarted 
after zk.takeSnapshot() but before finishing self.setCurrentEpoch(newEpoch).

{code:java}
case Leader.NEWLEADER: // it will be NEWLEADER in v1.0
zk.takeSnapshot();
self.setCurrentEpoch(newEpoch); // <<< got restarted here
snapshotTaken = true;
writePacket(new QuorumPacket(Leader.ACK, newLeaderZxid, null, null), true);
break;
{code}

The server fails to start because currentEpoch is still 1 but the last 
processed zkid from the snapshot has been updated.

{noformat}
2013-02-20 13:45:02,733 5543 [pool-1-thread-1] ERROR 
org.apache.zookeeper.server.quorum.QuorumPeer  - Unable to load database on disk
java.io.IOException: The current epoch, 1, is older than the last zxid, 
8589934592
at 
org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:439)
at 
org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:413)
...
{noformat}

{noformat}
$ find datadir 
datadir
datadir/version-2
datadir/version-2/currentEpoch.tmp
datadir/version-2/acceptedEpoch
datadir/version-2/snapshot.0
datadir/version-2/currentEpoch
datadir/version-2/snapshot.2

$ cat datadir/version-2/currentEpoch.tmp
2%
$ cat datadir/version-2/acceptedEpoch
2%
$ cat datadir/version-2/currentEpoch
1%
{noformat}



> zookeeper fails to start because of inconsistent epoch
> --
>
> Key: ZOOKEEPER-1653
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1653
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: quorum
>Affects Versions: 3.4.5
>Reporter: Michi Mutsuzaki
>
> It looks like QuorumPeer.loadDataBase() could fail if the server was 
> restarted after zk.takeSnapshot() but before finishing 
> self.setCurrentEpoch(newEpoch) in Learner.java.
> {code:java}
> case Leader.NEWLEADER: // it will be NEWLEADER in v1.0
> zk.takeSnapshot();
> self.setCurrentEpoch(newEpoch); // <<< got restarted here
> snapshotTaken = true;
> writePacket(new QuorumPacket(Leader.ACK, newLeaderZxid, null, null), 
> true);
> break;
> {code}
> The server fails to start because currentEpoch is still 1 but the last 
> processed zkid from the snapshot has been updated.
> {noformat}
> 2013-02-20 13:45:02,733 5543 [pool-1-thread-1] ERROR 
> org.apache.zookeeper.server.quorum.QuorumPeer  - Unable to load database on 
> disk
> java.io.IOException: The current epoch, 1, is older than the last zxid, 
> 8589934592
> at 
> org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:439)
> at 
> org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:413)
> ...
> {noformat}
> {noformat}
> $ find datadir 
> datadir
> datadir/version-2
> datadir/version-2/currentEpoch.tmp
> datadir/version-2/acceptedEpoch
> datadir/version-2/snapshot.0
> datadir/version-2/currentEpoch
> datadir/version-2/snapshot.2
> $ cat datadir/version-2/currentEpoch.tmp
> 2%
> $ cat datadir/version-2/acceptedEpoch
> 2%
> $ cat datadir/version-2/currentEpoch
> 1%
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact y

[jira] [Updated] (ZOOKEEPER-1633) Introduce a protocol version to connection initiation message

2013-02-26 Thread Alexander Shraer (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Shraer updated ZOOKEEPER-1633:


Attachment: ZOOKEEPER-1633-ver2.patch

Attached patch includes a test. I verified that the test fails without the 
patch code. [~fpj] can you please take a look and commit if everything's fine ?

> Introduce a protocol version to connection initiation message
> -
>
> Key: ZOOKEEPER-1633
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1633
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Reporter: Alexander Shraer
>Assignee: Alexander Shraer
> Fix For: 3.4.6
>
> Attachments: ZOOKEEPER-1633.patch, ZOOKEEPER-1633-ver2.patch
>
>
> Currently the first message a server sends to another server includes just 
> one field - the server's id (long). This is in QuorumCnxManager.java. This 
> makes changes to the information passed during this initial connection very 
> difficult. This patch will change the first field of the message to be a 
> protocol version (a negative number that can't be a server id). The second 
> field will be the server id. The third field is number of bytes in the 
> remainder of the message. A 3.4 server will read the first field as before, 
> but if this is a negative number it will read the second field to find the 
> server id, and then remove the remainder of the message from the stream. This 
> will not affect 3.4 since 3.4 and earlier servers send just the server id (so 
> the code in the patch will not run unless there is a server > 3.4 trying to 
> connect). This will, however, provide the necessary flexibility for future 
> releases as well as an upgrade path from 3.4

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Success: ZOOKEEPER-1633 PreCommit Build #1411

2013-02-26 Thread Apache Jenkins Server
Jira: https://issues.apache.org/jira/browse/ZOOKEEPER-1633
Build: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1411/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 268270 lines...]
 [exec] BUILD SUCCESSFUL
 [exec] Total time: 0 seconds
 [exec] 
 [exec] 
 [exec] 
 [exec] 
 [exec] +1 overall.  Here are the results of testing the latest attachment 
 [exec]   
http://issues.apache.org/jira/secure/attachment/12571138/ZOOKEEPER-1633-ver2.patch
 [exec]   against trunk revision 1448007.
 [exec] 
 [exec] +1 @author.  The patch does not contain any @author tags.
 [exec] 
 [exec] +1 tests included.  The patch appears to include 3 new or 
modified tests.
 [exec] 
 [exec] +1 javadoc.  The javadoc tool did not generate any warning 
messages.
 [exec] 
 [exec] +1 javac.  The applied patch does not increase the total number 
of javac compiler warnings.
 [exec] 
 [exec] +1 findbugs.  The patch does not introduce any new Findbugs 
(version 1.3.9) warnings.
 [exec] 
 [exec] +1 release audit.  The applied patch does not increase the 
total number of release audit warnings.
 [exec] 
 [exec] +1 core tests.  The patch passed core unit tests.
 [exec] 
 [exec] +1 contrib tests.  The patch passed contrib unit tests.
 [exec] 
 [exec] Test results: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1411//testReport/
 [exec] Findbugs warnings: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1411//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
 [exec] Console output: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1411//console
 [exec] 
 [exec] This message is automatically generated.
 [exec] 
 [exec] 
 [exec] 
==
 [exec] 
==
 [exec] Adding comment to Jira.
 [exec] 
==
 [exec] 
==
 [exec] 
 [exec] 
 [exec] Comment added.
 [exec] 2f3587519f4904ddda08bac9b937d13fc747a7c6 logged out
 [exec] 
 [exec] 
 [exec] 
==
 [exec] 
==
 [exec] Finished build.
 [exec] 
==
 [exec] 
==
 [exec] 
 [exec] 

BUILD SUCCESSFUL
Total time: 28 minutes 28 seconds
Archiving artifacts
Recording test results
Description set: ZOOKEEPER-1633
Email was triggered for: Success
Sending email for trigger: Success



###
## FAILED TESTS (if any) 
##
All tests passed

[jira] [Commented] (ZOOKEEPER-1633) Introduce a protocol version to connection initiation message

2013-02-26 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13588040#comment-13588040
 ] 

Hadoop QA commented on ZOOKEEPER-1633:
--

+1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12571138/ZOOKEEPER-1633-ver2.patch
  against trunk revision 1448007.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1411//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1411//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1411//console

This message is automatically generated.

> Introduce a protocol version to connection initiation message
> -
>
> Key: ZOOKEEPER-1633
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1633
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Reporter: Alexander Shraer
>Assignee: Alexander Shraer
> Fix For: 3.4.6
>
> Attachments: ZOOKEEPER-1633.patch, ZOOKEEPER-1633-ver2.patch
>
>
> Currently the first message a server sends to another server includes just 
> one field - the server's id (long). This is in QuorumCnxManager.java. This 
> makes changes to the information passed during this initial connection very 
> difficult. This patch will change the first field of the message to be a 
> protocol version (a negative number that can't be a server id). The second 
> field will be the server id. The third field is number of bytes in the 
> remainder of the message. A 3.4 server will read the first field as before, 
> but if this is a negative number it will read the second field to find the 
> server id, and then remove the remainder of the message from the stream. This 
> will not affect 3.4 since 3.4 and earlier servers send just the server id (so 
> the code in the patch will not run unless there is a server > 3.4 trying to 
> connect). This will, however, provide the necessary flexibility for future 
> releases as well as an upgrade path from 3.4

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira