[jira] [Updated] (ZOOKEEPER-1162) consistent handling of jute.maxbuffer when attempting to read large zk directories

2011-08-25 Thread Patrick Hunt (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Hunt updated ZOOKEEPER-1162:


  Component/s: server
 Priority: Critical  (was: Major)
Fix Version/s: 3.5.0

I would be nice to address this, it comes up somewhat frequently - I believe 
watch re-registration and such are effected by this as well.

Perhaps we should enforce this when setting data for a znode, but otw allow it 
to exceed the max when reading?

 consistent handling of jute.maxbuffer when attempting to read large zk 
 directories
 

 Key: ZOOKEEPER-1162
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1162
 Project: ZooKeeper
  Issue Type: Improvement
  Components: server
Affects Versions: 3.3.3
Reporter: Jonathan Hsieh
Priority: Critical
 Fix For: 3.5.0


 Recently we encountered a sitaution where a zk directory got sucessfully 
 populated with 250k elements.  When our system attempted to read the znode 
 dir, it failed because the contents of the dir exceeded the default 1mb 
 jute.maxbuffer limit.  There were a few odd things
 1) It seems odd that we could populate to be very large but could not read 
 the listing 
 2) The workaround was bumping up jute.maxbuffer on the client side setting.
 Would it make more sense to have it reject adding new znodes if it exceeds 
 jute.maxbuffer? 
 Alternately, would it make sense to have zk dir listing ignore the 
 jute.maxbuffer setting?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (ZOOKEEPER-1163) Memory leak in zk_hashtable.c:do_insert_watcher_object()

2011-08-25 Thread Anupam Chanda (JIRA)
Memory leak in zk_hashtable.c:do_insert_watcher_object()


 Key: ZOOKEEPER-1163
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1163
 Project: ZooKeeper
  Issue Type: Bug
  Components: c client
Affects Versions: 3.3.3
Reporter: Anupam Chanda


zk_hashtable.c:do_insert_watcher_object() line number 193 calls add_to_list 
with clone flag set to 1.  This leaks memory, since the original watcher object 
was already allocated on the heap by activateWatcher() line 330.

I will upload a patch shortly.  The fix is to set clone flag to 0 in the call 
to add_to_list().

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (ZOOKEEPER-1163) Memory leak in zk_hashtable.c:do_insert_watcher_object()

2011-08-25 Thread Anupam Chanda (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anupam Chanda updated ZOOKEEPER-1163:
-

Attachment: zookeeper-1163.patch

 Memory leak in zk_hashtable.c:do_insert_watcher_object()
 

 Key: ZOOKEEPER-1163
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1163
 Project: ZooKeeper
  Issue Type: Bug
  Components: c client
Affects Versions: 3.3.3
Reporter: Anupam Chanda
 Attachments: zookeeper-1163.patch


 zk_hashtable.c:do_insert_watcher_object() line number 193 calls add_to_list 
 with clone flag set to 1.  This leaks memory, since the original watcher 
 object was already allocated on the heap by activateWatcher() line 330.
 I will upload a patch shortly.  The fix is to set clone flag to 0 in the call 
 to add_to_list().

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




ZooKeeper-trunk - Build # 1280 - Failure

2011-08-25 Thread Apache Jenkins Server
See https://builds.apache.org/job/ZooKeeper-trunk/1280/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 262963 lines...]
at 
hudson.plugins.findbugs.FindBugsPublisher.perform(FindBugsPublisher.java:160)
at 
hudson.plugins.analysis.core.HealthAwarePublisher.perform(HealthAwarePublisher.java:310)
at hudson.tasks.BuildStepMonitor$2.perform(BuildStepMonitor.java:27)
at 
hudson.model.AbstractBuild$AbstractRunner.perform(AbstractBuild.java:682)
at 
hudson.model.AbstractBuild$AbstractRunner.performAllBuildSteps(AbstractBuild.java:657)
at 
hudson.model.AbstractBuild$AbstractRunner.performAllBuildSteps(AbstractBuild.java:635)
at hudson.model.Build$RunnerImpl.post2(Build.java:162)
at 
hudson.model.AbstractBuild$AbstractRunner.post(AbstractBuild.java:604)
at hudson.model.Run.run(Run.java:1401)
at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46)
at hudson.model.ResourceController.execute(ResourceController.java:88)
at hudson.model.Executor.run(Executor.java:230)
Caused by: java.io.IOException: Remote call on hadoop8 failed
at hudson.remoting.Channel.call(Channel.java:673)
at hudson.FilePath.act(FilePath.java:747)
... 13 more
Caused by: java.lang.ClassNotFoundException: Failed to deserialize the Callable 
object. Perhaps you needed to implement DelegatingCallable?
at hudson.remoting.UserRequest.perform(UserRequest.java:100)
at hudson.remoting.UserRequest.perform(UserRequest.java:48)
at hudson.remoting.Request$2.run(Request.java:270)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
Caused by: java.lang.ClassNotFoundException: 
hudson.plugins.findbugs.parser.FindBugsParser
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:247)
at java.io.ObjectInputStream.resolveClass(ObjectInputStream.java:603)
at 
hudson.remoting.ObjectInputStreamEx.resolveClass(ObjectInputStreamEx.java:50)
at 
java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1574)
at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1495)
at 
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1731)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1328)
at 
java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1946)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1870)
at 
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1752)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1328)
at 
java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1946)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1870)
at 
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1752)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1328)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:350)
at hudson.remoting.UserRequest.deserialize(UserRequest.java:182)
at hudson.remoting.UserRequest.perform(UserRequest.java:98)
... 8 more
Archiving artifacts
Recording fingerprints
Recording test results
Publishing Javadoc
Email was triggered for: Failure
Sending email for trigger: Failure



###
## FAILED TESTS (if any) 
##
All tests passed


Re: what happens when AuthenticationProvider throws an exception

2011-08-25 Thread Patrick Hunt
Probably should have caught up with all my email first... did you find
a resolution for this?

On Fri, Aug 12, 2011 at 11:00 AM, Fournier, Camille F.
camille.fourn...@gs.com wrote:
 Hi guys,

 So debugging some fun issues in my dev cluster, I discovered that due to some 
 bad user data, my AuthenticationProvider was throwing a null pointer 
 exception inside the handleAuthentication call. This call is made inside of 
 NIOServerCnxn.readRequest, and there is no try catch block. So it bubbles all 
 the way up to the NIOServerCnxn run method, which only logs it. Eventually I 
 end up with the corrupted request buffer I sent earlier:
 2011-08-12 08:01:16,602 - ERROR [CommitProcessor:4:FinalRequestProcessor@347] 
 - Failed to process sessionid:0x5319dd2bf3403f4 type:exists cxid:0x0 
 zxid:0xfffe txntype:unknown reqpath:n/a
 java.io.EOFException
        at java.io.DataInputStream.readFully(DataInputStream.java:180)
        at java.io.DataInputStream.readFully(DataInputStream.java:152)
        at 
 org.apache.jute.BinaryInputArchive.readString(BinaryInputArchive.java:82)
        at 
 org.apache.zookeeper.proto.ExistsRequest.deserialize(ExistsRequest.java:55)
        at 
 org.apache.zookeeper.server.ZooKeeperServer.byteBuffer2Record(ZooKeeperServer.java:599)
        at 
 org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:227)
        at 
 org.apache.zookeeper.server.quorum.Leader$ToBeAppliedRequestProcessor.processRequest(Leader.java:540)
        at 
 org.apache.zookeeper.server.quorum.CommitProcessor.run(CommitProcessor.java:73)
 2011-08-12 08:01:16,602 - ERROR [CommitProcessor:4:FinalRequestProcessor@354] 
 - Dumping request buffer: 0x504150

 I suspect this is due to the fact that, in the readPayload method, we don't 
 call clear on lenBuffer when an exception is thrown by readRequest.

 Question:
 Obviously, I need to fix the exception that is being thrown by my 
 AuthenticationProvider, but do we want to put some try/catch logic around 
 that call? It seems like the error there is probably contributing to my 
 corrupted buffer problem.

 C




[jira] [Updated] (ZOOKEEPER-1108) Various bugs in zoo_add_auth in C

2011-08-25 Thread Dheeraj Agrawal (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dheeraj Agrawal updated ZOOKEEPER-1108:
---

Attachment: ZOOKEEPER-1108.patch

Adding java changes to the patch

 Various bugs in zoo_add_auth in C
 -

 Key: ZOOKEEPER-1108
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1108
 Project: ZooKeeper
  Issue Type: Bug
  Components: c client
Affects Versions: 3.3.3
Reporter: Dheeraj Agrawal
Assignee: Dheeraj Agrawal
Priority: Blocker
 Fix For: 3.4.0

 Attachments: ZOOKEEPER-1108.patch, ZOOKEEPER-1108.patch, 
 ZOOKEEPER-1108.patch, ZOOKEEPER-1108.patch, ZOOKEEPER-1108.patch


 3 issues:
 In zoo_add_auth: there is a race condition:
2940 // [ZOOKEEPER-800] zoo_add_auth should return ZINVALIDSTATE if
2941 // the connection is closed.
2942 if (zoo_state(zh) == 0) {
2943 return ZINVALIDSTATE;
2944 }
 when we do zookeeper_init, the state is initialized to 0 and above we check 
 if state = 0 then throw exception.
 There is a race condition where the doIo thread is slow and has not changed 
 the state to CONNECTING, then you end up returning back ZKINVALIDSTATE.
 The problem is we use 0 for CLOSED state and UNINITIALIZED state. in case of 
 uninitialized case it should let it go through.
 2nd issue:
 Another Bug: in send_auth_info, the check is not correct
 while (auth-next != NULL) { //--BUG: in cases where there is only one auth 
 in the list, this will never send that auth, as its next will be NULL 
rc = send_info_packet(zh, auth); 
auth = auth-next; 
 }
 FIX IS:
 do { 
   rc = send_info_packet(zh, auth); 
   auth = auth-next; 
  } while (auth != NULL); //this will make sure that even if there is one auth 
 ,that will get sent.
 3rd issue:
2965 add_last_auth(zh-auth_h, authinfo);
2966 zoo_unlock_auth(zh);
2967
2968 if(zh-state == ZOO_CONNECTED_STATE || zh-state == 
 ZOO_ASSOCIATING_STATE)
2969 return send_last_auth_info(zh);
 if it is connected, we only send the last_auth_info, which may be different 
 than the one we added, as we unlocked it before sending it.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




Success: ZOOKEEPER-1108 PreCommit Build #474

2011-08-25 Thread Apache Jenkins Server
Jira: https://issues.apache.org/jira/browse/ZOOKEEPER-1108
Build: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/474/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 247560 lines...]
 [exec] BUILD SUCCESSFUL
 [exec] Total time: 0 seconds
 [exec] 
 [exec] 
 [exec] 
 [exec] 
 [exec] +1 overall.  Here are the results of testing the latest attachment 
 [exec]   
http://issues.apache.org/jira/secure/attachment/12491680/ZOOKEEPER-1108.patch
 [exec]   against trunk revision 1159929.
 [exec] 
 [exec] +1 @author.  The patch does not contain any @author tags.
 [exec] 
 [exec] +1 tests included.  The patch appears to include 6 new or 
modified tests.
 [exec] 
 [exec] +1 javadoc.  The javadoc tool did not generate any warning 
messages.
 [exec] 
 [exec] +1 javac.  The applied patch does not increase the total number 
of javac compiler warnings.
 [exec] 
 [exec] +1 findbugs.  The patch does not introduce any new Findbugs 
(version 1.3.9) warnings.
 [exec] 
 [exec] +1 release audit.  The applied patch does not increase the 
total number of release audit warnings.
 [exec] 
 [exec] +1 core tests.  The patch passed core unit tests.
 [exec] 
 [exec] +1 contrib tests.  The patch passed contrib unit tests.
 [exec] 
 [exec] Test results: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/474//testReport/
 [exec] Findbugs warnings: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/474//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
 [exec] Console output: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/474//console
 [exec] 
 [exec] This message is automatically generated.
 [exec] 
 [exec] 
 [exec] 
==
 [exec] 
==
 [exec] Adding comment to Jira.
 [exec] 
==
 [exec] 
==
 [exec] 
 [exec] 
 [exec] Comment added.
 [exec] o66cG7Ys86 logged out
 [exec] 
 [exec] 
 [exec] 
==
 [exec] 
==
 [exec] Finished build.
 [exec] 
==
 [exec] 
==
 [exec] 
 [exec] 

BUILD SUCCESSFUL
Total time: 22 minutes 36 seconds
Archiving artifacts
Recording test results
Description set: ZOOKEEPER-1108
Email was triggered for: Success
Sending email for trigger: Success



###
## FAILED TESTS (if any) 
##
All tests passed


[jira] [Commented] (ZOOKEEPER-1108) Various bugs in zoo_add_auth in C

2011-08-25 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13091256#comment-13091256
 ] 

Hadoop QA commented on ZOOKEEPER-1108:
--

+1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12491680/ZOOKEEPER-1108.patch
  against trunk revision 1159929.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 6 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/474//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/474//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/474//console

This message is automatically generated.

 Various bugs in zoo_add_auth in C
 -

 Key: ZOOKEEPER-1108
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1108
 Project: ZooKeeper
  Issue Type: Bug
  Components: c client
Affects Versions: 3.3.3
Reporter: Dheeraj Agrawal
Assignee: Dheeraj Agrawal
Priority: Blocker
 Fix For: 3.4.0

 Attachments: ZOOKEEPER-1108.patch, ZOOKEEPER-1108.patch, 
 ZOOKEEPER-1108.patch, ZOOKEEPER-1108.patch, ZOOKEEPER-1108.patch


 3 issues:
 In zoo_add_auth: there is a race condition:
2940 // [ZOOKEEPER-800] zoo_add_auth should return ZINVALIDSTATE if
2941 // the connection is closed.
2942 if (zoo_state(zh) == 0) {
2943 return ZINVALIDSTATE;
2944 }
 when we do zookeeper_init, the state is initialized to 0 and above we check 
 if state = 0 then throw exception.
 There is a race condition where the doIo thread is slow and has not changed 
 the state to CONNECTING, then you end up returning back ZKINVALIDSTATE.
 The problem is we use 0 for CLOSED state and UNINITIALIZED state. in case of 
 uninitialized case it should let it go through.
 2nd issue:
 Another Bug: in send_auth_info, the check is not correct
 while (auth-next != NULL) { //--BUG: in cases where there is only one auth 
 in the list, this will never send that auth, as its next will be NULL 
rc = send_info_packet(zh, auth); 
auth = auth-next; 
 }
 FIX IS:
 do { 
   rc = send_info_packet(zh, auth); 
   auth = auth-next; 
  } while (auth != NULL); //this will make sure that even if there is one auth 
 ,that will get sent.
 3rd issue:
2965 add_last_auth(zh-auth_h, authinfo);
2966 zoo_unlock_auth(zh);
2967
2968 if(zh-state == ZOO_CONNECTED_STATE || zh-state == 
 ZOO_ASSOCIATING_STATE)
2969 return send_last_auth_info(zh);
 if it is connected, we only send the last_auth_info, which may be different 
 than the one we added, as we unlocked it before sending it.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




RE: what happens when AuthenticationProvider throws an exception

2011-08-25 Thread Fournier, Camille F.
Welcome back from vacation :) Yeah, I checked in a fix and Ben committed it to 
trunk last week so we're good to go.

C

-Original Message-
From: Patrick Hunt [mailto:ph...@apache.org] 
Sent: Thursday, August 25, 2011 3:16 PM
To: dev@zookeeper.apache.org
Subject: Re: what happens when AuthenticationProvider throws an exception

Probably should have caught up with all my email first... did you find
a resolution for this?

On Fri, Aug 12, 2011 at 11:00 AM, Fournier, Camille F.
camille.fourn...@gs.com wrote:
 Hi guys,

 So debugging some fun issues in my dev cluster, I discovered that due to some 
 bad user data, my AuthenticationProvider was throwing a null pointer 
 exception inside the handleAuthentication call. This call is made inside of 
 NIOServerCnxn.readRequest, and there is no try catch block. So it bubbles all 
 the way up to the NIOServerCnxn run method, which only logs it. Eventually I 
 end up with the corrupted request buffer I sent earlier:
 2011-08-12 08:01:16,602 - ERROR [CommitProcessor:4:FinalRequestProcessor@347] 
 - Failed to process sessionid:0x5319dd2bf3403f4 type:exists cxid:0x0 
 zxid:0xfffe txntype:unknown reqpath:n/a
 java.io.EOFException
        at java.io.DataInputStream.readFully(DataInputStream.java:180)
        at java.io.DataInputStream.readFully(DataInputStream.java:152)
        at 
 org.apache.jute.BinaryInputArchive.readString(BinaryInputArchive.java:82)
        at 
 org.apache.zookeeper.proto.ExistsRequest.deserialize(ExistsRequest.java:55)
        at 
 org.apache.zookeeper.server.ZooKeeperServer.byteBuffer2Record(ZooKeeperServer.java:599)
        at 
 org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:227)
        at 
 org.apache.zookeeper.server.quorum.Leader$ToBeAppliedRequestProcessor.processRequest(Leader.java:540)
        at 
 org.apache.zookeeper.server.quorum.CommitProcessor.run(CommitProcessor.java:73)
 2011-08-12 08:01:16,602 - ERROR [CommitProcessor:4:FinalRequestProcessor@354] 
 - Dumping request buffer: 0x504150

 I suspect this is due to the fact that, in the readPayload method, we don't 
 call clear on lenBuffer when an exception is thrown by readRequest.

 Question:
 Obviously, I need to fix the exception that is being thrown by my 
 AuthenticationProvider, but do we want to put some try/catch logic around 
 that call? It seems like the error there is probably contributing to my 
 corrupted buffer problem.

 C




[jira] [Commented] (ZOOKEEPER-1162) consistent handling of jute.maxbuffer when attempting to read large zk directories

2011-08-25 Thread Jonathan Hsieh (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13091323#comment-13091323
 ] 

Jonathan Hsieh commented on ZOOKEEPER-1162:
---

Basically, I feel that to be consistent behavior-wise it should either:

1) reject on write when dir becomes too big, keeping the current read 
constraint (ideally in zk, as opposed to the client)
2) accept write like currently but allow the read to then succeed in this 
particular case.  
3) warn when writing when it gets too big, and then allow reads to succeed even 
if too big.  

 consistent handling of jute.maxbuffer when attempting to read large zk 
 directories
 

 Key: ZOOKEEPER-1162
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1162
 Project: ZooKeeper
  Issue Type: Improvement
  Components: server
Affects Versions: 3.3.3
Reporter: Jonathan Hsieh
Priority: Critical
 Fix For: 3.5.0


 Recently we encountered a sitaution where a zk directory got sucessfully 
 populated with 250k elements.  When our system attempted to read the znode 
 dir, it failed because the contents of the dir exceeded the default 1mb 
 jute.maxbuffer limit.  There were a few odd things
 1) It seems odd that we could populate to be very large but could not read 
 the listing 
 2) The workaround was bumping up jute.maxbuffer on the client side setting.
 Would it make more sense to have it reject adding new znodes if it exceeds 
 jute.maxbuffer? 
 Alternately, would it make sense to have zk dir listing ignore the 
 jute.maxbuffer setting?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (ZOOKEEPER-1156) Log truncation truncating log too much - can cause data loss

2011-08-25 Thread Patrick Hunt (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Hunt updated ZOOKEEPER-1156:


Fix Version/s: 3.4.0

 Log truncation truncating log too much - can cause data loss
 

 Key: ZOOKEEPER-1156
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1156
 Project: ZooKeeper
  Issue Type: Bug
  Components: quorum, server
Affects Versions: 3.3.3
Reporter: Vishal Kathuria
Assignee: Vishal Kathuria
Priority: Blocker
 Fix For: 3.3.4, 3.4.0

 Attachments: ZOOKEEPER-1156.patch

   Original Estimate: 24h
  Remaining Estimate: 24h

 The log truncation relies on position calculation for a particular zxid to 
 figure out the new size of the log file. There is a bug in 
 PositionInputStream implementation which skips counting the bytes in the log 
 which have value 0. This can lead to underestimating the actual log size. The 
 log records which should be there can get truncated, leading to data loss on 
 the participant which is executing the trunc.
 Clients can see different values depending on whether they connect to the 
 node on which trunc was executed. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (ZOOKEEPER-1156) Log truncation truncating log too much - can cause data loss

2011-08-25 Thread Patrick Hunt (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Hunt reassigned ZOOKEEPER-1156:
---

Assignee: Vishal Kathuria

 Log truncation truncating log too much - can cause data loss
 

 Key: ZOOKEEPER-1156
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1156
 Project: ZooKeeper
  Issue Type: Bug
  Components: quorum, server
Affects Versions: 3.3.3
Reporter: Vishal Kathuria
Assignee: Vishal Kathuria
Priority: Blocker
 Fix For: 3.3.4, 3.4.0

 Attachments: ZOOKEEPER-1156.patch

   Original Estimate: 24h
  Remaining Estimate: 24h

 The log truncation relies on position calculation for a particular zxid to 
 figure out the new size of the log file. There is a bug in 
 PositionInputStream implementation which skips counting the bytes in the log 
 which have value 0. This can lead to underestimating the actual log size. The 
 log records which should be there can get truncated, leading to data loss on 
 the participant which is executing the trunc.
 Clients can see different values depending on whether they connect to the 
 node on which trunc was executed. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (ZOOKEEPER-1154) Data inconsistency when the node(s) with the highest zxid is not present at the time of leader election

2011-08-25 Thread Patrick Hunt (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Hunt reassigned ZOOKEEPER-1154:
---

Assignee: Vishal Kathuria

 Data inconsistency when the node(s) with the highest zxid is not present at 
 the time of leader election
 ---

 Key: ZOOKEEPER-1154
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1154
 Project: ZooKeeper
  Issue Type: Bug
  Components: quorum
Affects Versions: 3.3.3
Reporter: Vishal Kathuria
Assignee: Vishal Kathuria
Priority: Blocker
 Fix For: 3.4.0

 Attachments: ZOOKEEPER-1154.patch, ZOOKEEPER-1154.patch

   Original Estimate: 504h
  Remaining Estimate: 504h

 If a participant with the highest zxid (lets call it A) isn't present during 
 leader election, a participant with a lower zxid (say B) might be chosen as a 
 leader. When A comes up, it will replay the log with that higher zxid. The 
 change that was in that higher zxid will only be visible to the clients 
 connecting to the participant A, but not to other participants.
 I was able to reproduce this problem by
 1. connect debugger to B and C and suspend them, so they don't write anything
 2. Issue an update to the leader A.
 3. After a few seconds, crash all servers (A,B,C)
 4. Start B and C, let the leader election take place
 5. Start A.
 6. You will find that the update done in step 2 is visible on A but not on 
 B,C, hence the inconsistency.
 Below is a more detailed analysis of what is happening in the code.
 Initial Condition
 1.Lets say there are three nodes in the ensemble A,B,C with A being the 
 leader
 2.The current epoch is 7. 
 3.For simplicity of the example, lets say zxid is a two digit number, 
 with epoch being the first digit.
 4.The zxid is 73
 5.All the nodes have seen the change 73 and have persistently logged it.
 Step 1
 Request with zxid 74 is issued. The leader A writes it to the log but there 
 is a crash of the entire ensemble and B,C never write the change 74 to their 
 log.
 Step 3
 B,C restart, A is still down
 B,C form the quorum
 B is the new leader. Lets say  B minCommitLog is 71 and maxCommitLog is 73
 epoch is now 8, zxid is 80
 Request with zxid 81 is successful. On B, minCommitLog is now 71, 
 maxCommitLog is 81
 Step 4
 A starts up. It applies the change in request with zxid 74 to its in-memory 
 data tree
 A contacts B to registerAsFollower and provides 74 as its ZxId
 Since 71=74=81, B decides to send A the diff. B will send to A the proposal 
 81.
 Problem:
 The problem with the above sequence is that A's data tree has the update from 
 request 74, which is not correct. Before getting the proposals 81, A should 
 have received a trunc to 73. I don't see that in the code. If the 
 maxCommitLog on B hadn't bumped to 81 but had stayed at 73, that case seems 
 to be fine.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (ZOOKEEPER-1160) test timeouts are too small

2011-08-25 Thread Patrick Hunt (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Hunt reassigned ZOOKEEPER-1160:
---

Assignee: Benjamin Reed

 test timeouts are too small
 ---

 Key: ZOOKEEPER-1160
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1160
 Project: ZooKeeper
  Issue Type: Sub-task
  Components: tests
Reporter: Benjamin Reed
Assignee: Benjamin Reed
 Fix For: 3.4.0

 Attachments: ZOOKEEPER-1160.patch


 in reviewing some tests that weren't passing i notices that the tick time was 
 2ms rather than the normal 2000ms. i think this is causing tests to fail on 
 some slow/overloaded machines.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (ZOOKEEPER-1154) Data inconsistency when the node(s) with the highest zxid is not present at the time of leader election

2011-08-25 Thread Patrick Hunt (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Hunt updated ZOOKEEPER-1154:


Fix Version/s: 3.3.4

 Data inconsistency when the node(s) with the highest zxid is not present at 
 the time of leader election
 ---

 Key: ZOOKEEPER-1154
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1154
 Project: ZooKeeper
  Issue Type: Bug
  Components: quorum
Affects Versions: 3.3.3
Reporter: Vishal Kathuria
Assignee: Vishal Kathuria
Priority: Blocker
 Fix For: 3.3.4, 3.4.0

 Attachments: ZOOKEEPER-1154.patch, ZOOKEEPER-1154.patch

   Original Estimate: 504h
  Remaining Estimate: 504h

 If a participant with the highest zxid (lets call it A) isn't present during 
 leader election, a participant with a lower zxid (say B) might be chosen as a 
 leader. When A comes up, it will replay the log with that higher zxid. The 
 change that was in that higher zxid will only be visible to the clients 
 connecting to the participant A, but not to other participants.
 I was able to reproduce this problem by
 1. connect debugger to B and C and suspend them, so they don't write anything
 2. Issue an update to the leader A.
 3. After a few seconds, crash all servers (A,B,C)
 4. Start B and C, let the leader election take place
 5. Start A.
 6. You will find that the update done in step 2 is visible on A but not on 
 B,C, hence the inconsistency.
 Below is a more detailed analysis of what is happening in the code.
 Initial Condition
 1.Lets say there are three nodes in the ensemble A,B,C with A being the 
 leader
 2.The current epoch is 7. 
 3.For simplicity of the example, lets say zxid is a two digit number, 
 with epoch being the first digit.
 4.The zxid is 73
 5.All the nodes have seen the change 73 and have persistently logged it.
 Step 1
 Request with zxid 74 is issued. The leader A writes it to the log but there 
 is a crash of the entire ensemble and B,C never write the change 74 to their 
 log.
 Step 3
 B,C restart, A is still down
 B,C form the quorum
 B is the new leader. Lets say  B minCommitLog is 71 and maxCommitLog is 73
 epoch is now 8, zxid is 80
 Request with zxid 81 is successful. On B, minCommitLog is now 71, 
 maxCommitLog is 81
 Step 4
 A starts up. It applies the change in request with zxid 74 to its in-memory 
 data tree
 A contacts B to registerAsFollower and provides 74 as its ZxId
 Since 71=74=81, B decides to send A the diff. B will send to A the proposal 
 81.
 Problem:
 The problem with the above sequence is that A's data tree has the update from 
 request 74, which is not correct. Before getting the proposals 81, A should 
 have received a trunc to 73. I don't see that in the code. If the 
 maxCommitLog on B hadn't bumped to 81 but had stayed at 73, that case seems 
 to be fine.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (ZOOKEEPER-1161) Provide an option for disabling auto-creation of the data directroy

2011-08-25 Thread Mahadev konar (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13091591#comment-13091591
 ] 

Mahadev konar commented on ZOOKEEPER-1161:
--

Roman,
 Can you please explain the motivation behind this jira?

 Provide an option for disabling auto-creation of the data directroy
 ---

 Key: ZOOKEEPER-1161
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1161
 Project: ZooKeeper
  Issue Type: New Feature
Reporter: Roman Shaposhnik

 Currently if ZK starts and doesn't see and existing dataDir it tries to 
 create it. There should be an option to tweak this behavior. As for default, 
 my personal opinion is to NOW allow autocreate.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (ZOOKEEPER-1051) SIGPIPE in Zookeeper 0.3.* when send'ing after cluster disconnection

2011-08-25 Thread Mahadev konar (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mahadev konar updated ZOOKEEPER-1051:
-

Fix Version/s: 3.4.0

I think this should go in to 3.4.0

 SIGPIPE in Zookeeper 0.3.* when send'ing after cluster disconnection
 

 Key: ZOOKEEPER-1051
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1051
 Project: ZooKeeper
  Issue Type: Bug
  Components: c client
Affects Versions: 3.3.2, 3.3.3, 3.4.0
Reporter: Stephen Tyree
Assignee: Stephen Tyree
Priority: Minor
 Fix For: 3.4.0

 Attachments: ZOOKEEPER-1051.patch, ZOOKEEPER-1051.patch

   Original Estimate: 2h
  Remaining Estimate: 2h

 In libzookeeper_mt, if your process is going rather slowly (such as when 
 running it in Valgrind's Memcheck) or you are using gdb with breakpoints, you 
 can occasionally get SIGPIPE when trying to send a message to the cluster. 
 For example:
 ==12788==
 ==12788== Process terminating with default action of signal 13 (SIGPIPE)
 ==12788==at 0x3F5180DE91: send (in /lib64/libpthread-2.5.so)
 ==12788==by 0x7F060AA: ??? (in /usr/lib64/libzookeeper_mt.so.2.0.0)
 ==12788==by 0x7F06E5B: zookeeper_process (in 
 /usr/lib64/libzookeeper_mt.so.2.0.0)
 ==12788==by 0x7F0D38E: ??? (in /usr/lib64/libzookeeper_mt.so.2.0.0)
 ==12788==by 0x3F5180673C: start_thread (in /lib64/libpthread-2.5.so)
 ==12788==by 0x3F50CD3F6C: clone (in /lib64/libc-2.5.so)
 ==12788==
 This is probably not the behavior we would like, since we handle server 
 disconnections after a failed call to send. To fix this, there are a few 
 options we could use. For BSD environments, we can tell a socket to never 
 send SIGPIPE with send using setsockopt:
 setsockopt(sd, SOL_SOCKET, SO_NOSIGPIPE, (void *)set, sizeof(int));
 For Linux environments, we can add a MSG_NOSIGNAL flag to every send call 
 that says to not send SIGPIPE on a bad file descriptor.
 For more information, see: 
 http://stackoverflow.com/questions/108183/how-to-prevent-sigpipes-or-handle-them-properly

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira