[jira] [Commented] (ZOOKEEPER-2469) infinite loop in ZK re-login

2016-07-07 Thread Mahadev konar (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15367064#comment-15367064
 ] 

Mahadev konar commented on ZOOKEEPER-2469:
--

[~sershe] done.

> infinite loop in ZK re-login
> 
>
> Key: ZOOKEEPER-2469
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2469
> Project: ZooKeeper
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>
> {noformat}
> int retry = 1;
> while (retry >= 0) {
> try {
> reLogin();
> break;
> } catch (LoginException le) {
> if (retry > 0) {
> --retry;
> // sleep for 10 seconds.
> try {
> Thread.sleep(10 * 1000);
> } catch (InterruptedException e) {
> LOG.error("Interrupted during login 
> retry after LoginException:", le);
> throw le;
> }
> } else {
> LOG.error("Could not refresh TGT for 
> principal: " + principal + ".", le);
> }
> }
> }
> {noformat}
> will retry forever. Should return like the one above



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (ZOOKEEPER-2469) infinite loop in ZK re-login

2016-07-07 Thread Mahadev konar (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mahadev konar updated ZOOKEEPER-2469:
-
Assignee: Sergey Shelukhin

> infinite loop in ZK re-login
> 
>
> Key: ZOOKEEPER-2469
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2469
> Project: ZooKeeper
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>
> {noformat}
> int retry = 1;
> while (retry >= 0) {
> try {
> reLogin();
> break;
> } catch (LoginException le) {
> if (retry > 0) {
> --retry;
> // sleep for 10 seconds.
> try {
> Thread.sleep(10 * 1000);
> } catch (InterruptedException e) {
> LOG.error("Interrupted during login 
> retry after LoginException:", le);
> throw le;
> }
> } else {
> LOG.error("Could not refresh TGT for 
> principal: " + principal + ".", le);
> }
> }
> }
> {noformat}
> will retry forever. Should return like the one above



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-1848) [WINDOWS] Java NIO socket channels does not work with Windows ipv6 on JDK6

2014-03-30 Thread Mahadev konar (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13954961#comment-13954961
 ] 

Mahadev konar commented on ZOOKEEPER-1848:
--

+1 for the patch. Rerunning it through jenkins again.

 [WINDOWS] Java NIO socket channels does not work with Windows ipv6 on JDK6
 --

 Key: ZOOKEEPER-1848
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1848
 Project: ZooKeeper
  Issue Type: Bug
Reporter: Enis Soztutar
Assignee: Enis Soztutar
 Fix For: 3.5.0

 Attachments: zookeeper-1848_v1.patch, zookeeper-1848_v2.patch


 ZK uses Java NIO to create ServerSorcket's from ServerSocketChannels. Under 
 windows, the ipv4 and ipv6 is implemented independently, and Java seems that 
 it cannot reuse the same socket channel for both ipv4 and ipv6 sockets. We 
 are getting java.net.SocketException: Address family not supported by 
 protocol
 family exceptions. When, ZK client resolves localhost, it gets both v4 
 127.0.0.1 and v6 ::1 address, but the socket channel cannot bind to both v4 
 and v6.
 The problem is reported as:
 http://bugs.sun.com/view_bug.do?bug_id=6230761
 http://stackoverflow.com/questions/1357091/binding-an-ipv6-server-socket-on-windows
 Although the JDK bug is reported as resolved, I have tested with jdk1.6.0_33 
 without any success. Although JDK7 seems to have fixed this problem. 
 See HBASE-6825 for reference. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (ZOOKEEPER-1667) Watch event isn't handled correctly when a client reestablish to a server

2013-10-22 Thread Mahadev konar (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13801581#comment-13801581
 ] 

Mahadev konar commented on ZOOKEEPER-1667:
--

+1 - the patch looks good to me.

 Watch event isn't handled correctly when a client reestablish to a server
 -

 Key: ZOOKEEPER-1667
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1667
 Project: ZooKeeper
  Issue Type: Bug
  Components: server
Affects Versions: 3.3.6, 3.4.5
Reporter: Jacky007
Assignee: Flavio Junqueira
Priority: Blocker
 Fix For: 3.4.6, 3.5.0

 Attachments: ZOOKEEPER-1667-b3.4.patch, ZOOKEEPER-1667-b3.4.patch, 
 ZOOKEEPER-1667.patch, ZOOKEEPER-1667-r34.patch, ZOOKEEPER-1667-trunk.patch


 When a client reestablish to a server, it will send the watches which have 
 not been triggered. But the code in DataTree does not handle it correctly.
 It is obvious, we just do not notice it :)
 scenario:
 1) Client a set a data watch on /d, then disconnect, client b delete /d and 
 create it again. When client a reestablish to zk, it will receive a 
 NodeCreated rather than a NodeDataChanged.
 2) Client a set a exists watch on /e(not exist), then disconnect, client b 
 create /e. When client a reestablish to zk, it will receive a NodeDataChanged 
 rather than a NodeCreated.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (ZOOKEEPER-1646) mt c client tests fail on Ubuntu Raring

2013-10-17 Thread Mahadev konar (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13798082#comment-13798082
 ] 

Mahadev konar commented on ZOOKEEPER-1646:
--

+1 for the patch. Nice catch Pat!

 mt c client tests fail on Ubuntu Raring
 ---

 Key: ZOOKEEPER-1646
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1646
 Project: ZooKeeper
  Issue Type: Bug
  Components: c client
Affects Versions: 3.4.5, 3.5.0
 Environment: Ubuntu 13.04 (raring), glibc 2.17
Reporter: James Page
Assignee: Patrick Hunt
Priority: Blocker
 Fix For: 3.4.6, 3.5.0

 Attachments: ZOOKEEPER-1646.patch


 Misc tests fail in the c client binding under the current Ubuntu development 
 release:
 ./zktest-mt 
  ZooKeeper server startedRunning 
 Zookeeper_clientretry::testRetry ZooKeeper server started ZooKeeper server 
 started : elapsed 9315 : OK
 Zookeeper_operations::testAsyncWatcher1 : assertion : elapsed 1054
 Zookeeper_operations::testAsyncGetOperation : assertion : elapsed 1055
 Zookeeper_operations::testOperationsAndDisconnectConcurrently1 : assertion : 
 elapsed 1066
 Zookeeper_operations::testOperationsAndDisconnectConcurrently2 : elapsed 0 : 
 OK
 Zookeeper_operations::testConcurrentOperations1 : assertion : elapsed 1055
 Zookeeper_init::testBasic : elapsed 1 : OK
 Zookeeper_init::testAddressResolution : elapsed 0 : OK
 Zookeeper_init::testMultipleAddressResolution : elapsed 0 : OK
 Zookeeper_init::testNullAddressString : elapsed 0 : OK
 Zookeeper_init::testEmptyAddressString : elapsed 0 : OK
 Zookeeper_init::testOneSpaceAddressString : elapsed 0 : OK
 Zookeeper_init::testTwoSpacesAddressString : elapsed 0 : OK
 Zookeeper_init::testInvalidAddressString1 : elapsed 0 : OK
 Zookeeper_init::testInvalidAddressString2 : elapsed 175 : OK
 Zookeeper_init::testNonexistentHost : elapsed 92 : OK
 Zookeeper_init::testOutOfMemory_init : elapsed 0 : OK
 Zookeeper_init::testOutOfMemory_getaddrs1 : elapsed 0 : OK
 Zookeeper_init::testOutOfMemory_getaddrs2 : elapsed 1 : OK
 Zookeeper_init::testPermuteAddrsList : elapsed 0 : OK
 Zookeeper_close::testIOThreadStoppedOnExpire : assertion : elapsed 1056
 Zookeeper_close::testCloseUnconnected : elapsed 0 : OK
 Zookeeper_close::testCloseUnconnected1 : elapsed 91 : OK
 Zookeeper_close::testCloseConnected1 : assertion : elapsed 1056
 Zookeeper_close::testCloseFromWatcher1 : assertion : elapsed 1076
 Zookeeper_simpleSystem::testAsyncWatcherAutoReset ZooKeeper server started : 
 elapsed 12155 : OK
 Zookeeper_simpleSystem::testDeserializeString : elapsed 0 : OK
 Zookeeper_simpleSystem::testNullData : elapsed 1031 : OK
 Zookeeper_simpleSystem::testIPV6 : elapsed 1005 : OK
 Zookeeper_simpleSystem::testPath : elapsed 1024 : OK
 Zookeeper_simpleSystem::testPathValidation : elapsed 1053 : OK
 Zookeeper_simpleSystem::testPing : elapsed 17287 : OK
 Zookeeper_simpleSystem::testAcl : elapsed 1019 : OK
 Zookeeper_simpleSystem::testChroot : elapsed 3052 : OK
 Zookeeper_simpleSystem::testAuth : assertion : elapsed 7010
 Zookeeper_simpleSystem::testHangingClient : elapsed 1015 : OK
 Zookeeper_simpleSystem::testWatcherAutoResetWithGlobal ZooKeeper server 
 started ZooKeeper server started ZooKeeper server started : elapsed 20556 : OK
 Zookeeper_simpleSystem::testWatcherAutoResetWithLocal ZooKeeper server 
 started ZooKeeper server started ZooKeeper server started : elapsed 20563 : OK
 Zookeeper_simpleSystem::testGetChildren2 : elapsed 1041 : OK
 Zookeeper_multi::testCreate : elapsed 1017 : OK
 Zookeeper_multi::testCreateDelete : elapsed 1007 : OK
 Zookeeper_multi::testInvalidVersion : elapsed 1011 : OK
 Zookeeper_multi::testNestedCreate : elapsed 1009 : OK
 Zookeeper_multi::testSetData : elapsed 6019 : OK
 Zookeeper_multi::testUpdateConflict : elapsed 1014 : OK
 Zookeeper_multi::testDeleteUpdateConflict : elapsed 1007 : OK
 Zookeeper_multi::testAsyncMulti : elapsed 2001 : OK
 Zookeeper_multi::testMultiFail : elapsed 1006 : OK
 Zookeeper_multi::testCheck : elapsed 1020 : OK
 Zookeeper_multi::testWatch : elapsed 2013 : OK
 Zookeeper_watchers::testDefaultSessionWatcher1zktest-mt: 
 tests/ZKMocks.cc:271: SyncedBoolCondition 
 DeliverWatchersWrapper::isDelivered() const: Assertion `i1000' failed.
 Aborted (core dumped)
 It would appear that the zookeeper connection does not transition to 
 connected within the required time; I increased the time allowed but no 
 change.
 Ubuntu raring has glibc 2.17; the test suite works fine on previous Ubuntu 
 releases and this is the only difference that stood out.
 Interestingly the cli_mt worked just fine connecting to the same zookeeper 
 instance that the tests left lying around so I'm assuming this is a test 
 error rather than an actual bug.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (ZOOKEEPER-442) need a way to remove watches that are no longer of interest

2013-10-10 Thread Mahadev konar (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13791557#comment-13791557
 ] 

Mahadev konar commented on ZOOKEEPER-442:
-

Thanks Rakesh. Good to see the initiative. Ill read through the doc and get 
back to you. 



 need a way to remove watches that are no longer of interest
 ---

 Key: ZOOKEEPER-442
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-442
 Project: ZooKeeper
  Issue Type: New Feature
Reporter: Benjamin Reed
Assignee: Daniel Gómez Ferro
Priority: Critical
 Fix For: 3.5.0

 Attachments: Remove Watch API.pdf, ZOOKEEPER-442.patch, 
 ZOOKEEPER-442.patch, ZOOKEEPER-442.patch, ZOOKEEPER-442.patch, 
 ZOOKEEPER-442.patch, ZOOKEEPER-442.patch, ZOOKEEPER-442.patch


 currently the only way a watch cleared is to trigger it. we need a way to 
 enumerate the outstanding watch objects, find watch events the objects are 
 watching for, and remove interests in an event.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (ZOOKEEPER-1791) ZooKeeper package includes unnecessary jars that are part of the package.

2013-10-10 Thread Mahadev konar (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mahadev konar updated ZOOKEEPER-1791:
-

Attachment: ZOOKEEPER-1791.patch

 ZooKeeper package includes unnecessary jars that are part of the package.
 -

 Key: ZOOKEEPER-1791
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1791
 Project: ZooKeeper
  Issue Type: Bug
  Components: build
Affects Versions: 3.5.0
Reporter: Mahadev konar
Assignee: Mahadev konar
 Fix For: 3.5.0

 Attachments: ZOOKEEPER-1791.patch


 ZooKeeper package includes unnecessary jars that are part of the package.
 Packages like fatjar and 
 {code}
 maven-ant-tasks-2.1.3.jar
 maven-artifact-2.2.1.jar
 maven-artifact-manager-2.2.1.jar
 maven-error-diagnostics-2.2.1.jar
 maven-model-2.2.1.jar
 maven-plugin-registry-2.2.1.jar
 maven-profile-2.2.1.jar
 maven-project-2.2.1.jar
 maven-repository-metadata-2.2.1.jar
 {code}
 are part of the zookeeper package and rpm (via bigtop). 



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Created] (ZOOKEEPER-1791) ZooKeeper package includes unnecessary jars that are part of the package.

2013-10-10 Thread Mahadev konar (JIRA)
Mahadev konar created ZOOKEEPER-1791:


 Summary: ZooKeeper package includes unnecessary jars that are part 
of the package.
 Key: ZOOKEEPER-1791
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1791
 Project: ZooKeeper
  Issue Type: Bug
  Components: build
Affects Versions: 3.5.0
Reporter: Mahadev konar
Assignee: Mahadev konar
 Fix For: 3.5.0
 Attachments: ZOOKEEPER-1791.patch

ZooKeeper package includes unnecessary jars that are part of the package.

Packages like fatjar and 

{code}
maven-ant-tasks-2.1.3.jar
maven-artifact-2.2.1.jar
maven-artifact-manager-2.2.1.jar
maven-error-diagnostics-2.2.1.jar
maven-model-2.2.1.jar
maven-plugin-registry-2.2.1.jar
maven-profile-2.2.1.jar
maven-project-2.2.1.jar
maven-repository-metadata-2.2.1.jar
{code}

are part of the zookeeper package and rpm (via bigtop). 



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (ZOOKEEPER-900) FLE implementation should be improved to use non-blocking sockets

2013-10-09 Thread Mahadev konar (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13790067#comment-13790067
 ] 

Mahadev konar commented on ZOOKEEPER-900:
-

[~phunt] I htink we can close this one in favor of another jira.


 FLE implementation should be improved to use non-blocking sockets
 -

 Key: ZOOKEEPER-900
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-900
 Project: ZooKeeper
  Issue Type: Bug
Reporter: Vishal Kher
Assignee: Vishal Kher
Priority: Critical
 Fix For: 3.5.0

 Attachments: ZOOKEEPER-900.patch, ZOOKEEPER-900.patch1, 
 ZOOKEEPER-900.patch2


 From earlier email exchanges:
 1. Blocking connects and accepts:
 a) The first problem is in manager.toSend(). This invokes connectOne(), which 
 does a blocking connect. While testing, I changed the code so that 
 connectOne() starts a new thread called AsyncConnct(). AsyncConnect.run() 
 does a socketChannel.connect(). After starting AsyncConnect, connectOne 
 starts a timer. connectOne continues with normal operations if the connection 
 is established before the timer expires, otherwise, when the timer expires it 
 interrupts AsyncConnect() thread and returns. In this way, I can have an 
 upper bound on the amount of time we need to wait for connect to succeed. Of 
 course, this was a quick fix for my testing. Ideally, we should use Selector 
 to do non-blocking connects/accepts. I am planning to do that later once we 
 at least have a quick fix for the problem and consensus from others for the 
 real fix (this problem is big blocker for us). Note that it is OK to do 
 blocking IO in SenderWorker and RecvWorker threads since they block IO to the 
 respective peer.
 b) The blocking IO problem is not just restricted to connectOne(), but also 
 in receiveConnection(). The Listener thread calls receiveConnection() for 
 each incoming connection request. receiveConnection does blocking IO to get 
 peer's info (s.read(msgBuffer)). Worse, it invokes connectOne() back to the 
 peer that had sent the connection request. All of this is happening from the 
 Listener. In short, if a peer fails after initiating a connection, the 
 Listener thread won't be able to accept connections from other peers, because 
 it would be stuck in read() or connetOne(). Also the code has an inherent 
 cycle. initiateConnection() and receiveConnection() will have to be very 
 carefully synchronized otherwise, we could run into deadlocks. This code is 
 going to be difficult to maintain/modify.
 Also see: https://issues.apache.org/jira/browse/ZOOKEEPER-822



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (ZOOKEEPER-1147) Add support for local sessions

2013-10-08 Thread Mahadev konar (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mahadev konar updated ZOOKEEPER-1147:
-

Attachment: ZOOKEEPER-1147.patch

Minor conflict with the current patch fails on applying with 
QuorumPeerMain.java - attaching a new one which fixes the conflict.

 Add support for local sessions
 --

 Key: ZOOKEEPER-1147
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1147
 Project: ZooKeeper
  Issue Type: Improvement
  Components: server
Affects Versions: 3.3.3
Reporter: Vishal Kathuria
Assignee: Thawan Kooburat
  Labels: api-change, scaling
 Fix For: 3.5.0

 Attachments: ZOOKEEPER-1147.patch, ZOOKEEPER-1147.patch, 
 ZOOKEEPER-1147.patch, ZOOKEEPER-1147.patch, ZOOKEEPER-1147.patch, 
 ZOOKEEPER-1147.patch, ZOOKEEPER-1147.patch, ZOOKEEPER-1147.patch, 
 ZOOKEEPER-1147.patch

   Original Estimate: 840h
  Remaining Estimate: 840h

 This improvement is in the bucket of making ZooKeeper work at a large scale. 
 We are planning on having about a 1 million clients connect to a ZooKeeper 
 ensemble through a set of 50-100 observers. Majority of these clients are 
 read only - ie they do not do any updates or create ephemeral nodes.
 In ZooKeeper today, the client creates a session and the session creation is 
 handled like any other update. In the above use case, the session create/drop 
 workload can easily overwhelm an ensemble. The following is a proposal for a 
 local session, to support a larger number of connections.
 1.   The idea is to introduce a new type of session - local session. A 
 local session doesn't have a full functionality of a normal session.
 2.   Local sessions cannot create ephemeral nodes.
 3.   Once a local session is lost, you cannot re-establish it using the 
 session-id/password. The session and its watches are gone for good.
 4.   When a local session connects, the session info is only maintained 
 on the zookeeper server (in this case, an observer) that it is connected to. 
 The leader is not aware of the creation of such a session and there is no 
 state written to disk.
 5.   The pings and expiration is handled by the server that the session 
 is connected to.
 With the above changes, we can make ZooKeeper scale to a much larger number 
 of clients without making the core ensemble a bottleneck.
 In terms of API, there are two options that are being considered
 1. Let the client specify at the connect time which kind of session do they 
 want.
 2. All sessions connect as local sessions and automatically get promoted to 
 global sessions when they do an operation that requires a global session 
 (e.g. creating an ephemeral node)
 Chubby took the approach of lazily promoting all sessions to global, but I 
 don't think that would work in our case, where we want to keep sessions which 
 never create ephemeral nodes as always local. Option 2 would make it more 
 broadly usable but option 1 would be easier to implement.
 We are thinking of implementing option 1 as the first cut. There would be a 
 client flag, IsLocalSession (much like the current readOnly flag) that would 
 be used to determine whether to create a local session or a global session.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (ZOOKEEPER-1147) Add support for local sessions

2013-10-08 Thread Mahadev konar (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13788998#comment-13788998
 ] 

Mahadev konar commented on ZOOKEEPER-1147:
--

[~fpj] looks like the patch is ready to get in. You want to look through before 
we commit? 


 Add support for local sessions
 --

 Key: ZOOKEEPER-1147
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1147
 Project: ZooKeeper
  Issue Type: Improvement
  Components: server
Affects Versions: 3.3.3
Reporter: Vishal Kathuria
Assignee: Thawan Kooburat
  Labels: api-change, scaling
 Fix For: 3.5.0

 Attachments: ZOOKEEPER-1147.patch, ZOOKEEPER-1147.patch, 
 ZOOKEEPER-1147.patch, ZOOKEEPER-1147.patch, ZOOKEEPER-1147.patch, 
 ZOOKEEPER-1147.patch, ZOOKEEPER-1147.patch, ZOOKEEPER-1147.patch, 
 ZOOKEEPER-1147.patch

   Original Estimate: 840h
  Remaining Estimate: 840h

 This improvement is in the bucket of making ZooKeeper work at a large scale. 
 We are planning on having about a 1 million clients connect to a ZooKeeper 
 ensemble through a set of 50-100 observers. Majority of these clients are 
 read only - ie they do not do any updates or create ephemeral nodes.
 In ZooKeeper today, the client creates a session and the session creation is 
 handled like any other update. In the above use case, the session create/drop 
 workload can easily overwhelm an ensemble. The following is a proposal for a 
 local session, to support a larger number of connections.
 1.   The idea is to introduce a new type of session - local session. A 
 local session doesn't have a full functionality of a normal session.
 2.   Local sessions cannot create ephemeral nodes.
 3.   Once a local session is lost, you cannot re-establish it using the 
 session-id/password. The session and its watches are gone for good.
 4.   When a local session connects, the session info is only maintained 
 on the zookeeper server (in this case, an observer) that it is connected to. 
 The leader is not aware of the creation of such a session and there is no 
 state written to disk.
 5.   The pings and expiration is handled by the server that the session 
 is connected to.
 With the above changes, we can make ZooKeeper scale to a much larger number 
 of clients without making the core ensemble a bottleneck.
 In terms of API, there are two options that are being considered
 1. Let the client specify at the connect time which kind of session do they 
 want.
 2. All sessions connect as local sessions and automatically get promoted to 
 global sessions when they do an operation that requires a global session 
 (e.g. creating an ephemeral node)
 Chubby took the approach of lazily promoting all sessions to global, but I 
 don't think that would work in our case, where we want to keep sessions which 
 never create ephemeral nodes as always local. Option 2 would make it more 
 broadly usable but option 1 would be easier to implement.
 We are thinking of implementing option 1 as the first cut. There would be a 
 client flag, IsLocalSession (much like the current readOnly flag) that would 
 be used to determine whether to create a local session or a global session.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (ZOOKEEPER-442) need a way to remove watches that are no longer of interest

2013-10-07 Thread Mahadev konar (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13788503#comment-13788503
 ] 

Mahadev konar commented on ZOOKEEPER-442:
-

[~eribeiro] if you are interested, feel free to take it up. I'd be happy to 
provide guidance/other help on this.

Thanks

 need a way to remove watches that are no longer of interest
 ---

 Key: ZOOKEEPER-442
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-442
 Project: ZooKeeper
  Issue Type: New Feature
Reporter: Benjamin Reed
Assignee: Daniel Gómez Ferro
Priority: Critical
 Fix For: 3.5.0

 Attachments: ZOOKEEPER-442.patch, ZOOKEEPER-442.patch, 
 ZOOKEEPER-442.patch, ZOOKEEPER-442.patch, ZOOKEEPER-442.patch, 
 ZOOKEEPER-442.patch, ZOOKEEPER-442.patch


 currently the only way a watch cleared is to trigger it. we need a way to 
 enumerate the outstanding watch objects, find watch events the objects are 
 watching for, and remove interests in an event.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (ZOOKEEPER-1696) Fail to run zookeeper client on Weblogic application server

2013-09-17 Thread Mahadev konar (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mahadev konar updated ZOOKEEPER-1696:
-

Fix Version/s: 3.4.6

 Fail to run zookeeper client on Weblogic application server
 ---

 Key: ZOOKEEPER-1696
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1696
 Project: ZooKeeper
  Issue Type: Bug
  Components: java client
Affects Versions: 3.4.5
 Environment: Java version: jdk170_06
 WebLogic Server Version: 10.3.6.0 
Reporter: Dmitry Konstantinov
Assignee: Jeffrey Zhong
Priority: Critical
 Fix For: 3.4.6

 Attachments: zookeeper-1696.patch


 The problem in details is described here: 
 http://comments.gmane.org/gmane.comp.java.zookeeper.user/2897
 The provided link also contains a reference to fix implementation.
 {noformat}
 Apr 24, 2013 1:03:28 PM MSK Warning org.apache.zookeeper.ClientCnxn 
 devapp090 clust2 [ACTIVE] ExecuteThread: '2' for queue: 
 'weblogic.kernel.Default (devapp090:2182) internal   1366794208810 
 BEA-00 WARN  org.apache.zookeeper.ClientCnxn - Session 0x0 for server 
 null, unexpected error, closing socket connection and attempting reconnect
 java.lang.IllegalArgumentException: No Configuration was registered that can 
 handle the configuration named Client
 at 
 com.bea.common.security.jdkutils.JAASConfiguration.getAppConfigurationEntry(JAASConfiguration.java:130)
 at 
 org.apache.zookeeper.client.ZooKeeperSaslClient.init(ZooKeeperSaslClient.java:97)
 at 
 org.apache.zookeeper.ClientCnxn$SendThread.startConnect(ClientCnxn.java:943)
 at 
 org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:993)
 
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (ZOOKEEPER-1733) FLETest#testLE is flaky on windows boxes

2013-09-17 Thread Mahadev konar (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mahadev konar updated ZOOKEEPER-1733:
-

Fix Version/s: 3.4.6

 FLETest#testLE is flaky on windows boxes
 

 Key: ZOOKEEPER-1733
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1733
 Project: ZooKeeper
  Issue Type: Bug
Affects Versions: 3.4.5
Reporter: Jeffrey Zhong
Assignee: Jeffrey Zhong
 Fix For: 3.4.6

 Attachments: zookeeper-1733.patch


 FLETest#testLE fail intermittently on windows boxes. The reason is that in 
 LEThread#run() we have:
 {code}
 if(leader == i){
 synchronized(finalObj){
 successCount++;
 if(successCount  (count/2)) 
 finalObj.notify();
 }
 break;
 }
 {code}
 Basically once we have a confirmed leader, the leader thread dies due to the 
 break of while loop. 
 While in the verification step, we check if the leader thread alive or not as 
 following:
 {code}
if(threads.get((int) leader).isAlive()){
Assert.fail(Leader hasn't joined:  + leader);
}
 {code}
 On windows boxes, the above verification step fails frequently because leader 
 thread most likely already exits.
 Do we know why we have the leader alive verification step only lead thread 
 can bump up successCount = count/2?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (ZOOKEEPER-1733) FLETest#testLE is flaky on windows boxes

2013-09-17 Thread Mahadev konar (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mahadev konar updated ZOOKEEPER-1733:
-

Fix Version/s: (was: 3.4.6)
   3.5.0

 FLETest#testLE is flaky on windows boxes
 

 Key: ZOOKEEPER-1733
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1733
 Project: ZooKeeper
  Issue Type: Bug
Affects Versions: 3.4.5
Reporter: Jeffrey Zhong
Assignee: Jeffrey Zhong
 Fix For: 3.5.0

 Attachments: zookeeper-1733.patch


 FLETest#testLE fail intermittently on windows boxes. The reason is that in 
 LEThread#run() we have:
 {code}
 if(leader == i){
 synchronized(finalObj){
 successCount++;
 if(successCount  (count/2)) 
 finalObj.notify();
 }
 break;
 }
 {code}
 Basically once we have a confirmed leader, the leader thread dies due to the 
 break of while loop. 
 While in the verification step, we check if the leader thread alive or not as 
 following:
 {code}
if(threads.get((int) leader).isAlive()){
Assert.fail(Leader hasn't joined:  + leader);
}
 {code}
 On windows boxes, the above verification step fails frequently because leader 
 thread most likely already exits.
 Do we know why we have the leader alive verification step only lead thread 
 can bump up successCount = count/2?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (ZOOKEEPER-1733) FLETest#testLE is flaky on windows boxes

2013-09-17 Thread Mahadev konar (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13770309#comment-13770309
 ] 

Mahadev konar commented on ZOOKEEPER-1733:
--

Running this through jenkins.

 FLETest#testLE is flaky on windows boxes
 

 Key: ZOOKEEPER-1733
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1733
 Project: ZooKeeper
  Issue Type: Bug
Affects Versions: 3.4.5
Reporter: Jeffrey Zhong
Assignee: Jeffrey Zhong
 Fix For: 3.5.0

 Attachments: zookeeper-1733.patch


 FLETest#testLE fail intermittently on windows boxes. The reason is that in 
 LEThread#run() we have:
 {code}
 if(leader == i){
 synchronized(finalObj){
 successCount++;
 if(successCount  (count/2)) 
 finalObj.notify();
 }
 break;
 }
 {code}
 Basically once we have a confirmed leader, the leader thread dies due to the 
 break of while loop. 
 While in the verification step, we check if the leader thread alive or not as 
 following:
 {code}
if(threads.get((int) leader).isAlive()){
Assert.fail(Leader hasn't joined:  + leader);
}
 {code}
 On windows boxes, the above verification step fails frequently because leader 
 thread most likely already exits.
 Do we know why we have the leader alive verification step only lead thread 
 can bump up successCount = count/2?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (ZOOKEEPER-1751) ClientCnxn#run could miss the second ping or connection get dropped before a ping

2013-09-17 Thread Mahadev konar (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mahadev konar updated ZOOKEEPER-1751:
-

Fix Version/s: 3.4.6

 ClientCnxn#run could miss the second ping or connection get dropped before a 
 ping
 -

 Key: ZOOKEEPER-1751
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1751
 Project: ZooKeeper
  Issue Type: Bug
Affects Versions: 3.4.5
Reporter: Jeffrey Zhong
Assignee: Jeffrey Zhong
 Fix For: 3.4.6

 Attachments: zookeeper-1751.patch


 We could throw SessionTimeoutException exception even when timeToNextPing may 
 also be negative depending on the time when the following line is executed by 
 the thread because we check time out before sending a ping.
 {code}
   to = readTimeout - clientCnxnSocket.getIdleRecv();
 {code}
 In addition, we only ping twice no matter how long the session time out value 
 is. For example, we set session time out = 60mins then we only try ping twice 
 in 40mins window. Therefore, the connection could be dropped by OS after idle 
 time out.
 The issue is causing randomly connection loss or session expired issues 
 in client side which is bad for applications like HBase.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (ZOOKEEPER-1696) Fail to run zookeeper client on Weblogic application server

2013-09-17 Thread Mahadev konar (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13770315#comment-13770315
 ] 

Mahadev konar commented on ZOOKEEPER-1696:
--

+1 for the patch. Given it ran through jenkins committing this to 3.4 and trunk.

 Fail to run zookeeper client on Weblogic application server
 ---

 Key: ZOOKEEPER-1696
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1696
 Project: ZooKeeper
  Issue Type: Bug
  Components: java client
Affects Versions: 3.4.5
 Environment: Java version: jdk170_06
 WebLogic Server Version: 10.3.6.0 
Reporter: Dmitry Konstantinov
Assignee: Jeffrey Zhong
Priority: Critical
 Fix For: 3.4.6

 Attachments: zookeeper-1696.patch


 The problem in details is described here: 
 http://comments.gmane.org/gmane.comp.java.zookeeper.user/2897
 The provided link also contains a reference to fix implementation.
 {noformat}
 Apr 24, 2013 1:03:28 PM MSK Warning org.apache.zookeeper.ClientCnxn 
 devapp090 clust2 [ACTIVE] ExecuteThread: '2' for queue: 
 'weblogic.kernel.Default (devapp090:2182) internal   1366794208810 
 BEA-00 WARN  org.apache.zookeeper.ClientCnxn - Session 0x0 for server 
 null, unexpected error, closing socket connection and attempting reconnect
 java.lang.IllegalArgumentException: No Configuration was registered that can 
 handle the configuration named Client
 at 
 com.bea.common.security.jdkutils.JAASConfiguration.getAppConfigurationEntry(JAASConfiguration.java:130)
 at 
 org.apache.zookeeper.client.ZooKeeperSaslClient.init(ZooKeeperSaslClient.java:97)
 at 
 org.apache.zookeeper.ClientCnxn$SendThread.startConnect(ClientCnxn.java:943)
 at 
 org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:993)
 
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (ZOOKEEPER-1657) Increased CPU usage by unnecessary SASL checks

2013-09-08 Thread Mahadev konar (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13761595#comment-13761595
 ] 

Mahadev konar commented on ZOOKEEPER-1657:
--

+1 for the patch. Looks good. Thanks Eugene/Flavio.

 Increased CPU usage by unnecessary SASL checks
 --

 Key: ZOOKEEPER-1657
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1657
 Project: ZooKeeper
  Issue Type: Bug
  Components: java client
Affects Versions: 3.4.5
Reporter: Gunnar Wagenknecht
Assignee: Philip K. Warren
  Labels: performance
 Fix For: 3.5.0, 3.4.6

 Attachments: ZOOKEEPER-1657.patch, ZOOKEEPER-1657.patch, 
 ZOOKEEPER-1657.patch, ZOOKEEPER-1657.patch, ZOOKEEPER-1657.patch, 
 zookeeper-hotspot-gone.png, zookeeper-hotspot.png


 I did some profiling in one of our Java environments and found an interesting 
 footprint in ZooKeeper. The SASL support seems to trigger a lot times on the 
 client although it's not even in use.
 Is there a switch to disable SASL completely?
 The attached screenshot shows a 10-minute profiling session on one of our 
 production Jetty servers. The Jetty server handles ~1k web requests per 
 minute. The average response time per web request is a few milli seconds. The 
 profiling was performed on a machine running for 24h. 
 We noticed a significant CPU increase on our servers when deploying an update 
 from ZooKeeper 3.3.2 to ZooKeeper 3.4.5. Thus, we started investigating. The 
 screenshot shows that only 32% CPU time are spent in Jetty. In contrast, 65% 
 are spend in ZooKeeper. 
 A few notes/thoughts:
 * {{ClientCnxn$SendThread.clientTunneledAuthenticationInProgress}} seems to 
 be the culprit
 * {{javax.security.auth.login.Configuration.getConfiguration}} seems to be 
 called very often?
 * There is quite a bit reflection involved in 
 {{java.security.AccessController.doPrivileged}}
 * No security manager is active in the JVM: I tend to place an if-check in 
 the code before calling {{AccessController.doPrivileged}}. When no SM is 
 installed, the runnable can be called directly which safes cycles.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (ZOOKEEPER-767) Submitting Demo/Recipe Shared / Exclusive Lock Code

2013-05-15 Thread Mahadev konar (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13658530#comment-13658530
 ] 

Mahadev konar commented on ZOOKEEPER-767:
-

Flavio,
 Agreed, I think its definitely a better match for Curator. 

 Submitting Demo/Recipe Shared / Exclusive Lock Code
 ---

 Key: ZOOKEEPER-767
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-767
 Project: ZooKeeper
  Issue Type: Improvement
  Components: recipes
Affects Versions: 3.3.0
Reporter: Sam Baskinger
Assignee: Sam Baskinger
Priority: Minor
 Fix For: 3.5.0

 Attachments: ZOOKEEPER-767.patch, ZOOKEEPER-767.patch, 
 ZOOKEEPER-767.patch, ZOOKEEPER-767.patch, ZOOKEEPER-767.patch, 
 ZOOKEEPER-767.patch

  Time Spent: 8h

 Networked Insights would like to share-back some code for shared/exclusive 
 locking that we are using in our labs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (ZOOKEEPER-1686) Publish ZK 3.4.5 test jar

2013-04-07 Thread Mahadev konar (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mahadev konar updated ZOOKEEPER-1686:
-

Assignee: Mahadev konar

 Publish ZK 3.4.5 test jar
 -

 Key: ZOOKEEPER-1686
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1686
 Project: ZooKeeper
  Issue Type: Bug
  Components: build, tests
Affects Versions: 3.4.5
Reporter: Todd Lipcon
Assignee: Mahadev konar

 ZooKeeper 3.4.2 used to publish a jar with the tests classifier for use by 
 downstream project tests. It seems this didn't get published for 3.4.4 or 
 3.4.5 (see 
 https://repository.apache.org/index.html#nexus-search;quick~org.apache.zookeeper).
  Would someone mind please publishing these artifacts?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (ZOOKEEPER-1382) Zookeeper server holds onto dead/expired session ids in the watch data structures

2013-03-03 Thread Mahadev konar (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13592041#comment-13592041
 ] 

Mahadev konar commented on ZOOKEEPER-1382:
--

Michael,
 Would you be able to upload a patch for trunk as well?

 Zookeeper server holds onto dead/expired session ids in the watch data 
 structures
 -

 Key: ZOOKEEPER-1382
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1382
 Project: ZooKeeper
  Issue Type: Bug
  Components: server
Affects Versions: 3.4.5
Reporter: Neha Narkhede
Assignee: Neha Narkhede
Priority: Critical
 Fix For: 3.4.6

 Attachments: ZOOKEEPER-1382_3.3.4.patch, 
 ZOOKEEPER-1382-branch-3.4.patch


 I've observed that zookeeper server holds onto expired session ids in the 
 watcher data structures. The result is the wchp command reports session ids 
 that cannot be found through cons/dump and those expired session ids sit 
 there maybe until the server is restarted. Here are snippets from the client 
 and the server logs that lead to this state, for one particular session id 
 0x134485fd7bcb26f -
 There are 4 servers in the zookeeper cluster - 223, 224, 225 (leader), 226 
 and I'm using ZkClient to connect to the cluster
 From the application log -
 application.log.2012-01-26-325.gz:2012/01/26 04:56:36.177 INFO [ClientCnxn] 
 [main-SendThread(223.prod:12913)] [application Session establishment complete 
 on server 223.prod/172.17.135.38:12913, sessionid = 0x134485fd7bcb26f, 
 negotiated timeout = 6000
 application.log.2012-01-27.gz:2012/01/27 09:52:37.714 INFO [ClientCnxn] 
 [main-SendThread(223.prod:12913)] [application] Client session timed out, 
 have not heard from server in 9827ms for sessionid 0x134485fd7bcb26f, closing 
 socket connection and attempting reconnect
 application.log.2012-01-27.gz:2012/01/27 09:52:38.191 INFO [ClientCnxn] 
 [main-SendThread(226.prod:12913)] [application] Unable to reconnect to 
 ZooKeeper service, session 0x134485fd7bcb26f has expired, closing socket 
 connection
 On the leader zk, 225 -
 zookeeper.log.2012-01-27-leader-225.gz:2012-01-27 09:52:34,010 - INFO  
 [SessionTracker:ZooKeeperServer@314] - Expiring session 0x134485fd7bcb26f, 
 timeout of 6000ms exceeded
 zookeeper.log.2012-01-27-leader-225.gz:2012-01-27 09:52:34,010 - INFO  
 [ProcessThread:-1:PrepRequestProcessor@391] - Processed session termination 
 for sessionid: 0x134485fd7bcb26f
 On the server, the client was initially connected to, 223 -
 zookeeper.log.2012-01-26-223.gz:2012-01-26 04:56:36,173 - INFO  
 [CommitProcessor:1:NIOServerCnxn@1580] - Established session 
 0x134485fd7bcb26f with negotiated timeout 6000 for client /172.17.136.82:45020
 zookeeper.log.2012-01-27-223.gz:2012-01-27 09:52:34,018 - INFO  
 [CommitProcessor:1:NIOServerCnxn@1435] - Closed socket connection for client 
 /172.17.136.82:45020 which had sessionid 0x134485fd7bcb26f
 Here are the log snippets from 226, which is the server, the client 
 reconnected to, before getting session expired event -
 2012-01-27 09:52:38,190 - INFO  
 [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:12913:NIOServerCnxn@770] - Client 
 attempting to renew session 0x134485fd7bcb26f at /172.17.136.82:49367
 2012-01-27 09:52:38,191 - INFO  
 [QuorumPeer:/0.0.0.0:12913:NIOServerCnxn@1573] - Invalid session 
 0x134485fd7bcb26f for client /172.17.136.82:49367, probably expired
 2012-01-27 09:52:38,191 - INFO  
 [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:12913:NIOServerCnxn@1435] - Closed 
 socket connection for client /172.17.136.82:49367 which had sessionid 
 0x134485fd7bcb26f
 wchp output from 226, taken on 01/30 -
 nnarkhed-ld:zk-cons-wchp-2012013000 nnarkhed$ grep 0x134485fd7bcb26f 
 *226.*wchp* | wc -l
 3
 wchp output from 223, taken on 01/30 -
 nnarkhed-ld:zk-cons-wchp-2012013000 nnarkhed$ grep 0x134485fd7bcb26f 
 *223.*wchp* | wc -l
 0
 cons output from 223 and 226, taken on 01/30 -
 nnarkhed-ld:zk-cons-wchp-2012013000 nnarkhed$ grep 0x134485fd7bcb26f 
 *226.*cons* | wc -l
 0
 nnarkhed-ld:zk-cons-wchp-2012013000 nnarkhed$ grep 0x134485fd7bcb26f 
 *223.*cons* | wc -l
 0
 So, what seems to have happened is that the client was able to re-register 
 the watches on the new server (226), after it got disconnected from 223, 
 inspite of having an expired session id. 
 In NIOServerCnxn, I saw that after suspecting that a session is expired, a 
 server removes the cnxn and its watches from its internal data structures. 
 But before that it allows more requests to be processed even if the session 
 is expired -
 // Now that the session is ready we can start receiving packets
 synchronized (this.factory) {
 sk.selector().wakeup();
 enableRecv();
 }
 } catch 

[jira] [Commented] (ZOOKEEPER-1551) Observer ignore txns that comes after snapshot and UPTODATE

2013-03-03 Thread Mahadev konar (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13592043#comment-13592043
 ] 

Mahadev konar commented on ZOOKEEPER-1551:
--

[~fpj] would you be able to review the latest patch?

 Observer ignore txns that comes after snapshot and UPTODATE 
 

 Key: ZOOKEEPER-1551
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1551
 Project: ZooKeeper
  Issue Type: Bug
  Components: quorum, server
Affects Versions: 3.4.3
Reporter: Thawan Kooburat
Assignee: Thawan Kooburat
Priority: Blocker
 Fix For: 3.5.0, 3.4.6

 Attachments: ZOOKEEPER-1551.patch, ZOOKEEPER-1551.patch


 In Learner.java, txns which comes after the learner has taken the snapshot 
 (after NEWLEADER packet) are stored in packetsNotCommitted. The follower has 
 special logic to apply these txns at the end of syncWithLeader() method. 
 However, the observer will ignore these txns completely, causing data 
 inconsistency. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (ZOOKEEPER-1657) Increased CPU usage by unnecessary SASL checks

2013-03-03 Thread Mahadev konar (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mahadev konar updated ZOOKEEPER-1657:
-

Fix Version/s: 3.4.6
   3.5.0

 Increased CPU usage by unnecessary SASL checks
 --

 Key: ZOOKEEPER-1657
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1657
 Project: ZooKeeper
  Issue Type: Bug
  Components: java client
Affects Versions: 3.4.5
Reporter: Gunnar Wagenknecht
  Labels: performance
 Fix For: 3.5.0, 3.4.6

 Attachments: ZOOKEEPER-1657.patch, ZOOKEEPER-1657.patch, 
 ZOOKEEPER-1657.patch, zookeeper-hotspot.png


 I did some profiling in one of our Java environments and found an interesting 
 footprint in ZooKeeper. The SASL support seems to trigger a lot times on the 
 client although it's not even in use.
 Is there a switch to disable SASL completely?
 The attached screenshot shows a 10-minute profiling session on one of our 
 production Jetty servers. The Jetty server handles ~1k web requests per 
 minute. The average response time per web request is a few milli seconds. The 
 profiling was performed on a machine running for 24h. 
 We noticed a significant CPU increase on our servers when deploying an update 
 from ZooKeeper 3.3.2 to ZooKeeper 3.4.5. Thus, we started investigating. The 
 screenshot shows that only 32% CPU time are spent in Jetty. In contrast, 65% 
 are spend in ZooKeeper. 
 A few notes/thoughts:
 * {{ClientCnxn$SendThread.clientTunneledAuthenticationInProgress}} seems to 
 be the culprit
 * {{javax.security.auth.login.Configuration.getConfiguration}} seems to be 
 called very often?
 * There is quite a bit reflection involved in 
 {{java.security.AccessController.doPrivileged}}
 * No security manager is active in the JVM: I tend to place an if-check in 
 the code before calling {{AccessController.doPrivileged}}. When no SM is 
 installed, the runnable can be called directly which safes cycles.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (ZOOKEEPER-1147) Add support for local sessions

2013-01-17 Thread Mahadev konar (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13556316#comment-13556316
 ] 

Mahadev konar commented on ZOOKEEPER-1147:
--

bq. Yes, a session retains the same ID when it is upgraded from local session 
to global session. I think this is desirable. Can you elaborate why this may 
cause problem?

Yes its desirable. Before I comment on what I think might be wrong, when does 
the server who has the local sessionid remove it from its data structures? Is 
it when it gets a response from in final request processor about the session 
creation? Until then the session is in a local session? 


 Add support for local sessions
 --

 Key: ZOOKEEPER-1147
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1147
 Project: ZooKeeper
  Issue Type: Improvement
  Components: server
Affects Versions: 3.3.3
Reporter: Vishal Kathuria
Assignee: Thawan Kooburat
  Labels: api-change, scaling
 Fix For: 3.5.0

 Attachments: ZOOKEEPER-1147.patch

   Original Estimate: 840h
  Remaining Estimate: 840h

 This improvement is in the bucket of making ZooKeeper work at a large scale. 
 We are planning on having about a 1 million clients connect to a ZooKeeper 
 ensemble through a set of 50-100 observers. Majority of these clients are 
 read only - ie they do not do any updates or create ephemeral nodes.
 In ZooKeeper today, the client creates a session and the session creation is 
 handled like any other update. In the above use case, the session create/drop 
 workload can easily overwhelm an ensemble. The following is a proposal for a 
 local session, to support a larger number of connections.
 1.   The idea is to introduce a new type of session - local session. A 
 local session doesn't have a full functionality of a normal session.
 2.   Local sessions cannot create ephemeral nodes.
 3.   Once a local session is lost, you cannot re-establish it using the 
 session-id/password. The session and its watches are gone for good.
 4.   When a local session connects, the session info is only maintained 
 on the zookeeper server (in this case, an observer) that it is connected to. 
 The leader is not aware of the creation of such a session and there is no 
 state written to disk.
 5.   The pings and expiration is handled by the server that the session 
 is connected to.
 With the above changes, we can make ZooKeeper scale to a much larger number 
 of clients without making the core ensemble a bottleneck.
 In terms of API, there are two options that are being considered
 1. Let the client specify at the connect time which kind of session do they 
 want.
 2. All sessions connect as local sessions and automatically get promoted to 
 global sessions when they do an operation that requires a global session 
 (e.g. creating an ephemeral node)
 Chubby took the approach of lazily promoting all sessions to global, but I 
 don't think that would work in our case, where we want to keep sessions which 
 never create ephemeral nodes as always local. Option 2 would make it more 
 broadly usable but option 1 would be easier to implement.
 We are thinking of implementing option 1 as the first cut. There would be a 
 client flag, IsLocalSession (much like the current readOnly flag) that would 
 be used to determine whether to create a local session or a global session.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Comment Edited] (ZOOKEEPER-1147) Add support for local sessions

2013-01-17 Thread Mahadev konar (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13556316#comment-13556316
 ] 

Mahadev konar edited comment on ZOOKEEPER-1147 at 1/17/13 3:42 PM:
---

bq. Yes, a session retains the same ID when it is upgraded from local session 
to global session. I think this is desirable. Can you elaborate why this may 
cause problem?

Yes its desirable. Before I comment on what I think might be wrong, when does 
the server who has the local sessionid remove it from its data structures? Is 
it when it gets a create session in final request processor? Until then the 
session is  a local session? 


  was (Author: mahadev):
bq. Yes, a session retains the same ID when it is upgraded from local 
session to global session. I think this is desirable. Can you elaborate why 
this may cause problem?

Yes its desirable. Before I comment on what I think might be wrong, when does 
the server who has the local sessionid remove it from its data structures? Is 
it when it gets a response from in final request processor about the session 
creation? Until then the session is in a local session? 

  
 Add support for local sessions
 --

 Key: ZOOKEEPER-1147
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1147
 Project: ZooKeeper
  Issue Type: Improvement
  Components: server
Affects Versions: 3.3.3
Reporter: Vishal Kathuria
Assignee: Thawan Kooburat
  Labels: api-change, scaling
 Fix For: 3.5.0

 Attachments: ZOOKEEPER-1147.patch

   Original Estimate: 840h
  Remaining Estimate: 840h

 This improvement is in the bucket of making ZooKeeper work at a large scale. 
 We are planning on having about a 1 million clients connect to a ZooKeeper 
 ensemble through a set of 50-100 observers. Majority of these clients are 
 read only - ie they do not do any updates or create ephemeral nodes.
 In ZooKeeper today, the client creates a session and the session creation is 
 handled like any other update. In the above use case, the session create/drop 
 workload can easily overwhelm an ensemble. The following is a proposal for a 
 local session, to support a larger number of connections.
 1.   The idea is to introduce a new type of session - local session. A 
 local session doesn't have a full functionality of a normal session.
 2.   Local sessions cannot create ephemeral nodes.
 3.   Once a local session is lost, you cannot re-establish it using the 
 session-id/password. The session and its watches are gone for good.
 4.   When a local session connects, the session info is only maintained 
 on the zookeeper server (in this case, an observer) that it is connected to. 
 The leader is not aware of the creation of such a session and there is no 
 state written to disk.
 5.   The pings and expiration is handled by the server that the session 
 is connected to.
 With the above changes, we can make ZooKeeper scale to a much larger number 
 of clients without making the core ensemble a bottleneck.
 In terms of API, there are two options that are being considered
 1. Let the client specify at the connect time which kind of session do they 
 want.
 2. All sessions connect as local sessions and automatically get promoted to 
 global sessions when they do an operation that requires a global session 
 (e.g. creating an ephemeral node)
 Chubby took the approach of lazily promoting all sessions to global, but I 
 don't think that would work in our case, where we want to keep sessions which 
 never create ephemeral nodes as always local. Option 2 would make it more 
 broadly usable but option 1 would be easier to implement.
 We are thinking of implementing option 1 as the first cut. There would be a 
 client flag, IsLocalSession (much like the current readOnly flag) that would 
 be used to determine whether to create a local session or a global session.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (ZOOKEEPER-1147) Add support for local sessions

2013-01-17 Thread Mahadev konar (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13557012#comment-13557012
 ] 

Mahadev konar commented on ZOOKEEPER-1147:
--

[~thawan] I thin the above scenario is ok. The only issue I think we have is 
the sensitive local sessions. Since we have had too many issues with 
disconnects and session expiry I think this might cause more issues than we 
already have. Is there something we can do here? I cant seem to find a way 
around it without doing client side changes.


 Add support for local sessions
 --

 Key: ZOOKEEPER-1147
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1147
 Project: ZooKeeper
  Issue Type: Improvement
  Components: server
Affects Versions: 3.3.3
Reporter: Vishal Kathuria
Assignee: Thawan Kooburat
  Labels: api-change, scaling
 Fix For: 3.5.0

 Attachments: ZOOKEEPER-1147.patch

   Original Estimate: 840h
  Remaining Estimate: 840h

 This improvement is in the bucket of making ZooKeeper work at a large scale. 
 We are planning on having about a 1 million clients connect to a ZooKeeper 
 ensemble through a set of 50-100 observers. Majority of these clients are 
 read only - ie they do not do any updates or create ephemeral nodes.
 In ZooKeeper today, the client creates a session and the session creation is 
 handled like any other update. In the above use case, the session create/drop 
 workload can easily overwhelm an ensemble. The following is a proposal for a 
 local session, to support a larger number of connections.
 1.   The idea is to introduce a new type of session - local session. A 
 local session doesn't have a full functionality of a normal session.
 2.   Local sessions cannot create ephemeral nodes.
 3.   Once a local session is lost, you cannot re-establish it using the 
 session-id/password. The session and its watches are gone for good.
 4.   When a local session connects, the session info is only maintained 
 on the zookeeper server (in this case, an observer) that it is connected to. 
 The leader is not aware of the creation of such a session and there is no 
 state written to disk.
 5.   The pings and expiration is handled by the server that the session 
 is connected to.
 With the above changes, we can make ZooKeeper scale to a much larger number 
 of clients without making the core ensemble a bottleneck.
 In terms of API, there are two options that are being considered
 1. Let the client specify at the connect time which kind of session do they 
 want.
 2. All sessions connect as local sessions and automatically get promoted to 
 global sessions when they do an operation that requires a global session 
 (e.g. creating an ephemeral node)
 Chubby took the approach of lazily promoting all sessions to global, but I 
 don't think that would work in our case, where we want to keep sessions which 
 never create ephemeral nodes as always local. Option 2 would make it more 
 broadly usable but option 1 would be easier to implement.
 We are thinking of implementing option 1 as the first cut. There would be a 
 client flag, IsLocalSession (much like the current readOnly flag) that would 
 be used to determine whether to create a local session or a global session.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (ZOOKEEPER-1621) ZooKeeper does not recover from crash when disk was full

2013-01-17 Thread Mahadev konar (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mahadev konar updated ZOOKEEPER-1621:
-

Assignee: Mahadev konar

 ZooKeeper does not recover from crash when disk was full
 

 Key: ZOOKEEPER-1621
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1621
 Project: ZooKeeper
  Issue Type: Bug
  Components: server
Affects Versions: 3.4.3
 Environment: Ubuntu 12.04, Amazon EC2 instance
Reporter: David Arthur
Assignee: Mahadev konar
 Fix For: 3.5.0

 Attachments: zookeeper.log.gz


 The disk that ZooKeeper was using filled up. During a snapshot write, I got 
 the following exception
 2013-01-16 03:11:14,098 - ERROR [SyncThread:0:SyncRequestProcessor@151] - 
 Severe unrecoverable error, exiting
 java.io.IOException: No space left on device
 at java.io.FileOutputStream.writeBytes(Native Method)
 at java.io.FileOutputStream.write(FileOutputStream.java:282)
 at 
 java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
 at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123)
 at 
 org.apache.zookeeper.server.persistence.FileTxnLog.commit(FileTxnLog.java:309)
 at 
 org.apache.zookeeper.server.persistence.FileTxnSnapLog.commit(FileTxnSnapLog.java:306)
 at org.apache.zookeeper.server.ZKDatabase.commit(ZKDatabase.java:484)
 at 
 org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:162)
 at 
 org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:101)
 Then many subsequent exceptions like:
 2013-01-16 15:02:23,984 - ERROR [main:Util@239] - Last transaction was 
 partial.
 2013-01-16 15:02:23,985 - ERROR [main:ZooKeeperServerMain@63] - Unexpected 
 exception, exiting abnormally
 java.io.EOFException
 at java.io.DataInputStream.readInt(DataInputStream.java:375)
 at 
 org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
 at 
 org.apache.zookeeper.server.persistence.FileHeader.deserialize(FileHeader.java:64)
 at 
 org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.inStreamCreated(FileTxnLog.java:558)
 at 
 org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.createInputArchive(FileTxnLog.java:577)
 at 
 org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.goToNextLog(FileTxnLog.java:543)
 at 
 org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:625)
 at 
 org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.init(FileTxnLog.java:529)
 at 
 org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.init(FileTxnLog.java:504)
 at 
 org.apache.zookeeper.server.persistence.FileTxnLog.read(FileTxnLog.java:341)
 at 
 org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:130)
 at 
 org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223)
 at 
 org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:259)
 at 
 org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:386)
 at 
 org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:138)
 at 
 org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:112)
 at 
 org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)
 at 
 org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)
 at 
 org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)
 at 
 org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)
 It seems to me that writing the transaction log should be fully atomic to 
 avoid such situations. Is this not the case?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (ZOOKEEPER-1621) ZooKeeper does not recover from crash when disk was full

2013-01-17 Thread Mahadev konar (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13557022#comment-13557022
 ] 

Mahadev konar commented on ZOOKEEPER-1621:
--

Looks like the header was incomplete. Unfortunately we do not handle corrupt 
header but do handle corrupt txn's later. Am suprised that this happened twice 
in a row for 2 users. Ill upload a patch and test case.

 ZooKeeper does not recover from crash when disk was full
 

 Key: ZOOKEEPER-1621
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1621
 Project: ZooKeeper
  Issue Type: Bug
  Components: server
Affects Versions: 3.4.3
 Environment: Ubuntu 12.04, Amazon EC2 instance
Reporter: David Arthur
 Fix For: 3.5.0

 Attachments: zookeeper.log.gz


 The disk that ZooKeeper was using filled up. During a snapshot write, I got 
 the following exception
 2013-01-16 03:11:14,098 - ERROR [SyncThread:0:SyncRequestProcessor@151] - 
 Severe unrecoverable error, exiting
 java.io.IOException: No space left on device
 at java.io.FileOutputStream.writeBytes(Native Method)
 at java.io.FileOutputStream.write(FileOutputStream.java:282)
 at 
 java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
 at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123)
 at 
 org.apache.zookeeper.server.persistence.FileTxnLog.commit(FileTxnLog.java:309)
 at 
 org.apache.zookeeper.server.persistence.FileTxnSnapLog.commit(FileTxnSnapLog.java:306)
 at org.apache.zookeeper.server.ZKDatabase.commit(ZKDatabase.java:484)
 at 
 org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:162)
 at 
 org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:101)
 Then many subsequent exceptions like:
 2013-01-16 15:02:23,984 - ERROR [main:Util@239] - Last transaction was 
 partial.
 2013-01-16 15:02:23,985 - ERROR [main:ZooKeeperServerMain@63] - Unexpected 
 exception, exiting abnormally
 java.io.EOFException
 at java.io.DataInputStream.readInt(DataInputStream.java:375)
 at 
 org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
 at 
 org.apache.zookeeper.server.persistence.FileHeader.deserialize(FileHeader.java:64)
 at 
 org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.inStreamCreated(FileTxnLog.java:558)
 at 
 org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.createInputArchive(FileTxnLog.java:577)
 at 
 org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.goToNextLog(FileTxnLog.java:543)
 at 
 org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:625)
 at 
 org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.init(FileTxnLog.java:529)
 at 
 org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.init(FileTxnLog.java:504)
 at 
 org.apache.zookeeper.server.persistence.FileTxnLog.read(FileTxnLog.java:341)
 at 
 org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:130)
 at 
 org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223)
 at 
 org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:259)
 at 
 org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:386)
 at 
 org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:138)
 at 
 org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:112)
 at 
 org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)
 at 
 org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)
 at 
 org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)
 at 
 org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)
 It seems to me that writing the transaction log should be fully atomic to 
 avoid such situations. Is this not the case?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (ZOOKEEPER-1624) PrepRequestProcessor abort multi-operation incorrectly

2013-01-17 Thread Mahadev konar (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mahadev konar updated ZOOKEEPER-1624:
-

Fix Version/s: 3.5.0

 PrepRequestProcessor abort multi-operation incorrectly
 --

 Key: ZOOKEEPER-1624
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1624
 Project: ZooKeeper
  Issue Type: Bug
  Components: server
Reporter: Thawan Kooburat
Assignee: Thawan Kooburat
Priority: Critical
 Fix For: 3.5.0

 Attachments: ZOOKEEPER-1624.patch


 We found this issue when trying to issue multiple instances of the following 
 multi-op concurrently
 multi {
 1. create sequential node /a- 
 2. create node /b
 }
 The expected result is that only the first multi-op request should success 
 and the rest of request should fail because /b is already exist
 However, the reported result is that the subsequence multi-op failed because 
 of sequential node creation failed which is not possible.
 Below is the return code for each sub-op when issuing 3 instances of the 
 above multi-op asynchronously
 1. ZOK, ZOK
 2. ZOK, ZNODEEXISTS,
 3. ZNODEEXISTS, ZRUNTIMEINCONSISTENCY,
 When I added more debug log. The cause is that PrepRequestProcessor rollback 
 outstandingChanges of the second multi-op incorrectly causing sequential node 
 name generation to be incorrect. Below is the sequential node name generated 
 by PrepRequestProcessor
 1. create /a-0001
 2. create /a-0003
 3. create /a-0001
 The bug is getPendingChanges() method. In failed to copied ChangeRecord for 
 the parent node (/).  So rollbackPendingChanges() cannot restore the right 
 previous change record of the parent node when aborting the second multi-op
 The impact of this bug is that sequential node creation on the same parent 
 node may fail until the previous one is committed. I am not sure if there is 
 other implication or not.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (ZOOKEEPER-1621) ZooKeeper does not recover from crash when disk was full

2013-01-16 Thread Mahadev konar (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mahadev konar updated ZOOKEEPER-1621:
-

Fix Version/s: 3.4.6

 ZooKeeper does not recover from crash when disk was full
 

 Key: ZOOKEEPER-1621
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1621
 Project: ZooKeeper
  Issue Type: Bug
  Components: server
Affects Versions: 3.4.3
 Environment: Ubuntu 12.04, Amazon EC2 instance
Reporter: David Arthur
 Fix For: 3.4.6


 The disk that ZooKeeper was using filled up. During a snapshot write, I got 
 the following exception
 2013-01-16 03:11:14,098 - ERROR [SyncThread:0:SyncRequestProcessor@151] - 
 Severe unrecoverable error, exiting
 java.io.IOException: No space left on device
 at java.io.FileOutputStream.writeBytes(Native Method)
 at java.io.FileOutputStream.write(FileOutputStream.java:282)
 at 
 java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
 at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123)
 at 
 org.apache.zookeeper.server.persistence.FileTxnLog.commit(FileTxnLog.java:309)
 at 
 org.apache.zookeeper.server.persistence.FileTxnSnapLog.commit(FileTxnSnapLog.java:306)
 at org.apache.zookeeper.server.ZKDatabase.commit(ZKDatabase.java:484)
 at 
 org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:162)
 at 
 org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:101)
 Then many subsequent exceptions like:
 2013-01-16 15:02:23,984 - ERROR [main:Util@239] - Last transaction was 
 partial.
 2013-01-16 15:02:23,985 - ERROR [main:ZooKeeperServerMain@63] - Unexpected 
 exception, exiting abnormally
 java.io.EOFException
 at java.io.DataInputStream.readInt(DataInputStream.java:375)
 at 
 org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
 at 
 org.apache.zookeeper.server.persistence.FileHeader.deserialize(FileHeader.java:64)
 at 
 org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.inStreamCreated(FileTxnLog.java:558)
 at 
 org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.createInputArchive(FileTxnLog.java:577)
 at 
 org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.goToNextLog(FileTxnLog.java:543)
 at 
 org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:625)
 at 
 org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.init(FileTxnLog.java:529)
 at 
 org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.init(FileTxnLog.java:504)
 at 
 org.apache.zookeeper.server.persistence.FileTxnLog.read(FileTxnLog.java:341)
 at 
 org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:130)
 at 
 org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223)
 at 
 org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:259)
 at 
 org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:386)
 at 
 org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:138)
 at 
 org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:112)
 at 
 org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)
 at 
 org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)
 at 
 org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)
 at 
 org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)
 It seems to me that writing the transaction log should be fully atomic to 
 avoid such situations. Is this not the case?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (ZOOKEEPER-1621) ZooKeeper does not recover from crash when disk was full

2013-01-16 Thread Mahadev konar (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mahadev konar updated ZOOKEEPER-1621:
-

Priority: Major  (was: Critical)

 ZooKeeper does not recover from crash when disk was full
 

 Key: ZOOKEEPER-1621
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1621
 Project: ZooKeeper
  Issue Type: Bug
  Components: server
Affects Versions: 3.4.3
 Environment: Ubuntu 12.04, Amazon EC2 instance
Reporter: David Arthur
 Fix For: 3.4.6


 The disk that ZooKeeper was using filled up. During a snapshot write, I got 
 the following exception
 2013-01-16 03:11:14,098 - ERROR [SyncThread:0:SyncRequestProcessor@151] - 
 Severe unrecoverable error, exiting
 java.io.IOException: No space left on device
 at java.io.FileOutputStream.writeBytes(Native Method)
 at java.io.FileOutputStream.write(FileOutputStream.java:282)
 at 
 java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
 at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123)
 at 
 org.apache.zookeeper.server.persistence.FileTxnLog.commit(FileTxnLog.java:309)
 at 
 org.apache.zookeeper.server.persistence.FileTxnSnapLog.commit(FileTxnSnapLog.java:306)
 at org.apache.zookeeper.server.ZKDatabase.commit(ZKDatabase.java:484)
 at 
 org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:162)
 at 
 org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:101)
 Then many subsequent exceptions like:
 2013-01-16 15:02:23,984 - ERROR [main:Util@239] - Last transaction was 
 partial.
 2013-01-16 15:02:23,985 - ERROR [main:ZooKeeperServerMain@63] - Unexpected 
 exception, exiting abnormally
 java.io.EOFException
 at java.io.DataInputStream.readInt(DataInputStream.java:375)
 at 
 org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
 at 
 org.apache.zookeeper.server.persistence.FileHeader.deserialize(FileHeader.java:64)
 at 
 org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.inStreamCreated(FileTxnLog.java:558)
 at 
 org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.createInputArchive(FileTxnLog.java:577)
 at 
 org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.goToNextLog(FileTxnLog.java:543)
 at 
 org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:625)
 at 
 org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.init(FileTxnLog.java:529)
 at 
 org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.init(FileTxnLog.java:504)
 at 
 org.apache.zookeeper.server.persistence.FileTxnLog.read(FileTxnLog.java:341)
 at 
 org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:130)
 at 
 org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223)
 at 
 org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:259)
 at 
 org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:386)
 at 
 org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:138)
 at 
 org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:112)
 at 
 org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)
 at 
 org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)
 at 
 org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)
 at 
 org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)
 It seems to me that writing the transaction log should be fully atomic to 
 avoid such situations. Is this not the case?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (ZOOKEEPER-1621) ZooKeeper does not recover from crash when disk was full

2013-01-16 Thread Mahadev konar (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13555169#comment-13555169
 ] 

Mahadev konar commented on ZOOKEEPER-1621:
--

David,
 So there exceptions are thrown when ZooKeeper is running? Am not sure why its 
exiting so many times. Do you guys restart the ZK server if it dies?

 ZooKeeper does not recover from crash when disk was full
 

 Key: ZOOKEEPER-1621
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1621
 Project: ZooKeeper
  Issue Type: Bug
  Components: server
Affects Versions: 3.4.3
 Environment: Ubuntu 12.04, Amazon EC2 instance
Reporter: David Arthur
 Fix For: 3.5.0


 The disk that ZooKeeper was using filled up. During a snapshot write, I got 
 the following exception
 2013-01-16 03:11:14,098 - ERROR [SyncThread:0:SyncRequestProcessor@151] - 
 Severe unrecoverable error, exiting
 java.io.IOException: No space left on device
 at java.io.FileOutputStream.writeBytes(Native Method)
 at java.io.FileOutputStream.write(FileOutputStream.java:282)
 at 
 java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
 at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123)
 at 
 org.apache.zookeeper.server.persistence.FileTxnLog.commit(FileTxnLog.java:309)
 at 
 org.apache.zookeeper.server.persistence.FileTxnSnapLog.commit(FileTxnSnapLog.java:306)
 at org.apache.zookeeper.server.ZKDatabase.commit(ZKDatabase.java:484)
 at 
 org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:162)
 at 
 org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:101)
 Then many subsequent exceptions like:
 2013-01-16 15:02:23,984 - ERROR [main:Util@239] - Last transaction was 
 partial.
 2013-01-16 15:02:23,985 - ERROR [main:ZooKeeperServerMain@63] - Unexpected 
 exception, exiting abnormally
 java.io.EOFException
 at java.io.DataInputStream.readInt(DataInputStream.java:375)
 at 
 org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
 at 
 org.apache.zookeeper.server.persistence.FileHeader.deserialize(FileHeader.java:64)
 at 
 org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.inStreamCreated(FileTxnLog.java:558)
 at 
 org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.createInputArchive(FileTxnLog.java:577)
 at 
 org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.goToNextLog(FileTxnLog.java:543)
 at 
 org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:625)
 at 
 org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.init(FileTxnLog.java:529)
 at 
 org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.init(FileTxnLog.java:504)
 at 
 org.apache.zookeeper.server.persistence.FileTxnLog.read(FileTxnLog.java:341)
 at 
 org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:130)
 at 
 org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223)
 at 
 org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:259)
 at 
 org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:386)
 at 
 org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:138)
 at 
 org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:112)
 at 
 org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)
 at 
 org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)
 at 
 org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)
 at 
 org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)
 It seems to me that writing the transaction log should be fully atomic to 
 avoid such situations. Is this not the case?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (ZOOKEEPER-1621) ZooKeeper does not recover from crash when disk was full

2013-01-16 Thread Mahadev konar (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13555192#comment-13555192
 ] 

Mahadev konar commented on ZOOKEEPER-1621:
--

David,
 I thought you said it does not recover when disk was full, but looks like the 
disk is still full? No?

 ZooKeeper does not recover from crash when disk was full
 

 Key: ZOOKEEPER-1621
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1621
 Project: ZooKeeper
  Issue Type: Bug
  Components: server
Affects Versions: 3.4.3
 Environment: Ubuntu 12.04, Amazon EC2 instance
Reporter: David Arthur
 Fix For: 3.5.0


 The disk that ZooKeeper was using filled up. During a snapshot write, I got 
 the following exception
 2013-01-16 03:11:14,098 - ERROR [SyncThread:0:SyncRequestProcessor@151] - 
 Severe unrecoverable error, exiting
 java.io.IOException: No space left on device
 at java.io.FileOutputStream.writeBytes(Native Method)
 at java.io.FileOutputStream.write(FileOutputStream.java:282)
 at 
 java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
 at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123)
 at 
 org.apache.zookeeper.server.persistence.FileTxnLog.commit(FileTxnLog.java:309)
 at 
 org.apache.zookeeper.server.persistence.FileTxnSnapLog.commit(FileTxnSnapLog.java:306)
 at org.apache.zookeeper.server.ZKDatabase.commit(ZKDatabase.java:484)
 at 
 org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:162)
 at 
 org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:101)
 Then many subsequent exceptions like:
 2013-01-16 15:02:23,984 - ERROR [main:Util@239] - Last transaction was 
 partial.
 2013-01-16 15:02:23,985 - ERROR [main:ZooKeeperServerMain@63] - Unexpected 
 exception, exiting abnormally
 java.io.EOFException
 at java.io.DataInputStream.readInt(DataInputStream.java:375)
 at 
 org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
 at 
 org.apache.zookeeper.server.persistence.FileHeader.deserialize(FileHeader.java:64)
 at 
 org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.inStreamCreated(FileTxnLog.java:558)
 at 
 org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.createInputArchive(FileTxnLog.java:577)
 at 
 org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.goToNextLog(FileTxnLog.java:543)
 at 
 org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:625)
 at 
 org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.init(FileTxnLog.java:529)
 at 
 org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.init(FileTxnLog.java:504)
 at 
 org.apache.zookeeper.server.persistence.FileTxnLog.read(FileTxnLog.java:341)
 at 
 org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:130)
 at 
 org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223)
 at 
 org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:259)
 at 
 org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:386)
 at 
 org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:138)
 at 
 org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:112)
 at 
 org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)
 at 
 org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)
 at 
 org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)
 at 
 org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)
 It seems to me that writing the transaction log should be fully atomic to 
 avoid such situations. Is this not the case?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (ZOOKEEPER-1612) Zookeeper unable to recover and start once datadir disk is full and disk space cleared

2013-01-16 Thread Mahadev konar (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mahadev konar resolved ZOOKEEPER-1612.
--

Resolution: Duplicate

Duplicate of ZOOKEEPER-1621.

 Zookeeper unable to recover and start once datadir disk is full and disk 
 space cleared
 --

 Key: ZOOKEEPER-1612
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1612
 Project: ZooKeeper
  Issue Type: Bug
Affects Versions: 3.4.3
Reporter: suja s

 Once zookeeper data dir disk becomes full, the process gets shut down.
 {noformat}
 2012-12-14 13:22:26,959 [myid:2] - ERROR 
 [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2181:ZooKeeperServer@276] - Severe 
 unrecoverable error, exiting
 java.io.IOException: No space left on device
   at java.io.FileOutputStream.writeBytes(Native Method)
   at java.io.FileOutputStream.write(FileOutputStream.java:282)
   at 
 java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
   at java.io.BufferedOutputStream.write(BufferedOutputStream.java:109)
   at java.util.zip.CheckedOutputStream.write(CheckedOutputStream.java:56)
   at java.io.DataOutputStream.write(DataOutputStream.java:90)
   at java.io.FilterOutputStream.write(FilterOutputStream.java:80)
   at 
 org.apache.jute.BinaryOutputArchive.writeBuffer(BinaryOutputArchive.java:119)
   at org.apache.zookeeper.server.DataNode.serialize(DataNode.java:168)
   at 
 org.apache.jute.BinaryOutputArchive.writeRecord(BinaryOutputArchive.java:123)
   at 
 org.apache.zookeeper.server.DataTree.serializeNode(DataTree.java:1115)
   at 
 org.apache.zookeeper.server.DataTree.serializeNode(DataTree.java:1130)
   at 
 org.apache.zookeeper.server.DataTree.serializeNode(DataTree.java:1130)
   at org.apache.zookeeper.server.DataTree.serialize(DataTree.java:1179)
   at 
 org.apache.zookeeper.server.util.SerializeUtils.serializeSnapshot(SerializeUtils.java:138)
   at 
 org.apache.zookeeper.server.persistence.FileSnap.serialize(FileSnap.java:213)
   at 
 org.apache.zookeeper.server.persistence.FileSnap.serialize(FileSnap.java:230)
   at 
 org.apache.zookeeper.server.persistence.FileTxnSnapLog.save(FileTxnSnapLog.java:242)
   at 
 org.apache.zookeeper.server.ZooKeeperServer.takeSnapshot(ZooKeeperServer.java:274)
   at 
 org.apache.zookeeper.server.quorum.Learner.syncWithLeader(Learner.java:407)
   at 
 org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:82)
   at 
 org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:759)
 {noformat}
 Later disk space is cleared and zk started again. Startup of zk fails as it 
 is not able to read snapshot properly. (Since load from disk failed it is not 
 able to join peers in the quorum and get a snapshot diff)
 {noformat}
 2012-12-14 16:20:31,489 [myid:2] - INFO  [main:FileSnap@83] - Reading 
 snapshot ../dataDir/version-2/snapshot.100042
 2012-12-14 16:20:31,564 [myid:2] - ERROR [main:QuorumPeer@472] - Unable to 
 load database on disk
 java.io.EOFException
   at java.io.DataInputStream.readInt(DataInputStream.java:375)
   at 
 org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
   at 
 org.apache.zookeeper.server.persistence.FileHeader.deserialize(FileHeader.java:64)
   at 
 org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.inStreamCreated(FileTxnLog.java:558)
   at 
 org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.createInputArchive(FileTxnLog.java:577)
   at 
 org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.goToNextLog(FileTxnLog.java:543)
   at 
 org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:625)
   at 
 org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.init(FileTxnLog.java:529)
   at 
 org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.init(FileTxnLog.java:504)
   at 
 org.apache.zookeeper.server.persistence.FileTxnLog.read(FileTxnLog.java:341)
   at 
 org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:132)
   at 
 org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223)
   at 
 org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:436)
   at 
 org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:428)
   at 
 org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:152)
   at 
 org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:111)
   at 
 org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)
 2012-12-14 16:20:31,566 [myid:2] - ERROR [main:QuorumPeerMain@89] - 
 

[jira] [Commented] (ZOOKEEPER-1621) ZooKeeper does not recover from crash when disk was full

2013-01-16 Thread Mahadev konar (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13555318#comment-13555318
 ] 

Mahadev konar commented on ZOOKEEPER-1621:
--

Ill makr 1612 as dup. Thanks for pointing that out Edward.



 ZooKeeper does not recover from crash when disk was full
 

 Key: ZOOKEEPER-1621
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1621
 Project: ZooKeeper
  Issue Type: Bug
  Components: server
Affects Versions: 3.4.3
 Environment: Ubuntu 12.04, Amazon EC2 instance
Reporter: David Arthur
 Fix For: 3.5.0

 Attachments: zookeeper.log.gz


 The disk that ZooKeeper was using filled up. During a snapshot write, I got 
 the following exception
 2013-01-16 03:11:14,098 - ERROR [SyncThread:0:SyncRequestProcessor@151] - 
 Severe unrecoverable error, exiting
 java.io.IOException: No space left on device
 at java.io.FileOutputStream.writeBytes(Native Method)
 at java.io.FileOutputStream.write(FileOutputStream.java:282)
 at 
 java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
 at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123)
 at 
 org.apache.zookeeper.server.persistence.FileTxnLog.commit(FileTxnLog.java:309)
 at 
 org.apache.zookeeper.server.persistence.FileTxnSnapLog.commit(FileTxnSnapLog.java:306)
 at org.apache.zookeeper.server.ZKDatabase.commit(ZKDatabase.java:484)
 at 
 org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:162)
 at 
 org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:101)
 Then many subsequent exceptions like:
 2013-01-16 15:02:23,984 - ERROR [main:Util@239] - Last transaction was 
 partial.
 2013-01-16 15:02:23,985 - ERROR [main:ZooKeeperServerMain@63] - Unexpected 
 exception, exiting abnormally
 java.io.EOFException
 at java.io.DataInputStream.readInt(DataInputStream.java:375)
 at 
 org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
 at 
 org.apache.zookeeper.server.persistence.FileHeader.deserialize(FileHeader.java:64)
 at 
 org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.inStreamCreated(FileTxnLog.java:558)
 at 
 org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.createInputArchive(FileTxnLog.java:577)
 at 
 org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.goToNextLog(FileTxnLog.java:543)
 at 
 org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:625)
 at 
 org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.init(FileTxnLog.java:529)
 at 
 org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.init(FileTxnLog.java:504)
 at 
 org.apache.zookeeper.server.persistence.FileTxnLog.read(FileTxnLog.java:341)
 at 
 org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:130)
 at 
 org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223)
 at 
 org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:259)
 at 
 org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:386)
 at 
 org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:138)
 at 
 org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:112)
 at 
 org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)
 at 
 org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)
 at 
 org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)
 at 
 org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)
 It seems to me that writing the transaction log should be fully atomic to 
 avoid such situations. Is this not the case?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (ZOOKEEPER-1622) session ids will be negative in the year 2022

2013-01-16 Thread Mahadev konar (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13555698#comment-13555698
 ] 

Mahadev konar commented on ZOOKEEPER-1622:
--

Nice catch Eric! I think we do document that id be between 0 and 255 but maybe 
we should error out if that is not the case.


 session ids will be negative in the year 2022
 -

 Key: ZOOKEEPER-1622
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1622
 Project: ZooKeeper
  Issue Type: Bug
Reporter: Eric Newton
Priority: Trivial

 Someone decided to use a large number for their myid file.  This cause 
 session ids to go negative, and our software (Apache Accumulo) did not handle 
 this very well.  While diagnosing the problem, I noticed this in SessionImpl:
 {noformat}
public static long initializeNextSession(long id) {
 long nextSid = 0;
 nextSid = (System.currentTimeMillis()  24)  8;
 nextSid =  nextSid | (id 56);
 return nextSid;
 }
 {noformat}
 When the 40th bit in System.currentTimeMillis() is a one, sign extension will 
 fill the upper 8 bytes of nextSid, and id will not make the session id 
 unique.  I recommend changing the right shift to the logical shift:
 {noformat}
public static long initializeNextSession(long id) {
 long nextSid = 0;
 nextSid = (System.currentTimeMillis()  24)  8;
 nextSid =  nextSid | (id 56);
 return nextSid;
 }
 {noformat}
 But, we have until the year 2022 before we have to worry about it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (ZOOKEEPER-1147) Add support for local sessions

2013-01-15 Thread Mahadev konar (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13554819#comment-13554819
 ] 

Mahadev konar commented on ZOOKEEPER-1147:
--

[~thawan] this helps. Thanks for the information. I still have a couple of more 
questions:

- Will a read only client always get a session expiration if a disconnect 
happens even though its not tried all the other servers? 
- Is the local session id the same as global session id when its created (I 
mean as the long value)? If its the same I think we have a problem with the 
shifting of client between servers.. 

bq. When a client reconnects to B, its sessionId won’t exist in B’s local 
session tracker. So B will send validation packet. If CreateSession issued by A 
is committed before validation packet arrive the client will be able to 
connect. Otherwise, the client will get session expired because the quorum 
hasn’t know about this session yet. If the client also tries to connect back to 
A again, the session is already removed from local session tracker. So A will 
need to send a validation packet to the leader. The outcome should be the same 
as B depending on the timing of the request.

 Add support for local sessions
 --

 Key: ZOOKEEPER-1147
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1147
 Project: ZooKeeper
  Issue Type: Improvement
  Components: server
Affects Versions: 3.3.3
Reporter: Vishal Kathuria
Assignee: Thawan Kooburat
  Labels: api-change, scaling
 Fix For: 3.5.0

 Attachments: ZOOKEEPER-1147.patch

   Original Estimate: 840h
  Remaining Estimate: 840h

 This improvement is in the bucket of making ZooKeeper work at a large scale. 
 We are planning on having about a 1 million clients connect to a ZooKeeper 
 ensemble through a set of 50-100 observers. Majority of these clients are 
 read only - ie they do not do any updates or create ephemeral nodes.
 In ZooKeeper today, the client creates a session and the session creation is 
 handled like any other update. In the above use case, the session create/drop 
 workload can easily overwhelm an ensemble. The following is a proposal for a 
 local session, to support a larger number of connections.
 1.   The idea is to introduce a new type of session - local session. A 
 local session doesn't have a full functionality of a normal session.
 2.   Local sessions cannot create ephemeral nodes.
 3.   Once a local session is lost, you cannot re-establish it using the 
 session-id/password. The session and its watches are gone for good.
 4.   When a local session connects, the session info is only maintained 
 on the zookeeper server (in this case, an observer) that it is connected to. 
 The leader is not aware of the creation of such a session and there is no 
 state written to disk.
 5.   The pings and expiration is handled by the server that the session 
 is connected to.
 With the above changes, we can make ZooKeeper scale to a much larger number 
 of clients without making the core ensemble a bottleneck.
 In terms of API, there are two options that are being considered
 1. Let the client specify at the connect time which kind of session do they 
 want.
 2. All sessions connect as local sessions and automatically get promoted to 
 global sessions when they do an operation that requires a global session 
 (e.g. creating an ephemeral node)
 Chubby took the approach of lazily promoting all sessions to global, but I 
 don't think that would work in our case, where we want to keep sessions which 
 never create ephemeral nodes as always local. Option 2 would make it more 
 broadly usable but option 1 would be easier to implement.
 We are thinking of implementing option 1 as the first cut. There would be a 
 client flag, IsLocalSession (much like the current readOnly flag) that would 
 be used to determine whether to create a local session or a global session.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (ZOOKEEPER-1549) Data inconsistency when follower is receiving a DIFF with a dirty snapshot

2013-01-14 Thread Mahadev konar (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mahadev konar updated ZOOKEEPER-1549:
-

Fix Version/s: 3.4.6

 Data inconsistency when follower is receiving a DIFF with a dirty snapshot
 --

 Key: ZOOKEEPER-1549
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1549
 Project: ZooKeeper
  Issue Type: Bug
  Components: quorum
Affects Versions: 3.4.3
Reporter: Jacky007
Priority: Blocker
 Fix For: 3.4.6

 Attachments: case.patch, ZOOKEEPER-1549-learner.patch


 the trunc code (from ZOOKEEPER-1154?) cannot work correct if the snapshot is 
 not correct.
 here is scenario(similar to 1154):
 Initial Condition
 1.Lets say there are three nodes in the ensemble A,B,C with A being the 
 leader
 2.The current epoch is 7. 
 3.For simplicity of the example, lets say zxid is a two digit number, 
 with epoch being the first digit.
 4.The zxid is 73
 5.All the nodes have seen the change 73 and have persistently logged it.
 Step 1
 Request with zxid 74 is issued. The leader A writes it to the log but there 
 is a crash of the entire ensemble and B,C never write the change 74 to their 
 log.
 Step 2
 A,B restart, A is elected as the new leader,  and A will load data and take a 
 clean snapshot(change 74 is in it), then send diff to B, but B died before 
 sync with A. A died later.
 Step 3
 B,C restart, A is still down
 B,C form the quorum
 B is the new leader. Lets say B minCommitLog is 71 and maxCommitLog is 73
 epoch is now 8, zxid is 80
 Request with zxid 81 is successful. On B, minCommitLog is now 71, 
 maxCommitLog is 81
 Step 4
 A starts up. It applies the change in request with zxid 74 to its in-memory 
 data tree
 A contacts B to registerAsFollower and provides 74 as its ZxId
 Since 71=74=81, B decides to send A the diff. 
 Problem:
 The problem with the above sequence is that after truncate the log, A will 
 load the snapshot again which is not correct.
 In 3.3 branch, FileTxnSnapLog.restore does not call listener(ZOOKEEPER-874), 
 the leader will send a snapshot to follower, it will not be a problem.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (ZOOKEEPER-1549) Data inconsistency when follower is receiving a DIFF with a dirty snapshot

2013-01-14 Thread Mahadev konar (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mahadev konar updated ZOOKEEPER-1549:
-

Assignee: Thawan Kooburat

 Data inconsistency when follower is receiving a DIFF with a dirty snapshot
 --

 Key: ZOOKEEPER-1549
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1549
 Project: ZooKeeper
  Issue Type: Bug
  Components: quorum
Affects Versions: 3.4.3
Reporter: Jacky007
Assignee: Thawan Kooburat
Priority: Blocker
 Fix For: 3.4.6

 Attachments: case.patch, ZOOKEEPER-1549-learner.patch


 the trunc code (from ZOOKEEPER-1154?) cannot work correct if the snapshot is 
 not correct.
 here is scenario(similar to 1154):
 Initial Condition
 1.Lets say there are three nodes in the ensemble A,B,C with A being the 
 leader
 2.The current epoch is 7. 
 3.For simplicity of the example, lets say zxid is a two digit number, 
 with epoch being the first digit.
 4.The zxid is 73
 5.All the nodes have seen the change 73 and have persistently logged it.
 Step 1
 Request with zxid 74 is issued. The leader A writes it to the log but there 
 is a crash of the entire ensemble and B,C never write the change 74 to their 
 log.
 Step 2
 A,B restart, A is elected as the new leader,  and A will load data and take a 
 clean snapshot(change 74 is in it), then send diff to B, but B died before 
 sync with A. A died later.
 Step 3
 B,C restart, A is still down
 B,C form the quorum
 B is the new leader. Lets say B minCommitLog is 71 and maxCommitLog is 73
 epoch is now 8, zxid is 80
 Request with zxid 81 is successful. On B, minCommitLog is now 71, 
 maxCommitLog is 81
 Step 4
 A starts up. It applies the change in request with zxid 74 to its in-memory 
 data tree
 A contacts B to registerAsFollower and provides 74 as its ZxId
 Since 71=74=81, B decides to send A the diff. 
 Problem:
 The problem with the above sequence is that after truncate the log, A will 
 load the snapshot again which is not correct.
 In 3.3 branch, FileTxnSnapLog.restore does not call listener(ZOOKEEPER-874), 
 the leader will send a snapshot to follower, it will not be a problem.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (ZOOKEEPER-1549) Data inconsistency when follower is receiving a DIFF with a dirty snapshot

2013-01-14 Thread Mahadev konar (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13553574#comment-13553574
 ] 

Mahadev konar commented on ZOOKEEPER-1549:
--

Thanks [~thawan]!

 Data inconsistency when follower is receiving a DIFF with a dirty snapshot
 --

 Key: ZOOKEEPER-1549
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1549
 Project: ZooKeeper
  Issue Type: Bug
  Components: quorum
Affects Versions: 3.4.3
Reporter: Jacky007
Assignee: Thawan Kooburat
Priority: Blocker
 Fix For: 3.4.6

 Attachments: case.patch, ZOOKEEPER-1549-learner.patch


 the trunc code (from ZOOKEEPER-1154?) cannot work correct if the snapshot is 
 not correct.
 here is scenario(similar to 1154):
 Initial Condition
 1.Lets say there are three nodes in the ensemble A,B,C with A being the 
 leader
 2.The current epoch is 7. 
 3.For simplicity of the example, lets say zxid is a two digit number, 
 with epoch being the first digit.
 4.The zxid is 73
 5.All the nodes have seen the change 73 and have persistently logged it.
 Step 1
 Request with zxid 74 is issued. The leader A writes it to the log but there 
 is a crash of the entire ensemble and B,C never write the change 74 to their 
 log.
 Step 2
 A,B restart, A is elected as the new leader,  and A will load data and take a 
 clean snapshot(change 74 is in it), then send diff to B, but B died before 
 sync with A. A died later.
 Step 3
 B,C restart, A is still down
 B,C form the quorum
 B is the new leader. Lets say B minCommitLog is 71 and maxCommitLog is 73
 epoch is now 8, zxid is 80
 Request with zxid 81 is successful. On B, minCommitLog is now 71, 
 maxCommitLog is 81
 Step 4
 A starts up. It applies the change in request with zxid 74 to its in-memory 
 data tree
 A contacts B to registerAsFollower and provides 74 as its ZxId
 Since 71=74=81, B decides to send A the diff. 
 Problem:
 The problem with the above sequence is that after truncate the log, A will 
 load the snapshot again which is not correct.
 In 3.3 branch, FileTxnSnapLog.restore does not call listener(ZOOKEEPER-874), 
 the leader will send a snapshot to follower, it will not be a problem.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (ZOOKEEPER-1603) StaticHostProviderTest testUpdateClientMigrateOrNot hangs

2012-12-19 Thread Mahadev konar (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13536229#comment-13536229
 ] 

Mahadev konar commented on ZOOKEEPER-1603:
--

Pat,
 Not sure why we had this. Seems like an over sight.

 StaticHostProviderTest testUpdateClientMigrateOrNot hangs
 -

 Key: ZOOKEEPER-1603
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1603
 Project: ZooKeeper
  Issue Type: Bug
  Components: tests
Affects Versions: 3.5.0
Reporter: Patrick Hunt
Assignee: Alexander Shraer
Priority: Blocker
 Fix For: 3.5.0

 Attachments: ZOOKEEPER-1603-ver1.patch, ZOOKEEPER-1603-ver2.patch


 StaticHostProviderTest method testUpdateClientMigrateOrNot hangs forever.
 On my laptop getHostName for 10.10.10.* takes 5+ seconds per call. As a 
 result this method effectively runs forever.
 Every time I run this test it hangs. Consistent.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (ZOOKEEPER-1504) Multi-thread NIOServerCnxn

2012-12-17 Thread Mahadev konar (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13534246#comment-13534246
 ] 

Mahadev konar commented on ZOOKEEPER-1504:
--

Pat,
 Makes sense. We can do it in a separate jira.

 Multi-thread NIOServerCnxn
 --

 Key: ZOOKEEPER-1504
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1504
 Project: ZooKeeper
  Issue Type: Improvement
  Components: server
Affects Versions: 3.4.3, 3.4.4, 3.5.0
Reporter: Jay Shrauner
Assignee: Jay Shrauner
  Labels: performance, scaling
 Fix For: 3.5.0

 Attachments: ZOOKEEPER-1504.patch, ZOOKEEPER-1504.patch, 
 ZOOKEEPER-1504.patch, ZOOKEEPER-1504.patch, ZOOKEEPER-1504.patch


 NIOServerCnxnFactory is single threaded, which doesn't scale well to large 
 numbers of clients. This is particularly noticeable when thousands of clients 
 connect. I propose multi-threading this code as follows:
 - 1   acceptor thread, for accepting new connections
 - 1-N selector threads
 - 0-M I/O worker threads
 Numbers of threads are configurable, with defaults scaling according to 
 number of cores. Communication with the selector threads is handled via 
 LinkedBlockingQueues, and connections are permanently assigned to a 
 particular selector thread so that all potentially blocking SelectionKey 
 operations can be performed solely by the selector thread. An ExecutorService 
 is used for the worker threads.
 On a 32 core machine running Linux 2.6.38, achieved best performance with 4 
 selector threads and 64 worker threads for a 70% +/- 5% improvement in 
 throughput.
 This patch incorporates and supersedes the patches for
 https://issues.apache.org/jira/browse/ZOOKEEPER-517
 https://issues.apache.org/jira/browse/ZOOKEEPER-1444
 New classes introduced in this patch are:
   - ExpiryQueue (from ZOOKEEPER-1444): factor out the logic from 
 SessionTrackerImpl used to expire sessions so that the same logic can be used 
 to expire connections
   - RateLogger (from ZOOKEEPER-517): rate limit error message logging, 
 currently only used to throttle rate of logging out of file descriptors 
 errors
   - WorkerService (also in ZOOKEEPER-1505): ExecutorService wrapper that 
 makes worker threads daemon threads and names then in an easily debuggable 
 manner. Supports assignable threads (as used by CommitProcessor) and 
 non-assignable threads (as used here).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (ZOOKEEPER-575) remove System.exit calls to make the server more container friendly

2012-12-16 Thread Mahadev konar (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mahadev konar updated ZOOKEEPER-575:


Attachment: ZOOKEEPER-575_4.patch

Updated the patch for trunk. This would be really be nice to get in and make it 
cleaner to embed ZK.

 remove System.exit calls to make the server more container friendly
 ---

 Key: ZOOKEEPER-575
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-575
 Project: ZooKeeper
  Issue Type: Improvement
  Components: server
Affects Versions: 3.4.0
Reporter: Patrick Hunt
Assignee: Andrew Finnell
 Fix For: 3.5.0

 Attachments: ZOOKEEPER-575-2.patch, ZOOKEEPER-575-3.patch, 
 ZOOKEEPER-575_4.patch, ZOOKEEPER-575.patch


 There are a handful of places left in the code that still use System.exit, we 
 should remove these to make the server
 more container friendly.
 There are some legitimate places for the exits - in *Main.java for example 
 should be fine - these are the command
 line main routines. Containers should be embedding code that runs just below 
 this layer (or we should refactor
 so that it would).
 The tricky bit is ensuring the server shuts down in case of an unrecoverable 
 error occurring, afaik these are the
 locations where we still have sys exit calls.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (ZOOKEEPER-1335) Add support for --config to zkEnv.sh to specify a config directory different than what is expected

2012-12-16 Thread Mahadev konar (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13533666#comment-13533666
 ] 

Mahadev konar commented on ZOOKEEPER-1335:
--

+1 for the patch. Looks good to me. Pat doesnt look like we have much 
documentation in forrest for zkServer.sh so I dont think we need any forrest 
docs update. 

 Add support for --config to zkEnv.sh to specify a config directory different 
 than what is expected
 --

 Key: ZOOKEEPER-1335
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1335
 Project: ZooKeeper
  Issue Type: Improvement
Reporter: Arpit Gupta
Assignee: Arpit Gupta
 Fix For: 3.5.0

 Attachments: ZOOKEEPER-1335.patch, ZOOKEEPER-1335.patch


 zkEnv.sh expects ZOOCFGDIR env variable set. If not it looks for the conf dir 
 in the ZOOKEEPER_PREFIX dir or in /etc/zookeeper. It would be great if we can 
 support --config option where at run time you could specify a different 
 config directory. We do the same thing in hadoop.
 With this you should be able to do
 /usr/sbin/zkServer.sh --config /some/conf/dir start|stop

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (ZOOKEEPER-1593) Add Debian style /etc/default/zookeeper support to init script

2012-12-16 Thread Mahadev konar (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13533673#comment-13533673
 ] 

Mahadev konar commented on ZOOKEEPER-1593:
--

Michi/Dirkjan,
 Unfortunately these package files are mostly unused and we probably should be 
getting rid of them given BigTop is doing all the packaging work. Dirkjan are 
you using the packaging in production? Do you think BigTop packaging might be 
of help to you?

 Add Debian style /etc/default/zookeeper support to init script
 --

 Key: ZOOKEEPER-1593
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1593
 Project: ZooKeeper
  Issue Type: Improvement
  Components: scripts
Affects Versions: 3.4.5
 Environment: Debian Linux 6.0
Reporter: Dirkjan Bussink
Priority: Minor
 Attachments: zookeeper_debian_default.patch


 In our configuration we use a different data directory for Zookeeper. The 
 problem is that the current Debian init.d script has the default location 
 hardcoded:
 ZOOPIDDIR=/var/lib/zookeeper/data
 ZOOPIDFILE=${ZOOPIDDIR}/zookeeper_server.pid
 By using the standard Debian practice of allowing for a 
 /etc/default/zookeeper we can redefine these variables to point to the 
 correct location:
 ZOOPIDDIR=/var/lib/zookeeper/data
 ZOOPIDFILE=${ZOOPIDDIR}/zookeeper_server.pid
 [ -r /etc/default/zookeeper ]  . /etc/default/zookeeper
 This currently can't be done through /usr/libexec/zkEnv.sh, since that is 
 loaded before ZOOPIDDIR and ZOOPIDFILE are set. Any change there would 
 therefore undo the setup made in for example /etc/zookeeper/zookeeper-env.sh.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (ZOOKEEPER-1488) Some links are not working in the Zookeeper Documentation

2012-12-16 Thread Mahadev konar (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13533674#comment-13533674
 ] 

Mahadev konar commented on ZOOKEEPER-1488:
--

bq. By the way, I have just seen that the PDF generated in the in the docs 
section still has a 2008 copyright notice (Copyright © 2008 The Apache 
Software Foundation. All rights reserved). Should I open a ticket to update 
this? Or may I try to include in this patch?


Thanks for pointing that out Edward. Please open a jira for that.

 Some links are not working in the Zookeeper Documentation
 -

 Key: ZOOKEEPER-1488
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1488
 Project: ZooKeeper
  Issue Type: Bug
  Components: documentation
Affects Versions: 3.4.3
Reporter: Kiran BC
Assignee: Edward Ribeiro
Priority: Minor
 Attachments: ZOOKEEPER-1488.patch, ZOOKEEPER-1488.patch


 There are some internal link errors in the Zookeeper documentation. The list 
 is as follows:
 docs\zookeeperAdmin.html - tickTime and datadir
 docs\zookeeperOver.html - fg_zkComponents, fg_zkPerfReliability and 
 fg_zkPerfRW
 docs\zookeeperStarted.html - Logging

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (ZOOKEEPER-1552) Enable sync request processor in Observer

2012-12-16 Thread Mahadev konar (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13533676#comment-13533676
 ] 

Mahadev konar commented on ZOOKEEPER-1552:
--

Thawan,
 This is a good idea. As for the patch, I think we have too many system 
properties spread around in the source code. Its best if we can use the 
ZooKeeper config file for this. What do others think? 


 Enable sync request processor in Observer
 -

 Key: ZOOKEEPER-1552
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1552
 Project: ZooKeeper
  Issue Type: Improvement
  Components: server
Affects Versions: 3.4.3
Reporter: Thawan Kooburat
Assignee: Thawan Kooburat
 Fix For: 3.5.0

 Attachments: ZOOKEEPER-1552.patch, ZOOKEEPER-1552.patch


 Observer doesn't forward its txns to SyncRequestProcessor. So it never 
 persists the txns onto disk or periodically creates snapshots. This increases 
 the start-up time since it will get the entire snapshot if the observer has 
 be running for a long time. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Comment Edited] (ZOOKEEPER-1552) Enable sync request processor in Observer

2012-12-16 Thread Mahadev konar (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13533676#comment-13533676
 ] 

Mahadev konar edited comment on ZOOKEEPER-1552 at 12/17/12 6:33 AM:


Thawan,
 This is a good idea. As for the patch, I think we have too many system 
properties spread around in the source code. Its best if we can use the 
ZooKeeper config file for this. What do others think? Other than that, the 
patch looks good.


  was (Author: mahadev):
Thawan,
 This is a good idea. As for the patch, I think we have too many system 
properties spread around in the source code. Its best if we can use the 
ZooKeeper config file for this. What do others think? 

  
 Enable sync request processor in Observer
 -

 Key: ZOOKEEPER-1552
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1552
 Project: ZooKeeper
  Issue Type: Improvement
  Components: server
Affects Versions: 3.4.3
Reporter: Thawan Kooburat
Assignee: Thawan Kooburat
 Fix For: 3.5.0

 Attachments: ZOOKEEPER-1552.patch, ZOOKEEPER-1552.patch


 Observer doesn't forward its txns to SyncRequestProcessor. So it never 
 persists the txns onto disk or periodically creates snapshots. This increases 
 the start-up time since it will get the entire snapshot if the observer has 
 be running for a long time. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (ZOOKEEPER-1480) ClientCnxn(1161) can't get the current zk server add, so that - Session 0x for server null, unexpected error

2012-12-16 Thread Mahadev konar (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13533678#comment-13533678
 ] 

Mahadev konar commented on ZOOKEEPER-1480:
--

Hey Leader,
 There are quite a few chinese characters in the patch. Can you please remove 
those? Also, can you please create a patch against trunk? 

Thanks

 ClientCnxn(1161) can't get the current zk server add, so that - Session 0x 
 for server null, unexpected error
 

 Key: ZOOKEEPER-1480
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1480
 Project: ZooKeeper
  Issue Type: Bug
  Components: java client
Affects Versions: 3.4.3
Reporter: Leader Ni
Assignee: Leader Ni
  Labels: client, getCurrentZooKeeperAddr
 Fix For: 3.5.0

 Attachments: getCurrentZooKeeperAddr_for_3.4.3.patch, 
 getCurrentZooKeeperAddr_for_branch3.4.patch


   When zookeeper occur an unexpected error( Not SessionExpiredException, 
 SessionTimeoutException and EndOfStreamException), ClientCnxn(1161) will log 
 such as the formart Session 0x for server null, unexpected error, closing 
 socket connection and attempting reconnect . The log at line 1161 in 
 zookeeper-3.3.3
   We found that, zookeeper use 
 ((SocketChannel)sockKey.channel()).socket().getRemoteSocketAddress() to get 
 zookeeper addr. But,Sometimes, it logs Session 0x for server null, you 
 know, if log null, developer can't determine the current zookeeper addr that 
 client is connected or connecting.
   I add a method in Class SendThread:InetSocketAddress 
 org.apache.zookeeper.ClientCnxn.SendThread.getCurrentZooKeeperAddr().
   Here:
 /**
 * Returns the address to which the socket is connected.
 * 
 * @return ip address of the remote side of the connection or null if not
 * connected
 */
 @Override
 SocketAddress getRemoteSocketAddress() {
// a lot could go wrong here, so rather than put in a bunch of code
// to check for nulls all down the chain let's do it the simple
// yet bulletproof way 
 .

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (ZOOKEEPER-1504) Multi-thread NIOServerCnxn

2012-12-16 Thread Mahadev konar (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13533687#comment-13533687
 ] 

Mahadev konar commented on ZOOKEEPER-1504:
--

Thawan,
 I was looking at the patch and it looks like you always have one acceptor 
thread. Is one acceptor thread enough when we have 1000's of immediate 
connections to the ZK servers in case of bootstrap or network glitches? Did you 
never see an issue with this?

 Read through the patch as well. Looks good to me otherwise.

 Multi-thread NIOServerCnxn
 --

 Key: ZOOKEEPER-1504
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1504
 Project: ZooKeeper
  Issue Type: Improvement
  Components: server
Affects Versions: 3.4.3, 3.4.4, 3.5.0
Reporter: Jay Shrauner
Assignee: Jay Shrauner
  Labels: performance, scaling
 Fix For: 3.5.0

 Attachments: ZOOKEEPER-1504.patch, ZOOKEEPER-1504.patch, 
 ZOOKEEPER-1504.patch, ZOOKEEPER-1504.patch, ZOOKEEPER-1504.patch


 NIOServerCnxnFactory is single threaded, which doesn't scale well to large 
 numbers of clients. This is particularly noticeable when thousands of clients 
 connect. I propose multi-threading this code as follows:
 - 1   acceptor thread, for accepting new connections
 - 1-N selector threads
 - 0-M I/O worker threads
 Numbers of threads are configurable, with defaults scaling according to 
 number of cores. Communication with the selector threads is handled via 
 LinkedBlockingQueues, and connections are permanently assigned to a 
 particular selector thread so that all potentially blocking SelectionKey 
 operations can be performed solely by the selector thread. An ExecutorService 
 is used for the worker threads.
 On a 32 core machine running Linux 2.6.38, achieved best performance with 4 
 selector threads and 64 worker threads for a 70% +/- 5% improvement in 
 throughput.
 This patch incorporates and supersedes the patches for
 https://issues.apache.org/jira/browse/ZOOKEEPER-517
 https://issues.apache.org/jira/browse/ZOOKEEPER-1444
 New classes introduced in this patch are:
   - ExpiryQueue (from ZOOKEEPER-1444): factor out the logic from 
 SessionTrackerImpl used to expire sessions so that the same logic can be used 
 to expire connections
   - RateLogger (from ZOOKEEPER-517): rate limit error message logging, 
 currently only used to throttle rate of logging out of file descriptors 
 errors
   - WorkerService (also in ZOOKEEPER-1505): ExecutorService wrapper that 
 makes worker threads daemon threads and names then in an easily debuggable 
 manner. Supports assignable threads (as used by CommitProcessor) and 
 non-assignable threads (as used here).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (ZOOKEEPER-1569) support upsert: setData if the node exists, otherwise, create a new node

2012-12-16 Thread Mahadev konar (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13533692#comment-13533692
 ] 

Mahadev konar commented on ZOOKEEPER-1569:
--

Jimmy,
 Can you please explain the semantics of such an operation? What would a return 
value be? When would this operation fail? When would it succeed?

 support upsert: setData if the node exists, otherwise, create a new node
 

 Key: ZOOKEEPER-1569
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1569
 Project: ZooKeeper
  Issue Type: Improvement
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Attachments: zk-1569.patch, zk-1569_v1.1.patch, zk-1569_v2.patch


 Currently, ZooKeeper supports setData and create.  If it can support upsert 
 like in SQL, it will be great.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (ZOOKEEPER-1578) org.apache.zookeeper.server.quorum.Zab1_0Test failed due to hard code with 33556 port

2012-12-16 Thread Mahadev konar (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13533695#comment-13533695
 ] 

Mahadev konar commented on ZOOKEEPER-1578:
--

+1 the patch looks good.

 org.apache.zookeeper.server.quorum.Zab1_0Test failed due to hard code with 
 33556 port
 -

 Key: ZOOKEEPER-1578
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1578
 Project: ZooKeeper
  Issue Type: Bug
Affects Versions: 3.4.3
Reporter: Li Ping Zhang
Assignee: Li Ping Zhang
  Labels: patch
 Attachments: ZOOKEEPER-1578-branch-3.4.patch, 
 ZOOKEEPER-1578-trunk.patch

   Original Estimate: 24h
  Remaining Estimate: 24h

 org.apache.zookeeper.server.quorum.Zab1_0Test was failed both with SUN JDK 
 and open JDK.
 [junit] Running org.apache.zookeeper.server.quorum.Zab1_0Test
 [junit] Tests run: 8, Failures: 0, Errors: 1, Time elapsed: 18.334 sec
 [junit] Test org.apache.zookeeper.server.quorum.Zab1_0Test FAILED 
 Zab1_0Test log:
 Zab1_0Test log:
 2012-07-11 23:17:15,579 [myid:] - INFO  [main:Leader@427] - Shutdown called
 java.lang.Exception: shutdown Leader! reason: end of test
 at org.apache.zookeeper.server.quorum.Leader.shutdown(Leader.java:427)
 at 
 org.apache.zookeeper.server.quorum.Zab1_0Test.testLastAcceptedEpoch(Zab1_0Test.java:211)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:48)
 2012-07-11 23:17:15,584 [myid:] - ERROR [main:Leader@139] - Couldn't bind to 
 port 33556
 java.net.BindException: Address already in use
 at java.net.PlainSocketImpl.bind(PlainSocketImpl.java:402)
 at java.net.ServerSocket.bind(ServerSocket.java:328)
 at java.net.ServerSocket.bind(ServerSocket.java:286)
 at org.apache.zookeeper.server.quorum.Leader.init(Leader.java:137)
 at 
 org.apache.zookeeper.server.quorum.Zab1_0Test.createLeader(Zab1_0Test.java:810)
 at 
 org.apache.zookeeper.server.quorum.Zab1_0Test.testLeaderInElectingFollowers(Zab1_0Test.java:224)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 2012-07-11 23:17:20,202 [myid:] - ERROR 
 [LearnerHandler-bdvm039.svl.ibm.com/9.30.122.48:40153:LearnerHandler@559] - 
 Unex
 pected exception causing shutdown while sock still open
 java.net.SocketTimeoutException: Read timed out
 at java.net.SocketInputStream.read(SocketInputStream.java:129)
 at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
 at java.io.BufferedInputStream.read(BufferedInputStream.java:237)
 at java.io.DataInputStream.readInt(DataInputStream.java:370)
 at 
 org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
 at 
 org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:83)
 at 
 org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:108)
 at 
 org.apache.zookeeper.server.quorum.LearnerHandler.run(LearnerHandler.java:291)
 2012-07-11 23:17:20,203 [myid:] - WARN  
 [LearnerHandler-bdvm039.svl.ibm.com/9.30.122.48:40153:LearnerHandler@569] - 
 
 *** GOODBYE bdvm039.svl.ibm.com/9.30.122.48:40153 
 2012-07-11 23:17:20,204 [myid:] - INFO  [Thread-20:Leader@421] - Shutting down
 2012-07-11 23:17:20,204 [myid:] - INFO  [Thread-20:Leader@427] - Shutdown 
 called
 java.lang.Exception: shutdown Leader! reason: lead ended
 this failure seems 33556 port is already used, but it is not in use with 
 command check in fact. There is a hard code in unit test, we can improve it 
 with code patch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (ZOOKEEPER-1574) mismatched CR/LF endings in text files

2012-12-16 Thread Mahadev konar (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13533708#comment-13533708
 ] 

Mahadev konar commented on ZOOKEEPER-1574:
--

Nikita/Raja,
 So we can just do a prop set and commit then? I tried this:

find * | grep java$ | xargs  svn propset -R svn:eol-style native

and its only changing the properties. Is this all we need to do on 3.4 and 
trunk? This is definitely better than committing the diff.

 mismatched CR/LF endings in text files
 --

 Key: ZOOKEEPER-1574
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1574
 Project: ZooKeeper
  Issue Type: Bug
Reporter: Raja Aluri
Assignee: Raja Aluri
 Attachments: ZOOKEEPER-1574.branch-3.4.patch, 
 ZOOKEEPER-1574.trunk.patch


 Source code in zookeeper repo has a bunch of files that have CRLF endings.
 With more development happening on windows there is a higher chance of more 
 CRLF files getting into the source tree.
 I would like to avoid that by creating .gitattributes file which prevents 
 sources from having CRLF entries in text files.
 But before adding the .gitattributes file we need to normalize the existing 
 tree, so that people when they sync after .giattributes change wont end up 
 with a bunch of modified files in their workspace.
 I am adding a couple of links here to give more primer on what exactly is the 
 issue and how we are trying to fix it.
 [http://git-scm.com/docs/gitattributes#_checking_out_and_checking_in]
 [http://stackoverflow.com/questions/170961/whats-the-best-crlf-handling-strategy-with-git]
 I will submit a separate bug and patch for .gitattributes

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (ZOOKEEPER-1572) Add an async interface for multi request

2012-12-16 Thread Mahadev konar (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mahadev konar updated ZOOKEEPER-1572:
-

Fix Version/s: (was: 3.4.6)

 Add an async interface for multi request
 

 Key: ZOOKEEPER-1572
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1572
 Project: ZooKeeper
  Issue Type: Improvement
  Components: java client
Reporter: Sijie Guo
Assignee: Sijie Guo
 Fix For: 3.5.0

 Attachments: ZOOKEEPER-1572.diff, ZOOKEEPER-1572.diff


 Currently there is no async interface for multi request in ZooKeeper java 
 client.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (ZOOKEEPER-1572) Add an async interface for multi request

2012-12-16 Thread Mahadev konar (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13533710#comment-13533710
 ] 

Mahadev konar commented on ZOOKEEPER-1572:
--

Removing it from 3.4 branch. We shouldnt commit new features in 3.4 branch.

 Add an async interface for multi request
 

 Key: ZOOKEEPER-1572
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1572
 Project: ZooKeeper
  Issue Type: Improvement
  Components: java client
Reporter: Sijie Guo
Assignee: Sijie Guo
 Fix For: 3.5.0, 3.4.6

 Attachments: ZOOKEEPER-1572.diff, ZOOKEEPER-1572.diff


 Currently there is no async interface for multi request in ZooKeeper java 
 client.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (ZOOKEEPER-1572) Add an async interface for multi request

2012-12-16 Thread Mahadev konar (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13533712#comment-13533712
 ] 

Mahadev konar commented on ZOOKEEPER-1572:
--

Flavio/Sejie,
 I am taking a look at this. Might need a day or 2 (maximum until tuesday) to 
review this. 

 Add an async interface for multi request
 

 Key: ZOOKEEPER-1572
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1572
 Project: ZooKeeper
  Issue Type: Improvement
  Components: java client
Reporter: Sijie Guo
Assignee: Sijie Guo
 Fix For: 3.5.0

 Attachments: ZOOKEEPER-1572.diff, ZOOKEEPER-1572.diff


 Currently there is no async interface for multi request in ZooKeeper java 
 client.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (ZOOKEEPER-1557) jenkins jdk7 test failure in testBadSaslAuthNotifiesWatch

2012-10-08 Thread Mahadev konar (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13471676#comment-13471676
 ] 

Mahadev konar commented on ZOOKEEPER-1557:
--

Thanks Eugene .. Interesting

 jenkins jdk7 test failure in testBadSaslAuthNotifiesWatch
 -

 Key: ZOOKEEPER-1557
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1557
 Project: ZooKeeper
  Issue Type: Bug
  Components: server
Affects Versions: 3.5.0, 3.4.5
Reporter: Patrick Hunt
Assignee: Eugene Koontz
 Fix For: 3.5.0, 3.4.6

 Attachments: jstack.out, SaslAuthFailTest.log, ZOOKEEPER-1557.patch


 Failure of testBadSaslAuthNotifiesWatch on the jenkins jdk7 job:
 https://builds.apache.org/job/ZooKeeper-trunk-jdk7/407/
 haven't seen this before.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (ZOOKEEPER-1557) jenkins jdk7 test failure in testBadSaslAuthNotifiesWatch

2012-10-07 Thread Mahadev konar (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mahadev konar updated ZOOKEEPER-1557:
-

Fix Version/s: (was: 3.4.5)
   3.4.6

 jenkins jdk7 test failure in testBadSaslAuthNotifiesWatch
 -

 Key: ZOOKEEPER-1557
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1557
 Project: ZooKeeper
  Issue Type: Bug
  Components: server
Affects Versions: 3.5.0, 3.4.5
Reporter: Patrick Hunt
Assignee: Eugene Koontz
 Fix For: 3.5.0, 3.4.6

 Attachments: jstack.out, SaslAuthFailTest.log, ZOOKEEPER-1557.patch


 Failure of testBadSaslAuthNotifiesWatch on the jenkins jdk7 job:
 https://builds.apache.org/job/ZooKeeper-trunk-jdk7/407/
 haven't seen this before.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Comment Edited] (ZOOKEEPER-1557) jenkins jdk7 test failure in testBadSaslAuthNotifiesWatch

2012-10-07 Thread Mahadev konar (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13471403#comment-13471403
 ] 

Mahadev konar edited comment on ZOOKEEPER-1557 at 10/8/12 5:04 AM:
---

Thanks Eugene for taking a look at it. Given your analysis above it doesnt look 
like we have a full knowledge of whats causing the issue. Given that this is 
not SASL related and could be related to how our test framework runs, I think 
we can move this out to 3.4.6 and get 3.4.5 out the door with what we have now. 
What do you think?

  was (Author: mahadev):
Thanks Eugene for taking a look at it. Given your any analysis above it 
doesnt look like we have a full knowledge of whats causing the issue. Given 
that this is not SASL related and could be related to how our test framework 
runs, I think we can move this out to 3.4.6 and get 3.4.5 out the door with 
what we have now. What do you think?
  
 jenkins jdk7 test failure in testBadSaslAuthNotifiesWatch
 -

 Key: ZOOKEEPER-1557
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1557
 Project: ZooKeeper
  Issue Type: Bug
  Components: server
Affects Versions: 3.5.0, 3.4.5
Reporter: Patrick Hunt
Assignee: Eugene Koontz
 Fix For: 3.5.0, 3.4.6

 Attachments: jstack.out, SaslAuthFailTest.log, ZOOKEEPER-1557.patch


 Failure of testBadSaslAuthNotifiesWatch on the jenkins jdk7 job:
 https://builds.apache.org/job/ZooKeeper-trunk-jdk7/407/
 haven't seen this before.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (ZOOKEEPER-1477) Test failures with Java 7 on Mac OS X

2012-09-28 Thread Mahadev konar (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mahadev konar updated ZOOKEEPER-1477:
-

Priority: Major  (was: Blocker)

Downgrading to Major given the recent updates on this jira.

 Test failures with Java 7 on Mac OS X
 -

 Key: ZOOKEEPER-1477
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1477
 Project: ZooKeeper
  Issue Type: Bug
  Components: server, tests
Affects Versions: 3.4.3
 Environment: Mac OS X Lion (10.7.4)
 Java version:
 java version 1.7.0_04
 Java(TM) SE Runtime Environment (build 1.7.0_04-b21)
 Java HotSpot(TM) 64-Bit Server VM (build 23.0-b21, mixed mode)
Reporter: Diwaker Gupta
 Fix For: 3.4.6

 Attachments: with-ZK-1550.txt


 I downloaded ZK 3.4.3 sources and ran {{ant test}}. Many of the tests failed, 
 including ZooKeeperTest. A common symptom was spurious 
 {{ConnectionLossException}}:
 {code}
 2012-06-01 12:01:23,420 [myid:] - INFO  
 [main:JUnit4ZKTestRunner$LoggedInvokeMethod@54] - TEST METHOD FAILED 
 testDeleteRecursiveAsync
 org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode 
 = ConnectionLoss for /
 at 
 org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
 at 
 org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
 at org.apache.zookeeper.ZooKeeper.setData(ZooKeeper.java:1246)
 at 
 org.apache.zookeeper.ZooKeeperTest.testDeleteRecursiveAsync(ZooKeeperTest.java:77)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 ... (snipped)
 {code}
 As background, I was actually investigating some non-deterministic failures 
 when using Netflix's Curator with Java 7 (see 
 https://github.com/Netflix/curator/issues/79). After a while, I figured I 
 should establish a clean ZK baseline first and realized it is actually a ZK 
 issue, not a Curator issue.
 We are trying to migrate to Java 7 but this is a blocking issue for us right 
 now.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (ZOOKEEPER-1477) Test failures with Java 7 on Mac OS X

2012-09-27 Thread Mahadev konar (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13465077#comment-13465077
 ] 

Mahadev konar commented on ZOOKEEPER-1477:
--

Diwaker, 
 Would you be able to run the tests along with Eugenes patch on  ZOOKEEPER-1550 
? If not please let me know. I can go ahead and run it.



 Test failures with Java 7 on Mac OS X
 -

 Key: ZOOKEEPER-1477
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1477
 Project: ZooKeeper
  Issue Type: Bug
  Components: server, tests
Affects Versions: 3.4.3
 Environment: Mac OS X Lion (10.7.4)
 Java version:
 java version 1.7.0_04
 Java(TM) SE Runtime Environment (build 1.7.0_04-b21)
 Java HotSpot(TM) 64-Bit Server VM (build 23.0-b21, mixed mode)
Reporter: Diwaker Gupta
Priority: Blocker
 Fix For: 3.4.5


 I downloaded ZK 3.4.3 sources and ran {{ant test}}. Many of the tests failed, 
 including ZooKeeperTest. A common symptom was spurious 
 {{ConnectionLossException}}:
 {code}
 2012-06-01 12:01:23,420 [myid:] - INFO  
 [main:JUnit4ZKTestRunner$LoggedInvokeMethod@54] - TEST METHOD FAILED 
 testDeleteRecursiveAsync
 org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode 
 = ConnectionLoss for /
 at 
 org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
 at 
 org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
 at org.apache.zookeeper.ZooKeeper.setData(ZooKeeper.java:1246)
 at 
 org.apache.zookeeper.ZooKeeperTest.testDeleteRecursiveAsync(ZooKeeperTest.java:77)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 ... (snipped)
 {code}
 As background, I was actually investigating some non-deterministic failures 
 when using Netflix's Curator with Java 7 (see 
 https://github.com/Netflix/curator/issues/79). After a while, I figured I 
 should establish a clean ZK baseline first and realized it is actually a ZK 
 issue, not a Curator issue.
 We are trying to migrate to Java 7 but this is a blocking issue for us right 
 now.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (ZOOKEEPER-1477) Test failures with Java 7 on Mac OS X

2012-09-27 Thread Mahadev konar (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13465097#comment-13465097
 ] 

Mahadev konar commented on ZOOKEEPER-1477:
--

Thanks Diwaker. Could you please upload a summary of the tests failing and the 
logs as well?



 Test failures with Java 7 on Mac OS X
 -

 Key: ZOOKEEPER-1477
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1477
 Project: ZooKeeper
  Issue Type: Bug
  Components: server, tests
Affects Versions: 3.4.3
 Environment: Mac OS X Lion (10.7.4)
 Java version:
 java version 1.7.0_04
 Java(TM) SE Runtime Environment (build 1.7.0_04-b21)
 Java HotSpot(TM) 64-Bit Server VM (build 23.0-b21, mixed mode)
Reporter: Diwaker Gupta
Priority: Blocker
 Fix For: 3.4.5


 I downloaded ZK 3.4.3 sources and ran {{ant test}}. Many of the tests failed, 
 including ZooKeeperTest. A common symptom was spurious 
 {{ConnectionLossException}}:
 {code}
 2012-06-01 12:01:23,420 [myid:] - INFO  
 [main:JUnit4ZKTestRunner$LoggedInvokeMethod@54] - TEST METHOD FAILED 
 testDeleteRecursiveAsync
 org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode 
 = ConnectionLoss for /
 at 
 org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
 at 
 org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
 at org.apache.zookeeper.ZooKeeper.setData(ZooKeeper.java:1246)
 at 
 org.apache.zookeeper.ZooKeeperTest.testDeleteRecursiveAsync(ZooKeeperTest.java:77)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 ... (snipped)
 {code}
 As background, I was actually investigating some non-deterministic failures 
 when using Netflix's Curator with Java 7 (see 
 https://github.com/Netflix/curator/issues/79). After a while, I figured I 
 should establish a clean ZK baseline first and realized it is actually a ZK 
 issue, not a Curator issue.
 We are trying to migrate to Java 7 but this is a blocking issue for us right 
 now.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (ZOOKEEPER-1477) Test failures with Java 7 on Mac OS X

2012-09-27 Thread Mahadev konar (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13465108#comment-13465108
 ] 

Mahadev konar commented on ZOOKEEPER-1477:
--

Diwaker,
 The usual time on a linux box is around 40 mins or so.



 Test failures with Java 7 on Mac OS X
 -

 Key: ZOOKEEPER-1477
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1477
 Project: ZooKeeper
  Issue Type: Bug
  Components: server, tests
Affects Versions: 3.4.3
 Environment: Mac OS X Lion (10.7.4)
 Java version:
 java version 1.7.0_04
 Java(TM) SE Runtime Environment (build 1.7.0_04-b21)
 Java HotSpot(TM) 64-Bit Server VM (build 23.0-b21, mixed mode)
Reporter: Diwaker Gupta
Priority: Blocker
 Fix For: 3.4.5


 I downloaded ZK 3.4.3 sources and ran {{ant test}}. Many of the tests failed, 
 including ZooKeeperTest. A common symptom was spurious 
 {{ConnectionLossException}}:
 {code}
 2012-06-01 12:01:23,420 [myid:] - INFO  
 [main:JUnit4ZKTestRunner$LoggedInvokeMethod@54] - TEST METHOD FAILED 
 testDeleteRecursiveAsync
 org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode 
 = ConnectionLoss for /
 at 
 org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
 at 
 org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
 at org.apache.zookeeper.ZooKeeper.setData(ZooKeeper.java:1246)
 at 
 org.apache.zookeeper.ZooKeeperTest.testDeleteRecursiveAsync(ZooKeeperTest.java:77)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 ... (snipped)
 {code}
 As background, I was actually investigating some non-deterministic failures 
 when using Netflix's Curator with Java 7 (see 
 https://github.com/Netflix/curator/issues/79). After a while, I figured I 
 should establish a clean ZK baseline first and realized it is actually a ZK 
 issue, not a Curator issue.
 We are trying to migrate to Java 7 but this is a blocking issue for us right 
 now.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (ZOOKEEPER-1477) Test failures with Java 7 on Mac OS X

2012-09-27 Thread Mahadev konar (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13465243#comment-13465243
 ] 

Mahadev konar commented on ZOOKEEPER-1477:
--

Thats fine Diwaker. Ill downgrade this jira to a major and mark it for the next 
release. We can just ship 3.4.5 with fix for ZOOKEEPER-1550.
 
Itll be good to upload the tests logs for those that fail but its not urgent. 
We can do it later for 3.4.6.

Thanks.

 Test failures with Java 7 on Mac OS X
 -

 Key: ZOOKEEPER-1477
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1477
 Project: ZooKeeper
  Issue Type: Bug
  Components: server, tests
Affects Versions: 3.4.3
 Environment: Mac OS X Lion (10.7.4)
 Java version:
 java version 1.7.0_04
 Java(TM) SE Runtime Environment (build 1.7.0_04-b21)
 Java HotSpot(TM) 64-Bit Server VM (build 23.0-b21, mixed mode)
Reporter: Diwaker Gupta
Priority: Blocker
 Fix For: 3.4.5

 Attachments: with-ZK-1550.txt


 I downloaded ZK 3.4.3 sources and ran {{ant test}}. Many of the tests failed, 
 including ZooKeeperTest. A common symptom was spurious 
 {{ConnectionLossException}}:
 {code}
 2012-06-01 12:01:23,420 [myid:] - INFO  
 [main:JUnit4ZKTestRunner$LoggedInvokeMethod@54] - TEST METHOD FAILED 
 testDeleteRecursiveAsync
 org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode 
 = ConnectionLoss for /
 at 
 org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
 at 
 org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
 at org.apache.zookeeper.ZooKeeper.setData(ZooKeeper.java:1246)
 at 
 org.apache.zookeeper.ZooKeeperTest.testDeleteRecursiveAsync(ZooKeeperTest.java:77)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 ... (snipped)
 {code}
 As background, I was actually investigating some non-deterministic failures 
 when using Netflix's Curator with Java 7 (see 
 https://github.com/Netflix/curator/issues/79). After a while, I figured I 
 should establish a clean ZK baseline first and realized it is actually a ZK 
 issue, not a Curator issue.
 We are trying to migrate to Java 7 but this is a blocking issue for us right 
 now.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (ZOOKEEPER-1550) ZooKeeperSaslClient does not finish anonymous login on OpenJDK

2012-09-26 Thread Mahadev konar (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1550?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mahadev konar updated ZOOKEEPER-1550:
-

Fix Version/s: 3.4.5

 ZooKeeperSaslClient does not finish anonymous login on OpenJDK
 --

 Key: ZOOKEEPER-1550
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1550
 Project: ZooKeeper
  Issue Type: Bug
  Components: java client
Affects Versions: 3.4.4
Reporter: Robert Macomber
 Fix For: 3.4.5


 On OpenJDK, {{javax.security.auth.login.Configuration.getConfiguration}} does 
 not throw an exception.  
 {{ZooKeeperSaslClient.clientTunneledAuthenticationInProgress}} uses an 
 exception from that method as a proxy for this client is not configured to 
 use SASL and as a result no commands can be sent, since it is still waiting 
 for auth to complete.
 [Link to mailing list 
 discussion|http://comments.gmane.org/gmane.comp.java.zookeeper.user/2667]
 The relevant bit of logs from OpenJDK and Oracle versions of 'connect and do 
 getChildren(/)':
 {code:title=OpenJDK}
 INFO [main] 2012-09-25 14:02:24,545 com.socrata.Main Waiting for connection...
 DEBUG [main] 2012-09-25 14:02:24,548 com.socrata.zookeeper.ZooKeeperProvider 
 Waiting for connected-state...
 INFO [main-SendThread(mike.local:2181)] 2012-09-25 14:02:24,576 
 org.apache.zookeeper.ClientCnxn Opening socket connection to server 
 mike.local/10.0.2.106:2181. Will not attempt to authenticate using SASL 
 (unknown error)
 INFO [main-SendThread(mike.local:2181)] 2012-09-25 14:02:24,584 
 org.apache.zookeeper.ClientCnxn Socket connection established to 
 mike.local/10.0.2.106:2181, initiating session
 DEBUG [main-SendThread(mike.local:2181)] 2012-09-25 14:02:24,586 
 org.apache.zookeeper.ClientCnxn Session establishment request sent on 
 mike.local/10.0.2.106:2181
 INFO [main-SendThread(mike.local:2181)] 2012-09-25 14:02:24,600 
 org.apache.zookeeper.ClientCnxn Session establishment complete on server 
 mike.local/10.0.2.106:2181, sessionid = 0x139ff2e85b60005, negotiated timeout 
 = 4
 DEBUG [main-EventThread] 2012-09-25 14:02:24,614 
 com.socrata.zookeeper.ZooKeeperProvider ConnectionStateChanged(Connected)
 DEBUG [main-SendThread(mike.local:2181)] 2012-09-25 14:02:24,636 
 org.apache.zookeeper.ClientCnxnSocketNIO deferring non-priming packet: 
 clientPath:/ serverPath:/ finished:false header:: 0,12  replyHeader:: 0,0,0 
 request:: '/,F  response:: v{} until SASL authentication completes.
 DEBUG [main-SendThread(mike.local:2181)] 2012-09-25 14:02:37,923 
 org.apache.zookeeper.ClientCnxnSocketNIO deferring non-priming packet: 
 clientPath:/ serverPath:/ finished:false header:: 0,12  replyHeader:: 0,0,0 
 request:: '/,F  response:: v{} until SASL authentication completes.
 DEBUG [main-SendThread(mike.local:2181)] 2012-09-25 14:02:37,924 
 org.apache.zookeeper.ClientCnxnSocketNIO deferring non-priming packet: 
 clientPath:/ serverPath:/ finished:false header:: 0,12  replyHeader:: 0,0,0 
 request:: '/,F  response:: v{} until SASL authentication completes.
 DEBUG [main-SendThread(mike.local:2181)] 2012-09-25 14:02:37,924 
 org.apache.zookeeper.ClientCnxnSocketNIO deferring non-priming packet: 
 clientPath:null serverPath:null finished:false header:: -2,11  replyHeader:: 
 null request:: null response:: nulluntil SASL authentication completes.
 DEBUG [main-SendThread(mike.local:2181)] 2012-09-25 14:02:51,260 
 org.apache.zookeeper.ClientCnxnSocketNIO deferring non-priming packet: 
 clientPath:/ serverPath:/ finished:false header:: 0,12  replyHeader:: 0,0,0 
 request:: '/,F  response:: v{} until SASL authentication completes.
 DEBUG [main-SendThread(mike.local:2181)] 2012-09-25 14:02:51,260 
 org.apache.zookeeper.ClientCnxnSocketNIO deferring non-priming packet: 
 clientPath:null serverPath:null finished:false header:: -2,11  replyHeader:: 
 null request:: null response:: nulluntil SASL authentication completes.
 DEBUG [main-SendThread(mike.local:2181)] 2012-09-25 14:02:51,261 
 org.apache.zookeeper.ClientCnxnSocketNIO deferring non-priming packet: 
 clientPath:/ serverPath:/ finished:false header:: 0,12  replyHeader:: 0,0,0 
 request:: '/,F  response:: v{} until SASL authentication completes.
 DEBUG [main-SendThread(mike.local:2181)] 2012-09-25 14:02:51,261 
 org.apache.zookeeper.ClientCnxnSocketNIO deferring non-priming packet: 
 clientPath:null serverPath:null finished:false header:: -2,11  replyHeader:: 
 null request:: null response:: nulluntil SASL authentication completes.
 DEBUG [main-SendThread(mike.local:2181)] 2012-09-25 14:02:51,261 
 org.apache.zookeeper.ClientCnxnSocketNIO deferring non-priming packet: 
 clientPath:null serverPath:null finished:false header:: -2,11  replyHeader:: 
 null request:: null response:: nulluntil SASL authentication completes.
 DEBUG 

[jira] [Updated] (ZOOKEEPER-1550) ZooKeeperSaslClient does not finish anonymous login on OpenJDK

2012-09-26 Thread Mahadev konar (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1550?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mahadev konar updated ZOOKEEPER-1550:
-

Priority: Blocker  (was: Major)

 ZooKeeperSaslClient does not finish anonymous login on OpenJDK
 --

 Key: ZOOKEEPER-1550
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1550
 Project: ZooKeeper
  Issue Type: Bug
  Components: java client
Affects Versions: 3.4.4
Reporter: Robert Macomber
Priority: Blocker
 Fix For: 3.4.5


 On OpenJDK, {{javax.security.auth.login.Configuration.getConfiguration}} does 
 not throw an exception.  
 {{ZooKeeperSaslClient.clientTunneledAuthenticationInProgress}} uses an 
 exception from that method as a proxy for this client is not configured to 
 use SASL and as a result no commands can be sent, since it is still waiting 
 for auth to complete.
 [Link to mailing list 
 discussion|http://comments.gmane.org/gmane.comp.java.zookeeper.user/2667]
 The relevant bit of logs from OpenJDK and Oracle versions of 'connect and do 
 getChildren(/)':
 {code:title=OpenJDK}
 INFO [main] 2012-09-25 14:02:24,545 com.socrata.Main Waiting for connection...
 DEBUG [main] 2012-09-25 14:02:24,548 com.socrata.zookeeper.ZooKeeperProvider 
 Waiting for connected-state...
 INFO [main-SendThread(mike.local:2181)] 2012-09-25 14:02:24,576 
 org.apache.zookeeper.ClientCnxn Opening socket connection to server 
 mike.local/10.0.2.106:2181. Will not attempt to authenticate using SASL 
 (unknown error)
 INFO [main-SendThread(mike.local:2181)] 2012-09-25 14:02:24,584 
 org.apache.zookeeper.ClientCnxn Socket connection established to 
 mike.local/10.0.2.106:2181, initiating session
 DEBUG [main-SendThread(mike.local:2181)] 2012-09-25 14:02:24,586 
 org.apache.zookeeper.ClientCnxn Session establishment request sent on 
 mike.local/10.0.2.106:2181
 INFO [main-SendThread(mike.local:2181)] 2012-09-25 14:02:24,600 
 org.apache.zookeeper.ClientCnxn Session establishment complete on server 
 mike.local/10.0.2.106:2181, sessionid = 0x139ff2e85b60005, negotiated timeout 
 = 4
 DEBUG [main-EventThread] 2012-09-25 14:02:24,614 
 com.socrata.zookeeper.ZooKeeperProvider ConnectionStateChanged(Connected)
 DEBUG [main-SendThread(mike.local:2181)] 2012-09-25 14:02:24,636 
 org.apache.zookeeper.ClientCnxnSocketNIO deferring non-priming packet: 
 clientPath:/ serverPath:/ finished:false header:: 0,12  replyHeader:: 0,0,0 
 request:: '/,F  response:: v{} until SASL authentication completes.
 DEBUG [main-SendThread(mike.local:2181)] 2012-09-25 14:02:37,923 
 org.apache.zookeeper.ClientCnxnSocketNIO deferring non-priming packet: 
 clientPath:/ serverPath:/ finished:false header:: 0,12  replyHeader:: 0,0,0 
 request:: '/,F  response:: v{} until SASL authentication completes.
 DEBUG [main-SendThread(mike.local:2181)] 2012-09-25 14:02:37,924 
 org.apache.zookeeper.ClientCnxnSocketNIO deferring non-priming packet: 
 clientPath:/ serverPath:/ finished:false header:: 0,12  replyHeader:: 0,0,0 
 request:: '/,F  response:: v{} until SASL authentication completes.
 DEBUG [main-SendThread(mike.local:2181)] 2012-09-25 14:02:37,924 
 org.apache.zookeeper.ClientCnxnSocketNIO deferring non-priming packet: 
 clientPath:null serverPath:null finished:false header:: -2,11  replyHeader:: 
 null request:: null response:: nulluntil SASL authentication completes.
 DEBUG [main-SendThread(mike.local:2181)] 2012-09-25 14:02:51,260 
 org.apache.zookeeper.ClientCnxnSocketNIO deferring non-priming packet: 
 clientPath:/ serverPath:/ finished:false header:: 0,12  replyHeader:: 0,0,0 
 request:: '/,F  response:: v{} until SASL authentication completes.
 DEBUG [main-SendThread(mike.local:2181)] 2012-09-25 14:02:51,260 
 org.apache.zookeeper.ClientCnxnSocketNIO deferring non-priming packet: 
 clientPath:null serverPath:null finished:false header:: -2,11  replyHeader:: 
 null request:: null response:: nulluntil SASL authentication completes.
 DEBUG [main-SendThread(mike.local:2181)] 2012-09-25 14:02:51,261 
 org.apache.zookeeper.ClientCnxnSocketNIO deferring non-priming packet: 
 clientPath:/ serverPath:/ finished:false header:: 0,12  replyHeader:: 0,0,0 
 request:: '/,F  response:: v{} until SASL authentication completes.
 DEBUG [main-SendThread(mike.local:2181)] 2012-09-25 14:02:51,261 
 org.apache.zookeeper.ClientCnxnSocketNIO deferring non-priming packet: 
 clientPath:null serverPath:null finished:false header:: -2,11  replyHeader:: 
 null request:: null response:: nulluntil SASL authentication completes.
 DEBUG [main-SendThread(mike.local:2181)] 2012-09-25 14:02:51,261 
 org.apache.zookeeper.ClientCnxnSocketNIO deferring non-priming packet: 
 clientPath:null serverPath:null finished:false header:: -2,11  replyHeader:: 
 null request:: null response:: nulluntil SASL authentication 

[jira] [Commented] (ZOOKEEPER-1550) ZooKeeperSaslClient does not finish anonymous login on OpenJDK

2012-09-26 Thread Mahadev konar (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13464339#comment-13464339
 ] 

Mahadev konar commented on ZOOKEEPER-1550:
--

Thanks Eugene.

Robert, can you verify this patch as well? 

Thanks

 ZooKeeperSaslClient does not finish anonymous login on OpenJDK
 --

 Key: ZOOKEEPER-1550
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1550
 Project: ZooKeeper
  Issue Type: Bug
  Components: java client
Affects Versions: 3.4.4
Reporter: Robert Macomber
Assignee: Eugene Koontz
Priority: Blocker
 Fix For: 3.4.5

 Attachments: ZOOKEEPER-1550.patch


 On OpenJDK, {{javax.security.auth.login.Configuration.getConfiguration}} does 
 not throw an exception.  
 {{ZooKeeperSaslClient.clientTunneledAuthenticationInProgress}} uses an 
 exception from that method as a proxy for this client is not configured to 
 use SASL and as a result no commands can be sent, since it is still waiting 
 for auth to complete.
 [Link to mailing list 
 discussion|http://comments.gmane.org/gmane.comp.java.zookeeper.user/2667]
 The relevant bit of logs from OpenJDK and Oracle versions of 'connect and do 
 getChildren(/)':
 {code:title=OpenJDK}
 INFO [main] 2012-09-25 14:02:24,545 com.socrata.Main Waiting for connection...
 DEBUG [main] 2012-09-25 14:02:24,548 com.socrata.zookeeper.ZooKeeperProvider 
 Waiting for connected-state...
 INFO [main-SendThread(mike.local:2181)] 2012-09-25 14:02:24,576 
 org.apache.zookeeper.ClientCnxn Opening socket connection to server 
 mike.local/10.0.2.106:2181. Will not attempt to authenticate using SASL 
 (unknown error)
 INFO [main-SendThread(mike.local:2181)] 2012-09-25 14:02:24,584 
 org.apache.zookeeper.ClientCnxn Socket connection established to 
 mike.local/10.0.2.106:2181, initiating session
 DEBUG [main-SendThread(mike.local:2181)] 2012-09-25 14:02:24,586 
 org.apache.zookeeper.ClientCnxn Session establishment request sent on 
 mike.local/10.0.2.106:2181
 INFO [main-SendThread(mike.local:2181)] 2012-09-25 14:02:24,600 
 org.apache.zookeeper.ClientCnxn Session establishment complete on server 
 mike.local/10.0.2.106:2181, sessionid = 0x139ff2e85b60005, negotiated timeout 
 = 4
 DEBUG [main-EventThread] 2012-09-25 14:02:24,614 
 com.socrata.zookeeper.ZooKeeperProvider ConnectionStateChanged(Connected)
 DEBUG [main-SendThread(mike.local:2181)] 2012-09-25 14:02:24,636 
 org.apache.zookeeper.ClientCnxnSocketNIO deferring non-priming packet: 
 clientPath:/ serverPath:/ finished:false header:: 0,12  replyHeader:: 0,0,0 
 request:: '/,F  response:: v{} until SASL authentication completes.
 DEBUG [main-SendThread(mike.local:2181)] 2012-09-25 14:02:37,923 
 org.apache.zookeeper.ClientCnxnSocketNIO deferring non-priming packet: 
 clientPath:/ serverPath:/ finished:false header:: 0,12  replyHeader:: 0,0,0 
 request:: '/,F  response:: v{} until SASL authentication completes.
 DEBUG [main-SendThread(mike.local:2181)] 2012-09-25 14:02:37,924 
 org.apache.zookeeper.ClientCnxnSocketNIO deferring non-priming packet: 
 clientPath:/ serverPath:/ finished:false header:: 0,12  replyHeader:: 0,0,0 
 request:: '/,F  response:: v{} until SASL authentication completes.
 DEBUG [main-SendThread(mike.local:2181)] 2012-09-25 14:02:37,924 
 org.apache.zookeeper.ClientCnxnSocketNIO deferring non-priming packet: 
 clientPath:null serverPath:null finished:false header:: -2,11  replyHeader:: 
 null request:: null response:: nulluntil SASL authentication completes.
 DEBUG [main-SendThread(mike.local:2181)] 2012-09-25 14:02:51,260 
 org.apache.zookeeper.ClientCnxnSocketNIO deferring non-priming packet: 
 clientPath:/ serverPath:/ finished:false header:: 0,12  replyHeader:: 0,0,0 
 request:: '/,F  response:: v{} until SASL authentication completes.
 DEBUG [main-SendThread(mike.local:2181)] 2012-09-25 14:02:51,260 
 org.apache.zookeeper.ClientCnxnSocketNIO deferring non-priming packet: 
 clientPath:null serverPath:null finished:false header:: -2,11  replyHeader:: 
 null request:: null response:: nulluntil SASL authentication completes.
 DEBUG [main-SendThread(mike.local:2181)] 2012-09-25 14:02:51,261 
 org.apache.zookeeper.ClientCnxnSocketNIO deferring non-priming packet: 
 clientPath:/ serverPath:/ finished:false header:: 0,12  replyHeader:: 0,0,0 
 request:: '/,F  response:: v{} until SASL authentication completes.
 DEBUG [main-SendThread(mike.local:2181)] 2012-09-25 14:02:51,261 
 org.apache.zookeeper.ClientCnxnSocketNIO deferring non-priming packet: 
 clientPath:null serverPath:null finished:false header:: -2,11  replyHeader:: 
 null request:: null response:: nulluntil SASL authentication completes.
 DEBUG [main-SendThread(mike.local:2181)] 2012-09-25 14:02:51,261 
 

[jira] [Commented] (ZOOKEEPER-1550) ZooKeeperSaslClient does not finish anonymous login on OpenJDK

2012-09-26 Thread Mahadev konar (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13464357#comment-13464357
 ] 

Mahadev konar commented on ZOOKEEPER-1550:
--

Awesome, Ill check this in and kick of the builds on jdk 7 and see if it all 
works.


 ZooKeeperSaslClient does not finish anonymous login on OpenJDK
 --

 Key: ZOOKEEPER-1550
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1550
 Project: ZooKeeper
  Issue Type: Bug
  Components: java client
Affects Versions: 3.4.4
Reporter: Robert Macomber
Assignee: Eugene Koontz
Priority: Blocker
 Fix For: 3.4.5

 Attachments: ZOOKEEPER-1550.patch


 On OpenJDK, {{javax.security.auth.login.Configuration.getConfiguration}} does 
 not throw an exception.  
 {{ZooKeeperSaslClient.clientTunneledAuthenticationInProgress}} uses an 
 exception from that method as a proxy for this client is not configured to 
 use SASL and as a result no commands can be sent, since it is still waiting 
 for auth to complete.
 [Link to mailing list 
 discussion|http://comments.gmane.org/gmane.comp.java.zookeeper.user/2667]
 The relevant bit of logs from OpenJDK and Oracle versions of 'connect and do 
 getChildren(/)':
 {code:title=OpenJDK}
 INFO [main] 2012-09-25 14:02:24,545 com.socrata.Main Waiting for connection...
 DEBUG [main] 2012-09-25 14:02:24,548 com.socrata.zookeeper.ZooKeeperProvider 
 Waiting for connected-state...
 INFO [main-SendThread(mike.local:2181)] 2012-09-25 14:02:24,576 
 org.apache.zookeeper.ClientCnxn Opening socket connection to server 
 mike.local/10.0.2.106:2181. Will not attempt to authenticate using SASL 
 (unknown error)
 INFO [main-SendThread(mike.local:2181)] 2012-09-25 14:02:24,584 
 org.apache.zookeeper.ClientCnxn Socket connection established to 
 mike.local/10.0.2.106:2181, initiating session
 DEBUG [main-SendThread(mike.local:2181)] 2012-09-25 14:02:24,586 
 org.apache.zookeeper.ClientCnxn Session establishment request sent on 
 mike.local/10.0.2.106:2181
 INFO [main-SendThread(mike.local:2181)] 2012-09-25 14:02:24,600 
 org.apache.zookeeper.ClientCnxn Session establishment complete on server 
 mike.local/10.0.2.106:2181, sessionid = 0x139ff2e85b60005, negotiated timeout 
 = 4
 DEBUG [main-EventThread] 2012-09-25 14:02:24,614 
 com.socrata.zookeeper.ZooKeeperProvider ConnectionStateChanged(Connected)
 DEBUG [main-SendThread(mike.local:2181)] 2012-09-25 14:02:24,636 
 org.apache.zookeeper.ClientCnxnSocketNIO deferring non-priming packet: 
 clientPath:/ serverPath:/ finished:false header:: 0,12  replyHeader:: 0,0,0 
 request:: '/,F  response:: v{} until SASL authentication completes.
 DEBUG [main-SendThread(mike.local:2181)] 2012-09-25 14:02:37,923 
 org.apache.zookeeper.ClientCnxnSocketNIO deferring non-priming packet: 
 clientPath:/ serverPath:/ finished:false header:: 0,12  replyHeader:: 0,0,0 
 request:: '/,F  response:: v{} until SASL authentication completes.
 DEBUG [main-SendThread(mike.local:2181)] 2012-09-25 14:02:37,924 
 org.apache.zookeeper.ClientCnxnSocketNIO deferring non-priming packet: 
 clientPath:/ serverPath:/ finished:false header:: 0,12  replyHeader:: 0,0,0 
 request:: '/,F  response:: v{} until SASL authentication completes.
 DEBUG [main-SendThread(mike.local:2181)] 2012-09-25 14:02:37,924 
 org.apache.zookeeper.ClientCnxnSocketNIO deferring non-priming packet: 
 clientPath:null serverPath:null finished:false header:: -2,11  replyHeader:: 
 null request:: null response:: nulluntil SASL authentication completes.
 DEBUG [main-SendThread(mike.local:2181)] 2012-09-25 14:02:51,260 
 org.apache.zookeeper.ClientCnxnSocketNIO deferring non-priming packet: 
 clientPath:/ serverPath:/ finished:false header:: 0,12  replyHeader:: 0,0,0 
 request:: '/,F  response:: v{} until SASL authentication completes.
 DEBUG [main-SendThread(mike.local:2181)] 2012-09-25 14:02:51,260 
 org.apache.zookeeper.ClientCnxnSocketNIO deferring non-priming packet: 
 clientPath:null serverPath:null finished:false header:: -2,11  replyHeader:: 
 null request:: null response:: nulluntil SASL authentication completes.
 DEBUG [main-SendThread(mike.local:2181)] 2012-09-25 14:02:51,261 
 org.apache.zookeeper.ClientCnxnSocketNIO deferring non-priming packet: 
 clientPath:/ serverPath:/ finished:false header:: 0,12  replyHeader:: 0,0,0 
 request:: '/,F  response:: v{} until SASL authentication completes.
 DEBUG [main-SendThread(mike.local:2181)] 2012-09-25 14:02:51,261 
 org.apache.zookeeper.ClientCnxnSocketNIO deferring non-priming packet: 
 clientPath:null serverPath:null finished:false header:: -2,11  replyHeader:: 
 null request:: null response:: nulluntil SASL authentication completes.
 DEBUG [main-SendThread(mike.local:2181)] 2012-09-25 14:02:51,261 
 

[jira] [Commented] (ZOOKEEPER-1550) ZooKeeperSaslClient does not finish anonymous login on OpenJDK

2012-09-26 Thread Mahadev konar (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13464370#comment-13464370
 ] 

Mahadev konar commented on ZOOKEEPER-1550:
--

Eugene,
 Looks like the sasl test failed. Can you please take a look?

 ZooKeeperSaslClient does not finish anonymous login on OpenJDK
 --

 Key: ZOOKEEPER-1550
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1550
 Project: ZooKeeper
  Issue Type: Bug
  Components: java client
Affects Versions: 3.4.4
Reporter: Robert Macomber
Assignee: Eugene Koontz
Priority: Blocker
 Fix For: 3.4.5

 Attachments: ZOOKEEPER-1550.patch


 On OpenJDK, {{javax.security.auth.login.Configuration.getConfiguration}} does 
 not throw an exception.  
 {{ZooKeeperSaslClient.clientTunneledAuthenticationInProgress}} uses an 
 exception from that method as a proxy for this client is not configured to 
 use SASL and as a result no commands can be sent, since it is still waiting 
 for auth to complete.
 [Link to mailing list 
 discussion|http://comments.gmane.org/gmane.comp.java.zookeeper.user/2667]
 The relevant bit of logs from OpenJDK and Oracle versions of 'connect and do 
 getChildren(/)':
 {code:title=OpenJDK}
 INFO [main] 2012-09-25 14:02:24,545 com.socrata.Main Waiting for connection...
 DEBUG [main] 2012-09-25 14:02:24,548 com.socrata.zookeeper.ZooKeeperProvider 
 Waiting for connected-state...
 INFO [main-SendThread(mike.local:2181)] 2012-09-25 14:02:24,576 
 org.apache.zookeeper.ClientCnxn Opening socket connection to server 
 mike.local/10.0.2.106:2181. Will not attempt to authenticate using SASL 
 (unknown error)
 INFO [main-SendThread(mike.local:2181)] 2012-09-25 14:02:24,584 
 org.apache.zookeeper.ClientCnxn Socket connection established to 
 mike.local/10.0.2.106:2181, initiating session
 DEBUG [main-SendThread(mike.local:2181)] 2012-09-25 14:02:24,586 
 org.apache.zookeeper.ClientCnxn Session establishment request sent on 
 mike.local/10.0.2.106:2181
 INFO [main-SendThread(mike.local:2181)] 2012-09-25 14:02:24,600 
 org.apache.zookeeper.ClientCnxn Session establishment complete on server 
 mike.local/10.0.2.106:2181, sessionid = 0x139ff2e85b60005, negotiated timeout 
 = 4
 DEBUG [main-EventThread] 2012-09-25 14:02:24,614 
 com.socrata.zookeeper.ZooKeeperProvider ConnectionStateChanged(Connected)
 DEBUG [main-SendThread(mike.local:2181)] 2012-09-25 14:02:24,636 
 org.apache.zookeeper.ClientCnxnSocketNIO deferring non-priming packet: 
 clientPath:/ serverPath:/ finished:false header:: 0,12  replyHeader:: 0,0,0 
 request:: '/,F  response:: v{} until SASL authentication completes.
 DEBUG [main-SendThread(mike.local:2181)] 2012-09-25 14:02:37,923 
 org.apache.zookeeper.ClientCnxnSocketNIO deferring non-priming packet: 
 clientPath:/ serverPath:/ finished:false header:: 0,12  replyHeader:: 0,0,0 
 request:: '/,F  response:: v{} until SASL authentication completes.
 DEBUG [main-SendThread(mike.local:2181)] 2012-09-25 14:02:37,924 
 org.apache.zookeeper.ClientCnxnSocketNIO deferring non-priming packet: 
 clientPath:/ serverPath:/ finished:false header:: 0,12  replyHeader:: 0,0,0 
 request:: '/,F  response:: v{} until SASL authentication completes.
 DEBUG [main-SendThread(mike.local:2181)] 2012-09-25 14:02:37,924 
 org.apache.zookeeper.ClientCnxnSocketNIO deferring non-priming packet: 
 clientPath:null serverPath:null finished:false header:: -2,11  replyHeader:: 
 null request:: null response:: nulluntil SASL authentication completes.
 DEBUG [main-SendThread(mike.local:2181)] 2012-09-25 14:02:51,260 
 org.apache.zookeeper.ClientCnxnSocketNIO deferring non-priming packet: 
 clientPath:/ serverPath:/ finished:false header:: 0,12  replyHeader:: 0,0,0 
 request:: '/,F  response:: v{} until SASL authentication completes.
 DEBUG [main-SendThread(mike.local:2181)] 2012-09-25 14:02:51,260 
 org.apache.zookeeper.ClientCnxnSocketNIO deferring non-priming packet: 
 clientPath:null serverPath:null finished:false header:: -2,11  replyHeader:: 
 null request:: null response:: nulluntil SASL authentication completes.
 DEBUG [main-SendThread(mike.local:2181)] 2012-09-25 14:02:51,261 
 org.apache.zookeeper.ClientCnxnSocketNIO deferring non-priming packet: 
 clientPath:/ serverPath:/ finished:false header:: 0,12  replyHeader:: 0,0,0 
 request:: '/,F  response:: v{} until SASL authentication completes.
 DEBUG [main-SendThread(mike.local:2181)] 2012-09-25 14:02:51,261 
 org.apache.zookeeper.ClientCnxnSocketNIO deferring non-priming packet: 
 clientPath:null serverPath:null finished:false header:: -2,11  replyHeader:: 
 null request:: null response:: nulluntil SASL authentication completes.
 DEBUG [main-SendThread(mike.local:2181)] 2012-09-25 14:02:51,261 
 

[jira] [Commented] (ZOOKEEPER-1550) ZooKeeperSaslClient does not finish anonymous login on OpenJDK

2012-09-26 Thread Mahadev konar (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13464387#comment-13464387
 ] 

Mahadev konar commented on ZOOKEEPER-1550:
--

Eugene,
 Still failing :)...

 ZooKeeperSaslClient does not finish anonymous login on OpenJDK
 --

 Key: ZOOKEEPER-1550
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1550
 Project: ZooKeeper
  Issue Type: Bug
  Components: java client
Affects Versions: 3.4.4
Reporter: Robert Macomber
Assignee: Eugene Koontz
Priority: Blocker
 Fix For: 3.4.5

 Attachments: ZOOKEEPER-1550.patch, ZOOKEEPER-1550.patch


 On OpenJDK, {{javax.security.auth.login.Configuration.getConfiguration}} does 
 not throw an exception.  
 {{ZooKeeperSaslClient.clientTunneledAuthenticationInProgress}} uses an 
 exception from that method as a proxy for this client is not configured to 
 use SASL and as a result no commands can be sent, since it is still waiting 
 for auth to complete.
 [Link to mailing list 
 discussion|http://comments.gmane.org/gmane.comp.java.zookeeper.user/2667]
 The relevant bit of logs from OpenJDK and Oracle versions of 'connect and do 
 getChildren(/)':
 {code:title=OpenJDK}
 INFO [main] 2012-09-25 14:02:24,545 com.socrata.Main Waiting for connection...
 DEBUG [main] 2012-09-25 14:02:24,548 com.socrata.zookeeper.ZooKeeperProvider 
 Waiting for connected-state...
 INFO [main-SendThread(mike.local:2181)] 2012-09-25 14:02:24,576 
 org.apache.zookeeper.ClientCnxn Opening socket connection to server 
 mike.local/10.0.2.106:2181. Will not attempt to authenticate using SASL 
 (unknown error)
 INFO [main-SendThread(mike.local:2181)] 2012-09-25 14:02:24,584 
 org.apache.zookeeper.ClientCnxn Socket connection established to 
 mike.local/10.0.2.106:2181, initiating session
 DEBUG [main-SendThread(mike.local:2181)] 2012-09-25 14:02:24,586 
 org.apache.zookeeper.ClientCnxn Session establishment request sent on 
 mike.local/10.0.2.106:2181
 INFO [main-SendThread(mike.local:2181)] 2012-09-25 14:02:24,600 
 org.apache.zookeeper.ClientCnxn Session establishment complete on server 
 mike.local/10.0.2.106:2181, sessionid = 0x139ff2e85b60005, negotiated timeout 
 = 4
 DEBUG [main-EventThread] 2012-09-25 14:02:24,614 
 com.socrata.zookeeper.ZooKeeperProvider ConnectionStateChanged(Connected)
 DEBUG [main-SendThread(mike.local:2181)] 2012-09-25 14:02:24,636 
 org.apache.zookeeper.ClientCnxnSocketNIO deferring non-priming packet: 
 clientPath:/ serverPath:/ finished:false header:: 0,12  replyHeader:: 0,0,0 
 request:: '/,F  response:: v{} until SASL authentication completes.
 DEBUG [main-SendThread(mike.local:2181)] 2012-09-25 14:02:37,923 
 org.apache.zookeeper.ClientCnxnSocketNIO deferring non-priming packet: 
 clientPath:/ serverPath:/ finished:false header:: 0,12  replyHeader:: 0,0,0 
 request:: '/,F  response:: v{} until SASL authentication completes.
 DEBUG [main-SendThread(mike.local:2181)] 2012-09-25 14:02:37,924 
 org.apache.zookeeper.ClientCnxnSocketNIO deferring non-priming packet: 
 clientPath:/ serverPath:/ finished:false header:: 0,12  replyHeader:: 0,0,0 
 request:: '/,F  response:: v{} until SASL authentication completes.
 DEBUG [main-SendThread(mike.local:2181)] 2012-09-25 14:02:37,924 
 org.apache.zookeeper.ClientCnxnSocketNIO deferring non-priming packet: 
 clientPath:null serverPath:null finished:false header:: -2,11  replyHeader:: 
 null request:: null response:: nulluntil SASL authentication completes.
 DEBUG [main-SendThread(mike.local:2181)] 2012-09-25 14:02:51,260 
 org.apache.zookeeper.ClientCnxnSocketNIO deferring non-priming packet: 
 clientPath:/ serverPath:/ finished:false header:: 0,12  replyHeader:: 0,0,0 
 request:: '/,F  response:: v{} until SASL authentication completes.
 DEBUG [main-SendThread(mike.local:2181)] 2012-09-25 14:02:51,260 
 org.apache.zookeeper.ClientCnxnSocketNIO deferring non-priming packet: 
 clientPath:null serverPath:null finished:false header:: -2,11  replyHeader:: 
 null request:: null response:: nulluntil SASL authentication completes.
 DEBUG [main-SendThread(mike.local:2181)] 2012-09-25 14:02:51,261 
 org.apache.zookeeper.ClientCnxnSocketNIO deferring non-priming packet: 
 clientPath:/ serverPath:/ finished:false header:: 0,12  replyHeader:: 0,0,0 
 request:: '/,F  response:: v{} until SASL authentication completes.
 DEBUG [main-SendThread(mike.local:2181)] 2012-09-25 14:02:51,261 
 org.apache.zookeeper.ClientCnxnSocketNIO deferring non-priming packet: 
 clientPath:null serverPath:null finished:false header:: -2,11  replyHeader:: 
 null request:: null response:: nulluntil SASL authentication completes.
 DEBUG [main-SendThread(mike.local:2181)] 2012-09-25 14:02:51,261 
 org.apache.zookeeper.ClientCnxnSocketNIO 

[jira] [Updated] (ZOOKEEPER-1477) Test failures with Java 7 on Mac OS X

2012-09-25 Thread Mahadev konar (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mahadev konar updated ZOOKEEPER-1477:
-

Priority: Blocker  (was: Critical)

 Test failures with Java 7 on Mac OS X
 -

 Key: ZOOKEEPER-1477
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1477
 Project: ZooKeeper
  Issue Type: Bug
  Components: server, tests
Affects Versions: 3.4.3
 Environment: Mac OS X Lion (10.7.4)
 Java version:
 java version 1.7.0_04
 Java(TM) SE Runtime Environment (build 1.7.0_04-b21)
 Java HotSpot(TM) 64-Bit Server VM (build 23.0-b21, mixed mode)
Reporter: Diwaker Gupta
Priority: Blocker
 Fix For: 3.4.5


 I downloaded ZK 3.4.3 sources and ran {{ant test}}. Many of the tests failed, 
 including ZooKeeperTest. A common symptom was spurious 
 {{ConnectionLossException}}:
 {code}
 2012-06-01 12:01:23,420 [myid:] - INFO  
 [main:JUnit4ZKTestRunner$LoggedInvokeMethod@54] - TEST METHOD FAILED 
 testDeleteRecursiveAsync
 org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode 
 = ConnectionLoss for /
 at 
 org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
 at 
 org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
 at org.apache.zookeeper.ZooKeeper.setData(ZooKeeper.java:1246)
 at 
 org.apache.zookeeper.ZooKeeperTest.testDeleteRecursiveAsync(ZooKeeperTest.java:77)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 ... (snipped)
 {code}
 As background, I was actually investigating some non-deterministic failures 
 when using Netflix's Curator with Java 7 (see 
 https://github.com/Netflix/curator/issues/79). After a while, I figured I 
 should establish a clean ZK baseline first and realized it is actually a ZK 
 issue, not a Curator issue.
 We are trying to migrate to Java 7 but this is a blocking issue for us right 
 now.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (ZOOKEEPER-1496) Ephemeral node not getting cleared even after client has exited

2012-09-17 Thread Mahadev konar (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13456860#comment-13456860
 ] 

Mahadev konar commented on ZOOKEEPER-1496:
--

Rakesh,
 The patch looks good to me. Ill wait for hudson to check this in. We are good 
to go for 3.4 RC now! Thanks Rakesh!

 Ephemeral node not getting cleared even after client has exited
 ---

 Key: ZOOKEEPER-1496
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1496
 Project: ZooKeeper
  Issue Type: Bug
  Components: server
Affects Versions: 3.4.3
Reporter: suja s
Assignee: Rakesh R
Priority: Critical
 Fix For: 3.4.4, 3.5.0

 Attachments: Logs.rar, ZOOKEEPER-1496.1.patch, 
 ZOOKEEPER-1496.2.patch, ZOOKEEPER-1496.3.patch, ZOOKEEPER-1496.patch


 In one of the tests we performed, came across a case where the ephemeral node 
 was not getting cleared from zookeeper though the client exited.
 Zk version: 3.4.3
 Ephemeral node still exists in Zookeeper: 
 HOST-xx-xx-xx-55:/home/Jun25_LR/install/zookeeper/bin # date 
 Tue Jun 26 16:07:04 IST 2012 
 HOST-xx-xx-xx-55:/home/Jun25_LR/install/zookeeper/bin # ./zkCli.sh -server 
 xx.xx.xx.55:2182 
 Connecting to xx.xx.xx.55:2182 
 Welcome to ZooKeeper! 
 JLine support is enabled 
 [zk: xx.xx.xx.55:2182(CONNECTING) 0] 
 WATCHER:: 
 WatchedEvent state:SyncConnected type:None path:null 
 [zk: xx.xx.xx.55:2182(CONNECTED) 0] get 
 /hadoop-ha/hacluster/ActiveStandbyElectorLock 
 haclusternn2HOSt-xx-xx-xx-102 �� 
 cZxid = 0x20075 
 ctime = Tue Jun 26 13:10:19 IST 2012 
 mZxid = 0x20075 
 mtime = Tue Jun 26 13:10:19 IST 2012 
 pZxid = 0x20075 
 cversion = 0 
 dataVersion = 0 
 aclVersion = 0 
 ephemeralOwner = 0x1382791d4e50004 
 dataLength = 42 
 numChildren = 0 
 [zk: xx.xx.xx.55:2182(CONNECTED) 1] 
 Grepped logs at ZK side for session 0x1382791d4e50004 - close session and 
 later create coming before closesession processed. 
 HOSt-xx-xx-xx-91:/home/Jun25_LR/install/zookeeper/logs # grep -E 
 /hadoop-ha/hacluster/ActiveStandbyElectorLock|0x1382791d4e50004 *|grep 
 0x20074 
 2012-06-26 13:10:18,834 [myid:3] - DEBUG [ProcessThread(sid:3 
 cport:-1)::CommitProcessor@171] - Processing request:: 
 sessionid:0x1382791d4e50004 type:closeSession cxid:0x0 zxid:0x20074 
 txntype:-11 reqpath:n/a 
 2012-06-26 13:10:19,892 [myid:3] - DEBUG [ProcessThread(sid:3 
 cport:-1)::Leader@716] - Proposing:: sessionid:0x1382791d4e50004 
 type:closeSession cxid:0x0 zxid:0x20074 txntype:-11 reqpath:n/a 
 2012-06-26 13:10:19,919 [myid:3] - DEBUG 
 [LearnerHandler-/xx.xx.xx.102:13846:CommitProcessor@161] - Committing 
 request:: sessionid:0x1382791d4e50004 type:closeSession cxid:0x0 
 zxid:0x20074 txntype:-11 reqpath:n/a 
 2012-06-26 13:10:20,608 [myid:3] - DEBUG 
 [CommitProcessor:3:FinalRequestProcessor@88] - Processing request:: 
 sessionid:0x1382791d4e50004 type:closeSession cxid:0x0 zxid:0x20074 
 txntype:-11 reqpath:n/a 
 HOSt-xx-xx-xx-91:/home/Jun25_LR/install/zookeeper/logs # grep -E 
 /hadoop-ha/hacluster/ActiveStandbyElectorLock|0x1382791d4e50004 *|grep 
 0x20075 
 2012-06-26 13:10:19,893 [myid:3] - DEBUG [ProcessThread(sid:3 
 cport:-1)::CommitProcessor@171] - Processing request:: 
 sessionid:0x1382791d4e50004 type:create cxid:0x2 zxid:0x20075 txntype:1 
 reqpath:n/a 
 2012-06-26 13:10:19,920 [myid:3] - DEBUG [ProcessThread(sid:3 
 cport:-1)::Leader@716] - Proposing:: sessionid:0x1382791d4e50004 type:create 
 cxid:0x2 zxid:0x20075 txntype:1 reqpath:n/a 
 2012-06-26 13:10:20,278 [myid:3] - DEBUG 
 [LearnerHandler-/xx.xx.xx.102:13846:CommitProcessor@161] - Committing 
 request:: sessionid:0x1382791d4e50004 type:create cxid:0x2 zxid:0x20075 
 txntype:1 reqpath:n/a 
 2012-06-26 13:10:20,752 [myid:3] - DEBUG 
 [CommitProcessor:3:FinalRequestProcessor@88] - Processing request:: 
 sessionid:0x1382791d4e50004 type:create cxid:0x2 zxid:0x20075 txntype:1 
 reqpath:n/a 
  Close session and create requests coming almost parallely. 
 Env:
 Hadoop setup.
 We were using Namenode HA with bookkeeper as shared storage and auto failover 
 enabled.
 NN102 was active and NN55 was standby. 
 FailoverController at 102 got shut down due to ZK connection error. 
 The lock-ActiveStandbyElectorLock created (ephemeral node) by this 
 failovercontroller is not cleared from ZK

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (ZOOKEEPER-1105) c client zookeeper_close not send CLOSE_OP request to server

2012-09-16 Thread Mahadev konar (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13456681#comment-13456681
 ] 

Mahadev konar commented on ZOOKEEPER-1105:
--

Nice catch Michi. I think Ill revert the patch for 3.4 and trunk and we can fix 
it later. I dont think this looks like a blocker for 3.4 release. Mitchi what 
do you think?

 c client zookeeper_close not send CLOSE_OP request to server
 

 Key: ZOOKEEPER-1105
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1105
 Project: ZooKeeper
  Issue Type: Bug
  Components: c client
Affects Versions: 3.3.2, 3.4.3
Reporter: jiang guangran
Assignee: lincoln.lee
 Fix For: 3.4.4, 3.5.0

 Attachments: zklog.txt, zktest.c, zktest.java, ZOOKEEPER-1105.patch


 in zookeeper_close function,  do adaptor_finish before send CLOSE_OP request 
 to server
 so the CLOSE_OP request can not be sent to server
 in server zookeeper.log have many
 2011-06-22 00:23:02,323 - WARN  
 [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@634] - 
 EndOfStreamException: Unable to read additional data from client sessionid 
 0x1305970d66d2224, likely client has closed socket
 2011-06-22 00:23:02,324 - INFO  
 [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1435] - Closed 
 socket connection for client /10.250.8.123:60257 which had sessionid 
 0x1305970d66d2224
 2011-06-22 00:23:02,325 - ERROR [CommitProcessor:1:NIOServerCnxn@445] - 
 Unexpected Exception:
 java.nio.channels.CancelledKeyException
 at sun.nio.ch.SelectionKeyImpl.ensureValid(SelectionKeyImpl.java:55)
 at sun.nio.ch.SelectionKeyImpl.interestOps(SelectionKeyImpl.java:59)
 at 
 org.apache.zookeeper.server.NIOServerCnxn.sendBuffer(NIOServerCnxn.java:418)
 at 
 org.apache.zookeeper.server.NIOServerCnxn.sendResponse(NIOServerCnxn.java:1509)
 at 
 org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:367)
 at 
 org.apache.zookeeper.server.quorum.CommitProcessor.run(CommitProcessor.java:73)
 and java client not have this problem

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (ZOOKEEPER-1105) c client zookeeper_close not send CLOSE_OP request to server

2012-09-16 Thread Mahadev konar (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13456682#comment-13456682
 ] 

Mahadev konar commented on ZOOKEEPER-1105:
--

Michi looks like you reverted the patch for trunk. Can you do that for 3.4 
branch as well? If not let me know. I can do it.

 c client zookeeper_close not send CLOSE_OP request to server
 

 Key: ZOOKEEPER-1105
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1105
 Project: ZooKeeper
  Issue Type: Bug
  Components: c client
Affects Versions: 3.3.2, 3.4.3
Reporter: jiang guangran
Assignee: lincoln.lee
 Fix For: 3.4.4, 3.5.0

 Attachments: zklog.txt, zktest.c, zktest.java, ZOOKEEPER-1105.patch


 in zookeeper_close function,  do adaptor_finish before send CLOSE_OP request 
 to server
 so the CLOSE_OP request can not be sent to server
 in server zookeeper.log have many
 2011-06-22 00:23:02,323 - WARN  
 [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@634] - 
 EndOfStreamException: Unable to read additional data from client sessionid 
 0x1305970d66d2224, likely client has closed socket
 2011-06-22 00:23:02,324 - INFO  
 [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1435] - Closed 
 socket connection for client /10.250.8.123:60257 which had sessionid 
 0x1305970d66d2224
 2011-06-22 00:23:02,325 - ERROR [CommitProcessor:1:NIOServerCnxn@445] - 
 Unexpected Exception:
 java.nio.channels.CancelledKeyException
 at sun.nio.ch.SelectionKeyImpl.ensureValid(SelectionKeyImpl.java:55)
 at sun.nio.ch.SelectionKeyImpl.interestOps(SelectionKeyImpl.java:59)
 at 
 org.apache.zookeeper.server.NIOServerCnxn.sendBuffer(NIOServerCnxn.java:418)
 at 
 org.apache.zookeeper.server.NIOServerCnxn.sendResponse(NIOServerCnxn.java:1509)
 at 
 org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:367)
 at 
 org.apache.zookeeper.server.quorum.CommitProcessor.run(CommitProcessor.java:73)
 and java client not have this problem

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (ZOOKEEPER-1105) c client zookeeper_close not send CLOSE_OP request to server

2012-09-16 Thread Mahadev konar (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13456684#comment-13456684
 ] 

Mahadev konar commented on ZOOKEEPER-1105:
--

Thanks Michi!

 c client zookeeper_close not send CLOSE_OP request to server
 

 Key: ZOOKEEPER-1105
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1105
 Project: ZooKeeper
  Issue Type: Bug
  Components: c client
Affects Versions: 3.3.2, 3.4.3
Reporter: jiang guangran
Assignee: lincoln.lee
 Fix For: 3.5.0

 Attachments: zklog.txt, zktest.c, zktest.java, ZOOKEEPER-1105.patch


 in zookeeper_close function,  do adaptor_finish before send CLOSE_OP request 
 to server
 so the CLOSE_OP request can not be sent to server
 in server zookeeper.log have many
 2011-06-22 00:23:02,323 - WARN  
 [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@634] - 
 EndOfStreamException: Unable to read additional data from client sessionid 
 0x1305970d66d2224, likely client has closed socket
 2011-06-22 00:23:02,324 - INFO  
 [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1435] - Closed 
 socket connection for client /10.250.8.123:60257 which had sessionid 
 0x1305970d66d2224
 2011-06-22 00:23:02,325 - ERROR [CommitProcessor:1:NIOServerCnxn@445] - 
 Unexpected Exception:
 java.nio.channels.CancelledKeyException
 at sun.nio.ch.SelectionKeyImpl.ensureValid(SelectionKeyImpl.java:55)
 at sun.nio.ch.SelectionKeyImpl.interestOps(SelectionKeyImpl.java:59)
 at 
 org.apache.zookeeper.server.NIOServerCnxn.sendBuffer(NIOServerCnxn.java:418)
 at 
 org.apache.zookeeper.server.NIOServerCnxn.sendResponse(NIOServerCnxn.java:1509)
 at 
 org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:367)
 at 
 org.apache.zookeeper.server.quorum.CommitProcessor.run(CommitProcessor.java:73)
 and java client not have this problem

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (ZOOKEEPER-1448) Node+Quota creation in transaction log can crash leader startup

2012-09-16 Thread Mahadev konar (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mahadev konar updated ZOOKEEPER-1448:
-


I looked at the patch. THe patch looks good except for the point that Pat 
mentioned above that its moving the test toward log4j than using slf4j. For now 
I am moving this out to 3.4.5 for getting it done right. Botond, if you have 
sometime would you please update the patch using slf4j in the testcase.

 Node+Quota creation in transaction log can crash leader startup
 ---

 Key: ZOOKEEPER-1448
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1448
 Project: ZooKeeper
  Issue Type: Bug
  Components: server
Affects Versions: 3.3.5
Reporter: Botond Hejj
Assignee: Botond Hejj
Priority: Critical
 Fix For: 3.5.0, 3.3.7, 3.4.5

 Attachments: ZOOKEEPER-1448_branch3.3.patch, ZOOKEEPER-1448.patch, 
 ZOOKEEPER-1448.patch, ZOOKEEPER-1448.patch, ZOOKEEPER-1448.patch


 Hi,
 I've found a bug in zookeeper related to quota creation which can shutdown 
 zookeeper leader on startup.
 Steps to reproduce:
 1. create /quota_bug
 2. setquota -n 1 /quota_bug
 3. stop the whole ensemble (the previous operations should be in the 
 transaction log)
 4. start all the servers
 5. the elected leader will shutdown with an exception (Missing stat node for 
 count /zookeeper/quota/quota_bug/zookeeper_
 stats)
 I've debugged a bit what happening and I found the following problem:
 On startup each server loads the last snapshot and replays the last 
 transaction log. While doing this it fills up the pTrie variable of the 
 DataTree with the path of the nodes which have quota.
 After the leader is elected the leader servers loads the snapshot and last 
 transaction log but it doesn't clean up the pTrie variable. This means it 
 still contains the /quota_bug path. Now when the create /quota_bug is 
 processed from the transaction log the DataTree already thinks that the quota 
 nodes (/zookeeper/quota/quota_bug/zookeeper_limits and 
 /zookeeper/quota/quota_bug/zookeeper_stats) are created but those node 
 creation actually comes later in the transaction log. This leads to the 
 missing stat node exception.
 I think clearing the pTrie should solve this problem.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (ZOOKEEPER-1448) Node+Quota creation in transaction log can crash leader startup

2012-09-16 Thread Mahadev konar (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mahadev konar updated ZOOKEEPER-1448:
-

Fix Version/s: (was: 3.4.4)
   3.4.5

 Node+Quota creation in transaction log can crash leader startup
 ---

 Key: ZOOKEEPER-1448
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1448
 Project: ZooKeeper
  Issue Type: Bug
  Components: server
Affects Versions: 3.3.5
Reporter: Botond Hejj
Assignee: Botond Hejj
Priority: Critical
 Fix For: 3.5.0, 3.3.7, 3.4.5

 Attachments: ZOOKEEPER-1448_branch3.3.patch, ZOOKEEPER-1448.patch, 
 ZOOKEEPER-1448.patch, ZOOKEEPER-1448.patch, ZOOKEEPER-1448.patch


 Hi,
 I've found a bug in zookeeper related to quota creation which can shutdown 
 zookeeper leader on startup.
 Steps to reproduce:
 1. create /quota_bug
 2. setquota -n 1 /quota_bug
 3. stop the whole ensemble (the previous operations should be in the 
 transaction log)
 4. start all the servers
 5. the elected leader will shutdown with an exception (Missing stat node for 
 count /zookeeper/quota/quota_bug/zookeeper_
 stats)
 I've debugged a bit what happening and I found the following problem:
 On startup each server loads the last snapshot and replays the last 
 transaction log. While doing this it fills up the pTrie variable of the 
 DataTree with the path of the nodes which have quota.
 After the leader is elected the leader servers loads the snapshot and last 
 transaction log but it doesn't clean up the pTrie variable. This means it 
 still contains the /quota_bug path. Now when the create /quota_bug is 
 processed from the transaction log the DataTree already thinks that the quota 
 nodes (/zookeeper/quota/quota_bug/zookeeper_limits and 
 /zookeeper/quota/quota_bug/zookeeper_stats) are created but those node 
 creation actually comes later in the transaction log. This leads to the 
 missing stat node exception.
 I think clearing the pTrie should solve this problem.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (ZOOKEEPER-1548) Cluster fails election loop in new and interesting way

2012-09-16 Thread Mahadev konar (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mahadev konar updated ZOOKEEPER-1548:
-

Fix Version/s: 3.4.5
   3.5.0

 Cluster fails election loop in new and interesting way
 --

 Key: ZOOKEEPER-1548
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1548
 Project: ZooKeeper
  Issue Type: Bug
  Components: leaderElection
Affects Versions: 3.4.3
Reporter: Alan Horn
 Fix For: 3.5.0, 3.4.5

 Attachments: 1-follower, 2-follower, 3-leader


 Hi,
 We have a five node cluster, recently upgraded from 3.3.5 to 3.4.3. Was 
 running fine for a few weeks after the upgrade, then the following sequence 
 of events occurred :
 1. All servers stopped responding to 'ruok' at the same time
 2. Our local supervisor process restarted all of them at the same time 
 (yes, this is bad, we didn't expect it to fail this way :)
 3. The cluster would not serve requests after this. Appeared to be unable to 
 complete an election.
 We tried various things at this point, none of which worked :
 * Moved around the restart order of the nodes (e.g. 4 thru 0, instead of 0 
 thru 4)
 * Reduced number of running nodes from 5 - 3 to simplify the quorum, by only 
 starting up 0, 1  2, in one test, and  0, 2  4 in the other
 * Removed the *Epoch files from version-2/ snapshot directory
 * Put the same version2/snapshot.x file on each server in the cluster
 * Added the (same on all nodes) last txlog onto each cluster
 * Kept only the last snapshot plus txlog unique on each server
 * Moved leaderServes=no to leaderServes=yes
 * Removed all files and started up with empty data as a control. This worked, 
 but of course isn't terribly useful :)
 Finally, I brought the data up on a single node running in standalone and 
 this worked (yay!) So at this point we brought the single node back into 
 service and have kept the other four available to debug why the election is 
 failing.
 We downgraded the four nodes to 3.3.5, and then they completed the election 
 and started serving as expected.
 We did a rolling upgrade to 3.4.3, and everything was fine until we restarted 
 the leader, whereupon we encountered the same re-election loop as before.
 We're a bit out of ideas at this point, so I was hoping someone from this 
 list might have some useful input.
 Output from two followers and a leader during this condition are attached.
 Cheers,
 Al

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (ZOOKEEPER-1496) Ephemeral node not getting cleared even after client has exited

2012-09-16 Thread Mahadev konar (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13456758#comment-13456758
 ] 

Mahadev konar commented on ZOOKEEPER-1496:
--

Rakesh,
 I looked at the patch and it looks good, except for this one:

{code}
-set = sessionSets.remove(nextExpirationTime);
+SessionSet set = sessionSets.get(nextExpirationTime);
{code}

I think the remove still needs to happen else the session sets will keep 
growing in the hashset.

Also, 

{code}
 if (s != null) {
-sessionSets.get(s.tickTime).sessions.remove(s);
+SessionSet sessionSet = sessionSets.get(s.tickTime);
+sessionSet.sessions.remove(s);
+// Cleanup sessionSets, if no session exists
+if (sessionSet.sessions.size() == 0) {
+sessionSets.remove(s.tickTime);
+}
 }
 }

{code}

I see that you are removing the sessionSet once the session cleans up but I 
think we need to still do the remove the session set when iterating for expiry.

Does that make sense?


 Ephemeral node not getting cleared even after client has exited
 ---

 Key: ZOOKEEPER-1496
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1496
 Project: ZooKeeper
  Issue Type: Bug
  Components: server
Affects Versions: 3.4.3
Reporter: suja s
Assignee: Rakesh R
Priority: Critical
 Fix For: 3.4.4, 3.5.0

 Attachments: Logs.rar, ZOOKEEPER-1496.1.patch, 
 ZOOKEEPER-1496.2.patch, ZOOKEEPER-1496.patch


 In one of the tests we performed, came across a case where the ephemeral node 
 was not getting cleared from zookeeper though the client exited.
 Zk version: 3.4.3
 Ephemeral node still exists in Zookeeper: 
 HOST-xx-xx-xx-55:/home/Jun25_LR/install/zookeeper/bin # date 
 Tue Jun 26 16:07:04 IST 2012 
 HOST-xx-xx-xx-55:/home/Jun25_LR/install/zookeeper/bin # ./zkCli.sh -server 
 xx.xx.xx.55:2182 
 Connecting to xx.xx.xx.55:2182 
 Welcome to ZooKeeper! 
 JLine support is enabled 
 [zk: xx.xx.xx.55:2182(CONNECTING) 0] 
 WATCHER:: 
 WatchedEvent state:SyncConnected type:None path:null 
 [zk: xx.xx.xx.55:2182(CONNECTED) 0] get 
 /hadoop-ha/hacluster/ActiveStandbyElectorLock 
 haclusternn2HOSt-xx-xx-xx-102 �� 
 cZxid = 0x20075 
 ctime = Tue Jun 26 13:10:19 IST 2012 
 mZxid = 0x20075 
 mtime = Tue Jun 26 13:10:19 IST 2012 
 pZxid = 0x20075 
 cversion = 0 
 dataVersion = 0 
 aclVersion = 0 
 ephemeralOwner = 0x1382791d4e50004 
 dataLength = 42 
 numChildren = 0 
 [zk: xx.xx.xx.55:2182(CONNECTED) 1] 
 Grepped logs at ZK side for session 0x1382791d4e50004 - close session and 
 later create coming before closesession processed. 
 HOSt-xx-xx-xx-91:/home/Jun25_LR/install/zookeeper/logs # grep -E 
 /hadoop-ha/hacluster/ActiveStandbyElectorLock|0x1382791d4e50004 *|grep 
 0x20074 
 2012-06-26 13:10:18,834 [myid:3] - DEBUG [ProcessThread(sid:3 
 cport:-1)::CommitProcessor@171] - Processing request:: 
 sessionid:0x1382791d4e50004 type:closeSession cxid:0x0 zxid:0x20074 
 txntype:-11 reqpath:n/a 
 2012-06-26 13:10:19,892 [myid:3] - DEBUG [ProcessThread(sid:3 
 cport:-1)::Leader@716] - Proposing:: sessionid:0x1382791d4e50004 
 type:closeSession cxid:0x0 zxid:0x20074 txntype:-11 reqpath:n/a 
 2012-06-26 13:10:19,919 [myid:3] - DEBUG 
 [LearnerHandler-/xx.xx.xx.102:13846:CommitProcessor@161] - Committing 
 request:: sessionid:0x1382791d4e50004 type:closeSession cxid:0x0 
 zxid:0x20074 txntype:-11 reqpath:n/a 
 2012-06-26 13:10:20,608 [myid:3] - DEBUG 
 [CommitProcessor:3:FinalRequestProcessor@88] - Processing request:: 
 sessionid:0x1382791d4e50004 type:closeSession cxid:0x0 zxid:0x20074 
 txntype:-11 reqpath:n/a 
 HOSt-xx-xx-xx-91:/home/Jun25_LR/install/zookeeper/logs # grep -E 
 /hadoop-ha/hacluster/ActiveStandbyElectorLock|0x1382791d4e50004 *|grep 
 0x20075 
 2012-06-26 13:10:19,893 [myid:3] - DEBUG [ProcessThread(sid:3 
 cport:-1)::CommitProcessor@171] - Processing request:: 
 sessionid:0x1382791d4e50004 type:create cxid:0x2 zxid:0x20075 txntype:1 
 reqpath:n/a 
 2012-06-26 13:10:19,920 [myid:3] - DEBUG [ProcessThread(sid:3 
 cport:-1)::Leader@716] - Proposing:: sessionid:0x1382791d4e50004 type:create 
 cxid:0x2 zxid:0x20075 txntype:1 reqpath:n/a 
 2012-06-26 13:10:20,278 [myid:3] - DEBUG 
 [LearnerHandler-/xx.xx.xx.102:13846:CommitProcessor@161] - Committing 
 request:: sessionid:0x1382791d4e50004 type:create cxid:0x2 zxid:0x20075 
 txntype:1 reqpath:n/a 
 2012-06-26 13:10:20,752 [myid:3] - DEBUG 
 [CommitProcessor:3:FinalRequestProcessor@88] - Processing request:: 
 sessionid:0x1382791d4e50004 type:create cxid:0x2 zxid:0x20075 txntype:1 
 reqpath:n/a 
  Close session and create requests coming almost parallely. 
 

[jira] [Updated] (ZOOKEEPER-1361) Leader.lead iterates over 'learners' set without proper synchronisation

2012-09-16 Thread Mahadev konar (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mahadev konar updated ZOOKEEPER-1361:
-

Attachment: ZOOKEEPER-1361-3.4.patch

Thanks Camille/Ross/Henry,
 I am committing Ross's patch that is a straightforward port from trunk to the 
3.4 branch. Attaching a cleaned up version of Ross's patch (removing 
CHANGES.txt changes).

 Leader.lead iterates over 'learners' set without proper synchronisation
 ---

 Key: ZOOKEEPER-1361
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1361
 Project: ZooKeeper
  Issue Type: Bug
Affects Versions: 3.4.2
Reporter: Henry Robinson
Assignee: Henry Robinson
 Fix For: 3.4.4, 3.5.0

 Attachments: zk-memory-leak-fix.patch, ZOOKEEPER-1361-3.4.patch, 
 ZOOKEEPER-1361-3.4.patch, ZOOKEEPER-1361-no-whitespace.patch, 
 ZOOKEEPER-1361.patch


 This block:
 {code}
 HashSetLong followerSet = new HashSetLong();
 for(LearnerHandler f : learners)
 followerSet.add(f.getSid());
 {code}
 is executed without holding the lock on learners, so if there were ever a 
 condition where a new learner was added during the initial sync phase, I'm 
 pretty sure we'd see a concurrent modification exception. Certainly other 
 parts of the code are very careful to lock on learners when iterating. 
 It would be nice to use a {{ConcurrentHashMap}} to hold the learners instead, 
 but I can't convince myself that this wouldn't introduce some correctness 
 bugs. For example the following:
 Learners contains A, B, C, D
 Thread 1 iterates over learners, and gets as far as B.
 Thread 2 removes A, and adds E.
 Thread 1 continues iterating and sees a learner view of A, B, C, D, E
 This may be a bug if Thread 1 is counting the number of synced followers for 
 a quorum count, since at no point was A, B, C, D, E a correct view of the 
 quorum.
 In practice, I think this is actually ok, because I don't think ZK makes any 
 strong ordering guarantees on learners joining or leaving (so we don't need a 
 strong serialisability guarantee on learners) but I don't think I'll make 
 that change for this patch. Instead I want to clean up the locking protocols 
 on the follower / learner sets - to avoid another easy deadlock like the one 
 we saw in ZOOKEEPER-1294 - and to do less with the lock held; i.e. to copy 
 and then iterate over the copy rather than iterate over a locked set. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (ZOOKEEPER-1469) Adding Cross-Realm support for secure Zookeeper client authentication

2012-09-14 Thread Mahadev konar (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mahadev konar updated ZOOKEEPER-1469:
-

Fix Version/s: (was: 3.4.4)

Moving it out of 3.4 release. 

 Adding Cross-Realm support for secure Zookeeper client authentication
 -

 Key: ZOOKEEPER-1469
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1469
 Project: ZooKeeper
  Issue Type: Improvement
  Components: documentation
Affects Versions: 3.4.3
Reporter: Himanshu Vashishtha
Assignee: Eugene Koontz
 Fix For: 3.5.0

 Attachments: SaslServerCallBackHandlerException.patch


 There is a use case where one needs to support cross realm authentication for 
 zookeeper cluster. One use case is HBase Replication: HBase supports 
 replicating data to multiple slave clusters, where the later might be running 
 in different realms. With current zookeeper security, the region server of 
 master HBase cluster are not able to query the zookeeper quorum members of 
 the slave cluster. This jira is about adding such Xrealm support.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (ZOOKEEPER-1478) Small bug in QuorumTest.testFollowersStartAfterLeader( )

2012-09-10 Thread Mahadev konar (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mahadev konar updated ZOOKEEPER-1478:
-

Fix Version/s: (was: 3.4.4)

Moving it out to 3.5 since the bugfix isnt a really critical one. 

 Small bug in QuorumTest.testFollowersStartAfterLeader( )
 

 Key: ZOOKEEPER-1478
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1478
 Project: ZooKeeper
  Issue Type: Bug
  Components: tests
Affects Versions: 3.4.3
Reporter: Alexander Shraer
Assignee: Alexander Shraer
Priority: Minor
 Fix For: 3.5.0

 Attachments: ZOOKEEPER-1478.patch, ZOOKEEPER-1478.patch, 
 ZOOKEEPER-1478.patch, ZOOKEEPER-1478.patch


 The following code appears in QuorumTest.testFollowersStartAfterLeader( ):
 for (int i = 0; i  30; i++) {
 try {
zk.create(/test, test.getBytes(), ZooDefs.Ids.OPEN_ACL_UNSAFE,
  CreateMode.PERSISTENT);
break;
  } catch(KeeperException.ConnectionLossException e) {
Thread.sleep(1000);
  }
 // test fails if we still can't connect to the quorum after 30 seconds.
 Assert.fail(client could not connect to reestablished quorum: giving up 
 after 30+ seconds.);
 }
 From the comment it looks like the intention was to try to reconnect 30 times 
 and only then trigger the Assert, but that's not what this does.
 After we fail to connect once and Thread.sleep is executed, Assert.fail will 
 be executed without retrying create. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (ZOOKEEPER-1478) Small bug in QuorumTest.testFollowersStartAfterLeader( )

2012-09-10 Thread Mahadev konar (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mahadev konar updated ZOOKEEPER-1478:
-

Attachment: ZOOKEEPER-1478.patch

Re uploading the patch for hudson.

 Small bug in QuorumTest.testFollowersStartAfterLeader( )
 

 Key: ZOOKEEPER-1478
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1478
 Project: ZooKeeper
  Issue Type: Bug
  Components: tests
Affects Versions: 3.4.3
Reporter: Alexander Shraer
Assignee: Alexander Shraer
Priority: Minor
 Fix For: 3.5.0

 Attachments: ZOOKEEPER-1478.patch, ZOOKEEPER-1478.patch, 
 ZOOKEEPER-1478.patch, ZOOKEEPER-1478.patch, ZOOKEEPER-1478.patch


 The following code appears in QuorumTest.testFollowersStartAfterLeader( ):
 for (int i = 0; i  30; i++) {
 try {
zk.create(/test, test.getBytes(), ZooDefs.Ids.OPEN_ACL_UNSAFE,
  CreateMode.PERSISTENT);
break;
  } catch(KeeperException.ConnectionLossException e) {
Thread.sleep(1000);
  }
 // test fails if we still can't connect to the quorum after 30 seconds.
 Assert.fail(client could not connect to reestablished quorum: giving up 
 after 30+ seconds.);
 }
 From the comment it looks like the intention was to try to reconnect 30 times 
 and only then trigger the Assert, but that's not what this does.
 After we fail to connect once and Thread.sleep is executed, Assert.fail will 
 be executed without retrying create. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (ZOOKEEPER-1494) C client: socket leak after receive timeout in zookeeper_interest()

2012-09-09 Thread Mahadev konar (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mahadev konar updated ZOOKEEPER-1494:
-

Attachment: ZOOKEEPER-1494.patch

 C client: socket leak after receive timeout in zookeeper_interest()
 ---

 Key: ZOOKEEPER-1494
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1494
 Project: ZooKeeper
  Issue Type: Bug
  Components: c client
Affects Versions: 3.4.2, 3.3.5
Reporter: Michi Mutsuzaki
Assignee: Michi Mutsuzaki
 Fix For: 3.4.4, 3.5.0

 Attachments: ZOOKEEPER-1494-3.4.patch, ZOOKEEPER-1494.patch, 
 ZOOKEEPER-1494.patch


 In zookeeper_interest(), we set zk-fd to -1 without closing it when timeout 
 happens. Instead we should let handle_socket_error_msg() function take care 
 of closing the socket properly.
 --Michi

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (ZOOKEEPER-1538) Improve space handling in zkServer.sh and zkEnv.sh

2012-09-07 Thread Mahadev konar (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mahadev konar updated ZOOKEEPER-1538:
-

Fix Version/s: (was: 3.4.4)

Moving it out of 3.4 branch. The patch looks good. Ill go ahead and commit this 
to trunk.

 Improve space handling in zkServer.sh and zkEnv.sh
 --

 Key: ZOOKEEPER-1538
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1538
 Project: ZooKeeper
  Issue Type: Bug
Affects Versions: 3.4.3
Reporter: Andrew Ferguson
Assignee: Andrew Ferguson
Priority: Trivial
 Fix For: 3.5.0

 Attachments: ZOOKEEPER-1538.patch


 Running `bin/zkServer.sh start` from a freshly-built copy of trunk fails if 
 the source code is checked-out to a directory with spaces in the name. I'll 
 include a small fix to fix this problem.
 thanks!

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (ZOOKEEPER-1462) Read-only server does not initialize database properly

2012-09-06 Thread Mahadev konar (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mahadev konar updated ZOOKEEPER-1462:
-

Fix Version/s: (was: 3.4.4)
   3.4.5

Moving it out since we do not have a patch.

 Read-only server does not initialize database properly
 --

 Key: ZOOKEEPER-1462
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1462
 Project: ZooKeeper
  Issue Type: Bug
  Components: server
Affects Versions: 3.4.3
Reporter: Thawan Kooburat
Assignee: Thawan Kooburat
Priority: Critical
 Fix For: 3.5.0, 3.4.5

 Attachments: ZOOKEEPER-1462.patch


 Brief Description:
 When a participant or observer get partitioned and restart as Read-only 
 server. ZkDb doesn't get reinitialized. This causes the RO server to drop any 
 incoming request with zxid  0 
 Error message:
 Refusing session request for client /xx.xx.xx.xx:39875 
 as it has seen zxid 0x2e00405fd9 our last zxid is 0x0 client must try another 
 server
 Steps to reproduce:
 Start an RO-enabled observer connecting to an ensemble. Kill the ensemble and 
 wait until the observer restart in RO mode. Zxid of this observer should be 0.
 Description:
 Before a server transition into LOOKING state, its database get closed as 
 part of shutdown sequence. The database of leader, follower and observer get 
 initialized as a side effect of participating in leader election protocol. 
 (eg. observer will call registerWithLeader() and call getLastLoggedZxid() 
 which initialize the db if not already).
 However, RO server does not participate in this protocol so its DB doesn't 
 get initialized properly
  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (ZOOKEEPER-1387) Wrong epoch file created

2012-09-06 Thread Mahadev konar (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mahadev konar updated ZOOKEEPER-1387:
-

Fix Version/s: (was: 3.4.4)
   3.4.5

Moving it out since its not a blocker.

 Wrong epoch file created
 

 Key: ZOOKEEPER-1387
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1387
 Project: ZooKeeper
  Issue Type: Bug
  Components: quorum
Affects Versions: 3.4.2
Reporter: Benjamin Busjaeger
Assignee: Benjamin Reed
Priority: Minor
 Fix For: 3.5.0, 3.4.5

 Attachments: ZOOKEEPER-1387.patch


 It looks like line 443 in QuorumPeer [1] may need to change from:
 writeLongToFile(CURRENT_EPOCH_FILENAME, acceptedEpoch);
 to
 writeLongToFile(ACCEPTED_EPOCH_FILENAME, acceptedEpoch);
 I only noticed this reading the code, so I may be wrong and I don't know yet 
 if/how this affects the runtime.
 [1] 
 https://github.com/apache/zookeeper/blob/trunk/src/java/main/org/apache/zookeeper/server/quorum/QuorumPeer.java#L443

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (ZOOKEEPER-1328) Misplaced assertion for the test case 'FLELostMessageTest' and not identifying misfunctions

2012-08-31 Thread Mahadev konar (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13446038#comment-13446038
 ] 

Mahadev konar commented on ZOOKEEPER-1328:
--

Thanks for fixing that Rakesh. Ill run it through hudson again and will commit 
as soon as it +1's.

 Misplaced assertion for the test case 'FLELostMessageTest' and not 
 identifying misfunctions
 ---

 Key: ZOOKEEPER-1328
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1328
 Project: ZooKeeper
  Issue Type: Test
  Components: leaderElection
Affects Versions: 3.4.0
Reporter: Rakesh R
Assignee: Rakesh R
 Fix For: 3.5.0

 Attachments: ZOOKEEPER-1328.1.patch, ZOOKEEPER-1328.2.patch, 
 ZOOKEEPER-1328.patch


 Assertion for testLostMessage is kept inside the thread.run() method. Due to 
 this the assertion failure will not be reflected in the main testcase. 
 I have observed the test case is still passing in case of the assert failure 
 or misfunction. Instead, the assertion can be moved to the test case - 
 testLostMessage.
 {noformat}
 class LEThread extends Thread {
   public void run(){
 peer.setCurrentVote(v);
 LOG.info(Finished election:  + i + ,  + v.getId());
 Assert.assertTrue(State is not leading., 
 peer.getPeerState() == ServerState.LEADING);
  } 
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (ZOOKEEPER-1328) Misplaced assertion for the test case 'FLELostMessageTest' and not identifying misfunctions

2012-08-30 Thread Mahadev konar (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13445060#comment-13445060
 ] 

Mahadev konar commented on ZOOKEEPER-1328:
--

Thanks for the patience and quick response Rakesh. Really appreciate that. The 
patch looks good to me. Ill let hudson run through it and will go ahead and 
commit once it +1's.

 Misplaced assertion for the test case 'FLELostMessageTest' and not 
 identifying misfunctions
 ---

 Key: ZOOKEEPER-1328
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1328
 Project: ZooKeeper
  Issue Type: Test
  Components: leaderElection
Affects Versions: 3.4.0
Reporter: Rakesh R
Assignee: Rakesh R
 Fix For: 3.5.0

 Attachments: ZOOKEEPER-1328.1.patch, ZOOKEEPER-1328.patch


 Assertion for testLostMessage is kept inside the thread.run() method. Due to 
 this the assertion failure will not be reflected in the main testcase. 
 I have observed the test case is still passing in case of the assert failure 
 or misfunction. Instead, the assertion can be moved to the test case - 
 testLostMessage.
 {noformat}
 class LEThread extends Thread {
   public void run(){
 peer.setCurrentVote(v);
 LOG.info(Finished election:  + i + ,  + v.getId());
 Assert.assertTrue(State is not leading., 
 peer.getPeerState() == ServerState.LEADING);
  } 
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (ZOOKEEPER-1536) c client : memory leak in winport.c

2012-08-30 Thread Mahadev konar (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13445172#comment-13445172
 ] 

Mahadev konar commented on ZOOKEEPER-1536:
--

Michi is this committed to 3.4 branch as well? 

 c client : memory leak in winport.c
 ---

 Key: ZOOKEEPER-1536
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1536
 Project: ZooKeeper
  Issue Type: Bug
  Components: c client
Affects Versions: 3.4.3
 Environment: windows7
Reporter: brooklin
Assignee: brooklin
 Fix For: 3.4.4

 Attachments: winport.c.patch


 At line 99 in winport.c, use windows API InitializeCriticalSection but 
 never call DeleteCriticalSection

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (ZOOKEEPER-1497) Allow server-side SASL login with JAAS configuration to be programmatically set (rather than only by reading JAAS configuration file)

2012-08-30 Thread Mahadev konar (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13445180#comment-13445180
 ] 

Mahadev konar commented on ZOOKEEPER-1497:
--

Pat, was this committed to 3.4 branch? I dont see it. Maybe I missed it?

 Allow server-side SASL login with JAAS configuration to be programmatically 
 set (rather than only by reading JAAS configuration file)
 -

 Key: ZOOKEEPER-1497
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1497
 Project: ZooKeeper
  Issue Type: Improvement
  Components: server
Affects Versions: 3.4.3, 3.5.0
Reporter: Matteo Bertozzi
Assignee: Matteo Bertozzi
  Labels: security
 Fix For: 3.4.4, 3.5.0

 Attachments: ZOOKEEPER-1497-v1.patch, ZOOKEEPER-1497-v2.patch, 
 ZOOKEEPER-1497-v3.patch, ZOOKEEPER-1497-v4.patch, ZOOKEEPER-1497-v5.patch


 Currently the CnxnFactory checks for java.security.auth.login.config to 
 decide whether or not enable SASL.
 * zookeeper/server/NIOServerCnxnFactory.java
 * zookeeper/server/NettyServerCnxnFactory.java
 ** configure() checks for java.security.auth.login.config
 *** If present start the new Login(Server, SaslServerCallbackHandler(conf))
 But since the SaslServerCallbackHandler does the right thing just checking if 
 getAppConfigurationEntry() is empty, we can allow SASL with JAAS 
 configuration to be programmatically just checking weather or not a 
 configuration entry is present instead of java.security.auth.login.config.
 (Something quite similar was done for the SaslClient in ZOOKEEPER-1373)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (ZOOKEEPER-1497) Allow server-side SASL login with JAAS configuration to be programmatically set (rather than only by reading JAAS configuration file)

2012-08-30 Thread Mahadev konar (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13445394#comment-13445394
 ] 

Mahadev konar commented on ZOOKEEPER-1497:
--

Nevermind, I see it now. Mistake on my side!

 Allow server-side SASL login with JAAS configuration to be programmatically 
 set (rather than only by reading JAAS configuration file)
 -

 Key: ZOOKEEPER-1497
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1497
 Project: ZooKeeper
  Issue Type: Improvement
  Components: server
Affects Versions: 3.4.3, 3.5.0
Reporter: Matteo Bertozzi
Assignee: Matteo Bertozzi
  Labels: security
 Fix For: 3.4.4, 3.5.0

 Attachments: ZOOKEEPER-1497-v1.patch, ZOOKEEPER-1497-v2.patch, 
 ZOOKEEPER-1497-v3.patch, ZOOKEEPER-1497-v4.patch, ZOOKEEPER-1497-v5.patch


 Currently the CnxnFactory checks for java.security.auth.login.config to 
 decide whether or not enable SASL.
 * zookeeper/server/NIOServerCnxnFactory.java
 * zookeeper/server/NettyServerCnxnFactory.java
 ** configure() checks for java.security.auth.login.config
 *** If present start the new Login(Server, SaslServerCallbackHandler(conf))
 But since the SaslServerCallbackHandler does the right thing just checking if 
 getAppConfigurationEntry() is empty, we can allow SASL with JAAS 
 configuration to be programmatically just checking weather or not a 
 configuration entry is present instead of java.security.auth.login.config.
 (Something quite similar was done for the SaslClient in ZOOKEEPER-1373)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (ZOOKEEPER-1359) ZkCli create command data and acl parts should be optional.

2012-08-30 Thread Mahadev konar (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mahadev konar updated ZOOKEEPER-1359:
-

Fix Version/s: (was: 3.4.4)
   3.4.5

Moving it out since its not a blocker.

 ZkCli create command data and acl parts should be optional.
 ---

 Key: ZOOKEEPER-1359
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1359
 Project: ZooKeeper
  Issue Type: Bug
  Components: java client
Reporter: kavita sharma
Assignee: kavita sharma
Priority: Trivial
  Labels: new
 Fix For: 3.5.0, 3.4.5


 In zkCli if we create a node without data then also node is getting created 
 but if we will see in the commandMap 
 it shows that
 {noformat}
  commandMap.put(create, [-s] [-e] path data acl);
 {noformat}
 that means data and acl parts are not optional .we need to change these parts 
 as optional.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (ZOOKEEPER-1378) Provide option to turn off sending of diffs

2012-08-30 Thread Mahadev konar (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mahadev konar updated ZOOKEEPER-1378:
-

Fix Version/s: (was: 3.4.4)
   3.5.0

Moving it out to 3.5. I think we should mark it as wont fix but Ill keep it 
open for now.

 Provide option to turn off sending of diffs
 ---

 Key: ZOOKEEPER-1378
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1378
 Project: ZooKeeper
  Issue Type: Task
Reporter: Zhihong Ted Yu
 Fix For: 3.5.0


 From Patrick:
 we need to have an option to turn off sending of diffs. There are a couple of 
 really strong reasons I can think of to do this:
 1) 3.3.x is broken in a similar way, there is an upgrade problem we can't 
 solve short of having ppl first upgrade to a fixed 3.3 (3.3.5 say) and then 
 upgrading to 3.4.x. If we could turn off diff sending this would address the 
 problem.
 2) safety valve. Say we find another new problem with diff sending in 
 3.4/3/5. Having an option to turn it off would be useful for people as a 
 workaround until a fix is found and released.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (ZOOKEEPER-1462) Read-only server does not initialize database properly

2012-08-29 Thread Mahadev konar (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mahadev konar updated ZOOKEEPER-1462:
-

Fix Version/s: (was: 3.4.3)
   3.4.4

Thawan, 
 would you be able to add a unit test for this?

 Read-only server does not initialize database properly
 --

 Key: ZOOKEEPER-1462
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1462
 Project: ZooKeeper
  Issue Type: Bug
  Components: server
Affects Versions: 3.4.3
Reporter: Thawan Kooburat
Assignee: Thawan Kooburat
Priority: Critical
 Fix For: 3.4.4, 3.5.0

 Attachments: ZOOKEEPER-1462.patch


 Brief Description:
 When a participant or observer get partitioned and restart as Read-only 
 server. ZkDb doesn't get reinitialized. This causes the RO server to drop any 
 incoming request with zxid  0 
 Error message:
 Refusing session request for client /xx.xx.xx.xx:39875 
 as it has seen zxid 0x2e00405fd9 our last zxid is 0x0 client must try another 
 server
 Steps to reproduce:
 Start an RO-enabled observer connecting to an ensemble. Kill the ensemble and 
 wait until the observer restart in RO mode. Zxid of this observer should be 0.
 Description:
 Before a server transition into LOOKING state, its database get closed as 
 part of shutdown sequence. The database of leader, follower and observer get 
 initialized as a side effect of participating in leader election protocol. 
 (eg. observer will call registerWithLeader() and call getLastLoggedZxid() 
 which initialize the db if not already).
 However, RO server does not participate in this protocol so its DB doesn't 
 get initialized properly
  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (ZOOKEEPER-1496) Ephemeral node not getting cleared even after client has exited

2012-08-29 Thread Mahadev konar (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mahadev konar updated ZOOKEEPER-1496:
-

Priority: Critical  (was: Major)

 Ephemeral node not getting cleared even after client has exited
 ---

 Key: ZOOKEEPER-1496
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1496
 Project: ZooKeeper
  Issue Type: Bug
  Components: server
Affects Versions: 3.4.3
Reporter: suja s
Assignee: Rakesh R
Priority: Critical
 Fix For: 3.4.4, 3.5.0

 Attachments: Logs.rar, ZOOKEEPER-1496.1.patch, 
 ZOOKEEPER-1496.2.patch, ZOOKEEPER-1496.patch


 In one of the tests we performed, came across a case where the ephemeral node 
 was not getting cleared from zookeeper though the client exited.
 Zk version: 3.4.3
 Ephemeral node still exists in Zookeeper: 
 HOST-xx-xx-xx-55:/home/Jun25_LR/install/zookeeper/bin # date 
 Tue Jun 26 16:07:04 IST 2012 
 HOST-xx-xx-xx-55:/home/Jun25_LR/install/zookeeper/bin # ./zkCli.sh -server 
 xx.xx.xx.55:2182 
 Connecting to xx.xx.xx.55:2182 
 Welcome to ZooKeeper! 
 JLine support is enabled 
 [zk: xx.xx.xx.55:2182(CONNECTING) 0] 
 WATCHER:: 
 WatchedEvent state:SyncConnected type:None path:null 
 [zk: xx.xx.xx.55:2182(CONNECTED) 0] get 
 /hadoop-ha/hacluster/ActiveStandbyElectorLock 
 haclusternn2HOSt-xx-xx-xx-102 �� 
 cZxid = 0x20075 
 ctime = Tue Jun 26 13:10:19 IST 2012 
 mZxid = 0x20075 
 mtime = Tue Jun 26 13:10:19 IST 2012 
 pZxid = 0x20075 
 cversion = 0 
 dataVersion = 0 
 aclVersion = 0 
 ephemeralOwner = 0x1382791d4e50004 
 dataLength = 42 
 numChildren = 0 
 [zk: xx.xx.xx.55:2182(CONNECTED) 1] 
 Grepped logs at ZK side for session 0x1382791d4e50004 - close session and 
 later create coming before closesession processed. 
 HOSt-xx-xx-xx-91:/home/Jun25_LR/install/zookeeper/logs # grep -E 
 /hadoop-ha/hacluster/ActiveStandbyElectorLock|0x1382791d4e50004 *|grep 
 0x20074 
 2012-06-26 13:10:18,834 [myid:3] - DEBUG [ProcessThread(sid:3 
 cport:-1)::CommitProcessor@171] - Processing request:: 
 sessionid:0x1382791d4e50004 type:closeSession cxid:0x0 zxid:0x20074 
 txntype:-11 reqpath:n/a 
 2012-06-26 13:10:19,892 [myid:3] - DEBUG [ProcessThread(sid:3 
 cport:-1)::Leader@716] - Proposing:: sessionid:0x1382791d4e50004 
 type:closeSession cxid:0x0 zxid:0x20074 txntype:-11 reqpath:n/a 
 2012-06-26 13:10:19,919 [myid:3] - DEBUG 
 [LearnerHandler-/xx.xx.xx.102:13846:CommitProcessor@161] - Committing 
 request:: sessionid:0x1382791d4e50004 type:closeSession cxid:0x0 
 zxid:0x20074 txntype:-11 reqpath:n/a 
 2012-06-26 13:10:20,608 [myid:3] - DEBUG 
 [CommitProcessor:3:FinalRequestProcessor@88] - Processing request:: 
 sessionid:0x1382791d4e50004 type:closeSession cxid:0x0 zxid:0x20074 
 txntype:-11 reqpath:n/a 
 HOSt-xx-xx-xx-91:/home/Jun25_LR/install/zookeeper/logs # grep -E 
 /hadoop-ha/hacluster/ActiveStandbyElectorLock|0x1382791d4e50004 *|grep 
 0x20075 
 2012-06-26 13:10:19,893 [myid:3] - DEBUG [ProcessThread(sid:3 
 cport:-1)::CommitProcessor@171] - Processing request:: 
 sessionid:0x1382791d4e50004 type:create cxid:0x2 zxid:0x20075 txntype:1 
 reqpath:n/a 
 2012-06-26 13:10:19,920 [myid:3] - DEBUG [ProcessThread(sid:3 
 cport:-1)::Leader@716] - Proposing:: sessionid:0x1382791d4e50004 type:create 
 cxid:0x2 zxid:0x20075 txntype:1 reqpath:n/a 
 2012-06-26 13:10:20,278 [myid:3] - DEBUG 
 [LearnerHandler-/xx.xx.xx.102:13846:CommitProcessor@161] - Committing 
 request:: sessionid:0x1382791d4e50004 type:create cxid:0x2 zxid:0x20075 
 txntype:1 reqpath:n/a 
 2012-06-26 13:10:20,752 [myid:3] - DEBUG 
 [CommitProcessor:3:FinalRequestProcessor@88] - Processing request:: 
 sessionid:0x1382791d4e50004 type:create cxid:0x2 zxid:0x20075 txntype:1 
 reqpath:n/a 
  Close session and create requests coming almost parallely. 
 Env:
 Hadoop setup.
 We were using Namenode HA with bookkeeper as shared storage and auto failover 
 enabled.
 NN102 was active and NN55 was standby. 
 FailoverController at 102 got shut down due to ZK connection error. 
 The lock-ActiveStandbyElectorLock created (ephemeral node) by this 
 failovercontroller is not cleared from ZK

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (ZOOKEEPER-1496) Ephemeral node not getting cleared even after client has exited

2012-08-29 Thread Mahadev konar (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mahadev konar updated ZOOKEEPER-1496:
-

Component/s: server

 Ephemeral node not getting cleared even after client has exited
 ---

 Key: ZOOKEEPER-1496
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1496
 Project: ZooKeeper
  Issue Type: Bug
  Components: server
Affects Versions: 3.4.3
Reporter: suja s
Assignee: Rakesh R
Priority: Critical
 Fix For: 3.4.4, 3.5.0

 Attachments: Logs.rar, ZOOKEEPER-1496.1.patch, 
 ZOOKEEPER-1496.2.patch, ZOOKEEPER-1496.patch


 In one of the tests we performed, came across a case where the ephemeral node 
 was not getting cleared from zookeeper though the client exited.
 Zk version: 3.4.3
 Ephemeral node still exists in Zookeeper: 
 HOST-xx-xx-xx-55:/home/Jun25_LR/install/zookeeper/bin # date 
 Tue Jun 26 16:07:04 IST 2012 
 HOST-xx-xx-xx-55:/home/Jun25_LR/install/zookeeper/bin # ./zkCli.sh -server 
 xx.xx.xx.55:2182 
 Connecting to xx.xx.xx.55:2182 
 Welcome to ZooKeeper! 
 JLine support is enabled 
 [zk: xx.xx.xx.55:2182(CONNECTING) 0] 
 WATCHER:: 
 WatchedEvent state:SyncConnected type:None path:null 
 [zk: xx.xx.xx.55:2182(CONNECTED) 0] get 
 /hadoop-ha/hacluster/ActiveStandbyElectorLock 
 haclusternn2HOSt-xx-xx-xx-102 �� 
 cZxid = 0x20075 
 ctime = Tue Jun 26 13:10:19 IST 2012 
 mZxid = 0x20075 
 mtime = Tue Jun 26 13:10:19 IST 2012 
 pZxid = 0x20075 
 cversion = 0 
 dataVersion = 0 
 aclVersion = 0 
 ephemeralOwner = 0x1382791d4e50004 
 dataLength = 42 
 numChildren = 0 
 [zk: xx.xx.xx.55:2182(CONNECTED) 1] 
 Grepped logs at ZK side for session 0x1382791d4e50004 - close session and 
 later create coming before closesession processed. 
 HOSt-xx-xx-xx-91:/home/Jun25_LR/install/zookeeper/logs # grep -E 
 /hadoop-ha/hacluster/ActiveStandbyElectorLock|0x1382791d4e50004 *|grep 
 0x20074 
 2012-06-26 13:10:18,834 [myid:3] - DEBUG [ProcessThread(sid:3 
 cport:-1)::CommitProcessor@171] - Processing request:: 
 sessionid:0x1382791d4e50004 type:closeSession cxid:0x0 zxid:0x20074 
 txntype:-11 reqpath:n/a 
 2012-06-26 13:10:19,892 [myid:3] - DEBUG [ProcessThread(sid:3 
 cport:-1)::Leader@716] - Proposing:: sessionid:0x1382791d4e50004 
 type:closeSession cxid:0x0 zxid:0x20074 txntype:-11 reqpath:n/a 
 2012-06-26 13:10:19,919 [myid:3] - DEBUG 
 [LearnerHandler-/xx.xx.xx.102:13846:CommitProcessor@161] - Committing 
 request:: sessionid:0x1382791d4e50004 type:closeSession cxid:0x0 
 zxid:0x20074 txntype:-11 reqpath:n/a 
 2012-06-26 13:10:20,608 [myid:3] - DEBUG 
 [CommitProcessor:3:FinalRequestProcessor@88] - Processing request:: 
 sessionid:0x1382791d4e50004 type:closeSession cxid:0x0 zxid:0x20074 
 txntype:-11 reqpath:n/a 
 HOSt-xx-xx-xx-91:/home/Jun25_LR/install/zookeeper/logs # grep -E 
 /hadoop-ha/hacluster/ActiveStandbyElectorLock|0x1382791d4e50004 *|grep 
 0x20075 
 2012-06-26 13:10:19,893 [myid:3] - DEBUG [ProcessThread(sid:3 
 cport:-1)::CommitProcessor@171] - Processing request:: 
 sessionid:0x1382791d4e50004 type:create cxid:0x2 zxid:0x20075 txntype:1 
 reqpath:n/a 
 2012-06-26 13:10:19,920 [myid:3] - DEBUG [ProcessThread(sid:3 
 cport:-1)::Leader@716] - Proposing:: sessionid:0x1382791d4e50004 type:create 
 cxid:0x2 zxid:0x20075 txntype:1 reqpath:n/a 
 2012-06-26 13:10:20,278 [myid:3] - DEBUG 
 [LearnerHandler-/xx.xx.xx.102:13846:CommitProcessor@161] - Committing 
 request:: sessionid:0x1382791d4e50004 type:create cxid:0x2 zxid:0x20075 
 txntype:1 reqpath:n/a 
 2012-06-26 13:10:20,752 [myid:3] - DEBUG 
 [CommitProcessor:3:FinalRequestProcessor@88] - Processing request:: 
 sessionid:0x1382791d4e50004 type:create cxid:0x2 zxid:0x20075 txntype:1 
 reqpath:n/a 
  Close session and create requests coming almost parallely. 
 Env:
 Hadoop setup.
 We were using Namenode HA with bookkeeper as shared storage and auto failover 
 enabled.
 NN102 was active and NN55 was standby. 
 FailoverController at 102 got shut down due to ZK connection error. 
 The lock-ActiveStandbyElectorLock created (ephemeral node) by this 
 failovercontroller is not cleared from ZK

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (ZOOKEEPER-1496) Ephemeral node not getting cleared even after client has exited

2012-08-29 Thread Mahadev konar (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mahadev konar updated ZOOKEEPER-1496:
-

Fix Version/s: 3.4.4

This looks like a critical bugfix.

 Ephemeral node not getting cleared even after client has exited
 ---

 Key: ZOOKEEPER-1496
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1496
 Project: ZooKeeper
  Issue Type: Bug
  Components: server
Affects Versions: 3.4.3
Reporter: suja s
Assignee: Rakesh R
 Fix For: 3.4.4, 3.5.0

 Attachments: Logs.rar, ZOOKEEPER-1496.1.patch, 
 ZOOKEEPER-1496.2.patch, ZOOKEEPER-1496.patch


 In one of the tests we performed, came across a case where the ephemeral node 
 was not getting cleared from zookeeper though the client exited.
 Zk version: 3.4.3
 Ephemeral node still exists in Zookeeper: 
 HOST-xx-xx-xx-55:/home/Jun25_LR/install/zookeeper/bin # date 
 Tue Jun 26 16:07:04 IST 2012 
 HOST-xx-xx-xx-55:/home/Jun25_LR/install/zookeeper/bin # ./zkCli.sh -server 
 xx.xx.xx.55:2182 
 Connecting to xx.xx.xx.55:2182 
 Welcome to ZooKeeper! 
 JLine support is enabled 
 [zk: xx.xx.xx.55:2182(CONNECTING) 0] 
 WATCHER:: 
 WatchedEvent state:SyncConnected type:None path:null 
 [zk: xx.xx.xx.55:2182(CONNECTED) 0] get 
 /hadoop-ha/hacluster/ActiveStandbyElectorLock 
 haclusternn2HOSt-xx-xx-xx-102 �� 
 cZxid = 0x20075 
 ctime = Tue Jun 26 13:10:19 IST 2012 
 mZxid = 0x20075 
 mtime = Tue Jun 26 13:10:19 IST 2012 
 pZxid = 0x20075 
 cversion = 0 
 dataVersion = 0 
 aclVersion = 0 
 ephemeralOwner = 0x1382791d4e50004 
 dataLength = 42 
 numChildren = 0 
 [zk: xx.xx.xx.55:2182(CONNECTED) 1] 
 Grepped logs at ZK side for session 0x1382791d4e50004 - close session and 
 later create coming before closesession processed. 
 HOSt-xx-xx-xx-91:/home/Jun25_LR/install/zookeeper/logs # grep -E 
 /hadoop-ha/hacluster/ActiveStandbyElectorLock|0x1382791d4e50004 *|grep 
 0x20074 
 2012-06-26 13:10:18,834 [myid:3] - DEBUG [ProcessThread(sid:3 
 cport:-1)::CommitProcessor@171] - Processing request:: 
 sessionid:0x1382791d4e50004 type:closeSession cxid:0x0 zxid:0x20074 
 txntype:-11 reqpath:n/a 
 2012-06-26 13:10:19,892 [myid:3] - DEBUG [ProcessThread(sid:3 
 cport:-1)::Leader@716] - Proposing:: sessionid:0x1382791d4e50004 
 type:closeSession cxid:0x0 zxid:0x20074 txntype:-11 reqpath:n/a 
 2012-06-26 13:10:19,919 [myid:3] - DEBUG 
 [LearnerHandler-/xx.xx.xx.102:13846:CommitProcessor@161] - Committing 
 request:: sessionid:0x1382791d4e50004 type:closeSession cxid:0x0 
 zxid:0x20074 txntype:-11 reqpath:n/a 
 2012-06-26 13:10:20,608 [myid:3] - DEBUG 
 [CommitProcessor:3:FinalRequestProcessor@88] - Processing request:: 
 sessionid:0x1382791d4e50004 type:closeSession cxid:0x0 zxid:0x20074 
 txntype:-11 reqpath:n/a 
 HOSt-xx-xx-xx-91:/home/Jun25_LR/install/zookeeper/logs # grep -E 
 /hadoop-ha/hacluster/ActiveStandbyElectorLock|0x1382791d4e50004 *|grep 
 0x20075 
 2012-06-26 13:10:19,893 [myid:3] - DEBUG [ProcessThread(sid:3 
 cport:-1)::CommitProcessor@171] - Processing request:: 
 sessionid:0x1382791d4e50004 type:create cxid:0x2 zxid:0x20075 txntype:1 
 reqpath:n/a 
 2012-06-26 13:10:19,920 [myid:3] - DEBUG [ProcessThread(sid:3 
 cport:-1)::Leader@716] - Proposing:: sessionid:0x1382791d4e50004 type:create 
 cxid:0x2 zxid:0x20075 txntype:1 reqpath:n/a 
 2012-06-26 13:10:20,278 [myid:3] - DEBUG 
 [LearnerHandler-/xx.xx.xx.102:13846:CommitProcessor@161] - Committing 
 request:: sessionid:0x1382791d4e50004 type:create cxid:0x2 zxid:0x20075 
 txntype:1 reqpath:n/a 
 2012-06-26 13:10:20,752 [myid:3] - DEBUG 
 [CommitProcessor:3:FinalRequestProcessor@88] - Processing request:: 
 sessionid:0x1382791d4e50004 type:create cxid:0x2 zxid:0x20075 txntype:1 
 reqpath:n/a 
  Close session and create requests coming almost parallely. 
 Env:
 Hadoop setup.
 We were using Namenode HA with bookkeeper as shared storage and auto failover 
 enabled.
 NN102 was active and NN55 was standby. 
 FailoverController at 102 got shut down due to ZK connection error. 
 The lock-ActiveStandbyElectorLock created (ephemeral node) by this 
 failovercontroller is not cleared from ZK

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (ZOOKEEPER-1496) Ephemeral node not getting cleared even after client has exited

2012-08-29 Thread Mahadev konar (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mahadev konar updated ZOOKEEPER-1496:
-

Fix Version/s: 3.5.0

 Ephemeral node not getting cleared even after client has exited
 ---

 Key: ZOOKEEPER-1496
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1496
 Project: ZooKeeper
  Issue Type: Bug
  Components: server
Affects Versions: 3.4.3
Reporter: suja s
Assignee: Rakesh R
Priority: Critical
 Fix For: 3.4.4, 3.5.0

 Attachments: Logs.rar, ZOOKEEPER-1496.1.patch, 
 ZOOKEEPER-1496.2.patch, ZOOKEEPER-1496.patch


 In one of the tests we performed, came across a case where the ephemeral node 
 was not getting cleared from zookeeper though the client exited.
 Zk version: 3.4.3
 Ephemeral node still exists in Zookeeper: 
 HOST-xx-xx-xx-55:/home/Jun25_LR/install/zookeeper/bin # date 
 Tue Jun 26 16:07:04 IST 2012 
 HOST-xx-xx-xx-55:/home/Jun25_LR/install/zookeeper/bin # ./zkCli.sh -server 
 xx.xx.xx.55:2182 
 Connecting to xx.xx.xx.55:2182 
 Welcome to ZooKeeper! 
 JLine support is enabled 
 [zk: xx.xx.xx.55:2182(CONNECTING) 0] 
 WATCHER:: 
 WatchedEvent state:SyncConnected type:None path:null 
 [zk: xx.xx.xx.55:2182(CONNECTED) 0] get 
 /hadoop-ha/hacluster/ActiveStandbyElectorLock 
 haclusternn2HOSt-xx-xx-xx-102 �� 
 cZxid = 0x20075 
 ctime = Tue Jun 26 13:10:19 IST 2012 
 mZxid = 0x20075 
 mtime = Tue Jun 26 13:10:19 IST 2012 
 pZxid = 0x20075 
 cversion = 0 
 dataVersion = 0 
 aclVersion = 0 
 ephemeralOwner = 0x1382791d4e50004 
 dataLength = 42 
 numChildren = 0 
 [zk: xx.xx.xx.55:2182(CONNECTED) 1] 
 Grepped logs at ZK side for session 0x1382791d4e50004 - close session and 
 later create coming before closesession processed. 
 HOSt-xx-xx-xx-91:/home/Jun25_LR/install/zookeeper/logs # grep -E 
 /hadoop-ha/hacluster/ActiveStandbyElectorLock|0x1382791d4e50004 *|grep 
 0x20074 
 2012-06-26 13:10:18,834 [myid:3] - DEBUG [ProcessThread(sid:3 
 cport:-1)::CommitProcessor@171] - Processing request:: 
 sessionid:0x1382791d4e50004 type:closeSession cxid:0x0 zxid:0x20074 
 txntype:-11 reqpath:n/a 
 2012-06-26 13:10:19,892 [myid:3] - DEBUG [ProcessThread(sid:3 
 cport:-1)::Leader@716] - Proposing:: sessionid:0x1382791d4e50004 
 type:closeSession cxid:0x0 zxid:0x20074 txntype:-11 reqpath:n/a 
 2012-06-26 13:10:19,919 [myid:3] - DEBUG 
 [LearnerHandler-/xx.xx.xx.102:13846:CommitProcessor@161] - Committing 
 request:: sessionid:0x1382791d4e50004 type:closeSession cxid:0x0 
 zxid:0x20074 txntype:-11 reqpath:n/a 
 2012-06-26 13:10:20,608 [myid:3] - DEBUG 
 [CommitProcessor:3:FinalRequestProcessor@88] - Processing request:: 
 sessionid:0x1382791d4e50004 type:closeSession cxid:0x0 zxid:0x20074 
 txntype:-11 reqpath:n/a 
 HOSt-xx-xx-xx-91:/home/Jun25_LR/install/zookeeper/logs # grep -E 
 /hadoop-ha/hacluster/ActiveStandbyElectorLock|0x1382791d4e50004 *|grep 
 0x20075 
 2012-06-26 13:10:19,893 [myid:3] - DEBUG [ProcessThread(sid:3 
 cport:-1)::CommitProcessor@171] - Processing request:: 
 sessionid:0x1382791d4e50004 type:create cxid:0x2 zxid:0x20075 txntype:1 
 reqpath:n/a 
 2012-06-26 13:10:19,920 [myid:3] - DEBUG [ProcessThread(sid:3 
 cport:-1)::Leader@716] - Proposing:: sessionid:0x1382791d4e50004 type:create 
 cxid:0x2 zxid:0x20075 txntype:1 reqpath:n/a 
 2012-06-26 13:10:20,278 [myid:3] - DEBUG 
 [LearnerHandler-/xx.xx.xx.102:13846:CommitProcessor@161] - Committing 
 request:: sessionid:0x1382791d4e50004 type:create cxid:0x2 zxid:0x20075 
 txntype:1 reqpath:n/a 
 2012-06-26 13:10:20,752 [myid:3] - DEBUG 
 [CommitProcessor:3:FinalRequestProcessor@88] - Processing request:: 
 sessionid:0x1382791d4e50004 type:create cxid:0x2 zxid:0x20075 txntype:1 
 reqpath:n/a 
  Close session and create requests coming almost parallely. 
 Env:
 Hadoop setup.
 We were using Namenode HA with bookkeeper as shared storage and auto failover 
 enabled.
 NN102 was active and NN55 was standby. 
 FailoverController at 102 got shut down due to ZK connection error. 
 The lock-ActiveStandbyElectorLock created (ephemeral node) by this 
 failovercontroller is not cleared from ZK

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


  1   2   3   4   >