ZooKeeper-trunk-jdk7 - Build # 430 - Failure
See https://builds.apache.org/job/ZooKeeper-trunk-jdk7/430/ ### ## LAST 60 LINES OF THE CONSOLE ### [...truncated 252338 lines...] [junit] 2012-10-24 09:55:57,187 [myid:] - INFO [main:ClientBase@427] - STOPPING server [junit] 2012-10-24 09:55:57,188 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11221:NIOServerCnxnFactory@224] - NIOServerCnxn factory exited run method [junit] 2012-10-24 09:55:57,188 [myid:] - INFO [main:ZooKeeperServer@399] - shutting down [junit] 2012-10-24 09:55:57,188 [myid:] - INFO [main:SessionTrackerImpl@225] - Shutting down [junit] 2012-10-24 09:55:57,188 [myid:] - INFO [main:PrepRequestProcessor@733] - Shutting down [junit] 2012-10-24 09:55:57,188 [myid:] - INFO [main:SyncRequestProcessor@175] - Shutting down [junit] 2012-10-24 09:55:57,188 [myid:] - INFO [ProcessThread(sid:0 cport:-1)::PrepRequestProcessor@142] - PrepRequestProcessor exited loop! [junit] 2012-10-24 09:55:57,188 [myid:] - INFO [SyncThread:0:SyncRequestProcessor@155] - SyncRequestProcessor exited! [junit] 2012-10-24 09:55:57,189 [myid:] - INFO [main:FinalRequestProcessor@411] - shutdown of request processor complete [junit] 2012-10-24 09:55:57,189 [myid:] - INFO [main:FourLetterWordMain@43] - connecting to 127.0.0.1 11221 [junit] 2012-10-24 09:55:57,190 [myid:] - INFO [main:JMXEnv@133] - ensureOnly:[] [junit] 2012-10-24 09:55:57,191 [myid:] - INFO [main:ClientBase@420] - STARTING server [junit] 2012-10-24 09:55:57,191 [myid:] - INFO [main:ZooKeeperServer@147] - Created server with tickTime 3000 minSessionTimeout 6000 maxSessionTimeout 6 datadir /home/jenkins/jenkins-slave/workspace/ZooKeeper-trunk-jdk7/trunk/build/test/tmp/test6841446492964033919.junit.dir/version-2 snapdir /home/jenkins/jenkins-slave/workspace/ZooKeeper-trunk-jdk7/trunk/build/test/tmp/test6841446492964033919.junit.dir/version-2 [junit] 2012-10-24 09:55:57,192 [myid:] - INFO [main:NIOServerCnxnFactory@94] - binding to port 0.0.0.0/0.0.0.0:11221 [junit] 2012-10-24 09:55:57,192 [myid:] - INFO [main:FileSnap@83] - Reading snapshot /home/jenkins/jenkins-slave/workspace/ZooKeeper-trunk-jdk7/trunk/build/test/tmp/test6841446492964033919.junit.dir/version-2/snapshot.b [junit] 2012-10-24 09:55:57,194 [myid:] - INFO [main:FileTxnSnapLog@270] - Snapshotting: 0xb to /home/jenkins/jenkins-slave/workspace/ZooKeeper-trunk-jdk7/trunk/build/test/tmp/test6841446492964033919.junit.dir/version-2/snapshot.b [junit] 2012-10-24 09:55:57,196 [myid:] - INFO [main:FourLetterWordMain@43] - connecting to 127.0.0.1 11221 [junit] 2012-10-24 09:55:57,197 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11221:NIOServerCnxnFactory@197] - Accepted socket connection from /127.0.0.1:49287 [junit] 2012-10-24 09:55:57,197 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11221:NIOServerCnxn@821] - Processing stat command from /127.0.0.1:49287 [junit] 2012-10-24 09:55:57,198 [myid:] - INFO [Thread-4:NIOServerCnxn$StatCommand@655] - Stat command output [junit] 2012-10-24 09:55:57,198 [myid:] - INFO [Thread-4:NIOServerCnxn@1001] - Closed socket connection for client /127.0.0.1:49287 (no session established for client) [junit] 2012-10-24 09:55:57,198 [myid:] - INFO [main:JMXEnv@133] - ensureOnly:[InMemoryDataTree, StandaloneServer_port] [junit] 2012-10-24 09:55:57,200 [myid:] - INFO [main:JMXEnv@105] - expect:InMemoryDataTree [junit] 2012-10-24 09:55:57,200 [myid:] - INFO [main:JMXEnv@108] - found:InMemoryDataTree org.apache.ZooKeeperService:name0=StandaloneServer_port-1,name1=InMemoryDataTree [junit] 2012-10-24 09:55:57,200 [myid:] - INFO [main:JMXEnv@105] - expect:StandaloneServer_port [junit] 2012-10-24 09:55:57,200 [myid:] - INFO [main:JMXEnv@108] - found:StandaloneServer_port org.apache.ZooKeeperService:name0=StandaloneServer_port-1 [junit] 2012-10-24 09:55:57,201 [myid:] - INFO [main:JUnit4ZKTestRunner$LoggedInvokeMethod@57] - FINISHED TEST METHOD testQuota [junit] 2012-10-24 09:55:57,201 [myid:] - INFO [main:ClientBase@457] - tearDown starting [junit] 2012-10-24 09:55:57,273 [myid:] - INFO [main:ZooKeeper@684] - Session: 0x13a923327c8 closed [junit] 2012-10-24 09:55:57,273 [myid:] - INFO [main-EventThread:ClientCnxn$EventThread@509] - EventThread shut down [junit] 2012-10-24 09:55:57,273 [myid:] - INFO [main:ClientBase@427] - STOPPING server [junit] 2012-10-24 09:55:57,274 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11221:NIOServerCnxnFactory@224] - NIOServerCnxn factory exited run method [junit] 2012-10-24 09:55:57,274 [myid:] - INFO [main:ZooKeeperServer@399] - shutting down [junit] 2012-10-24 09:55:57,275 [myid:] - INFO [main:SessionTrackerImpl@225] - Shutting down [junit] 2012-10-24 09:55:57,275 [myid:] -
[jira] [Commented] (ZOOKEEPER-1568) multi should have a non-transaction version
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13483136#comment-13483136 ] Flavio Junqueira commented on ZOOKEEPER-1568: - Hi Jimmy, I'm trying to understand why submitting operations asynchronously is not sufficient for your case. Why do you need to use multi in this case? multi should have a non-transaction version --- Key: ZOOKEEPER-1568 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1568 Project: ZooKeeper Issue Type: Improvement Reporter: Jimmy Xiang Currently multi is transactional, i.e. all or none. However, sometimes, we don't want that. We want all operations to be executed. Even some operation(s) fails, it is ok. We just need to know the result of each operation. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (ZOOKEEPER-1568) multi should have a non-transaction version
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13483311#comment-13483311 ] Jimmy Xiang commented on ZOOKEEPER-1568: Hi Flavio, for our use case, we need to create/setData hundreds/thousands of znodes. By submitting operations asynchronously, we need to do it one by one. If we can do it in batches, we can save lots of network trips. multi should have a non-transaction version --- Key: ZOOKEEPER-1568 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1568 Project: ZooKeeper Issue Type: Improvement Reporter: Jimmy Xiang Currently multi is transactional, i.e. all or none. However, sometimes, we don't want that. We want all operations to be executed. Even some operation(s) fails, it is ok. We just need to know the result of each operation. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (ZOOKEEPER-1568) multi should have a non-transaction version
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13483347#comment-13483347 ] Flavio Junqueira commented on ZOOKEEPER-1568: - In my view, the asynchronous API has been designed to address exactly use cases like yours. I don't think you should be suffering any severe penalty by using the asynchronous API. Have you actually tried it and had any issue with it? multi should have a non-transaction version --- Key: ZOOKEEPER-1568 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1568 Project: ZooKeeper Issue Type: Improvement Reporter: Jimmy Xiang Currently multi is transactional, i.e. all or none. However, sometimes, we don't want that. We want all operations to be executed. Even some operation(s) fails, it is ok. We just need to know the result of each operation. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (ZOOKEEPER-1560) Zookeeper client hangs on creation of large nodes
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13483566#comment-13483566 ] Patrick Hunt commented on ZOOKEEPER-1560: - That's a good point, the while loop in the patch seems like it would block when the tcp buffer is full (e.g. if the server is slow to read). I don't think that's a good idea. Rather we should have the code structured similar to what it was before - write as much as possible and then use the selector to wait for the socket to become writeable again. Eventually the send buffer will drain and we can remove it from the queue. Zookeeper client hangs on creation of large nodes - Key: ZOOKEEPER-1560 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1560 Project: ZooKeeper Issue Type: Bug Components: java client Affects Versions: 3.4.4, 3.5.0 Reporter: Igor Motov Assignee: Ted Yu Fix For: 3.5.0, 3.4.5 Attachments: ZOOKEEPER-1560.patch, zookeeper-1560-v1.txt, zookeeper-1560-v2.txt, zookeeper-1560-v3.txt, zookeeper-1560-v4.txt, zookeeper-1560-v5.txt, zookeeper-1560-v6.txt, zookeeper-1560-v7.txt To reproduce, try creating a node with 0.5M of data using java client. The test will hang waiting for a response from the server. See the attached patch for the test that reproduces the issue. It seems that ZOOKEEPER-1437 introduced a few issues to {{ClientCnxnSocketNIO.doIO}} that prevent {{ClientCnxnSocketNIO}} from sending large packets that require several invocations of {{SocketChannel.write}} to complete. The first issue is that the call to {{outgoingQueue.removeFirstOccurrence(p);}} removes the packet from the queue even if the packet wasn't completely sent yet. It looks to me that this call should be moved under {{if (!pbb.hasRemaining())}} The second issue is that {{p.createBB()}} is reinitializing {{ByteBuffer}} on every iteration, which confuses {{SocketChannel.write}}. And the third issue is caused by extra calls to {{cnxn.getXid()}} that increment xid on every iteration and confuse the server. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (ZOOKEEPER-1568) multi should have a non-transaction version
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13483590#comment-13483590 ] Marshall McMullen commented on ZOOKEEPER-1568: -- I actually think there is a valid use case for this. Mostly for performance reasons. Because a multi is one transaction, it causes less permuation on the distributed and replicated state of zookeeper than multiple individual operations not in a multi. With a Multi: - You only pay the cost of the RPC overhead once rather than on each individual operation - You get one flush of the leader channel rather than multiple ones for each write operation - A multi will case one new snapshot/log to be generated rather than multiple ones for each operation There are other reasons that make this a good reason too that are not performance based. e.g., if it makes the programmer's job easier to use a multi with these semantics, then that's a win. In other distributed databases I've worked on, we used different terminology to disinguish between a multi op that all succeed/fail vs one that does not. We used the term Batch to imply we were batching up operations but there was no guarantee they'd all succeed/fail. multi should have a non-transaction version --- Key: ZOOKEEPER-1568 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1568 Project: ZooKeeper Issue Type: Improvement Reporter: Jimmy Xiang Currently multi is transactional, i.e. all or none. However, sometimes, we don't want that. We want all operations to be executed. Even some operation(s) fails, it is ok. We just need to know the result of each operation. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (ZOOKEEPER-1560) Zookeeper client hangs on creation of large nodes
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13483592#comment-13483592 ] Nikita Vetoshkin commented on ZOOKEEPER-1560: - If no one can prepend {{outgoingQueue}} with packet, straightforward implementation like this should work: {noformat} diff --git a/src/java/main/org/apache/zookeeper/ClientCnxnSocketNIO.java b/src/java/main/org/apache/zookeeper/ClientCnxnSocketNIO.java index 70d8538..457c8cc 100644 --- a/src/java/main/org/apache/zookeeper/ClientCnxnSocketNIO.java +++ b/src/java/main/org/apache/zookeeper/ClientCnxnSocketNIO.java @@ -111,17 +111,20 @@ public class ClientCnxnSocketNIO extends ClientCnxnSocket { cnxn.sendThread.clientTunneledAuthenticationInProgress()); if (p != null) { -outgoingQueue.removeFirstOccurrence(p); updateLastSend(); -if ((p.requestHeader != null) -(p.requestHeader.getType() != OpCode.ping) -(p.requestHeader.getType() != OpCode.auth)) { -p.requestHeader.setXid(cnxn.getXid()); +if (p.bb != null) { +if ((p.requestHeader != null) +(p.requestHeader.getType() != OpCode.ping) +(p.requestHeader.getType() != OpCode.auth)) { +p.requestHeader.setXid(cnxn.getXid()); +} +p.createBB(); +// otherwise we're in the middle of sending packet } -p.createBB(); ByteBuffer pbb = p.bb; sock.write(pbb); if (!pbb.hasRemaining()) { +outgoingQueue.removeFirstOccurrence(p); sentCount++; if (p.requestHeader != null p.requestHeader.getType() != OpCode.ping {noformat} Zookeeper client hangs on creation of large nodes - Key: ZOOKEEPER-1560 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1560 Project: ZooKeeper Issue Type: Bug Components: java client Affects Versions: 3.4.4, 3.5.0 Reporter: Igor Motov Assignee: Ted Yu Fix For: 3.5.0, 3.4.5 Attachments: ZOOKEEPER-1560.patch, zookeeper-1560-v1.txt, zookeeper-1560-v2.txt, zookeeper-1560-v3.txt, zookeeper-1560-v4.txt, zookeeper-1560-v5.txt, zookeeper-1560-v6.txt, zookeeper-1560-v7.txt To reproduce, try creating a node with 0.5M of data using java client. The test will hang waiting for a response from the server. See the attached patch for the test that reproduces the issue. It seems that ZOOKEEPER-1437 introduced a few issues to {{ClientCnxnSocketNIO.doIO}} that prevent {{ClientCnxnSocketNIO}} from sending large packets that require several invocations of {{SocketChannel.write}} to complete. The first issue is that the call to {{outgoingQueue.removeFirstOccurrence(p);}} removes the packet from the queue even if the packet wasn't completely sent yet. It looks to me that this call should be moved under {{if (!pbb.hasRemaining())}} The second issue is that {{p.createBB()}} is reinitializing {{ByteBuffer}} on every iteration, which confuses {{SocketChannel.write}}. And the third issue is caused by extra calls to {{cnxn.getXid()}} that increment xid on every iteration and confuse the server. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (ZOOKEEPER-1560) Zookeeper client hangs on creation of large nodes
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13483597#comment-13483597 ] Ted Yu commented on ZOOKEEPER-1560: --- Looking at createBB(), upon exit the field bb wouldn't be null. I wonder why p.createBB() is enclosed in the if (p.bb != null) block above ? Zookeeper client hangs on creation of large nodes - Key: ZOOKEEPER-1560 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1560 Project: ZooKeeper Issue Type: Bug Components: java client Affects Versions: 3.4.4, 3.5.0 Reporter: Igor Motov Assignee: Ted Yu Fix For: 3.5.0, 3.4.5 Attachments: ZOOKEEPER-1560.patch, zookeeper-1560-v1.txt, zookeeper-1560-v2.txt, zookeeper-1560-v3.txt, zookeeper-1560-v4.txt, zookeeper-1560-v5.txt, zookeeper-1560-v6.txt, zookeeper-1560-v7.txt To reproduce, try creating a node with 0.5M of data using java client. The test will hang waiting for a response from the server. See the attached patch for the test that reproduces the issue. It seems that ZOOKEEPER-1437 introduced a few issues to {{ClientCnxnSocketNIO.doIO}} that prevent {{ClientCnxnSocketNIO}} from sending large packets that require several invocations of {{SocketChannel.write}} to complete. The first issue is that the call to {{outgoingQueue.removeFirstOccurrence(p);}} removes the packet from the queue even if the packet wasn't completely sent yet. It looks to me that this call should be moved under {{if (!pbb.hasRemaining())}} The second issue is that {{p.createBB()}} is reinitializing {{ByteBuffer}} on every iteration, which confuses {{SocketChannel.write}}. And the third issue is caused by extra calls to {{cnxn.getXid()}} that increment xid on every iteration and confuse the server. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (ZOOKEEPER-1560) Zookeeper client hangs on creation of large nodes
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13483601#comment-13483601 ] Ted Yu commented on ZOOKEEPER-1560: --- bq. similar to what it was before - write as much as possible and then use the selector to wait for the socket to become writeable again I looked at svn log for ClientCnxnSocketNIO.java back to 2011-04-12 and didn't seem to find the above change. FYI Zookeeper client hangs on creation of large nodes - Key: ZOOKEEPER-1560 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1560 Project: ZooKeeper Issue Type: Bug Components: java client Affects Versions: 3.4.4, 3.5.0 Reporter: Igor Motov Assignee: Ted Yu Fix For: 3.5.0, 3.4.5 Attachments: ZOOKEEPER-1560.patch, zookeeper-1560-v1.txt, zookeeper-1560-v2.txt, zookeeper-1560-v3.txt, zookeeper-1560-v4.txt, zookeeper-1560-v5.txt, zookeeper-1560-v6.txt, zookeeper-1560-v7.txt To reproduce, try creating a node with 0.5M of data using java client. The test will hang waiting for a response from the server. See the attached patch for the test that reproduces the issue. It seems that ZOOKEEPER-1437 introduced a few issues to {{ClientCnxnSocketNIO.doIO}} that prevent {{ClientCnxnSocketNIO}} from sending large packets that require several invocations of {{SocketChannel.write}} to complete. The first issue is that the call to {{outgoingQueue.removeFirstOccurrence(p);}} removes the packet from the queue even if the packet wasn't completely sent yet. It looks to me that this call should be moved under {{if (!pbb.hasRemaining())}} The second issue is that {{p.createBB()}} is reinitializing {{ByteBuffer}} on every iteration, which confuses {{SocketChannel.write}}. And the third issue is caused by extra calls to {{cnxn.getXid()}} that increment xid on every iteration and confuse the server. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (ZOOKEEPER-1568) multi should have a non-transaction version
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13483618#comment-13483618 ] Ted Yu commented on ZOOKEEPER-1568: --- bq. A multi will case one new snapshot/log to be generated I guess you meant 'cause' above. bq. but there was no guarantee they'd all succeed/fail. I think we need to formalize how success / failure status for individual operations in this new multi API should be delivered back to client. multi should have a non-transaction version --- Key: ZOOKEEPER-1568 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1568 Project: ZooKeeper Issue Type: Improvement Reporter: Jimmy Xiang Currently multi is transactional, i.e. all or none. However, sometimes, we don't want that. We want all operations to be executed. Even some operation(s) fails, it is ok. We just need to know the result of each operation. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (ZOOKEEPER-1568) multi should have a non-transaction version
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13483626#comment-13483626 ] Marshall McMullen commented on ZOOKEEPER-1568: -- Yes, I meant 'cause' :). The existing multi code fills in a list of results for each op. Right now, it aborts on the first op that fails and rolls back the data tree to what it was before it started. And it explicitly marks all ops after that in the results list with a runtime exception. So the mechanism is already there to communicate the errors back to the client. So I suppose the Multi code would need to take a bool to indicate if it was all or nothing or not. multi should have a non-transaction version --- Key: ZOOKEEPER-1568 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1568 Project: ZooKeeper Issue Type: Improvement Reporter: Jimmy Xiang Currently multi is transactional, i.e. all or none. However, sometimes, we don't want that. We want all operations to be executed. Even some operation(s) fails, it is ok. We just need to know the result of each operation. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (ZOOKEEPER-1568) multi should have a non-transaction version
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13483635#comment-13483635 ] Ted Yu commented on ZOOKEEPER-1568: --- bq. it aborts on the first op that fails and rolls back Should we allow operations after the failed operation to continue ? The rationale is that the operations in the batch may not have dependencies among them. multi should have a non-transaction version --- Key: ZOOKEEPER-1568 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1568 Project: ZooKeeper Issue Type: Improvement Reporter: Jimmy Xiang Currently multi is transactional, i.e. all or none. However, sometimes, we don't want that. We want all operations to be executed. Even some operation(s) fails, it is ok. We just need to know the result of each operation. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (ZOOKEEPER-1560) Zookeeper client hangs on creation of large nodes
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13483725#comment-13483725 ] Patrick Hunt commented on ZOOKEEPER-1560: - bq. PDH: similar to what it was before - write as much as possible and then use the selector to wait for the socket to become writeable again bq. Ted: I looked at svn log for ClientCnxnSocketNIO.java back to 2011-04-12 and didn't seem to find the above change. FYI Hi Ted, the following is what I was referring to. This is from latest on branch-3.3, 3.4.4 has a similar (although broken) block where it's a bit less obvious what's happening. branch-3.3 is more clear. Notice that we first attempt to write, if !remaining them we remove from the queue, otw we'll wait till the next time the selector wakes us up (the final isempty check is pretty criticial here as well to set interest correctly) and retry until the buffer is drained. {noformat} if (sockKey.isWritable()) { synchronized (outgoingQueue) { if (!outgoingQueue.isEmpty()) { ByteBuffer pbb = outgoingQueue.getFirst().bb; sock.write(pbb); if (!pbb.hasRemaining()) { sentCount++; Packet p = outgoingQueue.removeFirst(); if (p.header != null p.header.getType() != OpCode.ping p.header.getType() != OpCode.auth) { pendingQueue.add(p); } } } } } if (outgoingQueue.isEmpty()) { disableWrite(); } else { enableWrite(); } {noformat} Zookeeper client hangs on creation of large nodes - Key: ZOOKEEPER-1560 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1560 Project: ZooKeeper Issue Type: Bug Components: java client Affects Versions: 3.4.4, 3.5.0 Reporter: Igor Motov Assignee: Ted Yu Fix For: 3.5.0, 3.4.5 Attachments: ZOOKEEPER-1560.patch, zookeeper-1560-v1.txt, zookeeper-1560-v2.txt, zookeeper-1560-v3.txt, zookeeper-1560-v4.txt, zookeeper-1560-v5.txt, zookeeper-1560-v6.txt, zookeeper-1560-v7.txt To reproduce, try creating a node with 0.5M of data using java client. The test will hang waiting for a response from the server. See the attached patch for the test that reproduces the issue. It seems that ZOOKEEPER-1437 introduced a few issues to {{ClientCnxnSocketNIO.doIO}} that prevent {{ClientCnxnSocketNIO}} from sending large packets that require several invocations of {{SocketChannel.write}} to complete. The first issue is that the call to {{outgoingQueue.removeFirstOccurrence(p);}} removes the packet from the queue even if the packet wasn't completely sent yet. It looks to me that this call should be moved under {{if (!pbb.hasRemaining())}} The second issue is that {{p.createBB()}} is reinitializing {{ByteBuffer}} on every iteration, which confuses {{SocketChannel.write}}. And the third issue is caused by extra calls to {{cnxn.getXid()}} that increment xid on every iteration and confuse the server. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (ZOOKEEPER-1560) Zookeeper client hangs on creation of large nodes
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13483737#comment-13483737 ] Ted Yu commented on ZOOKEEPER-1560: --- I got the following based on the above code snippet: {code} Index: src/java/main/org/apache/zookeeper/ClientCnxnSocketNIO.java === --- src/java/main/org/apache/zookeeper/ClientCnxnSocketNIO.java (revision 1401904) +++ src/java/main/org/apache/zookeeper/ClientCnxnSocketNIO.java (working copy) @@ -111,18 +111,18 @@ cnxn.sendThread.clientTunneledAuthenticationInProgress()); if (p != null) { -outgoingQueue.removeFirstOccurrence(p); updateLastSend(); if ((p.requestHeader != null) (p.requestHeader.getType() != OpCode.ping) (p.requestHeader.getType() != OpCode.auth)) { p.requestHeader.setXid(cnxn.getXid()); } -p.createBB(); +if (p.bb == null) p.createBB(); ByteBuffer pbb = p.bb; sock.write(pbb); if (!pbb.hasRemaining()) { sentCount++; +outgoingQueue.removeFirstOccurrence(p); if (p.requestHeader != null p.requestHeader.getType() != OpCode.ping p.requestHeader.getType() != OpCode.auth) { @@ -141,8 +141,12 @@ synchronized(pendingQueue) { pendingQueue.addAll(pending); } - } +if (outgoingQueue.isEmpty()) { + disableWrite(); +} else { +enableWrite(); +} } private Packet findSendablePacket(LinkedListPacket outgoingQueue, {code} I still saw testLargeNodeData fail: {code} Testcase: testLargeNodeData took 0.714 sec Caused an ERROR KeeperErrorCode = ConnectionLoss for /large org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /large at org.apache.zookeeper.KeeperException.create(KeeperException.java:99) at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:783) at org.apache.zookeeper.test.ClientTest.testLargeNodeData(ClientTest.java:61) {code} Zookeeper client hangs on creation of large nodes - Key: ZOOKEEPER-1560 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1560 Project: ZooKeeper Issue Type: Bug Components: java client Affects Versions: 3.4.4, 3.5.0 Reporter: Igor Motov Assignee: Ted Yu Fix For: 3.5.0, 3.4.5 Attachments: ZOOKEEPER-1560.patch, zookeeper-1560-v1.txt, zookeeper-1560-v2.txt, zookeeper-1560-v3.txt, zookeeper-1560-v4.txt, zookeeper-1560-v5.txt, zookeeper-1560-v6.txt, zookeeper-1560-v7.txt To reproduce, try creating a node with 0.5M of data using java client. The test will hang waiting for a response from the server. See the attached patch for the test that reproduces the issue. It seems that ZOOKEEPER-1437 introduced a few issues to {{ClientCnxnSocketNIO.doIO}} that prevent {{ClientCnxnSocketNIO}} from sending large packets that require several invocations of {{SocketChannel.write}} to complete. The first issue is that the call to {{outgoingQueue.removeFirstOccurrence(p);}} removes the packet from the queue even if the packet wasn't completely sent yet. It looks to me that this call should be moved under {{if (!pbb.hasRemaining())}} The second issue is that {{p.createBB()}} is reinitializing {{ByteBuffer}} on every iteration, which confuses {{SocketChannel.write}}. And the third issue is caused by extra calls to {{cnxn.getXid()}} that increment xid on every iteration and confuse the server. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (ZOOKEEPER-1560) Zookeeper client hangs on creation of large nodes
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13483748#comment-13483748 ] Igor Motov commented on ZOOKEEPER-1560: --- {quote} For problem #3, I only found one call to getXid() in doIO: {code} p.requestHeader.setXid(cnxn.getXid()); {code} which is not in a loop. Some clarification would be nice. {quote} It's in the outer loop, so to speak. If the packet is large and is sent in chunks, Xid is incremented for every chunk. Before ZOOKEEPER-1437 it was incremented for every packet. Zookeeper client hangs on creation of large nodes - Key: ZOOKEEPER-1560 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1560 Project: ZooKeeper Issue Type: Bug Components: java client Affects Versions: 3.4.4, 3.5.0 Reporter: Igor Motov Assignee: Ted Yu Fix For: 3.5.0, 3.4.5 Attachments: ZOOKEEPER-1560.patch, zookeeper-1560-v1.txt, zookeeper-1560-v2.txt, zookeeper-1560-v3.txt, zookeeper-1560-v4.txt, zookeeper-1560-v5.txt, zookeeper-1560-v6.txt, zookeeper-1560-v7.txt To reproduce, try creating a node with 0.5M of data using java client. The test will hang waiting for a response from the server. See the attached patch for the test that reproduces the issue. It seems that ZOOKEEPER-1437 introduced a few issues to {{ClientCnxnSocketNIO.doIO}} that prevent {{ClientCnxnSocketNIO}} from sending large packets that require several invocations of {{SocketChannel.write}} to complete. The first issue is that the call to {{outgoingQueue.removeFirstOccurrence(p);}} removes the packet from the queue even if the packet wasn't completely sent yet. It looks to me that this call should be moved under {{if (!pbb.hasRemaining())}} The second issue is that {{p.createBB()}} is reinitializing {{ByteBuffer}} on every iteration, which confuses {{SocketChannel.write}}. And the third issue is caused by extra calls to {{cnxn.getXid()}} that increment xid on every iteration and confuse the server. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: Review Request: patch for ZOOKEEPER-1560: Zookeeper client hangs on creation of large nodes
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/7730/#review12743 --- src/java/main/org/apache/zookeeper/ClientCnxnSocketNIO.java https://reviews.apache.org/r/7730/#comment27199 Don't setXid 1x for the same packet src/java/main/org/apache/zookeeper/ClientCnxnSocketNIO.java https://reviews.apache.org/r/7730/#comment27200 Don't createBB 1x for the same packet src/java/main/org/apache/zookeeper/ClientCnxnSocketNIO.java https://reviews.apache.org/r/7730/#comment27203 Remove p from outgoingQueue only after we have finished writing it src/java/main/org/apache/zookeeper/ClientCnxnSocketNIO.java https://reviews.apache.org/r/7730/#comment27205 Always pick first packet if we already started writing it so we finished writing it - Skye Wanderman-Milne On Oct. 25, 2012, 12:50 a.m., Skye Wanderman-Milne wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/7730/ --- (Updated Oct. 25, 2012, 12:50 a.m.) Review request for zookeeper, Patrick Hunt and Ted Yu. Description --- see ZOOKEEPER-1560 JIRA This addresses bug ZOOKEEPER-1560. https://issues.apache.org/jira/browse/ZOOKEEPER-1560 Diffs - src/java/main/org/apache/zookeeper/ClientCnxnSocketNIO.java 70d8538 Diff: https://reviews.apache.org/r/7730/diff/ Testing --- unit tests (including testLargeNodeData from ZOOKEEPER-1560 JIRA) Thanks, Skye Wanderman-Milne
[jira] [Updated] (ZOOKEEPER-1560) Zookeeper client hangs on creation of large nodes
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Skye Wanderman-Milne updated ZOOKEEPER-1560: Attachment: ZOOKEEPER-1560-v8.patch I've created a new patch (ZOOKEEPER-1560-v8.patch) that incorporates what we have so far (moving removeFirstOccurrence to after the packet is completely written, only calling createBB when a BB doesn't already exist, and only calling setXid when no xid is already set). It also modifies findSendablePacket to always choose the first packet if it is partially written. The only place that a packet is prepended to outgoingQueue is ClientCnxn.primeConnection, which should only happen at the very beginning, so a partially-written packet should remain at the beginning of the queue until it is removed. I also cleaned up some of the code so the changes look more extensive than they really are :) Posted at https://reviews.apache.org/r/7730. I added comments to mark the important parts (as opposed to the clean up). Zookeeper client hangs on creation of large nodes - Key: ZOOKEEPER-1560 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1560 Project: ZooKeeper Issue Type: Bug Components: java client Affects Versions: 3.4.4, 3.5.0 Reporter: Igor Motov Assignee: Ted Yu Fix For: 3.5.0, 3.4.5 Attachments: ZOOKEEPER-1560.patch, zookeeper-1560-v1.txt, zookeeper-1560-v2.txt, zookeeper-1560-v3.txt, zookeeper-1560-v4.txt, zookeeper-1560-v5.txt, zookeeper-1560-v6.txt, zookeeper-1560-v7.txt, ZOOKEEPER-1560-v8.patch To reproduce, try creating a node with 0.5M of data using java client. The test will hang waiting for a response from the server. See the attached patch for the test that reproduces the issue. It seems that ZOOKEEPER-1437 introduced a few issues to {{ClientCnxnSocketNIO.doIO}} that prevent {{ClientCnxnSocketNIO}} from sending large packets that require several invocations of {{SocketChannel.write}} to complete. The first issue is that the call to {{outgoingQueue.removeFirstOccurrence(p);}} removes the packet from the queue even if the packet wasn't completely sent yet. It looks to me that this call should be moved under {{if (!pbb.hasRemaining())}} The second issue is that {{p.createBB()}} is reinitializing {{ByteBuffer}} on every iteration, which confuses {{SocketChannel.write}}. And the third issue is caused by extra calls to {{cnxn.getXid()}} that increment xid on every iteration and confuse the server. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Failed: ZOOKEEPER-1560 PreCommit Build #1238
Jira: https://issues.apache.org/jira/browse/ZOOKEEPER-1560 Build: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1238/ ### ## LAST 60 LINES OF THE CONSOLE ### [...truncated 257285 lines...] [exec] [exec] -1 overall. Here are the results of testing the latest attachment [exec] http://issues.apache.org/jira/secure/attachment/12550725/ZOOKEEPER-1560-v8.patch [exec] against trunk revision 1391526. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] -1 tests included. The patch doesn't appear to include any new or modified tests. [exec] Please justify why no new tests are needed for this patch. [exec] Also please list what manual steps were performed to verify this patch. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings. [exec] [exec] +1 core tests. The patch passed core unit tests. [exec] [exec] +1 contrib tests. The patch passed contrib unit tests. [exec] [exec] Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1238//testReport/ [exec] Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1238//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html [exec] Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1238//console [exec] [exec] This message is automatically generated. [exec] [exec] [exec] == [exec] == [exec] Adding comment to Jira. [exec] == [exec] == [exec] [exec] [exec] Comment added. [exec] 61ekK3nG4J logged out [exec] [exec] [exec] == [exec] == [exec] Finished build. [exec] == [exec] == [exec] [exec] BUILD FAILED /home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-Build/trunk/build.xml:1568: exec returned: 1 Total time: 27 minutes 35 seconds Build step 'Execute shell' marked build as failure Archiving artifacts Recording test results Description set: ZOOKEEPER-1560 Email was triggered for: Failure Sending email for trigger: Failure ### ## FAILED TESTS (if any) ## All tests passed
[jira] [Commented] (ZOOKEEPER-1560) Zookeeper client hangs on creation of large nodes
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13483809#comment-13483809 ] Hadoop QA commented on ZOOKEEPER-1560: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12550725/ZOOKEEPER-1560-v8.patch against trunk revision 1391526. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1238//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1238//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1238//console This message is automatically generated. Zookeeper client hangs on creation of large nodes - Key: ZOOKEEPER-1560 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1560 Project: ZooKeeper Issue Type: Bug Components: java client Affects Versions: 3.4.4, 3.5.0 Reporter: Igor Motov Assignee: Ted Yu Fix For: 3.5.0, 3.4.5 Attachments: ZOOKEEPER-1560.patch, zookeeper-1560-v1.txt, zookeeper-1560-v2.txt, zookeeper-1560-v3.txt, zookeeper-1560-v4.txt, zookeeper-1560-v5.txt, zookeeper-1560-v6.txt, zookeeper-1560-v7.txt, ZOOKEEPER-1560-v8.patch To reproduce, try creating a node with 0.5M of data using java client. The test will hang waiting for a response from the server. See the attached patch for the test that reproduces the issue. It seems that ZOOKEEPER-1437 introduced a few issues to {{ClientCnxnSocketNIO.doIO}} that prevent {{ClientCnxnSocketNIO}} from sending large packets that require several invocations of {{SocketChannel.write}} to complete. The first issue is that the call to {{outgoingQueue.removeFirstOccurrence(p);}} removes the packet from the queue even if the packet wasn't completely sent yet. It looks to me that this call should be moved under {{if (!pbb.hasRemaining())}} The second issue is that {{p.createBB()}} is reinitializing {{ByteBuffer}} on every iteration, which confuses {{SocketChannel.write}}. And the third issue is caused by extra calls to {{cnxn.getXid()}} that increment xid on every iteration and confuse the server. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: Review Request: patch for ZOOKEEPER-1560: Zookeeper client hangs on creation of large nodes
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/7730/#review12751 --- Ship it! I think this patch nicely summarizes collective feedback for this JIRA. Minor comments below. src/java/main/org/apache/zookeeper/ClientCnxnSocketNIO.java https://reviews.apache.org/r/7730/#comment27226 'p.bb will already exist' - 'p.bb would not be null' src/java/main/org/apache/zookeeper/ClientCnxnSocketNIO.java https://reviews.apache.org/r/7730/#comment27227 'we already starting' - 'we have already started' src/java/main/org/apache/zookeeper/ClientCnxnSocketNIO.java https://reviews.apache.org/r/7730/#comment27228 Remove white space introduced. - Ted Yu On Oct. 25, 2012, 12:50 a.m., Skye Wanderman-Milne wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/7730/ --- (Updated Oct. 25, 2012, 12:50 a.m.) Review request for zookeeper, Patrick Hunt and Ted Yu. Description --- see ZOOKEEPER-1560 JIRA This addresses bug ZOOKEEPER-1560. https://issues.apache.org/jira/browse/ZOOKEEPER-1560 Diffs - src/java/main/org/apache/zookeeper/ClientCnxnSocketNIO.java 70d8538 Diff: https://reviews.apache.org/r/7730/diff/ Testing --- unit tests (including testLargeNodeData from ZOOKEEPER-1560 JIRA) Thanks, Skye Wanderman-Milne
[jira] [Commented] (ZOOKEEPER-1560) Zookeeper client hangs on creation of large nodes
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13483823#comment-13483823 ] Ted Yu commented on ZOOKEEPER-1560: --- I left some minor comments on review board. Nice work, Skye. Zookeeper client hangs on creation of large nodes - Key: ZOOKEEPER-1560 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1560 Project: ZooKeeper Issue Type: Bug Components: java client Affects Versions: 3.4.4, 3.5.0 Reporter: Igor Motov Assignee: Ted Yu Fix For: 3.5.0, 3.4.5 Attachments: ZOOKEEPER-1560.patch, zookeeper-1560-v1.txt, zookeeper-1560-v2.txt, zookeeper-1560-v3.txt, zookeeper-1560-v4.txt, zookeeper-1560-v5.txt, zookeeper-1560-v6.txt, zookeeper-1560-v7.txt, ZOOKEEPER-1560-v8.patch To reproduce, try creating a node with 0.5M of data using java client. The test will hang waiting for a response from the server. See the attached patch for the test that reproduces the issue. It seems that ZOOKEEPER-1437 introduced a few issues to {{ClientCnxnSocketNIO.doIO}} that prevent {{ClientCnxnSocketNIO}} from sending large packets that require several invocations of {{SocketChannel.write}} to complete. The first issue is that the call to {{outgoingQueue.removeFirstOccurrence(p);}} removes the packet from the queue even if the packet wasn't completely sent yet. It looks to me that this call should be moved under {{if (!pbb.hasRemaining())}} The second issue is that {{p.createBB()}} is reinitializing {{ByteBuffer}} on every iteration, which confuses {{SocketChannel.write}}. And the third issue is caused by extra calls to {{cnxn.getXid()}} that increment xid on every iteration and confuse the server. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (ZOOKEEPER-1437) Client uses session before SASL authentication complete
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13483888#comment-13483888 ] Eugene Koontz commented on ZOOKEEPER-1437: -- Hi Jordan, What version of Java are you using to run the Java client? If it's Java 7, your problem in fact might be ZOOKEEPER-1550. -Eugene Client uses session before SASL authentication complete --- Key: ZOOKEEPER-1437 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1437 Project: ZooKeeper Issue Type: Bug Components: java client Affects Versions: 3.4.3 Reporter: Thomas Weise Assignee: Eugene Koontz Fix For: 3.4.4, 3.5.0 Attachments: getXidCallHierarchy.png, ZOOKEEPER-1437.patch, ZOOKEEPER-1437.patch, ZOOKEEPER-1437.patch, ZOOKEEPER-1437.patch, ZOOKEEPER-1437.patch, ZOOKEEPER-1437.patch, ZOOKEEPER-1437.patch, ZOOKEEPER-1437.patch, ZOOKEEPER-1437.patch, ZOOKEEPER-1437.patch, ZOOKEEPER-1437.patch, ZOOKEEPER-1437.patch, ZOOKEEPER-1437.patch, ZOOKEEPER-1437.patch, ZOOKEEPER-1437.patch, ZOOKEEPER-1437.patch, ZOOKEEPER-1437.patch Found issue in the context of hbase region server startup, but can be reproduced w/ zkCli alone. getData may occur prior to SaslAuthenticated and fail with NoAuth. This is not expected behavior when the client is configured to use SASL. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (BOOKKEEPER-362) Local subscriptions fail if remote region is down
[ https://issues.apache.org/jira/browse/BOOKKEEPER-362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13483033#comment-13483033 ] Sijie Guo commented on BOOKKEEPER-362: -- It seems that the new patch could not be applied to latest trunk. I took a look at the patch and had some comments as below: 1) #checkTopicSubscribedFromRegion #setTopicUnsubscribedFromRegion setTopicSubscribedFromRegion 1.1) where to manage these informations? I don't think it was a good idea to TopicManager. Because TopicManager takes a role to manage the ownership of topics, but these three method tries to interactive with metadata related to region info, which is similar as SubscriptionDataManager and TopicPersistenceInfoManager. So I would suggest you to move it to a brand new metadata manager like 'TopicRemoteSubscriptionDataManager' and put this manager under MetadataManagerFactory. And you could provide a ZKTopicRemoteSubscriptionDataManager in ZKMetadataManagerFactory. so you don't need to worry the TODO issue as you described in MMTopicManager. And it would make the responsibilities more clear. 1.2) the name of these methods? from my understanding about this issue, the data to record is what regions that the hub has subscribed the topic remotely. so a better name might be 'To' not 'From', like '#checkTopicSubscribedToRegion' '#setTopicUnsubscribedToRegion', '#setTopicSubscribedToRegion'. If my understanding is not right, please correct me. 1.3) It would be better to use colo name rather than regionAddress, I think. since regionAddress might be changed to a different address. Besides that, ZooKeeperServiceDown is not a good name, I would suggest using name like 'MetadataServiceDown'. 2) {code} -// no subscriptions now, it may be removed by other release ops -if (null != topicSubscriptions) { -for (ByteString subId : topicSubscriptions.keySet()) { -if (logger.isDebugEnabled()) { -logger.debug(Stop serving subscriber ( + topic.toStringUtf8() + , - + subId.toStringUtf8() + ) when losing topic); -} -if (null != dm) { -dm.stopServingSubscriber(topic, subId); -} -} -} -if (logger.isDebugEnabled()) { -logger.debug(Stop serving topic + topic.toStringUtf8()); -} -// Since we decrement local count when some of remote subscriptions failed, -// while we don't unsubscribe those succeed subscriptions. so we can't depends -// on local count, just try to notify unsubscribe. -notifyLastLocalUnsubscribe(topic); + cb.operationFinished(ctx, null); {code} seems that you removed some logic (stop serving subscriber when unsubscribe) when you rebased the patch. (3) It was great to clean up the boolean flag in ReleaseOp but adding backup map. But seems that this fix is not relate to this jira. so if it was convient, when you generate a new patch, could you split this part into a separated JIRA? {code} public class InMemorySubscriptionManager extends AbstractSubscriptionManager { +// Backup for top2sub2seq +final ConcurrentHashMapByteString, MapByteString, InMemorySubscriptionState _top2sub2seq = +new ConcurrentHashMapByteString, MapByteString, InMemorySubscriptionState(); {code} (4) indent issue {code} if (LOGGER.isDebugEnabled()) -LOGGER.debug([ + myRegion + ] cross-region recv-fwd succeeded for topic +LOGGER.debug([ + myRegion + ] cross-region recv-fwd succeeded for topic + topic.toStringUtf8()); {code} I saw some code are in wrong indent. I think it might be introduced by rebase. Local subscriptions fail if remote region is down - Key: BOOKKEEPER-362 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-362 Project: Bookkeeper Issue Type: Bug Components: hedwig-server Affects Versions: 4.2.0 Reporter: Aniruddha Assignee: Aniruddha Priority: Critical Labels: hedwig Attachments: 0001-Ignore-hub-client-remote-subscription-failure-if-we-.patch, rebase_remoteregion.patch Currently, local subscriptions fail if the remote region hubs are down, even if the local hub has subscribed to the remote topic previously. Because of this, one region cannot function
[jira] [Commented] (BOOKKEEPER-336) bookie readEntries is taking more time if the ensemble has failed bookie(s)
[ https://issues.apache.org/jira/browse/BOOKKEEPER-336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13483042#comment-13483042 ] Rakesh R commented on BOOKKEEPER-336: - Thanks Ivan, Stud for the participation. I also agree to go with read timeouts. I could see the parallel read to quorum bookies would make the bkserver busy/exhaust with many read requests and may affect the write latency in worst case. But good thing is, we have BOOKKEEPER-429 idea of separate read/write threads, this would help us in latency and IMO this issue also go together. bookie readEntries is taking more time if the ensemble has failed bookie(s) --- Key: BOOKKEEPER-336 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-336 Project: Bookkeeper Issue Type: Bug Affects Versions: 4.1.0 Reporter: Brahma Reddy Battula Attachments: BOOKKEEPER-336.1.patch, BOOKKEEPER-336.draft1.diff, BOOKKEEPER-336.patch Scenario: 1) Start three bookies. Create ledger with ensemblesize=3, quorumsize=2 2) Add 100 entries to this ledger 3) Make first bookie down and read the entries from 0-99 Output: Each entry is going to fetch from the failed bookie and is waiting for the bookie connection timeout, only after failure going to next bookie. This is affecting the read entry performance. Impact: Namenode switching time will be affected by adding this failed bookie readTimeOut also. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Review Request: BOOKKEEPER-368: Implementing multiplexing java client.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/7724/ --- Review request for bookkeeper. Description --- Implement a multiplexing java client. This addresses bug BOOKKEEPER-368. https://issues.apache.org/jira/browse/BOOKKEEPER-368 Diffs - hedwig-client/src/main/java/org/apache/hedwig/client/conf/ClientConfiguration.java fa2c6d6 hedwig-client/src/main/java/org/apache/hedwig/client/handlers/SubscribeResponseHandler.java b8e5aec hedwig-client/src/main/java/org/apache/hedwig/client/netty/HedwigClientImpl.java 1724e04 hedwig-client/src/main/java/org/apache/hedwig/client/netty/impl/HChannelHandler.java 7753c6e hedwig-client/src/main/java/org/apache/hedwig/client/netty/impl/multiplex/MultiplexHChannelManager.java PRE-CREATION hedwig-client/src/main/java/org/apache/hedwig/client/netty/impl/multiplex/MultiplexSubscribeResponseHandler.java PRE-CREATION hedwig-client/src/main/java/org/apache/hedwig/client/netty/impl/multiplex/MultiplexSubscriptionChannelPipelineFactory.java PRE-CREATION hedwig-client/src/main/java/org/apache/hedwig/client/netty/impl/multiplex/ResubscribeCallback.java PRE-CREATION hedwig-client/src/main/java/org/apache/hedwig/client/netty/impl/simple/SimpleSubscribeResponseHandler.java a426a7b hedwig-protocol/src/main/java/org/apache/hedwig/protocol/PubSubProtocol.java 8d8f2ac hedwig-protocol/src/main/java/org/apache/hedwig/protoextensions/PubSubResponseUtils.java af69043 hedwig-protocol/src/main/protobuf/PubSubProtocol.proto 7fafcce hedwig-server/src/main/java/org/apache/hedwig/server/delivery/FIFODeliveryManager.java fd5f448 hedwig-server/src/main/java/org/apache/hedwig/server/handlers/SubscribeHandler.java dfcde9f hedwig-server/src/main/java/org/apache/hedwig/server/handlers/SubscriptionChannelManager.java 2a8d093 hedwig-server/src/main/java/org/apache/hedwig/server/proxy/ChannelTracker.java 5bfd898 hedwig-server/src/main/java/org/apache/hedwig/server/proxy/HedwigProxy.java 35f8b64 hedwig-server/src/main/java/org/apache/hedwig/server/proxy/ProxyCloseSubscriptionHandler.java PRE-CREATION hedwig-server/src/test/java/org/apache/hedwig/client/TestPubSubClient.java ce0f3f6 hedwig-server/src/test/java/org/apache/hedwig/client/netty/TestCloseSubscription.java bf74df1 hedwig-server/src/test/java/org/apache/hedwig/client/netty/TestMultiplexing.java PRE-CREATION hedwig-server/src/test/java/org/apache/hedwig/server/HedwigRegionTestBase.java 4ec0d50 hedwig-server/src/test/java/org/apache/hedwig/server/delivery/TestThrottlingDelivery.java 4338825 hedwig-server/src/test/java/org/apache/hedwig/server/handlers/TestSubUnsubHandler.java 5bbf603 hedwig-server/src/test/java/org/apache/hedwig/server/integration/TestHedwigHub.java 02b4503 hedwig-server/src/test/java/org/apache/hedwig/server/integration/TestHedwigRegion.java 0b1851e Diff: https://reviews.apache.org/r/7724/diff/ Testing --- Passed all testing. Thanks, Sijie Guo
[jira] [Updated] (BOOKKEEPER-368) Implementing multiplexing java client.
[ https://issues.apache.org/jira/browse/BOOKKEEPER-368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sijie Guo updated BOOKKEEPER-368: - Attachment: BOOKKEEPER-368.diff Attach a new patch rebased to latest trunk. Also put it on review board: https://reviews.apache.org/r/7724/ Implementing multiplexing java client. -- Key: BOOKKEEPER-368 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-368 Project: Bookkeeper Issue Type: Sub-task Reporter: Sijie Guo Assignee: Sijie Guo Fix For: 4.2.0 Attachments: BOOKKEEPER-368.diff, BOOKKEEPER-368.diff Implement a multiplexing java client. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: Review Request: [BOOKKEEPER-204] Provide a MetaStore interface, and a mock implementation.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/7314/ --- (Updated Oct. 24, 2012, 12:07 p.m.) Review request for bookkeeper. Changes --- Make some changes following Ivan's comments and refine test case. BTW, although MetastoreTable#pub is removed, MetastoreTable#versionedPut(.., Version.ANY) can also update data without comparing version. I don't quite sure whether to disable this usage or not. Description --- We need a MetaStore interface which easy for us to plugin different scalable k/v storage, such as HBase. This addresses bug BOOKKEEPER-204. https://issues.apache.org/jira/browse/BOOKKEEPER-204 Diffs (updated) - bookkeeper-server/src/main/java/org/apache/bookkeeper/metastore/MSException.java PRE-CREATION bookkeeper-server/src/main/java/org/apache/bookkeeper/metastore/MetaStore.java PRE-CREATION bookkeeper-server/src/main/java/org/apache/bookkeeper/metastore/MetastoreCallback.java PRE-CREATION bookkeeper-server/src/main/java/org/apache/bookkeeper/metastore/MetastoreCursor.java PRE-CREATION bookkeeper-server/src/main/java/org/apache/bookkeeper/metastore/MetastoreException.java PRE-CREATION bookkeeper-server/src/main/java/org/apache/bookkeeper/metastore/MetastoreFactory.java PRE-CREATION bookkeeper-server/src/main/java/org/apache/bookkeeper/metastore/MetastoreScannableTable.java PRE-CREATION bookkeeper-server/src/main/java/org/apache/bookkeeper/metastore/MetastoreTable.java PRE-CREATION bookkeeper-server/src/main/java/org/apache/bookkeeper/metastore/MetastoreTableItem.java PRE-CREATION bookkeeper-server/src/main/java/org/apache/bookkeeper/metastore/Value.java PRE-CREATION bookkeeper-server/src/main/java/org/apache/bookkeeper/metastore/mock/MockMetaStore.java PRE-CREATION bookkeeper-server/src/main/java/org/apache/bookkeeper/metastore/mock/MockMetastoreCursor.java PRE-CREATION bookkeeper-server/src/main/java/org/apache/bookkeeper/metastore/mock/MockMetastoreTable.java PRE-CREATION bookkeeper-server/src/test/java/org/apache/bookkeeper/metastore/MetastoreScannableTableAsyncToSyncConverter.java PRE-CREATION bookkeeper-server/src/test/java/org/apache/bookkeeper/metastore/MetastoreTableAsyncToSyncConverter.java PRE-CREATION bookkeeper-server/src/test/java/org/apache/bookkeeper/metastore/TestMetaStore.java PRE-CREATION Diff: https://reviews.apache.org/r/7314/diff/ Testing --- Thanks, Jiannan Wang
Re: Review Request: [BOOKKEEPER-204] Provide a MetaStore interface, and a mock implementation.
On Oct. 22, 2012, 2:11 p.m., Ivan Kelly wrote: bookkeeper-server/src/main/java/org/apache/bookkeeper/metastore/Value.java, line 37 https://reviews.apache.org/r/7314/diff/1/?file=160309#file160309line37 value should only be a byte[]. adding fields like this does, overexpands the scope of the change without a strong need. Jiannan Wang wrote: Currently, SubscriptionData contains preference and state information, where state is updated frequently while preference changes only when sub. To make better performance, SubscriptionDataManager supports partial update operation. This is the reason why we introduce fields in Value to support updating a specific field. This is a lot of complexity to add for a single corner case. A simpler solution would be to split SubscriptionData before writing to the metadata interface and to write to 2 separate keys, subid-prefs and subid-state for example. - Ivan --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/7314/#review12653 --- On Oct. 24, 2012, 12:07 p.m., Jiannan Wang wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/7314/ --- (Updated Oct. 24, 2012, 12:07 p.m.) Review request for bookkeeper. Description --- We need a MetaStore interface which easy for us to plugin different scalable k/v storage, such as HBase. This addresses bug BOOKKEEPER-204. https://issues.apache.org/jira/browse/BOOKKEEPER-204 Diffs - bookkeeper-server/src/main/java/org/apache/bookkeeper/metastore/MSException.java PRE-CREATION bookkeeper-server/src/main/java/org/apache/bookkeeper/metastore/MetaStore.java PRE-CREATION bookkeeper-server/src/main/java/org/apache/bookkeeper/metastore/MetastoreCallback.java PRE-CREATION bookkeeper-server/src/main/java/org/apache/bookkeeper/metastore/MetastoreCursor.java PRE-CREATION bookkeeper-server/src/main/java/org/apache/bookkeeper/metastore/MetastoreException.java PRE-CREATION bookkeeper-server/src/main/java/org/apache/bookkeeper/metastore/MetastoreFactory.java PRE-CREATION bookkeeper-server/src/main/java/org/apache/bookkeeper/metastore/MetastoreScannableTable.java PRE-CREATION bookkeeper-server/src/main/java/org/apache/bookkeeper/metastore/MetastoreTable.java PRE-CREATION bookkeeper-server/src/main/java/org/apache/bookkeeper/metastore/MetastoreTableItem.java PRE-CREATION bookkeeper-server/src/main/java/org/apache/bookkeeper/metastore/Value.java PRE-CREATION bookkeeper-server/src/main/java/org/apache/bookkeeper/metastore/mock/MockMetaStore.java PRE-CREATION bookkeeper-server/src/main/java/org/apache/bookkeeper/metastore/mock/MockMetastoreCursor.java PRE-CREATION bookkeeper-server/src/main/java/org/apache/bookkeeper/metastore/mock/MockMetastoreTable.java PRE-CREATION bookkeeper-server/src/test/java/org/apache/bookkeeper/metastore/MetastoreScannableTableAsyncToSyncConverter.java PRE-CREATION bookkeeper-server/src/test/java/org/apache/bookkeeper/metastore/MetastoreTableAsyncToSyncConverter.java PRE-CREATION bookkeeper-server/src/test/java/org/apache/bookkeeper/metastore/TestMetaStore.java PRE-CREATION Diff: https://reviews.apache.org/r/7314/diff/ Testing --- Thanks, Jiannan Wang
[jira] [Commented] (BOOKKEEPER-390) Provide support for ZooKeeper authentication
[ https://issues.apache.org/jira/browse/BOOKKEEPER-390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13483285#comment-13483285 ] Flavio Junqueira commented on BOOKKEEPER-390: - I'd like to ask a couple of questions just for my own understanding, it is not (yet) a criticism to this approach: # When creating a bookkeeper object, we have the option of passing a zookeeper object. What if we require that, in the case of zookeeper authentication enabled, the application creates a zookeeper object before using bookkeeper? # We are moving towards having a MetaStore interface (BOOKKEEPER-204) so that we can use different backends to store metadata. Should we be looking into implementing a more general approach that fits into the MetaStore interface an enables authentication anything that supports SASL? Provide support for ZooKeeper authentication Key: BOOKKEEPER-390 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-390 Project: Bookkeeper Issue Type: New Feature Components: bookkeeper-client, bookkeeper-server Affects Versions: 4.0.0 Reporter: Rakesh R Assignee: Rakesh R Attachments: BOOKKEEPER-390-Acl-draftversion.patch This JIRA adds support for protecting the state of Bookkeeper znodes on a multi-tenant ZooKeeper cluster. Use case: When user tries to run a ZK cluster in multitenant mode, where more than one client service would like to share a single ZK service instance (cluster). In this case the client services typically want to protect their data (ZK znodes) from access by other services (tenants) on the cluster. Say you are running BK, HBase or ZKFC instances, etc... having authentication/authorization on the znodes is important for both security and helping to ensure that services don't interact negatively (touch each other's data). Presently Bookkeeper does not have support for authentication or authorization while accessing to ZK. This should be added to the BK clients/server that are accessing the ZK cluster. In general it means calling addAuthInfo once after a session is established -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Jenkins build is still unstable: bookkeeper-trunk » bookkeeper-server #769
See https://builds.apache.org/job/bookkeeper-trunk/org.apache.bookkeeper$bookkeeper-server/769/
Jenkins build is still unstable: bookkeeper-trunk » hedwig-server #769
See https://builds.apache.org/job/bookkeeper-trunk/org.apache.bookkeeper$hedwig-server/769/
[jira] [Comment Edited] (BOOKKEEPER-390) Provide support for ZooKeeper authentication
[ https://issues.apache.org/jira/browse/BOOKKEEPER-390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13483285#comment-13483285 ] Flavio Junqueira edited comment on BOOKKEEPER-390 at 10/24/12 3:05 PM: --- I'd like to ask a couple of questions just for my own understanding, it is not (yet) a criticism to this approach: # When creating a bookkeeper object, we have the option of passing a zookeeper object. What if we require that, in the case of zookeeper authentication enabled, the application creates a zookeeper object before using bookkeeper? # We are moving towards having a MetaStore interface (BOOKKEEPER-204) so that we can use different backends to store metadata. Should we be looking into implementing a more general approach that fits into the MetaStore interface and enables authentication for anything that supports SASL? was (Author: fpj): I'd like to ask a couple of questions just for my own understanding, it is not (yet) a criticism to this approach: # When creating a bookkeeper object, we have the option of passing a zookeeper object. What if we require that, in the case of zookeeper authentication enabled, the application creates a zookeeper object before using bookkeeper? # We are moving towards having a MetaStore interface (BOOKKEEPER-204) so that we can use different backends to store metadata. Should we be looking into implementing a more general approach that fits into the MetaStore interface an enables authentication anything that supports SASL? Provide support for ZooKeeper authentication Key: BOOKKEEPER-390 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-390 Project: Bookkeeper Issue Type: New Feature Components: bookkeeper-client, bookkeeper-server Affects Versions: 4.0.0 Reporter: Rakesh R Assignee: Rakesh R Attachments: BOOKKEEPER-390-Acl-draftversion.patch This JIRA adds support for protecting the state of Bookkeeper znodes on a multi-tenant ZooKeeper cluster. Use case: When user tries to run a ZK cluster in multitenant mode, where more than one client service would like to share a single ZK service instance (cluster). In this case the client services typically want to protect their data (ZK znodes) from access by other services (tenants) on the cluster. Say you are running BK, HBase or ZKFC instances, etc... having authentication/authorization on the znodes is important for both security and helping to ensure that services don't interact negatively (touch each other's data). Presently Bookkeeper does not have support for authentication or authorization while accessing to ZK. This should be added to the BK clients/server that are accessing the ZK cluster. In general it means calling addAuthInfo once after a session is established -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Jenkins build is still unstable: bookkeeper-trunk #769
See https://builds.apache.org/job/bookkeeper-trunk/changes
[jira] [Created] (BOOKKEEPER-441) InMemorySubscriptionManager should back up top2sub2seq before change it
Yixue (Andrew) Zhu created BOOKKEEPER-441: - Summary: InMemorySubscriptionManager should back up top2sub2seq before change it Key: BOOKKEEPER-441 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-441 Project: Bookkeeper Issue Type: Bug Components: hedwig-server Affects Versions: 4.3.0 Environment: unix Reporter: Yixue (Andrew) Zhu Priority: Minor Fix For: 4.3.0 On topic loss, InMemorySubscriptionManager currently does not clear top2sub2seq. The intent is to allow readSubscription to get the information there. This introduce dependency outside the class, evidence is that general ReleaseOp has to use a boolean parameter which targets this implementation detail. Further, this prevents Acquire-topic to notify listeners (notifyFirstLocalSubscribe is not called) of first subscription to act appropriately. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (BOOKKEEPER-346) Detect IOExceptions in LedgerCache and bookie should look at next ledger dir(if any)
[ https://issues.apache.org/jira/browse/BOOKKEEPER-346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13483373#comment-13483373 ] Ivan Kelly commented on BOOKKEEPER-346: --- I dont like the coupling of the journal with ledger storage. It makes things hard to test and benchmark in isolation. Whats more, it's unnecessary. If we have the copy sequence as above 1) copy A.idx to A.idx.rloc 2) delete A.idx 3) rename A.idx.rloc to A.idx the problematic case is if we crash after 2, before 3 completes. But if on initialization of the LedgerCacheImpl, we scan all directories for A.idx.rloc, if A.idx exists the copy was incomplete, remove A.idx.rloc, if A.idx does not exist, rename A.idx.rloc to A.idx. Theres no need to mess with the journal at all. Detect IOExceptions in LedgerCache and bookie should look at next ledger dir(if any) Key: BOOKKEEPER-346 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-346 Project: Bookkeeper Issue Type: Sub-task Components: bookkeeper-server Affects Versions: 4.1.0 Reporter: Rakesh R Assignee: Vinay Fix For: 4.2.0 Attachments: BOOKKEEPER-346.patch, BOOKKEEPER-346.patch, BOOKKEEPER-346.patch, BOOKKEEPER-346.patch, BOOKKEEPER-346.patch, BOOKKEEPER-346.patch This jira to detect IOExceptions in LedgerCache to iterate over all the configured ledger(s). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (BOOKKEEPER-441) InMemorySubscriptionManager should back up top2sub2seq before change it
[ https://issues.apache.org/jira/browse/BOOKKEEPER-441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13483378#comment-13483378 ] Yixue (Andrew) Zhu commented on BOOKKEEPER-441: --- I cannot assign the issue to myself for some reason (Edit/More Actions do not have the option). InMemorySubscriptionManager should back up top2sub2seq before change it --- Key: BOOKKEEPER-441 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-441 Project: Bookkeeper Issue Type: Bug Components: hedwig-server Affects Versions: 4.3.0 Environment: unix Reporter: Yixue (Andrew) Zhu Priority: Minor Labels: patch Fix For: 4.3.0 On topic loss, InMemorySubscriptionManager currently does not clear top2sub2seq. The intent is to allow readSubscription to get the information there. This introduce dependency outside the class, evidence is that general ReleaseOp has to use a boolean parameter which targets this implementation detail. Further, this prevents Acquire-topic to notify listeners (notifyFirstLocalSubscribe is not called) of first subscription to act appropriately. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (BOOKKEEPER-441) InMemorySubscriptionManager should back up top2sub2seq before change it
[ https://issues.apache.org/jira/browse/BOOKKEEPER-441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yixue (Andrew) Zhu updated BOOKKEEPER-441: -- Attachment: BackupTop2Sub2Seq.patch InMemorySubscriptionManager should back up top2sub2seq before change it --- Key: BOOKKEEPER-441 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-441 Project: Bookkeeper Issue Type: Bug Components: hedwig-server Affects Versions: 4.3.0 Environment: unix Reporter: Yixue (Andrew) Zhu Assignee: Yixue (Andrew) Zhu Priority: Minor Labels: patch Fix For: 4.3.0 Attachments: BackupTop2Sub2Seq.patch On topic loss, InMemorySubscriptionManager currently does not clear top2sub2seq. The intent is to allow readSubscription to get the information there. This introduce dependency outside the class, evidence is that general ReleaseOp has to use a boolean parameter which targets this implementation detail. Further, this prevents Acquire-topic to notify listeners (notifyFirstLocalSubscribe is not called) of first subscription to act appropriately. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (BOOKKEEPER-428) Expose command options in bookie scripts to disable/enable auto recovery temporarily
[ https://issues.apache.org/jira/browse/BOOKKEEPER-428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rakesh R updated BOOKKEEPER-428: Attachment: BOOKKEEPER-428.patch Expose command options in bookie scripts to disable/enable auto recovery temporarily Key: BOOKKEEPER-428 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-428 Project: Bookkeeper Issue Type: Sub-task Components: bookkeeper-auto-recovery Affects Versions: 4.0.0 Reporter: Rakesh R Assignee: Rakesh R Fix For: 4.2.0 Attachments: BOOKKEEPER-428.patch Administrators can invoke disble/enable autorecovery options through bookie shell. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (BOOKKEEPER-428) Expose command options in bookie scripts to disable/enable auto recovery temporarily
[ https://issues.apache.org/jira/browse/BOOKKEEPER-428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13483492#comment-13483492 ] Rakesh R commented on BOOKKEEPER-428: - Attached patch, which has the following command options for toggling autorecovery. Could you please review. Thanks {code} -d,--disable Disable auto recovery of underreplicated ledgers -e,--enable Enable auto recovery of underreplicated ledgers {code} Expose command options in bookie scripts to disable/enable auto recovery temporarily Key: BOOKKEEPER-428 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-428 Project: Bookkeeper Issue Type: Sub-task Components: bookkeeper-auto-recovery Affects Versions: 4.0.0 Reporter: Rakesh R Assignee: Rakesh R Fix For: 4.2.0 Attachments: BOOKKEEPER-428.patch Administrators can invoke disble/enable autorecovery options through bookie shell. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (BOOKKEEPER-441) InMemorySubscriptionManager should back up top2sub2seq before change it
[ https://issues.apache.org/jira/browse/BOOKKEEPER-441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13483741#comment-13483741 ] Yixue (Andrew) Zhu commented on BOOKKEEPER-441: --- 4.2.0 sounds good. will update it. InMemorySubscriptionManager should back up top2sub2seq before change it --- Key: BOOKKEEPER-441 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-441 Project: Bookkeeper Issue Type: Bug Components: hedwig-server Affects Versions: 4.3.0 Environment: unix Reporter: Yixue (Andrew) Zhu Assignee: Yixue (Andrew) Zhu Priority: Minor Labels: patch Fix For: 4.3.0 Attachments: BackupTop2Sub2Seq.patch On topic loss, InMemorySubscriptionManager currently does not clear top2sub2seq. The intent is to allow readSubscription to get the information there. This introduce dependency outside the class, evidence is that general ReleaseOp has to use a boolean parameter which targets this implementation detail. Further, this prevents Acquire-topic to notify listeners (notifyFirstLocalSubscribe is not called) of first subscription to act appropriately. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (BOOKKEEPER-441) InMemorySubscriptionManager should back up top2sub2seq before change it
[ https://issues.apache.org/jira/browse/BOOKKEEPER-441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yixue (Andrew) Zhu updated BOOKKEEPER-441: -- Attachment: BackupTop2Sub2Seq.patch InMemorySubscriptionManager should back up top2sub2seq before change it --- Key: BOOKKEEPER-441 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-441 Project: Bookkeeper Issue Type: Bug Components: hedwig-server Affects Versions: 4.2.0 Environment: unix Reporter: Yixue (Andrew) Zhu Assignee: Yixue (Andrew) Zhu Priority: Minor Labels: patch Fix For: 4.2.0 Attachments: BackupTop2Sub2Seq.patch, BackupTop2Sub2Seq.patch On topic loss, InMemorySubscriptionManager currently does not clear top2sub2seq. The intent is to allow readSubscription to get the information there. This introduce dependency outside the class, evidence is that general ReleaseOp has to use a boolean parameter which targets this implementation detail. Further, this prevents Acquire-topic to notify listeners (notifyFirstLocalSubscribe is not called) of first subscription to act appropriately. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Review Request: Backup topic-sub inside InMemorySubscriptionManager
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/7731/ --- Review request for bookkeeper, Ivan Kelly, Sijie Guo, and Aniruddha Laud. Description --- On topic loss, InMemorySubscriptionManager currently does not clear top2sub2seq. The intent is to allow readSubscription to get the information there. This introduce dependency outside the class, evidence is that general ReleaseOp has to use a boolean parameter which targets this implementation detail. Further, this prevents Acquire-topic to notify listeners (notifyFirstLocalSubscribe is not called) of first subscription to act appropriately. This change address the issue. This addresses bug BOOKKEEPER-441. https://issues.apache.org/jira/browse/BOOKKEEPER-441 Diffs - hedwig-server/src/main/java/org/apache/hedwig/server/subscriptions/AbstractSubscriptionManager.java 5552265 hedwig-server/src/main/java/org/apache/hedwig/server/subscriptions/InMemorySubscriptionManager.java 1400e49 Diff: https://reviews.apache.org/r/7731/diff/ Testing --- Unit tests Thanks, Yixue (Andrew) Zhu
[jira] [Commented] (BOOKKEEPER-362) Local subscriptions fail if remote region is down
[ https://issues.apache.org/jira/browse/BOOKKEEPER-362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13483843#comment-13483843 ] Yixue (Andrew) Zhu commented on BOOKKEEPER-362: --- I will change 1) and 3). As to 2), ZooKeeperServiceDown is more general exception, not specific for this issue. Local subscriptions fail if remote region is down - Key: BOOKKEEPER-362 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-362 Project: Bookkeeper Issue Type: Bug Components: hedwig-server Affects Versions: 4.2.0 Reporter: Aniruddha Assignee: Yixue (Andrew) Zhu Priority: Critical Labels: hedwig Attachments: 0001-Ignore-hub-client-remote-subscription-failure-if-we-.patch, rebase_remoteregion.patch Currently, local subscriptions fail if the remote region hubs are down, even if the local hub has subscribed to the remote topic previously. Because of this, one region cannot function independent of the other. A more detailed discussion related to this can be found here http://mail-archives.apache.org/mod_mbox/zookeeper-bookkeeper-dev/201208.mbox/%3cCAOLhyDQSOF+Y+pvnyrd-HJRq1YEr=c8ok_b3_mr81r1g-9m...@mail.gmail.com%3e -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (BOOKKEEPER-362) Local subscriptions fail if remote region is down
[ https://issues.apache.org/jira/browse/BOOKKEEPER-362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yixue (Andrew) Zhu updated BOOKKEEPER-362: -- Attachment: rebase_remoteregion.patch Local subscriptions fail if remote region is down - Key: BOOKKEEPER-362 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-362 Project: Bookkeeper Issue Type: Bug Components: hedwig-server Affects Versions: 4.2.0 Reporter: Aniruddha Assignee: Yixue (Andrew) Zhu Priority: Critical Labels: hedwig Fix For: 4.2.0 Attachments: 0001-Ignore-hub-client-remote-subscription-failure-if-we-.patch, rebase_remoteregion.patch, rebase_remoteregion.patch Currently, local subscriptions fail if the remote region hubs are down, even if the local hub has subscribed to the remote topic previously. Because of this, one region cannot function independent of the other. A more detailed discussion related to this can be found here http://mail-archives.apache.org/mod_mbox/zookeeper-bookkeeper-dev/201208.mbox/%3cCAOLhyDQSOF+Y+pvnyrd-HJRq1YEr=c8ok_b3_mr81r1g-9m...@mail.gmail.com%3e -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Review Request: Track remote region subscribed status in zookeeper node under topic
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/7732/ --- Review request for bookkeeper, Ivan Kelly and Sijie Guo. Description --- Ignore hub-client remote subscription failure if we have succeeded before, to handle transient remote region unavailable scenerio# The following invariants are maintained: 1. Track remote region subscribed status in zookeeper node under topic, which is best effort for now. 2. Untrack remote region subscribed status once last local subscription for the topic is removed. 3. The remote subscription is still attempted if no active local subscriptions on bootstrap, and retried until it succeed. This addresses bug BOOKKEEPER-362. https://issues.apache.org/jira/browse/BOOKKEEPER-362 Diffs - hedwig-client/src/main/java/org/apache/hedwig/util/HedwigSocketAddress.java 8bfdada hedwig-protocol/src/main/java/org/apache/hedwig/exceptions/PubSubException.java f2a20d0 hedwig-protocol/src/main/java/org/apache/hedwig/protocol/PubSubProtocol.java 8d8f2ac hedwig-protocol/src/main/protobuf/PubSubProtocol.proto 7fafcce hedwig-server/src/main/java/org/apache/hedwig/server/meta/MetadataManagerFactory.java bca37d2 hedwig-server/src/main/java/org/apache/hedwig/server/meta/RemoteSubscriptionDataManager.java PRE-CREATION hedwig-server/src/main/java/org/apache/hedwig/server/meta/ZkMetadataManagerFactory.java e65ad78 hedwig-server/src/main/java/org/apache/hedwig/server/netty/PubSubServer.java c06f03a hedwig-server/src/main/java/org/apache/hedwig/server/regions/HedwigHubClient.java 063a99c hedwig-server/src/main/java/org/apache/hedwig/server/regions/HedwigHubClientFactory.java 68d317e hedwig-server/src/main/java/org/apache/hedwig/server/regions/HedwigHubSubscriber.java 7055251 hedwig-server/src/main/java/org/apache/hedwig/server/regions/NoOpRemoteSubscriptionManager.java PRE-CREATION hedwig-server/src/main/java/org/apache/hedwig/server/regions/RegionManager.java bae960b hedwig-server/src/main/java/org/apache/hedwig/server/regions/ZKRemoteSubscriptionManager.java PRE-CREATION hedwig-server/src/main/java/org/apache/hedwig/server/subscriptions/AbstractSubscriptionManager.java 5552265 hedwig-server/src/main/java/org/apache/hedwig/server/subscriptions/InMemorySubscriptionManager.java 1400e49 hedwig-server/src/main/java/org/apache/hedwig/server/subscriptions/SubscriptionEventListener.java 6c6e626 hedwig-server/src/test/java/org/apache/hedwig/server/TestRegionSubscribe.java PRE-CREATION hedwig-server/src/test/java/org/apache/hedwig/server/meta/TestMetadataManagerFactory.java 44c30d7 hedwig-server/src/test/java/org/apache/hedwig/server/regions/TestNoOpRemoteSubManager.java PRE-CREATION hedwig-server/src/test/java/org/apache/hedwig/server/regions/TestZkRemoteSubManager.java PRE-CREATION Diff: https://reviews.apache.org/r/7732/diff/ Testing --- Added unit tests + existing Thanks, Yixue (Andrew) Zhu
Re: Review Request: Track remote region subscribed status in zookeeper node under topic
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/7732/#review12752 --- hedwig-server/src/main/java/org/apache/hedwig/server/subscriptions/AbstractSubscriptionManager.java https://reviews.apache.org/r/7732/#comment27229 This change is covered by BOOKKEEPER-441, which I already sent separate code review out. It is needed here, but will be checked in separately. - Yixue (Andrew) Zhu On Oct. 25, 2012, 2:45 a.m., Yixue (Andrew) Zhu wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/7732/ --- (Updated Oct. 25, 2012, 2:45 a.m.) Review request for bookkeeper, Ivan Kelly and Sijie Guo. Description --- Ignore hub-client remote subscription failure if we have succeeded before, to handle transient remote region unavailable scenerio# The following invariants are maintained: 1. Track remote region subscribed status in zookeeper node under topic, which is best effort for now. 2. Untrack remote region subscribed status once last local subscription for the topic is removed. 3. The remote subscription is still attempted if no active local subscriptions on bootstrap, and retried until it succeed. This addresses bug BOOKKEEPER-362. https://issues.apache.org/jira/browse/BOOKKEEPER-362 Diffs - hedwig-client/src/main/java/org/apache/hedwig/util/HedwigSocketAddress.java 8bfdada hedwig-protocol/src/main/java/org/apache/hedwig/exceptions/PubSubException.java f2a20d0 hedwig-protocol/src/main/java/org/apache/hedwig/protocol/PubSubProtocol.java 8d8f2ac hedwig-protocol/src/main/protobuf/PubSubProtocol.proto 7fafcce hedwig-server/src/main/java/org/apache/hedwig/server/meta/MetadataManagerFactory.java bca37d2 hedwig-server/src/main/java/org/apache/hedwig/server/meta/RemoteSubscriptionDataManager.java PRE-CREATION hedwig-server/src/main/java/org/apache/hedwig/server/meta/ZkMetadataManagerFactory.java e65ad78 hedwig-server/src/main/java/org/apache/hedwig/server/netty/PubSubServer.java c06f03a hedwig-server/src/main/java/org/apache/hedwig/server/regions/HedwigHubClient.java 063a99c hedwig-server/src/main/java/org/apache/hedwig/server/regions/HedwigHubClientFactory.java 68d317e hedwig-server/src/main/java/org/apache/hedwig/server/regions/HedwigHubSubscriber.java 7055251 hedwig-server/src/main/java/org/apache/hedwig/server/regions/NoOpRemoteSubscriptionManager.java PRE-CREATION hedwig-server/src/main/java/org/apache/hedwig/server/regions/RegionManager.java bae960b hedwig-server/src/main/java/org/apache/hedwig/server/regions/ZKRemoteSubscriptionManager.java PRE-CREATION hedwig-server/src/main/java/org/apache/hedwig/server/subscriptions/AbstractSubscriptionManager.java 5552265 hedwig-server/src/main/java/org/apache/hedwig/server/subscriptions/InMemorySubscriptionManager.java 1400e49 hedwig-server/src/main/java/org/apache/hedwig/server/subscriptions/SubscriptionEventListener.java 6c6e626 hedwig-server/src/test/java/org/apache/hedwig/server/TestRegionSubscribe.java PRE-CREATION hedwig-server/src/test/java/org/apache/hedwig/server/meta/TestMetadataManagerFactory.java 44c30d7 hedwig-server/src/test/java/org/apache/hedwig/server/regions/TestNoOpRemoteSubManager.java PRE-CREATION hedwig-server/src/test/java/org/apache/hedwig/server/regions/TestZkRemoteSubManager.java PRE-CREATION Diff: https://reviews.apache.org/r/7732/diff/ Testing --- Added unit tests + existing Thanks, Yixue (Andrew) Zhu
[jira] [Commented] (BOOKKEEPER-346) Detect IOExceptions in LedgerCache and bookie should look at next ledger dir(if any)
[ https://issues.apache.org/jira/browse/BOOKKEEPER-346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13483899#comment-13483899 ] Vinay commented on BOOKKEEPER-346: -- Oh!! Then I will post a patch with just movement using above mentioned steps. I hope that would be fine. Detect IOExceptions in LedgerCache and bookie should look at next ledger dir(if any) Key: BOOKKEEPER-346 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-346 Project: Bookkeeper Issue Type: Sub-task Components: bookkeeper-server Affects Versions: 4.1.0 Reporter: Rakesh R Assignee: Vinay Fix For: 4.2.0 Attachments: BOOKKEEPER-346.patch, BOOKKEEPER-346.patch, BOOKKEEPER-346.patch, BOOKKEEPER-346.patch, BOOKKEEPER-346.patch, BOOKKEEPER-346.patch This jira to detect IOExceptions in LedgerCache to iterate over all the configured ledger(s). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira