[jira] [Comment Edited] (BOOKKEEPER-604) Ledger storage can log an exception if GC happens concurrently.
[ https://issues.apache.org/jira/browse/BOOKKEEPER-604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13719265#comment-13719265 ] Sijie Guo edited comment on BOOKKEEPER-604 at 7/25/13 5:58 AM: --- attach a patch as what I commented here : https://issues.apache.org/jira/browse/BOOKKEEPER-604?focusedCommentId=13696005page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13696005 - obtain fileinfo before flushing - pass fileinfo to flush - finally release fileinfo after flush was (Author: hustlmsp): attach a patch - obtain fileinfo before flushing - pass fileinfo to flush - finally release fileinfo after flush Ledger storage can log an exception if GC happens concurrently. --- Key: BOOKKEEPER-604 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-604 Project: Bookkeeper Issue Type: Bug Reporter: Ivan Kelly Assignee: Ivan Kelly Fix For: 4.2.2, 4.3.0 Attachments: 0001-BOOKKEEPER-604-Ledger-storage-can-log-an-exception-i.patch, 0001-BOOKKEEPER-604-Ledger-storage-can-log-an-exception-i.patch, 0001-BOOKKEEPER-604-Ledger-storage-can-log-an-exception-i.patch, 0001-BOOKKEEPER-604-Ledger-storage-can-log-an-exception-i.patch, BOOKKEEPER-604.diff, BOOKKEEPER-604.diff If a ledger is flushing, and part way through,GC kicks in, it can delete the index file before we try and flush it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (BOOKKEEPER-658) ledger cache refactor
[ https://issues.apache.org/jira/browse/BOOKKEEPER-658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13719272#comment-13719272 ] Hadoop QA commented on BOOKKEEPER-658: -- Testing JIRA BOOKKEEPER-658 Patch [BOOKKEEPER-658.patch|https://issues.apache.org/jira/secure/attachment/12594096/BOOKKEEPER-658.patch] downloaded at Thu Jul 25 05:37:43 UTC 2013 {color:green}+1 PATCH_APPLIES{color} {color:green}+1 CLEAN{color} {color:green}+1 RAW_PATCH_ANALYSIS{color} .{color:green}+1{color} the patch does not introduce any @author tags .{color:green}+1{color} the patch does not introduce any tabs .{color:green}+1{color} the patch does not introduce any trailing spaces .{color:green}+1{color} the patch does not introduce any line longer than 120 .{color:green}+1{color} the patch does adds/modifies 4 testcase(s) {color:green}+1 RAT{color} .{color:green}+1{color} the patch does not seem to introduce new RAT warnings {color:green}+1 JAVADOC{color} .{color:green}+1{color} the patch does not seem to introduce new Javadoc warnings {color:green}+1 COMPILE{color} .{color:green}+1{color} HEAD compiles .{color:green}+1{color} patch compiles .{color:green}+1{color} the patch does not seem to introduce new javac warnings {color:green}+1 FINDBUGS{color} .{color:green}+1{color} the patch does not seem to introduce new Findbugs warnings {color:green}+1 TESTS{color} .Tests run: 860 {color:green}+1 DISTRO{color} .{color:green}+1{color} distro tarball builds with the patch {color:green}*+1 Overall result, good!, no -1s*{color} The full output of the test-patch run is available at . https://builds.apache.org/job/bookkeeper-trunk-precommit-build/436/ ledger cache refactor - Key: BOOKKEEPER-658 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-658 Project: Bookkeeper Issue Type: Sub-task Components: bookkeeper-server Reporter: Sijie Guo Assignee: Robin Dhamankar Fix For: 4.3.0 Attachments: BOOKKEEPER-658.patch refactor ledger cache to separate in-memory page management from persistent management. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (BOOKKEEPER-659) LRU page management in ledger cache.
[ https://issues.apache.org/jira/browse/BOOKKEEPER-659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13719275#comment-13719275 ] Hadoop QA commented on BOOKKEEPER-659: -- Testing JIRA BOOKKEEPER-659 Patch [BOOKKEEPER-659.diff|https://issues.apache.org/jira/secure/attachment/12594100/BOOKKEEPER-659.diff] downloaded at Thu Jul 25 06:04:32 UTC 2013 {color:red}-1{color} Patch failed to apply to head of branch LRU page management in ledger cache. Key: BOOKKEEPER-659 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-659 Project: Bookkeeper Issue Type: Sub-task Components: bookkeeper-server Reporter: Sijie Guo Assignee: Robin Dhamankar Fix For: 4.3.0 Attachments: BOOKKEEPER-659.diff better ledger page management. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (BOOKKEEPER-429) Provide separate read and write threads in the bookkeeper server
[ https://issues.apache.org/jira/browse/BOOKKEEPER-429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13719279#comment-13719279 ] Sijie Guo commented on BOOKKEEPER-429: -- MultipleThreadReadTest.java needs a license, it should probably be in the proto package as it's testing functionality there. I think it is an overall functionality not just protocol. if you feels strongly about it, I could move. just let me know. RequestProcessor.java should be in proto package also. It deals with RPC stuff. in future, it could be used to support chained request processor, to support like authentication handling. so I put the interface in a processor package to make it clean. I don't want to make proto too fat. I don't think we should allow 0 worker threads to be configured. It only exists for one test. It'd be better to fix the test. user could configure write thread to 0 which leverage netty threads to handle writes and configure read threads to a sensible number, so read would not block writes. so we don't have to spawn too much threads. the test problem could be addressed in a different ticket. I will address comment 3). Provide separate read and write threads in the bookkeeper server Key: BOOKKEEPER-429 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-429 Project: Bookkeeper Issue Type: Improvement Components: bookkeeper-server Affects Versions: 4.2.0 Reporter: Aniruddha Assignee: Aniruddha Fix For: 4.3.0 Attachments: BK-429.patch, BOOKKEEPER-429.diff The current bookkeeper server is single threaded. The same thread handles reads and writes. When reads are slow (possibly because of excessive seeks), add entry operations suffer in terms of latencies. Providing separate read and write threads helps in reducing add entry latencies and increasing throughput even when we're facing slow reads. Having a single read thread also results in low disk utilization because seeks can't be ordered efficiently by the OS. Multiple read threads would help in improving the read throughput. Discussion on this can be found at http://mail-archives.apache.org/mod_mbox/zookeeper-bookkeeper-dev/201209.mbox/%3ccaolhydqpzn-v10zynfwud_h0qzrxtmjgttx7a9eofohyyty...@mail.gmail.com%3e Reviewboard : https://reviews.apache.org/r/7560/ -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (BOOKKEEPER-632) AutoRecovery should consider read only bookies
[ https://issues.apache.org/jira/browse/BOOKKEEPER-632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinay updated BOOKKEEPER-632: - Attachment: BOOKKEEPER-632.patch Fixed test failures. Previously I had missed watching available bookies. AutoRecovery should consider read only bookies -- Key: BOOKKEEPER-632 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-632 Project: Bookkeeper Issue Type: Bug Components: bookkeeper-auto-recovery, bookkeeper-server Affects Versions: 4.2.1, 4.3.0 Reporter: Vinay Assignee: Vinay Fix For: 4.2.2, 4.3.0 Attachments: BOOKKEEPER-632.patch, BOOKKEEPER-632.patch, BOOKKEEPER-632.patch, BOOKKEEPER-632.patch, BOOKKEEPER-632.patch Autorecovery Auditor should consider the readonly bookies as Available Bookies while publishing the under-replicated ledgers. Also AutoRecoveryDaemon should shutdown if the local bookie is readonly -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (BOOKKEEPER-602) we should have request timeouts rather than channel timeout in PerChannelBookieClient
[ https://issues.apache.org/jira/browse/BOOKKEEPER-602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13719292#comment-13719292 ] Sijie Guo commented on BOOKKEEPER-602: -- The reason why we put getAddEntryTimeout to 1 is to put the best parameter for most of cases. so the user doesn't need to tune it too much. based on this consideration, we'd prefer putting the tuned value as default value and having TestBKConfiguration to handle low-throughput case (this TestBKConfiguration also exists in the journal improvements we made in BOOKKEEPER-657). if you feel strongly about it, I could change. let me know. 1 second is based on performance evaluation, 1) we don't want a slow add request to cause too much pending requests accumulated in client (which cause bad gc behavior) 2) for latency consideration. the failure test is also related to the configuration setting. I forgot to bring the changes for hedwig when generating the patch. will add soon. we should have request timeouts rather than channel timeout in PerChannelBookieClient - Key: BOOKKEEPER-602 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-602 Project: Bookkeeper Issue Type: Bug Affects Versions: 4.2.0, 4.2.1 Reporter: Sijie Guo Assignee: Sijie Guo Fix For: 4.3.0 Attachments: BOOKKEEPER-602.diff currently we only have readTimeout in netty channel, it timeouts only when there is no activities in that channel, but it can't track timeouts of individual requests. if a channel continues having read entry activities, it might shadow a slow add entry response, which is bad impacting add latency. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (BOOKKEEPER-602) we should have request timeouts rather than channel timeout in PerChannelBookieClient
[ https://issues.apache.org/jira/browse/BOOKKEEPER-602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sijie Guo updated BOOKKEEPER-602: - Assignee: Aniruddha (was: Sijie Guo) we should have request timeouts rather than channel timeout in PerChannelBookieClient - Key: BOOKKEEPER-602 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-602 Project: Bookkeeper Issue Type: Bug Affects Versions: 4.2.0, 4.2.1 Reporter: Sijie Guo Assignee: Aniruddha Fix For: 4.3.0 Attachments: BOOKKEEPER-602.diff currently we only have readTimeout in netty channel, it timeouts only when there is no activities in that channel, but it can't track timeouts of individual requests. if a channel continues having read entry activities, it might shadow a slow add entry response, which is bad impacting add latency. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (BOOKKEEPER-632) AutoRecovery should consider read only bookies
[ https://issues.apache.org/jira/browse/BOOKKEEPER-632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13719300#comment-13719300 ] Sijie Guo commented on BOOKKEEPER-632: -- please add timeout option. I guessed we need to modify precommit hook to check '@Test' to ensure there is a timeout provided. AutoRecovery should consider read only bookies -- Key: BOOKKEEPER-632 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-632 Project: Bookkeeper Issue Type: Bug Components: bookkeeper-auto-recovery, bookkeeper-server Affects Versions: 4.2.1, 4.3.0 Reporter: Vinay Assignee: Vinay Fix For: 4.2.2, 4.3.0 Attachments: BOOKKEEPER-632.patch, BOOKKEEPER-632.patch, BOOKKEEPER-632.patch, BOOKKEEPER-632.patch, BOOKKEEPER-632.patch Autorecovery Auditor should consider the readonly bookies as Available Bookies while publishing the under-replicated ledgers. Also AutoRecoveryDaemon should shutdown if the local bookie is readonly -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (BOOKKEEPER-661) Turn readonly back to writable if spaces are reclaimed.
Sijie Guo created BOOKKEEPER-661: Summary: Turn readonly back to writable if spaces are reclaimed. Key: BOOKKEEPER-661 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-661 Project: Bookkeeper Issue Type: Improvement Components: bookkeeper-server Reporter: Sijie Guo Assignee: Sijie Guo Fix For: 4.3.0 should be able to turn a bookie from readonly back to writable if the spaces are reclaimed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (BOOKKEEPER-662) Major GC should kick in immediately if remaining space reaches a warning threshold
Sijie Guo created BOOKKEEPER-662: Summary: Major GC should kick in immediately if remaining space reaches a warning threshold Key: BOOKKEEPER-662 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-662 Project: Bookkeeper Issue Type: Improvement Components: bookkeeper-server Reporter: Sijie Guo Assignee: Aniruddha Fix For: 4.3.0 in a high throughput case, Major GC should kick in immediately if remaining spaces reaches a warning threshold. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (BOOKKEEPER-632) AutoRecovery should consider read only bookies
[ https://issues.apache.org/jira/browse/BOOKKEEPER-632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinay updated BOOKKEEPER-632: - Attachment: BOOKKEEPER-632.patch Added timeout AutoRecovery should consider read only bookies -- Key: BOOKKEEPER-632 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-632 Project: Bookkeeper Issue Type: Bug Components: bookkeeper-auto-recovery, bookkeeper-server Affects Versions: 4.2.1, 4.3.0 Reporter: Vinay Assignee: Vinay Fix For: 4.2.2, 4.3.0 Attachments: BOOKKEEPER-632.patch, BOOKKEEPER-632.patch, BOOKKEEPER-632.patch, BOOKKEEPER-632.patch, BOOKKEEPER-632.patch, BOOKKEEPER-632.patch Autorecovery Auditor should consider the readonly bookies as Available Bookies while publishing the under-replicated ledgers. Also AutoRecoveryDaemon should shutdown if the local bookie is readonly -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (BOOKKEEPER-648) BasicJMSTest failed
[ https://issues.apache.org/jira/browse/BOOKKEEPER-648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13719324#comment-13719324 ] Mridul Muralidharan commented on BOOKKEEPER-648: On going over the code again, the testcase is extremely simple - it tests the sync and async api. a) Create one publisher and two subscriber sessions for the topic (one via sync api, another async api of jms). b) Send 4 messages through publisher session. c) Sleep for 10 ms to ensure all messages have been sent - this should not be required, but hedwig sometimes takes a lot of time. d) For sync test, wait for 100 ms for each message to be received : null is returned if it times out without receiving any message. The number of times receive is called is equal to the number of times we sent messages - the test expects in-order-delivery without loss of messages as per hedwig api contract. This is what is failing : we are not receiving all the messages we sent. e) The async session listener does the same as (d) - except on the async listener : this did not get tested in this specific case due to validation failure in (d). Looking more, we should probably add a sleep before checking the 'messageCount.getValue() != CHAT_MESSAGES.length' condition - in case async listener is still running in parallel : though this is not the failure we are observing ... Assuming you noticed the same assertion stacktrace in each case when it failed, it means no message was received before timeout in sync invocation. This can be due to : 1) Hedwig or bookkeeper is inordinately slow for some reasons (slow hdd, filled up /tmp, low mem, tlb thrashing, etc ?) : in which case, simply bumping up the sleep time and receive timeout param will circumvent the issue. 2) There is some bug somewhere in the chain which is causing message drops - either at publish time or while sending it to subscribers or somewhere else ? To get additional debugging info, there are log messages in jms module : but (particularly for this testcase) the jms module is a thin wrapper delegating to corresponding hedwig client api - so enabling debug there would be more helpful. Actually, I would validate if the server actually sent messages to both the subscribers and they were received by the client - if yes, rest would be a client side bug (hedwig client or jms). If you could reproduce the issue with debug logging enabled for root logger, I can definitely help narrow down the issue with those logs ! Unfortunately, there are almost no testcases in hedwig client : so I am not sure what design or implementation changes happened in client (or server ?) - since I am not keeping track of bookkeeper/hedwig anymore. BasicJMSTest failed --- Key: BOOKKEEPER-648 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-648 Project: Bookkeeper Issue Type: Bug Components: hedwig-client Reporter: Flavio Junqueira Assignee: Mridul Muralidharan While running tests, I got once a failure for this hedwig-client-jms test: BasicJMSTest. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (BOOKKEEPER-648) BasicJMSTest failed
[ https://issues.apache.org/jira/browse/BOOKKEEPER-648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13719348#comment-13719348 ] Mridul Muralidharan commented on BOOKKEEPER-648: Flavio, I am not sure what the expectation of this bug is - is it appropriate debug message for this testcase ? As I detailed above - the only message possible is : no message was received before timeout - unfortunately, I dont think that is going to help us much in debugging it. The reason I did not add descriptive message for every assertion in the tests is because the corresponding JMS api's detail the error conditions in detail (when a null can be returned from receive is detailed in JMS api javadocs for example). To actually debug/fix the issue, we will need to enable debug logging in server, client api and jms mode. Subsequently, when issue is observed, we will need to trace whether message was actually sent to server, whether server dispatched to both subscribers, whether client api received it, and whether it was dispatched to jms. The jms module does nothing different from any other api user of hedwig - barring bugs in it ofcourse :-) This is an assertion which is typically not expected to fail unless there is something broken elsewhere which is causing message loss. BasicJMSTest failed --- Key: BOOKKEEPER-648 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-648 Project: Bookkeeper Issue Type: Bug Components: hedwig-client Reporter: Flavio Junqueira Assignee: Mridul Muralidharan While running tests, I got once a failure for this hedwig-client-jms test: BasicJMSTest. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (BOOKKEEPER-632) AutoRecovery should consider read only bookies
[ https://issues.apache.org/jira/browse/BOOKKEEPER-632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13719445#comment-13719445 ] Hadoop QA commented on BOOKKEEPER-632: -- Testing JIRA BOOKKEEPER-632 Patch [BOOKKEEPER-632.patch|https://issues.apache.org/jira/secure/attachment/12594120/BOOKKEEPER-632.patch] downloaded at Thu Jul 25 09:17:47 UTC 2013 {color:green}+1 PATCH_APPLIES{color} {color:green}+1 CLEAN{color} {color:green}+1 RAW_PATCH_ANALYSIS{color} .{color:green}+1{color} the patch does not introduce any @author tags .{color:green}+1{color} the patch does not introduce any tabs .{color:green}+1{color} the patch does not introduce any trailing spaces .{color:green}+1{color} the patch does not introduce any line longer than 120 .{color:green}+1{color} the patch does adds/modifies 2 testcase(s) {color:green}+1 RAT{color} .{color:green}+1{color} the patch does not seem to introduce new RAT warnings {color:green}+1 JAVADOC{color} .{color:green}+1{color} the patch does not seem to introduce new Javadoc warnings {color:green}+1 COMPILE{color} .{color:green}+1{color} HEAD compiles .{color:green}+1{color} patch compiles .{color:green}+1{color} the patch does not seem to introduce new javac warnings {color:green}+1 FINDBUGS{color} .{color:green}+1{color} the patch does not seem to introduce new Findbugs warnings {color:green}+1 TESTS{color} .Tests run: 866 {color:green}+1 DISTRO{color} .{color:green}+1{color} distro tarball builds with the patch {color:green}*+1 Overall result, good!, no -1s*{color} The full output of the test-patch run is available at . https://builds.apache.org/job/bookkeeper-trunk-precommit-build/438/ AutoRecovery should consider read only bookies -- Key: BOOKKEEPER-632 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-632 Project: Bookkeeper Issue Type: Bug Components: bookkeeper-auto-recovery, bookkeeper-server Affects Versions: 4.2.1, 4.3.0 Reporter: Vinay Assignee: Vinay Fix For: 4.2.2, 4.3.0 Attachments: BOOKKEEPER-632.patch, BOOKKEEPER-632.patch, BOOKKEEPER-632.patch, BOOKKEEPER-632.patch, BOOKKEEPER-632.patch, BOOKKEEPER-632.patch Autorecovery Auditor should consider the readonly bookies as Available Bookies while publishing the under-replicated ledgers. Also AutoRecoveryDaemon should shutdown if the local bookie is readonly -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (BOOKKEEPER-604) Ledger storage can log an exception if GC happens concurrently.
[ https://issues.apache.org/jira/browse/BOOKKEEPER-604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13719470#comment-13719470 ] Hadoop QA commented on BOOKKEEPER-604: -- Testing JIRA BOOKKEEPER-604 Patch [BOOKKEEPER-604.diff|https://issues.apache.org/jira/secure/attachment/12594107/BOOKKEEPER-604.diff] downloaded at Thu Jul 25 09:44:58 UTC 2013 {color:green}+1 PATCH_APPLIES{color} {color:green}+1 CLEAN{color} {color:green}+1 RAW_PATCH_ANALYSIS{color} .{color:green}+1{color} the patch does not introduce any @author tags .{color:green}+1{color} the patch does not introduce any tabs .{color:green}+1{color} the patch does not introduce any trailing spaces .{color:green}+1{color} the patch does not introduce any line longer than 120 .{color:green}+1{color} the patch does adds/modifies 1 testcase(s) {color:green}+1 RAT{color} .{color:green}+1{color} the patch does not seem to introduce new RAT warnings {color:green}+1 JAVADOC{color} .{color:green}+1{color} the patch does not seem to introduce new Javadoc warnings {color:green}+1 COMPILE{color} .{color:green}+1{color} HEAD compiles .{color:green}+1{color} patch compiles .{color:green}+1{color} the patch does not seem to introduce new javac warnings {color:green}+1 FINDBUGS{color} .{color:green}+1{color} the patch does not seem to introduce new Findbugs warnings {color:green}+1 TESTS{color} .Tests run: 861 {color:green}+1 DISTRO{color} .{color:green}+1{color} distro tarball builds with the patch {color:green}*+1 Overall result, good!, no -1s*{color} The full output of the test-patch run is available at . https://builds.apache.org/job/bookkeeper-trunk-precommit-build/439/ Ledger storage can log an exception if GC happens concurrently. --- Key: BOOKKEEPER-604 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-604 Project: Bookkeeper Issue Type: Bug Reporter: Ivan Kelly Assignee: Ivan Kelly Fix For: 4.2.2, 4.3.0 Attachments: 0001-BOOKKEEPER-604-Ledger-storage-can-log-an-exception-i.patch, 0001-BOOKKEEPER-604-Ledger-storage-can-log-an-exception-i.patch, 0001-BOOKKEEPER-604-Ledger-storage-can-log-an-exception-i.patch, 0001-BOOKKEEPER-604-Ledger-storage-can-log-an-exception-i.patch, BOOKKEEPER-604.diff, BOOKKEEPER-604.diff If a ledger is flushing, and part way through,GC kicks in, it can delete the index file before we try and flush it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (BOOKKEEPER-596) Ledgers are gc'ed by mistake in MSLedgerManagerFactory.
[ https://issues.apache.org/jira/browse/BOOKKEEPER-596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13720342#comment-13720342 ] Sijie Guo commented on BOOKKEEPER-596: -- [~merlimat] I guessed I found the root cause. it is a bug introduce in this patch, where Ivan refactored the LedgerRange. int HierarchicalLedgerManager, the subSet is misused, which exclude the last ledger id. http://docs.oracle.com/javase/6/docs/api/java/util/SortedSet.html#subSet(E, E) {code} return new LedgerRange(zkActiveLedgers.subSet(getStartLedgerIdByLevel(level1, level2), getEndLedgerIdByLevel(level1, level2))); {code} so the last ledger in each level would be gc'ed. it is easy to reproduce this issue and fix it. Ledgers are gc'ed by mistake in MSLedgerManagerFactory. --- Key: BOOKKEEPER-596 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-596 Project: Bookkeeper Issue Type: Bug Affects Versions: 4.2.0, 4.2.1 Reporter: Sijie Guo Assignee: Sijie Guo Priority: Blocker Fix For: 4.2.2, 4.3.0 Attachments: 0001-BOOKKEEPER-596-Ledgers-are-gc-ed-by-mistake-in-MSLed.patch, 0001-BOOKKEEPER-596-Ledgers-are-gc-ed-by-mistake-in-MSLed.patch, BOOKKEEPER-596.patch, BOOKKEEPER-596.patch, BOOKKEEPER-596.patch details: https://issues.apache.org/jira/browse/BOOKKEEPER-590?focusedCommentId=13616397page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13616397 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (BOOKKEEPER-654) Bookkeeper client operations are allowed even after its closure, bk#close()
[ https://issues.apache.org/jira/browse/BOOKKEEPER-654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13720403#comment-13720403 ] Rakesh R commented on BOOKKEEPER-654: - Thanks [~hustlmsp] for the comments. Few clarifications: bq.1. in OrderedSafeExecutor, why not just catch the rejected exception rather than adding an extra boolean flag. since this flag doesn't avoid throwing rejected exception. From ThreadPoolExecutor JavaDoc, it says, will be rejected when the Executor has been shut down, and also when the Executor uses finite bounds for both maximum threads and work queue capacity, and is saturated. ThreadPoolExecutor.java {code} public void execute(Runnable command) { // // reject(command); // is shutdown or saturated {code} I've added the flag to convey to the user about the actual cause(either due to bk.close() or some other reason). Otherwise we need to iterate over the 'executor.threads[i]' and see whether its shutdown like below. Any other better way to handle this? OrderedSafeExecutor.java {code} for (int i = 0; i executor.threads.length; i++) { if(executor.threads[i].isShutdown()){ safeOperationComplete(BKException.Code.BkClientClosedException, result); return; } } {code} bq.2. in LedgerOpenOp, why we need #readComplete here? an unscheduled speculative task doesn't affect any logic. If we throws exception back to the caller, (callers are: LedgerRecovery#doRecoveryRead() and LedgerHandle.asyncReadEntries()) callers need to duplicate the logic of handling the exception and return the BkClientClosedException. Whats your opinion? Also, I need to remove 'if(bk.bookieClient.isClosed())' check added at LedgerHandle.asyncReadEntries(), its not required. Bookkeeper client operations are allowed even after its closure, bk#close() --- Key: BOOKKEEPER-654 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-654 Project: Bookkeeper Issue Type: Bug Components: bookkeeper-client Affects Versions: 4.2.0 Reporter: Rakesh R Assignee: Rakesh R Fix For: 4.2.2, 4.3.0 Attachments: 0001-BOOKKEEPER-654-testcase-to-understand-more.patch, 0002-BOOKKEEPER-654.patch, 0003-BOOKKEEPER-654.patch User can perform below operations with the closed bookkeeper client, which was instantiated with external zkclient. - open a closed ledger - create a new ledger Also, ledgerhandle operations like fencing/add/write are infinitely hanging. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (BOOKKEEPER-654) Bookkeeper client operations are allowed even after its closure, bk#close()
[ https://issues.apache.org/jira/browse/BOOKKEEPER-654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13720414#comment-13720414 ] Sijie Guo commented on BOOKKEEPER-654: -- From ThreadPoolExecutor JavaDoc, it says, will be rejected when the Executor has been shut down, and also when the Executor uses finite bounds for both maximum threads and work queue capacity, and is saturated. but you still need to catch the exception as I commented. a shutdown flag can't avoid submitting tasks to a shutdown scheduler. that's the point. If we throws exception back to the caller, (callers are: LedgerRecovery#doRecoveryRead() and LedgerHandle.asyncReadEntries()) callers need to duplicate the logic of handling the exception and return the BkClientClosedException. Whats your opinion? my point is a failure speculative task doesn't affect anything. since the original read request would fail due to bookie client is closed. isClosed checking is also not necessary. the errors already be propagated from either worker pool callback or bookie client. Bookkeeper client operations are allowed even after its closure, bk#close() --- Key: BOOKKEEPER-654 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-654 Project: Bookkeeper Issue Type: Bug Components: bookkeeper-client Affects Versions: 4.2.0 Reporter: Rakesh R Assignee: Rakesh R Fix For: 4.2.2, 4.3.0 Attachments: 0001-BOOKKEEPER-654-testcase-to-understand-more.patch, 0002-BOOKKEEPER-654.patch, 0003-BOOKKEEPER-654.patch User can perform below operations with the closed bookkeeper client, which was instantiated with external zkclient. - open a closed ledger - create a new ledger Also, ledgerhandle operations like fencing/add/write are infinitely hanging. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (BOOKKEEPER-602) we should have request timeouts rather than channel timeout in PerChannelBookieClient
[ https://issues.apache.org/jira/browse/BOOKKEEPER-602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sijie Guo updated BOOKKEEPER-602: - Attachment: BOOKKEEPER-602.diff add missing config in TestDeadLock to fix the failed test. we should have request timeouts rather than channel timeout in PerChannelBookieClient - Key: BOOKKEEPER-602 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-602 Project: Bookkeeper Issue Type: Bug Affects Versions: 4.2.0, 4.2.1 Reporter: Sijie Guo Assignee: Aniruddha Fix For: 4.3.0 Attachments: BOOKKEEPER-602.diff, BOOKKEEPER-602.diff currently we only have readTimeout in netty channel, it timeouts only when there is no activities in that channel, but it can't track timeouts of individual requests. if a channel continues having read entry activities, it might shadow a slow add entry response, which is bad impacting add latency. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (BOOKKEEPER-429) Provide separate read and write threads in the bookkeeper server
[ https://issues.apache.org/jira/browse/BOOKKEEPER-429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sijie Guo updated BOOKKEEPER-429: - Attachment: BOOKKEEPER-429.diff addressed comment 3). Provide separate read and write threads in the bookkeeper server Key: BOOKKEEPER-429 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-429 Project: Bookkeeper Issue Type: Improvement Components: bookkeeper-server Affects Versions: 4.2.0 Reporter: Aniruddha Assignee: Aniruddha Fix For: 4.3.0 Attachments: BK-429.patch, BOOKKEEPER-429.diff, BOOKKEEPER-429.diff The current bookkeeper server is single threaded. The same thread handles reads and writes. When reads are slow (possibly because of excessive seeks), add entry operations suffer in terms of latencies. Providing separate read and write threads helps in reducing add entry latencies and increasing throughput even when we're facing slow reads. Having a single read thread also results in low disk utilization because seeks can't be ordered efficiently by the OS. Multiple read threads would help in improving the read throughput. Discussion on this can be found at http://mail-archives.apache.org/mod_mbox/zookeeper-bookkeeper-dev/201209.mbox/%3ccaolhydqpzn-v10zynfwud_h0qzrxtmjgttx7a9eofohyyty...@mail.gmail.com%3e Reviewboard : https://reviews.apache.org/r/7560/ -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (BOOKKEEPER-164) Add checksumming for ledger index files
[ https://issues.apache.org/jira/browse/BOOKKEEPER-164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sijie Guo updated BOOKKEEPER-164: - Fix Version/s: (was: 4.3.0) Add checksumming for ledger index files --- Key: BOOKKEEPER-164 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-164 Project: Bookkeeper Issue Type: Improvement Components: bookkeeper-server Affects Versions: 4.0.0 Reporter: Sijie Guo Assignee: Sijie Guo now bookie ledger index files lacks checksumming to prevent truncation/corruption. if a ledger index file is truncated, the ledger index file still works but responds wrong response when reading last confirmed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira