[jira] [Comment Edited] (BOOKKEEPER-604) Ledger storage can log an exception if GC happens concurrently.

2013-07-25 Thread Sijie Guo (JIRA)

[ 
https://issues.apache.org/jira/browse/BOOKKEEPER-604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13719265#comment-13719265
 ] 

Sijie Guo edited comment on BOOKKEEPER-604 at 7/25/13 5:58 AM:
---

attach a patch as what I commented here : 
https://issues.apache.org/jira/browse/BOOKKEEPER-604?focusedCommentId=13696005page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13696005

- obtain fileinfo before flushing
- pass fileinfo to flush
- finally release fileinfo after flush



  was (Author: hustlmsp):
attach a patch 

- obtain fileinfo before flushing
- pass fileinfo to flush
- finally release fileinfo after flush


  
 Ledger storage can log an exception if GC happens concurrently.
 ---

 Key: BOOKKEEPER-604
 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-604
 Project: Bookkeeper
  Issue Type: Bug
Reporter: Ivan Kelly
Assignee: Ivan Kelly
 Fix For: 4.2.2, 4.3.0

 Attachments: 
 0001-BOOKKEEPER-604-Ledger-storage-can-log-an-exception-i.patch, 
 0001-BOOKKEEPER-604-Ledger-storage-can-log-an-exception-i.patch, 
 0001-BOOKKEEPER-604-Ledger-storage-can-log-an-exception-i.patch, 
 0001-BOOKKEEPER-604-Ledger-storage-can-log-an-exception-i.patch, 
 BOOKKEEPER-604.diff, BOOKKEEPER-604.diff


 If a ledger is flushing, and part way through,GC kicks in, it can delete the 
 index file before we try and flush it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (BOOKKEEPER-658) ledger cache refactor

2013-07-25 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/BOOKKEEPER-658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13719272#comment-13719272
 ] 

Hadoop QA commented on BOOKKEEPER-658:
--

Testing JIRA BOOKKEEPER-658


Patch 
[BOOKKEEPER-658.patch|https://issues.apache.org/jira/secure/attachment/12594096/BOOKKEEPER-658.patch]
 downloaded at Thu Jul 25 05:37:43 UTC 2013



{color:green}+1 PATCH_APPLIES{color}
{color:green}+1 CLEAN{color}
{color:green}+1 RAW_PATCH_ANALYSIS{color}
.{color:green}+1{color} the patch does not introduce any @author tags
.{color:green}+1{color} the patch does not introduce any tabs
.{color:green}+1{color} the patch does not introduce any trailing spaces
.{color:green}+1{color} the patch does not introduce any line longer than 
120
.{color:green}+1{color} the patch does adds/modifies 4 testcase(s)
{color:green}+1 RAT{color}
.{color:green}+1{color} the patch does not seem to introduce new RAT 
warnings
{color:green}+1 JAVADOC{color}
.{color:green}+1{color} the patch does not seem to introduce new Javadoc 
warnings
{color:green}+1 COMPILE{color}
.{color:green}+1{color} HEAD compiles
.{color:green}+1{color} patch compiles
.{color:green}+1{color} the patch does not seem to introduce new javac 
warnings
{color:green}+1 FINDBUGS{color}
.{color:green}+1{color} the patch does not seem to introduce new Findbugs 
warnings
{color:green}+1 TESTS{color}
.Tests run: 860
{color:green}+1 DISTRO{color}
.{color:green}+1{color} distro tarball builds with the patch 


{color:green}*+1 Overall result, good!, no -1s*{color}


The full output of the test-patch run is available at

.   https://builds.apache.org/job/bookkeeper-trunk-precommit-build/436/

 ledger cache refactor
 -

 Key: BOOKKEEPER-658
 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-658
 Project: Bookkeeper
  Issue Type: Sub-task
  Components: bookkeeper-server
Reporter: Sijie Guo
Assignee: Robin Dhamankar
 Fix For: 4.3.0

 Attachments: BOOKKEEPER-658.patch


 refactor ledger cache to separate in-memory page management from persistent 
 management.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (BOOKKEEPER-659) LRU page management in ledger cache.

2013-07-25 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/BOOKKEEPER-659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13719275#comment-13719275
 ] 

Hadoop QA commented on BOOKKEEPER-659:
--

Testing JIRA BOOKKEEPER-659


Patch 
[BOOKKEEPER-659.diff|https://issues.apache.org/jira/secure/attachment/12594100/BOOKKEEPER-659.diff]
 downloaded at Thu Jul 25 06:04:32 UTC 2013



{color:red}-1{color} Patch failed to apply to head of branch



 LRU page management in ledger cache.
 

 Key: BOOKKEEPER-659
 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-659
 Project: Bookkeeper
  Issue Type: Sub-task
  Components: bookkeeper-server
Reporter: Sijie Guo
Assignee: Robin Dhamankar
 Fix For: 4.3.0

 Attachments: BOOKKEEPER-659.diff


 better ledger page management.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (BOOKKEEPER-429) Provide separate read and write threads in the bookkeeper server

2013-07-25 Thread Sijie Guo (JIRA)

[ 
https://issues.apache.org/jira/browse/BOOKKEEPER-429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13719279#comment-13719279
 ] 

Sijie Guo commented on BOOKKEEPER-429:
--

 MultipleThreadReadTest.java needs a license, it should probably be in the 
 proto package as it's testing functionality there.

I think it is an overall functionality not just protocol. if you feels strongly 
about it, I could move. just let me know.

 RequestProcessor.java should be in proto package also. It deals with RPC 
 stuff.

in future, it could be used to support chained request processor, to support 
like authentication handling. so I put the interface in a processor package to 
make it clean. I don't want to make proto too fat.

 I don't think we should allow 0 worker threads to be configured. It only 
 exists for one test. It'd be better to fix the test.

user could configure write thread to 0 which leverage netty threads to handle 
writes and configure read threads to a sensible number, so read would not block 
writes. so we don't have to spawn too much threads.

the test problem could be addressed in a different ticket. 

I will address comment 3).

 Provide separate read and write threads in the bookkeeper server
 

 Key: BOOKKEEPER-429
 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-429
 Project: Bookkeeper
  Issue Type: Improvement
  Components: bookkeeper-server
Affects Versions: 4.2.0
Reporter: Aniruddha
Assignee: Aniruddha
 Fix For: 4.3.0

 Attachments: BK-429.patch, BOOKKEEPER-429.diff


 The current bookkeeper server is single threaded. The same thread handles 
 reads and writes. When reads are slow (possibly because of excessive seeks), 
 add entry operations suffer in terms of latencies. Providing separate read 
 and write threads helps in reducing add entry latencies and increasing 
 throughput even when we're facing slow reads. Having a single read thread 
 also results in low disk utilization because seeks can't be ordered 
 efficiently by the OS. Multiple read threads would help in improving the read 
 throughput. 
 Discussion on this can be found at 
 http://mail-archives.apache.org/mod_mbox/zookeeper-bookkeeper-dev/201209.mbox/%3ccaolhydqpzn-v10zynfwud_h0qzrxtmjgttx7a9eofohyyty...@mail.gmail.com%3e
 Reviewboard : https://reviews.apache.org/r/7560/

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (BOOKKEEPER-632) AutoRecovery should consider read only bookies

2013-07-25 Thread Vinay (JIRA)

 [ 
https://issues.apache.org/jira/browse/BOOKKEEPER-632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinay updated BOOKKEEPER-632:
-

Attachment: BOOKKEEPER-632.patch

Fixed test failures.

Previously I had missed watching available bookies.

 AutoRecovery should consider read only bookies
 --

 Key: BOOKKEEPER-632
 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-632
 Project: Bookkeeper
  Issue Type: Bug
  Components: bookkeeper-auto-recovery, bookkeeper-server
Affects Versions: 4.2.1, 4.3.0
Reporter: Vinay
Assignee: Vinay
 Fix For: 4.2.2, 4.3.0

 Attachments: BOOKKEEPER-632.patch, BOOKKEEPER-632.patch, 
 BOOKKEEPER-632.patch, BOOKKEEPER-632.patch, BOOKKEEPER-632.patch


 Autorecovery Auditor should consider the readonly bookies as Available 
 Bookies  while publishing the under-replicated ledgers.
 Also AutoRecoveryDaemon should shutdown if the local bookie is readonly

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (BOOKKEEPER-602) we should have request timeouts rather than channel timeout in PerChannelBookieClient

2013-07-25 Thread Sijie Guo (JIRA)

[ 
https://issues.apache.org/jira/browse/BOOKKEEPER-602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13719292#comment-13719292
 ] 

Sijie Guo commented on BOOKKEEPER-602:
--

The reason why we put getAddEntryTimeout to 1 is to put the best parameter for 
most of cases. so the user doesn't need to tune it too much. based on this 
consideration, we'd prefer putting the tuned value as default value and having 
TestBKConfiguration to handle low-throughput case (this TestBKConfiguration 
also exists in the journal improvements we made in BOOKKEEPER-657). if you feel 
strongly about it, I could change. let me know.

1 second is based on performance evaluation, 1) we don't want a slow add 
request to cause too much pending requests accumulated in client (which cause 
bad gc behavior) 2) for latency consideration.

the failure test is also related to the configuration setting. I forgot to 
bring the changes for hedwig when generating the patch. will add soon.

 we should have request timeouts rather than channel timeout in 
 PerChannelBookieClient
 -

 Key: BOOKKEEPER-602
 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-602
 Project: Bookkeeper
  Issue Type: Bug
Affects Versions: 4.2.0, 4.2.1
Reporter: Sijie Guo
Assignee: Sijie Guo
 Fix For: 4.3.0

 Attachments: BOOKKEEPER-602.diff


 currently we only have readTimeout in netty channel, it timeouts only when 
 there is no activities in that channel, but it can't track timeouts of 
 individual requests. if a channel continues having read entry activities, it 
 might shadow a slow add entry response, which is bad impacting add latency.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (BOOKKEEPER-602) we should have request timeouts rather than channel timeout in PerChannelBookieClient

2013-07-25 Thread Sijie Guo (JIRA)

 [ 
https://issues.apache.org/jira/browse/BOOKKEEPER-602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sijie Guo updated BOOKKEEPER-602:
-

Assignee: Aniruddha  (was: Sijie Guo)

 we should have request timeouts rather than channel timeout in 
 PerChannelBookieClient
 -

 Key: BOOKKEEPER-602
 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-602
 Project: Bookkeeper
  Issue Type: Bug
Affects Versions: 4.2.0, 4.2.1
Reporter: Sijie Guo
Assignee: Aniruddha
 Fix For: 4.3.0

 Attachments: BOOKKEEPER-602.diff


 currently we only have readTimeout in netty channel, it timeouts only when 
 there is no activities in that channel, but it can't track timeouts of 
 individual requests. if a channel continues having read entry activities, it 
 might shadow a slow add entry response, which is bad impacting add latency.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (BOOKKEEPER-632) AutoRecovery should consider read only bookies

2013-07-25 Thread Sijie Guo (JIRA)

[ 
https://issues.apache.org/jira/browse/BOOKKEEPER-632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13719300#comment-13719300
 ] 

Sijie Guo commented on BOOKKEEPER-632:
--

please add timeout option. I guessed we need to modify precommit hook to check 
'@Test' to ensure there is a timeout provided.

 AutoRecovery should consider read only bookies
 --

 Key: BOOKKEEPER-632
 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-632
 Project: Bookkeeper
  Issue Type: Bug
  Components: bookkeeper-auto-recovery, bookkeeper-server
Affects Versions: 4.2.1, 4.3.0
Reporter: Vinay
Assignee: Vinay
 Fix For: 4.2.2, 4.3.0

 Attachments: BOOKKEEPER-632.patch, BOOKKEEPER-632.patch, 
 BOOKKEEPER-632.patch, BOOKKEEPER-632.patch, BOOKKEEPER-632.patch


 Autorecovery Auditor should consider the readonly bookies as Available 
 Bookies  while publishing the under-replicated ledgers.
 Also AutoRecoveryDaemon should shutdown if the local bookie is readonly

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (BOOKKEEPER-661) Turn readonly back to writable if spaces are reclaimed.

2013-07-25 Thread Sijie Guo (JIRA)
Sijie Guo created BOOKKEEPER-661:


 Summary: Turn readonly back to writable if spaces are reclaimed.
 Key: BOOKKEEPER-661
 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-661
 Project: Bookkeeper
  Issue Type: Improvement
  Components: bookkeeper-server
Reporter: Sijie Guo
Assignee: Sijie Guo
 Fix For: 4.3.0


should be able to turn a bookie from readonly back to writable if the spaces 
are reclaimed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (BOOKKEEPER-662) Major GC should kick in immediately if remaining space reaches a warning threshold

2013-07-25 Thread Sijie Guo (JIRA)
Sijie Guo created BOOKKEEPER-662:


 Summary: Major GC should kick in immediately if remaining space 
reaches a warning threshold
 Key: BOOKKEEPER-662
 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-662
 Project: Bookkeeper
  Issue Type: Improvement
  Components: bookkeeper-server
Reporter: Sijie Guo
Assignee: Aniruddha
 Fix For: 4.3.0


in a high throughput case, Major GC should kick in immediately if remaining 
spaces reaches a warning threshold.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (BOOKKEEPER-632) AutoRecovery should consider read only bookies

2013-07-25 Thread Vinay (JIRA)

 [ 
https://issues.apache.org/jira/browse/BOOKKEEPER-632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinay updated BOOKKEEPER-632:
-

Attachment: BOOKKEEPER-632.patch

Added timeout

 AutoRecovery should consider read only bookies
 --

 Key: BOOKKEEPER-632
 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-632
 Project: Bookkeeper
  Issue Type: Bug
  Components: bookkeeper-auto-recovery, bookkeeper-server
Affects Versions: 4.2.1, 4.3.0
Reporter: Vinay
Assignee: Vinay
 Fix For: 4.2.2, 4.3.0

 Attachments: BOOKKEEPER-632.patch, BOOKKEEPER-632.patch, 
 BOOKKEEPER-632.patch, BOOKKEEPER-632.patch, BOOKKEEPER-632.patch, 
 BOOKKEEPER-632.patch


 Autorecovery Auditor should consider the readonly bookies as Available 
 Bookies  while publishing the under-replicated ledgers.
 Also AutoRecoveryDaemon should shutdown if the local bookie is readonly

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (BOOKKEEPER-648) BasicJMSTest failed

2013-07-25 Thread Mridul Muralidharan (JIRA)

[ 
https://issues.apache.org/jira/browse/BOOKKEEPER-648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13719324#comment-13719324
 ] 

Mridul Muralidharan commented on BOOKKEEPER-648:



On going over the code again, the testcase is extremely simple - it tests the 
sync and async api.

a) Create one publisher and two subscriber sessions for the topic (one via sync 
api, another async api of jms).

b) Send 4 messages through publisher session.

c) Sleep for 10 ms to ensure all messages have been sent - this should not be 
required, but hedwig sometimes takes a lot of time.

d) For sync test, wait for 100 ms for each message to be received : null is 
returned if it times out without receiving any message.
The number of times receive is called is equal to the number of times we sent 
messages - the test expects in-order-delivery without loss of messages as per 
hedwig api contract.
This is what is failing : we are not receiving all the messages we sent.

e) The async session listener does the same as (d) - except on the async 
listener : this did not get tested in this specific case due to validation 
failure in (d).
Looking more, we should probably add a sleep before checking the 
'messageCount.getValue() != CHAT_MESSAGES.length' condition - in case async 
listener is still running in parallel : though this is not the failure we are 
observing ...



Assuming you noticed the same assertion stacktrace in each case when it failed, 
it means no message was received before timeout in sync invocation.
This can be due to :

1) Hedwig or bookkeeper is inordinately slow for some reasons (slow hdd, filled 
up /tmp, low mem, tlb thrashing, etc ?) : in which case, simply bumping up the 
sleep time and receive timeout param will circumvent the issue.

2) There is some bug somewhere in the chain which is causing message drops - 
either at publish time or while sending it to subscribers or somewhere else ?

To get additional debugging info, there are log messages in jms module : but 
(particularly for this testcase) the jms module is a thin wrapper delegating to 
corresponding hedwig client api - so enabling debug there would be more helpful.
Actually, I would validate if the server actually sent messages to both the 
subscribers and they were received by the client - if yes, rest would be a 
client side bug (hedwig client or jms).


If you could reproduce the issue with debug logging enabled for root logger, I 
can definitely help narrow down the issue with those logs !
Unfortunately, there are almost no testcases in hedwig client : so I am not 
sure what design or implementation changes happened in client (or server ?) - 
since I am not keeping track of bookkeeper/hedwig anymore.


 BasicJMSTest failed
 ---

 Key: BOOKKEEPER-648
 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-648
 Project: Bookkeeper
  Issue Type: Bug
  Components: hedwig-client
Reporter: Flavio Junqueira
Assignee: Mridul Muralidharan

 While running tests, I got once a failure for this hedwig-client-jms test: 
 BasicJMSTest.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (BOOKKEEPER-648) BasicJMSTest failed

2013-07-25 Thread Mridul Muralidharan (JIRA)

[ 
https://issues.apache.org/jira/browse/BOOKKEEPER-648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13719348#comment-13719348
 ] 

Mridul Muralidharan commented on BOOKKEEPER-648:



Flavio, I am not sure what the expectation of this bug is - is it appropriate 
debug message for this testcase ?
As I detailed above - the only message possible is : no message was received 
before timeout - unfortunately, I dont think that is going to help us much in 
debugging it.
The reason I did not add descriptive message for every assertion in the tests 
is because the corresponding JMS api's detail the error conditions in detail 
(when a null can be returned from receive is detailed in JMS api javadocs for 
example).


To actually debug/fix the issue, we will need to enable debug logging in 
server, client api and jms mode.
Subsequently, when issue is observed, we will need to trace whether message was 
actually sent to server, whether server dispatched to both subscribers, whether 
client api received it, and whether it was dispatched to jms.
The jms module does nothing different from any other api user of hedwig - 
barring bugs in it ofcourse :-)

This is an assertion which is typically not expected to fail unless there is 
something broken elsewhere which is causing message loss.

 BasicJMSTest failed
 ---

 Key: BOOKKEEPER-648
 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-648
 Project: Bookkeeper
  Issue Type: Bug
  Components: hedwig-client
Reporter: Flavio Junqueira
Assignee: Mridul Muralidharan

 While running tests, I got once a failure for this hedwig-client-jms test: 
 BasicJMSTest.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (BOOKKEEPER-632) AutoRecovery should consider read only bookies

2013-07-25 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/BOOKKEEPER-632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13719445#comment-13719445
 ] 

Hadoop QA commented on BOOKKEEPER-632:
--

Testing JIRA BOOKKEEPER-632


Patch 
[BOOKKEEPER-632.patch|https://issues.apache.org/jira/secure/attachment/12594120/BOOKKEEPER-632.patch]
 downloaded at Thu Jul 25 09:17:47 UTC 2013



{color:green}+1 PATCH_APPLIES{color}
{color:green}+1 CLEAN{color}
{color:green}+1 RAW_PATCH_ANALYSIS{color}
.{color:green}+1{color} the patch does not introduce any @author tags
.{color:green}+1{color} the patch does not introduce any tabs
.{color:green}+1{color} the patch does not introduce any trailing spaces
.{color:green}+1{color} the patch does not introduce any line longer than 
120
.{color:green}+1{color} the patch does adds/modifies 2 testcase(s)
{color:green}+1 RAT{color}
.{color:green}+1{color} the patch does not seem to introduce new RAT 
warnings
{color:green}+1 JAVADOC{color}
.{color:green}+1{color} the patch does not seem to introduce new Javadoc 
warnings
{color:green}+1 COMPILE{color}
.{color:green}+1{color} HEAD compiles
.{color:green}+1{color} patch compiles
.{color:green}+1{color} the patch does not seem to introduce new javac 
warnings
{color:green}+1 FINDBUGS{color}
.{color:green}+1{color} the patch does not seem to introduce new Findbugs 
warnings
{color:green}+1 TESTS{color}
.Tests run: 866
{color:green}+1 DISTRO{color}
.{color:green}+1{color} distro tarball builds with the patch 


{color:green}*+1 Overall result, good!, no -1s*{color}


The full output of the test-patch run is available at

.   https://builds.apache.org/job/bookkeeper-trunk-precommit-build/438/

 AutoRecovery should consider read only bookies
 --

 Key: BOOKKEEPER-632
 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-632
 Project: Bookkeeper
  Issue Type: Bug
  Components: bookkeeper-auto-recovery, bookkeeper-server
Affects Versions: 4.2.1, 4.3.0
Reporter: Vinay
Assignee: Vinay
 Fix For: 4.2.2, 4.3.0

 Attachments: BOOKKEEPER-632.patch, BOOKKEEPER-632.patch, 
 BOOKKEEPER-632.patch, BOOKKEEPER-632.patch, BOOKKEEPER-632.patch, 
 BOOKKEEPER-632.patch


 Autorecovery Auditor should consider the readonly bookies as Available 
 Bookies  while publishing the under-replicated ledgers.
 Also AutoRecoveryDaemon should shutdown if the local bookie is readonly

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (BOOKKEEPER-604) Ledger storage can log an exception if GC happens concurrently.

2013-07-25 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/BOOKKEEPER-604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13719470#comment-13719470
 ] 

Hadoop QA commented on BOOKKEEPER-604:
--

Testing JIRA BOOKKEEPER-604


Patch 
[BOOKKEEPER-604.diff|https://issues.apache.org/jira/secure/attachment/12594107/BOOKKEEPER-604.diff]
 downloaded at Thu Jul 25 09:44:58 UTC 2013



{color:green}+1 PATCH_APPLIES{color}
{color:green}+1 CLEAN{color}
{color:green}+1 RAW_PATCH_ANALYSIS{color}
.{color:green}+1{color} the patch does not introduce any @author tags
.{color:green}+1{color} the patch does not introduce any tabs
.{color:green}+1{color} the patch does not introduce any trailing spaces
.{color:green}+1{color} the patch does not introduce any line longer than 
120
.{color:green}+1{color} the patch does adds/modifies 1 testcase(s)
{color:green}+1 RAT{color}
.{color:green}+1{color} the patch does not seem to introduce new RAT 
warnings
{color:green}+1 JAVADOC{color}
.{color:green}+1{color} the patch does not seem to introduce new Javadoc 
warnings
{color:green}+1 COMPILE{color}
.{color:green}+1{color} HEAD compiles
.{color:green}+1{color} patch compiles
.{color:green}+1{color} the patch does not seem to introduce new javac 
warnings
{color:green}+1 FINDBUGS{color}
.{color:green}+1{color} the patch does not seem to introduce new Findbugs 
warnings
{color:green}+1 TESTS{color}
.Tests run: 861
{color:green}+1 DISTRO{color}
.{color:green}+1{color} distro tarball builds with the patch 


{color:green}*+1 Overall result, good!, no -1s*{color}


The full output of the test-patch run is available at

.   https://builds.apache.org/job/bookkeeper-trunk-precommit-build/439/

 Ledger storage can log an exception if GC happens concurrently.
 ---

 Key: BOOKKEEPER-604
 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-604
 Project: Bookkeeper
  Issue Type: Bug
Reporter: Ivan Kelly
Assignee: Ivan Kelly
 Fix For: 4.2.2, 4.3.0

 Attachments: 
 0001-BOOKKEEPER-604-Ledger-storage-can-log-an-exception-i.patch, 
 0001-BOOKKEEPER-604-Ledger-storage-can-log-an-exception-i.patch, 
 0001-BOOKKEEPER-604-Ledger-storage-can-log-an-exception-i.patch, 
 0001-BOOKKEEPER-604-Ledger-storage-can-log-an-exception-i.patch, 
 BOOKKEEPER-604.diff, BOOKKEEPER-604.diff


 If a ledger is flushing, and part way through,GC kicks in, it can delete the 
 index file before we try and flush it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (BOOKKEEPER-596) Ledgers are gc'ed by mistake in MSLedgerManagerFactory.

2013-07-25 Thread Sijie Guo (JIRA)

[ 
https://issues.apache.org/jira/browse/BOOKKEEPER-596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13720342#comment-13720342
 ] 

Sijie Guo commented on BOOKKEEPER-596:
--

[~merlimat]

I guessed I found the root cause. it is a bug introduce in this patch, where 
Ivan refactored the LedgerRange.

int HierarchicalLedgerManager, the subSet is misused, which exclude the last 
ledger id. 
http://docs.oracle.com/javase/6/docs/api/java/util/SortedSet.html#subSet(E, E)
{code}
return new 
LedgerRange(zkActiveLedgers.subSet(getStartLedgerIdByLevel(level1, level2),
  
getEndLedgerIdByLevel(level1, level2)));
{code}

so the last ledger in each level would be gc'ed. it is easy to reproduce this 
issue and fix it.

 Ledgers are gc'ed by mistake in MSLedgerManagerFactory.
 ---

 Key: BOOKKEEPER-596
 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-596
 Project: Bookkeeper
  Issue Type: Bug
Affects Versions: 4.2.0, 4.2.1
Reporter: Sijie Guo
Assignee: Sijie Guo
Priority: Blocker
 Fix For: 4.2.2, 4.3.0

 Attachments: 
 0001-BOOKKEEPER-596-Ledgers-are-gc-ed-by-mistake-in-MSLed.patch, 
 0001-BOOKKEEPER-596-Ledgers-are-gc-ed-by-mistake-in-MSLed.patch, 
 BOOKKEEPER-596.patch, BOOKKEEPER-596.patch, BOOKKEEPER-596.patch


 details: 
 https://issues.apache.org/jira/browse/BOOKKEEPER-590?focusedCommentId=13616397page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13616397

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (BOOKKEEPER-654) Bookkeeper client operations are allowed even after its closure, bk#close()

2013-07-25 Thread Rakesh R (JIRA)

[ 
https://issues.apache.org/jira/browse/BOOKKEEPER-654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13720403#comment-13720403
 ] 

Rakesh R commented on BOOKKEEPER-654:
-

Thanks [~hustlmsp] for the comments. Few clarifications:
bq.1. in OrderedSafeExecutor, why not just catch the rejected exception rather 
than adding an extra boolean flag. since this flag doesn't avoid throwing 
rejected exception.

From ThreadPoolExecutor JavaDoc, it says, will be rejected when the Executor 
has been shut down, and also when the Executor uses finite bounds for both 
maximum threads and work queue capacity, and is saturated.

ThreadPoolExecutor.java
{code}
public void execute(Runnable command) {
//
//
reject(command); // is shutdown or saturated
{code}

I've added the flag to convey to the user about the actual cause(either due to 
bk.close() or some other reason). Otherwise we need to iterate over the 
'executor.threads[i]' and see whether its shutdown like below. Any other better 
way to handle this?
OrderedSafeExecutor.java
{code}
for (int i = 0; i  executor.threads.length; i++) {
if(executor.threads[i].isShutdown()){
  
safeOperationComplete(BKException.Code.BkClientClosedException,
result);
  return;
}
}
{code}

bq.2. in LedgerOpenOp, why we need #readComplete here? an unscheduled 
speculative task doesn't affect any logic. 
If we throws exception back to the caller, (callers are: 
LedgerRecovery#doRecoveryRead() and LedgerHandle.asyncReadEntries()) callers 
need to duplicate the logic of handling the exception and return the 
BkClientClosedException. Whats your opinion?

Also, I need to remove 'if(bk.bookieClient.isClosed())' check added at 
LedgerHandle.asyncReadEntries(), its not required.



 Bookkeeper client operations are allowed even after its closure, bk#close()
 ---

 Key: BOOKKEEPER-654
 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-654
 Project: Bookkeeper
  Issue Type: Bug
  Components: bookkeeper-client
Affects Versions: 4.2.0
Reporter: Rakesh R
Assignee: Rakesh R
 Fix For: 4.2.2, 4.3.0

 Attachments: 0001-BOOKKEEPER-654-testcase-to-understand-more.patch, 
 0002-BOOKKEEPER-654.patch, 0003-BOOKKEEPER-654.patch


 User can perform below operations with the closed bookkeeper client, which 
 was instantiated with external zkclient.
 - open a closed ledger 
 - create a new ledger 
 Also, ledgerhandle operations like fencing/add/write are infinitely hanging.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (BOOKKEEPER-654) Bookkeeper client operations are allowed even after its closure, bk#close()

2013-07-25 Thread Sijie Guo (JIRA)

[ 
https://issues.apache.org/jira/browse/BOOKKEEPER-654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13720414#comment-13720414
 ] 

Sijie Guo commented on BOOKKEEPER-654:
--

 From ThreadPoolExecutor JavaDoc, it says, will be rejected when the Executor 
 has been shut down, and also when the Executor uses finite bounds for both 
 maximum threads and work queue capacity, and is saturated.

but you still need to catch the exception as I commented. a shutdown flag can't 
avoid submitting tasks to a shutdown scheduler. that's the point.


 If we throws exception back to the caller, (callers are: 
 LedgerRecovery#doRecoveryRead() and LedgerHandle.asyncReadEntries()) callers 
 need to duplicate the logic of handling the exception and return the 
 BkClientClosedException. Whats your opinion?

my point is a failure speculative task doesn't affect anything. since the 
original read request would fail due to bookie client is closed.

isClosed checking is also not necessary. the errors already be propagated from 
either worker pool callback or bookie client.  

 Bookkeeper client operations are allowed even after its closure, bk#close()
 ---

 Key: BOOKKEEPER-654
 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-654
 Project: Bookkeeper
  Issue Type: Bug
  Components: bookkeeper-client
Affects Versions: 4.2.0
Reporter: Rakesh R
Assignee: Rakesh R
 Fix For: 4.2.2, 4.3.0

 Attachments: 0001-BOOKKEEPER-654-testcase-to-understand-more.patch, 
 0002-BOOKKEEPER-654.patch, 0003-BOOKKEEPER-654.patch


 User can perform below operations with the closed bookkeeper client, which 
 was instantiated with external zkclient.
 - open a closed ledger 
 - create a new ledger 
 Also, ledgerhandle operations like fencing/add/write are infinitely hanging.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (BOOKKEEPER-602) we should have request timeouts rather than channel timeout in PerChannelBookieClient

2013-07-25 Thread Sijie Guo (JIRA)

 [ 
https://issues.apache.org/jira/browse/BOOKKEEPER-602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sijie Guo updated BOOKKEEPER-602:
-

Attachment: BOOKKEEPER-602.diff

add missing config in TestDeadLock to fix the failed test.

 we should have request timeouts rather than channel timeout in 
 PerChannelBookieClient
 -

 Key: BOOKKEEPER-602
 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-602
 Project: Bookkeeper
  Issue Type: Bug
Affects Versions: 4.2.0, 4.2.1
Reporter: Sijie Guo
Assignee: Aniruddha
 Fix For: 4.3.0

 Attachments: BOOKKEEPER-602.diff, BOOKKEEPER-602.diff


 currently we only have readTimeout in netty channel, it timeouts only when 
 there is no activities in that channel, but it can't track timeouts of 
 individual requests. if a channel continues having read entry activities, it 
 might shadow a slow add entry response, which is bad impacting add latency.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (BOOKKEEPER-429) Provide separate read and write threads in the bookkeeper server

2013-07-25 Thread Sijie Guo (JIRA)

 [ 
https://issues.apache.org/jira/browse/BOOKKEEPER-429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sijie Guo updated BOOKKEEPER-429:
-

Attachment: BOOKKEEPER-429.diff

addressed comment 3).

 Provide separate read and write threads in the bookkeeper server
 

 Key: BOOKKEEPER-429
 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-429
 Project: Bookkeeper
  Issue Type: Improvement
  Components: bookkeeper-server
Affects Versions: 4.2.0
Reporter: Aniruddha
Assignee: Aniruddha
 Fix For: 4.3.0

 Attachments: BK-429.patch, BOOKKEEPER-429.diff, BOOKKEEPER-429.diff


 The current bookkeeper server is single threaded. The same thread handles 
 reads and writes. When reads are slow (possibly because of excessive seeks), 
 add entry operations suffer in terms of latencies. Providing separate read 
 and write threads helps in reducing add entry latencies and increasing 
 throughput even when we're facing slow reads. Having a single read thread 
 also results in low disk utilization because seeks can't be ordered 
 efficiently by the OS. Multiple read threads would help in improving the read 
 throughput. 
 Discussion on this can be found at 
 http://mail-archives.apache.org/mod_mbox/zookeeper-bookkeeper-dev/201209.mbox/%3ccaolhydqpzn-v10zynfwud_h0qzrxtmjgttx7a9eofohyyty...@mail.gmail.com%3e
 Reviewboard : https://reviews.apache.org/r/7560/

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (BOOKKEEPER-164) Add checksumming for ledger index files

2013-07-25 Thread Sijie Guo (JIRA)

 [ 
https://issues.apache.org/jira/browse/BOOKKEEPER-164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sijie Guo updated BOOKKEEPER-164:
-

Fix Version/s: (was: 4.3.0)

 Add checksumming for ledger index files
 ---

 Key: BOOKKEEPER-164
 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-164
 Project: Bookkeeper
  Issue Type: Improvement
  Components: bookkeeper-server
Affects Versions: 4.0.0
Reporter: Sijie Guo
Assignee: Sijie Guo

 now bookie ledger index files lacks checksumming to prevent 
 truncation/corruption. if a ledger index file is truncated, the ledger index 
 file still works but responds wrong response when reading last confirmed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira