[jira] [Updated] (BOOKKEEPER-164) Add checksumming for ledger index files

2013-07-25 Thread Sijie Guo (JIRA)

 [ 
https://issues.apache.org/jira/browse/BOOKKEEPER-164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sijie Guo updated BOOKKEEPER-164:
-

Fix Version/s: (was: 4.3.0)

> Add checksumming for ledger index files
> ---
>
> Key: BOOKKEEPER-164
> URL: https://issues.apache.org/jira/browse/BOOKKEEPER-164
> Project: Bookkeeper
>  Issue Type: Improvement
>  Components: bookkeeper-server
>Affects Versions: 4.0.0
>Reporter: Sijie Guo
>Assignee: Sijie Guo
>
> now bookie ledger index files lacks checksumming to prevent 
> truncation/corruption. if a ledger index file is truncated, the ledger index 
> file still works but responds wrong response when reading last confirmed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (BOOKKEEPER-521) Move metastore and versioning package to bookkeeper-common module

2013-07-25 Thread Sijie Guo (JIRA)

 [ 
https://issues.apache.org/jira/browse/BOOKKEEPER-521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sijie Guo updated BOOKKEEPER-521:
-

Fix Version/s: (was: 4.3.0)

> Move metastore and versioning package to bookkeeper-common module
> -
>
> Key: BOOKKEEPER-521
> URL: https://issues.apache.org/jira/browse/BOOKKEEPER-521
> Project: Bookkeeper
>  Issue Type: Task
>Reporter: Sijie Guo
>
> It would be better to move versioning and metastore package and other common 
> things to a separated module 'bookkeeper-common'. in this module, they are 
> common classes could be shared crossing bookkeeper-server and hedwig.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (BOOKKEEPER-648) BasicJMSTest failed

2013-07-25 Thread Sijie Guo (JIRA)

[ 
https://issues.apache.org/jira/browse/BOOKKEEPER-648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13720434#comment-13720434
 ] 

Sijie Guo commented on BOOKKEEPER-648:
--

// d) For sync test, wait for 100 ms for each message to be received : null is 
returned if it times out without receiving any message.

I think a reliable way to receive the message is polling in a while loop until 
received message. otherwise, it might be affected by the slow down of the 
bookies.

> BasicJMSTest failed
> ---
>
> Key: BOOKKEEPER-648
> URL: https://issues.apache.org/jira/browse/BOOKKEEPER-648
> Project: Bookkeeper
>  Issue Type: Bug
>  Components: hedwig-client
>Reporter: Flavio Junqueira
>Assignee: Mridul Muralidharan
>
> While running tests, I got once a failure for this hedwig-client-jms test: 
> BasicJMSTest.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (BOOKKEEPER-429) Provide separate read and write threads in the bookkeeper server

2013-07-25 Thread Sijie Guo (JIRA)

 [ 
https://issues.apache.org/jira/browse/BOOKKEEPER-429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sijie Guo updated BOOKKEEPER-429:
-

Attachment: BOOKKEEPER-429.diff

addressed comment 3).

> Provide separate read and write threads in the bookkeeper server
> 
>
> Key: BOOKKEEPER-429
> URL: https://issues.apache.org/jira/browse/BOOKKEEPER-429
> Project: Bookkeeper
>  Issue Type: Improvement
>  Components: bookkeeper-server
>Affects Versions: 4.2.0
>Reporter: Aniruddha
>Assignee: Aniruddha
> Fix For: 4.3.0
>
> Attachments: BK-429.patch, BOOKKEEPER-429.diff, BOOKKEEPER-429.diff
>
>
> The current bookkeeper server is single threaded. The same thread handles 
> reads and writes. When reads are slow (possibly because of excessive seeks), 
> add entry operations suffer in terms of latencies. Providing separate read 
> and write threads helps in reducing add entry latencies and increasing 
> throughput even when we're facing slow reads. Having a single read thread 
> also results in low disk utilization because seeks can't be ordered 
> efficiently by the OS. Multiple read threads would help in improving the read 
> throughput. 
> Discussion on this can be found at 
> http://mail-archives.apache.org/mod_mbox/zookeeper-bookkeeper-dev/201209.mbox/%3ccaolhydqpzn-v10zynfwud_h0qzrxtmjgttx7a9eofohyyty...@mail.gmail.com%3e
> Reviewboard : https://reviews.apache.org/r/7560/

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (BOOKKEEPER-602) we should have request timeouts rather than channel timeout in PerChannelBookieClient

2013-07-25 Thread Sijie Guo (JIRA)

 [ 
https://issues.apache.org/jira/browse/BOOKKEEPER-602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sijie Guo updated BOOKKEEPER-602:
-

Attachment: BOOKKEEPER-602.diff

add missing config in TestDeadLock to fix the failed test.

> we should have request timeouts rather than channel timeout in 
> PerChannelBookieClient
> -
>
> Key: BOOKKEEPER-602
> URL: https://issues.apache.org/jira/browse/BOOKKEEPER-602
> Project: Bookkeeper
>  Issue Type: Bug
>Affects Versions: 4.2.0, 4.2.1
>Reporter: Sijie Guo
>Assignee: Aniruddha
> Fix For: 4.3.0
>
> Attachments: BOOKKEEPER-602.diff, BOOKKEEPER-602.diff
>
>
> currently we only have readTimeout in netty channel, it timeouts only when 
> there is no activities in that channel, but it can't track timeouts of 
> individual requests. if a channel continues having read entry activities, it 
> might shadow a slow add entry response, which is bad impacting add latency.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (BOOKKEEPER-654) Bookkeeper client operations are allowed even after its closure, bk#close()

2013-07-25 Thread Sijie Guo (JIRA)

[ 
https://issues.apache.org/jira/browse/BOOKKEEPER-654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13720414#comment-13720414
 ] 

Sijie Guo commented on BOOKKEEPER-654:
--

> From ThreadPoolExecutor JavaDoc, it says, will be rejected when the Executor 
> has been shut down, and also when the Executor uses finite bounds for both 
> maximum threads and work queue capacity, and is saturated.

but you still need to catch the exception as I commented. a shutdown flag can't 
avoid submitting tasks to a shutdown scheduler. that's the point.


> If we throws exception back to the caller, (callers are: 
> LedgerRecovery#doRecoveryRead() and LedgerHandle.asyncReadEntries()) callers 
> need to duplicate the logic of handling the exception and return the 
> BkClientClosedException. Whats your opinion?

my point is a failure speculative task doesn't affect anything. since the 
original read request would fail due to bookie client is closed.

isClosed checking is also not necessary. the errors already be propagated from 
either worker pool callback or bookie client.  

> Bookkeeper client operations are allowed even after its closure, bk#close()
> ---
>
> Key: BOOKKEEPER-654
> URL: https://issues.apache.org/jira/browse/BOOKKEEPER-654
> Project: Bookkeeper
>  Issue Type: Bug
>  Components: bookkeeper-client
>Affects Versions: 4.2.0
>Reporter: Rakesh R
>Assignee: Rakesh R
> Fix For: 4.2.2, 4.3.0
>
> Attachments: 0001-BOOKKEEPER-654-testcase-to-understand-more.patch, 
> 0002-BOOKKEEPER-654.patch, 0003-BOOKKEEPER-654.patch
>
>
> User can perform below operations with the closed bookkeeper client, which 
> was instantiated with external zkclient.
> - open a closed ledger 
> - create a new ledger 
> Also, ledgerhandle operations like fencing/add/write are infinitely hanging.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (BOOKKEEPER-654) Bookkeeper client operations are allowed even after its closure, bk#close()

2013-07-25 Thread Rakesh R (JIRA)

[ 
https://issues.apache.org/jira/browse/BOOKKEEPER-654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13720403#comment-13720403
 ] 

Rakesh R commented on BOOKKEEPER-654:
-

Thanks [~hustlmsp] for the comments. Few clarifications:
bq.1. in OrderedSafeExecutor, why not just catch the rejected exception rather 
than adding an extra boolean flag. since this flag doesn't avoid throwing 
rejected exception.

>From ThreadPoolExecutor JavaDoc, it says, will be rejected when the Executor 
>has been shut down, and also when the Executor uses finite bounds for both 
>maximum threads and work queue capacity, and is saturated.

ThreadPoolExecutor.java
{code}
public void execute(Runnable command) {
//
//
reject(command); // is shutdown or saturated
{code}

I've added the flag to convey to the user about the actual cause(either due to 
bk.close() or some other reason). Otherwise we need to iterate over the 
'executor.threads[i]' and see whether its shutdown like below. Any other better 
way to handle this?
OrderedSafeExecutor.java
{code}
for (int i = 0; i < executor.threads.length; i++) {
if(executor.threads[i].isShutdown()){
  
safeOperationComplete(BKException.Code.BkClientClosedException,
result);
  return;
}
}
{code}

bq.2. in LedgerOpenOp, why we need #readComplete here? an unscheduled 
speculative task doesn't affect any logic. 
If we throws exception back to the caller, (callers are: 
LedgerRecovery#doRecoveryRead() and LedgerHandle.asyncReadEntries()) callers 
need to duplicate the logic of handling the exception and return the 
BkClientClosedException. Whats your opinion?

Also, I need to remove 'if(bk.bookieClient.isClosed())' check added at 
LedgerHandle.asyncReadEntries(), its not required.



> Bookkeeper client operations are allowed even after its closure, bk#close()
> ---
>
> Key: BOOKKEEPER-654
> URL: https://issues.apache.org/jira/browse/BOOKKEEPER-654
> Project: Bookkeeper
>  Issue Type: Bug
>  Components: bookkeeper-client
>Affects Versions: 4.2.0
>Reporter: Rakesh R
>Assignee: Rakesh R
> Fix For: 4.2.2, 4.3.0
>
> Attachments: 0001-BOOKKEEPER-654-testcase-to-understand-more.patch, 
> 0002-BOOKKEEPER-654.patch, 0003-BOOKKEEPER-654.patch
>
>
> User can perform below operations with the closed bookkeeper client, which 
> was instantiated with external zkclient.
> - open a closed ledger 
> - create a new ledger 
> Also, ledgerhandle operations like fencing/add/write are infinitely hanging.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (BOOKKEEPER-596) Ledgers are gc'ed by mistake in MSLedgerManagerFactory.

2013-07-25 Thread Sijie Guo (JIRA)

[ 
https://issues.apache.org/jira/browse/BOOKKEEPER-596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13720342#comment-13720342
 ] 

Sijie Guo commented on BOOKKEEPER-596:
--

[~merlimat]

I guessed I found the root cause. it is a bug introduce in this patch, where 
Ivan refactored the LedgerRange.

int HierarchicalLedgerManager, the subSet is misused, which exclude the last 
ledger id. 
http://docs.oracle.com/javase/6/docs/api/java/util/SortedSet.html#subSet(E, E)
{code}
return new 
LedgerRange(zkActiveLedgers.subSet(getStartLedgerIdByLevel(level1, level2),
  
getEndLedgerIdByLevel(level1, level2)));
{code}

so the last ledger in each level would be gc'ed. it is easy to reproduce this 
issue and fix it.

> Ledgers are gc'ed by mistake in MSLedgerManagerFactory.
> ---
>
> Key: BOOKKEEPER-596
> URL: https://issues.apache.org/jira/browse/BOOKKEEPER-596
> Project: Bookkeeper
>  Issue Type: Bug
>Affects Versions: 4.2.0, 4.2.1
>Reporter: Sijie Guo
>Assignee: Sijie Guo
>Priority: Blocker
> Fix For: 4.2.2, 4.3.0
>
> Attachments: 
> 0001-BOOKKEEPER-596-Ledgers-are-gc-ed-by-mistake-in-MSLed.patch, 
> 0001-BOOKKEEPER-596-Ledgers-are-gc-ed-by-mistake-in-MSLed.patch, 
> BOOKKEEPER-596.patch, BOOKKEEPER-596.patch, BOOKKEEPER-596.patch
>
>
> details: 
> https://issues.apache.org/jira/browse/BOOKKEEPER-590?focusedCommentId=13616397&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13616397

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Reopened] (BOOKKEEPER-596) Ledgers are gc'ed by mistake in MSLedgerManagerFactory.

2013-07-25 Thread Matteo Merli (JIRA)

 [ 
https://issues.apache.org/jira/browse/BOOKKEEPER-596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matteo Merli reopened BOOKKEEPER-596:
-


It seems like the issue is not completely fixed by this patch, it looks 
probably worse. 

I've verified that after a garbage collection cleaning, some ledgers (whose 
metadata is untouched in ZK) were delete from the bookies. This eventually 
triggers error when either adding entries to these ledgers (infinite loop in 
the client.. ) or when reading (entries not found).

This was using the HiearchicalLedgerManager (not sure the issue applies to all 
ledgers managers).

I'm trying to isolate a simple way to reproduce the issue, for now I think it's 
more likely to happen when a big number of ledgers are deleted in a short time 
and hence collected in the same gc cycle.

> Ledgers are gc'ed by mistake in MSLedgerManagerFactory.
> ---
>
> Key: BOOKKEEPER-596
> URL: https://issues.apache.org/jira/browse/BOOKKEEPER-596
> Project: Bookkeeper
>  Issue Type: Bug
>Affects Versions: 4.2.0, 4.2.1
>Reporter: Sijie Guo
>Assignee: Sijie Guo
>Priority: Blocker
> Fix For: 4.2.2, 4.3.0
>
> Attachments: 
> 0001-BOOKKEEPER-596-Ledgers-are-gc-ed-by-mistake-in-MSLed.patch, 
> 0001-BOOKKEEPER-596-Ledgers-are-gc-ed-by-mistake-in-MSLed.patch, 
> BOOKKEEPER-596.patch, BOOKKEEPER-596.patch, BOOKKEEPER-596.patch
>
>
> details: 
> https://issues.apache.org/jira/browse/BOOKKEEPER-590?focusedCommentId=13616397&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13616397

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (BOOKKEEPER-604) Ledger storage can log an exception if GC happens concurrently.

2013-07-25 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/BOOKKEEPER-604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13719470#comment-13719470
 ] 

Hadoop QA commented on BOOKKEEPER-604:
--

Testing JIRA BOOKKEEPER-604


Patch 
[BOOKKEEPER-604.diff|https://issues.apache.org/jira/secure/attachment/12594107/BOOKKEEPER-604.diff]
 downloaded at Thu Jul 25 09:44:58 UTC 2013



{color:green}+1 PATCH_APPLIES{color}
{color:green}+1 CLEAN{color}
{color:green}+1 RAW_PATCH_ANALYSIS{color}
.{color:green}+1{color} the patch does not introduce any @author tags
.{color:green}+1{color} the patch does not introduce any tabs
.{color:green}+1{color} the patch does not introduce any trailing spaces
.{color:green}+1{color} the patch does not introduce any line longer than 
120
.{color:green}+1{color} the patch does adds/modifies 1 testcase(s)
{color:green}+1 RAT{color}
.{color:green}+1{color} the patch does not seem to introduce new RAT 
warnings
{color:green}+1 JAVADOC{color}
.{color:green}+1{color} the patch does not seem to introduce new Javadoc 
warnings
{color:green}+1 COMPILE{color}
.{color:green}+1{color} HEAD compiles
.{color:green}+1{color} patch compiles
.{color:green}+1{color} the patch does not seem to introduce new javac 
warnings
{color:green}+1 FINDBUGS{color}
.{color:green}+1{color} the patch does not seem to introduce new Findbugs 
warnings
{color:green}+1 TESTS{color}
.Tests run: 861
{color:green}+1 DISTRO{color}
.{color:green}+1{color} distro tarball builds with the patch 


{color:green}*+1 Overall result, good!, no -1s*{color}


The full output of the test-patch run is available at

.   https://builds.apache.org/job/bookkeeper-trunk-precommit-build/439/

> Ledger storage can log an exception if GC happens concurrently.
> ---
>
> Key: BOOKKEEPER-604
> URL: https://issues.apache.org/jira/browse/BOOKKEEPER-604
> Project: Bookkeeper
>  Issue Type: Bug
>Reporter: Ivan Kelly
>Assignee: Ivan Kelly
> Fix For: 4.2.2, 4.3.0
>
> Attachments: 
> 0001-BOOKKEEPER-604-Ledger-storage-can-log-an-exception-i.patch, 
> 0001-BOOKKEEPER-604-Ledger-storage-can-log-an-exception-i.patch, 
> 0001-BOOKKEEPER-604-Ledger-storage-can-log-an-exception-i.patch, 
> 0001-BOOKKEEPER-604-Ledger-storage-can-log-an-exception-i.patch, 
> BOOKKEEPER-604.diff, BOOKKEEPER-604.diff
>
>
> If a ledger is flushing, and part way through,GC kicks in, it can delete the 
> index file before we try and flush it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (BOOKKEEPER-632) AutoRecovery should consider read only bookies

2013-07-25 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/BOOKKEEPER-632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13719445#comment-13719445
 ] 

Hadoop QA commented on BOOKKEEPER-632:
--

Testing JIRA BOOKKEEPER-632


Patch 
[BOOKKEEPER-632.patch|https://issues.apache.org/jira/secure/attachment/12594120/BOOKKEEPER-632.patch]
 downloaded at Thu Jul 25 09:17:47 UTC 2013



{color:green}+1 PATCH_APPLIES{color}
{color:green}+1 CLEAN{color}
{color:green}+1 RAW_PATCH_ANALYSIS{color}
.{color:green}+1{color} the patch does not introduce any @author tags
.{color:green}+1{color} the patch does not introduce any tabs
.{color:green}+1{color} the patch does not introduce any trailing spaces
.{color:green}+1{color} the patch does not introduce any line longer than 
120
.{color:green}+1{color} the patch does adds/modifies 2 testcase(s)
{color:green}+1 RAT{color}
.{color:green}+1{color} the patch does not seem to introduce new RAT 
warnings
{color:green}+1 JAVADOC{color}
.{color:green}+1{color} the patch does not seem to introduce new Javadoc 
warnings
{color:green}+1 COMPILE{color}
.{color:green}+1{color} HEAD compiles
.{color:green}+1{color} patch compiles
.{color:green}+1{color} the patch does not seem to introduce new javac 
warnings
{color:green}+1 FINDBUGS{color}
.{color:green}+1{color} the patch does not seem to introduce new Findbugs 
warnings
{color:green}+1 TESTS{color}
.Tests run: 866
{color:green}+1 DISTRO{color}
.{color:green}+1{color} distro tarball builds with the patch 


{color:green}*+1 Overall result, good!, no -1s*{color}


The full output of the test-patch run is available at

.   https://builds.apache.org/job/bookkeeper-trunk-precommit-build/438/

> AutoRecovery should consider read only bookies
> --
>
> Key: BOOKKEEPER-632
> URL: https://issues.apache.org/jira/browse/BOOKKEEPER-632
> Project: Bookkeeper
>  Issue Type: Bug
>  Components: bookkeeper-auto-recovery, bookkeeper-server
>Affects Versions: 4.2.1, 4.3.0
>Reporter: Vinay
>Assignee: Vinay
> Fix For: 4.2.2, 4.3.0
>
> Attachments: BOOKKEEPER-632.patch, BOOKKEEPER-632.patch, 
> BOOKKEEPER-632.patch, BOOKKEEPER-632.patch, BOOKKEEPER-632.patch, 
> BOOKKEEPER-632.patch
>
>
> Autorecovery Auditor should consider the readonly bookies as Available 
> Bookies  while publishing the under-replicated ledgers.
> Also AutoRecoveryDaemon should shutdown if the local bookie is readonly

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (BOOKKEEPER-648) BasicJMSTest failed

2013-07-25 Thread Mridul Muralidharan (JIRA)

[ 
https://issues.apache.org/jira/browse/BOOKKEEPER-648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13719348#comment-13719348
 ] 

Mridul Muralidharan commented on BOOKKEEPER-648:



Flavio, I am not sure what the expectation of this bug is - is it appropriate 
debug message for this testcase ?
As I detailed above - the only message possible is : "no message was received 
before timeout" - unfortunately, I dont think that is going to help us much in 
debugging it.
The reason I did not add descriptive message for every assertion in the tests 
is because the corresponding JMS api's detail the error conditions in detail 
(when a null can be returned from receive is detailed in JMS api javadocs for 
example).


To actually debug/fix the issue, we will need to enable debug logging in 
server, client api and jms mode.
Subsequently, when issue is observed, we will need to trace whether message was 
actually sent to server, whether server dispatched to both subscribers, whether 
client api received it, and whether it was dispatched to jms.
The jms module does nothing different from any other api user of hedwig - 
barring bugs in it ofcourse :-)

This is an assertion which is typically not expected to fail unless there is 
something broken elsewhere which is causing message loss.

> BasicJMSTest failed
> ---
>
> Key: BOOKKEEPER-648
> URL: https://issues.apache.org/jira/browse/BOOKKEEPER-648
> Project: Bookkeeper
>  Issue Type: Bug
>  Components: hedwig-client
>Reporter: Flavio Junqueira
>Assignee: Mridul Muralidharan
>
> While running tests, I got once a failure for this hedwig-client-jms test: 
> BasicJMSTest.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (BOOKKEEPER-648) BasicJMSTest failed

2013-07-25 Thread Flavio Junqueira (JIRA)

[ 
https://issues.apache.org/jira/browse/BOOKKEEPER-648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13719329#comment-13719329
 ] 

Flavio Junqueira commented on BOOKKEEPER-648:
-

Part of the issue is that we don't have enough message logs to determine what 
went wrong. All I got is in the first comment of this jira, which is not 
sufficient to help with tracking the problem.

> BasicJMSTest failed
> ---
>
> Key: BOOKKEEPER-648
> URL: https://issues.apache.org/jira/browse/BOOKKEEPER-648
> Project: Bookkeeper
>  Issue Type: Bug
>  Components: hedwig-client
>Reporter: Flavio Junqueira
>Assignee: Mridul Muralidharan
>
> While running tests, I got once a failure for this hedwig-client-jms test: 
> BasicJMSTest.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (BOOKKEEPER-648) BasicJMSTest failed

2013-07-25 Thread Mridul Muralidharan (JIRA)

[ 
https://issues.apache.org/jira/browse/BOOKKEEPER-648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13719324#comment-13719324
 ] 

Mridul Muralidharan commented on BOOKKEEPER-648:



On going over the code again, the testcase is extremely simple - it tests the 
sync and async api.

a) Create one publisher and two subscriber sessions for the topic (one via sync 
api, another async api of jms).

b) Send 4 messages through publisher session.

c) Sleep for 10 ms to ensure all messages have been sent - this should not be 
required, but hedwig sometimes takes a lot of time.

d) For sync test, wait for 100 ms for each message to be received : null is 
returned if it times out without receiving any message.
The number of times receive is called is equal to the number of times we sent 
messages - the test expects in-order-delivery without loss of messages as per 
hedwig api contract.
This is what is failing : we are not receiving all the messages we sent.

e) The async session listener does the same as (d) - except on the async 
listener : this did not get tested in this specific case due to validation 
failure in (d).
Looking more, we should probably add a sleep before checking the 
'messageCount.getValue() != CHAT_MESSAGES.length' condition - in case async 
listener is still running in parallel : though this is not the failure we are 
observing ...



Assuming you noticed the same assertion stacktrace in each case when it failed, 
it means no message was received before timeout in sync invocation.
This can be due to :

1) Hedwig or bookkeeper is inordinately slow for some reasons (slow hdd, filled 
up /tmp, low mem, tlb thrashing, etc ?) : in which case, simply bumping up the 
sleep time and receive timeout param will circumvent the issue.

2) There is some bug somewhere in the chain which is causing message drops - 
either at publish time or while sending it to subscribers or somewhere else ?

To get additional debugging info, there are log messages in jms module : but 
(particularly for this testcase) the jms module is a thin wrapper delegating to 
corresponding hedwig client api - so enabling debug there would be more helpful.
Actually, I would validate if the server actually sent messages to both the 
subscribers and they were received by the client - if yes, rest would be a 
client side bug (hedwig client or jms).


If you could reproduce the issue with debug logging enabled for root logger, I 
can definitely help narrow down the issue with those logs !
Unfortunately, there are almost no testcases in hedwig client : so I am not 
sure what design or implementation changes happened in client (or server ?) - 
since I am not keeping track of bookkeeper/hedwig anymore.


> BasicJMSTest failed
> ---
>
> Key: BOOKKEEPER-648
> URL: https://issues.apache.org/jira/browse/BOOKKEEPER-648
> Project: Bookkeeper
>  Issue Type: Bug
>  Components: hedwig-client
>Reporter: Flavio Junqueira
>Assignee: Mridul Muralidharan
>
> While running tests, I got once a failure for this hedwig-client-jms test: 
> BasicJMSTest.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (BOOKKEEPER-632) AutoRecovery should consider read only bookies

2013-07-25 Thread Vinay (JIRA)

 [ 
https://issues.apache.org/jira/browse/BOOKKEEPER-632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinay updated BOOKKEEPER-632:
-

Attachment: BOOKKEEPER-632.patch

Added timeout

> AutoRecovery should consider read only bookies
> --
>
> Key: BOOKKEEPER-632
> URL: https://issues.apache.org/jira/browse/BOOKKEEPER-632
> Project: Bookkeeper
>  Issue Type: Bug
>  Components: bookkeeper-auto-recovery, bookkeeper-server
>Affects Versions: 4.2.1, 4.3.0
>Reporter: Vinay
>Assignee: Vinay
> Fix For: 4.2.2, 4.3.0
>
> Attachments: BOOKKEEPER-632.patch, BOOKKEEPER-632.patch, 
> BOOKKEEPER-632.patch, BOOKKEEPER-632.patch, BOOKKEEPER-632.patch, 
> BOOKKEEPER-632.patch
>
>
> Autorecovery Auditor should consider the readonly bookies as Available 
> Bookies  while publishing the under-replicated ledgers.
> Also AutoRecoveryDaemon should shutdown if the local bookie is readonly

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira