[ 
https://issues.apache.org/jira/browse/BOOKKEEPER-648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13719324#comment-13719324
 ] 

Mridul Muralidharan commented on BOOKKEEPER-648:
------------------------------------------------


On going over the code again, the testcase is extremely simple - it tests the 
sync and async api.

a) Create one publisher and two subscriber sessions for the topic (one via sync 
api, another async api of jms).

b) Send 4 messages through publisher session.

c) Sleep for 10 ms to ensure all messages have been sent - this should not be 
required, but hedwig sometimes takes a lot of time.

d) For sync test, wait for 100 ms for each message to be received : null is 
returned if it times out without receiving any message.
The number of times receive is called is equal to the number of times we sent 
messages - the test expects in-order-delivery without loss of messages as per 
hedwig api contract.
This is what is failing : we are not receiving all the messages we sent.

e) The async session listener does the same as (d) - except on the async 
listener : this did not get tested in this specific case due to validation 
failure in (d).
Looking more, we should probably add a sleep before checking the 
'messageCount.getValue() != CHAT_MESSAGES.length' condition - in case async 
listener is still running in parallel : though this is not the failure we are 
observing ...



Assuming you noticed the same assertion stacktrace in each case when it failed, 
it means no message was received before timeout in sync invocation.
This can be due to :

1) Hedwig or bookkeeper is inordinately slow for some reasons (slow hdd, filled 
up /tmp, low mem, tlb thrashing, etc ?) : in which case, simply bumping up the 
sleep time and receive timeout param will circumvent the issue.

2) There is some bug somewhere in the chain which is causing message drops - 
either at publish time or while sending it to subscribers or somewhere else ?

To get additional debugging info, there are log messages in jms module : but 
(particularly for this testcase) the jms module is a thin wrapper delegating to 
corresponding hedwig client api - so enabling debug there would be more helpful.
Actually, I would validate if the server actually sent messages to both the 
subscribers and they were received by the client - if yes, rest would be a 
client side bug (hedwig client or jms).


If you could reproduce the issue with debug logging enabled for root logger, I 
can definitely help narrow down the issue with those logs !
Unfortunately, there are almost no testcases in hedwig client : so I am not 
sure what design or implementation changes happened in client (or server ?) - 
since I am not keeping track of bookkeeper/hedwig anymore.

                
> BasicJMSTest failed
> -------------------
>
>                 Key: BOOKKEEPER-648
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-648
>             Project: Bookkeeper
>          Issue Type: Bug
>          Components: hedwig-client
>            Reporter: Flavio Junqueira
>            Assignee: Mridul Muralidharan
>
> While running tests, I got once a failure for this hedwig-client-jms test: 
> BasicJMSTest.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to