On Thu, Dec 22, 2011 at 10:13:13AM -0600, Daniel S. Kim wrote:
> I thought the messages persisted in the bookies even after someone consumes
> it. I had one test topic with one publisher and one subscriber. I published
> about 5 messages to the topic. I subscribed and consumed messages from my
> listener, which just prints out the message along with its sequence number.
> When I get rid of this listener and start another one, this new listener
> will get all previous messages from the topic. How is this possible if
> messages are not being piled up somewhere (bookies)? Does the hub keep all
> the messages? I am somewhat confused how consuming messages get rid of old
> messages. In my thought, they persisted in the bookies. Correct me if I am
> wrong.
The cleanup is lazy. Messages are stored in ledgers, one ledger per
topic. To cleanup up a message, you have to delete the ledger, so
obviously we can't do it for each individual message. Have a look at
MessageConsumedTask in
hedwig-server/src/main/java/org/apache/hedwig/server/subscriptions/AbstractSubscriptionManager.java
 
This runs every minute by default.

That said, you shouldn't be getting the consumed messages again if you
are using the same subscription id. If the subscription id is
different, then you're getting the messages which having been cleaned
up yet, which is fine as you will get all messages after that in order
also.

> 
> Also I would like to contribute by adding delete method (if it is possible)
> and topic eviction, etc. However, I feel that I need to study its system,
> but I am not seeing very much information at
> http://zookeeper.apache.org/bookkeeper/docs/trunk/hedwigDesign.html. Is
> there any other design documentation with more details? Where is the best
> place to learn how hedwig is built without 100% digging through codes?
There's some extra docs in the wiki
https://cwiki.apache.org/confluence/display/BOOKKEEPER/HedWig

-Ivan

Reply via email to