On Thu, Dec 22, 2011 at 10:13:13AM -0600, Daniel S. Kim wrote: > I thought the messages persisted in the bookies even after someone consumes > it. I had one test topic with one publisher and one subscriber. I published > about 5 messages to the topic. I subscribed and consumed messages from my > listener, which just prints out the message along with its sequence number. > When I get rid of this listener and start another one, this new listener > will get all previous messages from the topic. How is this possible if > messages are not being piled up somewhere (bookies)? Does the hub keep all > the messages? I am somewhat confused how consuming messages get rid of old > messages. In my thought, they persisted in the bookies. Correct me if I am > wrong. The cleanup is lazy. Messages are stored in ledgers, one ledger per topic. To cleanup up a message, you have to delete the ledger, so obviously we can't do it for each individual message. Have a look at MessageConsumedTask in hedwig-server/src/main/java/org/apache/hedwig/server/subscriptions/AbstractSubscriptionManager.java This runs every minute by default.
That said, you shouldn't be getting the consumed messages again if you are using the same subscription id. If the subscription id is different, then you're getting the messages which having been cleaned up yet, which is fine as you will get all messages after that in order also. > > Also I would like to contribute by adding delete method (if it is possible) > and topic eviction, etc. However, I feel that I need to study its system, > but I am not seeing very much information at > http://zookeeper.apache.org/bookkeeper/docs/trunk/hedwigDesign.html. Is > there any other design documentation with more details? Where is the best > place to learn how hedwig is built without 100% digging through codes? There's some extra docs in the wiki https://cwiki.apache.org/confluence/display/BOOKKEEPER/HedWig -Ivan
