i'm not sure cross posting is such a good idea, it is probably better to discuss these comparisons relative to a project's viewpoint.

the problem space is similar, and i imagine as kafka develops the implementations may become even closer.

there are a couple of obvious things:

1) hedwig has strong durability guarantees. kafka can lose data due to failures. 2) hedwig was designed for lots of topics (100,000s) with low fan out (few subscribers/publishers). i think kafka is designed for a few topics with lots of subscribers and publishers. 3) hedwig tracks subscribers progress for gc of publishes. kafka uses time based gc. 4) hedwig will replay messages to subscribers starting from the last message they explicitly consumed. kafka allows subscribers to replay messages that they have already consumed.

there are probably others.

i really like the kafka design choices made for 3 and 4. hedwig will work on scaling to more subscribers/publishers per topic. i imagine, if needed, kafka will work on their durability guarantees and support for large number of topics.

ben

On 02/10/2011 01:27 AM, Thomas Koch wrote:
Flavio Junqueira:
Thomas, Did you mean to say Hedwig instead of BookKeeper?
Oh sh..ugar yeah. Thanks. Start over again:

I've just had a look at the kafka slides[1] from January HUG. It seems to me,
that Hedwig[2] and kafka are quite similar in there problem space. Is that
so? What are notable differences?
(Kafka is written in scala and therefor must be a lot cooler :-)

[1]<http://developer.yahoo.com/blogs/hadoop/posts/2011/02/hadoop-user-group-
january-2011-recap/>
[2] http://cwiki.apache.org/confluence/display/ZOOKEEPER/HedWig

Thomas Koch, http://www.koch.ro

Reply via email to