Hi Taylor, I've added a mention about Kafka's lack of an index to the client/driver doc, since it might confuse new users. I'll include your methods on how to cope when I write more end-user documentation.
FWIW, we ended up going with option 1, storing the history in a DB. Unlike your N-messages need, our need was primarily time based ("re-process all the messages received from time X to time Y", where X and Y may be separated by hours). In that respect, we'll be quite happy when this one gets implemented: https://issues.apache.org/jira/browse/KAFKA-87 Please pardon the lack of updates to the doc in the past week. I haven't abandoned it -- we just really need to get ZooKeeper aware producers/consumers working properly in brod, and that's where much of my time has gone in the last week. Thank you. Dave On Thu, Dec 1, 2011 at 10:22 PM, Taylor Gautier <tgaut...@tagged.com> wrote: > One thing we should make clear somewhere is that while Kafka has a history > mechanism, it doesn't provide an index. > > I probably moved forward in my implementation (and selection) to use Kafka > for 3-4 weeks before realizing that I would not be able to efficiently > query Kafka for the N-1000th message. > > This was nearly a deal killer for us, but there are several available > workarounds/solutions: > > - Keep the history somewhere, outside of Kafka, e.g. in a DB, memcache, > in memory, whatever, if you need to rewind N messages ago. This kind of > assumes you have clients that are always making forward progress and > working against the Kafka stream. If you have ephemeral clients that > come > and go, and don't have history with the stream, it doesn't work so well > - Make a minor modification to Kafka to have it implement a reverse > linked list - where each message also stores the offset of the previous > message > - Make a medium change to Kafka to have it store an index of message > offsets in a secondary topic > > We went with option #3... > > On Tue, Nov 29, 2011 at 9:06 AM, David Ormsbee <d...@datadoghq.com> wrote: > > > Hi Taylor, > > > > Yeah, Joe brought up the need for this distinction as well. When I > > move the doc over to the wiki, I'll try to consistently use "driver" > > to clear up ambiguities. The bits that are more higher-level client > > oriented are really just there for context, to explain why the network > > protocol is what it is. Things like the fetch and offsets requests are > > much easier to explain if you show how it connects to the > > implementation in the back. I wanted to create a single document that > > would take people 90% of the way there to writing a driver while > > assuming minimal prior knowledge, because it's the document I really > > wish I had last month. > > > > I always intended to write a separate document that would more > > comprehensively cover how to use our Python driver, but I imagine that > > part will vary substantially from one implementation to the next. I > > haven't started on that one yet just because our driver's API likely > > won't stabilize for another couple of weeks. > > > > Thank you. > > > > Dave > > > > > > On Tue, Nov 29, 2011 at 10:40 AM, Taylor Gautier <tgaut...@tagged.com> > > wrote: > > > Just wanted to add my $0.02 - I'm glad David wrote this - excellent job > > sir! > > > > > > My comment is this (I think it might have already been mentioned, > > however I > > > will re-iterate it): the document as is covers two audiences - those > > that > > > are writing Kafka "drivers" and those that are writing clients that > > publish > > > and consume to Kafka (using a "driver"). Most of the document is > geared > > > for the former, however there are some bits that are meant for or are > > > useful also to the latter. > > > > > > I would like to suggest that we split the document up and address each > > > audience separately. As great as it is that David wrote a lot of great > > > information for the "driver" writers, the need for that will slowly > > > decline, as the drivers slowly become more available and more stable > > > (there's only so many languages in the world). > > > > > > On the other hand, people will be writing their own "clients" using the > > > drivers far more often, so the latter audience will, assuming Kafka > > becomes > > > wildly successful, increase in need. Beefing up this part of the > > document > > > - by focusing on that audience, will be incredibly useful to new > > adopters. > > > > > > Incidentally, it might behoove us as a community to have strong > language > > > that separates these two activities. I used "driver" and "client" - I > am > > > not necessarily advocating for these terms but rather just that there > is > > a > > > need for terms that are distinct - it is important to separate the > > concepts > > > using language/syntax so that people do not get confused. > > > > > > On Tue, Nov 29, 2011 at 7:27 AM, David Ormsbee <d...@datadoghq.com> > > wrote: > > > > > >> HI Jay, > > >> > > >> > 1. Would you be willing to add this to the kafka wiki so we could > > make > > >> > this the official howto doc? > > >> > > >> Absolutely. > > >> > > >> > 2. It might be good to add a "how to contribute your client" > > section. > > >> > This would be hard to write right now because we haven't given > > anyone > > >> any > > >> > guidelines for doing it. We have been pretty liberal in accepting > > >> clients > > >> > kind of proceeding on the "something is better than nothing" > theory. > > >> But > > >> > this leads to clients of mixed quality and little documentation, > as > > >> you and > > >> > Joe noted. I will break this into a separate thread to broaden the > > >> > discussion. > > >> > > >> I'll be happy to add it as soon as we have consensus on what the > > >> guidelines should be. > > >> > > >> Thank you. > > >> > > >> Dave > > >> > > > > > >