Hi Taylor,

I've added a mention about Kafka's lack of an index to the client/driver
doc, since it might confuse new users. I'll include your methods on how to
cope when I write more end-user documentation.

FWIW, we ended up going with option 1, storing the history in a DB. Unlike
your N-messages need, our need was primarily time based ("re-process all
the messages received from time X to time Y", where X and Y may be
separated by hours). In that respect, we'll be quite happy when this one
gets implemented:

   https://issues.apache.org/jira/browse/KAFKA-87

Please pardon the lack of updates to the doc in the past week. I haven't
abandoned it -- we just really need to get ZooKeeper aware
producers/consumers working properly in brod, and that's where much of my
time has gone in the last week.

Thank you.

Dave

On Thu, Dec 1, 2011 at 10:22 PM, Taylor Gautier <tgaut...@tagged.com> wrote:

> One thing we should make clear somewhere is that while Kafka has a history
> mechanism, it doesn't provide an index.
>
> I probably moved forward in my implementation (and selection) to use Kafka
> for 3-4 weeks before realizing that I would not be able to efficiently
> query Kafka for the N-1000th message.
>
> This was nearly a deal killer for us, but there are several available
> workarounds/solutions:
>
>   - Keep the history somewhere, outside of Kafka, e.g. in a DB, memcache,
>   in memory, whatever, if you need to rewind N messages ago.  This kind of
>   assumes you have clients that are always making forward progress and
>   working against the Kafka stream.  If you have ephemeral clients that
> come
>   and go, and don't have history with the stream, it doesn't work so well
>   - Make a minor modification to Kafka to have it implement a reverse
>   linked list - where each message also stores the offset of the previous
>   message
>   - Make a medium change to Kafka to have it store an index of message
>   offsets in a secondary topic
>
> We went with option #3...
>
> On Tue, Nov 29, 2011 at 9:06 AM, David Ormsbee <d...@datadoghq.com> wrote:
>
> > Hi Taylor,
> >
> > Yeah, Joe brought up the need for this distinction as well. When I
> > move the doc over to the wiki, I'll try to consistently use "driver"
> > to clear up ambiguities. The bits that are more higher-level client
> > oriented are really just there for context, to explain why the network
> > protocol is what it is. Things like the fetch and offsets requests are
> > much easier to explain if you show how it connects to the
> > implementation in the back. I wanted to create a single document that
> > would take people 90% of the way there to writing a driver while
> > assuming minimal prior knowledge, because it's the document I really
> > wish I had last month.
> >
> > I always intended to write a separate document that would more
> > comprehensively cover how to use our Python driver, but I imagine that
> > part will vary substantially from one implementation to the next. I
> > haven't started on that one yet just because our driver's API likely
> > won't stabilize for another couple of weeks.
> >
> > Thank you.
> >
> > Dave
> >
> >
> > On Tue, Nov 29, 2011 at 10:40 AM, Taylor Gautier <tgaut...@tagged.com>
> > wrote:
> > > Just wanted to add my $0.02 - I'm glad David wrote this - excellent job
> > sir!
> > >
> > > My comment is this (I think it might have already been mentioned,
> > however I
> > > will re-iterate it):  the document as is covers two audiences - those
> > that
> > > are writing Kafka "drivers" and those that are writing clients that
> > publish
> > > and consume to Kafka (using a "driver").  Most of the document is
> geared
> > > for the former, however there are some bits that are meant for or are
> > > useful also to the latter.
> > >
> > > I would like to suggest that we split the document up and address each
> > > audience separately.  As great as it is that David wrote a lot of great
> > > information for the "driver" writers, the need for that will slowly
> > > decline, as the drivers slowly become more available and more stable
> > > (there's only so many languages in the world).
> > >
> > > On the other hand, people will be writing their own "clients" using the
> > > drivers far more often, so the latter audience will, assuming Kafka
> > becomes
> > > wildly successful, increase in need.  Beefing up this part of the
> > document
> > > - by focusing on that audience, will be incredibly useful to new
> > adopters.
> > >
> > > Incidentally, it might behoove us as a community to have strong
> language
> > > that separates these two activities.  I used "driver" and "client" - I
> am
> > > not necessarily advocating for these terms but rather just that there
> is
> > a
> > > need for terms that are distinct - it is important to separate the
> > concepts
> > > using language/syntax so that people do not get confused.
> > >
> > > On Tue, Nov 29, 2011 at 7:27 AM, David Ormsbee <d...@datadoghq.com>
> > wrote:
> > >
> > >> HI Jay,
> > >>
> > >> >   1. Would you be willing to add this to the kafka wiki so we could
> > make
> > >> >   this the official howto doc?
> > >>
> > >> Absolutely.
> > >>
> > >> >   2. It might be good to add a "how to contribute your client"
> > section.
> > >> >   This would be hard to write right now because we haven't given
> > anyone
> > >> any
> > >> >   guidelines for doing it. We have been pretty liberal in accepting
> > >> clients
> > >> >   kind of proceeding on the "something is better than nothing"
> theory.
> > >> But
> > >> >   this leads to clients of mixed quality and little documentation,
> as
> > >> you and
> > >> >   Joe noted. I will break this into a separate thread to broaden the
> > >> >   discussion.
> > >>
> > >> I'll be happy to add it as soon as we have consensus on what the
> > >> guidelines should be.
> > >>
> > >> Thank you.
> > >>
> > >> Dave
> > >>
> > >
> >
>

Reply via email to