Today, the only safe way of controlling consumer state management is by using the SimpleConsumer. The application is responsible for checkpointing offsets. So, in your example, when you commit a database transaction, you can store your consumer's offset as part of the txn. So either your txn succeeds and the offset moves ahead or your txn fails and the offset stays where it is.
Kafka 0.9 is when we will attempt to merge the high level and low level consumer APIs, move the offset management to the broker and offer stronger offset checkpointing guarantees. Thanks, Neha On Mon, Mar 25, 2013 at 11:36 AM, Darren Sargent <dsarg...@richrelevance.com> wrote: > This is where you are reading messages from a broker, doing something with > the messages, then commit them to some permanent storage such as HBase. There > is a race condition in commiting the offsets to Zookeeper; if the DB write > succeeds, but the ZK commit fails for any reason, you'll get a duplicate > batch next time you query the broker. If you commit to ZK first, and the > commit to the DB then fails, you lose data. > > The Kafka white paper mentions that Kafka stays agnostic about the > distributed commit problem. There has been some prior discussion about this > but I haven't seen any solid solutions. If you're using something like > PostgreSQL that admits two-phase commits, you can roll the offset into the DB > transaction, assuming you're okay with storing offsets in the DB rather than > in ZK, but that's not a general solution. > > Is there anything in Kafka 0.8.x that helps address this issue? > > --Darren Sargent > RichRelevance (www.richrelevance.com)