This is where you are reading messages from a broker, doing something with the 
messages, then commit them to some permanent storage such as HBase. There is a 
race condition in commiting the offsets to Zookeeper; if the DB write succeeds, 
but the ZK commit fails for any reason, you'll get a duplicate batch next time 
you query the broker. If you commit to ZK first, and the commit to the DB then 
fails, you lose data.

The Kafka white paper mentions that Kafka stays agnostic about the distributed 
commit problem. There has been some prior discussion about this but I haven't 
seen any solid solutions. If you're using something like PostgreSQL that admits 
two-phase commits, you can roll the offset into the DB transaction, assuming 
you're okay with storing offsets in the DB rather than in ZK, but that's not a 
general solution.

Is there anything in Kafka 0.8.x that helps address this issue?

--Darren Sargent
RichRelevance (www.richrelevance.com)

Reply via email to