Re: Any update on the "distributed commit" problem?

Neha Narkhede Mon, 25 Mar 2013 12:34:11 -0700

Today, the only safe way of controlling consumer state management is
by using the SimpleConsumer. The application is responsible for
checkpointing offsets. So, in your example, when you commit a database
transaction, you can store your consumer's offset as part of the txn.
So either your txn succeeds and the offset moves ahead or your txn
fails and the offset stays where it is.


Kafka 0.9 is when we will attempt to merge the high level and low
level consumer APIs, move the offset management to the broker and
offer stronger offset checkpointing guarantees.

Thanks,
Neha

On Mon, Mar 25, 2013 at 11:36 AM, Darren Sargent
<dsarg...@richrelevance.com> wrote:
> This is where you are reading messages from a broker, doing something with 
> the messages, then commit them to some permanent storage such as HBase. There 
> is a race condition in commiting the offsets to Zookeeper; if the DB write 
> succeeds, but the ZK commit fails for any reason, you'll get a duplicate 
> batch next time you query the broker. If you commit to ZK first, and the 
> commit to the DB then fails, you lose data.
>
> The Kafka white paper mentions that Kafka stays agnostic about the 
> distributed commit problem. There has been some prior discussion about this 
> but I haven't seen any solid solutions. If you're using something like 
> PostgreSQL that admits two-phase commits, you can roll the offset into the DB 
> transaction, assuming you're okay with storing offsets in the DB rather than 
> in ZK, but that's not a general solution.
>
> Is there anything in Kafka 0.8.x that helps address this issue?
>
> --Darren Sargent
> RichRelevance (www.richrelevance.com)

Re: Any update on the "distributed commit" problem?

Reply via email to