Hi Ben,

Those are both great questions. I will tackle the second one now, and
address the first one a bit later.

AppIds are prerequisite for using transactions, and must be consistent
across across application sessions. They are the mechanism by which
transaction recovery can occur across sessions, and assume that the
application has state which is persistent across sessions.

PIDs without AppIds are ephemeral and are valid for a single session. In
this context, they are only used to ensure idempotent message message
delivery.

Keeping them separate caters to those applications which don't care about
transactions but do care about idempotent message delivery guarantees,
which we assume will be the vast majority of the applications out there.

Further, from an API perspective, separating out AppId from PID means that
applications get idempotent message delivery for free once they upgrade
their kafka version on the client and server and switch to the new message
format: there are no additional code or config changes needed, since PID is
a completely internal concept.

Hope this clarifies the reason for separating these two things out.

Regards,
Apurva

On Tue, Dec 6, 2016 at 3:22 PM, Ben Kirwin <b...@kirw.in> wrote:

> Thanks for this! I'm looking forward to going through the full proposal in
> detail soon; a few early questions:
>
> First: what happens when a consumer rebalances in the middle of a
> transaction? The full documentation suggests that such a transaction ought
> to be rejected:
>
> > [...] if a rebalance has happened and this consumer
> > instance becomes a zombie, even if this offset message is appended in the
> > offset topic, the transaction will be rejected later on when it tries to
> > commit the transaction via the EndTxnRequest.
>
> ...but it's unclear to me how we ensure that a transaction can't complete
> if a rebalance has happened. (It's quite possible I'm missing something
> obvious!)
>
> As a concrete example: suppose a process with PID 1 adds offsets for some
> partition to a transaction; a consumer rebalance happens that assigns the
> partition to a process with PID 2, which adds some offsets to its current
> transaction; both processes try and commit. Allowing both commits would
> cause the messages to be processed twice -- how is that avoided?
>
> Second: App IDs normally map to a single PID. It seems like one could do
> away with the PID concept entirely, and just use App IDs in most places
> that require a PID. This feels like it would be significantly simpler,
> though it does increase the message size. Are there other reasons why the
> App ID / PID split is necessary?
>
> On Wed, Nov 30, 2016 at 2:19 PM, Guozhang Wang <wangg...@gmail.com> wrote:
>
> > Hi all,
> >
> > I have just created KIP-98 to enhance Kafka with exactly once delivery
> > semantics:
> >
> > *https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > 98+-+Exactly+Once+Delivery+and+Transactional+Messaging
> > <https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > 98+-+Exactly+Once+Delivery+and+Transactional+Messaging>*
> >
> > This KIP adds a transactional messaging mechanism along with an
> idempotent
> > producer implementation to make sure that 1) duplicated messages sent
> from
> > the same identified producer can be detected on the broker side, and 2) a
> > group of messages sent within a transaction will atomically be either
> > reflected and fetchable to consumers or not as a whole.
> >
> > The above wiki page provides a high-level view of the proposed changes as
> > well as summarized guarantees. Initial draft of the detailed
> implementation
> > design is described in this Google doc:
> >
> > https://docs.google.com/document/d/11Jqy_GjUGtdXJK94XGsEIK7CP1SnQGdp2eF
> > 0wSw9ra8
> >
> >
> > We would love to hear your comments and suggestions.
> >
> > Thanks,
> >
> > -- Guozhang
> >
>

Reply via email to