> this does not guarantee that the
> offsets of R have been written/flushed at the next commit() call

True, but does it matter? So long as you can guarantee the records are
delivered to the downstream Kafka cluster, it shouldn't matter if they have
been committed or not.

The worst that can happen is that the worker gets bounced and asks for the
same records a second time. Even if those records have since been dropped
from the upstream data source, it doesn't matter cuz you know they were
previously delivered successfully.

I do agree that commit() and commitRecord() are poorly named, but I don't
think there's anything fundamentally missing from the API.

Ryanne

On Wed, Oct 17, 2018 at 10:24 AM Per Steffensen <perst...@gmail.com> wrote:

> On 17/10/2018 16.43, Ryanne Dolan wrote:
> > I see, thanks.
> > On the other hand, the commitRecord() callback provides the functionality
> > you require in this case. In commitRecord() your SourceTask can track the
> > offsets of records that have been ack'd by the producer client, and then
> in
> > commit() you can be sure that those offsets have been flushed.
> That is the trick I am currently using - more or less.
> But unfortunately it does not work 100% either. It is possible that
> commitRecord() is called with a record R, and then commit() is called
> after that, without the offsets of R having been written/flushed. The
> call to commitRecord() means that the "actual data" of R has been
> send/acknowledged, but unfortunately this does not guarantee that the
> offsets of R have been written/flushed at the next commit() call
>
>

Reply via email to