Thanks, Aakash, for updating the KIP.

On Tue, May 19, 2020 at 2:18 AM Arjun Satish <arjun.sat...@gmail.com> wrote:

> Hi Randall,
>
> Thanks for the explanation! Excellent point about guaranteeing offsets in
> the async case.
>
> If we can guarantee that the offsets will be advanced only after the bad
> records are reported, then is there any value is the Future<> return type?
> I feel we can declare the function with a void return type:
>
> void report(SinkRecord failedRecord, Throwable error)
>
> that works asynchronously, and advances offsets only after the DLQ producer
> (and other reporters) complete successfully (as you explained).
>
> This actually alleviates my concern of what this Future<> actually means.
> Since a failure to report should kill the tasks, there is no reason for the
> connector to ever wait on the get().


We should not say "there is no reason", because we don't know all of the
requirements that might exist for sending records to external systems. The
additional guarantee regarding error records being fully recorded before
`preCommit(...)` is called is a minimal guarantee that Connect provides the
sink task, and returning a Future allow a sink task to have *stronger*
guarantees than what Connect provides by default.

Once again:
1. we need an async API to allow the sink task to report problem records
and then immediately continue doing more work.
2. Connect should guarantee to the sink task that all reported records will
actually be recorded before `preCommit(...)` is called
3. a sink task *might* need stronger guarantees, and may need to block on
the reported records some time before `preCommit(...)`, and we should allow
them to do this.
4. Future and callbacks are common techniques, but there are significant
runtime risks of using callbacks, whereas Future is a common/standard
pattern that is straightforward to use.

This *exactly* matches the current KIP, which is why I plan to vote for
this valuable and well-thought out KIP.


> And if we are guaranteeing that the
> offsets are only advanced when the errors are reported, then this becomes a
> double win:
>
> 1. connector developers can literally fire and forget failed records.
> 2. offsets are correctly advanced on errors being reported. Failure to
> report error will kill the task, and the last committed offset will be the
> correct one.


> The main contract would simply be to call report() before preCommit() or
> before put() returns in the task, so the framework knows that that there
> are error records reported, and those need to finish before the offsets can
> be advanced.
>
> I think I'd be pretty excited about this API. and if we all agree, then
> let's go ahead with this?


> Best,
>
>
>

Reply via email to