Re: [DISCUSS] KIP-610: Error Reporting in Sink Connectors

Konstantine Karantasis Fri, 15 May 2020 11:46:24 -0700

Thanks for the quick response Aakash.

To your last point, modern APIs like this tend to be asynchronous (see
admin, producer in Kafka) and such definition results in more expressive
and well defined APIs.


What you describe is easily an opt-in feature for the connector developer.
At the same time, the latest description above, gives us better chances for
this API to remain like this for longer, because it covers both the sync
and async per `put` user cases. Given how simple the sync implementation
is, just by complying with the return type of the method, I still think the
BiFunction definition that returns a Future makes sense.

Konstantine



On Fri, May 15, 2020 at 11:27 AM Aakash Shah <as...@confluent.io> wrote:

> Thanks for the additional feedback.
>
> I see the benefits of adding an overloaded put(...) over alternatives and I
> am on board going forward with this approach. It will definitely set forth
> a contract of where the reporter will be used with better aesthetics.
>
> The original idea of going with a synchronous approach for the error
> reporter was to ease the connector developer's job interacting with and
> handling the error reporter. The tradeoff for having a synchronous-only
> reporter would be lower throughput on the reporter; this was thought to be
> fine since arguably most circumstances would not include consistently large
> amounts of records being sent to the error reporter. Even if this was the
> case, an argument can be made that the lower throughput would be of
> assistance in this case, as it would allow more time for the user to
> realize the connector is having records sent to the error reporter before
> many are sent. However, if we are strongly in favor of having the option of
> asynchronous functionality available for the developer, then I am fine with
> that as well.
>
> Lastly, I am on board with changing the name to failedRecordReporter,
>
> Please let me know your thoughts.
>
> Thanks,
> Aakash
>
> On Fri, May 15, 2020 at 9:10 AM Randall Hauch <rha...@gmail.com> wrote:
>
> > Konstantine said:
> >
> > > I notice Randall also used BiFunction in his example, I wonder if it's
> > for
> > > similar reasons.
> > >
> >
> > Nope. Just a typo on my part.
> >
> > There appear to be three outstanding questions.
> >
> > First, Konstantine suggested calling this "failedRecordReporter". I think
> > this is minor, but using this new name may be a bit more precise and I'd
> be
> > fine with this.
> >
> > Second, should the reporter method be synchronous? I think the two
> options
> > are:
> >
> > 2a. Use `BiConsumer<SinkRecord, Throwable>` that returns nothing and
> blocks
> > (at this time).
> > 2b. Use `BiFunction<SinkRecord, Throwable, Future<Void>>` that returns a
> > future that the user can optionally use to be synchronous.
> >
> > I do agree with Konstantine that option 2b gives us more room for future
> > semantic changes, and since the producer write is already asynchronous
> this
> > should be straightforward to implement. I think the concern here is that
> if
> > the sink task does not *use* the future to make this synchronous, it is
> > very possible that the error records could be written out of order (due
> to
> > retries). But this won't be an issue if the implementation uses
> > `max.in.flight.requests.per.connection=1` for writing the error records.
> > It's a little less clear, but honestly IMO passing the reporter in the
> > `put(...)` method helps make this lambda easier to understand, for some
> > strange reason. So unless there are good reasons to avoid this, I'd be in
> > favor of 2b and returning a Future.
> >
> > Third, how do we pass the reporter lambda / method reference to the task?
> > My proposal to pass the reporter via an overload `put(...)` still is the
> > most attractive to me, for several reasons:
> >
> > 3a. There's no need to pass the reporter separately *and* to describe the
> > changes in method call ordering.
> > 3b. As mentioned above, for some reason passing it via `put(...)` makes
> the
> > intent more clear that it be used when processing the SinkRecord, and
> that
> > it shouldn't be used in `start(...)`, `preCommit(...)`,
> > `onPartitionsAssigned(...)`, or any of the other task methods. As Andrew
> > pointed out earlier, *describing* this in the KIP and in JavaDoc will be
> > tough to be exact yet succinct.
> > 3c. There is already precedence for evolving
> > `SourceTask.commitRecord(...)`, and the pattern is identical.
> > 3d. Backward compatibility is easy to understand, and at the same time
> it's
> > pretty easy to describe what implementations that want to take advantage
> of
> > this feature should do.
> > 3e. Minimal changes to the interface: we're just *adding* one default
> > method that calls the existing method and deprecating the existing
> > `put(...)`.
> > 3f. Deprecating the existing `put(...)` makes it more clear in a
> > programmatic sense that new sink implementations should use the reporter,
> > and that we recommend old sinks evolve to use it.
> >
> > Some of these benefits apply to some of the other suggestions, but I
> think
> > none of the other suggestions have all of these benefits. For example,
> > overloading `initialize(...)` is more difficult since most sink
> connectors
> > don't override it and therefore would be less subject to deprecations
> > warnings. Overloading `start(...)` is less attractive. Adding a method
> IMO
> > shares the fewest of these benefits.
> >
> > The one disadvantage of this approach is that sink task implementations
> > can't rely upon the reporter upon startup. IMO that's an acceptable
> > tradeoff to get the cleaner and more explicit API, especially if the API
> > contract is that Connect will pass the same reporter instance to each
> call
> > to `put(...)` on a single task instance.
> >
> > Best regards,
> >
> > Randall
> >
> > On Fri, May 15, 2020 at 6:59 AM Andrew Schofield <
> > andrew_schofi...@live.com>
> > wrote:
> >
> > > Hi,
> > > Randall's suggestion is really good. I think it gives the flexibility
> > > required and also
> > > keeps the interface the right way round.
> > >
> > > Thanks,
> > > Andrew Schofield
> > >
> > > On 15/05/2020, 02:07, "Aakash Shah" <as...@confluent.io> wrote:
> > >
> > > > Hi Randall,
> > > >
> > > > Thanks for the feedback.
> > > >
> > > > 1. This is a great suggestion, but I find that adding an overloaded
> > > > put(...) which essentially deprecates the old put(...) to only be
> used
> > > when
> > > > a connector is deployed on older versions of Connect adds enough of a
> > > > complication that could cause connectors to break if the old put(...)
> > > > doesn't correctly invoke the overloaded put(...); either that, or it
> > will
> > > > add duplication of functionality across the two put(...) methods. I
> > think
> > > > the older method simplifies things with the idea that a DLQ/error
> > > reporter
> > > > will or will not be passed into the method depending on the version
> of
> > > AK.
> > > > However, I also understand the aesthetic advantage of this method vs
> > the
> > > > setter method, so I am okay with going in this direction if others
> > agree
> > > > with adding the overloaded put(...).
> > > >
> > > > 2. Yes, your assumption is correct. Yes, we can remove the "Order of
> > > > Operations" if we go with the overloaded put(...) direction.
> > > >
> > > > 3. Great point, I will remove them from the KIP.
> > > >
> > > > 4. Yeah, accept(...) will be synchronous. I will change it to be
> > clearer,
> > > > thanks.
> > > >
> > > > 5. This KIP will use existing metrics as well introduce new metrics.
> I
> > > will
> > > > update this section to fully specify the metrics.
> > > >
> > > > Please let me know what you think.
> > > >
> > > > Thanks,
> > > > Aakash
> > > >
> > > > On Thu, May 14, 2020 at 3:52 PM Randall Hauch <rha...@gmail.com>
> > wrote:
> > > >
> > > > > Hi, Aakash.
> > > > >
> > > > > Thanks for the KIP. Connect does need an improved ability for sink
> > > > > connectors to report individual records as being problematic, and
> > this
> > > > > integrates nicely with the existing DLQ feature.
> > > > >
> > > > > I also appreciate the desire to maintain compatibility so that
> > > connectors
> > > > > can take advantage of this feature when deployed in a runtime that
> > > supports
> > > > > this feature, but can safely and easily do without the feature when
> > > > > deployed to an older runtime. But I do understand Andrew's concern
> > > about
> > > > > the aesthetics. Have you considered overloading the `put(...)`
> method
> > > and
> > > > > adding the `reporter` as a second parameter? Essentially it would
> add
> > > the
> > > > > one method (with proper JavaDoc) to `SinkTask` only:
> > > > >
> > > > > ```
> > > > >     public void put(Collection<SinkRecord> records,
> > > BiFunction<SinkRecord,
> > > > > Throwable> reporter) {
> > > > >         put(records);
> > > > >     }
> > > > > ```
> > > > > and the WorkerSinkTask would be changed to call `put(Collection,
> > > > > BiFunction)` instead.
> > > > >
> > > > > Sink connector implementations that don't do anything different can
> > > still
> > > > > override `put(Collection)`, and it still works as before.
> Developers
> > > that
> > > > > want to change their sink connector implementations to support this
> > new
> > > > > feature would do the following, which would work in older and newer
> > > Connect
> > > > > runtimes:
> > > > > ```
> > > > >     public void put(Collection<SinkRecord> records) {
> > > > >         put(records, null);
> > > > >     }
> > > > >     public void put(Collection<SinkRecord> records,
> > > BiFunction<SinkRecord,
> > > > > Throwable> reporter) {
> > > > >         // the normal `put(Collection)` logic goes here, but can
> > > optionally
> > > > > use `reporter` if non-null
> > > > >     }
> > > > > ```
> > > > >
> > > > > I think this has all the same benefits of the current KIP, but
> > > > > it's noticeably simpler and hopefully more aesthetically pleasing.
> > > > >
> > > > > As for Andrew's second concern about "the task can send errant
> > records
> > > to
> > > > > it within put(...)" being too restrictive. My guess is that this
> was
> > > more
> > > > > an attempt at describing the basic behavior, and less about
> requiring
> > > the
> > > > > reporter only being called within the `put(...)` method and not by
> > > methods
> > > > > to which `put(...)` synchronously or asynchronously delegates. Can
> > you
> > > > > confirm whether my assumption is correct? If so, then perhaps my
> > > suggestion
> > > > > helps work around this issue as well, since there would be no
> > > restriction
> > > > > on when the reporter is called, and the whole "Order of Operations"
> > > section
> > > > > could potentially be removed.
> > > > >
> > > > > Third, it's not clear to me why the "Error Reporter Object"
> > subsection
> > > in
> > > > > the "Proposal" section lists the worker configuration properties
> that
> > > were
> > > > > previously introduced with
> > > > >
> > > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-298%3A+Error+Handling+in+Connect
> > > > > .
> > > > > Maybe it's worth mentioning that the error reporter functionality
> > will
> > > > > reuse or build upon KIP-298, including reusing the configuration
> > > properties
> > > > > defined in KIP-298. But IIUC, the KIP does not propose changing any
> > > > > technical or semantic aspect of these configuration properties, and
> > > > > therefore the KIP would be more clear and succinct without them.
> > > *That* the
> > > > > error reporter will use these properties is part of the UX and
> > > therefore
> > > > > necessary to mention, but *how* it uses those properties is really
> up
> > > to
> > > > > the implementation.
> > > > >
> > > > > Fourth, the "Synchrony" section has a sentence that is confusing,
> or
> > > not as
> > > > > clear as it could be.
> > > > >
> > > > >     "If a record is sent to the error reporter, processing of the
> > next
> > > > > errant record in accept(...) will not begin until the producer
> > > successfully
> > > > > sends the errant record to Kafka."
> > > > >
> > > > > This sentence is a bit difficult to understand, but IIUC this
> really
> > > just
> > > > > means that "accept(...)" will be synchronous and will block until
> the
> > > > > errant record has been successfully written to Kafka. If so, let's
> > say
> > > > > that. The rest of the paragraph is fine.
> > > > >
> > > > > Finally, is this KIP proposing new metrics, or that existing
> metrics
> > > would
> > > > > be used to track the error reporter usage? If the former, then
> please
> > > > > fully-specify what these metrics will be, similarly to how metrics
> > are
> > > > > specified in
> > > > >
> > > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-196%3A+Add+metrics+to+Kafka+Connect+framework
> > > > > .
> > > > >
> > > > > Thoughts?
> > > > >
> > > > > Best regards,
> > > > >
> > > > > Randall
> > > > >
> > > > > On Mon, May 11, 2020 at 4:49 PM Andrew Schofield <
> > > > > andrew_schofi...@live.com>
> > > > > wrote:
> > > > >
> > > > > > Hi Aakash,
> > > > > > Thanks for sorting out the replies to the mailing list.
> > > > > >
> > > > > > First, I do like the idea of improving error reporting in sink
> > > > > connectors.
> > > > > > I'd like a simple
> > > > > > way to put bad records onto the DLQ.
> > > > > >
> > > > > > I think this KIP is considerably more complicated than it seems.
> > The
> > > > > > guidance on the
> > > > > > SinkTask.put() method is that it should send the records
> > > asynchronously
> > > > > > and immediately
> > > > > > return, so the task is likely to want to report errors
> > asynchronously
> > > > > > too.  Currently the KIP
> > > > > > states that "the task can send errant records to it within
> > put(...)"
> > > and
> > > > > > that's too restrictive.
> > > > > > The task ought to be able to report any unflushed records, but
> the
> > > > > > synchronisation of this is going
> > > > > > to be tricky. I suppose the connector author needs to make sure
> > that
> > > all
> > > > > > errant records have
> > > > > > been reported before returning control from SinkTask.flush(...)
> or
> > > > > perhaps
> > > > > > SinkTask.preCommit(...).
> > > > > >
> > > > > > I think the interface is a little strange too. I can see that
> this
> > > was
> > > > > > done so it's possible to deliver a connector
> > > > > > that supports error reporting but it can also work in earlier
> > > versions of
> > > > > > the KC runtime. But, the
> > > > > > pattern so far is that the task uses the methods of
> SinkTaskContext
> > > to
> > > > > > access utilities in the Kafka
> > > > > > Connect runtime, and I suggest that reporting a bad record is
> such
> > a
> > > > > > utility. SinkTaskContext has
> > > > > > changed before when the configs() methods was added, so I think
> > > there is
> > > > > > precedent for adding a method.
> > > > > > The way the KIP adds a method to SinkTask that the KC runtime
> calls
> > > to
> > > > > > provide the error reporting utility
> > > > > > seems not to match what has gone before.
> > > > > >
> > > > > > Thanks,
> > > > > > Andrew
> > > > > >
> > > > > > On 11/05/2020, 19:05, "Aakash Shah" <as...@confluent.io> wrote:
> > > > > >
> > > > > >     I wasn't previously added to the dev mailing list, so I'd
> like
> > to
> > > > > post
> > > > > > my
> > > > > >     discussion with Andrew Schofield below for visibility and
> > further
> > > > > >     discussion:
> > > > > >
> > > > > >     Hi Andrew,
> > > > > >
> > > > > >     Thanks for the reply. The main concern with this approach
> would
> > > be
> > > > > its
> > > > > >     backward compatibility. I’ve highlighted the thoughts around
> > the
> > > > > > backwards
> > > > > >     compatibility of the initial approach, please let me know
> what
> > > you
> > > > > > think.
> > > > > >
> > > > > >     Thanks,
> > > > > >     Aakash
> > > > > >
> > > > > >
> > > > > >
> > > > >
> > >
> >
> ____________________________________________________________________________________________________________________________
> > > > > >
> > > > > >     Hi,
> > > > > >     By adding a new method to the SinkContext interface in say
> > Kafka
> > > > > 2.6, a
> > > > > >     connector that calls it would require a Kafka 2.6 connect
> > > runtime. I
> > > > > > don't
> > > > > >     quite see how that's a backward compatibility problem. It's
> > just
> > > that
> > > > > > new
> > > > > >     connectors need the latest interface. I might not quite be
> > > > > > understanding,
> > > > > >     but I think it would be fine.
> > > > > >
> > > > > >     Thanks,
> > > > > >     Andrew
> > > > > >
> > > > > >
> > > > > >
> > > > >
> > >
> >
> ____________________________________________________________________________________________________________________________
> > > > > >
> > > > > >     Hi Andrew,
> > > > > >
> > > > > >     I apologize for the way the reply was sent. I just subscribed
> > to
> > > the
> > > > > > dev
> > > > > >     mailing list so it should be resolved now.
> > > > > >
> > > > > >     You are correct, new connectors would simply require the
> latest
> > > > > > interface.
> > > > > >     However, we want to remove that requirement - in other words,
> > we
> > > want
> > > > > > to
> > > > > >     allow the possibility that someone wants the latest
> > connector/to
> > > > > > upgrade to
> > > > > >     the latest version, but deploys it on an older version of AK.
> > > > > > Basically, we
> > > > > >     don't want to enforce the necessity of upgrading AK to get
> the
> > > latest
> > > > > >     interface. In the current approach, there would be no issue
> of
> > > > > > deploying a
> > > > > >     new connector on an older version of AK, as the Connect
> > framework
> > > > > would
> > > > > >     simply not invoke the new method.
> > > > > >
> > > > > >     Please let me know what you think and if I need to clarify
> > > anything.
> > > > > >
> > > > > >     Thanks,
> > > > > >     Aakash
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> > >
> >
>

Re: [DISCUSS] KIP-610: Error Reporting in Sink Connectors

Reply via email to