Re: [DISCUSS] KIP-745: Connect API to restart connector and tasks

Randall Hauch Sat, 05 Jun 2021 13:57:11 -0700

Thanks for the review and feedback, Konstantine. I think I've addressed all
of your concerns below.


On Fri, Jun 4, 2021 at 7:55 PM Konstantine Karantasis
<konstant...@confluent.io.invalid> wrote:

> Thanks for the KIP Randall!
> Really happy to see this feature coming along finally. Will indeed resolve
> an unintuitive aspect of Connect's REST API.
>
> I'm in favor of the current approach overall. Just a few comments below
> that could use a brief discussion:
>
> 1) Is there a specific reason why you choose the config topic to persist
> the restart requests?
> I know it doesn't really matter which internal topic is used for backwards
> compatibility, because the worker ignores records that does not understand.
>

I think there's value in the restart request record being ordered along
with the other config topic records, including configuration records,
target state change records, and session key updates. This also aligns with
how the distributed herder will process the restart requests within its
`tick()` method, again making sure there are no ongoing starts and stops
from rebalances or new connectors while the restarts are taking place. Plus
there is already a built-in config topic listener mechanism in the
distributed herder.

I think those are good reasons to use the config topic. But there also is
not a viable alternative: the status topic has no existing listener
mechanism in the StatusBackingStore, and the offset topic is clearly not
applicable.


> But are there any implications upon worker restart? I read you specifically
> mention that a worker will read restart requests upon its own restart. But
> is this really desirable?
>

It will read them, just like all of the config records, target state change
records, and session key updates. However, upon worker restart the herder
will ignore any restart requests, just like it ignores session key update
and task state change records.


> The config topic offers a way to persist configurations of the latest
> running connectors and upon cluster restart to bring these connectors back
> up.
> However, I'm not sure we want to persist recent restart attempts, the same
> way we persist configurations. Additionally configurations might get
> additional backups in order to allow for the config topic to be recovered
> for disaster recovery reasons. To that respect, I'm not sure that the
> restart requests require the same guarantees.
>

This is not the first or only non-config record written to this topic, so
there's already precedent.


> Given the above, would the status topic offer a better alternative in order
> to broadcast the restart requests across the workers? Haven't looked
> closely what would be the implications of writing these new records to the
> status topic, but this topic tends to be more transient.
> Or is it that you've preferred the config topic for the workers to be aware
> of the interleaving between restart requests and requests for
> reconfigurations? If that's the case an explicit call out might be
> worthwhile in the KIP itself.
>

I guess I thought I already tried to do that. But I can add a bit more
detail to the KIP to call this out even more.


>
> 2) You mention in the KIP the status STOPPED. But this is not currently
> part of the task or connector states that get persisted to the status
> topic, as these are defined in AbstractStatus#State enum. Should we remove
> any reference to a STOPPED state to avoid confusion? The only new state
> seems to be the RESTARTING state.
>

Good catch. Removed references of the STOPPED state.

And a few minor comments.
> a) The sentence "We need to keep the existing behavior of the method to
> just restart the Connector object must be retained for backward
> compatibility," doesn't read well for me.
>

Reworded.


> b) Subsection title: "Use REST API for Worker-to-Worker Communication and"
> is possibly an unfinished sentence.
>

Ah, that was just a typo. Fixed.


> c) In the paragraph right below, the last sentence "especially when the
> number of workers " might also be missing something.
>

Fixed.

>
> Best,
> Konstantine
>
> On Thu, Jun 3, 2021 at 8:42 AM Kalpesh Patel <kpa...@confluent.io.invalid>
> wrote:
>
> > Agreed, this would be a great enhancement.
> >
> > On Thu, Jun 3, 2021 at 9:26 AM Nikita Kretov <kretov...@gmail.com>
> wrote:
> >
> > > Hello! It's really not intuitive that you need to restart connect and
> > > tasks separately! I'd like to see this KIP landed in 3.0.0 release!
> > >
> > > On 6/2/21 7:40 PM, Randall Hauch wrote:
> > > > Hello all,
> > > >
> > > > Many users struggle with the connector restart REST API only
> restarting
> > > the
> > > > Connector instance rather than everything they associated with a
> > "named"
> > > > connector. I've created a KIP that attempts to solve this problem via
> > new
> > > > query parameters on an existing REST API method, though by default it
> > > > remains backward compatible with older behavior:
> > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-745%3A+Connect+API+to+restart+connector+and+tasks
> > > >
> > > > Please take a look and respond here with questions. I'd love to get
> > this
> > > > into AK 3.0.0, and the KIP freeze is coming up soon.
> > > >
> > > > Best regards,
> > > >
> > > > Randall
> > > >
> > >
> >
>

Re: [DISCUSS] KIP-745: Connect API to restart connector and tasks

Reply via email to