Thanks, Konstantine. I've incorporated your suggestion about the `tick()`
mention, and it sounds like my recent changes satisfy all of your other
concerns (which seem relatively minor). So I'll go ahead and start a vote
and we can continue to fine tune the wording.

Best regards,

Randall

On Mon, Jun 7, 2021 at 2:31 PM Konstantine Karantasis
<konstant...@confluent.io.invalid> wrote:

> Thanks Randall. I reply inline as well.
>
>
> On Sat, Jun 5, 2021 at 1:57 PM Randall Hauch <rha...@apache.org> wrote:
>
> > Thanks for the review and feedback, Konstantine. I think I've addressed
> all
> > of your concerns below.
> >
> > On Fri, Jun 4, 2021 at 7:55 PM Konstantine Karantasis
> > <konstant...@confluent.io.invalid> wrote:
> >
> > > Thanks for the KIP Randall!
> > > Really happy to see this feature coming along finally. Will indeed
> > resolve
> > > an unintuitive aspect of Connect's REST API.
> > >
> > > I'm in favor of the current approach overall. Just a few comments below
> > > that could use a brief discussion:
> > >
> > > 1) Is there a specific reason why you choose the config topic to
> persist
> > > the restart requests?
> > > I know it doesn't really matter which internal topic is used for
> > backwards
> > > compatibility, because the worker ignores records that does not
> > understand.
> > >
> >
> > I think there's value in the restart request record being ordered along
> > with the other config topic records, including configuration records,
> > target state change records, and session key updates. This also aligns
> with
> > how the distributed herder will process the restart requests within its
> > `tick()` method, again making sure there are no ongoing starts and stops
> > from rebalances or new connectors while the restarts are taking place.
> Plus
> > there is already a built-in config topic listener mechanism in the
> > distributed herder.
> >
> > I think those are good reasons to use the config topic. But there also is
> > not a viable alternative: the status topic has no existing listener
> > mechanism in the StatusBackingStore, and the offset topic is clearly not
> > applicable.
> >
>
> These are good points that in practice make the config topic preferrable.
> Even though some sound more like implementation details, I think saying
> explicitly why the config topic gives us a good tradeoff (you mostly say
> it) and why the status topic is not preferred (some info is missing here,
> especially regarding the listener) is valuable.
>

I've tried to mention why the config topic is applicable here in the final
paragraph of the "Distributed Runtime" section. And I previously added a
third Rejected Alternative that mentions the cons of the status topic and
explicitly mentions "the StatusBackingStore interface does not define a
listener mechanism". Let me know if you still are looking for other
clarification.


> Also a bit of a nit but mentioning the "tick()" method might not be clear
> for the uninitiated with the internals of the Connect framework.
> Calling this the main thread loop of the workers might be simpler.
>

Good idea. I've updated the KIP.

>
> > > But are there any implications upon worker restart? I read you
> > specifically
> > > mention that a worker will read restart requests upon its own restart.
> > But
> > > is this really desirable?
> > >
> >
> > It will read them, just like all of the config records, target state
> change
> > records, and session key updates. However, upon worker restart the herder
> > will ignore any restart requests, just like it ignores session key update
> > and task state change records.
> >
> >
> > > The config topic offers a way to persist configurations of the latest
> > > running connectors and upon cluster restart to bring these connectors
> > back
> > > up.
> > > However, I'm not sure we want to persist recent restart attempts, the
> > same
> > > way we persist configurations. Additionally configurations might get
> > > additional backups in order to allow for the config topic to be
> recovered
> > > for disaster recovery reasons. To that respect, I'm not sure that the
> > > restart requests require the same guarantees.
> > >
> >
> > This is not the first or only non-config record written to this topic, so
> > there's already precedent.
> >
>
> I acknowledge the precedent but we might need to draw a line at some point
> and make sure we maintain more permanent information in the config topic
> and more transient in the status topic, modulo any other concerns or
> trade-offs.
> As discussed above, I agree that the config topic is a better option here.
>
>
> >
> > > Given the above, would the status topic offer a better alternative in
> > order
> > > to broadcast the restart requests across the workers? Haven't looked
> > > closely what would be the implications of writing these new records to
> > the
> > > status topic, but this topic tends to be more transient.
> > > Or is it that you've preferred the config topic for the workers to be
> > aware
> > > of the interleaving between restart requests and requests for
> > > reconfigurations? If that's the case an explicit call out might be
> > > worthwhile in the KIP itself.
> > >
> >
> > I guess I thought I already tried to do that. But I can add a bit more
> > detail to the KIP to call this out even more.
> >
>
> +1. Indeed there's some explanation on that already. Maybe a subsection
> header and a juxtaposition with the status topic could help to highlight
> the reason behind this selection in a more clear way.
>

I feel like the third Rejected Alternative (which I added after your first
email review) is a better place for why we chose not to do something.
Apologies for not mentioning that new Rejected Alternative in my previous
email.


>
>
> >
> >
> > >
> > > 2) You mention in the KIP the status STOPPED. But this is not currently
> > > part of the task or connector states that get persisted to the status
> > > topic, as these are defined in AbstractStatus#State enum. Should we
> > remove
> > > any reference to a STOPPED state to avoid confusion? The only new state
> > > seems to be the RESTARTING state.
> > >
> >
> > Good catch. Removed references of the STOPPED state.
> >
> > And a few minor comments.
> > > a) The sentence "We need to keep the existing behavior of the method to
> > > just restart the Connector object must be retained for backward
> > > compatibility," doesn't read well for me.
> > >
> >
> > Reworded.
> >
> >
> > > b) Subsection title: "Use REST API for Worker-to-Worker Communication
> > and"
> > > is possibly an unfinished sentence.
> > >
> >
> > Ah, that was just a typo. Fixed.
> >
> >
> > > c) In the paragraph right below, the last sentence "especially when the
> > > number of workers " might also be missing something.
> > >
> >
> > Fixed.
> >
>
>
> Thanks for the replies.
> +1 overall.
>
> Konstantine
>
>
>
>
>
> > >
> > > Best,
> > > Konstantine
> > >
> > > On Thu, Jun 3, 2021 at 8:42 AM Kalpesh Patel
> <kpa...@confluent.io.invalid
> > >
> > > wrote:
> > >
> > > > Agreed, this would be a great enhancement.
> > > >
> > > > On Thu, Jun 3, 2021 at 9:26 AM Nikita Kretov <kretov...@gmail.com>
> > > wrote:
> > > >
> > > > > Hello! It's really not intuitive that you need to restart connect
> and
> > > > > tasks separately! I'd like to see this KIP landed in 3.0.0 release!
> > > > >
> > > > > On 6/2/21 7:40 PM, Randall Hauch wrote:
> > > > > > Hello all,
> > > > > >
> > > > > > Many users struggle with the connector restart REST API only
> > > restarting
> > > > > the
> > > > > > Connector instance rather than everything they associated with a
> > > > "named"
> > > > > > connector. I've created a KIP that attempts to solve this problem
> > via
> > > > new
> > > > > > query parameters on an existing REST API method, though by
> default
> > it
> > > > > > remains backward compatible with older behavior:
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-745%3A+Connect+API+to+restart+connector+and+tasks
> > > > > >
> > > > > > Please take a look and respond here with questions. I'd love to
> get
> > > > this
> > > > > > into AK 3.0.0, and the KIP freeze is coming up soon.
> > > > > >
> > > > > > Best regards,
> > > > > >
> > > > > > Randall
> > > > > >
> > > > >
> > > >
> > >
> >
>

Reply via email to