Re: [DISCUSS] KIP-882: Make Kafka Connect REST API request timeouts configurable

Chris Egerton Thu, 03 Nov 2022 08:31:10 -0700

Hi Yash,

RE 2: That's a great point about validations already being performed by the
leader. For completeness's sake, I'd like to note that this only holds for
valid configurations; invalid ones are caught right now before being
forwarded to the leader. Still, I think it's fine to forward to the leader
for now and optimize further in the future if necessary. If frequent
validations are taking place they should be conducted via the `PUT
/connector-plugins/{pluginName}/config/validate` endpoint, which won't do
any forwarding at all.


RE 3: Yes, those endpoints LGTM. And yes, bounds on the timeout also seem
reasonable... maybe a low-importance worker property could work for that?
Not sure what would make sense for a default; probably somewhere in the
10-60 minute range but would be interested in others' thoughts.

Thanks for the clarification on the zombie fencing logic. I think we might
want to have some more subtle logic around the interaction between calls to
Admin::fenceProducers and a worker-level timeout property if we go that
route, but we can cross that particular bridge if we get back to it.

Cheers,

Chris

On Wed, Nov 2, 2022 at 1:48 PM Yash Mayya <[email protected]> wrote:

> Hi Chris,
>
> Thanks a lot for the super quick response and the great feedback!
>
> 1. I think that makes a lot of sense, and I'd be happy to update the KIP to
> include this change in the scope. The current behavior where the API
> response indicates a time out but the connector is created/updated
> eventually anyway can be pretty confusing and is generally not a good user
> experience IMO.
>
> 2. Wow, thanks for pointing this out - it's a really good catch and
> something I hadn't noticed was happening with the current implementation.
> While I do like the idea of having a query parameter that determines
> whether validations can be skipped, I'm wondering if it might not be easier
> and cleaner to just do the leader check earlier and avoid doing the
> unnecessary config validation on the first worker? Since each config
> validation happens on its own thread, I'm not so sure about the concern of
> overloading the leader even on larger clusters, especially since
> validations aren't typically long running operations. Furthermore, even
> with the current implementation, the leader will always be doing a config
> validation for connector create / update REST API requests on any worker.
>
> 3. That's a good point, and this way we can also restrict the APIs whose
> timeouts are configurable - I'm thinking `PUT
> /connector-plugins/{pluginName}/config/validate`, `POST /connectors` and
> `PUT /connectors/{connector}/config` are the ones where such a timeout
> parameter could be useful. Also, do you think we should enforce some
> reasonable bounds for the timeout config?
>
> On the zombie fencing point, the implication was that the new worker
> property would not control the timeout used for the call to
> Admin::fenceProducers. However, if we go with a timeout query parameter
> approach, even the timeout for the `PUT /connectors/{connector}/fence'
> endpoint will remain unaffected.
>
> Thanks,
> Yash
>
> On Wed, Nov 2, 2022 at 8:13 PM Chris Egerton <[email protected]>
> wrote:
>
> > Hi Yash,
> >
> > Thanks for the KIP. It's a nice, focused change. Initially I was hesitant
> > to support cases where connector validation takes this long, but
> > considering the alternative is that we give users a 500 error response
> but
> > leave the request to create/modify the connector queued up in the
> herder, I
> > think I can get behind the motivation here. There's also an argument to
> be
> > made about keeping Kafka Connect available even when the systems that it
> > connects to are in a degraded state.
> >
> > I have a few alternatives I'd be interested in your thoughts on:
> >
> > 1. Since the primary concern here seems to be that custom connector
> > validation logic can take too long, do we have any thoughts on adding
> logic
> > to check for request timeout after validation has completed and, if it
> has,
> > aborting the attempt to create/modify the connector?
> >
> > 2. Right now it's possible that we'll perform two connector config
> > validations per create/modify request; once on the worker that initially
> > receives the request, and then again if that worker is not the leader of
> > the cluster and has to forward the request to the leader. Any thoughts on
> > optimizing this to only require a single validation per request? We
> > probably wouldn't want to force all validations to take place on the
> leader
> > (could lead to overloading it pretty quickly in large clusters), but we
> > could add an internal-only query parameter to skip validation and then
> use
> > that parameter when forwarding requests from followers to the leader.
> >
> > 3. A worker property is pretty coarse-grained, and difficult to change.
> We
> > might allow per-request toggling of the timeout by adding a URL query
> > parameter like '?timeout=90s' to the REST API to allow tweaking of the
> > timeout on a more granular basis, and without having to perform a worker
> > restart.
> >
> > I'd also like to clarify a point about the rejected alternative "Allow
> > configuring producer zombie fencing admin request timeout"--is the
> > implication here that the "rest.api.request.timeout.ms" property will
> not
> > control the REST timeout for requests to the 'PUT
> > /connectors/{connector}/fence' endpoint, or just that it won't control
> the
> > timeout that we use for the call to Admin::fenceProducers?
> >
> > Cheers,
> >
> > Chris
> >
> > On Wed, Nov 2, 2022 at 10:07 AM Yash Mayya <[email protected]> wrote:
> >
> > > Hi all,
> > >
> > > I'd like to start a discussion thread on this small KIP -
> > >
> > >
> >
> https://eu01.z.antigena.com/l/8mqVuSEhdZF05LKGk7J1Cb6esUWht6ps4-73ocd196fgyxC~lGhTLqBEt9BPJy1MBx6RzKVi7qRYYpJwCHRixlmdcayj7YMv~-0fIZJ2f8EjEhp-Qck3~O3Ga1JBTgaY4tTAlwh0VjF6217iBaGVG2PNa-2TO5z8zC5s0Nb28kcG2mUg1tcEfzsAtmWEXWRECoeD4cdUXyPvWeFtYL-ccVv-PmQydgSIYkdeR7~XItAXezft9Mfs5pzjejXSK~MhWW
> > >
> > > It proposes the addition of a new Kafka Connect worker configuration to
> > > allow configuring REST API request timeouts.
> > >
> > > Thanks,
> > > Yash
> > >
> >
>

Re: [DISCUSS] KIP-882: Make Kafka Connect REST API request timeouts configurable

Reply via email to