Hi Yash, RE 2: That's a great point about validations already being performed by the leader. For completeness's sake, I'd like to note that this only holds for valid configurations; invalid ones are caught right now before being forwarded to the leader. Still, I think it's fine to forward to the leader for now and optimize further in the future if necessary. If frequent validations are taking place they should be conducted via the `PUT /connector-plugins/{pluginName}/config/validate` endpoint, which won't do any forwarding at all.
RE 3: Yes, those endpoints LGTM. And yes, bounds on the timeout also seem reasonable... maybe a low-importance worker property could work for that? Not sure what would make sense for a default; probably somewhere in the 10-60 minute range but would be interested in others' thoughts. Thanks for the clarification on the zombie fencing logic. I think we might want to have some more subtle logic around the interaction between calls to Admin::fenceProducers and a worker-level timeout property if we go that route, but we can cross that particular bridge if we get back to it. Cheers, Chris On Wed, Nov 2, 2022 at 1:48 PM Yash Mayya <yash.ma...@gmail.com> wrote: > Hi Chris, > > Thanks a lot for the super quick response and the great feedback! > > 1. I think that makes a lot of sense, and I'd be happy to update the KIP to > include this change in the scope. The current behavior where the API > response indicates a time out but the connector is created/updated > eventually anyway can be pretty confusing and is generally not a good user > experience IMO. > > 2. Wow, thanks for pointing this out - it's a really good catch and > something I hadn't noticed was happening with the current implementation. > While I do like the idea of having a query parameter that determines > whether validations can be skipped, I'm wondering if it might not be easier > and cleaner to just do the leader check earlier and avoid doing the > unnecessary config validation on the first worker? Since each config > validation happens on its own thread, I'm not so sure about the concern of > overloading the leader even on larger clusters, especially since > validations aren't typically long running operations. Furthermore, even > with the current implementation, the leader will always be doing a config > validation for connector create / update REST API requests on any worker. > > 3. That's a good point, and this way we can also restrict the APIs whose > timeouts are configurable - I'm thinking `PUT > /connector-plugins/{pluginName}/config/validate`, `POST /connectors` and > `PUT /connectors/{connector}/config` are the ones where such a timeout > parameter could be useful. Also, do you think we should enforce some > reasonable bounds for the timeout config? > > On the zombie fencing point, the implication was that the new worker > property would not control the timeout used for the call to > Admin::fenceProducers. However, if we go with a timeout query parameter > approach, even the timeout for the `PUT /connectors/{connector}/fence' > endpoint will remain unaffected. > > Thanks, > Yash > > On Wed, Nov 2, 2022 at 8:13 PM Chris Egerton <chr...@aiven.io.invalid> > wrote: > > > Hi Yash, > > > > Thanks for the KIP. It's a nice, focused change. Initially I was hesitant > > to support cases where connector validation takes this long, but > > considering the alternative is that we give users a 500 error response > but > > leave the request to create/modify the connector queued up in the > herder, I > > think I can get behind the motivation here. There's also an argument to > be > > made about keeping Kafka Connect available even when the systems that it > > connects to are in a degraded state. > > > > I have a few alternatives I'd be interested in your thoughts on: > > > > 1. Since the primary concern here seems to be that custom connector > > validation logic can take too long, do we have any thoughts on adding > logic > > to check for request timeout after validation has completed and, if it > has, > > aborting the attempt to create/modify the connector? > > > > 2. Right now it's possible that we'll perform two connector config > > validations per create/modify request; once on the worker that initially > > receives the request, and then again if that worker is not the leader of > > the cluster and has to forward the request to the leader. Any thoughts on > > optimizing this to only require a single validation per request? We > > probably wouldn't want to force all validations to take place on the > leader > > (could lead to overloading it pretty quickly in large clusters), but we > > could add an internal-only query parameter to skip validation and then > use > > that parameter when forwarding requests from followers to the leader. > > > > 3. A worker property is pretty coarse-grained, and difficult to change. > We > > might allow per-request toggling of the timeout by adding a URL query > > parameter like '?timeout=90s' to the REST API to allow tweaking of the > > timeout on a more granular basis, and without having to perform a worker > > restart. > > > > I'd also like to clarify a point about the rejected alternative "Allow > > configuring producer zombie fencing admin request timeout"--is the > > implication here that the "rest.api.request.timeout.ms" property will > not > > control the REST timeout for requests to the 'PUT > > /connectors/{connector}/fence' endpoint, or just that it won't control > the > > timeout that we use for the call to Admin::fenceProducers? > > > > Cheers, > > > > Chris > > > > On Wed, Nov 2, 2022 at 10:07 AM Yash Mayya <yash.ma...@gmail.com> wrote: > > > > > Hi all, > > > > > > I'd like to start a discussion thread on this small KIP - > > > > > > > > > https://eu01.z.antigena.com/l/8mqVuSEhdZF05LKGk7J1Cb6esUWht6ps4-73ocd196fgyxC~lGhTLqBEt9BPJy1MBx6RzKVi7qRYYpJwCHRixlmdcayj7YMv~-0fIZJ2f8EjEhp-Qck3~O3Ga1JBTgaY4tTAlwh0VjF6217iBaGVG2PNa-2TO5z8zC5s0Nb28kcG2mUg1tcEfzsAtmWEXWRECoeD4cdUXyPvWeFtYL-ccVv-PmQydgSIYkdeR7~XItAXezft9Mfs5pzjejXSK~MhWW > > > > > > It proposes the addition of a new Kafka Connect worker configuration to > > > allow configuring REST API request timeouts. > > > > > > Thanks, > > > Yash > > > > > >