yashmayya commented on PR #13530:
URL: https://github.com/apache/kafka/pull/13530#issuecomment-1660303528

   Hey Chris, thanks for the reviews and apologies for the late response.
   
   > Honestly, I think the biggest obstacle to preventing pause/resume for 
connectors that have failed to generate task configs is that it'll look uglier 
in the code base, especially if we take a naive approach and introduce some 
kind of public WorkerConnector::setState method (gross).
   
   Haha that was exactly the reason I tried to avoid doing so but you're right 
about the delegate status listener bit, thanks for pointing that out! In fact, 
it looks like the herder's status listener is wrapped twice - 
   
   - Once in the `Worker` here - 
https://github.com/apache/kafka/blob/660e6fe8108e8a9b3481ea1ec20327a099dd8310/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/Worker.java#L295
   - And then again in the `WorkerConnector` itself like you pointed out - 
https://github.com/apache/kafka/blob/660e6fe8108e8a9b3481ea1ec20327a099dd8310/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/WorkerConnector.java#L98-L99
   
   > Perhaps we could add a WorkerConnector::taskConfigs method that accepts a 
boolean reportFailure parameter (feel free to rename), which controls whether 
any exceptions encountered while invoking Connector::taskConfigs (or anywhere 
else in the method, really) should be reported to the connector's status 
listener and result in a state change? It's still a little unclean, but it does 
reflect the needs of our internal API: in one mode, we want these errors to be 
thrown to the caller without affecting any other state for the connector, and 
in another mode, we still want these errors to be thrown (so as to surface in 
REST responses as 500 errors), but also to result in a change in state for the 
connector.
   
   Hm that's an interesting idea but I agree that it feels a little unclean - 
we'd also need to expand the `WorkerConnector`'s public interface since the 
method that updates its internal state and calls the wrapped status listener's 
`onFailure` hook is currently internal and exposing it could lead to questions 
and confusion around when it can safely be invoked externally (i.e. without 
affecting any other status updates that could occur internally). WDYT?
   
   > Finally, regarding augmenting error messages with a note that tasks may 
still be running--as long as we don't have to wrap any connector-thrown 
exceptions in order to introduce this additional wording, I'm in favor of it. I 
just don't want to add another set of stack frames and another link in the 
"caused by" chain without good reason since it makes things harder to read 
(I've seen people copy+paste stack traces that end with ["Tolerance exceeded in 
error 
handler"](https://github.com/apache/kafka/blob/30b087ead967b28d459945fe90c80545bf189d1f/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/WorkerSinkTask.java#L591)
 and omit the actual root cause, which is... less than helpful).
   
   With the current changes, we're already wrapping the connector thrown 
exception to use `Failed to update tasks after connector restart / startup` as 
the top level error message. I definitely get where you're coming from since 
I've seen the same issue in the past too (with `Tolerance exceeded in error 
handler`) but I do think that this case is different - users who aren't too 
familiar with the nuances between a connector and its tasks (and their 
associated statuses) would probably be fairly confused seeing a failed 
connector with running tasks. I feel like this situation would be significantly 
improved by using a message that talks about the task reconfiguration error for 
the connector status error message instead of using whatever the connector 
throws.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to