C0urante commented on PR #13530:
URL: https://github.com/apache/kafka/pull/13530#issuecomment-1613554645

   Hmmm... is there an advantage to enabling users to pause/resume a failed 
connector? I feel like restarting it is still the natural corrective action in 
that case, which will also cause task config generation to be retried.
   
   Honestly, I think the biggest obstacle to preventing pause/resume for 
connectors that have failed to generate task configs is that it'll look uglier 
in the code base, especially if we take a naive approach and introduce some 
kind of public `WorkerConnector::setState` method (gross).
   
   Perhaps we could add a `WorkerConnector::taskConfigs` method that accepts a 
`boolean reportFailure` parameter (feel free to rename), which controls whether 
any exceptions encountered while invoking `Connector::taskConfigs` (or anywhere 
else in the method, really) should be reported to the connector's status 
listener and result in a state change? It's still a little unclean, but it does 
reflect the needs of our internal API: in one mode, we want these errors to be 
thrown to the caller without affecting any other state for the connector, and 
in another mode, we still want these errors to be thrown (so as to surface in 
REST responses as 500 errors), but also to result in a change in state for the 
connector.
   
   Also, one more advantage of keeping the state tracking (and reporting) 
inside the `WorkerConnector` class is that we currently use a [delegating 
status 
listener](https://github.com/apache/kafka/blob/30b087ead967b28d459945fe90c80545bf189d1f/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/WorkerConnector.java#L98-L99)
 to record metrics whenever the connector's status is updated. I believe with 
the current implementation in this PR, there would be a bug in metrics 
reporting for connectors (LMK if I'm mistaken, though).
   
   Finally, regarding augmenting error messages with a note that tasks may 
still be running--as long as we don't have to wrap any connector-thrown 
exceptions in order to introduce this additional wording, I'm in favor of it. I 
just don't want to add another set of stack frames and another link in the 
"caused by" chain without good reason since it makes things harder to read 
(I've seen people copy+paste stack traces that end with ["Tolerance exceeded in 
error 
handler"](https://github.com/apache/kafka/blob/30b087ead967b28d459945fe90c80545bf189d1f/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/WorkerSinkTask.java#L591)
 and omit the actual root cause, which is... less than helpful).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to