Hi Sagar, Thanks for your feedback! I actually renamed the method from "deleted()" to "destroyed()", which I think conveys the intention more clearly. I can certainly rename it to be 'onDeleted()', although I feel any method named onXXX() belongs to a listener class :)
Regarding failure scenarios, an option I'm considering is to just provide an overloaded Connector#stop(boolean deleted) method that is called during WorkerConnector#doShutdown(). This has the advantage of providing the same semantics that the current Connector#stop() has, with the caveat that the API won't be as expressive. Also, the extra 'cleanup' bits that were supposed to happen when a connector is deleted might not to happen at all if the connector doesn't stop before the configured timeout (and is therefore cancelled). At this point I think the simplest option would be to provide an overloaded method (with a default implementation) that connectors can override. Wdyt? From: dev@kafka.apache.org At: 11/15/22 11:40:26 UTC-5:00To: dev@kafka.apache.org Subject: Re: [DISCUSS] KIP-883: Add delete callback method to Connector API Hey Hector, Thanks for the KIP. I have a minor suggestion in terms of naming. Since this is a callback method, would it make sense to call it onDelete()? Also, the failure scenarios discussed by Greg would need handling. Among other things, I like the idea of having a timeout for graceful shutdown or else try a force shutdown. What do you think about that approach? Thanks! Sagar. On Sat, Nov 12, 2022 at 1:53 AM Hector Geraldino (BLOOMBERG/ 919 3RD A) < hgerald...@bloomberg.net> wrote: > Thanks Greg for taking your time to review not just the KIP but also the > PR. > > 1. You made very valid points regarding the behavior of the destroy() > callback for connectors that don't follow the happy path. After thinking > about it, I decided to tweak the implementation a bit and have the > destroy() method be called during the worker shutdown: this means it will > share the same guarantees the connector#stop() method has. An alternative > implementation can be to have an overloaded connector#stop(boolean deleted) > method that signals a connector that it is being stopped due to deletion, > but I think that having a separate destroy() method provides clearer > semantics. > > I'll make sure to ammend the KIP with these details. > > 3. Without going too deep on the types of operations that can be performed > by a connector when it's being deleted, I can imagine the > org.apache.kafka.connect.source.SourceConnector base class having a default > implementation that deletes the connector's offsets automatically > (controlled by a property); this is in the context of KIP-875 (first-class > offsets support in Kafka Connect). Similar behaviors can be introduced for > the SinkConnector, however I'm not sure if this KIP is the right place to > discuss all the possibilities, or if we shoold keeping it more > narrow-focused on providing a callback mechanism for when connectors are > deleted, and what the expectations are around this newly introduced method. > What do you think? > > > From: dev@kafka.apache.org At: 11/09/22 16:55:04 UTC-5:00To: > dev@kafka.apache.org > Subject: Re: [DISCUSS] KIP-883: Add delete callback method to Connector API > > Hi Hector, > > Thanks for the KIP! > > This is certainly missing functionality from the native Connect framework, > and we should try to make it possible to inform connectors about this part > of their lifecycle. > However, as with most functionality that was left out of the initial > implementation of the framework, the details are more challenging to work > out. > > 1. What happens when the destroy call throws an error, how does the > framework respond? > > This is unspecified in the KIP, and it appears that your proposed changes > could cause the herder to fail. > From the perspective of operators & connector developers, what is a > reasonable expectation to have for failure of a destroy? > I could see operators wanting both a graceful-delete to make use of this > new feature, and a force-delete for when the graceful-delete fails. > A connector developer could choose to swallow all errors encountered, or > fail-fast to indicate to the operator that there is an issue with the > graceful-delete flow. > If the alternative is crashing the herder, connector developers may choose > to hide serious errors, which is undesirable. > > 2. What happens when the destroy() call takes a long time to complete, or > is interrupted? > > It appears that your implementation serially destroy()s each appropriate > connector, and may prevent the herder thread from making progress while the > operation is ongoing. > We have previously had to patch Connect to perform all connector and task > operations on a background thread, because some connector method > implementations can stall indefinitely. > Connect also has the notion of "cancelling" a connector/task if a graceful > shutdown timeout operation takes too long. Perhaps some of that design or > machinery may be useful to protect this method call as well. > > More specific to the destroy() call itself, what happens when a connector > completes part of a destroy operation and then cannot complete the > remainder, either due to timing out or a worker crashing? > What is the contract with the connector developer about this method? Is the > destroy() only started exactly once during the lifetime of the connector, > or may it be retried? > > 3. What should be considered a reasonable custom implementation of the > destroy() call? What resources should it clean up by default? > > I think we can broadly categorize the state a connector mutates among the > following > * Framework-managed state (e.g. source offsets, consumer offsets) > * Implementation detail state (e.g. debezium db history topic, audit > tables, temporary accounts) > * Third party system data (e.g. the actual data being written by a sink > connector) > * Third party system metadata (e.g. tables in a database, delivery > receipts, permissions) > > I think it's apparent that the framework-managed state cannot/should not be > interacted with by the destroy() call. However, the framework could be > changed to clean up these resources at the same time that destroy() is > called. Is that out-of-scope of this proposal, and better handled by manual > intervention? > From the text of the KIP, I think it explicitly includes the Implementation > detail state, which should not be depended on externally and should be safe > to clean up during a destroy(). I think this is completely reasonable. > Are the third-party data and metadata out-of-scope for this proposal? Can > we officially recommend against it, or should we accommodate users and > connector developers that wish to clean up data/metadata during destroy()? > > 4. How should connector implementations of destroy handle backwards > compatibility? > > In terms of backward-compatibility for the framework vs connector versions, > I think the default-noop method is very reasonable. > However, what happens when someone upgrades from a version of a connector > without a destroy() implementation to one with an implementation, and > maintain backwards compatibility? > To replicate the same behavior, the connector might include something like > an `enable.cleanup` config which allows users to opt-in to the new > behavior. This could mean the proliferation of many different > configurations to handle this behavior. > Maybe we can provide some recommendations to developers, or some mechanism > to standardize this opt-in behavior. > > I'm interested to hear if you have any experience with the above, if you've > experimented with this feature in your fork. > > Thanks, > Greg > > > On Thu, Nov 3, 2022 at 11:55 AM Hector Geraldino (BLOOMBERG/ 919 3RD A) < > hgerald...@bloomberg.net> wrote: > > > Hi everyone, > > > > I've submitted KIP-883, which introduces a callback to the public > > Connector API called when deleting a connector: > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-883%3A+Add+delete+callback > +method+to+Connector+API > > > > It adds a new `deleted()` method (open to better naming suggestions) to > > the org.apache.kafka.connect.connector.Connector abstract class, which > will > > be invoked by connect Workers when a connector is being deleted. > > > > Feedback and comments are welcome. > > > > Thank you! > > Hector > > > > > > >