Thanks for continuing to work on this KIP, Chris. On Fri, May 7, 2021 at 12:58 PM Chris Egerton <chr...@confluent.io.invalid> wrote:
> b) An annotation is a really cool way to allow connector developers to > signal eligibility for exactly-once to Connect (and possibly, through > Connect, to users). Like you mention, connectors both with and without the > annotation could still run on both pre- and post-upgrade workers with no > worries about missing class errors. And it only requires a single-line > change to each connector. My only concern is that with some connectors, > exactly-once support might not be all-or-nothing and might be dependent on > how the connector is configured. For a practical example, Confluent's JDBC > source connector would likely be eligible for exactly-once when run in > incrementing mode (where it tracks offsets based on the value of a > monotonically-increasing table column), but not in bulk mode (where it > doesn't provide offsets for its records at all). With that in mind, what do > you think about a new "exactlyOnce()" method to the SourceConnector class > that can return a new ExactlyOnce enum with options of "SUPPORTED", > "UNSUPPORTED", and "UNKNOWN", with a default implementation that returns > "UNKNOWN"? This can be invoked by Connect after start() has been called to > give the connector a chance to choose its response based on its > configuration. > As far as what to do with this information goes--I think it'd go pretty > nicely in the response from the GET /connectors/{connector} endpoint, which > currently includes information about the connector's name, configuration, > and task IDs. We could store the info in the config topic in the same > record that contains the connector's configuration whenever a connector is > (re)configured, which would guarantee that the information provided about > eligibility for exactly-once matches the configuration it was derived from, > and would present no compatibility issues (older workers would be able to > read records written by new workers and vice-versa). Thoughts? > The worker has to do (or plan to do) something with the information about a connector's support for EOS, whether that's via an annotation or a method. Otherwise, what's the point of requiring the connector to expose this information. But the problem I see with this whole concept is that there will still be ambiguity. For example, if the method returns `UNKNWON` by default, the connector could still be written in a way where EOS does work with the connector. Yet what are users supposed to do when they see "UNKNOWN"?