On 18/10/2018 16.46, Ryanne Dolan wrote:
Per Steffenson, getting sequence numbers correct is definitely difficult,
but this is not Connect's fault. I'd like to see Connect implement
exactly-once from end-to-end, but that requires coordination between
sources and sinks along the lines that you allude to, using sequence
numbers and transactions and whatnot.

The problem with commit() is knowing when it's okay to delete the files in
your example. I don't believe that issue has anything to do with avoiding
dupes or assigning unique sequence numbers. I believe it is safe to delete
a file if you know it has been delivered successfully, which the present
API exposes.
Ok, I believe you put too much into my example. It was just what I was able to come up with, that was so simple that it could be explained fairly easily, and where it would be important that you know for which records offsets has been flushed. The example may not have been good enough to fulfill its purpose of showing that knowing exactly for which records offsets have been flushed is important.

You might argue that you, as a source-connector developer, do not need to know about when offsets are flushed at all. I will argue that in many cases you do, and that you may need to know exactly which records have had their offsets flushed. In that case the current "commit" method is useless. You do not know which records had their offsets flushed * It is not necessarily all the records returned from "poll", at the point of "commit" called. Even though the current JavaDoc claims so * It is not necessarily all the records for which "commitRecord" has been called. It may be more records, or it may be less records Bottom line is that current "commit" cannot be used for much. You may argue that it should just be removed, but I definitely would not like to see that. I use "commit" i several of my source-connectors (pretending that it works), and could not live without it. As I see it the offsets are kinda your accounting "information about how to proceed from where you came to". Kafka Connect offers to help me keep track of that accounting in alignment with the outgoing data related to that accounting. Of course I could just deal with all that myself, but then a lot of the niceness of Kafka Connect would be gone, and I might as well just do everything myself.

That said, I'm not opposed to your proposed callbacks, and I agree that
commit() and commitRecord() are poorly named. I just don't believe the
present API is incorrect.
I definitely do. Currently there is a callback "commit" that lies in its JavaDoc, and that essentially cannot be used for anything, except for making you confused. You know nothing about the state when it is called.

But as long as you do not oppose the proposed solution, we probably should not spend too much time arguing back and forth about opinions.

Ryanne
Regards, and thanks for participating in the discussion
Per Steffensen

Reply via email to