Sophie, Thanks for the nice summary. Originally I'm leaning towards option 2) since changing this name to others could be a pretty large conceptual change to users, and I think the concerns you raised is good as well that we cannot reuse the `taskId()` function names any more..
After pondering on it a bit, what about 1) keeping `TaskId` in the current package as that's not our main concern anyways, 2) changing `TaskId` to an abstract class and deprecate all util functions / fields that we do not want to expose any longer, and 3) introducing a new class that extends the o.a.k.streams.processor.TaskId for internal implementations? Guozhang On Sun, May 16, 2021 at 7:19 PM Sophie Blee-Goldman <sop...@confluent.io.invalid> wrote: > Sorry, one correction regarding the discussion on naming below: the other > con for reusing TaskId as the interface name is that we would not be able > to follow the standard getter naming conventions and introduce new taskId() > APIs to replace the deprecated ones, as the deprecated methods have already > claimed that name. So we would have to come up with a different name for > these APIs, for example taskInfo() would require the TaskId interface > which is a bit confusing. Or using getTaskId() temporarily until the > deprecated methods can be removed, which goes against the established > convention of no "get" in the getters. Another possibility is something > like "taskIdInfo()" or "taskIdMetadata()" so that it's still mentioning > taskId somewhere in the name. > > Thoughts? Given this awkwardness I'm somewhat leaning towards renaming the > interface after all, as in the example above to "TaskInfo". Or we could > take a similar approach and just build on the original TaskId name, calling > it TaskIdInfo or something like that. Any thoughts? As always the KIP has > been updated with what I am proposing, so you can take a look yourself. > > On Sun, May 16, 2021 at 7:07 PM Sophie Blee-Goldman <sop...@confluent.io> > wrote: > > > I'm moving the discussion on KIP-740 to an actual [DISCUSS] thread since > > it turned out to be more nuanced > > and grew in scope since I initially proposed it. Thanks John for > > the suggestions/questions on the [VOTE] thread, > > I have copied them over and addressed them below. > > > > Before that, I just want to apologize to everyone for bringing this to a > > vote immediately. I truly thought this would > > be a trivial KIP that just fixed what seemed like a bug in the API, but > it > > slowly grew in scope as I wrangled with > > the question of whether TaskId was already a public API. To clarify, it > is > > -- a TaskId is returned by the > > Processor/StateStoreContext.taskId() methods, I just hadn't noticed those > > APIs at the time of writing this KIP. > > After finding those APIs and confirming that it was public, I began to > > wonder whether it really should be. > > > > 1. "Topic Group" is a vestigial term from the early days of > >> the codebase. We call a "topic group" a "subtopology" in the > >> public interface now (although "topic group" is still used > >> internally some places). For user-facing consistency, we > >> should also use "subtopologyId" in your proposal. > > > > > > 100% agree, I remember being very confused by the term "topic group" > when > > I was first getting to know > > the Streams code base, and wondering why it wasn't just called > > "subtopology". This would be the perfect > > opportunity to migrate to the more clear terminology where appropriate. > > I've updated the KIP to reflect this. > > > > 2. I'm wondering if it's really necessary to introduce this > >> interface at all. I think your motivation is to be able to > >> get the subtopologyId and partition via TaskMetadata, right? > >> Why not just add those methods to TaskMetadata? Stepping > >> back, the concept of metadata about an identifier is a bit > >> elaborate. > > > > > > I think if the only issue we wanted to address was the TaskMetadata > > returning a string representation of a > > TaskId rather than the real thing, as was originally the case, then I > > would agree with just unfolding the API > > to expose the TaskId fields directly. However while digging into the > > question of whether TaskId was public, it > > became clear that the class had some awkward features for a public API: > > public fields with no getters, many > > utility methods for internal functionality that had to be made public and > > cluttered up the API, etc. So I believe > > this KIP can and should take the opportunity to clean up the TaskId API > > altogether, not just for one getter in > > TaskMetadata. And we can't deprecate an API without adding a replacement > > for users to migrate onto, since > > the concept of a TaskId itself is still very much relevant. That's the > > primary motivation behind introducing this > > new interface. Of course technically this could be done by retaining the > > existing TaskId class and just moving > > the things that should not be public to the new internal class, but > that's > > just a worse version of the same idea, > > and would require a somewhat awkward deprecation cycle. Plus, given that > > TaskId is intended to just be a > > wrapper for some misc information about a Task, a public interface feels > > more natural than a class. > > > > It also gives us more flexibility in the future, as it's much easier to > > just add a field to the TaskId (with a new > > getter on the interface) then to track down all places where we return > > the TaskId fields, ie every topicGroupId() > > and partition() getters on an API, and add a new getter API for this > field. > > > > Please refer to the KIP for the current proposal, including all > deprecated > > and new APIs. I hope the latest > > iteration will make everyone happy. The only potentially controversial > > question remaining is how to name > > the new interface. Here are all the options that seem reasonable to me, > > though chose to go with the 2nd one: > > > > 1) Keep the name TaskId for the interface, put the interface in the > > o.a.k.streams package, and call the internal class > > TaskIdImpl. This will require a fully qualified class in any file > that > > references both the new interface and the > > deprecated public TaskId class. > > 2) Same as the above, but allow the internal class to also retain the > name > > TaskId. This will also require a fully qualified > > class name whenever any of the three are referenced in the same file, > > though I don't think there is likely to be a single > > file where the new internal TaskId class appears along with the old > > public TaskId class but not the new TaskId interface, > > or vice versa, so the impact is probably exactly the same as the > above > > option. > > 3) Name the interface something else entirely (eg TaskInfo) and keep the > > name TaskId for the new internal class. The > > new internal TaskId and the deprecated public TaskId class most > likely > > appear together. This option is probably the > > least annoying with respect to requiring fully qualified class names, > > but I expect it's still on par with how often this > > would be required in option 1 or 2. > > > > Given that both the new public interface and the deprecated public TaskId > > class are almost always going to appear together, > > we can't avoid fully qualified path names until the deprecated class can > > be removed, unless we give the interface a different > > name. And most of the time only the internal TaskId class will be > > referenced anyways, so the scope of fully qualified names > > should still be small with any option. In fact the vast majority of the > > codebase will be using the new internal class, so leaving it > > as "TaskId" will help prevent a truly massive PR, as only the import line > > will need to be changed, vs hundreds/thousands of lines > > in which the class name of the TaskId object appears. So I would rather > > avoid option 1. > > > > With option 2 we'll need to suppress a spotbugs warning, while with > option > > 3 we are changing the name of a pretty fundamental > > concept throughout Streams, so users will need to mentally adapt to > > thinking about TaskInfo (or whatever) instead of in terms > > of TaskId. That is why I personally prefer option 2. > > > > > > Thanks for the discussion, sorry this turned into such a long response. I > > think we're on the same page now but I want to make > > sure the background and my general thought process behind all this was > > clear to all. I'll kick off the vote again but please respond > > here with any further concerns and save the [VOTE] thread for voting and > > maybe the occasional small question. > > > > -Sophie > > > -- -- Guozhang