Sophie,

Thanks for the nice summary. Originally I'm leaning towards option 2) since
changing this name to others could be a pretty large conceptual change to
users, and I think the concerns you raised is good as well that we cannot
reuse the `taskId()` function names any more..

After pondering on it a bit, what about 1) keeping `TaskId` in the current
package as that's not our main concern anyways, 2) changing `TaskId` to an
abstract class and deprecate all util functions / fields that we do not
want to expose any longer, and 3) introducing a new class that extends the
o.a.k.streams.processor.TaskId for internal implementations?


Guozhang


On Sun, May 16, 2021 at 7:19 PM Sophie Blee-Goldman
<sop...@confluent.io.invalid> wrote:

> Sorry, one correction regarding the discussion on naming below: the other
> con for reusing TaskId as the interface name is that we would not be able
> to follow the standard getter naming conventions and introduce new taskId()
> APIs to replace the deprecated ones, as the deprecated methods have already
> claimed that name. So we would have to come up with a different name for
> these APIs, for example taskInfo()  would require the TaskId interface
> which is a bit confusing. Or using getTaskId() temporarily until the
> deprecated methods can be removed, which goes against the established
> convention of no "get" in the getters. Another possibility is something
> like "taskIdInfo()" or "taskIdMetadata()" so that it's still mentioning
> taskId somewhere in the name.
>
> Thoughts? Given this awkwardness I'm somewhat leaning towards renaming the
> interface after all, as in the example above to "TaskInfo". Or we could
> take a similar approach and just build on the original TaskId name, calling
> it TaskIdInfo or something like that. Any thoughts? As always the KIP has
> been updated with what I am proposing, so you can take a look yourself.
>
> On Sun, May 16, 2021 at 7:07 PM Sophie Blee-Goldman <sop...@confluent.io>
> wrote:
>
> > I'm moving the discussion on KIP-740 to an actual [DISCUSS] thread since
> > it turned out to be more nuanced
> > and grew in scope since I initially proposed it. Thanks John for
> > the suggestions/questions on the [VOTE] thread,
> > I have copied them over and addressed them below.
> >
> > Before that, I just want to apologize to everyone for bringing this to a
> > vote immediately. I truly thought this would
> > be a trivial KIP that just fixed what seemed like a bug in the API, but
> it
> > slowly grew in scope as I wrangled with
> > the question of whether TaskId was already a public API. To clarify, it
> is
> > -- a TaskId is returned by the
> > Processor/StateStoreContext.taskId() methods, I just hadn't noticed those
> > APIs at the time of writing this KIP.
> > After finding those APIs and confirming that it was public, I began to
> > wonder  whether it really should be.
> >
> > 1. "Topic Group" is a vestigial term from the early days of
> >> the codebase. We call a "topic group" a "subtopology" in the
> >> public interface now (although "topic group" is still used
> >> internally some places). For user-facing consistency, we
> >> should also use "subtopologyId" in your proposal.
> >
> >
> > 100% agree, I remember being very confused by the term "topic group"
> when
> > I was first getting to know
> > the Streams code base, and wondering why it wasn't just called
> > "subtopology". This would be the perfect
> > opportunity to migrate to the more clear terminology where appropriate.
> > I've updated the KIP to reflect this.
> >
> > 2. I'm wondering if it's really necessary to introduce this
> >> interface at all. I think your motivation is to be able to
> >> get the subtopologyId and partition via TaskMetadata, right?
> >> Why not just add those methods to TaskMetadata? Stepping
> >> back, the concept of metadata about an identifier is a bit
> >> elaborate.
> >
> >
> > I think if the only issue we wanted to address was the TaskMetadata
> > returning a string representation of a
> > TaskId rather than the real thing, as was originally the case, then I
> > would agree with just  unfolding the API
> > to expose the TaskId fields directly. However while digging into the
> > question of whether TaskId was public, it
> > became clear that the class had some awkward features for a public API:
> > public fields with no getters, many
> > utility methods for internal functionality that had to be made public and
> > cluttered up the API, etc. So I believe
> > this KIP can and should take the opportunity to clean up the TaskId API
> > altogether, not just for one getter in
> > TaskMetadata. And we can't deprecate an API without adding a replacement
> > for users to migrate onto, since
> > the concept of a TaskId itself is still very much relevant. That's the
> > primary motivation behind introducing this
> > new interface. Of course technically this could be done by retaining the
> > existing TaskId class and just moving
> > the things that should not be public to the new internal class, but
> that's
> > just a worse version of the same idea,
> > and would require a somewhat awkward deprecation cycle. Plus, given that
> > TaskId is intended to just be a
> > wrapper for some misc information about a Task, a public interface feels
> > more natural than a class.
> >
> > It also gives us more flexibility in the future, as it's much easier to
> > just add a field to the TaskId (with a new
> > getter on the interface) then to track  down all places where we return
> > the TaskId fields, ie every topicGroupId()
> > and partition() getters on an API, and add a new getter API for this
> field.
> >
> > Please refer to the KIP for the current proposal, including all
> deprecated
> > and new APIs. I hope the latest
> > iteration will make everyone happy. The only potentially controversial
> > question remaining is how to name
> > the new interface. Here are all the options that seem reasonable to me,
> > though chose to go with the 2nd one:
> >
> > 1) Keep the name TaskId for the interface, put the interface in the
> > o.a.k.streams package, and call the internal class
> >     TaskIdImpl. This will require a fully qualified class in any file
> that
> > references both the new interface and the
> >     deprecated public TaskId class.
> > 2) Same as the above, but allow the internal class to also retain the
> name
> > TaskId. This will also require a fully qualified
> >     class name whenever any of the three are referenced in the same file,
> > though I don't think there is likely to be a single
> >     file where the new internal TaskId class appears along with the old
> > public TaskId class but not the new TaskId interface,
> >     or vice versa, so the impact is probably exactly the same as the
> above
> > option.
> > 3) Name the interface something else entirely (eg TaskInfo) and keep the
> > name TaskId for the new internal class. The
> >     new internal TaskId and the deprecated public TaskId class most
> likely
> > appear together. This option is probably the
> >     least annoying with respect to requiring fully qualified class names,
> > but I expect it's still on par with how often this
> >     would be required in option 1 or 2.
> >
> > Given that both the new public interface and the deprecated public TaskId
> > class are almost always going to appear together,
> > we can't avoid fully qualified path names until the deprecated class can
> > be removed, unless we give the interface a different
> > name. And most of the time only the internal TaskId class will be
> > referenced anyways, so the scope of fully qualified names
> > should still be small with any option. In fact the vast majority of the
> > codebase will be using the new internal class, so leaving it
> > as "TaskId" will help prevent a truly massive PR, as only the import line
> > will need to be changed, vs hundreds/thousands of lines
> > in which the class name of the TaskId object appears. So I would rather
> > avoid option 1.
> >
> > With option 2 we'll need to suppress a spotbugs warning, while with
> option
> > 3 we are changing the name of a pretty fundamental
> > concept throughout Streams, so users will need to mentally adapt to
> > thinking about TaskInfo (or whatever) instead of in terms
> > of TaskId. That is why I personally prefer option 2.
> >
> >
> > Thanks for the discussion, sorry this turned into such a long response. I
> > think we're on the same page now but I want to make
> > sure the background and my general thought process behind all this was
> > clear to all. I'll kick off the vote again but please respond
> > here with any further concerns and save the [VOTE] thread for voting and
> > maybe the occasional small question.
> >
> > -Sophie
> >
>


-- 
-- Guozhang

Reply via email to