Re: [DISCUSS] KIP-740: Replace the public TaskId class with an interface

2021-05-18 Thread Sophie Blee-Goldman
Another update:

After going through all the options currently on the table, I've decided to
go back to an earlier version of the
proposal, in which we just fix the TaskMetadata API to return a TaskId
object rather than its String representation,
and just clean up the aspects of the TaskId class which are not suitable
for a public API. The full details are in
the Motivation and Rejected Alternatives sections of the KIP.

KIP-740: Clean up public API in TaskId and fix TaskMetadata#taskId()


Also, in a previous message I asserted that changing from a class to an
abstract class or interface was not a
compatible change. However we've relaxed the compatibility contract in
Streams a bit to guarantee source
compatibility rather than binary compatibility, in which case this kind of
change does in fact satisfy our policy.
However this was ultimately rejected for other reasons, see the KIP.

Thanks all,
Sophie

On Sun, May 16, 2021 at 11:40 PM Sophie Blee-Goldman 
wrote:

> Changing `TaskId` to an abstract class and deprecate all util functions /
>> fields that we do not
>> want to expose any longer
>
>
> I had considered doing exactly this, where we just convert the existing
> public TaskId class to an interface
> or an abstract class rather than introducing a new one, but it turns out
> that modifying a class in this way is
> not backwards compatible (see Evolving API packages
> 
>  and Evolving API Classes
> ).
> So it's off the table.
>
> However, that still leaves the door open for reclaiming the existing
> TaskId class and just deprecating the
> inappropriate public APIs, then extending this with a subclass. But this
> just does not seem like an appropriate
> application of subclassing (to me), as that hierarchy structure is exactly
> what interfaces are for. I tried to touch
> on this in my last reply but admittedly did so only briefly, and rather
> dismissively. I would be happy to consider
> this option as well, though as is probably clear by now, it's not my
> personal preference.
>
> Thoughts?
>
> On Sun, May 16, 2021, 9:17 PM Guozhang Wang  wrote:
>
>> Sophie,
>>
>> Thanks for the nice summary. Originally I'm leaning towards option 2)
>> since
>> changing this name to others could be a pretty large conceptual change to
>> users, and I think the concerns you raised is good as well that we cannot
>> reuse the `taskId()` function names any more..
>>
>> After pondering on it a bit, what about 1) keeping `TaskId` in the current
>> package as that's not our main concern anyways, 2) changing `TaskId` to an
>> abstract class and deprecate all util functions / fields that we do not
>> want to expose any longer, and 3) introducing a new class that extends the
>> o.a.k.streams.processor.TaskId for internal implementations?
>>
>>
>> Guozhang
>>
>>
>> On Sun, May 16, 2021 at 7:19 PM Sophie Blee-Goldman
>>  wrote:
>>
>> > Sorry, one correction regarding the discussion on naming below: the
>> other
>> > con for reusing TaskId as the interface name is that we would not be
>> able
>> > to follow the standard getter naming conventions and introduce new
>> taskId()
>> > APIs to replace the deprecated ones, as the deprecated methods have
>> already
>> > claimed that name. So we would have to come up with a different name for
>> > these APIs, for example taskInfo()  would require the TaskId interface
>> > which is a bit confusing. Or using getTaskId() temporarily until the
>> > deprecated methods can be removed, which goes against the established
>> > convention of no "get" in the getters. Another possibility is something
>> > like "taskIdInfo()" or "taskIdMetadata()" so that it's still mentioning
>> > taskId somewhere in the name.
>> >
>> > Thoughts? Given this awkwardness I'm somewhat leaning towards renaming
>> the
>> > interface after all, as in the example above to "TaskInfo". Or we could
>> > take a similar approach and just build on the original TaskId name,
>> calling
>> > it TaskIdInfo or something like that. Any thoughts? As always the KIP
>> has
>> > been updated with what I am proposing, so you can take a look yourself.
>> >
>> > On Sun, May 16, 2021 at 7:07 PM Sophie Blee-Goldman <
>> sop...@confluent.io>
>> > wrote:
>> >
>> > > I'm moving the discussion on KIP-740 to an actual [DISCUSS] thread
>> since
>> > > it turned out to be more nuanced
>> > > and grew in scope since I initially proposed it. Thanks John for
>> > > the suggestions/questions on the [VOTE] thread,
>> > > I have copied them over and addressed them below.
>> > >
>> > > Before that, I just want to apologize to everyone for bringing this
>> to a
>> > > vote immediately. I truly thought this would
>> > > be a trivial KIP that just fixed what 

Re: [DISCUSS] KIP-740: Replace the public TaskId class with an interface

2021-05-17 Thread Sophie Blee-Goldman
>
> Changing `TaskId` to an abstract class and deprecate all util functions /
> fields that we do not
> want to expose any longer


I had considered doing exactly this, where we just convert the existing
public TaskId class to an interface
or an abstract class rather than introducing a new one, but it turns out
that modifying a class in this way is
not backwards compatible (see Evolving API packages

 and Evolving API Classes
).
So it's off the table.

However, that still leaves the door open for reclaiming the existing TaskId
class and just deprecating the
inappropriate public APIs, then extending this with a subclass. But this
just does not seem like an appropriate
application of subclassing (to me), as that hierarchy structure is exactly
what interfaces are for. I tried to touch
on this in my last reply but admittedly did so only briefly, and rather
dismissively. I would be happy to consider
this option as well, though as is probably clear by now, it's not my
personal preference.

Thoughts?

On Sun, May 16, 2021, 9:17 PM Guozhang Wang  wrote:

> Sophie,
>
> Thanks for the nice summary. Originally I'm leaning towards option 2) since
> changing this name to others could be a pretty large conceptual change to
> users, and I think the concerns you raised is good as well that we cannot
> reuse the `taskId()` function names any more..
>
> After pondering on it a bit, what about 1) keeping `TaskId` in the current
> package as that's not our main concern anyways, 2) changing `TaskId` to an
> abstract class and deprecate all util functions / fields that we do not
> want to expose any longer, and 3) introducing a new class that extends the
> o.a.k.streams.processor.TaskId for internal implementations?
>
>
> Guozhang
>
>
> On Sun, May 16, 2021 at 7:19 PM Sophie Blee-Goldman
>  wrote:
>
> > Sorry, one correction regarding the discussion on naming below: the other
> > con for reusing TaskId as the interface name is that we would not be able
> > to follow the standard getter naming conventions and introduce new
> taskId()
> > APIs to replace the deprecated ones, as the deprecated methods have
> already
> > claimed that name. So we would have to come up with a different name for
> > these APIs, for example taskInfo()  would require the TaskId interface
> > which is a bit confusing. Or using getTaskId() temporarily until the
> > deprecated methods can be removed, which goes against the established
> > convention of no "get" in the getters. Another possibility is something
> > like "taskIdInfo()" or "taskIdMetadata()" so that it's still mentioning
> > taskId somewhere in the name.
> >
> > Thoughts? Given this awkwardness I'm somewhat leaning towards renaming
> the
> > interface after all, as in the example above to "TaskInfo". Or we could
> > take a similar approach and just build on the original TaskId name,
> calling
> > it TaskIdInfo or something like that. Any thoughts? As always the KIP has
> > been updated with what I am proposing, so you can take a look yourself.
> >
> > On Sun, May 16, 2021 at 7:07 PM Sophie Blee-Goldman  >
> > wrote:
> >
> > > I'm moving the discussion on KIP-740 to an actual [DISCUSS] thread
> since
> > > it turned out to be more nuanced
> > > and grew in scope since I initially proposed it. Thanks John for
> > > the suggestions/questions on the [VOTE] thread,
> > > I have copied them over and addressed them below.
> > >
> > > Before that, I just want to apologize to everyone for bringing this to
> a
> > > vote immediately. I truly thought this would
> > > be a trivial KIP that just fixed what seemed like a bug in the API, but
> > it
> > > slowly grew in scope as I wrangled with
> > > the question of whether TaskId was already a public API. To clarify, it
> > is
> > > -- a TaskId is returned by the
> > > Processor/StateStoreContext.taskId() methods, I just hadn't noticed
> those
> > > APIs at the time of writing this KIP.
> > > After finding those APIs and confirming that it was public, I began to
> > > wonder  whether it really should be.
> > >
> > > 1. "Topic Group" is a vestigial term from the early days of
> > >> the codebase. We call a "topic group" a "subtopology" in the
> > >> public interface now (although "topic group" is still used
> > >> internally some places). For user-facing consistency, we
> > >> should also use "subtopologyId" in your proposal.
> > >
> > >
> > > 100% agree, I remember being very confused by the term "topic group"
> > when
> > > I was first getting to know
> > > the Streams code base, and wondering why it wasn't just called
> > > "subtopology". This would be the perfect
> > > opportunity to migrate to the more clear terminology where appropriate.
> > > I've updated the KIP to reflect this.
> > >
> > > 2. I'm wondering if it's really 

Re: [DISCUSS] KIP-740: Replace the public TaskId class with an interface

2021-05-16 Thread Guozhang Wang
Sophie,

Thanks for the nice summary. Originally I'm leaning towards option 2) since
changing this name to others could be a pretty large conceptual change to
users, and I think the concerns you raised is good as well that we cannot
reuse the `taskId()` function names any more..

After pondering on it a bit, what about 1) keeping `TaskId` in the current
package as that's not our main concern anyways, 2) changing `TaskId` to an
abstract class and deprecate all util functions / fields that we do not
want to expose any longer, and 3) introducing a new class that extends the
o.a.k.streams.processor.TaskId for internal implementations?


Guozhang


On Sun, May 16, 2021 at 7:19 PM Sophie Blee-Goldman
 wrote:

> Sorry, one correction regarding the discussion on naming below: the other
> con for reusing TaskId as the interface name is that we would not be able
> to follow the standard getter naming conventions and introduce new taskId()
> APIs to replace the deprecated ones, as the deprecated methods have already
> claimed that name. So we would have to come up with a different name for
> these APIs, for example taskInfo()  would require the TaskId interface
> which is a bit confusing. Or using getTaskId() temporarily until the
> deprecated methods can be removed, which goes against the established
> convention of no "get" in the getters. Another possibility is something
> like "taskIdInfo()" or "taskIdMetadata()" so that it's still mentioning
> taskId somewhere in the name.
>
> Thoughts? Given this awkwardness I'm somewhat leaning towards renaming the
> interface after all, as in the example above to "TaskInfo". Or we could
> take a similar approach and just build on the original TaskId name, calling
> it TaskIdInfo or something like that. Any thoughts? As always the KIP has
> been updated with what I am proposing, so you can take a look yourself.
>
> On Sun, May 16, 2021 at 7:07 PM Sophie Blee-Goldman 
> wrote:
>
> > I'm moving the discussion on KIP-740 to an actual [DISCUSS] thread since
> > it turned out to be more nuanced
> > and grew in scope since I initially proposed it. Thanks John for
> > the suggestions/questions on the [VOTE] thread,
> > I have copied them over and addressed them below.
> >
> > Before that, I just want to apologize to everyone for bringing this to a
> > vote immediately. I truly thought this would
> > be a trivial KIP that just fixed what seemed like a bug in the API, but
> it
> > slowly grew in scope as I wrangled with
> > the question of whether TaskId was already a public API. To clarify, it
> is
> > -- a TaskId is returned by the
> > Processor/StateStoreContext.taskId() methods, I just hadn't noticed those
> > APIs at the time of writing this KIP.
> > After finding those APIs and confirming that it was public, I began to
> > wonder  whether it really should be.
> >
> > 1. "Topic Group" is a vestigial term from the early days of
> >> the codebase. We call a "topic group" a "subtopology" in the
> >> public interface now (although "topic group" is still used
> >> internally some places). For user-facing consistency, we
> >> should also use "subtopologyId" in your proposal.
> >
> >
> > 100% agree, I remember being very confused by the term "topic group"
> when
> > I was first getting to know
> > the Streams code base, and wondering why it wasn't just called
> > "subtopology". This would be the perfect
> > opportunity to migrate to the more clear terminology where appropriate.
> > I've updated the KIP to reflect this.
> >
> > 2. I'm wondering if it's really necessary to introduce this
> >> interface at all. I think your motivation is to be able to
> >> get the subtopologyId and partition via TaskMetadata, right?
> >> Why not just add those methods to TaskMetadata? Stepping
> >> back, the concept of metadata about an identifier is a bit
> >> elaborate.
> >
> >
> > I think if the only issue we wanted to address was the TaskMetadata
> > returning a string representation of a
> > TaskId rather than the real thing, as was originally the case, then I
> > would agree with just  unfolding the API
> > to expose the TaskId fields directly. However while digging into the
> > question of whether TaskId was public, it
> > became clear that the class had some awkward features for a public API:
> > public fields with no getters, many
> > utility methods for internal functionality that had to be made public and
> > cluttered up the API, etc. So I believe
> > this KIP can and should take the opportunity to clean up the TaskId API
> > altogether, not just for one getter in
> > TaskMetadata. And we can't deprecate an API without adding a replacement
> > for users to migrate onto, since
> > the concept of a TaskId itself is still very much relevant. That's the
> > primary motivation behind introducing this
> > new interface. Of course technically this could be done by retaining the
> > existing TaskId class and just moving
> > the things that should not be public to the new internal class, but
> that's
> 

Re: [DISCUSS] KIP-740: Replace the public TaskId class with an interface

2021-05-16 Thread Sophie Blee-Goldman
Sorry, one correction regarding the discussion on naming below: the other
con for reusing TaskId as the interface name is that we would not be able
to follow the standard getter naming conventions and introduce new taskId()
APIs to replace the deprecated ones, as the deprecated methods have already
claimed that name. So we would have to come up with a different name for
these APIs, for example taskInfo()  would require the TaskId interface
which is a bit confusing. Or using getTaskId() temporarily until the
deprecated methods can be removed, which goes against the established
convention of no "get" in the getters. Another possibility is something
like "taskIdInfo()" or "taskIdMetadata()" so that it's still mentioning
taskId somewhere in the name.

Thoughts? Given this awkwardness I'm somewhat leaning towards renaming the
interface after all, as in the example above to "TaskInfo". Or we could
take a similar approach and just build on the original TaskId name, calling
it TaskIdInfo or something like that. Any thoughts? As always the KIP has
been updated with what I am proposing, so you can take a look yourself.

On Sun, May 16, 2021 at 7:07 PM Sophie Blee-Goldman 
wrote:

> I'm moving the discussion on KIP-740 to an actual [DISCUSS] thread since
> it turned out to be more nuanced
> and grew in scope since I initially proposed it. Thanks John for
> the suggestions/questions on the [VOTE] thread,
> I have copied them over and addressed them below.
>
> Before that, I just want to apologize to everyone for bringing this to a
> vote immediately. I truly thought this would
> be a trivial KIP that just fixed what seemed like a bug in the API, but it
> slowly grew in scope as I wrangled with
> the question of whether TaskId was already a public API. To clarify, it is
> -- a TaskId is returned by the
> Processor/StateStoreContext.taskId() methods, I just hadn't noticed those
> APIs at the time of writing this KIP.
> After finding those APIs and confirming that it was public, I began to
> wonder  whether it really should be.
>
> 1. "Topic Group" is a vestigial term from the early days of
>> the codebase. We call a "topic group" a "subtopology" in the
>> public interface now (although "topic group" is still used
>> internally some places). For user-facing consistency, we
>> should also use "subtopologyId" in your proposal.
>
>
> 100% agree, I remember being very confused by the term "topic group"  when
> I was first getting to know
> the Streams code base, and wondering why it wasn't just called
> "subtopology". This would be the perfect
> opportunity to migrate to the more clear terminology where appropriate.
> I've updated the KIP to reflect this.
>
> 2. I'm wondering if it's really necessary to introduce this
>> interface at all. I think your motivation is to be able to
>> get the subtopologyId and partition via TaskMetadata, right?
>> Why not just add those methods to TaskMetadata? Stepping
>> back, the concept of metadata about an identifier is a bit
>> elaborate.
>
>
> I think if the only issue we wanted to address was the TaskMetadata
> returning a string representation of a
> TaskId rather than the real thing, as was originally the case, then I
> would agree with just  unfolding the API
> to expose the TaskId fields directly. However while digging into the
> question of whether TaskId was public, it
> became clear that the class had some awkward features for a public API:
> public fields with no getters, many
> utility methods for internal functionality that had to be made public and
> cluttered up the API, etc. So I believe
> this KIP can and should take the opportunity to clean up the TaskId API
> altogether, not just for one getter in
> TaskMetadata. And we can't deprecate an API without adding a replacement
> for users to migrate onto, since
> the concept of a TaskId itself is still very much relevant. That's the
> primary motivation behind introducing this
> new interface. Of course technically this could be done by retaining the
> existing TaskId class and just moving
> the things that should not be public to the new internal class, but that's
> just a worse version of the same idea,
> and would require a somewhat awkward deprecation cycle. Plus, given that
> TaskId is intended to just be a
> wrapper for some misc information about a Task, a public interface feels
> more natural than a class.
>
> It also gives us more flexibility in the future, as it's much easier to
> just add a field to the TaskId (with a new
> getter on the interface) then to track  down all places where we return
> the TaskId fields, ie every topicGroupId()
> and partition() getters on an API, and add a new getter API for this field.
>
> Please refer to the KIP for the current proposal, including all deprecated
> and new APIs. I hope the latest
> iteration will make everyone happy. The only potentially controversial
> question remaining is how to name
> the new interface. Here are all the options that seem reasonable to me,
> though 

[DISCUSS] KIP-740: Replace the public TaskId class with an interface

2021-05-16 Thread Sophie Blee-Goldman
I'm moving the discussion on KIP-740 to an actual [DISCUSS] thread since it
turned out to be more nuanced
and grew in scope since I initially proposed it. Thanks John for
the suggestions/questions on the [VOTE] thread,
I have copied them over and addressed them below.

Before that, I just want to apologize to everyone for bringing this to a
vote immediately. I truly thought this would
be a trivial KIP that just fixed what seemed like a bug in the API, but it
slowly grew in scope as I wrangled with
the question of whether TaskId was already a public API. To clarify, it is
-- a TaskId is returned by the
Processor/StateStoreContext.taskId() methods, I just hadn't noticed those
APIs at the time of writing this KIP.
After finding those APIs and confirming that it was public, I began to
wonder  whether it really should be.

1. "Topic Group" is a vestigial term from the early days of
> the codebase. We call a "topic group" a "subtopology" in the
> public interface now (although "topic group" is still used
> internally some places). For user-facing consistency, we
> should also use "subtopologyId" in your proposal.


100% agree, I remember being very confused by the term "topic group"  when
I was first getting to know
the Streams code base, and wondering why it wasn't just called
"subtopology". This would be the perfect
opportunity to migrate to the more clear terminology where appropriate.
I've updated the KIP to reflect this.

2. I'm wondering if it's really necessary to introduce this
> interface at all. I think your motivation is to be able to
> get the subtopologyId and partition via TaskMetadata, right?
> Why not just add those methods to TaskMetadata? Stepping
> back, the concept of metadata about an identifier is a bit
> elaborate.


I think if the only issue we wanted to address was the TaskMetadata
returning a string representation of a
TaskId rather than the real thing, as was originally the case, then I would
agree with just  unfolding the API
to expose the TaskId fields directly. However while digging into the
question of whether TaskId was public, it
became clear that the class had some awkward features for a public API:
public fields with no getters, many
utility methods for internal functionality that had to be made public and
cluttered up the API, etc. So I believe
this KIP can and should take the opportunity to clean up the TaskId API
altogether, not just for one getter in
TaskMetadata. And we can't deprecate an API without adding a replacement
for users to migrate onto, since
the concept of a TaskId itself is still very much relevant. That's the
primary motivation behind introducing this
new interface. Of course technically this could be done by retaining the
existing TaskId class and just moving
the things that should not be public to the new internal class, but that's
just a worse version of the same idea,
and would require a somewhat awkward deprecation cycle. Plus, given that
TaskId is intended to just be a
wrapper for some misc information about a Task, a public interface feels
more natural than a class.

It also gives us more flexibility in the future, as it's much easier to
just add a field to the TaskId (with a new
getter on the interface) then to track  down all places where we return the
TaskId fields, ie every topicGroupId()
and partition() getters on an API, and add a new getter API for this field.

Please refer to the KIP for the current proposal, including all deprecated
and new APIs. I hope the latest
iteration will make everyone happy. The only potentially controversial
question remaining is how to name
the new interface. Here are all the options that seem reasonable to me,
though chose to go with the 2nd one:

1) Keep the name TaskId for the interface, put the interface in the
o.a.k.streams package, and call the internal class
TaskIdImpl. This will require a fully qualified class in any file that
references both the new interface and the
deprecated public TaskId class.
2) Same as the above, but allow the internal class to also retain the name
TaskId. This will also require a fully qualified
class name whenever any of the three are referenced in the same file,
though I don't think there is likely to be a single
file where the new internal TaskId class appears along with the old
public TaskId class but not the new TaskId interface,
or vice versa, so the impact is probably exactly the same as the above
option.
3) Name the interface something else entirely (eg TaskInfo) and keep the
name TaskId for the new internal class. The
new internal TaskId and the deprecated public TaskId class most likely
appear together. This option is probably the
least annoying with respect to requiring fully qualified class names,
but I expect it's still on par with how often this
would be required in option 1 or 2.

Given that both the new public interface and the deprecated public TaskId
class are almost always going to appear together,
we can't avoid fully qualified