I think `RecordProcessor` is a good name.

-Matthias

On 6/21/19 5:09 PM, John Roesler wrote:
> After kicking the naming around a bit more, it seems like any package
> name change is a bit "weird" because it fragments the package and
> directory structure. If we can come up with a reasonable name for the
> interface after all, it seems like the better choice.
> 
> The real challenge is that the existing name "Processor" seems just
> about perfect. In picking a new name, we need to consider the ultimate
> state, after the deprecation period, when we entirely remove
> Processor. In this context, TypedProcessor seems a little odd to me,
> because it seems to imply that there should also be an "untyped
> processor".
> 
> After kicking around a few other ideas, what does everyone think about
> "RecordProcessor"? I _think_ maybe it stands on its own just fine,
> because it's a thing that processes... records?
> 
> If others agree with this, I can change the proposal to RecordProcessor.
> 
> Thanks,
> -John
> 
> On Fri, Jun 21, 2019 at 6:42 PM John Roesler <j...@confluent.io> wrote:
>>
>> Hi all,
>>
>> I've updated the KIP with the feedback so far.
>>
>> The naming question is still the biggest (only?) outstanding issue. It
>> would be good to hear some more thoughts on it.
>>
>> As we stand now, there's one vote for changing the package name to
>> something like 'typedprocessor', one for changing the interface to
>> TypedProcessor (as in the PoC), and one for just changing the
>> Processor interface in-place, breaking source compatibility.
>>
>> How can we resolve this decision?
>>
>> Thanks,
>> -John
>>
>> On Thu, Jun 20, 2019 at 5:44 PM John Roesler <j...@confluent.io> wrote:
>>>
>>> Thanks for the feedback, Guozhang and Matthias,
>>>
>>> Regarding motivation: I'll update the wiki. Briefly:
>>> * Any processor can benefit. Imagine a pure user of the ProcessorAPI
>>> who has very complex processing logic. I have seen several processor
>>> implementation that are hundreds of lines long and call
>>> `context.forward` in many different locations and branches. In such an
>>> implementation, it would be very easy to have a bug in a rarely used
>>> branch that forwards the wrong kind of value. This would structurally
>>> prevent that from happening.
>>> * Also, anyone who heavily uses the ProcessorAPI would likely have
>>> developed helper methods to wire together processors, just as we have
>>> in the DSL implementation. This change would enable them to ensure at
>>> compile time that they are actually wiring together compatible types.
>>> This was actually _my_ original motivation, since I found it very
>>> difficult and time consuming to follow the Streams DSL internal
>>> builders.
>>>
>>> Regarding breaking the source compatibility of Processor: I would
>>> _love_ to side-step the naming problem, but I really don't know if
>>> it's excusable to break compatibility. I suspect that our oldest and
>>> dearest friends are using the ProcessorAPI in some form or another,
>>> and all their source code would break. It sucks to have to create a
>>> whole new interface to get around this, but it feels like the right
>>> thing to do. Would be nice to get even more feedback on this point,
>>> though.
>>>
>>> Regarding the types of stores, as I said in my response to Sophie,
>>> it's not an issue.
>>>
>>> Regarding the change to StreamsBuilder, it doesn't pin the types in
>>> any way, since all the types are bounded by Object only, and there are
>>> no extra constraints between arguments (each type is used only once in
>>> one argument). But maybe I missed the point you were asking about.
>>> Since the type takes generic paramters, we should allow users to pass
>>> in parameterized arguments. Otherwise, they would _have to_ give us a
>>> raw type, and they would be forced to get a "rawtyes" warning from the
>>> compiler. So, it's our obligation in any API that accepts a
>>> parameterized-type parameter to allow people to actually pass a
>>> parameterized type, even if we don't actually use the parameters.
>>>
>>> The naming question is a complex one, as I took pains to detail
>>> previously. Please don't just pick out one minor point, call it weak,
>>> and then claim that it invalidates the whole decision. I don't think
>>> there's a clear best choice, so I'm more than happy for someone to
>>> advocate for renaming the class instead of the package. Can you
>>> provide some reasons why you think that would be better?
>>>
>>> Regarding the deprecated methods, you're absolutely right. I'll update the 
>>> KIP.
>>>
>>> Thanks again for all the feedback!
>>> -John
>>>
>>> On Thu, Jun 20, 2019 at 4:34 PM Matthias J. Sax <matth...@confluent.io> 
>>> wrote:
>>>>
>>>> Just want to second what Sophie said about the stores. The type of a
>>>> used stores is completely independent of input/output types.
>>>>
>>>> This related to change `addGlobalStore()` method. Why do you want to pin
>>>> the types? In fact, people request the ability to filter() and maybe
>>>> even map() the data before they are put into the global store. Limiting
>>>> the types seems to be a step backward here?
>>>>
>>>>
>>>>
>>>> Also, the pack name is questionable.
>>>>
>>>>> This wouldn't be the first project to do something like this...
>>>>
>>>> Not a strong argument. I would actually propose to not a a new package,
>>>> but just a new class `TypedProcessor`.
>>>>
>>>>
>>>> For `ProcessorContext#forward` methods -- some of those methods are
>>>> already deprecated. While the will still be affected, it would be worth
>>>> to mark them as deprecated in the wiki page, too.
>>>>
>>>>
>>>> @Guozhang: I dont' think we should break source compatibility in a minor
>>>> release.
>>>>
>>>>
>>>>
>>>> -Matthias
>>>>
>>>>
>>>>
>>>> On 6/20/19 1:43 PM, Guozhang Wang wrote:
>>>>> Hi John,
>>>>>
>>>>> Thanks for KIP! I've a few comments below:
>>>>>
>>>>> 1. So far the "Motivation" section is very general, and the only concrete
>>>>> example that I have in mind is `TransformValues#punctuate`. Do we have any
>>>>> other concrete issues that drive this KIP? If not then I feel better to
>>>>> narrow the scope of this KIP to:
>>>>>
>>>>> 1.a) modifying ProcessorContext only with the output types on forward.
>>>>> 1.b) modifying Transformer signature to have generics of ProcessorContext,
>>>>> and then lift the restricting of not using punctuate: if user did not
>>>>> follow the enforced typing and just code without generics, they will get
>>>>> warning at compile time and get run-time error if they forward wrong-typed
>>>>> records, which I think would be acceptable.
>>>>>
>>>>> I feel this would be a good solution for this specific issue; again, feel
>>>>> free to update the wiki page with other known issues that cannot be
>>>>> resolved.
>>>>>
>>>>> 2. If, we want to go with the current scope then my next question would 
>>>>> be,
>>>>> how much breakage we would introducing if we just modify the Processor
>>>>> signature directly? My feeling is that DSL users would be most likely not
>>>>> affected and PAPI users only need to modify a few lines on class
>>>>> declaration. I feel it worth doing some research on this part and then
>>>>> decide if we really want to bite the bullet of duplicated Processor /
>>>>> ProcessorSupplier classes for maintaining compatibility.
>>>>>
>>>>>
>>>>> Guozhang
>>>>>
>>>>>
>>>>>
>>>>> On Wed, Jun 19, 2019 at 12:21 PM John Roesler <j...@confluent.io> wrote:
>>>>>
>>>>>> Hi all,
>>>>>>
>>>>>> In response to the feedback so far, I changed the package name from
>>>>>> `processor2` to `processor.generic`.
>>>>>>
>>>>>> Thanks,
>>>>>> -John
>>>>>>
>>>>>> On Mon, Jun 17, 2019 at 4:49 PM John Roesler <j...@confluent.io> wrote:
>>>>>>>
>>>>>>> Thanks for the feedback, Sophie!
>>>>>>>
>>>>>>> I actually felt a little uneasy when I wrote that remark, because it's
>>>>>>> not restricted at all in the API, it's just available to you if you
>>>>>>> choose to give your stores and context the same parameters. So, I
>>>>>>> think your use case is valid, and also perfectly permissable under the
>>>>>>> current KIP. Sorry for sowing confusion on my own discussion thread!
>>>>>>>
>>>>>>> I'm not crazy about the package name, either. I went with it only
>>>>>>> because there's seemingly nothing special about the new package except
>>>>>>> that it can't have the same name as the old one. Otherwise, the
>>>>>>> existing "processor" and "Processor" names for the package and class
>>>>>>> are perfectly satisfying. Rather than pile on additional semantics, it
>>>>>>> seemed cleaner to just add a number to the package name.
>>>>>>>
>>>>>>> This wouldn't be the first project to do something like this... Apache
>>>>>>> Commons, for example, has added a "2" to the end of some of their
>>>>>>> packages for exactly the same reason.
>>>>>>>
>>>>>>> I'm open to any suggestions. For example, we could do something like
>>>>>>> org.apache.kafka.streams.typedprocessor.Processor or
>>>>>>> org.apache.kafka.streams.processor.typed.Processor , which would have
>>>>>>> just about the same effect. One microscopic thought is that, if
>>>>>>> there's another interface in the "processor" package that we wish to
>>>>>>> do the same thing to, would _could_ pile it in to "processor2", but we
>>>>>>> couldn't do the same if we use a package that has "typed" in the name,
>>>>>>> unless that change is _also_ related to types in some way. But this
>>>>>>> seems like a very minor concern.
>>>>>>>
>>>>>>> What's your preference?
>>>>>>> -John
>>>>>>>
>>>>>>> On Mon, Jun 17, 2019 at 3:56 PM Sophie Blee-Goldman 
>>>>>>> <sop...@confluent.io>
>>>>>> wrote:
>>>>>>>>
>>>>>>>> Hey John, thanks for writing this up! I like the proposal but there's
>>>>>> one
>>>>>>>> point that I think may be too restrictive:
>>>>>>>>
>>>>>>>> "A processor that happens to use a typed store is actually emitting the
>>>>>>>> same types that it is storing."
>>>>>>>>
>>>>>>>> I can imagine someone could want to leverage this new type safety
>>>>>> without
>>>>>>>> also limiting how they can interact with/use their store. As an
>>>>>> (admittedly
>>>>>>>> contrived) example, say you have an input stream of purchases of a
>>>>>> certain
>>>>>>>> type (entertainment, food, etc), and on seeing a new record you want to
>>>>>>>> output how many types of purchase a shopper has made more than 5
>>>>>> purchases
>>>>>>>> of in the last month. Your state store will probably be holding some
>>>>>> more
>>>>>>>> complicated PurchaseHistory object (keyed by user), but your output is
>>>>>> just
>>>>>>>> a <User, Long>
>>>>>>>>
>>>>>>>> I'm also not crazy about "processor2" as the package name ... not sure
>>>>>> what
>>>>>>>> a better one would be though (something with "typed"?)
>>>>>>>>
>>>>>>>> On Mon, Jun 17, 2019 at 12:47 PM John Roesler <j...@confluent.io>
>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Hi all,
>>>>>>>>>
>>>>>>>>> I'd like to propose KIP-478 (
>>>>>> https://cwiki.apache.org/confluence/x/2SkLBw
>>>>>>>>> ).
>>>>>>>>>
>>>>>>>>> This proposal would add output type bounds to the Processor interface
>>>>>>>>> in Kafka Streams, which enables static checking of a number of useful
>>>>>>>>> properties:
>>>>>>>>> * A processor B that consumes the output of processor A is actually
>>>>>>>>> expecting the same types that processor A produces.
>>>>>>>>> * A processor that happens to use a typed store is actually emitting
>>>>>>>>> the same types that it is storing.
>>>>>>>>> * A processor is simply forwarding the expected types in all code
>>>>>> paths.
>>>>>>>>> * Processors added via the Streams DSL, which are not permitted to
>>>>>>>>> forward results at all are statically prevented from doing so by the
>>>>>>>>> compiler
>>>>>>>>>
>>>>>>>>> Internally, we can use the above properties to achieve a much higher
>>>>>>>>> level of confidence in the Streams DSL implementation's correctness.
>>>>>>>>> Actually, while doing the POC, I found a few bugs and mistakes, which
>>>>>>>>> become structurally impossible with KIP-478.
>>>>>>>>>
>>>>>>>>> Additionally, the stronger types dramatically improve the
>>>>>>>>> self-documentation of our Streams internal implementations, which
>>>>>>>>> makes it much easier for new contributors to ramp up with confidence.
>>>>>>>>>
>>>>>>>>> Thanks so much for your consideration!
>>>>>>>>> -John
>>>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>

Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to