Re: [DISCUSS]: KIP-149: Enabling key access in ValueTransformer, ValueMapper, and ValueJoiner

Matthias J. Sax Thu, 18 May 2017 16:00:34 -0700

Thanks for the update and explanations. The KIP is quite improved now --
great job!


One more question: Why are RichValueMapper etc abstract classes and not
interfaces?


Overall, I like the KIP a lot!


-Matthias


On 5/16/17 7:03 AM, Jeyhun Karimov wrote:
> Hi,
> 
> Thanks for your comments.
> 
> I think supporting Lambdas for `withKey` and `AbstractRichFunction`
>> don't go together, as Lambdas are only supported for interfaces AFAIK.
> 
> 
> Maybe I misunderstood your comment.
> *withKey* and and *withOnlyValue* are interfaces. So they don't have direct
> relation with *AbstractRichFunction*.
> *withKey* and and *withOnlyValue* interfaces have only one  method , so we
> can use lambdas.
> Where does the *AbstractRichFunction* comes to play? Inside Rich functions.
> And we use Rich functions in 2 places:
> 
> 1. User doesn't use rich functions. Just regular *withKey* and and
> *withOnlyValue* interfaces(both support lambdas) . We get those interfaces
> and wrap into Rich function while building the topology, and send to
> Processor.
> 2. User does use rich functions (Rich functions implement *withKey*
> interface). As a result no lamdas here by definition. In this case, while
> building the topology we do a type check if the object type is
> *withKey* or *RichFunction*.
> 
> So *AbstractRichFunction* is just syntactic sugar for Rich functions and
> does not affect using lambdas.
> 
> Thus, if we want to support Lambdas for `withKey`, we need to have a
>> interface approach like this
>>   - RichFunction -> only adding init() and close()
>>   - ValueMapper
>>   - ValueMapperWithKey
>>   - RichValueMapper extends ValueMapperWithKey, RichFunction
> 
> 
> As I said above, currently we support lambdas for *withKey* interfaces as
> well.  However, I agree with your idea and I will remove the
> AbstractRichFunction from the design.
> 
> As an alternative, we could argue, that it is sufficient to support
>> Lambdas for the "plain" API only, but not for any "extended API". For
>> this, RichFunction could add key+init+close and AbstractRichFunction
>> would allow to only care about getting the key.
>> Not sure, which one is better. I don't like the idea of more overloaded
>> methods to get Lambdas for `withKey` interfaces too much because we have
>> already so many overlaods. On the other hand, I do see value in
>> supporting Lambdas for `withKey`.
> 
> 
> Just to clarify, with current design we have only one extra overloaded
> method per *withOnlyValue* interface:  which is *withKey* version of
> particular interface.
> We don't need extra overload for Rich function as Rich function implements
> *withKey* interface as a result they have same type. We differentiate them
> while building the topology.
> We supported lambdas for *withKey* APIs because of the comment:
> 
> @Jeyhun: I did not put any thought into this, but can we have a design
>> that allows for both? Also, with regard to lambdas, it might make sense
>> to allow for both `V -> newV` and `(K, V) -> newV` ?
> 
> 
> However, I don't think that this complicates the overall design
> significantly.
> 
> 
> Depending on what we want to support, it might make sense to
>> include/exclude RichFunctions from this KIP -- and thus, this also
>> determines if we should have a "ProcessorContext KIP" before driving
>> this KIP further.
> 
> 
> Based on our discussion I think we should keep Rich functions as I don't
> think that they bring extra layer of overhead to library.
> 
> Any comments are appreciated.
> 
> Cheers,
> Jeyhun
> 
> 
> On Tue, May 16, 2017 at 12:10 AM Matthias J. Sax <[email protected]>
> wrote:
> 
>> Jeyhun,
>>
>> thanks for the update.
>>
>> I think supporting Lambdas for `withKey` and `AbstractRichFunction`
>> don't go together, as Lambdas are only supported for interfaces AFAIK.
>>
>> Thus, if we want to support Lambdas for `withKey`, we need to have a
>> interface approach like this
>>
>>   - RichFunction -> only adding init() and close()
>>
>>   - ValueMapper
>>   - ValueMapperWithKey
>>
>>   - RichValueMapper extends ValueMapperWithKey, RichFunction
>>
>> For this approach, AbstractRichFunction does not make sense anymore, as
>> the only purpose of `RichFunction` is to allow the implementation of
>> init() and close() -- if you don't want those, you would implement a
>> different interface (ie, ValueMapperWithKey)
>>
>> As an alternative, we could argue, that it is sufficient to support
>> Lambdas for the "plain" API only, but not for any "extended API". For
>> this, RichFunction could add key+init+close and AbstractRichFunction
>> would allow to only care about getting the key.
>>
>> Not sure, which one is better. I don't like the idea of more overloaded
>> methods to get Lambdas for `withKey` interfaces too much because we have
>> already so many overlaods. On the other hand, I do see value in
>> supporting Lambdas for `withKey`.
>>
>> Depending on what we want to support, it might make sense to
>> include/exclude RichFunctions from this KIP -- and thus, this also
>> determines if we should have a "ProcessorContext KIP" before driving
>> this KIP further.
>>
>> Thoughts?
>>
>>
>>
>>
>> -Matthias
>>
>>
>> On 5/15/17 11:01 AM, Jeyhun Karimov wrote:
>>> Hi,
>>>
>>> Sorry for super late response. Thanks for your comments.
>>>
>>> I am not an expert on Lambdas. Can you elaborate a little bit? I cannot
>>>> follow the explanation in the KIP to see what the problem is.
>>>
>>>
>>> - From [1] says "A functional interface is an interface that has just one
>>> abstract method, and thus represents a single function contract".
>>> So basically once we extend some interface from another (in our case,
>>> ValueMapperWithKey from ValueMapper) we cannot use lambdas in the
>> extended
>>> interface.
>>>
>>>
>>> Further comments:
>>>>  - The KIP get a little hard to read -- can you maybe reformat the wiki
>>>> page a little bit? I think using `CodeBlock` would help.
>>>
>>>
>>> - I will work on the KIP.
>>>
>>>  - What about KStream-KTable joins? You don't have overlaods added for
>>>> them. Why? (Even if I still hope that we don't need to add any new
>>>> overloads)
>>>
>>>
>>> - Actually there are more than one Processor and public APIs to be
>>> changed (KStream-KTable
>>> joins is one case). However all of them has similar structure: we
>> overload
>>> the *method* with  *methodWithKey*,
>>> wrap it into the Rich function, send to processor and inside the
>> processor
>>> call *init* and *close* methods of the Rich function.
>>> As I wrote in KIP, I wanted to demonstrate the overall idea with only
>>> *ValueMapper* as the same can be applied to all changes.
>>> Anyway I will update the KIP.
>>>
>>>  - Why do we need `AbstractRichFunction`?
>>>
>>>
>>> Instead of overriding the *init(ProcessorContext p)* and* close()*
>> methods
>>> in every Rich function with empty body like:
>>>
>>> @Override
>>> void init(ProcessorContext context) {}
>>>
>>> @Override
>>> void close () {}
>>>
>>> I thought that we can override them once in *AbstractRichFunction* and
>>> extent new Rich functions from *AbstractRichFunction*.
>>> Basically this can eliminate code copy-paste and ease the maintenance.
>>>
>>>  - What about interfaces Initializer, ForeachAction, Merger, Predicate,
>>>> Reducer? I don't want to say we should/need to add to all, but we should
>>>> discuss all of them and add where it does make sense (e.g.,
>>>> RichForachAction does make sense IMHO)
>>>
>>>
>>> Definitely agree. As I said, the same technique applies to all this
>>> interfaces and I didn't want to explode the KIP, just wanted to give the
>>> overall intuition.
>>> However, I will update the KIP as I said.
>>>
>>>
>>> Btw: I like the hierarchy `ValueXX` -- `ValueXXWithKey` -- `RichValueXX`
>>>> in general -- but why can't we do all this with interfaces only?
>>>
>>>
>>> Sure we can. However the main intuition is we should not force users to
>>> implement *init(ProcessorContext)* and *close()* functions every time
>> they
>>> use Rich functions.
>>> If one needs, she can override the respective methods. However, I am open
>>> for discussion.
>>>
>>>
>>> I'd rather not see the use of  `ProcessorContext` spread any further than
>>>> it currently is. So maybe we need another KIP that is done before this?
>>>> Otherwise i think the scope of this KIP is becoming too large.
>>>
>>>
>>> That is good point. I wanted to make *init(ProcessorContext)* method
>>> persistent among the library (which use ProcessorContext as an input),
>>> therefore I put *ProcessorContext* as an input.
>>> So the important question is that (as @dguy and @mjsax mentioned) whether
>>> continue this KIP without providing users an access to *ProcessorContext*
>>> (change *init (ProcessorContext)* to * init()* ) or
>>> initiate another KIP before this.
>>>
>>> [1]
>>> http://cr.openjdk.java.net/~mr/se/8/java-se-8-pfd-spec/java-
>> se-8-jls-pfd-diffs.pdf
>>>
>>>
>>> Cheers,
>>> Jeyhun
>>>
>>> On Mon, May 15, 2017 at 7:15 PM, Damian Guy <[email protected]>
>> wrote:
>>>
>>>> I'd rather not see the use of  `ProcessorContext` spread any further
>> than
>>>> it currently is. So maybe we need another KIP that is done before this?
>>>> Otherwise i think the scope of this KIP is becoming too large.
>>>>
>>>>
>>>> On Mon, 15 May 2017 at 18:06 Matthias J. Sax <[email protected]>
>>>> wrote:
>>>>
>>>>> I agree that that `ProcessorContext` interface is too broad in general
>>>>> -- this is even true for transform/process, and it's also reflected in
>>>>> the API improvement list we want to do.
>>>>>
>>>>> https://cwiki.apache.org/confluence/display/KAFKA/
>>>> Kafka+Streams+Discussions
>>>>>
>>>>> So I am wondering, if you question the `RichFunction` approach in
>>>>> general? Or if you suggest to either extend the scope of this KIP to
>>>>> include this---or maybe better, do another KIP for it and delay this
>> KIP
>>>>> until the other one is done?
>>>>>
>>>>>
>>>>> -Matthias
>>>>>
>>>>> On 5/15/17 2:35 AM, Damian Guy wrote:
>>>>>> Thanks for the KIP.
>>>>>>
>>>>>> I'm not convinced on the `RichFunction` approach. Do we really want to
>>>>> give
>>>>>> every DSL method access to the `ProcessorContext` ? It has a bunch of
>>>>>> methods on it that seem in-appropriate for some of the DSL methods,
>>>> i.e,
>>>>>> `register`, `getStateStore`, `forward`, `schedule` etc. It is far too
>>>>>> broad. I think it would be better to have a narrower interface like
>> the
>>>>>> `RecordContext`  - remembering it is easier to add methods/interfaces
>>>>> later
>>>>>> than to remove them
>>>>>>
>>>>>> On Sat, 13 May 2017 at 22:26 Matthias J. Sax <[email protected]>
>>>>> wrote:
>>>>>>
>>>>>>> Jeyhun,
>>>>>>>
>>>>>>> I am not an expert on Lambdas. Can you elaborate a little bit? I
>>>> cannot
>>>>>>> follow the explanation in the KIP to see what the problem is.
>>>>>>>
>>>>>>> For updating the KIP title I don't know -- guess it's up to you.
>>>> Maybe a
>>>>>>> committer can comment on this?
>>>>>>>
>>>>>>>
>>>>>>> Further comments:
>>>>>>>
>>>>>>>  - The KIP get a little hard to read -- can you maybe reformat the
>>>> wiki
>>>>>>> page a little bit? I think using `CodeBlock` would help.
>>>>>>>
>>>>>>>  - What about KStream-KTable joins? You don't have overlaods added
>> for
>>>>>>> them. Why? (Even if I still hope that we don't need to add any new
>>>>>>> overloads)
>>>>>>>
>>>>>>>  - Why do we need `AbstractRichFunction`?
>>>>>>>
>>>>>>>  - What about interfaces Initializer, ForeachAction, Merger,
>>>> Predicate,
>>>>>>> Reducer? I don't want to say we should/need to add to all, but we
>>>> should
>>>>>>> discuss all of them and add where it does make sense (e.g.,
>>>>>>> RichForachAction does make sense IMHO)
>>>>>>>
>>>>>>>
>>>>>>> Btw: I like the hierarchy `ValueXX` -- `ValueXXWithKey` --
>>>> `RichValueXX`
>>>>>>> in general -- but why can't we do all this with interfaces only?
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> -Matthias
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 5/11/17 7:00 AM, Jeyhun Karimov wrote:
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> Thanks for your comments. I think we cannot extend the two
>> interfaces
>>>>> if
>>>>>>> we
>>>>>>>> want to keep lambdas. I updated the KIP [1]. Maybe I should change
>>>> the
>>>>>>>> title, because now we are not limiting the KIP to only ValueMapper,
>>>>>>>> ValueTransformer and ValueJoiner.
>>>>>>>> Please feel free to comment.
>>>>>>>>
>>>>>>>> [1]
>>>>>>>>
>>>>>>>
>>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
>>>> 149%3A+Enabling+key+access+in+ValueTransformer%2C+
>>>> ValueMapper%2C+and+ValueJoiner
>>>>>>>>
>>>>>>>>
>>>>>>>> Cheers,
>>>>>>>> Jeyhun
>>>>>>>>
>>>>>>>> On Tue, May 9, 2017 at 7:36 PM Matthias J. Sax <
>>>> [email protected]>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> If `ValueMapperWithKey` extends `ValueMapper` we don't need the new
>>>>>>>>> overlaod.
>>>>>>>>>
>>>>>>>>> And yes, we need to do one check -- but this happens when building
>>>> the
>>>>>>>>> topology. At runtime (I mean after KafkaStream#start() we don't
>> need
>>>>> any
>>>>>>>>> check as we will always use `ValueMapperWithKey`)
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> -Matthias
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 5/9/17 2:55 AM, Jeyhun Karimov wrote:
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> Thanks for feedback.
>>>>>>>>>> Then we need to overload method
>>>>>>>>>>   <VR> KStream<K, VR> mapValues(ValueMapper<? super V, ? extends
>>>> VR>
>>>>>>>>>> mapper);
>>>>>>>>>> with
>>>>>>>>>>   <VR> KStream<K, VR> mapValues(ValueMapperWithKey<? super V, ?
>>>>> extends
>>>>>>>>> VR>
>>>>>>>>>> mapper);
>>>>>>>>>>
>>>>>>>>>> and in runtime (inside processor) we still have to check it is
>>>>>>>>> ValueMapper
>>>>>>>>>> or ValueMapperWithKey before wrapping it into the rich function.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Please correct me if I am wrong.
>>>>>>>>>>
>>>>>>>>>> Cheers,
>>>>>>>>>> Jeyhun
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Tue, May 9, 2017 at 10:56 AM Michal Borowiecki <
>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>
>>>>>>>>>>> +1 :)
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On 08/05/17 23:52, Matthias J. Sax wrote:
>>>>>>>>>>>> Hi,
>>>>>>>>>>>>
>>>>>>>>>>>> I was reading the updated KIP and I am wondering, if we should
>> do
>>>>> the
>>>>>>>>>>>> design a little different.
>>>>>>>>>>>>
>>>>>>>>>>>> Instead of distinguishing between a RichFunction and
>>>>> non-RichFunction
>>>>>>>>> at
>>>>>>>>>>>> runtime level, we would use RichFunctions all the time. Thus, on
>>>>> the
>>>>>>>>> DSL
>>>>>>>>>>>> entry level, if a user provides a non-RichFunction, we wrap it
>>>> by a
>>>>>>>>>>>> RichFunction that is fully implemented by Streams. This
>>>>> RichFunction
>>>>>>>>>>>> would just forward the call omitting the key:
>>>>>>>>>>>>
>>>>>>>>>>>> Just to sketch the idea (incomplete code snippet):
>>>>>>>>>>>>
>>>>>>>>>>>>> public StreamsRichValueMapper implements RichValueMapper() {
>>>>>>>>>>>>>    private ValueMapper userProvidedMapper; // set by
>> constructor
>>>>>>>>>>>>>
>>>>>>>>>>>>>    public VR apply(final K key, final V1 value1, final V2
>>>> value2)
>>>>> {
>>>>>>>>>>>>>        return userProvidedMapper(value1, value2);
>>>>>>>>>>>>>    }
>>>>>>>>>>>>> }
>>>>>>>>>>>>
>>>>>>>>>>>>  From a performance point of view, I am not sure if the
>>>>>>>>>>>> "if(isRichFunction)" including casts etc would have more
>> overhead
>>>>>>> than
>>>>>>>>>>>> this approach (we would do more nested method call for
>>>>>>> non-RichFunction
>>>>>>>>>>>> which should be more common than RichFunctions).
>>>>>>>>>>>>
>>>>>>>>>>>> This approach should not effect lambdas (or do I miss
>> something?)
>>>>> and
>>>>>>>>>>>> might be cleaner, as we could have one more top level interface
>>>>>>>>>>>> `RichFunction` with methods `init()` and `close()` and also
>>>>>>> interfaces
>>>>>>>>>>>> for `RichValueMapper` etc. (thus, no abstract classes required).
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Any thoughts?
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> -Matthias
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On 5/6/17 5:29 PM, Jeyhun Karimov wrote:
>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks for comments. I extended PR and KIP to include rich
>>>>>>> functions.
>>>>>>>>> I
>>>>>>>>>>>>> will still have to evaluate the cost of deep copying of keys.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>> Jeyhun
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Fri, May 5, 2017 at 8:02 PM Mathieu Fenniak <
>>>>>>>>>>> [email protected]>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hey Matthias,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> My opinion would be that documenting the immutability of the
>>>> key
>>>>> is
>>>>>>>>> the
>>>>>>>>>>>>>> best approach available.  Better than requiring the key to be
>>>>>>>>>>> serializable
>>>>>>>>>>>>>> (as with Jeyhun's last pass at the PR), no performance risk.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> It'd be different if Java had immutable type constraints of
>>>> some
>>>>>>>>> kind.
>>>>>>>>>>> :-)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Mathieu
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Fri, May 5, 2017 at 11:31 AM, Matthias J. Sax <
>>>>>>>>>>> [email protected]>
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Agreed about RichFunction. If we follow this path, it should
>>>>> cover
>>>>>>>>>>>>>>> all(?) DSL interfaces.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> About guarding the key -- I am still not sure what to do
>> about
>>>>>>> it...
>>>>>>>>>>>>>>> Maybe it might be enough to document it (and name the key
>>>>>>> parameter
>>>>>>>>>>> like
>>>>>>>>>>>>>>> `readOnlyKey` to make is very clear). Ultimately, I would
>>>> prefer
>>>>>>> to
>>>>>>>>>>>>>>> guard against any modification, but I have no good idea how
>> to
>>>>> do
>>>>>>>>>>> this.
>>>>>>>>>>>>>>> Not sure what others think about the risk of corrupted
>>>>>>> partitioning
>>>>>>>>>>>>>>> (what would be a user error and we could say, well, bad luck,
>>>>> you
>>>>>>>>> got
>>>>>>>>>>> a
>>>>>>>>>>>>>>> bug in your code, that's not our fault), vs deep copy with a
>>>>>>>>> potential
>>>>>>>>>>>>>>> performance hit (that we can't quantity atm without any
>>>>>>> performance
>>>>>>>>>>>>>> test).
>>>>>>>>>>>>>>> We do have a performance system test. Maybe it's worth for
>> you
>>>>> to
>>>>>>>>>>> apply
>>>>>>>>>>>>>>> the deep copy strategy and run the test. It's very basic
>>>>>>> performance
>>>>>>>>>>>>>>> test only, but might give some insight. If you want to do
>>>> this,
>>>>>>> look
>>>>>>>>>>>>>>> into folder "tests" for general test setup, and into
>>>>>>>>>>>>>>> "tests/kafaktests/benchmarks/streams" to find find the perf
>>>>> test.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> -Matthias
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On 5/5/17 8:55 AM, Jeyhun Karimov wrote:
>>>>>>>>>>>>>>>> Hi Matthias,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I think extending KIP to include RichFunctions totally
>> makes
>>>>>>>>> sense.
>>>>>>>>>>>>>> So,
>>>>>>>>>>>>>>>>   we don't want to guard the keys because it is costly.
>>>>>>>>>>>>>>>> If we introduce RichFunctions I think it should not be
>>>> limited
>>>>>>> only
>>>>>>>>>>>>>> the 3
>>>>>>>>>>>>>>>> interfaces proposed in KIP but to wide range of interfaces.
>>>>>>>>>>>>>>>> Please correct me if I am wrong.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>>>>> Jeyhun
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Fri, May 5, 2017 at 12:04 AM Matthias J. Sax <
>>>>>>>>>>> [email protected]
>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> One follow up. There was this email on the user list:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> http://search-hadoop.com/m/Kafka/uyzND17KhCaBzPSZ1?subj=
>>>>>>>>>>>>>>> Shouldn+t+the+initializer+of+a+stream+aggregate+accept+the+
>>>> key+
>>>>>>>>>>>>>>>>> It might make sense so include Initializer, Adder, and
>>>>>>> Substractor
>>>>>>>>>>>>>>>>> inferface, too.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> And we should double check if there are other interface we
>>>>> might
>>>>>>>>>>> miss
>>>>>>>>>>>>>>> atm.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> -Matthias
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On 5/4/17 1:31 PM, Matthias J. Sax wrote:
>>>>>>>>>>>>>>>>>> Thanks for updating the KIP.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Deep copying the key will work for sure, but I am actually
>>>> a
>>>>>>>>> little
>>>>>>>>>>>>>> bit
>>>>>>>>>>>>>>>>>> worried about performance impact... We might want to do
>>>> some
>>>>>>> test
>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>> quantify this impact.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Btw: this remind me about the idea of `RichFunction`
>>>>> interface
>>>>>>>>> that
>>>>>>>>>>>>>>>>>> would allow users to access record metadata (like
>>>> timestamp,
>>>>>>>>>>> offset,
>>>>>>>>>>>>>>>>>> partition etc) within DSL. This would be a similar
>> concept.
>>>>>>>>> Thus, I
>>>>>>>>>>>>>> am
>>>>>>>>>>>>>>>>>> wondering, if it would make sense to enlarge the scope of
>>>>> this
>>>>>>>>> KIP
>>>>>>>>>>> by
>>>>>>>>>>>>>>>>>> that? WDYT?
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> -Matthias
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On 5/3/17 2:08 AM, Jeyhun Karimov wrote:
>>>>>>>>>>>>>>>>>>> Hi Mathieu,
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Thanks for feedback. I followed similar approach and
>>>> updated
>>>>>>> PR
>>>>>>>>>>> and
>>>>>>>>>>>>>>> KIP
>>>>>>>>>>>>>>>>>>> accordingly. I tried to guard the key in Processors
>>>> sending
>>>>> a
>>>>>>>>> copy
>>>>>>>>>>>>>> of
>>>>>>>>>>>>>>> an
>>>>>>>>>>>>>>>>>>> actual key.
>>>>>>>>>>>>>>>>>>> Because I am doing deep copy of an object, I think memory
>>>>> can
>>>>>>> be
>>>>>>>>>>>>>>>>> bottleneck
>>>>>>>>>>>>>>>>>>> in some use-cases.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>>>>>>>> Jeyhun
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> On Tue, May 2, 2017 at 5:10 PM Mathieu Fenniak <
>>>>>>>>>>>>>>>>> [email protected]>
>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Hi Jeyhun,
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> This approach would change ValueMapper (...etc) to be
>>>>>>> classes,
>>>>>>>>>>>>>> rather
>>>>>>>>>>>>>>>>> than
>>>>>>>>>>>>>>>>>>>> interfaces, which is also a backwards incompatible
>>>> change.
>>>>>>> An
>>>>>>>>>>>>>>>>> alternative
>>>>>>>>>>>>>>>>>>>> approach that would be backwards compatible would be to
>>>>>>> define
>>>>>>>>>>> new
>>>>>>>>>>>>>>>>>>>> interfaces, and provide overrides where those interfaces
>>>>> are
>>>>>>>>>>> used.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Unfortunately, making the key parameter as "final"
>>>> doesn't
>>>>>>>>> change
>>>>>>>>>>>>>>> much
>>>>>>>>>>>>>>>>>>>> about guarding against key change.  It only prevents the
>>>>>>>>>>> parameter
>>>>>>>>>>>>>>>>> variable
>>>>>>>>>>>>>>>>>>>> from being reassigned.  If the key type is a mutable
>>>> object
>>>>>>>>> (eg.
>>>>>>>>>>>>>>>>> byte[]),
>>>>>>>>>>>>>>>>>>>> it can still be mutated. (eg. key[0] = 0).  But I'm not
>>>>>>> really
>>>>>>>>>>> sure
>>>>>>>>>>>>>>>>> there's
>>>>>>>>>>>>>>>>>>>> much that can be done about that.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Mathieu
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> On Mon, May 1, 2017 at 5:39 PM, Jeyhun Karimov <
>>>>>>>>>>>>>> [email protected]
>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Thanks for comments.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> The concerns makes sense. Although we can guard for
>>>>>>> immutable
>>>>>>>>>>> keys
>>>>>>>>>>>>>>> in
>>>>>>>>>>>>>>>>>>>>> current implementation (with few changes), I didn't
>>>>> consider
>>>>>>>>>>>>>>> backward
>>>>>>>>>>>>>>>>>>>>> compatibility.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> In this case 2 solutions come to my mind. In both
>> cases,
>>>>>>> user
>>>>>>>>>>>>>>> accesses
>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>> key in Object type, as passing extra type parameter
>> will
>>>>>>> break
>>>>>>>>>>>>>>>>>>>>> backwards-compatibility.  So user has to cast to actual
>>>>> key
>>>>>>>>>>> type.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> 1. Firstly, We can overload apply method with 2
>> argument
>>>>>>> (key
>>>>>>>>>>> and
>>>>>>>>>>>>>>>>> value)
>>>>>>>>>>>>>>>>>>>>> and force key to be *final*. By doing this,  I think we
>>>>> can
>>>>>>>>>>>>>> address
>>>>>>>>>>>>>>>>> both
>>>>>>>>>>>>>>>>>>>>> backward-compatibility and guarding against key change.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> 2. Secondly, we can create class KeyAccess like:
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> public class KeyAccess {
>>>>>>>>>>>>>>>>>>>>>      Object key;
>>>>>>>>>>>>>>>>>>>>>      public void beforeApply(final Object key) {
>>>>>>>>>>>>>>>>>>>>>          this.key = key;
>>>>>>>>>>>>>>>>>>>>>      }
>>>>>>>>>>>>>>>>>>>>>      public Object getKey() {
>>>>>>>>>>>>>>>>>>>>>          return key;
>>>>>>>>>>>>>>>>>>>>>      }
>>>>>>>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> We can extend *ValueMapper, ValueJoiner* and
>>>>>>>>> *ValueTransformer*
>>>>>>>>>>>>>> from
>>>>>>>>>>>>>>>>>>>>> *KeyAccess*. Inside processor (for example
>>>>>>>>>>>>>>> *KTableMapValuesProcessor*)
>>>>>>>>>>>>>>>>>>>>> before calling *mapper.apply(value)* we can set the
>>>> *key*
>>>>> by
>>>>>>>>>>>>>>>>>>>>> *mapper.beforeApply(key)*. As a result, user can use
>>>>>>>>> *getKey()*
>>>>>>>>>>> to
>>>>>>>>>>>>>>>>> access
>>>>>>>>>>>>>>>>>>>>> the key inside *apply(value)* method.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>>>>>>>>>> Jeyhun
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> On Mon, May 1, 2017 at 7:24 PM Matthias J. Sax <
>>>>>>>>>>>>>>> [email protected]
>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Jeyhun,
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> thanks a lot for the KIP!
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> I think there are two issues we need to address:
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> (1) The KIP does not consider backward compatibility.
>>>>> Users
>>>>>>>>> did
>>>>>>>>>>>>>>>>>>>> complain
>>>>>>>>>>>>>>>>>>>>>> about this in past releases already, and as the user
>>>> base
>>>>>>>>>>> grows,
>>>>>>>>>>>>>> we
>>>>>>>>>>>>>>>>>>>>>> should not break backward compatibility in future
>>>>> releases
>>>>>>>>>>>>>> anymore.
>>>>>>>>>>>>>>>>>>>>>> Thus, we should think of a better way to allow key
>>>>> access.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Mathieu's comment goes into the same direction
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> On the other hand, the number of compile failures
>>>> that
>>>>>>>>> would
>>>>>>>>>>>>>> need
>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>> be
>>>>>>>>>>>>>>>>>>>>>>>> fixed from this change is unfortunate. :-)
>>>>>>>>>>>>>>>>>>>>>> (2) Another concern is, that there is no guard to
>>>> prevent
>>>>>>>>> user
>>>>>>>>>>>>>> code
>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>>> modify the key. This might corrupt partitioning if
>>>> users
>>>>> do
>>>>>>>>>>> alter
>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>> key (accidentally -- or users are just not aware that
>>>>> they
>>>>>>>>> are
>>>>>>>>>>>>>> not
>>>>>>>>>>>>>>>>>>>>>> allowed to modify the provided key object) and thus
>>>> break
>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>> application. (This was the original motivation to not
>>>>>>> provide
>>>>>>>>>>> the
>>>>>>>>>>>>>>> key
>>>>>>>>>>>>>>>>>>>> in
>>>>>>>>>>>>>>>>>>>>>> the first place -- it's guards against modification.)
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> -Matthias
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> On 5/1/17 6:31 AM, Mathieu Fenniak wrote:
>>>>>>>>>>>>>>>>>>>>>>> Hi Jeyhun,
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> I just want to add my voice that, I too, have wished
>>>> for
>>>>>>>>>>> access
>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>> record key during a mapValues or similar operation.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> On the other hand, the number of compile failures
>> that
>>>>>>> would
>>>>>>>>>>>>>> need
>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>> be
>>>>>>>>>>>>>>>>>>>>>>> fixed from this change is unfortunate. :-)  But at
>>>> least
>>>>>>> it
>>>>>>>>>>>>>> would
>>>>>>>>>>>>>>>>> all
>>>>>>>>>>>>>>>>>>>>> be
>>>>>>>>>>>>>>>>>>>>>> a
>>>>>>>>>>>>>>>>>>>>>>> pretty clear and easy change.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Mathieu
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> On Mon, May 1, 2017 at 6:55 AM, Jeyhun Karimov <
>>>>>>>>>>>>>>>>> [email protected]
>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>> Dear community,
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> I want to share KIP-149 [1] based on issues
>>>> KAFKA-4218
>>>>>>> [2],
>>>>>>>>>>>>>>>>>>>> KAFKA-4726
>>>>>>>>>>>>>>>>>>>>>> [3],
>>>>>>>>>>>>>>>>>>>>>>>> KAFKA-3745 [4]. The related PR can be found at [5].
>>>>>>>>>>>>>>>>>>>>>>>> I would like to get your comments.
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> [1]
>>>>>>>>>>>>>>>>>>>>>>>> https://cwiki.apache.org/
>>>> confluence/display/KAFKA/KIP-
>>>>>>>>>>>>>>>>>>>>>>>> 149%3A+Enabling+key+access+in+ValueTransformer%2C+
>>>>>>>>>>>>>>>>>>>>>>>> ValueMapper%2C+and+ValueJoiner
>>>>>>>>>>>>>>>>>>>>>>>> [2] https://issues.apache.org/jira
>> /browse/KAFKA-4218
>>>>>>>>>>>>>>>>>>>>>>>> [3] https://issues.apache.org/jira
>> /browse/KAFKA-4726
>>>>>>>>>>>>>>>>>>>>>>>> [4] https://issues.apache.org/jira
>> /browse/KAFKA-3745
>>>>>>>>>>>>>>>>>>>>>>>> [5] https://github.com/apache/kafka/pull/2946
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>>>>>>>>>>>>> Jeyhun
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>>>>>>> -Cheers
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> Jeyhun
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>>>> -Cheers
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Jeyhun
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>> -Cheers
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Jeyhun
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>> -Cheers
>>>>>>>>>>
>>>>>>>>>> Jeyhun
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>> -Cheers
>>>>>>>>
>>>>>>>> Jeyhun
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>>
>

signature.asc
Description: OpenPGP digital signature

Re: [DISCUSS]: KIP-149: Enabling key access in ValueTransformer, ValueMapper, and ValueJoiner

Reply via email to