Re: [DISCUSS] KIP-221: Repartition Topic Hints in Streams

Matthias J. Sax Fri, 19 Jul 2019 17:54:13 -0700

Thanks for driving the KIP.

I agree that users need to be able to specify a partitioning strategy.


Sophie raises a fair point about topic configs and producer configs. My
take is, that consider `Repartitioned` as an "extension" to `Produced`,
that adds topic configuration, is a good way to think about it and helps
to keep the API "clean".


With regard to method names. I would prefer to avoid abbreviations. Can
we rename:

`withNumOfPartitions` -> `withNumberOfPartitions`

Furthermore, it might be good to add some more `static` methods:

 - Repartitioned.with(Serde<K>, Serde<V>)
 - Repartitioned.withNumberOfPartitions(int)
 - Repartitioned.streamPartitioner(StreamPartitioner)


-Matthias

On 7/19/19 3:33 PM, Levani Kokhreidze wrote:
> Totally agree. I think in KStream interface it makes sense to have some 
> duplicate configurations between operators in order to keep API simple and 
> usable.
> Also, as more surface API has, harder it is to have proper backward 
> compatibility.
> While initial idea of keeping topic level configs separate was exciting, 
> having Repartitioned class encapsulate some producer level configs makes API 
> more readable.
> 
> Regards,
> Levani
> 
>> On Jul 20, 2019, at 1:15 AM, Sophie Blee-Goldman <[email protected]> wrote:
>>
>> I think that is a good point about trying to keep producer level
>> configurations and (repartition) topic level considerations separate.
>> Number of partitions is definitely purely a topic level configuration. But
>> on some level, serdes and partitioners are just as much a topic
>> configuration as a producer one. You could have two producers configured
>> with different serdes and/or partitioners, but if they are writing to the
>> same topic the result would be very difficult to part. So in a sense, these
>> are configurations of topics in Streams, not just producers.
>>
>> Another way to think of it: while the Streams API is not always true to
>> this, ideally all the relevant configs for an operator are wrapped into a
>> single object (in this case, Repartitioned). We could instead split out the
>> fields in common with Produced into a separate parameter to keep topic and
>> producer level configurations separate, but this increases the API surface
>> area by a lot. It's much more straightforward to just say "this is
>> everything that this particular operator needs" without worrying about what
>> exactly you're specifying.
>>
>> I suppose you could alternatively make Produced a field of Repartitioned,
>> but I don't think we do this kind of composition elsewhere in Streams at
>> the moment
>>
>> On Fri, Jul 19, 2019 at 1:45 PM Levani Kokhreidze <[email protected]>
>> wrote:
>>
>>> Hi Bill,
>>>
>>> Thanks a lot for the feedback.
>>> Yes, that makes sense. I’ve updated KIP with `Repartitioned#partitioner`
>>> configuration.
>>> In the beginning, I wanted to introduce a class for topic level
>>> configuration and keep topic level and producer level configurations (such
>>> as Produced) separately (see my second email in this thread).
>>> But while looking at the semantics of KStream interface, I couldn’t really
>>> figure out good operation name for Topic level configuration class and just
>>> introducing `Topic` config class was kinda breaking the semantics.
>>> So I think having Repartitioned class which encapsulates topic and
>>> producer level configurations for internal topics is viable thing to do.
>>>
>>> Regards,
>>> Levani
>>>
>>>> On Jul 19, 2019, at 7:47 PM, Bill Bejeck <[email protected]> wrote:
>>>>
>>>> Hi Lavani,
>>>>
>>>> Thanks for resurrecting this KIP.
>>>>
>>>> I'm also a +1 for adding a partition option.  In addition to the reason
>>>> provided by John, my reasoning is:
>>>>
>>>>  1. Users may want to use something other than hash-based partitioning
>>>>  2. Users may wish to partition on something different than the key
>>>>  without having to change the key.  For example:
>>>>     1. A combination of fields in the value in conjunction with the key
>>>>     2. Something other than the key
>>>>  3. We allow users to specify a partitioner on Produced hence in
>>>>  KStream.to and KStream.through, so it makes sense for API consistency.
>>>>
>>>> Just my  2 cents.
>>>>
>>>> Thanks,
>>>> Bill
>>>>
>>>>
>>>>
>>>> On Fri, Jul 19, 2019 at 5:46 AM Levani Kokhreidze <
>>> [email protected]>
>>>> wrote:
>>>>
>>>>> Hi John,
>>>>>
>>>>> In my mind it makes sense.
>>>>> If we add partitioner configuration to Repartitioned class, with the
>>>>> combination of specifying number of partitions for internal topics, user
>>>>> will have opportunity to ensure co-partitioning before join operation.
>>>>> I think this can be quite powerful feature.
>>>>> Wondering what others think about this?
>>>>>
>>>>> Regards,
>>>>> Levani
>>>>>
>>>>>> On Jul 18, 2019, at 1:20 AM, John Roesler <[email protected]> wrote:
>>>>>>
>>>>>> Yes, I believe that's what I had in mind. Again, not totally sure it
>>>>>> makes sense, but I believe something similar is the rationale for
>>>>>> having the partitioner option in Produced.
>>>>>>
>>>>>> Thanks,
>>>>>> -John
>>>>>>
>>>>>> On Wed, Jul 17, 2019 at 3:20 PM Levani Kokhreidze
>>>>>> <[email protected]> wrote:
>>>>>>>
>>>>>>> Hey John,
>>>>>>>
>>>>>>> Oh that’s interesting use-case.
>>>>>>> Do I understand this correctly, in your example I would first issue
>>>>> repartition(Repartitioned) with proper partitioner that essentially
>>> would
>>>>> be the same as the topic I want to join with and then do the
>>> KStream#join
>>>>> with DSL?
>>>>>>>
>>>>>>> Regards,
>>>>>>> Levani
>>>>>>>
>>>>>>>> On Jul 17, 2019, at 11:11 PM, John Roesler <[email protected]>
>>> wrote:
>>>>>>>>
>>>>>>>> Hey, all, just to chime in,
>>>>>>>>
>>>>>>>> I think it might be useful to have an option to specify the
>>>>>>>> partitioner. The case I have in mind is that some data may get
>>>>>>>> repartitioned and then joined with an input topic. If the right-side
>>>>>>>> input topic uses a custom partitioning strategy, then the
>>>>>>>> repartitioned stream also needs to be partitioned with the same
>>>>>>>> strategy.
>>>>>>>>
>>>>>>>> Does that make sense, or did I maybe miss something important?
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> -John
>>>>>>>>
>>>>>>>> On Wed, Jul 17, 2019 at 2:48 PM Levani Kokhreidze
>>>>>>>> <[email protected]> wrote:
>>>>>>>>>
>>>>>>>>> Yes, I was thinking about it as well. To be honest I’m not sure
>>> about
>>>>> it yet.
>>>>>>>>> As Kafka Streams DSL user, I don’t really think I would need control
>>>>> over partitioner for internal topics.
>>>>>>>>> As a user, I would assume that Kafka Streams knows best how to
>>>>> partition data for internal topics.
>>>>>>>>> In this KIP I wrote that Produced should be used only for topics
>>> that
>>>>> are created by user In advance.
>>>>>>>>> In those cases maybe it make sense to have possibility to specify
>>> the
>>>>> partitioner.
>>>>>>>>> I don’t have clear answer on that yet, but I guess specifying the
>>>>> partitioner can be added as well if there’s agreement on this.
>>>>>>>>>
>>>>>>>>> Regards,
>>>>>>>>> Levani
>>>>>>>>>
>>>>>>>>>> On Jul 17, 2019, at 10:42 PM, Sophie Blee-Goldman <
>>>>> [email protected]> wrote:
>>>>>>>>>>
>>>>>>>>>> Thanks for clearing that up. I agree that Repartitioned would be a
>>>>> useful
>>>>>>>>>> addition. I'm wondering if it might also need to have
>>>>>>>>>> a withStreamPartitioner method/field, similar to Produced? I'm not
>>>>> sure how
>>>>>>>>>> widely this feature is really used, but seems it should be
>>> available
>>>>> for
>>>>>>>>>> repartition topics.
>>>>>>>>>>
>>>>>>>>>> On Wed, Jul 17, 2019 at 11:26 AM Levani Kokhreidze <
>>>>> [email protected]>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hey Sophie,
>>>>>>>>>>>
>>>>>>>>>>> In both cases KStream#repartition and
>>>>> KStream#repartition(Repartitioned)
>>>>>>>>>>> topic will be created and managed by Kafka Streams.
>>>>>>>>>>> Idea of Repartitioned is to give user more control over the topic
>>>>> such as
>>>>>>>>>>> num of partitions.
>>>>>>>>>>> I feel like Repartitioned parameter is something that is missing
>>> in
>>>>>>>>>>> current DSL design.
>>>>>>>>>>> Essentially giving user control over parallelism by configuring
>>> num
>>>>> of
>>>>>>>>>>> partitions for internal topics.
>>>>>>>>>>>
>>>>>>>>>>> Hope this answers your question.
>>>>>>>>>>>
>>>>>>>>>>> Regards,
>>>>>>>>>>> Levani
>>>>>>>>>>>
>>>>>>>>>>>> On Jul 17, 2019, at 9:02 PM, Sophie Blee-Goldman <
>>>>> [email protected]>
>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> Hey Levani,
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks for the KIP! Can you clarify one thing for me -- for the
>>>>>>>>>>>> KStream#repartition signature taking a Repartitioned, will the
>>>>> topic be
>>>>>>>>>>>> auto-created by Streams (which seems to be the case for the
>>>>> signature
>>>>>>>>>>>> without a Repartitioned) or does it have to be pre-created? The
>>>>> wording
>>>>>>>>>>> in
>>>>>>>>>>>> the KIP makes it seem like one version of the method will
>>>>> auto-create
>>>>>>>>>>>> topics while the other will not.
>>>>>>>>>>>>
>>>>>>>>>>>> Cheers,
>>>>>>>>>>>> Sophie
>>>>>>>>>>>>
>>>>>>>>>>>> On Wed, Jul 17, 2019 at 10:15 AM Levani Kokhreidze <
>>>>>>>>>>> [email protected]>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Hello,
>>>>>>>>>>>>>
>>>>>>>>>>>>> One more bump about KIP-221 (
>>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>
>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-221%3A+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint
>>>>>>>>>>>>> <
>>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>
>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-221:+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint
>>>>>>>>>>>> )
>>>>>>>>>>>>> so it doesn’t get lost in mailing list :)
>>>>>>>>>>>>> Would love to hear communities opinions/concerns about this KIP.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>> Levani
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Jul 12, 2019, at 5:27 PM, Levani Kokhreidze <
>>>>> [email protected]
>>>>>>>>>>>>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hello,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Kind reminder about this KIP:
>>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>
>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-221%3A+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint
>>>>>>>>>>>>> <
>>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>
>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-221:+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>>> Levani
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Jul 9, 2019, at 11:38 AM, Levani Kokhreidze <
>>>>>>>>>>> [email protected]
>>>>>>>>>>>>> <mailto:[email protected]>> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Hello,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> In order to move this KIP forward, I’ve updated confluence
>>> page
>>>>> with
>>>>>>>>>>>>> the new proposal
>>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>
>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-221%3A+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint
>>>>>>>>>>>>> <
>>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>
>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-221:+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I’ve also filled “Rejected Alternatives” section.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Looking forward to discuss this KIP :)
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> King regards,
>>>>>>>>>>>>>>> Levani
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Jul 3, 2019, at 1:08 PM, Levani Kokhreidze <
>>>>>>>>>>> [email protected]
>>>>>>>>>>>>> <mailto:[email protected]>> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Hello Matthias,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Thanks for the feedback and ideas.
>>>>>>>>>>>>>>>> I like the idea of introducing dedicated `Topic` class for
>>>>> topic
>>>>>>>>>>>>> configuration for internal operators like `groupedBy`.
>>>>>>>>>>>>>>>> Would be great to hear others opinion about this as well.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Kind regards,
>>>>>>>>>>>>>>>> Levani
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Jul 3, 2019, at 7:00 AM, Matthias J. Sax <
>>>>> [email protected]
>>>>>>>>>>>>> <mailto:[email protected]>> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Levani,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Thanks for picking up this KIP! And thanks for summarizing
>>>>>>>>>>> everything.
>>>>>>>>>>>>>>>>> Even if some points may have been discussed already (can't
>>>>> really
>>>>>>>>>>>>>>>>> remember), it's helpful to get a good summary to refresh the
>>>>>>>>>>>>> discussion.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I think your reasoning makes sense. With regard to the
>>>>> distinction
>>>>>>>>>>>>>>>>> between operators that manage topics and operators that use
>>>>>>>>>>>>> user-created
>>>>>>>>>>>>>>>>> topics: Following this argument, it might indicate that
>>>>> leaving
>>>>>>>>>>>>>>>>> `through()` as-is (as an operator that uses use-defined
>>>>> topics) and
>>>>>>>>>>>>>>>>> introducing a new `repartition()` operator (an operator that
>>>>> manages
>>>>>>>>>>>>>>>>> topics itself) might be good. Otherwise, there is one
>>> operator
>>>>>>>>>>>>>>>>> `through()` that sometimes manages topics but sometimes
>>> not; a
>>>>>>>>>>>>> different
>>>>>>>>>>>>>>>>> name, ie, new operator would make the distinction clearer.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> About adding `numOfPartitions` to `Grouped`. I am wondering
>>>>> if the
>>>>>>>>>>>>> same
>>>>>>>>>>>>>>>>> argument as for `Produced` does apply and adding it is
>>>>> semantically
>>>>>>>>>>>>>>>>> questionable? Might be good to get opinions of others on
>>>>> this, too.
>>>>>>>>>>> I
>>>>>>>>>>>>> am
>>>>>>>>>>>>>>>>> not sure myself what solution I prefer atm.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> So far, KS uses configuration objects that allow to
>>> configure
>>>>> a
>>>>>>>>>>>>> certain
>>>>>>>>>>>>>>>>> "entity" like a consumer, producer, store. If we assume that
>>>>> a topic
>>>>>>>>>>>>> is
>>>>>>>>>>>>>>>>> a similar entity, I am wonder if we should have a
>>>>>>>>>>>>>>>>> `Topic#withNumberOfPartitions()` class and method instead of
>>>>> a plain
>>>>>>>>>>>>>>>>> integer? This would allow us to add other configs, like
>>>>> replication
>>>>>>>>>>>>>>>>> factor, retention-time etc, easily, without the need to
>>>>> change the
>>>>>>>>>>>>> "main
>>>>>>>>>>>>>>>>> API".
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Just want to give some ideas. Not sure if I like them
>>> myself.
>>>>> :)
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> -Matthias
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On 7/1/19 1:04 AM, Levani Kokhreidze wrote:
>>>>>>>>>>>>>>>>>> Actually, giving it more though - maybe enhancing Produced
>>>>> with num
>>>>>>>>>>>>> of partitions configuration is not the best approach. Let me
>>>>> explain
>>>>>>>>>>> why:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> 1) If we enhance Produced class with this configuration,
>>>>> this will
>>>>>>>>>>>>> also affect KStream#to operation. Since KStream#to is the final
>>>>> sink of
>>>>>>>>>>> the
>>>>>>>>>>>>> topology, for me, it seems to be reasonable assumption that user
>>>>> needs
>>>>>>>>>>> to
>>>>>>>>>>>>> manually create sink topic in advance. And in that case, having
>>>>> num of
>>>>>>>>>>>>> partitions configuration doesn’t make much sense.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> 2) Looking at Produced class, based on API contract, seems
>>>>> like
>>>>>>>>>>>>> Produced is designed to be something that is explicitly for
>>>>> producer
>>>>>>>>>>> (key
>>>>>>>>>>>>> serializer, value serializer, partitioner those all are producer
>>>>>>>>>>> specific
>>>>>>>>>>>>> configurations) and num of partitions is topic level
>>>>> configuration. And
>>>>>>>>>>> I
>>>>>>>>>>>>> don’t think mixing topic and producer level configurations
>>>>> together in
>>>>>>>>>>> one
>>>>>>>>>>>>> class is the good approach.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> 3) Looking at KStream interface, seems like Produced
>>>>> parameter is
>>>>>>>>>>>>> for operations that work with non-internal (e.g topics created
>>> and
>>>>>>>>>>> managed
>>>>>>>>>>>>> internally by Kafka Streams) topics and I think we should leave
>>>>> it as
>>>>>>>>>>> it is
>>>>>>>>>>>>> in that case.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Taking all this things into account, I think we should
>>>>> distinguish
>>>>>>>>>>>>> between DSL operations, where Kafka Streams should create and
>>>>> manage
>>>>>>>>>>>>> internal topics (KStream#groupBy) vs topics that should be
>>>>> created in
>>>>>>>>>>>>> advance (e.g KStream#to).
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> To sum it up, I think adding numPartitions configuration in
>>>>>>>>>>> Produced
>>>>>>>>>>>>> will result in mixing topic and producer level configuration in
>>>>> one
>>>>>>>>>>> class
>>>>>>>>>>>>> and it’s gonna break existing API semantics.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Regarding making topic name optional in KStream#through - I
>>>>> think
>>>>>>>>>>>>> underline idea is very useful and giving users possibility to
>>>>> specify
>>>>>>>>>>> num
>>>>>>>>>>>>> of partitions there is even more useful :) Considering arguments
>>>>> against
>>>>>>>>>>>>> adding num of partitions in Produced class, I see two options
>>>>> here:
>>>>>>>>>>>>>>>>>> 1) Add following method overloads
>>>>>>>>>>>>>>>>>> * through() - topic will be auto-generated and num of
>>>>> partitions
>>>>>>>>>>>>> will be taken from source topic
>>>>>>>>>>>>>>>>>> * through(final int numOfPartitions) - topic will be auto
>>>>>>>>>>>>> generated with specified num of partitions
>>>>>>>>>>>>>>>>>> * through(final int numOfPartitions, final Produced<K, V>
>>>>>>>>>>>>> produced) - topic will be with generated with specified num of
>>>>>>>>>>> partitions
>>>>>>>>>>>>> and configuration taken from produced parameter.
>>>>>>>>>>>>>>>>>> 2) Leave KStream#through as it is and introduce new method
>>> -
>>>>>>>>>>>>> KStream#repartition (I think Matthias suggested this in one of
>>> the
>>>>>>>>>>> threads)
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Considering all mentioned above I propose the following
>>> plan:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Option A:
>>>>>>>>>>>>>>>>>> 1) Leave Produced as it is
>>>>>>>>>>>>>>>>>> 2) Add num of partitions configuration to Grouped class (as
>>>>>>>>>>>>> mentioned in the KIP)
>>>>>>>>>>>>>>>>>> 3) Add following method overloads to KStream#through
>>>>>>>>>>>>>>>>>> * through() - topic will be auto-generated and num of
>>>>> partitions
>>>>>>>>>>>>> will be taken from source topic
>>>>>>>>>>>>>>>>>> * through(final int numOfPartitions) - topic will be auto
>>>>>>>>>>>>> generated with specified num of partitions
>>>>>>>>>>>>>>>>>> * through(final int numOfPartitions, final Produced<K, V>
>>>>>>>>>>>>> produced) - topic will be with generated with specified num of
>>>>>>>>>>> partitions
>>>>>>>>>>>>> and configuration taken from produced parameter.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Option B:
>>>>>>>>>>>>>>>>>> 1) Leave Produced as it is
>>>>>>>>>>>>>>>>>> 2) Add num of partitions configuration to Grouped class (as
>>>>>>>>>>>>> mentioned in the KIP)
>>>>>>>>>>>>>>>>>> 3) Add new operator KStream#repartition for creating and
>>>>> managing
>>>>>>>>>>>>> internal repartition topics
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> P.S. I’m sorry if all of this was already discussed in the
>>>>> mailing
>>>>>>>>>>>>> list, but I kinda got with all the threads that were about this
>>>>> KIP :(
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Kind regards,
>>>>>>>>>>>>>>>>>> Levani
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> On Jul 1, 2019, at 9:56 AM, Levani Kokhreidze <
>>>>>>>>>>>>> [email protected] <mailto:[email protected]>> wrote:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Hello,
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> I would like to resurrect discussion around KIP-221. Going
>>>>> through
>>>>>>>>>>>>> the discussion thread, there’s seems to agreement around
>>>>> usefulness of
>>>>>>>>>>> this
>>>>>>>>>>>>> feature.
>>>>>>>>>>>>>>>>>>> Regarding the implementation, as far as I understood, the
>>>>> most
>>>>>>>>>>>>> optimal solution for me seems the following:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> 1) Add two method overloads to KStream#through method
>>>>> (essentially
>>>>>>>>>>>>> making topic name optional)
>>>>>>>>>>>>>>>>>>> 2) Enhance Produced class with numOfPartitions
>>> configuration
>>>>>>>>>>> field.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Those two changes will allow DSL users to control
>>>>> parallelism and
>>>>>>>>>>>>> trigger re-partition without doing stateful operations.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> I will update KIP with interface changes around
>>>>> KStream#through if
>>>>>>>>>>>>> this changes sound sensible.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Kind regards,
>>>>>>>>>>>>>>>>>>> Levani
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>>
>>>>>
>>>
>>>
> 
>

signature.asc
Description: OpenPGP digital signature

Re: [DISCUSS] KIP-221: Repartition Topic Hints in Streams

Reply via email to